Re: [Qemu-devel] [PATCH] pci: change typename of q35 to pci-q35
On Thu, May 30, 2013 at 10:47:48AM +0800, Amos Kong wrote: In QEMU, we set the firmware name of pc-i440fx is 'pci', and set the firmware name of q35 is 'q35-pcihost'. The seabios pattern matching code isn't sophisticated, and we want to match those two devices with exact pattern. This patch changes the typename of q35 to 'pci-q35', all pci devices use same prefix 'pci' in fireware name. And I will change pattern in seabios to /pci*@i0cf8. Signed-off-by: Amos Kong ak...@redhat.com Let's just not leak name to the guest - there's no reason to. Won't the following address the issue without bios changes? Maybe we should find some way to set fw_name in hw/pci/pci_host.c, worth looking into. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c index 8467f86..24df6b5 100644 --- a/hw/pci-host/q35.c +++ b/hw/pci-host/q35.c @@ -76,6 +76,7 @@ static void q35_host_class_init(ObjectClass *klass, void *data) k-init = q35_host_init; dc-props = mch_props; +dc-fw_name = pci; } static void q35_host_initfn(Object *obj)
Re: [Qemu-devel] [SeaBIOS] KVM call agenda for 2013-05-28
Hi, Raised that QOM interface should be sufficient. Agree on this one. Ideally the acpi table generation code should be able to gather all information it needs from the qom tree, so it can be a standalone C file instead of being scattered over all qemu. Ack. So my basic argument is why not expose the QOM interfaces to firmware and move the generation code there? Seems like it would be more or less a copy/paste once we had a proper implementation in QEMU. Well, no. Firmware is a quite simple environment without standard libc etc, so moving code from qemu to firmware certainly isn't as easy as copying over a file. There were discussions on potentially introducing a middle component to generate the tables. Coreboot was raised as a possibility, and David thought it would be okay to use coreboot for both OVMF and SeaBIOS. Certainly an option, but that is a long-term project. Out of curiousity, are there other benefits to using coreboot as a core firmware in QEMU? Short-term it's alot of work as we have to bring coreboot's qemu support to feature parity with seabios. I suspect most of this is acpi related though, so when qemu provides the tables and coreboot uses them we could be pretty close already. Long-term it should simplify firmware maintainance as we have only *one* place which handles the hardware bringup, and seabios/ovmf have less work to do. Is there a payload we would ever plausibly use besides OVMF and SeaBIOS? I wouldn't be surprised if people start using other coreboot payloads and/or features such as direct linux kernel boot once it works well on qemu. We might even run qemu test suites as coreboot payload. cheers, Gerd
Re: [Qemu-devel] updated: kvm networking todo wiki
On Thu, May 30, 2013 at 7:23 AM, Rusty Russell ru...@rustcorp.com.au wrote: Anthony Liguori anth...@codemonkey.ws writes: Rusty Russell ru...@rustcorp.com.au writes: On Fri, May 24, 2013 at 08:47:58AM -0500, Anthony Liguori wrote: FWIW, I think what's more interesting is using vhost-net as a networking backend with virtio-net in QEMU being what's guest facing. In theory, this gives you the best of both worlds: QEMU acts as a first line of defense against a malicious guest while still getting the performance advantages of vhost-net (zero-copy). It would be an interesting idea if we didn't already have the vhost model where we don't need the userspace bounce. The model is very interesting for QEMU because then we can use vhost as a backend for other types of network adapters (like vmxnet3 or even e1000). It also helps for things like fault tolerance where we need to be able to control packet flow within QEMU. (CC's reduced, context added, Dmitry Fleytman added for vmxnet3 thoughts). Then I'm really confused as to what this would look like. A zero copy sendmsg? We should be able to implement that today. On the receive side, what can we do better than readv? If we need to return to userspace to tell the guest that we've got a new packet, we don't win on latency. We might reduce syscall overhead with a multi-dimensional readv to read multiple packets at once? Sounds like recvmmsg(2). Stefan
Re: [Qemu-devel] KVM call agenda for 2013-05-28
Hi, Why should this be true? Shouldn't we be allowed to increase the amount of memory the guest has across reboots? That's equivalent to adding another DIMM after power off. poweroff is equivalent to exiting qemu, not to guest reset. Not generating tables on reset does limit what we can do in a pretty fundamental way. Even if you can argue it in the short term, I don't think it's viable in the long term. I don't think so. The procedure for adding/removing non-hotpluggable hardware is: poweroff, plugin/-out hardware (change config in qemu), boot. Hotpluggable hardware doesn't need acpi table updates. cheers, Gerd
Re: [Qemu-devel] snabbswitch integration with QEMU for userspace ethernet I/O
On Wed, May 29, 2013 at 6:02 PM, Julian Stecklina jstec...@os.inf.tu-dresden.de wrote: On 05/29/2013 04:21 PM, Stefan Hajnoczi wrote: The fact that a single switch process has shared memory access to all guests' RAM is critical. If the switch process is exploited, then that exposes other guests' data! (Think of a multi-tenant host with guests belonging to different users.) True. But people don't mind having instruction decoding and half of virtio in the kernel these days, so it can't be that security critical... No, it's still security critical. If there were equivalent solutions with better security then I'm sure people would accept them. It's just that there isn't an equivalent solution yet :). Stefan
Re: [Qemu-devel] snabbswitch integration with QEMU for userspace ethernet I/O
On Thu, May 30, 2013 at 08:46:42AM +0200, Stefan Hajnoczi wrote: On Wed, May 29, 2013 at 6:02 PM, Julian Stecklina jstec...@os.inf.tu-dresden.de wrote: On 05/29/2013 04:21 PM, Stefan Hajnoczi wrote: The fact that a single switch process has shared memory access to all guests' RAM is critical. If the switch process is exploited, then that exposes other guests' data! (Think of a multi-tenant host with guests belonging to different users.) True. But people don't mind having instruction decoding and half of virtio in the kernel these days, so it can't be that security critical... No, it's still security critical. If there were equivalent solutions with better security then I'm sure people would accept them. It's just that there isn't an equivalent solution yet :). Stefan Some people would accept them. Others run with selinux off ... -- MST
Re: [Qemu-devel] [PATCH v2] monitor: work around delayed CHR_EVENT_OPENED events
Am 30.05.2013 00:29, schrieb mdroth: On Wed, May 29, 2013 at 01:27:33PM -0400, Luiz Capitulino wrote: On Mon, 27 May 2013 12:59:25 -0500 mdroth mdr...@linux.vnet.ibm.com wrote: On Mon, May 27, 2013 at 11:16:01AM -0500, Anthony Liguori wrote: Luiz Capitulino lcapitul...@redhat.com writes: On Sun, 26 May 2013 10:33:39 -0500 Michael Roth mdr...@linux.vnet.ibm.com wrote: In the past, CHR_EVENT_OPENED events were emitted via a pre-expired QEMUTimer. Due to timers being processing at the tail end of each main loop iteration, this generally meant that such events would be emitted within the same main loop iteration, prior any client data being read by tcp/unix socket server backends. With 9f939df955a4152aad69a19a77e0898631bb2c18, this was switched to a bottom-half that is registered via g_idle_add(). This makes it likely that the event won't be sent until a subsequent iteration, and also adds the possibility that such events will race with the processing of client data. In cases where we rely on the CHR_EVENT_OPENED event being delivered prior to any client data being read, this may cause unexpected behavior. In the case of a QMP monitor session using a socket backend, the delayed processing of the CHR_EVENT_OPENED event can lead to a situation where a previous session, where 'qmp_capabilities' has already processed, can cause the rejection of 'qmp_capabilities' for a subsequent session, since we reset capabilities in response to CHR_EVENT_OPENED, which may not yet have been delivered. This can be reproduced with the following command, generally within 50 or so iterations: mdroth@loki:~$ cat cmds {'execute':'qmp_capabilities'} {'execute':'query-cpus'} mdroth@loki:~$ while true; do if socat stdio unix-connect:/tmp/qmp.sock cmds | grep CommandNotFound /dev/null; then echo failed; break; else echo ok; fi; done; ok ok failed mdroth@loki:~$ As a work-around, we reset capabilities as part of the CHR_EVENT_CLOSED hook, which gets called as part of the GIOChannel cb associated with the client rather than a bottom-half, and is thus guaranteed to be delivered prior to accepting any subsequent client sessions. This does not address the more general problem of whether or not there are chardev users that rely on CHR_EVENT_OPENED being delivered prior to client data, and whether or not we should consider moving CHR_EVENT_OPENED in-band with connection establishment as a general solution, but fixes QMP for the time being. Reported-by: Stefan Priebe s.pri...@profihost.ag Cc: qemu-sta...@nongnu.org Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com Thanks for debugging this Michael. I'm going to apply your patch to the qmp branch because we can't let this broken (esp. in -stable) but this is papering over the real bug... Do we really need OPENED to happen in a bottom half? Shouldn't the open event handlers register the callback instead if they need it? I think that's the more general fix, but wasn't sure if there were historical reasons why we didn't do this in the first place. Looking at it more closely it does seem like we need a general fix sooner rather than later though, and I don't see any reason why we can't move BH registration into the actual OPENED hooks as you suggest: hw/usb/ccid-card-passthru.c - currently affected? no debug hook only - needs OPENED BH? no hw/usb/redirect.c - currently affected? yes can_read handler checks for dev-parser != NULL, which may be true if CLOSED BH has been executed yet. In the past, OPENED quiesced outstanding CLOSED events prior to us reading client data. - need OPENED BH? possibly seems to only be needed by CLOSED events, which can be issued by USBDevice, so we do teardown/usb_detach in BH. For OPENED, this may not apply. if we do issue a BH, we'd need to modify can_read handler to avoid race. hw/usb/dev-serial.c - currently affected? no can_read checks for dev.attached != NULL. set to NULL synchronously in CLOSED, will not accept reads until OPENED gets called and sets dev.attached - need opened BH? no hw/char/sclpconsole.c - currently affected? no guarded by can_read handler - need opened BH? no hw/char/ipoctal232.c - currently affected? no debug hook only - need opened BH? no hw/char/virtio-console.c - currently affected? no OPENED/CLOSED map to vq events handled asynchronously. can_read checks for guest_connected state rather than anything set by OPENED - need opened BH? no qtest.c - currently affected? yes can_read handler doesn't check for qtest_opened == true, can read data before OPENED event is processed - need opened BH? no monitor.c - current affected? yes negotiated session state can temporarilly leak into a new session - need opened BH? no gdbstub.c - currently affected? yes can fail to pause machine prior to starting gdb
Re: [Qemu-devel] [snabb-devel:308] Re: snabbswitch integration with QEMU for userspace ethernet I/O
On 30 May 2013 08:46, Stefan Hajnoczi stefa...@gmail.com wrote: No, it's still security critical. If there were equivalent solutions with better security then I'm sure people would accept them. It's just that there isn't an equivalent solution yet :). Security-wise this is where I would like to be in the long term: 1. Userspace switch running non-root. 2. Shared memory mappings with VMs for exactly the memory that will be used for packet buffers. (They could map it from me...) 3. IOMMU access to exactly the PCI devices under the switch's control. Again my perspective is pretty hardware-centric rather than kernel-centric i.e. I'm thinking more in terms of MMU/IOMMU than Unix system calls and permissions. Short-term agenda is to build something good enough that people will _want_ to spend the energy to make it highly secure (and highly portable, etc). So I run as root and use every trick in the sysfs book.
Re: [Qemu-devel] [PATCH v3 4/5] xilinx_devcfg: Zynq devcfg device model
On Thu, May 30, 2013 at 3:08 AM, Paolo Bonzini pbonz...@redhat.com wrote: Il 29/05/2013 19:04, Edgar E. Iglesias ha scritto: +for (i = 0; i R_MAX; ++i) { +RegisterInfo *r = s-regs_info[i]; + +*r = (RegisterInfo) { +.data = s-regs[i], +.data_size = sizeof(uint32_t), +.access = xilinx_devcfg_regs_info[i], +.debug = XILINX_DEVCFG_ERR_DEBUG, +.prefix = prefix, +.opaque = s, +}; +memory_region_init_io(r-mem, devcfg_reg_ops, r, devcfg-regs, 4); Hi Peter, Should we be putting r-access-name here instead of devcfg-regs? Yes, that's why I preferred to wrap the memory_region_init_io into an API that takes a RegisterInfo. :) ACK, You've convinced me :). Will be in v4 (pending outcome of discussion with Anthony RE decoding) Regards, Peter Paolo
Re: [Qemu-devel] [PATCH 3/3] ide-test: Add FLUSH CACHE test case
On Wed, May 29, 2013 at 01:34:06PM +0200, Kevin Wolf wrote: +/* Check registers */ +data = inb(IDE_BASE + reg_device); +g_assert_cmpint(data 0x10, ==, 0); assert_bit_clear() with a constant instead of the 0x10 magic number?
Re: [Qemu-devel] [PATCH 0/3] ide: Set BSY bit during FLUSH
On Wed, May 29, 2013 at 01:34:03PM +0200, Kevin Wolf wrote: The test case depends on the qemu-io series I sent yesterday. ([PATCH 00/16] Make qemu-io commands available in the monitor) Andreas Färber (1): ide: Set BSY bit during FLUSH Kevin Wolf (2): blkdebug: Add BLKDBG_FLUSH_TO_OS/DISK events ide-test: Add FLUSH CACHE test case block.c | 8 block/blkdebug.c | 3 +++ hw/ide/core.c | 3 +++ include/block/block.h | 3 +++ tests/ide-test.c | 40 5 files changed, 53 insertions(+), 4 deletions(-) Looks good. I really like the test case. Posted a minor comment on the test case. Stefan
[Qemu-devel] [QEMU PATCH v2] pci: set firmware name of q35 to pci
In QEMU, we set the firmware name of pc-i440fx to 'pci', and the typename 'q35-pcihost' is set to the firmware name of q35 by default. The seabios matches pci devices by /pci/@i0cf8, so q35 device could not be identified, seabios fails to adjust the boot priority of q35 devices. This patch sets firmware name of q35 to 'pci'. Signed-off-by: Michael S. Tsirkin m...@redhat.com Signed-off-by: Amos Kong ak...@redhat.com --- v2: just set the fw_name, no need to change seabios --- hw/pci-host/q35.c | 1 + 1 file changed, 1 insertion(+) diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c index 8467f86..24df6b5 100644 --- a/hw/pci-host/q35.c +++ b/hw/pci-host/q35.c @@ -76,6 +76,7 @@ static void q35_host_class_init(ObjectClass *klass, void *data) k-init = q35_host_init; dc-props = mch_props; +dc-fw_name = pci; } static void q35_host_initfn(Object *obj) -- 1.8.1.4
Re: [Qemu-devel] [PATCH] pci: change typename of q35 to pci-q35
On Thu, May 30, 2013 at 09:09:05AM +0300, Michael S. Tsirkin wrote: On Thu, May 30, 2013 at 10:47:48AM +0800, Amos Kong wrote: In QEMU, we set the firmware name of pc-i440fx is 'pci', and set the firmware name of q35 is 'q35-pcihost'. The seabios pattern matching code isn't sophisticated, and we want to match those two devices with exact pattern. This patch changes the typename of q35 to 'pci-q35', all pci devices use same prefix 'pci' in fireware name. And I will change pattern in seabios to /pci*@i0cf8. Signed-off-by: Amos Kong ak...@redhat.com Let's just not leak name to the guest - there's no reason to. Won't the following address the issue without bios changes? Maybe we should find some way to set fw_name in hw/pci/pci_host.c, worth looking into. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c index 8467f86..24df6b5 100644 --- a/hw/pci-host/q35.c +++ b/hw/pci-host/q35.c @@ -76,6 +76,7 @@ static void q35_host_class_init(ObjectClass *klass, void *data) k-init = q35_host_init; dc-props = mch_props; +dc-fw_name = pci; It's a light change, works with current seabios. } static void q35_host_initfn(Object *obj) -- Amos.
[Qemu-devel] [PATCH v3 0/2] i.MX: Improve GPT driver implementation
This patch is improving the completness of the GPT timer implementation. It adds compare 2 and 3 register support to the already exiting compare 1 register. This patch is also moving to a more modern/robust/abstract implementation Last a global more meaningfull naming is applied. Note: We still don't yet support the input register feature allowing to times stamp a level change on a GPIO. Jean-Christophe DUBOIS (2): i.MX: Implement a more complete version of the GPT timer. i.MX: Rework functions/types name and use new styme initialization hw/timer/imx_gpt.c | 558 +++-- 1 file changed, 325 insertions(+), 233 deletions(-) -- 1.8.1.2
[Qemu-devel] [PATCH v3 2/2] i.MX: Rework functions/types name and use new styme initialization
* use dynamic cast whenever possible. * Change function names to some more meanigfull prefix * Change type names to a more meanigfull one * use new style device intialization. Signed-off-by: Jean-Christophe DUBOIS j...@tribudubois.net --- Change since v1: * The patch has been divided in 2 sub-patches. One that deals with the added features and one dealing with the renaming (this one). Change since v2: * use a shorter prefix * move new style initialization and dynamic cast to this patch. hw/timer/imx_gpt.c | 171 ++--- 1 file changed, 84 insertions(+), 87 deletions(-) diff --git a/hw/timer/imx_gpt.c b/hw/timer/imx_gpt.c index 267a763..de53b13 100644 --- a/hw/timer/imx_gpt.c +++ b/hw/timer/imx_gpt.c @@ -27,7 +27,7 @@ #define DEBUG_TIMER 0 #if DEBUG_TIMER -static char const *imx_timerg_reg_name(uint32_t reg) +static char const *imx_gpt_reg_name(uint32_t reg) { switch (reg) { case 0: @@ -67,12 +67,14 @@ static char const *imx_timerg_reg_name(uint32_t reg) */ #define DEBUG_IMPLEMENTATION 1 #if DEBUG_IMPLEMENTATION -# define IPRINTF(fmt, args...) \ +# define IPRINTF(fmt, args...) \ do { fprintf(stderr, %s: fmt, __func__, ##args); } while (0) #else # define IPRINTF(fmt, args...) do {} while (0) #endif +#define IMX_GPT(obj) \ +OBJECT_CHECK(IMXGPTState, (obj), TYPE_IMX_GPT) /* * GPT : General purpose timer * @@ -137,33 +139,33 @@ typedef struct { uint32_t freq; qemu_irq irq; -} IMXTimerGState; +} IMXGPTState; -static const VMStateDescription vmstate_imx_timerg = { +static const VMStateDescription vmstate_imx_timer_gpt = { .name = TYPE_IMX_GPT, .version_id = 3, .minimum_version_id = 3, .minimum_version_id_old = 3, .fields = (VMStateField[]) { -VMSTATE_UINT32(cr, IMXTimerGState), -VMSTATE_UINT32(pr, IMXTimerGState), -VMSTATE_UINT32(sr, IMXTimerGState), -VMSTATE_UINT32(ir, IMXTimerGState), -VMSTATE_UINT32(ocr1, IMXTimerGState), -VMSTATE_UINT32(ocr2, IMXTimerGState), -VMSTATE_UINT32(ocr3, IMXTimerGState), -VMSTATE_UINT32(icr1, IMXTimerGState), -VMSTATE_UINT32(icr2, IMXTimerGState), -VMSTATE_UINT32(cnt, IMXTimerGState), -VMSTATE_UINT32(next_timeout, IMXTimerGState), -VMSTATE_UINT32(next_int, IMXTimerGState), -VMSTATE_UINT32(freq, IMXTimerGState), -VMSTATE_PTIMER(timer, IMXTimerGState), +VMSTATE_UINT32(cr, IMXGPTState), +VMSTATE_UINT32(pr, IMXGPTState), +VMSTATE_UINT32(sr, IMXGPTState), +VMSTATE_UINT32(ir, IMXGPTState), +VMSTATE_UINT32(ocr1, IMXGPTState), +VMSTATE_UINT32(ocr2, IMXGPTState), +VMSTATE_UINT32(ocr3, IMXGPTState), +VMSTATE_UINT32(icr1, IMXGPTState), +VMSTATE_UINT32(icr2, IMXGPTState), +VMSTATE_UINT32(cnt, IMXGPTState), +VMSTATE_UINT32(next_timeout, IMXGPTState), +VMSTATE_UINT32(next_int, IMXGPTState), +VMSTATE_UINT32(freq, IMXGPTState), +VMSTATE_PTIMER(timer, IMXGPTState), VMSTATE_END_OF_LIST() } }; -static const IMXClk imx_timerg_clocks[] = { +static const IMXClk imx_gpt_clocks[] = { NOCLK,/* 000 No clock source */ IPG, /* 001 ipg_clk, 532MHz*/ IPG, /* 010 ipg_clk_highfreq */ @@ -174,10 +176,10 @@ static const IMXClk imx_timerg_clocks[] = { NOCLK,/* 111 not defined */ }; -static void imx_timerg_set_freq(IMXTimerGState *s) +static void imx_gpt_set_freq(IMXGPTState *s) { uint32_t clksrc = extract32(s-cr, GPT_CR_CLKSRC_SHIFT, 3); -uint32_t freq = imx_clock_frequency(s-ccm, imx_timerg_clocks[clksrc]) +uint32_t freq = imx_clock_frequency(s-ccm, imx_gpt_clocks[clksrc]) / (1 + s-pr); s-freq = freq; @@ -188,7 +190,7 @@ static void imx_timerg_set_freq(IMXTimerGState *s) } } -static void imx_timerg_update(IMXTimerGState *s) +static void imx_gpt_update_int(IMXGPTState *s) { if ((s-sr s-ir) (s-cr GPT_CR_EN)) { qemu_irq_raise(s-irq); @@ -197,14 +199,14 @@ static void imx_timerg_update(IMXTimerGState *s) } } -static uint32_t imx_timerg_update_counts(IMXTimerGState *s) +static uint32_t imx_gpt_update_count(IMXGPTState *s) { s-cnt = s-next_timeout - (uint32_t)ptimer_get_count(s-timer); return s-cnt; } -static inline uint32_t imx_timerg_find_limit(uint32_t count, uint32_t reg, +static inline uint32_t imx_gpt_find_limit(uint32_t count, uint32_t reg, uint32_t timeout) { if ((count reg) (timeout reg)) { @@ -214,7 +216,7 @@ static inline uint32_t imx_timerg_find_limit(uint32_t count, uint32_t reg, return timeout; } -static void imx_timerg_compute_next_timeout(IMXTimerGState *s, bool event) +static void imx_gpt_compute_next_timeout(IMXGPTState *s, bool event) { uint32_t
[Qemu-devel] [PATCH v3 1/2] i.MX: Implement a more complete version of the GPT timer.
* implement compare 1 2 and 3 registers * simplify Debug printf Signed-off-by: Jean-Christophe DUBOIS j...@tribudubois.net --- Change since v1: * The patch has been divided in 2 sub-patches. One that deals with the added features (this one) and one dealing with the renaming. Change since v2: * move new style initialization and dynamic cast to the other patch hw/timer/imx_gpt.c | 443 - 1 file changed, 269 insertions(+), 174 deletions(-) diff --git a/hw/timer/imx_gpt.c b/hw/timer/imx_gpt.c index d8c4f0b..267a763 100644 --- a/hw/timer/imx_gpt.c +++ b/hw/timer/imx_gpt.c @@ -5,6 +5,7 @@ * Copyright (c) 2011 NICTA Pty Ltd * Originally written by Hans Jiang * Updated by Peter Chubb + * Updated by Jean-Christophe Dubois * * This code is licensed under GPL version 2 or later. See * the COPYING file in the top-level directory. @@ -18,10 +19,44 @@ #include hw/sysbus.h #include hw/arm/imx.h -//#define DEBUG_TIMER 1 -#ifdef DEBUG_TIMER +#define TYPE_IMX_GPT imx.gpt + +/* + * Define to 1 for debug messages + */ +#define DEBUG_TIMER 0 +#if DEBUG_TIMER + +static char const *imx_timerg_reg_name(uint32_t reg) +{ +switch (reg) { +case 0: +return CR; +case 1: +return PR; +case 2: +return SR; +case 3: +return IR; +case 4: +return OCR1; +case 5: +return OCR2; +case 6: +return OCR3; +case 7: +return ICR1; +case 8: +return ICR2; +case 9: +return CNT; +default: +return [?]; +} +} + # define DPRINTF(fmt, args...) \ - do { printf(imx_timer: fmt , ##args); } while (0) + do { printf(%s: fmt , __func__, ##args); } while (0) #else # define DPRINTF(fmt, args...) do {} while (0) #endif @@ -33,7 +68,7 @@ #define DEBUG_IMPLEMENTATION 1 #if DEBUG_IMPLEMENTATION # define IPRINTF(fmt, args...) \ -do { fprintf(stderr, imx_timer: fmt, ##args); } while (0) + do { fprintf(stderr, %s: fmt, __func__, ##args); } while (0) #else # define IPRINTF(fmt, args...) do {} while (0) #endif @@ -43,16 +78,7 @@ * * This timer counts up continuously while it is enabled, resetting itself * to 0 when it reaches TIMER_MAX (in freerun mode) or when it - * reaches the value of ocr1 (in periodic mode). WE simulate this using a - * QEMU ptimer counting down from ocr1 and reloading from ocr1 in - * periodic mode, or counting from ocr1 to zero, then TIMER_MAX - ocr1. - * waiting_rov is set when counting from TIMER_MAX. - * - * In the real hardware, there are three comparison registers that can - * trigger interrupts, and compare channel 1 can be used to - * force-reset the timer. However, this is a `bare-bones' - * implementation: only what Linux 3.x uses has been implemented - * (free-running timer from 0 to OCR1 or TIMER_MAX) . + * reaches the value of one of the ocrX (in periodic mode). */ #define TIMER_MAX 0XUL @@ -79,9 +105,13 @@ #define GPT_CR_FO3(1 31) /* Force Output Compare Channel 3 */ #define GPT_SR_OF1 (1 0) +#define GPT_SR_OF2 (1 1) +#define GPT_SR_OF3 (1 2) #define GPT_SR_ROV (1 5) #define GPT_IR_OF1IE (1 0) +#define GPT_IR_OF2IE (1 1) +#define GPT_IR_OF3IE (1 2) #define GPT_IR_ROVIE (1 5) typedef struct { @@ -101,15 +131,19 @@ typedef struct { uint32_t icr2; uint32_t cnt; -uint32_t waiting_rov; +uint32_t next_timeout; +uint32_t next_int; + +uint32_t freq; + qemu_irq irq; } IMXTimerGState; static const VMStateDescription vmstate_imx_timerg = { -.name = imx-timerg, -.version_id = 2, -.minimum_version_id = 2, -.minimum_version_id_old = 2, +.name = TYPE_IMX_GPT, +.version_id = 3, +.minimum_version_id = 3, +.minimum_version_id_old = 3, .fields = (VMStateField[]) { VMSTATE_UINT32(cr, IMXTimerGState), VMSTATE_UINT32(pr, IMXTimerGState), @@ -121,7 +155,9 @@ static const VMStateDescription vmstate_imx_timerg = { VMSTATE_UINT32(icr1, IMXTimerGState), VMSTATE_UINT32(icr2, IMXTimerGState), VMSTATE_UINT32(cnt, IMXTimerGState), -VMSTATE_UINT32(waiting_rov, IMXTimerGState), +VMSTATE_UINT32(next_timeout, IMXTimerGState), +VMSTATE_UINT32(next_int, IMXTimerGState), +VMSTATE_UINT32(freq, IMXTimerGState), VMSTATE_PTIMER(timer, IMXTimerGState), VMSTATE_END_OF_LIST() } @@ -138,16 +174,14 @@ static const IMXClk imx_timerg_clocks[] = { NOCLK,/* 111 not defined */ }; - static void imx_timerg_set_freq(IMXTimerGState *s) { -int clksrc; -uint32_t freq; - -clksrc = (s-cr GPT_CR_CLKSRC_SHIFT) GPT_CR_CLKSRC_MASK; -freq = imx_clock_frequency(s-ccm, imx_timerg_clocks[clksrc]) / (1 + s-pr); +uint32_t clksrc = extract32(s-cr, GPT_CR_CLKSRC_SHIFT, 3); +uint32_t freq = imx_clock_frequency(s-ccm, imx_timerg_clocks[clksrc]) +
Re: [Qemu-devel] [QEMU PATCH v2] pci: set firmware name of q35 to pci
On Thu, May 30, 2013 at 03:25:17PM +0800, Amos Kong wrote: In QEMU, we set the firmware name of pc-i440fx to 'pci', and the typename 'q35-pcihost' is set to the firmware name of q35 by default. The seabios matches pci devices by /pci/@i0cf8, so q35 device could not be identified, seabios fails to adjust the boot priority of q35 devices. This patch sets firmware name of q35 to 'pci'. Signed-off-by: Michael S. Tsirkin m...@redhat.com Signed-off-by: Amos Kong ak...@redhat.com Does this imply Tested-by: Amos Kong ak...@redhat.com ? --- v2: just set the fw_name, no need to change seabios --- hw/pci-host/q35.c | 1 + 1 file changed, 1 insertion(+) diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c index 8467f86..24df6b5 100644 --- a/hw/pci-host/q35.c +++ b/hw/pci-host/q35.c @@ -76,6 +76,7 @@ static void q35_host_class_init(ObjectClass *klass, void *data) k-init = q35_host_init; dc-props = mch_props; +dc-fw_name = pci; } static void q35_host_initfn(Object *obj) -- 1.8.1.4
[Qemu-devel] broken incoming migration
Hi! I found the migration broken on pseries platform, specifically, this patch broke it: f1c72795af573b24a7da5eb52375c9aba8a37972 migration: do not sent zero pages in bulk stage The idea is not to send zero pages to the destination guest which is expected to have 100% empty RAM. However on pseries plaftorm the guest always has some stuff in the RAM as a part of initialization (device tree, system firmware and rtas (?)) so it is not completely empty. As the source guest cannot detect this, it skips some pages during migration and we get a broken destination guest. Bug. While the idea is ok in general, I do not see any easy way to fix it as neither QEMUMachine::init nor QEMUMachine::reset callbacks has information about whether we are about to receive a migration or not (-incoming parameter) and we cannot move device-tree and system firmware initialization anywhere else. ram_bulk_stage is static and cannot be disabled from the platform initialization code. So what would the community suggest? Thanks! -- Alexey
Re: [Qemu-devel] updated: kvm networking todo wiki
Stefan Hajnoczi stefa...@gmail.com writes: On Thu, May 30, 2013 at 7:23 AM, Rusty Russell ru...@rustcorp.com.au wrote: On the receive side, what can we do better than readv? If we need to return to userspace to tell the guest that we've got a new packet, we don't win on latency. We might reduce syscall overhead with a multi-dimensional readv to read multiple packets at once? Sounds like recvmmsg(2). Wow... the future is here, today! Thanks, Rusty.
Re: [Qemu-devel] broken incoming migration
Il 30/05/2013 09:44, Alexey Kardashevskiy ha scritto: Hi! I found the migration broken on pseries platform, specifically, this patch broke it: f1c72795af573b24a7da5eb52375c9aba8a37972 migration: do not sent zero pages in bulk stage The idea is not to send zero pages to the destination guest which is expected to have 100% empty RAM. However on pseries plaftorm the guest always has some stuff in the RAM as a part of initialization (device tree, system firmware and rtas (?)) so it is not completely empty. As the source guest cannot detect this, it skips some pages during migration and we get a broken destination guest. Bug. While the idea is ok in general, I do not see any easy way to fix it as neither QEMUMachine::init nor QEMUMachine::reset callbacks has information about whether we are about to receive a migration or not (-incoming parameter) and we cannot move device-tree and system firmware initialization anywhere else. ram_bulk_stage is static and cannot be disabled from the platform initialization code. So what would the community suggest? Revert the patch. :) Paolo
Re: [Qemu-devel] broken incoming migration
Forgot some cc:. On 05/30/2013 05:44 PM, Alexey Kardashevskiy wrote: Hi! I found the migration broken on pseries platform, specifically, this patch broke it: f1c72795af573b24a7da5eb52375c9aba8a37972 migration: do not sent zero pages in bulk stage The idea is not to send zero pages to the destination guest which is expected to have 100% empty RAM. However on pseries plaftorm the guest always has some stuff in the RAM as a part of initialization (device tree, system firmware and rtas (?)) so it is not completely empty. As the source guest cannot detect this, it skips some pages during migration and we get a broken destination guest. Bug. While the idea is ok in general, I do not see any easy way to fix it as neither QEMUMachine::init nor QEMUMachine::reset callbacks has information about whether we are about to receive a migration or not (-incoming parameter) and we cannot move device-tree and system firmware initialization anywhere else. ram_bulk_stage is static and cannot be disabled from the platform initialization code. So what would the community suggest? Thanks! -- Alexey
Re: [Qemu-devel] snabbswitch integration with QEMU for userspace ethernet I/O
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 05/30/2013 08:46 AM, Stefan Hajnoczi wrote: On Wed, May 29, 2013 at 6:02 PM, Julian Stecklina jstec...@os.inf.tu-dresden.de wrote: On 05/29/2013 04:21 PM, Stefan Hajnoczi wrote: The fact that a single switch process has shared memory access to all guests' RAM is critical. If the switch process is exploited, then that exposes other guests' data! (Think of a multi-tenant host with guests belonging to different users.) True. But people don't mind having instruction decoding and half of virtio in the kernel these days, so it can't be that security critical... No, it's still security critical. If there were equivalent solutions with better security then I'm sure people would accept them. It's just that there isn't an equivalent solution yet :). My comment was more or less meant in a resigning way. ;) At least we are not putting HTTP servers in there any more. Julian -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.19 (GNU/Linux) iEYEARECAAYFAlGnCRMACgkQ2EtjUdW3H9mzFwCghZxvckYgZ4atLm3HLPPWF/Lb 688AnRXm12jbBlmCVOKSaDUHHejEdh7O =csrK -END PGP SIGNATURE-
Re: [Qemu-devel] broken incoming migration
On 05/30/2013 05:49 PM, Paolo Bonzini wrote: Il 30/05/2013 09:44, Alexey Kardashevskiy ha scritto: Hi! I found the migration broken on pseries platform, specifically, this patch broke it: f1c72795af573b24a7da5eb52375c9aba8a37972 migration: do not sent zero pages in bulk stage The idea is not to send zero pages to the destination guest which is expected to have 100% empty RAM. However on pseries plaftorm the guest always has some stuff in the RAM as a part of initialization (device tree, system firmware and rtas (?)) so it is not completely empty. As the source guest cannot detect this, it skips some pages during migration and we get a broken destination guest. Bug. While the idea is ok in general, I do not see any easy way to fix it as neither QEMUMachine::init nor QEMUMachine::reset callbacks has information about whether we are about to receive a migration or not (-incoming parameter) and we cannot move device-tree and system firmware initialization anywhere else. ram_bulk_stage is static and cannot be disabled from the platform initialization code. So what would the community suggest? Revert the patch. :) I'll wait for 24 hours (forgot to cc: the author) and then post a revert patch :) -- Alexey
Re: [Qemu-devel] [QEMU PATCH v2] pci: set firmware name of q35 to pci
On Thu, May 30, 2013 at 03:25:17PM +0800, Amos Kong wrote: In QEMU, we set the firmware name of pc-i440fx to 'pci', and the typename 'q35-pcihost' is set to the firmware name of q35 by default. The seabios matches pci devices by /pci/@i0cf8, so q35 device could not be identified, seabios fails to adjust the boot priority of q35 devices. This patch sets firmware name of q35 to 'pci'. Signed-off-by: Michael S. Tsirkin m...@redhat.com Signed-off-by: Amos Kong ak...@redhat.com I've applied this already. --- v2: just set the fw_name, no need to change seabios --- hw/pci-host/q35.c | 1 + 1 file changed, 1 insertion(+) diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c index 8467f86..24df6b5 100644 --- a/hw/pci-host/q35.c +++ b/hw/pci-host/q35.c @@ -76,6 +76,7 @@ static void q35_host_class_init(ObjectClass *klass, void *data) k-init = q35_host_init; dc-props = mch_props; +dc-fw_name = pci; } static void q35_host_initfn(Object *obj) -- 1.8.1.4
Re: [Qemu-devel] Fwd: [Qemu-discuss] pci-assign error
Dear Alex, Thank you for your answer. Unfortunately, I had not yet the opportunity to test it as I don't have a Kepler arch (I suppose it depends on it ?). I will keep you informed of the results. Regards, Jean Le 03/04/2013 20:16, Alex Williamson a écrit : On Wed, 2013-04-03 at 16:26 +0200, Jean Parpaillon wrote: Message original Sujet: [Qemu-discuss] pci-assign error Date : Wed, 03 Apr 2013 12:12:15 +0200 De : Jean Parpaillon jean.parpail...@free.fr Pour : qemu-disc...@nongnu.org Dear all, I'm trying to assign a PCI graphic card to a qemu guest. Hi, A) Graphics assignment isn't supported and this likely won't be the only problem you find, but if you want to experiment with VFIO-based VGA support please try: kernel: git://github.com/awilliam/linux-vfio.git (branch next) qemu: git://github.com/awilliam/qemu-vfio.git (tag vfio-pci-for-qemu-20130401.0) I sent a qemu pull request for the latter and the kernel updates should be in 3.10. B) Does assignment of anything work on your system or do you perhaps see an error in dmesg like: No interrupt remapping support, disallowing device assignment. Re-enble with allow_unsafe_assigned_interrupts=1 module option. If so, try that. Thanks, Alex When I start the guest with libvirt, I have the following error in /var/log/libvirt/qemu/sofa.log: LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin HOME=/root USER=root LOGNAME=root QEMU_AUDIO_DRV=spice /usr/bin/kvm -name sofa -S -M pc-1.1 -m 1500 -smp 2,sockets=2,cores=1,threads=1 -uuid 1234beef -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/sofa.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/dev/vgsurcouf/sofa,if=none,id=drive-virtio-disk0,format=raw -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/var/lib/libvirt/images/iso/debian-testing-amd64-netinst.iso,if=none,id=drive-ide0-1-0,readonly=on,format=raw -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=23,id=hostnet0,vhost=on,vhostfd=24 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=...,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -device usb-tablet,id=input0 -spice port=5900,addr=127.0.0.1,disable-ticketing,seamless-migration=on -vga qxl -global qxl-vga.ram_size=67108864 -global qxl-vga.vram_size=67108864 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device pci-assign,host=02:00.0,id=hostdev0,configfd=25,bus=pci.0,multifunction=on,addr=0x8 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 W: kvm binary is deprecated, please use qemu-system-x86_64 instead char device redirected to /dev/pts/2 (label charserial0) qemu-system-x86_64: -device pci-assign,host=02:00.0,id=hostdev0,configfd=25,bus=pci.0,multifunction=on,addr=0x8: Failed to assign device hostdev0 : Operation not permitted qemu-system-x86_64: -device pci-assign,host=02:00.0,id=hostdev0,configfd=25,bus=pci.0,multifunction=on,addr=0x8: Device 'kvm-pci-assign' could not be initialized 2013-04-03 09:44:35.398+: shutting down My host setup: Debian Wheezy + qemu 1.4.0, libvirt 1.0.4, kernel 3.8.5 My host gfx: $ lspci|grep VGA 02:00.0 VGA compatible controller: NVIDIA Corporation GT218 [Quadro NVS 300] (rev a2) I run the following to enable handle device with PCI stub, includind second function: echo 10de 10d8 /sys/bus/pci/drivers/pci-stub/new_id echo 10de 0be3 /sys/bus/pci/drivers/pci-stub/new_id echo :02:00.0 /sys/bus/pci/devices/:02:00.0/driver/unbind echo :02:00.1 /sys/bus/pci/devices/:02:00.1/driver/unbind sleep 3 echo :02:00.0 /sys/bus/pci/drivers/pci-stub/bind echo :02:00.1 /sys/bus/pci/drivers/pci-stub/bind Intel VT-d is enabled: $ dmesg |grep Directed [1.167179] PCI-DMA: Intel(R) Virtualization Technology for Directed I/O $ dmesg |grep IOMMU [0.00] Intel-IOMMU: enabled [0.171520] dmar: IOMMU 0: reg_base_addr ebffe000 ver 1:0 cap c90780106f0462 ecap f020f6 [0.171529] dmar: IOMMU 1: reg_base_addr fedc ver 1:0 cap c90780106f0462 ecap f020f6 [1.166855] IOMMU 0 0xebffe000: using Queued invalidation [1.166861] IOMMU 1 0xfedc: using Queued invalidation [1.166889] IOMMU: hardware identity mapping for device :00:00.0 [1.166891] IOMMU: hardware identity mapping for device :00:03.0 [1.166893] IOMMU: hardware identity mapping for device :00:07.0 [1.166895]
Re: [Qemu-devel] broken incoming migration
Am 30.05.2013 10:18, schrieb Alexey Kardashevskiy: On 05/30/2013 05:49 PM, Paolo Bonzini wrote: Il 30/05/2013 09:44, Alexey Kardashevskiy ha scritto: Hi! I found the migration broken on pseries platform, specifically, this patch broke it: f1c72795af573b24a7da5eb52375c9aba8a37972 migration: do not sent zero pages in bulk stage The idea is not to send zero pages to the destination guest which is expected to have 100% empty RAM. However on pseries plaftorm the guest always has some stuff in the RAM as a part of initialization (device tree, system firmware and rtas (?)) so it is not completely empty. As the source guest cannot detect this, it skips some pages during migration and we get a broken destination guest. Bug. While the idea is ok in general, I do not see any easy way to fix it as neither QEMUMachine::init nor QEMUMachine::reset callbacks has information about whether we are about to receive a migration or not (-incoming parameter) and we cannot move device-tree and system firmware initialization anywhere else. ram_bulk_stage is static and cannot be disabled from the platform initialization code. So what would the community suggest? Revert the patch. :) I'll wait for 24 hours (forgot to cc: the author) and then post a revert patch :) does this problem only occur on pseries emulation? not sending zero pages is not only a performance benefit it also makes overcomitted memory usable. the madv_dontneed seems to kick in asynchronously and memory is not available immediately. what I do not understand if the a memory region is not empty at destination due to device tree, firmware etc. it shouldn't be empty at the source as well so in theory this should not be a problem. Peter
[Qemu-devel] [PATCH] target-ppc kvm: missing kvm_arch_init_irq_routing
This adds an empty stub to make the compiler happy. Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- target-ppc/kvm.c |4 1 file changed, 4 insertions(+) diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c index 3ab2946..2bbc3b8 100644 --- a/target-ppc/kvm.c +++ b/target-ppc/kvm.c @@ -1797,3 +1797,7 @@ int kvm_arch_on_sigbus(int code, void *addr) { return 1; } + +void kvm_arch_init_irq_routing(KVMState *s) +{ +} -- 1.7.10.4
[Qemu-devel] [PATCH] target-ppc kvm: save cr register
This adds a missing code to save CR (condition register) via kvm_arch_put_registers(). kvm_arch_get_registers() already has it. Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- target-ppc/kvm.c |5 + 1 file changed, 5 insertions(+) diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c index 2bbc3b8..c89dd58 100644 --- a/target-ppc/kvm.c +++ b/target-ppc/kvm.c @@ -791,6 +791,11 @@ int kvm_arch_put_registers(CPUState *cs, int level) for (i = 0;i 32; i++) regs.gpr[i] = env-gpr[i]; +regs.cr = 0; +for (i = 0; i 8; i++) { +regs.cr |= (env-crf[i] 15) (4 * (7 - i)); +} + ret = kvm_vcpu_ioctl(cs, KVM_SET_REGS, regs); if (ret 0) return ret; -- 1.7.10.4
Re: [Qemu-devel] [SeaBIOS] KVM call agenda for 2013-05-28
On Wed, 2013-05-29 at 11:18 -0500, Anthony Liguori wrote: Certainly an option, but that is a long-term project. Out of curiousity, are there other benefits to using coreboot as a core firmware in QEMU? Is there a payload we would ever plausibly use besides OVMF and SeaBIOS? I like the idea of using Coreboot on the UEFI side — if the most actively used TianoCore platform is CorebootPkg instead of OvmfPkg, that makes it a lot easier for people using *real* hardware with Coreboot to use TianoCore. And it helps to dispel the stupid misconception in some quarters that Coreboot *competes* with UEFI and thus cannot possibly be supported because helping something that competes with UEFI would be bad. Is there a payload we would ever plausibly use besides OVMF and SeaBIOS? For my part I want to get to the point where the default firmware shipped with qemu can be OVMF. We have SeaBIOS-as-CSM working, which was one of the biggest barriers. There are a few more things (like NV variable storage, in particular) that I need to fix before I can actually make that suggestion with a straight face though... -- dwmw2 smime.p7s Description: S/MIME cryptographic signature
[Qemu-devel] [PATCH] virtio-pci: drop unused wmb macro
The implementation is wrong for kvm, and it's unused anyway. Drop it. Signed-off-by: Michael S. Tsirkin m...@redhat.com Message-id: 20130528102023.ga30...@redhat.com Signed-off-by: Anthony Liguori aligu...@us.ibm.com --- hw/virtio/virtio-pci.c | 6 -- 1 file changed, 6 deletions(-) diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c index 70d2c6b..444b71a 100644 --- a/hw/virtio/virtio-pci.c +++ b/hw/virtio/virtio-pci.c @@ -89,12 +89,6 @@ /* Flags track per-device state like workarounds for quirks in older guests. */ #define VIRTIO_PCI_FLAG_BUS_MASTER_BUG (1 0) -/* QEMU doesn't strictly need write barriers since everything runs in - * lock-step. We'll leave the calls to wmb() in though to make it obvious for - * KVM or if kqemu gets SMP support. - */ -#define wmb() do { } while (0) - /* HACK for virtio to determine if it's running a big endian guest */ bool virtio_is_big_endian(void); -- MST
[Qemu-devel] [PATCH 0/2] fixup some fallback from tree reorg
Here are some patches fixing up minor issues in the tree reorganization. Michael S. Tsirkin (2): dec.c - move to pci-bridge firmware_abi: move to include/hw/nvram/ hw/pci-bridge/Makefile.objs | 2 + hw/pci-bridge/dec.c | 156 +++ hw/pci-bridge/dec.h | 10 ++ hw/pci-host/Makefile.objs| 1 - hw/pci-host/dec.c| 156 --- hw/pci-host/dec.h| 10 -- hw/sparc/sun4m.c | 2 +- hw/sparc64/sun4u.c | 2 +- include/hw/nvram/openbios_firmware_abi.h | 73 +++ include/hw/sparc/firmware_abi.h | 73 --- 10 files changed, 243 insertions(+), 242 deletions(-) create mode 100644 hw/pci-bridge/dec.c create mode 100644 hw/pci-bridge/dec.h delete mode 100644 hw/pci-host/dec.c delete mode 100644 hw/pci-host/dec.h create mode 100644 include/hw/nvram/openbios_firmware_abi.h delete mode 100644 include/hw/sparc/firmware_abi.h -- MST
[Qemu-devel] [PATCH 1/2] dec.c - move to pci-bridge
Looks like dec.c is in pci-host by mistake. Moving it over to pci-bridge. Cc: Andreas Färber afaer...@suse.de Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/pci-bridge/Makefile.objs | 2 + hw/pci-bridge/dec.c | 156 hw/pci-bridge/dec.h | 10 +++ hw/pci-host/Makefile.objs | 1 - hw/pci-host/dec.c | 156 hw/pci-host/dec.h | 10 --- 6 files changed, 168 insertions(+), 167 deletions(-) create mode 100644 hw/pci-bridge/dec.c create mode 100644 hw/pci-bridge/dec.h delete mode 100644 hw/pci-host/dec.c delete mode 100644 hw/pci-host/dec.h diff --git a/hw/pci-bridge/Makefile.objs b/hw/pci-bridge/Makefile.objs index 5dd92d2..968b369 100644 --- a/hw/pci-bridge/Makefile.objs +++ b/hw/pci-bridge/Makefile.objs @@ -1,3 +1,5 @@ common-obj-y += pci_bridge_dev.o common-obj-y += ioh3420.o xio3130_upstream.o xio3130_downstream.o common-obj-y += i82801b11.o +# NewWorld PowerMac +common-obj-$(CONFIG_DEC_PCI) += dec.o diff --git a/hw/pci-bridge/dec.c b/hw/pci-bridge/dec.c new file mode 100644 index 000..cff458b --- /dev/null +++ b/hw/pci-bridge/dec.c @@ -0,0 +1,156 @@ +/* + * QEMU DEC 21154 PCI bridge + * + * Copyright (c) 2006-2007 Fabrice Bellard + * Copyright (c) 2007 Jocelyn Mayer + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the Software), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#include dec.h +#include hw/sysbus.h +#include hw/pci/pci.h +#include hw/pci/pci_host.h +#include hw/pci/pci_bridge.h +#include hw/pci/pci_bus.h + +/* debug DEC */ +//#define DEBUG_DEC + +#ifdef DEBUG_DEC +#define DEC_DPRINTF(fmt, ...) \ +do { printf(DEC: fmt , ## __VA_ARGS__); } while (0) +#else +#define DEC_DPRINTF(fmt, ...) +#endif + +#define DEC_21154(obj) OBJECT_CHECK(DECState, (obj), TYPE_DEC_21154) + +typedef struct DECState { +PCIHostState parent_obj; +} DECState; + +static int dec_map_irq(PCIDevice *pci_dev, int irq_num) +{ +return irq_num; +} + +static int dec_pci_bridge_initfn(PCIDevice *pci_dev) +{ +return pci_bridge_initfn(pci_dev, TYPE_PCI_BUS); +} + +static void dec_21154_pci_bridge_class_init(ObjectClass *klass, void *data) +{ +DeviceClass *dc = DEVICE_CLASS(klass); +PCIDeviceClass *k = PCI_DEVICE_CLASS(klass); + +k-init = dec_pci_bridge_initfn; +k-exit = pci_bridge_exitfn; +k-vendor_id = PCI_VENDOR_ID_DEC; +k-device_id = PCI_DEVICE_ID_DEC_21154; +k-config_write = pci_bridge_write_config; +k-is_bridge = 1; +dc-desc = DEC 21154 PCI-PCI bridge; +dc-reset = pci_bridge_reset; +dc-vmsd = vmstate_pci_device; +} + +static const TypeInfo dec_21154_pci_bridge_info = { +.name = dec-21154-p2p-bridge, +.parent= TYPE_PCI_DEVICE, +.instance_size = sizeof(PCIBridge), +.class_init= dec_21154_pci_bridge_class_init, +}; + +PCIBus *pci_dec_21154_init(PCIBus *parent_bus, int devfn) +{ +PCIDevice *dev; +PCIBridge *br; + +dev = pci_create_multifunction(parent_bus, devfn, false, + dec-21154-p2p-bridge); +br = DO_UPCAST(PCIBridge, dev, dev); +pci_bridge_map_irq(br, DEC 21154 PCI-PCI bridge, dec_map_irq); +qdev_init_nofail(dev-qdev); +return pci_bridge_get_sec_bus(br); +} + +static int pci_dec_21154_device_init(SysBusDevice *dev) +{ +PCIHostState *phb; + +phb = PCI_HOST_BRIDGE(dev); + +memory_region_init_io(phb-conf_mem, pci_host_conf_le_ops, + dev, pci-conf-idx, 0x1000); +memory_region_init_io(phb-data_mem, pci_host_data_le_ops, + dev, pci-data-idx, 0x1000); +sysbus_init_mmio(dev, phb-conf_mem); +sysbus_init_mmio(dev, phb-data_mem); +return 0; +} + +static int dec_21154_pci_host_init(PCIDevice *d) +{ +/* PCI2PCI bridge same values as PearPC - check this */ +return 0; +} + +static void dec_21154_pci_host_class_init(ObjectClass *klass, void *data) +{ +
Re: [Qemu-devel] broken incoming migration
On 05/30/2013 07:08 PM, Peter Lieven wrote: Am 30.05.2013 10:18, schrieb Alexey Kardashevskiy: On 05/30/2013 05:49 PM, Paolo Bonzini wrote: Il 30/05/2013 09:44, Alexey Kardashevskiy ha scritto: Hi! I found the migration broken on pseries platform, specifically, this patch broke it: f1c72795af573b24a7da5eb52375c9aba8a37972 migration: do not sent zero pages in bulk stage The idea is not to send zero pages to the destination guest which is expected to have 100% empty RAM. However on pseries plaftorm the guest always has some stuff in the RAM as a part of initialization (device tree, system firmware and rtas (?)) so it is not completely empty. As the source guest cannot detect this, it skips some pages during migration and we get a broken destination guest. Bug. While the idea is ok in general, I do not see any easy way to fix it as neither QEMUMachine::init nor QEMUMachine::reset callbacks has information about whether we are about to receive a migration or not (-incoming parameter) and we cannot move device-tree and system firmware initialization anywhere else. ram_bulk_stage is static and cannot be disabled from the platform initialization code. So what would the community suggest? Revert the patch. :) I'll wait for 24 hours (forgot to cc: the author) and then post a revert patch :) does this problem only occur on pseries emulation? No idea, really. not sending zero pages is not only a performance benefit it also makes overcomitted memory usable. the madv_dontneed seems to kick in asynchronously and memory is not available immediately. Ok, I do not mind :) what I do not understand if the a memory region is not empty at destination due to device tree, firmware etc. it shouldn't be empty at the source as well so in theory this should not be a problem. This is how it works - first QEMU allocates RAM and put devicetree+firmware somewhere. Then QEMU starts the guest so the firmware starts, loads the kernel and then the kernel zeroes the whole (most of?) RAM including the area where the firmware used to be. Now we migrate. If the source guest is in the kernel already, then it does not know about the memory area previously occupied by the firmware, it is just an empty page. If the source guest is still in the firmware, then those pages are not empty and they are perfectly migrated. -- Alexey
[Qemu-devel] [PATCH 2/2] firmware_abi: move to include/hw/nvram/
firmware_abi.h with structs for OpenBIOS landed in hw/sparc/ by mistake - move it to hw/nvram/ alongside fw_cfg.h. In addition to sparc it's included from ppc mac_nvram.c and will need to include it from prep.c in the future. Cc: Andreas Färber afaer...@suse.de Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/sparc/sun4m.c | 2 +- hw/sparc64/sun4u.c | 2 +- include/hw/nvram/openbios_firmware_abi.h | 73 include/hw/sparc/firmware_abi.h | 73 4 files changed, 75 insertions(+), 75 deletions(-) create mode 100644 include/hw/nvram/openbios_firmware_abi.h delete mode 100644 include/hw/sparc/firmware_abi.h diff --git a/hw/sparc/sun4m.c b/hw/sparc/sun4m.c index 8840881..6e62d17 100644 --- a/hw/sparc/sun4m.c +++ b/hw/sparc/sun4m.c @@ -30,7 +30,7 @@ #include sysemu/sysemu.h #include net/net.h #include hw/boards.h -#include hw/sparc/firmware_abi.h +#include hw/nvram/openbios_firmware_abi.h #include hw/scsi/esp.h #include hw/i386/pc.h #include hw/isa/isa.h diff --git a/hw/sparc64/sun4u.c b/hw/sparc64/sun4u.c index 5c2bbd4..13672de 100644 --- a/hw/sparc64/sun4u.c +++ b/hw/sparc64/sun4u.c @@ -32,7 +32,7 @@ #include qemu/timer.h #include sysemu/sysemu.h #include hw/boards.h -#include hw/sparc/firmware_abi.h +#include hw/nvram/openbios_firmware_abi.h #include hw/nvram/fw_cfg.h #include hw/sysbus.h #include hw/ide.h diff --git a/include/hw/nvram/openbios_firmware_abi.h b/include/hw/nvram/openbios_firmware_abi.h new file mode 100644 index 000..5e6e5d4 --- /dev/null +++ b/include/hw/nvram/openbios_firmware_abi.h @@ -0,0 +1,73 @@ +#ifndef FIRMWARE_ABI_H +#define FIRMWARE_ABI_H + +/* OpenBIOS NVRAM partition */ +struct OpenBIOS_nvpart_v1 { +uint8_t signature; +uint8_t checksum; +uint16_t len; // BE, length divided by 16 +char name[12]; +}; + +#define OPENBIOS_PART_SYSTEM 0x70 +#define OPENBIOS_PART_FREE 0x7f + +static inline void +OpenBIOS_finish_partition(struct OpenBIOS_nvpart_v1 *header, uint32_t size) +{ +unsigned int i, sum; +uint8_t *tmpptr; + +// Length divided by 16 +header-len = cpu_to_be16(size 4); + +// Checksum +tmpptr = (uint8_t *)header; +sum = *tmpptr; +for (i = 0; i 14; i++) { +sum += tmpptr[2 + i]; +sum = (sum + ((sum 0xff00) 8)) 0xff; +} +header-checksum = sum 0xff; +} + +static inline uint32_t +OpenBIOS_set_var(uint8_t *nvram, uint32_t addr, const char *str) +{ +uint32_t len; + +len = strlen(str) + 1; +memcpy(nvram[addr], str, len); + +return addr + len; +} + +/* Sun IDPROM structure at the end of NVRAM */ +/* from http://www.squirrel.com/squirrel/sun-nvram-hostid.faq.html */ +struct Sun_nvram { +uint8_t type; /* always 01 */ +uint8_t machine_id; /* first byte of host id (machine type) */ +uint8_t macaddr[6]; /* 6 byte ethernet address (first 3 bytes 08, 00, 20) */ +uint8_t date[4];/* date of manufacture */ +uint8_t hostid[3]; /* remaining 3 bytes of host id (serial number) */ +uint8_t checksum; /* bitwise xor of previous bytes */ +}; + +static inline void +Sun_init_header(struct Sun_nvram *header, const uint8_t *macaddr, int machine_id) +{ +uint8_t tmp, *tmpptr; +unsigned int i; + +header-type = 1; +header-machine_id = machine_id 0xff; +memcpy(header-macaddr, macaddr, 6); +/* Calculate checksum */ +tmp = 0; +tmpptr = (uint8_t *)header; +for (i = 0; i 15; i++) +tmp ^= tmpptr[i]; + +header-checksum = tmp; +} +#endif /* FIRMWARE_ABI_H */ diff --git a/include/hw/sparc/firmware_abi.h b/include/hw/sparc/firmware_abi.h deleted file mode 100644 index 5e6e5d4..000 --- a/include/hw/sparc/firmware_abi.h +++ /dev/null @@ -1,73 +0,0 @@ -#ifndef FIRMWARE_ABI_H -#define FIRMWARE_ABI_H - -/* OpenBIOS NVRAM partition */ -struct OpenBIOS_nvpart_v1 { -uint8_t signature; -uint8_t checksum; -uint16_t len; // BE, length divided by 16 -char name[12]; -}; - -#define OPENBIOS_PART_SYSTEM 0x70 -#define OPENBIOS_PART_FREE 0x7f - -static inline void -OpenBIOS_finish_partition(struct OpenBIOS_nvpart_v1 *header, uint32_t size) -{ -unsigned int i, sum; -uint8_t *tmpptr; - -// Length divided by 16 -header-len = cpu_to_be16(size 4); - -// Checksum -tmpptr = (uint8_t *)header; -sum = *tmpptr; -for (i = 0; i 14; i++) { -sum += tmpptr[2 + i]; -sum = (sum + ((sum 0xff00) 8)) 0xff; -} -header-checksum = sum 0xff; -} - -static inline uint32_t -OpenBIOS_set_var(uint8_t *nvram, uint32_t addr, const char *str) -{ -uint32_t len; - -len = strlen(str) + 1; -memcpy(nvram[addr], str, len); - -return addr + len; -} - -/* Sun IDPROM structure at the end of NVRAM */ -/* from http://www.squirrel.com/squirrel/sun-nvram-hostid.faq.html */ -struct Sun_nvram { -uint8_t type; /* always 01 */ -uint8_t machine_id; /* first
[Qemu-devel] [PATCH V15 1/6] add def_value_str in QemuOptDesc struct and rewrite qemu_opts_print
From: Dong Xu Wang wdon...@linux.vnet.ibm.com qemu_opts_print has no user now, so can re-write the function safely. qemu_opts_print will be used while using qemu-img create, it will produce the same output as previous code. The behavior of this function has changed: 1. Print every possible option, whether a value has been set or not. 2. Option descriptors may provide a default value. 3. Print to stdout instead of stderr. Previously the behavior was to print every option that has been set. Options that have not been set would be skipped. v13-v14: 1) fix memory leak. 2) make opt_set do not accpet null value argument. v12-v13 1) re-write commit message. v11-v12 1) make def_value_str become the real default value string in opt_set function. v10-v11: 1) print all values that have actually been assigned while accept-any cases. v7-v8: 1) print elements = accept any params while opts_accepts_any() == true. 2) since def_print_str is the default value if an option isn't set, so rename it to def_value_str. Signed-off-by: Dong Xu Wang wdon...@linux.vnet.ibm.com Signed-off-by: Dongxu Wang wdon...@linux.vnet.ibm.com --- include/qemu/option.h | 3 ++- util/qemu-option.c| 32 ++-- 2 files changed, 28 insertions(+), 7 deletions(-) diff --git a/include/qemu/option.h b/include/qemu/option.h index bdb6d21..b928ab0 100644 --- a/include/qemu/option.h +++ b/include/qemu/option.h @@ -96,6 +96,7 @@ typedef struct QemuOptDesc { const char *name; enum QemuOptType type; const char *help; +const char *def_value_str; } QemuOptDesc; struct QemuOptsList { @@ -152,7 +153,7 @@ QDict *qemu_opts_to_qdict(QemuOpts *opts, QDict *qdict); void qemu_opts_absorb_qdict(QemuOpts *opts, QDict *qdict, Error **errp); typedef int (*qemu_opts_loopfunc)(QemuOpts *opts, void *opaque); -int qemu_opts_print(QemuOpts *opts, void *dummy); +int qemu_opts_print(QemuOpts *opts); int qemu_opts_foreach(QemuOptsList *list, qemu_opts_loopfunc func, void *opaque, int abort_on_failure); diff --git a/util/qemu-option.c b/util/qemu-option.c index 8b74bf1..84d8c8b 100644 --- a/util/qemu-option.c +++ b/util/qemu-option.c @@ -860,16 +860,36 @@ void qemu_opts_del(QemuOpts *opts) g_free(opts); } -int qemu_opts_print(QemuOpts *opts, void *dummy) +int qemu_opts_print(QemuOpts *opts) { QemuOpt *opt; +QemuOptDesc *desc = opts-list-desc; -fprintf(stderr, %s: %s:, opts-list-name, -opts-id ? opts-id : noid); -QTAILQ_FOREACH(opt, opts-head, next) { -fprintf(stderr, %s=\%s\, opt-name, opt-str); +if (desc[0].name == NULL) { +QTAILQ_FOREACH(opt, opts-head, next) { +printf(%s=\%s\ , opt-name, opt-str); +} +return 0; +} +for (; desc desc-name; desc++) { +const char *value = desc-def_value_str; +QemuOpt *opt; + +opt = qemu_opt_find(opts, desc-name); +if (opt) { +value = opt-str; +} + +if (!value) { +continue; +} + +if (desc-type == QEMU_OPT_STRING) { +printf(%s='%s' , desc-name, value); +} else { +printf(%s=%s , desc-name, value); +} } -fprintf(stderr, \n); return 0; } -- 1.7.11.7
[Qemu-devel] [PATCH V15 2/6] avoid duplication of default value in QemuOpts
From: Dong Xu Wang wdon...@linux.vnet.ibm.com This patch will move the default value entirely to QemuOptDesc. When getting the value of an option that hasn't been set, and QemuOptDesc has a default value, return that. Else, behave as before. Example: qemu_opt_get_number(opts, foo, 42) If foo has been set in opts, return its value. Else, if opt's QemuOptDesc has a default value for foo, return that. Else, return 42. Note that the last argument is useless when QemuOptDesc has a default value. Ugly. If it bothers us, we could assert. Example: qemu_opt_get(opts, bar) If bar has been set in opts, return its value. Else, if opt's QemuOptDesc has a default value for bar, return that. Else, return NULL. Signed-off-by: Dong Xu Wang wdon...@linux.vnet.ibm.com v13-v14: 1) change code style. 2) assert errors. Signed-off-by: Dongxu Wang wdon...@linux.vnet.ibm.com --- util/qemu-option.c | 66 -- 1 file changed, 49 insertions(+), 17 deletions(-) diff --git a/util/qemu-option.c b/util/qemu-option.c index 84d8c8b..bd2acdc 100644 --- a/util/qemu-option.c +++ b/util/qemu-option.c @@ -525,9 +525,31 @@ static QemuOpt *qemu_opt_find(QemuOpts *opts, const char *name) return NULL; } +static const QemuOptDesc *find_desc_by_name(const QemuOptDesc *desc, +const char *name) +{ +int i; + +for (i = 0; desc[i].name != NULL; i++) { +if (strcmp(desc[i].name, name) == 0) { +return desc[i]; +} +} + +return NULL; +} + const char *qemu_opt_get(QemuOpts *opts, const char *name) { QemuOpt *opt = qemu_opt_find(opts, name); +const QemuOptDesc *desc; + +if (!opt) { +desc = find_desc_by_name(opts-list-desc, name); +if (desc desc-def_value_str) { +return desc-def_value_str; +} +} return opt ? opt-str : NULL; } @@ -546,9 +568,17 @@ bool qemu_opt_has_help_opt(QemuOpts *opts) bool qemu_opt_get_bool(QemuOpts *opts, const char *name, bool defval) { QemuOpt *opt = qemu_opt_find(opts, name); +const QemuOptDesc *desc; +Error *local_err = NULL; -if (opt == NULL) +if (opt == NULL) { +desc = find_desc_by_name(opts-list-desc, name); +if (desc desc-def_value_str) { +parse_option_bool(name, desc-def_value_str, defval, local_err); +assert(!local_err); +} return defval; +} assert(opt-desc opt-desc-type == QEMU_OPT_BOOL); return opt-value.boolean; } @@ -556,9 +586,17 @@ bool qemu_opt_get_bool(QemuOpts *opts, const char *name, bool defval) uint64_t qemu_opt_get_number(QemuOpts *opts, const char *name, uint64_t defval) { QemuOpt *opt = qemu_opt_find(opts, name); +const QemuOptDesc *desc; +Error *local_err = NULL; -if (opt == NULL) +if (opt == NULL) { +desc = find_desc_by_name(opts-list-desc, name); +if (desc desc-def_value_str) { +parse_option_number(name, desc-def_value_str, defval, local_err); +assert(!local_err); +} return defval; +} assert(opt-desc opt-desc-type == QEMU_OPT_NUMBER); return opt-value.uint; } @@ -566,9 +604,17 @@ uint64_t qemu_opt_get_number(QemuOpts *opts, const char *name, uint64_t defval) uint64_t qemu_opt_get_size(QemuOpts *opts, const char *name, uint64_t defval) { QemuOpt *opt = qemu_opt_find(opts, name); +const QemuOptDesc *desc; +Error *local_err = NULL; -if (opt == NULL) +if (opt == NULL) { +desc = find_desc_by_name(opts-list-desc, name); +if (desc desc-def_value_str) { +parse_option_size(name, desc-def_value_str, defval, local_err); +assert(!local_err); +} return defval; +} assert(opt-desc opt-desc-type == QEMU_OPT_SIZE); return opt-value.uint; } @@ -609,20 +655,6 @@ static bool opts_accepts_any(const QemuOpts *opts) return opts-list-desc[0].name == NULL; } -static const QemuOptDesc *find_desc_by_name(const QemuOptDesc *desc, -const char *name) -{ -int i; - -for (i = 0; desc[i].name != NULL; i++) { -if (strcmp(desc[i].name, name) == 0) { -return desc[i]; -} -} - -return NULL; -} - static void opt_set(QemuOpts *opts, const char *name, const char *value, bool prepend, Error **errp) { -- 1.7.11.7
[Qemu-devel] [PATCH V15 0/6] replace QEMUOptionParameter with QemuOpts parser
These patches will replace QEMUOptionParameter with QemuOpts. Change logs please go to each patch's commit message. V14-V15: 1) Only delete enum QEMUOptionParType. Dong Xu Wang (6): add def_value_str in QemuOptDesc struct and rewrite qemu_opts_print avoid duplication of default value in QemuOpts Create four QemuOptsList related functions Create some QemuOpts functons Use QemuOpts support in block layer remove QEMUOptionParameter related functions and struct block.c | 100 block/cow.c | 52 ++--- block/gluster.c | 37 ++- block/iscsi.c | 31 ++- block/qcow.c | 67 +++--- block/qcow2.c | 199 block/qed.c | 108 + block/qed.h | 2 +- block/raw-posix.c | 59 +++-- block/raw-win32.c | 31 +-- block/raw.c | 30 +-- block/rbd.c | 62 +++-- block/sheepdog.c | 81 --- block/ssh.c | 29 ++- block/vdi.c | 70 +++--- block/vmdk.c | 128 ++- block/vpc.c | 65 +++--- block/vvfat.c | 11 +- include/block/block.h | 5 +- include/block/block_int.h | 6 +- include/qemu/option.h | 56 ++--- qemu-img.c| 65 +++--- util/qemu-option.c| 571 +- 23 files changed, 904 insertions(+), 961 deletions(-) -- 1.7.11.7
[Qemu-devel] [PATCH V15 4/6] Create some QemuOpts functons
From: Dong Xu Wang wdon...@linux.vnet.ibm.com These functions will be used in next commit. qemu_opt_get_(*)_del functions are used to make sure we have the same behaviors as before: in block layer, after parsing a parameter value, parameter list will delete it to avoid parsing it twice. Signed-off-by: Dong Xu Wang wdon...@linux.vnet.ibm.com v13-v14: 1) rewrite commit message. 2) use def_value_str in qemu_opt_get_FOO_del() and qemu_opt_get_del(). 3) delete redundant qemu_opt_del(opt). Signed-off-by: Dongxu Wang wdon...@linux.vnet.ibm.com --- include/qemu/option.h | 11 - util/qemu-option.c| 116 +++--- 2 files changed, 119 insertions(+), 8 deletions(-) diff --git a/include/qemu/option.h b/include/qemu/option.h index c7a5c14..d63e447 100644 --- a/include/qemu/option.h +++ b/include/qemu/option.h @@ -108,6 +108,7 @@ struct QemuOptsList { }; const char *qemu_opt_get(QemuOpts *opts, const char *name); +const char *qemu_opt_get_del(QemuOpts *opts, const char *name); /** * qemu_opt_has_help_opt: * @opts: options to search for a help request @@ -121,13 +122,18 @@ const char *qemu_opt_get(QemuOpts *opts, const char *name); */ bool qemu_opt_has_help_opt(QemuOpts *opts); bool qemu_opt_get_bool(QemuOpts *opts, const char *name, bool defval); +bool qemu_opt_get_bool_del(QemuOpts *opts, const char *name, bool defval); uint64_t qemu_opt_get_number(QemuOpts *opts, const char *name, uint64_t defval); uint64_t qemu_opt_get_size(QemuOpts *opts, const char *name, uint64_t defval); +uint64_t qemu_opt_get_size_del(QemuOpts *opts, const char *name, + uint64_t defval); int qemu_opt_set(QemuOpts *opts, const char *name, const char *value); +int qemu_opt_replace_set(QemuOpts *opts, const char *name, const char *value); void qemu_opt_set_err(QemuOpts *opts, const char *name, const char *value, Error **errp); int qemu_opt_set_bool(QemuOpts *opts, const char *name, bool val); int qemu_opt_set_number(QemuOpts *opts, const char *name, int64_t val); +int qemu_opt_replace_set_number(QemuOpts *opts, const char *name, int64_t val); typedef int (*qemu_opt_loopfunc)(const char *name, const char *value, void *opaque); int qemu_opt_foreach(QemuOpts *opts, qemu_opt_loopfunc func, void *opaque, int abort_on_failure); @@ -144,7 +150,10 @@ const char *qemu_opts_id(QemuOpts *opts); void qemu_opts_del(QemuOpts *opts); void qemu_opts_validate(QemuOpts *opts, const QemuOptDesc *desc, Error **errp); int qemu_opts_do_parse(QemuOpts *opts, const char *params, const char *firstname); -QemuOpts *qemu_opts_parse(QemuOptsList *list, const char *params, int permit_abbrev); +int qemu_opts_do_parse_replace(QemuOpts *opts, const char *params, + const char *firstname); +QemuOpts *qemu_opts_parse(QemuOptsList *list, const char *params, + int permit_abbrev); void qemu_opts_set_defaults(QemuOptsList *list, const char *params, int permit_abbrev); QemuOpts *qemu_opts_from_qdict(QemuOptsList *list, const QDict *qdict, diff --git a/util/qemu-option.c b/util/qemu-option.c index e3f482f..3251adc 100644 --- a/util/qemu-option.c +++ b/util/qemu-option.c @@ -33,6 +33,8 @@ #include qapi/qmp/qerror.h #include qemu/option_int.h +static void qemu_opt_del(QemuOpt *opt); + /* * Extracts the name of an option from the parameter string (p points at the * first byte of the option name) @@ -553,6 +555,24 @@ const char *qemu_opt_get(QemuOpts *opts, const char *name) return opt ? opt-str : NULL; } +const char *qemu_opt_get_del(QemuOpts *opts, const char *name) +{ +QemuOpt *opt = qemu_opt_find(opts, name); +const QemuOptDesc *desc; +const char *str = NULL; + +if (!opt) { +desc = find_desc_by_name(opts-list-desc, name); +if (desc desc-def_value_str) { +str = g_strdup(desc-def_value_str); +} +} else { +str = g_strdup(opt-str); +qemu_opt_del(opt); +} +return str; +} + bool qemu_opt_has_help_opt(QemuOpts *opts) { QemuOpt *opt; @@ -583,6 +603,27 @@ bool qemu_opt_get_bool(QemuOpts *opts, const char *name, bool defval) return opt-value.boolean; } +bool qemu_opt_get_bool_del(QemuOpts *opts, const char *name, bool defval) +{ +QemuOpt *opt = qemu_opt_find(opts, name); +const QemuOptDesc *desc; +Error *local_err = NULL; +bool ret = defval; + +if (opt == NULL) { +desc = find_desc_by_name(opts-list-desc, name); +if (desc desc-def_value_str) { +parse_option_bool(name, desc-def_value_str, ret, local_err); +assert(!local_err); +} +return ret; +} +assert(opt-desc opt-desc-type == QEMU_OPT_BOOL); +ret = opt-value.boolean; +qemu_opt_del(opt); +return ret; +} + uint64_t qemu_opt_get_number(QemuOpts *opts, const char *name, uint64_t defval) {
[Qemu-devel] [PATCH V15 3/6] Create four QemuOptsList related functions
From: Dong Xu Wang wdon...@linux.vnet.ibm.com This patch will create 4 functions, count_opts_list, qemu_opts_append, qemu_opts_free and qemu_opts_print_help, they will be used in following commits. v12-v13: 1) simply assert that neither argument has merge_lists set. 2) drop superfluous paranthesesis around p == first. v11-v12: 1) renmae functions. 2) fix loop styles and code styles. 3) qemu_opts_apend will not return NULL now. 4) merge_lists value is from arguments in qemu_opts_append. v6-v7: 1) Fix typo. v5-v6: 1) allocate enough space in append_opts_list function. Signed-off-by: Dong Xu Wang wdon...@linux.vnet.ibm.com Signed-off-by: Dongxu Wang wdon...@linux.vnet.ibm.com --- include/qemu/option.h | 3 ++ util/qemu-option.c| 82 +++ 2 files changed, 85 insertions(+) diff --git a/include/qemu/option.h b/include/qemu/option.h index b928ab0..c7a5c14 100644 --- a/include/qemu/option.h +++ b/include/qemu/option.h @@ -157,4 +157,7 @@ int qemu_opts_print(QemuOpts *opts); int qemu_opts_foreach(QemuOptsList *list, qemu_opts_loopfunc func, void *opaque, int abort_on_failure); +QemuOptsList *qemu_opts_append(QemuOptsList *first, QemuOptsList *second); +void qemu_opts_free(QemuOptsList *list); +void qemu_opts_print_help(QemuOptsList *list); #endif diff --git a/util/qemu-option.c b/util/qemu-option.c index bd2acdc..e3f482f 100644 --- a/util/qemu-option.c +++ b/util/qemu-option.c @@ -1218,3 +1218,85 @@ int qemu_opts_foreach(QemuOptsList *list, qemu_opts_loopfunc func, void *opaque, loc_pop(loc); return rc; } + +static size_t count_opts_list(QemuOptsList *list) +{ +size_t i = 0; + +for (i = 0; list list-desc[i].name; i++) { +; +} + +return i; +} + +/* Create a new QemuOptsList and make its desc to the merge of first + * and second. It will allocate space for one new QemuOptsList plus + * enough space for QemuOptDesc in first and second QemuOptsList. + * First argument's QemuOptDesc members take precedence over second's. + * The result's name and implied_opt_name are not copied from them. + * Both merge_lists should not be set. Both list can be NULL. + */ +QemuOptsList *qemu_opts_append(QemuOptsList *first, + QemuOptsList *second) +{ +size_t num_first_opts, num_second_opts; +QemuOptsList *dest = NULL; +int i = 0; +int index = 0; +QemuOptsList *p = first; + +num_first_opts = count_opts_list(first); +num_second_opts = count_opts_list(second); + +dest = g_malloc0(sizeof(QemuOptsList) ++ (num_first_opts + num_second_opts + 1) * sizeof(QemuOptDesc)); + +dest-name = append_opts_list; +dest-implied_opt_name = NULL; +assert((!first || !first-merge_lists) + (!second || !second-merge_lists)); +QTAILQ_INIT(dest-head); + +for (i = 0; p p-desc[i].name; i++) { +if (!find_desc_by_name(dest-desc, p-desc[i].name)) { +dest-desc[index].name = g_strdup(p-desc[i].name); +dest-desc[index].help = g_strdup(p-desc[i].help); +dest-desc[index].type = p-desc[i].type; +dest-desc[index].def_value_str = +g_strdup(p-desc[i].def_value_str); +index++; +} +if (p == first p !p-desc[i].name) { +p = second; +i = 0; +} +} +dest-desc[index].name = NULL; +return dest; +} + +/* free a QemuOptsList, can accept NULL as arguments */ +void qemu_opts_free(QemuOptsList *list) +{ +int i = 0; + +for (i = 0; list list-desc[i].name; i++) { +g_free((char *)list-desc[i].name); +g_free((char *)list-desc[i].help); +g_free((char *)list-desc[i].def_value_str); +} + +g_free(list); +} + +void qemu_opts_print_help(QemuOptsList *list) +{ +int i = 0; +printf(Supported options:\n); +for (i = 0; list list-desc[i].name; i++) { +printf(%-16s %s\n, list-desc[i].name, +list-desc[i].help ? +list-desc[i].help : No description available); +} +} -- 1.7.11.7
[Qemu-devel] [PATCH V15 6/6] remove QEMUOptionParameter related functions and struct
From: Dong Xu Wang wdon...@linux.vnet.ibm.com Signed-off-by: Dongxu Wang wdon...@linux.vnet.ibm.com --- include/qemu/option.h | 39 --- util/qemu-option.c| 285 -- 2 files changed, 324 deletions(-) diff --git a/include/qemu/option.h b/include/qemu/option.h index d63e447..d2d3f16 100644 --- a/include/qemu/option.h +++ b/include/qemu/option.h @@ -31,24 +31,6 @@ #include qapi/error.h #include qapi/qmp/qdict.h -enum QEMUOptionParType { -OPT_FLAG, -OPT_NUMBER, -OPT_SIZE, -OPT_STRING, -}; - -typedef struct QEMUOptionParameter { -const char *name; -enum QEMUOptionParType type; -union { -uint64_t n; -char* s; -} value; -const char *help; -} QEMUOptionParameter; - - const char *get_opt_name(char *buf, int buf_size, const char *p, char delim); const char *get_opt_value(char *buf, int buf_size, const char *p); int get_next_param_value(char *buf, int buf_size, @@ -58,27 +40,6 @@ int get_param_value(char *buf, int buf_size, int check_params(char *buf, int buf_size, const char * const *params, const char *str); - -/* - * The following functions take a parameter list as input. This is a pointer to - * the first element of a QEMUOptionParameter array which is terminated by an - * entry with entry-name == NULL. - */ - -QEMUOptionParameter *get_option_parameter(QEMUOptionParameter *list, -const char *name); -int set_option_parameter(QEMUOptionParameter *list, const char *name, -const char *value); -int set_option_parameter_int(QEMUOptionParameter *list, const char *name, -uint64_t value); -QEMUOptionParameter *append_option_parameters(QEMUOptionParameter *dest, -QEMUOptionParameter *list); -QEMUOptionParameter *parse_option_parameters(const char *param, -QEMUOptionParameter *list, QEMUOptionParameter *dest); -void free_option_parameters(QEMUOptionParameter *list); -void print_option_parameters(QEMUOptionParameter *list); -void print_option_help(QEMUOptionParameter *list); - /* -- */ typedef struct QemuOpt QemuOpt; diff --git a/util/qemu-option.c b/util/qemu-option.c index 3251adc..e1b2a3b 100644 --- a/util/qemu-option.c +++ b/util/qemu-option.c @@ -155,22 +155,6 @@ int check_params(char *buf, int buf_size, return 0; } -/* - * Searches an option list for an option with the given name - */ -QEMUOptionParameter *get_option_parameter(QEMUOptionParameter *list, -const char *name) -{ -while (list list-name) { -if (!strcmp(list-name, name)) { -return list; -} -list++; -} - -return NULL; -} - static void parse_option_bool(const char *name, const char *value, bool *ret, Error **errp) { @@ -244,275 +228,6 @@ static void parse_option_size(const char *name, const char *value, } } -/* - * Sets the value of a parameter in a given option list. The parsing of the - * value depends on the type of option: - * - * OPT_FLAG (uses value.n): - * If no value is given, the flag is set to 1. - * Otherwise the value must be on (set to 1) or off (set to 0) - * - * OPT_STRING (uses value.s): - * value is strdup()ed and assigned as option value - * - * OPT_SIZE (uses value.n): - * The value is converted to an integer. Suffixes for kilobytes etc. are - * allowed (powers of 1024). - * - * Returns 0 on succes, -1 in error cases - */ -int set_option_parameter(QEMUOptionParameter *list, const char *name, -const char *value) -{ -bool flag; -Error *local_err = NULL; - -// Find a matching parameter -list = get_option_parameter(list, name); -if (list == NULL) { -fprintf(stderr, Unknown option '%s'\n, name); -return -1; -} - -// Process parameter -switch (list-type) { -case OPT_FLAG: -parse_option_bool(name, value, flag, local_err); -if (!error_is_set(local_err)) { -list-value.n = flag; -} -break; - -case OPT_STRING: -if (value != NULL) { -list-value.s = g_strdup(value); -} else { -fprintf(stderr, Option '%s' needs a parameter\n, name); -return -1; -} -break; - -case OPT_SIZE: -parse_option_size(name, value, list-value.n, local_err); -break; - -default: -fprintf(stderr, Bug: Option '%s' has an unknown type\n, name); -return -1; -} - -if (error_is_set(local_err)) { -qerror_report_err(local_err); -error_free(local_err); -return -1; -} - -return 0; -} - -/* - * Sets the given parameter to an integer instead of a string. - * This function cannot be used to set string options. - * - * Returns 0 on success, -1 in error cases - */ -int set_option_parameter_int(QEMUOptionParameter *list, const char *name, -uint64_t value) -{ -// Find a matching parameter -list
[Qemu-devel] Unable to parse -device drivers containing commas?
Hi all, I found that the QEMU -device command line parser doesn't seem to like driver names containing a comma such as SUNW,tcx for the video driver on qemu-system-sparc: $ ./qemu-system-sparc -device SUNW,tcx,help qemu-system-sparc: -device SUNW,tcx,help: Parameter 'driver' expects device type $ ./qemu-system-sparc -device 'SUNW,tcx',help qemu-system-sparc: -device SUNW,tcx,help: Parameter 'driver' expects device type $ ./qemu-system-sparc -device SUNW,tcx,help qemu-system-sparc: -device SUNW,tcx,help: Parameter 'driver' expects device type If I try temporarily removing the comma from the TypeInfo name field in hw/display/tcx.c then all is fine: $ ./qemu-system-sparc -device 'SUNWtcx',help SUNWtcx.vram_size=hex32 SUNWtcx.width=uint16 SUNWtcx.height=uint16 SUNWtcx.depth=uint16 $ ./qemu-system-sparc -device SUNWtcx,help SUNWtcx.vram_size=hex32 SUNWtcx.width=uint16 SUNWtcx.height=uint16 SUNWtcx.depth=uint16 Note that there are a couple of other devices in the SPARC32 device tree with this problem, since the general device naming convention on SPARC is in the form manufacturer,device. Is there a way of escaping the commas on the command line so that it is possible to list properties for drivers named in this way? Many thanks, Mark.
[Qemu-devel] [PATCH V19 0/8] add-cow file format
It will introduce a new file format: add-cow. The add-cow file format makes it possible to perform copy-on-write on top of a raw disk image. When we know that no backing file clusters remain visible (e.g. we have streamed the entire image and copied all data from the backing file), then it is possible to discard the add-cow file and use the raw image file directly. This feature adds the copy-on-write feature to raw files (which cannot support it natively) while allowing us to get full performance again later when we no longer need copy-on-write. add-cow can benefit from other available functions, such as path_has_protocol and qed_read_string, so we will make them public. snapshot_blkdev are not supported now for add-cow. Will add it in futher patches. These patches are using QemuOpts parser, former patches could be found here: http://patchwork.ozlabs.org/patch/247508/ v18-v19: 1) support parallel aio write. 2) fix flush method. 3) other small fix. v17 - v18: 1) remove version field. 2) header size is maximum value and cluster size value. 3) fix type. 4) move struct to source file. 5) cluster_size-table_size. 6) use error_report, not fprintf. 7) remove version field from header. 8) header_size is MAX(cluster_size, 4096). 9) introduce s-cluster_sectors. 10) use BLKDBG_L2_LOAD/UPDATE. 11) add 037 and 038 tests. v16-v17): 1) Use stringify. v15-v16): 1) Rebased on QEMU upstream source tree. 2) Judge if opts is null in add_cow_create function. v14-v15: 1) Fix typo and make some sentences more readable in docs. 2) Introduce STRINGIZER macro. v13-v14: 1) Make some sentences more clear in docs. 2) Make MAGIC from 8 bytes to 4 bytes. 3) Fix some bugs. v12-v13: 1) Use QemuOpts, not QEMUOptionParameter 2) cluster_size configuable 3) Refactor block-cache.c 4) Correct qemu-iotests script. 5) Other bug fix. v11-v12: 1) Removed un-used feature bit. 2) Share cache code with qcow2.c. 3) Remove snapshot_blkdev support, will add it in another patch. 5) COW Bitmap field in add-cow file will be multiple of 65536. 6) fix grammer and typo. Dong Xu Wang (8): V18: docs: document for add-cow file format make path_has_protocol non static qed_read_string to bdrv_read_string rename qcow2-cache.c to block-cache.c Make block-cache.c be common interface add debug event for add-cow add-cow file format core code. qemu-iotests: add add-cow iotests support block.c | 29 +- block/Makefile.objs | 4 +- block/add-cow.c | 812 +++ block/blkdebug.c | 3 + block/block-cache.c | 342 ++ block/qcow2-cache.c | 323 - block/qcow2-cluster.c| 52 +-- block/qcow2-refcount.c | 62 ++-- block/qcow2.c| 21 +- block/qcow2.h| 24 +- block/qed.c | 34 +- docs/specs/add-cow.txt | 172 + include/block/block-cache.h | 59 include/block/block.h| 6 + include/block/block_int.h| 2 + tests/qemu-iotests/017 | 2 +- tests/qemu-iotests/020 | 2 +- tests/qemu-iotests/037 | 2 +- tests/qemu-iotests/038 | 2 +- tests/qemu-iotests/common| 6 + tests/qemu-iotests/common.rc | 15 +- trace-events | 13 +- 22 files changed, 1536 insertions(+), 451 deletions(-) create mode 100644 block/add-cow.c create mode 100644 block/block-cache.c delete mode 100644 block/qcow2-cache.c create mode 100644 docs/specs/add-cow.txt create mode 100644 include/block/block-cache.h -- 1.7.11.7
[Qemu-devel] [PATCH V19 1/8] V18: docs: document for add-cow file format
From: Dong Xu Wang wdon...@linux.vnet.ibm.com Document for add-cow format, the usage and spec of add-cow are introduced. v18-v19: 1) backing_fmt and image_fmt NUL-terminated. 2) other fix. V17-V18: 1) remove version field. 2) header size is maximum value and cluster size value. 3) fix type. Signed-off-by: Dong Xu Wang wdon...@linux.vnet.ibm.com Signed-off-by: Dongxu Wang wdon...@linux.vnet.ibm.com --- docs/specs/add-cow.txt | 172 + 1 file changed, 172 insertions(+) create mode 100644 docs/specs/add-cow.txt diff --git a/docs/specs/add-cow.txt b/docs/specs/add-cow.txt new file mode 100644 index 000..fba74dd --- /dev/null +++ b/docs/specs/add-cow.txt @@ -0,0 +1,172 @@ +== General == + +The raw file format does not support backing files or copy on write +feature. The add-cow image format makes it possible to use backing +files with an image by keeping a separate .add-cow metadata file. +Once all clusters have been written into the image it is safe to +discard the .add-cow and backing files, then we can use the image +directly. + +An example usage of add-cow would look like: +(ubuntu.img is a disk image which has an installed OS.) +1) Create an image, such as raw format, with the same size of +ubuntu.img: +qemu-img create -f raw test.raw 8G +2) Create an add-cow image which will store dirty bitmap +qemu-img create -f add-cow test.add-cow \ +-o backing_file=ubuntu.img,image_file=test.raw +3) Run qemu with add-cow image +qemu -drive if=virtio,file=test.add-cow + +test.raw may be larger than ubuntu.img, in that case, the size of +test.add-cow will be calculated from the size of test.raw. + +image_fmt can be omitted, in that case image_fmt is assumed to be +raw. backing_fmt can also be omitted, add-cow should do a probe +operation and determine what the backing file's format is. It is +recommended to always specify the format for any raw file, because +probing a raw file is a security hole. + +=Specification= + +The file format looks like this: + + +---+---+ + | Header| COW bitmap | + +---+---+ + +All numbers in add-cow are stored in Little Endian byte order. + +== Header == + +The Header is included in the first bytes: +(HEADER_SIZE is defined in 40-43 bytes.) +Byte0 - 3:magic +add-cow magic string (ACOW). It is coded in +free-form ASCII. + +4 - 7:backing file name offset +Offset in the add-cow file at which the backing +file name is stored (NB: The string is NOT +NUL-terminated). +If backing file name does NOT exist, this field +will be 0. Must be between 76 and [HEADER_SIZE +- 2](a file name must be at least 1 byte). + +8 - 11:backing file name size +Length of the backing file name in bytes. It +will be 0 if the backing file name offset is +0. If backing file name offset is non-zero, +then it must be non-zero. Must be less than +[HEADER_SIZE - 76] to fit in the reserved +part of the header. Backing file name offset ++ size must be no more than HEADER_SIZE. + +12 - 15:image file name offset +Offset in the add-cow file at which the image +file name is stored (NB: The string is NOT +NUL-terminated). It must be between 76 and +[HEADER_SIZE - 2]. Image file name size + offset +must be no more than HEADER_SIZE. + +16 - 19:image file name size +Length of the image file name in bytes. +Must be less than [HEADER_SIZE - 76] to fit in +the reserved part of the header. + +20 - 23:cluster bits +Number of bits that are used for addressing an +offset within a cluster (1 cluster_bits is +the cluster size). Must not be less than 12 +(i.e. 4096 byte clusters). + +Note: qemu as of today has an implementation +limit of 2 MB as the maximum cluster size and +won't be able to open images with larger cluster +sizes. + +24 - 31:features +Bitmask of features. If a feature bit is set +but not recognized, the opening add-cow file must +fail. No features bits are currently defined. + +
[Qemu-devel] [PATCH V19 4/8] rename qcow2-cache.c to block-cache.c
From: Dong Xu Wang wdon...@linux.vnet.ibm.com Block layer will use qcow2-cache as common cache code, so rename it to block-cache.c. v18-v19: 1) only rename, did not touch other code. Signed-off-by: Dongxu Wang wdon...@linux.vnet.ibm.com --- block/Makefile.objs| 3 +- block/block-cache.c| 323 + block/qcow2-cache.c| 323 - block/qcow2-cluster.c | 48 block/qcow2-refcount.c | 42 +++ block/qcow2.c | 18 +-- block/qcow2.h | 28 ++--- trace-events | 12 +- 8 files changed, 399 insertions(+), 398 deletions(-) create mode 100644 block/block-cache.c delete mode 100644 block/qcow2-cache.c diff --git a/block/Makefile.objs b/block/Makefile.objs index 5f0358a..16e574a 100644 --- a/block/Makefile.objs +++ b/block/Makefile.objs @@ -1,5 +1,6 @@ block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat.o -block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o +block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o +block-obj-y += block-cache.o block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o block-obj-y += qed-check.o block-obj-y += vhdx.o diff --git a/block/block-cache.c b/block/block-cache.c new file mode 100644 index 000..bc057a8 --- /dev/null +++ b/block/block-cache.c @@ -0,0 +1,323 @@ +/* + * L2/refcount table cache for the QCOW2 format + * + * Copyright (c) 2010 Kevin Wolf kw...@redhat.com + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the Software), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#include block/block_int.h +#include qemu-common.h +#include qcow2.h +#include trace.h + +typedef struct BlockCachedTable { +void* table; +int64_t offset; +booldirty; +int cache_hits; +int ref; +} BlockCachedTable; + +struct BlockCache { +BlockCachedTable* entries; +struct BlockCache* depends; +int size; +booldepends_on_flush; +}; + +BlockCache *block_cache_create(BlockDriverState *bs, int num_tables) +{ +BDRVQcowState *s = bs-opaque; +BlockCache *c; +int i; + +c = g_malloc0(sizeof(*c)); +c-size = num_tables; +c-entries = g_malloc0(sizeof(*c-entries) * num_tables); + +for (i = 0; i c-size; i++) { +c-entries[i].table = qemu_blockalign(bs, s-cluster_size); +} + +return c; +} + +int block_cache_destroy(BlockDriverState* bs, BlockCache *c) +{ +int i; + +for (i = 0; i c-size; i++) { +assert(c-entries[i].ref == 0); +qemu_vfree(c-entries[i].table); +} + +g_free(c-entries); +g_free(c); + +return 0; +} + +static int block_cache_flush_dependency(BlockDriverState *bs, BlockCache *c) +{ +int ret; + +ret = block_cache_flush(bs, c-depends); +if (ret 0) { +return ret; +} + +c-depends = NULL; +c-depends_on_flush = false; + +return 0; +} + +static int block_cache_entry_flush(BlockDriverState *bs, BlockCache *c, int i) +{ +BDRVQcowState *s = bs-opaque; +int ret = 0; + +if (!c-entries[i].dirty || !c-entries[i].offset) { +return 0; +} + +trace_block_cache_entry_flush(qemu_coroutine_self(), + c == s-l2_table_cache, i); + +if (c-depends) { +ret = block_cache_flush_dependency(bs, c); +} else if (c-depends_on_flush) { +ret = bdrv_flush(bs-file); +if (ret = 0) { +c-depends_on_flush = false; +} +} + +if (ret 0) { +return ret; +} + +if (c == s-refcount_block_cache) { +BLKDBG_EVENT(bs-file, BLKDBG_REFBLOCK_UPDATE_PART); +} else if (c == s-l2_table_cache) { +BLKDBG_EVENT(bs-file, BLKDBG_L2_UPDATE); +} + +ret = bdrv_pwrite(bs-file, c-entries[i].offset, c-entries[i].table, +s-cluster_size); +if (ret 0) { +return
[Qemu-devel] [PATCH V19 3/8] qed_read_string to bdrv_read_string
From: Dong Xu Wang wdon...@linux.vnet.ibm.com Make qed_read_string function to a common interface, so move it to block.c. Signed-off-by: Dong Xu Wang wdon...@linux.vnet.ibm.com Signed-off-by: Dongxu Wang wdon...@linux.vnet.ibm.com --- block.c | 27 +++ block/qed.c | 34 -- include/block/block.h | 2 ++ 3 files changed, 33 insertions(+), 30 deletions(-) diff --git a/block.c b/block.c index 5135044..65c5e76 100644 --- a/block.c +++ b/block.c @@ -210,6 +210,33 @@ int path_has_protocol(const char *path) return *p == ':'; } +/** + * Read a string of known length from the image file + * + * @bs: Image file + * @offset: File offset to start of string, in bytes + * @n: String length in bytes + * @buf:Destination buffer + * @buflen: Destination buffer length in bytes + * @ret:0 on success, -errno on failure + * + * The string is NUL-terminated. + */ +int bdrv_read_string(BlockDriverState *bs, uint64_t offset, size_t n, + char *buf, size_t buflen) +{ +int ret; +if (n = buflen) { +return -EINVAL; +} +ret = bdrv_pread(bs, offset, buf, n); +if (ret 0) { +return ret; +} +buf[n] = '\0'; +return 0; +} + int path_is_absolute(const char *path) { #ifdef _WIN32 diff --git a/block/qed.c b/block/qed.c index 88daf22..077da76 100644 --- a/block/qed.c +++ b/block/qed.c @@ -217,33 +217,6 @@ static bool qed_is_image_size_valid(uint64_t image_size, uint32_t cluster_size, } /** - * Read a string of known length from the image file - * - * @file: Image file - * @offset: File offset to start of string, in bytes - * @n: String length in bytes - * @buf:Destination buffer - * @buflen: Destination buffer length in bytes - * @ret:0 on success, -errno on failure - * - * The string is NUL-terminated. - */ -static int qed_read_string(BlockDriverState *file, uint64_t offset, size_t n, - char *buf, size_t buflen) -{ -int ret; -if (n = buflen) { -return -EINVAL; -} -ret = bdrv_pread(file, offset, buf, n); -if (ret 0) { -return ret; -} -buf[n] = '\0'; -return 0; -} - -/** * Allocate new clusters * * @s: QED state @@ -437,9 +410,10 @@ static int bdrv_qed_open(BlockDriverState *bs, QDict *options, int flags) return -EINVAL; } -ret = qed_read_string(bs-file, s-header.backing_filename_offset, - s-header.backing_filename_size, bs-backing_file, - sizeof(bs-backing_file)); +ret = bdrv_read_string(bs-file, s-header.backing_filename_offset, + s-header.backing_filename_size, + bs-backing_file, + sizeof(bs-backing_file)); if (ret 0) { return ret; } diff --git a/include/block/block.h b/include/block/block.h index fd776a1..2989da6 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -175,6 +175,8 @@ int bdrv_pwrite_sync(BlockDriverState *bs, int64_t offset, const void *buf, int count); int coroutine_fn bdrv_co_readv(BlockDriverState *bs, int64_t sector_num, int nb_sectors, QEMUIOVector *qiov); +int bdrv_read_string(BlockDriverState *bs, uint64_t offset, size_t n, + char *buf, size_t buflen); int coroutine_fn bdrv_co_copy_on_readv(BlockDriverState *bs, int64_t sector_num, int nb_sectors, QEMUIOVector *qiov); int coroutine_fn bdrv_co_writev(BlockDriverState *bs, int64_t sector_num, -- 1.7.11.7
[Qemu-devel] [PATCH V19 2/8] make path_has_protocol non static
From: Dong Xu Wang wdon...@linux.vnet.ibm.com We will use path_has_protocol outside block.c, so just make it public. Reviewed-by: Michael Roth mdr...@linux.vnet.ibm.com Signed-off-by: Dong Xu Wang wdon...@linux.vnet.ibm.com Signed-off-by: Dongxu Wang wdon...@linux.vnet.ibm.com --- block.c | 2 +- include/block/block.h | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/block.c b/block.c index 710362f..5135044 100644 --- a/block.c +++ b/block.c @@ -193,7 +193,7 @@ static void bdrv_io_limits_intercept(BlockDriverState *bs, } /* check if the path starts with protocol: */ -static int path_has_protocol(const char *path) +int path_has_protocol(const char *path) { const char *p; diff --git a/include/block/block.h b/include/block/block.h index 90f95ef..fd776a1 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -345,6 +345,7 @@ char *bdrv_snapshot_dump(char *buf, int buf_size, QEMUSnapshotInfo *sn); char *get_human_readable_size(char *buf, int buf_size, int64_t size); int path_is_absolute(const char *path); +int path_has_protocol(const char *path); void path_combine(char *dest, int dest_size, const char *base_path, const char *filename); -- 1.7.11.7
[Qemu-devel] [PATCH V19 8/8] qemu-iotests: add add-cow iotests support
From: Dong Xu Wang wdon...@linux.vnet.ibm.com This patch will use qemu-iotests to test add-cow file format. v17-v18: 1) add 037 and 038 tests. Signed-off-by: Dong Xu Wang wdon...@linux.vnet.ibm.com Signed-off-by: Dongxu Wang wdon...@linux.vnet.ibm.com --- tests/qemu-iotests/017 | 2 +- tests/qemu-iotests/020 | 2 +- tests/qemu-iotests/037 | 2 +- tests/qemu-iotests/038 | 2 +- tests/qemu-iotests/common| 6 ++ tests/qemu-iotests/common.rc | 15 ++- 6 files changed, 24 insertions(+), 5 deletions(-) diff --git a/tests/qemu-iotests/017 b/tests/qemu-iotests/017 index 45f2c0b..ae0e627 100755 --- a/tests/qemu-iotests/017 +++ b/tests/qemu-iotests/017 @@ -40,7 +40,7 @@ trap _cleanup; exit \$status 0 1 2 3 15 . ./common.pattern # Any format supporting backing files -_supported_fmt qcow qcow2 vmdk qed +_supported_fmt qcow qcow2 vmdk qed add-cow _supported_proto generic _supported_os Linux diff --git a/tests/qemu-iotests/020 b/tests/qemu-iotests/020 index 2fb0ff8..3dbb495 100755 --- a/tests/qemu-iotests/020 +++ b/tests/qemu-iotests/020 @@ -42,7 +42,7 @@ trap _cleanup; exit \$status 0 1 2 3 15 . ./common.pattern # Any format supporting backing files -_supported_fmt qcow qcow2 vmdk qed +_supported_fmt qcow qcow2 vmdk qed add-cow _supported_proto generic _supported_os Linux diff --git a/tests/qemu-iotests/037 b/tests/qemu-iotests/037 index c11460b..683e73d 100755 --- a/tests/qemu-iotests/037 +++ b/tests/qemu-iotests/037 @@ -38,7 +38,7 @@ trap _cleanup; exit \$status 0 1 2 3 15 . ./common.rc . ./common.filter -_supported_fmt qcow qcow2 vmdk qed +_supported_fmt qcow qcow2 vmdk qed add-cow _supported_proto generic _supported_os Linux diff --git a/tests/qemu-iotests/038 b/tests/qemu-iotests/038 index 36125ea..10bacfb 100755 --- a/tests/qemu-iotests/038 +++ b/tests/qemu-iotests/038 @@ -38,7 +38,7 @@ trap _cleanup; exit \$status 0 1 2 3 15 . ./common.rc . ./common.filter -_supported_fmt qcow2 qed +_supported_fmt qcow2 qed add-cow _supported_proto generic _supported_os Linux diff --git a/tests/qemu-iotests/common b/tests/qemu-iotests/common index 6826ea7..5110fc1 100644 --- a/tests/qemu-iotests/common +++ b/tests/qemu-iotests/common @@ -128,6 +128,7 @@ common options check options -rawtest raw (default) -cowtest cow +-add-cowtest add-cow -qcow test qcow -qcow2 test qcow2 -qedtest qed @@ -165,6 +166,11 @@ testlist options xpand=false ;; +-add-cow) +IMGFMT=add-cow +xpand=false +;; + -qcow) IMGFMT=qcow xpand=false diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc index e9ba358..b203f04 100644 --- a/tests/qemu-iotests/common.rc +++ b/tests/qemu-iotests/common.rc @@ -110,6 +110,16 @@ _make_test_img() fi if [ \( $IMGFMT = qcow2 -o $IMGFMT = qed \) -a -n $CLUSTER_SIZE ]; then optstr=$(_optstr_add $optstr cluster_size=$CLUSTER_SIZE) +elif [ $IMGFMT = add-cow ]; then +local IMG=$TEST_IMG.raw +if [ $1 = -b ]; then +IMG=$IMG.b +$QEMU_IMG create -f raw $IMG $image_size/dev/null +extra_img_options=-o image_file=$IMG $extra_img_options +else +$QEMU_IMG create -f raw $IMG $image_size/dev/null +extra_img_options=-o image_file=$IMG +fi fi if [ -n $optstr ]; then @@ -130,7 +140,8 @@ _make_test_img() -e s# zeroed_grain=\\(on\\|off\\)##g \ -e s# subformat='[^']*'##g \ -e s# adapter_type='[^']*'##g \ --e s# lazy_refcounts=\\(on\\|off\\)##g +-e s# lazy_refcounts=\\(on\\|off\\)##g \ +-e s# image_file='[^']*'##g # Start an NBD server on the image file, which is what we'll be talking to if [ $IMGPROTO = nbd ]; then @@ -152,6 +163,8 @@ _cleanup_test_img() rm -f $TEST_DIR/t.$IMGFMT rm -f $TEST_DIR/t.$IMGFMT.orig rm -f $TEST_DIR/t.$IMGFMT.base +rm -f $TEST_DIR/t.$IMGFMT.raw +rm -f $TEST_DIR/t.$IMGFMT.raw.b ;; rbd) -- 1.7.11.7
[Qemu-devel] [PATCH V19 7/8] add-cow file format core code.
From: Dong Xu Wang wdon...@linux.vnet.ibm.com add-cow file format core code. It use block-cache.c as cache code. It lacks of snapshot_blkdev support. v18-v19: 1) add aio parallel write support. 2) fix flush method. v17-v18: 1) use error_report, not fprintf. 2) remove version field from header. 3) header_size is MAX(cluster_size, 4096). 4) introduce s-cluster_sectors. 5) use BLKDBG_L2_LOAD/UPDATE. v16-v17: 1) Use stringify. v15-v16: 1) Judge if opts is null in add_cow_create function. Signed-off-by: Dong Xu Wang wdon...@linux.vnet.ibm.com Conflicts: block/block-cache.c Signed-off-by: Dongxu Wang wdon...@linux.vnet.ibm.com --- block/Makefile.objs | 1 + block/add-cow.c | 812 block/block-cache.c | 16 +- include/block/block-cache.h | 4 + include/block/block_int.h | 2 + 5 files changed, 831 insertions(+), 4 deletions(-) create mode 100644 block/add-cow.c diff --git a/block/Makefile.objs b/block/Makefile.objs index 16e574a..f666a7a 100644 --- a/block/Makefile.objs +++ b/block/Makefile.objs @@ -1,6 +1,7 @@ block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat.o block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o block-obj-y += block-cache.o +block-obj-y += add-cow.o block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o block-obj-y += qed-check.o block-obj-y += vhdx.o diff --git a/block/add-cow.c b/block/add-cow.c new file mode 100644 index 000..290da3c --- /dev/null +++ b/block/add-cow.c @@ -0,0 +1,812 @@ +/* + * QEMU ADD-COW Disk Format + * + * Copyright IBM, Corp. 2012 + * + * Authors: + * Dong Xu Wang wdon...@linux.vnet.ibm.com + * + * This work is licensed under the terms of the GNU LGPL, version 2 or later. + * See the COPYING.LIB file in the top-level directory. + * + */ + +#include qemu-common.h +#include block/block_int.h +#include qemu/module.h +#include block/block-cache.h + +#define ACOW_CLUSTER_SIZE 65536 + +typedef struct AddCowMeta +{ +uint64_t start_sector; +int remaining_sectors; +CoQueue dependent_requests; +struct AddCowMeta *next; + +QLIST_ENTRY(AddCowMeta) next_in_flight; +} AddCowMeta; + +enum { +/* compat_features bit */ +ACOW_F_ALL_ALLOCATED= 0X01, + +/* none feature bit used now */ +ACOW_FEATURE_MASK = 0, + +ACOW_MAGIC = 'A' | 'C' 8 | 'O' 16 | 'W' 24, +ACOW_CACHE_SIZE = 16, +DEFAULT_HEADER_SIZE = 4096, +MIN_CLUSTER_BITS= 12, +MAX_CLUSTER_BITS= 21, +}; + +typedef struct AddCowHeader { +uint32_tmagic; + +uint32_tbacking_offset; +uint32_tbacking_size; +uint32_timage_offset; +uint32_timage_size; + +uint32_tcluster_bits; +uint64_tfeatures; +uint64_tcompat_features; +uint32_theader_size; + +charbacking_fmt[16]; +charimage_fmt[16]; +} QEMU_PACKED AddCowHeader; + +typedef struct BDRVAddCowState { +BlockDriverState*image_hd; +CoMutex lock; +int cluster_size; +int cluster_sectors; +BlockCache *bitmap_cache; +uint64_tbitmap_size; +AddCowHeaderheader; +charbacking_fmt[16]; +charimage_fmt[16]; +BlockCache *image_hd_cache; +QLIST_HEAD(AddCowAlloc, AddCowMeta) cluster_allocs; +} BDRVAddCowState; + +/* Convert sector_num to offset in bitmap */ +static inline int64_t offset_in_bitmap(int64_t sector_num, + int64_t cluster_sectors) +{ +int64_t cluster_num = sector_num / cluster_sectors; +return cluster_num / 8; +} + +static inline bool is_cluster_head(int64_t sector_num, + int64_t cluster_sectors) +{ +return sector_num % cluster_sectors == 0; +} + +static inline bool is_cluster_tail(int64_t sector_num, + int64_t cluster_sectors) +{ +return (sector_num + 1) % cluster_sectors == 0; +} + +static void add_cow_header_le_to_cpu(const AddCowHeader *le, AddCowHeader *cpu) +{ +cpu-magic = le32_to_cpu(le-magic); + +cpu-backing_offset = le32_to_cpu(le-backing_offset); +cpu-backing_size = le32_to_cpu(le-backing_size); +cpu-image_offset = le32_to_cpu(le-image_offset); +cpu-image_size = le32_to_cpu(le-image_size); + +cpu-cluster_bits = le32_to_cpu(le-cluster_bits); +cpu-features = le64_to_cpu(le-features); +cpu-compat_features= le64_to_cpu(le-compat_features); +cpu-header_size= le32_to_cpu(le-header_size); + +memcpy(cpu-backing_fmt, le-backing_fmt, sizeof(cpu-backing_fmt)); +memcpy(cpu-image_fmt, le-image_fmt, sizeof(cpu-image_fmt)); +} + +static void add_cow_header_cpu_to_le(const AddCowHeader *cpu, AddCowHeader *le) +{ +le-magic =
[Qemu-devel] [PATCH V19 5/8] Make block-cache.c be common interface
From: Dong Xu Wang wdon...@linux.vnet.ibm.com To use block-cache.c be common cache interface, we need to add some parameter to identify cache's type. Define a struct named BlockTableType, pass BlockTableType and table size parameters to block cache initialization function. v17-v18: 1) move struct to source file. 2) cluster_size-table_size. Signed-off-by: Dong Xu Wang wdon...@linux.vnet.ibm.com Signed-off-by: Dongxu Wang wdon...@linux.vnet.ibm.com --- block/block-cache.c | 75 ++--- block/qcow2-cluster.c | 30 +- block/qcow2-refcount.c | 30 +- block/qcow2.c | 7 +++-- block/qcow2.h | 24 ++- include/block/block-cache.h | 55 + trace-events| 1 + 7 files changed, 139 insertions(+), 83 deletions(-) create mode 100644 include/block/block-cache.h diff --git a/block/block-cache.c b/block/block-cache.c index bc057a8..f5d75d1 100644 --- a/block/block-cache.c +++ b/block/block-cache.c @@ -1,4 +1,8 @@ /* + * QEMU Block Layer Cache + * + * This file is based on qcow2-cache.c, see its copyrights below: + * * L2/refcount table cache for the QCOW2 format * * Copyright (c) 2010 Kevin Wolf kw...@redhat.com @@ -24,11 +28,11 @@ #include block/block_int.h #include qemu-common.h -#include qcow2.h #include trace.h +#include block/block-cache.h typedef struct BlockCachedTable { -void* table; +void*table; int64_t offset; booldirty; int cache_hits; @@ -36,30 +40,34 @@ typedef struct BlockCachedTable { } BlockCachedTable; struct BlockCache { -BlockCachedTable* entries; -struct BlockCache* depends; -int size; -booldepends_on_flush; +BlockCachedTable*entries; +struct BlockCache *depends; +int size; +size_t table_size; +BlockTableType table_type; +booldepends_on_flush; }; -BlockCache *block_cache_create(BlockDriverState *bs, int num_tables) +BlockCache *block_cache_create(BlockDriverState *bs, int num_tables, + size_t table_size, BlockTableType type) { -BDRVQcowState *s = bs-opaque; BlockCache *c; int i; c = g_malloc0(sizeof(*c)); c-size = num_tables; c-entries = g_malloc0(sizeof(*c-entries) * num_tables); +c-table_type = type; +c-table_size = table_size; for (i = 0; i c-size; i++) { -c-entries[i].table = qemu_blockalign(bs, s-cluster_size); +c-entries[i].table = qemu_blockalign(bs, table_size); } return c; } -int block_cache_destroy(BlockDriverState* bs, BlockCache *c) +int block_cache_destroy(BlockDriverState *bs, BlockCache *c) { int i; @@ -91,15 +99,13 @@ static int block_cache_flush_dependency(BlockDriverState *bs, BlockCache *c) static int block_cache_entry_flush(BlockDriverState *bs, BlockCache *c, int i) { -BDRVQcowState *s = bs-opaque; int ret = 0; if (!c-entries[i].dirty || !c-entries[i].offset) { return 0; } -trace_block_cache_entry_flush(qemu_coroutine_self(), - c == s-l2_table_cache, i); +trace_block_cache_entry_flush(qemu_coroutine_self(), c-table_type, i); if (c-depends) { ret = block_cache_flush_dependency(bs, c); @@ -114,14 +120,16 @@ static int block_cache_entry_flush(BlockDriverState *bs, BlockCache *c, int i) return ret; } -if (c == s-refcount_block_cache) { +if (c-table_type == BLOCK_TABLE_REF) { BLKDBG_EVENT(bs-file, BLKDBG_REFBLOCK_UPDATE_PART); -} else if (c == s-l2_table_cache) { +} else if (c-table_type == BLOCK_TABLE_L2) { BLKDBG_EVENT(bs-file, BLKDBG_L2_UPDATE); +} else if (c-table_type == BLOCK_TABLE_BITMAP) { +BLKDBG_EVENT(bs-file, BLKDBG_COW_WRITE); } -ret = bdrv_pwrite(bs-file, c-entries[i].offset, c-entries[i].table, -s-cluster_size); +ret = bdrv_pwrite(bs-file, c-entries[i].offset, + c-entries[i].table, c-table_size); if (ret 0) { return ret; } @@ -133,12 +141,11 @@ static int block_cache_entry_flush(BlockDriverState *bs, BlockCache *c, int i) int block_cache_flush(BlockDriverState *bs, BlockCache *c) { -BDRVQcowState *s = bs-opaque; int result = 0; int ret; int i; -trace_block_cache_flush(qemu_coroutine_self(), c == s-l2_table_cache); +trace_block_cache_flush(qemu_coroutine_self(), c-table_type); for (i = 0; i c-size; i++) { ret = block_cache_entry_flush(bs, c, i); @@ -157,8 +164,9 @@ int block_cache_flush(BlockDriverState *bs, BlockCache *c) return result; } -int block_cache_set_dependency(BlockDriverState *bs, BlockCache *c, -BlockCache *dependency) +int block_cache_set_dependency(BlockDriverState *bs, +
Re: [Qemu-devel] [PATCH 5/5] memory: able to pin guest node memory to host node manually
Any comments? Use mbind to pin guest numa node memory to host nodes manually. If we are not able to pin memory to host node, we may meet the cross node memory access performance regression. With this patch, we can add manual pinning host node like this: -m 1024 -numa node,cpus=0,nodeid=0,mem=512,pin=0 -numa node,nodeid=1,cpus=1,mem=512,pin=1 And, if PCI-passthrough is used, direct-attached-device uses DMA transfer between device and qemu process. All pages of the guest will be pinned by get_user_pages(). KVM_ASSIGN_PCI_DEVICE ioctl kvm_vm_ioctl_assign_device() =kvm_assign_device() = kvm_iommu_map_memslots() = kvm_iommu_map_pages() = kvm_pin_pages() So, with direct-attached-device, all guest page's page count will be +1 and any page migration will not work. AutoNUMA won't too. And direction by libvirt is *ignored*. Above all, we need manual pinning memory to host node to avoid such cross nodes memmory access performance regression. Signed-off-by: Wanlong Gao gaowanl...@cn.fujitsu.com --- exec.c | 21 + include/sysemu/sysemu.h | 1 + vl.c| 13 + 3 files changed, 35 insertions(+) diff --git a/exec.c b/exec.c index aec65c5..fe929ef 100644 --- a/exec.c +++ b/exec.c @@ -36,6 +36,8 @@ #include qemu/config-file.h #include exec/memory.h #include sysemu/dma.h +#include sysemu/sysemu.h +#include qemu/bitops.h #include exec/address-spaces.h #if defined(CONFIG_USER_ONLY) #include qemu.h @@ -1081,6 +1083,25 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host, memory_try_enable_merging(new_block-host, size); } } + +if (nb_numa_nodes 0 !strcmp(mr-name, pc.ram)) { +int i; +uint64_t nodes_mem = 0; +unsigned long *maskp = g_malloc0(sizeof(*maskp)); +for (i = 0; i nb_numa_nodes; i++) { +*maskp = 0; +if (node_pin[i] != -1) { +set_bit(node_pin[i], maskp); +if (qemu_mbind(new_block-host + nodes_mem, node_mem[i], + QEMU_MPOL_BIND, maskp, MAX_NODES, 0)) { +perror(qemu_mbind); +exit(1); +} +} +nodes_mem += node_mem[i]; +} +} + new_block-length = size; /* Keep the list sorted from biggest to smallest block. */ diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h index 2fb71af..ebf6580 100644 --- a/include/sysemu/sysemu.h +++ b/include/sysemu/sysemu.h @@ -131,6 +131,7 @@ extern QEMUClock *rtc_clock; #define MAX_CPUMASK_BITS 255 extern int nb_numa_nodes; extern uint64_t node_mem[MAX_NODES]; +extern int node_pin[MAX_NODES]; extern unsigned long *node_cpumask[MAX_NODES]; #define MAX_OPTION_ROMS 16 diff --git a/vl.c b/vl.c index b1d..3768002 100644 --- a/vl.c +++ b/vl.c @@ -253,6 +253,7 @@ static QTAILQ_HEAD(, FWBootEntry) fw_boot_order = int nb_numa_nodes; uint64_t node_mem[MAX_NODES]; +int node_pin[MAX_NODES]; unsigned long *node_cpumask[MAX_NODES]; uint8_t qemu_uuid[16]; @@ -1390,6 +1391,17 @@ static void numa_add(const char *optarg) } node_mem[nodenr] = sval; } + +if (get_param_value(option, 128, pin, optarg) != 0) { +int unsigned long long pin_node; +if (parse_uint_full(option, pin_node, 10) 0) { +fprintf(stderr, qemu: Invalid pinning nodeid: %s\n, optarg); +exit(1); +} else { +node_pin[nodenr] = pin_node; +} +} + if (get_param_value(option, 128, cpus, optarg) != 0) { numa_node_parse_cpus(nodenr, option); } @@ -2921,6 +2933,7 @@ int main(int argc, char **argv, char **envp) for (i = 0; i MAX_NODES; i++) { node_mem[i] = 0; +node_pin[i] = -1; node_cpumask[i] = bitmap_new(MAX_CPUMASK_BITS); }
[Qemu-devel] [RFC] Check backing_file chain's loop
Hi, Now block layer does not check whether backing_file chain can be a circle, do you think it is necessary to do a check? For example, 1.qcow2's backing_file is 2.qcow2 and 2.qcow2's backing_file is 1.qcow2, then any IO operation will be a fault. If necessary, I will post a patch to fix. [$] qemu-img create -f qcow2 1.qcow2 qemu-img: Image creation needs a size parameter [$] qemu-img create -f qcow2 1.qcow2 8M Formatting '1.qcow2', fmt=qcow2 size=8388608 encryption=off cluster_size=65536 lazy_refcounts=off [$] qemu-img create -f qcow2 2.qcow2 -o backing_file=1.qcow2 Formatting '2.qcow2', fmt=qcow2 size=8388608 backing_file='1.qcow2' encryption=off cluster_size=65536 lazy_refcounts=off [$] qemu-img create -f qcow2 1.qcow2 -o backing_file=2.qcow2 Formatting '1.qcow2', fmt=qcow2 size=8388608 backing_file='2.qcow2' encryption=off cluster_size=65536 lazy_refcounts=off [$] qemu-io -c read -v 0 512 1.qcow2 ^CSegmentation fault (core dumped) Thanks.
Re: [Qemu-devel] broken incoming migration
On 30 May 2013 08:44, Alexey Kardashevskiy a...@ozlabs.ru wrote: I found the migration broken on pseries platform, specifically, this patch broke it: f1c72795af573b24a7da5eb52375c9aba8a37972 migration: do not sent zero pages in bulk stage The idea is not to send zero pages to the destination guest which is expected to have 100% empty RAM. This just seems like a broken assumption to me. If the migration protocol is going to skip sending zero pages it should do it by saying there's a zero page here and have the destination memset the page to zero. thanks -- PMM
[Qemu-devel] [PATCH V19 6/8] add debug event for add-cow
From: Dong Xu Wang wdon...@linux.vnet.ibm.com Signed-off-by: Dong Xu Wang wdon...@linux.vnet.ibm.com Signed-off-by: Dongxu Wang wdon...@linux.vnet.ibm.com --- block/blkdebug.c | 3 +++ block/block-cache.c | 4 ++-- include/block/block.h | 3 +++ 3 files changed, 8 insertions(+), 2 deletions(-) diff --git a/block/blkdebug.c b/block/blkdebug.c index 71f99e4..2bd6a53 100644 --- a/block/blkdebug.c +++ b/block/blkdebug.c @@ -182,6 +182,9 @@ static const char *event_names[BLKDBG_EVENT_MAX] = { [BLKDBG_CLUSTER_ALLOC] = cluster_alloc, [BLKDBG_CLUSTER_ALLOC_BYTES]= cluster_alloc_bytes, [BLKDBG_CLUSTER_FREE] = cluster_free, + +[BLKDBG_ADDCOW_READ]= add_cow_read, +[BLKDBG_ADDCOW_WRITE] = add_cow_write, }; static int get_event_by_name(const char *name, BlkDebugEvent *event) diff --git a/block/block-cache.c b/block/block-cache.c index f5d75d1..454269c 100644 --- a/block/block-cache.c +++ b/block/block-cache.c @@ -125,7 +125,7 @@ static int block_cache_entry_flush(BlockDriverState *bs, BlockCache *c, int i) } else if (c-table_type == BLOCK_TABLE_L2) { BLKDBG_EVENT(bs-file, BLKDBG_L2_UPDATE); } else if (c-table_type == BLOCK_TABLE_BITMAP) { -BLKDBG_EVENT(bs-file, BLKDBG_COW_WRITE); +BLKDBG_EVENT(bs-file, BLKDBG_ADDCOW_WRITE); } ret = bdrv_pwrite(bs-file, c-entries[i].offset, @@ -260,7 +260,7 @@ static int block_cache_do_get(BlockDriverState *bs, BlockCache *c, if (c-table_type == BLOCK_TABLE_L2) { BLKDBG_EVENT(bs-file, BLKDBG_L2_LOAD); } else if (c-table_type == BLOCK_TABLE_BITMAP) { -BLKDBG_EVENT(bs-file, BLKDBG_COW_READ); +BLKDBG_EVENT(bs-file, BLKDBG_ADDCOW_READ); } ret = bdrv_pread(bs-file, offset, c-entries[i].table, diff --git a/include/block/block.h b/include/block/block.h index 2989da6..3573e3e 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -451,6 +451,9 @@ typedef enum { BLKDBG_CLUSTER_ALLOC_BYTES, BLKDBG_CLUSTER_FREE, +BLKDBG_ADDCOW_READ, +BLKDBG_ADDCOW_WRITE, + BLKDBG_EVENT_MAX, } BlkDebugEvent; -- 1.7.11.7
[Qemu-devel] Why some test suite in kvm-unit-tests designed for 64bit only?
Hi there, I'm now reading codes of kvm-unit-tests and I found that some of the test cases for x86 is only designed for x86_64 (including access.flat, apic.flat, emulator.flat, idt_test.flat and so on). I wonder why these cases are not designed for i386? Or is there any other concerns? Thanks, Arthur -- Arthur Chunqi Li Department of Computer Science School of EECS Peking University Beijing, China
[Qemu-devel] [PATCH v2 2/5] pci: store PCI hole ranges in guestinfo structure
Will be used to pass hole ranges to guests. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/i386/pc.c | 39 ++- hw/i386/pc_piix.c | 14 +- hw/i386/pc_q35.c | 6 +- hw/pci-host/q35.c | 4 include/hw/i386/pc.h | 19 ++- include/hw/pci-host/q35.h | 2 ++ include/qemu/typedefs.h | 1 + 7 files changed, 81 insertions(+), 4 deletions(-) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 4844a6b..c233161 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -978,6 +978,41 @@ void pc_cpus_init(const char *cpu_model, DeviceState *icc_bridge) } } +typedef struct PcGuestInfoState { +PcGuestInfo info; +Notifier machine_done; +} PcGuestInfoState; + +static +void pc_guest_info_machine_done(Notifier *notifier, void *data) +{ +PcGuestInfoState *guest_info_state = container_of(notifier, + PcGuestInfoState, + machine_done); +} + +PcGuestInfo *pc_guest_info_init(ram_addr_t below_4g_mem_size, +ram_addr_t above_4g_mem_size) +{ +PcGuestInfoState *guest_info_state = g_malloc0(sizeof *guest_info_state); +PcGuestInfo *guest_info = guest_info_state-info; + +guest_info-pci_info.w32.end = IO_APIC_DEFAULT_ADDRESS; +if (sizeof(hwaddr) == 4) { +guest_info-pci_info.w64.begin = 0; +guest_info-pci_info.w64.end = 0; +} else { +guest_info-pci_info.w64.begin = 0x1ULL + above_4g_mem_size; +guest_info-pci_info.w64.end = guest_info-pci_info.w64.begin + +(0x1ULL 62); +assert(guest_info-pci_info.w64.begin = guest_info-pci_info.w64.end); +} + +guest_info_state-machine_done.notify = pc_guest_info_machine_done; +qemu_add_machine_init_done_notifier(guest_info_state-machine_done); +return guest_info; +} + void pc_acpi_init(const char *default_dsdt) { char *filename; @@ -1019,7 +1054,8 @@ FWCfgState *pc_memory_init(MemoryRegion *system_memory, ram_addr_t below_4g_mem_size, ram_addr_t above_4g_mem_size, MemoryRegion *rom_memory, - MemoryRegion **ram_memory) + MemoryRegion **ram_memory, + PcGuestInfo *guest_info) { int linux_boot, i; MemoryRegion *ram, *option_rom_mr; @@ -1071,6 +1107,7 @@ FWCfgState *pc_memory_init(MemoryRegion *system_memory, for (i = 0; i nb_option_roms; i++) { rom_add_option(option_rom[i].name, option_rom[i].bootindex); } +guest_info-fw_cfg = fw_cfg; return fw_cfg; } diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c index d547548..eaff0b6 100644 --- a/hw/i386/pc_piix.c +++ b/hw/i386/pc_piix.c @@ -90,6 +90,7 @@ static void pc_init1(MemoryRegion *system_memory, MemoryRegion *rom_memory; DeviceState *icc_bridge; FWCfgState *fw_cfg = NULL; +PcGuestInfo *guest_info; icc_bridge = qdev_create(NULL, TYPE_ICC_BRIDGE); object_property_add_child(qdev_get_machine(), icc-bridge, @@ -119,12 +120,23 @@ static void pc_init1(MemoryRegion *system_memory, rom_memory = system_memory; } +guest_info = pc_guest_info_init(below_4g_mem_size, above_4g_mem_size); + +/* Set PCI window size the way seabios has always done it. */ +/* TODO: consider just starting at below_4g_mem_size */ +if (ram_size = 0x8000) +guest_info-pci_info.w32.begin = 0x8000; +else if (ram_size = 0xc000) +guest_info-pci_info.w32.begin = 0xc000; +else +guest_info-pci_info.w32.begin = 0xe000; + /* allocate ram and load rom/bios */ if (!xen_enabled()) { fw_cfg = pc_memory_init(system_memory, kernel_filename, kernel_cmdline, initrd_filename, below_4g_mem_size, above_4g_mem_size, - rom_memory, ram_memory); + rom_memory, ram_memory, guest_info); } gsi_state = g_malloc0(sizeof(*gsi_state)); diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c index 7888dfe..32d6357 100644 --- a/hw/i386/pc_q35.c +++ b/hw/i386/pc_q35.c @@ -77,6 +77,7 @@ static void pc_q35_init(QEMUMachineInitArgs *args) ICH9LPCState *ich9_lpc; PCIDevice *ahci; DeviceState *icc_bridge; +PcGuestInfo *guest_info; icc_bridge = qdev_create(NULL, TYPE_ICC_BRIDGE); object_property_add_child(qdev_get_machine(), icc-bridge, @@ -105,11 +106,13 @@ static void pc_q35_init(QEMUMachineInitArgs *args) rom_memory = get_system_memory(); } +guest_info = pc_guest_info_init(below_4g_mem_size, above_4g_mem_size); + /* allocate ram and load rom/bios */ if (!xen_enabled()) { pc_memory_init(get_system_memory(), kernel_filename, kernel_cmdline,
[Qemu-devel] [PATCH v2 1/5] range: add Range structure
Sometimes we need to pass ranges around, add a handy structure for this purpose. Note: memory.c defines its own concept of AddrRange structure for working with 128 addresses. It's necessary there for doing range math. This is not needed for most users: struct Range is much simpler, and is only used for passing the range around. Cc: Peter Maydell peter.mayd...@linaro.org Signed-off-by: Michael S. Tsirkin m...@redhat.com --- include/qemu/range.h | 16 1 file changed, 16 insertions(+) diff --git a/include/qemu/range.h b/include/qemu/range.h index 3502372..b76cc0d 100644 --- a/include/qemu/range.h +++ b/include/qemu/range.h @@ -1,6 +1,22 @@ #ifndef QEMU_RANGE_H #define QEMU_RANGE_H +#include inttypes.h + +/* + * Operations on 64 bit address ranges. + * Notes: + * - ranges must not wrap around 0, but can include the last byte ~0x0LL. + * - this can not represent a full 0 to ~0x0LL range. + */ + +/* A structure representing a range of addresses. */ +struct Range { +uint64_t begin; /* First byte of the range, or 0 if empty. */ +uint64_t end; /* 1 + the last byte. 0 if range empty or ends at ~0x0LL. */ +}; +typedef struct Range Range; + /* Get last byte of a range from offset + length. * Undefined for ranges that wrap around 0. */ static inline uint64_t range_get_last(uint64_t offset, uint64_t len) -- MST
[Qemu-devel] [PATCH v2 0/5] pc: pass pci window data to guests
This makes it possible for bios to load pci window data from host. This makes it possible for host to make sure setup matches hardware exactly. This will also make it easier to add more chipsets down the road. Ranges are passed within a generic GuestInfo structure, can add more fields of interest to Guests in the future. Note: this is on top of my PCI branch, if no one objects I'd like to merge it through there as there are some trivial dependencies on that. Changes from v1: - fix v1.5-v1.6 migration compatibility - address Peter Maydell's comments on range.h - make addresses a bit smaller, compatible to what seabios does at the moment. We can increase the windows, carefully, at a later time. Michael S. Tsirkin (5): range: add Range structure pci: store PCI hole ranges in guestinfo structure pc: pass PCI hole ranges to Guests pc: add 1.6 compat type pc: pci-info add compat support hw/i386/pc.c | 65 ++- hw/i386/pc_piix.c | 37 --- hw/i386/pc_q35.c | 6 - hw/pci-host/q35.c | 4 +++ include/hw/i386/pc.h | 20 ++- include/hw/pci-host/q35.h | 2 ++ include/qemu/range.h | 16 include/qemu/typedefs.h | 1 + 8 files changed, 145 insertions(+), 6 deletions(-) -- MST
[Qemu-devel] [PATCH v2 5/5] pc: pci-info add compat support
We can't change fw cfg entries or add new ones without breaking cross version migration. Add a flag to skip adding new entry when running with 1.5 compat machine type. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/i386/pc.c | 7 ++- hw/i386/pc_piix.c| 12 ++-- include/hw/i386/pc.h | 1 + 3 files changed, 17 insertions(+), 3 deletions(-) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 64101cb..8b548d0 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -988,7 +988,12 @@ typedef struct PcRomPciInfo { static void pc_fw_cfg_guest_info(PcGuestInfo *guest_info) { -PcRomPciInfo *info = g_malloc(sizeof *info); +PcRomPciInfo *info; +if (guest_info-compat_v1_5) { +return; +} + +info = g_malloc(sizeof *info); info-w32_min = cpu_to_le64(guest_info-pci_info.w32.begin); info-w32_max = cpu_to_le64(guest_info-pci_info.w32.end); info-w64_min = cpu_to_le64(guest_info-pci_info.w64.begin); diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c index 2717d83..5b281d1 100644 --- a/hw/i386/pc_piix.c +++ b/hw/i386/pc_piix.c @@ -57,7 +57,7 @@ static const int ide_iobase2[MAX_IDE_BUS] = { 0x3f6, 0x376 }; static const int ide_irq[MAX_IDE_BUS] = { 14, 15 }; static bool has_pvpanic = true; -static bool has_pci_info = true; +static bool guest_info_compat_v1_5 = false; /* PC hardware initialisation */ static void pc_init1(MemoryRegion *system_memory, @@ -122,6 +122,7 @@ static void pc_init1(MemoryRegion *system_memory, } guest_info = pc_guest_info_init(below_4g_mem_size, above_4g_mem_size); +guest_info-compat_v1_5 = guest_info_compat_v1_5; /* Set PCI window size the way seabios has always done it. */ /* TODO: consider just starting at below_4g_mem_size */ @@ -259,6 +260,12 @@ static void pc_init_pci(QEMUMachineInitArgs *args) initrd_filename, cpu_model, 1, 1); } +static void pc_init_pci_1_5(QEMUMachineInitArgs *args) +{ +guest_info_compat_v1_5 = true; +pc_init_pci(args); +} + static void pc_init_pci_1_4(QEMUMachineInitArgs *args) { has_pvpanic = false; @@ -355,7 +362,7 @@ static QEMUMachine pc_i440fx_machine_v1_6 = { static QEMUMachine pc_i440fx_machine_v1_5 = { .name = pc-i440fx-1.5, .desc = Standard PC (i440FX + PIIX, 1996), -.init = pc_init_pci, +.init = pc_init_pci_1_5, .hot_add_cpu = pc_hot_add_cpu, .max_cpus = 255, .is_default = 1, @@ -759,6 +766,7 @@ static QEMUMachine xenfv_machine = { static void pc_machine_init(void) { +qemu_register_machine(pc_i440fx_machine_v1_6); qemu_register_machine(pc_i440fx_machine_v1_5); qemu_register_machine(pc_i440fx_machine_v1_4); qemu_register_machine(pc_machine_v1_3); diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h index 1bf5219..6419da8 100644 --- a/include/hw/i386/pc.h +++ b/include/hw/i386/pc.h @@ -20,6 +20,7 @@ typedef struct PcPciInfo { struct PcGuestInfo { PcPciInfo pci_info; +bool compat_v1_5; FWCfgState *fw_cfg; }; -- MST
[Qemu-devel] [PATCH v2 4/5] pc: add 1.6 compat type
Identical to 1.5 ATM, but changes will accumulate. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/i386/pc_piix.c | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c index eaff0b6..2717d83 100644 --- a/hw/i386/pc_piix.c +++ b/hw/i386/pc_piix.c @@ -57,6 +57,7 @@ static const int ide_iobase2[MAX_IDE_BUS] = { 0x3f6, 0x376 }; static const int ide_irq[MAX_IDE_BUS] = { 14, 15 }; static bool has_pvpanic = true; +static bool has_pci_info = true; /* PC hardware initialisation */ static void pc_init1(MemoryRegion *system_memory, @@ -340,9 +341,19 @@ static void pc_xen_hvm_init(QEMUMachineInitArgs *args) } #endif +static QEMUMachine pc_i440fx_machine_v1_6 = { +.name = pc-i440fx-1.6, +.alias = pc, +.desc = Standard PC (i440FX + PIIX, 1996), +.init = pc_init_pci, +.hot_add_cpu = pc_hot_add_cpu, +.max_cpus = 255, +.is_default = 1, +DEFAULT_MACHINE_OPTIONS, +}; + static QEMUMachine pc_i440fx_machine_v1_5 = { .name = pc-i440fx-1.5, -.alias = pc, .desc = Standard PC (i440FX + PIIX, 1996), .init = pc_init_pci, .hot_add_cpu = pc_hot_add_cpu, -- MST
[Qemu-devel] [PATCH v2 3/5] pc: pass PCI hole ranges to Guests
Guest currently has to jump through lots of hoops to guess the PCI hole ranges. It's fragile, and makes us change BIOS each time we add a new chipset. Let's report the window in a ROM file, to make BIOS do exactly what QEMU intends. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/i386/pc.c | 21 + 1 file changed, 21 insertions(+) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index c233161..64101cb 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -978,6 +978,26 @@ void pc_cpus_init(const char *cpu_model, DeviceState *icc_bridge) } } +/* pci-info ROM file. Little endian format */ +typedef struct PcRomPciInfo { +uint64_t w32_min; +uint64_t w32_max; +uint64_t w64_min; +uint64_t w64_max; +} PcRomPciInfo; + +static void pc_fw_cfg_guest_info(PcGuestInfo *guest_info) +{ +PcRomPciInfo *info = g_malloc(sizeof *info); +info-w32_min = cpu_to_le64(guest_info-pci_info.w32.begin); +info-w32_max = cpu_to_le64(guest_info-pci_info.w32.end); +info-w64_min = cpu_to_le64(guest_info-pci_info.w64.begin); +info-w64_max = cpu_to_le64(guest_info-pci_info.w64.end); +/* Pass PCI hole info to guest via a side channel. + * Required so guest PCI enumeration does the right thing. */ +fw_cfg_add_file(guest_info-fw_cfg, etc/pci-info, info, sizeof *info); +} + typedef struct PcGuestInfoState { PcGuestInfo info; Notifier machine_done; @@ -989,6 +1009,7 @@ void pc_guest_info_machine_done(Notifier *notifier, void *data) PcGuestInfoState *guest_info_state = container_of(notifier, PcGuestInfoState, machine_done); +pc_fw_cfg_guest_info(guest_info_state-info); } PcGuestInfo *pc_guest_info_init(ram_addr_t below_4g_mem_size, -- MST
Re: [Qemu-devel] [PATCH v4 0/3] ARM aarch64 TCG target
On 29 May 2013 10:04, Claudio Fontana claudio.font...@huawei.com wrote: This series implements preliminary support for the ARM aarch64 TCG target. Reviewed-by: Peter Maydell peter.mayd...@linaro.org Cc'd some people who might like to commit the patchset. thanks -- PMM
Re: [Qemu-devel] [SeaBIOS] KVM call agenda for 2013-05-28
On 05/30/13 11:23, David Woodhouse wrote: On Wed, 2013-05-29 at 11:18 -0500, Anthony Liguori wrote: Certainly an option, but that is a long-term project. Out of curiousity, are there other benefits to using coreboot as a core firmware in QEMU? Is there a payload we would ever plausibly use besides OVMF and SeaBIOS? I like the idea of using Coreboot on the UEFI side — if the most actively used TianoCore platform is CorebootPkg instead of OvmfPkg, that makes it a lot easier for people using *real* hardware with Coreboot to use TianoCore. Where is CorebootPkg available from? And it helps to dispel the stupid misconception in some quarters that Coreboot *competes* with UEFI and thus cannot possibly be supported because helping something that competes with UEFI would be bad. I'm not sure who do you mean by some quarters, but for some distributions Coreboot would be yet another component (package) to support, for no obvious benefit. (Gerd said it better than I possibly could: http://thread.gmane.org/gmane.comp.bios.coreboot.seabios/5685/focus=5705.) Is there a payload we would ever plausibly use besides OVMF and SeaBIOS? For my part I want to get to the point where the default firmware shipped with qemu can be OVMF. For some distributions this is a licensing question as well. See FatBinPkg/License.txt. (The same applies if you rebuild it from source (FatPkgDev), based on FatBinPkg/ReadMe.txt.) For example Fedora can't ship OVMF because of it. If you implement a UEFI FAT driver based on Microsoft's official specification, you're bound by the same restrictions on use and redistribution. If you implement a private UEFI FAT driver from scratch, or port a free software FAT implementation (eg. the r/o one in grub or the r/w one in mtools), you could still run into legal problems, I've been told. If you rip out the FAT driver, then OVMF won't be UEFI compliant and no installer media will boot on it. Interestingly, Ubuntu has OVMF in Universe http://packages.ubuntu.com/raring/ovmf. I think they missed the FatBinPkg license (I would have missed it too, after all you have to track down the licenses of every module included in the FDF file -- it was Paolo who told me about it) and I believe they should actually ship OVMF in Multiverse or Restricted https://help.ubuntu.com/community/Repositories/Ubuntu. We have SeaBIOS-as-CSM working, which was one of the biggest barriers. Agreed, and I could have never done that. Thanks for implementing it with Kevin. We need at least one out-of-tree edk2 patch for now (from you) but apparently that's no problem. There are a few more things (like NV variable storage, in particular) that I need to fix before I can actually make that suggestion with a straight face though... As far as I understand: - Jordan is nearing completion of flash support on KVM, - he also has WIP NvVar storage on top of qemu flash. http://thread.gmane.org/gmane.comp.emulators.qemu/213690 http://thread.gmane.org/gmane.comp.bios.tianocore.devel/2781/focus=2798 (Please coordinate I guess :)) Laszlo
[Qemu-devel] [PATCH] gdbstub: do not restart crashed guest
If a guest has crashed with an internal error or similar, detaching gdb (or any other debugger action) should not restart it. Cc: Jan Kiszka jan.kis...@siemens.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- gdbstub.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/gdbstub.c b/gdbstub.c index e80e1d3..90e54cb 100644 --- a/gdbstub.c +++ b/gdbstub.c @@ -371,7 +371,9 @@ static inline void gdb_continue(GDBState *s) #ifdef CONFIG_USER_ONLY s-running_state = 1; #else -vm_start(); +if (runstate_check(RUN_STATE_DEBUG)) { +vm_start(); +} #endif } -- 1.8.1.4
[Qemu-devel] [PVSCSI]How to unplug scsi disk simulated by Qemu, just like unplug the ide disk?
Hi all, My environment is xen-4.1.2 + qemu-1.2.2 I made a pvscsi driver for Redhat guest, but I encountered a problem that I could see two scsi disks, one was simulated by QEMU, another was passthrough. Actually I want to unplug the scsi disk simulated. Any methods can solve the problem on the qemu upstream. Thanks!
Re: [Qemu-devel] [PVSCSI]How to unplug scsi disk simulated by Qemu, just like unplug the ide disk?
On 05/30/13 13:23, Gonglei (Arei) wrote: Hi all, My environment is xen-4.1.2 + qemu-1.2.2 I made a pvscsi driver for Redhat guest, but I encountered a problem that I could see two scsi disks, one was simulated by QEMU, another was passthrough. Actually I want to unplug the scsi disk simulated. Any methods can solve the problem on the qemu upstream. Thanks! Unless I'm misunderstanding you (which is more likely than not): emulated devices (eg. NIC and IDE disks) are unplugged by the guest kernel. See arch/x86/xen/platform-pci-unplug.c. The qemu side seems to be in hw/xen/xen_platform.c. Laszlo
Re: [Qemu-devel] [PATCH] gdbstub: do not restart crashed guest
On 05/30/13 13:20, Paolo Bonzini wrote: If a guest has crashed with an internal error or similar, detaching gdb (or any other debugger action) should not restart it. Cc: Jan Kiszka jan.kis...@siemens.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- gdbstub.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/gdbstub.c b/gdbstub.c index e80e1d3..90e54cb 100644 --- a/gdbstub.c +++ b/gdbstub.c @@ -371,7 +371,9 @@ static inline void gdb_continue(GDBState *s) #ifdef CONFIG_USER_ONLY s-running_state = 1; #else -vm_start(); +if (runstate_check(RUN_STATE_DEBUG)) { +vm_start(); +} #endif } I sought to check the gdb_continue() call sites, and uses of RUN_STATE_DEBUG. Seems reasonable. Reviewed-by: Laszlo Ersek ler...@redhat.com ( FWIW I wonder why in commit ad02b96a Luiz allowed DEBUG - SUSPENDED. As far as I understand, when the debugger is attached, the guest is not running, hence it can't go directly to RUN_STATE_SUSPENDED. Maybe due to a concurrent monitor command? Technically it does seem possible; from main_loop_should_exit(): if (qemu_debug_requested()) { vm_stop(RUN_STATE_DEBUG); } if (qemu_suspend_requested()) { qemu_system_suspend(); } Both requests could become pending during one iteration of the loop, and the next iteration will see both of them. OK. ) Laszlo
Re: [Qemu-devel] Unable to parse -device drivers containing commas?
On 05/30/2013 04:00 AM, Mark Cave-Ayland wrote: Hi all, I found that the QEMU -device command line parser doesn't seem to like driver names containing a comma such as SUNW,tcx for the video driver on qemu-system-sparc: $ ./qemu-system-sparc -device SUNW,tcx,help Is there a way of escaping the commas on the command line so that it is possible to list properties for drivers named in this way? Commas are escaped by doubling them. Try ./qemu-system-sparc -device SUNW,,tcx,help That's the generic quoting we've used for escaping commas in all new command line options, although there may be some older options that still need to be taught to honor that escaping. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [Qemu-stable] [QEMU PATCH v3] qdev: fix get_fw_dev_path to support to add nothing to fw_dev_path
Am 29.05.2013 09:56, schrieb Amos Kong: Recent virtio refactoring in QEMU made virtio-bus become the parent bus of scsi-bus, and virtio-bus doesn't have get_fw_dev_path implementation, typename will be added to fw_dev_path by default, the new fw_dev_path could not be identified by seabios. It causes that bootindex parameter of scsi device doesn't work. This patch implements get_fw_dev_path() in BusClass, it will be called if bus doesn't implement the method, tyename will be added to fw_dev_path. If the implemented method returns NULL, nothing will be added to fw_dev_path. It also implements virtio_bus_get_fw_dev_path() to return NULL. Then QEMU will still pass original style of fw_dev_path to seabios. Signed-off-by: Amos Kong ak...@redhat.com -- v2: only add nothing to fw_dev_path when get_fw_dev_path() is implemented and returns NULL. then it will not effect other devices don't have get_fw_dev_path() implementation. v3: implement default get_fw_dev_path() in BusClass --- hw/core/qdev.c | 10 +- hw/virtio/virtio-bus.c | 6 ++ 2 files changed, 15 insertions(+), 1 deletion(-) diff --git a/hw/core/qdev.c b/hw/core/qdev.c index 6985ad8..9190a7e 100644 --- a/hw/core/qdev.c +++ b/hw/core/qdev.c @@ -515,7 +515,7 @@ static int qdev_get_fw_dev_path_helper(DeviceState *dev, char *p, int size) l += snprintf(p + l, size - l, %s, d); g_free(d); } else { -l += snprintf(p + l, size - l, %s, object_get_typename(OBJECT(dev))); +return l; } } l += snprintf(p + l , size - l, /); @@ -867,9 +867,17 @@ static void qbus_initfn(Object *obj) QTAILQ_INIT(bus-children); } +static char *default_bus_get_fw_dev_path(DeviceState *dev) +{ +return g_strdup(object_get_typename(OBJECT(dev))); +} + static void bus_class_init(ObjectClass *class, void *data) { +BusClass *bc = BUS_CLASS(class); + class-unparent = bus_unparent; +bc-get_fw_dev_path = default_bus_get_fw_dev_path; } static void qbus_finalize(Object *obj) diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c index ea2e11a..6849a01 100644 --- a/hw/virtio/virtio-bus.c +++ b/hw/virtio/virtio-bus.c @@ -161,10 +161,16 @@ static char *virtio_bus_get_dev_path(DeviceState *dev) return qdev_get_dev_path(proxy); } +static char *virtio_bus_get_fw_dev_path(DeviceState *dev) +{ +return NULL; +} + static void virtio_bus_class_init(ObjectClass *klass, void *data) { BusClass *bus_class = BUS_CLASS(klass); bus_class-get_dev_path = virtio_bus_get_dev_path; +bus_class-get_fw_dev_path = virtio_bus_get_fw_dev_path; } static const TypeInfo virtio_bus_info = { To me booting VMs with more than one SCSI disk does still not work. net: bootindex=100 scsi0: bootindex=201 does not work this one works fine: net: bootindex=200 scsi0: bootindex=101 Stefan
Re: [Qemu-devel] [RFC] Check backing_file chain's loop
On 05/30/2013 04:14 AM, Dongxu Wang wrote: Hi, Now block layer does not check whether backing_file chain can be a circle, do you think it is necessary to do a check? For example, 1.qcow2's backing_file is 2.qcow2 and 2.qcow2's backing_file is 1.qcow2, then any IO operation will be a fault. If necessary, I will post a patch to fix. Yes, it's worth patching. [$] qemu-img create -f qcow2 1.qcow2 qemu-img: Image creation needs a size parameter [$] qemu-img create -f qcow2 1.qcow2 8M Formatting '1.qcow2', fmt=qcow2 size=8388608 encryption=off cluster_size=65536 lazy_refcounts=off [$] qemu-img create -f qcow2 2.qcow2 -o backing_file=1.qcow2 Formatting '2.qcow2', fmt=qcow2 size=8388608 backing_file='1.qcow2' encryption=off cluster_size=65536 lazy_refcounts=off [$] qemu-img create -f qcow2 1.qcow2 -o backing_file=2.qcow2 Formatting '1.qcow2', fmt=qcow2 size=8388608 backing_file='2.qcow2' encryption=off cluster_size=65536 lazy_refcounts=off [$] qemu-io -c read -v 0 512 1.qcow2 ^CSegmentation fault (core dumped) Also make sure that 'qemu-img info 1.qcow2' and 'qemu-img info --backing-chain 1.qcow2' don't crash. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH v2 2/5] pci: store PCI hole ranges in guestinfo structure
Hi, +} else { +guest_info-pci_info.w64.begin = 0x1ULL + above_4g_mem_size; +guest_info-pci_info.w64.end = guest_info-pci_info.w64.begin + +(0x1ULL 62); Doesn't this give unaligned windows? +/* Set PCI window size the way seabios has always done it. */ +/* TODO: consider just starting at below_4g_mem_size */ Used to be that way. Was changed for alignment reasons (i.e. 1G window starts at 1G border etc). cheers, Gerd
Re: [Qemu-devel] [PATCH v2 2/5] pci: store PCI hole ranges in guestinfo structure
On Thu, May 30, 2013 at 02:16:13PM +0200, Gerd Hoffmann wrote: Hi, +} else { +guest_info-pci_info.w64.begin = 0x1ULL + above_4g_mem_size; +guest_info-pci_info.w64.end = guest_info-pci_info.w64.begin + +(0x1ULL 62); Doesn't this give unaligned windows? PCI Bridge windows do not need to be size aligned. In any case, the windows are *exactly* as calculated by seabios - apparently it does not size-align windows either. +/* Set PCI window size the way seabios has always done it. */ +/* TODO: consider just starting at below_4g_mem_size */ Used to be that way. Was changed for alignment reasons (i.e. 1G window starts at 1G border etc). Where's the alignment requirement coming from? cheers, Gerd
Re: [Qemu-devel] [SeaBIOS] KVM call agenda for 2013-05-28
On Thu, 2013-05-30 at 13:13 +0200, Laszlo Ersek wrote: Where is CorebootPkg available from? https://github.com/pgeorgi/edk2/tree/coreboot-pkg And it helps to dispel the stupid misconception in some quarters that Coreboot *competes* with UEFI and thus cannot possibly be supported because helping something that competes with UEFI would be bad. I'm not sure who do you mean by some quarters, but for some distributions Coreboot would be yet another component (package) to support, for no obvious benefit. (Gerd said it better than I possibly could: http://thread.gmane.org/gmane.comp.bios.coreboot.seabios/5685/focus=5705.) Yeah, but if we're shoving a lot of hardware-specific ACPI table generation into the guest's firmware, instead of just doing it on the qemu side where a number of us seem to think it belongs, then there *is* a benefit to using Coreboot. When stuff changes on the qemu side and we have to update the table generation to match, you end up having to update just the Coreboot package, and *not* having to patch both SeaBIOS and OVMF. The extra package in the distro really isn't painful to handle, and I suspect it would end up *reducing* the amount of work that we have to do to update. You update *just* Coreboot, not *both* of SeaBIOS and OVMF. If you implement a private UEFI FAT driver from scratch, or port a free software FAT implementation (eg. the r/o one in grub or the r/w one in mtools), you could still run into legal problems, I've been told. That has been said, and it's been said for the FAT implementation in the kernel too. If a distribution is happy to ship the kernel without ripping out its FAT support, then I see no reason why that distribution wouldn't also be happy to ship a version of OVMF with a clean implementation of FAT support. It doesn't make sense to be happy with one but not the other. We need at least one out-of-tree edk2 patch for now (from you) but apparently that's no problem. That'll get merged soon. We are working on the corresponding spec update... As far as I understand: - Jordan is nearing completion of flash support on KVM, - he also has WIP NvVar storage on top of qemu flash. http://thread.gmane.org/gmane.comp.emulators.qemu/213690 http://thread.gmane.org/gmane.comp.bios.tianocore.devel/2781/focus=2798 (Please coordinate I guess :)) Ooh, shiny. Yeah, when I get back to actually working on it rather than just heckling, I'll make sure I do :) -- dwmw2 smime.p7s Description: S/MIME cryptographic signature
Re: [Qemu-devel] [PATCH v2 2/5] pci: store PCI hole ranges in guestinfo structure
I can't offer any opinion about the values you put into w32 and w64, but I have some remarks. First, please consider passing -O/path/to/some/order_file to git-format-patch, so that .h files show up at the top of each patch. On 05/30/13 13:07, Michael S. Tsirkin wrote: Will be used to pass hole ranges to guests. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/i386/pc.c | 39 ++- hw/i386/pc_piix.c | 14 +- hw/i386/pc_q35.c | 6 +- hw/pci-host/q35.c | 4 include/hw/i386/pc.h | 19 ++- include/hw/pci-host/q35.h | 2 ++ include/qemu/typedefs.h | 1 + 7 files changed, 81 insertions(+), 4 deletions(-) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 4844a6b..c233161 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -978,6 +978,41 @@ void pc_cpus_init(const char *cpu_model, DeviceState *icc_bridge) } } +typedef struct PcGuestInfoState { +PcGuestInfo info; +Notifier machine_done; +} PcGuestInfoState; + +static +void pc_guest_info_machine_done(Notifier *notifier, void *data) +{ +PcGuestInfoState *guest_info_state = container_of(notifier, + PcGuestInfoState, + machine_done); +} + +PcGuestInfo *pc_guest_info_init(ram_addr_t below_4g_mem_size, +ram_addr_t above_4g_mem_size) +{ +PcGuestInfoState *guest_info_state = g_malloc0(sizeof *guest_info_state); +PcGuestInfo *guest_info = guest_info_state-info; + +guest_info-pci_info.w32.end = IO_APIC_DEFAULT_ADDRESS; +if (sizeof(hwaddr) == 4) { +guest_info-pci_info.w64.begin = 0; +guest_info-pci_info.w64.end = 0; +} else { +guest_info-pci_info.w64.begin = 0x1ULL + above_4g_mem_size; +guest_info-pci_info.w64.end = guest_info-pci_info.w64.begin + +(0x1ULL 62); +assert(guest_info-pci_info.w64.begin = guest_info-pci_info.w64.end); +} + +guest_info_state-machine_done.notify = pc_guest_info_machine_done; +qemu_add_machine_init_done_notifier(guest_info_state-machine_done); +return guest_info; +} + void pc_acpi_init(const char *default_dsdt) { char *filename; @@ -1019,7 +1054,8 @@ FWCfgState *pc_memory_init(MemoryRegion *system_memory, ram_addr_t below_4g_mem_size, ram_addr_t above_4g_mem_size, MemoryRegion *rom_memory, - MemoryRegion **ram_memory) + MemoryRegion **ram_memory, + PcGuestInfo *guest_info) { int linux_boot, i; MemoryRegion *ram, *option_rom_mr; @@ -1071,6 +1107,7 @@ FWCfgState *pc_memory_init(MemoryRegion *system_memory, for (i = 0; i nb_option_roms; i++) { rom_add_option(option_rom[i].name, option_rom[i].bootindex); } +guest_info-fw_cfg = fw_cfg; return fw_cfg; } I find this suboptimal style, ie. passing guest_info to pc_memory_init() just so that pc_memory_init() can set guest_info-fw_cfg to fw_cfg, when pc_memory_init() returns fw_cfg anyway. More philosophically: diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h index b4c8a74..1bf5219 100644 --- a/include/hw/i386/pc.h +++ b/include/hw/i386/pc.h @@ -9,8 +9,20 @@ #include net/net.h #include hw/i386/ioapic.h +#include qemu/range.h + /* PC-style peripherals (also used by other machines). */ +typedef struct PcPciInfo { +Range w32; +Range w64; +} PcPciInfo; + +struct PcGuestInfo { +PcPciInfo pci_info; +FWCfgState *fw_cfg; +}; Are you sure you need a private link to the global fw_cfg object? The pvpanic series added a global lookup possibility in commit 10a584b2 (object_resolve_path()). Anyway these are just subjective style notes. + /* parallel.c */ static inline bool parallel_init(ISABus *bus, int index, CharDriverState *chr) { @@ -80,6 +92,10 @@ void pc_acpi_smi_interrupt(void *opaque, int irq, int level); void pc_cpus_init(const char *cpu_model, DeviceState *icc_bridge); void pc_hot_add_cpu(const int64_t id, Error **errp); void pc_acpi_init(const char *default_dsdt); + +PcGuestInfo *pc_guest_info_init(ram_addr_t below_4g_mem_size, +ram_addr_t above_4g_mem_size); + FWCfgState *pc_memory_init(MemoryRegion *system_memory, const char *kernel_filename, const char *kernel_cmdline, @@ -87,7 +103,8 @@ FWCfgState *pc_memory_init(MemoryRegion *system_memory, ram_addr_t below_4g_mem_size, ram_addr_t above_4g_mem_size, MemoryRegion *rom_memory, - MemoryRegion **ram_memory); +
Re: [Qemu-devel] [SeaBIOS] KVM call agenda for 2013-05-28
On Thu, May 30, 2013 at 01:19:18PM +0100, David Woodhouse wrote: Yeah, but if we're shoving a lot of hardware-specific ACPI table generation into the guest's firmware, instead of just doing it on the qemu side where a number of us seem to think it belongs, Hopefully this is not yet set in stone. then there *is* a benefit to using Coreboot. When stuff changes on the qemu side and we have to update the table generation to match, you end up having to update just the Coreboot package, and *not* having to patch both SeaBIOS and OVMF. We have all kind of logic in qemu. Some of it can thinkably be moved to a separate VM - it doesn't even need to run in the same VM as the guest - we could do it e.g. like kvm unit-test does, with less pain than running it in firmware. Not clear why would generating ACPI tables - which merely fills up an array of bytes from internal QEMU datastructures - should be the part where we start this modularization. -- MST
Re: [Qemu-devel] [PATCH v2 1/3] block: add basic backup support to block driver
On Thu, May 30, 2013 at 5:37 AM, Fam Zheng f...@redhat.com wrote: On Mon, 04/29 09:42, Stefan Hajnoczi wrote: + +static void coroutine_fn backup_run(void *opaque) +{ +BackupBlockJob *job = opaque; +BlockDriverState *bs = job-common.bs; +assert(bs); + +int64_t start, end; + +start = 0; +end = DIV_ROUND_UP(bdrv_getlength(bs) / BDRV_SECTOR_SIZE, + BACKUP_BLOCKS_PER_CLUSTER); + +job-bitmap = hbitmap_alloc(end, 0); + +DPRINTF(backup_run start %s % PRId64 % PRId64 \n, +bdrv_get_device_name(bs), start, end); + +int ret = 0; + +for (; start end; start++) { +if (block_job_is_cancelled(job-common)) { +ret = -1; +break; +} + +/* we need to yield so that qemu_aio_flush() returns. + * (without, VM does not reboot) + */ +if (job-common.speed) { +uint64_t delay_ns = ratelimit_calculate_delay( +job-limit, job-sectors_read); +job-sectors_read = 0; +block_job_sleep_ns(job-common, rt_clock, delay_ns); +} else { +block_job_sleep_ns(job-common, rt_clock, 0); +} + +if (block_job_is_cancelled(job-common)) { +ret = -1; +break; +} + +if (hbitmap_get(job-bitmap, start)) { +continue; /* already copied */ +} + +DPRINTF(backup_run loop C% PRId64 \n, start); + +/** + * This triggers a cluster copy + * Note: avoid direct call to brdv_co_backup_cow, because + * this does not call tracked_request_begin() + */ +ret = bdrv_co_backup(bs, start*BACKUP_BLOCKS_PER_CLUSTER, 1); +if (ret 0) { +break; +} +/* Publish progress */ +job-common.offset += BACKUP_CLUSTER_SIZE; +} + +/* wait until pending backup_do_cow()calls have completed */ +qemu_co_rwlock_wrlock(job-rwlock); +qemu_co_rwlock_unlock(job-rwlock); + +hbitmap_free(job-bitmap); + +bdrv_delete(job-target); + +DPRINTF(backup_run complete %d\n, ret); +block_job_completed(job-common, ret); +} What will ret value be when the source block is not aligned to cluster size? Good catch, will fix. Stefan
Re: [Qemu-devel] [PATCH v2 2/5] pci: store PCI hole ranges in guestinfo structure
On 05/30/13 14:19, Michael S. Tsirkin wrote: On Thu, May 30, 2013 at 02:16:13PM +0200, Gerd Hoffmann wrote: Hi, +} else { +guest_info-pci_info.w64.begin = 0x1ULL + above_4g_mem_size; +guest_info-pci_info.w64.end = guest_info-pci_info.w64.begin + +(0x1ULL 62); Doesn't this give unaligned windows? PCI Bridge windows do not need to be size aligned. In any case, the windows are *exactly* as calculated by seabios - apparently it does not size-align windows either. Surely not. SeaBIOS sizes the 64bit window according to the space needed by the 64bit bars it wants to map there. +/* Set PCI window size the way seabios has always done it. */ +/* TODO: consider just starting at below_4g_mem_size */ Used to be that way. Was changed for alignment reasons (i.e. 1G window starts at 1G border etc). Where's the alignment requirement coming from? seabios creates a mtrr entry for the window, which doesn't work in case it isn't aligned (at least not with a single entry). Also real hardware tends to do it this way. cheers, Gerd
[Qemu-devel] [PATCH v5 10/11] blockdev: add Abort transaction
The Abort action can be used to test QMP 'transaction' failure. Add it as the last action to exercise the .abort() and .cleanup() code paths for all previous actions. Signed-off-by: Stefan Hajnoczi stefa...@redhat.com --- blockdev.c | 15 +++ qapi-schema.json | 13 - 2 files changed, 27 insertions(+), 1 deletion(-) diff --git a/blockdev.c b/blockdev.c index 18a2012..e3393d2 100644 --- a/blockdev.c +++ b/blockdev.c @@ -963,6 +963,16 @@ static void drive_backup_abort(BlkTransactionState *common) } } +static void abort_prepare(BlkTransactionState *common, Error **errp) +{ +error_setg(errp, Transaction aborted using Abort action); +} + +static void abort_commit(BlkTransactionState *common) +{ +assert(false); /* this action never succeeds */ +} + static const BdrvActionOps actions[] = { [TRANSACTION_ACTION_KIND_BLOCKDEV_SNAPSHOT_SYNC] = { .instance_size = sizeof(ExternalSnapshotState), @@ -975,6 +985,11 @@ static const BdrvActionOps actions[] = { .prepare = drive_backup_prepare, .abort = drive_backup_abort, }, +[TRANSACTION_ACTION_KIND_ABORT] = { +.instance_size = sizeof(BlkTransactionState), +.prepare = abort_prepare, +.commit = abort_commit, +}, }; /* diff --git a/qapi-schema.json b/qapi-schema.json index 6bf96aa..0da1b98 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -1646,6 +1646,16 @@ '*on-target-error': 'BlockdevOnError' } } ## +# @Abort +# +# This action can be used to test transaction failure. +# +# Since: 1.6 +### +{ 'type': 'Abort', + 'data': { } } + +## # @TransactionAction # # A discriminated record of operations that can be performed with @@ -1654,7 +1664,8 @@ { 'union': 'TransactionAction', 'data': { 'blockdev-snapshot-sync': 'BlockdevSnapshot', - 'drive-backup': 'DriveBackup' + 'drive-backup': 'DriveBackup', + 'abort': 'Abort' } } ## -- 1.8.1.4
[Qemu-devel] [PATCH v5 09/11] blockdev: add DriveBackup transaction
This patch adds a transactional version of the drive-backup QMP command. It allows atomic snapshots of multiple drives along with automatic cleanup if there is a failure to start one of the backup jobs. Note that QMP events are emitted for block job completion/cancellation and the block job will be listed by query-block-jobs. @device: the name of the device whose writes should be mirrored. @target: the target of the new image. If the file exists, or if it is a device, the existing file/device will be used as the new destination. If it does not exist, a new file will be created. @format: #optional the format of the new destination, default is to probe if @mode is 'existing', else the format of the source @mode: #optional whether and how QEMU should create a new image, default is 'absolute-paths'. @speed: #optional the maximum speed, in bytes per second @on-source-error: #optional the action to take on an error on the source, default 'report'. 'stop' and 'enospc' can only be used if the block device supports io-status (see BlockInfo). @on-target-error: #optional the action to take on an error on the target, default 'report' (no limitations, since this applies to a different block device than @device). Signed-off-by: Stefan Hajnoczi stefa...@redhat.com --- blockdev.c | 49 + qapi-schema.json | 40 +++- 2 files changed, 88 insertions(+), 1 deletion(-) diff --git a/blockdev.c b/blockdev.c index 0e649df..18a2012 100644 --- a/blockdev.c +++ b/blockdev.c @@ -919,6 +919,50 @@ static void external_snapshot_abort(BlkTransactionState *common) } } +typedef struct DriveBackupState { +BlkTransactionState common; +BlockDriverState *bs; +BlockJob *job; +} DriveBackupState; + +static void drive_backup_prepare(BlkTransactionState *common, Error **errp) +{ +DriveBackupState *state = DO_UPCAST(DriveBackupState, common, common); +DriveBackup *backup; +Error *local_err = NULL; + +assert(common-action-kind == TRANSACTION_ACTION_KIND_DRIVE_BACKUP); +backup = common-action-drive_backup; + +qmp_drive_backup(backup-device, backup-target, + backup-has_format, backup-format, + backup-has_mode, backup-mode, + backup-has_speed, backup-speed, + backup-has_on_source_error, backup-on_source_error, + backup-has_on_target_error, backup-on_target_error, + local_err); +if (error_is_set(local_err)) { +error_propagate(errp, local_err); +state-bs = NULL; +state-job = NULL; +return; +} + +state-bs = bdrv_find(backup-device); +state-job = state-bs-job; +} + +static void drive_backup_abort(BlkTransactionState *common) +{ +DriveBackupState *state = DO_UPCAST(DriveBackupState, common, common); +BlockDriverState *bs = state-bs; + +/* Only cancel if it's the job we started */ +if (bs bs-job bs-job == state-job) { +block_job_cancel_sync(bs-job); +} +} + static const BdrvActionOps actions[] = { [TRANSACTION_ACTION_KIND_BLOCKDEV_SNAPSHOT_SYNC] = { .instance_size = sizeof(ExternalSnapshotState), @@ -926,6 +970,11 @@ static const BdrvActionOps actions[] = { .commit = external_snapshot_commit, .abort = external_snapshot_abort, }, +[TRANSACTION_ACTION_KIND_DRIVE_BACKUP] = { +.instance_size = sizeof(DriveBackupState), +.prepare = drive_backup_prepare, +.abort = drive_backup_abort, +}, }; /* diff --git a/qapi-schema.json b/qapi-schema.json index 55b1a37..6bf96aa 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -1609,6 +1609,43 @@ '*mode': 'NewImageMode' } } ## +# @DriveBackup +# +# @device: the name of the device which should be copied. +# +# @target: the target of the new image. If the file exists, or if it +# is a device, the existing file/device will be used as the new +# destination. If it does not exist, a new file will be created. +# +# @format: #optional the format of the new destination, default is to +# probe if @mode is 'existing', else the format of the source +# +# @mode: #optional whether and how QEMU should create a new image, default is +#'absolute-paths'. +# +# @speed: #optional the maximum speed, in bytes per second +# +# @on-source-error: #optional the action to take on an error on the source, +# default 'report'. 'stop' and 'enospc' can only be used +# if the block device supports io-status (see BlockInfo). +# +# @on-target-error: #optional the action to take on an error on the target, +# default 'report' (no limitations, since this applies to +# a different block device than @device). +# +# Note
[Qemu-devel] [PATCH v5 08/11] blockdev: allow BdrvActionOps-commit() to be NULL
Some QMP 'transaction' types don't need to do anything on .commit(). Make .commit() optional just like .abort(). The drive-backup action will take advantage of this, it only needs to cancel the block job on .abort(). Other block job actions will probably follow the same pattern, so allow .commit() to be NULL. Suggested-by: Eric Blake ebl...@redhat.com Signed-off-by: Stefan Hajnoczi stefa...@redhat.com --- blockdev.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/blockdev.c b/blockdev.c index 94ad829..0e649df 100644 --- a/blockdev.c +++ b/blockdev.c @@ -789,7 +789,7 @@ typedef struct BdrvActionOps { size_t instance_size; /* Prepare the work, must NOT be NULL. */ void (*prepare)(BlkTransactionState *common, Error **errp); -/* Commit the changes, must NOT be NULL. */ +/* Commit the changes, can be NULL. */ void (*commit)(BlkTransactionState *common); /* Abort the changes on fail, can be NULL. */ void (*abort)(BlkTransactionState *common); @@ -969,7 +969,9 @@ void qmp_transaction(TransactionActionList *dev_list, Error **errp) } QSIMPLEQ_FOREACH(state, snap_bdrv_states, entry) { -state-ops-commit(state); +if (state-ops-commit) { +state-ops-commit(state); +} } /* success */ -- 1.8.1.4
[Qemu-devel] [PATCH v5 04/11] blockdev: drop redundant proto_drv check
It is not necessary to check that we can find a protocol block driver since we create or open the image file. This produces the error that we need anyway. Besides, the QERR_INVALID_BLOCK_FORMAT is inappropriate since the protocol is incorrect rather than the format. Also drop an empty line between bdrv_open() and checking its return value. This may be due to copy-pasting from earlier code that performed other operations before handling errors. Reported-by: Kevin Wolf kw...@redhat.com Signed-off-by: Stefan Hajnoczi stefa...@redhat.com --- blockdev.c | 15 --- 1 file changed, 15 deletions(-) diff --git a/blockdev.c b/blockdev.c index b9b2d10..01db519 100644 --- a/blockdev.c +++ b/blockdev.c @@ -818,7 +818,6 @@ typedef struct ExternalSnapshotStates { static void external_snapshot_prepare(BlkTransactionStates *common, Error **errp) { -BlockDriver *proto_drv; BlockDriver *drv; int flags, ret; Error *local_err = NULL; @@ -874,12 +873,6 @@ static void external_snapshot_prepare(BlkTransactionStates *common, flags = states-old_bs-open_flags; -proto_drv = bdrv_find_protocol(new_image_file); -if (!proto_drv) { -error_set(errp, QERR_INVALID_BLOCK_FORMAT, format); -return; -} - /* create new image w/backing file */ if (mode != NEW_IMAGE_MODE_EXISTING) { bdrv_img_create(new_image_file, format, @@ -1368,7 +1361,6 @@ void qmp_drive_mirror(const char *device, const char *target, { BlockDriverState *bs; BlockDriverState *source, *target_bs; -BlockDriver *proto_drv; BlockDriver *drv = NULL; Error *local_err = NULL; int flags; @@ -1436,12 +1428,6 @@ void qmp_drive_mirror(const char *device, const char *target, sync = MIRROR_SYNC_MODE_FULL; } -proto_drv = bdrv_find_protocol(target); -if (!proto_drv) { -error_set(errp, QERR_INVALID_BLOCK_FORMAT, format); -return; -} - bdrv_get_geometry(bs, size); size *= 512; if (sync == MIRROR_SYNC_MODE_FULL mode != NEW_IMAGE_MODE_EXISTING) { @@ -1476,7 +1462,6 @@ void qmp_drive_mirror(const char *device, const char *target, */ target_bs = bdrv_new(); ret = bdrv_open(target_bs, target, NULL, flags | BDRV_O_NO_BACKING, drv); - if (ret 0) { bdrv_delete(target_bs); error_set(errp, QERR_OPEN_FILE_FAILED, target); -- 1.8.1.4
[Qemu-devel] [PATCH v5 07/11] blockdev: rename BlkTransactionStates to singular
The QMP 'transaction' command keeps a list of in-flight transactions. The transaction state structure is called BlkTransactionStates even though it only deals with a single transaction. The only plural thing is the linked list of transaction states. I find it confusing to call the single structure States. This patch renames it to State, just like BlockDriverState is singular. Signed-off-by: Stefan Hajnoczi stefa...@redhat.com --- blockdev.c | 104 ++--- 1 file changed, 52 insertions(+), 52 deletions(-) diff --git a/blockdev.c b/blockdev.c index 5de33f5..94ad829 100644 --- a/blockdev.c +++ b/blockdev.c @@ -780,7 +780,7 @@ void qmp_blockdev_snapshot_sync(const char *device, const char *snapshot_file, /* New and old BlockDriverState structs for group snapshots */ -typedef struct BlkTransactionStates BlkTransactionStates; +typedef struct BlkTransactionState BlkTransactionState; /* Only prepare() may fail. In a single transaction, only one of commit() or abort() will be called, clean() will always be called if it present. */ @@ -788,13 +788,13 @@ typedef struct BdrvActionOps { /* Size of state struct, in bytes. */ size_t instance_size; /* Prepare the work, must NOT be NULL. */ -void (*prepare)(BlkTransactionStates *common, Error **errp); +void (*prepare)(BlkTransactionState *common, Error **errp); /* Commit the changes, must NOT be NULL. */ -void (*commit)(BlkTransactionStates *common); +void (*commit)(BlkTransactionState *common); /* Abort the changes on fail, can be NULL. */ -void (*abort)(BlkTransactionStates *common); +void (*abort)(BlkTransactionState *common); /* Clean up resource in the end, can be NULL. */ -void (*clean)(BlkTransactionStates *common); +void (*clean)(BlkTransactionState *common); } BdrvActionOps; /* @@ -802,20 +802,20 @@ typedef struct BdrvActionOps { * that compiler will also arrange it to the same address with parent instance. * Later it will be used in free(). */ -struct BlkTransactionStates { +struct BlkTransactionState { TransactionAction *action; const BdrvActionOps *ops; -QSIMPLEQ_ENTRY(BlkTransactionStates) entry; +QSIMPLEQ_ENTRY(BlkTransactionState) entry; }; /* external snapshot private data */ -typedef struct ExternalSnapshotStates { -BlkTransactionStates common; +typedef struct ExternalSnapshotState { +BlkTransactionState common; BlockDriverState *old_bs; BlockDriverState *new_bs; -} ExternalSnapshotStates; +} ExternalSnapshotState; -static void external_snapshot_prepare(BlkTransactionStates *common, +static void external_snapshot_prepare(BlkTransactionState *common, Error **errp) { BlockDriver *drv; @@ -825,8 +825,8 @@ static void external_snapshot_prepare(BlkTransactionStates *common, const char *new_image_file; const char *format = qcow2; enum NewImageMode mode = NEW_IMAGE_MODE_ABSOLUTE_PATHS; -ExternalSnapshotStates *states = - DO_UPCAST(ExternalSnapshotStates, common, common); +ExternalSnapshotState *state = + DO_UPCAST(ExternalSnapshotState, common, common); TransactionAction *action = common-action; /* get parameters */ @@ -848,36 +848,36 @@ static void external_snapshot_prepare(BlkTransactionStates *common, return; } -states-old_bs = bdrv_find(device); -if (!states-old_bs) { +state-old_bs = bdrv_find(device); +if (!state-old_bs) { error_set(errp, QERR_DEVICE_NOT_FOUND, device); return; } -if (!bdrv_is_inserted(states-old_bs)) { +if (!bdrv_is_inserted(state-old_bs)) { error_set(errp, QERR_DEVICE_HAS_NO_MEDIUM, device); return; } -if (bdrv_in_use(states-old_bs)) { +if (bdrv_in_use(state-old_bs)) { error_set(errp, QERR_DEVICE_IN_USE, device); return; } -if (!bdrv_is_read_only(states-old_bs)) { -if (bdrv_flush(states-old_bs)) { +if (!bdrv_is_read_only(state-old_bs)) { +if (bdrv_flush(state-old_bs)) { error_set(errp, QERR_IO_ERROR); return; } } -flags = states-old_bs-open_flags; +flags = state-old_bs-open_flags; /* create new image w/backing file */ if (mode != NEW_IMAGE_MODE_EXISTING) { bdrv_img_create(new_image_file, format, -states-old_bs-filename, -states-old_bs-drv-format_name, +state-old_bs-filename, +state-old_bs-drv-format_name, NULL, -1, flags, local_err, false); if (error_is_set(local_err)) { error_propagate(errp, local_err); @@ -886,42 +886,42 @@ static void external_snapshot_prepare(BlkTransactionStates *common, } /* We will manually add the backing_hd field to the bs
Re: [Qemu-devel] [SeaBIOS] KVM call agenda for 2013-05-28
On 05/30/13 14:19, David Woodhouse wrote: Yeah, but if we're shoving a lot of hardware-specific ACPI table generation into the guest's firmware, instead of just doing it on the qemu side where a number of us seem to think it belongs, then there *is* a benefit to using Coreboot. When stuff changes on the qemu side and we have to update the table generation to match, you end up having to update just the Coreboot package, and *not* having to patch both SeaBIOS and OVMF. The extra package in the distro really isn't painful to handle, and I suspect it would end up *reducing* the amount of work that we have to do to update. You update *just* Coreboot, not *both* of SeaBIOS and OVMF. I can't deny there's logic in that, but it still feels like tying ourselves up in some intricate bondage choreography. We'd like to move ACPI tables out of firmware, but we can't move them to qemu due to project direction disagreement, so let's adopt a middleman. (I'm not trying to denigrate coreboot -- I don't know it at all --, but introducing it in a (granted, distro-specific) stack just for this purpose seems quite arbitrary.) If you implement a private UEFI FAT driver from scratch, or port a free software FAT implementation (eg. the r/o one in grub or the r/w one in mtools), you could still run into legal problems, I've been told. That has been said, and it's been said for the FAT implementation in the kernel too. If a distribution is happy to ship the kernel without ripping out its FAT support, then I see no reason why that distribution wouldn't also be happy to ship a version of OVMF with a clean implementation of FAT support. It doesn't make sense to be happy with one but not the other. Under my *personal* impression, logic and Common Law don't really mix, especially not in the US. Still under my *personal* impression, someone might not feel convenient suing you for redistributing code that already exists in the upstream Linux kernel, but might happily drag you to court for an original clean implementation, and then you can explain it's illogical for them to do so. The best I can do with your suggestion is to take it to our legal dept. I would be happy to work on a new FAT driver. (I used to feel differently earlier but I've changed my mind.) We need at least one out-of-tree edk2 patch for now (from you) but apparently that's no problem. That'll get merged soon. We are working on the corresponding spec update... Great news! Thanks, Laszlo
[Qemu-devel] [PATCH v5 05/11] blockdev: use bdrv_getlength() in qmp_drive_mirror()
Use bdrv_getlength() for its byte units and error return instead of bdrv_get_geometry(). Reported-by: Kevin Wolf kw...@redhat.com Signed-off-by: Stefan Hajnoczi stefa...@redhat.com --- blockdev.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/blockdev.c b/blockdev.c index 01db519..73b175b 100644 --- a/blockdev.c +++ b/blockdev.c @@ -1364,7 +1364,7 @@ void qmp_drive_mirror(const char *device, const char *target, BlockDriver *drv = NULL; Error *local_err = NULL; int flags; -uint64_t size; +int64_t size; int ret; if (!has_speed) { @@ -1428,8 +1428,12 @@ void qmp_drive_mirror(const char *device, const char *target, sync = MIRROR_SYNC_MODE_FULL; } -bdrv_get_geometry(bs, size); -size *= 512; +size = bdrv_getlength(bs); +if (size 0) { +error_setg_errno(errp, -size, bdrv_getlength failed); +return; +} + if (sync == MIRROR_SYNC_MODE_FULL mode != NEW_IMAGE_MODE_EXISTING) { /* create new image w/o backing file */ assert(format drv); -- 1.8.1.4
[Qemu-devel] [PATCH v5 02/11] block: add bdrv_add_before_write_notifier()
The bdrv_add_before_write_notifier() function installs a callback that is invoked before a write request is processed. This will be used to implement copy-on-write point-in-time snapshots where we need to copy out old data before overwriting it. Note that BdrvTrackedRequest is moved to block_int.h since it is passed to .notify() functions. Signed-off-by: Stefan Hajnoczi stefa...@redhat.com --- block.c | 23 --- include/block/block_int.h | 23 ++- 2 files changed, 34 insertions(+), 12 deletions(-) diff --git a/block.c b/block.c index 65c0b60..9e43f20 100644 --- a/block.c +++ b/block.c @@ -308,6 +308,7 @@ BlockDriverState *bdrv_new(const char *device_name) } bdrv_iostatus_disable(bs); notifier_list_init(bs-close_notifiers); +notifier_with_return_list_init(bs-before_write_notifiers); return bs; } @@ -1850,16 +1851,6 @@ int bdrv_commit_all(void) return 0; } -struct BdrvTrackedRequest { -BlockDriverState *bs; -int64_t sector_num; -int nb_sectors; -bool is_write; -QLIST_ENTRY(BdrvTrackedRequest) list; -Coroutine *co; /* owner, used for deadlock detection */ -CoQueue wait_queue; /* coroutines blocked on this request */ -}; - /** * Remove an active request from the tracked requests list * @@ -2630,7 +2621,11 @@ static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs, tracked_request_begin(req, bs, sector_num, nb_sectors, true); -if (flags BDRV_REQ_ZERO_WRITE) { +ret = notifier_with_return_list_notify(bs-before_write_notifiers, req); + +if (ret 0) { +/* Do nothing, write notifier decided to fail this request */ +} else if (flags BDRV_REQ_ZERO_WRITE) { ret = bdrv_co_do_write_zeroes(bs, sector_num, nb_sectors); } else { ret = drv-bdrv_co_writev(bs, sector_num, nb_sectors, qiov); @@ -4894,3 +4889,9 @@ AioContext *bdrv_get_aio_context(BlockDriverState *bs) /* Currently BlockDriverState always uses the main loop AioContext */ return qemu_get_aio_context(); } + +void bdrv_add_before_write_notifier(BlockDriverState *bs, +NotifierWithReturn *notifier) +{ +notifier_with_return_list_add(bs-before_write_notifiers, notifier); +} diff --git a/include/block/block_int.h b/include/block/block_int.h index 6078dd3..440d510 100644 --- a/include/block/block_int.h +++ b/include/block/block_int.h @@ -58,7 +58,16 @@ #define BLOCK_OPT_LAZY_REFCOUNTSlazy_refcounts #define BLOCK_OPT_ADAPTER_TYPE adapter_type -typedef struct BdrvTrackedRequest BdrvTrackedRequest; +typedef struct BdrvTrackedRequest { +BlockDriverState *bs; +int64_t sector_num; +int nb_sectors; +bool is_write; +QLIST_ENTRY(BdrvTrackedRequest) list; +Coroutine *co; /* owner, used for deadlock detection */ +CoQueue wait_queue; /* coroutines blocked on this request */ +} BdrvTrackedRequest; + typedef struct BlockIOLimit { int64_t bps[3]; @@ -247,6 +256,9 @@ struct BlockDriverState { NotifierList close_notifiers; +/* Callback before write request is processed */ +NotifierWithReturnList before_write_notifiers; + /* number of in-flight copy-on-read requests */ unsigned int copy_on_read_in_flight; @@ -298,6 +310,15 @@ void bdrv_set_io_limits(BlockDriverState *bs, BlockIOLimit *io_limits); /** + * bdrv_add_before_write_notifier: + * + * Register a callback that is invoked before write requests are processed but + * after any throttling or waiting for overlapping requests. + */ +void bdrv_add_before_write_notifier(BlockDriverState *bs, +NotifierWithReturn *notifier); + +/** * bdrv_get_aio_context: * * Returns: the currently bound #AioContext -- 1.8.1.4
[Qemu-devel] [PATCH v5 11/11] qemu-iotests: add 055 drive-backup test case
Testing drive-backup is similar to image streaming and drive mirroring. This test case is based on 041. Signed-off-by: Stefan Hajnoczi stefa...@redhat.com --- tests/qemu-iotests/055 | 256 + tests/qemu-iotests/055.out | 5 + tests/qemu-iotests/group | 1 + 3 files changed, 262 insertions(+) create mode 100755 tests/qemu-iotests/055 create mode 100644 tests/qemu-iotests/055.out diff --git a/tests/qemu-iotests/055 b/tests/qemu-iotests/055 new file mode 100755 index 000..0341301 --- /dev/null +++ b/tests/qemu-iotests/055 @@ -0,0 +1,256 @@ +#!/usr/bin/env python +# +# Tests for drive-backup +# +# Copyright (C) 2013 Red Hat, Inc. +# +# Based on 041. +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see http://www.gnu.org/licenses/. +# + +import time +import os +import iotests +from iotests import qemu_img, qemu_io + +test_img = os.path.join(iotests.test_dir, 'test.img') +target_img = os.path.join(iotests.test_dir, 'target.img') + +class TestSingleDrive(iotests.QMPTestCase): +image_len = 120 * 1024 * 1024 # MB + +def setUp(self): +qemu_img('create', '-f', iotests.imgfmt, test_img, str(TestSingleDrive.image_len)) +self.vm = iotests.VM().add_drive(test_img) +self.vm.launch() + +def tearDown(self): +self.vm.shutdown() +os.remove(test_img) +try: +os.remove(target_img) +except OSError: +pass + +def test_cancel(self): +self.assert_no_active_block_jobs() + +result = self.vm.qmp('drive-backup', device='drive0', + target=target_img) +self.assert_qmp(result, 'return', {}) + +self.cancel_and_wait() + +def test_pause(self): +self.assert_no_active_block_jobs() + +result = self.vm.qmp('drive-backup', device='drive0', + target=target_img) +self.assert_qmp(result, 'return', {}) + +result = self.vm.qmp('block-job-pause', device='drive0') +self.assert_qmp(result, 'return', {}) + +time.sleep(1) +result = self.vm.qmp('query-block-jobs') +offset = self.dictpath(result, 'return[0]/offset') + +time.sleep(1) +result = self.vm.qmp('query-block-jobs') +self.assert_qmp(result, 'return[0]/offset', offset) + +self.vm.shutdown() +self.assertTrue(iotests.compare_images(test_img, target_img), +'target image does not match source after backup') + +def test_medium_not_found(self): +result = self.vm.qmp('drive-backup', device='ide1-cd0', + target=target_img) +self.assert_qmp(result, 'error/class', 'GenericError') + +def test_image_not_found(self): +result = self.vm.qmp('drive-backup', device='drive0', + mode='existing', target=target_img) +self.assert_qmp(result, 'error/class', 'GenericError') + +def test_device_not_found(self): +result = self.vm.qmp('drive-backup', device='nonexistent', + target=target_img) +self.assert_qmp(result, 'error/class', 'DeviceNotFound') + +class TestSetSpeed(iotests.QMPTestCase): +image_len = 80 * 1024 * 1024 # MB + +def setUp(self): +qemu_img('create', '-f', iotests.imgfmt, test_img, str(TestSetSpeed.image_len)) +self.vm = iotests.VM().add_drive(test_img) +self.vm.launch() + +def tearDown(self): +self.vm.shutdown() +os.remove(test_img) +os.remove(target_img) + +def test_set_speed(self): +self.assert_no_active_block_jobs() + +result = self.vm.qmp('drive-backup', device='drive0', + target=target_img) +self.assert_qmp(result, 'return', {}) + +# Default speed is 0 +result = self.vm.qmp('query-block-jobs') +self.assert_qmp(result, 'return[0]/device', 'drive0') +self.assert_qmp(result, 'return[0]/speed', 0) + +result = self.vm.qmp('block-job-set-speed', device='drive0', speed=8 * 1024 * 1024) +self.assert_qmp(result, 'return', {}) + +# Ensure the speed we set was accepted +result = self.vm.qmp('query-block-jobs') +self.assert_qmp(result, 'return[0]/device', 'drive0') +self.assert_qmp(result, 'return[0]/speed', 8 * 1024 * 1024) + +self.cancel_and_wait() +
Re: [Qemu-devel] [PATCH] gdbstub: do not restart crashed guest
On Thu, 30 May 2013 13:55:50 +0200 Laszlo Ersek ler...@redhat.com wrote: On 05/30/13 13:20, Paolo Bonzini wrote: If a guest has crashed with an internal error or similar, detaching gdb (or any other debugger action) should not restart it. Cc: Jan Kiszka jan.kis...@siemens.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- gdbstub.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/gdbstub.c b/gdbstub.c index e80e1d3..90e54cb 100644 --- a/gdbstub.c +++ b/gdbstub.c @@ -371,7 +371,9 @@ static inline void gdb_continue(GDBState *s) #ifdef CONFIG_USER_ONLY s-running_state = 1; #else -vm_start(); +if (runstate_check(RUN_STATE_DEBUG)) { +vm_start(); +} #endif } I sought to check the gdb_continue() call sites, and uses of RUN_STATE_DEBUG. Seems reasonable. Reviewed-by: Laszlo Ersek ler...@redhat.com ( FWIW I wonder why in commit ad02b96a Luiz allowed DEBUG - SUSPENDED. As far as I understand, when the debugger is attached, the guest is not running, hence it can't go directly to RUN_STATE_SUSPENDED. Maybe due to a concurrent monitor command? I honestly don't remember if I took into account the explanation you gave below. You're right that a stopped guest can't suspend. Technically it does seem possible; from main_loop_should_exit(): if (qemu_debug_requested()) { vm_stop(RUN_STATE_DEBUG); } if (qemu_suspend_requested()) { qemu_system_suspend(); } Both requests could become pending during one iteration of the loop, and the next iteration will see both of them. OK. ) Laszlo
[Qemu-devel] [PATCH v5 06/11] block: add drive-backup QMP command
@drive-backup Start a point-in-time copy of a block device to a new destination. The status of ongoing drive-backup operations can be checked with query-block-jobs where the BlockJobInfo.type field has the value 'backup'. The operation can be stopped before it has completed using the block-job-cancel command. @device: the name of the device which should be copied. @target: the target of the new image. If the file exists, or if it is a device, the existing file/device will be used as the new destination. If it does not exist, a new file will be created. @format: #optional the format of the new destination, default is to probe if @mode is 'existing', else the format of the source @mode: #optional whether and how QEMU should create a new image, default is 'absolute-paths'. @speed: #optional the maximum speed, in bytes per second @on-source-error: #optional the action to take on an error on the source, default 'report'. 'stop' and 'enospc' can only be used if the block device supports io-status (see BlockInfo). @on-target-error: #optional the action to take on an error on the target, default 'report' (no limitations, since this applies to a different block device than @device). Note that @on-source-error and @on-target-error only affect background I/O. If an error occurs during a guest write request, the device's rerror/werror actions will be used. Returns: nothing on success If @device is not a valid block device, DeviceNotFound Since 1.6 Signed-off-by: Stefan Hajnoczi stefa...@redhat.com --- blockdev.c | 97 qapi-schema.json | 46 +++ qmp-commands.hx | 46 +++ 3 files changed, 189 insertions(+) diff --git a/blockdev.c b/blockdev.c index 73b175b..5de33f5 100644 --- a/blockdev.c +++ b/blockdev.c @@ -1346,6 +1346,103 @@ void qmp_block_commit(const char *device, drive_get_ref(drive_get_by_blockdev(bs)); } +void qmp_drive_backup(const char *device, const char *target, + bool has_format, const char *format, + bool has_mode, enum NewImageMode mode, + bool has_speed, int64_t speed, + bool has_on_source_error, BlockdevOnError on_source_error, + bool has_on_target_error, BlockdevOnError on_target_error, + Error **errp) +{ +BlockDriverState *bs; +BlockDriverState *target_bs; +BlockDriver *drv = NULL; +Error *local_err = NULL; +int flags; +int64_t size; +int ret; + +if (!has_speed) { +speed = 0; +} +if (!has_on_source_error) { +on_source_error = BLOCKDEV_ON_ERROR_REPORT; +} +if (!has_on_target_error) { +on_target_error = BLOCKDEV_ON_ERROR_REPORT; +} +if (!has_mode) { +mode = NEW_IMAGE_MODE_ABSOLUTE_PATHS; +} + +bs = bdrv_find(device); +if (!bs) { +error_set(errp, QERR_DEVICE_NOT_FOUND, device); +return; +} + +if (!bdrv_is_inserted(bs)) { +error_set(errp, QERR_DEVICE_HAS_NO_MEDIUM, device); +return; +} + +if (!has_format) { +format = mode == NEW_IMAGE_MODE_EXISTING ? NULL : bs-drv-format_name; +} +if (format) { +drv = bdrv_find_format(format); +if (!drv) { +error_set(errp, QERR_INVALID_BLOCK_FORMAT, format); +return; +} +} + +if (bdrv_in_use(bs)) { +error_set(errp, QERR_DEVICE_IN_USE, device); +return; +} + +flags = bs-open_flags | BDRV_O_RDWR; + +size = bdrv_getlength(bs); +if (size 0) { +error_setg_errno(errp, -size, bdrv_getlength failed); +return; +} + +if (mode != NEW_IMAGE_MODE_EXISTING) { +assert(format drv); +bdrv_img_create(target, format, +NULL, NULL, NULL, size, flags, local_err, false); +} + +if (error_is_set(local_err)) { +error_propagate(errp, local_err); +return; +} + +target_bs = bdrv_new(); +ret = bdrv_open(target_bs, target, NULL, flags, drv); +if (ret 0) { +bdrv_delete(target_bs); +error_set(errp, QERR_OPEN_FILE_FAILED, target); +return; +} + +backup_start(bs, target_bs, speed, on_source_error, on_target_error, + block_job_cb, bs, local_err); +if (local_err != NULL) { +bdrv_delete(target_bs); +error_propagate(errp, local_err); +return; +} + +/* Grab a reference so hotplug does not delete the BlockDriverState from + * underneath us. + */ +drive_get_ref(drive_get_by_blockdev(bs)); +} + #define DEFAULT_MIRROR_BUF_SIZE (10 20) void qmp_drive_mirror(const char *device, const char *target, diff --git a/qapi-schema.json b/qapi-schema.json index ef1f657..55b1a37 100644 ---
[Qemu-devel] [PATCH v5 03/11] block: add basic backup support to block driver
From: Dietmar Maurer diet...@proxmox.com backup_start() creates a block job that copies a point-in-time snapshot of a block device to a target block device. We call backup_do_cow() for each write during backup. That function reads the original data from the block device before it gets overwritten. The data is then written to the target device. Currently backup cluster size is hardcoded to 65536 bytes. [I made a number of changes to Dietmar's original patch and folded them in to make code review easy. Here is the full list: * Drop BackupDumpFunc interface in favor of a target block device * Detect zero clusters with buffer_is_zero() and use bdrv_co_write_zeroes() * Use 0 delay instead of 1us, like other block jobs * Unify creation/start functions into backup_start() * Simplify cleanup, free bitmap in backup_run() instead of cb * function * Use HBitmap to avoid duplicating bitmap code * Use bdrv_getlength() instead of accessing -total_sectors * directly * Delete the backup.h header file, it is no longer necessary * Move ./backup.c to block/backup.c * Remove #ifdefed out code * Coding style and whitespace cleanups * Use bdrv_add_before_write_notifier() instead of blockjob-specific hooks * Keep our own in-flight CowRequest list instead of using block.c tracked requests. This means a little code duplication but is much simpler than trying to share the tracked requests list and use the backup block size. * Add on_source_error and on_target_error error handling. -- stefanha] Signed-off-by: Dietmar Maurer diet...@proxmox.com Signed-off-by: Stefan Hajnoczi stefa...@redhat.com backup size fixes Signed-off-by: Stefan Hajnoczi stefa...@redhat.com --- block/Makefile.objs | 1 + block/backup.c| 355 ++ include/block/block_int.h | 19 +++ 3 files changed, 375 insertions(+) create mode 100644 block/backup.c diff --git a/block/Makefile.objs b/block/Makefile.objs index 5f0358a..88bd101 100644 --- a/block/Makefile.objs +++ b/block/Makefile.objs @@ -20,5 +20,6 @@ endif common-obj-y += stream.o common-obj-y += commit.o common-obj-y += mirror.o +common-obj-y += backup.o $(obj)/curl.o: QEMU_CFLAGS+=$(CURL_CFLAGS) diff --git a/block/backup.c b/block/backup.c new file mode 100644 index 000..466744a --- /dev/null +++ b/block/backup.c @@ -0,0 +1,355 @@ +/* + * QEMU backup + * + * Copyright (C) 2013 Proxmox Server Solutions + * + * Authors: + * Dietmar Maurer (diet...@proxmox.com) + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + * + */ + +#include stdio.h +#include errno.h +#include unistd.h + +#include block/block.h +#include block/block_int.h +#include block/blockjob.h +#include qemu/ratelimit.h + +#define DEBUG_BACKUP 0 + +#define DPRINTF(fmt, ...) \ +do { \ +if (DEBUG_BACKUP) { \ +fprintf(stderr, backup: fmt, ## __VA_ARGS__); \ +} \ +} while (0) + +#define BACKUP_CLUSTER_BITS 16 +#define BACKUP_CLUSTER_SIZE (1 BACKUP_CLUSTER_BITS) +#define BACKUP_SECTORS_PER_CLUSTER (BACKUP_CLUSTER_SIZE / BDRV_SECTOR_SIZE) + +#define SLICE_TIME 1ULL /* ns */ + +typedef struct CowRequest { +int64_t start; +int64_t end; +QLIST_ENTRY(CowRequest) list; +CoQueue wait_queue; /* coroutines blocked on this request */ +} CowRequest; + +typedef struct BackupBlockJob { +BlockJob common; +BlockDriverState *target; +RateLimit limit; +BlockdevOnError on_source_error; +BlockdevOnError on_target_error; +CoRwlock flush_rwlock; +uint64_t sectors_read; +HBitmap *bitmap; +QLIST_HEAD(, CowRequest) inflight_reqs; +} BackupBlockJob; + +/* See if in-flight requests overlap and wait for them to complete */ +static void coroutine_fn wait_for_overlapping_requests(BackupBlockJob *job, + int64_t start, + int64_t end) +{ +CowRequest *req; +bool retry; + +do { +retry = false; +QLIST_FOREACH(req, job-inflight_reqs, list) { +if (end req-start start req-end) { +qemu_co_queue_wait(req-wait_queue); +retry = true; +break; +} +} +} while (retry); +} + +/* Keep track of an in-flight request */ +static void cow_request_begin(CowRequest *req, BackupBlockJob *job, + int64_t start, int64_t end) +{ +req-start = start; +req-end = end; +qemu_co_queue_init(req-wait_queue); +QLIST_INSERT_HEAD(job-inflight_reqs, req, list); +} + +/* Forget about a completed request */ +static void cow_request_end(CowRequest *req) +{ +QLIST_REMOVE(req, list); +qemu_co_queue_restart_all(req-wait_queue); +} + +static int coroutine_fn backup_do_cow(BlockDriverState *bs, + int64_t sector_num, int
Re: [Qemu-devel] [PATCH v2 2/5] pci: store PCI hole ranges in guestinfo structure
On Thu, May 30, 2013 at 02:25:41PM +0200, Laszlo Ersek wrote: I can't offer any opinion about the values you put into w32 and w64, but I have some remarks. First, please consider passing -O/path/to/some/order_file to git-format-patch, so that .h files show up at the top of each patch. On 05/30/13 13:07, Michael S. Tsirkin wrote: Will be used to pass hole ranges to guests. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/i386/pc.c | 39 ++- hw/i386/pc_piix.c | 14 +- hw/i386/pc_q35.c | 6 +- hw/pci-host/q35.c | 4 include/hw/i386/pc.h | 19 ++- include/hw/pci-host/q35.h | 2 ++ include/qemu/typedefs.h | 1 + 7 files changed, 81 insertions(+), 4 deletions(-) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 4844a6b..c233161 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -978,6 +978,41 @@ void pc_cpus_init(const char *cpu_model, DeviceState *icc_bridge) } } +typedef struct PcGuestInfoState { +PcGuestInfo info; +Notifier machine_done; +} PcGuestInfoState; + +static +void pc_guest_info_machine_done(Notifier *notifier, void *data) +{ +PcGuestInfoState *guest_info_state = container_of(notifier, + PcGuestInfoState, + machine_done); +} + +PcGuestInfo *pc_guest_info_init(ram_addr_t below_4g_mem_size, +ram_addr_t above_4g_mem_size) +{ +PcGuestInfoState *guest_info_state = g_malloc0(sizeof *guest_info_state); +PcGuestInfo *guest_info = guest_info_state-info; + +guest_info-pci_info.w32.end = IO_APIC_DEFAULT_ADDRESS; +if (sizeof(hwaddr) == 4) { +guest_info-pci_info.w64.begin = 0; +guest_info-pci_info.w64.end = 0; +} else { +guest_info-pci_info.w64.begin = 0x1ULL + above_4g_mem_size; +guest_info-pci_info.w64.end = guest_info-pci_info.w64.begin + +(0x1ULL 62); +assert(guest_info-pci_info.w64.begin = guest_info-pci_info.w64.end); +} + +guest_info_state-machine_done.notify = pc_guest_info_machine_done; +qemu_add_machine_init_done_notifier(guest_info_state-machine_done); +return guest_info; +} + void pc_acpi_init(const char *default_dsdt) { char *filename; @@ -1019,7 +1054,8 @@ FWCfgState *pc_memory_init(MemoryRegion *system_memory, ram_addr_t below_4g_mem_size, ram_addr_t above_4g_mem_size, MemoryRegion *rom_memory, - MemoryRegion **ram_memory) + MemoryRegion **ram_memory, + PcGuestInfo *guest_info) { int linux_boot, i; MemoryRegion *ram, *option_rom_mr; @@ -1071,6 +1107,7 @@ FWCfgState *pc_memory_init(MemoryRegion *system_memory, for (i = 0; i nb_option_roms; i++) { rom_add_option(option_rom[i].name, option_rom[i].bootindex); } +guest_info-fw_cfg = fw_cfg; return fw_cfg; } I find this suboptimal style, ie. passing guest_info to pc_memory_init() just so that pc_memory_init() can set guest_info-fw_cfg to fw_cfg, when pc_memory_init() returns fw_cfg anyway. Well otherwise all callers have to remember to set it. More philosophically: diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h index b4c8a74..1bf5219 100644 --- a/include/hw/i386/pc.h +++ b/include/hw/i386/pc.h @@ -9,8 +9,20 @@ #include net/net.h #include hw/i386/ioapic.h +#include qemu/range.h + /* PC-style peripherals (also used by other machines). */ +typedef struct PcPciInfo { +Range w32; +Range w64; +} PcPciInfo; + +struct PcGuestInfo { +PcPciInfo pci_info; +FWCfgState *fw_cfg; +}; Are you sure you need a private link to the global fw_cfg object? The pvpanic series added a global lookup possibility in commit 10a584b2 (object_resolve_path()). Anyway these are just subjective style notes. Yes. I personally prefer not using global lookups: passing a pointer makes dependencies clearer IMO. If we do switch to that, I think it's cleaner to have a wrapper so that everyone does not need to hard-code strings like /machine/fw_cfg. + /* parallel.c */ static inline bool parallel_init(ISABus *bus, int index, CharDriverState *chr) { @@ -80,6 +92,10 @@ void pc_acpi_smi_interrupt(void *opaque, int irq, int level); void pc_cpus_init(const char *cpu_model, DeviceState *icc_bridge); void pc_hot_add_cpu(const int64_t id, Error **errp); void pc_acpi_init(const char *default_dsdt); + +PcGuestInfo *pc_guest_info_init(ram_addr_t below_4g_mem_size, +ram_addr_t
[Qemu-devel] [PATCH v5 01/11] notify: add NotiferWithReturn so notifier list can abort
notifier_list_notify() has no return value. This is fine when we just want to invoke side-effects. Sometimes it's useful for notifiers to produce a return value. This allows notifiers to veto an operation and will be used by the block layer before-write notifier. Signed-off-by: Stefan Hajnoczi stefa...@redhat.com --- include/qemu/notify.h | 25 + util/notify.c | 30 ++ 2 files changed, 55 insertions(+) diff --git a/include/qemu/notify.h b/include/qemu/notify.h index 4e2e7f0..d3103e7 100644 --- a/include/qemu/notify.h +++ b/include/qemu/notify.h @@ -40,4 +40,29 @@ void notifier_remove(Notifier *notifier); void notifier_list_notify(NotifierList *list, void *data); +/* Same as Notifier but allows .notify() to return errors */ +typedef struct NotifierWithReturn NotifierWithReturn; + +struct NotifierWithReturn { +int (*notify)(NotifierWithReturn *notifier, void *data); +QLIST_ENTRY(NotifierWithReturn) node; +}; + +typedef struct NotifierWithReturnList { +QLIST_HEAD(, NotifierWithReturn) notifiers; +} NotifierWithReturnList; + +#define NOTIFIER_WITH_RETURN_LIST_INITIALIZER(head) \ +{ QLIST_HEAD_INITIALIZER((head).notifiers) } + +void notifier_with_return_list_init(NotifierWithReturnList *list); + +void notifier_with_return_list_add(NotifierWithReturnList *list, + NotifierWithReturn *notifier); + +void notifier_with_return_remove(NotifierWithReturn *notifier); + +int notifier_with_return_list_notify(NotifierWithReturnList *list, + void *data); + #endif diff --git a/util/notify.c b/util/notify.c index 7b7692a..f215dfc 100644 --- a/util/notify.c +++ b/util/notify.c @@ -39,3 +39,33 @@ void notifier_list_notify(NotifierList *list, void *data) notifier-notify(notifier, data); } } + +void notifier_with_return_list_init(NotifierWithReturnList *list) +{ +QLIST_INIT(list-notifiers); +} + +void notifier_with_return_list_add(NotifierWithReturnList *list, + NotifierWithReturn *notifier) +{ +QLIST_INSERT_HEAD(list-notifiers, notifier, node); +} + +void notifier_with_return_remove(NotifierWithReturn *notifier) +{ +QLIST_REMOVE(notifier, node); +} + +int notifier_with_return_list_notify(NotifierWithReturnList *list, void *data) +{ +NotifierWithReturn *notifier, *next; +int ret = 0; + +QLIST_FOREACH_SAFE(notifier, list-notifiers, node, next) { +ret = notifier-notify(notifier, data); +if (ret != 0) { +break; +} +} +return ret; +} -- 1.8.1.4
[Qemu-devel] [PATCH v5 00/11] block: drive-backup live backup command
Note: These patches apply to kevin/block. You can also grab the code from git here: git://github.com/stefanha/qemu.git block-backup-core This series adds a new QMP command, drive-backup, which takes a point-in-time snapshot of a block device. The snapshot is copied out to a target block device. A simple example is: drive-backup device=virtio0 format=qcow2 target=backup-20130401.qcow2 The original drive-backup blockjob was written by Dietmar Maurer diet...@proxmox.com. He is currently busy but I feel the feature is worth pushing into QEMU since there has been interest. This is my version of his patch, plus the QMP command and qemu-iotests test case. QMP 'transaction' support is included since v3. It adds support for atomic snapshots of multiple block devices. I also added an 'abort' transaction to allow testing of the .abort()/.cleanup() code path. Thanks to Wenchao for making qmp_transaction() extensible. How is this different from block-stream and drive-mirror? - Both block-stream and drive-mirror do not provide immediate point-in-time snapshots. Instead they copy data into a new file and then switch to it. In other words, the point at which the snapshot is taken cannot be controlled directly. drive-backup intercepts guest writes and saves data into the target block device before it is overwritten. The target block device can be a raw image file, backing files are not used to implement this feature. How can drive-backup be used? - The simplest use-case is to copy a point-in-time snapshot to a local file. More advanced users may wish to make the target an NBD URL. The NBD server listening on the other side can process the backup writes any way it wishes. I previously posted an RFC series with a backup server that streamed Dietmar's VMA backup archive format. What's next for drive-backup? - 1. Sync modes like drive-mirror (top, full, none). This makes it possible to preserve the backing file chain. v5: * Use bdrv_co_write_zeroes(job-target) [kwolf] This change means that we write zeroes over NBD again. The optimization can be reintroduced by skipping zeroes when bdrv_has_zero_init() is true and the sectors are allocated. Leave that for a future series, if we decide to do this optimization again because it may require extra block driver configuration to indicate that an image has zero init. * iostatus error handling [kwolf/pbonzini] Add configurable on-source-error and on-target-error actions just like drive-mirror. These are used when the block job coroutine hits an error. If we are in guest write request context, return the errno and let the usual guest error handling take over. * Allow BdrvActionOps-commit() to be NULL [eblake] * Use bdrv_getlength() in qmp_drive_mirror() [kwolf] * Drop redundant proto_drv check [kwolf] * Fix outdated DPRINTF() function names * Drop comment about non-existent bitmap coroutine race [kwolf] * Rename BACKUP_SECTORS_PER_CLUSTER [kwolf] * Fix completion when image is not multiple of backup cluster size [fam] v4: * Use notifier lists and BdrvTrackedRequest instead of custom callbacks [bonzini] * Add drive-backup QMP example JSON [eblake] * Add Since: 1.6 to QMP schema changes [eblake] v3: * Rename to drive-backup for consistency with drive-mirror [kwolf] * Add QMP transaction support [kwolf] * Introduce bdrv_add_before_write_cb() to hook writes * Mention 'query-block-jobs' lists job of type 'backup' [eblake] * Rename rwlock to flush_rwlock [kwolf] * Fix space in block/backup.c comment [kwolf] v2: * s/block_backup/block-backup/ in commit message [eblake] * Avoid funny spacing in QMP docs [eblake] * Document query-block-jobs and block-job-cancel usage [eblake] Dietmar Maurer (1): block: add basic backup support to block driver Stefan Hajnoczi (10): notify: add NotiferWithReturn so notifier list can abort block: add bdrv_add_before_write_notifier() blockdev: drop redundant proto_drv check blockdev: use bdrv_getlength() in qmp_drive_mirror() block: add drive-backup QMP command blockdev: rename BlkTransactionStates to singular blockdev: allow BdrvActionOps-commit() to be NULL blockdev: add DriveBackup transaction blockdev: add Abort transaction qemu-iotests: add 055 drive-backup test case block.c| 23 +-- block/Makefile.objs| 1 + block/backup.c | 355 + blockdev.c | 288 +++- include/block/block_int.h | 42 +- include/qemu/notify.h | 25 qapi-schema.json | 97 - qmp-commands.hx| 46 ++ tests/qemu-iotests/055 | 256 tests/qemu-iotests/055.out | 5 + tests/qemu-iotests/group | 1 + util/notify.c | 30 12 files
Re: [Qemu-devel] [Xen-devel] [PVSCSI]How to unplug scsi disk simulated by Qemu, just like unplug the ide disk?
On Thu, May 30, 2013 at 11:23:44AM +, Gonglei (Arei) wrote: Hi all, My environment is xen-4.1.2 + qemu-1.2.2 I made a pvscsi driver for Redhat guest, but I encountered a problem that I could see two scsi disks, one was simulated by QEMU, another was passthrough. Actually I want to unplug the scsi disk simulated. Any methods can solve the problem on the qemu upstream. Thanks! Are you actually using Xen *PVSCSI* (http://wiki.xen.org/wiki/Paravirtualized_SCSI), or are you talking about Xen HVM guest with PV drivers for disk? -- Pasi
Re: [Qemu-devel] [RFC] Check backing_file chain's loop
On Thu, May 30, 2013 at 06:12:00AM -0600, Eric Blake wrote: On 05/30/2013 04:14 AM, Dongxu Wang wrote: Hi, Now block layer does not check whether backing_file chain can be a circle, do you think it is necessary to do a check? For example, 1.qcow2's backing_file is 2.qcow2 and 2.qcow2's backing_file is 1.qcow2, then any IO operation will be a fault. If necessary, I will post a patch to fix. Yes, it's worth patching. [$] qemu-img create -f qcow2 1.qcow2 qemu-img: Image creation needs a size parameter [$] qemu-img create -f qcow2 1.qcow2 8M Formatting '1.qcow2', fmt=qcow2 size=8388608 encryption=off cluster_size=65536 lazy_refcounts=off [$] qemu-img create -f qcow2 2.qcow2 -o backing_file=1.qcow2 Formatting '2.qcow2', fmt=qcow2 size=8388608 backing_file='1.qcow2' encryption=off cluster_size=65536 lazy_refcounts=off [$] qemu-img create -f qcow2 1.qcow2 -o backing_file=2.qcow2 Formatting '1.qcow2', fmt=qcow2 size=8388608 backing_file='2.qcow2' encryption=off cluster_size=65536 lazy_refcounts=off [$] qemu-io -c read -v 0 512 1.qcow2 ^CSegmentation fault (core dumped) Also make sure that 'qemu-img info 1.qcow2' and 'qemu-img info --backing-chain 1.qcow2' don't crash. qemu-img info --backing-chain detects cycles and returns an error, see the hash table in collect_image_info_list(). We should protect bdrv_open() too. Stefan
Re: [Qemu-devel] [PATCH v2 2/5] pci: store PCI hole ranges in guestinfo structure
On Thu, May 30, 2013 at 02:32:01PM +0200, Gerd Hoffmann wrote: On 05/30/13 14:19, Michael S. Tsirkin wrote: On Thu, May 30, 2013 at 02:16:13PM +0200, Gerd Hoffmann wrote: Hi, +} else { +guest_info-pci_info.w64.begin = 0x1ULL + above_4g_mem_size; +guest_info-pci_info.w64.end = guest_info-pci_info.w64.begin + +(0x1ULL 62); Doesn't this give unaligned windows? PCI Bridge windows do not need to be size aligned. In any case, the windows are *exactly* as calculated by seabios - apparently it does not size-align windows either. Surely not. SeaBIOS sizes the 64bit window according to the space needed by the 64bit bars it wants to map there. Ah, it's 64 bit. True. That's a seabios bug by the way: if we add more devices by hotplug later, we want more pci memory. +/* Set PCI window size the way seabios has always done it. */ +/* TODO: consider just starting at below_4g_mem_size */ Used to be that way. Was changed for alignment reasons (i.e. 1G window starts at 1G border etc). Where's the alignment requirement coming from? seabios creates a mtrr entry for the window, which doesn't work in case it isn't aligned (at least not with a single entry). Also real hardware tends to do it this way. cheers, Gerd I see. I'll figure out the details and add a comment to this end. But that's for the 32 bit window - I don't see it playing with mtrrs for the 64 bit ranges. So I'm guessing alignment isn't needed there, right? -- MST
Re: [Qemu-devel] [PATCH] walk_pml4e(): fix abort on bad PML4E/PDPTE/PDE/PTE addresses
On Tue, 28 May 2013 14:19:22 -0400 Luiz Capitulino lcapitul...@redhat.com wrote: The code used to walk IA-32e page-tables, and possibly PAE page-tables, uses the bit mask ~0xfff to get the next PML4E/PDPTE/PDE/PTE address. However, as we use a uint64_t to store the resulting address, that mask gets expanded to 0xf000 which not only ends up selecting reserved bits but also selects the XD bit (execute-disable) which happens to be enabled by Windows 8, causing qemu_get_ram_ptr() to abort. This commit fixes that problem by replacing ~0xfff by a correct mask that only selects the address bit range (ie. bits 51:12). Signed-off-by: Luiz Capitulino lcapitul...@redhat.com Ping? Wen? Would be nice get a Reviewed-by before merging... --- PS: I (obviously) don't any more core dumps with this patch applied, but I couldn't check if the Windows dump is correct (does anyone know how to do this?). I did quickly check on Linux though. target-i386/arch_memory_mapping.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/target-i386/arch_memory_mapping.c b/target-i386/arch_memory_mapping.c index 844893f..24884bd 100644 --- a/target-i386/arch_memory_mapping.c +++ b/target-i386/arch_memory_mapping.c @@ -75,6 +75,8 @@ static void walk_pte2(MemoryMappingList *list, } /* PAE Paging or IA-32e Paging */ +#define PLM4_ADDR_MASK 0xff000 /* selects bits 51:12 */ + static void walk_pde(MemoryMappingList *list, hwaddr pde_start_addr, int32_t a20_mask, target_ulong start_line_addr) { @@ -105,7 +107,7 @@ static void walk_pde(MemoryMappingList *list, hwaddr pde_start_addr, continue; } -pte_start_addr = (pde ~0xfff) a20_mask; +pte_start_addr = (pde PLM4_ADDR_MASK) a20_mask; walk_pte(list, pte_start_addr, a20_mask, line_addr); } } @@ -208,7 +210,7 @@ static void walk_pdpe(MemoryMappingList *list, continue; } -pde_start_addr = (pdpe ~0xfff) a20_mask; +pde_start_addr = (pdpe PLM4_ADDR_MASK) a20_mask; walk_pde(list, pde_start_addr, a20_mask, line_addr); } } @@ -231,7 +233,7 @@ static void walk_pml4e(MemoryMappingList *list, } line_addr = ((i 0x1ffULL) 39) | (0xULL 48); -pdpe_start_addr = (pml4e ~0xfff) a20_mask; +pdpe_start_addr = (pml4e PLM4_ADDR_MASK) a20_mask; walk_pdpe(list, pdpe_start_addr, a20_mask, line_addr); } } @@ -249,7 +251,7 @@ int cpu_get_memory_mapping(MemoryMappingList *list, CPUArchState *env) if (env-hflags HF_LMA_MASK) { hwaddr pml4e_addr; -pml4e_addr = (env-cr[3] ~0xfff) env-a20_mask; +pml4e_addr = (env-cr[3] PLM4_ADDR_MASK) env-a20_mask; walk_pml4e(list, pml4e_addr, env-a20_mask); } else #endif
Re: [Qemu-devel] [PATCH v19 4/7] pvpanic: pass configurable ioport to seabios
On Thu, Apr 18, 2013 at 10:14:43AM +0800, Hu Tao wrote: This lets seabios patch the corresponding SSDT entry. Signed-off-by: Hu Tao hu...@cn.fujitsu.com --- hw/misc/pvpanic.c | 14 ++ hw/nvram/fw_cfg.c | 8 +++- include/hw/nvram/fw_cfg.h | 2 ++ 3 files changed, 23 insertions(+), 1 deletion(-) diff --git a/hw/misc/pvpanic.c b/hw/misc/pvpanic.c index c3adcdf..23599a3 100644 --- a/hw/misc/pvpanic.c +++ b/hw/misc/pvpanic.c @@ -18,6 +18,8 @@ #include sysemu/sysemu.h #include sysemu/kvm.h +#include hw/nvram/fw_cfg.h + /* The bit of supported pv event */ #define PVPANIC_F_PANICKED 0 @@ -86,10 +88,22 @@ static const MemoryRegionOps pvpanic_ops = { static int pvpanic_isa_initfn(ISADevice *dev) { PVPanicState *s = ISA_PVPANIC_DEVICE(dev); +static bool port_configured; +void *fw_cfg; memory_region_init_io(s-io, pvpanic_ops, s, pvpanic, 1); isa_register_ioport(dev, s-io, s-ioport); +if (!port_configured) { +fw_cfg = object_resolve_path(/machine/fw_cfg, NULL); +if (fw_cfg) { +fw_cfg_add_file(fw_cfg, etc/pvpanic-port, +g_memdup(s-ioport, sizeof(s-ioport)), +sizeof(s-ioport)); +port_configured = true; +} +} + return 0; } Sorry about missing this when this was posted/applied. Wouldn't it make more sense to have /machine/fw_cfg string in a header. diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c index 97bba87..1a7e49c 100644 --- a/hw/nvram/fw_cfg.c +++ b/hw/nvram/fw_cfg.c @@ -489,11 +489,17 @@ FWCfgState *fw_cfg_init(uint32_t ctl_port, uint32_t data_port, dev = qdev_create(NULL, fw_cfg); qdev_prop_set_uint32(dev, ctl_iobase, ctl_port); qdev_prop_set_uint32(dev, data_iobase, data_port); -qdev_init_nofail(dev); d = SYS_BUS_DEVICE(dev); s = DO_UPCAST(FWCfgState, busdev.qdev, dev); +if (!object_resolve_path(/machine/fw_cfg, NULL)) { Why is this tested here? Should be an assert(!object_resolve_path(/machine/fw_cfg, NULL)) surely? +object_property_add_child(qdev_get_machine(), fw_cfg, OBJECT(s), + NULL); +} + +qdev_init_nofail(dev); + if (ctl_addr) { sysbus_mmio_map(d, 0, ctl_addr); } diff --git a/include/hw/nvram/fw_cfg.h b/include/hw/nvram/fw_cfg.h index 05c8df1..07cc941 100644 --- a/include/hw/nvram/fw_cfg.h +++ b/include/hw/nvram/fw_cfg.h @@ -1,6 +1,8 @@ #ifndef FW_CFG_H #define FW_CFG_H +#include exec/hwaddr.h + #define FW_CFG_SIGNATURE0x00 #define FW_CFG_ID 0x01 #define FW_CFG_UUID 0x02 -- 1.8.1.4
Re: [Qemu-devel] broken incoming migration
Il 30/05/2013 11:08, Peter Lieven ha scritto: Am 30.05.2013 10:18, schrieb Alexey Kardashevskiy: On 05/30/2013 05:49 PM, Paolo Bonzini wrote: Il 30/05/2013 09:44, Alexey Kardashevskiy ha scritto: Hi! I found the migration broken on pseries platform, specifically, this patch broke it: f1c72795af573b24a7da5eb52375c9aba8a37972 migration: do not sent zero pages in bulk stage The idea is not to send zero pages to the destination guest which is expected to have 100% empty RAM. However on pseries plaftorm the guest always has some stuff in the RAM as a part of initialization (device tree, system firmware and rtas (?)) so it is not completely empty. As the source guest cannot detect this, it skips some pages during migration and we get a broken destination guest. Bug. While the idea is ok in general, I do not see any easy way to fix it as neither QEMUMachine::init nor QEMUMachine::reset callbacks has information about whether we are about to receive a migration or not (-incoming parameter) and we cannot move device-tree and system firmware initialization anywhere else. ram_bulk_stage is static and cannot be disabled from the platform initialization code. So what would the community suggest? Revert the patch. :) I'll wait for 24 hours (forgot to cc: the author) and then post a revert patch :) does this problem only occur on pseries emulation? Probably not. On a PC, it would occur if you had 4K of zeros in the source BIOS but not in the destination BIOS. When you reboot, the BIOS image is wrong. not sending zero pages is not only a performance benefit it also makes overcomitted memory usable. the madv_dontneed seems to kick in asynchronously and memory is not available immediately. You could also scan the page for nonzero values before writing it. Paolo
Re: [Qemu-devel] [PATCH 0/2] fixup some fallback from tree reorg
Il 30/05/2013 11:30, Michael S. Tsirkin ha scritto: Here are some patches fixing up minor issues in the tree reorganization. Michael S. Tsirkin (2): dec.c - move to pci-bridge firmware_abi: move to include/hw/nvram/ hw/pci-bridge/Makefile.objs | 2 + hw/pci-bridge/dec.c | 156 +++ hw/pci-bridge/dec.h | 10 ++ hw/pci-host/Makefile.objs| 1 - hw/pci-host/dec.c| 156 --- hw/pci-host/dec.h| 10 -- hw/sparc/sun4m.c | 2 +- hw/sparc64/sun4u.c | 2 +- include/hw/nvram/openbios_firmware_abi.h | 73 +++ include/hw/sparc/firmware_abi.h | 73 --- 10 files changed, 243 insertions(+), 242 deletions(-) create mode 100644 hw/pci-bridge/dec.c create mode 100644 hw/pci-bridge/dec.h delete mode 100644 hw/pci-host/dec.c delete mode 100644 hw/pci-host/dec.h create mode 100644 include/hw/nvram/openbios_firmware_abi.h delete mode 100644 include/hw/sparc/firmware_abi.h Thanks. Please give Blue a couple of days to comment. Paolo
Re: [Qemu-devel] [PATCH 0/2] fixup some fallback from tree reorg
On Thu, May 30, 2013 at 03:04:39PM +0200, Paolo Bonzini wrote: Il 30/05/2013 11:30, Michael S. Tsirkin ha scritto: Here are some patches fixing up minor issues in the tree reorganization. Michael S. Tsirkin (2): dec.c - move to pci-bridge firmware_abi: move to include/hw/nvram/ hw/pci-bridge/Makefile.objs | 2 + hw/pci-bridge/dec.c | 156 +++ hw/pci-bridge/dec.h | 10 ++ hw/pci-host/Makefile.objs| 1 - hw/pci-host/dec.c| 156 --- hw/pci-host/dec.h| 10 -- hw/sparc/sun4m.c | 2 +- hw/sparc64/sun4u.c | 2 +- include/hw/nvram/openbios_firmware_abi.h | 73 +++ include/hw/sparc/firmware_abi.h | 73 --- 10 files changed, 243 insertions(+), 242 deletions(-) create mode 100644 hw/pci-bridge/dec.c create mode 100644 hw/pci-bridge/dec.h delete mode 100644 hw/pci-host/dec.c delete mode 100644 hw/pci-host/dec.h create mode 100644 include/hw/nvram/openbios_firmware_abi.h delete mode 100644 include/hw/sparc/firmware_abi.h Thanks. Please give Blue a couple of days to comment. Paolo I always do this: pci tree tip is documented as being rebased so I drop stuff from there if there are nacks.
Re: [Qemu-devel] [PATCH v3 4/5] xilinx_devcfg: Zynq devcfg device model
Il 30/05/2013 09:15, Peter Crosthwaite ha scritto: Hi Peter, Should we be putting r-access-name here instead of devcfg-regs? Yes, that's why I preferred to wrap the memory_region_init_io into an API that takes a RegisterInfo. :) ACK, You've convinced me :). Will be in v4 (pending outcome of discussion with Anthony RE decoding) I would also have preferred (I was too terse in mentioning .accepts.valid earlier so this is the less concise explanation) an API that does a single memory_region_init_io region for a whole array of registers. Basically having a RegisterArrayInfo in addition to the RegisterInfo. Paolo