date:20130530

Re: [Qemu-devel] [PATCH] pci: change typename of q35 to pci-q35

2013-05-30 Thread Michael S. Tsirkin

On Thu, May 30, 2013 at 10:47:48AM +0800, Amos Kong wrote:
 In QEMU, we set the firmware name of pc-i440fx is 'pci', and set the
 firmware name of q35 is 'q35-pcihost'.
 
 The seabios pattern matching code isn't sophisticated, and we want to
 match those two devices with exact pattern.
 
 This patch changes the typename of q35 to 'pci-q35', all pci devices
 use same prefix 'pci' in fireware name. And I will change pattern in
 seabios to /pci*@i0cf8.
 
 Signed-off-by: Amos Kong ak...@redhat.com

Let's just not leak name to the guest - there's no reason to.

Won't the following address the issue without bios changes?
Maybe we should find some way to set fw_name in hw/pci/pci_host.c,
worth looking into.

Signed-off-by: Michael S. Tsirkin m...@redhat.com

---


diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
index 8467f86..24df6b5 100644
--- a/hw/pci-host/q35.c
+++ b/hw/pci-host/q35.c
@@ -76,6 +76,7 @@ static void q35_host_class_init(ObjectClass *klass, void 
*data)
 
 k-init = q35_host_init;
 dc-props = mch_props;
+dc-fw_name = pci;
 }
 
 static void q35_host_initfn(Object *obj)

Re: [Qemu-devel] [SeaBIOS] KVM call agenda for 2013-05-28

2013-05-30 Thread Gerd Hoffmann

  Hi,

 Raised
 that QOM interface should be sufficient.

 Agree on this one.  Ideally the acpi table generation code should be
 able to gather all information it needs from the qom tree, so it can be
 a standalone C file instead of being scattered over all qemu.
 
 Ack.  So my basic argument is why not expose the QOM interfaces to
 firmware and move the generation code there?  Seems like it would be
 more or less a copy/paste once we had a proper implementation in QEMU.

Well, no.  Firmware is a quite simple environment without standard libc
etc, so moving code from qemu to firmware certainly isn't as easy as
copying over a file.

 There were discussions on potentially introducing a middle component
 to generate the tables.  Coreboot was raised as a possibility, and
 David thought it would be okay to use coreboot for both OVMF and
 SeaBIOS.

 Certainly an option, but that is a long-term project.
 
 Out of curiousity, are there other benefits to using coreboot as a core
 firmware in QEMU?

Short-term it's alot of work as we have to bring coreboot's qemu support
to feature parity with seabios.  I suspect most of this is acpi related
though, so when qemu provides the tables and coreboot uses them we could
be pretty close already.

Long-term it should simplify firmware maintainance as we have only *one*
place which handles the hardware bringup, and seabios/ovmf have less
work to do.

 Is there a payload we would ever plausibly use besides OVMF and SeaBIOS?

I wouldn't be surprised if people start using other coreboot payloads
and/or features such as direct linux kernel boot once it works well on qemu.

We might even run qemu test suites as coreboot payload.

cheers,
  Gerd

Re: [Qemu-devel] updated: kvm networking todo wiki

2013-05-30 Thread Stefan Hajnoczi

On Thu, May 30, 2013 at 7:23 AM, Rusty Russell ru...@rustcorp.com.au wrote:
 Anthony Liguori anth...@codemonkey.ws writes:
 Rusty Russell ru...@rustcorp.com.au writes:
 On Fri, May 24, 2013 at 08:47:58AM -0500, Anthony Liguori wrote:
 FWIW, I think what's more interesting is using vhost-net as a networking
 backend with virtio-net in QEMU being what's guest facing.

 In theory, this gives you the best of both worlds: QEMU acts as a first
 line of defense against a malicious guest while still getting the
 performance advantages of vhost-net (zero-copy).

 It would be an interesting idea if we didn't already have the vhost
 model where we don't need the userspace bounce.

 The model is very interesting for QEMU because then we can use vhost as
 a backend for other types of network adapters (like vmxnet3 or even
 e1000).

 It also helps for things like fault tolerance where we need to be able
 to control packet flow within QEMU.

 (CC's reduced, context added, Dmitry Fleytman added for vmxnet3 thoughts).

 Then I'm really confused as to what this would look like.  A zero copy
 sendmsg?  We should be able to implement that today.

 On the receive side, what can we do better than readv?  If we need to
 return to userspace to tell the guest that we've got a new packet, we
 don't win on latency.  We might reduce syscall overhead with a
 multi-dimensional readv to read multiple packets at once?

Sounds like recvmmsg(2).

Stefan

Re: [Qemu-devel] KVM call agenda for 2013-05-28

2013-05-30 Thread Gerd Hoffmann

  Hi,

 Why should this be true?  Shouldn't we be allowed to increase the amount
 of memory the guest has across reboots?  That's equivalent to adding
 another DIMM after power off.

poweroff is equivalent to exiting qemu, not to guest reset.

 Not generating tables on reset does limit what we can do in a pretty
 fundamental way.  Even if you can argue it in the short term, I don't
 think it's viable in the long term.

I don't think so.  The procedure for adding/removing non-hotpluggable
hardware is:  poweroff, plugin/-out hardware (change config in qemu),
boot.  Hotpluggable hardware doesn't need acpi table updates.

cheers,
  Gerd

Re: [Qemu-devel] snabbswitch integration with QEMU for userspace ethernet I/O

2013-05-30 Thread Stefan Hajnoczi

On Wed, May 29, 2013 at 6:02 PM, Julian Stecklina
jstec...@os.inf.tu-dresden.de wrote:
 On 05/29/2013 04:21 PM, Stefan Hajnoczi wrote:
 The fact that a single switch process has shared memory access to all
 guests' RAM is critical.  If the switch process is exploited, then that
 exposes other guests' data!  (Think of a multi-tenant host with guests
 belonging to different users.)

 True. But people don't mind having instruction decoding and half of
 virtio in the kernel these days, so it can't be that security critical...

No, it's still security critical.  If there were equivalent solutions
with better security then I'm sure people would accept them.  It's
just that there isn't an equivalent solution yet :).

Stefan

Re: [Qemu-devel] snabbswitch integration with QEMU for userspace ethernet I/O

2013-05-30 Thread Michael S. Tsirkin

On Thu, May 30, 2013 at 08:46:42AM +0200, Stefan Hajnoczi wrote:
 On Wed, May 29, 2013 at 6:02 PM, Julian Stecklina
 jstec...@os.inf.tu-dresden.de wrote:
  On 05/29/2013 04:21 PM, Stefan Hajnoczi wrote:
  The fact that a single switch process has shared memory access to all
  guests' RAM is critical.  If the switch process is exploited, then that
  exposes other guests' data!  (Think of a multi-tenant host with guests
  belonging to different users.)
 
  True. But people don't mind having instruction decoding and half of
  virtio in the kernel these days, so it can't be that security critical...
 
 No, it's still security critical.  If there were equivalent solutions
 with better security then I'm sure people would accept them.  It's
 just that there isn't an equivalent solution yet :).
 
 Stefan

Some people would accept them.
Others run with selinux off ...

-- 
MST

Re: [Qemu-devel] [PATCH v2] monitor: work around delayed CHR_EVENT_OPENED events

2013-05-30 Thread Stefan Priebe - Profihost AG

Am 30.05.2013 00:29, schrieb mdroth:
 On Wed, May 29, 2013 at 01:27:33PM -0400, Luiz Capitulino wrote:
 On Mon, 27 May 2013 12:59:25 -0500
 mdroth mdr...@linux.vnet.ibm.com wrote:

 On Mon, May 27, 2013 at 11:16:01AM -0500, Anthony Liguori wrote:
 Luiz Capitulino lcapitul...@redhat.com writes:

 On Sun, 26 May 2013 10:33:39 -0500
 Michael Roth mdr...@linux.vnet.ibm.com wrote:

 In the past, CHR_EVENT_OPENED events were emitted via a pre-expired
 QEMUTimer. Due to timers being processing at the tail end of each main
 loop iteration, this generally meant that such events would be emitted
 within the same main loop iteration, prior any client data being read
 by tcp/unix socket server backends.

 With 9f939df955a4152aad69a19a77e0898631bb2c18, this was switched to
 a bottom-half that is registered via g_idle_add(). This makes it
 likely that the event won't be sent until a subsequent iteration, and
 also adds the possibility that such events will race with the
 processing of client data.

 In cases where we rely on the CHR_EVENT_OPENED event being delivered
 prior to any client data being read, this may cause unexpected behavior.

 In the case of a QMP monitor session using a socket backend, the delayed
 processing of the CHR_EVENT_OPENED event can lead to a situation where
 a previous session, where 'qmp_capabilities' has already processed, can
 cause the rejection of 'qmp_capabilities' for a subsequent session,
 since we reset capabilities in response to CHR_EVENT_OPENED, which may
 not yet have been delivered. This can be reproduced with the following
 command, generally within 50 or so iterations:

   mdroth@loki:~$ cat cmds
   {'execute':'qmp_capabilities'}
   {'execute':'query-cpus'}
   mdroth@loki:~$ while true; do if socat stdio unix-connect:/tmp/qmp.sock
   cmds | grep CommandNotFound /dev/null; then echo failed; break; else
   echo ok; fi; done;
   ok
   ok
   failed
   mdroth@loki:~$

 As a work-around, we reset capabilities as part of the CHR_EVENT_CLOSED
 hook, which gets called as part of the GIOChannel cb associated with the
 client rather than a bottom-half, and is thus guaranteed to be delivered
 prior to accepting any subsequent client sessions.

 This does not address the more general problem of whether or not there
 are chardev users that rely on CHR_EVENT_OPENED being delivered prior to
 client data, and whether or not we should consider moving 
 CHR_EVENT_OPENED
 in-band with connection establishment as a general solution, but fixes
 QMP for the time being.

 Reported-by: Stefan Priebe s.pri...@profihost.ag
 Cc: qemu-sta...@nongnu.org
 Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com

 Thanks for debugging this Michael. I'm going to apply your patch to the 
 qmp
 branch because we can't let this broken (esp. in -stable) but this is 
 papering
 over the real bug...

 Do we really need OPENED to happen in a bottom half?  Shouldn't the open
 event handlers register the callback instead if they need it?

 I think that's the more general fix, but wasn't sure if there were
 historical reasons why we didn't do this in the first place.

 Looking at it more closely it does seem like we need a general fix
 sooner rather than later though, and I don't see any reason why we can't
 move BH registration into the actual OPENED hooks as you suggest:

 hw/usb/ccid-card-passthru.c
  - currently affected? no
 debug hook only
  - needs OPENED BH? no

 hw/usb/redirect.c
  - currently affected? yes
 can_read handler checks for dev-parser != NULL, which may be
 true if CLOSED BH has been executed yet. In the past, OPENED
 quiesced outstanding CLOSED events prior to us reading client data.
  - need OPENED BH? possibly
 seems to only be needed by CLOSED events, which can be issued by
 USBDevice, so we do teardown/usb_detach in BH. For OPENED, this
 may not apply. if we do issue a BH, we'd need to modify can_read
 handler to avoid race.

 hw/usb/dev-serial.c
  - currently affected? no
 can_read checks for dev.attached != NULL. set to NULL
 synchronously in CLOSED, will not accept reads until OPENED gets
 called and sets dev.attached
  - need opened BH?  no

 hw/char/sclpconsole.c
  - currently affected? no
 guarded by can_read handler
  - need opened BH? no

 hw/char/ipoctal232.c
  - currently affected? no
 debug hook only
  - need opened BH? no

 hw/char/virtio-console.c
  - currently affected? no
 OPENED/CLOSED map to vq events handled asynchronously. can_read
 checks for guest_connected state rather than anything set by OPENED
  - need opened BH? no

 qtest.c
  - currently affected? yes
 can_read handler doesn't check for qtest_opened == true, can read
 data before OPENED event is processed
  - need opened BH? no

 monitor.c
  - current affected? yes
 negotiated session state can temporarilly leak into a new session
  - need opened BH? no

 gdbstub.c
  - currently affected? yes
 can fail to pause machine prior to starting gdb

Re: [Qemu-devel] [snabb-devel:308] Re: snabbswitch integration with QEMU for userspace ethernet I/O

2013-05-30 Thread Luke Gorrie

On 30 May 2013 08:46, Stefan Hajnoczi stefa...@gmail.com wrote:

 No, it's still security critical.  If there were equivalent solutions
 with better security then I'm sure people would accept them.  It's
 just that there isn't an equivalent solution yet :).


Security-wise this is where I would like to be in the long term:

1. Userspace switch running non-root.
2. Shared memory mappings with VMs for exactly the memory that will be used
for packet buffers. (They could map it from me...)
3. IOMMU access to exactly the PCI devices under the switch's control.

Again my perspective is pretty hardware-centric rather than kernel-centric
i.e. I'm thinking more in terms of MMU/IOMMU than Unix system calls and
permissions.

Short-term agenda is to build something good enough that people will _want_
to spend the energy to make it highly secure (and highly portable, etc). So
I run as root and use every trick in the sysfs book.

Re: [Qemu-devel] [PATCH v3 4/5] xilinx_devcfg: Zynq devcfg device model

2013-05-30 Thread Peter Crosthwaite

On Thu, May 30, 2013 at 3:08 AM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 29/05/2013 19:04, Edgar E. Iglesias ha scritto:
  +for (i = 0; i  R_MAX; ++i) {
  +RegisterInfo *r = s-regs_info[i];
  +
  +*r = (RegisterInfo) {
  +.data = s-regs[i],
  +.data_size = sizeof(uint32_t),
  +.access = xilinx_devcfg_regs_info[i],
  +.debug = XILINX_DEVCFG_ERR_DEBUG,
  +.prefix = prefix,
  +.opaque = s,
  +};
  +memory_region_init_io(r-mem, devcfg_reg_ops, r, 
  devcfg-regs, 4);
 Hi Peter, Should we be putting r-access-name here instead of devcfg-regs?

 Yes, that's why I preferred to wrap the memory_region_init_io into an
 API that takes a RegisterInfo. :)

ACK,

You've convinced me :). Will be in v4 (pending outcome of discussion
with Anthony RE decoding)

Regards,
Peter


 Paolo

Re: [Qemu-devel] [PATCH 3/3] ide-test: Add FLUSH CACHE test case

2013-05-30 Thread Stefan Hajnoczi

On Wed, May 29, 2013 at 01:34:06PM +0200, Kevin Wolf wrote:
 +/* Check registers */
 +data = inb(IDE_BASE + reg_device);
 +g_assert_cmpint(data  0x10, ==, 0);

assert_bit_clear() with a constant instead of the 0x10 magic number?

Re: [Qemu-devel] [PATCH 0/3] ide: Set BSY bit during FLUSH

2013-05-30 Thread Stefan Hajnoczi

On Wed, May 29, 2013 at 01:34:03PM +0200, Kevin Wolf wrote:
 The test case depends on the qemu-io series I sent yesterday.
 ([PATCH 00/16] Make qemu-io commands available in the monitor)
 
 Andreas Färber (1):
   ide: Set BSY bit during FLUSH
 
 Kevin Wolf (2):
   blkdebug: Add BLKDBG_FLUSH_TO_OS/DISK events
   ide-test: Add FLUSH CACHE test case
 
  block.c   |  8 
  block/blkdebug.c  |  3 +++
  hw/ide/core.c |  3 +++
  include/block/block.h |  3 +++
  tests/ide-test.c  | 40 
  5 files changed, 53 insertions(+), 4 deletions(-)

Looks good.  I really like the test case.

Posted a minor comment on the test case.

Stefan

[Qemu-devel] [QEMU PATCH v2] pci: set firmware name of q35 to pci

2013-05-30 Thread Amos Kong

In QEMU, we set the firmware name of pc-i440fx to 'pci', and the
typename 'q35-pcihost' is set to the firmware name of q35 by default.

The seabios matches pci devices by /pci/@i0cf8, so q35 device
could not be identified, seabios fails to adjust the boot priority
of q35 devices.

This patch sets firmware name of q35 to 'pci'.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
Signed-off-by: Amos Kong ak...@redhat.com
---
v2: just set the fw_name, no need to change seabios
---
 hw/pci-host/q35.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
index 8467f86..24df6b5 100644
--- a/hw/pci-host/q35.c
+++ b/hw/pci-host/q35.c
@@ -76,6 +76,7 @@ static void q35_host_class_init(ObjectClass *klass, void 
*data)
 
 k-init = q35_host_init;
 dc-props = mch_props;
+dc-fw_name = pci;
 }
 
 static void q35_host_initfn(Object *obj)
-- 
1.8.1.4

Re: [Qemu-devel] [PATCH] pci: change typename of q35 to pci-q35

2013-05-30 Thread Amos Kong

On Thu, May 30, 2013 at 09:09:05AM +0300, Michael S. Tsirkin wrote:
 On Thu, May 30, 2013 at 10:47:48AM +0800, Amos Kong wrote:
  In QEMU, we set the firmware name of pc-i440fx is 'pci', and set the
  firmware name of q35 is 'q35-pcihost'.
  
  The seabios pattern matching code isn't sophisticated, and we want to
  match those two devices with exact pattern.
  
  This patch changes the typename of q35 to 'pci-q35', all pci devices
  use same prefix 'pci' in fireware name. And I will change pattern in
  seabios to /pci*@i0cf8.
  
  Signed-off-by: Amos Kong ak...@redhat.com
 
 Let's just not leak name to the guest - there's no reason to.
 
 Won't the following address the issue without bios changes?
 Maybe we should find some way to set fw_name in hw/pci/pci_host.c,
 worth looking into.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 
 ---
 
 
 diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
 index 8467f86..24df6b5 100644
 --- a/hw/pci-host/q35.c
 +++ b/hw/pci-host/q35.c
 @@ -76,6 +76,7 @@ static void q35_host_class_init(ObjectClass *klass, void 
 *data)
  
  k-init = q35_host_init;
  dc-props = mch_props;
 +dc-fw_name = pci;

It's a light change, works with current seabios.

  }
  
  static void q35_host_initfn(Object *obj)

-- 
Amos.

[Qemu-devel] [PATCH v3 0/2] i.MX: Improve GPT driver implementation

2013-05-30 Thread Jean-Christophe DUBOIS

This patch is improving the completness of the GPT timer implementation.

It adds compare 2 and 3 register support to the already exiting compare 1
register.
This patch is also moving to a more modern/robust/abstract implementation
Last a global more meaningfull naming is applied.

Note: We still don't yet support the input register feature allowing to
  times stamp a level change on a GPIO.

Jean-Christophe DUBOIS (2):
  i.MX: Implement a more complete version of the GPT timer.
  i.MX: Rework functions/types name and use new styme initialization

 hw/timer/imx_gpt.c | 558 +++--
 1 file changed, 325 insertions(+), 233 deletions(-)

-- 
1.8.1.2

[Qemu-devel] [PATCH v3 2/2] i.MX: Rework functions/types name and use new styme initialization

2013-05-30 Thread Jean-Christophe DUBOIS

* use dynamic cast whenever possible.
* Change function names to some more meanigfull prefix
* Change type names to a more meanigfull one
* use new style device intialization.

Signed-off-by: Jean-Christophe DUBOIS j...@tribudubois.net
---

Change since v1:
* The patch has been divided in 2 sub-patches. One that deals with the added
  features and one dealing with the renaming (this one).

Change since v2:
* use a shorter prefix
* move new style initialization and dynamic cast to this patch.

 hw/timer/imx_gpt.c | 171 ++---
 1 file changed, 84 insertions(+), 87 deletions(-)

diff --git a/hw/timer/imx_gpt.c b/hw/timer/imx_gpt.c
index 267a763..de53b13 100644
--- a/hw/timer/imx_gpt.c
+++ b/hw/timer/imx_gpt.c
@@ -27,7 +27,7 @@
 #define DEBUG_TIMER 0
 #if DEBUG_TIMER
 
-static char const *imx_timerg_reg_name(uint32_t reg)
+static char const *imx_gpt_reg_name(uint32_t reg)
 {
 switch (reg) {
 case 0:
@@ -67,12 +67,14 @@ static char const *imx_timerg_reg_name(uint32_t reg)
  */
 #define DEBUG_IMPLEMENTATION 1
 #if DEBUG_IMPLEMENTATION
-#  define IPRINTF(fmt, args...) \
+#  define IPRINTF(fmt, args...) \
   do { fprintf(stderr, %s:  fmt, __func__, ##args); } while (0)
 #else
 #  define IPRINTF(fmt, args...) do {} while (0)
 #endif
 
+#define IMX_GPT(obj) \
+OBJECT_CHECK(IMXGPTState, (obj), TYPE_IMX_GPT)
 /*
  * GPT : General purpose timer
  *
@@ -137,33 +139,33 @@ typedef struct {
 uint32_t freq;
 
 qemu_irq irq;
-} IMXTimerGState;
+} IMXGPTState;
 
-static const VMStateDescription vmstate_imx_timerg = {
+static const VMStateDescription vmstate_imx_timer_gpt = {
 .name = TYPE_IMX_GPT,
 .version_id = 3,
 .minimum_version_id = 3,
 .minimum_version_id_old = 3,
 .fields  = (VMStateField[]) {
-VMSTATE_UINT32(cr, IMXTimerGState),
-VMSTATE_UINT32(pr, IMXTimerGState),
-VMSTATE_UINT32(sr, IMXTimerGState),
-VMSTATE_UINT32(ir, IMXTimerGState),
-VMSTATE_UINT32(ocr1, IMXTimerGState),
-VMSTATE_UINT32(ocr2, IMXTimerGState),
-VMSTATE_UINT32(ocr3, IMXTimerGState),
-VMSTATE_UINT32(icr1, IMXTimerGState),
-VMSTATE_UINT32(icr2, IMXTimerGState),
-VMSTATE_UINT32(cnt, IMXTimerGState),
-VMSTATE_UINT32(next_timeout, IMXTimerGState),
-VMSTATE_UINT32(next_int, IMXTimerGState),
-VMSTATE_UINT32(freq, IMXTimerGState),
-VMSTATE_PTIMER(timer, IMXTimerGState),
+VMSTATE_UINT32(cr, IMXGPTState),
+VMSTATE_UINT32(pr, IMXGPTState),
+VMSTATE_UINT32(sr, IMXGPTState),
+VMSTATE_UINT32(ir, IMXGPTState),
+VMSTATE_UINT32(ocr1, IMXGPTState),
+VMSTATE_UINT32(ocr2, IMXGPTState),
+VMSTATE_UINT32(ocr3, IMXGPTState),
+VMSTATE_UINT32(icr1, IMXGPTState),
+VMSTATE_UINT32(icr2, IMXGPTState),
+VMSTATE_UINT32(cnt, IMXGPTState),
+VMSTATE_UINT32(next_timeout, IMXGPTState),
+VMSTATE_UINT32(next_int, IMXGPTState),
+VMSTATE_UINT32(freq, IMXGPTState),
+VMSTATE_PTIMER(timer, IMXGPTState),
 VMSTATE_END_OF_LIST()
 }
 };
 
-static const IMXClk imx_timerg_clocks[] = {
+static const IMXClk imx_gpt_clocks[] = {
 NOCLK,/* 000 No clock source */
 IPG,  /* 001 ipg_clk, 532MHz*/
 IPG,  /* 010 ipg_clk_highfreq */
@@ -174,10 +176,10 @@ static const IMXClk imx_timerg_clocks[] = {
 NOCLK,/* 111 not defined */
 };
 
-static void imx_timerg_set_freq(IMXTimerGState *s)
+static void imx_gpt_set_freq(IMXGPTState *s)
 {
 uint32_t clksrc = extract32(s-cr, GPT_CR_CLKSRC_SHIFT, 3);
-uint32_t freq = imx_clock_frequency(s-ccm, imx_timerg_clocks[clksrc]) 
+uint32_t freq = imx_clock_frequency(s-ccm, imx_gpt_clocks[clksrc])
 / (1 + s-pr);
 s-freq = freq;
 
@@ -188,7 +190,7 @@ static void imx_timerg_set_freq(IMXTimerGState *s)
 }
 }
 
-static void imx_timerg_update(IMXTimerGState *s)
+static void imx_gpt_update_int(IMXGPTState *s)
 {
 if ((s-sr  s-ir)  (s-cr  GPT_CR_EN)) {
 qemu_irq_raise(s-irq);
@@ -197,14 +199,14 @@ static void imx_timerg_update(IMXTimerGState *s)
 }
 }
 
-static uint32_t imx_timerg_update_counts(IMXTimerGState *s)
+static uint32_t imx_gpt_update_count(IMXGPTState *s)
 {
 s-cnt = s-next_timeout - (uint32_t)ptimer_get_count(s-timer);
 
 return s-cnt;
 }
 
-static inline uint32_t imx_timerg_find_limit(uint32_t count, uint32_t reg,
+static inline uint32_t imx_gpt_find_limit(uint32_t count, uint32_t reg,
  uint32_t timeout)
 {
 if ((count  reg)  (timeout  reg)) {
@@ -214,7 +216,7 @@ static inline uint32_t imx_timerg_find_limit(uint32_t 
count, uint32_t reg,
 return timeout;
 }
 
-static void imx_timerg_compute_next_timeout(IMXTimerGState *s, bool event)
+static void imx_gpt_compute_next_timeout(IMXGPTState *s, bool event)
 {
 uint32_t

[Qemu-devel] [PATCH v3 1/2] i.MX: Implement a more complete version of the GPT timer.

2013-05-30 Thread Jean-Christophe DUBOIS

* implement compare 1 2 and 3 registers
* simplify Debug printf

Signed-off-by: Jean-Christophe DUBOIS j...@tribudubois.net
---

Change since v1:
* The patch has been divided in 2 sub-patches. One that deals with the added
  features (this one) and one dealing with the renaming.

Change since v2:
* move new style initialization and dynamic cast to the other patch

 hw/timer/imx_gpt.c | 443 -
 1 file changed, 269 insertions(+), 174 deletions(-)

diff --git a/hw/timer/imx_gpt.c b/hw/timer/imx_gpt.c
index d8c4f0b..267a763 100644
--- a/hw/timer/imx_gpt.c
+++ b/hw/timer/imx_gpt.c
@@ -5,6 +5,7 @@
  * Copyright (c) 2011 NICTA Pty Ltd
  * Originally written by Hans Jiang
  * Updated by Peter Chubb
+ * Updated by Jean-Christophe Dubois
  *
  * This code is licensed under GPL version 2 or later.  See
  * the COPYING file in the top-level directory.
@@ -18,10 +19,44 @@
 #include hw/sysbus.h
 #include hw/arm/imx.h
 
-//#define DEBUG_TIMER 1
-#ifdef DEBUG_TIMER
+#define TYPE_IMX_GPT imx.gpt
+
+/*
+ * Define to 1 for debug messages
+ */
+#define DEBUG_TIMER 0
+#if DEBUG_TIMER
+
+static char const *imx_timerg_reg_name(uint32_t reg)
+{
+switch (reg) {
+case 0:
+return CR;
+case 1:
+return PR;
+case 2:
+return SR;
+case 3:
+return IR;
+case 4:
+return OCR1;
+case 5:
+return OCR2;
+case 6:
+return OCR3;
+case 7:
+return ICR1;
+case 8:
+return ICR2;
+case 9:
+return CNT;
+default:
+return [?];
+}
+}
+
 #  define DPRINTF(fmt, args...) \
-  do { printf(imx_timer:  fmt , ##args); } while (0)
+  do { printf(%s:  fmt , __func__, ##args); } while (0)
 #else
 #  define DPRINTF(fmt, args...) do {} while (0)
 #endif
@@ -33,7 +68,7 @@
 #define DEBUG_IMPLEMENTATION 1
 #if DEBUG_IMPLEMENTATION
 #  define IPRINTF(fmt, args...) \
-do  { fprintf(stderr, imx_timer:  fmt, ##args); } while (0)
+  do { fprintf(stderr, %s:  fmt, __func__, ##args); } while (0)
 #else
 #  define IPRINTF(fmt, args...) do {} while (0)
 #endif
@@ -43,16 +78,7 @@
  *
  * This timer counts up continuously while it is enabled, resetting itself
  * to 0 when it reaches TIMER_MAX (in freerun mode) or when it
- * reaches the value of ocr1 (in periodic mode).  WE simulate this using a
- * QEMU ptimer counting down from ocr1 and reloading from ocr1 in
- * periodic mode, or counting from ocr1 to zero, then TIMER_MAX - ocr1.
- * waiting_rov is set when counting from TIMER_MAX.
- *
- * In the real hardware, there are three comparison registers that can
- * trigger interrupts, and compare channel 1 can be used to
- * force-reset the timer. However, this is a `bare-bones'
- * implementation: only what Linux 3.x uses has been implemented
- * (free-running timer from 0 to OCR1 or TIMER_MAX) .
+ * reaches the value of one of the ocrX (in periodic mode).
  */
 
 #define TIMER_MAX  0XUL
@@ -79,9 +105,13 @@
 #define GPT_CR_FO3(1  31) /* Force Output Compare Channel 3 */
 
 #define GPT_SR_OF1  (1  0)
+#define GPT_SR_OF2  (1  1)
+#define GPT_SR_OF3  (1  2)
 #define GPT_SR_ROV  (1  5)
 
 #define GPT_IR_OF1IE  (1  0)
+#define GPT_IR_OF2IE  (1  1)
+#define GPT_IR_OF3IE  (1  2)
 #define GPT_IR_ROVIE  (1  5)
 
 typedef struct {
@@ -101,15 +131,19 @@ typedef struct {
 uint32_t icr2;
 uint32_t cnt;
 
-uint32_t waiting_rov;
+uint32_t next_timeout;
+uint32_t next_int;
+
+uint32_t freq;
+
 qemu_irq irq;
 } IMXTimerGState;
 
 static const VMStateDescription vmstate_imx_timerg = {
-.name = imx-timerg,
-.version_id = 2,
-.minimum_version_id = 2,
-.minimum_version_id_old = 2,
+.name = TYPE_IMX_GPT,
+.version_id = 3,
+.minimum_version_id = 3,
+.minimum_version_id_old = 3,
 .fields  = (VMStateField[]) {
 VMSTATE_UINT32(cr, IMXTimerGState),
 VMSTATE_UINT32(pr, IMXTimerGState),
@@ -121,7 +155,9 @@ static const VMStateDescription vmstate_imx_timerg = {
 VMSTATE_UINT32(icr1, IMXTimerGState),
 VMSTATE_UINT32(icr2, IMXTimerGState),
 VMSTATE_UINT32(cnt, IMXTimerGState),
-VMSTATE_UINT32(waiting_rov, IMXTimerGState),
+VMSTATE_UINT32(next_timeout, IMXTimerGState),
+VMSTATE_UINT32(next_int, IMXTimerGState),
+VMSTATE_UINT32(freq, IMXTimerGState),
 VMSTATE_PTIMER(timer, IMXTimerGState),
 VMSTATE_END_OF_LIST()
 }
@@ -138,16 +174,14 @@ static const IMXClk imx_timerg_clocks[] = {
 NOCLK,/* 111 not defined */
 };
 
-
 static void imx_timerg_set_freq(IMXTimerGState *s)
 {
-int clksrc;
-uint32_t freq;
-
-clksrc = (s-cr  GPT_CR_CLKSRC_SHIFT)  GPT_CR_CLKSRC_MASK;
-freq = imx_clock_frequency(s-ccm, imx_timerg_clocks[clksrc]) / (1 + 
s-pr);
+uint32_t clksrc = extract32(s-cr, GPT_CR_CLKSRC_SHIFT, 3);
+uint32_t freq = imx_clock_frequency(s-ccm, imx_timerg_clocks[clksrc]) 
+

Re: [Qemu-devel] [QEMU PATCH v2] pci: set firmware name of q35 to pci

2013-05-30 Thread Michael S. Tsirkin

On Thu, May 30, 2013 at 03:25:17PM +0800, Amos Kong wrote:
 In QEMU, we set the firmware name of pc-i440fx to 'pci', and the
 typename 'q35-pcihost' is set to the firmware name of q35 by default.
 
 The seabios matches pci devices by /pci/@i0cf8, so q35 device
 could not be identified, seabios fails to adjust the boot priority
 of q35 devices.
 
 This patch sets firmware name of q35 to 'pci'.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 Signed-off-by: Amos Kong ak...@redhat.com

Does this imply
Tested-by: Amos Kong ak...@redhat.com

?

 ---
 v2: just set the fw_name, no need to change seabios
 ---
  hw/pci-host/q35.c | 1 +
  1 file changed, 1 insertion(+)
 
 diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
 index 8467f86..24df6b5 100644
 --- a/hw/pci-host/q35.c
 +++ b/hw/pci-host/q35.c
 @@ -76,6 +76,7 @@ static void q35_host_class_init(ObjectClass *klass, void 
 *data)
  
  k-init = q35_host_init;
  dc-props = mch_props;
 +dc-fw_name = pci;
  }
  
  static void q35_host_initfn(Object *obj)
 -- 
 1.8.1.4

[Qemu-devel] broken incoming migration

2013-05-30 Thread Alexey Kardashevskiy

Hi!

I found the migration broken on pseries platform, specifically, this patch
broke it:

f1c72795af573b24a7da5eb52375c9aba8a37972
migration: do not sent zero pages in bulk stage

The idea is not to send zero pages to the destination guest which is
expected to have 100% empty RAM.

However on pseries plaftorm the guest always has some stuff in the RAM as a
part of initialization (device tree, system firmware and rtas (?)) so it is
not completely empty. As the source guest cannot detect this, it skips some
pages during migration and we get a broken destination guest. Bug.

While the idea is ok in general, I do not see any easy way to fix it as
neither QEMUMachine::init nor QEMUMachine::reset callbacks has information
about whether we are about to receive a migration or not (-incoming
parameter) and we cannot move device-tree and system firmware
initialization anywhere else.

ram_bulk_stage is static and cannot be disabled from the platform
initialization code.

So what would the community suggest? Thanks!


-- 
Alexey

Re: [Qemu-devel] updated: kvm networking todo wiki

2013-05-30 Thread Rusty Russell

Stefan Hajnoczi stefa...@gmail.com writes:
 On Thu, May 30, 2013 at 7:23 AM, Rusty Russell ru...@rustcorp.com.au wrote:
 On the receive side, what can we do better than readv?  If we need to
 return to userspace to tell the guest that we've got a new packet, we
 don't win on latency.  We might reduce syscall overhead with a
 multi-dimensional readv to read multiple packets at once?

 Sounds like recvmmsg(2).

Wow... the future is here, today!

Thanks,
Rusty.

Re: [Qemu-devel] broken incoming migration

2013-05-30 Thread Paolo Bonzini

Il 30/05/2013 09:44, Alexey Kardashevskiy ha scritto:
 Hi!
 
 I found the migration broken on pseries platform, specifically, this patch
 broke it:
 
 f1c72795af573b24a7da5eb52375c9aba8a37972
 migration: do not sent zero pages in bulk stage
 
 The idea is not to send zero pages to the destination guest which is
 expected to have 100% empty RAM.
 
 However on pseries plaftorm the guest always has some stuff in the RAM as a
 part of initialization (device tree, system firmware and rtas (?)) so it is
 not completely empty. As the source guest cannot detect this, it skips some
 pages during migration and we get a broken destination guest. Bug.
 
 While the idea is ok in general, I do not see any easy way to fix it as
 neither QEMUMachine::init nor QEMUMachine::reset callbacks has information
 about whether we are about to receive a migration or not (-incoming
 parameter) and we cannot move device-tree and system firmware
 initialization anywhere else.
 
 ram_bulk_stage is static and cannot be disabled from the platform
 initialization code.
 
 So what would the community suggest?

Revert the patch. :)

Paolo

Re: [Qemu-devel] broken incoming migration

2013-05-30 Thread Alexey Kardashevskiy

Forgot some cc:.

On 05/30/2013 05:44 PM, Alexey Kardashevskiy wrote:
 Hi!
 
 I found the migration broken on pseries platform, specifically, this patch
 broke it:
 
 f1c72795af573b24a7da5eb52375c9aba8a37972
 migration: do not sent zero pages in bulk stage
 
 The idea is not to send zero pages to the destination guest which is
 expected to have 100% empty RAM.
 
 However on pseries plaftorm the guest always has some stuff in the RAM as a
 part of initialization (device tree, system firmware and rtas (?)) so it is
 not completely empty. As the source guest cannot detect this, it skips some
 pages during migration and we get a broken destination guest. Bug.
 
 While the idea is ok in general, I do not see any easy way to fix it as
 neither QEMUMachine::init nor QEMUMachine::reset callbacks has information
 about whether we are about to receive a migration or not (-incoming
 parameter) and we cannot move device-tree and system firmware
 initialization anywhere else.
 
 ram_bulk_stage is static and cannot be disabled from the platform
 initialization code.
 
 So what would the community suggest? Thanks!
 
 


-- 
Alexey

Re: [Qemu-devel] snabbswitch integration with QEMU for userspace ethernet I/O

2013-05-30 Thread Julian Stecklina

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 05/30/2013 08:46 AM, Stefan Hajnoczi wrote:
 On Wed, May 29, 2013 at 6:02 PM, Julian Stecklina 
 jstec...@os.inf.tu-dresden.de wrote:
 On 05/29/2013 04:21 PM, Stefan Hajnoczi wrote:
 The fact that a single switch process has shared memory access
 to all guests' RAM is critical.  If the switch process is
 exploited, then that exposes other guests' data!  (Think of a
 multi-tenant host with guests belonging to different users.)
 
 True. But people don't mind having instruction decoding and half
 of virtio in the kernel these days, so it can't be that security
 critical...
 
 No, it's still security critical.  If there were equivalent
 solutions with better security then I'm sure people would accept
 them.  It's just that there isn't an equivalent solution yet :).

My comment was more or less meant in a resigning way. ;) At least we
are not putting HTTP servers in there any more.

Julian

-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.19 (GNU/Linux)

iEYEARECAAYFAlGnCRMACgkQ2EtjUdW3H9mzFwCghZxvckYgZ4atLm3HLPPWF/Lb
688AnRXm12jbBlmCVOKSaDUHHejEdh7O
=csrK
-END PGP SIGNATURE-

Re: [Qemu-devel] broken incoming migration

2013-05-30 Thread Alexey Kardashevskiy

On 05/30/2013 05:49 PM, Paolo Bonzini wrote:
 Il 30/05/2013 09:44, Alexey Kardashevskiy ha scritto:
 Hi!

 I found the migration broken on pseries platform, specifically, this patch
 broke it:

 f1c72795af573b24a7da5eb52375c9aba8a37972
 migration: do not sent zero pages in bulk stage

 The idea is not to send zero pages to the destination guest which is
 expected to have 100% empty RAM.

 However on pseries plaftorm the guest always has some stuff in the RAM as a
 part of initialization (device tree, system firmware and rtas (?)) so it is
 not completely empty. As the source guest cannot detect this, it skips some
 pages during migration and we get a broken destination guest. Bug.

 While the idea is ok in general, I do not see any easy way to fix it as
 neither QEMUMachine::init nor QEMUMachine::reset callbacks has information
 about whether we are about to receive a migration or not (-incoming
 parameter) and we cannot move device-tree and system firmware
 initialization anywhere else.

 ram_bulk_stage is static and cannot be disabled from the platform
 initialization code.

 So what would the community suggest?
 
 Revert the patch. :)

I'll wait for 24 hours (forgot to cc: the author) and then post a revert
patch :)



-- 
Alexey

Re: [Qemu-devel] [QEMU PATCH v2] pci: set firmware name of q35 to pci

2013-05-30 Thread Michael S. Tsirkin

On Thu, May 30, 2013 at 03:25:17PM +0800, Amos Kong wrote:
 In QEMU, we set the firmware name of pc-i440fx to 'pci', and the
 typename 'q35-pcihost' is set to the firmware name of q35 by default.
 
 The seabios matches pci devices by /pci/@i0cf8, so q35 device
 could not be identified, seabios fails to adjust the boot priority
 of q35 devices.
 
 This patch sets firmware name of q35 to 'pci'.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 Signed-off-by: Amos Kong ak...@redhat.com

I've applied this already.

 ---
 v2: just set the fw_name, no need to change seabios
 ---
  hw/pci-host/q35.c | 1 +
  1 file changed, 1 insertion(+)
 
 diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
 index 8467f86..24df6b5 100644
 --- a/hw/pci-host/q35.c
 +++ b/hw/pci-host/q35.c
 @@ -76,6 +76,7 @@ static void q35_host_class_init(ObjectClass *klass, void 
 *data)
  
  k-init = q35_host_init;
  dc-props = mch_props;
 +dc-fw_name = pci;
  }
  
  static void q35_host_initfn(Object *obj)
 -- 
 1.8.1.4

Re: [Qemu-devel] Fwd: [Qemu-discuss] pci-assign error

2013-05-30 Thread Jean Parpaillon

Dear Alex,
Thank you for your answer. Unfortunately, I had not yet the opportunity
to test it as I don't have a Kepler arch (I suppose it depends on it ?).

I will keep you informed of the results.


Regards,
Jean


Le 03/04/2013 20:16, Alex Williamson a écrit :
 On Wed, 2013-04-03 at 16:26 +0200, Jean Parpaillon wrote:


  Message original 
 Sujet: [Qemu-discuss] pci-assign error
 Date : Wed, 03 Apr 2013 12:12:15 +0200
 De : Jean Parpaillon jean.parpail...@free.fr
 Pour : qemu-disc...@nongnu.org

 Dear all,
 I'm trying to assign a PCI graphic card to a qemu guest.
 
 Hi,
 
 A) Graphics assignment isn't supported and this likely won't be the only
 problem you find, but if you want to experiment with VFIO-based VGA
 support please try:
 
 kernel:
 git://github.com/awilliam/linux-vfio.git (branch next)
 qemu:
 git://github.com/awilliam/qemu-vfio.git (tag vfio-pci-for-qemu-20130401.0)
 
 I sent a qemu pull request for the latter and the kernel updates should
 be in 3.10.
 
 B) Does assignment of anything work on your system or do you perhaps see
 an error in dmesg like:
 
 No interrupt remapping support, disallowing device assignment.
 Re-enble with allow_unsafe_assigned_interrupts=1 module
 option.
 
 If so, try that.  Thanks,
 
 Alex
 
 When I start the guest with libvirt, I have the following error in
 /var/log/libvirt/qemu/sofa.log:
 LC_ALL=C
 PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
 HOME=/root USER=root LOGNAME=root QEMU_AUDIO_DRV=spice /usr/bin/kvm
 -name sofa -S -M pc-1.1 -m 1500 -smp 2,sockets=2,cores=1,threads=1 -uuid
 1234beef -no-user-config -nodefaults -chardev
 socket,id=charmonitor,path=/var/lib/libvirt/qemu/sofa.monitor,server,nowait
 -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc
 -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2
 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive
 file=/dev/vgsurcouf/sofa,if=none,id=drive-virtio-disk0,format=raw
 -device
 virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
 -drive
 file=/var/lib/libvirt/images/iso/debian-testing-amd64-netinst.iso,if=none,id=drive-ide0-1-0,readonly=on,format=raw
 -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev
 tap,fd=23,id=hostnet0,vhost=on,vhostfd=24 -device
 virtio-net-pci,netdev=hostnet0,id=net0,mac=...,bus=pci.0,addr=0x3
 -chardev pty,id=charserial0 -device
 isa-serial,chardev=charserial0,id=serial0 -chardev
 spicevmc,id=charchannel0,name=vdagent -device
 virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0
 -device usb-tablet,id=input0 -spice
 port=5900,addr=127.0.0.1,disable-ticketing,seamless-migration=on -vga
 qxl -global qxl-vga.ram_size=67108864 -global qxl-vga.vram_size=67108864
 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device
 hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device
 pci-assign,host=02:00.0,id=hostdev0,configfd=25,bus=pci.0,multifunction=on,addr=0x8
 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7
 W: kvm binary is deprecated, please use qemu-system-x86_64 instead
 char device redirected to /dev/pts/2 (label charserial0)
 qemu-system-x86_64: -device
 pci-assign,host=02:00.0,id=hostdev0,configfd=25,bus=pci.0,multifunction=on,addr=0x8:
 Failed to assign device hostdev0 : Operation not permitted
 qemu-system-x86_64: -device
 pci-assign,host=02:00.0,id=hostdev0,configfd=25,bus=pci.0,multifunction=on,addr=0x8:
 Device 'kvm-pci-assign' could not be initialized
 2013-04-03 09:44:35.398+: shutting down



 My host setup: Debian Wheezy + qemu 1.4.0, libvirt 1.0.4, kernel 3.8.5
 My host gfx:
  $ lspci|grep VGA
 02:00.0 VGA compatible controller: NVIDIA Corporation GT218 [Quadro NVS
 300] (rev a2)

 I run the following to enable handle device with PCI stub, includind
 second function:
 echo 10de 10d8  /sys/bus/pci/drivers/pci-stub/new_id
 echo 10de 0be3  /sys/bus/pci/drivers/pci-stub/new_id
 echo :02:00.0  /sys/bus/pci/devices/:02:00.0/driver/unbind
 echo :02:00.1  /sys/bus/pci/devices/:02:00.1/driver/unbind
 sleep 3
 echo :02:00.0  /sys/bus/pci/drivers/pci-stub/bind
 echo :02:00.1  /sys/bus/pci/drivers/pci-stub/bind

 Intel VT-d is enabled:
 $ dmesg |grep Directed
 [1.167179] PCI-DMA: Intel(R) Virtualization Technology for Directed I/O

 $ dmesg |grep IOMMU
 [0.00] Intel-IOMMU: enabled
 [0.171520] dmar: IOMMU 0: reg_base_addr ebffe000 ver 1:0 cap
 c90780106f0462 ecap f020f6
 [0.171529] dmar: IOMMU 1: reg_base_addr fedc ver 1:0 cap
 c90780106f0462 ecap f020f6
 [1.166855] IOMMU 0 0xebffe000: using Queued invalidation
 [1.166861] IOMMU 1 0xfedc: using Queued invalidation
 [1.166889] IOMMU: hardware identity mapping for device :00:00.0
 [1.166891] IOMMU: hardware identity mapping for device :00:03.0
 [1.166893] IOMMU: hardware identity mapping for device :00:07.0
 [1.166895]

Re: [Qemu-devel] broken incoming migration

2013-05-30 Thread Peter Lieven

Am 30.05.2013 10:18, schrieb Alexey Kardashevskiy:
 On 05/30/2013 05:49 PM, Paolo Bonzini wrote:
 Il 30/05/2013 09:44, Alexey Kardashevskiy ha scritto:
 Hi!

 I found the migration broken on pseries platform, specifically, this patch
 broke it:

 f1c72795af573b24a7da5eb52375c9aba8a37972
 migration: do not sent zero pages in bulk stage

 The idea is not to send zero pages to the destination guest which is
 expected to have 100% empty RAM.

 However on pseries plaftorm the guest always has some stuff in the RAM as a
 part of initialization (device tree, system firmware and rtas (?)) so it is
 not completely empty. As the source guest cannot detect this, it skips some
 pages during migration and we get a broken destination guest. Bug.

 While the idea is ok in general, I do not see any easy way to fix it as
 neither QEMUMachine::init nor QEMUMachine::reset callbacks has information
 about whether we are about to receive a migration or not (-incoming
 parameter) and we cannot move device-tree and system firmware
 initialization anywhere else.

 ram_bulk_stage is static and cannot be disabled from the platform
 initialization code.

 So what would the community suggest?
 Revert the patch. :)
 I'll wait for 24 hours (forgot to cc: the author) and then post a revert
 patch :)



does this problem only occur on pseries emulation?

not sending zero pages is not only a performance benefit it also makes
overcomitted memory usable. the madv_dontneed seems to kick in asynchronously
and memory is not available immediately.

what I do not understand if the a memory region is not empty at destination
due to device tree, firmware etc. it shouldn't be empty at the source as well so
in theory this should not be a problem.

Peter

[Qemu-devel] [PATCH] target-ppc kvm: missing kvm_arch_init_irq_routing

2013-05-30 Thread Alexey Kardashevskiy

This adds an empty stub to make the compiler happy.

Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
---
 target-ppc/kvm.c |4 
 1 file changed, 4 insertions(+)

diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index 3ab2946..2bbc3b8 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -1797,3 +1797,7 @@ int kvm_arch_on_sigbus(int code, void *addr)
 {
 return 1;
 }
+
+void kvm_arch_init_irq_routing(KVMState *s)
+{
+}
-- 
1.7.10.4

[Qemu-devel] [PATCH] target-ppc kvm: save cr register

2013-05-30 Thread Alexey Kardashevskiy

This adds a missing code to save CR (condition register) via
kvm_arch_put_registers(). kvm_arch_get_registers() already has it.

Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
---
 target-ppc/kvm.c |5 +
 1 file changed, 5 insertions(+)

diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index 2bbc3b8..c89dd58 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -791,6 +791,11 @@ int kvm_arch_put_registers(CPUState *cs, int level)
 for (i = 0;i  32; i++)
 regs.gpr[i] = env-gpr[i];
 
+regs.cr = 0;
+for (i = 0; i  8; i++) {
+regs.cr |= (env-crf[i]  15)  (4 * (7 - i));
+}
+
 ret = kvm_vcpu_ioctl(cs, KVM_SET_REGS, regs);
 if (ret  0)
 return ret;
-- 
1.7.10.4

Re: [Qemu-devel] [SeaBIOS] KVM call agenda for 2013-05-28

2013-05-30 Thread David Woodhouse

On Wed, 2013-05-29 at 11:18 -0500, Anthony Liguori wrote:
 
  Certainly an option, but that is a long-term project.
 
 Out of curiousity, are there other benefits to using coreboot as a core
 firmware in QEMU?
 
 Is there a payload we would ever plausibly use besides OVMF and SeaBIOS?

I like the idea of using Coreboot on the UEFI side — if the most
actively used TianoCore platform is CorebootPkg instead of OvmfPkg, that
makes it a lot easier for people using *real* hardware with Coreboot to
use TianoCore.

And it helps to dispel the stupid misconception in some quarters that
Coreboot *competes* with UEFI and thus cannot possibly be supported
because helping something that competes with UEFI would be bad.

 Is there a payload we would ever plausibly use besides OVMF and
 SeaBIOS?

For my part I want to get to the point where the default firmware
shipped with qemu can be OVMF. We have SeaBIOS-as-CSM working, which was
one of the biggest barriers. There are a few more things (like NV
variable storage, in particular) that I need to fix before I can
actually make that suggestion with a straight face though...

-- 
dwmw2



smime.p7s
Description: S/MIME cryptographic signature

[Qemu-devel] [PATCH] virtio-pci: drop unused wmb macro

2013-05-30 Thread Michael S. Tsirkin

The implementation is wrong for kvm, and it's unused anyway.
Drop it.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
Message-id: 20130528102023.ga30...@redhat.com
Signed-off-by: Anthony Liguori aligu...@us.ibm.com
---
 hw/virtio/virtio-pci.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 70d2c6b..444b71a 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -89,12 +89,6 @@
 /* Flags track per-device state like workarounds for quirks in older guests. */
 #define VIRTIO_PCI_FLAG_BUS_MASTER_BUG  (1  0)
 
-/* QEMU doesn't strictly need write barriers since everything runs in
- * lock-step.  We'll leave the calls to wmb() in though to make it obvious for
- * KVM or if kqemu gets SMP support.
- */
-#define wmb() do { } while (0)
-
 /* HACK for virtio to determine if it's running a big endian guest */
 bool virtio_is_big_endian(void);
 
-- 
MST

[Qemu-devel] [PATCH 0/2] fixup some fallback from tree reorg

2013-05-30 Thread Michael S. Tsirkin

Here are some patches fixing up minor issues
in the tree reorganization.

Michael S. Tsirkin (2):
  dec.c - move to pci-bridge
  firmware_abi: move to include/hw/nvram/

 hw/pci-bridge/Makefile.objs  |   2 +
 hw/pci-bridge/dec.c  | 156 +++
 hw/pci-bridge/dec.h  |  10 ++
 hw/pci-host/Makefile.objs|   1 -
 hw/pci-host/dec.c| 156 ---
 hw/pci-host/dec.h|  10 --
 hw/sparc/sun4m.c |   2 +-
 hw/sparc64/sun4u.c   |   2 +-
 include/hw/nvram/openbios_firmware_abi.h |  73 +++
 include/hw/sparc/firmware_abi.h  |  73 ---
 10 files changed, 243 insertions(+), 242 deletions(-)
 create mode 100644 hw/pci-bridge/dec.c
 create mode 100644 hw/pci-bridge/dec.h
 delete mode 100644 hw/pci-host/dec.c
 delete mode 100644 hw/pci-host/dec.h
 create mode 100644 include/hw/nvram/openbios_firmware_abi.h
 delete mode 100644 include/hw/sparc/firmware_abi.h

-- 
MST

[Qemu-devel] [PATCH 1/2] dec.c - move to pci-bridge

2013-05-30 Thread Michael S. Tsirkin

Looks like dec.c is in pci-host by mistake.
Moving it over to pci-bridge.

Cc: Andreas Färber afaer...@suse.de
Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 hw/pci-bridge/Makefile.objs |   2 +
 hw/pci-bridge/dec.c | 156 
 hw/pci-bridge/dec.h |  10 +++
 hw/pci-host/Makefile.objs   |   1 -
 hw/pci-host/dec.c   | 156 
 hw/pci-host/dec.h   |  10 ---
 6 files changed, 168 insertions(+), 167 deletions(-)
 create mode 100644 hw/pci-bridge/dec.c
 create mode 100644 hw/pci-bridge/dec.h
 delete mode 100644 hw/pci-host/dec.c
 delete mode 100644 hw/pci-host/dec.h

diff --git a/hw/pci-bridge/Makefile.objs b/hw/pci-bridge/Makefile.objs
index 5dd92d2..968b369 100644
--- a/hw/pci-bridge/Makefile.objs
+++ b/hw/pci-bridge/Makefile.objs
@@ -1,3 +1,5 @@
 common-obj-y += pci_bridge_dev.o
 common-obj-y += ioh3420.o xio3130_upstream.o xio3130_downstream.o
 common-obj-y += i82801b11.o
+# NewWorld PowerMac
+common-obj-$(CONFIG_DEC_PCI) += dec.o
diff --git a/hw/pci-bridge/dec.c b/hw/pci-bridge/dec.c
new file mode 100644
index 000..cff458b
--- /dev/null
+++ b/hw/pci-bridge/dec.c
@@ -0,0 +1,156 @@
+/*
+ * QEMU DEC 21154 PCI bridge
+ *
+ * Copyright (c) 2006-2007 Fabrice Bellard
+ * Copyright (c) 2007 Jocelyn Mayer
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the Software), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include dec.h
+#include hw/sysbus.h
+#include hw/pci/pci.h
+#include hw/pci/pci_host.h
+#include hw/pci/pci_bridge.h
+#include hw/pci/pci_bus.h
+
+/* debug DEC */
+//#define DEBUG_DEC
+
+#ifdef DEBUG_DEC
+#define DEC_DPRINTF(fmt, ...)   \
+do { printf(DEC:  fmt , ## __VA_ARGS__); } while (0)
+#else
+#define DEC_DPRINTF(fmt, ...)
+#endif
+
+#define DEC_21154(obj) OBJECT_CHECK(DECState, (obj), TYPE_DEC_21154)
+
+typedef struct DECState {
+PCIHostState parent_obj;
+} DECState;
+
+static int dec_map_irq(PCIDevice *pci_dev, int irq_num)
+{
+return irq_num;
+}
+
+static int dec_pci_bridge_initfn(PCIDevice *pci_dev)
+{
+return pci_bridge_initfn(pci_dev, TYPE_PCI_BUS);
+}
+
+static void dec_21154_pci_bridge_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
+
+k-init = dec_pci_bridge_initfn;
+k-exit = pci_bridge_exitfn;
+k-vendor_id = PCI_VENDOR_ID_DEC;
+k-device_id = PCI_DEVICE_ID_DEC_21154;
+k-config_write = pci_bridge_write_config;
+k-is_bridge = 1;
+dc-desc = DEC 21154 PCI-PCI bridge;
+dc-reset = pci_bridge_reset;
+dc-vmsd = vmstate_pci_device;
+}
+
+static const TypeInfo dec_21154_pci_bridge_info = {
+.name  = dec-21154-p2p-bridge,
+.parent= TYPE_PCI_DEVICE,
+.instance_size = sizeof(PCIBridge),
+.class_init= dec_21154_pci_bridge_class_init,
+};
+
+PCIBus *pci_dec_21154_init(PCIBus *parent_bus, int devfn)
+{
+PCIDevice *dev;
+PCIBridge *br;
+
+dev = pci_create_multifunction(parent_bus, devfn, false,
+   dec-21154-p2p-bridge);
+br = DO_UPCAST(PCIBridge, dev, dev);
+pci_bridge_map_irq(br, DEC 21154 PCI-PCI bridge, dec_map_irq);
+qdev_init_nofail(dev-qdev);
+return pci_bridge_get_sec_bus(br);
+}
+
+static int pci_dec_21154_device_init(SysBusDevice *dev)
+{
+PCIHostState *phb;
+
+phb = PCI_HOST_BRIDGE(dev);
+
+memory_region_init_io(phb-conf_mem, pci_host_conf_le_ops,
+  dev, pci-conf-idx, 0x1000);
+memory_region_init_io(phb-data_mem, pci_host_data_le_ops,
+  dev, pci-data-idx, 0x1000);
+sysbus_init_mmio(dev, phb-conf_mem);
+sysbus_init_mmio(dev, phb-data_mem);
+return 0;
+}
+
+static int dec_21154_pci_host_init(PCIDevice *d)
+{
+/* PCI2PCI bridge same values as PearPC - check this */
+return 0;
+}
+
+static void dec_21154_pci_host_class_init(ObjectClass *klass, void *data)
+{
+

Re: [Qemu-devel] broken incoming migration

2013-05-30 Thread Alexey Kardashevskiy

On 05/30/2013 07:08 PM, Peter Lieven wrote:
 Am 30.05.2013 10:18, schrieb Alexey Kardashevskiy:
 On 05/30/2013 05:49 PM, Paolo Bonzini wrote:
 Il 30/05/2013 09:44, Alexey Kardashevskiy ha scritto:
 Hi!

 I found the migration broken on pseries platform, specifically, this patch
 broke it:

 f1c72795af573b24a7da5eb52375c9aba8a37972
 migration: do not sent zero pages in bulk stage

 The idea is not to send zero pages to the destination guest which is
 expected to have 100% empty RAM.

 However on pseries plaftorm the guest always has some stuff in the RAM as a
 part of initialization (device tree, system firmware and rtas (?)) so it is
 not completely empty. As the source guest cannot detect this, it skips some
 pages during migration and we get a broken destination guest. Bug.

 While the idea is ok in general, I do not see any easy way to fix it as
 neither QEMUMachine::init nor QEMUMachine::reset callbacks has information
 about whether we are about to receive a migration or not (-incoming
 parameter) and we cannot move device-tree and system firmware
 initialization anywhere else.

 ram_bulk_stage is static and cannot be disabled from the platform
 initialization code.

 So what would the community suggest?
 Revert the patch. :)
 I'll wait for 24 hours (forgot to cc: the author) and then post a revert
 patch :)



 does this problem only occur on pseries emulation?

No idea, really.

 not sending zero pages is not only a performance benefit it also makes
 overcomitted memory usable. the madv_dontneed seems to kick in asynchronously
 and memory is not available immediately.

Ok, I do not mind :)

 what I do not understand if the a memory region is not empty at destination
 due to device tree, firmware etc. it shouldn't be empty at the source as well 
 so
 in theory this should not be a problem.

This is how it works - first QEMU allocates RAM and put devicetree+firmware
somewhere. Then QEMU starts the guest so the firmware starts, loads the
kernel and then the kernel zeroes the whole (most of?) RAM including the
area where the firmware used to be.

Now we migrate. If the source guest is in the kernel already, then it does
not know about the memory area previously occupied by the firmware, it is
just an empty page. If the source guest is still in the firmware, then
those pages are not empty and they are perfectly migrated.


-- 
Alexey

[Qemu-devel] [PATCH 2/2] firmware_abi: move to include/hw/nvram/

2013-05-30 Thread Michael S. Tsirkin

firmware_abi.h with structs for OpenBIOS landed in hw/sparc/ by mistake
- move it to hw/nvram/ alongside fw_cfg.h.  In addition to sparc it's
included from ppc mac_nvram.c and will need to include it from prep.c in
the future.

Cc: Andreas Färber afaer...@suse.de
Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 hw/sparc/sun4m.c |  2 +-
 hw/sparc64/sun4u.c   |  2 +-
 include/hw/nvram/openbios_firmware_abi.h | 73 
 include/hw/sparc/firmware_abi.h  | 73 
 4 files changed, 75 insertions(+), 75 deletions(-)
 create mode 100644 include/hw/nvram/openbios_firmware_abi.h
 delete mode 100644 include/hw/sparc/firmware_abi.h

diff --git a/hw/sparc/sun4m.c b/hw/sparc/sun4m.c
index 8840881..6e62d17 100644
--- a/hw/sparc/sun4m.c
+++ b/hw/sparc/sun4m.c
@@ -30,7 +30,7 @@
 #include sysemu/sysemu.h
 #include net/net.h
 #include hw/boards.h
-#include hw/sparc/firmware_abi.h
+#include hw/nvram/openbios_firmware_abi.h
 #include hw/scsi/esp.h
 #include hw/i386/pc.h
 #include hw/isa/isa.h
diff --git a/hw/sparc64/sun4u.c b/hw/sparc64/sun4u.c
index 5c2bbd4..13672de 100644
--- a/hw/sparc64/sun4u.c
+++ b/hw/sparc64/sun4u.c
@@ -32,7 +32,7 @@
 #include qemu/timer.h
 #include sysemu/sysemu.h
 #include hw/boards.h
-#include hw/sparc/firmware_abi.h
+#include hw/nvram/openbios_firmware_abi.h
 #include hw/nvram/fw_cfg.h
 #include hw/sysbus.h
 #include hw/ide.h
diff --git a/include/hw/nvram/openbios_firmware_abi.h 
b/include/hw/nvram/openbios_firmware_abi.h
new file mode 100644
index 000..5e6e5d4
--- /dev/null
+++ b/include/hw/nvram/openbios_firmware_abi.h
@@ -0,0 +1,73 @@
+#ifndef FIRMWARE_ABI_H
+#define FIRMWARE_ABI_H
+
+/* OpenBIOS NVRAM partition */
+struct OpenBIOS_nvpart_v1 {
+uint8_t signature;
+uint8_t checksum;
+uint16_t len; // BE, length divided by 16
+char name[12];
+};
+
+#define OPENBIOS_PART_SYSTEM 0x70
+#define OPENBIOS_PART_FREE 0x7f
+
+static inline void
+OpenBIOS_finish_partition(struct OpenBIOS_nvpart_v1 *header, uint32_t size)
+{
+unsigned int i, sum;
+uint8_t *tmpptr;
+
+// Length divided by 16
+header-len = cpu_to_be16(size  4);
+
+// Checksum
+tmpptr = (uint8_t *)header;
+sum = *tmpptr;
+for (i = 0; i  14; i++) {
+sum += tmpptr[2 + i];
+sum = (sum + ((sum  0xff00)  8))  0xff;
+}
+header-checksum = sum  0xff;
+}
+
+static inline uint32_t
+OpenBIOS_set_var(uint8_t *nvram, uint32_t addr, const char *str)
+{
+uint32_t len;
+
+len = strlen(str) + 1;
+memcpy(nvram[addr], str, len);
+
+return addr + len;
+}
+
+/* Sun IDPROM structure at the end of NVRAM */
+/* from http://www.squirrel.com/squirrel/sun-nvram-hostid.faq.html */
+struct Sun_nvram {
+uint8_t type;   /* always 01 */
+uint8_t machine_id; /* first byte of host id (machine type) */
+uint8_t macaddr[6]; /* 6 byte ethernet address (first 3 bytes 08, 00, 20) 
*/
+uint8_t date[4];/* date of manufacture */
+uint8_t hostid[3];  /* remaining 3 bytes of host id (serial number) */
+uint8_t checksum;   /* bitwise xor of previous bytes */
+};
+
+static inline void
+Sun_init_header(struct Sun_nvram *header, const uint8_t *macaddr, int 
machine_id)
+{
+uint8_t tmp, *tmpptr;
+unsigned int i;
+
+header-type = 1;
+header-machine_id = machine_id  0xff;
+memcpy(header-macaddr, macaddr, 6);
+/* Calculate checksum */
+tmp = 0;
+tmpptr = (uint8_t *)header;
+for (i = 0; i  15; i++)
+tmp ^= tmpptr[i];
+
+header-checksum = tmp;
+}
+#endif /* FIRMWARE_ABI_H */
diff --git a/include/hw/sparc/firmware_abi.h b/include/hw/sparc/firmware_abi.h
deleted file mode 100644
index 5e6e5d4..000
--- a/include/hw/sparc/firmware_abi.h
+++ /dev/null
@@ -1,73 +0,0 @@
-#ifndef FIRMWARE_ABI_H
-#define FIRMWARE_ABI_H
-
-/* OpenBIOS NVRAM partition */
-struct OpenBIOS_nvpart_v1 {
-uint8_t signature;
-uint8_t checksum;
-uint16_t len; // BE, length divided by 16
-char name[12];
-};
-
-#define OPENBIOS_PART_SYSTEM 0x70
-#define OPENBIOS_PART_FREE 0x7f
-
-static inline void
-OpenBIOS_finish_partition(struct OpenBIOS_nvpart_v1 *header, uint32_t size)
-{
-unsigned int i, sum;
-uint8_t *tmpptr;
-
-// Length divided by 16
-header-len = cpu_to_be16(size  4);
-
-// Checksum
-tmpptr = (uint8_t *)header;
-sum = *tmpptr;
-for (i = 0; i  14; i++) {
-sum += tmpptr[2 + i];
-sum = (sum + ((sum  0xff00)  8))  0xff;
-}
-header-checksum = sum  0xff;
-}
-
-static inline uint32_t
-OpenBIOS_set_var(uint8_t *nvram, uint32_t addr, const char *str)
-{
-uint32_t len;
-
-len = strlen(str) + 1;
-memcpy(nvram[addr], str, len);
-
-return addr + len;
-}
-
-/* Sun IDPROM structure at the end of NVRAM */
-/* from http://www.squirrel.com/squirrel/sun-nvram-hostid.faq.html */
-struct Sun_nvram {
-uint8_t type;   /* always 01 */
-uint8_t machine_id; /* first

[Qemu-devel] [PATCH V15 1/6] add def_value_str in QemuOptDesc struct and rewrite qemu_opts_print

2013-05-30 Thread Dongxu Wang

From: Dong Xu Wang wdon...@linux.vnet.ibm.com

qemu_opts_print has no user now, so can re-write the function safely.

qemu_opts_print will be used while using qemu-img create, it will
produce the same output as previous code.

The behavior of this function has changed:

1. Print every possible option, whether a value has been set or not.
2. Option descriptors may provide a default value.
3. Print to stdout instead of stderr.

Previously the behavior was to print every option that has been set.
Options that have not been set would be skipped.

v13-v14:
1) fix memory leak.
2) make opt_set do not accpet null value argument.

v12-v13
1) re-write commit message.

v11-v12
1) make def_value_str become the real default value string in opt_set
function.

v10-v11:
1) print all values that have actually been assigned while accept-any
cases.

v7-v8:
1) print elements = accept any params while opts_accepts_any() ==
true.
2) since def_print_str is the default value if an option isn't set,
so rename it to def_value_str.

Signed-off-by: Dong Xu Wang wdon...@linux.vnet.ibm.com
Signed-off-by: Dongxu Wang wdon...@linux.vnet.ibm.com
---
 include/qemu/option.h |  3 ++-
 util/qemu-option.c| 32 ++--
 2 files changed, 28 insertions(+), 7 deletions(-)

diff --git a/include/qemu/option.h b/include/qemu/option.h
index bdb6d21..b928ab0 100644
--- a/include/qemu/option.h
+++ b/include/qemu/option.h
@@ -96,6 +96,7 @@ typedef struct QemuOptDesc {
 const char *name;
 enum QemuOptType type;
 const char *help;
+const char *def_value_str;
 } QemuOptDesc;
 
 struct QemuOptsList {
@@ -152,7 +153,7 @@ QDict *qemu_opts_to_qdict(QemuOpts *opts, QDict *qdict);
 void qemu_opts_absorb_qdict(QemuOpts *opts, QDict *qdict, Error **errp);
 
 typedef int (*qemu_opts_loopfunc)(QemuOpts *opts, void *opaque);
-int qemu_opts_print(QemuOpts *opts, void *dummy);
+int qemu_opts_print(QemuOpts *opts);
 int qemu_opts_foreach(QemuOptsList *list, qemu_opts_loopfunc func, void 
*opaque,
   int abort_on_failure);
 
diff --git a/util/qemu-option.c b/util/qemu-option.c
index 8b74bf1..84d8c8b 100644
--- a/util/qemu-option.c
+++ b/util/qemu-option.c
@@ -860,16 +860,36 @@ void qemu_opts_del(QemuOpts *opts)
 g_free(opts);
 }
 
-int qemu_opts_print(QemuOpts *opts, void *dummy)
+int qemu_opts_print(QemuOpts *opts)
 {
 QemuOpt *opt;
+QemuOptDesc *desc = opts-list-desc;
 
-fprintf(stderr, %s: %s:, opts-list-name,
-opts-id ? opts-id : noid);
-QTAILQ_FOREACH(opt, opts-head, next) {
-fprintf(stderr,  %s=\%s\, opt-name, opt-str);
+if (desc[0].name == NULL) {
+QTAILQ_FOREACH(opt, opts-head, next) {
+printf(%s=\%s\ , opt-name, opt-str);
+}
+return 0;
+}
+for (; desc  desc-name; desc++) {
+const char *value = desc-def_value_str;
+QemuOpt *opt;
+
+opt = qemu_opt_find(opts, desc-name);
+if (opt) {
+value = opt-str;
+}
+
+if (!value) {
+continue;
+}
+
+if (desc-type == QEMU_OPT_STRING) {
+printf(%s='%s' , desc-name, value);
+} else {
+printf(%s=%s , desc-name, value);
+}
 }
-fprintf(stderr, \n);
 return 0;
 }
 
-- 
1.7.11.7

[Qemu-devel] [PATCH V15 2/6] avoid duplication of default value in QemuOpts

2013-05-30 Thread Dongxu Wang

From: Dong Xu Wang wdon...@linux.vnet.ibm.com

This patch will move the default value entirely to QemuOptDesc.

When getting the value of an option that hasn't been set, and
QemuOptDesc has a default value, return that.  Else, behave as
before.

Example: qemu_opt_get_number(opts, foo, 42)

   If foo has been set in opts, return its value.

   Else, if opt's QemuOptDesc has a default value for foo, return
   that.

   Else, return 42.

   Note that the last argument is useless when QemuOptDesc has a
   default value.  Ugly.  If it bothers us, we could assert.

Example: qemu_opt_get(opts, bar)

   If bar has been set in opts, return its value.

   Else, if opt's QemuOptDesc has a default value for bar, return
   that.

   Else, return NULL.

Signed-off-by: Dong Xu Wang wdon...@linux.vnet.ibm.com

v13-v14:
1) change code style.
2) assert errors.
Signed-off-by: Dongxu Wang wdon...@linux.vnet.ibm.com
---
 util/qemu-option.c | 66 --
 1 file changed, 49 insertions(+), 17 deletions(-)

diff --git a/util/qemu-option.c b/util/qemu-option.c
index 84d8c8b..bd2acdc 100644
--- a/util/qemu-option.c
+++ b/util/qemu-option.c
@@ -525,9 +525,31 @@ static QemuOpt *qemu_opt_find(QemuOpts *opts, const char 
*name)
 return NULL;
 }
 
+static const QemuOptDesc *find_desc_by_name(const QemuOptDesc *desc,
+const char *name)
+{
+int i;
+
+for (i = 0; desc[i].name != NULL; i++) {
+if (strcmp(desc[i].name, name) == 0) {
+return desc[i];
+}
+}
+
+return NULL;
+}
+
 const char *qemu_opt_get(QemuOpts *opts, const char *name)
 {
 QemuOpt *opt = qemu_opt_find(opts, name);
+const QemuOptDesc *desc;
+
+if (!opt) {
+desc = find_desc_by_name(opts-list-desc, name);
+if (desc  desc-def_value_str) {
+return desc-def_value_str;
+}
+}
 return opt ? opt-str : NULL;
 }
 
@@ -546,9 +568,17 @@ bool qemu_opt_has_help_opt(QemuOpts *opts)
 bool qemu_opt_get_bool(QemuOpts *opts, const char *name, bool defval)
 {
 QemuOpt *opt = qemu_opt_find(opts, name);
+const QemuOptDesc *desc;
+Error *local_err = NULL;
 
-if (opt == NULL)
+if (opt == NULL) {
+desc = find_desc_by_name(opts-list-desc, name);
+if (desc  desc-def_value_str) {
+parse_option_bool(name, desc-def_value_str, defval, local_err);
+assert(!local_err);
+}
 return defval;
+}
 assert(opt-desc  opt-desc-type == QEMU_OPT_BOOL);
 return opt-value.boolean;
 }
@@ -556,9 +586,17 @@ bool qemu_opt_get_bool(QemuOpts *opts, const char *name, 
bool defval)
 uint64_t qemu_opt_get_number(QemuOpts *opts, const char *name, uint64_t defval)
 {
 QemuOpt *opt = qemu_opt_find(opts, name);
+const QemuOptDesc *desc;
+Error *local_err = NULL;
 
-if (opt == NULL)
+if (opt == NULL) {
+desc = find_desc_by_name(opts-list-desc, name);
+if (desc  desc-def_value_str) {
+parse_option_number(name, desc-def_value_str, defval, 
local_err);
+assert(!local_err);
+}
 return defval;
+}
 assert(opt-desc  opt-desc-type == QEMU_OPT_NUMBER);
 return opt-value.uint;
 }
@@ -566,9 +604,17 @@ uint64_t qemu_opt_get_number(QemuOpts *opts, const char 
*name, uint64_t defval)
 uint64_t qemu_opt_get_size(QemuOpts *opts, const char *name, uint64_t defval)
 {
 QemuOpt *opt = qemu_opt_find(opts, name);
+const QemuOptDesc *desc;
+Error *local_err = NULL;
 
-if (opt == NULL)
+if (opt == NULL) {
+desc = find_desc_by_name(opts-list-desc, name);
+if (desc  desc-def_value_str) {
+parse_option_size(name, desc-def_value_str, defval, local_err);
+assert(!local_err);
+}
 return defval;
+}
 assert(opt-desc  opt-desc-type == QEMU_OPT_SIZE);
 return opt-value.uint;
 }
@@ -609,20 +655,6 @@ static bool opts_accepts_any(const QemuOpts *opts)
 return opts-list-desc[0].name == NULL;
 }
 
-static const QemuOptDesc *find_desc_by_name(const QemuOptDesc *desc,
-const char *name)
-{
-int i;
-
-for (i = 0; desc[i].name != NULL; i++) {
-if (strcmp(desc[i].name, name) == 0) {
-return desc[i];
-}
-}
-
-return NULL;
-}
-
 static void opt_set(QemuOpts *opts, const char *name, const char *value,
 bool prepend, Error **errp)
 {
-- 
1.7.11.7

[Qemu-devel] [PATCH V15 0/6] replace QEMUOptionParameter with QemuOpts parser

2013-05-30 Thread Dongxu Wang

These patches will replace QEMUOptionParameter with QemuOpts. Change logs
please go to each patch's commit message.

V14-V15:
1) Only delete enum QEMUOptionParType.

Dong Xu Wang (6):
  add def_value_str in QemuOptDesc struct and rewrite qemu_opts_print
  avoid duplication of default value in QemuOpts
  Create four QemuOptsList related functions
  Create some QemuOpts functons
  Use QemuOpts support in block layer
  remove QEMUOptionParameter related functions and struct

 block.c   | 100 
 block/cow.c   |  52 ++---
 block/gluster.c   |  37 ++-
 block/iscsi.c |  31 ++-
 block/qcow.c  |  67 +++---
 block/qcow2.c | 199 
 block/qed.c   | 108 +
 block/qed.h   |   2 +-
 block/raw-posix.c |  59 +++--
 block/raw-win32.c |  31 +--
 block/raw.c   |  30 +--
 block/rbd.c   |  62 +++--
 block/sheepdog.c  |  81 ---
 block/ssh.c   |  29 ++-
 block/vdi.c   |  70 +++---
 block/vmdk.c  | 128 ++-
 block/vpc.c   |  65 +++---
 block/vvfat.c |  11 +-
 include/block/block.h |   5 +-
 include/block/block_int.h |   6 +-
 include/qemu/option.h |  56 ++---
 qemu-img.c|  65 +++---
 util/qemu-option.c| 571 +-
 23 files changed, 904 insertions(+), 961 deletions(-)

-- 
1.7.11.7

[Qemu-devel] [PATCH V15 4/6] Create some QemuOpts functons

2013-05-30 Thread Dongxu Wang

From: Dong Xu Wang wdon...@linux.vnet.ibm.com

These functions will be used in next commit.

qemu_opt_get_(*)_del functions are used to make sure we
have the same behaviors as before: in block layer, after
parsing a parameter value, parameter list will delete it
to avoid parsing it twice.

Signed-off-by: Dong Xu Wang wdon...@linux.vnet.ibm.com

v13-v14:
1) rewrite commit message.
2) use def_value_str in qemu_opt_get_FOO_del() and qemu_opt_get_del().
3) delete redundant qemu_opt_del(opt).
Signed-off-by: Dongxu Wang wdon...@linux.vnet.ibm.com
---
 include/qemu/option.h |  11 -
 util/qemu-option.c| 116 +++---
 2 files changed, 119 insertions(+), 8 deletions(-)

diff --git a/include/qemu/option.h b/include/qemu/option.h
index c7a5c14..d63e447 100644
--- a/include/qemu/option.h
+++ b/include/qemu/option.h
@@ -108,6 +108,7 @@ struct QemuOptsList {
 };
 
 const char *qemu_opt_get(QemuOpts *opts, const char *name);
+const char *qemu_opt_get_del(QemuOpts *opts, const char *name);
 /**
  * qemu_opt_has_help_opt:
  * @opts: options to search for a help request
@@ -121,13 +122,18 @@ const char *qemu_opt_get(QemuOpts *opts, const char 
*name);
  */
 bool qemu_opt_has_help_opt(QemuOpts *opts);
 bool qemu_opt_get_bool(QemuOpts *opts, const char *name, bool defval);
+bool qemu_opt_get_bool_del(QemuOpts *opts, const char *name, bool defval);
 uint64_t qemu_opt_get_number(QemuOpts *opts, const char *name, uint64_t 
defval);
 uint64_t qemu_opt_get_size(QemuOpts *opts, const char *name, uint64_t defval);
+uint64_t qemu_opt_get_size_del(QemuOpts *opts, const char *name,
+   uint64_t defval);
 int qemu_opt_set(QemuOpts *opts, const char *name, const char *value);
+int qemu_opt_replace_set(QemuOpts *opts, const char *name, const char *value);
 void qemu_opt_set_err(QemuOpts *opts, const char *name, const char *value,
   Error **errp);
 int qemu_opt_set_bool(QemuOpts *opts, const char *name, bool val);
 int qemu_opt_set_number(QemuOpts *opts, const char *name, int64_t val);
+int qemu_opt_replace_set_number(QemuOpts *opts, const char *name, int64_t val);
 typedef int (*qemu_opt_loopfunc)(const char *name, const char *value, void 
*opaque);
 int qemu_opt_foreach(QemuOpts *opts, qemu_opt_loopfunc func, void *opaque,
  int abort_on_failure);
@@ -144,7 +150,10 @@ const char *qemu_opts_id(QemuOpts *opts);
 void qemu_opts_del(QemuOpts *opts);
 void qemu_opts_validate(QemuOpts *opts, const QemuOptDesc *desc, Error **errp);
 int qemu_opts_do_parse(QemuOpts *opts, const char *params, const char 
*firstname);
-QemuOpts *qemu_opts_parse(QemuOptsList *list, const char *params, int 
permit_abbrev);
+int qemu_opts_do_parse_replace(QemuOpts *opts, const char *params,
+   const char *firstname);
+QemuOpts *qemu_opts_parse(QemuOptsList *list, const char *params,
+  int permit_abbrev);
 void qemu_opts_set_defaults(QemuOptsList *list, const char *params,
 int permit_abbrev);
 QemuOpts *qemu_opts_from_qdict(QemuOptsList *list, const QDict *qdict,
diff --git a/util/qemu-option.c b/util/qemu-option.c
index e3f482f..3251adc 100644
--- a/util/qemu-option.c
+++ b/util/qemu-option.c
@@ -33,6 +33,8 @@
 #include qapi/qmp/qerror.h
 #include qemu/option_int.h
 
+static void qemu_opt_del(QemuOpt *opt);
+
 /*
  * Extracts the name of an option from the parameter string (p points at the
  * first byte of the option name)
@@ -553,6 +555,24 @@ const char *qemu_opt_get(QemuOpts *opts, const char *name)
 return opt ? opt-str : NULL;
 }
 
+const char *qemu_opt_get_del(QemuOpts *opts, const char *name)
+{
+QemuOpt *opt = qemu_opt_find(opts, name);
+const QemuOptDesc *desc;
+const char *str = NULL;
+
+if (!opt) {
+desc = find_desc_by_name(opts-list-desc, name);
+if (desc  desc-def_value_str) {
+str = g_strdup(desc-def_value_str);
+}
+} else {
+str = g_strdup(opt-str);
+qemu_opt_del(opt);
+}
+return str;
+}
+
 bool qemu_opt_has_help_opt(QemuOpts *opts)
 {
 QemuOpt *opt;
@@ -583,6 +603,27 @@ bool qemu_opt_get_bool(QemuOpts *opts, const char *name, 
bool defval)
 return opt-value.boolean;
 }
 
+bool qemu_opt_get_bool_del(QemuOpts *opts, const char *name, bool defval)
+{
+QemuOpt *opt = qemu_opt_find(opts, name);
+const QemuOptDesc *desc;
+Error *local_err = NULL;
+bool ret = defval;
+
+if (opt == NULL) {
+desc = find_desc_by_name(opts-list-desc, name);
+if (desc  desc-def_value_str) {
+parse_option_bool(name, desc-def_value_str, ret, local_err);
+assert(!local_err);
+}
+return ret;
+}
+assert(opt-desc  opt-desc-type == QEMU_OPT_BOOL);
+ret = opt-value.boolean;
+qemu_opt_del(opt);
+return ret;
+}
+
 uint64_t qemu_opt_get_number(QemuOpts *opts, const char *name, uint64_t defval)
 {

[Qemu-devel] [PATCH V15 3/6] Create four QemuOptsList related functions

2013-05-30 Thread Dongxu Wang

From: Dong Xu Wang wdon...@linux.vnet.ibm.com

This patch will create 4 functions, count_opts_list, qemu_opts_append,
qemu_opts_free and qemu_opts_print_help, they will be used in following
commits.

v12-v13:
1) simply assert that neither argument has merge_lists set.
2) drop superfluous paranthesesis around p == first.

v11-v12:
1) renmae functions.
2) fix loop styles and code styles.
3) qemu_opts_apend will not return NULL now.
4) merge_lists value is from arguments in qemu_opts_append.

v6-v7:
1) Fix typo.

v5-v6:
1) allocate enough space in append_opts_list function.

Signed-off-by: Dong Xu Wang wdon...@linux.vnet.ibm.com
Signed-off-by: Dongxu Wang wdon...@linux.vnet.ibm.com
---
 include/qemu/option.h |  3 ++
 util/qemu-option.c| 82 +++
 2 files changed, 85 insertions(+)

diff --git a/include/qemu/option.h b/include/qemu/option.h
index b928ab0..c7a5c14 100644
--- a/include/qemu/option.h
+++ b/include/qemu/option.h
@@ -157,4 +157,7 @@ int qemu_opts_print(QemuOpts *opts);
 int qemu_opts_foreach(QemuOptsList *list, qemu_opts_loopfunc func, void 
*opaque,
   int abort_on_failure);
 
+QemuOptsList *qemu_opts_append(QemuOptsList *first, QemuOptsList *second);
+void qemu_opts_free(QemuOptsList *list);
+void qemu_opts_print_help(QemuOptsList *list);
 #endif
diff --git a/util/qemu-option.c b/util/qemu-option.c
index bd2acdc..e3f482f 100644
--- a/util/qemu-option.c
+++ b/util/qemu-option.c
@@ -1218,3 +1218,85 @@ int qemu_opts_foreach(QemuOptsList *list, 
qemu_opts_loopfunc func, void *opaque,
 loc_pop(loc);
 return rc;
 }
+
+static size_t count_opts_list(QemuOptsList *list)
+{
+size_t i = 0;
+
+for (i = 0; list  list-desc[i].name; i++) {
+;
+}
+
+return i;
+}
+
+/* Create a new QemuOptsList and make its desc to the merge of first
+ * and second. It will allocate space for one new QemuOptsList plus
+ * enough space for QemuOptDesc in first and second QemuOptsList.
+ * First argument's QemuOptDesc members take precedence over second's.
+ * The result's name and implied_opt_name are not copied from them.
+ * Both merge_lists should not be set. Both list can be NULL.
+ */
+QemuOptsList *qemu_opts_append(QemuOptsList *first,
+   QemuOptsList *second)
+{
+size_t num_first_opts, num_second_opts;
+QemuOptsList *dest = NULL;
+int i = 0;
+int index = 0;
+QemuOptsList *p = first;
+
+num_first_opts = count_opts_list(first);
+num_second_opts = count_opts_list(second);
+
+dest = g_malloc0(sizeof(QemuOptsList)
++ (num_first_opts + num_second_opts + 1) * sizeof(QemuOptDesc));
+
+dest-name = append_opts_list;
+dest-implied_opt_name = NULL;
+assert((!first || !first-merge_lists)
+ (!second || !second-merge_lists));
+QTAILQ_INIT(dest-head);
+
+for (i = 0; p  p-desc[i].name; i++) {
+if (!find_desc_by_name(dest-desc, p-desc[i].name)) {
+dest-desc[index].name = g_strdup(p-desc[i].name);
+dest-desc[index].help = g_strdup(p-desc[i].help);
+dest-desc[index].type = p-desc[i].type;
+dest-desc[index].def_value_str =
+g_strdup(p-desc[i].def_value_str);
+index++;
+}
+if (p == first  p  !p-desc[i].name) {
+p = second;
+i = 0;
+}
+}
+dest-desc[index].name = NULL;
+return dest;
+}
+
+/* free a QemuOptsList, can accept NULL as arguments */
+void qemu_opts_free(QemuOptsList *list)
+{
+int i = 0;
+
+for (i = 0; list  list-desc[i].name; i++) {
+g_free((char *)list-desc[i].name);
+g_free((char *)list-desc[i].help);
+g_free((char *)list-desc[i].def_value_str);
+}
+
+g_free(list);
+}
+
+void qemu_opts_print_help(QemuOptsList *list)
+{
+int i = 0;
+printf(Supported options:\n);
+for (i = 0; list  list-desc[i].name; i++) {
+printf(%-16s %s\n, list-desc[i].name,
+list-desc[i].help ?
+list-desc[i].help : No description available);
+}
+}
-- 
1.7.11.7

[Qemu-devel] [PATCH V15 6/6] remove QEMUOptionParameter related functions and struct

2013-05-30 Thread Dongxu Wang

From: Dong Xu Wang wdon...@linux.vnet.ibm.com


Signed-off-by: Dongxu Wang wdon...@linux.vnet.ibm.com
---
 include/qemu/option.h |  39 ---
 util/qemu-option.c| 285 --
 2 files changed, 324 deletions(-)

diff --git a/include/qemu/option.h b/include/qemu/option.h
index d63e447..d2d3f16 100644
--- a/include/qemu/option.h
+++ b/include/qemu/option.h
@@ -31,24 +31,6 @@
 #include qapi/error.h
 #include qapi/qmp/qdict.h
 
-enum QEMUOptionParType {
-OPT_FLAG,
-OPT_NUMBER,
-OPT_SIZE,
-OPT_STRING,
-};
-
-typedef struct QEMUOptionParameter {
-const char *name;
-enum QEMUOptionParType type;
-union {
-uint64_t n;
-char* s;
-} value;
-const char *help;
-} QEMUOptionParameter;
-
-
 const char *get_opt_name(char *buf, int buf_size, const char *p, char delim);
 const char *get_opt_value(char *buf, int buf_size, const char *p);
 int get_next_param_value(char *buf, int buf_size,
@@ -58,27 +40,6 @@ int get_param_value(char *buf, int buf_size,
 int check_params(char *buf, int buf_size,
  const char * const *params, const char *str);
 
-
-/*
- * The following functions take a parameter list as input. This is a pointer to
- * the first element of a QEMUOptionParameter array which is terminated by an
- * entry with entry-name == NULL.
- */
-
-QEMUOptionParameter *get_option_parameter(QEMUOptionParameter *list,
-const char *name);
-int set_option_parameter(QEMUOptionParameter *list, const char *name,
-const char *value);
-int set_option_parameter_int(QEMUOptionParameter *list, const char *name,
-uint64_t value);
-QEMUOptionParameter *append_option_parameters(QEMUOptionParameter *dest,
-QEMUOptionParameter *list);
-QEMUOptionParameter *parse_option_parameters(const char *param,
-QEMUOptionParameter *list, QEMUOptionParameter *dest);
-void free_option_parameters(QEMUOptionParameter *list);
-void print_option_parameters(QEMUOptionParameter *list);
-void print_option_help(QEMUOptionParameter *list);
-
 /* -- */
 
 typedef struct QemuOpt QemuOpt;
diff --git a/util/qemu-option.c b/util/qemu-option.c
index 3251adc..e1b2a3b 100644
--- a/util/qemu-option.c
+++ b/util/qemu-option.c
@@ -155,22 +155,6 @@ int check_params(char *buf, int buf_size,
 return 0;
 }
 
-/*
- * Searches an option list for an option with the given name
- */
-QEMUOptionParameter *get_option_parameter(QEMUOptionParameter *list,
-const char *name)
-{
-while (list  list-name) {
-if (!strcmp(list-name, name)) {
-return list;
-}
-list++;
-}
-
-return NULL;
-}
-
 static void parse_option_bool(const char *name, const char *value, bool *ret,
   Error **errp)
 {
@@ -244,275 +228,6 @@ static void parse_option_size(const char *name, const 
char *value,
 }
 }
 
-/*
- * Sets the value of a parameter in a given option list. The parsing of the
- * value depends on the type of option:
- *
- * OPT_FLAG (uses value.n):
- *  If no value is given, the flag is set to 1.
- *  Otherwise the value must be on (set to 1) or off (set to 0)
- *
- * OPT_STRING (uses value.s):
- *  value is strdup()ed and assigned as option value
- *
- * OPT_SIZE (uses value.n):
- *  The value is converted to an integer. Suffixes for kilobytes etc. are
- *  allowed (powers of 1024).
- *
- * Returns 0 on succes, -1 in error cases
- */
-int set_option_parameter(QEMUOptionParameter *list, const char *name,
-const char *value)
-{
-bool flag;
-Error *local_err = NULL;
-
-// Find a matching parameter
-list = get_option_parameter(list, name);
-if (list == NULL) {
-fprintf(stderr, Unknown option '%s'\n, name);
-return -1;
-}
-
-// Process parameter
-switch (list-type) {
-case OPT_FLAG:
-parse_option_bool(name, value, flag, local_err);
-if (!error_is_set(local_err)) {
-list-value.n = flag;
-}
-break;
-
-case OPT_STRING:
-if (value != NULL) {
-list-value.s = g_strdup(value);
-} else {
-fprintf(stderr, Option '%s' needs a parameter\n, name);
-return -1;
-}
-break;
-
-case OPT_SIZE:
-parse_option_size(name, value, list-value.n, local_err);
-break;
-
-default:
-fprintf(stderr, Bug: Option '%s' has an unknown type\n, name);
-return -1;
-}
-
-if (error_is_set(local_err)) {
-qerror_report_err(local_err);
-error_free(local_err);
-return -1;
-}
-
-return 0;
-}
-
-/*
- * Sets the given parameter to an integer instead of a string.
- * This function cannot be used to set string options.
- *
- * Returns 0 on success, -1 in error cases
- */
-int set_option_parameter_int(QEMUOptionParameter *list, const char *name,
-uint64_t value)
-{
-// Find a matching parameter
-list

[Qemu-devel] Unable to parse -device drivers containing commas?

2013-05-30 Thread Mark Cave-Ayland


Hi all,

I found that the QEMU -device command line parser doesn't seem to like 
driver names containing a comma such as SUNW,tcx for the video driver 
on qemu-system-sparc:


$ ./qemu-system-sparc -device SUNW,tcx,help
qemu-system-sparc: -device SUNW,tcx,help: Parameter 'driver' expects 
device type

$ ./qemu-system-sparc -device 'SUNW,tcx',help
qemu-system-sparc: -device SUNW,tcx,help: Parameter 'driver' expects 
device type

$ ./qemu-system-sparc -device SUNW,tcx,help
qemu-system-sparc: -device SUNW,tcx,help: Parameter 'driver' expects 
device type


If I try temporarily removing the comma from the TypeInfo name field in 
hw/display/tcx.c then all is fine:


$ ./qemu-system-sparc -device 'SUNWtcx',help
SUNWtcx.vram_size=hex32
SUNWtcx.width=uint16
SUNWtcx.height=uint16
SUNWtcx.depth=uint16
$ ./qemu-system-sparc -device SUNWtcx,help
SUNWtcx.vram_size=hex32
SUNWtcx.width=uint16
SUNWtcx.height=uint16
SUNWtcx.depth=uint16

Note that there are a couple of other devices in the SPARC32 device tree 
with this problem, since the general device naming convention on SPARC 
is in the form manufacturer,device. Is there a way of escaping the 
commas on the command line so that it is possible to list properties for 
drivers named in this way?



Many thanks,

Mark.

[Qemu-devel] [PATCH V19 0/8] add-cow file format

2013-05-30 Thread Dongxu Wang

It will introduce a new file format: add-cow.

The add-cow file format makes it possible to perform copy-on-write on top of
a raw disk image.  When we know that no backing file clusters remain visible
(e.g. we have streamed the entire image and copied all data from the backing
file), then it is possible to discard the add-cow file and use the raw image
file directly.

This feature adds the copy-on-write feature to raw files (which cannot support
it natively) while allowing us to get full performance again later when we no
longer need copy-on-write.

add-cow can benefit from other available functions, such as path_has_protocol
and qed_read_string, so we will make them public.

snapshot_blkdev are not supported now for add-cow. Will add it in futher 
patches.

These patches are using QemuOpts parser, former patches could be found here:
http://patchwork.ozlabs.org/patch/247508/


v18-v19:
1) support parallel aio write.
2) fix flush method.
3) other small fix.
v17 - v18:
1) remove version field.
2) header size is maximum value and cluster size value.
3) fix type.
4) move struct to source file.
5) cluster_size-table_size.
6) use error_report, not fprintf.
7) remove version field from header.
8) header_size is MAX(cluster_size, 4096).
9) introduce s-cluster_sectors.
10) use BLKDBG_L2_LOAD/UPDATE.
11) add 037 and 038 tests.

v16-v17):
1) Use stringify.

v15-v16):
1) Rebased on QEMU upstream source tree.
2) Judge if opts is null in add_cow_create function.

v14-v15:
1) Fix typo and make some sentences more readable in docs.
2) Introduce STRINGIZER macro.

v13-v14:
1) Make some sentences more clear in docs.
2) Make MAGIC from 8 bytes to 4 bytes.
3) Fix some bugs.

v12-v13:
1) Use QemuOpts, not QEMUOptionParameter
2) cluster_size configuable
3) Refactor block-cache.c
4) Correct qemu-iotests script.
5) Other bug fix.

v11-v12:
1) Removed un-used feature bit.
2) Share cache code with qcow2.c.
3) Remove snapshot_blkdev support, will add it in another patch.
5) COW Bitmap field in add-cow file will be multiple of 65536.
6) fix grammer and typo.

Dong Xu Wang (8):
  V18: docs: document for add-cow file format
  make path_has_protocol non static
  qed_read_string to bdrv_read_string
  rename qcow2-cache.c to block-cache.c
  Make block-cache.c be common interface
  add debug event for add-cow
  add-cow file format core code.
  qemu-iotests: add add-cow iotests support

 block.c  |  29 +-
 block/Makefile.objs  |   4 +-
 block/add-cow.c  | 812 +++
 block/blkdebug.c |   3 +
 block/block-cache.c  | 342 ++
 block/qcow2-cache.c  | 323 -
 block/qcow2-cluster.c|  52 +--
 block/qcow2-refcount.c   |  62 ++--
 block/qcow2.c|  21 +-
 block/qcow2.h|  24 +-
 block/qed.c  |  34 +-
 docs/specs/add-cow.txt   | 172 +
 include/block/block-cache.h  |  59 
 include/block/block.h|   6 +
 include/block/block_int.h|   2 +
 tests/qemu-iotests/017   |   2 +-
 tests/qemu-iotests/020   |   2 +-
 tests/qemu-iotests/037   |   2 +-
 tests/qemu-iotests/038   |   2 +-
 tests/qemu-iotests/common|   6 +
 tests/qemu-iotests/common.rc |  15 +-
 trace-events |  13 +-
 22 files changed, 1536 insertions(+), 451 deletions(-)
 create mode 100644 block/add-cow.c
 create mode 100644 block/block-cache.c
 delete mode 100644 block/qcow2-cache.c
 create mode 100644 docs/specs/add-cow.txt
 create mode 100644 include/block/block-cache.h

-- 
1.7.11.7

[Qemu-devel] [PATCH V19 1/8] V18: docs: document for add-cow file format

2013-05-30 Thread Dongxu Wang

From: Dong Xu Wang wdon...@linux.vnet.ibm.com

Document for add-cow format, the usage and spec of add-cow are
introduced.

v18-v19:
1) backing_fmt and image_fmt NUL-terminated.
2) other fix.
V17-V18:
1) remove version field.
2) header size is maximum value and cluster size value.
3) fix type.

Signed-off-by: Dong Xu Wang wdon...@linux.vnet.ibm.com
Signed-off-by: Dongxu Wang wdon...@linux.vnet.ibm.com
---
 docs/specs/add-cow.txt | 172 +
 1 file changed, 172 insertions(+)
 create mode 100644 docs/specs/add-cow.txt

diff --git a/docs/specs/add-cow.txt b/docs/specs/add-cow.txt
new file mode 100644
index 000..fba74dd
--- /dev/null
+++ b/docs/specs/add-cow.txt
@@ -0,0 +1,172 @@
+== General ==
+
+The raw file format does not support backing files or copy on write
+feature. The add-cow image format makes it possible to use backing
+files with an image by keeping a separate .add-cow metadata file.
+Once all clusters have been written into the image it is safe to
+discard the .add-cow and backing files, then we can use the image
+directly.
+
+An example usage of add-cow would look like:
+(ubuntu.img is a disk image which has an installed OS.)
+1)  Create an image, such as raw format, with the same size of
+ubuntu.img:
+qemu-img create -f raw test.raw 8G
+2)  Create an add-cow image which will store dirty bitmap
+qemu-img create -f add-cow test.add-cow \
+-o backing_file=ubuntu.img,image_file=test.raw
+3)  Run qemu with add-cow image
+qemu -drive if=virtio,file=test.add-cow
+
+test.raw may be larger than ubuntu.img, in that case, the size of
+test.add-cow will be calculated from the size of test.raw.
+
+image_fmt can be omitted, in that case image_fmt is assumed to be
+raw. backing_fmt can also be omitted, add-cow should do a probe
+operation and determine what the backing file's format is. It is
+recommended to always specify the format for any raw file, because
+probing a raw file is a security hole.
+
+=Specification=
+
+The file format looks like this:
+
+ +---+---+
+ | Header|   COW bitmap  |
+ +---+---+
+
+All numbers in add-cow are stored in Little Endian byte order.
+
+== Header ==
+
+The Header is included in the first bytes:
+(HEADER_SIZE is defined in 40-43 bytes.)
+Byte0  -  3:magic
+add-cow magic string (ACOW). It is coded in
+free-form ASCII.
+
+4  -  7:backing file name offset
+Offset in the add-cow file at which the backing
+file name is stored (NB: The string is NOT
+NUL-terminated).
+If backing file name does NOT exist, this field
+will be 0. Must be between 76 and [HEADER_SIZE
+- 2](a file name must be at least 1 byte).
+
+8  - 11:backing file name size
+Length of the backing file name in bytes. It
+will be 0 if the backing file name offset is
+0. If backing file name offset is non-zero,
+then it must be non-zero. Must be less than
+[HEADER_SIZE - 76] to fit in the reserved
+part of the header. Backing file name offset
++ size must be no more than HEADER_SIZE.
+
+12 - 15:image file name offset
+Offset in the add-cow file at which the image
+file name is stored (NB: The string is NOT
+NUL-terminated). It must be between 76 and
+[HEADER_SIZE - 2]. Image file name size + offset
+must be no more than HEADER_SIZE.
+
+16 - 19:image file name size
+Length of the image file name in bytes.
+Must be less than [HEADER_SIZE - 76] to fit in
+the reserved part of the header.
+
+20 - 23:cluster bits
+Number of bits that are used for addressing an
+offset within a cluster (1  cluster_bits is
+the cluster size). Must not be less than 12
+(i.e. 4096 byte clusters).
+
+Note: qemu as of today has an implementation
+limit of 2 MB as the maximum cluster size and
+won't be able to open images with larger cluster
+sizes.
+
+24 - 31:features
+Bitmask of features. If a feature bit is set
+but not recognized, the opening add-cow file must
+fail.  No features bits are currently defined.
+
+

[Qemu-devel] [PATCH V19 4/8] rename qcow2-cache.c to block-cache.c

2013-05-30 Thread Dongxu Wang

From: Dong Xu Wang wdon...@linux.vnet.ibm.com

Block layer will use qcow2-cache as common cache code, so rename
it to block-cache.c.

v18-v19:
1) only rename, did not touch other code.

Signed-off-by: Dongxu Wang wdon...@linux.vnet.ibm.com
---
 block/Makefile.objs|   3 +-
 block/block-cache.c| 323 +
 block/qcow2-cache.c| 323 -
 block/qcow2-cluster.c  |  48 
 block/qcow2-refcount.c |  42 +++
 block/qcow2.c  |  18 +--
 block/qcow2.h  |  28 ++---
 trace-events   |  12 +-
 8 files changed, 399 insertions(+), 398 deletions(-)
 create mode 100644 block/block-cache.c
 delete mode 100644 block/qcow2-cache.c

diff --git a/block/Makefile.objs b/block/Makefile.objs
index 5f0358a..16e574a 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -1,5 +1,6 @@
 block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o 
vvfat.o
-block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o 
qcow2-cache.o
+block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o
+block-obj-y += block-cache.o
 block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
 block-obj-y += qed-check.o
 block-obj-y += vhdx.o
diff --git a/block/block-cache.c b/block/block-cache.c
new file mode 100644
index 000..bc057a8
--- /dev/null
+++ b/block/block-cache.c
@@ -0,0 +1,323 @@
+/*
+ * L2/refcount table cache for the QCOW2 format
+ *
+ * Copyright (c) 2010 Kevin Wolf kw...@redhat.com
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the Software), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include block/block_int.h
+#include qemu-common.h
+#include qcow2.h
+#include trace.h
+
+typedef struct BlockCachedTable {
+void*   table;
+int64_t offset;
+booldirty;
+int cache_hits;
+int ref;
+} BlockCachedTable;
+
+struct BlockCache {
+BlockCachedTable*   entries;
+struct BlockCache*  depends;
+int size;
+booldepends_on_flush;
+};
+
+BlockCache *block_cache_create(BlockDriverState *bs, int num_tables)
+{
+BDRVQcowState *s = bs-opaque;
+BlockCache *c;
+int i;
+
+c = g_malloc0(sizeof(*c));
+c-size = num_tables;
+c-entries = g_malloc0(sizeof(*c-entries) * num_tables);
+
+for (i = 0; i  c-size; i++) {
+c-entries[i].table = qemu_blockalign(bs, s-cluster_size);
+}
+
+return c;
+}
+
+int block_cache_destroy(BlockDriverState* bs, BlockCache *c)
+{
+int i;
+
+for (i = 0; i  c-size; i++) {
+assert(c-entries[i].ref == 0);
+qemu_vfree(c-entries[i].table);
+}
+
+g_free(c-entries);
+g_free(c);
+
+return 0;
+}
+
+static int block_cache_flush_dependency(BlockDriverState *bs, BlockCache *c)
+{
+int ret;
+
+ret = block_cache_flush(bs, c-depends);
+if (ret  0) {
+return ret;
+}
+
+c-depends = NULL;
+c-depends_on_flush = false;
+
+return 0;
+}
+
+static int block_cache_entry_flush(BlockDriverState *bs, BlockCache *c, int i)
+{
+BDRVQcowState *s = bs-opaque;
+int ret = 0;
+
+if (!c-entries[i].dirty || !c-entries[i].offset) {
+return 0;
+}
+
+trace_block_cache_entry_flush(qemu_coroutine_self(),
+  c == s-l2_table_cache, i);
+
+if (c-depends) {
+ret = block_cache_flush_dependency(bs, c);
+} else if (c-depends_on_flush) {
+ret = bdrv_flush(bs-file);
+if (ret = 0) {
+c-depends_on_flush = false;
+}
+}
+
+if (ret  0) {
+return ret;
+}
+
+if (c == s-refcount_block_cache) {
+BLKDBG_EVENT(bs-file, BLKDBG_REFBLOCK_UPDATE_PART);
+} else if (c == s-l2_table_cache) {
+BLKDBG_EVENT(bs-file, BLKDBG_L2_UPDATE);
+}
+
+ret = bdrv_pwrite(bs-file, c-entries[i].offset, c-entries[i].table,
+s-cluster_size);
+if (ret  0) {
+return

[Qemu-devel] [PATCH V19 3/8] qed_read_string to bdrv_read_string

2013-05-30 Thread Dongxu Wang

From: Dong Xu Wang wdon...@linux.vnet.ibm.com

Make qed_read_string function to a common interface, so move it to
block.c.

Signed-off-by: Dong Xu Wang wdon...@linux.vnet.ibm.com
Signed-off-by: Dongxu Wang wdon...@linux.vnet.ibm.com
---
 block.c   | 27 +++
 block/qed.c   | 34 --
 include/block/block.h |  2 ++
 3 files changed, 33 insertions(+), 30 deletions(-)

diff --git a/block.c b/block.c
index 5135044..65c5e76 100644
--- a/block.c
+++ b/block.c
@@ -210,6 +210,33 @@ int path_has_protocol(const char *path)
 return *p == ':';
 }
 
+/**
+ * Read a string of known length from the image file
+ *
+ * @bs: Image file
+ * @offset: File offset to start of string, in bytes
+ * @n:  String length in bytes
+ * @buf:Destination buffer
+ * @buflen: Destination buffer length in bytes
+ * @ret:0 on success, -errno on failure
+ *
+ * The string is NUL-terminated.
+ */
+int bdrv_read_string(BlockDriverState *bs, uint64_t offset, size_t n,
+ char *buf, size_t buflen)
+{
+int ret;
+if (n = buflen) {
+return -EINVAL;
+}
+ret = bdrv_pread(bs, offset, buf, n);
+if (ret  0) {
+return ret;
+}
+buf[n] = '\0';
+return 0;
+}
+
 int path_is_absolute(const char *path)
 {
 #ifdef _WIN32
diff --git a/block/qed.c b/block/qed.c
index 88daf22..077da76 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -217,33 +217,6 @@ static bool qed_is_image_size_valid(uint64_t image_size, 
uint32_t cluster_size,
 }
 
 /**
- * Read a string of known length from the image file
- *
- * @file:   Image file
- * @offset: File offset to start of string, in bytes
- * @n:  String length in bytes
- * @buf:Destination buffer
- * @buflen: Destination buffer length in bytes
- * @ret:0 on success, -errno on failure
- *
- * The string is NUL-terminated.
- */
-static int qed_read_string(BlockDriverState *file, uint64_t offset, size_t n,
-   char *buf, size_t buflen)
-{
-int ret;
-if (n = buflen) {
-return -EINVAL;
-}
-ret = bdrv_pread(file, offset, buf, n);
-if (ret  0) {
-return ret;
-}
-buf[n] = '\0';
-return 0;
-}
-
-/**
  * Allocate new clusters
  *
  * @s:  QED state
@@ -437,9 +410,10 @@ static int bdrv_qed_open(BlockDriverState *bs, QDict 
*options, int flags)
 return -EINVAL;
 }
 
-ret = qed_read_string(bs-file, s-header.backing_filename_offset,
-  s-header.backing_filename_size, 
bs-backing_file,
-  sizeof(bs-backing_file));
+ret = bdrv_read_string(bs-file, s-header.backing_filename_offset,
+   s-header.backing_filename_size,
+   bs-backing_file,
+   sizeof(bs-backing_file));
 if (ret  0) {
 return ret;
 }
diff --git a/include/block/block.h b/include/block/block.h
index fd776a1..2989da6 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -175,6 +175,8 @@ int bdrv_pwrite_sync(BlockDriverState *bs, int64_t offset,
 const void *buf, int count);
 int coroutine_fn bdrv_co_readv(BlockDriverState *bs, int64_t sector_num,
 int nb_sectors, QEMUIOVector *qiov);
+int bdrv_read_string(BlockDriverState *bs, uint64_t offset, size_t n,
+ char *buf, size_t buflen);
 int coroutine_fn bdrv_co_copy_on_readv(BlockDriverState *bs,
 int64_t sector_num, int nb_sectors, QEMUIOVector *qiov);
 int coroutine_fn bdrv_co_writev(BlockDriverState *bs, int64_t sector_num,
-- 
1.7.11.7

[Qemu-devel] [PATCH V19 2/8] make path_has_protocol non static

2013-05-30 Thread Dongxu Wang

From: Dong Xu Wang wdon...@linux.vnet.ibm.com

We will use path_has_protocol outside block.c, so just make it public.

Reviewed-by: Michael Roth mdr...@linux.vnet.ibm.com
Signed-off-by: Dong Xu Wang wdon...@linux.vnet.ibm.com
Signed-off-by: Dongxu Wang wdon...@linux.vnet.ibm.com
---
 block.c   | 2 +-
 include/block/block.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index 710362f..5135044 100644
--- a/block.c
+++ b/block.c
@@ -193,7 +193,7 @@ static void bdrv_io_limits_intercept(BlockDriverState *bs,
 }
 
 /* check if the path starts with protocol: */
-static int path_has_protocol(const char *path)
+int path_has_protocol(const char *path)
 {
 const char *p;
 
diff --git a/include/block/block.h b/include/block/block.h
index 90f95ef..fd776a1 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -345,6 +345,7 @@ char *bdrv_snapshot_dump(char *buf, int buf_size, 
QEMUSnapshotInfo *sn);
 
 char *get_human_readable_size(char *buf, int buf_size, int64_t size);
 int path_is_absolute(const char *path);
+int path_has_protocol(const char *path);
 void path_combine(char *dest, int dest_size,
   const char *base_path,
   const char *filename);
-- 
1.7.11.7

[Qemu-devel] [PATCH V19 8/8] qemu-iotests: add add-cow iotests support

2013-05-30 Thread Dongxu Wang

From: Dong Xu Wang wdon...@linux.vnet.ibm.com

This patch will use qemu-iotests to test add-cow file format.

v17-v18:
1) add 037 and 038 tests.

Signed-off-by: Dong Xu Wang wdon...@linux.vnet.ibm.com
Signed-off-by: Dongxu Wang wdon...@linux.vnet.ibm.com
---
 tests/qemu-iotests/017   |  2 +-
 tests/qemu-iotests/020   |  2 +-
 tests/qemu-iotests/037   |  2 +-
 tests/qemu-iotests/038   |  2 +-
 tests/qemu-iotests/common|  6 ++
 tests/qemu-iotests/common.rc | 15 ++-
 6 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/tests/qemu-iotests/017 b/tests/qemu-iotests/017
index 45f2c0b..ae0e627 100755
--- a/tests/qemu-iotests/017
+++ b/tests/qemu-iotests/017
@@ -40,7 +40,7 @@ trap _cleanup; exit \$status 0 1 2 3 15
 . ./common.pattern
 
 # Any format supporting backing files
-_supported_fmt qcow qcow2 vmdk qed
+_supported_fmt qcow qcow2 vmdk qed add-cow
 _supported_proto generic
 _supported_os Linux
 
diff --git a/tests/qemu-iotests/020 b/tests/qemu-iotests/020
index 2fb0ff8..3dbb495 100755
--- a/tests/qemu-iotests/020
+++ b/tests/qemu-iotests/020
@@ -42,7 +42,7 @@ trap _cleanup; exit \$status 0 1 2 3 15
 . ./common.pattern
 
 # Any format supporting backing files
-_supported_fmt qcow qcow2 vmdk qed
+_supported_fmt qcow qcow2 vmdk qed add-cow
 _supported_proto generic
 _supported_os Linux
 
diff --git a/tests/qemu-iotests/037 b/tests/qemu-iotests/037
index c11460b..683e73d 100755
--- a/tests/qemu-iotests/037
+++ b/tests/qemu-iotests/037
@@ -38,7 +38,7 @@ trap _cleanup; exit \$status 0 1 2 3 15
 . ./common.rc
 . ./common.filter
 
-_supported_fmt qcow qcow2 vmdk qed
+_supported_fmt qcow qcow2 vmdk qed add-cow
 _supported_proto generic
 _supported_os Linux
 
diff --git a/tests/qemu-iotests/038 b/tests/qemu-iotests/038
index 36125ea..10bacfb 100755
--- a/tests/qemu-iotests/038
+++ b/tests/qemu-iotests/038
@@ -38,7 +38,7 @@ trap _cleanup; exit \$status 0 1 2 3 15
 . ./common.rc
 . ./common.filter
 
-_supported_fmt qcow2 qed
+_supported_fmt qcow2 qed add-cow
 _supported_proto generic
 _supported_os Linux
 
diff --git a/tests/qemu-iotests/common b/tests/qemu-iotests/common
index 6826ea7..5110fc1 100644
--- a/tests/qemu-iotests/common
+++ b/tests/qemu-iotests/common
@@ -128,6 +128,7 @@ common options
 check options
 -rawtest raw (default)
 -cowtest cow
+-add-cowtest add-cow
 -qcow   test qcow
 -qcow2  test qcow2
 -qedtest qed
@@ -165,6 +166,11 @@ testlist options
xpand=false
;;
 
+-add-cow)
+IMGFMT=add-cow
+xpand=false
+;;
+
-qcow)
IMGFMT=qcow
xpand=false
diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
index e9ba358..b203f04 100644
--- a/tests/qemu-iotests/common.rc
+++ b/tests/qemu-iotests/common.rc
@@ -110,6 +110,16 @@ _make_test_img()
 fi
 if [ \( $IMGFMT = qcow2 -o $IMGFMT = qed \) -a -n $CLUSTER_SIZE 
]; then
 optstr=$(_optstr_add $optstr cluster_size=$CLUSTER_SIZE)
+elif [ $IMGFMT = add-cow ]; then
+local IMG=$TEST_IMG.raw
+if [ $1 = -b ]; then
+IMG=$IMG.b
+$QEMU_IMG create -f raw $IMG $image_size/dev/null
+extra_img_options=-o image_file=$IMG $extra_img_options
+else
+$QEMU_IMG create -f raw $IMG $image_size/dev/null
+extra_img_options=-o image_file=$IMG
+fi
 fi
 
 if [ -n $optstr ]; then
@@ -130,7 +140,8 @@ _make_test_img()
 -e s# zeroed_grain=\\(on\\|off\\)##g \
 -e s# subformat='[^']*'##g \
 -e s# adapter_type='[^']*'##g \
--e s# lazy_refcounts=\\(on\\|off\\)##g
+-e s# lazy_refcounts=\\(on\\|off\\)##g \
+-e s# image_file='[^']*'##g
 
 # Start an NBD server on the image file, which is what we'll be talking to
 if [ $IMGPROTO = nbd ]; then
@@ -152,6 +163,8 @@ _cleanup_test_img()
 rm -f $TEST_DIR/t.$IMGFMT
 rm -f $TEST_DIR/t.$IMGFMT.orig
 rm -f $TEST_DIR/t.$IMGFMT.base
+rm -f $TEST_DIR/t.$IMGFMT.raw
+rm -f $TEST_DIR/t.$IMGFMT.raw.b
 ;;
 
 rbd)
-- 
1.7.11.7

[Qemu-devel] [PATCH V19 7/8] add-cow file format core code.

2013-05-30 Thread Dongxu Wang

From: Dong Xu Wang wdon...@linux.vnet.ibm.com

add-cow file format core code. It use block-cache.c as cache code.
It lacks of snapshot_blkdev support.

v18-v19:
1) add aio parallel write support.
2) fix flush method.

v17-v18:
1) use error_report, not fprintf.
2) remove version field from header.
3) header_size is MAX(cluster_size, 4096).
4) introduce s-cluster_sectors.
5) use BLKDBG_L2_LOAD/UPDATE.

v16-v17:
1) Use stringify.

v15-v16:
1) Judge if opts is null in add_cow_create function.

Signed-off-by: Dong Xu Wang wdon...@linux.vnet.ibm.com

Conflicts:
block/block-cache.c
Signed-off-by: Dongxu Wang wdon...@linux.vnet.ibm.com
---
 block/Makefile.objs |   1 +
 block/add-cow.c | 812 
 block/block-cache.c |  16 +-
 include/block/block-cache.h |   4 +
 include/block/block_int.h   |   2 +
 5 files changed, 831 insertions(+), 4 deletions(-)
 create mode 100644 block/add-cow.c

diff --git a/block/Makefile.objs b/block/Makefile.objs
index 16e574a..f666a7a 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -1,6 +1,7 @@
 block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o 
vvfat.o
 block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o
 block-obj-y += block-cache.o
+block-obj-y += add-cow.o
 block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
 block-obj-y += qed-check.o
 block-obj-y += vhdx.o
diff --git a/block/add-cow.c b/block/add-cow.c
new file mode 100644
index 000..290da3c
--- /dev/null
+++ b/block/add-cow.c
@@ -0,0 +1,812 @@
+/*
+ * QEMU ADD-COW Disk Format
+ *
+ * Copyright IBM, Corp. 2012
+ *
+ * Authors:
+ *  Dong Xu Wang wdon...@linux.vnet.ibm.com
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#include qemu-common.h
+#include block/block_int.h
+#include qemu/module.h
+#include block/block-cache.h
+
+#define ACOW_CLUSTER_SIZE 65536
+
+typedef struct AddCowMeta
+{
+uint64_t start_sector;
+int remaining_sectors;
+CoQueue dependent_requests;
+struct AddCowMeta *next;
+
+QLIST_ENTRY(AddCowMeta) next_in_flight;
+} AddCowMeta;
+
+enum {
+/* compat_features bit */
+ACOW_F_ALL_ALLOCATED= 0X01,
+
+/* none feature bit used now */
+ACOW_FEATURE_MASK   = 0,
+
+ACOW_MAGIC  = 'A' | 'C'  8 | 'O'  16 | 'W'  24,
+ACOW_CACHE_SIZE = 16,
+DEFAULT_HEADER_SIZE = 4096,
+MIN_CLUSTER_BITS= 12,
+MAX_CLUSTER_BITS= 21,
+};
+
+typedef struct AddCowHeader {
+uint32_tmagic;
+
+uint32_tbacking_offset;
+uint32_tbacking_size;
+uint32_timage_offset;
+uint32_timage_size;
+
+uint32_tcluster_bits;
+uint64_tfeatures;
+uint64_tcompat_features;
+uint32_theader_size;
+
+charbacking_fmt[16];
+charimage_fmt[16];
+} QEMU_PACKED AddCowHeader;
+
+typedef struct BDRVAddCowState {
+BlockDriverState*image_hd;
+CoMutex lock;
+int cluster_size;
+int cluster_sectors;
+BlockCache  *bitmap_cache;
+uint64_tbitmap_size;
+AddCowHeaderheader;
+charbacking_fmt[16];
+charimage_fmt[16];
+BlockCache  *image_hd_cache;
+QLIST_HEAD(AddCowAlloc, AddCowMeta) cluster_allocs;
+} BDRVAddCowState;
+
+/* Convert sector_num to offset in bitmap */
+static inline int64_t offset_in_bitmap(int64_t sector_num,
+   int64_t cluster_sectors)
+{
+int64_t cluster_num = sector_num / cluster_sectors;
+return cluster_num / 8;
+}
+
+static inline bool is_cluster_head(int64_t sector_num,
+   int64_t cluster_sectors)
+{
+return sector_num % cluster_sectors == 0;
+}
+
+static inline bool is_cluster_tail(int64_t sector_num,
+   int64_t cluster_sectors)
+{
+return (sector_num + 1) % cluster_sectors == 0;
+}
+
+static void add_cow_header_le_to_cpu(const AddCowHeader *le, AddCowHeader *cpu)
+{
+cpu-magic  = le32_to_cpu(le-magic);
+
+cpu-backing_offset = le32_to_cpu(le-backing_offset);
+cpu-backing_size   = le32_to_cpu(le-backing_size);
+cpu-image_offset   = le32_to_cpu(le-image_offset);
+cpu-image_size = le32_to_cpu(le-image_size);
+
+cpu-cluster_bits   = le32_to_cpu(le-cluster_bits);
+cpu-features   = le64_to_cpu(le-features);
+cpu-compat_features= le64_to_cpu(le-compat_features);
+cpu-header_size= le32_to_cpu(le-header_size);
+
+memcpy(cpu-backing_fmt, le-backing_fmt, sizeof(cpu-backing_fmt));
+memcpy(cpu-image_fmt, le-image_fmt, sizeof(cpu-image_fmt));
+}
+
+static void add_cow_header_cpu_to_le(const AddCowHeader *cpu, AddCowHeader *le)
+{
+le-magic   =

[Qemu-devel] [PATCH V19 5/8] Make block-cache.c be common interface

2013-05-30 Thread Dongxu Wang

From: Dong Xu Wang wdon...@linux.vnet.ibm.com

To use block-cache.c be common cache interface, we need to add
some parameter to identify cache's type.

Define a struct named BlockTableType, pass BlockTableType and
table size parameters to block cache initialization function.

v17-v18:
1) move struct to source file.
2) cluster_size-table_size.

Signed-off-by: Dong Xu Wang wdon...@linux.vnet.ibm.com
Signed-off-by: Dongxu Wang wdon...@linux.vnet.ibm.com
---
 block/block-cache.c | 75 ++---
 block/qcow2-cluster.c   | 30 +-
 block/qcow2-refcount.c  | 30 +-
 block/qcow2.c   |  7 +++--
 block/qcow2.h   | 24 ++-
 include/block/block-cache.h | 55 +
 trace-events|  1 +
 7 files changed, 139 insertions(+), 83 deletions(-)
 create mode 100644 include/block/block-cache.h

diff --git a/block/block-cache.c b/block/block-cache.c
index bc057a8..f5d75d1 100644
--- a/block/block-cache.c
+++ b/block/block-cache.c
@@ -1,4 +1,8 @@
 /*
+ * QEMU Block Layer Cache
+ *
+ * This file is based on qcow2-cache.c, see its copyrights below:
+ *
  * L2/refcount table cache for the QCOW2 format
  *
  * Copyright (c) 2010 Kevin Wolf kw...@redhat.com
@@ -24,11 +28,11 @@
 
 #include block/block_int.h
 #include qemu-common.h
-#include qcow2.h
 #include trace.h
+#include block/block-cache.h
 
 typedef struct BlockCachedTable {
-void*   table;
+void*table;
 int64_t offset;
 booldirty;
 int cache_hits;
@@ -36,30 +40,34 @@ typedef struct BlockCachedTable {
 } BlockCachedTable;
 
 struct BlockCache {
-BlockCachedTable*   entries;
-struct BlockCache*  depends;
-int size;
-booldepends_on_flush;
+BlockCachedTable*entries;
+struct BlockCache   *depends;
+int size;
+size_t  table_size;
+BlockTableType  table_type;
+booldepends_on_flush;
 };
 
-BlockCache *block_cache_create(BlockDriverState *bs, int num_tables)
+BlockCache *block_cache_create(BlockDriverState *bs, int num_tables,
+   size_t table_size, BlockTableType type)
 {
-BDRVQcowState *s = bs-opaque;
 BlockCache *c;
 int i;
 
 c = g_malloc0(sizeof(*c));
 c-size = num_tables;
 c-entries = g_malloc0(sizeof(*c-entries) * num_tables);
+c-table_type = type;
+c-table_size = table_size;
 
 for (i = 0; i  c-size; i++) {
-c-entries[i].table = qemu_blockalign(bs, s-cluster_size);
+c-entries[i].table = qemu_blockalign(bs, table_size);
 }
 
 return c;
 }
 
-int block_cache_destroy(BlockDriverState* bs, BlockCache *c)
+int block_cache_destroy(BlockDriverState *bs, BlockCache *c)
 {
 int i;
 
@@ -91,15 +99,13 @@ static int block_cache_flush_dependency(BlockDriverState 
*bs, BlockCache *c)
 
 static int block_cache_entry_flush(BlockDriverState *bs, BlockCache *c, int i)
 {
-BDRVQcowState *s = bs-opaque;
 int ret = 0;
 
 if (!c-entries[i].dirty || !c-entries[i].offset) {
 return 0;
 }
 
-trace_block_cache_entry_flush(qemu_coroutine_self(),
-  c == s-l2_table_cache, i);
+trace_block_cache_entry_flush(qemu_coroutine_self(), c-table_type, i);
 
 if (c-depends) {
 ret = block_cache_flush_dependency(bs, c);
@@ -114,14 +120,16 @@ static int block_cache_entry_flush(BlockDriverState *bs, 
BlockCache *c, int i)
 return ret;
 }
 
-if (c == s-refcount_block_cache) {
+if (c-table_type == BLOCK_TABLE_REF) {
 BLKDBG_EVENT(bs-file, BLKDBG_REFBLOCK_UPDATE_PART);
-} else if (c == s-l2_table_cache) {
+} else if (c-table_type == BLOCK_TABLE_L2) {
 BLKDBG_EVENT(bs-file, BLKDBG_L2_UPDATE);
+} else if (c-table_type == BLOCK_TABLE_BITMAP) {
+BLKDBG_EVENT(bs-file, BLKDBG_COW_WRITE);
 }
 
-ret = bdrv_pwrite(bs-file, c-entries[i].offset, c-entries[i].table,
-s-cluster_size);
+ret = bdrv_pwrite(bs-file, c-entries[i].offset,
+  c-entries[i].table, c-table_size);
 if (ret  0) {
 return ret;
 }
@@ -133,12 +141,11 @@ static int block_cache_entry_flush(BlockDriverState *bs, 
BlockCache *c, int i)
 
 int block_cache_flush(BlockDriverState *bs, BlockCache *c)
 {
-BDRVQcowState *s = bs-opaque;
 int result = 0;
 int ret;
 int i;
 
-trace_block_cache_flush(qemu_coroutine_self(), c == s-l2_table_cache);
+trace_block_cache_flush(qemu_coroutine_self(), c-table_type);
 
 for (i = 0; i  c-size; i++) {
 ret = block_cache_entry_flush(bs, c, i);
@@ -157,8 +164,9 @@ int block_cache_flush(BlockDriverState *bs, BlockCache *c)
 return result;
 }
 
-int block_cache_set_dependency(BlockDriverState *bs, BlockCache *c,
-BlockCache *dependency)
+int block_cache_set_dependency(BlockDriverState *bs,
+

Re: [Qemu-devel] [PATCH 5/5] memory: able to pin guest node memory to host node manually

2013-05-30 Thread Wanlong Gao

Any comments?


 Use mbind to pin guest numa node memory to host nodes manually.
 
 If we are not able to pin memory to host node, we may meet the
 cross node memory access performance regression.
 
 With this patch, we can add manual pinning host node like this:
 -m 1024 -numa node,cpus=0,nodeid=0,mem=512,pin=0 -numa 
 node,nodeid=1,cpus=1,mem=512,pin=1
 
 And, if PCI-passthrough is used, direct-attached-device uses DMA transfer
 between device and qemu process. All pages of the guest will be pinned by 
 get_user_pages().
 
 KVM_ASSIGN_PCI_DEVICE ioctl
   kvm_vm_ioctl_assign_device()
 =kvm_assign_device()
   = kvm_iommu_map_memslots()
 = kvm_iommu_map_pages()
= kvm_pin_pages()
 
 So, with direct-attached-device, all guest page's page count will be +1 and
 any page migration will not work. AutoNUMA won't too. And direction by 
 libvirt is *ignored*.
 
 Above all, we need manual pinning memory to host node to avoid
 such cross nodes memmory access performance regression.
 
 Signed-off-by: Wanlong Gao gaowanl...@cn.fujitsu.com
 ---
  exec.c  | 21 +
  include/sysemu/sysemu.h |  1 +
  vl.c| 13 +
  3 files changed, 35 insertions(+)
 
 diff --git a/exec.c b/exec.c
 index aec65c5..fe929ef 100644
 --- a/exec.c
 +++ b/exec.c
 @@ -36,6 +36,8 @@
  #include qemu/config-file.h
  #include exec/memory.h
  #include sysemu/dma.h
 +#include sysemu/sysemu.h
 +#include qemu/bitops.h
  #include exec/address-spaces.h
  #if defined(CONFIG_USER_ONLY)
  #include qemu.h
 @@ -1081,6 +1083,25 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, 
 void *host,
  memory_try_enable_merging(new_block-host, size);
  }
  }
 +
 +if (nb_numa_nodes  0  !strcmp(mr-name, pc.ram)) {
 +int i;
 +uint64_t nodes_mem = 0;
 +unsigned long *maskp = g_malloc0(sizeof(*maskp));
 +for (i = 0; i  nb_numa_nodes; i++) {
 +*maskp = 0;
 +if (node_pin[i] != -1) {
 +set_bit(node_pin[i], maskp);
 +if (qemu_mbind(new_block-host + nodes_mem, node_mem[i],
 +   QEMU_MPOL_BIND, maskp, MAX_NODES, 0)) {
 +perror(qemu_mbind);
 +exit(1);
 +}
 +}
 +nodes_mem += node_mem[i];
 +}
 +}
 +
  new_block-length = size;
  
  /* Keep the list sorted from biggest to smallest block.  */
 diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
 index 2fb71af..ebf6580 100644
 --- a/include/sysemu/sysemu.h
 +++ b/include/sysemu/sysemu.h
 @@ -131,6 +131,7 @@ extern QEMUClock *rtc_clock;
  #define MAX_CPUMASK_BITS 255
  extern int nb_numa_nodes;
  extern uint64_t node_mem[MAX_NODES];
 +extern int node_pin[MAX_NODES];
  extern unsigned long *node_cpumask[MAX_NODES];
  
  #define MAX_OPTION_ROMS 16
 diff --git a/vl.c b/vl.c
 index b1d..3768002 100644
 --- a/vl.c
 +++ b/vl.c
 @@ -253,6 +253,7 @@ static QTAILQ_HEAD(, FWBootEntry) fw_boot_order =
  
  int nb_numa_nodes;
  uint64_t node_mem[MAX_NODES];
 +int node_pin[MAX_NODES];
  unsigned long *node_cpumask[MAX_NODES];
  
  uint8_t qemu_uuid[16];
 @@ -1390,6 +1391,17 @@ static void numa_add(const char *optarg)
  }
  node_mem[nodenr] = sval;
  }
 +
 +if (get_param_value(option, 128, pin, optarg) != 0) {
 +int unsigned long long pin_node;
 +if (parse_uint_full(option, pin_node, 10)  0) {
 +fprintf(stderr, qemu: Invalid pinning nodeid: %s\n, 
 optarg);
 +exit(1);
 +} else {
 +node_pin[nodenr] = pin_node;
 +}
 +}
 +
  if (get_param_value(option, 128, cpus, optarg) != 0) {
  numa_node_parse_cpus(nodenr, option);
  }
 @@ -2921,6 +2933,7 @@ int main(int argc, char **argv, char **envp)
  
  for (i = 0; i  MAX_NODES; i++) {
  node_mem[i] = 0;
 +node_pin[i] = -1;
  node_cpumask[i] = bitmap_new(MAX_CPUMASK_BITS);
  }

[Qemu-devel] [RFC] Check backing_file chain's loop

2013-05-30 Thread Dongxu Wang


Hi,

Now block layer does not check whether backing_file chain can be a 
circle, do you think it is necessary to do a check?


For example,  1.qcow2's backing_file is 2.qcow2 and 2.qcow2's 
backing_file is 1.qcow2, then any IO operation will be a fault.


If necessary, I will post a patch to fix.

[$] qemu-img create -f qcow2 1.qcow2
qemu-img: Image creation needs a size parameter
[$] qemu-img create -f qcow2 1.qcow2 8M
Formatting '1.qcow2', fmt=qcow2 size=8388608 encryption=off 
cluster_size=65536 lazy_refcounts=off

[$] qemu-img create -f qcow2 2.qcow2 -o backing_file=1.qcow2
Formatting '2.qcow2', fmt=qcow2 size=8388608 backing_file='1.qcow2' 
encryption=off cluster_size=65536 lazy_refcounts=off

[$] qemu-img create -f qcow2 1.qcow2 -o backing_file=2.qcow2
Formatting '1.qcow2', fmt=qcow2 size=8388608 backing_file='2.qcow2' 
encryption=off cluster_size=65536 lazy_refcounts=off

[$] qemu-io -c read -v 0 512 1.qcow2
^CSegmentation fault (core dumped)



Thanks.

Re: [Qemu-devel] broken incoming migration

2013-05-30 Thread Peter Maydell

On 30 May 2013 08:44, Alexey Kardashevskiy a...@ozlabs.ru wrote:
 I found the migration broken on pseries platform, specifically, this patch
 broke it:

 f1c72795af573b24a7da5eb52375c9aba8a37972
 migration: do not sent zero pages in bulk stage

 The idea is not to send zero pages to the destination guest which is
 expected to have 100% empty RAM.

This just seems like a broken assumption to me. If the migration
protocol is going to skip sending zero pages it should do it by saying
there's a zero page here and have the destination memset the
page to zero.

thanks
-- PMM

[Qemu-devel] [PATCH V19 6/8] add debug event for add-cow

2013-05-30 Thread Dongxu Wang

From: Dong Xu Wang wdon...@linux.vnet.ibm.com

Signed-off-by: Dong Xu Wang wdon...@linux.vnet.ibm.com
Signed-off-by: Dongxu Wang wdon...@linux.vnet.ibm.com
---
 block/blkdebug.c  | 3 +++
 block/block-cache.c   | 4 ++--
 include/block/block.h | 3 +++
 3 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/block/blkdebug.c b/block/blkdebug.c
index 71f99e4..2bd6a53 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -182,6 +182,9 @@ static const char *event_names[BLKDBG_EVENT_MAX] = {
 [BLKDBG_CLUSTER_ALLOC]  = cluster_alloc,
 [BLKDBG_CLUSTER_ALLOC_BYTES]= cluster_alloc_bytes,
 [BLKDBG_CLUSTER_FREE]   = cluster_free,
+
+[BLKDBG_ADDCOW_READ]= add_cow_read,
+[BLKDBG_ADDCOW_WRITE]   = add_cow_write,
 };
 
 static int get_event_by_name(const char *name, BlkDebugEvent *event)
diff --git a/block/block-cache.c b/block/block-cache.c
index f5d75d1..454269c 100644
--- a/block/block-cache.c
+++ b/block/block-cache.c
@@ -125,7 +125,7 @@ static int block_cache_entry_flush(BlockDriverState *bs, 
BlockCache *c, int i)
 } else if (c-table_type == BLOCK_TABLE_L2) {
 BLKDBG_EVENT(bs-file, BLKDBG_L2_UPDATE);
 } else if (c-table_type == BLOCK_TABLE_BITMAP) {
-BLKDBG_EVENT(bs-file, BLKDBG_COW_WRITE);
+BLKDBG_EVENT(bs-file, BLKDBG_ADDCOW_WRITE);
 }
 
 ret = bdrv_pwrite(bs-file, c-entries[i].offset,
@@ -260,7 +260,7 @@ static int block_cache_do_get(BlockDriverState *bs, 
BlockCache *c,
 if (c-table_type == BLOCK_TABLE_L2) {
 BLKDBG_EVENT(bs-file, BLKDBG_L2_LOAD);
 } else if (c-table_type == BLOCK_TABLE_BITMAP) {
-BLKDBG_EVENT(bs-file, BLKDBG_COW_READ);
+BLKDBG_EVENT(bs-file, BLKDBG_ADDCOW_READ);
 }
 
 ret = bdrv_pread(bs-file, offset, c-entries[i].table,
diff --git a/include/block/block.h b/include/block/block.h
index 2989da6..3573e3e 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -451,6 +451,9 @@ typedef enum {
 BLKDBG_CLUSTER_ALLOC_BYTES,
 BLKDBG_CLUSTER_FREE,
 
+BLKDBG_ADDCOW_READ,
+BLKDBG_ADDCOW_WRITE,
+
 BLKDBG_EVENT_MAX,
 } BlkDebugEvent;
 
-- 
1.7.11.7

[Qemu-devel] Why some test suite in kvm-unit-tests designed for 64bit only?

2013-05-30 Thread 李春奇

Hi there,
I'm now reading codes of kvm-unit-tests and I found that some of the
test cases for x86 is only designed for x86_64 (including access.flat,
apic.flat, emulator.flat, idt_test.flat and so on). I wonder why these
cases are not designed for i386? Or is there any other concerns?

Thanks,
Arthur

--
Arthur Chunqi Li
Department of Computer Science
School of EECS
Peking University
Beijing, China

[Qemu-devel] [PATCH v2 2/5] pci: store PCI hole ranges in guestinfo structure

2013-05-30 Thread Michael S. Tsirkin

Will be used to pass hole ranges to guests.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 hw/i386/pc.c  | 39 ++-
 hw/i386/pc_piix.c | 14 +-
 hw/i386/pc_q35.c  |  6 +-
 hw/pci-host/q35.c |  4 
 include/hw/i386/pc.h  | 19 ++-
 include/hw/pci-host/q35.h |  2 ++
 include/qemu/typedefs.h   |  1 +
 7 files changed, 81 insertions(+), 4 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 4844a6b..c233161 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -978,6 +978,41 @@ void pc_cpus_init(const char *cpu_model, DeviceState 
*icc_bridge)
 }
 }
 
+typedef struct PcGuestInfoState {
+PcGuestInfo info;
+Notifier machine_done;
+} PcGuestInfoState;
+
+static
+void pc_guest_info_machine_done(Notifier *notifier, void *data)
+{
+PcGuestInfoState *guest_info_state = container_of(notifier,
+  PcGuestInfoState,
+  machine_done);
+}
+
+PcGuestInfo *pc_guest_info_init(ram_addr_t below_4g_mem_size,
+ram_addr_t above_4g_mem_size)
+{
+PcGuestInfoState *guest_info_state = g_malloc0(sizeof *guest_info_state);
+PcGuestInfo *guest_info = guest_info_state-info;
+
+guest_info-pci_info.w32.end = IO_APIC_DEFAULT_ADDRESS;
+if (sizeof(hwaddr) == 4) {
+guest_info-pci_info.w64.begin = 0;
+guest_info-pci_info.w64.end = 0;
+} else {
+guest_info-pci_info.w64.begin = 0x1ULL + above_4g_mem_size;
+guest_info-pci_info.w64.end =  guest_info-pci_info.w64.begin +
+(0x1ULL  62);
+assert(guest_info-pci_info.w64.begin = guest_info-pci_info.w64.end);
+}
+
+guest_info_state-machine_done.notify = pc_guest_info_machine_done;
+qemu_add_machine_init_done_notifier(guest_info_state-machine_done);
+return guest_info;
+}
+
 void pc_acpi_init(const char *default_dsdt)
 {
 char *filename;
@@ -1019,7 +1054,8 @@ FWCfgState *pc_memory_init(MemoryRegion *system_memory,
ram_addr_t below_4g_mem_size,
ram_addr_t above_4g_mem_size,
MemoryRegion *rom_memory,
-   MemoryRegion **ram_memory)
+   MemoryRegion **ram_memory,
+   PcGuestInfo *guest_info)
 {
 int linux_boot, i;
 MemoryRegion *ram, *option_rom_mr;
@@ -1071,6 +1107,7 @@ FWCfgState *pc_memory_init(MemoryRegion *system_memory,
 for (i = 0; i  nb_option_roms; i++) {
 rom_add_option(option_rom[i].name, option_rom[i].bootindex);
 }
+guest_info-fw_cfg = fw_cfg;
 return fw_cfg;
 }
 
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index d547548..eaff0b6 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -90,6 +90,7 @@ static void pc_init1(MemoryRegion *system_memory,
 MemoryRegion *rom_memory;
 DeviceState *icc_bridge;
 FWCfgState *fw_cfg = NULL;
+PcGuestInfo *guest_info;
 
 icc_bridge = qdev_create(NULL, TYPE_ICC_BRIDGE);
 object_property_add_child(qdev_get_machine(), icc-bridge,
@@ -119,12 +120,23 @@ static void pc_init1(MemoryRegion *system_memory,
 rom_memory = system_memory;
 }
 
+guest_info = pc_guest_info_init(below_4g_mem_size, above_4g_mem_size);
+
+/* Set PCI window size the way seabios has always done it. */
+/* TODO: consider just starting at below_4g_mem_size */
+if (ram_size = 0x8000)
+guest_info-pci_info.w32.begin = 0x8000;
+else if (ram_size = 0xc000)
+guest_info-pci_info.w32.begin = 0xc000;
+else
+guest_info-pci_info.w32.begin = 0xe000;
+
 /* allocate ram and load rom/bios */
 if (!xen_enabled()) {
 fw_cfg = pc_memory_init(system_memory,
kernel_filename, kernel_cmdline, initrd_filename,
below_4g_mem_size, above_4g_mem_size,
-   rom_memory, ram_memory);
+   rom_memory, ram_memory, guest_info);
 }
 
 gsi_state = g_malloc0(sizeof(*gsi_state));
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 7888dfe..32d6357 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -77,6 +77,7 @@ static void pc_q35_init(QEMUMachineInitArgs *args)
 ICH9LPCState *ich9_lpc;
 PCIDevice *ahci;
 DeviceState *icc_bridge;
+PcGuestInfo *guest_info;
 
 icc_bridge = qdev_create(NULL, TYPE_ICC_BRIDGE);
 object_property_add_child(qdev_get_machine(), icc-bridge,
@@ -105,11 +106,13 @@ static void pc_q35_init(QEMUMachineInitArgs *args)
 rom_memory = get_system_memory();
 }
 
+guest_info = pc_guest_info_init(below_4g_mem_size, above_4g_mem_size);
+
 /* allocate ram and load rom/bios */
 if (!xen_enabled()) {
 pc_memory_init(get_system_memory(), kernel_filename, kernel_cmdline,

[Qemu-devel] [PATCH v2 1/5] range: add Range structure

2013-05-30 Thread Michael S. Tsirkin

Sometimes we need to pass ranges around, add a
handy structure for this purpose.

Note: memory.c defines its own concept of AddrRange structure for
working with 128 addresses.  It's necessary there for doing range math.
This is not needed for most users: struct Range is
much simpler, and is only used for passing the range around.

Cc: Peter Maydell peter.mayd...@linaro.org
Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 include/qemu/range.h | 16 
 1 file changed, 16 insertions(+)

diff --git a/include/qemu/range.h b/include/qemu/range.h
index 3502372..b76cc0d 100644
--- a/include/qemu/range.h
+++ b/include/qemu/range.h
@@ -1,6 +1,22 @@
 #ifndef QEMU_RANGE_H
 #define QEMU_RANGE_H
 
+#include inttypes.h
+
+/*
+ * Operations on 64 bit address ranges.
+ * Notes:
+ *   - ranges must not wrap around 0, but can include the last byte ~0x0LL.
+ *   - this can not represent a full 0 to ~0x0LL range.
+ */
+
+/* A structure representing a range of addresses. */
+struct Range {
+uint64_t begin; /* First byte of the range, or 0 if empty. */
+uint64_t end;   /* 1 + the last byte. 0 if range empty or ends at ~0x0LL. 
*/
+};
+typedef struct Range Range;
+
 /* Get last byte of a range from offset + length.
  * Undefined for ranges that wrap around 0. */
 static inline uint64_t range_get_last(uint64_t offset, uint64_t len)
-- 
MST

[Qemu-devel] [PATCH v2 0/5] pc: pass pci window data to guests

2013-05-30 Thread Michael S. Tsirkin

This makes it possible for bios to load pci window
data from host.

This makes it possible for host to make sure
setup matches hardware exactly.
This will also make it easier to add more chipsets
down the road.

Ranges are passed within a generic GuestInfo
structure, can add more fields of interest
to Guests in the future.

Note: this is on top of my PCI branch,
if no one objects I'd like to merge it through
there as there are some trivial dependencies on that.

Changes from v1:
- fix v1.5-v1.6 migration compatibility
- address Peter Maydell's comments on range.h
- make addresses a bit smaller, compatible to what seabios does at the
  moment. We can increase the windows, carefully, at a later time.

Michael S. Tsirkin (5):
  range: add Range structure
  pci: store PCI hole ranges in guestinfo structure
  pc: pass PCI hole ranges to Guests
  pc: add 1.6 compat type
  pc: pci-info add compat support

 hw/i386/pc.c  | 65 ++-
 hw/i386/pc_piix.c | 37 ---
 hw/i386/pc_q35.c  |  6 -
 hw/pci-host/q35.c |  4 +++
 include/hw/i386/pc.h  | 20 ++-
 include/hw/pci-host/q35.h |  2 ++
 include/qemu/range.h  | 16 
 include/qemu/typedefs.h   |  1 +
 8 files changed, 145 insertions(+), 6 deletions(-)

-- 
MST

[Qemu-devel] [PATCH v2 5/5] pc: pci-info add compat support

2013-05-30 Thread Michael S. Tsirkin

We can't change fw cfg entries or add new ones
without breaking cross version migration.
Add a flag to skip adding new entry when
running with 1.5 compat machine type.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 hw/i386/pc.c |  7 ++-
 hw/i386/pc_piix.c| 12 ++--
 include/hw/i386/pc.h |  1 +
 3 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 64101cb..8b548d0 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -988,7 +988,12 @@ typedef struct PcRomPciInfo {
 
 static void pc_fw_cfg_guest_info(PcGuestInfo *guest_info)
 {
-PcRomPciInfo *info = g_malloc(sizeof *info);
+PcRomPciInfo *info;
+if (guest_info-compat_v1_5) {
+return;
+}
+
+info = g_malloc(sizeof *info);
 info-w32_min = cpu_to_le64(guest_info-pci_info.w32.begin);
 info-w32_max = cpu_to_le64(guest_info-pci_info.w32.end);
 info-w64_min = cpu_to_le64(guest_info-pci_info.w64.begin);
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 2717d83..5b281d1 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -57,7 +57,7 @@ static const int ide_iobase2[MAX_IDE_BUS] = { 0x3f6, 0x376 };
 static const int ide_irq[MAX_IDE_BUS] = { 14, 15 };
 
 static bool has_pvpanic = true;
-static bool has_pci_info = true;
+static bool guest_info_compat_v1_5 = false;
 
 /* PC hardware initialisation */
 static void pc_init1(MemoryRegion *system_memory,
@@ -122,6 +122,7 @@ static void pc_init1(MemoryRegion *system_memory,
 }
 
 guest_info = pc_guest_info_init(below_4g_mem_size, above_4g_mem_size);
+guest_info-compat_v1_5 = guest_info_compat_v1_5;
 
 /* Set PCI window size the way seabios has always done it. */
 /* TODO: consider just starting at below_4g_mem_size */
@@ -259,6 +260,12 @@ static void pc_init_pci(QEMUMachineInitArgs *args)
  initrd_filename, cpu_model, 1, 1);
 }
 
+static void pc_init_pci_1_5(QEMUMachineInitArgs *args)
+{
+guest_info_compat_v1_5 = true;
+pc_init_pci(args);
+}
+
 static void pc_init_pci_1_4(QEMUMachineInitArgs *args)
 {
 has_pvpanic = false;
@@ -355,7 +362,7 @@ static QEMUMachine pc_i440fx_machine_v1_6 = {
 static QEMUMachine pc_i440fx_machine_v1_5 = {
 .name = pc-i440fx-1.5,
 .desc = Standard PC (i440FX + PIIX, 1996),
-.init = pc_init_pci,
+.init = pc_init_pci_1_5,
 .hot_add_cpu = pc_hot_add_cpu,
 .max_cpus = 255,
 .is_default = 1,
@@ -759,6 +766,7 @@ static QEMUMachine xenfv_machine = {
 
 static void pc_machine_init(void)
 {
+qemu_register_machine(pc_i440fx_machine_v1_6);
 qemu_register_machine(pc_i440fx_machine_v1_5);
 qemu_register_machine(pc_i440fx_machine_v1_4);
 qemu_register_machine(pc_machine_v1_3);
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 1bf5219..6419da8 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -20,6 +20,7 @@ typedef struct PcPciInfo {
 
 struct PcGuestInfo {
 PcPciInfo pci_info;
+bool compat_v1_5;
 FWCfgState *fw_cfg;
 };
 
-- 
MST

[Qemu-devel] [PATCH v2 4/5] pc: add 1.6 compat type

2013-05-30 Thread Michael S. Tsirkin

Identical to 1.5 ATM, but changes will accumulate.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 hw/i386/pc_piix.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index eaff0b6..2717d83 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -57,6 +57,7 @@ static const int ide_iobase2[MAX_IDE_BUS] = { 0x3f6, 0x376 };
 static const int ide_irq[MAX_IDE_BUS] = { 14, 15 };
 
 static bool has_pvpanic = true;
+static bool has_pci_info = true;
 
 /* PC hardware initialisation */
 static void pc_init1(MemoryRegion *system_memory,
@@ -340,9 +341,19 @@ static void pc_xen_hvm_init(QEMUMachineInitArgs *args)
 }
 #endif
 
+static QEMUMachine pc_i440fx_machine_v1_6 = {
+.name = pc-i440fx-1.6,
+.alias = pc,
+.desc = Standard PC (i440FX + PIIX, 1996),
+.init = pc_init_pci,
+.hot_add_cpu = pc_hot_add_cpu,
+.max_cpus = 255,
+.is_default = 1,
+DEFAULT_MACHINE_OPTIONS,
+};
+
 static QEMUMachine pc_i440fx_machine_v1_5 = {
 .name = pc-i440fx-1.5,
-.alias = pc,
 .desc = Standard PC (i440FX + PIIX, 1996),
 .init = pc_init_pci,
 .hot_add_cpu = pc_hot_add_cpu,
-- 
MST

[Qemu-devel] [PATCH v2 3/5] pc: pass PCI hole ranges to Guests

2013-05-30 Thread Michael S. Tsirkin

Guest currently has to jump through lots of hoops to guess the PCI hole
ranges.  It's fragile, and makes us change BIOS each time we add a new
chipset.  Let's report the window in a ROM file, to make BIOS do exactly
what QEMU intends.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 hw/i386/pc.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index c233161..64101cb 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -978,6 +978,26 @@ void pc_cpus_init(const char *cpu_model, DeviceState 
*icc_bridge)
 }
 }
 
+/* pci-info ROM file. Little endian format */
+typedef struct PcRomPciInfo {
+uint64_t w32_min;
+uint64_t w32_max;
+uint64_t w64_min;
+uint64_t w64_max;
+} PcRomPciInfo;
+
+static void pc_fw_cfg_guest_info(PcGuestInfo *guest_info)
+{
+PcRomPciInfo *info = g_malloc(sizeof *info);
+info-w32_min = cpu_to_le64(guest_info-pci_info.w32.begin);
+info-w32_max = cpu_to_le64(guest_info-pci_info.w32.end);
+info-w64_min = cpu_to_le64(guest_info-pci_info.w64.begin);
+info-w64_max = cpu_to_le64(guest_info-pci_info.w64.end);
+/* Pass PCI hole info to guest via a side channel.
+ * Required so guest PCI enumeration does the right thing. */
+fw_cfg_add_file(guest_info-fw_cfg, etc/pci-info, info, sizeof *info);
+}
+
 typedef struct PcGuestInfoState {
 PcGuestInfo info;
 Notifier machine_done;
@@ -989,6 +1009,7 @@ void pc_guest_info_machine_done(Notifier *notifier, void 
*data)
 PcGuestInfoState *guest_info_state = container_of(notifier,
   PcGuestInfoState,
   machine_done);
+pc_fw_cfg_guest_info(guest_info_state-info);
 }
 
 PcGuestInfo *pc_guest_info_init(ram_addr_t below_4g_mem_size,
-- 
MST

Re: [Qemu-devel] [PATCH v4 0/3] ARM aarch64 TCG target

2013-05-30 Thread Peter Maydell

On 29 May 2013 10:04, Claudio Fontana claudio.font...@huawei.com wrote:

 This series implements preliminary support for the ARM aarch64 TCG target.

Reviewed-by: Peter Maydell peter.mayd...@linaro.org

Cc'd some people who might like to commit the patchset.

thanks
-- PMM

Re: [Qemu-devel] [SeaBIOS] KVM call agenda for 2013-05-28

2013-05-30 Thread Laszlo Ersek

On 05/30/13 11:23, David Woodhouse wrote:
 On Wed, 2013-05-29 at 11:18 -0500, Anthony Liguori wrote:

 Certainly an option, but that is a long-term project.

 Out of curiousity, are there other benefits to using coreboot as a core
 firmware in QEMU?

 Is there a payload we would ever plausibly use besides OVMF and SeaBIOS?
 
 I like the idea of using Coreboot on the UEFI side — if the most
 actively used TianoCore platform is CorebootPkg instead of OvmfPkg, that
 makes it a lot easier for people using *real* hardware with Coreboot to
 use TianoCore.

Where is CorebootPkg available from?

 And it helps to dispel the stupid misconception in some quarters that
 Coreboot *competes* with UEFI and thus cannot possibly be supported
 because helping something that competes with UEFI would be bad.

I'm not sure who do you mean by some quarters, but for some
distributions Coreboot would be yet another component (package) to
support, for no obvious benefit.

(Gerd said it better than I possibly could:
http://thread.gmane.org/gmane.comp.bios.coreboot.seabios/5685/focus=5705.)

 
 Is there a payload we would ever plausibly use besides OVMF and
 SeaBIOS?
 
 For my part I want to get to the point where the default firmware
 shipped with qemu can be OVMF.

For some distributions this is a licensing question as well. See
FatBinPkg/License.txt. (The same applies if you rebuild it from source
(FatPkgDev), based on FatBinPkg/ReadMe.txt.)  For example Fedora can't
ship OVMF because of it.

If you implement a UEFI FAT driver based on Microsoft's official
specification, you're bound by the same restrictions on use and
redistribution.

If you implement a private UEFI FAT driver from scratch, or port a free
software FAT implementation (eg. the r/o one in grub or the r/w one in
mtools), you could still run into legal problems, I've been told.

If you rip out the FAT driver, then OVMF won't be UEFI compliant and no
installer media will boot on it.

Interestingly, Ubuntu has OVMF in Universe
http://packages.ubuntu.com/raring/ovmf. I think they missed the
FatBinPkg license (I would have missed it too, after all you have to
track down the licenses of every module included in the FDF file -- it
was Paolo who told me about it) and I believe they should actually ship
OVMF in Multiverse or Restricted
https://help.ubuntu.com/community/Repositories/Ubuntu.

 We have SeaBIOS-as-CSM working, which was
 one of the biggest barriers.

Agreed, and I could have never done that. Thanks for implementing it
with Kevin.

We need at least one out-of-tree edk2 patch for now (from you) but
apparently that's no problem.

 There are a few more things (like NV
 variable storage, in particular) that I need to fix before I can
 actually make that suggestion with a straight face though...

As far as I understand:
- Jordan is nearing completion of flash support on KVM,
- he also has WIP NvVar storage on top of qemu flash.

http://thread.gmane.org/gmane.comp.emulators.qemu/213690
http://thread.gmane.org/gmane.comp.bios.tianocore.devel/2781/focus=2798

(Please coordinate I guess :))

Laszlo

[Qemu-devel] [PATCH] gdbstub: do not restart crashed guest

2013-05-30 Thread Paolo Bonzini

If a guest has crashed with an internal error or similar, detaching
gdb (or any other debugger action) should not restart it.

Cc: Jan Kiszka jan.kis...@siemens.com
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 gdbstub.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gdbstub.c b/gdbstub.c
index e80e1d3..90e54cb 100644
--- a/gdbstub.c
+++ b/gdbstub.c
@@ -371,7 +371,9 @@ static inline void gdb_continue(GDBState *s)
 #ifdef CONFIG_USER_ONLY
 s-running_state = 1;
 #else
-vm_start();
+if (runstate_check(RUN_STATE_DEBUG)) {
+vm_start();
+}
 #endif
 }
 
-- 
1.8.1.4

[Qemu-devel] [PVSCSI]How to unplug scsi disk simulated by Qemu, just like unplug the ide disk?

2013-05-30 Thread Gonglei (Arei)

Hi all,
My environment is xen-4.1.2 + qemu-1.2.2
I made a pvscsi driver for Redhat guest, but I encountered a problem 
that I could see two scsi disks, one was simulated by QEMU, another was 
passthrough.
Actually I want to unplug the scsi disk simulated. Any methods can solve 
the problem on the qemu upstream. Thanks!

Re: [Qemu-devel] [PVSCSI]How to unplug scsi disk simulated by Qemu, just like unplug the ide disk?

2013-05-30 Thread Laszlo Ersek

On 05/30/13 13:23, Gonglei (Arei) wrote:
 Hi all,
 My environment is xen-4.1.2 + qemu-1.2.2
   I made a pvscsi driver for Redhat guest, but I encountered a problem 
 that I could see two scsi disks, one was simulated by QEMU, another was 
 passthrough.
 Actually I want to unplug the scsi disk simulated. Any methods can solve 
 the problem on the qemu upstream. Thanks!

Unless I'm misunderstanding you (which is more likely than not):
emulated devices (eg. NIC and IDE disks) are unplugged by the guest
kernel. See arch/x86/xen/platform-pci-unplug.c.

The qemu side seems to be in hw/xen/xen_platform.c.

Laszlo

Re: [Qemu-devel] [PATCH] gdbstub: do not restart crashed guest

2013-05-30 Thread Laszlo Ersek

On 05/30/13 13:20, Paolo Bonzini wrote:
 If a guest has crashed with an internal error or similar, detaching
 gdb (or any other debugger action) should not restart it.
 
 Cc: Jan Kiszka jan.kis...@siemens.com
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 ---
  gdbstub.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)
 
 diff --git a/gdbstub.c b/gdbstub.c
 index e80e1d3..90e54cb 100644
 --- a/gdbstub.c
 +++ b/gdbstub.c
 @@ -371,7 +371,9 @@ static inline void gdb_continue(GDBState *s)
  #ifdef CONFIG_USER_ONLY
  s-running_state = 1;
  #else
 -vm_start();
 +if (runstate_check(RUN_STATE_DEBUG)) {
 +vm_start();
 +}
  #endif
  }
  
 

I sought to check the gdb_continue() call sites, and uses of
RUN_STATE_DEBUG. Seems reasonable.

Reviewed-by: Laszlo Ersek ler...@redhat.com

(
FWIW I wonder why in commit ad02b96a Luiz allowed DEBUG - SUSPENDED. As
far as I understand, when the debugger is attached, the guest is not
running, hence it can't go directly to RUN_STATE_SUSPENDED. Maybe due to
a concurrent monitor command?

Technically it does seem possible; from main_loop_should_exit():

if (qemu_debug_requested()) {
vm_stop(RUN_STATE_DEBUG);
}
if (qemu_suspend_requested()) {
qemu_system_suspend();
}

Both requests could become pending during one iteration of the loop, and
the next iteration will see both of them. OK.
)

Laszlo

Re: [Qemu-devel] Unable to parse -device drivers containing commas?

2013-05-30 Thread Eric Blake

On 05/30/2013 04:00 AM, Mark Cave-Ayland wrote:
 Hi all,
 
 I found that the QEMU -device command line parser doesn't seem to like
 driver names containing a comma such as SUNW,tcx for the video driver
 on qemu-system-sparc:
 
 $ ./qemu-system-sparc -device SUNW,tcx,help

  Is there a way of escaping the
 commas on the command line so that it is possible to list properties for
 drivers named in this way?

Commas are escaped by doubling them.  Try ./qemu-system-sparc -device
SUNW,,tcx,help

That's the generic quoting we've used for escaping commas in all new
command line options, although there may be some older options that
still need to be taught to honor that escaping.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [Qemu-stable] [QEMU PATCH v3] qdev: fix get_fw_dev_path to support to add nothing to fw_dev_path

2013-05-30 Thread Stefan Priebe - Profihost AG

Am 29.05.2013 09:56, schrieb Amos Kong:
 Recent virtio refactoring in QEMU made virtio-bus become the parent bus
 of scsi-bus, and virtio-bus doesn't have get_fw_dev_path implementation,
 typename will be added to fw_dev_path by default, the new fw_dev_path
 could not be identified by seabios. It causes that bootindex parameter
 of scsi device doesn't work.
 
 This patch implements get_fw_dev_path() in BusClass, it will be called
 if bus doesn't implement the method, tyename will be added to
 fw_dev_path. If the implemented method returns NULL, nothing will be
 added to fw_dev_path.
 
 It also implements virtio_bus_get_fw_dev_path() to return NULL. Then
 QEMU will still pass original style of fw_dev_path to seabios.
 
 Signed-off-by: Amos Kong ak...@redhat.com
 --
 v2: only add nothing to fw_dev_path when get_fw_dev_path() is
 implemented and returns NULL. then it will not effect other devices
 don't have get_fw_dev_path() implementation.
 v3: implement default get_fw_dev_path() in BusClass
 ---
  hw/core/qdev.c | 10 +-
  hw/virtio/virtio-bus.c |  6 ++
  2 files changed, 15 insertions(+), 1 deletion(-)
 
 diff --git a/hw/core/qdev.c b/hw/core/qdev.c
 index 6985ad8..9190a7e 100644
 --- a/hw/core/qdev.c
 +++ b/hw/core/qdev.c
 @@ -515,7 +515,7 @@ static int qdev_get_fw_dev_path_helper(DeviceState *dev, 
 char *p, int size)
  l += snprintf(p + l, size - l, %s, d);
  g_free(d);
  } else {
 -l += snprintf(p + l, size - l, %s, 
 object_get_typename(OBJECT(dev)));
 +return l;
  }
  }
  l += snprintf(p + l , size - l, /);
 @@ -867,9 +867,17 @@ static void qbus_initfn(Object *obj)
  QTAILQ_INIT(bus-children);
  }
  
 +static char *default_bus_get_fw_dev_path(DeviceState *dev)
 +{
 +return g_strdup(object_get_typename(OBJECT(dev)));
 +}
 +
  static void bus_class_init(ObjectClass *class, void *data)
  {
 +BusClass *bc = BUS_CLASS(class);
 +
  class-unparent = bus_unparent;
 +bc-get_fw_dev_path = default_bus_get_fw_dev_path;
  }
  
  static void qbus_finalize(Object *obj)
 diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
 index ea2e11a..6849a01 100644
 --- a/hw/virtio/virtio-bus.c
 +++ b/hw/virtio/virtio-bus.c
 @@ -161,10 +161,16 @@ static char *virtio_bus_get_dev_path(DeviceState *dev)
  return qdev_get_dev_path(proxy);
  }
  
 +static char *virtio_bus_get_fw_dev_path(DeviceState *dev)
 +{
 +return NULL;
 +}
 +
  static void virtio_bus_class_init(ObjectClass *klass, void *data)
  {
  BusClass *bus_class = BUS_CLASS(klass);
  bus_class-get_dev_path = virtio_bus_get_dev_path;
 +bus_class-get_fw_dev_path = virtio_bus_get_fw_dev_path;
  }
  
  static const TypeInfo virtio_bus_info = {
 

To me booting VMs with more than one SCSI disk does still not work.

net: bootindex=100
scsi0: bootindex=201

does not work

this one works fine:
net: bootindex=200
scsi0: bootindex=101


Stefan

Re: [Qemu-devel] [RFC] Check backing_file chain's loop

2013-05-30 Thread Eric Blake

On 05/30/2013 04:14 AM, Dongxu Wang wrote:
 Hi,
 
 Now block layer does not check whether backing_file chain can be a
 circle, do you think it is necessary to do a check?
 
 For example,  1.qcow2's backing_file is 2.qcow2 and 2.qcow2's
 backing_file is 1.qcow2, then any IO operation will be a fault.
 
 If necessary, I will post a patch to fix.

Yes, it's worth patching.

 
 [$] qemu-img create -f qcow2 1.qcow2
 qemu-img: Image creation needs a size parameter
 [$] qemu-img create -f qcow2 1.qcow2 8M
 Formatting '1.qcow2', fmt=qcow2 size=8388608 encryption=off
 cluster_size=65536 lazy_refcounts=off
 [$] qemu-img create -f qcow2 2.qcow2 -o backing_file=1.qcow2
 Formatting '2.qcow2', fmt=qcow2 size=8388608 backing_file='1.qcow2'
 encryption=off cluster_size=65536 lazy_refcounts=off
 [$] qemu-img create -f qcow2 1.qcow2 -o backing_file=2.qcow2
 Formatting '1.qcow2', fmt=qcow2 size=8388608 backing_file='2.qcow2'
 encryption=off cluster_size=65536 lazy_refcounts=off
 [$] qemu-io -c read -v 0 512 1.qcow2
 ^CSegmentation fault (core dumped)

Also make sure that 'qemu-img info 1.qcow2' and 'qemu-img info
--backing-chain 1.qcow2' don't crash.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v2 2/5] pci: store PCI hole ranges in guestinfo structure

2013-05-30 Thread Gerd Hoffmann

  Hi,

 +} else {
 +guest_info-pci_info.w64.begin = 0x1ULL + above_4g_mem_size;
 +guest_info-pci_info.w64.end =  guest_info-pci_info.w64.begin +
 +(0x1ULL  62);

Doesn't this give unaligned windows?

 +/* Set PCI window size the way seabios has always done it. */
 +/* TODO: consider just starting at below_4g_mem_size */

Used to be that way.  Was changed for alignment reasons (i.e. 1G window
starts at 1G border etc).

cheers,
  Gerd

Re: [Qemu-devel] [PATCH v2 2/5] pci: store PCI hole ranges in guestinfo structure

2013-05-30 Thread Michael S. Tsirkin

On Thu, May 30, 2013 at 02:16:13PM +0200, Gerd Hoffmann wrote:
   Hi,
 
  +} else {
  +guest_info-pci_info.w64.begin = 0x1ULL + 
  above_4g_mem_size;
  +guest_info-pci_info.w64.end =  guest_info-pci_info.w64.begin +
  +(0x1ULL  62);
 
 Doesn't this give unaligned windows?

PCI Bridge windows do not need to be size aligned.

In any case, the windows are *exactly* as calculated
by seabios - apparently it does not size-align windows either.

  +/* Set PCI window size the way seabios has always done it. */
  +/* TODO: consider just starting at below_4g_mem_size */
 
 Used to be that way.  Was changed for alignment reasons (i.e. 1G window
 starts at 1G border etc).

Where's the alignment requirement coming from?

 
 cheers,
   Gerd

Re: [Qemu-devel] [SeaBIOS] KVM call agenda for 2013-05-28

2013-05-30 Thread David Woodhouse

On Thu, 2013-05-30 at 13:13 +0200, Laszlo Ersek wrote:
 Where is CorebootPkg available from?

https://github.com/pgeorgi/edk2/tree/coreboot-pkg

  And it helps to dispel the stupid misconception in some quarters that
  Coreboot *competes* with UEFI and thus cannot possibly be supported
  because helping something that competes with UEFI would be bad.
 
 I'm not sure who do you mean by some quarters, but for some
 distributions Coreboot would be yet another component (package) to
 support, for no obvious benefit.
 
 (Gerd said it better than I possibly could:
 http://thread.gmane.org/gmane.comp.bios.coreboot.seabios/5685/focus=5705.)

Yeah, but if we're shoving a lot of hardware-specific ACPI table
generation into the guest's firmware, instead of just doing it on the
qemu side where a number of us seem to think it belongs, then there *is*
a benefit to using Coreboot. When stuff changes on the qemu side and we
have to update the table generation to match, you end up having to
update just the Coreboot package, and *not* having to patch both SeaBIOS
and OVMF.

The extra package in the distro really isn't painful to handle, and I
suspect it would end up *reducing* the amount of work that we have to do
to update. You update *just* Coreboot, not *both* of SeaBIOS and OVMF.

 If you implement a private UEFI FAT driver from scratch, or port a free
 software FAT implementation (eg. the r/o one in grub or the r/w one in
 mtools), you could still run into legal problems, I've been told.

That has been said, and it's been said for the FAT implementation in the
kernel too. If a distribution is happy to ship the kernel without
ripping out its FAT support, then I see no reason why that distribution
wouldn't also be happy to ship a version of OVMF with a clean
implementation of FAT support. It doesn't make sense to be happy with
one but not the other.

 We need at least one out-of-tree edk2 patch for now (from you) but
 apparently that's no problem.

That'll get merged soon. We are working on the corresponding spec
update...

 As far as I understand:
 - Jordan is nearing completion of flash support on KVM,
 - he also has WIP NvVar storage on top of qemu flash.
 
 http://thread.gmane.org/gmane.comp.emulators.qemu/213690
 http://thread.gmane.org/gmane.comp.bios.tianocore.devel/2781/focus=2798
 
 (Please coordinate I guess :))

Ooh, shiny. Yeah, when I get back to actually working on it rather than
just heckling, I'll make sure I do :)

-- 
dwmw2


smime.p7s
Description: S/MIME cryptographic signature

Re: [Qemu-devel] [PATCH v2 2/5] pci: store PCI hole ranges in guestinfo structure

2013-05-30 Thread Laszlo Ersek

I can't offer any opinion about the values you put into w32 and w64, but I have 
some remarks.

First, please consider passing -O/path/to/some/order_file to git-format-patch, 
so that .h files show up at the top of each patch.

On 05/30/13 13:07, Michael S. Tsirkin wrote:
 Will be used to pass hole ranges to guests.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
  hw/i386/pc.c  | 39 ++-
  hw/i386/pc_piix.c | 14 +-
  hw/i386/pc_q35.c  |  6 +-
  hw/pci-host/q35.c |  4 
  include/hw/i386/pc.h  | 19 ++-
  include/hw/pci-host/q35.h |  2 ++
  include/qemu/typedefs.h   |  1 +
  7 files changed, 81 insertions(+), 4 deletions(-)
 
 diff --git a/hw/i386/pc.c b/hw/i386/pc.c
 index 4844a6b..c233161 100644
 --- a/hw/i386/pc.c
 +++ b/hw/i386/pc.c
 @@ -978,6 +978,41 @@ void pc_cpus_init(const char *cpu_model, DeviceState 
 *icc_bridge)
  }
  }
  
 +typedef struct PcGuestInfoState {
 +PcGuestInfo info;
 +Notifier machine_done;
 +} PcGuestInfoState;
 +
 +static
 +void pc_guest_info_machine_done(Notifier *notifier, void *data)
 +{
 +PcGuestInfoState *guest_info_state = container_of(notifier,
 +  PcGuestInfoState,
 +  machine_done);
 +}
 +
 +PcGuestInfo *pc_guest_info_init(ram_addr_t below_4g_mem_size,
 +ram_addr_t above_4g_mem_size)
 +{
 +PcGuestInfoState *guest_info_state = g_malloc0(sizeof *guest_info_state);
 +PcGuestInfo *guest_info = guest_info_state-info;
 +
 +guest_info-pci_info.w32.end = IO_APIC_DEFAULT_ADDRESS;
 +if (sizeof(hwaddr) == 4) {
 +guest_info-pci_info.w64.begin = 0;
 +guest_info-pci_info.w64.end = 0;
 +} else {
 +guest_info-pci_info.w64.begin = 0x1ULL + above_4g_mem_size;
 +guest_info-pci_info.w64.end =  guest_info-pci_info.w64.begin +
 +(0x1ULL  62);
 +assert(guest_info-pci_info.w64.begin = 
 guest_info-pci_info.w64.end);
 +}
 +
 +guest_info_state-machine_done.notify = pc_guest_info_machine_done;
 +qemu_add_machine_init_done_notifier(guest_info_state-machine_done);
 +return guest_info;
 +}
 +
  void pc_acpi_init(const char *default_dsdt)
  {
  char *filename;
 @@ -1019,7 +1054,8 @@ FWCfgState *pc_memory_init(MemoryRegion *system_memory,
 ram_addr_t below_4g_mem_size,
 ram_addr_t above_4g_mem_size,
 MemoryRegion *rom_memory,
 -   MemoryRegion **ram_memory)
 +   MemoryRegion **ram_memory,
 +   PcGuestInfo *guest_info)
  {
  int linux_boot, i;
  MemoryRegion *ram, *option_rom_mr;
 @@ -1071,6 +1107,7 @@ FWCfgState *pc_memory_init(MemoryRegion *system_memory,
  for (i = 0; i  nb_option_roms; i++) {
  rom_add_option(option_rom[i].name, option_rom[i].bootindex);
  }
 +guest_info-fw_cfg = fw_cfg;
  return fw_cfg;
  }

I find this suboptimal style, ie. passing guest_info to pc_memory_init() just 
so that pc_memory_init() can set guest_info-fw_cfg to fw_cfg, when 
pc_memory_init() returns fw_cfg anyway.

More philosophically:

 diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
 index b4c8a74..1bf5219 100644
 --- a/include/hw/i386/pc.h
 +++ b/include/hw/i386/pc.h
 @@ -9,8 +9,20 @@
  #include net/net.h
  #include hw/i386/ioapic.h
  
 +#include qemu/range.h
 +
  /* PC-style peripherals (also used by other machines).  */
  
 +typedef struct PcPciInfo {
 +Range w32;
 +Range w64;
 +} PcPciInfo;
 +
 +struct PcGuestInfo {
 +PcPciInfo pci_info;
 +FWCfgState *fw_cfg;
 +};

Are you sure you need a private link to the global fw_cfg object? The pvpanic 
series added a global lookup possibility in commit 10a584b2 
(object_resolve_path()).

Anyway these are just subjective style notes.

 +
  /* parallel.c */
  static inline bool parallel_init(ISABus *bus, int index, CharDriverState 
 *chr)
  {
 @@ -80,6 +92,10 @@ void pc_acpi_smi_interrupt(void *opaque, int irq, int 
 level);
  void pc_cpus_init(const char *cpu_model, DeviceState *icc_bridge);
  void pc_hot_add_cpu(const int64_t id, Error **errp);
  void pc_acpi_init(const char *default_dsdt);
 +
 +PcGuestInfo *pc_guest_info_init(ram_addr_t below_4g_mem_size,
 +ram_addr_t above_4g_mem_size);
 +
  FWCfgState *pc_memory_init(MemoryRegion *system_memory,
 const char *kernel_filename,
 const char *kernel_cmdline,
 @@ -87,7 +103,8 @@ FWCfgState *pc_memory_init(MemoryRegion *system_memory,
 ram_addr_t below_4g_mem_size,
 ram_addr_t above_4g_mem_size,
 MemoryRegion *rom_memory,
 -   MemoryRegion **ram_memory);
 +

Re: [Qemu-devel] [SeaBIOS] KVM call agenda for 2013-05-28

2013-05-30 Thread Michael S. Tsirkin

On Thu, May 30, 2013 at 01:19:18PM +0100, David Woodhouse wrote:
 Yeah, but if we're shoving a lot of hardware-specific ACPI table
 generation into the guest's firmware, instead of just doing it on the
 qemu side where a number of us seem to think it belongs,

Hopefully this is not yet set in stone.

 then there *is*
 a benefit to using Coreboot. When stuff changes on the qemu side and we
 have to update the table generation to match, you end up having to
 update just the Coreboot package, and *not* having to patch both SeaBIOS
 and OVMF.

We have all kind of logic in qemu. Some of it can thinkably
be moved to a separate VM - it doesn't even need to
run in the same VM as the guest - we could do it e.g. like
kvm unit-test does, with less pain than running it in firmware.
Not clear why would generating ACPI tables - which merely
fills up an array of bytes from internal QEMU
datastructures - should be the part where we start
this modularization.

-- 
MST

Re: [Qemu-devel] [PATCH v2 1/3] block: add basic backup support to block driver

2013-05-30 Thread Stefan Hajnoczi

On Thu, May 30, 2013 at 5:37 AM, Fam Zheng f...@redhat.com wrote:
 On Mon, 04/29 09:42, Stefan Hajnoczi wrote:
 +
 +static void coroutine_fn backup_run(void *opaque)
 +{
 +BackupBlockJob *job = opaque;
 +BlockDriverState *bs = job-common.bs;
 +assert(bs);
 +
 +int64_t start, end;
 +
 +start = 0;
 +end = DIV_ROUND_UP(bdrv_getlength(bs) / BDRV_SECTOR_SIZE,
 +   BACKUP_BLOCKS_PER_CLUSTER);
 +
 +job-bitmap = hbitmap_alloc(end, 0);
 +
 +DPRINTF(backup_run start %s % PRId64  % PRId64 \n,
 +bdrv_get_device_name(bs), start, end);
 +
 +int ret = 0;
 +
 +for (; start  end; start++) {
 +if (block_job_is_cancelled(job-common)) {
 +ret = -1;
 +break;
 +}
 +
 +/* we need to yield so that qemu_aio_flush() returns.
 + * (without, VM does not reboot)
 + */
 +if (job-common.speed) {
 +uint64_t delay_ns = ratelimit_calculate_delay(
 +job-limit, job-sectors_read);
 +job-sectors_read = 0;
 +block_job_sleep_ns(job-common, rt_clock, delay_ns);
 +} else {
 +block_job_sleep_ns(job-common, rt_clock, 0);
 +}
 +
 +if (block_job_is_cancelled(job-common)) {
 +ret = -1;
 +break;
 +}
 +
 +if (hbitmap_get(job-bitmap, start)) {
 +continue; /* already copied */
 +}
 +
 +DPRINTF(backup_run loop C% PRId64 \n, start);
 +
 +/**
 + * This triggers a cluster copy
 + * Note: avoid direct call to brdv_co_backup_cow, because
 + * this does not call tracked_request_begin()
 + */
 +ret = bdrv_co_backup(bs, start*BACKUP_BLOCKS_PER_CLUSTER, 1);
 +if (ret  0) {
 +break;
 +}
 +/* Publish progress */
 +job-common.offset += BACKUP_CLUSTER_SIZE;
 +}
 +
 +/* wait until pending backup_do_cow()calls have completed */
 +qemu_co_rwlock_wrlock(job-rwlock);
 +qemu_co_rwlock_unlock(job-rwlock);
 +
 +hbitmap_free(job-bitmap);
 +
 +bdrv_delete(job-target);
 +
 +DPRINTF(backup_run complete %d\n, ret);
 +block_job_completed(job-common, ret);
 +}

 What will ret value be when the source block is not aligned to cluster
 size?

Good catch, will fix.

Stefan

Re: [Qemu-devel] [PATCH v2 2/5] pci: store PCI hole ranges in guestinfo structure

2013-05-30 Thread Gerd Hoffmann

On 05/30/13 14:19, Michael S. Tsirkin wrote:
 On Thu, May 30, 2013 at 02:16:13PM +0200, Gerd Hoffmann wrote:
   Hi,

 +} else {
 +guest_info-pci_info.w64.begin = 0x1ULL + 
 above_4g_mem_size;
 +guest_info-pci_info.w64.end =  guest_info-pci_info.w64.begin +
 +(0x1ULL  62);

 Doesn't this give unaligned windows?
 
 PCI Bridge windows do not need to be size aligned.
 
 In any case, the windows are *exactly* as calculated
 by seabios - apparently it does not size-align windows either.

Surely not.  SeaBIOS sizes the 64bit window according to the space
needed by the 64bit bars it wants to map there.

 +/* Set PCI window size the way seabios has always done it. */
 +/* TODO: consider just starting at below_4g_mem_size */

 Used to be that way.  Was changed for alignment reasons (i.e. 1G window
 starts at 1G border etc).
 
 Where's the alignment requirement coming from?

seabios creates a mtrr entry for the window, which doesn't work in case
it isn't aligned (at least not with a single entry).

Also real hardware tends to do it this way.

cheers,
  Gerd

[Qemu-devel] [PATCH v5 10/11] blockdev: add Abort transaction

2013-05-30 Thread Stefan Hajnoczi

The Abort action can be used to test QMP 'transaction' failure.  Add it
as the last action to exercise the .abort() and .cleanup() code paths
for all previous actions.

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 blockdev.c   | 15 +++
 qapi-schema.json | 13 -
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/blockdev.c b/blockdev.c
index 18a2012..e3393d2 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -963,6 +963,16 @@ static void drive_backup_abort(BlkTransactionState *common)
 }
 }
 
+static void abort_prepare(BlkTransactionState *common, Error **errp)
+{
+error_setg(errp, Transaction aborted using Abort action);
+}
+
+static void abort_commit(BlkTransactionState *common)
+{
+assert(false); /* this action never succeeds */
+}
+
 static const BdrvActionOps actions[] = {
 [TRANSACTION_ACTION_KIND_BLOCKDEV_SNAPSHOT_SYNC] = {
 .instance_size = sizeof(ExternalSnapshotState),
@@ -975,6 +985,11 @@ static const BdrvActionOps actions[] = {
 .prepare = drive_backup_prepare,
 .abort = drive_backup_abort,
 },
+[TRANSACTION_ACTION_KIND_ABORT] = {
+.instance_size = sizeof(BlkTransactionState),
+.prepare = abort_prepare,
+.commit = abort_commit,
+},
 };
 
 /*
diff --git a/qapi-schema.json b/qapi-schema.json
index 6bf96aa..0da1b98 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1646,6 +1646,16 @@
 '*on-target-error': 'BlockdevOnError' } }
 
 ##
+# @Abort
+#
+# This action can be used to test transaction failure.
+#
+# Since: 1.6
+###
+{ 'type': 'Abort',
+  'data': { } }
+
+##
 # @TransactionAction
 #
 # A discriminated record of operations that can be performed with
@@ -1654,7 +1664,8 @@
 { 'union': 'TransactionAction',
   'data': {
'blockdev-snapshot-sync': 'BlockdevSnapshot',
-   'drive-backup': 'DriveBackup'
+   'drive-backup': 'DriveBackup',
+   'abort': 'Abort'
} }
 
 ##
-- 
1.8.1.4

[Qemu-devel] [PATCH v5 09/11] blockdev: add DriveBackup transaction

2013-05-30 Thread Stefan Hajnoczi

This patch adds a transactional version of the drive-backup QMP command.
It allows atomic snapshots of multiple drives along with automatic
cleanup if there is a failure to start one of the backup jobs.

Note that QMP events are emitted for block job completion/cancellation
and the block job will be listed by query-block-jobs.

@device: the name of the device whose writes should be mirrored.

@target: the target of the new image. If the file exists, or if it
 is a device, the existing file/device will be used as the new
 destination.  If it does not exist, a new file will be created.

@format: #optional the format of the new destination, default is to
 probe if @mode is 'existing', else the format of the source

@mode: #optional whether and how QEMU should create a new image, default is
   'absolute-paths'.

@speed: #optional the maximum speed, in bytes per second

@on-source-error: #optional the action to take on an error on the source,
  default 'report'.  'stop' and 'enospc' can only be used
  if the block device supports io-status (see BlockInfo).

@on-target-error: #optional the action to take on an error on the target,
  default 'report' (no limitations, since this applies to
  a different block device than @device).

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 blockdev.c   | 49 +
 qapi-schema.json | 40 +++-
 2 files changed, 88 insertions(+), 1 deletion(-)

diff --git a/blockdev.c b/blockdev.c
index 0e649df..18a2012 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -919,6 +919,50 @@ static void external_snapshot_abort(BlkTransactionState 
*common)
 }
 }
 
+typedef struct DriveBackupState {
+BlkTransactionState common;
+BlockDriverState *bs;
+BlockJob *job;
+} DriveBackupState;
+
+static void drive_backup_prepare(BlkTransactionState *common, Error **errp)
+{
+DriveBackupState *state = DO_UPCAST(DriveBackupState, common, common);
+DriveBackup *backup;
+Error *local_err = NULL;
+
+assert(common-action-kind == TRANSACTION_ACTION_KIND_DRIVE_BACKUP);
+backup = common-action-drive_backup;
+
+qmp_drive_backup(backup-device, backup-target,
+ backup-has_format, backup-format,
+ backup-has_mode, backup-mode,
+ backup-has_speed, backup-speed,
+ backup-has_on_source_error, backup-on_source_error,
+ backup-has_on_target_error, backup-on_target_error,
+ local_err);
+if (error_is_set(local_err)) {
+error_propagate(errp, local_err);
+state-bs = NULL;
+state-job = NULL;
+return;
+}
+
+state-bs = bdrv_find(backup-device);
+state-job = state-bs-job;
+}
+
+static void drive_backup_abort(BlkTransactionState *common)
+{
+DriveBackupState *state = DO_UPCAST(DriveBackupState, common, common);
+BlockDriverState *bs = state-bs;
+
+/* Only cancel if it's the job we started */
+if (bs  bs-job  bs-job == state-job) {
+block_job_cancel_sync(bs-job);
+}
+}
+
 static const BdrvActionOps actions[] = {
 [TRANSACTION_ACTION_KIND_BLOCKDEV_SNAPSHOT_SYNC] = {
 .instance_size = sizeof(ExternalSnapshotState),
@@ -926,6 +970,11 @@ static const BdrvActionOps actions[] = {
 .commit   = external_snapshot_commit,
 .abort = external_snapshot_abort,
 },
+[TRANSACTION_ACTION_KIND_DRIVE_BACKUP] = {
+.instance_size = sizeof(DriveBackupState),
+.prepare = drive_backup_prepare,
+.abort = drive_backup_abort,
+},
 };
 
 /*
diff --git a/qapi-schema.json b/qapi-schema.json
index 55b1a37..6bf96aa 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1609,6 +1609,43 @@
 '*mode': 'NewImageMode' } }
 
 ##
+# @DriveBackup
+#
+# @device: the name of the device which should be copied.
+#
+# @target: the target of the new image. If the file exists, or if it
+#  is a device, the existing file/device will be used as the new
+#  destination.  If it does not exist, a new file will be created.
+#
+# @format: #optional the format of the new destination, default is to
+#  probe if @mode is 'existing', else the format of the source
+#
+# @mode: #optional whether and how QEMU should create a new image, default is
+#'absolute-paths'.
+#
+# @speed: #optional the maximum speed, in bytes per second
+#
+# @on-source-error: #optional the action to take on an error on the source,
+#   default 'report'.  'stop' and 'enospc' can only be used
+#   if the block device supports io-status (see BlockInfo).
+#
+# @on-target-error: #optional the action to take on an error on the target,
+#   default 'report' (no limitations, since this applies to
+#   a different block device than @device).
+#
+# Note

[Qemu-devel] [PATCH v5 08/11] blockdev: allow BdrvActionOps-commit() to be NULL

2013-05-30 Thread Stefan Hajnoczi

Some QMP 'transaction' types don't need to do anything on .commit().
Make .commit() optional just like .abort().

The drive-backup action will take advantage of this, it only needs to
cancel the block job on .abort().  Other block job actions will probably
follow the same pattern, so allow .commit() to be NULL.

Suggested-by: Eric Blake ebl...@redhat.com
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 blockdev.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 94ad829..0e649df 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -789,7 +789,7 @@ typedef struct BdrvActionOps {
 size_t instance_size;
 /* Prepare the work, must NOT be NULL. */
 void (*prepare)(BlkTransactionState *common, Error **errp);
-/* Commit the changes, must NOT be NULL. */
+/* Commit the changes, can be NULL. */
 void (*commit)(BlkTransactionState *common);
 /* Abort the changes on fail, can be NULL. */
 void (*abort)(BlkTransactionState *common);
@@ -969,7 +969,9 @@ void qmp_transaction(TransactionActionList *dev_list, Error 
**errp)
 }
 
 QSIMPLEQ_FOREACH(state, snap_bdrv_states, entry) {
-state-ops-commit(state);
+if (state-ops-commit) {
+state-ops-commit(state);
+}
 }
 
 /* success */
-- 
1.8.1.4

[Qemu-devel] [PATCH v5 04/11] blockdev: drop redundant proto_drv check

2013-05-30 Thread Stefan Hajnoczi

It is not necessary to check that we can find a protocol block driver
since we create or open the image file.  This produces the error that we
need anyway.

Besides, the QERR_INVALID_BLOCK_FORMAT is inappropriate since the
protocol is incorrect rather than the format.

Also drop an empty line between bdrv_open() and checking its return
value.  This may be due to copy-pasting from earlier code that performed
other operations before handling errors.

Reported-by: Kevin Wolf kw...@redhat.com
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 blockdev.c | 15 ---
 1 file changed, 15 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index b9b2d10..01db519 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -818,7 +818,6 @@ typedef struct ExternalSnapshotStates {
 static void external_snapshot_prepare(BlkTransactionStates *common,
   Error **errp)
 {
-BlockDriver *proto_drv;
 BlockDriver *drv;
 int flags, ret;
 Error *local_err = NULL;
@@ -874,12 +873,6 @@ static void external_snapshot_prepare(BlkTransactionStates 
*common,
 
 flags = states-old_bs-open_flags;
 
-proto_drv = bdrv_find_protocol(new_image_file);
-if (!proto_drv) {
-error_set(errp, QERR_INVALID_BLOCK_FORMAT, format);
-return;
-}
-
 /* create new image w/backing file */
 if (mode != NEW_IMAGE_MODE_EXISTING) {
 bdrv_img_create(new_image_file, format,
@@ -1368,7 +1361,6 @@ void qmp_drive_mirror(const char *device, const char 
*target,
 {
 BlockDriverState *bs;
 BlockDriverState *source, *target_bs;
-BlockDriver *proto_drv;
 BlockDriver *drv = NULL;
 Error *local_err = NULL;
 int flags;
@@ -1436,12 +1428,6 @@ void qmp_drive_mirror(const char *device, const char 
*target,
 sync = MIRROR_SYNC_MODE_FULL;
 }
 
-proto_drv = bdrv_find_protocol(target);
-if (!proto_drv) {
-error_set(errp, QERR_INVALID_BLOCK_FORMAT, format);
-return;
-}
-
 bdrv_get_geometry(bs, size);
 size *= 512;
 if (sync == MIRROR_SYNC_MODE_FULL  mode != NEW_IMAGE_MODE_EXISTING) {
@@ -1476,7 +1462,6 @@ void qmp_drive_mirror(const char *device, const char 
*target,
  */
 target_bs = bdrv_new();
 ret = bdrv_open(target_bs, target, NULL, flags | BDRV_O_NO_BACKING, drv);
-
 if (ret  0) {
 bdrv_delete(target_bs);
 error_set(errp, QERR_OPEN_FILE_FAILED, target);
-- 
1.8.1.4

[Qemu-devel] [PATCH v5 07/11] blockdev: rename BlkTransactionStates to singular

2013-05-30 Thread Stefan Hajnoczi

The QMP 'transaction' command keeps a list of in-flight transactions.
The transaction state structure is called BlkTransactionStates even
though it only deals with a single transaction.  The only plural thing
is the linked list of transaction states.

I find it confusing to call the single structure States.  This patch
renames it to State, just like BlockDriverState is singular.

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 blockdev.c | 104 ++---
 1 file changed, 52 insertions(+), 52 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 5de33f5..94ad829 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -780,7 +780,7 @@ void qmp_blockdev_snapshot_sync(const char *device, const 
char *snapshot_file,
 
 /* New and old BlockDriverState structs for group snapshots */
 
-typedef struct BlkTransactionStates BlkTransactionStates;
+typedef struct BlkTransactionState BlkTransactionState;
 
 /* Only prepare() may fail. In a single transaction, only one of commit() or
abort() will be called, clean() will always be called if it present. */
@@ -788,13 +788,13 @@ typedef struct BdrvActionOps {
 /* Size of state struct, in bytes. */
 size_t instance_size;
 /* Prepare the work, must NOT be NULL. */
-void (*prepare)(BlkTransactionStates *common, Error **errp);
+void (*prepare)(BlkTransactionState *common, Error **errp);
 /* Commit the changes, must NOT be NULL. */
-void (*commit)(BlkTransactionStates *common);
+void (*commit)(BlkTransactionState *common);
 /* Abort the changes on fail, can be NULL. */
-void (*abort)(BlkTransactionStates *common);
+void (*abort)(BlkTransactionState *common);
 /* Clean up resource in the end, can be NULL. */
-void (*clean)(BlkTransactionStates *common);
+void (*clean)(BlkTransactionState *common);
 } BdrvActionOps;
 
 /*
@@ -802,20 +802,20 @@ typedef struct BdrvActionOps {
  * that compiler will also arrange it to the same address with parent instance.
  * Later it will be used in free().
  */
-struct BlkTransactionStates {
+struct BlkTransactionState {
 TransactionAction *action;
 const BdrvActionOps *ops;
-QSIMPLEQ_ENTRY(BlkTransactionStates) entry;
+QSIMPLEQ_ENTRY(BlkTransactionState) entry;
 };
 
 /* external snapshot private data */
-typedef struct ExternalSnapshotStates {
-BlkTransactionStates common;
+typedef struct ExternalSnapshotState {
+BlkTransactionState common;
 BlockDriverState *old_bs;
 BlockDriverState *new_bs;
-} ExternalSnapshotStates;
+} ExternalSnapshotState;
 
-static void external_snapshot_prepare(BlkTransactionStates *common,
+static void external_snapshot_prepare(BlkTransactionState *common,
   Error **errp)
 {
 BlockDriver *drv;
@@ -825,8 +825,8 @@ static void external_snapshot_prepare(BlkTransactionStates 
*common,
 const char *new_image_file;
 const char *format = qcow2;
 enum NewImageMode mode = NEW_IMAGE_MODE_ABSOLUTE_PATHS;
-ExternalSnapshotStates *states =
- DO_UPCAST(ExternalSnapshotStates, common, common);
+ExternalSnapshotState *state =
+ DO_UPCAST(ExternalSnapshotState, common, common);
 TransactionAction *action = common-action;
 
 /* get parameters */
@@ -848,36 +848,36 @@ static void 
external_snapshot_prepare(BlkTransactionStates *common,
 return;
 }
 
-states-old_bs = bdrv_find(device);
-if (!states-old_bs) {
+state-old_bs = bdrv_find(device);
+if (!state-old_bs) {
 error_set(errp, QERR_DEVICE_NOT_FOUND, device);
 return;
 }
 
-if (!bdrv_is_inserted(states-old_bs)) {
+if (!bdrv_is_inserted(state-old_bs)) {
 error_set(errp, QERR_DEVICE_HAS_NO_MEDIUM, device);
 return;
 }
 
-if (bdrv_in_use(states-old_bs)) {
+if (bdrv_in_use(state-old_bs)) {
 error_set(errp, QERR_DEVICE_IN_USE, device);
 return;
 }
 
-if (!bdrv_is_read_only(states-old_bs)) {
-if (bdrv_flush(states-old_bs)) {
+if (!bdrv_is_read_only(state-old_bs)) {
+if (bdrv_flush(state-old_bs)) {
 error_set(errp, QERR_IO_ERROR);
 return;
 }
 }
 
-flags = states-old_bs-open_flags;
+flags = state-old_bs-open_flags;
 
 /* create new image w/backing file */
 if (mode != NEW_IMAGE_MODE_EXISTING) {
 bdrv_img_create(new_image_file, format,
-states-old_bs-filename,
-states-old_bs-drv-format_name,
+state-old_bs-filename,
+state-old_bs-drv-format_name,
 NULL, -1, flags, local_err, false);
 if (error_is_set(local_err)) {
 error_propagate(errp, local_err);
@@ -886,42 +886,42 @@ static void 
external_snapshot_prepare(BlkTransactionStates *common,
 }
 
 /* We will manually add the backing_hd field to the bs

Re: [Qemu-devel] [SeaBIOS] KVM call agenda for 2013-05-28

2013-05-30 Thread Laszlo Ersek

On 05/30/13 14:19, David Woodhouse wrote:

 Yeah, but if we're shoving a lot of hardware-specific ACPI table
 generation into the guest's firmware, instead of just doing it on the
 qemu side where a number of us seem to think it belongs, then there *is*
 a benefit to using Coreboot. When stuff changes on the qemu side and we
 have to update the table generation to match, you end up having to
 update just the Coreboot package, and *not* having to patch both SeaBIOS
 and OVMF.
 
 The extra package in the distro really isn't painful to handle, and I
 suspect it would end up *reducing* the amount of work that we have to do
 to update. You update *just* Coreboot, not *both* of SeaBIOS and OVMF.

I can't deny there's logic in that, but it still feels like tying
ourselves up in some intricate bondage choreography. We'd like to move
ACPI tables out of firmware, but we can't move them to qemu due to
project direction disagreement, so let's adopt a middleman. (I'm not
trying to denigrate coreboot -- I don't know it at all --, but
introducing it in a (granted, distro-specific) stack just for this
purpose seems quite arbitrary.)

 If you implement a private UEFI FAT driver from scratch, or port a free
 software FAT implementation (eg. the r/o one in grub or the r/w one in
 mtools), you could still run into legal problems, I've been told.
 
 That has been said, and it's been said for the FAT implementation in the
 kernel too. If a distribution is happy to ship the kernel without
 ripping out its FAT support, then I see no reason why that distribution
 wouldn't also be happy to ship a version of OVMF with a clean
 implementation of FAT support. It doesn't make sense to be happy with
 one but not the other.

Under my *personal* impression, logic and Common Law don't really mix,
especially not in the US. Still under my *personal* impression, someone
might not feel convenient suing you for redistributing code that already
exists in the upstream Linux kernel, but might happily drag you to court
for an original clean implementation, and then you can explain it's
illogical for them to do so.

The best I can do with your suggestion is to take it to our legal dept.
I would be happy to work on a new FAT driver. (I used to feel
differently earlier but I've changed my mind.)


 We need at least one out-of-tree edk2 patch for now (from you) but
 apparently that's no problem.
 
 That'll get merged soon. We are working on the corresponding spec
 update...

Great news!

Thanks,
Laszlo

[Qemu-devel] [PATCH v5 05/11] blockdev: use bdrv_getlength() in qmp_drive_mirror()

2013-05-30 Thread Stefan Hajnoczi

Use bdrv_getlength() for its byte units and error return instead of
bdrv_get_geometry().

Reported-by: Kevin Wolf kw...@redhat.com
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 blockdev.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 01db519..73b175b 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1364,7 +1364,7 @@ void qmp_drive_mirror(const char *device, const char 
*target,
 BlockDriver *drv = NULL;
 Error *local_err = NULL;
 int flags;
-uint64_t size;
+int64_t size;
 int ret;
 
 if (!has_speed) {
@@ -1428,8 +1428,12 @@ void qmp_drive_mirror(const char *device, const char 
*target,
 sync = MIRROR_SYNC_MODE_FULL;
 }
 
-bdrv_get_geometry(bs, size);
-size *= 512;
+size = bdrv_getlength(bs);
+if (size  0) {
+error_setg_errno(errp, -size, bdrv_getlength failed);
+return;
+}
+
 if (sync == MIRROR_SYNC_MODE_FULL  mode != NEW_IMAGE_MODE_EXISTING) {
 /* create new image w/o backing file */
 assert(format  drv);
-- 
1.8.1.4

[Qemu-devel] [PATCH v5 02/11] block: add bdrv_add_before_write_notifier()

2013-05-30 Thread Stefan Hajnoczi

The bdrv_add_before_write_notifier() function installs a callback that
is invoked before a write request is processed.  This will be used to
implement copy-on-write point-in-time snapshots where we need to copy
out old data before overwriting it.

Note that BdrvTrackedRequest is moved to block_int.h since it is passed
to .notify() functions.

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 block.c   | 23 ---
 include/block/block_int.h | 23 ++-
 2 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/block.c b/block.c
index 65c0b60..9e43f20 100644
--- a/block.c
+++ b/block.c
@@ -308,6 +308,7 @@ BlockDriverState *bdrv_new(const char *device_name)
 }
 bdrv_iostatus_disable(bs);
 notifier_list_init(bs-close_notifiers);
+notifier_with_return_list_init(bs-before_write_notifiers);
 
 return bs;
 }
@@ -1850,16 +1851,6 @@ int bdrv_commit_all(void)
 return 0;
 }
 
-struct BdrvTrackedRequest {
-BlockDriverState *bs;
-int64_t sector_num;
-int nb_sectors;
-bool is_write;
-QLIST_ENTRY(BdrvTrackedRequest) list;
-Coroutine *co; /* owner, used for deadlock detection */
-CoQueue wait_queue; /* coroutines blocked on this request */
-};
-
 /**
  * Remove an active request from the tracked requests list
  *
@@ -2630,7 +2621,11 @@ static int coroutine_fn 
bdrv_co_do_writev(BlockDriverState *bs,
 
 tracked_request_begin(req, bs, sector_num, nb_sectors, true);
 
-if (flags  BDRV_REQ_ZERO_WRITE) {
+ret = notifier_with_return_list_notify(bs-before_write_notifiers, req);
+
+if (ret  0) {
+/* Do nothing, write notifier decided to fail this request */
+} else if (flags  BDRV_REQ_ZERO_WRITE) {
 ret = bdrv_co_do_write_zeroes(bs, sector_num, nb_sectors);
 } else {
 ret = drv-bdrv_co_writev(bs, sector_num, nb_sectors, qiov);
@@ -4894,3 +4889,9 @@ AioContext *bdrv_get_aio_context(BlockDriverState *bs)
 /* Currently BlockDriverState always uses the main loop AioContext */
 return qemu_get_aio_context();
 }
+
+void bdrv_add_before_write_notifier(BlockDriverState *bs,
+NotifierWithReturn *notifier)
+{
+notifier_with_return_list_add(bs-before_write_notifiers, notifier);
+}
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 6078dd3..440d510 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -58,7 +58,16 @@
 #define BLOCK_OPT_LAZY_REFCOUNTSlazy_refcounts
 #define BLOCK_OPT_ADAPTER_TYPE  adapter_type
 
-typedef struct BdrvTrackedRequest BdrvTrackedRequest;
+typedef struct BdrvTrackedRequest {
+BlockDriverState *bs;
+int64_t sector_num;
+int nb_sectors;
+bool is_write;
+QLIST_ENTRY(BdrvTrackedRequest) list;
+Coroutine *co; /* owner, used for deadlock detection */
+CoQueue wait_queue; /* coroutines blocked on this request */
+} BdrvTrackedRequest;
+
 
 typedef struct BlockIOLimit {
 int64_t bps[3];
@@ -247,6 +256,9 @@ struct BlockDriverState {
 
 NotifierList close_notifiers;
 
+/* Callback before write request is processed */
+NotifierWithReturnList before_write_notifiers;
+
 /* number of in-flight copy-on-read requests */
 unsigned int copy_on_read_in_flight;
 
@@ -298,6 +310,15 @@ void bdrv_set_io_limits(BlockDriverState *bs,
 BlockIOLimit *io_limits);
 
 /**
+ * bdrv_add_before_write_notifier:
+ *
+ * Register a callback that is invoked before write requests are processed but
+ * after any throttling or waiting for overlapping requests.
+ */
+void bdrv_add_before_write_notifier(BlockDriverState *bs,
+NotifierWithReturn *notifier);
+
+/**
  * bdrv_get_aio_context:
  *
  * Returns: the currently bound #AioContext
-- 
1.8.1.4

[Qemu-devel] [PATCH v5 11/11] qemu-iotests: add 055 drive-backup test case

2013-05-30 Thread Stefan Hajnoczi

Testing drive-backup is similar to image streaming and drive mirroring.
This test case is based on 041.

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 tests/qemu-iotests/055 | 256 +
 tests/qemu-iotests/055.out |   5 +
 tests/qemu-iotests/group   |   1 +
 3 files changed, 262 insertions(+)
 create mode 100755 tests/qemu-iotests/055
 create mode 100644 tests/qemu-iotests/055.out

diff --git a/tests/qemu-iotests/055 b/tests/qemu-iotests/055
new file mode 100755
index 000..0341301
--- /dev/null
+++ b/tests/qemu-iotests/055
@@ -0,0 +1,256 @@
+#!/usr/bin/env python
+#
+# Tests for drive-backup
+#
+# Copyright (C) 2013 Red Hat, Inc.
+#
+# Based on 041.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see http://www.gnu.org/licenses/.
+#
+
+import time
+import os
+import iotests
+from iotests import qemu_img, qemu_io
+
+test_img = os.path.join(iotests.test_dir, 'test.img')
+target_img = os.path.join(iotests.test_dir, 'target.img')
+
+class TestSingleDrive(iotests.QMPTestCase):
+image_len = 120 * 1024 * 1024 # MB
+
+def setUp(self):
+qemu_img('create', '-f', iotests.imgfmt, test_img, 
str(TestSingleDrive.image_len))
+self.vm = iotests.VM().add_drive(test_img)
+self.vm.launch()
+
+def tearDown(self):
+self.vm.shutdown()
+os.remove(test_img)
+try:
+os.remove(target_img)
+except OSError:
+pass
+
+def test_cancel(self):
+self.assert_no_active_block_jobs()
+
+result = self.vm.qmp('drive-backup', device='drive0',
+ target=target_img)
+self.assert_qmp(result, 'return', {})
+
+self.cancel_and_wait()
+
+def test_pause(self):
+self.assert_no_active_block_jobs()
+
+result = self.vm.qmp('drive-backup', device='drive0',
+ target=target_img)
+self.assert_qmp(result, 'return', {})
+
+result = self.vm.qmp('block-job-pause', device='drive0')
+self.assert_qmp(result, 'return', {})
+
+time.sleep(1)
+result = self.vm.qmp('query-block-jobs')
+offset = self.dictpath(result, 'return[0]/offset')
+
+time.sleep(1)
+result = self.vm.qmp('query-block-jobs')
+self.assert_qmp(result, 'return[0]/offset', offset)
+
+self.vm.shutdown()
+self.assertTrue(iotests.compare_images(test_img, target_img),
+'target image does not match source after backup')
+
+def test_medium_not_found(self):
+result = self.vm.qmp('drive-backup', device='ide1-cd0',
+ target=target_img)
+self.assert_qmp(result, 'error/class', 'GenericError')
+
+def test_image_not_found(self):
+result = self.vm.qmp('drive-backup', device='drive0',
+ mode='existing', target=target_img)
+self.assert_qmp(result, 'error/class', 'GenericError')
+
+def test_device_not_found(self):
+result = self.vm.qmp('drive-backup', device='nonexistent',
+ target=target_img)
+self.assert_qmp(result, 'error/class', 'DeviceNotFound')
+
+class TestSetSpeed(iotests.QMPTestCase):
+image_len = 80 * 1024 * 1024 # MB
+
+def setUp(self):
+qemu_img('create', '-f', iotests.imgfmt, test_img, 
str(TestSetSpeed.image_len))
+self.vm = iotests.VM().add_drive(test_img)
+self.vm.launch()
+
+def tearDown(self):
+self.vm.shutdown()
+os.remove(test_img)
+os.remove(target_img)
+
+def test_set_speed(self):
+self.assert_no_active_block_jobs()
+
+result = self.vm.qmp('drive-backup', device='drive0',
+ target=target_img)
+self.assert_qmp(result, 'return', {})
+
+# Default speed is 0
+result = self.vm.qmp('query-block-jobs')
+self.assert_qmp(result, 'return[0]/device', 'drive0')
+self.assert_qmp(result, 'return[0]/speed', 0)
+
+result = self.vm.qmp('block-job-set-speed', device='drive0', speed=8 * 
1024 * 1024)
+self.assert_qmp(result, 'return', {})
+
+# Ensure the speed we set was accepted
+result = self.vm.qmp('query-block-jobs')
+self.assert_qmp(result, 'return[0]/device', 'drive0')
+self.assert_qmp(result, 'return[0]/speed', 8 * 1024 * 1024)
+
+self.cancel_and_wait()
+

Re: [Qemu-devel] [PATCH] gdbstub: do not restart crashed guest

2013-05-30 Thread Luiz Capitulino

On Thu, 30 May 2013 13:55:50 +0200
Laszlo Ersek ler...@redhat.com wrote:

 On 05/30/13 13:20, Paolo Bonzini wrote:
  If a guest has crashed with an internal error or similar, detaching
  gdb (or any other debugger action) should not restart it.
  
  Cc: Jan Kiszka jan.kis...@siemens.com
  Signed-off-by: Paolo Bonzini pbonz...@redhat.com
  ---
   gdbstub.c | 4 +++-
   1 file changed, 3 insertions(+), 1 deletion(-)
  
  diff --git a/gdbstub.c b/gdbstub.c
  index e80e1d3..90e54cb 100644
  --- a/gdbstub.c
  +++ b/gdbstub.c
  @@ -371,7 +371,9 @@ static inline void gdb_continue(GDBState *s)
   #ifdef CONFIG_USER_ONLY
   s-running_state = 1;
   #else
  -vm_start();
  +if (runstate_check(RUN_STATE_DEBUG)) {
  +vm_start();
  +}
   #endif
   }
   
  
 
 I sought to check the gdb_continue() call sites, and uses of
 RUN_STATE_DEBUG. Seems reasonable.
 
 Reviewed-by: Laszlo Ersek ler...@redhat.com
 
 (
 FWIW I wonder why in commit ad02b96a Luiz allowed DEBUG - SUSPENDED. As
 far as I understand, when the debugger is attached, the guest is not
 running, hence it can't go directly to RUN_STATE_SUSPENDED. Maybe due to
 a concurrent monitor command?

I honestly don't remember if I took into account the explanation you gave
below. You're right that a stopped guest can't suspend.

 
 Technically it does seem possible; from main_loop_should_exit():
 
 if (qemu_debug_requested()) {
 vm_stop(RUN_STATE_DEBUG);
 }
 if (qemu_suspend_requested()) {
 qemu_system_suspend();
 }
 
 Both requests could become pending during one iteration of the loop, and
 the next iteration will see both of them. OK.
 )
 
 Laszlo

[Qemu-devel] [PATCH v5 06/11] block: add drive-backup QMP command

2013-05-30 Thread Stefan Hajnoczi

@drive-backup

Start a point-in-time copy of a block device to a new destination.  The
status of ongoing drive-backup operations can be checked with
query-block-jobs where the BlockJobInfo.type field has the value 'backup'.
The operation can be stopped before it has completed using the
block-job-cancel command.

@device: the name of the device which should be copied.

@target: the target of the new image. If the file exists, or if it
 is a device, the existing file/device will be used as the new
 destination.  If it does not exist, a new file will be created.

@format: #optional the format of the new destination, default is to
 probe if @mode is 'existing', else the format of the source

@mode: #optional whether and how QEMU should create a new image, default is
   'absolute-paths'.

@speed: #optional the maximum speed, in bytes per second

@on-source-error: #optional the action to take on an error on the source,
  default 'report'.  'stop' and 'enospc' can only be used
  if the block device supports io-status (see BlockInfo).

@on-target-error: #optional the action to take on an error on the target,
  default 'report' (no limitations, since this applies to
  a different block device than @device).

Note that @on-source-error and @on-target-error only affect background I/O.
If an error occurs during a guest write request, the device's rerror/werror
actions will be used.

Returns: nothing on success
 If @device is not a valid block device, DeviceNotFound

Since 1.6

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 blockdev.c   | 97 
 qapi-schema.json | 46 +++
 qmp-commands.hx  | 46 +++
 3 files changed, 189 insertions(+)

diff --git a/blockdev.c b/blockdev.c
index 73b175b..5de33f5 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1346,6 +1346,103 @@ void qmp_block_commit(const char *device,
 drive_get_ref(drive_get_by_blockdev(bs));
 }
 
+void qmp_drive_backup(const char *device, const char *target,
+  bool has_format, const char *format,
+  bool has_mode, enum NewImageMode mode,
+  bool has_speed, int64_t speed,
+  bool has_on_source_error, BlockdevOnError 
on_source_error,
+  bool has_on_target_error, BlockdevOnError 
on_target_error,
+  Error **errp)
+{
+BlockDriverState *bs;
+BlockDriverState *target_bs;
+BlockDriver *drv = NULL;
+Error *local_err = NULL;
+int flags;
+int64_t size;
+int ret;
+
+if (!has_speed) {
+speed = 0;
+}
+if (!has_on_source_error) {
+on_source_error = BLOCKDEV_ON_ERROR_REPORT;
+}
+if (!has_on_target_error) {
+on_target_error = BLOCKDEV_ON_ERROR_REPORT;
+}
+if (!has_mode) {
+mode = NEW_IMAGE_MODE_ABSOLUTE_PATHS;
+}
+
+bs = bdrv_find(device);
+if (!bs) {
+error_set(errp, QERR_DEVICE_NOT_FOUND, device);
+return;
+}
+
+if (!bdrv_is_inserted(bs)) {
+error_set(errp, QERR_DEVICE_HAS_NO_MEDIUM, device);
+return;
+}
+
+if (!has_format) {
+format = mode == NEW_IMAGE_MODE_EXISTING ? NULL : bs-drv-format_name;
+}
+if (format) {
+drv = bdrv_find_format(format);
+if (!drv) {
+error_set(errp, QERR_INVALID_BLOCK_FORMAT, format);
+return;
+}
+}
+
+if (bdrv_in_use(bs)) {
+error_set(errp, QERR_DEVICE_IN_USE, device);
+return;
+}
+
+flags = bs-open_flags | BDRV_O_RDWR;
+
+size = bdrv_getlength(bs);
+if (size  0) {
+error_setg_errno(errp, -size, bdrv_getlength failed);
+return;
+}
+
+if (mode != NEW_IMAGE_MODE_EXISTING) {
+assert(format  drv);
+bdrv_img_create(target, format,
+NULL, NULL, NULL, size, flags, local_err, false);
+}
+
+if (error_is_set(local_err)) {
+error_propagate(errp, local_err);
+return;
+}
+
+target_bs = bdrv_new();
+ret = bdrv_open(target_bs, target, NULL, flags, drv);
+if (ret  0) {
+bdrv_delete(target_bs);
+error_set(errp, QERR_OPEN_FILE_FAILED, target);
+return;
+}
+
+backup_start(bs, target_bs, speed, on_source_error, on_target_error,
+ block_job_cb, bs, local_err);
+if (local_err != NULL) {
+bdrv_delete(target_bs);
+error_propagate(errp, local_err);
+return;
+}
+
+/* Grab a reference so hotplug does not delete the BlockDriverState from
+ * underneath us.
+ */
+drive_get_ref(drive_get_by_blockdev(bs));
+}
+
 #define DEFAULT_MIRROR_BUF_SIZE   (10  20)
 
 void qmp_drive_mirror(const char *device, const char *target,
diff --git a/qapi-schema.json b/qapi-schema.json
index ef1f657..55b1a37 100644
---

[Qemu-devel] [PATCH v5 03/11] block: add basic backup support to block driver

2013-05-30 Thread Stefan Hajnoczi

From: Dietmar Maurer diet...@proxmox.com

backup_start() creates a block job that copies a point-in-time snapshot
of a block device to a target block device.

We call backup_do_cow() for each write during backup. That function
reads the original data from the block device before it gets
overwritten.  The data is then written to the target device.

Currently backup cluster size is hardcoded to 65536 bytes.

[I made a number of changes to Dietmar's original patch and folded them
in to make code review easy.  Here is the full list:

 * Drop BackupDumpFunc interface in favor of a target block device
 * Detect zero clusters with buffer_is_zero() and use bdrv_co_write_zeroes()
 * Use 0 delay instead of 1us, like other block jobs
 * Unify creation/start functions into backup_start()
 * Simplify cleanup, free bitmap in backup_run() instead of cb
 * function
 * Use HBitmap to avoid duplicating bitmap code
 * Use bdrv_getlength() instead of accessing -total_sectors
 * directly
 * Delete the backup.h header file, it is no longer necessary
 * Move ./backup.c to block/backup.c
 * Remove #ifdefed out code
 * Coding style and whitespace cleanups
 * Use bdrv_add_before_write_notifier() instead of blockjob-specific hooks
 * Keep our own in-flight CowRequest list instead of using block.c
   tracked requests.  This means a little code duplication but is much
   simpler than trying to share the tracked requests list and use the
   backup block size.
 * Add on_source_error and on_target_error error handling.

-- stefanha]

Signed-off-by: Dietmar Maurer diet...@proxmox.com
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com

backup size fixes

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 block/Makefile.objs   |   1 +
 block/backup.c| 355 ++
 include/block/block_int.h |  19 +++
 3 files changed, 375 insertions(+)
 create mode 100644 block/backup.c

diff --git a/block/Makefile.objs b/block/Makefile.objs
index 5f0358a..88bd101 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -20,5 +20,6 @@ endif
 common-obj-y += stream.o
 common-obj-y += commit.o
 common-obj-y += mirror.o
+common-obj-y += backup.o
 
 $(obj)/curl.o: QEMU_CFLAGS+=$(CURL_CFLAGS)
diff --git a/block/backup.c b/block/backup.c
new file mode 100644
index 000..466744a
--- /dev/null
+++ b/block/backup.c
@@ -0,0 +1,355 @@
+/*
+ * QEMU backup
+ *
+ * Copyright (C) 2013 Proxmox Server Solutions
+ *
+ * Authors:
+ *  Dietmar Maurer (diet...@proxmox.com)
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include stdio.h
+#include errno.h
+#include unistd.h
+
+#include block/block.h
+#include block/block_int.h
+#include block/blockjob.h
+#include qemu/ratelimit.h
+
+#define DEBUG_BACKUP 0
+
+#define DPRINTF(fmt, ...) \
+do { \
+if (DEBUG_BACKUP) { \
+fprintf(stderr, backup:  fmt, ## __VA_ARGS__); \
+} \
+} while (0)
+
+#define BACKUP_CLUSTER_BITS 16
+#define BACKUP_CLUSTER_SIZE (1  BACKUP_CLUSTER_BITS)
+#define BACKUP_SECTORS_PER_CLUSTER (BACKUP_CLUSTER_SIZE / BDRV_SECTOR_SIZE)
+
+#define SLICE_TIME 1ULL /* ns */
+
+typedef struct CowRequest {
+int64_t start;
+int64_t end;
+QLIST_ENTRY(CowRequest) list;
+CoQueue wait_queue; /* coroutines blocked on this request */
+} CowRequest;
+
+typedef struct BackupBlockJob {
+BlockJob common;
+BlockDriverState *target;
+RateLimit limit;
+BlockdevOnError on_source_error;
+BlockdevOnError on_target_error;
+CoRwlock flush_rwlock;
+uint64_t sectors_read;
+HBitmap *bitmap;
+QLIST_HEAD(, CowRequest) inflight_reqs;
+} BackupBlockJob;
+
+/* See if in-flight requests overlap and wait for them to complete */
+static void coroutine_fn wait_for_overlapping_requests(BackupBlockJob *job,
+   int64_t start,
+   int64_t end)
+{
+CowRequest *req;
+bool retry;
+
+do {
+retry = false;
+QLIST_FOREACH(req, job-inflight_reqs, list) {
+if (end  req-start  start  req-end) {
+qemu_co_queue_wait(req-wait_queue);
+retry = true;
+break;
+}
+}
+} while (retry);
+}
+
+/* Keep track of an in-flight request */
+static void cow_request_begin(CowRequest *req, BackupBlockJob *job,
+ int64_t start, int64_t end)
+{
+req-start = start;
+req-end = end;
+qemu_co_queue_init(req-wait_queue);
+QLIST_INSERT_HEAD(job-inflight_reqs, req, list);
+}
+
+/* Forget about a completed request */
+static void cow_request_end(CowRequest *req)
+{
+QLIST_REMOVE(req, list);
+qemu_co_queue_restart_all(req-wait_queue);
+}
+
+static int coroutine_fn backup_do_cow(BlockDriverState *bs,
+  int64_t sector_num, int

Re: [Qemu-devel] [PATCH v2 2/5] pci: store PCI hole ranges in guestinfo structure

2013-05-30 Thread Michael S. Tsirkin

On Thu, May 30, 2013 at 02:25:41PM +0200, Laszlo Ersek wrote:
 I can't offer any opinion about the values you put into w32 and w64, but I 
 have some remarks.
 
 First, please consider passing -O/path/to/some/order_file to 
 git-format-patch, so that .h files show up at the top of each patch.
 
 On 05/30/13 13:07, Michael S. Tsirkin wrote:
  Will be used to pass hole ranges to guests.
  
  Signed-off-by: Michael S. Tsirkin m...@redhat.com
  ---
   hw/i386/pc.c  | 39 ++-
   hw/i386/pc_piix.c | 14 +-
   hw/i386/pc_q35.c  |  6 +-
   hw/pci-host/q35.c |  4 
   include/hw/i386/pc.h  | 19 ++-
   include/hw/pci-host/q35.h |  2 ++
   include/qemu/typedefs.h   |  1 +
   7 files changed, 81 insertions(+), 4 deletions(-)
  
  diff --git a/hw/i386/pc.c b/hw/i386/pc.c
  index 4844a6b..c233161 100644
  --- a/hw/i386/pc.c
  +++ b/hw/i386/pc.c
  @@ -978,6 +978,41 @@ void pc_cpus_init(const char *cpu_model, DeviceState 
  *icc_bridge)
   }
   }
   
  +typedef struct PcGuestInfoState {
  +PcGuestInfo info;
  +Notifier machine_done;
  +} PcGuestInfoState;
  +
  +static
  +void pc_guest_info_machine_done(Notifier *notifier, void *data)
  +{
  +PcGuestInfoState *guest_info_state = container_of(notifier,
  +  PcGuestInfoState,
  +  machine_done);
  +}
  +
  +PcGuestInfo *pc_guest_info_init(ram_addr_t below_4g_mem_size,
  +ram_addr_t above_4g_mem_size)
  +{
  +PcGuestInfoState *guest_info_state = g_malloc0(sizeof 
  *guest_info_state);
  +PcGuestInfo *guest_info = guest_info_state-info;
  +
  +guest_info-pci_info.w32.end = IO_APIC_DEFAULT_ADDRESS;
  +if (sizeof(hwaddr) == 4) {
  +guest_info-pci_info.w64.begin = 0;
  +guest_info-pci_info.w64.end = 0;
  +} else {
  +guest_info-pci_info.w64.begin = 0x1ULL + 
  above_4g_mem_size;
  +guest_info-pci_info.w64.end =  guest_info-pci_info.w64.begin +
  +(0x1ULL  62);
  +assert(guest_info-pci_info.w64.begin = 
  guest_info-pci_info.w64.end);
  +}
  +
  +guest_info_state-machine_done.notify = pc_guest_info_machine_done;
  +qemu_add_machine_init_done_notifier(guest_info_state-machine_done);
  +return guest_info;
  +}
  +
   void pc_acpi_init(const char *default_dsdt)
   {
   char *filename;
  @@ -1019,7 +1054,8 @@ FWCfgState *pc_memory_init(MemoryRegion 
  *system_memory,
  ram_addr_t below_4g_mem_size,
  ram_addr_t above_4g_mem_size,
  MemoryRegion *rom_memory,
  -   MemoryRegion **ram_memory)
  +   MemoryRegion **ram_memory,
  +   PcGuestInfo *guest_info)
   {
   int linux_boot, i;
   MemoryRegion *ram, *option_rom_mr;
  @@ -1071,6 +1107,7 @@ FWCfgState *pc_memory_init(MemoryRegion 
  *system_memory,
   for (i = 0; i  nb_option_roms; i++) {
   rom_add_option(option_rom[i].name, option_rom[i].bootindex);
   }
  +guest_info-fw_cfg = fw_cfg;
   return fw_cfg;
   }
 
 I find this suboptimal style, ie. passing guest_info to pc_memory_init() 
 just so that pc_memory_init() can set guest_info-fw_cfg to fw_cfg, when 
 pc_memory_init() returns fw_cfg anyway.

Well otherwise all callers have to remember to set it.

 More philosophically:
 
  diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
  index b4c8a74..1bf5219 100644
  --- a/include/hw/i386/pc.h
  +++ b/include/hw/i386/pc.h
  @@ -9,8 +9,20 @@
   #include net/net.h
   #include hw/i386/ioapic.h
   
  +#include qemu/range.h
  +
   /* PC-style peripherals (also used by other machines).  */
   
  +typedef struct PcPciInfo {
  +Range w32;
  +Range w64;
  +} PcPciInfo;
  +
  +struct PcGuestInfo {
  +PcPciInfo pci_info;
  +FWCfgState *fw_cfg;
  +};
 
 Are you sure you need a private link to the global fw_cfg object? The pvpanic 
 series added a global lookup possibility in commit 10a584b2 
 (object_resolve_path()).
 
 Anyway these are just subjective style notes.

Yes.

I personally prefer not using global lookups: passing a pointer makes
dependencies clearer IMO.

If we do switch to that, I think it's cleaner to have a wrapper so
that everyone does not need to hard-code strings like /machine/fw_cfg.


  +
   /* parallel.c */
   static inline bool parallel_init(ISABus *bus, int index, CharDriverState 
  *chr)
   {
  @@ -80,6 +92,10 @@ void pc_acpi_smi_interrupt(void *opaque, int irq, int 
  level);
   void pc_cpus_init(const char *cpu_model, DeviceState *icc_bridge);
   void pc_hot_add_cpu(const int64_t id, Error **errp);
   void pc_acpi_init(const char *default_dsdt);
  +
  +PcGuestInfo *pc_guest_info_init(ram_addr_t below_4g_mem_size,
  +ram_addr_t

[Qemu-devel] [PATCH v5 01/11] notify: add NotiferWithReturn so notifier list can abort

2013-05-30 Thread Stefan Hajnoczi

notifier_list_notify() has no return value.  This is fine when we just
want to invoke side-effects.

Sometimes it's useful for notifiers to produce a return value.  This
allows notifiers to veto an operation and will be used by the block
layer before-write notifier.

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 include/qemu/notify.h | 25 +
 util/notify.c | 30 ++
 2 files changed, 55 insertions(+)

diff --git a/include/qemu/notify.h b/include/qemu/notify.h
index 4e2e7f0..d3103e7 100644
--- a/include/qemu/notify.h
+++ b/include/qemu/notify.h
@@ -40,4 +40,29 @@ void notifier_remove(Notifier *notifier);
 
 void notifier_list_notify(NotifierList *list, void *data);
 
+/* Same as Notifier but allows .notify() to return errors */
+typedef struct NotifierWithReturn NotifierWithReturn;
+
+struct NotifierWithReturn {
+int (*notify)(NotifierWithReturn *notifier, void *data);
+QLIST_ENTRY(NotifierWithReturn) node;
+};
+
+typedef struct NotifierWithReturnList {
+QLIST_HEAD(, NotifierWithReturn) notifiers;
+} NotifierWithReturnList;
+
+#define NOTIFIER_WITH_RETURN_LIST_INITIALIZER(head) \
+{ QLIST_HEAD_INITIALIZER((head).notifiers) }
+
+void notifier_with_return_list_init(NotifierWithReturnList *list);
+
+void notifier_with_return_list_add(NotifierWithReturnList *list,
+   NotifierWithReturn *notifier);
+
+void notifier_with_return_remove(NotifierWithReturn *notifier);
+
+int notifier_with_return_list_notify(NotifierWithReturnList *list,
+ void *data);
+
 #endif
diff --git a/util/notify.c b/util/notify.c
index 7b7692a..f215dfc 100644
--- a/util/notify.c
+++ b/util/notify.c
@@ -39,3 +39,33 @@ void notifier_list_notify(NotifierList *list, void *data)
 notifier-notify(notifier, data);
 }
 }
+
+void notifier_with_return_list_init(NotifierWithReturnList *list)
+{
+QLIST_INIT(list-notifiers);
+}
+
+void notifier_with_return_list_add(NotifierWithReturnList *list,
+   NotifierWithReturn *notifier)
+{
+QLIST_INSERT_HEAD(list-notifiers, notifier, node);
+}
+
+void notifier_with_return_remove(NotifierWithReturn *notifier)
+{
+QLIST_REMOVE(notifier, node);
+}
+
+int notifier_with_return_list_notify(NotifierWithReturnList *list, void *data)
+{
+NotifierWithReturn *notifier, *next;
+int ret = 0;
+
+QLIST_FOREACH_SAFE(notifier, list-notifiers, node, next) {
+ret = notifier-notify(notifier, data);
+if (ret != 0) {
+break;
+}
+}
+return ret;
+}
-- 
1.8.1.4

[Qemu-devel] [PATCH v5 00/11] block: drive-backup live backup command

2013-05-30 Thread Stefan Hajnoczi

Note: These patches apply to kevin/block.  You can also grab the code from git
here:
git://github.com/stefanha/qemu.git block-backup-core

This series adds a new QMP command, drive-backup, which takes a point-in-time
snapshot of a block device.  The snapshot is copied out to a target block
device.  A simple example is:

  drive-backup device=virtio0 format=qcow2 target=backup-20130401.qcow2

The original drive-backup blockjob was written by Dietmar Maurer
diet...@proxmox.com.  He is currently busy but I feel the feature is worth
pushing into QEMU since there has been interest.  This is my version of his
patch, plus the QMP command and qemu-iotests test case.

QMP 'transaction' support is included since v3.  It adds support for atomic
snapshots of multiple block devices.  I also added an 'abort' transaction to
allow testing of the .abort()/.cleanup() code path.  Thanks to Wenchao for
making qmp_transaction() extensible.

How is this different from block-stream and drive-mirror?
-
Both block-stream and drive-mirror do not provide immediate point-in-time
snapshots.  Instead they copy data into a new file and then switch to it.  In
other words, the point at which the snapshot is taken cannot be controlled
directly.

drive-backup intercepts guest writes and saves data into the target block
device before it is overwritten.  The target block device can be a raw image
file, backing files are not used to implement this feature.

How can drive-backup be used?
-
The simplest use-case is to copy a point-in-time snapshot to a local file.

More advanced users may wish to make the target an NBD URL.  The NBD server
listening on the other side can process the backup writes any way it wishes.  I
previously posted an RFC series with a backup server that streamed Dietmar's
VMA backup archive format.

What's next for drive-backup?
-
1. Sync modes like drive-mirror (top, full, none).  This makes it possible to
   preserve the backing file chain.

v5:
 * Use bdrv_co_write_zeroes(job-target) [kwolf]

   This change means that we write zeroes over NBD again.  The optimization can
   be reintroduced by skipping zeroes when bdrv_has_zero_init() is true and the
   sectors are allocated.  Leave that for a future series, if we decide to do
   this optimization again because it may require extra block driver
   configuration to indicate that an image has zero init.

 * iostatus error handling [kwolf/pbonzini]

   Add configurable on-source-error and on-target-error actions just like
   drive-mirror.  These are used when the block job coroutine hits an error.
   If we are in guest write request context, return the errno and let the usual
   guest error handling take over.

 * Allow BdrvActionOps-commit() to be NULL [eblake]
 * Use bdrv_getlength() in qmp_drive_mirror() [kwolf]
 * Drop redundant proto_drv check [kwolf]
 * Fix outdated DPRINTF() function names
 * Drop comment about non-existent bitmap coroutine race [kwolf]
 * Rename BACKUP_SECTORS_PER_CLUSTER [kwolf]
 * Fix completion when image is not multiple of backup cluster size [fam]

v4:
 * Use notifier lists and BdrvTrackedRequest instead of custom callbacks 
[bonzini]
 * Add drive-backup QMP example JSON [eblake]
 * Add Since: 1.6 to QMP schema changes [eblake]

v3:
 * Rename to drive-backup for consistency with drive-mirror [kwolf]
 * Add QMP transaction support [kwolf]
 * Introduce bdrv_add_before_write_cb() to hook writes
 * Mention 'query-block-jobs' lists job of type 'backup' [eblake]
 * Rename rwlock to flush_rwlock [kwolf]
 * Fix space in block/backup.c comment [kwolf]

v2:
 * s/block_backup/block-backup/ in commit message [eblake]
 * Avoid funny spacing in QMP docs [eblake]
 * Document query-block-jobs and block-job-cancel usage [eblake]

Dietmar Maurer (1):
  block: add basic backup support to block driver

Stefan Hajnoczi (10):
  notify: add NotiferWithReturn so notifier list can abort
  block: add bdrv_add_before_write_notifier()
  blockdev: drop redundant proto_drv check
  blockdev: use bdrv_getlength() in qmp_drive_mirror()
  block: add drive-backup QMP command
  blockdev: rename BlkTransactionStates to singular
  blockdev: allow BdrvActionOps-commit() to be NULL
  blockdev: add DriveBackup transaction
  blockdev: add Abort transaction
  qemu-iotests: add 055 drive-backup test case

 block.c|  23 +--
 block/Makefile.objs|   1 +
 block/backup.c | 355 +
 blockdev.c | 288 +++-
 include/block/block_int.h  |  42 +-
 include/qemu/notify.h  |  25 
 qapi-schema.json   |  97 -
 qmp-commands.hx|  46 ++
 tests/qemu-iotests/055 | 256 
 tests/qemu-iotests/055.out |   5 +
 tests/qemu-iotests/group   |   1 +
 util/notify.c  |  30 
 12 files

Re: [Qemu-devel] [Xen-devel] [PVSCSI]How to unplug scsi disk simulated by Qemu, just like unplug the ide disk?

2013-05-30 Thread Pasi Kärkkäinen

On Thu, May 30, 2013 at 11:23:44AM +, Gonglei (Arei) wrote:
 Hi all,
 My environment is xen-4.1.2 + qemu-1.2.2
   I made a pvscsi driver for Redhat guest, but I encountered a problem 
 that I could see two scsi disks, one was simulated by QEMU, another was 
 passthrough.
 Actually I want to unplug the scsi disk simulated. Any methods can solve 
 the problem on the qemu upstream. Thanks!


Are you actually using Xen *PVSCSI* 
(http://wiki.xen.org/wiki/Paravirtualized_SCSI),
or are you talking about Xen HVM guest with PV drivers for disk? 

-- Pasi

Re: [Qemu-devel] [RFC] Check backing_file chain's loop

2013-05-30 Thread Stefan Hajnoczi

On Thu, May 30, 2013 at 06:12:00AM -0600, Eric Blake wrote:
 On 05/30/2013 04:14 AM, Dongxu Wang wrote:
  Hi,
  
  Now block layer does not check whether backing_file chain can be a
  circle, do you think it is necessary to do a check?
  
  For example,  1.qcow2's backing_file is 2.qcow2 and 2.qcow2's
  backing_file is 1.qcow2, then any IO operation will be a fault.
  
  If necessary, I will post a patch to fix.
 
 Yes, it's worth patching.
 
  
  [$] qemu-img create -f qcow2 1.qcow2
  qemu-img: Image creation needs a size parameter
  [$] qemu-img create -f qcow2 1.qcow2 8M
  Formatting '1.qcow2', fmt=qcow2 size=8388608 encryption=off
  cluster_size=65536 lazy_refcounts=off
  [$] qemu-img create -f qcow2 2.qcow2 -o backing_file=1.qcow2
  Formatting '2.qcow2', fmt=qcow2 size=8388608 backing_file='1.qcow2'
  encryption=off cluster_size=65536 lazy_refcounts=off
  [$] qemu-img create -f qcow2 1.qcow2 -o backing_file=2.qcow2
  Formatting '1.qcow2', fmt=qcow2 size=8388608 backing_file='2.qcow2'
  encryption=off cluster_size=65536 lazy_refcounts=off
  [$] qemu-io -c read -v 0 512 1.qcow2
  ^CSegmentation fault (core dumped)
 
 Also make sure that 'qemu-img info 1.qcow2' and 'qemu-img info
 --backing-chain 1.qcow2' don't crash.

qemu-img info --backing-chain detects cycles and returns an error, see
the hash table in collect_image_info_list().

We should protect bdrv_open() too.

Stefan

Re: [Qemu-devel] [PATCH v2 2/5] pci: store PCI hole ranges in guestinfo structure

2013-05-30 Thread Michael S. Tsirkin

On Thu, May 30, 2013 at 02:32:01PM +0200, Gerd Hoffmann wrote:
 On 05/30/13 14:19, Michael S. Tsirkin wrote:
  On Thu, May 30, 2013 at 02:16:13PM +0200, Gerd Hoffmann wrote:
Hi,
 
  +} else {
  +guest_info-pci_info.w64.begin = 0x1ULL + 
  above_4g_mem_size;
  +guest_info-pci_info.w64.end =  guest_info-pci_info.w64.begin +
  +(0x1ULL  62);
 
  Doesn't this give unaligned windows?
  
  PCI Bridge windows do not need to be size aligned.
  
  In any case, the windows are *exactly* as calculated
  by seabios - apparently it does not size-align windows either.
 
 Surely not.  SeaBIOS sizes the 64bit window according to the space
 needed by the 64bit bars it wants to map there.

Ah, it's 64 bit. True. That's a seabios bug by the way:
if we add more devices by hotplug later, we want more
pci memory.

  +/* Set PCI window size the way seabios has always done it. */
  +/* TODO: consider just starting at below_4g_mem_size */
 
  Used to be that way.  Was changed for alignment reasons (i.e. 1G window
  starts at 1G border etc).
  
  Where's the alignment requirement coming from?
 
 seabios creates a mtrr entry for the window, which doesn't work in case
 it isn't aligned (at least not with a single entry).
 
 Also real hardware tends to do it this way.
 
 cheers,
   Gerd

I see. I'll figure out the details and add a comment to this end.

But that's for the 32 bit window - I don't see it playing
with mtrrs for the 64 bit ranges.
So I'm guessing alignment isn't needed there, right?

-- 
MST

Re: [Qemu-devel] [PATCH] walk_pml4e(): fix abort on bad PML4E/PDPTE/PDE/PTE addresses

2013-05-30 Thread Luiz Capitulino

On Tue, 28 May 2013 14:19:22 -0400
Luiz Capitulino lcapitul...@redhat.com wrote:

 The code used to walk IA-32e page-tables, and possibly PAE page-tables,
 uses the bit mask ~0xfff to get the next PML4E/PDPTE/PDE/PTE address.
 
 However, as we use a uint64_t to store the resulting address, that mask
 gets expanded to 0xf000 which not only ends up selecting
 reserved bits but also selects the XD bit (execute-disable) which
 happens to be enabled by Windows 8, causing qemu_get_ram_ptr() to abort.
 
 This commit fixes that problem by replacing ~0xfff by a correct mask
 that only selects the address bit range (ie. bits 51:12).
 
 Signed-off-by: Luiz Capitulino lcapitul...@redhat.com

Ping? Wen?

Would be nice get a Reviewed-by before merging...

 ---
 
 PS: I (obviously) don't any more core dumps with this patch applied, but
 I couldn't check if the Windows dump is correct (does anyone know
 how to do this?). I did quickly check on Linux though.
 
  target-i386/arch_memory_mapping.c | 10 ++
  1 file changed, 6 insertions(+), 4 deletions(-)
 
 diff --git a/target-i386/arch_memory_mapping.c 
 b/target-i386/arch_memory_mapping.c
 index 844893f..24884bd 100644
 --- a/target-i386/arch_memory_mapping.c
 +++ b/target-i386/arch_memory_mapping.c
 @@ -75,6 +75,8 @@ static void walk_pte2(MemoryMappingList *list,
  }
  
  /* PAE Paging or IA-32e Paging */
 +#define PLM4_ADDR_MASK 0xff000 /* selects bits 51:12 */
 +
  static void walk_pde(MemoryMappingList *list, hwaddr pde_start_addr,
   int32_t a20_mask, target_ulong start_line_addr)
  {
 @@ -105,7 +107,7 @@ static void walk_pde(MemoryMappingList *list, hwaddr 
 pde_start_addr,
  continue;
  }
  
 -pte_start_addr = (pde  ~0xfff)  a20_mask;
 +pte_start_addr = (pde  PLM4_ADDR_MASK)  a20_mask;
  walk_pte(list, pte_start_addr, a20_mask, line_addr);
  }
  }
 @@ -208,7 +210,7 @@ static void walk_pdpe(MemoryMappingList *list,
  continue;
  }
  
 -pde_start_addr = (pdpe  ~0xfff)  a20_mask;
 +pde_start_addr = (pdpe  PLM4_ADDR_MASK)  a20_mask;
  walk_pde(list, pde_start_addr, a20_mask, line_addr);
  }
  }
 @@ -231,7 +233,7 @@ static void walk_pml4e(MemoryMappingList *list,
  }
  
  line_addr = ((i  0x1ffULL)  39) | (0xULL  48);
 -pdpe_start_addr = (pml4e  ~0xfff)  a20_mask;
 +pdpe_start_addr = (pml4e  PLM4_ADDR_MASK)  a20_mask;
  walk_pdpe(list, pdpe_start_addr, a20_mask, line_addr);
  }
  }
 @@ -249,7 +251,7 @@ int cpu_get_memory_mapping(MemoryMappingList *list, 
 CPUArchState *env)
  if (env-hflags  HF_LMA_MASK) {
  hwaddr pml4e_addr;
  
 -pml4e_addr = (env-cr[3]  ~0xfff)  env-a20_mask;
 +pml4e_addr = (env-cr[3]  PLM4_ADDR_MASK)  env-a20_mask;
  walk_pml4e(list, pml4e_addr, env-a20_mask);
  } else
  #endif

Re: [Qemu-devel] [PATCH v19 4/7] pvpanic: pass configurable ioport to seabios

2013-05-30 Thread Michael S. Tsirkin

On Thu, Apr 18, 2013 at 10:14:43AM +0800, Hu Tao wrote:
 This lets seabios patch the corresponding SSDT entry.
 
 Signed-off-by: Hu Tao hu...@cn.fujitsu.com
 ---
  hw/misc/pvpanic.c | 14 ++
  hw/nvram/fw_cfg.c |  8 +++-
  include/hw/nvram/fw_cfg.h |  2 ++
  3 files changed, 23 insertions(+), 1 deletion(-)
 
 diff --git a/hw/misc/pvpanic.c b/hw/misc/pvpanic.c
 index c3adcdf..23599a3 100644
 --- a/hw/misc/pvpanic.c
 +++ b/hw/misc/pvpanic.c
 @@ -18,6 +18,8 @@
  #include sysemu/sysemu.h
  #include sysemu/kvm.h
  
 +#include hw/nvram/fw_cfg.h
 +
  /* The bit of supported pv event */
  #define PVPANIC_F_PANICKED  0
  
 @@ -86,10 +88,22 @@ static const MemoryRegionOps pvpanic_ops = {
  static int pvpanic_isa_initfn(ISADevice *dev)
  {
  PVPanicState *s = ISA_PVPANIC_DEVICE(dev);
 +static bool port_configured;
 +void *fw_cfg;
  
  memory_region_init_io(s-io, pvpanic_ops, s, pvpanic, 1);
  isa_register_ioport(dev, s-io, s-ioport);
  
 +if (!port_configured) {
 +fw_cfg = object_resolve_path(/machine/fw_cfg, NULL);
 +if (fw_cfg) {
 +fw_cfg_add_file(fw_cfg, etc/pvpanic-port,
 +g_memdup(s-ioport, sizeof(s-ioport)),
 +sizeof(s-ioport));
 +port_configured = true;
 +}
 +}
 +
  return 0;
  }
  

Sorry about missing this when this was posted/applied.
Wouldn't it make more sense to have /machine/fw_cfg
string in a header.


 diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c
 index 97bba87..1a7e49c 100644
 --- a/hw/nvram/fw_cfg.c
 +++ b/hw/nvram/fw_cfg.c
 @@ -489,11 +489,17 @@ FWCfgState *fw_cfg_init(uint32_t ctl_port, uint32_t 
 data_port,
  dev = qdev_create(NULL, fw_cfg);
  qdev_prop_set_uint32(dev, ctl_iobase, ctl_port);
  qdev_prop_set_uint32(dev, data_iobase, data_port);
 -qdev_init_nofail(dev);
  d = SYS_BUS_DEVICE(dev);
  
  s = DO_UPCAST(FWCfgState, busdev.qdev, dev);
  
 +if (!object_resolve_path(/machine/fw_cfg, NULL)) {

Why is this tested here?
Should be an assert(!object_resolve_path(/machine/fw_cfg, NULL))
surely?

 +object_property_add_child(qdev_get_machine(), fw_cfg, OBJECT(s),
 +  NULL);
 +}
 +
 +qdev_init_nofail(dev);
 +
  if (ctl_addr) {
  sysbus_mmio_map(d, 0, ctl_addr);
  }
 diff --git a/include/hw/nvram/fw_cfg.h b/include/hw/nvram/fw_cfg.h
 index 05c8df1..07cc941 100644
 --- a/include/hw/nvram/fw_cfg.h
 +++ b/include/hw/nvram/fw_cfg.h
 @@ -1,6 +1,8 @@
  #ifndef FW_CFG_H
  #define FW_CFG_H
  
 +#include exec/hwaddr.h
 +
  #define FW_CFG_SIGNATURE0x00
  #define FW_CFG_ID   0x01
  #define FW_CFG_UUID 0x02
 -- 
 1.8.1.4

Re: [Qemu-devel] broken incoming migration

2013-05-30 Thread Paolo Bonzini

Il 30/05/2013 11:08, Peter Lieven ha scritto:
 Am 30.05.2013 10:18, schrieb Alexey Kardashevskiy:
 On 05/30/2013 05:49 PM, Paolo Bonzini wrote:
 Il 30/05/2013 09:44, Alexey Kardashevskiy ha scritto:
 Hi!

 I found the migration broken on pseries platform, specifically, this patch
 broke it:

 f1c72795af573b24a7da5eb52375c9aba8a37972
 migration: do not sent zero pages in bulk stage

 The idea is not to send zero pages to the destination guest which is
 expected to have 100% empty RAM.

 However on pseries plaftorm the guest always has some stuff in the RAM as a
 part of initialization (device tree, system firmware and rtas (?)) so it is
 not completely empty. As the source guest cannot detect this, it skips some
 pages during migration and we get a broken destination guest. Bug.

 While the idea is ok in general, I do not see any easy way to fix it as
 neither QEMUMachine::init nor QEMUMachine::reset callbacks has information
 about whether we are about to receive a migration or not (-incoming
 parameter) and we cannot move device-tree and system firmware
 initialization anywhere else.

 ram_bulk_stage is static and cannot be disabled from the platform
 initialization code.

 So what would the community suggest?
 Revert the patch. :)
 I'll wait for 24 hours (forgot to cc: the author) and then post a revert
 patch :)

 does this problem only occur on pseries emulation?

Probably not.  On a PC, it would occur if you had 4K of zeros in the
source BIOS but not in the destination BIOS.  When you reboot, the BIOS
image is wrong.

 not sending zero pages is not only a performance benefit it also makes
 overcomitted memory usable. the madv_dontneed seems to kick in asynchronously
 and memory is not available immediately.

You could also scan the page for nonzero values before writing it.

Paolo

Re: [Qemu-devel] [PATCH 0/2] fixup some fallback from tree reorg

2013-05-30 Thread Paolo Bonzini

Il 30/05/2013 11:30, Michael S. Tsirkin ha scritto:
 Here are some patches fixing up minor issues
 in the tree reorganization.
 
 Michael S. Tsirkin (2):
   dec.c - move to pci-bridge
   firmware_abi: move to include/hw/nvram/
 
  hw/pci-bridge/Makefile.objs  |   2 +
  hw/pci-bridge/dec.c  | 156 
 +++
  hw/pci-bridge/dec.h  |  10 ++
  hw/pci-host/Makefile.objs|   1 -
  hw/pci-host/dec.c| 156 
 ---
  hw/pci-host/dec.h|  10 --
  hw/sparc/sun4m.c |   2 +-
  hw/sparc64/sun4u.c   |   2 +-
  include/hw/nvram/openbios_firmware_abi.h |  73 +++
  include/hw/sparc/firmware_abi.h  |  73 ---
  10 files changed, 243 insertions(+), 242 deletions(-)
  create mode 100644 hw/pci-bridge/dec.c
  create mode 100644 hw/pci-bridge/dec.h
  delete mode 100644 hw/pci-host/dec.c
  delete mode 100644 hw/pci-host/dec.h
  create mode 100644 include/hw/nvram/openbios_firmware_abi.h
  delete mode 100644 include/hw/sparc/firmware_abi.h
 

Thanks.  Please give Blue a couple of days to comment.

Paolo

Re: [Qemu-devel] [PATCH 0/2] fixup some fallback from tree reorg

2013-05-30 Thread Michael S. Tsirkin

On Thu, May 30, 2013 at 03:04:39PM +0200, Paolo Bonzini wrote:
 Il 30/05/2013 11:30, Michael S. Tsirkin ha scritto:
  Here are some patches fixing up minor issues
  in the tree reorganization.
  
  Michael S. Tsirkin (2):
dec.c - move to pci-bridge
firmware_abi: move to include/hw/nvram/
  
   hw/pci-bridge/Makefile.objs  |   2 +
   hw/pci-bridge/dec.c  | 156 
  +++
   hw/pci-bridge/dec.h  |  10 ++
   hw/pci-host/Makefile.objs|   1 -
   hw/pci-host/dec.c| 156 
  ---
   hw/pci-host/dec.h|  10 --
   hw/sparc/sun4m.c |   2 +-
   hw/sparc64/sun4u.c   |   2 +-
   include/hw/nvram/openbios_firmware_abi.h |  73 +++
   include/hw/sparc/firmware_abi.h  |  73 ---
   10 files changed, 243 insertions(+), 242 deletions(-)
   create mode 100644 hw/pci-bridge/dec.c
   create mode 100644 hw/pci-bridge/dec.h
   delete mode 100644 hw/pci-host/dec.c
   delete mode 100644 hw/pci-host/dec.h
   create mode 100644 include/hw/nvram/openbios_firmware_abi.h
   delete mode 100644 include/hw/sparc/firmware_abi.h
  
 
 Thanks.  Please give Blue a couple of days to comment.
 
 Paolo

I always do this: pci tree tip is documented as
being rebased so I drop stuff from there
if there are nacks.

Re: [Qemu-devel] [PATCH v3 4/5] xilinx_devcfg: Zynq devcfg device model

2013-05-30 Thread Paolo Bonzini

Il 30/05/2013 09:15, Peter Crosthwaite ha scritto:
  Hi Peter, Should we be putting r-access-name here instead of 
  devcfg-regs?
 
  Yes, that's why I preferred to wrap the memory_region_init_io into an
  API that takes a RegisterInfo. :)
 ACK,
 
 You've convinced me :). Will be in v4 (pending outcome of discussion
 with Anthony RE decoding)

I would also have preferred (I was too terse in mentioning
.accepts.valid earlier so this is the less concise explanation) an API
that does a single memory_region_init_io region for a whole array of
registers.  Basically having a RegisterArrayInfo in addition to the
RegisterInfo.

Paolo

1 2 3 4 >

1 - 100 of 300 matches

Mail list logo