date:20150730

[Qemu-devel] [PATCH 26/53] hw/core: rebase sysbus_get_fw_dev_path() to g_strdup_printf()

2015-07-30 Thread Michael Roth

From: Laszlo Ersek ler...@redhat.com

This is done mainly for improving readability, and in preparation for the
next patch, but Markus pointed out another bonus for the string being
returned:

No arbitrary length limit. Before the patch, it's 39 characters, and the
code breaks catastrophically when qdev_fw_name() is longer: the second
snprintf() is called with its first argument pointing beyond path[], and
its second argument underflowing to a huge size.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Laszlo Ersek ler...@redhat.com
Tested-by: Marcel Apfelbaum mar...@redhat.com
Reviewed-by: Marcel Apfelbaum mar...@redhat.com
Reviewed-by: Markus Armbruster arm...@redhat.com
Reviewed-by: Michael S. Tsirkin m...@redhat.com
Signed-off-by: Michael S. Tsirkin m...@redhat.com
(cherry picked from commit 5ba03e2dd785362026917e4cc8a1fd2c64e8e62c)
Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com
---
 hw/core/sysbus.c | 16 ++--
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/hw/core/sysbus.c b/hw/core/sysbus.c
index b53c351..92eced9 100644
--- a/hw/core/sysbus.c
+++ b/hw/core/sysbus.c
@@ -281,19 +281,15 @@ static void sysbus_dev_print(Monitor *mon, DeviceState 
*dev, int indent)
 static char *sysbus_get_fw_dev_path(DeviceState *dev)
 {
 SysBusDevice *s = SYS_BUS_DEVICE(dev);
-char path[40];
-int off;
-
-off = snprintf(path, sizeof(path), %s, qdev_fw_name(dev));
 
 if (s-num_mmio) {
-snprintf(path + off, sizeof(path) - off, @TARGET_FMT_plx,
- s-mmio[0].addr);
-} else if (s-num_pio) {
-snprintf(path + off, sizeof(path) - off, @i%04x, s-pio[0]);
+return g_strdup_printf(%s@ TARGET_FMT_plx, qdev_fw_name(dev),
+   s-mmio[0].addr);
 }
-
-return g_strdup(path);
+if (s-num_pio) {
+return g_strdup_printf(%s@i%04x, qdev_fw_name(dev), s-pio[0]);
+}
+return g_strdup(qdev_fw_name(dev));
 }
 
 void sysbus_add_io(SysBusDevice *dev, hwaddr addr,
-- 
1.9.1

[Qemu-devel] [PATCH 04/53] nbd/trivial: fix type cast for ioctl

2015-07-30 Thread Michael Roth

From: Bogdan Purcareata bogdan.purcare...@freescale.com

This fixes ioctl behavior on powerpc e6500 platforms with 64bit kernel and 32bit
userspace. The current type cast has no effect there and the value passed to the
kernel is still 0. Probably an issue related to the compiler, since I'm assuming
the same configuration works on a similar setup on x86.

Also ensure consistency with previous type cast in TRACE message.

Signed-off-by: Bogdan Purcareata bogdan.purcare...@freescale.com
Message-Id: 1428058914-32050-1-git-send-email-bogdan.purcare...@freescale.com
Cc: qemu-sta...@nongnu.org
[Fix parens as noticed by Michael. - Paolo]
Signed-off-by: Paolo Bonzini pbonz...@redhat.com

(cherry picked from commit d064d9f381b00538e41f14104b88a1ae85d78865)
Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com
---
 nbd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/nbd.c b/nbd.c
index 91b7d56..cb1b9bb 100644
--- a/nbd.c
+++ b/nbd.c
@@ -681,7 +681,7 @@ int nbd_init(int fd, int csock, uint32_t flags, off_t size)
 
 TRACE(Setting size to %zd block(s), (size_t)(size / BDRV_SECTOR_SIZE));
 
-if (ioctl(fd, NBD_SET_SIZE_BLOCKS, size / (size_t)BDRV_SECTOR_SIZE)  0) {
+if (ioctl(fd, NBD_SET_SIZE_BLOCKS, (size_t)(size / BDRV_SECTOR_SIZE))  0) 
{
 int serrno = errno;
 LOG(Failed setting size (in blocks));
 return -serrno;
-- 
1.9.1

[Qemu-devel] [PATCH 05/53] vmdk: Fix next_cluster_sector for compressed write

2015-07-30 Thread Michael Roth

From: Fam Zheng f...@redhat.com

This fixes the bug introduced by commit c6ac36e (vmdk: Optimize cluster
allocation).

Sometimes, write_len could be larger than cluster size, because it
contains both data and marker.  We must advance next_cluster_sector in
this case, otherwise the image gets corrupted.

Cc: qemu-sta...@nongnu.org
Reported-by: Antoni Villalonga qemu-l...@friki.cat
Signed-off-by: Fam Zheng f...@redhat.com
Reviewed-by: Max Reitz mre...@redhat.com
Signed-off-by: Kevin Wolf kw...@redhat.com
(cherry picked from commit 5e82a31eb967db135fc4e688b134fb0972d62de3)
Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com
---
 block/vmdk.c | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index 8410a15..bb093dd 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1302,6 +1302,8 @@ static int vmdk_write_extent(VmdkExtent *extent, int64_t 
cluster_offset,
 uLongf buf_len;
 const uint8_t *write_buf = buf;
 int write_len = nb_sectors * 512;
+int64_t write_offset;
+int64_t write_end_sector;
 
 if (extent-compressed) {
 if (!extent-has_marker) {
@@ -1320,10 +1322,14 @@ static int vmdk_write_extent(VmdkExtent *extent, 
int64_t cluster_offset,
 write_buf = (uint8_t *)data;
 write_len = buf_len + sizeof(VmdkGrainMarker);
 }
-ret = bdrv_pwrite(extent-file,
-cluster_offset + offset_in_cluster,
-write_buf,
-write_len);
+write_offset = cluster_offset + offset_in_cluster,
+ret = bdrv_pwrite(extent-file, write_offset, write_buf, write_len);
+
+write_end_sector = DIV_ROUND_UP(write_offset + write_len, 
BDRV_SECTOR_SIZE);
+
+extent-next_cluster_sector = MAX(extent-next_cluster_sector,
+  write_end_sector);
+
 if (ret != write_len) {
 ret = ret  0 ? ret : -EIO;
 goto out;
-- 
1.9.1

[Qemu-devel] [PATCH for-2.4 0/3] scsi: fixes for failed requests

2015-07-30 Thread Stefan Hajnoczi

When requests fail the error policy (-drive rerror=,werror=) determines what
happens.  The 'stop' policy pauses the guest and waits for the administrator to
resolve the storage problem.  It is possible to live migrate during this time
and the failed requests can be restarted on the destination host.

Two bugs:
1. Segfault due to missing sgs mapping when loading migrated failed requests.
2. Incorrect error action due to broken is_read logic.

I also noticed that the unaligned WRITE SAME test case in
tests/virtio-scsi-test.c is broken.  I've included a fix for that too.

Stefan Hajnoczi (3):
  virtio-scsi: use virtqueue_map_sg() when loading requests
  scsi-disk: fix cmd.mode field typo
  tests: virtio-scsi: clear unit attention after reset

 hw/scsi/scsi-disk.c  |  2 +-
 hw/scsi/virtio-scsi.c|  5 +++
 tests/virtio-scsi-test.c | 90 +---
 3 files changed, 60 insertions(+), 37 deletions(-)

-- 
2.4.3

[Qemu-devel] [PATCH for-2.4 1/3] virtio-scsi: use virtqueue_map_sg() when loading requests

2015-07-30 Thread Stefan Hajnoczi

The VirtQueueElement struct is serialized during migration but the
in_sg[]/out_sg[] iovec arrays are not usable on the destination host
because the pointers are meaningless.

Use virtqueue_map_sg() to refresh in_sg[]/out_sg[] to valid pointers
based on in_addr[]/out_addr[] hwaddrs.

Cc: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 hw/scsi/virtio-scsi.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 811c3da..a8bb1c6 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -217,6 +217,11 @@ static void *virtio_scsi_load_request(QEMUFile *f, 
SCSIRequest *sreq)
 assert(req-elem.in_num = ARRAY_SIZE(req-elem.in_sg));
 assert(req-elem.out_num = ARRAY_SIZE(req-elem.out_sg));
 
+virtqueue_map_sg(req-elem.in_sg, req-elem.in_addr,
+ req-elem.in_num, 1);
+virtqueue_map_sg(req-elem.out_sg, req-elem.out_addr,
+ req-elem.out_num, 0);
+
 if (virtio_scsi_parse_req(req, sizeof(VirtIOSCSICmdReq) + vs-cdb_size,
   sizeof(VirtIOSCSICmdResp) + vs-sense_size)  0) 
{
 error_report(invalid SCSI request migration data);
-- 
2.4.3

[Qemu-devel] [PATCH for-2.4 3/3] tests: virtio-scsi: clear unit attention after reset

2015-07-30 Thread Stefan Hajnoczi

The unit attention after reset (power on) prevents normal commands from
running.  The unaligned WRITE SAME test never executed its command!

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 tests/virtio-scsi-test.c | 90 +---
 1 file changed, 54 insertions(+), 36 deletions(-)

diff --git a/tests/virtio-scsi-test.c b/tests/virtio-scsi-test.c
index 11ccdd6..6e91ca9 100644
--- a/tests/virtio-scsi-test.c
+++ b/tests/virtio-scsi-test.c
@@ -13,6 +13,7 @@
 #include libqtest.h
 #include qemu/osdep.h
 #include stdio.h
+#include block/scsi.h
 #include libqos/virtio.h
 #include libqos/virtio-pci.h
 #include libqos/pci-pc.h
@@ -71,40 +72,6 @@ static void qvirtio_scsi_stop(void)
 qtest_end();
 }
 
-static QVirtIOSCSI *qvirtio_scsi_pci_init(int slot)
-{
-QVirtIOSCSI *vs;
-QVirtioPCIDevice *dev;
-void *addr;
-int i;
-
-vs = g_new0(QVirtIOSCSI, 1);
-vs-alloc = pc_alloc_init();
-vs-bus = qpci_init_pc();
-
-dev = qvirtio_pci_device_find(vs-bus, QVIRTIO_SCSI_DEVICE_ID);
-vs-dev = (QVirtioDevice *)dev;
-g_assert(dev != NULL);
-g_assert_cmphex(vs-dev-device_type, ==, QVIRTIO_SCSI_DEVICE_ID);
-
-qvirtio_pci_device_enable(dev);
-qvirtio_reset(qvirtio_pci, vs-dev);
-qvirtio_set_acknowledge(qvirtio_pci, vs-dev);
-qvirtio_set_driver(qvirtio_pci, vs-dev);
-
-addr = dev-addr + QVIRTIO_PCI_DEVICE_SPECIFIC_NO_MSIX;
-vs-num_queues = qvirtio_config_readl(qvirtio_pci, vs-dev,
-  (uint64_t)(uintptr_t)addr);
-
-g_assert_cmpint(vs-num_queues, , MAX_NUM_QUEUES);
-
-for (i = 0; i  vs-num_queues + 2; i++) {
-vs-vq[i] = qvirtqueue_setup(qvirtio_pci, vs-dev, vs-alloc, i);
-}
-
-return vs;
-}
-
 static void qvirtio_scsi_pci_free(QVirtIOSCSI *vs)
 {
 int i;
@@ -134,7 +101,8 @@ static uint64_t qvirtio_scsi_alloc(QVirtIOSCSI *vs, size_t 
alloc_size,
 static uint8_t virtio_scsi_do_command(QVirtIOSCSI *vs, const uint8_t *cdb,
   const uint8_t *data_in,
   size_t data_in_len,
-  uint8_t *data_out, size_t data_out_len)
+  uint8_t *data_out, size_t data_out_len,
+  QVirtIOSCSICmdResp *resp_out)
 {
 QVirtQueue *vq;
 QVirtIOSCSICmdReq req = { { 0 } };
@@ -174,6 +142,10 @@ static uint8_t virtio_scsi_do_command(QVirtIOSCSI *vs, 
const uint8_t *cdb,
 
 response = readb(resp_addr + offsetof(QVirtIOSCSICmdResp, response));
 
+if (resp_out) {
+memread(resp_addr, resp_out, sizeof(*resp_out));
+}
+
 guest_free(vs-alloc, req_addr);
 guest_free(vs-alloc, resp_addr);
 guest_free(vs-alloc, data_in_addr);
@@ -181,6 +153,52 @@ static uint8_t virtio_scsi_do_command(QVirtIOSCSI *vs, 
const uint8_t *cdb,
 return response;
 }
 
+static QVirtIOSCSI *qvirtio_scsi_pci_init(int slot)
+{
+const uint8_t test_unit_ready_cdb[CDB_SIZE] = {};
+QVirtIOSCSI *vs;
+QVirtioPCIDevice *dev;
+QVirtIOSCSICmdResp resp;
+void *addr;
+int i;
+
+vs = g_new0(QVirtIOSCSI, 1);
+vs-alloc = pc_alloc_init();
+vs-bus = qpci_init_pc();
+
+dev = qvirtio_pci_device_find(vs-bus, QVIRTIO_SCSI_DEVICE_ID);
+vs-dev = (QVirtioDevice *)dev;
+g_assert(dev != NULL);
+g_assert_cmphex(vs-dev-device_type, ==, QVIRTIO_SCSI_DEVICE_ID);
+
+qvirtio_pci_device_enable(dev);
+qvirtio_reset(qvirtio_pci, vs-dev);
+qvirtio_set_acknowledge(qvirtio_pci, vs-dev);
+qvirtio_set_driver(qvirtio_pci, vs-dev);
+
+addr = dev-addr + QVIRTIO_PCI_DEVICE_SPECIFIC_NO_MSIX;
+vs-num_queues = qvirtio_config_readl(qvirtio_pci, vs-dev,
+  (uint64_t)(uintptr_t)addr);
+
+g_assert_cmpint(vs-num_queues, , MAX_NUM_QUEUES);
+
+for (i = 0; i  vs-num_queues + 2; i++) {
+vs-vq[i] = qvirtqueue_setup(qvirtio_pci, vs-dev, vs-alloc, i);
+}
+
+/* Clear the POWER ON OCCURRED unit attention */
+g_assert_cmpint(virtio_scsi_do_command(vs, test_unit_ready_cdb,
+   NULL, 0, NULL, 0, resp),
+==, 0);
+g_assert_cmpint(resp.status, ==, CHECK_CONDITION);
+g_assert_cmpint(resp.sense[0], ==, 0x70); /* Fixed format sense buffer */
+g_assert_cmpint(resp.sense[2], ==, UNIT_ATTENTION);
+g_assert_cmpint(resp.sense[12], ==, 0x29); /* POWER ON */
+g_assert_cmpint(resp.sense[13], ==, 0x00);
+
+return vs;
+}
+
 /* Tests only initialization so far. TODO: Replace with functional tests */
 static void pci_nop(void)
 {
@@ -231,7 +249,7 @@ static void test_unaligned_write_same(void)
 vs = qvirtio_scsi_pci_init(PCI_SLOT);
 
 g_assert_cmphex(0, ==,
-virtio_scsi_do_command(vs, write_same_cdb, NULL, 0, buf, 512));
+virtio_scsi_do_command(vs, write_same_cdb, NULL, 0, buf, 512, NULL));
 
 qvirtio_scsi_pci_free(vs);

Re: [Qemu-devel] help

2015-07-30 Thread Stefan Hajnoczi

On Thu, Jul 30, 2015 at 11:17 AM, Serigne Baytir DIENG
yabri...@gmail.com wrote:
 I am new into this qemu, but i have a pretty good understanding of how it
 works. What i am trying to do is to attach a new pci device into the virtio
 pci bus. I don't know much what files need to be changed or how could i
 achieved that.

I don't understand what you are trying to do.  Do you want to
implement a new type of virtio device?

That is largely independent of PCI, although you'll probably want to
hook up the new device following the same pattern as the others in
hw/virtio/virtio-pci.c.

Take a look at the existing virtio device types (blk, net, scsi, rng,
9p, etc) for examples of how to do it.

Stefan

[Qemu-devel] [PATCH 10/53] target-arm: Avoid buffer overrun on UNPREDICTABLE ldrd/strd

2015-07-30 Thread Michael Roth

From: Peter Maydell peter.mayd...@linaro.org

A LDRD or STRD where rd is not an even number is UNPREDICTABLE.
We were letting this fall through, which is OK unless rd is 15,
in which case we would attempt to do a load_reg or store_reg
to a nonexistent r16 for the second half of the double-word.
Catch the odd-numbered-rd cases and UNDEF them instead.

To do this we rearrange the structure of the code a little
so we can put the UNDEF catches at the top before we've
allocated TCG temporaries.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Peter Maydell peter.mayd...@linaro.org
Message-id: 1431348973-21315-1-git-send-email-peter.mayd...@linaro.org
(cherry picked from commit 3960c336ad96c2183549c8bf32bbff93ecda7ea4)
Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com
---
 target-arm/translate.c | 56 --
 1 file changed, 32 insertions(+), 24 deletions(-)

diff --git a/target-arm/translate.c b/target-arm/translate.c
index 9116529..f8f72be 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -8423,34 +8423,30 @@ static void disas_arm_insn(DisasContext *s, unsigned 
int insn)
 }
 } else {
 int address_offset;
-int load;
+bool load = insn  (1  20);
+bool doubleword = false;
 /* Misc load/store */
 rn = (insn  16)  0xf;
 rd = (insn  12)  0xf;
+
+if (!load  (sh  2)) {
+/* doubleword */
+ARCH(5TE);
+if (rd  1) {
+/* UNPREDICTABLE; we choose to UNDEF */
+goto illegal_op;
+}
+load = (sh  1) == 0;
+doubleword = true;
+}
+
 addr = load_reg(s, rn);
 if (insn  (1  24))
 gen_add_datah_offset(s, insn, 0, addr);
 address_offset = 0;
-if (insn  (1  20)) {
-/* load */
-tmp = tcg_temp_new_i32();
-switch(sh) {
-case 1:
-gen_aa32_ld16u(tmp, addr, get_mem_index(s));
-break;
-case 2:
-gen_aa32_ld8s(tmp, addr, get_mem_index(s));
-break;
-default:
-case 3:
-gen_aa32_ld16s(tmp, addr, get_mem_index(s));
-break;
-}
-load = 1;
-} else if (sh  2) {
-ARCH(5TE);
-/* doubleword */
-if (sh  1) {
+
+if (doubleword) {
+if (!load) {
 /* store */
 tmp = load_reg(s, rd);
 gen_aa32_st32(tmp, addr, get_mem_index(s));
@@ -8459,7 +8455,6 @@ static void disas_arm_insn(DisasContext *s, unsigned int 
insn)
 tmp = load_reg(s, rd + 1);
 gen_aa32_st32(tmp, addr, get_mem_index(s));
 tcg_temp_free_i32(tmp);
-load = 0;
 } else {
 /* load */
 tmp = tcg_temp_new_i32();
@@ -8469,15 +8464,28 @@ static void disas_arm_insn(DisasContext *s, unsigned 
int insn)
 tmp = tcg_temp_new_i32();
 gen_aa32_ld32u(tmp, addr, get_mem_index(s));
 rd++;
-load = 1;
 }
 address_offset = -4;
+} else if (load) {
+/* load */
+tmp = tcg_temp_new_i32();
+switch (sh) {
+case 1:
+gen_aa32_ld16u(tmp, addr, get_mem_index(s));
+break;
+case 2:
+gen_aa32_ld8s(tmp, addr, get_mem_index(s));
+break;
+default:
+case 3:
+gen_aa32_ld16s(tmp, addr, get_mem_index(s));
+break;
+}
 } else {
 /* store */
 tmp = load_reg(s, rd);
 gen_aa32_st16(tmp, addr, get_mem_index(s));
 tcg_temp_free_i32(tmp);
-load = 0;
 }
 /* Perform base writeback before the loaded value to
ensure correct behavior with overlapping index registers.
-- 
1.9.1

[Qemu-devel] [PATCH 39/53] spapr_vty: lookup should only return valid VTY objects

2015-07-30 Thread Michael Roth

From: David Gibson da...@gibson.dropbear.id.au

If a guest passes the reg property of a valid VIO object that is not a VTY
to either H_GET_TERM_CHAR or H_PUT_TERM_CHAR, QEMU hits a dynamic cast
assertion and aborts.

PAPR+ says Hypervisor checks the termno parameter for validity against the
Vterm IOA unit addresses assigned to the partition, else return H_Parameter.

This patch adds a type check to ensure vty_lookup() either returns a pointer
to a valid VTY object or NULL.  H_GET_TERM_CHAR and H_PUT_TERM_CHAR will
now return H_PARAMETER to the guest instead of crashing.

The patch has no effect on the reg == 0 hack used to implement the RTAS call
display-character.

Signed-off-by: Greg Kurz gk...@linux.vnet.ibm.com
Signed-off-by: David Gibson da...@gibson.dropbear.id.au
Signed-off-by: Alexander Graf ag...@suse.de
(cherry picked from commit 0f888bfaddfc5f55b0d82cde2e1164658a672375)
Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com
---
 hw/char/spapr_vty.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/hw/char/spapr_vty.c b/hw/char/spapr_vty.c
index 4e464bd..c7f824e 100644
--- a/hw/char/spapr_vty.c
+++ b/hw/char/spapr_vty.c
@@ -228,6 +228,10 @@ VIOsPAPRDevice *vty_lookup(sPAPREnvironment *spapr, 
target_ulong reg)
 return spapr_vty_get_default(spapr-vio_bus);
 }
 
+if (!object_dynamic_cast(OBJECT(sdev), TYPE_VIO_SPAPR_VTY_DEVICE)) {
+return NULL;
+}
+
 return sdev;
 }
 
-- 
1.9.1

[Qemu-devel] [PATCH 28/53] virtio-ccw: complete handling of guest-initiated resets

2015-07-30 Thread Michael Roth

From: Cornelia Huck cornelia.h...@de.ibm.com

For a guest-initiated reset, we need to not only reset the virtio device,
but also reset the VirtioCcwDevice into a clean state. This includes
resetting the indicators, or else a guest will not be able to e.g.
switch from classic interrupts to adapter interrupts.

Split off this routine into a new function virtio_ccw_reset_virtio()
to make the distinction between resetting the virtio-related devices
and the base subchannel device clear.

CC: qemu-sta...@nongnu.org
Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
Reviewed-by: Christian Borntraeger borntrae...@de.ibm.com
(cherry picked from commit fa8b0ca5d1b69975b715a259d3586cadf7a5280f)
Conflicts:
hw/s390x/virtio-ccw.c

*removed context dependency on 0b352fd

Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com
---
 hw/s390x/virtio-ccw.c | 39 ++-
 1 file changed, 22 insertions(+), 17 deletions(-)

diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
index d32ecaf..d8fde77 100644
--- a/hw/s390x/virtio-ccw.c
+++ b/hw/s390x/virtio-ccw.c
@@ -295,6 +295,25 @@ static int virtio_ccw_set_vqs(SubchDev *sch, uint64_t 
addr, uint32_t align,
 return 0;
 }
 
+static void virtio_ccw_reset_virtio(VirtioCcwDevice *dev, VirtIODevice *vdev)
+{
+virtio_ccw_stop_ioeventfd(dev);
+virtio_reset(vdev);
+if (dev-indicators) {
+release_indicator(dev-routes.adapter, dev-indicators);
+dev-indicators = NULL;
+}
+if (dev-indicators2) {
+release_indicator(dev-routes.adapter, dev-indicators2);
+dev-indicators2 = NULL;
+}
+if (dev-summary_indicator) {
+release_indicator(dev-routes.adapter, dev-summary_indicator);
+dev-summary_indicator = NULL;
+}
+dev-sch-thinint_active = false;
+}
+
 static int virtio_ccw_cb(SubchDev *sch, CCW1 ccw)
 {
 int ret;
@@ -351,8 +370,7 @@ static int virtio_ccw_cb(SubchDev *sch, CCW1 ccw)
 }
 break;
 case CCW_CMD_VDEV_RESET:
-virtio_ccw_stop_ioeventfd(dev);
-virtio_reset(vdev);
+virtio_ccw_reset_virtio(dev, vdev);
 ret = 0;
 break;
 case CCW_CMD_READ_FEAT:
@@ -480,7 +498,7 @@ static int virtio_ccw_cb(SubchDev *sch, CCW1 ccw)
 }
 virtio_set_status(vdev, status);
 if (vdev-status == 0) {
-virtio_reset(vdev);
+virtio_ccw_reset_virtio(dev, vdev);
 }
 if (status  VIRTIO_CONFIG_S_DRIVER_OK) {
 virtio_ccw_start_ioeventfd(dev);
@@ -1098,21 +1116,8 @@ static void virtio_ccw_reset(DeviceState *d)
 VirtioCcwDevice *dev = VIRTIO_CCW_DEVICE(d);
 VirtIODevice *vdev = virtio_bus_get_device(dev-bus);
 
-virtio_ccw_stop_ioeventfd(dev);
-virtio_reset(vdev);
+virtio_ccw_reset_virtio(dev, vdev);
 css_reset_sch(dev-sch);
-if (dev-indicators) {
-release_indicator(dev-routes.adapter, dev-indicators);
-dev-indicators = NULL;
-}
-if (dev-indicators2) {
-release_indicator(dev-routes.adapter, dev-indicators2);
-dev-indicators2 = NULL;
-}
-if (dev-summary_indicator) {
-release_indicator(dev-routes.adapter, dev-summary_indicator);
-dev-summary_indicator = NULL;
-}
 }
 
 static void virtio_ccw_vmstate_change(DeviceState *d, bool running)
-- 
1.9.1

[Qemu-devel] [PATCH 23/53] sdl2: fix crash in handle_windowevent() when restoring the screen size

2015-07-30 Thread Michael Roth

From: Alberto Garcia be...@igalia.com

The Ctrl-Alt-u keyboard shortcut restores the screen to its original
size. In the SDL2 UI this is done by destroying the window and
creating a new one. The old window emits SDL_WINDOWEVENT_HIDDEN when
it's destroyed, but trying to call SDL_GetWindowFromID() from that
event's window ID returns a null pointer. handle_windowevent() assumes
that the pointer is never null so it results in a crash.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Alberto Garcia be...@igalia.com
Signed-off-by: Gerd Hoffmann kra...@redhat.com
(cherry picked from commit 08d49df0dbaacc220a099dbfb644e1dc0eda57be)
Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com
---
 ui/sdl2.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/ui/sdl2.c b/ui/sdl2.c
index 60e3c3b..f10c6a4 100644
--- a/ui/sdl2.c
+++ b/ui/sdl2.c
@@ -511,6 +511,10 @@ static void handle_windowevent(SDL_Event *ev)
 {
 struct sdl2_console *scon = get_scon_from_window(ev-window.windowID);

+if (!scon) {
+return;
+}
+
 switch (ev-window.event) {
 case SDL_WINDOWEVENT_RESIZED:
 {
-- 
1.9.1

[Qemu-devel] [PATCH 31/53] mirror: Do zero write on target if sectors not allocated

2015-07-30 Thread Michael Roth

From: Fam Zheng f...@redhat.com

If guest discards a source cluster, mirroring with bdrv_aio_readv is overkill.
Some protocols do zero upon discard, where it's best to use
bdrv_aio_write_zeroes, otherwise, bdrv_aio_discard will be enough.

Signed-off-by: Fam Zheng f...@redhat.com
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
(cherry picked from commit dcfb3beb5130694b76b57de109619fcbf9c7e5b5)
Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com
---
 block/mirror.c | 20 ++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 1814523..bd079a4 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -168,6 +168,8 @@ static uint64_t coroutine_fn 
mirror_iteration(MirrorBlockJob *s)
 int64_t end, sector_num, next_chunk, next_sector, hbitmap_next_sector;
 uint64_t delay_ns = 0;
 MirrorOp *op;
+int pnum;
+int64_t ret;
 
 s-sector_num = hbitmap_iter_next(s-hbi);
 if (s-sector_num  0) {
@@ -296,8 +298,22 @@ static uint64_t coroutine_fn 
mirror_iteration(MirrorBlockJob *s)
 s-in_flight++;
 s-sectors_in_flight += nb_sectors;
 trace_mirror_one_iteration(s, sector_num, nb_sectors);
-bdrv_aio_readv(source, sector_num, op-qiov, nb_sectors,
-   mirror_read_complete, op);
+
+ret = bdrv_get_block_status_above(source, NULL, sector_num,
+  nb_sectors, pnum);
+if (ret  0 || pnum  nb_sectors ||
+(ret  BDRV_BLOCK_DATA  !(ret  BDRV_BLOCK_ZERO))) {
+bdrv_aio_readv(source, sector_num, op-qiov, nb_sectors,
+   mirror_read_complete, op);
+} else if (ret  BDRV_BLOCK_ZERO) {
+bdrv_aio_write_zeroes(s-target, sector_num, op-nb_sectors,
+  s-unmap ? BDRV_REQ_MAY_UNMAP : 0,
+  mirror_write_complete, op);
+} else {
+assert(!(ret  BDRV_BLOCK_DATA));
+bdrv_aio_discard(s-target, sector_num, op-nb_sectors,
+ mirror_write_complete, op);
+}
 return delay_ns;
 }
 
-- 
1.9.1

[Qemu-devel] [PATCH 40/53] target-ppc: fix hugepage support when using memory-backend-file

2015-07-30 Thread Michael Roth

Current PPC code relies on -mem-path being used in order for
hugepage support to be detected. With the introduction of
MemoryBackendFile we can now handle this via:
  -object memory-file-backend,mem-path=...,id=hugemem0 \
  -numa node,id=mem0,memdev=hugemem0

Management tools like libvirt treat the 2 approaches as
interchangeable in some cases, which can lead to user-visible
regressions even for previously supported guest configurations.

Fix these by also iterating through any configured memory
backends that may be backed by hugepages.

Since the old code assumed hugepages always backed the entirety
of guest memory, play it safe an pick the minimum across the
max pages sizes for all backends, even ones that aren't backed
by hugepages.

Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com
Reviewed-by: David Gibson da...@gibson.dropbear.id.au
Signed-off-by: Alexander Graf ag...@suse.de
(cherry picked from commit 2d103aae876518a91636ad6f4a4d866269c0d953)
Conflicts:
target-ppc/kvm.c

*remove context dependency on header includes not in 2.3.0

Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com
---
 target-ppc/kvm.c | 57 ++--
 1 file changed, 51 insertions(+), 6 deletions(-)

diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index 12328a4..84ae447 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -39,6 +39,7 @@
 #include sysemu/watchdog.h
 #include trace.h
 #include exec/gdbstub.h
+#include sysemu/hostmem.h
 
 //#define DEBUG_KVM
 
@@ -302,16 +303,11 @@ static void kvm_get_smmu_info(PowerPCCPU *cpu, struct 
kvm_ppc_smmu_info *info)
 kvm_get_fallback_smmu_info(cpu, info);
 }
 
-static long getrampagesize(void)
+static long gethugepagesize(const char *mem_path)
 {
 struct statfs fs;
 int ret;
 
-if (!mem_path) {
-/* guest RAM is backed by normal anonymous pages */
-return getpagesize();
-}
-
 do {
 ret = statfs(mem_path, fs);
 } while (ret != 0  errno == EINTR);
@@ -333,6 +329,55 @@ static long getrampagesize(void)
 return fs.f_bsize;
 }
 
+static int find_max_supported_pagesize(Object *obj, void *opaque)
+{
+char *mem_path;
+long *hpsize_min = opaque;
+
+if (object_dynamic_cast(obj, TYPE_MEMORY_BACKEND)) {
+mem_path = object_property_get_str(obj, mem-path, NULL);
+if (mem_path) {
+long hpsize = gethugepagesize(mem_path);
+if (hpsize  *hpsize_min) {
+*hpsize_min = hpsize;
+}
+} else {
+*hpsize_min = getpagesize();
+}
+}
+
+return 0;
+}
+
+static long getrampagesize(void)
+{
+long hpsize = LONG_MAX;
+Object *memdev_root;
+
+if (mem_path) {
+return gethugepagesize(mem_path);
+}
+
+/* it's possible we have memory-backend objects with
+ * hugepage-backed RAM. these may get mapped into system
+ * address space via -numa parameters or memory hotplug
+ * hooks. we want to take these into account, but we
+ * also want to make sure these supported hugepage
+ * sizes are applicable across the entire range of memory
+ * we may boot from, so we take the min across all
+ * backends, and assume normal pages in cases where a
+ * backend isn't backed by hugepages.
+ */
+memdev_root = object_resolve_path(/objects, NULL);
+if (!memdev_root) {
+return getpagesize();
+}
+
+object_child_foreach(memdev_root, find_max_supported_pagesize, hpsize);
+
+return (hpsize == LONG_MAX) ? getpagesize() : hpsize;
+}
+
 static bool kvm_valid_page_size(uint32_t flags, long rampgsize, uint32_t shift)
 {
 if (!(flags  KVM_PPC_PAGE_SIZES_REAL)) {
-- 
1.9.1

[Qemu-devel] [PATCH 07/53] qcow2: Flush pending discards before allocating cluster

2015-07-30 Thread Michael Roth

From: Kevin Wolf kw...@redhat.com

Before a freed cluster can be reused, pending discards for this cluster
must be processed.

The original assumption was that this was not a problem because discards
are only cached during discard/write zeroes operations, which are
synchronous so that no concurrent write requests can cause cluster
allocations.

However, the discard/write zeroes operation itself can allocate a new L2
table (and it has to in order to put zero flags there), so make sure we
can cope with the situation.

This fixes https://bugs.launchpad.net/bugs/1349972.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Kevin Wolf kw...@redhat.com
Reviewed-by: Max Reitz mre...@redhat.com
(cherry picked from commit ecbda7a22576591a84f44de1be0150faf6001f1c)
Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com
---
 block/qcow2-refcount.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 6cbae1d..63c0085 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -833,6 +833,11 @@ static int64_t alloc_clusters_noref(BlockDriverState *bs, 
uint64_t size)
 uint64_t i, nb_clusters, refcount;
 int ret;
 
+/* We can't allocate clusters if they may still be queued for discard. */
+if (s-cache_discards) {
+qcow2_process_discards(bs, 0);
+}
+
 nb_clusters = size_to_clusters(s, size);
 retry:
 for(i = 0; i  nb_clusters; i++) {
-- 
1.9.1

Re: [Qemu-devel] [POC] colo-proxy in qemu

2015-07-30 Thread Gonglei

On 2015/7/30 19:56, Dr. David Alan Gilbert wrote:
 * Jason Wang (jasow...@redhat.com) wrote:


 On 07/30/2015 04:03 PM, Dr. David Alan Gilbert wrote:
 * Dong, Eddie (eddie.d...@intel.com) wrote:
 A question here, the packet comparing may be very tricky. For example,
 some protocol use random data to generate unpredictable id or
 something else. One example is ipv6_select_ident() in Linux. So COLO
 needs a mechanism to make sure PVM and SVM can generate same random
 data?
 Good question, the random data connection is a big problem for COLO. At
 present, it will trigger checkpoint processing because of the different 
 random
 data.
 I don't think any mechanisms can assure two different machines generate 
 the
 same random data. If you have any ideas, pls tell us :)

 Frequent checkpoint can handle this scenario, but maybe will cause the
 performance poor. :(

 The assumption is that, after VM checkpoint, SVM and PVM have identical 
 internal state, so the pattern used to generate random data has high 
 possibility to generate identical data at short time, at least...
 They do diverge pretty quickly though; I have simple examples which
 reliably cause a checkpoint because of simple randomness in applications.

 Dave


 And it will become even worse if hwrng is used in guest.
 
 Yes; it seems quite application dependent;  (on IPv4) an ssh connection,
 once established, tends to work well without triggering checkpoints;
 and static web pages also work well.  Examples of things that do cause
 more checkpoints are, displaying guest statistics (e.g. running top
 in that ssh) which is timing dependent, and dynamically generated
 web pages that include a unique ID (bugzilla's password reset link in
 it's front page was a fun one), I think also establishing
 new encrypted connections cause the same randomness.
 
 However, it's worth remembering that COLO is trying to reduce the
 number of checkpoints compared to a simple checkpointing world
 which would be aiming to do a checkpoint ~100 times a second,
 and for compute bound workloads, or ones that don't expose
 the randomness that much, it can get checkpoints of a few seconds
 in length which greatly reduces the overhead.
 

Yes. That's the truth.
We can set two different modes for different scenarios. Maybe Named
1) frequent checkpoint mode for multi-connections and randomness scenarios
and 2) non-frequent checkpoint mode for other scenarios.

But that's the next plan, we are thinking about that.

Regards,
-Gonglei

[Qemu-devel] [PATCH for-2.4 2/3] scsi-disk: fix cmd.mode field typo

2015-07-30 Thread Stefan Hajnoczi

The cmd.xfer field is the data length.  The cmd.mode field is the data
transfer direction.

scsi_handle_rw_error() was using the wrong error policy for read
requests.

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 hw/scsi/scsi-disk.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 64f0694..73fed3f 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -399,7 +399,7 @@ static void scsi_read_data(SCSIRequest *req)
  */
 static int scsi_handle_rw_error(SCSIDiskReq *r, int error)
 {
-bool is_read = (r-req.cmd.xfer == SCSI_XFER_FROM_DEV);
+bool is_read = (r-req.cmd.mode == SCSI_XFER_FROM_DEV);
 SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r-req.dev);
 BlockErrorAction action = blk_get_error_action(s-qdev.conf.blk,
is_read, error);
-- 
2.4.3

Re: [Qemu-devel] [POC] colo-proxy in qemu

2015-07-30 Thread Dr. David Alan Gilbert

* zhanghailiang (zhang.zhanghaili...@huawei.com) wrote:
 On 2015/7/30 20:30, Dr. David Alan Gilbert wrote:
 * Gonglei (arei.gong...@huawei.com) wrote:
 On 2015/7/30 19:56, Dr. David Alan Gilbert wrote:
 * Jason Wang (jasow...@redhat.com) wrote:
 
 
 On 07/30/2015 04:03 PM, Dr. David Alan Gilbert wrote:
 * Dong, Eddie (eddie.d...@intel.com) wrote:
 A question here, the packet comparing may be very tricky. For example,
 some protocol use random data to generate unpredictable id or
 something else. One example is ipv6_select_ident() in Linux. So COLO
 needs a mechanism to make sure PVM and SVM can generate same random
 data?
 Good question, the random data connection is a big problem for COLO. At
 present, it will trigger checkpoint processing because of the 
 different random
 data.
 I don't think any mechanisms can assure two different machines 
 generate the
 same random data. If you have any ideas, pls tell us :)
 
 Frequent checkpoint can handle this scenario, but maybe will cause the
 performance poor. :(
 
 The assumption is that, after VM checkpoint, SVM and PVM have identical 
 internal state, so the pattern used to generate random data has high 
 possibility to generate identical data at short time, at least...
 They do diverge pretty quickly though; I have simple examples which
 reliably cause a checkpoint because of simple randomness in applications.
 
 Dave
 
 
 And it will become even worse if hwrng is used in guest.
 
 Yes; it seems quite application dependent;  (on IPv4) an ssh connection,
 once established, tends to work well without triggering checkpoints;
 and static web pages also work well.  Examples of things that do cause
 more checkpoints are, displaying guest statistics (e.g. running top
 in that ssh) which is timing dependent, and dynamically generated
 web pages that include a unique ID (bugzilla's password reset link in
 it's front page was a fun one), I think also establishing
 new encrypted connections cause the same randomness.
 
 However, it's worth remembering that COLO is trying to reduce the
 number of checkpoints compared to a simple checkpointing world
 which would be aiming to do a checkpoint ~100 times a second,
 and for compute bound workloads, or ones that don't expose
 the randomness that much, it can get checkpoints of a few seconds
 in length which greatly reduces the overhead.
 
 
 Yes. That's the truth.
 We can set two different modes for different scenarios. Maybe Named
 1) frequent checkpoint mode for multi-connections and randomness scenarios
 and 2) non-frequent checkpoint mode for other scenarios.
 
 But that's the next plan, we are thinking about that.
 
 I have some code that tries to automatically switch between those;
 it measures the checkpoint lengths, and if they're consistently short
 it sends a different message byte to the secondary at the start of the
 checkpoint, so that it doesn't bother running.   Every so often it
 then flips back to a COLO checkpoint to see if the checkpoints
 are still really fast.
 
 
 Do you mean if there are consistent checkpoint requests, not do checkpoint 
 but just send a special message to SVM?
 Resume to common COLO mode until the checkpoint lengths is so not short ?

  We still have to do checkpoints, but we send a special message to the SVM so 
that
the SVM just takes the checkpoint but does not run.

  I'll send the code after I've updated it to your current version; but it's
quite rough/experimental.

It works something like

 ---run PVM run SVM
 COLO long gap
 mode   miscompare
checkpoint
 ---run PVM run SVM
 COLO short gap
 mode   miscompare
checkpoint
 ---run PVM run SVM
 COLO short gap
 mode   miscompare  After a few short runs
checkpoint
 ---run PVM SVM idle   \
   Passivefixed delay|  - repeat 'n' times
 mode   checkpoint /
 ---run PVM run SVM
 COLO short gap   Still a short gap
 mode   miscompare
 ---run PVM SVM idle   \
   Passivefixed delay|  - repeat 'n' times
 mode   checkpoint /
 ---run PVM run SVM
 COLO long gap   long gap now, stay in COLO
 mode   miscompare
checkpoint
 ---run PVM run SVM
 COLO long gap
 mode   miscompare
checkpoint
 
So it saves the CPU time on the SVM, and the comparison traffic, and is
automatic at switching into the passive mode.

It used to be more useful, but your minimum COLO run time that you
added a few versions ago helps a lot in the cases where there are miscompares,
and the delay after the miscompare before you take the checkpoint also helps
in the case where the data is very random.

Dave

 
 Thanks.
 
 Dave
 
 
 Regards,
 -Gonglei
 
 --
 Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
 
 .
 
 
 
--
Dr.

Re: [Qemu-devel] Call Trace for QEMU functions

2015-07-30 Thread Peter Maydell

On 30 July 2015 at 13:20, Naman patel naman...@gmail.com wrote:
 Hi,

  I have compiled QEMU (2.0) for x86_64 on Fedora 22 with tracing enabled
 and the tracing option I chose was dtrace. I have this script called
 callTrace.stp in which I try and get the Call Trace of the function
 helper_invlpg and later tlb_flush.  But I am not able to get the function
 name of the caller function and the call trace depth is only limited to 2.

The helper_invlpg function is called directly from code generated
by QEMU's built-in JIT, not from any other C function.

If you use a newer version of QEMU than 2.0 then I think we have
fixed some of the stack frame information up so that you can
get a backtrace that looks like:
 * helper function
 * [generated code]
 * QEMU execution loop code that handles executing guest code
 * other QEMU functions

This is not likely to be very useful for profiling why or when
we're calling a particular helper function, though.

thanks
-- PMM

[Qemu-devel] [PATCH 34/53] qemu-iotests: Add test case for mirror with unmap

2015-07-30 Thread Michael Roth

From: Fam Zheng f...@redhat.com

This checks that the discard on mirror source that effectively zeroes
data is also reflected by the data of target.

Signed-off-by: Fam Zheng f...@redhat.com
Reviewed-by: John Snow js...@redhat.com
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
(cherry picked from commit c615091793f53ff33b8f6c1b1ba711cf7c93e97b)
Conflicts:
tests/qemu-iotests/group

*remove context dependencies on newer block tests

Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com
---
 tests/qemu-iotests/132 | 59 ++
 tests/qemu-iotests/132.out |  5 
 tests/qemu-iotests/group   |  1 +
 3 files changed, 65 insertions(+)
 create mode 100644 tests/qemu-iotests/132
 create mode 100644 tests/qemu-iotests/132.out

diff --git a/tests/qemu-iotests/132 b/tests/qemu-iotests/132
new file mode 100644
index 000..f53ef6e
--- /dev/null
+++ b/tests/qemu-iotests/132
@@ -0,0 +1,59 @@
+#!/usr/bin/env python
+#
+# Test mirror with unmap
+#
+# Copyright (C) 2015 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see http://www.gnu.org/licenses/.
+#
+
+import time
+import os
+import iotests
+from iotests import qemu_img, qemu_io
+
+test_img = os.path.join(iotests.test_dir, 'test.img')
+target_img = os.path.join(iotests.test_dir, 'target.img')
+
+class TestSingleDrive(iotests.QMPTestCase):
+image_len = 2 * 1024 * 1024 # MB
+
+def setUp(self):
+# Write data to the image so we can compare later
+qemu_img('create', '-f', iotests.imgfmt, test_img, 
str(TestSingleDrive.image_len))
+qemu_io('-f', iotests.imgfmt, '-c', 'write -P0x5d 0 2M', test_img)
+
+self.vm = iotests.VM().add_drive(test_img, 'discard=unmap')
+self.vm.launch()
+
+def tearDown(self):
+self.vm.shutdown()
+os.remove(test_img)
+try:
+os.remove(target_img)
+except OSError:
+pass
+
+def test_mirror_discard(self):
+result = self.vm.qmp('drive-mirror', device='drive0', sync='full',
+ target=target_img)
+self.assert_qmp(result, 'return', {})
+self.vm.hmp_qemu_io('drive0', 'discard 0 64k')
+self.complete_and_wait('drive0')
+self.vm.shutdown()
+self.assertTrue(iotests.compare_images(test_img, target_img),
+'target image does not match source after mirroring')
+
+if __name__ == '__main__':
+iotests.main(supported_fmts=['raw', 'qcow2'])
diff --git a/tests/qemu-iotests/132.out b/tests/qemu-iotests/132.out
new file mode 100644
index 000..ae1213e
--- /dev/null
+++ b/tests/qemu-iotests/132.out
@@ -0,0 +1,5 @@
+.
+--
+Ran 1 tests
+
+OK
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index bcf2578..09e45f4 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -125,3 +125,4 @@
 123 rw auto quick
 128 rw auto quick
 130 rw auto quick
+132 rw auto quick
-- 
1.9.1

[Qemu-devel] [PATCH 22/53] vmdk: Use vmdk_find_index_in_cluster everywhere

2015-07-30 Thread Michael Roth

From: Fam Zheng f...@redhat.com

Signed-off-by: Fam Zheng f...@redhat.com
Reviewed-by: Max Reitz mre...@redhat.com
Signed-off-by: Kevin Wolf kw...@redhat.com
(cherry picked from commit 90df601f06de14f062d2e8dc1bc57f0decf86fd1)
Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com
---
 block/vmdk.c | 10 ++
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index 49a332d..4c71cde 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1424,7 +1424,6 @@ static int vmdk_read(BlockDriverState *bs, int64_t 
sector_num,
 BDRVVmdkState *s = bs-opaque;
 int ret;
 uint64_t n, index_in_cluster;
-uint64_t extent_begin_sector, extent_relative_sector_num;
 VmdkExtent *extent = NULL;
 uint64_t cluster_offset;
 
@@ -1436,9 +1435,7 @@ static int vmdk_read(BlockDriverState *bs, int64_t 
sector_num,
 ret = get_cluster_offset(bs, extent, NULL,
  sector_num  9, false, cluster_offset,
  0, 0);
-extent_begin_sector = extent-end_sector - extent-sectors;
-extent_relative_sector_num = sector_num - extent_begin_sector;
-index_in_cluster = extent_relative_sector_num % 
extent-cluster_sectors;
+index_in_cluster = vmdk_find_index_in_cluster(extent, sector_num);
 n = extent-cluster_sectors - index_in_cluster;
 if (n  nb_sectors) {
 n = nb_sectors;
@@ -1500,7 +1497,6 @@ static int vmdk_write(BlockDriverState *bs, int64_t 
sector_num,
 VmdkExtent *extent = NULL;
 int ret;
 int64_t index_in_cluster, n;
-uint64_t extent_begin_sector, extent_relative_sector_num;
 uint64_t cluster_offset;
 VmdkMetaData m_data;
 
@@ -1516,9 +1512,7 @@ static int vmdk_write(BlockDriverState *bs, int64_t 
sector_num,
 if (!extent) {
 return -EIO;
 }
-extent_begin_sector = extent-end_sector - extent-sectors;
-extent_relative_sector_num = sector_num - extent_begin_sector;
-index_in_cluster = extent_relative_sector_num % 
extent-cluster_sectors;
+index_in_cluster = vmdk_find_index_in_cluster(extent, sector_num);
 n = extent-cluster_sectors - index_in_cluster;
 if (n  nb_sectors) {
 n = nb_sectors;
-- 
1.9.1

[Qemu-devel] [PATCH 25/53] i8254: fix out-of-bounds memory access in pit_ioport_read()

2015-07-30 Thread Michael Roth

From: Petr Matousek pmato...@redhat.com

Due converting PIO to the new memory read/write api we no longer provide
separate I/O region lenghts for read and write operations. As a result,
reading from PIT Mode/Command register will end with accessing
pit-channels with invalid index.

Fix this by ignoring read from the Mode/Command register.

This is CVE-2015-3214.

Reported-by: Matt Tait mattt...@google.com
Fixes: 0505bcdec8228d8de39ab1a02644e71999e7c052
Cc: qemu-sta...@nongnu.org
Signed-off-by: Petr Matousek pmato...@redhat.com
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
(cherry picked from commit d4862a87e31a51de9eb260f25c9e99a75efe3235)
Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com
---
 hw/timer/i8254.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/hw/timer/i8254.c b/hw/timer/i8254.c
index 3450c98..9b65a33 100644
--- a/hw/timer/i8254.c
+++ b/hw/timer/i8254.c
@@ -196,6 +196,12 @@ static uint64_t pit_ioport_read(void *opaque, hwaddr addr,
 PITChannelState *s;
 
 addr = 3;
+
+if (addr == 3) {
+/* Mode/Command register is write only, read is ignored */
+return 0;
+}
+
 s = pit-channels[addr];
 if (s-status_latched) {
 s-status_latched = 0;
-- 
1.9.1

[Qemu-devel] [PATCH 47/53] vfio/pci: Fix bootindex

2015-07-30 Thread Michael Roth

From: Alex Williamson alex.william...@redhat.com

bootindex was incorrectly changed to a device Property during the
platform code split, resulting in it no longer working.  Remove it.

Signed-off-by: Alex Williamson alex.william...@redhat.com
Cc: qemu-sta...@nongnu.org # v2.3+
(cherry picked from commit 759b484c5d7f92bd01f98797c07e8543ee187888)
Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com
---
 hw/vfio/pci.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 73fd89e..beaa306 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3556,7 +3556,6 @@ static Property vfio_pci_dev_properties[] = {
 VFIO_FEATURE_ENABLE_VGA_BIT, false),
 DEFINE_PROP_BIT(x-req, VFIOPCIDevice, features,
 VFIO_FEATURE_ENABLE_REQ_BIT, true),
-DEFINE_PROP_INT32(bootindex, VFIOPCIDevice, bootindex, -1),
 DEFINE_PROP_BOOL(x-mmap, VFIOPCIDevice, vbasedev.allow_mmap, true),
 /*
  * TODO - support passed fds... is this necessary?
-- 
1.9.1

[Qemu-devel] [PATCH 41/53] Fix irq route entries exceeding KVM_MAX_IRQ_ROUTES

2015-07-30 Thread Michael Roth

From: 马文霜 kevin...@tencent.com

Last month, we experienced several guests crash(6cores-8cores), qemu logs
display the following messages:

qemu-system-x86_64: /build/qemu-2.1.2/kvm-all.c:976:
kvm_irqchip_commit_routes: Assertion `ret == 0' failed.

After analysis and verification, we can confirm it's irq-balance
daemon(in guest) leads to the assertion failure. Start a 8 core guest with
two disks, execute the following scripts will reproduce the BUG quickly:

irq_affinity.sh


vda_irq_num=25
vdb_irq_num=27
while [ 1 ]
do
for irq in {1,2,4,8,10,20,40,80}
do
echo $irq  /proc/irq/$vda_irq_num/smp_affinity
echo $irq  /proc/irq/$vdb_irq_num/smp_affinity
dd if=/dev/vda of=/dev/zero bs=4K count=100 iflag=direct
dd if=/dev/vdb of=/dev/zero bs=4K count=100 iflag=direct
done
done


QEMU setup static irq route entries in kvm_pc_setup_irq_routing(), PIC and
IOAPIC share the first 15 GSI numbers, take up 23 GSI numbers, but take up
38 irq route entries. When change irq smp_affinity in guest, a dynamic route
entry may be setup, the current logic is: if allocate GSI number succeeds,
a new route entry can be added. The available dynamic GSI numbers is
1021(KVM_MAX_IRQ_ROUTES-23), but available irq route entries is only
986(KVM_MAX_IRQ_ROUTES-38), GSI numbers greater than route entries.
irq-balance's behavior will eventually leads to total irq route entries
exceed KVM_MAX_IRQ_ROUTES, ioctl(KVM_SET_GSI_ROUTING) fail and
kvm_irqchip_commit_routes() trigger assertion failure.

This patch fix the BUG.

Signed-off-by: Wenshuang Ma kevin...@tencent.com
Cc: qemu-sta...@nongnu.org
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
(cherry picked from commit bdf026317daa3b9dfa281f29e96fbb6fd48394c8)
Conflicts:
kvm-all.c

* remove context dependency on bd2a8884
Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com
---
 kvm-all.c | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index dd44f8c..481c560 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1142,9 +1142,17 @@ static int kvm_irqchip_get_virq(KVMState *s)
 uint32_t *word = s-used_gsi_bitmap;
 int max_words = ALIGN(s-gsi_count, 32) / 32;
 int i, bit;
-bool retry = true;
 
-again:
+/*
+ * PIC and IOAPIC share the first 16 GSI numbers, thus the available
+ * GSI numbers are more than the number of IRQ route. Allocating a GSI
+ * number can succeed even though a new route entry cannot be added.
+ * When this happens, flush dynamic MSI entries to free IRQ route entries.
+ */
+if (!s-direct_msi  s-irq_routes-nr == s-gsi_count) {
+kvm_flush_dynamic_msi_routes(s);
+}
+
 /* Return the lowest unused GSI in the bitmap */
 for (i = 0; i  max_words; i++) {
 bit = ffs(~word[i]);
@@ -1154,11 +1162,6 @@ again:
 
 return bit - 1 + i * 32;
 }
-if (!s-direct_msi  retry) {
-retry = false;
-kvm_flush_dynamic_msi_routes(s);
-goto again;
-}
 return -ENOSPC;
 
 }
-- 
1.9.1

[Qemu-devel] [PATCH 32/53] block: Fix dirty bitmap in bdrv_co_discard

2015-07-30 Thread Michael Roth

From: Fam Zheng f...@redhat.com

Unsetting dirty globally with discard is not very correct. The discard may zero
out sectors (depending on can_write_zeroes_with_unmap), we should replicate
this change to destination side to make sure that the guest sees the same data.

Calling bdrv_reset_dirty also troubles mirror job because the hbitmap iterator
doesn't expect unsetting of bits after current position.

So let's do it the opposite way which fixes both problems: set the dirty bits
if we are to discard it.

Reported-by: wangxiaol...@ucloud.cn
Signed-off-by: Fam Zheng f...@redhat.com
Reviewed-by: Paolo Bonzini pbonz...@redhat.com
Reviewed-by: Eric Blake ebl...@redhat.com
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
(cherry picked from commit 508249952c0ea7472c62e17bf8132295dab4912d)
Conflicts:
block/io.c

* applied manually to avoid dependency on 61007b316
* squashed in 6e82e4b bdrv_reset_dirty() is static in
  2.3.0 and becomes unused as of this patch
Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com
---
 block.c | 15 ++-
 1 file changed, 2 insertions(+), 13 deletions(-)

diff --git a/block.c b/block.c
index 63dd460..4f52d7a 100644
--- a/block.c
+++ b/block.c
@@ -99,8 +99,6 @@ static QLIST_HEAD(, BlockDriver) bdrv_drivers =
 
 static void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector,
int nr_sectors);
-static void bdrv_reset_dirty(BlockDriverState *bs, int64_t cur_sector,
- int nr_sectors);
 /* If non-zero, use only whitelisted block drivers */
 static int use_bdrv_whitelist;
 
@@ -5299,8 +5297,6 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, 
int64_t sector_num,
 return -EROFS;
 }
 
-bdrv_reset_dirty(bs, sector_num, nb_sectors);
-
 /* Do nothing if disabled.  */
 if (!(bs-open_flags  BDRV_O_UNMAP)) {
 return 0;
@@ -5310,6 +5306,8 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, 
int64_t sector_num,
 return 0;
 }
 
+bdrv_set_dirty(bs, sector_num, nb_sectors);
+
 max_discard = MIN_NON_ZERO(bs-bl.max_discard, BDRV_REQUEST_MAX_SECTORS);
 while (nb_sectors  0) {
 int ret;
@@ -5620,15 +5618,6 @@ static void bdrv_set_dirty(BlockDriverState *bs, int64_t 
cur_sector,
 }
 }
 
-static void bdrv_reset_dirty(BlockDriverState *bs, int64_t cur_sector,
- int nr_sectors)
-{
-BdrvDirtyBitmap *bitmap;
-QLIST_FOREACH(bitmap, bs-dirty_bitmaps, list) {
-hbitmap_reset(bitmap-bitmap, cur_sector, nr_sectors);
-}
-}
-
 int64_t bdrv_get_dirty_count(BlockDriverState *bs, BdrvDirtyBitmap *bitmap)
 {
 return hbitmap_count(bitmap-bitmap);
-- 
1.9.1

Re: [Qemu-devel] [PATCH v2 06/45] ivshmem: remove unnecessary dup()

2015-07-30 Thread Marc-André Lureau

hi

On Wed, Jul 29, 2015 at 9:10 PM, Eric Blake ebl...@redhat.com wrote:
 On 07/27/2015 06:32 PM, Marc-André Lureau wrote:
 From: Marc-André Lureau marcandre.lur...@gmail.com

 qemu_chr_fe_get_msgfd() transfer ownership, there is no need to dup the fd.

 s/transfer/transfers/


 Signed-off-by: Marc-André Lureau marcandre.lur...@redhat.com

 Interesting difference in From: vs. S-o-b:; you may want to check your
 configuration, and/or update .mailmap so that we can consolidate all
 contributions from you under different addresses into a single author
 lookup.


both fixed, thanks

-- 
Marc-André Lureau

Re: [Qemu-devel] [PATCH v2 11/11] block: Only poll block layer fds in bdrv_aio_poll

2015-07-30 Thread Paolo Bonzini



On 30/07/2015 03:35, Fam Zheng wrote:
 block.cbdrv_create
 block/curl.c   curl_init_state
 block/io.c bdrv_drain
 block/io.c bdrv_drain_all
 block/io.c bdrv_prwv_co
 block/io.c bdrv_get_block_status_above
 block/io.c bdrv_aio_cancel
 block/io.c bdrv_flush
 block/io.c bdrv_discard
 block/io.c bdrv_flush_io_queue
 block/nfs.cnfs_get_allocated_file_size
 block/qed-table.c  qed_read_l1_table_sync
 block/qed-table.c  qed_write_l1_table_sync
 block/qed-table.c  qed_read_l2_table_sync
 block/qed-table.c  qed_write_l2_table_sync
 blockjob.c block_job_finish_sync
 include/block/block.h  bdrv_get_stats
 qemu-img.c run_block_job
 qemu-io-cmds.c do_co_write_zeroes
 qemu-io-cmds.c wait_break_f
 
 Most of them make some sense to me, but not many make a real difference.  The
 most important ones should be bdrv_drain* and bdrv_flush, and can be taken 
 care
 of from caller side.

Even just bdrv_drain*.  bdrv_flush is okay with incoming requests, it
should be handled in the caller side.

Paolo

[Qemu-devel] [PATCH 01/53] bt-sdp: fix broken uuids power-of-2 calculation

2015-07-30 Thread Michael Roth

From: Stefan Hajnoczi stefa...@redhat.com

The binary search in sdp_uuid_match() only works when the number of
elements to search is a power of two.

  lo = record-uuid;
  hi = record-uuids;
  while (hi = 1)
  if (lo[hi] = val)
  lo += hi;

  return *lo == val;

I noticed that the record-uuids calculation in
sdp_service_record_build() was suspect:

  record-uuids = 1  ffs(record-uuids - 1);

Unlike most ffs(val) - 1 users, the expression is ffs(val - 1)!

Actually ffs() is the wrong function to use for power-of-2.  Use
pow2ceil() to achieve the correct effect.  Now the record-uuid[] array
is sized correctly and the binary search in sdp_uuid_match() should
work.

I'm not sure how to run/test this code.

Cc: Andrzej Zaborowski bal...@zabor.org
Cc: qemu-sta...@nongnu.org
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
Message-id: 1427124571-28598-2-git-send-email-stefa...@redhat.com
Signed-off-by: Kevin Wolf kw...@redhat.com
(cherry picked from commit 588ef9d411339012fc3c94bfad8911e9d0a517a2)
Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com
---
 hw/bt/sdp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/bt/sdp.c b/hw/bt/sdp.c
index 218e075..c903747 100644
--- a/hw/bt/sdp.c
+++ b/hw/bt/sdp.c
@@ -707,7 +707,7 @@ static void sdp_service_record_build(struct 
sdp_service_record_s *record,
 len += sdp_attr_max_size(def-attributes[record-attributes ++].data,
 record-uuids);
 }
-record-uuids = 1  ffs(record-uuids - 1);
+record-uuids = pow2ceil(record-uuids);
 record-attribute_list =
 g_malloc0(record-attributes * sizeof(*record-attribute_list));
 record-uuid =
-- 
1.9.1

[Qemu-devel] [PATCH 21/53] vmdk: Fix index_in_cluster calculation in vmdk_co_get_block_status

2015-07-30 Thread Michael Roth

From: Fam Zheng f...@redhat.com

It has the similar issue with b1649fae49a8. Since the calculation
is repeated for a few times already, introduce a function so it can be
reused.

Signed-off-by: Fam Zheng f...@redhat.com
Reviewed-by: Max Reitz mre...@redhat.com
Signed-off-by: Kevin Wolf kw...@redhat.com
(cherry picked from commit 61f0ed1d54601b91b8195c1a30d7046f83283b40)
Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com
---
 block/vmdk.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index bd74050..49a332d 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1248,6 +1248,17 @@ static VmdkExtent *find_extent(BDRVVmdkState *s,
 return NULL;
 }
 
+static inline uint64_t vmdk_find_index_in_cluster(VmdkExtent *extent,
+  int64_t sector_num)
+{
+uint64_t index_in_cluster, extent_begin_sector, extent_relative_sector_num;
+
+extent_begin_sector = extent-end_sector - extent-sectors;
+extent_relative_sector_num = sector_num - extent_begin_sector;
+index_in_cluster = extent_relative_sector_num % extent-cluster_sectors;
+return index_in_cluster;
+}
+
 static int64_t coroutine_fn vmdk_co_get_block_status(BlockDriverState *bs,
 int64_t sector_num, int nb_sectors, int *pnum)
 {
@@ -1285,7 +1296,7 @@ static int64_t coroutine_fn 
vmdk_co_get_block_status(BlockDriverState *bs,
 break;
 }
 
-index_in_cluster = sector_num % extent-cluster_sectors;
+index_in_cluster = vmdk_find_index_in_cluster(extent, sector_num);
 n = extent-cluster_sectors - index_in_cluster;
 if (n  nb_sectors) {
 n = nb_sectors;
-- 
1.9.1

[Qemu-devel] [PATCH 16/53] qga/commands-posix: Fix bug in guest-fstrim

2015-07-30 Thread Michael Roth

From: Justin Ossevoort jus...@quarantainenet.nl

The FITRIM ioctl updates the fstrim_range structure it receives. This
way the caller can determine how many bytes were trimmed. The
guest-fstrim logic reuses the same fstrim_range for each filesystem,
effectively limiting each filesystem to trim at most as much as the
previous was able to trim.

If a previous filesystem would have trimmed 0 bytes, than the next
filesystem would report an error 'Invalid argument' because a FITRIM
request with length 0 is not valid.

This change resets the fstrim_range structure for each filesystem.

Signed-off-by: Justin Ossevoort jus...@quarantainenet.nl
Reviewed-by: Thomas Huth th...@redhat.com
Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com
(cherry picked from commit 73a652a1b08445e8d91e50cdbb2da50e571c61b3)
Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com
---
 qga/commands-posix.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/qga/commands-posix.c b/qga/commands-posix.c
index ba8de62..4449628 100644
--- a/qga/commands-posix.c
+++ b/qga/commands-posix.c
@@ -1332,11 +1332,7 @@ void qmp_guest_fstrim(bool has_minimum, int64_t minimum, 
Error **errp)
 struct FsMount *mount;
 int fd;
 Error *local_err = NULL;
-struct fstrim_range r = {
-.start = 0,
-.len = -1,
-.minlen = has_minimum ? minimum : 0,
-};
+struct fstrim_range r;
 
 slog(guest-fstrim called);
 
@@ -1360,6 +1356,9 @@ void qmp_guest_fstrim(bool has_minimum, int64_t minimum, 
Error **errp)
  * error means an unexpected error, so return it in those cases.  In
  * some other cases ENOTTY will be reported (e.g. CD-ROMs).
  */
+r.start = 0;
+r.len = -1;
+r.minlen = has_minimum ? minimum : 0;
 ret = ioctl(fd, FITRIM, r);
 if (ret == -1) {
 if (errno != ENOTTY  errno != EOPNOTSUPP) {
-- 
1.9.1

[Qemu-devel] [PATCH 30/53] qmp: Add optional bool unmap to drive-mirror

2015-07-30 Thread Michael Roth

From: Fam Zheng f...@redhat.com

If specified as true, it allows discarding on target sectors where source is
not allocated.

Signed-off-by: Fam Zheng f...@redhat.com
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
(cherry picked from commit 0fc9f8ea2800b76eaea20a8a3a91fbeeb4bfa81b)

* added to maintain any interdependencies between patches in the
  set. not intended as a new feature for 2.3.1, though it's there
  for anyone interested

Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com
---
 block/mirror.c| 8 ++--
 blockdev.c| 5 +
 hmp.c | 2 +-
 include/block/block_int.h | 2 ++
 qapi/block-core.json  | 8 +++-
 qmp-commands.hx   | 3 +++
 6 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 4056164..1814523 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -57,6 +57,7 @@ typedef struct MirrorBlockJob {
 int in_flight;
 int sectors_in_flight;
 int ret;
+bool unmap;
 } MirrorBlockJob;
 
 typedef struct MirrorOp {
@@ -660,6 +661,7 @@ static void mirror_start_job(BlockDriverState *bs, 
BlockDriverState *target,
  int64_t buf_size,
  BlockdevOnError on_source_error,
  BlockdevOnError on_target_error,
+ bool unmap,
  BlockCompletionFunc *cb,
  void *opaque, Error **errp,
  const BlockJobDriver *driver,
@@ -702,6 +704,7 @@ static void mirror_start_job(BlockDriverState *bs, 
BlockDriverState *target,
 s-base = base;
 s-granularity = granularity;
 s-buf_size = MAX(buf_size, granularity);
+s-unmap = unmap;
 
 s-dirty_bitmap = bdrv_create_dirty_bitmap(bs, granularity, errp);
 if (!s-dirty_bitmap) {
@@ -720,6 +723,7 @@ void mirror_start(BlockDriverState *bs, BlockDriverState 
*target,
   int64_t speed, int64_t granularity, int64_t buf_size,
   MirrorSyncMode mode, BlockdevOnError on_source_error,
   BlockdevOnError on_target_error,
+  bool unmap,
   BlockCompletionFunc *cb,
   void *opaque, Error **errp)
 {
@@ -730,7 +734,7 @@ void mirror_start(BlockDriverState *bs, BlockDriverState 
*target,
 base = mode == MIRROR_SYNC_MODE_TOP ? bs-backing_hd : NULL;
 mirror_start_job(bs, target, replaces,
  speed, granularity, buf_size,
- on_source_error, on_target_error, cb, opaque, errp,
+ on_source_error, on_target_error, unmap, cb, opaque, errp,
  mirror_job_driver, is_none_mode, base);
 }
 
@@ -778,7 +782,7 @@ void commit_active_start(BlockDriverState *bs, 
BlockDriverState *base,
 
 bdrv_ref(base);
 mirror_start_job(bs, base, NULL, speed, 0, 0,
- on_error, on_error, cb, opaque, local_err,
+ on_error, on_error, false, cb, opaque, local_err,
  commit_active_job_driver, false, base);
 if (local_err) {
 error_propagate(errp, local_err);
diff --git a/blockdev.c b/blockdev.c
index fbb3a79..dde4061 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2461,6 +2461,7 @@ void qmp_drive_mirror(const char *device, const char 
*target,
   bool has_buf_size, int64_t buf_size,
   bool has_on_source_error, BlockdevOnError 
on_source_error,
   bool has_on_target_error, BlockdevOnError 
on_target_error,
+  bool has_unmap, bool unmap,
   Error **errp)
 {
 BlockBackend *blk;
@@ -2492,6 +2493,9 @@ void qmp_drive_mirror(const char *device, const char 
*target,
 if (!has_buf_size) {
 buf_size = DEFAULT_MIRROR_BUF_SIZE;
 }
+if (!has_unmap) {
+unmap = true;
+}
 
 if (granularity != 0  (granularity  512 || granularity  1048576 * 64)) 
{
 error_set(errp, QERR_INVALID_PARAMETER_VALUE, granularity,
@@ -2631,6 +2635,7 @@ void qmp_drive_mirror(const char *device, const char 
*target,
  has_replaces ? replaces : NULL,
  speed, granularity, buf_size, sync,
  on_source_error, on_target_error,
+ unmap,
  block_job_cb, bs, local_err);
 if (local_err != NULL) {
 bdrv_unref(target_bs);
diff --git a/hmp.c b/hmp.c
index f142d36..1b9a317 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1034,7 +1034,7 @@ void hmp_drive_mirror(Monitor *mon, const QDict *qdict)
  false, NULL, false, NULL,
  full ? MIRROR_SYNC_MODE_FULL : MIRROR_SYNC_MODE_TOP,
  true, mode, false, 0, false, 0, false, 0,
- false, 0, false, 0, err);
+ false, 0, false, 0, false, true, err);
 hmp_handle_error(mon, err);
 }
 
diff --git a/include/block/block_int.h

[Qemu-devel] [PATCH 33/53] qemu-iotests: Make block job methods common

2015-07-30 Thread Michael Roth

From: Fam Zheng f...@redhat.com

Signed-off-by: Fam Zheng f...@redhat.com
Reviewed-by: John Snow js...@redhat.com
Reviewed-by: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
(cherry picked from commit 866323f39d5c7bb053f5e5bf753908ad9f5abec7)
Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com
---
 tests/qemu-iotests/041| 66 ++-
 tests/qemu-iotests/iotests.py | 28 ++
 2 files changed, 43 insertions(+), 51 deletions(-)

diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041
index 59a8f73..3d46ed7 100755
--- a/tests/qemu-iotests/041
+++ b/tests/qemu-iotests/041
@@ -34,38 +34,8 @@ quorum_img3 = os.path.join(iotests.test_dir, 'quorum3.img')
 quorum_repair_img = os.path.join(iotests.test_dir, 'quorum_repair.img')
 quorum_snapshot_file = os.path.join(iotests.test_dir, 'quorum_snapshot.img')
 
-class ImageMirroringTestCase(iotests.QMPTestCase):
-'''Abstract base class for image mirroring test cases'''
 
-def wait_ready(self, drive='drive0'):
-'''Wait until a block job BLOCK_JOB_READY event'''
-ready = False
-while not ready:
-for event in self.vm.get_qmp_events(wait=True):
-if event['event'] == 'BLOCK_JOB_READY':
-self.assert_qmp(event, 'data/type', 'mirror')
-self.assert_qmp(event, 'data/device', drive)
-ready = True
-
-def wait_ready_and_cancel(self, drive='drive0'):
-self.wait_ready(drive=drive)
-event = self.cancel_and_wait(drive=drive)
-self.assertEquals(event['event'], 'BLOCK_JOB_COMPLETED')
-self.assert_qmp(event, 'data/type', 'mirror')
-self.assert_qmp(event, 'data/offset', event['data']['len'])
-
-def complete_and_wait(self, drive='drive0', wait_ready=True):
-'''Complete a block job and wait for it to finish'''
-if wait_ready:
-self.wait_ready(drive=drive)
-
-result = self.vm.qmp('block-job-complete', device=drive)
-self.assert_qmp(result, 'return', {})
-
-event = self.wait_until_completed(drive=drive)
-self.assert_qmp(event, 'data/type', 'mirror')
-
-class TestSingleDrive(ImageMirroringTestCase):
+class TestSingleDrive(iotests.QMPTestCase):
 image_len = 1 * 1024 * 1024 # MB
 
 def setUp(self):
@@ -221,17 +191,9 @@ class TestSingleDriveUnalignedLength(TestSingleDrive):
 test_small_buffer2 = None
 test_large_cluster = None
 
-class TestMirrorNoBacking(ImageMirroringTestCase):
+class TestMirrorNoBacking(iotests.QMPTestCase):
 image_len = 2 * 1024 * 1024 # MB
 
-def complete_and_wait(self, drive='drive0', wait_ready=True):
-iotests.create_image(target_backing_img, TestMirrorNoBacking.image_len)
-return ImageMirroringTestCase.complete_and_wait(self, drive, 
wait_ready)
-
-def compare_images(self, img1, img2):
-iotests.create_image(target_backing_img, TestMirrorNoBacking.image_len)
-return iotests.compare_images(img1, img2)
-
 def setUp(self):
 iotests.create_image(backing_img, TestMirrorNoBacking.image_len)
 qemu_img('create', '-f', iotests.imgfmt, '-o', 'backing_file=%s' % 
backing_img, test_img)
@@ -242,7 +204,10 @@ class TestMirrorNoBacking(ImageMirroringTestCase):
 self.vm.shutdown()
 os.remove(test_img)
 os.remove(backing_img)
-os.remove(target_backing_img)
+try:
+os.remove(target_backing_img)
+except:
+pass
 os.remove(target_img)
 
 def test_complete(self):
@@ -257,7 +222,7 @@ class TestMirrorNoBacking(ImageMirroringTestCase):
 result = self.vm.qmp('query-block')
 self.assert_qmp(result, 'return[0]/inserted/file', target_img)
 self.vm.shutdown()
-self.assertTrue(self.compare_images(test_img, target_img),
+self.assertTrue(iotests.compare_images(test_img, target_img),
 'target image does not match source after mirroring')
 
 def test_cancel(self):
@@ -272,7 +237,7 @@ class TestMirrorNoBacking(ImageMirroringTestCase):
 result = self.vm.qmp('query-block')
 self.assert_qmp(result, 'return[0]/inserted/file', test_img)
 self.vm.shutdown()
-self.assertTrue(self.compare_images(test_img, target_img),
+self.assertTrue(iotests.compare_images(test_img, target_img),
 'target image does not match source after mirroring')
 
 def test_large_cluster(self):
@@ -283,7 +248,6 @@ class TestMirrorNoBacking(ImageMirroringTestCase):
 %(TestMirrorNoBacking.image_len), target_backing_img)
 qemu_img('create', '-f', iotests.imgfmt, '-o', 
'cluster_size=%d,backing_file=%s'
 % (TestMirrorNoBacking.image_len, target_backing_img), 
target_img)
-os.remove(target_backing_img)
 
 result = self.vm.qmp('drive-mirror', device='drive0',

[Qemu-devel] [PATCH 49/53] block: vpc - prevent overflow if max_table_entries = 0x40000000

2015-07-30 Thread Michael Roth

From: Jeff Cody jc...@redhat.com

When we allocate the pagetable based on max_table_entries, we multiply
the max table entry value by 4 to accomodate a table of 32-bit integers.
However, max_table_entries is a uint32_t, and the VPC driver accepts
ranges for that entry over 0x4000.  So during this allocation:

s-pagetable = qemu_try_blockalign(bs-file, s-max_table_entries * 4);

The size arg overflows, allocating significantly less memory than
expected.

Since qemu_try_blockalign() size argument is size_t, cast the
multiplication correctly to prevent overflow.

The value of max_table_entries * 4 is used elsewhere in the code as
well, so store the correct value for use in all those cases.

We also check the Max Tables Entries value, to make sure that it is 
SIZE_MAX / 4, so we know the pagetable size will fit in size_t.

Cc: qemu-sta...@nongnu.org
Reported-by: Richard W.M. Jones rjo...@redhat.com
Signed-off-by: Jeff Cody jc...@redhat.com
Signed-off-by: Kevin Wolf kw...@redhat.com
(cherry picked from commit b15deac79530d818092cb49a8021bcce83d71b5b)
Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com
---
 block/vpc.c | 18 ++
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/block/vpc.c b/block/vpc.c
index 43e768e..8ab30d6 100644
--- a/block/vpc.c
+++ b/block/vpc.c
@@ -168,6 +168,7 @@ static int vpc_open(BlockDriverState *bs, QDict *options, 
int flags,
 uint8_t buf[HEADER_SIZE];
 uint32_t checksum;
 uint64_t computed_size;
+uint64_t pagetable_size;
 int disk_type = VHD_DYNAMIC;
 int ret;
 
@@ -269,7 +270,17 @@ static int vpc_open(BlockDriverState *bs, QDict *options, 
int flags,
 goto fail;
 }
 
-s-pagetable = qemu_try_blockalign(bs-file, s-max_table_entries * 4);
+if (s-max_table_entries  SIZE_MAX / 4 ||
+s-max_table_entries  (int) INT_MAX / 4) {
+error_setg(errp, Max Table Entries too large (% PRId32 ),
+s-max_table_entries);
+ret = -EINVAL;
+goto fail;
+}
+
+pagetable_size = (uint64_t) s-max_table_entries * 4;
+
+s-pagetable = qemu_try_blockalign(bs-file, pagetable_size);
 if (s-pagetable == NULL) {
 ret = -ENOMEM;
 goto fail;
@@ -277,14 +288,13 @@ static int vpc_open(BlockDriverState *bs, QDict *options, 
int flags,
 
 s-bat_offset = be64_to_cpu(dyndisk_header-table_offset);
 
-ret = bdrv_pread(bs-file, s-bat_offset, s-pagetable,
- s-max_table_entries * 4);
+ret = bdrv_pread(bs-file, s-bat_offset, s-pagetable, 
pagetable_size);
 if (ret  0) {
 goto fail;
 }
 
 s-free_data_block_offset =
-(s-bat_offset + (s-max_table_entries * 4) + 511)  ~511;
+ROUND_UP(s-bat_offset + pagetable_size, 512);
 
 for (i = 0; i  s-max_table_entries; i++) {
 be32_to_cpus(s-pagetable[i]);
-- 
1.9.1

Re: [Qemu-devel] [POC] colo-proxy in qemu

2015-07-30 Thread Dr. David Alan Gilbert

* Gonglei (arei.gong...@huawei.com) wrote:
 On 2015/7/30 19:56, Dr. David Alan Gilbert wrote:
  * Jason Wang (jasow...@redhat.com) wrote:
 
 
  On 07/30/2015 04:03 PM, Dr. David Alan Gilbert wrote:
  * Dong, Eddie (eddie.d...@intel.com) wrote:
  A question here, the packet comparing may be very tricky. For example,
  some protocol use random data to generate unpredictable id or
  something else. One example is ipv6_select_ident() in Linux. So COLO
  needs a mechanism to make sure PVM and SVM can generate same random
  data?
  Good question, the random data connection is a big problem for COLO. At
  present, it will trigger checkpoint processing because of the different 
  random
  data.
  I don't think any mechanisms can assure two different machines generate 
  the
  same random data. If you have any ideas, pls tell us :)
 
  Frequent checkpoint can handle this scenario, but maybe will cause the
  performance poor. :(
 
  The assumption is that, after VM checkpoint, SVM and PVM have identical 
  internal state, so the pattern used to generate random data has high 
  possibility to generate identical data at short time, at least...
  They do diverge pretty quickly though; I have simple examples which
  reliably cause a checkpoint because of simple randomness in applications.
 
  Dave
 
 
  And it will become even worse if hwrng is used in guest.
  
  Yes; it seems quite application dependent;  (on IPv4) an ssh connection,
  once established, tends to work well without triggering checkpoints;
  and static web pages also work well.  Examples of things that do cause
  more checkpoints are, displaying guest statistics (e.g. running top
  in that ssh) which is timing dependent, and dynamically generated
  web pages that include a unique ID (bugzilla's password reset link in
  it's front page was a fun one), I think also establishing
  new encrypted connections cause the same randomness.
  
  However, it's worth remembering that COLO is trying to reduce the
  number of checkpoints compared to a simple checkpointing world
  which would be aiming to do a checkpoint ~100 times a second,
  and for compute bound workloads, or ones that don't expose
  the randomness that much, it can get checkpoints of a few seconds
  in length which greatly reduces the overhead.
  
 
 Yes. That's the truth.
 We can set two different modes for different scenarios. Maybe Named
 1) frequent checkpoint mode for multi-connections and randomness scenarios
 and 2) non-frequent checkpoint mode for other scenarios.
 
 But that's the next plan, we are thinking about that.

I have some code that tries to automatically switch between those;
it measures the checkpoint lengths, and if they're consistently short
it sends a different message byte to the secondary at the start of the
checkpoint, so that it doesn't bother running.   Every so often it
then flips back to a COLO checkpoint to see if the checkpoints
are still really fast.

Dave

 
 Regards,
 -Gonglei
 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [POC] colo-proxy in qemu

2015-07-30 Thread zhanghailiang


On 2015/7/30 20:30, Dr. David Alan Gilbert wrote:

* Gonglei (arei.gong...@huawei.com) wrote:

On 2015/7/30 19:56, Dr. David Alan Gilbert wrote:

* Jason Wang (jasow...@redhat.com) wrote:



On 07/30/2015 04:03 PM, Dr. David Alan Gilbert wrote:

* Dong, Eddie (eddie.d...@intel.com) wrote:

A question here, the packet comparing may be very tricky. For example,
some protocol use random data to generate unpredictable id or
something else. One example is ipv6_select_ident() in Linux. So COLO
needs a mechanism to make sure PVM and SVM can generate same random

data?
Good question, the random data connection is a big problem for COLO. At
present, it will trigger checkpoint processing because of the different random
data.
I don't think any mechanisms can assure two different machines generate the
same random data. If you have any ideas, pls tell us :)

Frequent checkpoint can handle this scenario, but maybe will cause the
performance poor. :(


The assumption is that, after VM checkpoint, SVM and PVM have identical 
internal state, so the pattern used to generate random data has high 
possibility to generate identical data at short time, at least...

They do diverge pretty quickly though; I have simple examples which
reliably cause a checkpoint because of simple randomness in applications.

Dave



And it will become even worse if hwrng is used in guest.


Yes; it seems quite application dependent;  (on IPv4) an ssh connection,
once established, tends to work well without triggering checkpoints;
and static web pages also work well.  Examples of things that do cause
more checkpoints are, displaying guest statistics (e.g. running top
in that ssh) which is timing dependent, and dynamically generated
web pages that include a unique ID (bugzilla's password reset link in
it's front page was a fun one), I think also establishing
new encrypted connections cause the same randomness.

However, it's worth remembering that COLO is trying to reduce the
number of checkpoints compared to a simple checkpointing world
which would be aiming to do a checkpoint ~100 times a second,
and for compute bound workloads, or ones that don't expose
the randomness that much, it can get checkpoints of a few seconds
in length which greatly reduces the overhead.



Yes. That's the truth.
We can set two different modes for different scenarios. Maybe Named
1) frequent checkpoint mode for multi-connections and randomness scenarios
and 2) non-frequent checkpoint mode for other scenarios.

But that's the next plan, we are thinking about that.


I have some code that tries to automatically switch between those;
it measures the checkpoint lengths, and if they're consistently short
it sends a different message byte to the secondary at the start of the
checkpoint, so that it doesn't bother running.   Every so often it
then flips back to a COLO checkpoint to see if the checkpoints
are still really fast.



Do you mean if there are consistent checkpoint requests, not do checkpoint but 
just send a special message to SVM?
Resume to common COLO mode until the checkpoint lengths is so not short ?

Thanks.


Dave



Regards,
-Gonglei


--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

.

Re: [Qemu-devel] [sheepdog] [PATCH] sheepdog: fix overlapping metadata update

2015-07-30 Thread Jeff Cody

On Thu, Jul 30, 2015 at 09:41:08AM +0300, Vasiliy Tolstov wrote:
 2015-07-29 12:31 GMT+03:00 Liu Yuan namei.u...@gmail.com:
  Technically, it won't affect the performance because index updates are not 
  range
  but concrete in terms of underlying 4M block size. Only 2 or 3 indexes in a
  range will be updated and 90+% updates will be only 1. So if 2 updates 
  stride a
  large range, it will actually worse the performance of sheepdog because many
  additional unref of object will be executed by sheep internally.
 
  It is not a performance problem but more the right fix. Even with your 
  patch,
  updates of inode can overlap. You just don't allow overlapped requests go to
  sheepdog, which is a overkill approach. IMHO, we should only adjust to avoid
  the overlapped inode updates, which can be done easily and incrementally on 
  top
  of old code, rather than take on a complete new untested overkill 
  mechanism. So
  what we get from your patch? Covering the problem and lock every requests?
 
  Your patch actually fix nothing but just cover the problem by slowing down 
  the
  request and even with your patch, the problem still exists because inode 
  updates
  can overlap. Your commit log doesn't explain what is the real problem and 
  why
  your fix works. This is not your toy project that can commit whatever you 
  want.
 
  BTW, sheepdog project was already forked, why don't you fork the block
  driver, too?
 
  What makes you think you own the block driver?
 
  We forked the sheepdog project because it is low quality of code partly and 
  mostly
  some company tries to make it a private project. It is not as open source 
  friendly
  as before and that is the main reason Kazutaka and I chose to fork the 
  sheepdog
  project. But this doesn't mean we need to fork the QEMU project, it is an
  open source project and not your home toy.
 
  Kazutaka and I are the biggest contributers of both sheepdog and QEMU 
  sheepdog
  block driver for years, so I think I am eligible to review the patch and
  responsible to suggest the right fix. If you are pissed off when someone 
  else
  have other opinions, you can just fork the code and play with it at home or 
  you
  follow the rule of open source project.
 
 
 Jeff Cody, please be the judge, patch from Hitoshi solved my problem
 that i emailed in sheepdog list (i have test environment with 8 hosts
 on each 6 SSD disks and infiniband interconnect between hosts) before
 Hitoshi patch, massive writing to sheepdog storage breaks file system
 and corrupt it.
 After the patch i don't see issues.


I'd rather see some sort consensus amongst Liu, Hitoshi, yourself, or
others more intimately familiar with sheepdog.

Right now, we have Hitoshi's patch in the main git repo, slated for
2.4 release (which is Monday).  It sounds, from Liu's email, as this
may not fix the root cause.

Vasiliy said he would test Liu's patch; if he can confirm this new
patch fix, then I would be inclined to use Liu's patch, based on the
detailed analysis of the issue in the commit message.

Thanks,
Jeff

[Qemu-devel] [PATCH 12/53] Revert block: Fix unaligned zero write

2015-07-30 Thread Michael Roth

This reverts commit fc3959e4669a1c2149b91ccb05101cfc7ae1fc05.

From upstream commit d01c07f:
  This reverts commit fc3959e4669a1c2149b91ccb05101cfc7ae1fc05.

  The core write code already handles the case, so remove this
  duplication.

  Because commit 61007b316 moved the touched code from block.c to
  block/io.c, the change is manually reverted.

  Signed-off-by: Fam Zheng f...@redhat.com
  Reviewed-by: Stefan Hajnoczi stefa...@redhat.com
  Reviewed-by: Kevin Wolf kw...@redhat.com

v2.3.0 does not contain 61007b316 so we can revert the change
directly.

Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com
---
 block.c | 45 ++---
 1 file changed, 6 insertions(+), 39 deletions(-)

diff --git a/block.c b/block.c
index f2f8ae7..0fe97de 100644
--- a/block.c
+++ b/block.c
@@ -3118,19 +3118,6 @@ out:
 return ret;
 }
 
-static inline uint64_t bdrv_get_align(BlockDriverState *bs)
-{
-/* TODO Lift BDRV_SECTOR_SIZE restriction in BlockDriver interface */
-return MAX(BDRV_SECTOR_SIZE, bs-request_alignment);
-}
-
-static inline bool bdrv_req_is_aligned(BlockDriverState *bs,
-   int64_t offset, size_t bytes)
-{
-int64_t align = bdrv_get_align(bs);
-return !(offset  (align - 1) || (bytes  (align - 1)));
-}
-
 /*
  * Handle a read request in coroutine context
  */
@@ -3141,7 +3128,8 @@ static int coroutine_fn 
bdrv_co_do_preadv(BlockDriverState *bs,
 BlockDriver *drv = bs-drv;
 BdrvTrackedRequest req;
 
-uint64_t align = bdrv_get_align(bs);
+/* TODO Lift BDRV_SECTOR_SIZE restriction in BlockDriver interface */
+uint64_t align = MAX(BDRV_SECTOR_SIZE, bs-request_alignment);
 uint8_t *head_buf = NULL;
 uint8_t *tail_buf = NULL;
 QEMUIOVector local_qiov;
@@ -3383,7 +3371,8 @@ static int coroutine_fn 
bdrv_co_do_pwritev(BlockDriverState *bs,
 BdrvRequestFlags flags)
 {
 BdrvTrackedRequest req;
-uint64_t align = bdrv_get_align(bs);
+/* TODO Lift BDRV_SECTOR_SIZE restriction in BlockDriver interface */
+uint64_t align = MAX(BDRV_SECTOR_SIZE, bs-request_alignment);
 uint8_t *head_buf = NULL;
 uint8_t *tail_buf = NULL;
 QEMUIOVector local_qiov;
@@ -3482,10 +3471,6 @@ static int coroutine_fn 
bdrv_co_do_pwritev(BlockDriverState *bs,
 bytes = ROUND_UP(bytes, align);
 }
 
-if (use_local_qiov) {
-/* Local buffer may have non-zero data. */
-flags = ~BDRV_REQ_ZERO_WRITE;
-}
 ret = bdrv_aligned_pwritev(bs, req, offset, bytes,
use_local_qiov ? local_qiov : qiov,
flags);
@@ -3526,32 +3511,14 @@ int coroutine_fn bdrv_co_write_zeroes(BlockDriverState 
*bs,
   int64_t sector_num, int nb_sectors,
   BdrvRequestFlags flags)
 {
-int ret;
-
 trace_bdrv_co_write_zeroes(bs, sector_num, nb_sectors, flags);
 
 if (!(bs-open_flags  BDRV_O_UNMAP)) {
 flags = ~BDRV_REQ_MAY_UNMAP;
 }
-if (bdrv_req_is_aligned(bs, sector_num  BDRV_SECTOR_BITS,
-nb_sectors  BDRV_SECTOR_BITS)) {
-ret = bdrv_co_do_writev(bs, sector_num, nb_sectors, NULL,
-BDRV_REQ_ZERO_WRITE | flags);
-} else {
-uint8_t *buf;
-QEMUIOVector local_qiov;
-size_t bytes = nb_sectors  BDRV_SECTOR_BITS;
-
-buf = qemu_memalign(bdrv_opt_mem_align(bs), bytes);
-memset(buf, 0, bytes);
-qemu_iovec_init(local_qiov, 1);
-qemu_iovec_add(local_qiov, buf, bytes);
 
-ret = bdrv_co_do_writev(bs, sector_num, nb_sectors, local_qiov,
-BDRV_REQ_ZERO_WRITE | flags);
-qemu_vfree(buf);
-}
-return ret;
+return bdrv_co_do_writev(bs, sector_num, nb_sectors, NULL,
+ BDRV_REQ_ZERO_WRITE | flags);
 }
 
 /**
-- 
1.9.1

[Qemu-devel] [PATCH 18/53] kbd: add brazil kbd keys to x11 evdev map

2015-07-30 Thread Michael Roth

From: Gerd Hoffmann kra...@redhat.com

This patch adds the two extra brazilian keys to the evdev keymap for
X11.  This patch gets the two keys going with the vnc, gtk and sdl1
UIs.

The SDL2 library complains it doesn't know these keys, so the SDL2
library must be fixed before we can update ui/sdl2-keymap.h

Cc: qemu-sta...@nongnu.org
Signed-off-by: Gerd Hoffmann kra...@redhat.com
Reviewed-by: Markus Armbruster arm...@redhat.com
Reviewed-by: Daniel P. Berrange berra...@redhat.com
Reviewed-by: Michael Tokarev m...@tls.msk.ru
(cherry picked from commit 33aa30cafcce053b833f9fe09fbb88e2f54b93aa)
Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com
---
 ui/x_keymap.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/ui/x_keymap.c b/ui/x_keymap.c
index b9b0944..1a77317 100644
--- a/ui/x_keymap.c
+++ b/ui/x_keymap.c
@@ -94,7 +94,7 @@ static const uint8_t x_keycode_to_pc_keycode[115] = {
  */
 
 static const uint8_t evdev_keycode_to_pc_keycode[61] = {
-0, /*  97 EVDEV - RO   (Internet Keyboards) */
+0x73,  /*  97 EVDEV - RO   (Internet Keyboards) */
 0, /*  98 EVDEV - KATA (Katakana) */
 0, /*  99 EVDEV - HIRA (Hiragana) */
 0x79,  /* 100 EVDEV - HENK (Henkan) */
@@ -126,7 +126,7 @@ static const uint8_t evdev_keycode_to_pc_keycode[61] = {
 0, /* 126 EVDEV - I126 (Internet Keyboards) */
 0, /* 127 EVDEV - PAUS */
 0, /* 128 EVDEV -  */
-0, /* 129 EVDEV - I129 (Internet Keyboards) */
+0x7e,  /* 129 EVDEV - KP_COMMA (brazilian) */
 0xf1,  /* 130 EVDEV - HNGL (Korean Hangul Latin toggle) */
 0xf2,  /* 131 EVDEV - HJCV (Korean Hangul Hanja toggle) */
 0x7d,  /* 132 AE13 (Yen)*/
-- 
1.9.1

[Qemu-devel] [PATCH 17/53] kbd: add brazil kbd keys to qemu

2015-07-30 Thread Michael Roth

From: Gerd Hoffmann kra...@redhat.com

The brazilian computer keyboard layout has two extra keys (compared to
the usual 105-key intl ps/2 keyboard).  This patch makes these two keys
known to qemu.

For historic reasons qemu has two ways to specify a key:  A QKeyCode
(name-based) or a number (ps/2 scancode based).  Therefore we have to
update multiple places to make new keys known to qemu:

  (1) The QKeyCode definition in qapi-schema.json
  (2) The QKeyCode - number mapping table in ui/input-keymap.c

This patch does just that.  With this patch applied you can send those
two keys to the guest using the send-key monitor command.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Gerd Hoffmann kra...@redhat.com
Reviewed-by: Markus Armbruster arm...@redhat.com
Reviewed-by: Daniel P. Berrange berra...@redhat.com
Reviewed-by: Michael Tokarev m...@tls.msk.ru
(cherry picked from commit b771f470f3e2f99f585eaae68147f0c849fd1f8d)
Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com
---
 qapi-schema.json  | 4 +++-
 ui/input-keymap.c | 4 
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/qapi-schema.json b/qapi-schema.json
index ac9594d..ddccd36 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -2669,6 +2669,7 @@
 # Since: 1.3.0
 #
 # 'unmapped' and 'pause' since 2.0
+# 'ro' and 'kp_comma' since 2.4
 ##
 { 'enum': 'QKeyCode',
   'data': [ 'unmapped',
@@ -2686,7 +2687,8 @@
 'kp_9', 'less', 'f11', 'f12', 'print', 'home', 'pgup', 'pgdn', 
'end',
 'left', 'up', 'down', 'right', 'insert', 'delete', 'stop', 'again',
 'props', 'undo', 'front', 'copy', 'open', 'paste', 'find', 'cut',
- 'lf', 'help', 'meta_l', 'meta_r', 'compose', 'pause' ] }
+'lf', 'help', 'meta_l', 'meta_r', 'compose', 'pause', 'ro',
+'kp_comma' ] }
 
 ##
 # @KeyValue
diff --git a/ui/input-keymap.c b/ui/input-keymap.c
index 5d29935..7635cb0 100644
--- a/ui/input-keymap.c
+++ b/ui/input-keymap.c
@@ -128,6 +128,10 @@ static const int qcode_to_number[] = {
 
 [Q_KEY_CODE_INSERT] = 0xd2,
 [Q_KEY_CODE_DELETE] = 0xd3,
+
+[Q_KEY_CODE_RO] = 0x73,
+[Q_KEY_CODE_KP_COMMA] = 0x7e,
+
 [Q_KEY_CODE_MAX] = 0,
 };
 
-- 
1.9.1

[Qemu-devel] [PATCH 19/53] qcow2: Set MIN_L2_CACHE_SIZE to 2

2015-07-30 Thread Michael Roth

From: Max Reitz mre...@redhat.com

The L2 cache must cover at least two L2 tables, because during COW two
L2 tables are accessed simultaneously.

Reported-by: Alexander Graf ag...@suse.de
Cc: qemu-stable qemu-sta...@nongnu.org
Signed-off-by: Max Reitz mre...@redhat.com
Tested-by: Alexander Graf ag...@suse.de
Reviewed-by: Alberto Garcia be...@igalia.com
Signed-off-by: Kevin Wolf kw...@redhat.com
(cherry picked from commit 57e216695948a79d9ced82fc217a37cce70fd986)
Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com
---
 block/qcow2.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 422b825..2f20949 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -62,7 +62,8 @@
 #define MIN_CLUSTER_BITS 9
 #define MAX_CLUSTER_BITS 21
 
-#define MIN_L2_CACHE_SIZE 1 /* cluster */
+/* Must be at least 2 to cover COW */
+#define MIN_L2_CACHE_SIZE 2 /* clusters */
 
 /* Must be at least 4 to cover all cases of refcount table growth */
 #define MIN_REFCOUNT_CACHE_SIZE 4 /* clusters */
-- 
1.9.1

[Qemu-devel] [PATCH 13/53] block: Fix NULL deference for unaligned write if qiov is NULL

2015-07-30 Thread Michael Roth

From: Fam Zheng f...@redhat.com

For zero write, callers pass in NULL qiov (qemu-io write -z or
scsi-disk write same).

Commit fc3959e466 fixed bdrv_co_write_zeroes which is the common case
for this bug, but it still exists in bdrv_aio_write_zeroes. A simpler
fix would be in bdrv_co_do_pwritev which is the NULL dereference point
and covers both cases.

So don't access it in bdrv_co_do_pwritev in this case, use three aligned
writes.

[Initialize ret to 0 in bdrv_co_do_zero_pwritev() to avoid uninitialized
variable warning with gcc 4.9.2.
--Stefan]

Signed-off-by: Fam Zheng f...@redhat.com
Message-id: 1431522721-3266-3-git-send-email-f...@redhat.com
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
(cherry picked from commit 9eeb6dd1b27bd57eb4e3869290e87feac8e8b226)
Conflicts:
block/io.c

* moved hunks into corresponding location in block.c due to lack of
  61007b316 in v2.3.0
Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com
---
 block.c | 97 +++--
 1 file changed, 95 insertions(+), 2 deletions(-)

diff --git a/block.c b/block.c
index 0fe97de..2b50dc7 100644
--- a/block.c
+++ b/block.c
@@ -3363,6 +3363,94 @@ static int coroutine_fn 
bdrv_aligned_pwritev(BlockDriverState *bs,
 return ret;
 }
 
+static int coroutine_fn bdrv_co_do_zero_pwritev(BlockDriverState *bs,
+int64_t offset,
+unsigned int bytes,
+BdrvRequestFlags flags,
+BdrvTrackedRequest *req)
+{
+uint8_t *buf = NULL;
+QEMUIOVector local_qiov;
+struct iovec iov;
+uint64_t align = MAX(BDRV_SECTOR_SIZE, bs-request_alignment);
+unsigned int head_padding_bytes, tail_padding_bytes;
+int ret = 0;
+
+head_padding_bytes = offset  (align - 1);
+tail_padding_bytes = align - ((offset + bytes)  (align - 1));
+
+
+assert(flags  BDRV_REQ_ZERO_WRITE);
+if (head_padding_bytes || tail_padding_bytes) {
+buf = qemu_blockalign(bs, align);
+iov = (struct iovec) {
+.iov_base   = buf,
+.iov_len= align,
+};
+qemu_iovec_init_external(local_qiov, iov, 1);
+}
+if (head_padding_bytes) {
+uint64_t zero_bytes = MIN(bytes, align - head_padding_bytes);
+
+/* RMW the unaligned part before head. */
+mark_request_serialising(req, align);
+wait_serialising_requests(req);
+BLKDBG_EVENT(bs, BLKDBG_PWRITEV_RMW_HEAD);
+ret = bdrv_aligned_preadv(bs, req, offset  ~(align - 1), align,
+  align, local_qiov, 0);
+if (ret  0) {
+goto fail;
+}
+BLKDBG_EVENT(bs, BLKDBG_PWRITEV_RMW_AFTER_HEAD);
+
+memset(buf + head_padding_bytes, 0, zero_bytes);
+ret = bdrv_aligned_pwritev(bs, req, offset  ~(align - 1), align,
+   local_qiov,
+   flags  ~BDRV_REQ_ZERO_WRITE);
+if (ret  0) {
+goto fail;
+}
+offset += zero_bytes;
+bytes -= zero_bytes;
+}
+
+assert(!bytes || (offset  (align - 1)) == 0);
+if (bytes = align) {
+/* Write the aligned part in the middle. */
+uint64_t aligned_bytes = bytes  ~(align - 1);
+ret = bdrv_aligned_pwritev(bs, req, offset, aligned_bytes,
+   NULL, flags);
+if (ret  0) {
+goto fail;
+}
+bytes -= aligned_bytes;
+offset += aligned_bytes;
+}
+
+assert(!bytes || (offset  (align - 1)) == 0);
+if (bytes) {
+assert(align == tail_padding_bytes + bytes);
+/* RMW the unaligned part after tail. */
+mark_request_serialising(req, align);
+wait_serialising_requests(req);
+BLKDBG_EVENT(bs, BLKDBG_PWRITEV_RMW_TAIL);
+ret = bdrv_aligned_preadv(bs, req, offset, align,
+  align, local_qiov, 0);
+if (ret  0) {
+goto fail;
+}
+BLKDBG_EVENT(bs, BLKDBG_PWRITEV_RMW_AFTER_TAIL);
+
+memset(buf, 0, bytes);
+ret = bdrv_aligned_pwritev(bs, req, offset, align,
+   local_qiov, flags  ~BDRV_REQ_ZERO_WRITE);
+}
+fail:
+qemu_vfree(buf);
+return ret;
+
+}
+
 /*
  * Handle a write request in coroutine context
  */
@@ -3403,6 +3491,11 @@ static int coroutine_fn 
bdrv_co_do_pwritev(BlockDriverState *bs,
  */
 tracked_request_begin(req, bs, offset, bytes, true);
 
+if (!qiov) {
+ret = bdrv_co_do_zero_pwritev(bs, offset, bytes, flags, req);
+goto out;
+}
+
 if (offset  (align - 1)) {
 QEMUIOVector head_qiov;
 struct iovec head_iov;
@@ -3476,14 +3569,14 @@ static int coroutine_fn 
bdrv_co_do_pwritev(BlockDriverState *bs,
flags);
 
 fail:
-

[Qemu-devel] [PATCH 20/53] iotests: qcow2 COW with minimal L2 cache size

2015-07-30 Thread Michael Roth

From: Max Reitz mre...@redhat.com

This adds a test case to test 103 for performing a COW operation in a
qcow2 image using an L2 cache with minimal size (which should be at
least two clusters so the COW can access both source and destination
simultaneously).

Signed-off-by: Max Reitz mre...@redhat.com
Reviewed-by: Alberto Garcia be...@igalia.com
Signed-off-by: Kevin Wolf kw...@redhat.com
(cherry picked from commit a4291eafc597c0944057930acf3e51d899f79c2e)
Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com
---
 tests/qemu-iotests/103 | 10 ++
 tests/qemu-iotests/103.out |  5 +
 2 files changed, 15 insertions(+)

diff --git a/tests/qemu-iotests/103 b/tests/qemu-iotests/103
index ccab551..fa9a3c1 100755
--- a/tests/qemu-iotests/103
+++ b/tests/qemu-iotests/103
@@ -93,6 +93,16 @@ $QEMU_IO -c open -o 
l2-cache-size=1M,refcount-cache-size=0.25M $TEST_IMG \
  -c 'read -P 42 0 64k' \
 | _filter_qemu_io
 
+echo
+echo '=== Testing minimal L2 cache and COW ==='
+echo
+
+$QEMU_IMG snapshot -c foo $TEST_IMG
+# This requires a COW operation, which accesses two L2 tables simultaneously
+# (COW source and destination), so there must be enough space in the cache to
+# place both tables there (and qemu should not crash)
+$QEMU_IO -c open -o cache-size=0 $TEST_IMG -c 'write 0 64k' | _filter_qemu_io
+
 # success, all done
 echo '*** done'
 rm -f $seq.full
diff --git a/tests/qemu-iotests/103.out b/tests/qemu-iotests/103.out
index ee705b0..d05f49f 100644
--- a/tests/qemu-iotests/103.out
+++ b/tests/qemu-iotests/103.out
@@ -26,4 +26,9 @@ read 65536/65536 bytes at offset 0
 64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 read 65536/65536 bytes at offset 0
 64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+=== Testing minimal L2 cache and COW ===
+
+wrote 65536/65536 bytes at offset 0
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 *** done
-- 
1.9.1

[Qemu-devel] [PATCH 14/53] qemu-iotests: Test unaligned sub-block zero write

2015-07-30 Thread Michael Roth

From: Fam Zheng f...@redhat.com

Test zero write in byte range 512~1024 for 4k alignment.

Signed-off-by: Fam Zheng f...@redhat.com
Reviewed-by: Stefan Hajnoczi stefa...@redhat.com
Reviewed-by: Kevin Wolf kw...@redhat.com
Message-id: 1431522721-3266-4-git-send-email-f...@redhat.com
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
(cherry picked from commit ab53c44718305d3fde3d9d2251889f1cab694be2)
Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com
---
 tests/qemu-iotests/033 | 13 +
 tests/qemu-iotests/033.out | 30 ++
 2 files changed, 43 insertions(+)

diff --git a/tests/qemu-iotests/033 b/tests/qemu-iotests/033
index 4008f10..a61d8ce 100755
--- a/tests/qemu-iotests/033
+++ b/tests/qemu-iotests/033
@@ -78,6 +78,19 @@ for align in 512 4k; do
echo
echo == verifying patterns (2) ==
do_test $align read -P 0x0 0x400 0x2 $TEST_IMG | _filter_qemu_io
+
+   echo
+   echo == rewriting unaligned zeroes ==
+   do_test $align write -P 0xb 0x0 0x1000 $TEST_IMG | _filter_qemu_io
+   do_test $align write -z 0x200 0x200 $TEST_IMG | _filter_qemu_io
+
+   echo
+   echo == verifying patterns (3) ==
+   do_test $align read -P 0xb 0x0 0x200 $TEST_IMG | _filter_qemu_io
+   do_test $align read -P 0x0 0x200 0x200 $TEST_IMG | _filter_qemu_io
+   do_test $align read -P 0xb 0x400 0xc00 $TEST_IMG | _filter_qemu_io
+
+   echo
 done
 
 # success, all done
diff --git a/tests/qemu-iotests/033.out b/tests/qemu-iotests/033.out
index 305949f..c3d18aa 100644
--- a/tests/qemu-iotests/033.out
+++ b/tests/qemu-iotests/033.out
@@ -27,6 +27,21 @@ wrote 65536/65536 bytes at offset 65536
 read 131072/131072 bytes at offset 1024
 128 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 
+== rewriting unaligned zeroes ==
+wrote 4096/4096 bytes at offset 0
+4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 512/512 bytes at offset 512
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== verifying patterns (3) ==
+read 512/512 bytes at offset 0
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 512/512 bytes at offset 512
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 3072/3072 bytes at offset 1024
+3 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+
 == preparing image ==
 wrote 1024/1024 bytes at offset 512
 1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
@@ -52,4 +67,19 @@ wrote 65536/65536 bytes at offset 65536
 == verifying patterns (2) ==
 read 131072/131072 bytes at offset 1024
 128 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== rewriting unaligned zeroes ==
+wrote 4096/4096 bytes at offset 0
+4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 512/512 bytes at offset 512
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== verifying patterns (3) ==
+read 512/512 bytes at offset 0
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 512/512 bytes at offset 512
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 3072/3072 bytes at offset 1024
+3 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
 *** done
-- 
1.9.1

[Qemu-devel] [PATCH 27/53] vhost: correctly pass error to caller in vhost_dev_enable_notifiers()

2015-07-30 Thread Michael Roth

From: Jason Wang jasow...@redhat.com

We override the error value r in fail_vq, this will cause the caller
can't detect the failure which may cause the caller may disable the
notifiers twice if vhost is failed to start. Fix this by using another
variable to keep track the return value of set_host_notifier().

Fixes b0b3db79559e57db340b292621c397e7a6cdbdc5 (vhost-net: cleanup
host notifiers at last step)

Cc: qemu-sta...@nongnu.org
Cc: Michael S. Tsirkin m...@redhat.com
Signed-off-by: Jason Wang jasow...@redhat.com
Reviewed-by: Michael S. Tsirkin m...@redhat.com
Signed-off-by: Michael S. Tsirkin m...@redhat.com
(cherry picked from commit 16617e36b02ebdc83f215d89db9ac00f7d6d6d83)
Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com
---
 hw/virtio/vhost.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 54851b7..a7858d3 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -921,7 +921,7 @@ int vhost_dev_enable_notifiers(struct vhost_dev *hdev, 
VirtIODevice *vdev)
 BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
 VirtioBusState *vbus = VIRTIO_BUS(qbus);
 VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
-int i, r;
+int i, r, e;
 if (!k-set_host_notifier) {
 fprintf(stderr, binding does not support host notifiers\n);
 r = -ENOSYS;
@@ -939,12 +939,12 @@ int vhost_dev_enable_notifiers(struct vhost_dev *hdev, 
VirtIODevice *vdev)
 return 0;
 fail_vq:
 while (--i = 0) {
-r = k-set_host_notifier(qbus-parent, hdev-vq_index + i, false);
-if (r  0) {
+e = k-set_host_notifier(qbus-parent, hdev-vq_index + i, false);
+if (e  0) {
 fprintf(stderr, vhost VQ %d notifier cleanup error: %d\n, i, -r);
 fflush(stderr);
 }
-assert (r = 0);
+assert (e = 0);
 }
 fail:
 return r;
-- 
1.9.1

[Qemu-devel] [PATCH 50/53] block: qemu-iotests - add check for multiplication overflow in vpc

2015-07-30 Thread Michael Roth

From: Jeff Cody jc...@redhat.com

This checks that VPC is able to successfully fail (without segfault)
on an image file with a max_table_entries that exceeds 0x4000.

This table entry is within the valid range for VPC (although too large
for this sample image).

Cc: qemu-sta...@nongnu.org
Signed-off-by: Jeff Cody jc...@redhat.com
Signed-off-by: Kevin Wolf kw...@redhat.com
(cherry picked from commit 77c102c26ead946fe7589d4bddcdfa5cb431ebfe)
Conflicts:
tests/qemu-iotests/group

* removed context dependency on iotest not present in 2.3.0 group
  file

Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com
---
 tests/qemu-iotests/135|  54 ++
 tests/qemu-iotests/135.out|   5 +++
 tests/qemu-iotests/group  |   1 +
 tests/qemu-iotests/sample_images/afl5.img.bz2 | Bin 0 - 175 bytes
 4 files changed, 60 insertions(+)
 create mode 100755 tests/qemu-iotests/135
 create mode 100644 tests/qemu-iotests/135.out
 create mode 100644 tests/qemu-iotests/sample_images/afl5.img.bz2

diff --git a/tests/qemu-iotests/135 b/tests/qemu-iotests/135
new file mode 100755
index 000..16bf736
--- /dev/null
+++ b/tests/qemu-iotests/135
@@ -0,0 +1,54 @@
+#!/bin/bash
+#
+# Test VPC open of image with large Max Table Entries value.
+#
+# Copyright (C) 2015 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see http://www.gnu.org/licenses/.
+#
+
+# creator
+owner=jc...@redhat.com
+
+seq=`basename $0`
+echo QA output created by $seq
+
+here=`pwd`
+tmp=/tmp/$$
+status=1   # failure is the default!
+
+_cleanup()
+{
+_cleanup_test_img
+}
+trap _cleanup; exit \$status 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+_supported_fmt vpc
+_supported_proto generic
+_supported_os Linux
+
+_use_sample_img afl5.img.bz2
+
+echo
+echo === Verify image open and failure 
+$QEMU_IMG info $TEST_IMG 21| _filter_testdir
+
+# success, all done
+echo *** done
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/135.out b/tests/qemu-iotests/135.out
new file mode 100644
index 000..793898b
--- /dev/null
+++ b/tests/qemu-iotests/135.out
@@ -0,0 +1,5 @@
+QA output created by 135
+
+=== Verify image open and failure 
+qemu-img: Could not open 'TEST_DIR/afl5.img': Max Table Entries too large 
(1073741825)
+*** done
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index 09e45f4..4c6d9ef 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -126,3 +126,4 @@
 128 rw auto quick
 130 rw auto quick
 132 rw auto quick
+135 rw auto
diff --git a/tests/qemu-iotests/sample_images/afl5.img.bz2 
b/tests/qemu-iotests/sample_images/afl5.img.bz2
new file mode 100644
index 
..1614348865e5b2cfcb0340eab9474841717be2c5
GIT binary patch
literal 175
zcmV;g08sxzT4*^jL0KkKSqT!KVgLXwfB*jgAVdfNFaTf(B!Frw|3pDR00;sy03ZSY
z3IG5B1Sp^YbSh$=r=c!nr#TgFeYj0XnKQ9RLxB?V^M@KMwpn4hJ!zOVwhn^RnWP
zlo2h)1BC$~JU|O6a~hBm5oG|a2~t!d6NaCTwor7gVwVQS9W$Kd2?dJH~Ej+J=Q^
dtom#MI_=bg;S5HeF^MqnF64@Ep$|^KE!naKsf*a

literal 0
HcmV?d1

-- 
1.9.1

[Qemu-devel] [PATCH 51/53] ide: Check array bounds before writing to io_buffer (CVE-2015-5154)

2015-07-30 Thread Michael Roth

From: Kevin Wolf kw...@redhat.com

If the end_transfer_func of a command is called because enough data has
been read or written for the current PIO transfer, and it fails to
correctly call the command completion functions, the DRQ bit in the
status register and s-end_transfer_func may remain set. This allows the
guest to access further bytes in s-io_buffer beyond s-data_end, and
eventually overflowing the io_buffer.

One case where this currently happens is emulation of the ATAPI command
START STOP UNIT.

This patch fixes the problem by adding explicit array bounds checks
before accessing the buffer instead of relying on end_transfer_func to
function correctly.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Kevin Wolf kw...@redhat.com
Reviewed-by: John Snow js...@redhat.com
(cherry picked from commit d2ff85854512574e7209f295e87b0835d5b032c6)
Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com
---
 hw/ide/core.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index a895fd8..17153f5 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -2021,6 +2021,10 @@ void ide_data_writew(void *opaque, uint32_t addr, 
uint32_t val)
 }
 
 p = s-data_ptr;
+if (p + 2  s-data_end) {
+return;
+}
+
 *(uint16_t *)p = le16_to_cpu(val);
 p += 2;
 s-data_ptr = p;
@@ -2042,6 +2046,10 @@ uint32_t ide_data_readw(void *opaque, uint32_t addr)
 }
 
 p = s-data_ptr;
+if (p + 2  s-data_end) {
+return 0;
+}
+
 ret = cpu_to_le16(*(uint16_t *)p);
 p += 2;
 s-data_ptr = p;
@@ -2063,6 +2071,10 @@ void ide_data_writel(void *opaque, uint32_t addr, 
uint32_t val)
 }
 
 p = s-data_ptr;
+if (p + 4  s-data_end) {
+return;
+}
+
 *(uint32_t *)p = le32_to_cpu(val);
 p += 4;
 s-data_ptr = p;
@@ -2084,6 +2096,10 @@ uint32_t ide_data_readl(void *opaque, uint32_t addr)
 }
 
 p = s-data_ptr;
+if (p + 4  s-data_end) {
+return 0;
+}
+
 ret = cpu_to_le32(*(uint32_t *)p);
 p += 4;
 s-data_ptr = p;
-- 
1.9.1

[Qemu-devel] [PATCH 37/53] block/nfs: limit maximum readahead size to 1MB

2015-07-30 Thread Michael Roth

From: Peter Lieven p...@kamp.de

a malicious caller could otherwise specify a very
large value via the URI and force libnfs to allocate
a large amount of memory for the readahead buffer.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Peter Lieven p...@kamp.de
Message-id: 1435317241-25585-1-git-send-email...@kamp.de
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
(cherry picked from commit 29c838cdc96c4d117f00c75bbcb941e1be9590fb)
Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com
---
 block/nfs.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/block/nfs.c b/block/nfs.c
index ca9e24e..c026ff6 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -35,6 +35,8 @@
 #include sysemu/sysemu.h
 #include nfsc/libnfs.h
 
+#define QEMU_NFS_MAX_READAHEAD_SIZE 1048576
+
 typedef struct NFSClient {
 struct nfs_context *context;
 struct nfsfh *fh;
@@ -327,6 +329,11 @@ static int64_t nfs_client_open(NFSClient *client, const 
char *filename,
 nfs_set_tcp_syncnt(client-context, val);
 #ifdef LIBNFS_FEATURE_READAHEAD
 } else if (!strcmp(qp-p[i].name, readahead)) {
+if (val  QEMU_NFS_MAX_READAHEAD_SIZE) {
+error_report(NFS Warning: Truncating NFS readahead
+  size to %d, QEMU_NFS_MAX_READAHEAD_SIZE);
+val = QEMU_NFS_MAX_READAHEAD_SIZE;
+}
 nfs_set_readahead(client-context, val);
 #endif
 } else {
-- 
1.9.1

[Qemu-devel] [PATCH 35/53] iotests: Use event_wait in wait_ready

2015-07-30 Thread Michael Roth

From: Fam Zheng f...@redhat.com

Only poll the specific type of event we are interested in, to avoid
stealing events that should be consumed by someone else.

Suggested-by: John Snow js...@redhat.com
Signed-off-by: Fam Zheng f...@redhat.com
Reviewed-by: John Snow js...@redhat.com
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
(cherry picked from commit d7b25297920d18fa2a2cde1ed21fde38a88c935f)
Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com
---
 tests/qemu-iotests/iotests.py | 9 ++---
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index b1d0c51..05909da 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -290,13 +290,8 @@ class QMPTestCase(unittest.TestCase):
 
 def wait_ready(self, drive='drive0'):
 '''Wait until a block job BLOCK_JOB_READY event'''
-ready = False
-while not ready:
-for event in self.vm.get_qmp_events(wait=True):
-if event['event'] == 'BLOCK_JOB_READY':
-self.assert_qmp(event, 'data/type', 'mirror')
-self.assert_qmp(event, 'data/device', drive)
-ready = True
+f = {'data': {'type': 'mirror', 'device': drive } }
+event = self.vm.event_wait(name='BLOCK_JOB_READY', match=f)
 
 def wait_ready_and_cancel(self, drive='drive0'):
 self.wait_ready(drive=drive)
-- 
1.9.1

[Qemu-devel] [PATCH 52/53] ide/atapi: Fix START STOP UNIT command completion

2015-07-30 Thread Michael Roth

From: Kevin Wolf kw...@redhat.com

The command must be completed on all code paths. START STOP UNIT with
pwrcnd set should succeed without doing anything.

Signed-off-by: Kevin Wolf kw...@redhat.com
Reviewed-by: John Snow js...@redhat.com
(cherry picked from commit 03441c3a4a42beb25460dd11592539030337d0f8)
Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com
---
 hw/ide/atapi.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
index 950e311..79dd167 100644
--- a/hw/ide/atapi.c
+++ b/hw/ide/atapi.c
@@ -983,6 +983,7 @@ static void cmd_start_stop_unit(IDEState *s, uint8_t* buf)
 
 if (pwrcnd) {
 /* eject/load only happens for power condition == 0 */
+ide_atapi_cmd_ok(s);
 return;
 }
 
-- 
1.9.1

Re: [Qemu-devel] [POC] colo-proxy in qemu

2015-07-30 Thread Dr. David Alan Gilbert

* Jason Wang (jasow...@redhat.com) wrote:
 
 
 On 07/30/2015 04:03 PM, Dr. David Alan Gilbert wrote:
  * Dong, Eddie (eddie.d...@intel.com) wrote:
  A question here, the packet comparing may be very tricky. For example,
  some protocol use random data to generate unpredictable id or
  something else. One example is ipv6_select_ident() in Linux. So COLO
  needs a mechanism to make sure PVM and SVM can generate same random
  data?
  Good question, the random data connection is a big problem for COLO. At
  present, it will trigger checkpoint processing because of the different 
  random
  data.
  I don't think any mechanisms can assure two different machines generate 
  the
  same random data. If you have any ideas, pls tell us :)
 
  Frequent checkpoint can handle this scenario, but maybe will cause the
  performance poor. :(
 
  The assumption is that, after VM checkpoint, SVM and PVM have identical 
  internal state, so the pattern used to generate random data has high 
  possibility to generate identical data at short time, at least...
  They do diverge pretty quickly though; I have simple examples which
  reliably cause a checkpoint because of simple randomness in applications.
 
  Dave
 
 
 And it will become even worse if hwrng is used in guest.

Yes; it seems quite application dependent;  (on IPv4) an ssh connection,
once established, tends to work well without triggering checkpoints;
and static web pages also work well.  Examples of things that do cause
more checkpoints are, displaying guest statistics (e.g. running top
in that ssh) which is timing dependent, and dynamically generated
web pages that include a unique ID (bugzilla's password reset link in
it's front page was a fun one), I think also establishing
new encrypted connections cause the same randomness.

However, it's worth remembering that COLO is trying to reduce the
number of checkpoints compared to a simple checkpointing world
which would be aiming to do a checkpoint ~100 times a second,
and for compute bound workloads, or ones that don't expose
the randomness that much, it can get checkpoints of a few seconds
in length which greatly reduces the overhead.

Dave
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH RFC v2 26/47] qapi-types: Convert to QAPISchemaVisitor, fixing flat unions

2015-07-30 Thread Eric Blake

On 07/30/2015 12:42 AM, Markus Armbruster wrote:

 But what happens if the 0th branch is mapped to a different parser, as
 would be the case if one of the alternate's branches is 'number'?  In
 particular, qmp_input_type_number() accepts BOTH QFloat and QInt types.
  So, if we have this qapi:
  { 'alternate': 'Foo', 'data': { 'a': 'str', 'b': 'number' } }
 but pass in an integer, visit_get_next_type() will see a qtype of QInt,
 but Foo_qtypes[QTYPE_QINT] will be 0 (due to default initialization) and
 we will wrongly try to visit the 0th branch (FOO_KIND_A) and fail (the
 string parser doesn't like ints) even though the parse should succeed by
 using the FOO_KIND_B branch.
 
 Yup, bug.

And it's an order-dependent bug - merely declaring 'b' first makes it
appear to work correctly.

 
 Interestingly, this means that if we ever write an alternate type that
 accepts both 'int' and 'number' (we have not attempted that so far),
 then the number branch will only be taken for inputs that don't also
 look like ints (normally, 'number' accepts anything numeric). Maybe that
 means we should document and enforce that 'number' and 'int' cannot be
 mixed in the same alternate?
 
 Even if we outlaw mixing the two, I'm afraid we still have this bug: an
 alternate with a 'number' member rejects input that gets parsed as
 QTYPE_QINT.
 
 Let's simply make alternates behave sanely:
 
 alternate has  case selected for
 'int'  'number'QTYPE_QINT  QTYPE_QFLOAT
   nono error   error
   no   yes 'number''number'
  yesno 'int'   error
  yes   yes 'int'   'number'

Works for me.


 
 + 1 works, because the element type is int, not BlockdevRefKind.  It's
 int so it can serve as argument for visit_get_next_type()'s parameter
 const int *qtypes.
 
 The + 1, - 1 business could be mildly confusing.  We could set all
 unused elements to -1 instead:

Or, we could ditch the qtypes lookup altogether, and merely create the
alternate enum as a non-consecutive QTYPE mapping, for one less level of
indirection, as in:

typedef enum BlockdevRefKind {
BLOCKDEV_REF_DEFINITION = QTYPE_QOBJECT,
BLOCKDEV_REF_REFERENCE = QTYPE_QSTRING,
};

then rewrite visit_get_next_type() to directly return the qtype, as well
as rewrite the generated switch statement in visit_type_BlockdevRef() to
directly inspect the qtypes it cares about.  In fact, that's the
approach I'm currently playing with.

 Add test eight test cases from my table above, then fix the generator to
 make them pass.

I hope to post an RFC followup patch along those lines later today.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH for-2.4 0/3] scsi: fixes for failed requests

2015-07-30 Thread Paolo Bonzini



On 30/07/2015 15:16, Stefan Hajnoczi wrote:
 When requests fail the error policy (-drive rerror=,werror=) determines what
 happens.  The 'stop' policy pauses the guest and waits for the administrator 
 to
 resolve the storage problem.  It is possible to live migrate during this time
 and the failed requests can be restarted on the destination host.
 
 Two bugs:
 1. Segfault due to missing sgs mapping when loading migrated failed requests.
 2. Incorrect error action due to broken is_read logic.
 
 I also noticed that the unaligned WRITE SAME test case in
 tests/virtio-scsi-test.c is broken.  I've included a fix for that too.
 
 Stefan Hajnoczi (3):
   virtio-scsi: use virtqueue_map_sg() when loading requests
   scsi-disk: fix cmd.mode field typo
   tests: virtio-scsi: clear unit attention after reset
 
  hw/scsi/scsi-disk.c  |  2 +-
  hw/scsi/virtio-scsi.c|  5 +++
  tests/virtio-scsi-test.c | 90 
 +---
  3 files changed, 60 insertions(+), 37 deletions(-)
 

All good, thanks!  I'll send a pull request asap.

Paolo

[Qemu-devel] [PATCH] vhost/scsi: call vhost_dev_cleanup() at unrealize() time

2015-07-30 Thread Igor Mammedov

vhost-scsi calls vhost_dev_init() at realize() time
but forgets to call it's counterpart vhost_dev_cleanup()
at unrealize() time.

Calling it should fix leaking of memory table and
mem_sections table in vhost device. And also unregister
vhost's memory listerner to prevent access from
memory core to freed memory.

Signed-off-by: Igor Mammedov imamm...@redhat.com
---
 hw/scsi/vhost-scsi.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
index a69918b..0dd57ff 100644
--- a/hw/scsi/vhost-scsi.c
+++ b/hw/scsi/vhost-scsi.c
@@ -277,6 +277,7 @@ static void vhost_scsi_unrealize(DeviceState *dev, Error 
**errp)
 /* This will stop vhost backend. */
 vhost_scsi_set_status(vdev, 0);
 
+vhost_dev_cleanup(s-dev);
 g_free(s-dev.vqs);
 
 virtio_scsi_common_unrealize(dev, errp);
-- 
1.8.3.1

[Qemu-devel] [PATCH 44/53] mips/kvm: Sign extend registers written to KVM

2015-07-30 Thread Michael Roth

From: James Hogan james.ho...@imgtec.com

In case we're running on a 64-bit host, be sure to sign extend the
general purpose registers and hi/lo/pc before writing them to KVM, so as
to take advantage of MIPS32/MIPS64 compatibility.

Signed-off-by: James Hogan james.ho...@imgtec.com
Cc: Paolo Bonzini pbonz...@redhat.com
Cc: Leon Alrae leon.al...@imgtec.com
Cc: Aurelien Jarno aurel...@aurel32.net
Cc: k...@vger.kernel.org
Cc: qemu-sta...@nongnu.org
Message-Id: 1429871214-23514-3-git-send-email-james.ho...@imgtec.com
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
(cherry picked from commit 02dae26ac4ceb1e82c432cfca4d9b65ae82343c6)
Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com
---
 target-mips/kvm.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/target-mips/kvm.c b/target-mips/kvm.c
index 1597bbe..d5388ca 100644
--- a/target-mips/kvm.c
+++ b/target-mips/kvm.c
@@ -633,12 +633,12 @@ int kvm_arch_put_registers(CPUState *cs, int level)
 
 /* Set the registers based on QEMU's view of things */
 for (i = 0; i  32; i++) {
-regs.gpr[i] = env-active_tc.gpr[i];
+regs.gpr[i] = (int64_t)(target_long)env-active_tc.gpr[i];
 }
 
-regs.hi = env-active_tc.HI[0];
-regs.lo = env-active_tc.LO[0];
-regs.pc = env-active_tc.PC;
+regs.hi = (int64_t)(target_long)env-active_tc.HI[0];
+regs.lo = (int64_t)(target_long)env-active_tc.LO[0];
+regs.pc = (int64_t)(target_long)env-active_tc.PC;
 
 ret = kvm_vcpu_ioctl(cs, KVM_SET_REGS, regs);
 
-- 
1.9.1

[Qemu-devel] [PATCH 46/53] virtio-net: unbreak any layout

2015-07-30 Thread Michael Roth

From: Jason Wang jasow...@redhat.com

Commit 032a74a1c0fcdd5fd1c69e56126b4c857ee36611
(virtio-net: byteswap virtio-net header) breaks any layout by
requiring out_sg[0].iov_len = n-guest_hdr_len. Fixing this by
copying header to temporary buffer if swap is needed, and then use
this buffer as part of out_sg.

Fixes 032a74a1c0fcdd5fd1c69e56126b4c857ee36611
(virtio-net: byteswap virtio-net header)
Cc: qemu-sta...@nongnu.org
Cc: c...@fr.ibm.com
Signed-off-by: Jason Wang jasow...@redhat.com
Reviewed-by: Michael S. Tsirkin m...@redhat.com
Signed-off-by: Michael S. Tsirkin m...@redhat.com
Reviewed-by: Eric Blake ebl...@redhat.com

(cherry picked from commit feb93f361739071778ca2d23df3876db399548f7)
Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com
---
 hw/net/virtio-net.c   | 23 ++-
 include/hw/virtio/virtio-access.h |  9 +
 2 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index b6fac9c..2d570e4 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -1138,7 +1138,8 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q)
 ssize_t ret, len;
 unsigned int out_num = elem.out_num;
 struct iovec *out_sg = elem.out_sg[0];
-struct iovec sg[VIRTQUEUE_MAX_SIZE];
+struct iovec sg[VIRTQUEUE_MAX_SIZE], sg2[VIRTQUEUE_MAX_SIZE + 1];
+struct virtio_net_hdr_mrg_rxbuf mhdr;
 
 if (out_num  1) {
 error_report(virtio-net header not in first element);
@@ -1146,13 +1147,25 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q)
 }
 
 if (n-has_vnet_hdr) {
-if (out_sg[0].iov_len  n-guest_hdr_len) {
+if (iov_to_buf(out_sg, out_num, 0, mhdr, n-guest_hdr_len) 
+n-guest_hdr_len) {
 error_report(virtio-net header incorrect);
 exit(1);
 }
-virtio_net_hdr_swap(vdev, (void *) out_sg[0].iov_base);
+if (virtio_needs_swap(vdev)) {
+virtio_net_hdr_swap(vdev, (void *) mhdr);
+sg2[0].iov_base = mhdr;
+sg2[0].iov_len = n-guest_hdr_len;
+out_num = iov_copy(sg2[1], ARRAY_SIZE(sg2) - 1,
+   out_sg, out_num,
+   n-guest_hdr_len, -1);
+if (out_num == VIRTQUEUE_MAX_SIZE) {
+goto drop;
+   }
+out_num += 1;
+out_sg = sg2;
+   }
 }
-
 /*
  * If host wants to see the guest header as is, we can
  * pass it on unchanged. Otherwise, copy just the parts
@@ -1182,7 +1195,7 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q)
 }
 
 len += ret;
-
+drop:
 virtqueue_push(q-tx_vq, elem, 0);
 virtio_notify(vdev, q-tx_vq);
 
diff --git a/include/hw/virtio/virtio-access.h 
b/include/hw/virtio/virtio-access.h
index 46456fd..f88f731 100644
--- a/include/hw/virtio/virtio-access.h
+++ b/include/hw/virtio/virtio-access.h
@@ -126,6 +126,15 @@ static inline uint64_t virtio_ldq_p(VirtIODevice *vdev, 
const void *ptr)
 }
 }
 
+static inline bool virtio_needs_swap(VirtIODevice *vdev)
+{
+#ifdef HOST_WORDS_BIGENDIAN
+return virtio_access_is_big_endian(vdev) ? false : true;
+#else
+return virtio_access_is_big_endian(vdev) ? true : false;
+#endif
+}
+
 static inline uint16_t virtio_tswap16(VirtIODevice *vdev, uint16_t s)
 {
 #ifdef HOST_WORDS_BIGENDIAN
-- 
1.9.1

[Qemu-devel] [PATCH 38/53] s390x/ipl: Fix boot if no bootindex was specified

2015-07-30 Thread Michael Roth

From: Christian Borntraeger borntrae...@de.ibm.com

commit fa92e218df1d (s390x/ipl: avoid sign extension) introduced
a regression:

qemu-system-s390x -drive file=image.qcow,format=qcow2
does not boot, the bios states
No virtio-blk device found!

adding bootindex=1 does boot.

The reason is that the uint32_t as return value will not do the right
thing for the return -1 (default without bootindex).
The bios itself, will interpret a 64bit -1 as autodetect (but it will
interpret 32bit -1 as ccw device address ff.ff.)

Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com
Cc: Paolo Bonzini pbonz...@redhat.com
Cc: Cornelia Huck cornelia.h...@de.ibm.com
Cc: qemu-sta...@nongnu.org # v2.3.0
Tested-by: Aurelien Jarno aurel...@aurel32.net
Reviewed-by: Aurelien Jarno aurel...@aurel32.net
Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
(cherry picked from commit 6efd2c2a125b4369b8def585b0dac35c849b5eb3)
Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com
---
 hw/s390x/ipl.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/s390x/ipl.c b/hw/s390x/ipl.c
index 2e26d2a..754fb19 100644
--- a/hw/s390x/ipl.c
+++ b/hw/s390x/ipl.c
@@ -218,7 +218,7 @@ static Property s390_ipl_properties[] = {
  * - -1 if no valid boot device was found
  * - ccw id of the boot device otherwise
  */
-static uint32_t s390_update_iplstate(CPUS390XState *env, S390IPLState *ipl)
+static uint64_t s390_update_iplstate(CPUS390XState *env, S390IPLState *ipl)
 {
 DeviceState *dev_st;
 
@@ -248,7 +248,7 @@ static uint32_t s390_update_iplstate(CPUS390XState *env, 
S390IPLState *ipl)
 
 return -1;
 out:
-return ipl-cssid  24 | ipl-ssid  16 | ipl-devno;
+return (uint32_t) (ipl-cssid  24 | ipl-ssid  16 | ipl-devno);
 }
 
 int s390_ipl_update_diag308(IplParameterBlock *iplb)
-- 
1.9.1

[Qemu-devel] [PATCH 03/53] Strip brackets from vnc host

2015-07-30 Thread Michael Roth

From: Ján Tomko jto...@redhat.com

Commit v2.2.0-1530-ge556032 vnc: switch to inet_listen_opts
bypassed the use of inet_parse in inet_listen, making literal
IPv6 addresses enclosed in brackets fail:

qemu-kvm: -vnc [::1]:0: Failed to start VNC server on `(null)': address
resolution failed for [::1]:5900: Name or service not known

Strip the brackets to make it work again.

Signed-off-by: Ján Tomko jto...@redhat.com
Reviewed-by: Eric Blake ebl...@redhat.com
Signed-off-by: Gerd Hoffmann kra...@redhat.com
(cherry picked from commit 274c3b52e10466a4771d591f6298ef61e8354ce0)
Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com
---
 ui/vnc.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/ui/vnc.c b/ui/vnc.c
index cffb5b7..f989dfb 100644
--- a/ui/vnc.c
+++ b/ui/vnc.c
@@ -3482,7 +3482,14 @@ void vnc_display_open(const char *id, Error **errp)
 
 h = strrchr(vnc, ':');
 if (h) {
-char *host = g_strndup(vnc, h - vnc);
+char *host;
+size_t hlen = h - vnc;
+
+if (vnc[0] == '['  vnc[hlen - 1] == ']') {
+host = g_strndup(vnc + 1, hlen - 2);
+} else {
+host = g_strndup(vnc, hlen);
+}
 qemu_opt_set(sopts, host, host, error_abort);
 qemu_opt_set(wsopts, host, host, error_abort);
 qemu_opt_set(sopts, port, h+1, error_abort);
-- 
1.9.1

[Qemu-devel] [PATCH 06/53] vmdk: Fix overflow if l1_size is 0x20000000

2015-07-30 Thread Michael Roth

From: Fam Zheng f...@redhat.com

Richard Jones caught this bug with afl fuzzer.

In fact, that's the only possible value to overflow (extent-l1_size =
0x2000) l1_size:

l1_size = extent-l1_size * sizeof(long) = 0x8000;

g_try_malloc returns NULL because l1_size is interpreted as negative
during type casting from 'int' to 'gsize', which yields a enormous
value. Hence, by coincidence, we get a not too bad behavior:

qemu-img: Could not open '/tmp/afl6.img': Could not open
'/tmp/afl6.img': Cannot allocate memory

Values larger than 0x2000 will be refused by the validation in
vmdk_add_extent.

Values smaller than 0x2000 will not overflow l1_size.

Cc: qemu-sta...@nongnu.org
Reported-by: Richard W.M. Jones rjo...@redhat.com
Signed-off-by: Fam Zheng f...@redhat.com
Reviewed-by: Max Reitz mre...@redhat.com
Tested-by: Richard W.M. Jones rjo...@redhat.com
Signed-off-by: Kevin Wolf kw...@redhat.com
(cherry picked from commit 13c4941cdd8685d28c7e3a09e393a5579b58db46)
Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com
---
 block/vmdk.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index bb093dd..bd74050 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -451,7 +451,8 @@ static int vmdk_init_tables(BlockDriverState *bs, 
VmdkExtent *extent,
 Error **errp)
 {
 int ret;
-int l1_size, i;
+size_t l1_size;
+int i;
 
 /* read the L1 table */
 l1_size = extent-l1_size * sizeof(uint32_t);
-- 
1.9.1

[Qemu-devel] [PATCH 08/53] usb: fix usb-net segfault

2015-07-30 Thread Michael Roth

From: Michal Kazior michal.kaz...@tieto.com

The dev-config pointer isn't set until guest
system initializes usb devices (via
usb_desc_set_config). However qemu networking can
go through some motions prior to that, e.g.:

 #0  is_rndis (s=0x57261970) at hw/usb/dev-network.c:653
 #1  0x5585f723 in usbnet_can_receive (nc=0x5641e820) at 
hw/usb/dev-network.c:1315
 #2  0x5587635e in qemu_can_send_packet (sender=0x572660a0) at 
net/net.c:470
 #3  0x55878e34 in net_hub_port_can_receive (nc=0x562d7800) at 
net/hub.c:101
 #4  0x5587635e in qemu_can_send_packet (sender=0x562d7980) at 
net/net.c:470
 #5  0x5587dbca in tap_can_send (opaque=0x562d7980) at net/tap.c:172

The command to reproduce most reliably was:

 qemu-system-i386 -usb -device usb-net,vlan=0 -net tap,vlan=0

This wasn't strictly a problem with tap. Other
networking endpoints (vde, user) could trigger
this problem as well.

Fixes: https://bugs.launchpad.net/qemu/+bug/1050823
Cc: qemu-sta...@nongnu.org
Signed-off-by: Michal Kazior michal.kaz...@tieto.com
Signed-off-by: Gerd Hoffmann kra...@redhat.com
(cherry picked from commit 278412d0e710e2e848c6e510f8308e5b1ed4d03e)
Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com
---
 hw/usb/dev-network.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/hw/usb/dev-network.c b/hw/usb/dev-network.c
index 1866991..9be3a64 100644
--- a/hw/usb/dev-network.c
+++ b/hw/usb/dev-network.c
@@ -1310,6 +1310,10 @@ static int usbnet_can_receive(NetClientState *nc)
 {
 USBNetState *s = qemu_get_nic_opaque(nc);
 
+if (!s-dev.config) {
+return 0;
+}
+
 if (is_rndis(s)  s-rndis_state != RNDIS_DATA_INITIALIZED) {
 return 1;
 }
-- 
1.9.1

[Qemu-devel] [PATCH 02/53] block/iscsi: do not forget to logout from target

2015-07-30 Thread Michael Roth

From: Peter Lieven p...@kamp.de

We actually were always impolitely dropping the connection and
not cleanly logging out.

CC: qemu-sta...@nongnu.org
Signed-off-by: Peter Lieven p...@kamp.de
Message-id: 1429193313-4263-2-git-send-email...@kamp.de
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
Signed-off-by: Kevin Wolf kw...@redhat.com
(cherry picked from commit 20474e9aa040b9a255c63127f1eb873c29c54f68)
Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com
---
 block/iscsi.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/block/iscsi.c b/block/iscsi.c
index ba33290..be8af46 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -1501,6 +1501,9 @@ out:
 
 if (ret) {
 if (iscsi != NULL) {
+if (iscsi_is_logged_in(iscsi)) {
+iscsi_logout_sync(iscsi);
+}
 iscsi_destroy_context(iscsi);
 }
 memset(iscsilun, 0, sizeof(IscsiLun));
@@ -1514,6 +1517,9 @@ static void iscsi_close(BlockDriverState *bs)
 struct iscsi_context *iscsi = iscsilun-iscsi;
 
 iscsi_detach_aio_context(bs);
+if (iscsi_is_logged_in(iscsi)) {
+iscsi_logout_sync(iscsi);
+}
 iscsi_destroy_context(iscsi);
 g_free(iscsilun-zeroblock);
 g_free(iscsilun-allocationmap);
-- 
1.9.1

[Qemu-devel] [PATCH 48/53] scsi: fix buffer overflow in scsi_req_parse_cdb (CVE-2015-5158)

2015-07-30 Thread Michael Roth

From: Paolo Bonzini pbonz...@redhat.com

This is a guest-triggerable buffer overflow present in QEMU 2.2.0
and newer.  scsi_cdb_length returns -1 as an error value, but the
caller does not check it.

Luckily, the massive overflow means that QEMU will just SIGSEGV,
making the impact much smaller.

Reported-by: Zhu Donghai (朱东海) donghai@alibaba-inc.com
Fixes: 1894df02811f6b79ea3ffbf1084599d96f316173
Reviewed-by: Fam Zheng f...@redhat.com
Cc: qemu-sta...@nongnu.org
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
(cherry picked from commit c170aad8b057223b1139d72e5ce7acceafab4fa9)
Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com
---
 hw/scsi/scsi-bus.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index bd2c0e4..0c506db 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -1239,10 +1239,15 @@ int scsi_cdb_length(uint8_t *buf) {
 int scsi_req_parse_cdb(SCSIDevice *dev, SCSICommand *cmd, uint8_t *buf)
 {
 int rc;
+int len;
 
 cmd-lba = -1;
-cmd-len = scsi_cdb_length(buf);
+len = scsi_cdb_length(buf);
+if (len  0) {
+return -1;
+}
 
+cmd-len = len;
 switch (dev-type) {
 case TYPE_TAPE:
 rc = scsi_req_stream_xfer(cmd, dev, buf);
-- 
1.9.1

Re: [Qemu-devel] [PATCH] bsd-user: Fix operand to cpu_x86_exec

2015-07-30 Thread Peter Maydell

On 29 July 2015 at 22:22, Peter Maydell peter.mayd...@linaro.org wrote:
 On 29 July 2015 at 19:40, Richard Henderson r...@twiddle.net wrote:
 Signed-off-by: Richard Henderson r...@twiddle.net
 ---
 This buglet, from whenever we re-orged the parameters, means
 that all x86-bsd-user invocations die instantly.


 r~
 ---
  bsd-user/main.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/bsd-user/main.c b/bsd-user/main.c
 index f46728b..ee68daa 100644
 --- a/bsd-user/main.c
 +++ b/bsd-user/main.c
 @@ -173,7 +173,7 @@ void cpu_loop(CPUX86State *env)
  //target_siginfo_t info;

  for(;;) {
 -trapnr = cpu_x86_exec(env);
 +trapnr = cpu_x86_exec(cs);
  switch(trapnr) {
  case 0x80:
  /* syscall from int $0x80 */

 Whoops. This sounds like it's worth putting into 2.4...

 Reviewed-by: Peter Maydell peter.mayd...@linaro.org

Applied, thanks.

-- PMM

[Qemu-devel] help

2015-07-30 Thread Serigne Baytir DIENG

Hi all,

I am new into this qemu, but i have a pretty good understanding of how it
works. What i am trying to do is to attach a new pci device into the virtio
pci bus. I don't know much what files need to be changed or how could i
achieved that.

Anyone can help?

[Qemu-devel] help attach pci device to virtio bus

2015-07-30 Thread Serigne Baytir DIENG

Hi all,

I am new into this qemu, but i have a pretty good understanding of how it
works. What i am trying to do is to attach a new pci device into the virtio
pci bus. I don't know much what files need to be changed or how could i
achieved that.

Anyone can help?

[Qemu-devel] simulate SMI in Qemu

2015-07-30 Thread Yu-Cheng Liu

hello,
Does QEMU support SMI/SMM implementation?  I use Qemu and coreboot to trace
the procedure of the SMI/SMM ,the program return  in SMM initial function
,the reason is the value return from pci_read_word ,the function always
return 0, whatever the address I give.

I want to simulate the board by qemu and use coreboot as BIOS,then trigger
SMI by post value to b2h port ,Is that my idea can work ? or I  need to
 burn coreboot.rom in real motherboard?

If anyone have experience please share to me ~
thanks~

Re: [Qemu-devel] [PATCH] vhost/scsi: call vhost_dev_cleanup() at unrealize() time

2015-07-30 Thread Paolo Bonzini



On 30/07/2015 15:29, Igor Mammedov wrote:
 vhost-scsi calls vhost_dev_init() at realize() time
 but forgets to call it's counterpart vhost_dev_cleanup()
 at unrealize() time.
 
 Calling it should fix leaking of memory table and
 mem_sections table in vhost device. And also unregister
 vhost's memory listerner to prevent access from
 memory core to freed memory.
 
 Signed-off-by: Igor Mammedov imamm...@redhat.com
 ---
  hw/scsi/vhost-scsi.c | 1 +
  1 file changed, 1 insertion(+)
 
 diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
 index a69918b..0dd57ff 100644
 --- a/hw/scsi/vhost-scsi.c
 +++ b/hw/scsi/vhost-scsi.c
 @@ -277,6 +277,7 @@ static void vhost_scsi_unrealize(DeviceState *dev, Error 
 **errp)
  /* This will stop vhost backend. */
  vhost_scsi_set_status(vdev, 0);
  
 +vhost_dev_cleanup(s-dev);
  g_free(s-dev.vqs);
  
  virtio_scsi_common_unrealize(dev, errp);
 

Applied to scsi-next, thanks.

Paolo

[Qemu-devel] [PATCH 11/53] fdc: force the fifo access to be in bounds of the allocated buffer

2015-07-30 Thread Michael Roth

From: Petr Matousek pmato...@redhat.com

During processing of certain commands such as FD_CMD_READ_ID and
FD_CMD_DRIVE_SPECIFICATION_COMMAND the fifo memory access could
get out of bounds leading to memory corruption with values coming
from the guest.

Fix this by making sure that the index is always bounded by the
allocated memory.

This is CVE-2015-3456.

Signed-off-by: Petr Matousek pmato...@redhat.com
Reviewed-by: John Snow js...@redhat.com
Signed-off-by: John Snow js...@redhat.com
(cherry picked from commit e907746266721f305d67bc0718795fedee2e824c)
Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com
---
 hw/block/fdc.c | 17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/hw/block/fdc.c b/hw/block/fdc.c
index 2bf87c9..a9de4ab 100644
--- a/hw/block/fdc.c
+++ b/hw/block/fdc.c
@@ -1512,7 +1512,7 @@ static uint32_t fdctrl_read_data(FDCtrl *fdctrl)
 {
 FDrive *cur_drv;
 uint32_t retval = 0;
-int pos;
+uint32_t pos;
 
 cur_drv = get_cur_drv(fdctrl);
 fdctrl-dsr = ~FD_DSR_PWRDOWN;
@@ -1521,8 +1521,8 @@ static uint32_t fdctrl_read_data(FDCtrl *fdctrl)
 return 0;
 }
 pos = fdctrl-data_pos;
+pos %= FD_SECTOR_LEN;
 if (fdctrl-msr  FD_MSR_NONDMA) {
-pos %= FD_SECTOR_LEN;
 if (pos == 0) {
 if (fdctrl-data_pos != 0)
 if (!fdctrl_seek_to_next_sect(fdctrl, cur_drv)) {
@@ -1867,10 +1867,13 @@ static void fdctrl_handle_option(FDCtrl *fdctrl, int 
direction)
 static void fdctrl_handle_drive_specification_command(FDCtrl *fdctrl, int 
direction)
 {
 FDrive *cur_drv = get_cur_drv(fdctrl);
+uint32_t pos;
 
-if (fdctrl-fifo[fdctrl-data_pos - 1]  0x80) {
+pos = fdctrl-data_pos - 1;
+pos %= FD_SECTOR_LEN;
+if (fdctrl-fifo[pos]  0x80) {
 /* Command parameters done */
-if (fdctrl-fifo[fdctrl-data_pos - 1]  0x40) {
+if (fdctrl-fifo[pos]  0x40) {
 fdctrl-fifo[0] = fdctrl-fifo[1];
 fdctrl-fifo[2] = 0;
 fdctrl-fifo[3] = 0;
@@ -1970,7 +1973,7 @@ static uint8_t command_to_handler[256];
 static void fdctrl_write_data(FDCtrl *fdctrl, uint32_t value)
 {
 FDrive *cur_drv;
-int pos;
+uint32_t pos;
 
 /* Reset mode */
 if (!(fdctrl-dor  FD_DOR_nRESET)) {
@@ -2019,7 +2022,9 @@ static void fdctrl_write_data(FDCtrl *fdctrl, uint32_t 
value)
 }
 
 FLOPPY_DPRINTF(%s: %02x\n, __func__, value);
-fdctrl-fifo[fdctrl-data_pos++] = value;
+pos = fdctrl-data_pos++;
+pos %= FD_SECTOR_LEN;
+fdctrl-fifo[pos] = value;
 if (fdctrl-data_pos == fdctrl-data_len) {
 /* We now have all parameters
  * and will be able to treat the command
-- 
1.9.1

[Qemu-devel] Patch Round-up for stable 2.3.1, freeze on 2015-08-06

2015-07-30 Thread Michael Roth

Hi everyone,

The following new patches are queued for QEMU stable v2.3.1:

  https://github.com/mdroth/qemu/commits/stable-2.3-staging

The release is planned for 2015-08-11:

  http://wiki.qemu.org/Planning/2.3

Please respond here or CC qemu-sta...@nongnu.org on any patches you
think should be included in the release.

Testing/feedback is greatly appreciated.

Thanks!


Alberto Garcia (1):
  sdl2: fix crash in handle_windowevent() when restoring the screen size

Alex Williamson (2):
  vfio/pci: Fix RTL8168 NIC quirks
  vfio/pci: Fix bootindex

Bogdan Purcareata (1):
  nbd/trivial: fix type cast for ioctl

Christian Borntraeger (1):
  s390x/ipl: Fix boot if no bootindex was specified

Cornelia Huck (1):
  virtio-ccw: complete handling of guest-initiated resets

David Gibson (1):
  spapr_vty: lookup should only return valid VTY objects

Fam Zheng (14):
  vmdk: Fix next_cluster_sector for compressed write
  vmdk: Fix overflow if l1_size is 0x2000
  block: Fix NULL deference for unaligned write if qiov is NULL
  qemu-iotests: Test unaligned sub-block zero write
  vmdk: Fix index_in_cluster calculation in vmdk_co_get_block_status
  vmdk: Use vmdk_find_index_in_cluster everywhere
  block: Add bdrv_get_block_status_above
  qmp: Add optional bool unmap to drive-mirror
  mirror: Do zero write on target if sectors not allocated
  block: Fix dirty bitmap in bdrv_co_discard
  qemu-iotests: Make block job methods common
  qemu-iotests: Add test case for mirror with unmap
  iotests: Use event_wait in wait_ready
  block: Initialize local_err in bdrv_append_temp_snapshot

Gerd Hoffmann (3):
  kbd: add brazil kbd keys to qemu
  kbd: add brazil kbd keys to x11 evdev map
  spice-display: fix segfault in qemu_spice_create_update

James Hogan (2):
  mips/kvm: Fix Big endian 32-bit register access
  mips/kvm: Sign extend registers written to KVM

Jason Wang (3):
  virtio-net: fix the upper bound when trying to delete queues
  vhost: correctly pass error to caller in vhost_dev_enable_notifiers()
  virtio-net: unbreak any layout

Jeff Cody (2):
  block: vpc - prevent overflow if max_table_entries = 0x4000
  block: qemu-iotests - add check for multiplication overflow in vpc

John Snow (1):
  iotests: add QMP event waiting queue

Justin Ossevoort (1):
  qga/commands-posix: Fix bug in guest-fstrim

Ján Tomko (1):
  Strip brackets from vnc host

Kevin Wolf (4):
  qcow2: Flush pending discards before allocating cluster
  ide: Check array bounds before writing to io_buffer (CVE-2015-5154)
  ide/atapi: Fix START STOP UNIT command completion
  ide: Clear DRQ after handling all expected accesses

Laszlo Ersek (1):
  hw/core: rebase sysbus_get_fw_dev_path() to g_strdup_printf()

Max Reitz (2):
  qcow2: Set MIN_L2_CACHE_SIZE to 2
  iotests: qcow2 COW with minimal L2 cache size

Michael Roth (2):
  Revert block: Fix unaligned zero write
  target-ppc: fix hugepage support when using memory-backend-file

Michal Kazior (1):
  usb: fix usb-net segfault

Paolo Bonzini (1):
  scsi: fix buffer overflow in scsi_req_parse_cdb (CVE-2015-5158)

Peter Lieven (2):
  block/iscsi: do not forget to logout from target
  block/nfs: limit maximum readahead size to 1MB

Peter Maydell (1):
  target-arm: Avoid buffer overrun on UNPREDICTABLE ldrd/strd

Petr Matousek (2):
  fdc: force the fifo access to be in bounds of the allocated buffer
  i8254: fix out-of-bounds memory access in pit_ioport_read()

Shannon Zhao (1):
  hw/acpi/aml-build: Fix memory leak

Stefan Hajnoczi (1):
  bt-sdp: fix broken uuids power-of-2 calculation

马文霜 (1):
  Fix irq route entries exceeding KVM_MAX_IRQ_ROUTES

 block.c   | 215 
+++---
 block/iscsi.c |   6 ++
 block/mirror.c|  28 ++--
 block/nfs.c   |   7 ++
 block/qcow2-refcount.c|   5 ++
 block/qcow2.h |   3 +-
 block/vmdk.c  |  40 
 block/vpc.c   |  18 --
 blockdev.c|   5 ++
 hmp.c |   2 +-
 hw/acpi/aml-build.c   |   1 +
 hw/block/fdc.c|  17 +++--
 hw/bt/sdp.c   |   2 +-
 hw/char/spapr_vty.c   |   4 ++
 hw/core/sysbus.c  |  16 ++---
 hw/ide/atapi.c|   1 +
 hw/ide/core.c |  32 --
 hw/net/virtio-net.c   |  25 ++--

[Qemu-devel] [PATCH 09/53] virtio-net: fix the upper bound when trying to delete queues

2015-07-30 Thread Michael Roth

From: Jason Wang jasow...@redhat.com

Virtqueue were indexed from zero, so don't delete virtqueue whose
index is n-max_queues * 2 + 1.

Cc: Michael S. Tsirkin m...@redhat.com
Cc: qemu-stable qemu-sta...@nongnu.org
Signed-off-by: Jason Wang jasow...@redhat.com
Reviewed-by: Michael S. Tsirkin m...@redhat.com
Signed-off-by: Michael S. Tsirkin m...@redhat.com

(cherry picked from commit 27a46dcf5038e20451101ed2d5414aebf3846e27)
Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com
---
 hw/net/virtio-net.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 59f76bc..b6fac9c 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -1309,7 +1309,7 @@ static void virtio_net_set_multiqueue(VirtIONet *n, int 
multiqueue)

 n-multiqueue = multiqueue;

-for (i = 2; i = n-max_queues * 2 + 1; i++) {
+for (i = 2; i  n-max_queues * 2 + 1; i++) {
 virtio_del_queue(vdev, i);
 }

-- 
1.9.1

[Qemu-devel] [PATCH 15/53] hw/acpi/aml-build: Fix memory leak

2015-07-30 Thread Michael Roth

From: Shannon Zhao shannon.z...@linaro.org

Signed-off-by: Shannon Zhao zhaoshengl...@huawei.com
Signed-off-by: Shannon Zhao shannon.z...@linaro.org
Reviewed-by: Michael S. Tsirkin m...@redhat.com
Signed-off-by: Michael S. Tsirkin m...@redhat.com
Reviewed-by: Igor Mammedov imamm...@redhat.com
(cherry picked from commit afcf905cff7971324c2706600ead35a1f41f417a)
Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com
---
 hw/acpi/aml-build.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index d7945f6..41ff6a3 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -304,6 +304,7 @@ static void aml_free(gpointer data, gpointer user_data)
 {
 Aml *var = data;
 build_free_array(var-buf);
+g_free(var);
 }
 
 Aml *init_aml_allocator(void)
-- 
1.9.1

[Qemu-devel] [PATCH 36/53] iotests: add QMP event waiting queue

2015-07-30 Thread Michael Roth

From: John Snow js...@redhat.com

A filter is added to allow callers to request very specific
events to be pulled from the event queue, while leaving undesired
events still in the stream.

This allows us to poll for completion data for multiple asynchronous
events in any arbitrary order.

A new timeout context is added to the qmp pull_event method's
wait parameter to allow tests to fail if they do not complete
within some expected period of time.

Also fixed is a bug in qmp.pull_event where we try to retrieve an event
from an empty list if we attempt to retrieve an event with wait=False
but no events have occurred.

Signed-off-by: John Snow js...@redhat.com
Reviewed-by: Max Reitz mre...@redhat.com
Reviewed-by: Stefan Hajnoczi stefa...@redhat.com
Message-id: 1429314609-29776-19-git-send-email-js...@redhat.com
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
Signed-off-by: Kevin Wolf kw...@redhat.com
(cherry picked from commit 7898f74e78a5900fc079868e255b65d807fa8a8f)
Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com
---
 scripts/qmp/qmp.py| 95 +--
 tests/qemu-iotests/iotests.py | 38 +
 2 files changed, 103 insertions(+), 30 deletions(-)

diff --git a/scripts/qmp/qmp.py b/scripts/qmp/qmp.py
index 20b6ec7..1d38e3e 100644
--- a/scripts/qmp/qmp.py
+++ b/scripts/qmp/qmp.py
@@ -21,6 +21,9 @@ class QMPConnectError(QMPError):
 class QMPCapabilitiesError(QMPError):
 pass
 
+class QMPTimeoutError(QMPError):
+pass
+
 class QEMUMonitorProtocol:
 def __init__(self, address, server=False):
 
@@ -72,6 +75,44 @@ class QEMUMonitorProtocol:
 
 error = socket.error
 
+def __get_events(self, wait=False):
+
+Check for new events in the stream and cache them in __events.
+
+@param wait (bool): block until an event is available.
+@param wait (float): If wait is a float, treat it as a timeout value.
+
+@raise QMPTimeoutError: If a timeout float is provided and the timeout
+period elapses.
+@raise QMPConnectError: If wait is True but no events could be 
retrieved
+or if some other error occurred.
+
+
+# Check for new events regardless and pull them into the cache:
+self.__sock.setblocking(0)
+try:
+self.__json_read()
+except socket.error, err:
+if err[0] == errno.EAGAIN:
+# No data available
+pass
+self.__sock.setblocking(1)
+
+# Wait for new events, if needed.
+# if wait is 0.0, this means no wait and is also implicitly false.
+if not self.__events and wait:
+if isinstance(wait, float):
+self.__sock.settimeout(wait)
+try:
+ret = self.__json_read(only_event=True)
+except socket.timeout:
+raise QMPTimeoutError(Timeout waiting for event)
+except:
+raise QMPConnectError(Error while reading from socket)
+if ret is None:
+raise QMPConnectError(Error while reading from socket)
+self.__sock.settimeout(None)
+
 def connect(self, negotiate=True):
 
 Connect to the QMP Monitor and perform capabilities negotiation.
@@ -140,43 +181,37 @@ class QEMUMonitorProtocol:
 
 Get and delete the first available QMP event.
 
-@param wait: block until an event is available (bool)
+@param wait (bool): block until an event is available.
+@param wait (float): If wait is a float, treat it as a timeout value.
+
+@raise QMPTimeoutError: If a timeout float is provided and the timeout
+period elapses.
+@raise QMPConnectError: If wait is True but no events could be 
retrieved
+or if some other error occurred.
+
+@return The first available QMP event, or None.
 
-self.__sock.setblocking(0)
-try:
-self.__json_read()
-except socket.error, err:
-if err[0] == errno.EAGAIN:
-# No data available
-pass
-self.__sock.setblocking(1)
-if not self.__events and wait:
-self.__json_read(only_event=True)
-event = self.__events[0]
-del self.__events[0]
-return event
+self.__get_events(wait)
+
+if self.__events:
+return self.__events.pop(0)
+return None
 
 def get_events(self, wait=False):
 
 Get a list of available QMP events.
 
-@param wait: block until an event is available (bool)
-
-self.__sock.setblocking(0)
-try:
-self.__json_read()
-except socket.error, err:
-if err[0] == errno.EAGAIN:
-# No data available
-pass
-self.__sock.setblocking(1)
-if

[Qemu-devel] [PATCH 29/53] block: Add bdrv_get_block_status_above

2015-07-30 Thread Michael Roth

From: Fam Zheng f...@redhat.com

Like bdrv_is_allocated_above, this function follows the backing chain until 
seeing
BDRV_BLOCK_ALLOCATED.  Base is not included.

Reimplement bdrv_is_allocated on top.

[Initialized bdrv_co_get_block_status_above() ret to 0 to silence
mingw64 compiler warning about the unitialized variable.  assert(bs !=
base) prevents that case but I suppose the program could be compiled
with -DNDEBUG.
--Stefan]

Signed-off-by: Fam Zheng f...@redhat.com
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
(cherry picked from commit ba3f0e2545c365ebe1dbddb0e53058710d41881e)
Conflicts:
block/io.c

* applied manually to avoid dependency on 61007b316
Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com
---
 block.c   | 56 +--
 include/block/block.h |  4 
 2 files changed, 49 insertions(+), 11 deletions(-)

diff --git a/block.c b/block.c
index 2b50dc7..63dd460 100644
--- a/block.c
+++ b/block.c
@@ -4202,28 +4202,54 @@ static int64_t coroutine_fn 
bdrv_co_get_block_status(BlockDriverState *bs,
 return ret;
 }
 
-/* Coroutine wrapper for bdrv_get_block_status() */
-static void coroutine_fn bdrv_get_block_status_co_entry(void *opaque)
+static int64_t coroutine_fn bdrv_co_get_block_status_above(BlockDriverState 
*bs,
+BlockDriverState *base,
+int64_t sector_num,
+int nb_sectors,
+int *pnum)
+{
+BlockDriverState *p;
+int64_t ret = 0;
+
+assert(bs != base);
+for (p = bs; p != base; p = p-backing_hd) {
+ret = bdrv_co_get_block_status(p, sector_num, nb_sectors, pnum);
+if (ret  0 || ret  BDRV_BLOCK_ALLOCATED) {
+break;
+}
+/* [sector_num, pnum] unallocated on this layer, which could be only
+ * the first part of [sector_num, nb_sectors].  */
+nb_sectors = MIN(nb_sectors, *pnum);
+}
+return ret;
+}
+
+/* Coroutine wrapper for bdrv_get_block_status_above() */
+static void coroutine_fn bdrv_get_block_status_above_co_entry(void *opaque)
 {
 BdrvCoGetBlockStatusData *data = opaque;
-BlockDriverState *bs = data-bs;
 
-data-ret = bdrv_co_get_block_status(bs, data-sector_num, 
data-nb_sectors,
- data-pnum);
+data-ret = bdrv_co_get_block_status_above(data-bs, data-base,
+   data-sector_num,
+   data-nb_sectors,
+   data-pnum);
 data-done = true;
 }
 
 /*
- * Synchronous wrapper around bdrv_co_get_block_status().
+ * Synchronous wrapper around bdrv_co_get_block_status_above().
  *
- * See bdrv_co_get_block_status() for details.
+ * See bdrv_co_get_block_status_above() for details.
  */
-int64_t bdrv_get_block_status(BlockDriverState *bs, int64_t sector_num,
-  int nb_sectors, int *pnum)
+int64_t bdrv_get_block_status_above(BlockDriverState *bs,
+BlockDriverState *base,
+int64_t sector_num,
+int nb_sectors, int *pnum)
 {
 Coroutine *co;
 BdrvCoGetBlockStatusData data = {
 .bs = bs,
+.base = base,
 .sector_num = sector_num,
 .nb_sectors = nb_sectors,
 .pnum = pnum,
@@ -4232,11 +4258,11 @@ int64_t bdrv_get_block_status(BlockDriverState *bs, 
int64_t sector_num,
 
 if (qemu_in_coroutine()) {
 /* Fast-path if already in coroutine context */
-bdrv_get_block_status_co_entry(data);
+bdrv_get_block_status_above_co_entry(data);
 } else {
 AioContext *aio_context = bdrv_get_aio_context(bs);
 
-co = qemu_coroutine_create(bdrv_get_block_status_co_entry);
+co = qemu_coroutine_create(bdrv_get_block_status_above_co_entry);
 qemu_coroutine_enter(co, data);
 while (!data.done) {
 aio_poll(aio_context, true);
@@ -4245,6 +4271,14 @@ int64_t bdrv_get_block_status(BlockDriverState *bs, 
int64_t sector_num,
 return data.ret;
 }
 
+int64_t bdrv_get_block_status(BlockDriverState *bs,
+  int64_t sector_num,
+  int nb_sectors, int *pnum)
+{
+return bdrv_get_block_status_above(bs, bs-backing_hd,
+   sector_num, nb_sectors, pnum);
+}
+
 int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t sector_num,
int nb_sectors, int *pnum)
 {
diff --git a/include/block/block.h b/include/block/block.h
index 4c57d63..98c6703 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -361,6 +361,10 @@ bool bdrv_unallocated_blocks_are_zero(BlockDriverState 
*bs);
 bool bdrv_can_write_zeroes_with_unmap(BlockDriverState *bs);
 int64_t bdrv_get_block_status(BlockDriverState *bs, int64_t sector_num,
   int nb_sectors, int *pnum);
+int64_t

[Qemu-devel] [PATCH 24/53] spice-display: fix segfault in qemu_spice_create_update

2015-07-30 Thread Michael Roth

From: Gerd Hoffmann kra...@redhat.com

Although it is pretty unusual the stride for the guest image and the
mirror image maintained by spice-display can be different.  So use
separate variables for them.

https://bugzilla.redhat.com/show_bug.cgi?id=1163047

Cc: qemu-sta...@nongnu.org
Reported-by: perrier vincent clow...@clownix.net
Signed-off-by: Gerd Hoffmann kra...@redhat.com
(cherry picked from commit c6e484707f28b3e115e64122a0570f6b3c585489)
Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com
---
 ui/spice-display.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/ui/spice-display.c b/ui/spice-display.c
index 1644185..5935564 100644
--- a/ui/spice-display.c
+++ b/ui/spice-display.c
@@ -199,7 +199,7 @@ static void qemu_spice_create_update(SimpleSpiceDisplay 
*ssd)
 static const int blksize = 32;
 int blocks = (surface_width(ssd-ds) + blksize - 1) / blksize;
 int dirty_top[blocks];
-int y, yoff, x, xoff, blk, bw;
+int y, yoff1, yoff2, x, xoff, blk, bw;
 int bpp = surface_bytes_per_pixel(ssd-ds);
 uint8_t *guest, *mirror;
 
@@ -214,13 +214,14 @@ static void qemu_spice_create_update(SimpleSpiceDisplay 
*ssd)
 guest = surface_data(ssd-ds);
 mirror = (void *)pixman_image_get_data(ssd-mirror);
 for (y = ssd-dirty.top; y  ssd-dirty.bottom; y++) {
-yoff = y * surface_stride(ssd-ds);
+yoff1 = y * surface_stride(ssd-ds);
+yoff2 = y * pixman_image_get_stride(ssd-mirror);
 for (x = ssd-dirty.left; x  ssd-dirty.right; x += blksize) {
 xoff = x * bpp;
 blk = x / blksize;
 bw = MIN(blksize, ssd-dirty.right - x);
-if (memcmp(guest + yoff + xoff,
-   mirror + yoff + xoff,
+if (memcmp(guest + yoff1 + xoff,
+   mirror + yoff2 + xoff,
bw * bpp) == 0) {
 if (dirty_top[blk] != -1) {
 QXLRect update = {
-- 
1.9.1

[Qemu-devel] [PATCH 42/53] block: Initialize local_err in bdrv_append_temp_snapshot

2015-07-30 Thread Michael Roth

From: Fam Zheng f...@redhat.com

Cc: qemu-sta...@nongnu.org
Signed-off-by: Fam Zheng f...@redhat.com
Message-id: 1436156684-16526-1-git-send-email-f...@redhat.com
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
(cherry picked from commit c2e0dbbfd7265eb9a7170ab195d8f9f8a1cbd1af)
Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com
---
 block.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block.c b/block.c
index 4f52d7a..2366b8a 100644
--- a/block.c
+++ b/block.c
@@ -1380,7 +1380,7 @@ int bdrv_append_temp_snapshot(BlockDriverState *bs, int 
flags, Error **errp)
 QemuOpts *opts = NULL;
 QDict *snapshot_options;
 BlockDriverState *bs_snapshot;
-Error *local_err;
+Error *local_err = NULL;
 int ret;
 
 /* if snapshot, we create a temporary backing file and open it
-- 
1.9.1

[Qemu-devel] [PATCH 53/53] ide: Clear DRQ after handling all expected accesses

2015-07-30 Thread Michael Roth

From: Kevin Wolf kw...@redhat.com

This is additional hardening against an end_transfer_func that fails to
clear the DRQ status bit. The bit must be unset as soon as the PIO
transfer has completed, so it's better to do this in a central place
instead of duplicating the code in all commands (and forgetting it in
some).

Signed-off-by: Kevin Wolf kw...@redhat.com
Reviewed-by: John Snow js...@redhat.com
(cherry picked from commit cb72cba83021fa42719e73a5249c12096a4d1cfc)
Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com
---
 hw/ide/core.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index 17153f5..822519b 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -2028,8 +2028,10 @@ void ide_data_writew(void *opaque, uint32_t addr, 
uint32_t val)
 *(uint16_t *)p = le16_to_cpu(val);
 p += 2;
 s-data_ptr = p;
-if (p = s-data_end)
+if (p = s-data_end) {
+s-status = ~DRQ_STAT;
 s-end_transfer_func(s);
+}
 }
 
 uint32_t ide_data_readw(void *opaque, uint32_t addr)
@@ -2053,8 +2055,10 @@ uint32_t ide_data_readw(void *opaque, uint32_t addr)
 ret = cpu_to_le16(*(uint16_t *)p);
 p += 2;
 s-data_ptr = p;
-if (p = s-data_end)
+if (p = s-data_end) {
+s-status = ~DRQ_STAT;
 s-end_transfer_func(s);
+}
 return ret;
 }
 
@@ -2078,8 +2082,10 @@ void ide_data_writel(void *opaque, uint32_t addr, 
uint32_t val)
 *(uint32_t *)p = le32_to_cpu(val);
 p += 4;
 s-data_ptr = p;
-if (p = s-data_end)
+if (p = s-data_end) {
+s-status = ~DRQ_STAT;
 s-end_transfer_func(s);
+}
 }
 
 uint32_t ide_data_readl(void *opaque, uint32_t addr)
@@ -2103,8 +2109,10 @@ uint32_t ide_data_readl(void *opaque, uint32_t addr)
 ret = cpu_to_le32(*(uint32_t *)p);
 p += 4;
 s-data_ptr = p;
-if (p = s-data_end)
+if (p = s-data_end) {
+s-status = ~DRQ_STAT;
 s-end_transfer_func(s);
+}
 return ret;
 }
 
-- 
1.9.1

[Qemu-devel] [PATCH 45/53] vfio/pci: Fix RTL8168 NIC quirks

2015-07-30 Thread Michael Roth

From: Alex Williamson alex.william...@redhat.com

The RTL8168 quirk correctly describes using bit 31 as a signal to
mark a latch/completion, but the code mistakenly uses bit 28.  This
causes the Realtek driver to spin on this register for quite a while,
20k cycles on Windows 7 v7.092 driver.  Then it gets frustrated and
tries to set the bit itself and spins for another 20k cycles.  For
some this still results in a working driver, for others not.  About
the only thing the code really does in its current form is protect
the guest from sneaking in writes to the real hardware MSI-X table.
The fix is obviously to use bit 31 as we document that we should.

The other problem doesn't seem to affect current drivers as nobody
seems to use these window registers for writes to the MSI-X table, but
we need to use the stored data when a write is triggered, not the
value of the current write, which only provides the offset.

Note that only the Windows drivers from Realtek seem to use these
registers, the Microsoft drivers provided with Windows 8.1 do not
access them, nor do Linux in-kernel drivers.

Link: https://bugs.launchpad.net/qemu/+bug/1384892
Signed-off-by: Alex Williamson alex.william...@redhat.com
Cc: qemu-sta...@nongnu.org # v2.1+
(cherry picked from commit 69970fcef937bddd7f745efe39501c7716fdfe56)
Conflicts:
hw/vfio/pci.c

* removed dependency on 3b643495

Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com
---
 hw/vfio/pci.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 6b80539..73fd89e 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -1516,7 +1516,7 @@ static uint64_t vfio_rtl8168_window_quirk_read(void 
*opaque,
 memory_region_name(quirk-mem),
 vdev-vbasedev.name);
 
-return quirk-data.address_match ^ 0x1000U;
+return quirk-data.address_match ^ 0x8000U;
 }
 break;
 case 0: /* data */
@@ -1554,7 +1554,7 @@ static void vfio_rtl8168_window_quirk_write(void *opaque, 
hwaddr addr,
 switch (addr) {
 case 4: /* address */
 if ((data  0x7fff) == 0x1) {
-if (data  0x1000U 
+if (data  0x8000U 
 vdev-pdev.cap_present  QEMU_PCI_CAP_MSIX) {
 
 trace_vfio_rtl8168_window_quirk_write_table(
@@ -1562,8 +1562,9 @@ static void vfio_rtl8168_window_quirk_write(void *opaque, 
hwaddr addr,
 vdev-vbasedev.name);
 
 io_mem_write(vdev-pdev.msix_table_mmio,
- (hwaddr)(quirk-data.address_match  0xfff),
- data, size);
+ (hwaddr)(data  0xfff),
+ (uint64_t)quirk-data.address_mask,
+ size);
 }
 
 quirk-data.flags = 1;
-- 
1.9.1

[Qemu-devel] [PATCH 43/53] mips/kvm: Fix Big endian 32-bit register access

2015-07-30 Thread Michael Roth

From: James Hogan james.ho...@imgtec.com

Fix access to 32-bit registers on big endian targets. The pointer passed
to the kernel must be for the actual 32-bit value, not a temporary
64-bit value, otherwise on big endian systems the kernel will only
interpret the upper half.

Signed-off-by: James Hogan james.ho...@imgtec.com
Cc: Paolo Bonzini pbonz...@redhat.com
Cc: Leon Alrae leon.al...@imgtec.com
Cc: Aurelien Jarno aurel...@aurel32.net
Cc: k...@vger.kernel.org
Cc: qemu-sta...@nongnu.org
Message-Id: 1429871214-23514-2-git-send-email-james.ho...@imgtec.com
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
(cherry picked from commit f8b3e48b2d269551cd40f94770dc20da2f402325)
Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com
---
 target-mips/kvm.c | 13 +++--
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/target-mips/kvm.c b/target-mips/kvm.c
index 4d1f7ea..1597bbe 100644
--- a/target-mips/kvm.c
+++ b/target-mips/kvm.c
@@ -240,10 +240,9 @@ int kvm_mips_set_ipi_interrupt(MIPSCPU *cpu, int irq, int 
level)
 static inline int kvm_mips_put_one_reg(CPUState *cs, uint64_t reg_id,
int32_t *addr)
 {
-uint64_t val64 = *addr;
 struct kvm_one_reg cp0reg = {
 .id = reg_id,
-.addr = (uintptr_t)val64
+.addr = (uintptr_t)addr
 };
 
 return kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, cp0reg);
@@ -275,18 +274,12 @@ static inline int kvm_mips_put_one_reg64(CPUState *cs, 
uint64_t reg_id,
 static inline int kvm_mips_get_one_reg(CPUState *cs, uint64_t reg_id,
int32_t *addr)
 {
-int ret;
-uint64_t val64 = 0;
 struct kvm_one_reg cp0reg = {
 .id = reg_id,
-.addr = (uintptr_t)val64
+.addr = (uintptr_t)addr
 };
 
-ret = kvm_vcpu_ioctl(cs, KVM_GET_ONE_REG, cp0reg);
-if (ret = 0) {
-*addr = val64;
-}
-return ret;
+return kvm_vcpu_ioctl(cs, KVM_GET_ONE_REG, cp0reg);
 }
 
 static inline int kvm_mips_get_one_ulreg(CPUState *cs, uint64 reg_id,
-- 
1.9.1

Re: [Qemu-devel] [PATCH RFC v2 29/47] qapi: Replace dirty is_c_ptr() by method c_null()

2015-07-30 Thread Eric Blake

On 07/29/2015 11:22 AM, Markus Armbruster wrote:
 Eric Blake ebl...@redhat.com writes:
 
 On 07/29/2015 02:32 AM, Markus Armbruster wrote:

 2. We can leak retval only when qmp_FOO() returns non-null and local_err
is non-null.  This must not happen, because:

a. local_err must be null before the call, and

b. the call must not return non-null when it sets local_err.

 We don't state that contract anywhere, but I doubt any of the qmp_FOO()
 functions violate it, so it is worth making it part of the contract.

 It's a general Error API rule: set an error exactly on failure.  It
 applies to any function returning errors through an Error **errp
 parameter, and we generally don't bother to spell it out for the
 individual functions.

 The part that needs to be spelling out is what success and failure mean.
 A qmp_FOO() returning an object returns null on failure.

For qmp_FOO(), this is a reasonable contract.  But our very own
generated code does not follow these rules: visit_type_FOO() can assign
into *obj even when setting an error, if it encounters a parse error
halfway through the struct, leaving the caller responsible to still
clean up the mess if it wants to avoid a memory leak.

Maybe that means our generated code needs to be reworked to properly
clean up on a failed parse, such that *obj is guaranteed to be NULL if
an error is returned.  As a separate patch, of course.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v2] arm: change vendor ID for virtio-mmio

2015-07-30 Thread Michael S. Tsirkin

On Thu, Jul 30, 2015 at 10:24:11AM +0100, Peter Maydell wrote:
 On 30 July 2015 at 09:04, Michael S. Tsirkin m...@redhat.com wrote:
  On Thu, Jul 30, 2015 at 09:23:20AM +0800, Shannon Zhao wrote:
 
  Why do we drop the previous way using QEMU? Something I missed?
 
  So that guests that bind to this interface will work fine with non QEMU
  implementations of virtio-mmio.
 
 I don't understand this sentence. If there are pre-existing
 non-QEMU virtio-mmio implementations, then they're using
 LNRO0005, and we should use it too. If there are going to
 be implementations of virtio-mmio in future, then they will
 use whatever identifier we pick here. Either way, we get
 interoperability. I don't see any difference between our
 saying the ID for virtio-mmio is QEMU0005 and saying
 the ID for virtio-mmio is 1AF4103F.

I agree. It's just that 1AF4 is already reserved for virtio.

 (The latter seems unnecessarily opaque to me, to be honest.
 At least an ID string QEMU gives you a clue where to
 look for who owns the thing.)

Well - if one looks in the ACPI spec, that says if ID uses numbers, then
one has to find the vendor from PCI SIG, and that has a database mapping
IDs to vendors.

 
 Note also that strictly you don't mean non-QEMU implementations
 of virtio-mmio, you mean non-QEMU implementations of the
 ACPI tables.

Yes.

 The hardware implementation of virtio-mmio
 doesn't care at all about the ACPI ID. (In fact the most
 plausible other-implementation would be UEFI using its
 own (hard-coded) ACPI tables on top of a QEMU vexpress-a15
 model or something similar.)
 
 -- PMM

Re: [Qemu-devel] Call Trace for QEMU functions

2015-07-30 Thread Alex Bennée


Peter Maydell peter.mayd...@linaro.org writes:

 On 30 July 2015 at 13:20, Naman patel naman...@gmail.com wrote:
 Hi,

  I have compiled QEMU (2.0) for x86_64 on Fedora 22 with tracing enabled
 and the tracing option I chose was dtrace. I have this script called
 callTrace.stp in which I try and get the Call Trace of the function
 helper_invlpg and later tlb_flush.  But I am not able to get the function
 name of the caller function and the call trace depth is only limited to 2.

 The helper_invlpg function is called directly from code generated
 by QEMU's built-in JIT, not from any other C function.

 If you use a newer version of QEMU than 2.0 then I think we have
 fixed some of the stack frame information up so that you can
 get a backtrace that looks like:
  * helper function
  * [generated code]
  * QEMU execution loop code that handles executing guest code
  * other QEMU functions

 This is not likely to be very useful for profiling why or when
 we're calling a particular helper function, though.

With the perf JIT patch you can get a better handle on the profile. I'll
see if I can re-spin them tomorrow for the latest tree.


 thanks
 -- PMM

-- 
Alex Bennée

Re: [Qemu-devel] [POC] colo-proxy in qemu

2015-07-30 Thread Yang Hongyang




On 07/30/2015 09:59 PM, Dr. David Alan Gilbert wrote:

* zhanghailiang (zhang.zhanghaili...@huawei.com) wrote:

On 2015/7/30 20:30, Dr. David Alan Gilbert wrote:

* Gonglei (arei.gong...@huawei.com) wrote:

On 2015/7/30 19:56, Dr. David Alan Gilbert wrote:

* Jason Wang (jasow...@redhat.com) wrote:



On 07/30/2015 04:03 PM, Dr. David Alan Gilbert wrote:

* Dong, Eddie (eddie.d...@intel.com) wrote:

A question here, the packet comparing may be very tricky. For example,
some protocol use random data to generate unpredictable id or
something else. One example is ipv6_select_ident() in Linux. So COLO
needs a mechanism to make sure PVM and SVM can generate same random

data?
Good question, the random data connection is a big problem for COLO. At
present, it will trigger checkpoint processing because of the different random
data.
I don't think any mechanisms can assure two different machines generate the
same random data. If you have any ideas, pls tell us :)

Frequent checkpoint can handle this scenario, but maybe will cause the
performance poor. :(


The assumption is that, after VM checkpoint, SVM and PVM have identical 
internal state, so the pattern used to generate random data has high 
possibility to generate identical data at short time, at least...

They do diverge pretty quickly though; I have simple examples which
reliably cause a checkpoint because of simple randomness in applications.

Dave



And it will become even worse if hwrng is used in guest.


Yes; it seems quite application dependent;  (on IPv4) an ssh connection,
once established, tends to work well without triggering checkpoints;
and static web pages also work well.  Examples of things that do cause
more checkpoints are, displaying guest statistics (e.g. running top
in that ssh) which is timing dependent, and dynamically generated
web pages that include a unique ID (bugzilla's password reset link in
it's front page was a fun one), I think also establishing
new encrypted connections cause the same randomness.

However, it's worth remembering that COLO is trying to reduce the
number of checkpoints compared to a simple checkpointing world
which would be aiming to do a checkpoint ~100 times a second,
and for compute bound workloads, or ones that don't expose
the randomness that much, it can get checkpoints of a few seconds
in length which greatly reduces the overhead.



Yes. That's the truth.
We can set two different modes for different scenarios. Maybe Named
1) frequent checkpoint mode for multi-connections and randomness scenarios
and 2) non-frequent checkpoint mode for other scenarios.

But that's the next plan, we are thinking about that.


I have some code that tries to automatically switch between those;
it measures the checkpoint lengths, and if they're consistently short
it sends a different message byte to the secondary at the start of the
checkpoint, so that it doesn't bother running.   Every so often it
then flips back to a COLO checkpoint to see if the checkpoints
are still really fast.



Do you mean if there are consistent checkpoint requests, not do checkpoint but 
just send a special message to SVM?
Resume to common COLO mode until the checkpoint lengths is so not short ?


   We still have to do checkpoints, but we send a special message to the SVM so 
that
the SVM just takes the checkpoint but does not run.

   I'll send the code after I've updated it to your current version; but it's
quite rough/experimental.

It works something like

  ---run PVM run SVM
  COLO long gap
  mode   miscompare
 checkpoint
  ---run PVM run SVM
  COLO short gap
  mode   miscompare
 checkpoint
  ---run PVM run SVM
  COLO short gap
  mode   miscompare  After a few short runs
 checkpoint
  ---run PVM SVM idle   \
Passivefixed delay|  - repeat 'n' times
  mode   checkpoint /
  ---run PVM run SVM
  COLO short gap   Still a short gap
  mode   miscompare
  ---run PVM SVM idle   \
Passivefixed delay|  - repeat 'n' times
  mode   checkpoint /
  ---run PVM run SVM
  COLO long gap   long gap now, stay in COLO
  mode   miscompare
 checkpoint
  ---run PVM run SVM
  COLO long gap
  mode   miscompare
 checkpoint

So it saves the CPU time on the SVM, and the comparison traffic, and is
automatic at switching into the passive mode.

It used to be more useful, but your minimum COLO run time that you
added a few versions ago helps a lot in the cases where there are miscompares,
and the delay after the miscompare before you take the checkpoint also helps
in the case where the data is very random.


This is great! This is exactly what we were thinking about, when random
scenario will fallback to MC/Remus

Re: [Qemu-devel] [PATCH] qmp-shell: add documentation

2015-07-30 Thread Luiz Capitulino

On Thu, 23 Jul 2015 09:36:38 +0200
Markus Armbruster arm...@redhat.com wrote:

 John Snow js...@redhat.com writes:
 
  On 07/02/2015 11:31 AM, Luiz Capitulino wrote:
  On Wed,  1 Jul 2015 14:25:49 -0400
  John Snow js...@redhat.com wrote:
  
  I should probably document the changes that were made.
 
 John, what do you mean here?
 
  Signed-off-by: John Snow js...@redhat.com
  
  Looks good to me, CC'ing maintainer.
 
 Luiz, is this a R-by?

Reviewed-by: Luiz Capitulino lcapitul...@redhat.com

This one is :)

 
  Whoops, didn't realize Markus took this file over, too. Sorry Luiz.
 
 Don't worry about our maintainer reshuffling.
 
  Markus, would you consider staging this? It's purely a documentation
  update for only a dev tool, so it doesn't really matter /when/ it lands
  either way, just shoring up some changes I made a while back to the
  interpreter here.
 
  tldr: ping
 
 I'm happy to include this in the next pull after it got reviewed.  I'm
 ignorant about qmp-shell, because I don't use it myself, so I'd have to
 dig through it to verify your documentation is accurate and reasonably
 complete.
 
 Fishing for more qualified reviewers:
 
 $ scripts/get_maintainer.pl --git-blame -f scripts/qmp/qmp-shell 
 Markus Armbruster arm...@redhat.com (supporter:QMP)
 Luiz Capitulino lcapitul...@redhat.com (authored 
 lines:230/390=59%,commits:10/10=100%)
 John Snow js...@redhat.com (authored lines:117/390=30%,commits:4/10=40%)
 Daniel P. Berrange berra...@redhat.com (authored lines:27/390=7%)
 Eric Blake ebl...@redhat.com (commits:6/10=60%)
 Stefan Hajnoczi stefa...@redhat.com (commits:2/10=20%)
 Benoit Canet ben...@irqsave.net (commits:1/10=10%)
 
 Luiz, can you review for accuracy and reasonable completeness?
 
 Of course, I'm the reviewer of last resort for anything I maintain,
 whether I understand it or not :)

Re: [Qemu-devel] help attach pci device to virtio bus

2015-07-30 Thread Stefan Hajnoczi

This thread is a duplicate.

Re: [Qemu-devel] [PATCH 09/12] netfilter: add a netbuffer filter

2015-07-30 Thread Yang Hongyang




On 07/30/2015 10:16 PM, Thomas Huth wrote:

On 30/07/15 12:28, Yang Hongyang wrote:

On 07/30/2015 06:14 PM, Jason Wang wrote:



On 07/30/2015 05:49 PM, Yang Hongyang wrote:

On 07/30/2015 05:33 PM, Jason Wang wrote:

[...]

I see, so the reason is you are using qemu_deliver_packet() for both
enqueuing packet to filter and delivering packet to destination. How
about something like:

E.g for qemu_send_packet_async(), move the hook before
qemu_send_packet_async_with_flags(). Then flush method can call
qemu_send_packet_async_with_flags() without any issue?


I think we can't move the hook earlier, because filters only deal
with the packets will actually been sent. for example, a dump filter.
dump packet that probably won't been sent is wrong. calling
qemu_send_packet_async() or qemu_send_packet_async_with_flags()
doesn't mean the packet is sent, if the sent_cb is not provided and
the other peer is not able to receive, the packet will be dropped.


It depends on how do you define 'actually been sent' and whether or not
we should have such accuracy. Packet could be dropped by various layers.
Reaching receive() or receive_iov() does not mean it can be sent for
sure. For example, lots of nics drop packet in their receive()
implementation.


This is true, ok, I'm convinced that we might not need to be this accurate.
but Thomas might have different opinion, I saw this description in his
dump series:

+/*
+ * Log network traffic into a dump file. Note: This should ideally
+ * be done after calling the -receive() function below to make sure
+ * that we only log the packets that have really been sent. However,
+ * this does not work right with slirp networking since it immediately
+ * sends reply packets during the receive() function already, so we
+ * would get a wrong order of the packets in the dump file in that
case.
+ */

So Thomas, what do you think of this?


IMHO it should be ok if a dump captures a packet multiple times - it's
not nice, but it could theoretically also happen on a physical line when
a packet has to be retransmitted.


Ok, thanks! then I'll move the filter hook earlier.



  Thomas


.



--
Thanks,
Yang.

Re: [Qemu-devel] [PATCH for-2.5 7/8] s390x: Migrate guest storage keys (initial memory only)

2015-07-30 Thread Jason J. Herne


On 07/21/2015 06:37 AM, David Hildenbrand wrote:


So if I've got this code right, you send here a header that announces
a packet with all pages ...


+while (handled_count  total_count) {
+cur_count = MIN(total_count - handled_count, S390_SKEYS_BUFFER_SIZE);
+
+ret = skeyclass-get_skeys(ss, cur_gfn, cur_count, buf);
+if (ret  0) {
+error_report(S390_GET_KEYS error %d\n, ret);
+break;


... but when an error occurs here, you suddenly stop in the middle of
that packet with all pages ...


Indeed, although that should never fail, we never know.
We don't want to overengineer the protocol but still abort migration at least
on the loading side in that (theoretical) case.



I don't have a strong opinion on this either way. I think it is fine 
just the way
it is (for the reasons David described above). However, if people are 
worried I

can see about writing some code that sends fake keys to the destination as
described below. Thoughts?




+}
+
+/* write keys to stream */
+qemu_put_buffer(f, buf, cur_count);
+
+cur_gfn += cur_count;
+handled_count += cur_count;
+}
+
+g_free(buf);
+end_stream:
+qemu_put_be64(f, S390_SKEYS_SAVE_FLAG_EOS);


... and send an EOS marker here instead ...


+}
+
+static int s390_storage_keys_load(QEMUFile *f, void *opaque, int version_id)
+{
+S390SKeysState *ss = S390_SKEYS(opaque);
+S390SKeysClass *skeyclass = S390_SKEYS_GET_CLASS(ss);
+int ret = 0;
+
+while (!ret) {
+ram_addr_t addr;
+int flags;
+
+addr = qemu_get_be64(f);
+flags = addr  ~TARGET_PAGE_MASK;
+addr = TARGET_PAGE_MASK;
+
+switch (flags) {
+case S390_SKEYS_SAVE_FLAG_SKEYS: {
+const uint64_t total_count = qemu_get_be64(f);
+uint64_t handled_count = 0, cur_count;
+uint64_t cur_gfn = addr / TARGET_PAGE_SIZE;
+uint8_t *buf = g_try_malloc(S390_SKEYS_BUFFER_SIZE);
+
+if (!buf) {
+error_report(storage key load could not allocate memory\n);
+ret = -ENOMEM;
+break;
+}
+
+while (handled_count  total_count) {
+cur_count = MIN(total_count - handled_count,
+S390_SKEYS_BUFFER_SIZE);
+qemu_get_buffer(f, buf, cur_count);


... while the receiver can not handle the EOS marker here.

This looks fishy to me (or I might have just missed something), but
anyway please double check whether your error handling in the sender
really makes sense.


My shot would be, to send invalid storage keys if getting the keys from the
kernel fails. So we can detect it on the loading side and abort migration
gracefully.




+ret = skeyclass-set_skeys(ss, cur_gfn, cur_count, buf);
+if (ret  0) {
+error_report(S390_SET_KEYS error %d\n, ret);
+break;
+}
+handled_count += cur_count;
+cur_gfn += cur_count;
+}
+g_free(buf);
+break;
+}
+case S390_SKEYS_SAVE_FLAG_EOS:
+/* normal exit */
+return 0;
+default:
+error_report(Unexpected storage key flag data: %#x, flags);
+ret = -EINVAL;
+}
+}
+
+return ret;
+}


  Thomas


Thanks Thomas!


David



--
-- Jason J. Herne (jjhe...@linux.vnet.ibm.com)

Re: [Qemu-devel] [PATCH for-2.5 7/8] s390x: Migrate guest storage keys (initial memory only)

2015-07-30 Thread Thomas Huth

On 30/07/15 17:00, Jason J. Herne wrote:
 On 07/21/2015 06:37 AM, David Hildenbrand wrote:

 So if I've got this code right, you send here a header that announces
 a packet with all pages ...

 +while (handled_count  total_count) {
 +cur_count = MIN(total_count - handled_count,
 S390_SKEYS_BUFFER_SIZE);
 +
 +ret = skeyclass-get_skeys(ss, cur_gfn, cur_count, buf);
 +if (ret  0) {
 +error_report(S390_GET_KEYS error %d\n, ret);
 +break;

 ... but when an error occurs here, you suddenly stop in the middle of
 that packet with all pages ...

 Indeed, although that should never fail, we never know.
 We don't want to overengineer the protocol but still abort migration
 at least
 on the loading side in that (theoretical) case.

 
 I don't have a strong opinion on this either way. I think it is fine
 just the way
 it is (for the reasons David described above). However, if people are
 worried I
 can see about writing some code that sends fake keys to the destination as
 described below. Thoughts?

If David is right and the skeyclass-get_skeys() really never fails (I
did not check), then simply do an assert (ret == 0) afterwards - that
way you can be sure that it really never fails. And if it ever fails,
you notice it immediately - and that's certainly way much better than
debugging the currently-wrong error handling code.

 Thomas

Re: [Qemu-devel] [PATCH 09/12] netfilter: add a netbuffer filter

2015-07-30 Thread Thomas Huth

On 30/07/15 12:28, Yang Hongyang wrote:
 On 07/30/2015 06:14 PM, Jason Wang wrote:


 On 07/30/2015 05:49 PM, Yang Hongyang wrote:
 On 07/30/2015 05:33 PM, Jason Wang wrote:
[...]
 I see, so the reason is you are using qemu_deliver_packet() for both
 enqueuing packet to filter and delivering packet to destination. How
 about something like:

 E.g for qemu_send_packet_async(), move the hook before
 qemu_send_packet_async_with_flags(). Then flush method can call
 qemu_send_packet_async_with_flags() without any issue?

 I think we can't move the hook earlier, because filters only deal
 with the packets will actually been sent. for example, a dump filter.
 dump packet that probably won't been sent is wrong. calling
 qemu_send_packet_async() or qemu_send_packet_async_with_flags()
 doesn't mean the packet is sent, if the sent_cb is not provided and
 the other peer is not able to receive, the packet will be dropped.

 It depends on how do you define 'actually been sent' and whether or not
 we should have such accuracy. Packet could be dropped by various layers.
 Reaching receive() or receive_iov() does not mean it can be sent for
 sure. For example, lots of nics drop packet in their receive()
 implementation.
 
 This is true, ok, I'm convinced that we might not need to be this accurate.
 but Thomas might have different opinion, I saw this description in his
 dump series:
 
 +/*
 + * Log network traffic into a dump file. Note: This should ideally
 + * be done after calling the -receive() function below to make sure
 + * that we only log the packets that have really been sent. However,
 + * this does not work right with slirp networking since it immediately
 + * sends reply packets during the receive() function already, so we
 + * would get a wrong order of the packets in the dump file in that
 case.
 + */
 
 So Thomas, what do you think of this?

IMHO it should be ok if a dump captures a packet multiple times - it's
not nice, but it could theoretically also happen on a physical line when
a packet has to be retransmitted.

 Thomas

Re: [Qemu-devel] [PATCH v2] arm: change vendor ID for virtio-mmio

2015-07-30 Thread Michael S. Tsirkin

On Thu, Jul 30, 2015 at 05:21:51PM +0800, Shannon Zhao wrote:
 
 
 On 2015/7/30 16:04, Michael S. Tsirkin wrote:
  On Thu, Jul 30, 2015 at 09:23:20AM +0800, Shannon Zhao wrote:
 
 
  On 2015/7/30 3:16, Michael S. Tsirkin wrote:
  ACPI spec 5.0 allows the use of PCI vendor IDs.
 
  But virtio-mmio is not a PCI device, it's a platform device.
  
  Yes. ACPI spec 5.0 says:
  
  A valid PNP ID must be of the form AAA where A is an uppercase
  letter and # is a hex digit. A valid ACPI ID must be of the form
   where N is an uppercase letter or a digit ('0'-'9') and # is
  a hex digit. This specification reserves the string ACPI for use only
  with devices defined herein.
  
  It further reserves all strings representing 4 HEX digits for
  exclusive use with PCI-assigned Vendor IDs.
  
  The second paragraph means if PCI SIG assigned you an ID, you
  can use that without need to register it with ASWG.
  
  
  Why do we drop the previous way using QEMU? Something I missed?
  
  So that guests that bind to this interface will work fine with non QEMU
  implementations of virtio-mmio.
  
 
 I think kernel driver supports multiple IDs. If they don't want to
 QEMU as ACPI ID, it's free to add a new one like below.
 
 +static const struct acpi_device_id virtio_mmio_acpi_match[] = {
 +   { QEMU0005, },
 +   { 1AF4103F, },
 + { }
 +};
 +MODULE_DEVICE_TABLE(acpi, virtio_mmio_acpi_match);

Yes but that won't work with existing disto kernels.

  It's just playing nice with others.
  
  We could have done something similar to pvpanic as well, except we
  didn't and guests using the QEMU prefix have been released,
  so we have to keep using that.
  
  Since we have one for virtio, it seems neater to use that
  rather than LNRO. For the device ID, use 103F which is a legacy ID that
  isn't used in virtio PCI spec - seems to make sense since virtio-mmio is
  a legacy device but we don't know the correct device type.
 
  Guests should probably match everything in the range 1000-103F
  (just like legacy pci drivers do) which will allow us to pass in the
  actual ID in the future if we want to.
 
  Signed-off-by: Michael S. Tsirkin m...@redhat.com
  ---
   hw/arm/virt-acpi-build.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)
 
  diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
  index f365140..dea61ba 100644
  --- a/hw/arm/virt-acpi-build.c
  +++ b/hw/arm/virt-acpi-build.c
  @@ -145,7 +145,7 @@ static void acpi_dsdt_add_virtio(Aml *scope,
   
   for (i = 0; i  num; i++) {
   Aml *dev = aml_device(VR%02u, i);
  -aml_append(dev, aml_name_decl(_HID, aml_string(LNRO0005)));
  +aml_append(dev, aml_name_decl(_HID, aml_string(1AF4103F)));
   aml_append(dev, aml_name_decl(_UID, aml_int(i)));
   
   Aml *crs = aml_resource_template();
 
 
  -- 
  Shannon
  
  .
  
 
 -- 
 Shannon

Re: [Qemu-devel] [PATCH for-2.4 v3 1/3] vhost: add vhost_has_free_slot() interface

2015-07-30 Thread Michael S. Tsirkin

On Thu, Jul 30, 2015 at 12:11:57PM +0200, Igor Mammedov wrote:
 it will allow for other parts of QEMU check if it's safe
 to map memory region during hotplug/runtime.
 That way hotplug path will have a chance to cancel
 hotplug operation instead of crashing in vhost_commit().
 
 Signed-off-by: Igor Mammedov imamm...@redhat.com
 ---
 v3:
   * add refcountin of limit by # vhost devices,
 and make vhost_has_free_slot() return true if no vhost devices
 are present at current moment
   * move limit initialization to vhost_dev_init()
 v2:
   * replace probbing with checking for
 /sys/module/vhost/parameters/max_mem_regions and
 if it's missing has non wrong value return
 hardcoded legacy limit (64 slots).
 ---
  hw/virtio/vhost-backend.c | 21 -
  hw/virtio/vhost-user.c|  8 +++-
  hw/virtio/vhost.c | 23 +++
  include/hw/virtio/vhost-backend.h |  2 ++
  include/hw/virtio/vhost.h |  1 +
  stubs/Makefile.objs   |  1 +
  stubs/vhost.c |  6 ++
  7 files changed, 60 insertions(+), 2 deletions(-)
  create mode 100644 stubs/vhost.c
 
 diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
 index 4d68a27..11ec669 100644
 --- a/hw/virtio/vhost-backend.c
 +++ b/hw/virtio/vhost-backend.c
 @@ -11,6 +11,7 @@
  #include hw/virtio/vhost.h
  #include hw/virtio/vhost-backend.h
  #include qemu/error-report.h
 +#include linux/vhost.h
  
  #include sys/ioctl.h
  
 @@ -42,11 +43,29 @@ static int vhost_kernel_cleanup(struct vhost_dev *dev)
  return close(fd);
  }
  
 +static int vhost_kernel_memslots_limit(struct vhost_dev *dev)
 +{
 +int limit = 64;
 +char *s;
 +
 +if (g_file_get_contents(/sys/module/vhost/parameters/max_mem_regions,
 +s, NULL, NULL)) {
 +uint64_t val = g_ascii_strtoull(s, NULL, 10);
 +if (!((val == G_MAXUINT64 || !val)  errno)) {
 +return val;
 +}
 +error_report(ignoring invalid max_mem_regions value in vhost 
 module:
 +  %s, s);
 +}
 +return limit;
 +}
 +
  static const VhostOps kernel_ops = {
  .backend_type = VHOST_BACKEND_TYPE_KERNEL,
  .vhost_call = vhost_kernel_call,
  .vhost_backend_init = vhost_kernel_init,
 -.vhost_backend_cleanup = vhost_kernel_cleanup
 +.vhost_backend_cleanup = vhost_kernel_cleanup,
 +.vhost_backend_memslots_limit = vhost_kernel_memslots_limit
  };
  
  int vhost_set_backend_type(struct vhost_dev *dev, VhostBackendType 
 backend_type)
 diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
 index e7ab829..acdfd04 100644
 --- a/hw/virtio/vhost-user.c
 +++ b/hw/virtio/vhost-user.c
 @@ -343,9 +343,15 @@ static int vhost_user_cleanup(struct vhost_dev *dev)
  return 0;
  }
  
 +static int vhost_user_memslots_limit(struct vhost_dev *dev)
 +{
 +return VHOST_MEMORY_MAX_NREGIONS;
 +}
 +
  const VhostOps user_ops = {
  .backend_type = VHOST_BACKEND_TYPE_USER,
  .vhost_call = vhost_user_call,
  .vhost_backend_init = vhost_user_init,
 -.vhost_backend_cleanup = vhost_user_cleanup
 +.vhost_backend_cleanup = vhost_user_cleanup,
 +.vhost_backend_memslots_limit = vhost_user_memslots_limit
  };
 diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
 index 2712c6f..bcbad48 100644
 --- a/hw/virtio/vhost.c
 +++ b/hw/virtio/vhost.c
 @@ -26,6 +26,18 @@
  
  static struct vhost_log *vhost_log;
  
 +static struct {
 +int used_memslots;
 +int memslots_limit;
 +int refcount;
 +} slots_limit;
 +
 +bool vhost_has_free_slot(void)
 +{
 +return slots_limit.refcount ?
 +slots_limit.memslots_limit  slots_limit.used_memslots : true;
 +}
 +
  static void vhost_dev_sync_region(struct vhost_dev *dev,
MemoryRegionSection *section,
uint64_t mfirst, uint64_t mlast,
 @@ -457,6 +469,7 @@ static void vhost_set_memory(MemoryListener *listener,
  dev-mem_changed_start_addr = MIN(dev-mem_changed_start_addr, 
 start_addr);
  dev-mem_changed_end_addr = MAX(dev-mem_changed_end_addr, start_addr + 
 size - 1);
  dev-memory_changed = true;
 +slots_limit.used_memslots = dev-mem-nregions;
  }
  
  static bool vhost_section(MemoryRegionSection *section)
 @@ -916,6 +929,14 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
  return -errno;
  }
  
 +r = hdev-vhost_ops-vhost_backend_memslots_limit(hdev);
 +if (slots_limit.refcount  0) {
 +slots_limit.memslots_limit = MIN(slots_limit.memslots_limit, r);
 +} else {
 +slots_limit.memslots_limit = r;
 +}
 +slots_limit.refcount++;
 +
  r = hdev-vhost_ops-vhost_call(hdev, VHOST_SET_OWNER, NULL);
  if (r  0) {
  goto fail;
 @@ -972,6 +993,7 @@ fail_vq:
  fail:
  r = -errno;
  hdev-vhost_ops-vhost_backend_cleanup(hdev);
 +

[Qemu-devel] [RFC PATCH] spapr: Provide an error message when migration fails due to htab_shift mismatch

2015-07-30 Thread Bharata B Rao

Include an error message when migration fails due to mismatch in
htab_shift values at source and target. This should provide a bit more
verbose message in addition to the current migration failure message
that reads like:

qemu-system-ppc64: error while loading state for instance 0x0 of device 
'spapr/htab'

After this patch, the failure message will look like this:

qemu-system-ppc64: htab_shift mismatch: source 29 target 24
qemu-system-ppc64: error while loading state for instance 0x0 of device 
'spapr/htab'

Signed-off-by: Bharata B Rao bhar...@linux.vnet.ibm.com
---
Applies against spapr-next branch of David Gibson's tree.

 hw/ppc/spapr.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index dfd808f..2dcab34 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1471,6 +1471,8 @@ static int htab_load(QEMUFile *f, void *opaque, int 
version_id)
 if (section_hdr) {
 /* First section, just the hash shift */
 if (spapr-htab_shift != section_hdr) {
+error_report(htab_shift mismatch: source %d target %d,
+ section_hdr, spapr-htab_shift);
 return -EINVAL;
 }
 return 0;
-- 
2.1.0

[Qemu-devel] [PATCH] hw/pci-host/bonito: Avoid buffer overrun for bad LDMA/COP accesses

2015-07-30 Thread Peter Maydell

The LDMA and COP memory regions represent four 32 bit registers
each, but the memory regions themselves are 0x100 bytes large.
Add guards to the read and write accessors so that bogus accesses
beyond the four defined registers don't just run off the end of
the bonldma and boncop structs and into whatever lies beyond.

Signed-off-by: Peter Maydell peter.mayd...@linaro.org
---
I don't have a fulong2e image, so this is compile tested only...

 hw/pci-host/bonito.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/hw/pci-host/bonito.c b/hw/pci-host/bonito.c
index 3a731fe..4139a2c 100644
--- a/hw/pci-host/bonito.c
+++ b/hw/pci-host/bonito.c
@@ -355,6 +355,10 @@ static uint64_t bonito_ldma_readl(void *opaque, hwaddr 
addr,
 uint32_t val;
 PCIBonitoState *s = opaque;
 
+if (addr = sizeof(s-bonldma)) {
+return 0;
+}
+
 val = ((uint32_t *)(s-bonldma))[addr/sizeof(uint32_t)];
 
 return val;
@@ -365,6 +369,10 @@ static void bonito_ldma_writel(void *opaque, hwaddr addr,
 {
 PCIBonitoState *s = opaque;
 
+if (addr = sizeof(s-bonldma)) {
+return;
+}
+
 ((uint32_t *)(s-bonldma))[addr/sizeof(uint32_t)] = val  0x;
 }
 
@@ -384,6 +392,10 @@ static uint64_t bonito_cop_readl(void *opaque, hwaddr addr,
 uint32_t val;
 PCIBonitoState *s = opaque;
 
+if (addr = sizeof(s-boncop)) {
+return 0;
+}
+
 val = ((uint32_t *)(s-boncop))[addr/sizeof(uint32_t)];
 
 return val;
@@ -394,6 +406,10 @@ static void bonito_cop_writel(void *opaque, hwaddr addr,
 {
 PCIBonitoState *s = opaque;
 
+if (addr = sizeof(s-boncop)) {
+return;
+}
+
 ((uint32_t *)(s-boncop))[addr/sizeof(uint32_t)] = val  0x;
 }
 
-- 
1.9.1

Re: [Qemu-devel] [PATCH RFC v2 07/47] qapi: Generate a nicer struct for flat unions

2015-07-30 Thread Eric Blake

On 07/30/2015 01:11 AM, Markus Armbruster wrote:

 Another name collision bug: our code generates flat unions as:

 struct BlockdevOptions {
 BlockdevDriver driver;
 ...
 /* End fields inherited from BlockdevOptionsBase. */
 /* union tag is BlockdevDriver driver */
 union {
 void *data;
 BlockdevOptionsArchipelago *archipelago;
 ...

 which means that if we name any of the branches 'data' (that is, if
 'data' is a member of the enum discriminator), things fail to compile.
 We could probably fix that by naming our dummy branch '_data'.
 
 I wonder whether member data is actually used.  I'll find out.

The dealloc visitor uses 'data' being non-null as a flag on whether to
deallocate the union even if the tag was invalid for some reason; or
more importantly, if parsing consumed the tag but then detected an error
while parsing the union, leaving the union branch partially allocated.
To avoid a leak, we have to deallocate the branch.

But if the tag was invalid, then why did we ever allocate the union in
the first place, and how do we prove we are calling the correct free-ing
function?  And if the tag is valid, why can't we just guarantee that the
union is 0-initialized and that deleting the branch will work through
the correct branch type instead of worrying about 'data'?

We still need a dummy member if it is valid to do { 'union':'Foo',
'data':{} } since C doesn't like empty unions, but an empty union seems
like something we may want to reject, at which point you are probably
right that deleting the data member altogether should work and still let
us recover from bad partial parses without a leak.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 01/12] qga: misc spelling

2015-07-30 Thread Eric Blake

On 07/01/2015 05:47 AM, Marc-André Lureau wrote:
 ---
  qga/qapi-schema.json | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Eric Blake ebl...@redhat.com

As a doc change, it's trivial enough for inclusion if we wanted it, but
now we're late enough that it's probably also fine to wait for 2.5.

 
 diff --git a/qga/qapi-schema.json b/qga/qapi-schema.json
 index b446dc7..fbf983c 100644
 --- a/qga/qapi-schema.json
 +++ b/qga/qapi-schema.json
 @@ -755,7 +755,7 @@
  # scheme. Refer to the documentation of the guest operating system
  # in question to determine what is supported.
  #
 -# Note all guest operating systems will support use of the
 +# Not all guest operating systems will support use of the
  # @crypted flag, as they may require the clear-text password
  #
  # The @password parameter must always be base64 encoded before
 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH for-2.4 v3 3/3] vhost: fail backend intialization if memslots number is more than its supported limit

2015-07-30 Thread Michael S. Tsirkin

On Thu, Jul 30, 2015 at 12:11:59PM +0200, Igor Mammedov wrote:
 Signed-off-by: Igor Mammedov imamm...@redhat.com
 ---
  hw/virtio/vhost.c | 6 ++
  1 file changed, 6 insertions(+)
 
 diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
 index bcbad48..48fbac1 100644
 --- a/hw/virtio/vhost.c
 +++ b/hw/virtio/vhost.c
 @@ -985,6 +985,12 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
  hdev-started = false;
  hdev-memory_changed = false;
  memory_listener_register(hdev-memory_listener, address_space_memory);
 +if (!vhost_has_free_slot()) {

I think this one needs a different test: we are not adding
a new slot so just checking = there should be
enough.


 +fprintf(stderr, vhost backend memory slots limit is less
 + than current number of present memory slots\n);
 +vhost_dev_cleanup(hdev);
 +return -1;
 +}
  return 0;
  fail_vq:
  while (--i = 0) {
 -- 
 1.8.3.1

Re: [Qemu-devel] [PATCH v13 00/19] i.MX: Add i.MX25 support through the PDK evaluation board

2015-07-30 Thread Jean-Christophe DUBOIS


Hi,

Is there any more work needed on this series?

Regards

JC

Le 16/07/2015 23:21, Jean-Christophe Dubois a écrit :

This series of patches add the support for the i.MX25 processor through the
Freescale PDK evaluation board.

For now a limited set of devices is supported.
 * GPT timers (from i.MX31)
 * EPIT timers (from i.MX31)
 * Serial ports (from i.MX31)
 * Ethernet FEC port
 * I2C controller

In the process the KZM platform was split into an i.MX31 SOC
and a plateform part.

Also, I2C devices was added to the i.MX31 SOC.

This was tested by:
 * booting a minimal linux system on the i.MX25 PDK platform
 * booting the Xvisor hypervisor on the i.MX25 PDK platform
 * booting a minimal linux system on the KZM platform

Jean-Christophe Dubois (19):
   i.MX: Split UART emulator in a header file and a source file
   i.MX: Move serial initialization to init/realize of DeviceClass.
   i.MX:Fix Coding style for UART emulator.
   i.MX: Split AVIC emulator in a header file and a source file
   i.MX: Fix Coding style for AVIC emulator.
   i.MX: Split CCM emulator in a header file and a source file
   i.MX: Fix Coding style for CCM emulator
   i.MX: Split EPIT emulator in a header file and a source file
   i.MX: Fix Coding style for EPIT emulator
   i.MX: Split GPT emulator in a header file and a source file
   i.MX: Fix Coding style for GPT emulator
   i.MX: Add SOC support for i.MX31
   i.MX: KZM now uses the standalone i.MX31 SOC support
   i.MX: Add I2C controller emulator
   i.MX: Add FEC Ethernet Emulator
   i.MX: Add SOC support for i.MX25
   i.MX: Add the i.MX25 PDK plateform
   i.MX: Add qtest support for I2C device emulator.
   i.MX: Adding i2C devices to i.MX31 SOC

  default-configs/arm-softmmu.mak |   6 +
  hw/arm/Makefile.objs|   4 +-
  hw/arm/fsl-imx25.c  | 260 +++
  hw/arm/fsl-imx31.c  | 233 +
  hw/arm/imx25_pdk.c  | 162 +
  hw/arm/kzm.c| 205 ++--
  hw/char/imx_serial.c| 180 ++
  hw/i2c/Makefile.objs|   1 +
  hw/i2c/imx_i2c.c| 339 +++
  hw/intc/imx_avic.c  |  56 +---
  hw/misc/imx_ccm.c   |  81 +
  hw/net/Makefile.objs|   1 +
  hw/net/imx_fec.c| 709 
  hw/timer/imx_epit.c |  75 +
  hw/timer/imx_gpt.c  |  96 +-
  include/hw/arm/fsl-imx25.h  | 234 +
  include/hw/arm/fsl-imx31.h  | 110 +++
  include/hw/arm/imx.h|  34 --
  include/hw/char/imx_serial.h| 102 ++
  include/hw/i2c/imx_i2c.h|  85 +
  include/hw/intc/imx_avic.h  |  55 
  include/hw/misc/imx_ccm.h   |  91 ++
  include/hw/net/imx_fec.h| 113 +++
  include/hw/timer/imx_epit.h |  79 +
  include/hw/timer/imx_gpt.h  | 107 ++
  tests/Makefile  |   3 +
  tests/ds1338-test.c |  75 +
  tests/libqos/i2c-imx.c  | 209 
  tests/libqos/i2c.h  |   3 +
  29 files changed, 3151 insertions(+), 557 deletions(-)
  create mode 100644 hw/arm/fsl-imx25.c
  create mode 100644 hw/arm/fsl-imx31.c
  create mode 100644 hw/arm/imx25_pdk.c
  create mode 100644 hw/i2c/imx_i2c.c
  create mode 100644 hw/net/imx_fec.c
  create mode 100644 include/hw/arm/fsl-imx25.h
  create mode 100644 include/hw/arm/fsl-imx31.h
  delete mode 100644 include/hw/arm/imx.h
  create mode 100644 include/hw/char/imx_serial.h
  create mode 100644 include/hw/i2c/imx_i2c.h
  create mode 100644 include/hw/intc/imx_avic.h
  create mode 100644 include/hw/misc/imx_ccm.h
  create mode 100644 include/hw/net/imx_fec.h
  create mode 100644 include/hw/timer/imx_epit.h
  create mode 100644 include/hw/timer/imx_gpt.h
  create mode 100644 tests/ds1338-test.c
  create mode 100644 tests/libqos/i2c-imx.c

Re: [Qemu-devel] [PATCH] ahci: fix ICC mask definition

2015-07-30 Thread Peter Maydell

On 30 July 2015 at 19:41, John Snow js...@redhat.com wrote:
 Peter: I assume you still want this for 2.4 to fix the clang warnings, yes?

Yeah, it's safe enough. (I don't actually require these clang
warnings all fixed for 2.4; but it's a nice-to-have.)

thanks
-- PMM

[Qemu-devel] [Bug 1465935] Re: kvm_irqchip_commit_routes: Assertion `ret == 0' failed

2015-07-30 Thread Li Chengyuan

Bader,

Sorry to response late. 
We patch our QEMU 2.0 and running on ubuntu 12.04, and shall keep it running 
for a while. 
I'll let you know if this problem is gone after weeks.

Regards,
CY.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1465935

Title:
  kvm_irqchip_commit_routes: Assertion `ret == 0' failed

Status in QEMU:
  New
Status in qemu package in Ubuntu:
  Confirmed
Status in qemu source package in Precise:
  New
Status in qemu source package in Trusty:
  New
Status in qemu source package in Utopic:
  New
Status in qemu source package in Vivid:
  New

Bug description:
  Several my QEMU instances crashed, and in the  qemu log, I can see
  this assertion failure,

 qemu-system-x86_64: /build/buildd/qemu-2.0.0+dfsg/kvm-all.c:984:
  kvm_irqchip_commit_routes: Assertion `ret == 0' failed.

  The QEMU version is 2.0.0, HV OS is ubuntu 12.04, kernel 3.2.0-38.
  Guest OS is RHEL 6.3.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1465935/+subscriptions

Re: [Qemu-devel] [PATCH v13 00/19] i.MX: Add i.MX25 support through the PDK evaluation board

2015-07-30 Thread Peter Maydell

On 30 July 2015 at 22:36, Jean-Christophe DUBOIS j...@tribudubois.net wrote:
 Is there any more work needed on this series?

I was expecting Peter C to review this, but he might be busy just
now, and I see he already reviewed most of the patches in the
set anyway. It's on my todo list; I'll try to get to it tomorrow
or early next week. We have a little time yet until 2.4 releases
and we can put things in master for 2.5 anyway...

thanks
-- PMM

Re: [Qemu-devel] [PATCH RFC v2 26/47] qapi-types: Convert to QAPISchemaVisitor, fixing flat unions

2015-07-30 Thread Eric Blake

On 07/30/2015 10:36 AM, Eric Blake wrote:
 On 07/30/2015 09:53 AM, Markus Armbruster wrote:
 
 Or, we could ditch the qtypes lookup altogether, and merely create the
 alternate enum as a non-consecutive QTYPE mapping, for one less level of
 indirection, as in:

 typedef enum BlockdevRefKind {
 BLOCKDEV_REF_DEFINITION = QTYPE_QOBJECT,

 QTYPE_QDICT, but I get what you mean.

 BLOCKDEV_REF_REFERENCE = QTYPE_QSTRING,
 };


 Hmm, your new BlockdevRefKind is basically a subset of qtype_code with
 the members renamed.  Could we simply use qtype_code directly?
 
 We could, except that clients that manipulate the generated struct then
 have to know the qtype mapping directly; while keeping symbolic names
 lets them do 'foo-type = BLOCKDEV_REF_REFERENCE; foo-reference = xyz;'
 as a nice visual indicator of which union member within the struct is
 being assigned according to the discriminator.
 
 I guess I'll see how much code currently manipulates the generated
 structs (I already recall from other patches in this series that
 blockdev played a bit loose by  validating that the QMP was okay and
 then using QDict for everything else rather than the generated struct)
 and make my decision when posting my RFC patch.

Turns out that using it directly was easier, and less code:
http://thread.gmane.org/gmane.comp.emulators.qemu/353204/focus=354008

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH] hw/pci-host/bonito: Avoid buffer overrun for bad LDMA/COP accesses

2015-07-30 Thread Peter Maydell

On 30 July 2015 at 23:02, Aurelien Jarno aurel...@aurel32.net wrote:
 On 2015-07-30 16:33, Peter Maydell wrote:
 The LDMA and COP memory regions represent four 32 bit registers
 each, but the memory regions themselves are 0x100 bytes large.
 Add guards to the read and write accessors so that bogus accesses
 beyond the four defined registers don't just run off the end of
 the bonldma and boncop structs and into whatever lies beyond.

 Thanks for finding that. I don't know if it is better to reduce the
 memory region or just ignore the access as in your patch. I haven't
 found any documentation about the bonito northbridge, so I think it's
 safer to go like in your patch.

I did find some documentation by random googling -- but it just
defines that there are four valid registers in each region,
and doesn't say anything about what happens in the gaps
in between...

 I have just tested, it still boots fine with the change.

  hw/pci-host/bonito.c | 16 
  1 file changed, 16 insertions(+)

 Acked-by: Aurelien Jarno aurel...@aurel32.net

Thanks. (I haven't marked this as for-2.4 because
it's been like this since forever, and fulong2e isn't a
KVM board we care about security on; this is just a random
cleanup I happened to remember about. I could be persuaded
that it ought to go in, though.)

-- PMM

Re: [Qemu-devel] [POC] colo-proxy in qemu

2015-07-30 Thread Yang Hongyang


On 07/31/2015 09:28 AM, zhanghailiang wrote:

On 2015/7/31 9:08, Yang Hongyang wrote:



On 07/31/2015 01:53 AM, Dr. David Alan Gilbert wrote:

* Yang Hongyang (yan...@cn.fujitsu.com) wrote:



On 07/30/2015 09:59 PM, Dr. David Alan Gilbert wrote:

* zhanghailiang (zhang.zhanghaili...@huawei.com) wrote:

On 2015/7/30 20:30, Dr. David Alan Gilbert wrote:

* Gonglei (arei.gong...@huawei.com) wrote:

On 2015/7/30 19:56, Dr. David Alan Gilbert wrote:

* Jason Wang (jasow...@redhat.com) wrote:



On 07/30/2015 04:03 PM, Dr. David Alan Gilbert wrote:

* Dong, Eddie (eddie.d...@intel.com) wrote:

A question here, the packet comparing may be very tricky. For
example,
some protocol use random data to generate unpredictable id or
something else. One example is ipv6_select_ident() in Linux. So COLO
needs a mechanism to make sure PVM and SVM can generate same random

data?
Good question, the random data connection is a big problem for
COLO. At
present, it will trigger checkpoint processing because of the
different random
data.
I don't think any mechanisms can assure two different machines
generate the
same random data. If you have any ideas, pls tell us :)

Frequent checkpoint can handle this scenario, but maybe will cause the
performance poor. :(


The assumption is that, after VM checkpoint, SVM and PVM have
identical internal state, so the pattern used to generate random
data has high possibility to generate identical data at short time,
at least...

They do diverge pretty quickly though; I have simple examples which
reliably cause a checkpoint because of simple randomness in
applications.

Dave



And it will become even worse if hwrng is used in guest.


Yes; it seems quite application dependent;  (on IPv4) an ssh connection,
once established, tends to work well without triggering checkpoints;
and static web pages also work well.  Examples of things that do cause
more checkpoints are, displaying guest statistics (e.g. running top
in that ssh) which is timing dependent, and dynamically generated
web pages that include a unique ID (bugzilla's password reset link in
it's front page was a fun one), I think also establishing
new encrypted connections cause the same randomness.

However, it's worth remembering that COLO is trying to reduce the
number of checkpoints compared to a simple checkpointing world
which would be aiming to do a checkpoint ~100 times a second,
and for compute bound workloads, or ones that don't expose
the randomness that much, it can get checkpoints of a few seconds
in length which greatly reduces the overhead.



Yes. That's the truth.
We can set two different modes for different scenarios. Maybe Named
1) frequent checkpoint mode for multi-connections and randomness scenarios
and 2) non-frequent checkpoint mode for other scenarios.

But that's the next plan, we are thinking about that.


I have some code that tries to automatically switch between those;
it measures the checkpoint lengths, and if they're consistently short
it sends a different message byte to the secondary at the start of the
checkpoint, so that it doesn't bother running.   Every so often it
then flips back to a COLO checkpoint to see if the checkpoints
are still really fast.



Do you mean if there are consistent checkpoint requests, not do checkpoint
but just send a special message to SVM?
Resume to common COLO mode until the checkpoint lengths is so not short ?


   We still have to do checkpoints, but we send a special message to the
SVM so that
the SVM just takes the checkpoint but does not run.

   I'll send the code after I've updated it to your current version; but it's
quite rough/experimental.

It works something like

  ---run PVM run SVM
  COLO long gap
  mode   miscompare
 checkpoint
  ---run PVM run SVM
  COLO short gap
  mode   miscompare
 checkpoint
  ---run PVM run SVM
  COLO short gap
  mode   miscompare  After a few short runs
 checkpoint
  ---run PVM SVM idle   \
Passivefixed delay|  - repeat 'n' times
  mode   checkpoint /
  ---run PVM run SVM
  COLO short gap   Still a short gap
  mode   miscompare
  ---run PVM SVM idle   \
Passivefixed delay|  - repeat 'n' times
  mode   checkpoint /
  ---run PVM run SVM
  COLO long gap   long gap now, stay in COLO
  mode   miscompare
 checkpoint
  ---run PVM run SVM
  COLO long gap
  mode   miscompare
 checkpoint

So it saves the CPU time on the SVM, and the comparison traffic, and is
automatic at switching into the passive mode.

It used to be more useful, but your minimum COLO run time that you
added a few versions ago helps a lot in the cases where there are miscompares,
and the delay after the miscompare before

1 2 3 >

1 - 100 of 237 matches

Mail list logo