[PATCH v4 1/2] virtio-blk: delete vqs on the error path in realize()

2020-03-27 Thread Pan Nengyuan
virtio_vqs forgot to free on the error path in realize(). Fix that.

The asan stack:
Direct leak of 14336 byte(s) in 1 object(s) allocated from:
#0 0x7f58b93fd970 in __interceptor_calloc (/lib64/libasan.so.5+0xef970)
#1 0x7f58b858249d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5249d)
#2 0x5562cc627f49 in virtio_add_queue /mnt/sdb/qemu/hw/virtio/virtio.c:2413
#3 0x5562cc4b524a in virtio_blk_device_realize 
/mnt/sdb/qemu/hw/block/virtio-blk.c:1202
#4 0x5562cc613050 in virtio_device_realize 
/mnt/sdb/qemu/hw/virtio/virtio.c:3615
#5 0x5562ccb7a568 in device_set_realized /mnt/sdb/qemu/hw/core/qdev.c:891
#6 0x5562cd39cd45 in property_set_bool /mnt/sdb/qemu/qom/object.c:2238

Reported-by: Euler Robot 
Signed-off-by: Pan Nengyuan 
Reviewed-by: Stefano Garzarella 
---
v2->v1:
- Fix incorrect free in v1, it will cause a uaf.
---
Cc: Stefan Hajnoczi 
Cc: Kevin Wolf 
Cc: Max Reitz 
Cc: qemu-block@nongnu.org
---
 hw/block/virtio-blk.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 142863a3b2..97ba8a2187 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -1204,6 +1204,9 @@ static void virtio_blk_device_realize(DeviceState *dev, 
Error **errp)
 virtio_blk_data_plane_create(vdev, conf, >dataplane, );
 if (err != NULL) {
 error_propagate(errp, err);
+for (i = 0; i < conf->num_queues; i++) {
+virtio_del_queue(vdev, i);
+}
 virtio_cleanup(vdev);
 return;
 }
-- 
2.18.2




[PULL 2/5] hw/ide/sii3112: Use qdev gpio rather than qemu_allocate_irqs()

2020-03-27 Thread John Snow
From: Peter Maydell 

Coverity points out (CID 1421984) that we are leaking the
memory returned by qemu_allocate_irqs(). We can avoid this
leak by switching to using qdev_init_gpio_in(); the base
class finalize will free the irqs that this allocates under
the hood.

Signed-off-by: Peter Maydell 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: John Snow 
Tested-by: BALATON Zoltan 
Message-id: 20200323151715.29454-1-peter.mayd...@linaro.org
[Maintainer edit: replace `DEVICE(dev)` by `ds` --js]
Signed-off-by: John Snow 
---
 hw/ide/sii3112.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/ide/sii3112.c b/hw/ide/sii3112.c
index 06605d7af2..d69079c3d9 100644
--- a/hw/ide/sii3112.c
+++ b/hw/ide/sii3112.c
@@ -251,8 +251,8 @@ static void sii3112_pci_realize(PCIDevice *dev, Error 
**errp)
 {
 SiI3112PCIState *d = SII3112_PCI(dev);
 PCIIDEState *s = PCI_IDE(dev);
+DeviceState *ds = DEVICE(dev);
 MemoryRegion *mr;
-qemu_irq *irq;
 int i;
 
 pci_config_set_interrupt_pin(dev->config, 1);
@@ -280,10 +280,10 @@ static void sii3112_pci_realize(PCIDevice *dev, Error 
**errp)
 memory_region_init_alias(mr, OBJECT(d), "sii3112.bar4", >mmio, 0, 16);
 pci_register_bar(dev, 4, PCI_BASE_ADDRESS_SPACE_IO, mr);
 
-irq = qemu_allocate_irqs(sii3112_set_irq, d, 2);
+qdev_init_gpio_in(ds, sii3112_set_irq, 2);
 for (i = 0; i < 2; i++) {
-ide_bus_new(>bus[i], sizeof(s->bus[i]), DEVICE(dev), i, 1);
-ide_init2(>bus[i], irq[i]);
+ide_bus_new(>bus[i], sizeof(s->bus[i]), ds, i, 1);
+ide_init2(>bus[i], qdev_get_gpio_in(ds, i));
 
 bmdma_init(>bus[i], >bmdma[i], s);
 s->bmdma[i].bus = >bus[i];
-- 
2.21.1




[PULL 3/5] via-ide: don't use PCI level for legacy IRQs

2020-03-27 Thread John Snow
From: Mark Cave-Ayland 

The PCI level calculation was accidentally left in when rebasing from a
previous patchset. Since both IRQs are driven separately, the value
being passed into the IRQ handler should be used directly.

Signed-off-by: Mark Cave-Ayland 
Message-id: 20200324210519.2974-2-mark.cave-ayl...@ilande.co.uk
Signed-off-by: John Snow 
---
 hw/ide/via.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/hw/ide/via.c b/hw/ide/via.c
index 8de4945cc1..2a55b7fbc6 100644
--- a/hw/ide/via.c
+++ b/hw/ide/via.c
@@ -112,7 +112,6 @@ static void via_ide_set_irq(void *opaque, int n, int level)
 d->config[0x70 + n * 8] &= ~0x80;
 }
 
-level = (d->config[0x70] & 0x80) || (d->config[0x78] & 0x80);
 qemu_set_irq(isa_get_irq(NULL, 14 + n), level);
 }
 
-- 
2.21.1




[PULL 1/5] fdc/i8257: implement verify transfer mode

2020-03-27 Thread John Snow
From: Sven Schnelle 

While working on the Tulip driver i tried to write some Teledisk images to
a floppy image which didn't work. Turned out that Teledisk checks the written
data by issuing a READ command to the FDC but running the DMA controller
in VERIFY mode. As we ignored the DMA request in that case, the DMA transfer
never finished, and Teledisk reported an error.

The i8257 spec says about verify transfers:

3) DMA verify, which does not actually involve the transfer of data. When an
8257 channel is in the DMA verify mode, it will respond the same as described
for transfer operations, except that no memory or I/O read/write control signals
will be generated.

Hervé proposed to remove all the dma_mode_ok stuff from fdc to have a more
clear boundary between DMA and FDC, so this patch also does that.

Suggested-by: Hervé Poussineau 
Signed-off-by: Sven Schnelle 
Reviewed-by: Hervé Poussineau 
Signed-off-by: John Snow 
---
 include/hw/isa/isa.h |  1 -
 hw/block/fdc.c   | 61 +---
 hw/dma/i8257.c   | 20 ++-
 3 files changed, 31 insertions(+), 51 deletions(-)

diff --git a/include/hw/isa/isa.h b/include/hw/isa/isa.h
index e9ac1f1205..59a4d4b50a 100644
--- a/include/hw/isa/isa.h
+++ b/include/hw/isa/isa.h
@@ -56,7 +56,6 @@ typedef int (*IsaDmaTransferHandler)(void *opaque, int nchan, 
int pos,
 typedef struct IsaDmaClass {
 InterfaceClass parent;
 
-IsaDmaTransferMode (*get_transfer_mode)(IsaDma *obj, int nchan);
 bool (*has_autoinitialization)(IsaDma *obj, int nchan);
 int (*read_memory)(IsaDma *obj, int nchan, void *buf, int pos, int len);
 int (*write_memory)(IsaDma *obj, int nchan, void *buf, int pos, int len);
diff --git a/hw/block/fdc.c b/hw/block/fdc.c
index 22e954e0dc..33bc9e2f92 100644
--- a/hw/block/fdc.c
+++ b/hw/block/fdc.c
@@ -1714,53 +1714,28 @@ static void fdctrl_start_transfer(FDCtrl *fdctrl, int 
direction)
 }
 fdctrl->eot = fdctrl->fifo[6];
 if (fdctrl->dor & FD_DOR_DMAEN) {
-IsaDmaTransferMode dma_mode;
+/* DMA transfer is enabled. */
 IsaDmaClass *k = ISADMA_GET_CLASS(fdctrl->dma);
-bool dma_mode_ok;
-/* DMA transfer are enabled. Check if DMA channel is well programmed */
-dma_mode = k->get_transfer_mode(fdctrl->dma, fdctrl->dma_chann);
-FLOPPY_DPRINTF("dma_mode=%d direction=%d (%d - %d)\n",
-   dma_mode, direction,
-   (128 << fdctrl->fifo[5]) *
+
+FLOPPY_DPRINTF("direction=%d (%d - %d)\n",
+   direction, (128 << fdctrl->fifo[5]) *
(cur_drv->last_sect - ks + 1), fdctrl->data_len);
-switch (direction) {
-case FD_DIR_SCANE:
-case FD_DIR_SCANL:
-case FD_DIR_SCANH:
-dma_mode_ok = (dma_mode == ISADMA_TRANSFER_VERIFY);
-break;
-case FD_DIR_WRITE:
-dma_mode_ok = (dma_mode == ISADMA_TRANSFER_WRITE);
-break;
-case FD_DIR_READ:
-dma_mode_ok = (dma_mode == ISADMA_TRANSFER_READ);
-break;
-case FD_DIR_VERIFY:
-dma_mode_ok = true;
-break;
-default:
-dma_mode_ok = false;
-break;
-}
-if (dma_mode_ok) {
-/* No access is allowed until DMA transfer has completed */
-fdctrl->msr &= ~FD_MSR_RQM;
-if (direction != FD_DIR_VERIFY) {
-/* Now, we just have to wait for the DMA controller to
- * recall us...
- */
-k->hold_DREQ(fdctrl->dma, fdctrl->dma_chann);
-k->schedule(fdctrl->dma);
-} else {
-/* Start transfer */
-fdctrl_transfer_handler(fdctrl, fdctrl->dma_chann, 0,
-fdctrl->data_len);
-}
-return;
+
+/* No access is allowed until DMA transfer has completed */
+fdctrl->msr &= ~FD_MSR_RQM;
+if (direction != FD_DIR_VERIFY) {
+/*
+ * Now, we just have to wait for the DMA controller to
+ * recall us...
+ */
+k->hold_DREQ(fdctrl->dma, fdctrl->dma_chann);
+k->schedule(fdctrl->dma);
 } else {
-FLOPPY_DPRINTF("bad dma_mode=%d direction=%d\n", dma_mode,
-   direction);
+/* Start transfer */
+fdctrl_transfer_handler(fdctrl, fdctrl->dma_chann, 0,
+fdctrl->data_len);
 }
+return;
 }
 FLOPPY_DPRINTF("start non-DMA transfer\n");
 fdctrl->msr |= FD_MSR_NONDMA | FD_MSR_RQM;
diff --git a/hw/dma/i8257.c b/hw/dma/i8257.c
index ef15c06d77..1b3435ab58 100644
--- a/hw/dma/i8257.c
+++ b/hw/dma/i8257.c
@@ -292,12 +292,6 @@ static uint64_t i8257_read_cont(void *opaque, hwaddr 
nport, unsigned size)
 return val;
 }
 
-static IsaDmaTransferMode 

[PULL 4/5] via-ide: use qdev gpio rather than qemu_allocate_irqs()

2020-03-27 Thread John Snow
From: Mark Cave-Ayland 

This prevents the memory from qemu_allocate_irqs() from being leaked which
can in some cases be spotted by Coverity (CID 1421984).

Signed-off-by: Mark Cave-Ayland 
Message-id: 20200324210519.2974-3-mark.cave-ayl...@ilande.co.uk
Signed-off-by: John Snow 
---
 hw/ide/via.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/hw/ide/via.c b/hw/ide/via.c
index 2a55b7fbc6..be09912b33 100644
--- a/hw/ide/via.c
+++ b/hw/ide/via.c
@@ -160,6 +160,7 @@ static void via_ide_reset(DeviceState *dev)
 static void via_ide_realize(PCIDevice *dev, Error **errp)
 {
 PCIIDEState *d = PCI_IDE(dev);
+DeviceState *ds = DEVICE(dev);
 uint8_t *pci_conf = dev->config;
 int i;
 
@@ -187,9 +188,10 @@ static void via_ide_realize(PCIDevice *dev, Error **errp)
 bmdma_setup_bar(d);
 pci_register_bar(dev, 4, PCI_BASE_ADDRESS_SPACE_IO, >bmdma_bar);
 
+qdev_init_gpio_in(ds, via_ide_set_irq, 2);
 for (i = 0; i < 2; i++) {
-ide_bus_new(>bus[i], sizeof(d->bus[i]), DEVICE(d), i, 2);
-ide_init2(>bus[i], qemu_allocate_irq(via_ide_set_irq, d, i));
+ide_bus_new(>bus[i], sizeof(d->bus[i]), ds, i, 2);
+ide_init2(>bus[i], qdev_get_gpio_in(ds, i));
 
 bmdma_init(>bus[i], >bmdma[i], d);
 d->bmdma[i].bus = >bus[i];
-- 
2.21.1




[PULL 5/5] cmd646-ide: use qdev gpio rather than qemu_allocate_irqs()

2020-03-27 Thread John Snow
From: Mark Cave-Ayland 

This prevents the memory from qemu_allocate_irqs() from being leaked which
can in some cases be spotted by Coverity (CID 1421984).

Signed-off-by: Mark Cave-Ayland 
Message-id: 20200324210519.2974-4-mark.cave-ayl...@ilande.co.uk
Signed-off-by: John Snow 
---
 hw/ide/cmd646.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/hw/ide/cmd646.c b/hw/ide/cmd646.c
index 699f25824d..c254631485 100644
--- a/hw/ide/cmd646.c
+++ b/hw/ide/cmd646.c
@@ -249,8 +249,8 @@ static void cmd646_pci_config_write(PCIDevice *d, uint32_t 
addr, uint32_t val,
 static void pci_cmd646_ide_realize(PCIDevice *dev, Error **errp)
 {
 PCIIDEState *d = PCI_IDE(dev);
+DeviceState *ds = DEVICE(dev);
 uint8_t *pci_conf = dev->config;
-qemu_irq *irq;
 int i;
 
 pci_conf[PCI_CLASS_PROG] = 0x8f;
@@ -291,16 +291,15 @@ static void pci_cmd646_ide_realize(PCIDevice *dev, Error 
**errp)
 /* TODO: RST# value should be 0 */
 pci_conf[PCI_INTERRUPT_PIN] = 0x01; // interrupt on pin 1
 
-irq = qemu_allocate_irqs(cmd646_set_irq, d, 2);
+qdev_init_gpio_in(ds, cmd646_set_irq, 2);
 for (i = 0; i < 2; i++) {
-ide_bus_new(>bus[i], sizeof(d->bus[i]), DEVICE(dev), i, 2);
-ide_init2(>bus[i], irq[i]);
+ide_bus_new(>bus[i], sizeof(d->bus[i]), ds, i, 2);
+ide_init2(>bus[i], qdev_get_gpio_in(ds, i));
 
 bmdma_init(>bus[i], >bmdma[i], d);
 d->bmdma[i].bus = >bus[i];
 ide_register_restart_cb(>bus[i]);
 }
-g_free(irq);
 }
 
 static void pci_cmd646_ide_exitfn(PCIDevice *dev)
-- 
2.21.1




[PULL 0/5] Ide patches

2020-03-27 Thread John Snow
The following changes since commit cfe68ae025f704f336d7dd3d1903ce37b445831d:

  Merge remote-tracking branch 
'remotes/vivier2/tags/linux-user-for-5.0-pull-request' into staging (2020-03-26 
20:55:54 +)

are available in the Git repository at:

  https://github.com/jnsnow/qemu.git tags/ide-pull-request

for you to fetch changes up to cbf4c9ac9c000f7caf1bfee031041b62d2b000c8:

  cmd646-ide: use qdev gpio rather than qemu_allocate_irqs() (2020-03-27 
14:30:08 -0400)


Pull request



Mark Cave-Ayland (3):
  via-ide: don't use PCI level for legacy IRQs
  via-ide: use qdev gpio rather than qemu_allocate_irqs()
  cmd646-ide: use qdev gpio rather than qemu_allocate_irqs()

Peter Maydell (1):
  hw/ide/sii3112: Use qdev gpio rather than qemu_allocate_irqs()

Sven Schnelle (1):
  fdc/i8257: implement verify transfer mode

 include/hw/isa/isa.h |  1 -
 hw/block/fdc.c   | 61 +---
 hw/dma/i8257.c   | 20 ++-
 hw/ide/cmd646.c  |  9 +++
 hw/ide/sii3112.c |  8 +++---
 hw/ide/via.c |  7 ++---
 6 files changed, 43 insertions(+), 63 deletions(-)

-- 
2.21.1




Re: [PATCH v4] qcow2: Forbid discard in qcow2 v2 images with backing files

2020-03-27 Thread Eric Blake

On 3/27/20 1:59 PM, Alberto Garcia wrote:

A discard request deallocates the selected clusters so they read back
as zeroes. This is done by clearing the cluster offset field and
setting QCOW_OFLAG_ZERO in the L2 entry.

This flag is however only supported when qcow_version >= 3. In older
images the cluster is simply deallocated, exposing any possible stale
data from the backing file.

Since discard is an advisory operation it's safer to simply forbid it
in this scenario.

Note that we are adding this check to qcow2_co_pdiscard() and not to
qcow2_cluster_discard() or discard_in_l2_slice() because the last
two are also used by qcow2_snapshot_create() to discard the clusters
used by the VM state. In this case there's no risk of exposing stale
data to the guest and we really want that the clusters are always
discarded.

Signed-off-by: Alberto Garcia 
---
v4:
- Show output of qemu-img map when there's no backing file [Eric]


Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [PATCH v3] qcow2: Forbid discard in qcow2 v2 images with backing files

2020-03-27 Thread Alberto Garcia
On Fri 27 Mar 2020 07:57:40 PM CET, Eric Blake wrote:
>> +/* If the image does not support QCOW_OFLAG_ZERO then discarding
>> + * clusters could expose stale data from the backing file. */
>> +if (s->qcow_version < 3 && bs->backing) {
>> +return -ENOTSUP;
>> +}
>
> Hmm. Should we blindly always fail for v2, or can we be a little bit
> smarter and still discard a cluster in the top layer if the backing
> layer does not also have it allocated?

Not sure if that's worth it. I only wanted to fix what looks like a
potential security bug so I prefer to keep it simple. qcow2 v3 has been
out for many years already.

Berto



[PATCH v4] qcow2: Forbid discard in qcow2 v2 images with backing files

2020-03-27 Thread Alberto Garcia
A discard request deallocates the selected clusters so they read back
as zeroes. This is done by clearing the cluster offset field and
setting QCOW_OFLAG_ZERO in the L2 entry.

This flag is however only supported when qcow_version >= 3. In older
images the cluster is simply deallocated, exposing any possible stale
data from the backing file.

Since discard is an advisory operation it's safer to simply forbid it
in this scenario.

Note that we are adding this check to qcow2_co_pdiscard() and not to
qcow2_cluster_discard() or discard_in_l2_slice() because the last
two are also used by qcow2_snapshot_create() to discard the clusters
used by the VM state. In this case there's no risk of exposing stale
data to the guest and we really want that the clusters are always
discarded.

Signed-off-by: Alberto Garcia 
---
v4:
- Show output of qemu-img map when there's no backing file [Eric]

v3:
- Rebase and change iotest number
- Show output of qemu-img map in iotest 290 [Kevin]
- Use the l2_offset and rb_offset variables in iotest 060

v2:

- Don't create the image with compat=0.10 in iotest 060 [Max]
- Use $TEST_IMG.base for the backing image name in iotest 289 [Max]
- Add list of unsupported options to iotest 289 [Max]

 block/qcow2.c  |  6 +++
 tests/qemu-iotests/060 | 12 ++---
 tests/qemu-iotests/060.out |  2 -
 tests/qemu-iotests/290 | 97 ++
 tests/qemu-iotests/290.out | 61 
 tests/qemu-iotests/group   |  1 +
 6 files changed, 170 insertions(+), 9 deletions(-)
 create mode 100755 tests/qemu-iotests/290
 create mode 100644 tests/qemu-iotests/290.out

diff --git a/block/qcow2.c b/block/qcow2.c
index 2bb536b014..e8cbcc1ec1 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3784,6 +3784,12 @@ static coroutine_fn int 
qcow2_co_pdiscard(BlockDriverState *bs,
 int ret;
 BDRVQcow2State *s = bs->opaque;
 
+/* If the image does not support QCOW_OFLAG_ZERO then discarding
+ * clusters could expose stale data from the backing file. */
+if (s->qcow_version < 3 && bs->backing) {
+return -ENOTSUP;
+}
+
 if (!QEMU_IS_ALIGNED(offset | bytes, s->cluster_size)) {
 assert(bytes < s->cluster_size);
 /* Ignore partial clusters, except for the special case of the
diff --git a/tests/qemu-iotests/060 b/tests/qemu-iotests/060
index 043f12904a..32c0ecce9e 100755
--- a/tests/qemu-iotests/060
+++ b/tests/qemu-iotests/060
@@ -160,18 +160,16 @@ TEST_IMG=$BACKING_IMG _make_test_img 1G
 
 $QEMU_IO -c 'write 0k 64k' "$BACKING_IMG" | _filter_qemu_io
 
-# compat=0.10 is required in order to make the following discard actually
-# unallocate the sector rather than make it a zero sector - we want COW, after
-# all.
-_make_test_img -o 'compat=0.10' -b "$BACKING_IMG" 1G
+_make_test_img -b "$BACKING_IMG" 1G
 # Write two clusters, the second one enforces creation of an L2 table after
 # the first data cluster.
 $QEMU_IO -c 'write 0k 64k' -c 'write 512M 64k' "$TEST_IMG" | _filter_qemu_io
-# Discard the first cluster. This cluster will soon enough be reallocated and
+# Free the first cluster. This cluster will soon enough be reallocated and
 # used for COW.
-$QEMU_IO -c 'discard 0k 64k' "$TEST_IMG" | _filter_qemu_io
+poke_file "$TEST_IMG" "$l2_offset" "\x00\x00\x00\x00\x00\x00\x00\x00"
+poke_file "$TEST_IMG" "$(($rb_offset+10))" "\x00\x00"
 # Now, corrupt the image by marking the second L2 table cluster as free.
-poke_file "$TEST_IMG" '131084' "\x00\x00" # 0x2000c
+poke_file "$TEST_IMG" "$(($rb_offset+12))" "\x00\x00"
 # Start a write operation requiring COW on the image stopping it right before
 # doing the read; then, trigger the corruption prevention by writing anything 
to
 # any unallocated cluster, leading to an attempt to overwrite the second L2
diff --git a/tests/qemu-iotests/060.out b/tests/qemu-iotests/060.out
index d27692a33c..09caaea865 100644
--- a/tests/qemu-iotests/060.out
+++ b/tests/qemu-iotests/060.out
@@ -105,8 +105,6 @@ wrote 65536/65536 bytes at offset 0
 64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 wrote 65536/65536 bytes at offset 536870912
 64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-discard 65536/65536 bytes at offset 0
-64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 qcow2: Marking image as corrupt: Preventing invalid write on metadata 
(overlaps with active L2 table); further corruption events will be suppressed
 blkdebug: Suspended request '0'
 write failed: Input/output error
diff --git a/tests/qemu-iotests/290 b/tests/qemu-iotests/290
new file mode 100755
index 00..776b65e915
--- /dev/null
+++ b/tests/qemu-iotests/290
@@ -0,0 +1,97 @@
+#!/usr/bin/env bash
+#
+# Test how 'qemu-io -c discard' behaves on v2 and v3 qcow2 images
+#
+# Copyright (C) 2020 Igalia, S.L.
+# Author: Alberto Garcia 
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software 

Re: [PATCH v3] qcow2: Forbid discard in qcow2 v2 images with backing files

2020-03-27 Thread Eric Blake

On 3/27/20 11:48 AM, Alberto Garcia wrote:

A discard request deallocates the selected clusters so they read back
as zeroes. This is done by clearing the cluster offset field and
setting QCOW_OFLAG_ZERO in the L2 entry.

This flag is however only supported when qcow_version >= 3. In older
images the cluster is simply deallocated, exposing any possible stale
data from the backing file.

Since discard is an advisory operation it's safer to simply forbid it
in this scenario.

Note that we are adding this check to qcow2_co_pdiscard() and not to
qcow2_cluster_discard() or discard_in_l2_slice() because the last
two are also used by qcow2_snapshot_create() to discard the clusters
used by the VM state. In this case there's no risk of exposing stale
data to the guest and we really want that the clusters are always
discarded.

Signed-off-by: Alberto Garcia 
---



+++ b/block/qcow2.c
@@ -3784,6 +3784,12 @@ static coroutine_fn int 
qcow2_co_pdiscard(BlockDriverState *bs,
  int ret;
  BDRVQcow2State *s = bs->opaque;
  
+/* If the image does not support QCOW_OFLAG_ZERO then discarding

+ * clusters could expose stale data from the backing file. */
+if (s->qcow_version < 3 && bs->backing) {
+return -ENOTSUP;
+}
+


Hmm. Should we blindly always fail for v2, or can we be a little bit 
smarter and still discard a cluster in the top layer if the backing 
layer does not also have it allocated?  In other words, is it also worth 
checking bdrv_is_allocated(), and avoiding the -ENOTSUP failure in the 
case where the backing chain does not have any allocation of the same 
range (at which point, discarding the cluster in the top layer will read 
as zeroes rather than as stale data)?


But that's a minor optimization for an older file format; it's just as 
easy to tell users that if they want maximum discarding power on their 
active layer, then they should be using v3 images.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [PATCH v3] qcow2: Forbid discard in qcow2 v2 images with backing files

2020-03-27 Thread Eric Blake

On 3/27/20 1:43 PM, Alberto Garcia wrote:

On Fri 27 Mar 2020 07:13:04 PM CET, Eric Blake wrote:

+for qcow2_compat in 0.10 1.1; do
+echo "# Create an image with compat=$qcow2_compat without a backing file"
+_make_test_img -o "compat=$qcow2_compat" 128k
+
+echo "# Fill all clusters with data and then discard them"
+$QEMU_IO -c 'write -P 0x01 0 128k' "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c 'discard 0 128k' "$TEST_IMG" | _filter_qemu_io
+
+echo "# Read the data from the discarded clusters"
+$QEMU_IO -c 'read -P 0x00 0 128k' "$TEST_IMG" | _filter_qemu_io
+done


Should this loop also inspect qemu-img map output?


I guess we can, although here the image is completely unallocated in
both cases.


Which shows that even for v2 images, discard DOES do something when 
there is no backing file (even if it is now a no-op when there is a 
backing file after this patch).


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [PATCH 2/3] io: Support shutdown of TLS channel

2020-03-27 Thread Eric Blake

On 3/27/20 12:43 PM, Daniel P. Berrangé wrote:


I don't think it is acceptable to do this loop here. The gnutls_bye()
function triggers several I/O operations which could block. Looping
like this means we busy-wait, blocking this thread for as long as I/O
is blocking on the socket.


Hmm, good point.  Should we give qio_channel_tls_shutdown a bool parameter
that says whether it should wait (good for use when we are being run in a
coroutine and can deal with the additional I/O) or just fail with -EAGAIN
(which the caller can ignore if it is not worried)?


A bool would not be sufficient, because the caller would need to know
which direction to wait for I/O on.

Looking at the code it does a flush of outstanding data, then sends
some bytes, and then receives some bytes. The write will probably
work most of the time, but the receive is almost always going to
return -EAGAIN. So I don't think that failing with EGAIN is going
to be much of a benefit.


Here, I'm guessing you're talking about gnutls lib/record.c.  But note: 
for GNUTLS_SHUT_WR, it only does _gnutls_io_write_flush and 
gnutls_alert_send; the use of _gnutls_recv_int is reserved for 
GNUTLS_SHUT_RDWR.  When informing the other end that you are not going 
to write any more, you don't have to wait for a reply.





If we must call gnutls_bye(), then it needs to be done in a way that
can integrate with the main loop so it poll()'s / unblocks the current
coroutine/thread.  This makes the whole thing significantly more
complex to deal with, especially if the shutdown is being done in
cleanup paths which ordinarily are expected to execute without
blocking on I/O.  This is the big reason why i never made any attempt
to use gnutls_bye().


We _are_ using gnutls_bye(GNUTLS_SHUT_RDWR) on the close() path (which is


Are you sure ?  AFAIK there is no existing usage of gnutls_bye() at all
in QEMU.


Oh, I'm confusing that with nbdkit, which does use 
gnutls_bye(GNUTLS_SHUT_RDWR) but does not wait for a response (at least, 
not yet).





indeed a cleanup path where not blocking is worthwhile) even without this
patch, but the question is whether using gnutls_bye(GNUTLS_SHUT_WR) in the
normal data path, where we are already using coroutines to manage callbacks,
can benefit the remote endpoint, giving them a chance to see clean shutdown
instead of abrupt termination.


I'm not convinced the clean shutdown at the TLS level does anything important
given that we already have done a clean shutdown at the NBD level.


Okay, then for now, I'll still try and see if I can fix the 
libnbd/nbdkit hang without relying on qemu sending a clean 
gnutls_bye(GNUTLS_SHUT_WR).


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [PATCH 0/3] nbd: Try for cleaner TLS shutdown

2020-03-27 Thread no-reply
Patchew URL: 
https://patchew.org/QEMU/20200327161936.2225989-1-ebl...@redhat.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

io/channel-tls.o: In function `qio_channel_tls_shutdown':
/tmp/qemu-test/src/io/channel-tls.c:373: undefined reference to 
`qcrypto_tls_session_shutdown'
/tmp/qemu-test/src/io/channel-tls.c:376: undefined reference to 
`qcrypto_tls_session_shutdown'
collect2: error: ld returned 1 exit status
make: *** [scsi/qemu-pr-helper] Error 1
make: *** Waiting for unfinished jobs
io/channel-tls.o: In function `qio_channel_tls_shutdown':
/tmp/qemu-test/src/io/channel-tls.c:373: undefined reference to 
`qcrypto_tls_session_shutdown'
/tmp/qemu-test/src/io/channel-tls.c:376: undefined reference to 
`qcrypto_tls_session_shutdown'
collect2: error: ld returned 1 exit status
make: *** [qemu-nbd] Error 1
io/channel-tls.o: In function `qio_channel_tls_shutdown':
/tmp/qemu-test/src/io/channel-tls.c:373: undefined reference to 
`qcrypto_tls_session_shutdown'
/tmp/qemu-test/src/io/channel-tls.c:376: undefined reference to 
`qcrypto_tls_session_shutdown'
collect2: error: ld returned 1 exit status
make: *** [qemu-io] Error 1
io/channel-tls.o: In function `qio_channel_tls_shutdown':
/tmp/qemu-test/src/io/channel-tls.c:373: undefined reference to 
`qcrypto_tls_session_shutdown'
/tmp/qemu-test/src/io/channel-tls.c:376: undefined reference to 
`qcrypto_tls_session_shutdown'
collect2: error: ld returned 1 exit status
make: *** [qemu-storage-daemon] Error 1
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 664, in 
sys.exit(main())
---
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', 
'--label', 'com.qemu.instance.uuid=89d889bd9dd84a6592e526b967f6a8cc', '-u', 
'1003', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', 
'-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 
'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', 
'/home/patchew2/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', 
'/var/tmp/patchew-tester-tmp-xtkcxxmy/src/docker-src.2020-03-27-14.42.04.29238:/var/tmp/qemu:z,ro',
 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit 
status 2.
filter=--filter=label=com.qemu.instance.uuid=89d889bd9dd84a6592e526b967f6a8cc
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-xtkcxxmy/src'
make: *** [docker-run-test-quick@centos7] Error 2

real2m13.487s
user0m8.419s


The full log is available at
http://patchew.org/logs/20200327161936.2225989-1-ebl...@redhat.com/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [PATCH v3] qcow2: Forbid discard in qcow2 v2 images with backing files

2020-03-27 Thread Alberto Garcia
On Fri 27 Mar 2020 07:13:04 PM CET, Eric Blake wrote:
>> +for qcow2_compat in 0.10 1.1; do
>> +echo "# Create an image with compat=$qcow2_compat without a backing 
>> file"
>> +_make_test_img -o "compat=$qcow2_compat" 128k
>> +
>> +echo "# Fill all clusters with data and then discard them"
>> +$QEMU_IO -c 'write -P 0x01 0 128k' "$TEST_IMG" | _filter_qemu_io
>> +$QEMU_IO -c 'discard 0 128k' "$TEST_IMG" | _filter_qemu_io
>> +
>> +echo "# Read the data from the discarded clusters"
>> +$QEMU_IO -c 'read -P 0x00 0 128k' "$TEST_IMG" | _filter_qemu_io
>> +done
>
> Should this loop also inspect qemu-img map output?

I guess we can, although here the image is completely unallocated in
both cases.

Berto



Re: [PATCH v3] qcow2: Forbid discard in qcow2 v2 images with backing files

2020-03-27 Thread Eric Blake

On 3/27/20 11:48 AM, Alberto Garcia wrote:

A discard request deallocates the selected clusters so they read back
as zeroes. This is done by clearing the cluster offset field and
setting QCOW_OFLAG_ZERO in the L2 entry.

This flag is however only supported when qcow_version >= 3. In older
images the cluster is simply deallocated, exposing any possible stale
data from the backing file.

Since discard is an advisory operation it's safer to simply forbid it
in this scenario.

Note that we are adding this check to qcow2_co_pdiscard() and not to
qcow2_cluster_discard() or discard_in_l2_slice() because the last
two are also used by qcow2_snapshot_create() to discard the clusters
used by the VM state. In this case there's no risk of exposing stale
data to the guest and we really want that the clusters are always
discarded.

Signed-off-by: Alberto Garcia 
---



+++ b/tests/qemu-iotests/290



+
+echo
+echo "### Test 'qemu-io -c discard' on a QCOW2 image without a backing file"
+echo
+for qcow2_compat in 0.10 1.1; do
+echo "# Create an image with compat=$qcow2_compat without a backing file"
+_make_test_img -o "compat=$qcow2_compat" 128k
+
+echo "# Fill all clusters with data and then discard them"
+$QEMU_IO -c 'write -P 0x01 0 128k' "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c 'discard 0 128k' "$TEST_IMG" | _filter_qemu_io
+
+echo "# Read the data from the discarded clusters"
+$QEMU_IO -c 'read -P 0x00 0 128k' "$TEST_IMG" | _filter_qemu_io
+done


Should this loop also inspect qemu-img map output?


+
+echo
+echo "### Test 'qemu-io -c discard' on a QCOW2 image with a backing file"
+echo
+
+echo "# Create a backing image and fill it with data"
+BACKING_IMG="$TEST_IMG.base"
+TEST_IMG="$BACKING_IMG" _make_test_img 128k
+$QEMU_IO -c 'write -P 0xff 0 128k' "$BACKING_IMG" | _filter_qemu_io
+
+for qcow2_compat in 0.10 1.1; do
+echo "# Create an image with compat=$qcow2_compat and a backing file"
+_make_test_img -o "compat=$qcow2_compat" -b "$BACKING_IMG"
+
+echo "# Fill all clusters with data and then discard them"
+$QEMU_IO -c 'write -P 0x01 0 128k' "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c 'discard 0 128k' "$TEST_IMG" | _filter_qemu_io
+
+echo "# Read the data from the discarded clusters"
+if [ "$qcow2_compat" = "1.1" ]; then
+# In qcow2 v3 clusters are zeroed (with QCOW_OFLAG_ZERO)
+$QEMU_IO -c 'read -P 0x00 0 128k' "$TEST_IMG" | _filter_qemu_io
+else
+# In qcow2 v2 if there's a backing image we cannot zero the clusters
+# without exposing the backing file data so discard does nothing
+$QEMU_IO -c 'read -P 0x01 0 128k' "$TEST_IMG" | _filter_qemu_io
+fi
+
+echo "# Output of qemu-img map"
+$QEMU_IMG map "$TEST_IMG" | _filter_testdir
+done


But I agree this was the more interesting one, so we at least have 
decent coverage of the change itself.


Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [PATCH 3/3] nbd: Use shutdown(SHUT_WR) after last item sent

2020-03-27 Thread Daniel P . Berrangé
On Fri, Mar 27, 2020 at 12:42:21PM -0500, Eric Blake wrote:
> On 3/27/20 11:35 AM, Daniel P. Berrangé wrote:
> > On Fri, Mar 27, 2020 at 11:19:36AM -0500, Eric Blake wrote:
> > > Although the remote end should always be tolerant of a socket being
> > > arbitrarily closed, there are situations where it is a lot easier if
> > > the remote end can be guaranteed to read EOF even before the socket
> > > has closed.  In particular, when using gnutls, if we fail to inform
> > > the remote end about an impending teardown, the remote end cannot
> > > distinguish between our closing the socket as intended vs. a malicious
> > > intermediary interrupting things, and may result in spurious error
> > > messages.
> > 
> > Does this actually matter in the NBD case ?
> > 
> > It has an explicit NBD command for requesting shutdown, and once
> > that's processed, it is fine to just close the socket abruptly - I
> > don't see a benefit to a TLS shutdown sequence on top.
> 
> You're right that the NBD protocol has ways for the client to advertise it
> will be shutting down, AND documents that the server must be robust to
> clients that just abruptly disconnect after that point.  But we don't have
> control over all such servers, and there may very well be a server that logs
> an error on abrupt closure, where it would be silent if we did a proper
> gnutls_bye.  Which is more important: maximum speed in disconnecting after
> we expressed intent, or maximum attempt at catering to all sorts of remote
> implementations that might not be as tolerant as qemu is of an abrupt
> termination?

It is the cost / benefit tradeoff here that matters. Correctly using
gnutls_bye(), in contexts which aren't expected to block is non-trivial
bringing notable extra code complexity. It isn't an obvious win to me
for something that just changes an error message for a scenario that
can already be cleanly handled at the application protocol level.

> 
> > AFAIK, the TLS level clean shutdown is only required if the
> > application protocol does not have any way to determine an
> > unexpected shutdown itself.
> 
> 'man gnutls_bye' states:
> 
>Note that not all implementations will properly terminate a TLS
> connec‐
>tion.   Some  of  them, usually for performance reasons, will
> terminate
>only the  underlying  transport  layer,  and  thus  not
> distinguishing
>between  a  malicious  party prematurely terminating the connection
> and
>normal termination.
> 
> You're right that because the protocol has an explicit message, we can
> reliably distinguish any early termination prior to
> NBD_OPT_ABORT/NBD_CMD_DISC as being malicious, so the only case where it
> matters is if we have a premature termination after we asked for clean
> shutdown, at which point a malicious termination didn't lose any data. So on
> that front, I guess you are right that not using gnutls_bye isn't going to
> have much impact.
> 
> > 
> > This is relevant for HTTP where the connection data stream may not
> > have a well defined end condition.
> > 
> > In the NBD case though, we have an explicit NBD_CMD_DISC to trigger
> > the disconnect. After processing that message, an EOF is acceptable
> > regardless of whether ,
> > before processing that message, any EOF is a unexpected.
> > 
> > >Or, we can end up with a deadlock where both ends are stuck
> > > on a read() from the other end but neither gets an EOF.
> > 
> > If the socket has been closed abruptly why would it get stuck in
> > read() - it should see EOF surely ?
> 
> That's what I'm trying to figure out: the nbdkit testsuite definitely hung
> even though 'qemu-nbd --list' exited, but I haven't yet figured out whether
> the bug lies in nbdkit proper or in libnbd, nor whether a cleaner tls
> shutdown would have prevented the hang in a more reliable manner.
> https://www.redhat.com/archives/libguestfs/2020-March/msg00191.html


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH 2/3] io: Support shutdown of TLS channel

2020-03-27 Thread Daniel P . Berrangé
On Fri, Mar 27, 2020 at 12:29:39PM -0500, Eric Blake wrote:
> On 3/27/20 11:40 AM, Daniel P. Berrangé wrote:
> > On Fri, Mar 27, 2020 at 11:19:35AM -0500, Eric Blake wrote:
> > > Gnutls documents that while many apps simply yank out the underlying
> > > transport at the end of communication in the name of efficiency, this
> > > is indistinguishable from a malicious actor terminating the connection
> > > prematurely.  Since our channel I/O code already supports the notion of
> > > a graceful shutdown request, it is time to plumb that through to the
> > > TLS layer, and wait for TLS to give the all clear before then
> > > terminating traffic on the underlying channel.
> > > 
> > > Note that channel-tls now always advertises shutdown support,
> > > regardless of whether the underlying channel also has that support.
> > > 
> > > Signed-off-by: Eric Blake 
> > > ---
> > >   io/channel-tls.c | 27 ++-
> > >   1 file changed, 26 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/io/channel-tls.c b/io/channel-tls.c
> > > index 7ec8ceff2f01..f90905823e1d 100644
> > > --- a/io/channel-tls.c
> > > +++ b/io/channel-tls.c
> > > @@ -360,10 +360,35 @@ static int qio_channel_tls_shutdown(QIOChannel *ioc,
> > >   Error **errp)
> > >   {
> > >   QIOChannelTLS *tioc = QIO_CHANNEL_TLS(ioc);
> > > +int ret = 0;
> > > 
> > >   tioc->shutdown |= how;
> > > 
> > > -return qio_channel_shutdown(tioc->master, how, errp);
> > > +do {
> > > +switch (how) {
> > > +case QIO_CHANNEL_SHUTDOWN_READ:
> > > +/* No TLS counterpart */
> > > +break;
> > > +case QIO_CHANNEL_SHUTDOWN_WRITE:
> > > +ret = qcrypto_tls_session_shutdown(tioc->session, 
> > > QCRYPTO_SHUT_WR);
> > > +break;
> > > +case QIO_CHANNEL_SHUTDOWN_BOTH:
> > > +ret = qcrypto_tls_session_shutdown(tioc->session,
> > > +   QCRYPTO_SHUT_RDWR);
> > > +break;
> > > +default:
> > > +abort();
> > > +}
> > > +} while (ret == -EAGAIN);
> > 
> > I don't think it is acceptable to do this loop here. The gnutls_bye()
> > function triggers several I/O operations which could block. Looping
> > like this means we busy-wait, blocking this thread for as long as I/O
> > is blocking on the socket.
> 
> Hmm, good point.  Should we give qio_channel_tls_shutdown a bool parameter
> that says whether it should wait (good for use when we are being run in a
> coroutine and can deal with the additional I/O) or just fail with -EAGAIN
> (which the caller can ignore if it is not worried)?

A bool would not be sufficient, because the caller would need to know
which direction to wait for I/O on.

Looking at the code it does a flush of outstanding data, then sends
some bytes, and then receives some bytes. The write will probably
work most of the time, but the receive is almost always going to
return -EAGAIN. So I don't think that failing with EGAIN is going
to be much of a benefit.

> > If we must call gnutls_bye(), then it needs to be done in a way that
> > can integrate with the main loop so it poll()'s / unblocks the current
> > coroutine/thread.  This makes the whole thing significantly more
> > complex to deal with, especially if the shutdown is being done in
> > cleanup paths which ordinarily are expected to execute without
> > blocking on I/O.  This is the big reason why i never made any attempt
> > to use gnutls_bye().
> 
> We _are_ using gnutls_bye(GNUTLS_SHUT_RDWR) on the close() path (which is

Are you sure ?  AFAIK there is no existing usage of gnutls_bye() at all
in QEMU.

> indeed a cleanup path where not blocking is worthwhile) even without this
> patch, but the question is whether using gnutls_bye(GNUTLS_SHUT_WR) in the
> normal data path, where we are already using coroutines to manage callbacks,
> can benefit the remote endpoint, giving them a chance to see clean shutdown
> instead of abrupt termination.

I'm not convinced the clean shutdown at the TLS level does anything important
given that we already have done a clean shutdown at the NBD level.


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH 3/3] nbd: Use shutdown(SHUT_WR) after last item sent

2020-03-27 Thread Eric Blake

On 3/27/20 11:35 AM, Daniel P. Berrangé wrote:

On Fri, Mar 27, 2020 at 11:19:36AM -0500, Eric Blake wrote:

Although the remote end should always be tolerant of a socket being
arbitrarily closed, there are situations where it is a lot easier if
the remote end can be guaranteed to read EOF even before the socket
has closed.  In particular, when using gnutls, if we fail to inform
the remote end about an impending teardown, the remote end cannot
distinguish between our closing the socket as intended vs. a malicious
intermediary interrupting things, and may result in spurious error
messages.


Does this actually matter in the NBD case ?

It has an explicit NBD command for requesting shutdown, and once
that's processed, it is fine to just close the socket abruptly - I
don't see a benefit to a TLS shutdown sequence on top.


You're right that the NBD protocol has ways for the client to advertise 
it will be shutting down, AND documents that the server must be robust 
to clients that just abruptly disconnect after that point.  But we don't 
have control over all such servers, and there may very well be a server 
that logs an error on abrupt closure, where it would be silent if we did 
a proper gnutls_bye.  Which is more important: maximum speed in 
disconnecting after we expressed intent, or maximum attempt at catering 
to all sorts of remote implementations that might not be as tolerant as 
qemu is of an abrupt termination?



AFAIK, the TLS level clean shutdown is only required if the
application protocol does not have any way to determine an
unexpected shutdown itself.


'man gnutls_bye' states:

   Note that not all implementations will properly terminate a TLS 
connec‐
   tion.   Some  of  them, usually for performance reasons, will 
terminate
   only the  underlying  transport  layer,  and  thus  not 
distinguishing
   between  a  malicious  party prematurely terminating the 
connection and

   normal termination.

You're right that because the protocol has an explicit message, we can 
reliably distinguish any early termination prior to 
NBD_OPT_ABORT/NBD_CMD_DISC as being malicious, so the only case where it 
matters is if we have a premature termination after we asked for clean 
shutdown, at which point a malicious termination didn't lose any data. 
So on that front, I guess you are right that not using gnutls_bye isn't 
going to have much impact.




This is relevant for HTTP where the connection data stream may not
have a well defined end condition.

In the NBD case though, we have an explicit NBD_CMD_DISC to trigger
the disconnect. After processing that message, an EOF is acceptable
regardless of whether ,
before processing that message, any EOF is a unexpected.


   Or, we can end up with a deadlock where both ends are stuck
on a read() from the other end but neither gets an EOF.


If the socket has been closed abruptly why would it get stuck in
read() - it should see EOF surely ?


That's what I'm trying to figure out: the nbdkit testsuite definitely 
hung even though 'qemu-nbd --list' exited, but I haven't yet figured out 
whether the bug lies in nbdkit proper or in libnbd, nor whether a 
cleaner tls shutdown would have prevented the hang in a more reliable 
manner. https://www.redhat.com/archives/libguestfs/2020-March/msg00191.html


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [PULL 0/6] Block layer patches

2020-03-27 Thread Peter Maydell
On Fri, 27 Mar 2020 at 15:20, Kevin Wolf  wrote:
>
> The following changes since commit cfe68ae025f704f336d7dd3d1903ce37b445831d:
>
>   Merge remote-tracking branch 
> 'remotes/vivier2/tags/linux-user-for-5.0-pull-request' into staging 
> (2020-03-26 20:55:54 +)
>
> are available in the Git repository at:
>
>   git://repo.or.cz/qemu/kevin.git tags/for-upstream
>
> for you to fetch changes up to df74b1d3dff80983e7a30db1346a4a05847d65fa:
>
>   qcow2: Remove unused fields from BDRVQcow2State (2020-03-27 14:47:23 +0100)
>
> 
> Block layer patches:
>
> - Fix another case of mirror block job deadlocks
> - Minor fixes


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/5.0
for any user-visible changes.

-- PMM



Re: [PATCH 2/3] io: Support shutdown of TLS channel

2020-03-27 Thread Eric Blake

On 3/27/20 11:40 AM, Daniel P. Berrangé wrote:

On Fri, Mar 27, 2020 at 11:19:35AM -0500, Eric Blake wrote:

Gnutls documents that while many apps simply yank out the underlying
transport at the end of communication in the name of efficiency, this
is indistinguishable from a malicious actor terminating the connection
prematurely.  Since our channel I/O code already supports the notion of
a graceful shutdown request, it is time to plumb that through to the
TLS layer, and wait for TLS to give the all clear before then
terminating traffic on the underlying channel.

Note that channel-tls now always advertises shutdown support,
regardless of whether the underlying channel also has that support.

Signed-off-by: Eric Blake 
---
  io/channel-tls.c | 27 ++-
  1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/io/channel-tls.c b/io/channel-tls.c
index 7ec8ceff2f01..f90905823e1d 100644
--- a/io/channel-tls.c
+++ b/io/channel-tls.c
@@ -360,10 +360,35 @@ static int qio_channel_tls_shutdown(QIOChannel *ioc,
  Error **errp)
  {
  QIOChannelTLS *tioc = QIO_CHANNEL_TLS(ioc);
+int ret = 0;

  tioc->shutdown |= how;

-return qio_channel_shutdown(tioc->master, how, errp);
+do {
+switch (how) {
+case QIO_CHANNEL_SHUTDOWN_READ:
+/* No TLS counterpart */
+break;
+case QIO_CHANNEL_SHUTDOWN_WRITE:
+ret = qcrypto_tls_session_shutdown(tioc->session, QCRYPTO_SHUT_WR);
+break;
+case QIO_CHANNEL_SHUTDOWN_BOTH:
+ret = qcrypto_tls_session_shutdown(tioc->session,
+   QCRYPTO_SHUT_RDWR);
+break;
+default:
+abort();
+}
+} while (ret == -EAGAIN);


I don't think it is acceptable to do this loop here. The gnutls_bye()
function triggers several I/O operations which could block. Looping
like this means we busy-wait, blocking this thread for as long as I/O
is blocking on the socket.


Hmm, good point.  Should we give qio_channel_tls_shutdown a bool 
parameter that says whether it should wait (good for use when we are 
being run in a coroutine and can deal with the additional I/O) or just 
fail with -EAGAIN (which the caller can ignore if it is not worried)?




If we must call gnutls_bye(), then it needs to be done in a way that
can integrate with the main loop so it poll()'s / unblocks the current
coroutine/thread.  This makes the whole thing significantly more
complex to deal with, especially if the shutdown is being done in
cleanup paths which ordinarily are expected to execute without
blocking on I/O.  This is the big reason why i never made any attempt
to use gnutls_bye().


We _are_ using gnutls_bye(GNUTLS_SHUT_RDWR) on the close() path (which 
is indeed a cleanup path where not blocking is worthwhile) even without 
this patch, but the question is whether using gnutls_bye(GNUTLS_SHUT_WR) 
in the normal data path, where we are already using coroutines to manage 
callbacks, can benefit the remote endpoint, giving them a chance to see 
clean shutdown instead of abrupt termination.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




[PATCH v3] qcow2: Forbid discard in qcow2 v2 images with backing files

2020-03-27 Thread Alberto Garcia
A discard request deallocates the selected clusters so they read back
as zeroes. This is done by clearing the cluster offset field and
setting QCOW_OFLAG_ZERO in the L2 entry.

This flag is however only supported when qcow_version >= 3. In older
images the cluster is simply deallocated, exposing any possible stale
data from the backing file.

Since discard is an advisory operation it's safer to simply forbid it
in this scenario.

Note that we are adding this check to qcow2_co_pdiscard() and not to
qcow2_cluster_discard() or discard_in_l2_slice() because the last
two are also used by qcow2_snapshot_create() to discard the clusters
used by the VM state. In this case there's no risk of exposing stale
data to the guest and we really want that the clusters are always
discarded.

Signed-off-by: Alberto Garcia 
---
v3:
- Rebase and change iotest number
- Show output of qemu-img map in iotest 290 [Kevin]
- Use the l2_offset and rb_offset variables in iotest 060

v2:

- Don't create the image with compat=0.10 in iotest 060 [Max]
- Use $TEST_IMG.base for the backing image name in iotest 289 [Max]
- Add list of unsupported options to iotest 289 [Max]

 block/qcow2.c  |  6 +++
 tests/qemu-iotests/060 | 12 ++---
 tests/qemu-iotests/060.out |  2 -
 tests/qemu-iotests/290 | 94 ++
 tests/qemu-iotests/290.out | 57 +++
 tests/qemu-iotests/group   |  1 +
 6 files changed, 163 insertions(+), 9 deletions(-)
 create mode 100755 tests/qemu-iotests/290
 create mode 100644 tests/qemu-iotests/290.out

diff --git a/block/qcow2.c b/block/qcow2.c
index 2bb536b014..e8cbcc1ec1 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3784,6 +3784,12 @@ static coroutine_fn int 
qcow2_co_pdiscard(BlockDriverState *bs,
 int ret;
 BDRVQcow2State *s = bs->opaque;
 
+/* If the image does not support QCOW_OFLAG_ZERO then discarding
+ * clusters could expose stale data from the backing file. */
+if (s->qcow_version < 3 && bs->backing) {
+return -ENOTSUP;
+}
+
 if (!QEMU_IS_ALIGNED(offset | bytes, s->cluster_size)) {
 assert(bytes < s->cluster_size);
 /* Ignore partial clusters, except for the special case of the
diff --git a/tests/qemu-iotests/060 b/tests/qemu-iotests/060
index 043f12904a..32c0ecce9e 100755
--- a/tests/qemu-iotests/060
+++ b/tests/qemu-iotests/060
@@ -160,18 +160,16 @@ TEST_IMG=$BACKING_IMG _make_test_img 1G
 
 $QEMU_IO -c 'write 0k 64k' "$BACKING_IMG" | _filter_qemu_io
 
-# compat=0.10 is required in order to make the following discard actually
-# unallocate the sector rather than make it a zero sector - we want COW, after
-# all.
-_make_test_img -o 'compat=0.10' -b "$BACKING_IMG" 1G
+_make_test_img -b "$BACKING_IMG" 1G
 # Write two clusters, the second one enforces creation of an L2 table after
 # the first data cluster.
 $QEMU_IO -c 'write 0k 64k' -c 'write 512M 64k' "$TEST_IMG" | _filter_qemu_io
-# Discard the first cluster. This cluster will soon enough be reallocated and
+# Free the first cluster. This cluster will soon enough be reallocated and
 # used for COW.
-$QEMU_IO -c 'discard 0k 64k' "$TEST_IMG" | _filter_qemu_io
+poke_file "$TEST_IMG" "$l2_offset" "\x00\x00\x00\x00\x00\x00\x00\x00"
+poke_file "$TEST_IMG" "$(($rb_offset+10))" "\x00\x00"
 # Now, corrupt the image by marking the second L2 table cluster as free.
-poke_file "$TEST_IMG" '131084' "\x00\x00" # 0x2000c
+poke_file "$TEST_IMG" "$(($rb_offset+12))" "\x00\x00"
 # Start a write operation requiring COW on the image stopping it right before
 # doing the read; then, trigger the corruption prevention by writing anything 
to
 # any unallocated cluster, leading to an attempt to overwrite the second L2
diff --git a/tests/qemu-iotests/060.out b/tests/qemu-iotests/060.out
index d27692a33c..09caaea865 100644
--- a/tests/qemu-iotests/060.out
+++ b/tests/qemu-iotests/060.out
@@ -105,8 +105,6 @@ wrote 65536/65536 bytes at offset 0
 64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 wrote 65536/65536 bytes at offset 536870912
 64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-discard 65536/65536 bytes at offset 0
-64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 qcow2: Marking image as corrupt: Preventing invalid write on metadata 
(overlaps with active L2 table); further corruption events will be suppressed
 blkdebug: Suspended request '0'
 write failed: Input/output error
diff --git a/tests/qemu-iotests/290 b/tests/qemu-iotests/290
new file mode 100755
index 00..e41d642c7f
--- /dev/null
+++ b/tests/qemu-iotests/290
@@ -0,0 +1,94 @@
+#!/usr/bin/env bash
+#
+# Test how 'qemu-io -c discard' behaves on v2 and v3 qcow2 images
+#
+# Copyright (C) 2020 Igalia, S.L.
+# Author: Alberto Garcia 
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any 

Re: [PATCH 2/3] io: Support shutdown of TLS channel

2020-03-27 Thread Daniel P . Berrangé
On Fri, Mar 27, 2020 at 11:19:35AM -0500, Eric Blake wrote:
> Gnutls documents that while many apps simply yank out the underlying
> transport at the end of communication in the name of efficiency, this
> is indistinguishable from a malicious actor terminating the connection
> prematurely.  Since our channel I/O code already supports the notion of
> a graceful shutdown request, it is time to plumb that through to the
> TLS layer, and wait for TLS to give the all clear before then
> terminating traffic on the underlying channel.
> 
> Note that channel-tls now always advertises shutdown support,
> regardless of whether the underlying channel also has that support.
> 
> Signed-off-by: Eric Blake 
> ---
>  io/channel-tls.c | 27 ++-
>  1 file changed, 26 insertions(+), 1 deletion(-)
> 
> diff --git a/io/channel-tls.c b/io/channel-tls.c
> index 7ec8ceff2f01..f90905823e1d 100644
> --- a/io/channel-tls.c
> +++ b/io/channel-tls.c
> @@ -360,10 +360,35 @@ static int qio_channel_tls_shutdown(QIOChannel *ioc,
>  Error **errp)
>  {
>  QIOChannelTLS *tioc = QIO_CHANNEL_TLS(ioc);
> +int ret = 0;
> 
>  tioc->shutdown |= how;
> 
> -return qio_channel_shutdown(tioc->master, how, errp);
> +do {
> +switch (how) {
> +case QIO_CHANNEL_SHUTDOWN_READ:
> +/* No TLS counterpart */
> +break;
> +case QIO_CHANNEL_SHUTDOWN_WRITE:
> +ret = qcrypto_tls_session_shutdown(tioc->session, 
> QCRYPTO_SHUT_WR);
> +break;
> +case QIO_CHANNEL_SHUTDOWN_BOTH:
> +ret = qcrypto_tls_session_shutdown(tioc->session,
> +   QCRYPTO_SHUT_RDWR);
> +break;
> +default:
> +abort();
> +}
> +} while (ret == -EAGAIN);

I don't think it is acceptable to do this loop here. The gnutls_bye()
function triggers several I/O operations which could block. Looping
like this means we busy-wait, blocking this thread for as long as I/O
is blocking on the socket.

If we must call gnutls_bye(), then it needs to be done in a way that
can integrate with the main loop so it poll()'s / unblocks the current
coroutine/thread.  This makes the whole thing significantly more
complex to deal with, especially if the shutdown is being done in
cleanup paths which ordinarily are expected to execute without
blocking on I/O.  This is the big reason why i never made any attempt
to use gnutls_bye().


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH 3/3] nbd: Use shutdown(SHUT_WR) after last item sent

2020-03-27 Thread Daniel P . Berrangé
On Fri, Mar 27, 2020 at 11:19:36AM -0500, Eric Blake wrote:
> Although the remote end should always be tolerant of a socket being
> arbitrarily closed, there are situations where it is a lot easier if
> the remote end can be guaranteed to read EOF even before the socket
> has closed.  In particular, when using gnutls, if we fail to inform
> the remote end about an impending teardown, the remote end cannot
> distinguish between our closing the socket as intended vs. a malicious
> intermediary interrupting things, and may result in spurious error
> messages.

Does this actually matter in the NBD case ?

It has an explicit NBD command for requesting shutdown, and once
that's processed, it is fine to just close the socket abruptly - I
don't see a benefit to a TLS shutdown sequence on top.
AFAIK, the TLS level clean shutdown is only required if the
application protocol does not have any way to determine an
unexpected shutdown itself.

This is relevant for HTTP where the connection data stream may not
have a well defined end condition.

In the NBD case though, we have an explicit NBD_CMD_DISC to trigger
the disconnect. After processing that message, an EOF is acceptable
regardless of whether ,
before processing that message, any EOF is a unexpected.

>   Or, we can end up with a deadlock where both ends are stuck
> on a read() from the other end but neither gets an EOF.

If the socket has been closed abruptly why would it get stuck in
read() - it should see EOF surely ?

>  Thus, after
> any time a client sends NBD_OPT_ABORT or NBD_CMD_DISC, or a server has
> finished replying (where appropriate) to such a request, it is worth
> informing the channel that we will not be transmitting anything else.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH v2] qcow2: Forbid discard in qcow2 v2 images with backing files

2020-03-27 Thread Alberto Garcia
On Fri 27 Mar 2020 03:14:42 PM CET, Kevin Wolf wrote:
>> +echo "# Create a backing image and fill it with data"
>> +BACKING_IMG="$TEST_IMG.base"
>> +TEST_IMG="$BACKING_IMG" _make_test_img 128k
>> +$QEMU_IO -c 'write -P 0xff 0 128k' "$BACKING_IMG" | _filter_qemu_io
>> +
>> +for qcow2_compat in 0.10 1.1; do
>> +echo "# Create an image with compat=$qcow2_compat and a backing file"
>> +_make_test_img -o "compat=$qcow2_compat" -b "$BACKING_IMG"
>
> I would write some non-zero data to the backing file so that you can
> later distinguish zero clusters from unallocated clusters.

I'm doing that already (immediately before the 'for' loop).

I'll add the output of qemu-img map in v3.

Berto



[PATCH 2/3] io: Support shutdown of TLS channel

2020-03-27 Thread Eric Blake
Gnutls documents that while many apps simply yank out the underlying
transport at the end of communication in the name of efficiency, this
is indistinguishable from a malicious actor terminating the connection
prematurely.  Since our channel I/O code already supports the notion of
a graceful shutdown request, it is time to plumb that through to the
TLS layer, and wait for TLS to give the all clear before then
terminating traffic on the underlying channel.

Note that channel-tls now always advertises shutdown support,
regardless of whether the underlying channel also has that support.

Signed-off-by: Eric Blake 
---
 io/channel-tls.c | 27 ++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/io/channel-tls.c b/io/channel-tls.c
index 7ec8ceff2f01..f90905823e1d 100644
--- a/io/channel-tls.c
+++ b/io/channel-tls.c
@@ -360,10 +360,35 @@ static int qio_channel_tls_shutdown(QIOChannel *ioc,
 Error **errp)
 {
 QIOChannelTLS *tioc = QIO_CHANNEL_TLS(ioc);
+int ret = 0;

 tioc->shutdown |= how;

-return qio_channel_shutdown(tioc->master, how, errp);
+do {
+switch (how) {
+case QIO_CHANNEL_SHUTDOWN_READ:
+/* No TLS counterpart */
+break;
+case QIO_CHANNEL_SHUTDOWN_WRITE:
+ret = qcrypto_tls_session_shutdown(tioc->session, QCRYPTO_SHUT_WR);
+break;
+case QIO_CHANNEL_SHUTDOWN_BOTH:
+ret = qcrypto_tls_session_shutdown(tioc->session,
+   QCRYPTO_SHUT_RDWR);
+break;
+default:
+abort();
+}
+} while (ret == -EAGAIN);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "Cannot shut down TLS channel");
+return -1;
+}
+
+if (qio_channel_has_feature(tioc->master, QIO_CHANNEL_FEATURE_SHUTDOWN)) {
+return qio_channel_shutdown(tioc->master, how, errp);
+}
+return 0;
 }

 static int qio_channel_tls_close(QIOChannel *ioc,
-- 
2.26.0.rc2




[PATCH 1/3] crypto: Add qcrypto_tls_shutdown()

2020-03-27 Thread Eric Blake
Gnutls documents that applications that want to distinguish between a
clean end-of-communication and a malicious client abruptly tearing the
underlying transport out of under our feet need to use gnutls_bye().
Our channel code is already set up to allow shutdown requests, but we
weren't forwarding those to gnutls.  To make that work, we first need
a new entry point that can isolate the rest of our code from the
gnutls interface.

Signed-off-by: Eric Blake 
---
 qapi/crypto.json| 15 +++
 include/crypto/tlssession.h | 24 
 crypto/tlssession.c | 27 +++
 3 files changed, 66 insertions(+)

diff --git a/qapi/crypto.json b/qapi/crypto.json
index b2a4cff683ff..1df0f4502885 100644
--- a/qapi/crypto.json
+++ b/qapi/crypto.json
@@ -119,6 +119,21 @@
   'prefix': 'QCRYPTO_IVGEN_ALG',
   'data': ['plain', 'plain64', 'essiv']}

+##
+# @QCryptoShutdownMode:
+#
+# The supported modes for requesting shutdown of a crypto
+# communication channel.
+#
+# @shut-wr: No more writes will be sent, but the remote end can still send
+#   data to be read.
+# @shut-rdwr: No more reads or writes should occur.
+# Since: 5.1
+##
+{ 'enum': 'QCryptoShutdownMode',
+  'prefix': 'QCRYPTO',
+  'data': ['shut-wr', 'shut-rdwr']}
+
 ##
 # @QCryptoBlockFormat:
 #
diff --git a/include/crypto/tlssession.h b/include/crypto/tlssession.h
index 15b9cef086cc..10c670e3b6a2 100644
--- a/include/crypto/tlssession.h
+++ b/include/crypto/tlssession.h
@@ -321,4 +321,28 @@ int qcrypto_tls_session_get_key_size(QCryptoTLSSession 
*sess,
  */
 char *qcrypto_tls_session_get_peer_name(QCryptoTLSSession *sess);

+/**
+ * qcrypto_tls_shutdown:
+ * @sess: the TLS session object
+ * @how: the desired shutdown mode
+ *
+ * Prepare to terminate the session.  If @how is QCRYPTO_SHUT_WR, this
+ * side will no longer write data, but should still process reads
+ * until EOF; if @how is QCRYPTO_SHUT_RDWR, then the entire session
+ * should shut down.  Use of this function is optional, since closing
+ * the session implies QCRYPTO_SHUT_RDWR.  However, using this
+ * function prior to terminating the underlying transport layer makes
+ * it possible for the remote endpoint to distinguish between a
+ * malicious party prematurely terminating the the connection and
+ * normal termination.
+ *
+ * This function should only be used after a successful
+ * qcrypto_tls_session_handshake().
+ *
+ * Returns: 0 for success, or -EAGAIN if more underlying I/O is
+ * required to finish proper session shutdown.
+ */
+int qcrypto_tls_session_shutdown(QCryptoTLSSession *sess,
+ QCryptoShutdownMode how);
+
 #endif /* QCRYPTO_TLSSESSION_H */
diff --git a/crypto/tlssession.c b/crypto/tlssession.c
index 33203e8ca711..903301189069 100644
--- a/crypto/tlssession.c
+++ b/crypto/tlssession.c
@@ -521,6 +521,33 @@ qcrypto_tls_session_get_handshake_status(QCryptoTLSSession 
*session)
 }


+int qcrypto_tls_session_shutdown(QCryptoTLSSession *session,
+ QCryptoShutdownMode how)
+{
+gnutls_close_request_t mode;
+int ret;
+
+assert(session->handshakeComplete);
+switch (how) {
+case QCRYPTO_SHUT_WR:
+mode = GNUTLS_SHUT_WR;
+break;
+case QCRYPTO_SHUT_RDWR:
+mode = GNUTLS_SHUT_RDWR;
+break;
+default:
+abort();
+}
+
+ret = gnutls_bye(session->handle, mode);
+if (ret == GNUTLS_E_INTERRUPTED ||
+ret == GNUTLS_E_AGAIN) {
+return -EAGAIN;
+}
+return 0;
+}
+
+
 int
 qcrypto_tls_session_get_key_size(QCryptoTLSSession *session,
  Error **errp)
-- 
2.26.0.rc2




[PATCH 0/3] nbd: Try for cleaner TLS shutdown

2020-03-27 Thread Eric Blake
I'm currently trying to debug a hang in 'nbdkit nbd' as a client when
TLS is in use, when dealing with three processes:
nbdkit example1[1] <=tls=> nbdkit nbd[2] <=plain=> qemu-nbd --list[3]

In theory, when process [3] exits, the server side of process [2] is
supposed to disconnect as client from process [1].  But we were
hitting sporadic hangs where process [2] was stuck in a poll() while
in gnutls cleanup code, which appears to be because it did not
recognize a clean shutdown from process [3].

I do not yet have a strong reproducible test case (which might include
adding strategic sleep()s in one-off compilations) to prove that the
problem is an unclean TLS shutdown, but at the same time, the gnutls
documentation is clear that without proper use of gnutls_bye(), a
client cannot distinguish between clean end of communications and a
malicious interruption in service.  Shutting down the write side of a
bi-directional socket as soon as possible makes it easier for a client
to begin its own shutdown actions, and can help in avoiding deadlocks
where both sides of a connection are waiting for the other side to
close() the socket first.  If I can come up with a reliable test case,
this series may be a candidate for 5.0; but for now, I'm only
proposing it for 5.1 (we've had more than one release where qemu was
not exploiting the full power of gnutls_bye(), and even if that
triggers poor interaction with other endpoints such as nbdkit, going
one more release with the same behavior isn't making things worse).

Eric Blake (3):
  crypto: Add qcrypto_tls_shutdown()
  io: Support shutdown of TLS channel
  nbd: Use shutdown(SHUT_WR) after last item sent

 qapi/crypto.json| 15 +++
 include/crypto/tlssession.h | 24 
 block/nbd.c |  1 +
 crypto/tlssession.c | 27 +++
 io/channel-tls.c| 27 ++-
 nbd/client.c|  3 ++-
 nbd/server.c|  4 
 7 files changed, 99 insertions(+), 2 deletions(-)

-- 
2.26.0.rc2




[PATCH 3/3] nbd: Use shutdown(SHUT_WR) after last item sent

2020-03-27 Thread Eric Blake
Although the remote end should always be tolerant of a socket being
arbitrarily closed, there are situations where it is a lot easier if
the remote end can be guaranteed to read EOF even before the socket
has closed.  In particular, when using gnutls, if we fail to inform
the remote end about an impending teardown, the remote end cannot
distinguish between our closing the socket as intended vs. a malicious
intermediary interrupting things, and may result in spurious error
messages.  Or, we can end up with a deadlock where both ends are stuck
on a read() from the other end but neither gets an EOF.  Thus, after
any time a client sends NBD_OPT_ABORT or NBD_CMD_DISC, or a server has
finished replying (where appropriate) to such a request, it is worth
informing the channel that we will not be transmitting anything else.

Signed-off-by: Eric Blake 
---
 block/nbd.c  | 1 +
 nbd/client.c | 3 ++-
 nbd/server.c | 4 
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/block/nbd.c b/block/nbd.c
index 2160859f6499..2906484390f9 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -1402,6 +1402,7 @@ static void nbd_client_close(BlockDriverState *bs)

 if (s->ioc) {
 nbd_send_request(s->ioc, );
+qio_channel_shutdown(s->ioc, QIO_CHANNEL_SHUTDOWN_WRITE, NULL);
 }

 nbd_teardown_connection(bs);
diff --git a/nbd/client.c b/nbd/client.c
index ba173108baab..1b8b3a9ae3bd 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -1,5 +1,5 @@
 /*
- *  Copyright (C) 2016-2019 Red Hat, Inc.
+ *  Copyright (C) 2016-2020 Red Hat, Inc.
  *  Copyright (C) 2005  Anthony Liguori 
  *
  *  Network Block Device Client Side
@@ -103,6 +103,7 @@ static void nbd_send_opt_abort(QIOChannel *ioc)
  * even care if the request makes it to the server, let alone
  * waiting around for whether the server replies. */
 nbd_send_option_request(ioc, NBD_OPT_ABORT, 0, NULL, NULL);
+qio_channel_shutdown(ioc, QIO_CHANNEL_SHUTDOWN_WRITE, NULL);
 }


diff --git a/nbd/server.c b/nbd/server.c
index 02b1ed080145..e21a1f662cc2 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1168,6 +1168,8 @@ static int nbd_negotiate_options(NBDClient *client, Error 
**errp)
"Option 0x%" PRIx32
" not permitted before TLS", option);
 if (option == NBD_OPT_ABORT) {
+qio_channel_shutdown(client->ioc,
+ QIO_CHANNEL_SHUTDOWN_WRITE, NULL);
 return 1;
 }
 break;
@@ -1187,6 +1189,8 @@ static int nbd_negotiate_options(NBDClient *client, Error 
**errp)
  * disconnecting, but that we must also tolerate
  * guests that don't wait for our reply. */
 nbd_negotiate_send_rep(client, NBD_REP_ACK, NULL);
+qio_channel_shutdown(client->ioc,
+ QIO_CHANNEL_SHUTDOWN_WRITE, NULL);
 return 1;

 case NBD_OPT_EXPORT_NAME:
-- 
2.26.0.rc2




[PULL 1/6] block/iscsi:use the flags in iscsi_open() prevent Clang warning

2020-03-27 Thread Kevin Wolf
From: Chen Qun 

Clang static code analyzer show warning:
  block/iscsi.c:1920:9: warning: Value stored to 'flags' is never read
flags &= ~BDRV_O_RDWR;
^

In iscsi_allocmap_init() only checks BDRV_O_NOCACHE, which
is the same in both of flags and bs->open_flags.
We can use the flags instead bs->open_flags to prevent Clang warning.

Reported-by: Euler Robot 
Signed-off-by: Chen Qun 
Reviewed-by: Kevin Wolf 
Message-Id: <20200311032927.35092-1-kuhn.chen...@huawei.com>
Signed-off-by: Kevin Wolf 
---
 block/iscsi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index 14680a436a..4e216bd8aa 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -2002,7 +2002,7 @@ static int iscsi_open(BlockDriverState *bs, QDict 
*options, int flags,
 iscsilun->cluster_size = iscsilun->bl.opt_unmap_gran *
 iscsilun->block_size;
 if (iscsilun->lbprz) {
-ret = iscsi_allocmap_init(iscsilun, bs->open_flags);
+ret = iscsi_allocmap_init(iscsilun, flags);
 }
 }
 
-- 
2.20.1




[PULL 4/6] Revert "mirror: Don't let an operation wait for itself"

2020-03-27 Thread Kevin Wolf
This reverts commit 7e6c4ff792734e196c8ca82564c56b5e7c6288ca.

The fix was incomplete as it only protected against requests waiting for
themselves, but not against requests waiting for each other. We need a
different solution.

Signed-off-by: Kevin Wolf 
Message-Id: <20200326153628.4869-2-kw...@redhat.com>
Reviewed-by: Eric Blake 
Signed-off-by: Kevin Wolf 
---
 block/mirror.c | 21 +
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 6203e5946e..5879e63473 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -283,14 +283,11 @@ static int mirror_cow_align(MirrorBlockJob *s, int64_t 
*offset,
 }
 
 static inline void coroutine_fn
-mirror_wait_for_any_operation(MirrorBlockJob *s, MirrorOp *self, bool active)
+mirror_wait_for_any_operation(MirrorBlockJob *s, bool active)
 {
 MirrorOp *op;
 
 QTAILQ_FOREACH(op, >ops_in_flight, next) {
-if (self == op) {
-continue;
-}
 /* Do not wait on pseudo ops, because it may in turn wait on
  * some other operation to start, which may in fact be the
  * caller of this function.  Since there is only one pseudo op
@@ -305,10 +302,10 @@ mirror_wait_for_any_operation(MirrorBlockJob *s, MirrorOp 
*self, bool active)
 }
 
 static inline void coroutine_fn
-mirror_wait_for_free_in_flight_slot(MirrorBlockJob *s, MirrorOp *self)
+mirror_wait_for_free_in_flight_slot(MirrorBlockJob *s)
 {
 /* Only non-active operations use up in-flight slots */
-mirror_wait_for_any_operation(s, self, false);
+mirror_wait_for_any_operation(s, false);
 }
 
 /* Perform a mirror copy operation.
@@ -351,7 +348,7 @@ static void coroutine_fn mirror_co_read(void *opaque)
 
 while (s->buf_free_count < nb_chunks) {
 trace_mirror_yield_in_flight(s, op->offset, s->in_flight);
-mirror_wait_for_free_in_flight_slot(s, op);
+mirror_wait_for_free_in_flight_slot(s);
 }
 
 /* Now make a QEMUIOVector taking enough granularity-sized chunks
@@ -558,7 +555,7 @@ static uint64_t coroutine_fn 
mirror_iteration(MirrorBlockJob *s)
 
 while (s->in_flight >= MAX_IN_FLIGHT) {
 trace_mirror_yield_in_flight(s, offset, s->in_flight);
-mirror_wait_for_free_in_flight_slot(s, pseudo_op);
+mirror_wait_for_free_in_flight_slot(s);
 }
 
 if (s->ret < 0) {
@@ -612,7 +609,7 @@ static void mirror_free_init(MirrorBlockJob *s)
 static void coroutine_fn mirror_wait_for_all_io(MirrorBlockJob *s)
 {
 while (s->in_flight > 0) {
-mirror_wait_for_free_in_flight_slot(s, NULL);
+mirror_wait_for_free_in_flight_slot(s);
 }
 }
 
@@ -810,7 +807,7 @@ static int coroutine_fn mirror_dirty_init(MirrorBlockJob *s)
 if (s->in_flight >= MAX_IN_FLIGHT) {
 trace_mirror_yield(s, UINT64_MAX, s->buf_free_count,
s->in_flight);
-mirror_wait_for_free_in_flight_slot(s, NULL);
+mirror_wait_for_free_in_flight_slot(s);
 continue;
 }
 
@@ -963,7 +960,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
 /* Do not start passive operations while there are active
  * writes in progress */
 while (s->in_active_write_counter) {
-mirror_wait_for_any_operation(s, NULL, true);
+mirror_wait_for_any_operation(s, true);
 }
 
 if (s->ret < 0) {
@@ -989,7 +986,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
 if (s->in_flight >= MAX_IN_FLIGHT || s->buf_free_count == 0 ||
 (cnt == 0 && s->in_flight > 0)) {
 trace_mirror_yield(s, cnt, s->buf_free_count, s->in_flight);
-mirror_wait_for_free_in_flight_slot(s, NULL);
+mirror_wait_for_free_in_flight_slot(s);
 continue;
 } else if (cnt != 0) {
 delay_ns = mirror_iteration(s);
-- 
2.20.1




[PULL 5/6] mirror: Wait only for in-flight operations

2020-03-27 Thread Kevin Wolf
mirror_wait_for_free_in_flight_slot() just picks a random operation to
wait for. However, a MirrorOp is already in s->ops_in_flight when
mirror_co_read() waits for free slots, so if not enough slots are
immediately available, an operation can end up waiting for itself, or
two or more operations can wait for each other to complete, which
results in a hang.

Fix this by adding a flag to MirrorOp that tells us if the request is
already in flight (and therefore occupies slots that it will later
free), and picking only such operations for waiting.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1794692
Signed-off-by: Kevin Wolf 
Message-Id: <20200326153628.4869-3-kw...@redhat.com>
Reviewed-by: Eric Blake 
Signed-off-by: Kevin Wolf 
---
 block/mirror.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/block/mirror.c b/block/mirror.c
index 5879e63473..c26fd9260d 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -102,6 +102,7 @@ struct MirrorOp {
 
 bool is_pseudo_op;
 bool is_active_write;
+bool is_in_flight;
 CoQueue waiting_requests;
 Coroutine *co;
 
@@ -293,7 +294,9 @@ mirror_wait_for_any_operation(MirrorBlockJob *s, bool 
active)
  * caller of this function.  Since there is only one pseudo op
  * at any given time, we will always find some real operation
  * to wait on. */
-if (!op->is_pseudo_op && op->is_active_write == active) {
+if (!op->is_pseudo_op && op->is_in_flight &&
+op->is_active_write == active)
+{
 qemu_co_queue_wait(>waiting_requests, NULL);
 return;
 }
@@ -367,6 +370,7 @@ static void coroutine_fn mirror_co_read(void *opaque)
 /* Copy the dirty cluster.  */
 s->in_flight++;
 s->bytes_in_flight += op->bytes;
+op->is_in_flight = true;
 trace_mirror_one_iteration(s, op->offset, op->bytes);
 
 ret = bdrv_co_preadv(s->mirror_top_bs->backing, op->offset, op->bytes,
@@ -382,6 +386,7 @@ static void coroutine_fn mirror_co_zero(void *opaque)
 op->s->in_flight++;
 op->s->bytes_in_flight += op->bytes;
 *op->bytes_handled = op->bytes;
+op->is_in_flight = true;
 
 ret = blk_co_pwrite_zeroes(op->s->target, op->offset, op->bytes,
op->s->unmap ? BDRV_REQ_MAY_UNMAP : 0);
@@ -396,6 +401,7 @@ static void coroutine_fn mirror_co_discard(void *opaque)
 op->s->in_flight++;
 op->s->bytes_in_flight += op->bytes;
 *op->bytes_handled = op->bytes;
+op->is_in_flight = true;
 
 ret = blk_co_pdiscard(op->s->target, op->offset, op->bytes);
 mirror_write_complete(op, ret);
@@ -1319,6 +1325,7 @@ static MirrorOp *coroutine_fn 
active_write_prepare(MirrorBlockJob *s,
 .offset = offset,
 .bytes  = bytes,
 .is_active_write= true,
+.is_in_flight   = true,
 };
 qemu_co_queue_init(>waiting_requests);
 QTAILQ_INSERT_TAIL(>ops_in_flight, op, next);
-- 
2.20.1




[PULL 6/6] qcow2: Remove unused fields from BDRVQcow2State

2020-03-27 Thread Kevin Wolf
These fields were already removed in commit c3c10f72, but then commit
b58deb34 revived them probably due to bad merge conflict resolution.
They are still unused, so remove them again.

Fixes: b58deb344ddff3b9d8b265bf73a65274767ee5f4
Signed-off-by: Kevin Wolf 
Message-Id: <20200326170757.12344-1-kw...@redhat.com>
Reviewed-by: Eric Blake 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Kevin Wolf 
---
 block/qcow2.h | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 0942126232..f4de0a27d5 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -301,9 +301,6 @@ typedef struct BDRVQcow2State {
 QEMUTimer *cache_clean_timer;
 unsigned cache_clean_interval;
 
-uint8_t *cluster_cache;
-uint8_t *cluster_data;
-uint64_t cluster_cache_offset;
 QLIST_HEAD(, QCowL2Meta) cluster_allocs;
 
 uint64_t *refcount_table;
-- 
2.20.1




[PULL 0/6] Block layer patches

2020-03-27 Thread Kevin Wolf
The following changes since commit cfe68ae025f704f336d7dd3d1903ce37b445831d:

  Merge remote-tracking branch 
'remotes/vivier2/tags/linux-user-for-5.0-pull-request' into staging (2020-03-26 
20:55:54 +)

are available in the Git repository at:

  git://repo.or.cz/qemu/kevin.git tags/for-upstream

for you to fetch changes up to df74b1d3dff80983e7a30db1346a4a05847d65fa:

  qcow2: Remove unused fields from BDRVQcow2State (2020-03-27 14:47:23 +0100)


Block layer patches:

- Fix another case of mirror block job deadlocks
- Minor fixes


Chen Qun (1):
  block/iscsi:use the flags in iscsi_open() prevent Clang warning

Kevin Wolf (3):
  Revert "mirror: Don't let an operation wait for itself"
  mirror: Wait only for in-flight operations
  qcow2: Remove unused fields from BDRVQcow2State

Minwoo Im (1):
  nvme: Print 'cqid' for nvme_del_cq

Vladimir Sementsov-Ogievskiy (1):
  block: fix bdrv_root_attach_child forget to unref child_bs

 block/qcow2.h |  3 ---
 block.c   |  1 +
 block/iscsi.c |  2 +-
 block/mirror.c| 30 +-
 hw/block/trace-events |  2 +-
 5 files changed, 20 insertions(+), 18 deletions(-)




[PULL 2/6] block: fix bdrv_root_attach_child forget to unref child_bs

2020-03-27 Thread Kevin Wolf
From: Vladimir Sementsov-Ogievskiy 

bdrv_root_attach_child promises to drop child_bs reference on failure.
It does it on first handled failure path, but not on the second. Fix
that.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Message-Id: <20200324155921.23822-1-vsement...@virtuozzo.com>
Signed-off-by: Kevin Wolf 
---
 block.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block.c b/block.c
index af3faf664e..2e3905c99e 100644
--- a/block.c
+++ b/block.c
@@ -2617,6 +2617,7 @@ BdrvChild *bdrv_root_attach_child(BlockDriverState 
*child_bs,
 error_propagate(errp, local_err);
 g_free(child);
 bdrv_abort_perm_update(child_bs);
+bdrv_unref(child_bs);
 return NULL;
 }
 }
-- 
2.20.1




[PULL 3/6] nvme: Print 'cqid' for nvme_del_cq

2020-03-27 Thread Kevin Wolf
From: Minwoo Im 

The given argument for this trace should be cqid, not sqid.

Signed-off-by: Minwoo Im 
Message-Id: <20200324140646.8274-1-minwoo.im@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Stefano Garzarella 
Signed-off-by: Kevin Wolf 
---
 hw/block/trace-events | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/block/trace-events b/hw/block/trace-events
index f78939fa9d..bf6d11b58b 100644
--- a/hw/block/trace-events
+++ b/hw/block/trace-events
@@ -37,7 +37,7 @@ nvme_rw(const char *verb, uint32_t blk_count, uint64_t 
byte_count, uint64_t lba)
 nvme_create_sq(uint64_t addr, uint16_t sqid, uint16_t cqid, uint16_t qsize, 
uint16_t qflags) "create submission queue, addr=0x%"PRIx64", sqid=%"PRIu16", 
cqid=%"PRIu16", qsize=%"PRIu16", qflags=%"PRIu16""
 nvme_create_cq(uint64_t addr, uint16_t cqid, uint16_t vector, uint16_t size, 
uint16_t qflags, int ien) "create completion queue, addr=0x%"PRIx64", 
cqid=%"PRIu16", vector=%"PRIu16", qsize=%"PRIu16", qflags=%"PRIu16", ien=%d"
 nvme_del_sq(uint16_t qid) "deleting submission queue sqid=%"PRIu16""
-nvme_del_cq(uint16_t cqid) "deleted completion queue, sqid=%"PRIu16""
+nvme_del_cq(uint16_t cqid) "deleted completion queue, cqid=%"PRIu16""
 nvme_identify_ctrl(void) "identify controller"
 nvme_identify_ns(uint16_t ns) "identify namespace, nsid=%"PRIu16""
 nvme_identify_nslist(uint16_t ns) "identify namespace list, nsid=%"PRIu16""
-- 
2.20.1




Re: [PATCH v2] qcow2: Forbid discard in qcow2 v2 images with backing files

2020-03-27 Thread Kevin Wolf
Am 24.03.2020 um 13:16 hat Alberto Garcia geschrieben:
> A discard request deallocates the selected clusters so they read back
> as zeroes. This is done by clearing the cluster offset field and
> setting QCOW_OFLAG_ZERO in the L2 entry.
> 
> This flag is however only supported when qcow_version >= 3. In older
> images the cluster is simply deallocated, exposing any possible stale
> data from the backing file.
> 
> Since discard is an advisory operation it's safer to simply forbid it
> in this scenario.
> 
> Note that we are adding this check to qcow2_co_pdiscard() and not to
> qcow2_cluster_discard() or discard_in_l2_slice() because the last
> two are also used by qcow2_snapshot_create() to discard the clusters
> used by the VM state. In this case there's no risk of exposing stale
> data to the guest and we really want that the clusters are always
> discarded.
> 
> Signed-off-by: Alberto Garcia 

The test number 289 is already taken, so this needs a rebase.

> +echo
> +echo "### Test 'qemu-io -c discard' on a QCOW2 image without a backing file"
> +echo
> +for qcow2_compat in 0.10 1.1; do
> +echo "# Create an image with compat=$qcow2_compat without a backing file"
> +_make_test_img -o "compat=$qcow2_compat" 128k
> +
> +echo "# Fill all clusters with data and then discard them"
> +$QEMU_IO -c 'write -P 0x01 0 128k' "$TEST_IMG" | _filter_qemu_io
> +$QEMU_IO -c 'discard 0 128k' "$TEST_IMG" | _filter_qemu_io
> +
> +echo "# Read the data from the discarded clusters"
> +$QEMU_IO -c 'read -P 0x00 0 128k' "$TEST_IMG" | _filter_qemu_io
> +done

How about adding a 'qemu-img map' call just to have some more direct
information about what happened to the allocation status?

> +
> +echo
> +echo "### Test 'qemu-io -c discard' on a QCOW2 image with a backing file"
> +echo
> +
> +echo "# Create a backing image and fill it with data"
> +BACKING_IMG="$TEST_IMG.base"
> +TEST_IMG="$BACKING_IMG" _make_test_img 128k
> +$QEMU_IO -c 'write -P 0xff 0 128k' "$BACKING_IMG" | _filter_qemu_io
> +
> +for qcow2_compat in 0.10 1.1; do
> +echo "# Create an image with compat=$qcow2_compat and a backing file"
> +_make_test_img -o "compat=$qcow2_compat" -b "$BACKING_IMG"

I would write some non-zero data to the backing file so that you can
later distinguish zero clusters from unallocated clusters.

> +echo "# Fill all clusters with data and then discard them"
> +$QEMU_IO -c 'write -P 0x01 0 128k' "$TEST_IMG" | _filter_qemu_io
> +$QEMU_IO -c 'discard 0 128k' "$TEST_IMG" | _filter_qemu_io
> +
> +echo "# Read the data from the discarded clusters"
> +if [ "$qcow2_compat" = "1.1" ]; then
> +# In qcow2 v3 clusters are zeroed (with QCOW_OFLAG_ZERO)
> +$QEMU_IO -c 'read -P 0x00 0 128k' "$TEST_IMG" | _filter_qemu_io
> +else
> +# In qcow2 v2 if there's a backing image we cannot zero the clusters
> +# without exposing the backing file data so discard does nothing
> +$QEMU_IO -c 'read -P 0x01 0 128k' "$TEST_IMG" | _filter_qemu_io
> +fi
> +done

Kevin




Re: [PATCH v9 3/4] qcow2: add zstd cluster compression

2020-03-27 Thread Vladimir Sementsov-Ogievskiy

27.03.2020 12:40, Denis Plotnikov wrote:



On 27.03.2020 11:43, Vladimir Sementsov-Ogievskiy wrote:

Should we note somehow in qcow2 spec that we use streamed version of zstd with 
specific end byte?

We didn't do it for zlib. zstd does it the same way as zlib, saves the 
compression output to some buffer.


Looked through zstd format spec, I don't thing that there are may be any 
principal difference between data
compressed by simple api or by streamed, so now I don't think that we need any 
additional comment.





23.03.2020 17:25, Denis Plotnikov wrote:

zstd significantly reduces cluster compression time.
It provides better compression performance maintaining
the same level of the compression ratio in comparison with
zlib, which, at the moment, is the only compression
method available.

The performance test results:
Test compresses and decompresses qemu qcow2 image with just
installed rhel-7.6 guest.
Image cluster size: 64K. Image on disk size: 2.2G

The test was conducted with brd disk to reduce the influence
of disk subsystem to the test results.
The results is given in seconds.

compress cmd:
   time ./qemu-img convert -O qcow2 -c -o compression_type=[zlib|zstd]
   src.img [zlib|zstd]_compressed.img
decompress cmd
   time ./qemu-img convert -O qcow2
   [zlib|zstd]_compressed.img uncompressed.img

    compression   decompression
  zlib   zstd   zlib zstd

real 65.5   16.3 (-75 %)    1.9  1.6 (-16 %)
user 65.0   15.8    5.3  2.5
sys   3.3    0.2    2.0  2.0

Both ZLIB and ZSTD gave the same compression ratio: 1.57
compressed image size in both cases: 1.4G

Signed-off-by: Denis Plotnikov 
QAPI part:
Acked-by: Markus Armbruster 
---


[..]


+
+/*
+ * qcow2_zstd_compress()
+ *
+ * Compress @src_size bytes of data using zstd compression method
+ *
+ * @dest - destination buffer, @dest_size bytes
+ * @src - source buffer, @src_size bytes
+ *
+ * Returns: compressed size on success
+ *  -ENOMEM destination buffer is not enough to store compressed data
+ *  -EIO    on any other error
+ */
+static ssize_t qcow2_zstd_compress(void *dest, size_t dest_size,
+   const void *src, size_t src_size)
+{
+    size_t ret;
+    ZSTD_outBuffer output = { dest, dest_size, 0 };
+    ZSTD_inBuffer input = { src, src_size, 0 };
+    ZSTD_CCtx *cctx = ZSTD_createCCtx();
+
+    if (!cctx) {
+    return -EIO;
+    }
+    /*
+ * ZSTD spec: "You must continue calling ZSTD_compressStream2()
+ * with ZSTD_e_end until it returns 0, at which point you are
+ * free to start a new frame"
+ */
+    {
+    /*
+ * zstd simple interface requires the exact compressed size.
+ * zstd stream interface reads the comressed size from
+ * the compressed stream frame.
+ * Instruct zstd to compress the whole buffer and write
+ * the frame which includes the compressed size.
+ * This allows as to use zstd streaming semantics and
+ * don't store the compressed size for the zstd decompression.
+ */
+    ret = ZSTD_compressStream2(cctx, , , ZSTD_e_end);
+    if (ZSTD_isError(ret)) {
+    ret = -EIO;
+    goto out;
+    }
+    /* Dest buffer isn't big enough to store compressed content */
+    if (output.pos + ret > output.size) {
+    ret = -ENOMEM;
+    goto out;
+    }
+    } while (ret);
+
+    /* if no error, the input data must be fully consumed */
+    assert(input.pos == input.size);
+    /* make sure we can safely return compressed buffer size with ssize_t *//z
+    assert(output.pos <= SSIZE_MAX);
+    ret = output.pos;
+
+out:
+    ZSTD_freeCCtx(cctx);
+    return ret;
+}
+
+/*
+ * qcow2_zstd_decompress()
+ *
+ * Decompress some data (not more than @src_size bytes) to produce exactly
+ * @dest_size bytes using zstd compression method
+ *
+ * @dest - destination buffer, @dest_size bytes
+ * @src - source buffer, @src_size bytes
+ *
+ * Returns: 0 on success
+ *  -EIO on any error
+ */
+static ssize_t qcow2_zstd_decompress(void *dest, size_t dest_size,
+ const void *src, size_t src_size)
+{
+    size_t ret = 0;
+    ZSTD_outBuffer output = { dest, dest_size, 0 };
+    ZSTD_inBuffer input = { src, src_size, 0 };
+    ZSTD_DCtx *dctx = ZSTD_createDCtx();
+
+    if (!dctx) {
+    return -EIO;
+    }
+
+    {
+    ret = ZSTD_decompressStream(dctx, , );
+    if (ZSTD_isError(ret)) {
+    ret = -EIO;
+    goto out;
+    }
+    /*
+ * Dest buffer size is the image cluster size.
+ * It should be big enough to store uncompressed content.
+ * There shouldn't be any cases when the decompressed content
+ * size is greater then the cluster size.
+ 

Re: [PATCH v2 0/3] Fix some AIO context locking in jobs

2020-03-27 Thread Dietmar Maurer
> I *think* the second patch also fixes the hangs on backup abort that I and
> Dietmar noticed in v1, but I'm not sure, they we're somewhat intermittent
> before too.

After more test, I am 100% sure the bug (or another one) is still there. 
Here is how to trigger:

1. use latest qemu sources from githup
2. apply those 3 patches from Stefan
2. create a VM with virtio-scsis-single drive using io-thread
3. inside VM install Debian buster
4. inside VM, run "stress -d 5"

Then run a series of backups, aborting them after a few seconds:

# start loop 

qmp: { "execute": "drive-backup", "arguments": { "device": "drive-scsi0", 
"sync": "full", "target": "backup-scsi0.raw" } }

sleep 3 second

qmp: { "execute": "'block-job-cancel", "arguments": { "device": "drive-scsi0" } 
}

# end loop

After several iterations (mostly < 50) the VM freezes (this time somewhere 
inside drive_backup_prepare):


(gdb) bt
#0  0x7f61ea09e916 in __GI_ppoll (fds=0x7f6158130c40, nfds=2, 
timeout=, timeout@entry=0x0, sigmask=sigmask@entry=0x0)
at ../sysdeps/unix/sysv/linux/ppoll.c:39
#1  0x55f708401c79 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=)
at /usr/include/x86_64-linux-gnu/bits/poll2.h:77
#2  0x55f708401c79 in qemu_poll_ns (fds=, nfds=, timeout=timeout@entry=-1) at util/qemu-timer.c:335
#3  0x55f708404461 in fdmon_poll_wait (ctx=0x7f61dcd05e80, 
ready_list=0x7ffc4e7fbde8, timeout=-1) at util/fdmon-poll.c:79
#4  0x55f708403a47 in aio_poll (ctx=0x7f61dcd05e80, 
blocking=blocking@entry=true) at util/aio-posix.c:589
#5  0x55f708364c03 in bdrv_do_drained_begin (poll=, 
ignore_bds_parents=false, parent=0x0, recursive=false, bs=0x7f61dcd4c500)
at block/io.c:429
#6  0x55f708364c03 in bdrv_do_drained_begin
(bs=0x7f61dcd4c500, recursive=, parent=0x0, 
ignore_bds_parents=, poll=) at block/io.c:395
#7  0x55f7081016f9 in drive_backup_prepare (common=0x7f61d9a0c280, 
errp=0x7ffc4e7fbf28) at blockdev.c:1755
#8  0x55f708103e6a in qmp_transaction 
(dev_list=dev_list@entry=0x7ffc4e7fbfa0, has_props=has_props@entry=false, 
props=0x7f61d9a304e8, 
props@entry=0x0, errp=errp@entry=0x7ffc4e7fbfd8) at blockdev.c:2401
#9  0x55f708105322 in blockdev_do_action (errp=0x7ffc4e7fbfd8, 
action=0x7ffc4e7fbf90) at blockdev.c:1054
#10 0x55f708105322 in qmp_drive_backup (backup=backup@entry=0x7ffc4e7fbfe0, 
errp=errp@entry=0x7ffc4e7fbfd8) at blockdev.c:3129
#11 0x55f7082c0101 in qmp_marshal_drive_backup (args=, 
ret=, errp=0x7ffc4e7fc0b8)
at qapi/qapi-commands-block-core.c:555
#12 0x55f7083b7338 in qmp_dispatch (cmds=0x55f708904000 , 
request=, allow_oob=)
at qapi/qmp-dispatch.c:155
#13 0x55f7082a1bd1 in monitor_qmp_dispatch (mon=0x7f61dcd15d80, 
req=) at monitor/qmp.c:145
#14 0x55f7082a23ba in monitor_qmp_bh_dispatcher (data=) at 
monitor/qmp.c:234
#15 0x55f708400205 in aio_bh_call (bh=0x7f61dd28f960) at util/async.c:164
#16 0x55f708400205 in aio_bh_poll (ctx=ctx@entry=0x7f61dd33ef80) at 
util/async.c:164
#17 0x55f70840388e in aio_dispatch (ctx=0x7f61dd33ef80) at 
util/aio-posix.c:380
#18 0x55f7084000ee in aio_ctx_dispatch (source=, 
callback=, user_data=) at util/async.c:298
#19 0x7f61ec069f2e in g_main_context_dispatch () at 
/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#20 0x55f708402af8 in glib_pollfds_poll () at util/main-loop.c:219
#21 0x55f708402af8 in os_host_main_loop_wait (timeout=) at 
util/main-loop.c:242
#22 0x55f708402af8 in main_loop_wait (nonblocking=nonblocking@entry=0) at 
util/main-loop.c:518
#23 0x55f70809e589 in qemu_main_loop () at 
/home/dietmar/pve5-devel/mirror_qemu/softmmu/vl.c:1665
#24 0x55f707fa2c3e in main (argc=, argv=, 
envp=)
at /home/dietmar/pve5-devel/mirror_qemu/softmmu/main.c:49




[PATCH v3 1/2] virtio-blk: delete vqs on the error path in realize()

2020-03-27 Thread Pan Nengyuan
virtio_vqs forgot to free on the error path in realize(). Fix that.

The asan stack:
Direct leak of 14336 byte(s) in 1 object(s) allocated from:
#0 0x7f58b93fd970 in __interceptor_calloc (/lib64/libasan.so.5+0xef970)
#1 0x7f58b858249d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5249d)
#2 0x5562cc627f49 in virtio_add_queue /mnt/sdb/qemu/hw/virtio/virtio.c:2413
#3 0x5562cc4b524a in virtio_blk_device_realize 
/mnt/sdb/qemu/hw/block/virtio-blk.c:1202
#4 0x5562cc613050 in virtio_device_realize 
/mnt/sdb/qemu/hw/virtio/virtio.c:3615
#5 0x5562ccb7a568 in device_set_realized /mnt/sdb/qemu/hw/core/qdev.c:891
#6 0x5562cd39cd45 in property_set_bool /mnt/sdb/qemu/qom/object.c:2238

Reported-by: Euler Robot 
Signed-off-by: Pan Nengyuan 
Reviewed-by: Stefano Garzarella 
---
v2->v1:
- Fix incorrect free in v1, it will cause a uaf.
---
Cc: Stefan Hajnoczi 
Cc: Kevin Wolf 
Cc: Max Reitz 
Cc: qemu-block@nongnu.org
---
 hw/block/virtio-blk.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 142863a3b2..97ba8a2187 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -1204,6 +1204,9 @@ static void virtio_blk_device_realize(DeviceState *dev, 
Error **errp)
 virtio_blk_data_plane_create(vdev, conf, >dataplane, );
 if (err != NULL) {
 error_propagate(errp, err);
+for (i = 0; i < conf->num_queues; i++) {
+virtio_del_queue(vdev, i);
+}
 virtio_cleanup(vdev);
 return;
 }
-- 
2.18.2



Re: [PATCH v9 3/4] qcow2: add zstd cluster compression

2020-03-27 Thread Denis Plotnikov




On 27.03.2020 11:43, Vladimir Sementsov-Ogievskiy wrote:
Should we note somehow in qcow2 spec that we use streamed version of 
zstd with specific end byte?
We didn't do it for zlib. zstd does it the same way as zlib, saves the 
compression output to some buffer.




23.03.2020 17:25, Denis Plotnikov wrote:

zstd significantly reduces cluster compression time.
It provides better compression performance maintaining
the same level of the compression ratio in comparison with
zlib, which, at the moment, is the only compression
method available.

The performance test results:
Test compresses and decompresses qemu qcow2 image with just
installed rhel-7.6 guest.
Image cluster size: 64K. Image on disk size: 2.2G

The test was conducted with brd disk to reduce the influence
of disk subsystem to the test results.
The results is given in seconds.

compress cmd:
   time ./qemu-img convert -O qcow2 -c -o compression_type=[zlib|zstd]
   src.img [zlib|zstd]_compressed.img
decompress cmd
   time ./qemu-img convert -O qcow2
   [zlib|zstd]_compressed.img uncompressed.img

    compression   decompression
  zlib   zstd   zlib zstd

real 65.5   16.3 (-75 %)    1.9  1.6 (-16 %)
user 65.0   15.8    5.3  2.5
sys   3.3    0.2    2.0  2.0

Both ZLIB and ZSTD gave the same compression ratio: 1.57
compressed image size in both cases: 1.4G

Signed-off-by: Denis Plotnikov 
QAPI part:
Acked-by: Markus Armbruster 
---


[..]


+
+/*
+ * qcow2_zstd_compress()
+ *
+ * Compress @src_size bytes of data using zstd compression method
+ *
+ * @dest - destination buffer, @dest_size bytes
+ * @src - source buffer, @src_size bytes
+ *
+ * Returns: compressed size on success
+ *  -ENOMEM destination buffer is not enough to store 
compressed data

+ *  -EIO    on any other error
+ */
+static ssize_t qcow2_zstd_compress(void *dest, size_t dest_size,
+   const void *src, size_t src_size)
+{
+    size_t ret;
+    ZSTD_outBuffer output = { dest, dest_size, 0 };
+    ZSTD_inBuffer input = { src, src_size, 0 };
+    ZSTD_CCtx *cctx = ZSTD_createCCtx();
+
+    if (!cctx) {
+    return -EIO;
+    }
+    /*
+ * ZSTD spec: "You must continue calling ZSTD_compressStream2()
+ * with ZSTD_e_end until it returns 0, at which point you are
+ * free to start a new frame"
+ */
+    {
+    /*
+ * zstd simple interface requires the exact compressed size.
+ * zstd stream interface reads the comressed size from
+ * the compressed stream frame.
+ * Instruct zstd to compress the whole buffer and write
+ * the frame which includes the compressed size.
+ * This allows as to use zstd streaming semantics and
+ * don't store the compressed size for the zstd decompression.
+ */
+    ret = ZSTD_compressStream2(cctx, , , ZSTD_e_end);
+    if (ZSTD_isError(ret)) {
+    ret = -EIO;
+    goto out;
+    }
+    /* Dest buffer isn't big enough to store compressed content */
+    if (output.pos + ret > output.size) {
+    ret = -ENOMEM;
+    goto out;
+    }
+    } while (ret);
+
+    /* if no error, the input data must be fully consumed */
+    assert(input.pos == input.size);
+    /* make sure we can safely return compressed buffer size with 
ssize_t *//z

+    assert(output.pos <= SSIZE_MAX);
+    ret = output.pos;
+
+out:
+    ZSTD_freeCCtx(cctx);
+    return ret;
+}
+
+/*
+ * qcow2_zstd_decompress()
+ *
+ * Decompress some data (not more than @src_size bytes) to produce 
exactly

+ * @dest_size bytes using zstd compression method
+ *
+ * @dest - destination buffer, @dest_size bytes
+ * @src - source buffer, @src_size bytes
+ *
+ * Returns: 0 on success
+ *  -EIO on any error
+ */
+static ssize_t qcow2_zstd_decompress(void *dest, size_t dest_size,
+ const void *src, size_t src_size)
+{
+    size_t ret = 0;
+    ZSTD_outBuffer output = { dest, dest_size, 0 };
+    ZSTD_inBuffer input = { src, src_size, 0 };
+    ZSTD_DCtx *dctx = ZSTD_createDCtx();
+
+    if (!dctx) {
+    return -EIO;
+    }
+
+    {
+    ret = ZSTD_decompressStream(dctx, , );
+    if (ZSTD_isError(ret)) {
+    ret = -EIO;
+    goto out;
+    }
+    /*
+ * Dest buffer size is the image cluster size.
+ * It should be big enough to store uncompressed content.
+ * There shouldn't be any cases when the decompressed content
+ * size is greater then the cluster size.
+ */
+    if (output.pos + ret > output.size) {
+    ret = -EIO;
+    goto out;
+    }
+    } while (ret);



Hmm. Unfortunately, zstd spec is not enough verbose to understand how 
to use

these functions :).

But I found this 

Re: [PATCH v2 1/2] virtio-blk: delete vqs on the error path in realize()

2020-03-27 Thread Stefano Garzarella
On Fri, Mar 27, 2020 at 12:56:19PM +0800, Pan Nengyuan wrote:
> virtio_vqs forgot to free on the error path in realize(). Fix that.
> 
> The asan stack:
> Direct leak of 14336 byte(s) in 1 object(s) allocated from:
> #0 0x7f58b93fd970 in __interceptor_calloc (/lib64/libasan.so.5+0xef970)
> #1 0x7f58b858249d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5249d)
> #2 0x5562cc627f49 in virtio_add_queue 
> /mnt/sdb/qemu/hw/virtio/virtio.c:2413
> #3 0x5562cc4b524a in virtio_blk_device_realize 
> /mnt/sdb/qemu/hw/block/virtio-blk.c:1202
> #4 0x5562cc613050 in virtio_device_realize 
> /mnt/sdb/qemu/hw/virtio/virtio.c:3615
> #5 0x5562ccb7a568 in device_set_realized /mnt/sdb/qemu/hw/core/qdev.c:891
> #6 0x5562cd39cd45 in property_set_bool /mnt/sdb/qemu/qom/object.c:2238
> 
> Reported-by: Euler Robot 
> Signed-off-by: Pan Nengyuan 
> ---
> v2->v1:
> - Fix incorrect free in v1, it will cause a uaf.
> ---
> Cc: Stefan Hajnoczi 
> Cc: Kevin Wolf 
> Cc: Max Reitz 
> Cc: qemu-block@nongnu.org
> ---
>  hw/block/virtio-blk.c | 3 +++
>  1 file changed, 3 insertions(+)

Reviewed-by: Stefano Garzarella 

> 
> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
> index 142863a3b2..97ba8a2187 100644
> --- a/hw/block/virtio-blk.c
> +++ b/hw/block/virtio-blk.c
> @@ -1204,6 +1204,9 @@ static void virtio_blk_device_realize(DeviceState *dev, 
> Error **errp)
>  virtio_blk_data_plane_create(vdev, conf, >dataplane, );
>  if (err != NULL) {
>  error_propagate(errp, err);
> +for (i = 0; i < conf->num_queues; i++) {
> +virtio_del_queue(vdev, i);
> +}
>  virtio_cleanup(vdev);
>  return;
>  }
> -- 
> 2.18.2
> 
> 




Re: [PATCH 1/2] virtio-blk: delete vqs on the error path in realize()

2020-03-27 Thread Pan Nengyuan



On 3/27/2020 4:41 PM, Stefano Garzarella wrote:
> On Fri, Mar 27, 2020 at 11:56:49AM +0800, Pan Nengyuan wrote:
>> virtio_vqs forgot to free on the error path in realize(). Fix that.
>>
>> The asan stack:
>> Direct leak of 14336 byte(s) in 1 object(s) allocated from:
>> #0 0x7f58b93fd970 in __interceptor_calloc (/lib64/libasan.so.5+0xef970)
>> #1 0x7f58b858249d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5249d)
>> #2 0x5562cc627f49 in virtio_add_queue 
>> /mnt/sdb/qemu/hw/virtio/virtio.c:2413
>> #3 0x5562cc4b524a in virtio_blk_device_realize 
>> /mnt/sdb/qemu/hw/block/virtio-blk.c:1202
>> #4 0x5562cc613050 in virtio_device_realize 
>> /mnt/sdb/qemu/hw/virtio/virtio.c:3615
>> #5 0x5562ccb7a568 in device_set_realized /mnt/sdb/qemu/hw/core/qdev.c:891
>> #6 0x5562cd39cd45 in property_set_bool /mnt/sdb/qemu/qom/object.c:2238
>>
>> Reported-by: Euler Robot 
>> Signed-off-by: Pan Nengyuan 
>> ---
>> Cc: Stefan Hajnoczi 
>> Cc: Kevin Wolf 
>> Cc: Max Reitz 
>> Cc: qemu-block@nongnu.org
>> ---
>>  hw/block/virtio-blk.c | 8 ++--
>>  1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
>> index 142863a3b2..a6682c2ced 100644
>> --- a/hw/block/virtio-blk.c
>> +++ b/hw/block/virtio-blk.c
>> @@ -1204,8 +1204,7 @@ static void virtio_blk_device_realize(DeviceState 
>> *dev, Error **errp)
>>  virtio_blk_data_plane_create(vdev, conf, >dataplane, );
>>  if (err != NULL) {
>>  error_propagate(errp, err);
>> -virtio_cleanup(vdev);
>> -return;
>> +goto fail;
>>  }
>>  
>>  s->change = qemu_add_vm_change_state_handler(virtio_blk_dma_restart_cb, 
>> s);
>> @@ -1218,6 +1217,11 @@ static void virtio_blk_device_realize(DeviceState 
>> *dev, Error **errp)
>>   conf->conf.lcyls,
>>   conf->conf.lheads,
>>   conf->conf.lsecs);
> 
> I think we should add a return here, otherwise we will remove the virt
> queues also in the success case.

Yes, I have sent a new version v2 and changed it, you can review on it.

Thanks.

> 
> Thanks,
> Stefano
> 
>> +fail:
>> +for (i = 0; i < conf->num_queues; i++) {
>> +virtio_del_queue(vdev, i);
>> +}
>> +virtio_cleanup(vdev);
>>  }
>>  
>>  static void virtio_blk_device_unrealize(DeviceState *dev, Error **errp)
>> -- 
>> 2.18.2
>>
>>
> 
> .
> 



Re: [PATCH v9 3/4] qcow2: add zstd cluster compression

2020-03-27 Thread Vladimir Sementsov-Ogievskiy

Should we note somehow in qcow2 spec that we use streamed version of zstd with 
specific end byte?

23.03.2020 17:25, Denis Plotnikov wrote:

zstd significantly reduces cluster compression time.
It provides better compression performance maintaining
the same level of the compression ratio in comparison with
zlib, which, at the moment, is the only compression
method available.

The performance test results:
Test compresses and decompresses qemu qcow2 image with just
installed rhel-7.6 guest.
Image cluster size: 64K. Image on disk size: 2.2G

The test was conducted with brd disk to reduce the influence
of disk subsystem to the test results.
The results is given in seconds.

compress cmd:
   time ./qemu-img convert -O qcow2 -c -o compression_type=[zlib|zstd]
   src.img [zlib|zstd]_compressed.img
decompress cmd
   time ./qemu-img convert -O qcow2
   [zlib|zstd]_compressed.img uncompressed.img

compression   decompression
  zlib   zstd   zlib zstd

real 65.5   16.3 (-75 %)1.9  1.6 (-16 %)
user 65.0   15.85.3  2.5
sys   3.30.22.0  2.0

Both ZLIB and ZSTD gave the same compression ratio: 1.57
compressed image size in both cases: 1.4G

Signed-off-by: Denis Plotnikov 
QAPI part:
Acked-by: Markus Armbruster 
---


[..]


+
+/*
+ * qcow2_zstd_compress()
+ *
+ * Compress @src_size bytes of data using zstd compression method
+ *
+ * @dest - destination buffer, @dest_size bytes
+ * @src - source buffer, @src_size bytes
+ *
+ * Returns: compressed size on success
+ *  -ENOMEM destination buffer is not enough to store compressed data
+ *  -EIOon any other error
+ */
+static ssize_t qcow2_zstd_compress(void *dest, size_t dest_size,
+   const void *src, size_t src_size)
+{
+size_t ret;
+ZSTD_outBuffer output = { dest, dest_size, 0 };
+ZSTD_inBuffer input = { src, src_size, 0 };
+ZSTD_CCtx *cctx = ZSTD_createCCtx();
+
+if (!cctx) {
+return -EIO;
+}
+/*
+ * ZSTD spec: "You must continue calling ZSTD_compressStream2()
+ * with ZSTD_e_end until it returns 0, at which point you are
+ * free to start a new frame"
+ */
+{
+/*
+ * zstd simple interface requires the exact compressed size.
+ * zstd stream interface reads the comressed size from
+ * the compressed stream frame.
+ * Instruct zstd to compress the whole buffer and write
+ * the frame which includes the compressed size.
+ * This allows as to use zstd streaming semantics and
+ * don't store the compressed size for the zstd decompression.
+ */
+ret = ZSTD_compressStream2(cctx, , , ZSTD_e_end);
+if (ZSTD_isError(ret)) {
+ret = -EIO;
+goto out;
+}
+/* Dest buffer isn't big enough to store compressed content */
+if (output.pos + ret > output.size) {
+ret = -ENOMEM;
+goto out;
+}
+} while (ret);
+
+/* if no error, the input data must be fully consumed */
+assert(input.pos == input.size);
+/* make sure we can safely return compressed buffer size with ssize_t *//z
+assert(output.pos <= SSIZE_MAX);
+ret = output.pos;
+
+out:
+ZSTD_freeCCtx(cctx);
+return ret;
+}
+
+/*
+ * qcow2_zstd_decompress()
+ *
+ * Decompress some data (not more than @src_size bytes) to produce exactly
+ * @dest_size bytes using zstd compression method
+ *
+ * @dest - destination buffer, @dest_size bytes
+ * @src - source buffer, @src_size bytes
+ *
+ * Returns: 0 on success
+ *  -EIO on any error
+ */
+static ssize_t qcow2_zstd_decompress(void *dest, size_t dest_size,
+ const void *src, size_t src_size)
+{
+size_t ret = 0;
+ZSTD_outBuffer output = { dest, dest_size, 0 };
+ZSTD_inBuffer input = { src, src_size, 0 };
+ZSTD_DCtx *dctx = ZSTD_createDCtx();
+
+if (!dctx) {
+return -EIO;
+}
+
+{
+ret = ZSTD_decompressStream(dctx, , );
+if (ZSTD_isError(ret)) {
+ret = -EIO;
+goto out;
+}
+/*
+ * Dest buffer size is the image cluster size.
+ * It should be big enough to store uncompressed content.
+ * There shouldn't be any cases when the decompressed content
+ * size is greater then the cluster size.
+ */
+if (output.pos + ret > output.size) {
+ret = -EIO;
+goto out;
+}
+} while (ret);



Hmm. Unfortunately, zstd spec is not enough verbose to understand how to use
these functions :).

But I found this in comment in 
https://github.com/facebook/zstd/blob/dev/examples/streaming_decompression.c :

/* The return code is zero if the frame is complete, but there may
 * 

Re: [PATCH 1/2] virtio-blk: delete vqs on the error path in realize()

2020-03-27 Thread Stefano Garzarella
On Fri, Mar 27, 2020 at 11:56:49AM +0800, Pan Nengyuan wrote:
> virtio_vqs forgot to free on the error path in realize(). Fix that.
> 
> The asan stack:
> Direct leak of 14336 byte(s) in 1 object(s) allocated from:
> #0 0x7f58b93fd970 in __interceptor_calloc (/lib64/libasan.so.5+0xef970)
> #1 0x7f58b858249d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5249d)
> #2 0x5562cc627f49 in virtio_add_queue 
> /mnt/sdb/qemu/hw/virtio/virtio.c:2413
> #3 0x5562cc4b524a in virtio_blk_device_realize 
> /mnt/sdb/qemu/hw/block/virtio-blk.c:1202
> #4 0x5562cc613050 in virtio_device_realize 
> /mnt/sdb/qemu/hw/virtio/virtio.c:3615
> #5 0x5562ccb7a568 in device_set_realized /mnt/sdb/qemu/hw/core/qdev.c:891
> #6 0x5562cd39cd45 in property_set_bool /mnt/sdb/qemu/qom/object.c:2238
> 
> Reported-by: Euler Robot 
> Signed-off-by: Pan Nengyuan 
> ---
> Cc: Stefan Hajnoczi 
> Cc: Kevin Wolf 
> Cc: Max Reitz 
> Cc: qemu-block@nongnu.org
> ---
>  hw/block/virtio-blk.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
> index 142863a3b2..a6682c2ced 100644
> --- a/hw/block/virtio-blk.c
> +++ b/hw/block/virtio-blk.c
> @@ -1204,8 +1204,7 @@ static void virtio_blk_device_realize(DeviceState *dev, 
> Error **errp)
>  virtio_blk_data_plane_create(vdev, conf, >dataplane, );
>  if (err != NULL) {
>  error_propagate(errp, err);
> -virtio_cleanup(vdev);
> -return;
> +goto fail;
>  }
>  
>  s->change = qemu_add_vm_change_state_handler(virtio_blk_dma_restart_cb, 
> s);
> @@ -1218,6 +1217,11 @@ static void virtio_blk_device_realize(DeviceState 
> *dev, Error **errp)
>   conf->conf.lcyls,
>   conf->conf.lheads,
>   conf->conf.lsecs);

I think we should add a return here, otherwise we will remove the virt
queues also in the success case.

Thanks,
Stefano

> +fail:
> +for (i = 0; i < conf->num_queues; i++) {
> +virtio_del_queue(vdev, i);
> +}
> +virtio_cleanup(vdev);
>  }
>  
>  static void virtio_blk_device_unrealize(DeviceState *dev, Error **errp)
> -- 
> 2.18.2
> 
> 




Re: [PATCH v2 0/3] Fix some AIO context locking in jobs

2020-03-27 Thread Dietmar Maurer
Wait - maybe this was a bug in my test setup - I am unable to reproduce now..

@Stefan Reiter: Are you able to trigger this?

> > I *think* the second patch also fixes the hangs on backup abort that I and
> > Dietmar noticed in v1, but I'm not sure, they we're somewhat intermittent
> > before too.
> 
> No, I still get this freeze:
> 
> 0  0x7f0aa4866916 in __GI_ppoll (fds=0x7f0a12935c40, nfds=2, 
> timeout=, timeout@entry=0x0, sigmask=sigmask@entry=0x0)
> at ../sysdeps/unix/sysv/linux/ppoll.c:39
> #1  0x55d3a6c91d29 in ppoll (__ss=0x0, __timeout=0x0, __nfds= out>, __fds=)
> at /usr/include/x86_64-linux-gnu/bits/poll2.h:77
> #2  0x55d3a6c91d29 in qemu_poll_ns (fds=, nfds= out>, timeout=timeout@entry=-1) at util/qemu-timer.c:335
> #3  0x55d3a6c94511 in fdmon_poll_wait (ctx=0x7f0a97505e80, 
> ready_list=0x7fff67e5c358, timeout=-1) at util/fdmon-poll.c:79
> #4  0x55d3a6c93af7 in aio_poll (ctx=0x7f0a97505e80, 
> blocking=blocking@entry=true) at util/aio-posix.c:589
> #5  0x55d3a6bf4cd3 in bdrv_do_drained_begin
> (poll=, ignore_bds_parents=false, parent=0x0, 
> recursive=false, bs=0x7f0a9754c280) at block/io.c:429
> #6  0x55d3a6bf4cd3 in bdrv_do_drained_begin
> (bs=0x7f0a9754c280, recursive=, parent=0x0, 
> ignore_bds_parents=, poll=) at block/io.c:395
> #7  0x55d3a6be5c87 in blk_drain (blk=0x7f0a97abcc00) at 
> block/block-backend.c:1617
> #8  0x55d3a6be686d in blk_unref (blk=0x7f0a97abcc00) at 
> block/block-backend.c:473
> #9  0x55d3a6b9e835 in block_job_free (job=0x7f0a15f44e00) at blockjob.c:89
> #10 0x55d3a6b9fe29 in job_unref (job=0x7f0a15f44e00) at job.c:376
> #11 0x55d3a6b9fe29 in job_unref (job=0x7f0a15f44e00) at job.c:368
> #12 0x55d3a6ba07aa in job_finish_sync (job=job@entry=0x7f0a15f44e00, 
> finish=finish@entry=
> 0x55d3a6ba0cd0 , errp=errp@entry=0x0) at job.c:1004
> #13 0x55d3a6ba0cee in job_cancel_sync (job=job@entry=0x7f0a15f44e00) at 
> job.c:947




Re: [PATCH v2 0/3] Fix some AIO context locking in jobs

2020-03-27 Thread Dietmar Maurer
But I need to mention that it takes some time to reproduce this. This time I
run/aborted about 500 backup jobs until it triggers.

> > I *think* the second patch also fixes the hangs on backup abort that I and
> > Dietmar noticed in v1, but I'm not sure, they we're somewhat intermittent
> > before too.
> 
> No, I still get this freeze:
> 
> 0  0x7f0aa4866916 in __GI_ppoll (fds=0x7f0a12935c40, nfds=2, 
> timeout=, timeout@entry=0x0, sigmask=sigmask@entry=0x0)
> at ../sysdeps/unix/sysv/linux/ppoll.c:39
> #1  0x55d3a6c91d29 in ppoll (__ss=0x0, __timeout=0x0, __nfds= out>, __fds=)
> at /usr/include/x86_64-linux-gnu/bits/poll2.h:77
> #2  0x55d3a6c91d29 in qemu_poll_ns (fds=, nfds= out>, timeout=timeout@entry=-1) at util/qemu-timer.c:335
> #3  0x55d3a6c94511 in fdmon_poll_wait (ctx=0x7f0a97505e80, 
> ready_list=0x7fff67e5c358, timeout=-1) at util/fdmon-poll.c:79
> #4  0x55d3a6c93af7 in aio_poll (ctx=0x7f0a97505e80, 
> blocking=blocking@entry=true) at util/aio-posix.c:589
> #5  0x55d3a6bf4cd3 in bdrv_do_drained_begin
> (poll=, ignore_bds_parents=false, parent=0x0, 
> recursive=false, bs=0x7f0a9754c280) at block/io.c:429
> #6  0x55d3a6bf4cd3 in bdrv_do_drained_begin
> (bs=0x7f0a9754c280, recursive=, parent=0x0, 
> ignore_bds_parents=, poll=) at block/io.c:395
> #7  0x55d3a6be5c87 in blk_drain (blk=0x7f0a97abcc00) at 
> block/block-backend.c:1617
> #8  0x55d3a6be686d in blk_unref (blk=0x7f0a97abcc00) at 
> block/block-backend.c:473
> #9  0x55d3a6b9e835 in block_job_free (job=0x7f0a15f44e00) at blockjob.c:89
> #10 0x55d3a6b9fe29 in job_unref (job=0x7f0a15f44e00) at job.c:376
> #11 0x55d3a6b9fe29 in job_unref (job=0x7f0a15f44e00) at job.c:368
> #12 0x55d3a6ba07aa in job_finish_sync (job=job@entry=0x7f0a15f44e00, 
> finish=finish@entry=
> 0x55d3a6ba0cd0 , errp=errp@entry=0x0) at job.c:1004
> #13 0x55d3a6ba0cee in job_cancel_sync (job=job@entry=0x7f0a15f44e00) at 
> job.c:947




Re: [PATCH v2 0/3] Fix some AIO context locking in jobs

2020-03-27 Thread Dietmar Maurer


> I *think* the second patch also fixes the hangs on backup abort that I and
> Dietmar noticed in v1, but I'm not sure, they we're somewhat intermittent
> before too.

No, I still get this freeze:

0  0x7f0aa4866916 in __GI_ppoll (fds=0x7f0a12935c40, nfds=2, 
timeout=, timeout@entry=0x0, sigmask=sigmask@entry=0x0)
at ../sysdeps/unix/sysv/linux/ppoll.c:39
#1  0x55d3a6c91d29 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=)
at /usr/include/x86_64-linux-gnu/bits/poll2.h:77
#2  0x55d3a6c91d29 in qemu_poll_ns (fds=, nfds=, timeout=timeout@entry=-1) at util/qemu-timer.c:335
#3  0x55d3a6c94511 in fdmon_poll_wait (ctx=0x7f0a97505e80, 
ready_list=0x7fff67e5c358, timeout=-1) at util/fdmon-poll.c:79
#4  0x55d3a6c93af7 in aio_poll (ctx=0x7f0a97505e80, 
blocking=blocking@entry=true) at util/aio-posix.c:589
#5  0x55d3a6bf4cd3 in bdrv_do_drained_begin
(poll=, ignore_bds_parents=false, parent=0x0, 
recursive=false, bs=0x7f0a9754c280) at block/io.c:429
#6  0x55d3a6bf4cd3 in bdrv_do_drained_begin
(bs=0x7f0a9754c280, recursive=, parent=0x0, 
ignore_bds_parents=, poll=) at block/io.c:395
#7  0x55d3a6be5c87 in blk_drain (blk=0x7f0a97abcc00) at 
block/block-backend.c:1617
#8  0x55d3a6be686d in blk_unref (blk=0x7f0a97abcc00) at 
block/block-backend.c:473
#9  0x55d3a6b9e835 in block_job_free (job=0x7f0a15f44e00) at blockjob.c:89
#10 0x55d3a6b9fe29 in job_unref (job=0x7f0a15f44e00) at job.c:376
#11 0x55d3a6b9fe29 in job_unref (job=0x7f0a15f44e00) at job.c:368
#12 0x55d3a6ba07aa in job_finish_sync (job=job@entry=0x7f0a15f44e00, 
finish=finish@entry=
0x55d3a6ba0cd0 , errp=errp@entry=0x0) at job.c:1004
#13 0x55d3a6ba0cee in job_cancel_sync (job=job@entry=0x7f0a15f44e00) at 
job.c:947




Re: [PATCH v2 1/3] backup: don't acquire aio_context in backup_clean

2020-03-27 Thread Vladimir Sementsov-Ogievskiy

26.03.2020 18:56, Stefan Reiter wrote:

All code-paths leading to backup_clean (via job_clean) have the job's
context already acquired. The job's context is guaranteed to be the same
as the one used by backup_top via backup_job_create.


As we already discussed, this is not quite right. So, may be this patch should
be the last one...



Since the previous logic effectively acquired the lock twice, this
broke cleanup of backups for disks using IO threads, since the BDRV_POLL_WHILE
in bdrv_backup_top_drop -> bdrv_do_drained_begin would only release the lock
once, thus deadlocking with the IO thread.

This is a partial revert of 0abf2581717a19.

Signed-off-by: Stefan Reiter 
---
  block/backup.c | 4 
  1 file changed, 4 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 7430ca5883..a7a7dcaf4c 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -126,11 +126,7 @@ static void backup_abort(Job *job)
  static void backup_clean(Job *job)
  {
  BackupBlockJob *s = container_of(job, BackupBlockJob, common.job);
-AioContext *aio_context = bdrv_get_aio_context(s->backup_top);
-
-aio_context_acquire(aio_context);
  bdrv_backup_top_drop(s->backup_top);
-aio_context_release(aio_context);
  }
  
  void backup_do_checkpoint(BlockJob *job, Error **errp)





--
Best regards,
Vladimir