Re: [Qemu-block] [Qemu-devel] [PATCH] qcow2: allocate cluster_cache/cluster_data on demand

2017-08-18 Thread Alexey Kardashevskiy
On 19/08/17 12:46, Alexey Kardashevskiy wrote:
> On 19/08/17 01:18, Eric Blake wrote:
>> On 08/18/2017 08:31 AM, Stefan Hajnoczi wrote:
>>> Most qcow2 files are uncompressed so it is wasteful to allocate (32 + 1)
>>> * cluster_size + 512 bytes upfront.  Allocate s->cluster_cache and
>>> s->cluster_data when the first read operation is performance on a
>>> compressed cluster.
>>>
>>> The buffers are freed in .bdrv_close().  .bdrv_open() no longer has any
>>> code paths that can allocate these buffers, so remove the free functions
>>> in the error code path.
>>>
>>> Reported-by: Alexey Kardashevskiy 
>>> Cc: Kevin Wolf 
>>> Signed-off-by: Stefan Hajnoczi 
>>> ---
>>> Alexey: Does this improve your memory profiling results?
>>
>> Is this a regression from earlier versions? 
> 
> Hm, I have not thought about this.
> 
> So. I did bisect and this started happening from
> 9a4c0e220d8a4f82b5665d0ee95ef94d8e1509d5
> "hw/virtio-pci: fix virtio behaviour"
> 
> Before that, the very same command line would take less than 1GB of
> resident memory. That thing basically enforces virtio-1.0 for QEMU <=2.6
> which means that upstream with "-machine pseries-2.6" works fine (less than
> 1GB), "-machine pseries-2.7" does not (close to 7GB, sometime even 9GB).
> 
> Then I tried bisecting again, with
> "scsi=off,disable-modern=off,disable-legacy=on" on my 150 virtio-block
> devices, started from
> e266d421490e0 "virtio-pci: add flags to enable/disable legacy/modern" (it
> added the disable-modern switch) which uses 2GB of memory.
> 
> I ended up with ada434cd0b44 "virtio-pci: implement cfg capability".
> 
> Then I removed proxy->modern_as on v2.10.0-rc3 (see below) and got 1.5GB of
> used memory (yay!)
> 
> I do not really know how to reinterpret all of this, do you?
> 
> 
> Note: 1GB..9GB numbers from below are the peak values from valgrind's

s/from below/from above/ , sorry, bad cut-n-paste :)


-- 
Alexey



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH] qcow2: allocate cluster_cache/cluster_data on demand

2017-08-18 Thread Alexey Kardashevskiy
On 18/08/17 23:31, Stefan Hajnoczi wrote:
> Most qcow2 files are uncompressed so it is wasteful to allocate (32 + 1)
> * cluster_size + 512 bytes upfront.  Allocate s->cluster_cache and
> s->cluster_data when the first read operation is performance on a
> compressed cluster.
> 
> The buffers are freed in .bdrv_close().  .bdrv_open() no longer has any
> code paths that can allocate these buffers, so remove the free functions
> in the error code path.
> 
> Reported-by: Alexey Kardashevskiy 
> Cc: Kevin Wolf 
> Signed-off-by: Stefan Hajnoczi 
> ---
> Alexey: Does this improve your memory profiling results?

Yes, it does:

was:
12.81% (1,023,193,088B)
now:
05.36% (393,893,888B)


Tested-by: Alexey Kardashevskiy 

>  block/qcow2-cluster.c | 17 +
>  block/qcow2.c | 12 
>  2 files changed, 17 insertions(+), 12 deletions(-)
> 
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index f06c08f64c..c47600a44e 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -1516,6 +1516,23 @@ int qcow2_decompress_cluster(BlockDriverState *bs, 
> uint64_t cluster_offset)
>  nb_csectors = ((cluster_offset >> s->csize_shift) & s->csize_mask) + 
> 1;
>  sector_offset = coffset & 511;
>  csize = nb_csectors * 512 - sector_offset;
> +
> +/* Allocate buffers on first decompress operation, most images are
> + * uncompressed and the memory overhead can be avoided.  The buffers
> + * are freed in .bdrv_close().
> + */
> +if (!s->cluster_data) {
> +/* one more sector for decompressed data alignment */
> +s->cluster_data = qemu_try_blockalign(bs->file->bs,
> +QCOW_MAX_CRYPT_CLUSTERS * s->cluster_size + 512);
> +if (!s->cluster_data) {
> +return -EIO;
> +}
> +}
> +if (!s->cluster_cache) {
> +s->cluster_cache = g_malloc(s->cluster_size);
> +}
> +
>  BLKDBG_EVENT(bs->file, BLKDBG_READ_COMPRESSED);
>  ret = bdrv_read(bs->file, coffset >> 9, s->cluster_data,
>  nb_csectors);
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 40ba26c111..0ac201910a 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -1360,16 +1360,6 @@ static int qcow2_do_open(BlockDriverState *bs, QDict 
> *options, int flags,
>  goto fail;
>  }
>  
> -s->cluster_cache = g_malloc(s->cluster_size);
> -/* one more sector for decompressed data alignment */
> -s->cluster_data = qemu_try_blockalign(bs->file->bs, 
> QCOW_MAX_CRYPT_CLUSTERS
> -* s->cluster_size + 512);
> -if (s->cluster_data == NULL) {
> -error_setg(errp, "Could not allocate temporary cluster buffer");
> -ret = -ENOMEM;
> -goto fail;
> -}
> -
>  s->cluster_cache_offset = -1;
>  s->flags = flags;
>  
> @@ -1507,8 +1497,6 @@ static int qcow2_do_open(BlockDriverState *bs, QDict 
> *options, int flags,
>  if (s->refcount_block_cache) {
>  qcow2_cache_destroy(bs, s->refcount_block_cache);
>  }
> -g_free(s->cluster_cache);
> -qemu_vfree(s->cluster_data);
>  qcrypto_block_free(s->crypto);
>  qapi_free_QCryptoBlockOpenOptions(s->crypto_opts);
>  return ret;
> 


-- 
Alexey



Re: [Qemu-block] [Qemu-devel] [PATCH] qcow2: allocate cluster_cache/cluster_data on demand

2017-08-18 Thread Alexey Kardashevskiy
On 19/08/17 01:18, Eric Blake wrote:
> On 08/18/2017 08:31 AM, Stefan Hajnoczi wrote:
>> Most qcow2 files are uncompressed so it is wasteful to allocate (32 + 1)
>> * cluster_size + 512 bytes upfront.  Allocate s->cluster_cache and
>> s->cluster_data when the first read operation is performance on a
>> compressed cluster.
>>
>> The buffers are freed in .bdrv_close().  .bdrv_open() no longer has any
>> code paths that can allocate these buffers, so remove the free functions
>> in the error code path.
>>
>> Reported-by: Alexey Kardashevskiy 
>> Cc: Kevin Wolf 
>> Signed-off-by: Stefan Hajnoczi 
>> ---
>> Alexey: Does this improve your memory profiling results?
> 
> Is this a regression from earlier versions? 

Hm, I have not thought about this.

So. I did bisect and this started happening from
9a4c0e220d8a4f82b5665d0ee95ef94d8e1509d5
"hw/virtio-pci: fix virtio behaviour"

Before that, the very same command line would take less than 1GB of
resident memory. That thing basically enforces virtio-1.0 for QEMU <=2.6
which means that upstream with "-machine pseries-2.6" works fine (less than
1GB), "-machine pseries-2.7" does not (close to 7GB, sometime even 9GB).

Then I tried bisecting again, with
"scsi=off,disable-modern=off,disable-legacy=on" on my 150 virtio-block
devices, started from
e266d421490e0 "virtio-pci: add flags to enable/disable legacy/modern" (it
added the disable-modern switch) which uses 2GB of memory.

I ended up with ada434cd0b44 "virtio-pci: implement cfg capability".

Then I removed proxy->modern_as on v2.10.0-rc3 (see below) and got 1.5GB of
used memory (yay!)

I do not really know how to reinterpret all of this, do you?


Note: 1GB..9GB numbers from below are the peak values from valgrind's
massif. This is pretty much resident memory used by QEMU process. In my
testing I did not enable KVM and I did not start the guest (i.e. used -S).
150 virtio-block devices, 2GB RAM for the guest.

Also, while bisecting, I only paid attention if it is 1..2GB or 6..9GB -
all tests did fit these 2 ranges, for any given sha1 the amount of memory
would be stable but among "good" commits it could change between 1GB and 2GB.



diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 5d14bd6..7ad447a 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1783,6 +1783,7 @@ static void virtio_pci_realize(PCIDevice *pci_dev,
Error **errp)
/* PCI BAR regions must be powers of 2 */
pow2ceil(proxy->notify.offset + proxy->notify.size));

+#if 0
 memory_region_init_alias(&proxy->modern_cfg,
  OBJECT(proxy),
  "virtio-pci-cfg",
@@ -1791,7 +1792,7 @@ static void virtio_pci_realize(PCIDevice *pci_dev,
Error **errp)
  memory_region_size(&proxy->modern_bar));

 address_space_init(&proxy->modern_as, &proxy->modern_cfg,
"virtio-pci-cfg-as");
-
+#endif
 if (proxy->disable_legacy == ON_OFF_AUTO_AUTO) {
 proxy->disable_legacy = pcie_port ? ON_OFF_AUTO_ON : ON_OFF_AUTO_OFF;
 }
@@ -1860,10 +1861,10 @@ static void virtio_pci_realize(PCIDevice *pci_dev,
Error **errp)

 static void virtio_pci_exit(PCIDevice *pci_dev)
 {
-VirtIOPCIProxy *proxy = VIRTIO_PCI(pci_dev);
+//VirtIOPCIProxy *proxy = VIRTIO_PCI(pci_dev);

 msix_uninit_exclusive_bar(pci_dev);
-address_space_destroy(&proxy->modern_as);
+//address_space_destroy(&proxy->modern_as);
 }

 static void virtio_pci_reset(DeviceState *qdev)





> Likely, it is NOT -rc4
> material, and thus can wait for 2.11; but it should be fine for -stable
> as part of 2.10.1 down the road.
> 
>> +++ b/block/qcow2-cluster.c
>> @@ -1516,6 +1516,23 @@ int qcow2_decompress_cluster(BlockDriverState *bs, 
>> uint64_t cluster_offset)
>>  nb_csectors = ((cluster_offset >> s->csize_shift) & s->csize_mask) 
>> + 1;
>>  sector_offset = coffset & 511;
>>  csize = nb_csectors * 512 - sector_offset;
>> +
>> +/* Allocate buffers on first decompress operation, most images are
>> + * uncompressed and the memory overhead can be avoided.  The buffers
>> + * are freed in .bdrv_close().
>> + */
>> +if (!s->cluster_data) {
>> +/* one more sector for decompressed data alignment */
>> +s->cluster_data = qemu_try_blockalign(bs->file->bs,
>> +QCOW_MAX_CRYPT_CLUSTERS * s->cluster_size + 512);
>> +if (!s->cluster_data) {
>> +return -EIO;
> 
> Is -ENOMEM any better than -EIO here?
> 


-- 
Alexey



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v5 08/13] tests: Rely more on global_qtest

2017-08-18 Thread John Snow


On 08/18/2017 05:15 PM, Eric Blake wrote:
> libqtest provides two layers of functions: qtest_*() that operate
> on an explicit object, and a plain version that operates on the
> 'global_qtest' object.  However, very few tests care about the
> distinction, and even the tests that manipulate multiple qtest
> connections at once are just fine reassigning global_qtest around
> the blocks of code that will then operate on the updated global,
> rather than calling the verbose form.  Before the next few patches
> get rid of the qtest_* layer, we first need to update the remaining
> few spots that were using the long form where we can instead rely
> on the short form.
> 

Not a big fan of globals and implicit state, but I do at least agree
that we don't need two sets of functions.

(You just happen to be killing the set I like.)

eh, to-may-to to-mah-to.

> Signed-off-by: Eric Blake 

Acked-by: John Snow 



[Qemu-block] [PATCH v5 08/13] tests: Rely more on global_qtest

2017-08-18 Thread Eric Blake
libqtest provides two layers of functions: qtest_*() that operate
on an explicit object, and a plain version that operates on the
'global_qtest' object.  However, very few tests care about the
distinction, and even the tests that manipulate multiple qtest
connections at once are just fine reassigning global_qtest around
the blocks of code that will then operate on the updated global,
rather than calling the verbose form.  Before the next few patches
get rid of the qtest_* layer, we first need to update the remaining
few spots that were using the long form where we can instead rely
on the short form.

Signed-off-by: Eric Blake 
---
 tests/fdc-test.c |  2 +-
 tests/ide-test.c | 10 +-
 tests/ipmi-bt-test.c |  2 +-
 tests/ipmi-kcs-test.c|  2 +-
 tests/libqos/libqos-pc.c |  2 +-
 tests/postcopy-test.c| 14 +++---
 tests/rtc-test.c |  9 +++--
 tests/tco-test.c |  5 ++---
 tests/wdt_ib700-test.c   | 30 +-
 9 files changed, 38 insertions(+), 38 deletions(-)

diff --git a/tests/fdc-test.c b/tests/fdc-test.c
index 325712e0f2..0b68d9aec4 100644
--- a/tests/fdc-test.c
+++ b/tests/fdc-test.c
@@ -565,7 +565,7 @@ int main(int argc, char **argv)
 g_test_init(&argc, &argv, NULL);

 qtest_start("-device floppy,id=floppy0");
-qtest_irq_intercept_in(global_qtest, "ioapic");
+irq_intercept_in("ioapic");
 qtest_add_func("/fdc/cmos", test_cmos);
 qtest_add_func("/fdc/no_media_on_start", test_no_media_on_start);
 qtest_add_func("/fdc/read_without_media", test_read_without_media);
diff --git a/tests/ide-test.c b/tests/ide-test.c
index aa9de065fc..505a800b44 100644
--- a/tests/ide-test.c
+++ b/tests/ide-test.c
@@ -462,7 +462,7 @@ static void test_bmdma_setup(void)
 "-drive file=%s,if=ide,serial=%s,cache=writeback,format=raw "
 "-global ide-hd.ver=%s",
 tmp_path, "testdisk", "version");
-qtest_irq_intercept_in(global_qtest, "ioapic");
+irq_intercept_in("ioapic");
 }

 static void test_bmdma_teardown(void)
@@ -584,7 +584,7 @@ static void test_flush(void)

 dev = get_pci_device(&bmdma_bar, &ide_bar);

-qtest_irq_intercept_in(global_qtest, "ioapic");
+irq_intercept_in("ioapic");

 /* Dirty media so that CMD_FLUSH_CACHE will actually go to disk */
 make_dirty(0);
@@ -635,7 +635,7 @@ static void test_retry_flush(const char *machine)

 dev = get_pci_device(&bmdma_bar, &ide_bar);

-qtest_irq_intercept_in(global_qtest, "ioapic");
+irq_intercept_in("ioapic");

 /* Dirty media so that CMD_FLUSH_CACHE will actually go to disk */
 make_dirty(0);
@@ -826,7 +826,7 @@ static void cdrom_pio_impl(int nblocks)
 ide_test_start("-drive 
if=none,file=%s,media=cdrom,format=raw,id=sr0,index=0 "
"-device ide-cd,drive=sr0,bus=ide.0", tmp_path);
 dev = get_pci_device(&bmdma_bar, &ide_bar);
-qtest_irq_intercept_in(global_qtest, "ioapic");
+irq_intercept_in("ioapic");

 /* PACKET command on device 0 */
 qpci_io_writeb(dev, ide_bar, reg_device, 0);
@@ -909,7 +909,7 @@ static void test_cdrom_dma(void)

 ide_test_start("-drive 
if=none,file=%s,media=cdrom,format=raw,id=sr0,index=0 "
"-device ide-cd,drive=sr0,bus=ide.0", tmp_path);
-qtest_irq_intercept_in(global_qtest, "ioapic");
+irq_intercept_in("ioapic");

 guest_buf = guest_alloc(guest_malloc, len);
 prdt[0].addr = cpu_to_le32(guest_buf);
diff --git a/tests/ipmi-bt-test.c b/tests/ipmi-bt-test.c
index 7e21a9bbcb..891f5bfb13 100644
--- a/tests/ipmi-bt-test.c
+++ b/tests/ipmi-bt-test.c
@@ -421,7 +421,7 @@ int main(int argc, char **argv)
   " -device isa-ipmi-bt,bmc=bmc0", emu_port);
 qtest_start(cmdline);
 g_free(cmdline);
-qtest_irq_intercept_in(global_qtest, "ioapic");
+irq_intercept_in("ioapic");
 qtest_add_func("/ipmi/extern/connect", test_connect);
 qtest_add_func("/ipmi/extern/bt_base", test_bt_base);
 qtest_add_func("/ipmi/extern/bt_enable_irq", test_enable_irq);
diff --git a/tests/ipmi-kcs-test.c b/tests/ipmi-kcs-test.c
index 178ffc1797..53127d2884 100644
--- a/tests/ipmi-kcs-test.c
+++ b/tests/ipmi-kcs-test.c
@@ -280,7 +280,7 @@ int main(int argc, char **argv)
   " -device isa-ipmi-kcs,bmc=bmc0");
 qtest_start(cmdline);
 g_free(cmdline);
-qtest_irq_intercept_in(global_qtest, "ioapic");
+irq_intercept_in("ioapic");
 qtest_add_func("/ipmi/local/kcs_base", test_kcs_base);
 qtest_add_func("/ipmi/local/kcs_abort", test_kcs_abort);
 qtest_add_func("/ipmi/local/kcs_enable_irq", test_enable_irq);
diff --git a/tests/libqos/libqos-pc.c b/tests/libqos/libqos-pc.c
index b554758802..6a2ff6608b 100644
--- a/tests/libqos/libqos-pc.c
+++ b/tests/libqos/libqos-pc.c
@@ -25,7 +25,7 @@ QOSState *qtest_pc_boot(const char *cmdline_fmt, ...)
 qs = qtest_vboot(&qos_ops, cmdline_fmt, ap);
 va_end(ap);

-qtest_irq_intercept_in(global_qtest, "ioapic");
+i

Re: [Qemu-block] [RFC v4 08/13] ide: enumerate_slots implementation

2017-08-18 Thread Eduardo Habkost
On Wed, Aug 16, 2017 at 05:46:18PM -0400, John Snow wrote:
> 
> 
> On 08/14/2017 05:57 PM, Eduardo Habkost wrote:
> > Example output when using "-machine q35":
> > 
> >   {
> > "available": true,
> > "count": 1,
> > "device-types": [
> >   "ide-device"
> > ],
> > "hotpluggable": false,
> > "opts": [
> >   { "option": "unit", "values": 0 },
> >   { "option": "bus", "values": "ide.2" }
> > ],
> > "opts-complete": true
> >   }
> >   {
> > "available": false,
> > "count": 1,
> > "device": "/machine/unattached/device[19]",
> > "device-types": [
> >   "ide-device"
> > ],
> > "hotpluggable": false,
> > "opts": [
> >   { "option": "unit", "values": 1 },
> >   { "option": "bus", "values": "ide.2" } ],
> > "opts-complete": true
> >   }
> >   {
> > "available": true,
> > "count": 10,
> > "device-types": [
> >   "ide-device"
> > ],
> > "hotpluggable": false,
> > "opts": [
> >   { "option": "unit", "values": [ [ 0, 1 ] ] },
> 
> Hm, these unit values aren't really correct -- we do not support
> primary/secondary semantics for IDE buses on the AHCI device. I guess
> they technically exist, but you cannot use them for anything.
> 
> Should I do something to "disable" or otherwise hide the unusable
> secondary unit slots for AHCI devices?

If the device is already rejecting -device ...,unit=1, then the
bug is in my implementation of enumerate_devices.  Maybe it
should just look at IDEBus::max_units to find that out?

-- 
Eduardo



Re: [Qemu-block] [Qemu-devel] [PATCH] qcow2: allocate cluster_cache/cluster_data on demand

2017-08-18 Thread Eric Blake
On 08/18/2017 08:31 AM, Stefan Hajnoczi wrote:
> Most qcow2 files are uncompressed so it is wasteful to allocate (32 + 1)
> * cluster_size + 512 bytes upfront.  Allocate s->cluster_cache and
> s->cluster_data when the first read operation is performance on a
> compressed cluster.
> 
> The buffers are freed in .bdrv_close().  .bdrv_open() no longer has any
> code paths that can allocate these buffers, so remove the free functions
> in the error code path.
> 
> Reported-by: Alexey Kardashevskiy 
> Cc: Kevin Wolf 
> Signed-off-by: Stefan Hajnoczi 
> ---
> Alexey: Does this improve your memory profiling results?

Is this a regression from earlier versions?  Likely, it is NOT -rc4
material, and thus can wait for 2.11; but it should be fine for -stable
as part of 2.10.1 down the road.

> +++ b/block/qcow2-cluster.c
> @@ -1516,6 +1516,23 @@ int qcow2_decompress_cluster(BlockDriverState *bs, 
> uint64_t cluster_offset)
>  nb_csectors = ((cluster_offset >> s->csize_shift) & s->csize_mask) + 
> 1;
>  sector_offset = coffset & 511;
>  csize = nb_csectors * 512 - sector_offset;
> +
> +/* Allocate buffers on first decompress operation, most images are
> + * uncompressed and the memory overhead can be avoided.  The buffers
> + * are freed in .bdrv_close().
> + */
> +if (!s->cluster_data) {
> +/* one more sector for decompressed data alignment */
> +s->cluster_data = qemu_try_blockalign(bs->file->bs,
> +QCOW_MAX_CRYPT_CLUSTERS * s->cluster_size + 512);
> +if (!s->cluster_data) {
> +return -EIO;

Is -ENOMEM any better than -EIO here?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [PATCH for-2.10 v2 0/4] block: Fix non-shared storage migration

2017-08-18 Thread Christian Ehrhardt
On Tue, Aug 15, 2017 at 3:07 PM, Fam Zheng  wrote:

> v2: Don't leak blk->vmsh when BB is deleted before the callback is called.
> [Stefan]
> From stub functions, don't return g_malloc0(1) which is risky, return
> NULL.
> [Eric]


Thanks Fam Zheng and Kevin Wolf,
retesting on -rc3 clearly got me further.

Unfortunately the overall test of a libvirt controlled migration with
--copy-storage-all or --copy-storage-inc still does not work.
I have no good pointers yet where to look at, but wanted to give a heads up.

I'm tracking what I have in [1] so far and added upstream qemu as bug task
as well until we know better.

[1]: https://bugs.launchpad.net/qemu/+bug/1711602


[Qemu-block] [PATCH] qcow2: allocate cluster_cache/cluster_data on demand

2017-08-18 Thread Stefan Hajnoczi
Most qcow2 files are uncompressed so it is wasteful to allocate (32 + 1)
* cluster_size + 512 bytes upfront.  Allocate s->cluster_cache and
s->cluster_data when the first read operation is performance on a
compressed cluster.

The buffers are freed in .bdrv_close().  .bdrv_open() no longer has any
code paths that can allocate these buffers, so remove the free functions
in the error code path.

Reported-by: Alexey Kardashevskiy 
Cc: Kevin Wolf 
Signed-off-by: Stefan Hajnoczi 
---
Alexey: Does this improve your memory profiling results?

 block/qcow2-cluster.c | 17 +
 block/qcow2.c | 12 
 2 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index f06c08f64c..c47600a44e 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1516,6 +1516,23 @@ int qcow2_decompress_cluster(BlockDriverState *bs, 
uint64_t cluster_offset)
 nb_csectors = ((cluster_offset >> s->csize_shift) & s->csize_mask) + 1;
 sector_offset = coffset & 511;
 csize = nb_csectors * 512 - sector_offset;
+
+/* Allocate buffers on first decompress operation, most images are
+ * uncompressed and the memory overhead can be avoided.  The buffers
+ * are freed in .bdrv_close().
+ */
+if (!s->cluster_data) {
+/* one more sector for decompressed data alignment */
+s->cluster_data = qemu_try_blockalign(bs->file->bs,
+QCOW_MAX_CRYPT_CLUSTERS * s->cluster_size + 512);
+if (!s->cluster_data) {
+return -EIO;
+}
+}
+if (!s->cluster_cache) {
+s->cluster_cache = g_malloc(s->cluster_size);
+}
+
 BLKDBG_EVENT(bs->file, BLKDBG_READ_COMPRESSED);
 ret = bdrv_read(bs->file, coffset >> 9, s->cluster_data,
 nb_csectors);
diff --git a/block/qcow2.c b/block/qcow2.c
index 40ba26c111..0ac201910a 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1360,16 +1360,6 @@ static int qcow2_do_open(BlockDriverState *bs, QDict 
*options, int flags,
 goto fail;
 }
 
-s->cluster_cache = g_malloc(s->cluster_size);
-/* one more sector for decompressed data alignment */
-s->cluster_data = qemu_try_blockalign(bs->file->bs, QCOW_MAX_CRYPT_CLUSTERS
-* s->cluster_size + 512);
-if (s->cluster_data == NULL) {
-error_setg(errp, "Could not allocate temporary cluster buffer");
-ret = -ENOMEM;
-goto fail;
-}
-
 s->cluster_cache_offset = -1;
 s->flags = flags;
 
@@ -1507,8 +1497,6 @@ static int qcow2_do_open(BlockDriverState *bs, QDict 
*options, int flags,
 if (s->refcount_block_cache) {
 qcow2_cache_destroy(bs, s->refcount_block_cache);
 }
-g_free(s->cluster_cache);
-qemu_vfree(s->cluster_data);
 qcrypto_block_free(s->crypto);
 qapi_free_QCryptoBlockOpenOptions(s->crypto_opts);
 return ret;
-- 
2.13.4




Re: [Qemu-block] [PATCH for-2.10] qemu-iotests: step clock after each test iteration

2017-08-18 Thread Alberto Garcia
On Tue 15 Aug 2017 03:05:02 PM CEST, Stefan Hajnoczi wrote:
> The 093 throttling test submits twice as many requests as the throttle
> limit in order to ensure that we reach the limit.  The remaining
> requests are left in-flight at the end of each test iteration.
>
> Commit 452589b6b47e8dc6353df257fc803dfc1383bed8 ("vl.c/exit: pause cpus
> before closing block devices") exposed a hang in 093.  This happens
> because requests are still in flight when QEMU terminates but
> QEMU_CLOCK_VIRTUAL time is frozen.  bdrv_drain_all() hangs forever since
> throttled requests cannot complete.
>
> Step the clock at the end of each test iteration so in-flight requests
> actually finish.  This solves the hang and is cleaner than leaving tests
> in-flight.
>
> Note that this could also be "fixed" by disabling throttling when drives
> are closed in QEMU.  That approach has two issues:
>
> 1. We must drain requests before disabling throttling, so the hang
>cannot be easily avoided!
>
> 2. Any time QEMU disables throttling internally there is a chance that
>malicious users can abuse the code path to bypass throttling limits.
>
> Therefore it makes more sense to fix the test case than to modify QEMU.
>
> Signed-off-by: Stefan Hajnoczi 

Reviewed-by: Alberto Garcia 

Berto



Re: [Qemu-block] [PATCH v5 4/6] block: convert ThrottleGroup to object with QOM

2017-08-18 Thread Alberto Garcia
On Fri 18 Aug 2017 05:10:17 AM CEST, Manos Pitsidianakis wrote:

>   * If no ThrottleGroup is found with the given name a new one is
>   * created.
>   *
> - * @name: the name of the ThrottleGroup
> + * This function edits throttle_groups and must be called under the global
> + * mutex.
> + *
> + * @name: the name of the ThrottleGroup, NULL means a new anonymous group 
> will
> + *be created.
>   * @ret:  the ThrottleState member of the ThrottleGroup
>   */
>  ThrottleState *throttle_group_incref(const char *name)

If we're not going to have anonymous groups in the end this patch needs
changes (name == NULL is no longer allowed).

> +/* This function edits throttle_groups and must be called under the global
> + * mutex */
> +static void throttle_group_obj_complete(UserCreatable *obj, Error **errp)
> +{
> +ThrottleGroup *tg = THROTTLE_GROUP(obj), *iter;
> +ThrottleConfig *cfg = &tg->ts.cfg;
> +
> +/* set group name to object id if it exists */
> +if (!tg->name && tg->parent_obj.parent) {
> +tg->name = object_get_canonical_path_component(OBJECT(obj));
> +}
> +
> +if (tg->name) {
> +/* error if name is duplicate */
> +QTAILQ_FOREACH(iter, &throttle_groups, list) {
> +if (!g_strcmp0(tg->name, iter->name) && tg != iter) {

I'm just nitpicking here :) but if you change this and put 'tg != iter'
first you'll save one unnecessary string comparison.

> +error_setg(errp, "A group with this name already exists");
> +return;
> +}
> +}
> +}
> +
> +/* unfix buckets to check validity */
> +throttle_get_config(&tg->ts, cfg);
> +if (!throttle_is_valid(cfg, errp)) {
> +return;
> +}
> +/* fix buckets again */
> +throttle_config(&tg->ts, tg->clock_type, cfg);

throttle_get_config(ts, cfg) makes a copy of the existing configuration,
but the cfg pointer that you are passing already points to the existing
configuration, so in practice you are doing

*(ts->cfg) = *(ts->cfg);
throttle_unfix_bucket(...);

and since you "unfixed" the configuration, then you need undo the whole
thing by setting the config again:

*(ts->cfg) = *(ts->cfg);
throttle_fix_bucket(...);

You should declare a local ThrottleConfig variable, copy the config
there, and run throttle_is_valid() on that copy. The final
throttle_config() call is unnecessary.

Once the patches I sent yesterday are merged we'll be able to skip the
throttle_get_config() call and do throttle_is_valid(&tg->ts.cfg, errp)
directly.

> +static void throttle_group_set(Object *obj, Visitor *v, const char * name,
> +   void *opaque, Error **errp)
> +
> +{
> +ThrottleGroup *tg = THROTTLE_GROUP(obj);
> +ThrottleConfig cfg;
> +ThrottleParamInfo *info = opaque;
> +Error *local_err = NULL;
> +int64_t value;
> +
> +/* If we have finished initialization, don't accept individual property
> + * changes through QOM. Throttle configuration limits must be set in one
> + * transaction, as certain combinations are invalid.
> + */
> +if (tg->is_initialized) {
> +error_setg(&local_err, "Property cannot be set after 
> initialization");
> +goto ret;
> +}
> +
> +visit_type_int64(v, name, &value, &local_err);
> +if (local_err) {
> +goto ret;
> +}
> +if (value < 0) {
> +error_setg(&local_err, "Property values cannot be negative");
> +goto ret;
> +}
> +
> +cfg = tg->ts.cfg;
> +switch (info->data_type) {
> +case UINT64:
> +{
> +uint64_t *field = (void *)&cfg.buckets[info->type] + 
> info->offset;
> +*field = value;
> +}
> +break;
> +case DOUBLE:
> +{
> +double *field = (void *)&cfg.buckets[info->type] + info->offset;
> +*field = value;
> +}
> +break;
> +case UNSIGNED:
> +{
> +if (value > UINT_MAX) {
> +error_setg(&local_err, "%s value must be in the"
> +   "range [0, %u]", info->name, 
> UINT_MAX);
> +goto ret;
> +}
> +unsigned *field = (void *)&cfg.buckets[info->type] + 
> info->offset;
> +*field = value;
> +}
> +}
> +
> +tg->ts.cfg = cfg;

There's a bit of black magic here :) you have a user-defined enumeration
(UNSIGNED, DOUBLE, UINT64) to identify C types and I'm worried about
what happens if the data types of LeakyBucket change, will we be able to
detect the problem?

Out of the blue I can think of the following alternative:

   - There's 6 different buckets (we have BucketType listing them)

   - There's 3 values we can set in each bucket (max, avg,
 burst_length). For those we can have an internal enumeration
 (probably with one additional value for iops-size).

   - In the 'properties' array, for each property we know its category
 (and I mean: 

Re: [Qemu-block] [Qemu-devel] [PATCH 5/6] block: Fix write/resize permissions for inactive images

2017-08-18 Thread Fam Zheng
On Fri, 08/18 18:06, Xie Changlong wrote:
> The root casue is when we run replication in secondary, vmstate changes to
> RUN_STATE_INMIGRATE, then blockdev_init() sets bdrv_flags |=
> BDRV_O_INACTIVE. So the whole chain become readonly. I've tried on my side,
> but it seems not easy to fix it. I wonder if there is any way to bypass
> this? Any suggestion would be appreciated.

The non-shared storage migration uses "nbd_server_add -w" at destinition side
where BDRV_O_INACTIVE is set for images like your case, the way it handles it is
by calling bdrv_invalidate_cache(). See nbd_export_new().

See also commit 3dff24f2dffc5f3aa46dc014122012848bd7959d.

I'm not sure if this is enough for block replication?

Fam



Re: [Qemu-block] [Qemu-devel] [PATCH 5/6] block: Fix write/resize permissions for inactive images

2017-08-18 Thread Xie Changlong

在 5/5/2017 12:52 AM, Kevin Wolf 写道:
  
+/* Returns whether the image file can be written to right now */

+bool bdrv_is_writable(BlockDriverState *bs)
+{
+return !bdrv_is_read_only(bs) && !(bs->open_flags & BDRV_O_INACTIVE);
+}
+


This commit use BDRV_O_INACTIVE to judge whether the image file can be 
written or not. But it blocks replication driver on the secondary node. 
For replication in secondary, we must ensure that the whole chain are 
writable:



  ||
  ||.--
  ||| Secondary
  ||'--
  ||
  ||

virtio-blk
 ^
-->  3 NBD   |
  || server  2 filter
  ||^^
  ||||
  ||  Secondary disk <- hidden-disk 5 <- active-disk 4
  |||  backing^   backing
  ||| |
  ||| |
  ||'-'
  ||   drive-backup sync=none 6

The root casue is when we run replication in secondary, vmstate changes 
to RUN_STATE_INMIGRATE, then blockdev_init() sets bdrv_flags |= 
BDRV_O_INACTIVE. So the whole chain become readonly. I've tried on my 
side, but it seems not easy to fix it. I wonder if there is any way to 
bypass this? Any suggestion would be appreciated.


It's very easy to reproduce this scenario:
(gdb) r
Starting program: /root/.xie/qemu-colo/x86_64-softmmu/qemu-system-x86_64 
-boot c -m 2048 -smp 2 -qmp stdio -vnc :0 -name secondary -enable-kvm 
-cpu qemu64,+kvmclock -device piix3-usb-uhci -device usb-tablet -drive 
if=none,id=colo-disk,file.filename=/root/.xie/suse.qcow2.orgin,file.node-name=secondary_disk,driver=qcow2,node-name=sec-qcow2-driver-for-nbd 
-drive 
if=ide,id=active-disk0,node-name=active-disk111,throttling.bps-total=7000,driver=replication,node-name=secondary-replication-driver,mode=secondary,top-id=active-disk0,file.driver=qcow2,file.node-name=active-qcow2-driver,file.file.filename=/mnt/ramfs/active_disk.img,file.file.node-name=active_disk,file.backing.driver=qcow2,file.backing.file.filename=/mnt/ramfs/hidden_disk.img,file.backing.node-name=hidden-qcow2-driver,file.backing.file.node-name=hidden_disk,file.backing.backing=colo-disk 
-incoming tcp:0:

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x74801700 (LWP 25252)]
[New Thread 0x74000700 (LWP 25255)]
qemu-system-x86_64: -drive 
if=ide,id=active-disk0,node-name=active-disk111,throttling.bps-total=7000,driver=replication,node-name=secondary-replication-driver,mode=secondary,top-id=active-disk0,file.driver=qcow2,file.node-name=active-qcow2-driver,file.file.filename=/mnt/ramfs/active_disk.img,file.file.node-name=active_disk,file.backing.driver=qcow2,file.backing.file.filename=/mnt/ramfs/hidden_disk.img,file.backing.node-name=hidden-qcow2-driver,file.backing.file.node-name=hidden_disk,file.backing.backing=colo-disk: 
Block node is read-only

[Thread 0x74000700 (LWP 25255) exited]
[Thread 0x74801700 (LWP 25252) exited]
[Inferior 1 (process 25248) exited with code 01]
Missing separate debuginfos, use: debuginfo-install 
glib2-2.46.2-4.el7.x86_64 glibc-2.17-157.el7_3.4.x86_64 
libacl-2.2.51-12.el7.x86_64 libattr-2.4.46-12.el7.x86_64 
libgcc-4.8.5-11.el7.x86_64 libgcrypt-1.5.3-13.el7_3.1.x86_64 
libgpg-error-1.12-3.el7.x86_64 libstdc++-4.8.5-11.el7.x86_64 
libuuid-2.23.2-33.el7_3.2.x86_64 openssl-libs-1.0.1e-60.el7_3.1.x86_64 
pixman-0.34.0-1.el7.x86_64 zlib-1.2.7-17.el7.x86_64

(gdb)

--
Thanks
-Xie



Re: [Qemu-block] [PATCH v5 5/6] block: add throttle block filter driver

2017-08-18 Thread Alberto Garcia
On Fri 18 Aug 2017 11:07:22 AM CEST, Manos Pitsidianakis wrote:

>>> The driver can be used with the syntax
>>> -drive driver=throttle,file.filename=foo.qcow2, \
>>> limits.iops-total=...,throttle-group=bar
>>
>>I had understood that we would get rid of the limits.* options in this
>>driver, or did I get it wrong?
>>
> I was going to send a patch after this was merged along with adding
> ThrottleGroups to the root container, to speed things up. Do you
> prefer to do this in this patch?

I'm not sure what's the benefit of adding a complete infrastructure that
you are going to remove immediately afterwards :-?

> The root container patch probably has to go to the 'remove legacy'
> series since adding it here means the name collision errors introduce
> error paths in block/block-backend.c that go away in that series, and
> that'd be a waste of effort.

Ok.

Berto



Re: [Qemu-block] [PATCH v5 5/6] block: add throttle block filter driver

2017-08-18 Thread Manos Pitsidianakis

On Fri, Aug 18, 2017 at 10:23:09AM +0200, Alberto Garcia wrote:

On Fri 18 Aug 2017 05:10:18 AM CEST, Manos Pitsidianakis wrote:

block/throttle.c uses existing I/O throttle infrastructure inside a
block filter driver. I/O operations are intercepted in the filter's
read/write coroutines, and referred to block/throttle-groups.c

The driver can be used with the syntax
-drive driver=throttle,file.filename=foo.qcow2, \
limits.iops-total=...,throttle-group=bar


I had understood that we would get rid of the limits.* options in this
driver, or did I get it wrong?

Other than that, the rest of the code looks perfect to me.

Berto



I was going to send a patch after this was merged along with adding 
ThrottleGroups to the root container, to speed things up. Do you prefer 
to do this in this patch?


The root container patch probably has to go to the 'remove legacy' 
series since adding it here means the name collision errors introduce 
error paths in block/block-backend.c that go away in that series, and 
that'd be a waste of effort.


signature.asc
Description: PGP signature


Re: [Qemu-block] [PATCH v5 5/6] block: add throttle block filter driver

2017-08-18 Thread Alberto Garcia
On Fri 18 Aug 2017 05:10:18 AM CEST, Manos Pitsidianakis wrote:
> block/throttle.c uses existing I/O throttle infrastructure inside a
> block filter driver. I/O operations are intercepted in the filter's
> read/write coroutines, and referred to block/throttle-groups.c
>
> The driver can be used with the syntax
> -drive driver=throttle,file.filename=foo.qcow2, \
> limits.iops-total=...,throttle-group=bar

I had understood that we would get rid of the limits.* options in this
driver, or did I get it wrong?

Other than that, the rest of the code looks perfect to me.

Berto