Re: [Qemu-block] [Qemu-devel] [PATCH 3/3] block: mirror - zero unallocated target sectors when zero init not present

2015-09-28 Thread Eric Blake
On 09/28/2015 03:32 PM, Jeff Cody wrote:

> I guess that makes sense.  What about  the case when the target is a raw
> device without zero init?  There is no backing file... Of course, perhaps
> in the raw case the user should be using sync==full anyways.
> 
>>
>> 2) even with mode == "existing" you expect the data to be consistent at
>> the end of the mirroring
>>
> 
> The reason I added the "existing" exception was so the user could avoid the
> time penalty of zeroing out the data if they knew the target had already
> explicitly been zeroed.  Do you think it is fair to assume that if the user
> specified existing, that they take responsibility for setting up the target
> image how they like (including data initialization)?  Or should we add
> another option for mirror, to allow the user to bypass the zero fill?

mode == 'existing' puts the burden on the caller to ensure that the file
they are passing in starts with known contents (either contents don't
matter because we are doing sync == 'full' to write every sector, or
contents MUST initially match what the guest would see looking at the
backing image when doing a shallow clone).  But if there is a way for a
user to pass in an existing file which they have pre-zeroed, even though
the file would normally be treated as though it did not have zero fill,
then the option to bypass a redundant zero fill might be useful.  I'm
not sure it's worth implementing without a known user, though, and I
don't know that libvirt would use it.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH 16/16] block: Remove bdrv_swap()

2015-09-28 Thread Alberto Garcia
On Thu 17 Sep 2015 03:48:20 PM CEST, Kevin Wolf wrote:
> bdrv_swap() is unused now. Remove it and all functions that have
> no other users than bdrv_swap(). In particular, this removes the
> .bdrv_rebind callbacks from block drivers.
>
> Signed-off-by: Kevin Wolf 

Reviewed-by: Alberto Garcia 

Berto



Re: [Qemu-block] [Qemu-devel] [PATCH 06/10][TRIVIAL] macio-ide: add to storage category

2015-09-28 Thread Thomas Huth
On 26/09/15 18:22, Laurent Vivier wrote:
> macio-ide is an IDE controller, so add it
> to the storage category.
> 
> Signed-off-by: Laurent Vivier 
> ---
>  hw/ide/macio.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/hw/ide/macio.c b/hw/ide/macio.c
> index 66ac2ba..893c9b9 100644
> --- a/hw/ide/macio.c
> +++ b/hw/ide/macio.c
> @@ -590,6 +590,7 @@ static void macio_ide_class_init(ObjectClass *oc, void 
> *data)
>  dc->realize = macio_ide_realizefn;
>  dc->reset = macio_ide_reset;
>  dc->vmsd = _pmac;
> +set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
>  }

Reviewed-by: Thomas Huth 





Re: [Qemu-block] [Qemu-devel] [PATCH 01/10][TRIVIAL] adb: add to input category

2015-09-28 Thread Thomas Huth
On 26/09/15 18:22, Laurent Vivier wrote:
> The Apple Desktop Bus is used to connect a keyboard and a mouse,
> so add it to the input category.
> 
> Signed-off-by: Laurent Vivier 
> ---
>  hw/input/adb.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/hw/input/adb.c b/hw/input/adb.c
> index a18eea2..09eead9 100644
> --- a/hw/input/adb.c
> +++ b/hw/input/adb.c
> @@ -362,6 +362,7 @@ static void adb_kbd_class_init(ObjectClass *oc, void 
> *data)
>  
>  akc->parent_realize = dc->realize;
>  dc->realize = adb_kbd_realizefn;
> +set_bit(DEVICE_CATEGORY_INPUT, dc->categories);
>  
>  adc->devreq = adb_kbd_request;
>  dc->reset = adb_kbd_reset;
> @@ -566,6 +567,7 @@ static void adb_mouse_class_init(ObjectClass *oc, void 
> *data)
>  
>  amc->parent_realize = dc->realize;
>  dc->realize = adb_mouse_realizefn;
> +set_bit(DEVICE_CATEGORY_INPUT, dc->categories);
>  
>  adc->devreq = adb_mouse_request;
>  dc->reset = adb_mouse_reset;

Reviewed-by: Thomas Huth 





Re: [Qemu-block] [Qemu-devel] [PATCH 03/10][TRIVIAL] escc: add to input category

2015-09-28 Thread Thomas Huth
On 26/09/15 18:22, Laurent Vivier wrote:
> ESCC is a serial port controller, so add it
> to the input category.
> 
> Signed-off-by: Laurent Vivier 
> ---
>  hw/char/escc.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/hw/char/escc.c b/hw/char/escc.c
> index ba653ef..9816154 100644
> --- a/hw/char/escc.c
> +++ b/hw/char/escc.c
> @@ -1035,6 +1035,7 @@ static void escc_class_init(ObjectClass *klass, void 
> *data)
>  dc->reset = escc_reset;
>  dc->vmsd = _escc;
>  dc->props = escc_properties;
> +set_bit(DEVICE_CATEGORY_INPUT, dc->categories);
>  }

Reviewed-by: Thomas Huth 





Re: [Qemu-block] [PATCH 3/3] block: mirror - zero unallocated target sectors when zero init not present

2015-09-28 Thread Paolo Bonzini


On 28/09/2015 05:29, Jeff Cody wrote:
> This only occurs under two conditions:
> 
> 1. 'mode' != "existing"
> 2. bdrv_has_zero_init(target) == NULL
> 

I'm not sure if mode != "existing" actually matters.  I think what
actually matters is sync == "full".

The reasons are:

1) with sync != "full", unallocated target sectors should remain
unallocated on the destination because they are supposed to point to the
backing file.

2) even with mode == "existing" you expect the data to be consistent at
the end of the mirroring

Paolo



Re: [Qemu-block] [PATCH 2/3] block: mirror - split out part of mirror_run()

2015-09-28 Thread Paolo Bonzini


On 28/09/2015 05:29, Jeff Cody wrote:
> This is code relocation, to pull the part of mirror_run() that
> calls mirror_iteration out into a separate function.
> 
> Signed-off-by: Jeff Cody 
> ---
>  block/mirror.c | 206 
> ++---
>  1 file changed, 110 insertions(+), 96 deletions(-)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index 8b3e89b..405e5c4 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -318,6 +318,115 @@ static uint64_t coroutine_fn 
> mirror_iteration(MirrorBlockJob *s)
>  return delay_ns;
>  }
>  
> +static int mirror_do_iteration(MirrorBlockJob *s, uint64_t last_pause_ns)
> +{
> +int ret;
> +
> +BlockDriverState *bs = s->common.bs;
> +
> +for (;;) {
> +uint64_t delay_ns = 0;
> +int64_t cnt;
> +bool should_complete;
> +
> +if (s->ret < 0) {
> +ret = s->ret;
> +goto immediate_exit;
> +}

You might as well replace the gotos with returns (either "return ret;"
or "return s->ret;") and the breaks with "return 0;").

> +
> +cnt = bdrv_get_dirty_count(s->dirty_bitmap);
> +/* s->common.offset contains the number of bytes already processed so
> + * far, cnt is the number of dirty sectors remaining and
> + * s->sectors_in_flight is the number of sectors currently being
> + * processed; together those are the current total operation length 
> */
> +s->common.len = s->common.offset +
> +(cnt + s->sectors_in_flight) * BDRV_SECTOR_SIZE;
> +
> +/* Note that even when no rate limit is applied we need to yield
> + * periodically with no pending I/O so that bdrv_drain_all() returns.
> + * We do so every SLICE_TIME nanoseconds, or when there is an error,
> + * or when the source is clean, whichever comes first.
> + */
> +if (qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - last_pause_ns < 
> SLICE_TIME
> +&& s->common.iostatus == BLOCK_DEVICE_IO_STATUS_OK) {
> +if (s->in_flight == MAX_IN_FLIGHT || s->buf_free_count == 0 ||
> +(cnt == 0 && s->in_flight > 0)) {
> +trace_mirror_yield(s, s->in_flight, s->buf_free_count, cnt);
> +s->waiting_for_io = true;
> +qemu_coroutine_yield();
> +s->waiting_for_io = false;
> +continue;
> +} else if (cnt != 0) {
> +delay_ns = mirror_iteration(s);
> +}
> +}
> +
> +should_complete = false;
> +if (s->in_flight == 0 && cnt == 0) {
> +trace_mirror_before_flush(s);
> +ret = bdrv_flush(s->target);
> +if (ret < 0) {
> +if (mirror_error_action(s, false, -ret) ==
> +BLOCK_ERROR_ACTION_REPORT) {
> +goto immediate_exit;
> +}
> +} else {
> +/* We're out of the streaming phase.  From now on, if the job
> + * is cancelled we will actually complete all pending I/O and
> + * report completion.  This way, block-job-cancel will leave
> + * the target in a consistent state.
> + */
> +if (!s->synced) {
> +block_job_event_ready(>common);
> +s->synced = true;
> +}
> +
> +should_complete = s->should_complete ||
> +block_job_is_cancelled(>common);
> +cnt = bdrv_get_dirty_count(s->dirty_bitmap);
> +}
> +}
> +
> +if (cnt == 0 && should_complete) {
> +/* The dirty bitmap is not updated while operations are pending.
> + * If we're about to exit, wait for pending operations before
> + * calling bdrv_get_dirty_count(bs), or we may exit while the
> + * source has dirty data to copy!
> + *
> + * Note that I/O can be submitted by the guest while
> + * mirror_populate runs.
> + */
> +trace_mirror_before_drain(s, cnt);
> +bdrv_drain(bs);
> +cnt = bdrv_get_dirty_count(s->dirty_bitmap);
> +}
> +
> +ret = 0;
> +trace_mirror_before_sleep(s, cnt, s->synced, delay_ns);
> +if (!s->synced) {
> +block_job_sleep_ns(>common, QEMU_CLOCK_REALTIME, delay_ns);
> +if (block_job_is_cancelled(>common)) {
> +break;
> +}
> +} else if (!should_complete) {
> +delay_ns = (s->in_flight == 0 && cnt == 0 ? SLICE_TIME : 0);
> +block_job_sleep_ns(>common, QEMU_CLOCK_REALTIME, delay_ns);
> +} else if (cnt == 0) {
> +/* The two disks are in sync.  Exit and report successful
> + * completion.
> + */
> +assert(QLIST_EMPTY(>tracked_requests));
> +   

Re: [Qemu-block] [Qemu-devel] [PATCH 04/10][TRIVIAL] grackle: add to bridge category

2015-09-28 Thread Thomas Huth
On 26/09/15 18:22, Laurent Vivier wrote:
> Grackle is the PCI host controller of oldworld powermac,
> so add it to the bridge category.
> 
> Signed-off-by: Laurent Vivier 
> ---
>  hw/pci-host/grackle.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/hw/pci-host/grackle.c b/hw/pci-host/grackle.c
> index bfe707a..ea31b72 100644
> --- a/hw/pci-host/grackle.c
> +++ b/hw/pci-host/grackle.c
> @@ -146,8 +146,10 @@ static const TypeInfo grackle_pci_info = {
>  static void pci_grackle_class_init(ObjectClass *klass, void *data)
>  {
>  SysBusDeviceClass *k = SYS_BUS_DEVICE_CLASS(klass);
> +DeviceClass *dc = DEVICE_CLASS(klass);
>  
>  k->init = pci_grackle_init_device;
> +set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
>  }

Reviewed-by: Thomas Huth 





Re: [Qemu-block] [PATCH 10/16] block/io: Make bdrv_requests_pending() public

2015-09-28 Thread Alberto Garcia
On Thu 17 Sep 2015 03:48:14 PM CEST, Kevin Wolf wrote:
> Signed-off-by: Kevin Wolf 
> ---
>  block/io.c| 2 +-
>  include/block/block_int.h | 1 +
>  2 files changed, 2 insertions(+), 1 deletion(-)

Reviewed-by: Alberto Garcia 

Berto



Re: [Qemu-block] [Qemu-devel] [PATCH 10/10][TRIVIAL] openpic: add to misc category

2015-09-28 Thread Thomas Huth
On 26/09/15 18:22, Laurent Vivier wrote:
> openpic is a programmable interrupt controller, so
> add it to the misc category.
> 
> Signed-off-by: Laurent Vivier 
> ---
>  hw/intc/openpic.c | 1 +
>  hw/intc/openpic_kvm.c | 1 +
>  2 files changed, 2 insertions(+)
> 
> diff --git a/hw/intc/openpic.c b/hw/intc/openpic.c
> index 14ab0e3..bfcf155 100644
> --- a/hw/intc/openpic.c
> +++ b/hw/intc/openpic.c
> @@ -1643,6 +1643,7 @@ static void openpic_class_init(ObjectClass *oc, void 
> *data)
>  dc->props = openpic_properties;
>  dc->reset = openpic_reset;
>  dc->vmsd = _openpic;
> +set_bit(DEVICE_CATEGORY_MISC, dc->categories);
>  }
>  
>  static const TypeInfo openpic_info = {
> diff --git a/hw/intc/openpic_kvm.c b/hw/intc/openpic_kvm.c
> index f7cac58..649f476 100644
> --- a/hw/intc/openpic_kvm.c
> +++ b/hw/intc/openpic_kvm.c
> @@ -275,6 +275,7 @@ static void kvm_openpic_class_init(ObjectClass *oc, void 
> *data)
>  dc->realize = kvm_openpic_realize;
>  dc->props = kvm_openpic_properties;
>  dc->reset = kvm_openpic_reset;
> +set_bit(DEVICE_CATEGORY_MISC, dc->categories);
>  }

Reviewed-by: Thomas Huth 





Re: [Qemu-block] [Qemu-devel] [PATCH 09/10][TRIVIAL] macio-nvram: add to misc category

2015-09-28 Thread Thomas Huth
On 26/09/15 18:22, Laurent Vivier wrote:
> The macio nvram is a non volatile RAM, so add it
> the misc category.
> 
> Signed-off-by: Laurent Vivier 
> ---
>  hw/nvram/mac_nvram.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/hw/nvram/mac_nvram.c b/hw/nvram/mac_nvram.c
> index d35f8a3..9f16566 100644
> --- a/hw/nvram/mac_nvram.c
> +++ b/hw/nvram/mac_nvram.c
> @@ -123,6 +123,7 @@ static void macio_nvram_class_init(ObjectClass *oc, void 
> *data)
>  dc->reset = macio_nvram_reset;
>  dc->vmsd = _macio_nvram;
>  dc->props = macio_nvram_properties;
> +set_bit(DEVICE_CATEGORY_MISC, dc->categories);
>  }

Reviewed-by: Thomas Huth 





Re: [Qemu-block] [PATCH 2/3] block: mirror - split out part of mirror_run()

2015-09-28 Thread Kevin Wolf
Am 28.09.2015 um 05:29 hat Jeff Cody geschrieben:
> This is code relocation, to pull the part of mirror_run() that
> calls mirror_iteration out into a separate function.
> 
> Signed-off-by: Jeff Cody 
> ---
>  block/mirror.c | 206 
> ++---
>  1 file changed, 110 insertions(+), 96 deletions(-)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index 8b3e89b..405e5c4 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -318,6 +318,115 @@ static uint64_t coroutine_fn 
> mirror_iteration(MirrorBlockJob *s)
>  return delay_ns;
>  }
>  
> +static int mirror_do_iteration(MirrorBlockJob *s, uint64_t last_pause_ns)
> +{
> +int ret;
> +
> +BlockDriverState *bs = s->common.bs;
> +
> +for (;;) {
> +uint64_t delay_ns = 0;
> +int64_t cnt;
> +bool should_complete;
> +
> +if (s->ret < 0) {
> +ret = s->ret;
> +goto immediate_exit;
> +}
> +
> +cnt = bdrv_get_dirty_count(s->dirty_bitmap);
> +/* s->common.offset contains the number of bytes already processed so
> + * far, cnt is the number of dirty sectors remaining and
> + * s->sectors_in_flight is the number of sectors currently being
> + * processed; together those are the current total operation length 
> */
> +s->common.len = s->common.offset +
> +(cnt + s->sectors_in_flight) * BDRV_SECTOR_SIZE;
> +
> +/* Note that even when no rate limit is applied we need to yield
> + * periodically with no pending I/O so that bdrv_drain_all() returns.
> + * We do so every SLICE_TIME nanoseconds, or when there is an error,
> + * or when the source is clean, whichever comes first.
> + */
> +if (qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - last_pause_ns < 
> SLICE_TIME
> +&& s->common.iostatus == BLOCK_DEVICE_IO_STATUS_OK) {
> +if (s->in_flight == MAX_IN_FLIGHT || s->buf_free_count == 0 ||
> +(cnt == 0 && s->in_flight > 0)) {
> +trace_mirror_yield(s, s->in_flight, s->buf_free_count, cnt);
> +s->waiting_for_io = true;
> +qemu_coroutine_yield();
> +s->waiting_for_io = false;
> +continue;
> +} else if (cnt != 0) {
> +delay_ns = mirror_iteration(s);
> +}
> +}
> +
> +should_complete = false;
> +if (s->in_flight == 0 && cnt == 0) {
> +trace_mirror_before_flush(s);
> +ret = bdrv_flush(s->target);
> +if (ret < 0) {
> +if (mirror_error_action(s, false, -ret) ==
> +BLOCK_ERROR_ACTION_REPORT) {
> +goto immediate_exit;
> +}
> +} else {
> +/* We're out of the streaming phase.  From now on, if the job
> + * is cancelled we will actually complete all pending I/O and
> + * report completion.  This way, block-job-cancel will leave
> + * the target in a consistent state.
> + */
> +if (!s->synced) {
> +block_job_event_ready(>common);
> +s->synced = true;
> +}
> +
> +should_complete = s->should_complete ||
> +block_job_is_cancelled(>common);
> +cnt = bdrv_get_dirty_count(s->dirty_bitmap);
> +}
> +}
> +
> +if (cnt == 0 && should_complete) {
> +/* The dirty bitmap is not updated while operations are pending.
> + * If we're about to exit, wait for pending operations before
> + * calling bdrv_get_dirty_count(bs), or we may exit while the
> + * source has dirty data to copy!
> + *
> + * Note that I/O can be submitted by the guest while
> + * mirror_populate runs.
> + */
> +trace_mirror_before_drain(s, cnt);
> +bdrv_drain(bs);
> +cnt = bdrv_get_dirty_count(s->dirty_bitmap);
> +}
> +
> +ret = 0;
> +trace_mirror_before_sleep(s, cnt, s->synced, delay_ns);
> +if (!s->synced) {
> +block_job_sleep_ns(>common, QEMU_CLOCK_REALTIME, delay_ns);
> +if (block_job_is_cancelled(>common)) {
> +break;
> +}
> +} else if (!should_complete) {
> +delay_ns = (s->in_flight == 0 && cnt == 0 ? SLICE_TIME : 0);
> +block_job_sleep_ns(>common, QEMU_CLOCK_REALTIME, delay_ns);
> +} else if (cnt == 0) {
> +/* The two disks are in sync.  Exit and report successful
> + * completion.
> + */
> +assert(QLIST_EMPTY(>tracked_requests));
> +s->common.cancelled = false;
> +break;
> +}
> +last_pause_ns = 

Re: [Qemu-block] [PATCH 3/3] block: mirror - zero unallocated target sectors when zero init not present

2015-09-28 Thread Kevin Wolf
Am 28.09.2015 um 05:29 hat Jeff Cody geschrieben:
> During mirror, if the target device does not have support zero
> initialization, a mirror may result in a corrupt image.

I think you want to check this sentence. ("During mirror [...], a
mirror may result [...]")

> For instance, on mirror to a host device with format = raw, whatever
> random data is on the target device will still be there for unallocated
> sectors.
> 
> This is because during the mirror, we set the dirty bitmap to copy only
> sectors allocated above 'base'.  In the case of target devices where we
> cannot assume unallocated sectors will be read as zeroes, we need to
> explicitely zero out this data.
> 
> In order to avoid zeroing out all sectors of the target device prior to
> mirroring, we do zeroing as part of the block job.  A second dirty
> bitmap cache is created, to track sectors that are unallocated above
> 'base'.  These sectors are then checked for status of BDRV_BLOCK_ZERO
> on the target - if they are not, then zeroes are explicitly written.

Why do you need a bitmap? You never change the bitmap after initialising
it, so couldn't you instead just check the allocation status when you
need it?

In fact, why do we need two passes? I would have expected that commit
dcfb3beb already does the trick, with checking allocation status and
writing zeroes during the normal single pass.

If that commit fails to solve the problem, I guess I first need to
understand why before I can continue reviewing this one...

> This only occurs under two conditions:
> 
> 1. 'mode' != "existing"
> 2. bdrv_has_zero_init(target) == NULL
> 
> We perform the mirroring through mirror_iteration() as before, except
> in two passes.  If the above two conditions are met, the first pass
> is using the bitmap tracking unallocated sectors, to write the needed
> zeroes.  Then, the second pass is performed, to mirror the actual data
> as before.
> 
> If the above two conditions are not met, then the first pass is skipped,
> and only the second pass (the one with the actual data) is performed.
> 
> Signed-off-by: Jeff Cody 
> ---
>  block/mirror.c| 109 
> ++
>  blockdev.c|   2 +-
>  include/block/block_int.h |   3 +-
>  qapi/block-core.json  |   6 ++-
>  4 files changed, 87 insertions(+), 33 deletions(-)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index 405e5c4..b599176 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -50,7 +50,9 @@ typedef struct MirrorBlockJob {
>  int64_t bdev_length;
>  unsigned long *cow_bitmap;
>  BdrvDirtyBitmap *dirty_bitmap;
> -HBitmapIter hbi;
> +HBitmapIter zero_hbi;
> +HBitmapIter allocated_hbi;
> +HBitmapIter *hbi;
>  uint8_t *buf;
>  QSIMPLEQ_HEAD(, MirrorBuffer) buf_free;
>  int buf_free_count;
> @@ -60,6 +62,8 @@ typedef struct MirrorBlockJob {
>  int sectors_in_flight;
>  int ret;
>  bool unmap;
> +bool zero_unallocated;
> +bool zero_cycle;
>  bool waiting_for_io;
>  } MirrorBlockJob;
>  
> @@ -166,10 +170,10 @@ static uint64_t coroutine_fn 
> mirror_iteration(MirrorBlockJob *s)
>  int pnum;
>  int64_t ret;
>  
> -s->sector_num = hbitmap_iter_next(>hbi);
> +s->sector_num = hbitmap_iter_next(s->hbi);
>  if (s->sector_num < 0) {
> -bdrv_dirty_iter_init(s->dirty_bitmap, >hbi);
> -s->sector_num = hbitmap_iter_next(>hbi);
> +bdrv_dirty_iter_init(s->dirty_bitmap, s->hbi);
> +s->sector_num = hbitmap_iter_next(s->hbi);
>  trace_mirror_restart_iter(s, bdrv_get_dirty_count(s->dirty_bitmap));
>  assert(s->sector_num >= 0);
>  }
> @@ -287,7 +291,7 @@ static uint64_t coroutine_fn 
> mirror_iteration(MirrorBlockJob *s)
>   */
>  if (next_sector > hbitmap_next_sector
>  && bdrv_get_dirty(source, s->dirty_bitmap, next_sector)) {
> -hbitmap_next_sector = hbitmap_iter_next(>hbi);
> +hbitmap_next_sector = hbitmap_iter_next(s->hbi);
>  }
>  
>  next_sector += sectors_per_chunk;
> @@ -300,25 +304,34 @@ static uint64_t coroutine_fn 
> mirror_iteration(MirrorBlockJob *s)
>  s->sectors_in_flight += nb_sectors;
>  trace_mirror_one_iteration(s, sector_num, nb_sectors);
>  
> -ret = bdrv_get_block_status_above(source, NULL, sector_num,
> -  nb_sectors, );
> -if (ret < 0 || pnum < nb_sectors ||
> -(ret & BDRV_BLOCK_DATA && !(ret & BDRV_BLOCK_ZERO))) {
> -bdrv_aio_readv(source, sector_num, >qiov, nb_sectors,
> -   mirror_read_complete, op);
> -} else if (ret & BDRV_BLOCK_ZERO) {
> -bdrv_aio_write_zeroes(s->target, sector_num, op->nb_sectors,
> -  s->unmap ? BDRV_REQ_MAY_UNMAP : 0,
> -  mirror_write_complete, op);
> +if (s->zero_cycle) {
> +ret = 

Re: [Qemu-block] [PATCH v2] iotests: disable core dumps in test 061

2015-09-28 Thread Max Reitz
On 28.09.2015 16:23, Alberto Garcia wrote:
> Commit 934659c460 disabled the supression of segmentation faults in
> bash tests. The new output of test 061, however, assumes that a core
> dump will be produced if a program aborts. This is not necessarily the
> case because core dumps can be disabled using ulimit.
> 
> Since we cannot guarantee that abort() will produce a core dump, we
> should use SIGKILL instead (that does not produce any) and update the
> test output accordingly.
> 
> Signed-off-by: Alberto Garcia 
> ---
>  tests/qemu-iotests/061 | 8 
>  tests/qemu-iotests/061.out | 4 ++--
>  2 files changed, 6 insertions(+), 6 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


[Qemu-block] [PATCH v2] iotests: disable core dumps in test 061

2015-09-28 Thread Alberto Garcia
Commit 934659c460 disabled the supression of segmentation faults in
bash tests. The new output of test 061, however, assumes that a core
dump will be produced if a program aborts. This is not necessarily the
case because core dumps can be disabled using ulimit.

Since we cannot guarantee that abort() will produce a core dump, we
should use SIGKILL instead (that does not produce any) and update the
test output accordingly.

Signed-off-by: Alberto Garcia 
---
 tests/qemu-iotests/061 | 8 
 tests/qemu-iotests/061.out | 4 ++--
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/tests/qemu-iotests/061 b/tests/qemu-iotests/061
index 1df887a..e191e65 100755
--- a/tests/qemu-iotests/061
+++ b/tests/qemu-iotests/061
@@ -58,8 +58,8 @@ echo
 echo "=== Testing dirty version downgrade ==="
 echo
 IMGOPTS="compat=1.1,lazy_refcounts=on" _make_test_img 64M
-$QEMU_IO -c "write -P 0x2a 0 128k" -c flush -c abort "$TEST_IMG" 2>&1 \
-| _filter_qemu_io
+$QEMU_IO -c "write -P 0x2a 0 128k" -c flush \
+ -c "sigraise $(kill -l KILL)" "$TEST_IMG" 2>&1 | _filter_qemu_io
 $PYTHON qcow2.py "$TEST_IMG" dump-header
 $QEMU_IMG amend -o "compat=0.10" "$TEST_IMG"
 $PYTHON qcow2.py "$TEST_IMG" dump-header
@@ -92,8 +92,8 @@ echo
 echo "=== Testing dirty lazy_refcounts=off ==="
 echo
 IMGOPTS="compat=1.1,lazy_refcounts=on" _make_test_img 64M
-$QEMU_IO -c "write -P 0x2a 0 128k" -c flush -c abort "$TEST_IMG" 2>&1 \
-| _filter_qemu_io
+$QEMU_IO -c "write -P 0x2a 0 128k" -c flush \
+ -c "sigraise $(kill -l KILL)" "$TEST_IMG" 2>&1 | _filter_qemu_io
 $PYTHON qcow2.py "$TEST_IMG" dump-header
 $QEMU_IMG amend -o "lazy_refcounts=off" "$TEST_IMG"
 $PYTHON qcow2.py "$TEST_IMG" dump-header
diff --git a/tests/qemu-iotests/061.out b/tests/qemu-iotests/061.out
index a683f46..b16bea9 100644
--- a/tests/qemu-iotests/061.out
+++ b/tests/qemu-iotests/061.out
@@ -57,7 +57,7 @@ No errors were found on the image.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
 wrote 131072/131072 bytes at offset 0
 128 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-./common.config: Aborted (core dumped) ( exec "$QEMU_IO_PROG" 
$QEMU_IO_OPTIONS "$@" )
+./common.config: Killed  ( exec "$QEMU_IO_PROG" 
$QEMU_IO_OPTIONS "$@" )
 magic 0x514649fb
 version   3
 backing_file_offset   0x0
@@ -215,7 +215,7 @@ No errors were found on the image.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
 wrote 131072/131072 bytes at offset 0
 128 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-./common.config: Aborted (core dumped) ( exec "$QEMU_IO_PROG" 
$QEMU_IO_OPTIONS "$@" )
+./common.config: Killed  ( exec "$QEMU_IO_PROG" 
$QEMU_IO_OPTIONS "$@" )
 magic 0x514649fb
 version   3
 backing_file_offset   0x0
-- 
2.5.3




Re: [Qemu-block] [Qemu-devel] [PATCH 06/10][TRIVIAL] macio-ide: add to storage category

2015-09-28 Thread John Snow


On 09/26/2015 12:22 PM, Laurent Vivier wrote:
> macio-ide is an IDE controller, so add it
> to the storage category.
> 
> Signed-off-by: Laurent Vivier 
> ---
>  hw/ide/macio.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/hw/ide/macio.c b/hw/ide/macio.c
> index 66ac2ba..893c9b9 100644
> --- a/hw/ide/macio.c
> +++ b/hw/ide/macio.c
> @@ -590,6 +590,7 @@ static void macio_ide_class_init(ObjectClass *oc, void 
> *data)
>  dc->realize = macio_ide_realizefn;
>  dc->reset = macio_ide_reset;
>  dc->vmsd = _pmac;
> +set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
>  }
>  
>  static const TypeInfo macio_ide_type_info = {
> 

Thanks,

Reviewed-by: John Snow 

I'll leave this to be staged by whomever stages the entire series.

--js



Re: [Qemu-block] [PATCH 1/3] block: allow creation of detached dirty bitmaps

2015-09-28 Thread Stefan Hajnoczi
On Sun, Sep 27, 2015 at 11:29:16PM -0400, Jeff Cody wrote:
> This allows the creation of detached dirty bitmaps, so that the
> block driver dirty bitmaps can be used without inserting the
> bitmap into the dirty bitmap list for a BDS.
> 
> To free a bitmap that was created "detached = true", call
> bdrv_release_dirty_bitmap() with the BlockDriverState argument
> as NULL.

I wonder if just disabling the bitmap with bdrv_disable_dirty_bitmap()
is enough?



Re: [Qemu-block] [PATCH 3/3] block: mirror - zero unallocated target sectors when zero init not present

2015-09-28 Thread Kevin Wolf
Am 28.09.2015 um 05:29 hat Jeff Cody geschrieben:
> During mirror, if the target device does not have support zero
> initialization, a mirror may result in a corrupt image.
> 
> For instance, on mirror to a host device with format = raw, whatever
> random data is on the target device will still be there for unallocated
> sectors.
> 
> This is because during the mirror, we set the dirty bitmap to copy only
> sectors allocated above 'base'.  In the case of target devices where we
> cannot assume unallocated sectors will be read as zeroes, we need to
> explicitely zero out this data.
> 
> In order to avoid zeroing out all sectors of the target device prior to
> mirroring, we do zeroing as part of the block job.  A second dirty
> bitmap cache is created, to track sectors that are unallocated above
> 'base'.  These sectors are then checked for status of BDRV_BLOCK_ZERO
> on the target - if they are not, then zeroes are explicitly written.
> 
> This only occurs under two conditions:
> 
> 1. 'mode' != "existing"
> 2. bdrv_has_zero_init(target) == NULL
> 
> We perform the mirroring through mirror_iteration() as before, except
> in two passes.  If the above two conditions are met, the first pass
> is using the bitmap tracking unallocated sectors, to write the needed
> zeroes.  Then, the second pass is performed, to mirror the actual data
> as before.
> 
> If the above two conditions are not met, then the first pass is skipped,
> and only the second pass (the one with the actual data) is performed.
> 
> Signed-off-by: Jeff Cody 

Also, this makes qemu-iotests 097 fail for me.

Kevin



Re: [Qemu-block] [PATCH 03/18] qemu-thread: introduce QemuLockCnt

2015-09-28 Thread Paolo Bonzini


On 28/09/2015 12:15, Stefan Hajnoczi wrote:
> On Thu, Aug 06, 2015 at 03:36:01PM +0200, Paolo Bonzini wrote:
>> > +int qemu_lockcnt_count(QemuLockCnt *lockcnt);
> Why use int here when the counter field is unsigned?

Nice catch, will fix.

Paolo



Re: [Qemu-block] [PATCH 03/18] qemu-thread: introduce QemuLockCnt

2015-09-28 Thread Stefan Hajnoczi
On Thu, Aug 06, 2015 at 03:36:01PM +0200, Paolo Bonzini wrote:
> +int qemu_lockcnt_count(QemuLockCnt *lockcnt);

Why use int here when the counter field is unsigned?



Re: [Qemu-block] [Qemu-devel] [PATCH v2 00/11] aio: Introduce handler type to fix nested aio_poll for dataplane

2015-09-28 Thread Stefan Hajnoczi
On Fri, Sep 11, 2015 at 02:22:14PM +0200, Kevin Wolf wrote:
> Am 11.09.2015 um 13:46 hat Fam Zheng geschrieben:
> > On Fri, 09/11 12:39, Kevin Wolf wrote:
> > > Am 29.07.2015 um 06:42 hat Fam Zheng geschrieben:
> > > > v2: Switch to disable/enable model. [Paolo]
> > > > 
> > > > Most existing nested aio_poll()'s in block layer are inconsiderate of
> > > > dispatching potential new r/w requests from ioeventfds and nbd exports, 
> > > > which
> > > > might result in responsiveness issues (e.g. bdrv_drain_all will not 
> > > > return when
> > > > new requests keep coming), or even wrong semantics (e.g. 
> > > > qmp_transaction cannot
> > > > enforce atomicity due to aio_poll in bdrv_drain_all).
> > > > 
> > > > Previous attampts to address this issue include new op blocker[1], 
> > > > bdrv_lock[2]
> > > > and nested AioContext (patches not posted to qemu-devel).
> > > > 
> > > > This approach is based on the idea proposed by Paolo Bonzini. The 
> > > > original idea
> > > > is introducing "aio_context_disable_client / aio_context_enable_client 
> > > > to
> > > > filter AioContext handlers according to the "client", e.g.
> > > > AIO_CLIENT_DATAPLANE (ioeventfd), AIO_CLIENT_PROTOCOL, 
> > > > AIO_CLIENT_NBD_SERVER,
> > > > AIO_CLIENT_CONTEXT, ... Extend aio_set_{event_notifier,fd}_handler to 
> > > > pass a
> > > > client (type)." 
> > > > 
> > > > After this series, block layer aio_poll() will only process those 
> > > > "protocol"
> > > > fds that are used in block I/O, plus the ctx->notifier for 
> > > > aio_notify();  other
> > > > aio_poll()'s keep unchanged.
> > > > 
> > > > The biggest advantage over approaches [1] and [2] is, no change is 
> > > > needed in
> > > > virtio-{blk,scsi}-dataplane code, also this doesn't depend on 
> > > > converting QMP to
> > > > coroutines.
> > > 
> > > It seems that I haven't replied on the mailing list yet, even though I
> > > think I already said this in person at KVM Forum: This series fixes only
> > > a special case of the real problem, which is that bdrv_drain/all at a
> > > single point doesn't make a lot of sense, but needs to be replaced by a
> > > whole section with exclusive access, like a bdrv_drained_begin/end pair.
> > > 
> > > To be clear: Anything that works with types of users instead of
> > > individual users is bound to fall short of being a complete solution. I
> > > don't prefer partial solutions when we know there is a bigger problem.
> > > 
> > > This series addresses your immediate need of protecting against new data
> > > plane requests, which it arguably achieves. The second case I always
> > > have in mind is Berto's case where he has multiple streaming block jobs
> > > in the same backing file chain [1].
> > > 
> > > This involves a bdrv_reopen() of the target BDS to make it writable, and
> > > bdrv_reopen() uses bdrv_drain_all() so drivers don't have to cope with
> > > running requests while reopening themselves. It can however involve
> > > nested event loops for synchronous operations like bdrv_flush(), and if
> > > those can process completions of block jobs, which can respond by doing
> > > anything to the respective node, things can go wrong.
> > 
> > Just to get a better idea of bdrv_drained_begin/end, could you explain how 
> > to
> > use the pair to fix the above problem?
> 
> How to use it is easy part: In bdrv_reopen_multiple(), you would replace
> the existing bdrv_drain_all() with begin and you would add the
> corresponding end right before the return statement.
> 
> > > You don't solve this by adding client types (then problematic request
> > > would be PROTOCOL in your proposal and you can never exclude that), but
> > > you really need to have bdrv_drained_being/end pairs, where only
> > > requests issued in between are processed and everything else waits.
> > 
> > What do you mean by "only requests issued in between are processed"? Where 
> > are
> > the requests from?
> 
> Generally speaking, you would have code that looks like this:
> 
> bdrv_drain_begin()
> ...
> bdrv_something_synchronous()
> ...
> bdrv_drain_end()
> 
> You want to process everything that is necessary for completing
> bdrv_something_synchronous(), but nothing else.
> 
> The trickier question is how to implement this. I know that it's much
> easier to say that your series doesn't work than actually proposing
> something else that works...
> 
> One relatively obvious answer we found when we discussed this a while
> back was some kind of a recursive CoRwLock (reader = in-flight request;
> writer = drained section), but that requires obviously that you're
> running in a coroutine if you want to do something with a drained
> request queue.

At one point I thought about converting vl.c:main() to run everything in
a coroutine.  It would allow us to eliminate synchronous operations
completely.

The problem is that it's quite a big change that requires protecting
formerly synchronous operations against concurrency.

Stefan



Re: [Qemu-block] [PATCH 3/3] block: mirror - zero unallocated target sectors when zero init not present

2015-09-28 Thread Stefan Hajnoczi
On Sun, Sep 27, 2015 at 11:29:18PM -0400, Jeff Cody wrote:
> +if (s->zero_cycle) {
> +ret = bdrv_get_block_status(s->target, sector_num, nb_sectors, 
> );
> +if (!(ret & BDRV_BLOCK_ZERO)) {
> +bdrv_aio_write_zeroes(s->target, sector_num, op->nb_sectors,
> +  s->unmap ? BDRV_REQ_MAY_UNMAP : 0,
> +  mirror_write_complete, op);

mirror_write_complete will advance s->common.offset.  Won't the progress
be incorrect if we do that for both zeroing and regular mirroring?



Re: [Qemu-block] [Qemu-devel] [PATCH 3/3] block: mirror - zero unallocated target sectors when zero init not present

2015-09-28 Thread Jeff Cody
On Sep 28, 2015 4:12 PM, "Paolo Bonzini"  wrote:
>
>

Replying from mobile; please excuse any formatting issues.

>
> On 28/09/2015 05:29, Jeff Cody wrote:
> > This only occurs under two conditions:
> >
> > 1. 'mode' != "existing"
> > 2. bdrv_has_zero_init(target) == NULL
> >
>
> I'm not sure if mode != "existing" actually matters.  I think what
> actually matters is sync == "full".
>
> The reasons are:
>
> 1) with sync != "full", unallocated target sectors should remain
> unallocated on the destination because they are supposed to point to the
> backing file.

I guess that makes sense.  What about  the case when the target is a raw
device without zero init?  There is no backing file... Of course, perhaps
in the raw case the user should be using sync==full anyways.

>
> 2) even with mode == "existing" you expect the data to be consistent at
> the end of the mirroring
>

The reason I added the "existing" exception was so the user could avoid the
time penalty of zeroing out the data if they knew the target had already
explicitly been zeroed.  Do you think it is fair to assume that if the user
specified existing, that they take responsibility for setting up the target
image how they like (including data initialization)?  Or should we add
another option for mirror, to allow the user to bypass the zero fill?

Thanks,
Jeff


Re: [Qemu-block] [PATCH 2/3] block: mirror - split out part of mirror_run()

2015-09-28 Thread Max Reitz
On 28.09.2015 05:29, Jeff Cody wrote:
> This is code relocation, to pull the part of mirror_run() that
> calls mirror_iteration out into a separate function.
> 
> Signed-off-by: Jeff Cody 
> ---
>  block/mirror.c | 206 
> ++---
>  1 file changed, 110 insertions(+), 96 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH 1/3] block: allow creation of detached dirty bitmaps

2015-09-28 Thread Max Reitz
On 28.09.2015 05:29, Jeff Cody wrote:
> This allows the creation of detached dirty bitmaps, so that the
> block driver dirty bitmaps can be used without inserting the
> bitmap into the dirty bitmap list for a BDS.
> 
> To free a bitmap that was created "detached = true", call
> bdrv_release_dirty_bitmap() with the BlockDriverState argument
> as NULL.
> 
> Signed-off-by: Jeff Cody 
> ---
>  block.c   | 26 --
>  block/mirror.c|  3 ++-
>  blockdev.c|  2 +-
>  include/block/block.h |  1 +
>  migration/block.c |  2 +-
>  5 files changed, 25 insertions(+), 9 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


[Qemu-block] [PATCH] qtest/ide-test: ppc64be correction for ATAPI tests

2015-09-28 Thread John Snow
the 16bit ide data register is LE by definition.

Signed-off-by: John Snow 
---
 tests/ide-test.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tests/ide-test.c b/tests/ide-test.c
index 5594738..b6e9e1a 100644
--- a/tests/ide-test.c
+++ b/tests/ide-test.c
@@ -633,7 +633,7 @@ static void send_scsi_cdb_read10(uint64_t lba, int nblocks)
 
 /* Send Packet */
 for (i = 0; i < sizeof(Read10CDB)/2; i++) {
-outw(IDE_BASE + reg_data, ((uint16_t *))[i]);
+outw(IDE_BASE + reg_data, cpu_to_le16(((uint16_t *))[i]));
 }
 }
 
@@ -733,7 +733,7 @@ static void cdrom_pio_impl(int nblocks)
 size_t offset = i * (limit / 2);
 size_t rem = (rxsize / 2) - offset;
 for (j = 0; j < MIN((limit / 2), rem); j++) {
-rx[offset + j] = inw(IDE_BASE + reg_data);
+rx[offset + j] = le16_to_cpu(inw(IDE_BASE + reg_data));
 }
 ide_wait_intr(IDE_PRIMARY_IRQ);
 }
-- 
2.4.3




Re: [Qemu-block] [PATCH 3/3] block: mirror - zero unallocated target sectors when zero init not present

2015-09-28 Thread Max Reitz
On 28.09.2015 05:29, Jeff Cody wrote:
> During mirror, if the target device does not have support zero
> initialization, a mirror may result in a corrupt image.
> 
> For instance, on mirror to a host device with format = raw, whatever
> random data is on the target device will still be there for unallocated
> sectors.
> 
> This is because during the mirror, we set the dirty bitmap to copy only
> sectors allocated above 'base'.  In the case of target devices where we
> cannot assume unallocated sectors will be read as zeroes, we need to
> explicitely zero out this data.
> 
> In order to avoid zeroing out all sectors of the target device prior to
> mirroring, we do zeroing as part of the block job.  A second dirty
> bitmap cache is created, to track sectors that are unallocated above
> 'base'.  These sectors are then checked for status of BDRV_BLOCK_ZERO
> on the target - if they are not, then zeroes are explicitly written.
> 
> This only occurs under two conditions:
> 
> 1. 'mode' != "existing"
> 2. bdrv_has_zero_init(target) == NULL
> 
> We perform the mirroring through mirror_iteration() as before, except
> in two passes.  If the above two conditions are met, the first pass
> is using the bitmap tracking unallocated sectors, to write the needed
> zeroes.  Then, the second pass is performed, to mirror the actual data
> as before.
> 
> If the above two conditions are not met, then the first pass is skipped,
> and only the second pass (the one with the actual data) is performed.
> 
> Signed-off-by: Jeff Cody 
> ---
>  block/mirror.c| 109 
> ++
>  blockdev.c|   2 +-
>  include/block/block_int.h |   3 +-
>  qapi/block-core.json  |   6 ++-
>  4 files changed, 87 insertions(+), 33 deletions(-)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index 405e5c4..b599176 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -50,7 +50,9 @@ typedef struct MirrorBlockJob {
>  int64_t bdev_length;
>  unsigned long *cow_bitmap;
>  BdrvDirtyBitmap *dirty_bitmap;
> -HBitmapIter hbi;
> +HBitmapIter zero_hbi;
> +HBitmapIter allocated_hbi;
> +HBitmapIter *hbi;
>  uint8_t *buf;
>  QSIMPLEQ_HEAD(, MirrorBuffer) buf_free;
>  int buf_free_count;
> @@ -60,6 +62,8 @@ typedef struct MirrorBlockJob {
>  int sectors_in_flight;
>  int ret;
>  bool unmap;
> +bool zero_unallocated;
> +bool zero_cycle;
>  bool waiting_for_io;
>  } MirrorBlockJob;
>  
> @@ -166,10 +170,10 @@ static uint64_t coroutine_fn 
> mirror_iteration(MirrorBlockJob *s)
>  int pnum;
>  int64_t ret;
>  
> -s->sector_num = hbitmap_iter_next(>hbi);
> +s->sector_num = hbitmap_iter_next(s->hbi);
>  if (s->sector_num < 0) {
> -bdrv_dirty_iter_init(s->dirty_bitmap, >hbi);
> -s->sector_num = hbitmap_iter_next(>hbi);
> +bdrv_dirty_iter_init(s->dirty_bitmap, s->hbi);
> +s->sector_num = hbitmap_iter_next(s->hbi);
>  trace_mirror_restart_iter(s, bdrv_get_dirty_count(s->dirty_bitmap));
>  assert(s->sector_num >= 0);
>  }
> @@ -287,7 +291,7 @@ static uint64_t coroutine_fn 
> mirror_iteration(MirrorBlockJob *s)
>   */
>  if (next_sector > hbitmap_next_sector
>  && bdrv_get_dirty(source, s->dirty_bitmap, next_sector)) {
> -hbitmap_next_sector = hbitmap_iter_next(>hbi);
> +hbitmap_next_sector = hbitmap_iter_next(s->hbi);
>  }
>  
>  next_sector += sectors_per_chunk;
> @@ -300,25 +304,34 @@ static uint64_t coroutine_fn 
> mirror_iteration(MirrorBlockJob *s)
>  s->sectors_in_flight += nb_sectors;
>  trace_mirror_one_iteration(s, sector_num, nb_sectors);
>  
> -ret = bdrv_get_block_status_above(source, NULL, sector_num,
> -  nb_sectors, );
> -if (ret < 0 || pnum < nb_sectors ||
> -(ret & BDRV_BLOCK_DATA && !(ret & BDRV_BLOCK_ZERO))) {
> -bdrv_aio_readv(source, sector_num, >qiov, nb_sectors,
> -   mirror_read_complete, op);
> -} else if (ret & BDRV_BLOCK_ZERO) {
> -bdrv_aio_write_zeroes(s->target, sector_num, op->nb_sectors,
> -  s->unmap ? BDRV_REQ_MAY_UNMAP : 0,
> -  mirror_write_complete, op);
> +if (s->zero_cycle) {
> +ret = bdrv_get_block_status(s->target, sector_num, nb_sectors, 
> );
> +if (!(ret & BDRV_BLOCK_ZERO)) {
> +bdrv_aio_write_zeroes(s->target, sector_num, op->nb_sectors,
> +  s->unmap ? BDRV_REQ_MAY_UNMAP : 0,
> +  mirror_write_complete, op);
> +}
>  } else {
> -assert(!(ret & BDRV_BLOCK_DATA));
> -bdrv_aio_discard(s->target, sector_num, op->nb_sectors,
> - mirror_write_complete, op);
> +ret = bdrv_get_block_status_above(source, NULL, sector_num,
> +

Re: [Qemu-block] [Qemu-devel] [PATCH 3/3] block: mirror - zero unallocated target sectors when zero init not present

2015-09-28 Thread Eric Blake
On 09/28/2015 08:13 AM, Paolo Bonzini wrote:
> 
> 
> On 28/09/2015 05:29, Jeff Cody wrote:
>> This only occurs under two conditions:
>>
>> 1. 'mode' != "existing"
>> 2. bdrv_has_zero_init(target) == NULL
>>
> 
> I'm not sure if mode != "existing" actually matters.  I think what
> actually matters is sync == "full".

When mode == 'existing' for a shallow mirror (sync != 'full'), that is
the caller stating that the guest-visible contents of the destination
match the guest-visible contents of the backing image.  The only sectors
to be copied are those that differ from the backing file, and we should
not be zeroing unrelated sectors because the user has already promised
they have the same guest-visible content as the backing image would report.

When mode == 'existing' for a full mirror (sync == 'full'), that is the
caller stating that they want every single sector of the destination
written to hold the current state of the source (of course, allowing for
optimizations such as skipping the write where the contents will read
back the same as if the write had been performed).

I think Paolo is right: we care about zeroing unallocated sectors for
sync == 'full', regardless of whether mode == 'existing'.

I also think the reason Jeff confused it for mode == 'existing' is that
the other modes let qemu create the file, but qemu does not create block
devices (the only way to mirror to a block device is via mode ==
'existing'), and it is primarily block devices where zero init is not
guaranteed.

> 
> The reasons are:
> 
> 1) with sync != "full", unallocated target sectors should remain
> unallocated on the destination because they are supposed to point to the
> backing file.
> 
> 2) even with mode == "existing" you expect the data to be consistent at
> the end of the mirroring
> 
> Paolo
> 
> 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [PATCH 3/3] block: mirror - zero unallocated target sectors when zero init not present

2015-09-28 Thread Jeff Cody
On Sep 28, 2015 5:34 PM, "Kevin Wolf"  wrote:
>
> Am 28.09.2015 um 05:29 hat Jeff Cody geschrieben:
> > During mirror, if the target device does not have support zero
> > initialization, a mirror may result in a corrupt image.
> >
> > For instance, on mirror to a host device with format = raw, whatever
> > random data is on the target device will still be there for unallocated
> > sectors.
> >
> > This is because during the mirror, we set the dirty bitmap to copy only
> > sectors allocated above 'base'.  In the case of target devices where we
> > cannot assume unallocated sectors will be read as zeroes, we need to
> > explicitely zero out this data.
> >
> > In order to avoid zeroing out all sectors of the target device prior to
> > mirroring, we do zeroing as part of the block job.  A second dirty
> > bitmap cache is created, to track sectors that are unallocated above
> > 'base'.  These sectors are then checked for status of BDRV_BLOCK_ZERO
> > on the target - if they are not, then zeroes are explicitly written.
> >
> > This only occurs under two conditions:
> >
> > 1. 'mode' != "existing"
> > 2. bdrv_has_zero_init(target) == NULL
> >
> > We perform the mirroring through mirror_iteration() as before, except
> > in two passes.  If the above two conditions are met, the first pass
> > is using the bitmap tracking unallocated sectors, to write the needed
> > zeroes.  Then, the second pass is performed, to mirror the actual data
> > as before.
> >
> > If the above two conditions are not met, then the first pass is skipped,
> > and only the second pass (the one with the actual data) is performed.
> >
> > Signed-off-by: Jeff Cody 
>
> Also, this makes qemu-iotests 097 fail for me.
>

OK, thanks - I'll check that out tomorrow afternoon.  I ran iotests on all
the tests I thought dealt with mirror, but I must have missed that one with
my grep.

Jeff


Re: [Qemu-block] [Qemu-devel] [PATCH 3/3] block: mirror - zero unallocated target sectors when zero init not present

2015-09-28 Thread Jeff Cody
On Sep 28, 2015 5:31 PM, "Kevin Wolf"  wrote:
>

(Responding from mobile phone again)

> Am 28.09.2015 um 05:29 hat Jeff Cody geschrieben:
> > During mirror, if the target device does not have support zero
> > initialization, a mirror may result in a corrupt image.
>
> I think you want to check this sentence. ("During mirror [...], a
> mirror may result [...]")
>

Yes, thanks.

> > For instance, on mirror to a host device with format = raw, whatever
> > random data is on the target device will still be there for unallocated
> > sectors.
> >
> > This is because during the mirror, we set the dirty bitmap to copy only
> > sectors allocated above 'base'.  In the case of target devices where we
> > cannot assume unallocated sectors will be read as zeroes, we need to
> > explicitely zero out this data.
> >
> > In order to avoid zeroing out all sectors of the target device prior to
> > mirroring, we do zeroing as part of the block job.  A second dirty
> > bitmap cache is created, to track sectors that are unallocated above
> > 'base'.  These sectors are then checked for status of BDRV_BLOCK_ZERO
> > on the target - if they are not, then zeroes are explicitly written.
>
> Why do you need a bitmap? You never change the bitmap after initialising
> it, so couldn't you instead just check the allocation status when you
> need it?

The main reason was really to maximize code reuse, and be able to use the
same iteration code in the mirror coroutine.

>
> In fact, why do we need two passes? I would have expected that commit
> dcfb3beb already does the trick, with checking allocation status and
> writing zeroes during the normal single pass.
>
> If that commit fails to solve the problem, I guess I first need to
> understand why before I can continue reviewing this one...
>

Responding from memory right now, but that commit only helps if the guest
unmaps data, changing the sectors to unallocated after the mirror begins.

However, before we get to this point we've already generated our bitmap of
dirty sectors in mirror_run(), and those are explicitly only sectors that
are allocated above the source.  Inside the iteration, we'll only pick up
the unallocated sectors if they have been changed by the guest.

> > This only occurs under two conditions:
> >
> > 1. 'mode' != "existing"
> > 2. bdrv_has_zero_init(target) == NULL
> >
> > We perform the mirroring through mirror_iteration() as before, except
> > in two passes.  If the above two conditions are met, the first pass
> > is using the bitmap tracking unallocated sectors, to write the needed
> > zeroes.  Then, the second pass is performed, to mirror the actual data
> > as before.
> >
> > If the above two conditions are not met, then the first pass is skipped,
> > and only the second pass (the one with the actual data) is performed.
> >
> > Signed-off-by: Jeff Cody 
> > ---
> >  block/mirror.c| 109
++
> >  blockdev.c|   2 +-
> >  include/block/block_int.h |   3 +-
> >  qapi/block-core.json  |   6 ++-
> >  4 files changed, 87 insertions(+), 33 deletions(-)
> >
> > diff --git a/block/mirror.c b/block/mirror.c
> > index 405e5c4..b599176 100644
> > --- a/block/mirror.c
> > +++ b/block/mirror.c
> > @@ -50,7 +50,9 @@ typedef struct MirrorBlockJob {
> >  int64_t bdev_length;
> >  unsigned long *cow_bitmap;
> >  BdrvDirtyBitmap *dirty_bitmap;
> > -HBitmapIter hbi;
> > +HBitmapIter zero_hbi;
> > +HBitmapIter allocated_hbi;
> > +HBitmapIter *hbi;
> >  uint8_t *buf;
> >  QSIMPLEQ_HEAD(, MirrorBuffer) buf_free;
> >  int buf_free_count;
> > @@ -60,6 +62,8 @@ typedef struct MirrorBlockJob {
> >  int sectors_in_flight;
> >  int ret;
> >  bool unmap;
> > +bool zero_unallocated;
> > +bool zero_cycle;
> >  bool waiting_for_io;
> >  } MirrorBlockJob;
> >
> > @@ -166,10 +170,10 @@ static uint64_t coroutine_fn
mirror_iteration(MirrorBlockJob *s)
> >  int pnum;
> >  int64_t ret;
> >
> > -s->sector_num = hbitmap_iter_next(>hbi);
> > +s->sector_num = hbitmap_iter_next(s->hbi);
> >  if (s->sector_num < 0) {
> > -bdrv_dirty_iter_init(s->dirty_bitmap, >hbi);
> > -s->sector_num = hbitmap_iter_next(>hbi);
> > +bdrv_dirty_iter_init(s->dirty_bitmap, s->hbi);
> > +s->sector_num = hbitmap_iter_next(s->hbi);
> >  trace_mirror_restart_iter(s,
bdrv_get_dirty_count(s->dirty_bitmap));
> >  assert(s->sector_num >= 0);
> >  }
> > @@ -287,7 +291,7 @@ static uint64_t coroutine_fn
mirror_iteration(MirrorBlockJob *s)
> >   */
> >  if (next_sector > hbitmap_next_sector
> >  && bdrv_get_dirty(source, s->dirty_bitmap, next_sector)) {
> > -hbitmap_next_sector = hbitmap_iter_next(>hbi);
> > +hbitmap_next_sector = hbitmap_iter_next(s->hbi);
> >  }
> >
> >  next_sector += sectors_per_chunk;
> > @@ -300,25