Re: [Qemu-block] [Qemu-devel] [PATCH 3/3] block: mirror - zero unallocated target sectors when zero init not present
On 09/28/2015 03:32 PM, Jeff Cody wrote: > I guess that makes sense. What about the case when the target is a raw > device without zero init? There is no backing file... Of course, perhaps > in the raw case the user should be using sync==full anyways. > >> >> 2) even with mode == "existing" you expect the data to be consistent at >> the end of the mirroring >> > > The reason I added the "existing" exception was so the user could avoid the > time penalty of zeroing out the data if they knew the target had already > explicitly been zeroed. Do you think it is fair to assume that if the user > specified existing, that they take responsibility for setting up the target > image how they like (including data initialization)? Or should we add > another option for mirror, to allow the user to bypass the zero fill? mode == 'existing' puts the burden on the caller to ensure that the file they are passing in starts with known contents (either contents don't matter because we are doing sync == 'full' to write every sector, or contents MUST initially match what the guest would see looking at the backing image when doing a shallow clone). But if there is a way for a user to pass in an existing file which they have pre-zeroed, even though the file would normally be treated as though it did not have zero fill, then the option to bypass a redundant zero fill might be useful. I'm not sure it's worth implementing without a known user, though, and I don't know that libvirt would use it. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [PATCH 16/16] block: Remove bdrv_swap()
On Thu 17 Sep 2015 03:48:20 PM CEST, Kevin Wolf wrote: > bdrv_swap() is unused now. Remove it and all functions that have > no other users than bdrv_swap(). In particular, this removes the > .bdrv_rebind callbacks from block drivers. > > Signed-off-by: Kevin WolfReviewed-by: Alberto Garcia Berto
Re: [Qemu-block] [Qemu-devel] [PATCH 06/10][TRIVIAL] macio-ide: add to storage category
On 26/09/15 18:22, Laurent Vivier wrote: > macio-ide is an IDE controller, so add it > to the storage category. > > Signed-off-by: Laurent Vivier> --- > hw/ide/macio.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/hw/ide/macio.c b/hw/ide/macio.c > index 66ac2ba..893c9b9 100644 > --- a/hw/ide/macio.c > +++ b/hw/ide/macio.c > @@ -590,6 +590,7 @@ static void macio_ide_class_init(ObjectClass *oc, void > *data) > dc->realize = macio_ide_realizefn; > dc->reset = macio_ide_reset; > dc->vmsd = _pmac; > +set_bit(DEVICE_CATEGORY_STORAGE, dc->categories); > } Reviewed-by: Thomas Huth
Re: [Qemu-block] [Qemu-devel] [PATCH 01/10][TRIVIAL] adb: add to input category
On 26/09/15 18:22, Laurent Vivier wrote: > The Apple Desktop Bus is used to connect a keyboard and a mouse, > so add it to the input category. > > Signed-off-by: Laurent Vivier> --- > hw/input/adb.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/hw/input/adb.c b/hw/input/adb.c > index a18eea2..09eead9 100644 > --- a/hw/input/adb.c > +++ b/hw/input/adb.c > @@ -362,6 +362,7 @@ static void adb_kbd_class_init(ObjectClass *oc, void > *data) > > akc->parent_realize = dc->realize; > dc->realize = adb_kbd_realizefn; > +set_bit(DEVICE_CATEGORY_INPUT, dc->categories); > > adc->devreq = adb_kbd_request; > dc->reset = adb_kbd_reset; > @@ -566,6 +567,7 @@ static void adb_mouse_class_init(ObjectClass *oc, void > *data) > > amc->parent_realize = dc->realize; > dc->realize = adb_mouse_realizefn; > +set_bit(DEVICE_CATEGORY_INPUT, dc->categories); > > adc->devreq = adb_mouse_request; > dc->reset = adb_mouse_reset; Reviewed-by: Thomas Huth
Re: [Qemu-block] [Qemu-devel] [PATCH 03/10][TRIVIAL] escc: add to input category
On 26/09/15 18:22, Laurent Vivier wrote: > ESCC is a serial port controller, so add it > to the input category. > > Signed-off-by: Laurent Vivier> --- > hw/char/escc.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/hw/char/escc.c b/hw/char/escc.c > index ba653ef..9816154 100644 > --- a/hw/char/escc.c > +++ b/hw/char/escc.c > @@ -1035,6 +1035,7 @@ static void escc_class_init(ObjectClass *klass, void > *data) > dc->reset = escc_reset; > dc->vmsd = _escc; > dc->props = escc_properties; > +set_bit(DEVICE_CATEGORY_INPUT, dc->categories); > } Reviewed-by: Thomas Huth
Re: [Qemu-block] [PATCH 3/3] block: mirror - zero unallocated target sectors when zero init not present
On 28/09/2015 05:29, Jeff Cody wrote: > This only occurs under two conditions: > > 1. 'mode' != "existing" > 2. bdrv_has_zero_init(target) == NULL > I'm not sure if mode != "existing" actually matters. I think what actually matters is sync == "full". The reasons are: 1) with sync != "full", unallocated target sectors should remain unallocated on the destination because they are supposed to point to the backing file. 2) even with mode == "existing" you expect the data to be consistent at the end of the mirroring Paolo
Re: [Qemu-block] [PATCH 2/3] block: mirror - split out part of mirror_run()
On 28/09/2015 05:29, Jeff Cody wrote: > This is code relocation, to pull the part of mirror_run() that > calls mirror_iteration out into a separate function. > > Signed-off-by: Jeff Cody> --- > block/mirror.c | 206 > ++--- > 1 file changed, 110 insertions(+), 96 deletions(-) > > diff --git a/block/mirror.c b/block/mirror.c > index 8b3e89b..405e5c4 100644 > --- a/block/mirror.c > +++ b/block/mirror.c > @@ -318,6 +318,115 @@ static uint64_t coroutine_fn > mirror_iteration(MirrorBlockJob *s) > return delay_ns; > } > > +static int mirror_do_iteration(MirrorBlockJob *s, uint64_t last_pause_ns) > +{ > +int ret; > + > +BlockDriverState *bs = s->common.bs; > + > +for (;;) { > +uint64_t delay_ns = 0; > +int64_t cnt; > +bool should_complete; > + > +if (s->ret < 0) { > +ret = s->ret; > +goto immediate_exit; > +} You might as well replace the gotos with returns (either "return ret;" or "return s->ret;") and the breaks with "return 0;"). > + > +cnt = bdrv_get_dirty_count(s->dirty_bitmap); > +/* s->common.offset contains the number of bytes already processed so > + * far, cnt is the number of dirty sectors remaining and > + * s->sectors_in_flight is the number of sectors currently being > + * processed; together those are the current total operation length > */ > +s->common.len = s->common.offset + > +(cnt + s->sectors_in_flight) * BDRV_SECTOR_SIZE; > + > +/* Note that even when no rate limit is applied we need to yield > + * periodically with no pending I/O so that bdrv_drain_all() returns. > + * We do so every SLICE_TIME nanoseconds, or when there is an error, > + * or when the source is clean, whichever comes first. > + */ > +if (qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - last_pause_ns < > SLICE_TIME > +&& s->common.iostatus == BLOCK_DEVICE_IO_STATUS_OK) { > +if (s->in_flight == MAX_IN_FLIGHT || s->buf_free_count == 0 || > +(cnt == 0 && s->in_flight > 0)) { > +trace_mirror_yield(s, s->in_flight, s->buf_free_count, cnt); > +s->waiting_for_io = true; > +qemu_coroutine_yield(); > +s->waiting_for_io = false; > +continue; > +} else if (cnt != 0) { > +delay_ns = mirror_iteration(s); > +} > +} > + > +should_complete = false; > +if (s->in_flight == 0 && cnt == 0) { > +trace_mirror_before_flush(s); > +ret = bdrv_flush(s->target); > +if (ret < 0) { > +if (mirror_error_action(s, false, -ret) == > +BLOCK_ERROR_ACTION_REPORT) { > +goto immediate_exit; > +} > +} else { > +/* We're out of the streaming phase. From now on, if the job > + * is cancelled we will actually complete all pending I/O and > + * report completion. This way, block-job-cancel will leave > + * the target in a consistent state. > + */ > +if (!s->synced) { > +block_job_event_ready(>common); > +s->synced = true; > +} > + > +should_complete = s->should_complete || > +block_job_is_cancelled(>common); > +cnt = bdrv_get_dirty_count(s->dirty_bitmap); > +} > +} > + > +if (cnt == 0 && should_complete) { > +/* The dirty bitmap is not updated while operations are pending. > + * If we're about to exit, wait for pending operations before > + * calling bdrv_get_dirty_count(bs), or we may exit while the > + * source has dirty data to copy! > + * > + * Note that I/O can be submitted by the guest while > + * mirror_populate runs. > + */ > +trace_mirror_before_drain(s, cnt); > +bdrv_drain(bs); > +cnt = bdrv_get_dirty_count(s->dirty_bitmap); > +} > + > +ret = 0; > +trace_mirror_before_sleep(s, cnt, s->synced, delay_ns); > +if (!s->synced) { > +block_job_sleep_ns(>common, QEMU_CLOCK_REALTIME, delay_ns); > +if (block_job_is_cancelled(>common)) { > +break; > +} > +} else if (!should_complete) { > +delay_ns = (s->in_flight == 0 && cnt == 0 ? SLICE_TIME : 0); > +block_job_sleep_ns(>common, QEMU_CLOCK_REALTIME, delay_ns); > +} else if (cnt == 0) { > +/* The two disks are in sync. Exit and report successful > + * completion. > + */ > +assert(QLIST_EMPTY(>tracked_requests)); > +
Re: [Qemu-block] [Qemu-devel] [PATCH 04/10][TRIVIAL] grackle: add to bridge category
On 26/09/15 18:22, Laurent Vivier wrote: > Grackle is the PCI host controller of oldworld powermac, > so add it to the bridge category. > > Signed-off-by: Laurent Vivier> --- > hw/pci-host/grackle.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/hw/pci-host/grackle.c b/hw/pci-host/grackle.c > index bfe707a..ea31b72 100644 > --- a/hw/pci-host/grackle.c > +++ b/hw/pci-host/grackle.c > @@ -146,8 +146,10 @@ static const TypeInfo grackle_pci_info = { > static void pci_grackle_class_init(ObjectClass *klass, void *data) > { > SysBusDeviceClass *k = SYS_BUS_DEVICE_CLASS(klass); > +DeviceClass *dc = DEVICE_CLASS(klass); > > k->init = pci_grackle_init_device; > +set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories); > } Reviewed-by: Thomas Huth
Re: [Qemu-block] [PATCH 10/16] block/io: Make bdrv_requests_pending() public
On Thu 17 Sep 2015 03:48:14 PM CEST, Kevin Wolf wrote: > Signed-off-by: Kevin Wolf> --- > block/io.c| 2 +- > include/block/block_int.h | 1 + > 2 files changed, 2 insertions(+), 1 deletion(-) Reviewed-by: Alberto Garcia Berto
Re: [Qemu-block] [Qemu-devel] [PATCH 10/10][TRIVIAL] openpic: add to misc category
On 26/09/15 18:22, Laurent Vivier wrote: > openpic is a programmable interrupt controller, so > add it to the misc category. > > Signed-off-by: Laurent Vivier> --- > hw/intc/openpic.c | 1 + > hw/intc/openpic_kvm.c | 1 + > 2 files changed, 2 insertions(+) > > diff --git a/hw/intc/openpic.c b/hw/intc/openpic.c > index 14ab0e3..bfcf155 100644 > --- a/hw/intc/openpic.c > +++ b/hw/intc/openpic.c > @@ -1643,6 +1643,7 @@ static void openpic_class_init(ObjectClass *oc, void > *data) > dc->props = openpic_properties; > dc->reset = openpic_reset; > dc->vmsd = _openpic; > +set_bit(DEVICE_CATEGORY_MISC, dc->categories); > } > > static const TypeInfo openpic_info = { > diff --git a/hw/intc/openpic_kvm.c b/hw/intc/openpic_kvm.c > index f7cac58..649f476 100644 > --- a/hw/intc/openpic_kvm.c > +++ b/hw/intc/openpic_kvm.c > @@ -275,6 +275,7 @@ static void kvm_openpic_class_init(ObjectClass *oc, void > *data) > dc->realize = kvm_openpic_realize; > dc->props = kvm_openpic_properties; > dc->reset = kvm_openpic_reset; > +set_bit(DEVICE_CATEGORY_MISC, dc->categories); > } Reviewed-by: Thomas Huth
Re: [Qemu-block] [Qemu-devel] [PATCH 09/10][TRIVIAL] macio-nvram: add to misc category
On 26/09/15 18:22, Laurent Vivier wrote: > The macio nvram is a non volatile RAM, so add it > the misc category. > > Signed-off-by: Laurent Vivier> --- > hw/nvram/mac_nvram.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/hw/nvram/mac_nvram.c b/hw/nvram/mac_nvram.c > index d35f8a3..9f16566 100644 > --- a/hw/nvram/mac_nvram.c > +++ b/hw/nvram/mac_nvram.c > @@ -123,6 +123,7 @@ static void macio_nvram_class_init(ObjectClass *oc, void > *data) > dc->reset = macio_nvram_reset; > dc->vmsd = _macio_nvram; > dc->props = macio_nvram_properties; > +set_bit(DEVICE_CATEGORY_MISC, dc->categories); > } Reviewed-by: Thomas Huth
Re: [Qemu-block] [PATCH 2/3] block: mirror - split out part of mirror_run()
Am 28.09.2015 um 05:29 hat Jeff Cody geschrieben: > This is code relocation, to pull the part of mirror_run() that > calls mirror_iteration out into a separate function. > > Signed-off-by: Jeff Cody> --- > block/mirror.c | 206 > ++--- > 1 file changed, 110 insertions(+), 96 deletions(-) > > diff --git a/block/mirror.c b/block/mirror.c > index 8b3e89b..405e5c4 100644 > --- a/block/mirror.c > +++ b/block/mirror.c > @@ -318,6 +318,115 @@ static uint64_t coroutine_fn > mirror_iteration(MirrorBlockJob *s) > return delay_ns; > } > > +static int mirror_do_iteration(MirrorBlockJob *s, uint64_t last_pause_ns) > +{ > +int ret; > + > +BlockDriverState *bs = s->common.bs; > + > +for (;;) { > +uint64_t delay_ns = 0; > +int64_t cnt; > +bool should_complete; > + > +if (s->ret < 0) { > +ret = s->ret; > +goto immediate_exit; > +} > + > +cnt = bdrv_get_dirty_count(s->dirty_bitmap); > +/* s->common.offset contains the number of bytes already processed so > + * far, cnt is the number of dirty sectors remaining and > + * s->sectors_in_flight is the number of sectors currently being > + * processed; together those are the current total operation length > */ > +s->common.len = s->common.offset + > +(cnt + s->sectors_in_flight) * BDRV_SECTOR_SIZE; > + > +/* Note that even when no rate limit is applied we need to yield > + * periodically with no pending I/O so that bdrv_drain_all() returns. > + * We do so every SLICE_TIME nanoseconds, or when there is an error, > + * or when the source is clean, whichever comes first. > + */ > +if (qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - last_pause_ns < > SLICE_TIME > +&& s->common.iostatus == BLOCK_DEVICE_IO_STATUS_OK) { > +if (s->in_flight == MAX_IN_FLIGHT || s->buf_free_count == 0 || > +(cnt == 0 && s->in_flight > 0)) { > +trace_mirror_yield(s, s->in_flight, s->buf_free_count, cnt); > +s->waiting_for_io = true; > +qemu_coroutine_yield(); > +s->waiting_for_io = false; > +continue; > +} else if (cnt != 0) { > +delay_ns = mirror_iteration(s); > +} > +} > + > +should_complete = false; > +if (s->in_flight == 0 && cnt == 0) { > +trace_mirror_before_flush(s); > +ret = bdrv_flush(s->target); > +if (ret < 0) { > +if (mirror_error_action(s, false, -ret) == > +BLOCK_ERROR_ACTION_REPORT) { > +goto immediate_exit; > +} > +} else { > +/* We're out of the streaming phase. From now on, if the job > + * is cancelled we will actually complete all pending I/O and > + * report completion. This way, block-job-cancel will leave > + * the target in a consistent state. > + */ > +if (!s->synced) { > +block_job_event_ready(>common); > +s->synced = true; > +} > + > +should_complete = s->should_complete || > +block_job_is_cancelled(>common); > +cnt = bdrv_get_dirty_count(s->dirty_bitmap); > +} > +} > + > +if (cnt == 0 && should_complete) { > +/* The dirty bitmap is not updated while operations are pending. > + * If we're about to exit, wait for pending operations before > + * calling bdrv_get_dirty_count(bs), or we may exit while the > + * source has dirty data to copy! > + * > + * Note that I/O can be submitted by the guest while > + * mirror_populate runs. > + */ > +trace_mirror_before_drain(s, cnt); > +bdrv_drain(bs); > +cnt = bdrv_get_dirty_count(s->dirty_bitmap); > +} > + > +ret = 0; > +trace_mirror_before_sleep(s, cnt, s->synced, delay_ns); > +if (!s->synced) { > +block_job_sleep_ns(>common, QEMU_CLOCK_REALTIME, delay_ns); > +if (block_job_is_cancelled(>common)) { > +break; > +} > +} else if (!should_complete) { > +delay_ns = (s->in_flight == 0 && cnt == 0 ? SLICE_TIME : 0); > +block_job_sleep_ns(>common, QEMU_CLOCK_REALTIME, delay_ns); > +} else if (cnt == 0) { > +/* The two disks are in sync. Exit and report successful > + * completion. > + */ > +assert(QLIST_EMPTY(>tracked_requests)); > +s->common.cancelled = false; > +break; > +} > +last_pause_ns =
Re: [Qemu-block] [PATCH 3/3] block: mirror - zero unallocated target sectors when zero init not present
Am 28.09.2015 um 05:29 hat Jeff Cody geschrieben: > During mirror, if the target device does not have support zero > initialization, a mirror may result in a corrupt image. I think you want to check this sentence. ("During mirror [...], a mirror may result [...]") > For instance, on mirror to a host device with format = raw, whatever > random data is on the target device will still be there for unallocated > sectors. > > This is because during the mirror, we set the dirty bitmap to copy only > sectors allocated above 'base'. In the case of target devices where we > cannot assume unallocated sectors will be read as zeroes, we need to > explicitely zero out this data. > > In order to avoid zeroing out all sectors of the target device prior to > mirroring, we do zeroing as part of the block job. A second dirty > bitmap cache is created, to track sectors that are unallocated above > 'base'. These sectors are then checked for status of BDRV_BLOCK_ZERO > on the target - if they are not, then zeroes are explicitly written. Why do you need a bitmap? You never change the bitmap after initialising it, so couldn't you instead just check the allocation status when you need it? In fact, why do we need two passes? I would have expected that commit dcfb3beb already does the trick, with checking allocation status and writing zeroes during the normal single pass. If that commit fails to solve the problem, I guess I first need to understand why before I can continue reviewing this one... > This only occurs under two conditions: > > 1. 'mode' != "existing" > 2. bdrv_has_zero_init(target) == NULL > > We perform the mirroring through mirror_iteration() as before, except > in two passes. If the above two conditions are met, the first pass > is using the bitmap tracking unallocated sectors, to write the needed > zeroes. Then, the second pass is performed, to mirror the actual data > as before. > > If the above two conditions are not met, then the first pass is skipped, > and only the second pass (the one with the actual data) is performed. > > Signed-off-by: Jeff Cody> --- > block/mirror.c| 109 > ++ > blockdev.c| 2 +- > include/block/block_int.h | 3 +- > qapi/block-core.json | 6 ++- > 4 files changed, 87 insertions(+), 33 deletions(-) > > diff --git a/block/mirror.c b/block/mirror.c > index 405e5c4..b599176 100644 > --- a/block/mirror.c > +++ b/block/mirror.c > @@ -50,7 +50,9 @@ typedef struct MirrorBlockJob { > int64_t bdev_length; > unsigned long *cow_bitmap; > BdrvDirtyBitmap *dirty_bitmap; > -HBitmapIter hbi; > +HBitmapIter zero_hbi; > +HBitmapIter allocated_hbi; > +HBitmapIter *hbi; > uint8_t *buf; > QSIMPLEQ_HEAD(, MirrorBuffer) buf_free; > int buf_free_count; > @@ -60,6 +62,8 @@ typedef struct MirrorBlockJob { > int sectors_in_flight; > int ret; > bool unmap; > +bool zero_unallocated; > +bool zero_cycle; > bool waiting_for_io; > } MirrorBlockJob; > > @@ -166,10 +170,10 @@ static uint64_t coroutine_fn > mirror_iteration(MirrorBlockJob *s) > int pnum; > int64_t ret; > > -s->sector_num = hbitmap_iter_next(>hbi); > +s->sector_num = hbitmap_iter_next(s->hbi); > if (s->sector_num < 0) { > -bdrv_dirty_iter_init(s->dirty_bitmap, >hbi); > -s->sector_num = hbitmap_iter_next(>hbi); > +bdrv_dirty_iter_init(s->dirty_bitmap, s->hbi); > +s->sector_num = hbitmap_iter_next(s->hbi); > trace_mirror_restart_iter(s, bdrv_get_dirty_count(s->dirty_bitmap)); > assert(s->sector_num >= 0); > } > @@ -287,7 +291,7 @@ static uint64_t coroutine_fn > mirror_iteration(MirrorBlockJob *s) > */ > if (next_sector > hbitmap_next_sector > && bdrv_get_dirty(source, s->dirty_bitmap, next_sector)) { > -hbitmap_next_sector = hbitmap_iter_next(>hbi); > +hbitmap_next_sector = hbitmap_iter_next(s->hbi); > } > > next_sector += sectors_per_chunk; > @@ -300,25 +304,34 @@ static uint64_t coroutine_fn > mirror_iteration(MirrorBlockJob *s) > s->sectors_in_flight += nb_sectors; > trace_mirror_one_iteration(s, sector_num, nb_sectors); > > -ret = bdrv_get_block_status_above(source, NULL, sector_num, > - nb_sectors, ); > -if (ret < 0 || pnum < nb_sectors || > -(ret & BDRV_BLOCK_DATA && !(ret & BDRV_BLOCK_ZERO))) { > -bdrv_aio_readv(source, sector_num, >qiov, nb_sectors, > - mirror_read_complete, op); > -} else if (ret & BDRV_BLOCK_ZERO) { > -bdrv_aio_write_zeroes(s->target, sector_num, op->nb_sectors, > - s->unmap ? BDRV_REQ_MAY_UNMAP : 0, > - mirror_write_complete, op); > +if (s->zero_cycle) { > +ret =
Re: [Qemu-block] [PATCH v2] iotests: disable core dumps in test 061
On 28.09.2015 16:23, Alberto Garcia wrote: > Commit 934659c460 disabled the supression of segmentation faults in > bash tests. The new output of test 061, however, assumes that a core > dump will be produced if a program aborts. This is not necessarily the > case because core dumps can be disabled using ulimit. > > Since we cannot guarantee that abort() will produce a core dump, we > should use SIGKILL instead (that does not produce any) and update the > test output accordingly. > > Signed-off-by: Alberto Garcia> --- > tests/qemu-iotests/061 | 8 > tests/qemu-iotests/061.out | 4 ++-- > 2 files changed, 6 insertions(+), 6 deletions(-) Reviewed-by: Max Reitz signature.asc Description: OpenPGP digital signature
[Qemu-block] [PATCH v2] iotests: disable core dumps in test 061
Commit 934659c460 disabled the supression of segmentation faults in bash tests. The new output of test 061, however, assumes that a core dump will be produced if a program aborts. This is not necessarily the case because core dumps can be disabled using ulimit. Since we cannot guarantee that abort() will produce a core dump, we should use SIGKILL instead (that does not produce any) and update the test output accordingly. Signed-off-by: Alberto Garcia--- tests/qemu-iotests/061 | 8 tests/qemu-iotests/061.out | 4 ++-- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/tests/qemu-iotests/061 b/tests/qemu-iotests/061 index 1df887a..e191e65 100755 --- a/tests/qemu-iotests/061 +++ b/tests/qemu-iotests/061 @@ -58,8 +58,8 @@ echo echo "=== Testing dirty version downgrade ===" echo IMGOPTS="compat=1.1,lazy_refcounts=on" _make_test_img 64M -$QEMU_IO -c "write -P 0x2a 0 128k" -c flush -c abort "$TEST_IMG" 2>&1 \ -| _filter_qemu_io +$QEMU_IO -c "write -P 0x2a 0 128k" -c flush \ + -c "sigraise $(kill -l KILL)" "$TEST_IMG" 2>&1 | _filter_qemu_io $PYTHON qcow2.py "$TEST_IMG" dump-header $QEMU_IMG amend -o "compat=0.10" "$TEST_IMG" $PYTHON qcow2.py "$TEST_IMG" dump-header @@ -92,8 +92,8 @@ echo echo "=== Testing dirty lazy_refcounts=off ===" echo IMGOPTS="compat=1.1,lazy_refcounts=on" _make_test_img 64M -$QEMU_IO -c "write -P 0x2a 0 128k" -c flush -c abort "$TEST_IMG" 2>&1 \ -| _filter_qemu_io +$QEMU_IO -c "write -P 0x2a 0 128k" -c flush \ + -c "sigraise $(kill -l KILL)" "$TEST_IMG" 2>&1 | _filter_qemu_io $PYTHON qcow2.py "$TEST_IMG" dump-header $QEMU_IMG amend -o "lazy_refcounts=off" "$TEST_IMG" $PYTHON qcow2.py "$TEST_IMG" dump-header diff --git a/tests/qemu-iotests/061.out b/tests/qemu-iotests/061.out index a683f46..b16bea9 100644 --- a/tests/qemu-iotests/061.out +++ b/tests/qemu-iotests/061.out @@ -57,7 +57,7 @@ No errors were found on the image. Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864 wrote 131072/131072 bytes at offset 0 128 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) -./common.config: Aborted (core dumped) ( exec "$QEMU_IO_PROG" $QEMU_IO_OPTIONS "$@" ) +./common.config: Killed ( exec "$QEMU_IO_PROG" $QEMU_IO_OPTIONS "$@" ) magic 0x514649fb version 3 backing_file_offset 0x0 @@ -215,7 +215,7 @@ No errors were found on the image. Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864 wrote 131072/131072 bytes at offset 0 128 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) -./common.config: Aborted (core dumped) ( exec "$QEMU_IO_PROG" $QEMU_IO_OPTIONS "$@" ) +./common.config: Killed ( exec "$QEMU_IO_PROG" $QEMU_IO_OPTIONS "$@" ) magic 0x514649fb version 3 backing_file_offset 0x0 -- 2.5.3
Re: [Qemu-block] [Qemu-devel] [PATCH 06/10][TRIVIAL] macio-ide: add to storage category
On 09/26/2015 12:22 PM, Laurent Vivier wrote: > macio-ide is an IDE controller, so add it > to the storage category. > > Signed-off-by: Laurent Vivier> --- > hw/ide/macio.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/hw/ide/macio.c b/hw/ide/macio.c > index 66ac2ba..893c9b9 100644 > --- a/hw/ide/macio.c > +++ b/hw/ide/macio.c > @@ -590,6 +590,7 @@ static void macio_ide_class_init(ObjectClass *oc, void > *data) > dc->realize = macio_ide_realizefn; > dc->reset = macio_ide_reset; > dc->vmsd = _pmac; > +set_bit(DEVICE_CATEGORY_STORAGE, dc->categories); > } > > static const TypeInfo macio_ide_type_info = { > Thanks, Reviewed-by: John Snow I'll leave this to be staged by whomever stages the entire series. --js
Re: [Qemu-block] [PATCH 1/3] block: allow creation of detached dirty bitmaps
On Sun, Sep 27, 2015 at 11:29:16PM -0400, Jeff Cody wrote: > This allows the creation of detached dirty bitmaps, so that the > block driver dirty bitmaps can be used without inserting the > bitmap into the dirty bitmap list for a BDS. > > To free a bitmap that was created "detached = true", call > bdrv_release_dirty_bitmap() with the BlockDriverState argument > as NULL. I wonder if just disabling the bitmap with bdrv_disable_dirty_bitmap() is enough?
Re: [Qemu-block] [PATCH 3/3] block: mirror - zero unallocated target sectors when zero init not present
Am 28.09.2015 um 05:29 hat Jeff Cody geschrieben: > During mirror, if the target device does not have support zero > initialization, a mirror may result in a corrupt image. > > For instance, on mirror to a host device with format = raw, whatever > random data is on the target device will still be there for unallocated > sectors. > > This is because during the mirror, we set the dirty bitmap to copy only > sectors allocated above 'base'. In the case of target devices where we > cannot assume unallocated sectors will be read as zeroes, we need to > explicitely zero out this data. > > In order to avoid zeroing out all sectors of the target device prior to > mirroring, we do zeroing as part of the block job. A second dirty > bitmap cache is created, to track sectors that are unallocated above > 'base'. These sectors are then checked for status of BDRV_BLOCK_ZERO > on the target - if they are not, then zeroes are explicitly written. > > This only occurs under two conditions: > > 1. 'mode' != "existing" > 2. bdrv_has_zero_init(target) == NULL > > We perform the mirroring through mirror_iteration() as before, except > in two passes. If the above two conditions are met, the first pass > is using the bitmap tracking unallocated sectors, to write the needed > zeroes. Then, the second pass is performed, to mirror the actual data > as before. > > If the above two conditions are not met, then the first pass is skipped, > and only the second pass (the one with the actual data) is performed. > > Signed-off-by: Jeff CodyAlso, this makes qemu-iotests 097 fail for me. Kevin
Re: [Qemu-block] [PATCH 03/18] qemu-thread: introduce QemuLockCnt
On 28/09/2015 12:15, Stefan Hajnoczi wrote: > On Thu, Aug 06, 2015 at 03:36:01PM +0200, Paolo Bonzini wrote: >> > +int qemu_lockcnt_count(QemuLockCnt *lockcnt); > Why use int here when the counter field is unsigned? Nice catch, will fix. Paolo
Re: [Qemu-block] [PATCH 03/18] qemu-thread: introduce QemuLockCnt
On Thu, Aug 06, 2015 at 03:36:01PM +0200, Paolo Bonzini wrote: > +int qemu_lockcnt_count(QemuLockCnt *lockcnt); Why use int here when the counter field is unsigned?
Re: [Qemu-block] [Qemu-devel] [PATCH v2 00/11] aio: Introduce handler type to fix nested aio_poll for dataplane
On Fri, Sep 11, 2015 at 02:22:14PM +0200, Kevin Wolf wrote: > Am 11.09.2015 um 13:46 hat Fam Zheng geschrieben: > > On Fri, 09/11 12:39, Kevin Wolf wrote: > > > Am 29.07.2015 um 06:42 hat Fam Zheng geschrieben: > > > > v2: Switch to disable/enable model. [Paolo] > > > > > > > > Most existing nested aio_poll()'s in block layer are inconsiderate of > > > > dispatching potential new r/w requests from ioeventfds and nbd exports, > > > > which > > > > might result in responsiveness issues (e.g. bdrv_drain_all will not > > > > return when > > > > new requests keep coming), or even wrong semantics (e.g. > > > > qmp_transaction cannot > > > > enforce atomicity due to aio_poll in bdrv_drain_all). > > > > > > > > Previous attampts to address this issue include new op blocker[1], > > > > bdrv_lock[2] > > > > and nested AioContext (patches not posted to qemu-devel). > > > > > > > > This approach is based on the idea proposed by Paolo Bonzini. The > > > > original idea > > > > is introducing "aio_context_disable_client / aio_context_enable_client > > > > to > > > > filter AioContext handlers according to the "client", e.g. > > > > AIO_CLIENT_DATAPLANE (ioeventfd), AIO_CLIENT_PROTOCOL, > > > > AIO_CLIENT_NBD_SERVER, > > > > AIO_CLIENT_CONTEXT, ... Extend aio_set_{event_notifier,fd}_handler to > > > > pass a > > > > client (type)." > > > > > > > > After this series, block layer aio_poll() will only process those > > > > "protocol" > > > > fds that are used in block I/O, plus the ctx->notifier for > > > > aio_notify(); other > > > > aio_poll()'s keep unchanged. > > > > > > > > The biggest advantage over approaches [1] and [2] is, no change is > > > > needed in > > > > virtio-{blk,scsi}-dataplane code, also this doesn't depend on > > > > converting QMP to > > > > coroutines. > > > > > > It seems that I haven't replied on the mailing list yet, even though I > > > think I already said this in person at KVM Forum: This series fixes only > > > a special case of the real problem, which is that bdrv_drain/all at a > > > single point doesn't make a lot of sense, but needs to be replaced by a > > > whole section with exclusive access, like a bdrv_drained_begin/end pair. > > > > > > To be clear: Anything that works with types of users instead of > > > individual users is bound to fall short of being a complete solution. I > > > don't prefer partial solutions when we know there is a bigger problem. > > > > > > This series addresses your immediate need of protecting against new data > > > plane requests, which it arguably achieves. The second case I always > > > have in mind is Berto's case where he has multiple streaming block jobs > > > in the same backing file chain [1]. > > > > > > This involves a bdrv_reopen() of the target BDS to make it writable, and > > > bdrv_reopen() uses bdrv_drain_all() so drivers don't have to cope with > > > running requests while reopening themselves. It can however involve > > > nested event loops for synchronous operations like bdrv_flush(), and if > > > those can process completions of block jobs, which can respond by doing > > > anything to the respective node, things can go wrong. > > > > Just to get a better idea of bdrv_drained_begin/end, could you explain how > > to > > use the pair to fix the above problem? > > How to use it is easy part: In bdrv_reopen_multiple(), you would replace > the existing bdrv_drain_all() with begin and you would add the > corresponding end right before the return statement. > > > > You don't solve this by adding client types (then problematic request > > > would be PROTOCOL in your proposal and you can never exclude that), but > > > you really need to have bdrv_drained_being/end pairs, where only > > > requests issued in between are processed and everything else waits. > > > > What do you mean by "only requests issued in between are processed"? Where > > are > > the requests from? > > Generally speaking, you would have code that looks like this: > > bdrv_drain_begin() > ... > bdrv_something_synchronous() > ... > bdrv_drain_end() > > You want to process everything that is necessary for completing > bdrv_something_synchronous(), but nothing else. > > The trickier question is how to implement this. I know that it's much > easier to say that your series doesn't work than actually proposing > something else that works... > > One relatively obvious answer we found when we discussed this a while > back was some kind of a recursive CoRwLock (reader = in-flight request; > writer = drained section), but that requires obviously that you're > running in a coroutine if you want to do something with a drained > request queue. At one point I thought about converting vl.c:main() to run everything in a coroutine. It would allow us to eliminate synchronous operations completely. The problem is that it's quite a big change that requires protecting formerly synchronous operations against concurrency. Stefan
Re: [Qemu-block] [PATCH 3/3] block: mirror - zero unallocated target sectors when zero init not present
On Sun, Sep 27, 2015 at 11:29:18PM -0400, Jeff Cody wrote: > +if (s->zero_cycle) { > +ret = bdrv_get_block_status(s->target, sector_num, nb_sectors, > ); > +if (!(ret & BDRV_BLOCK_ZERO)) { > +bdrv_aio_write_zeroes(s->target, sector_num, op->nb_sectors, > + s->unmap ? BDRV_REQ_MAY_UNMAP : 0, > + mirror_write_complete, op); mirror_write_complete will advance s->common.offset. Won't the progress be incorrect if we do that for both zeroing and regular mirroring?
Re: [Qemu-block] [Qemu-devel] [PATCH 3/3] block: mirror - zero unallocated target sectors when zero init not present
On Sep 28, 2015 4:12 PM, "Paolo Bonzini"wrote: > > Replying from mobile; please excuse any formatting issues. > > On 28/09/2015 05:29, Jeff Cody wrote: > > This only occurs under two conditions: > > > > 1. 'mode' != "existing" > > 2. bdrv_has_zero_init(target) == NULL > > > > I'm not sure if mode != "existing" actually matters. I think what > actually matters is sync == "full". > > The reasons are: > > 1) with sync != "full", unallocated target sectors should remain > unallocated on the destination because they are supposed to point to the > backing file. I guess that makes sense. What about the case when the target is a raw device without zero init? There is no backing file... Of course, perhaps in the raw case the user should be using sync==full anyways. > > 2) even with mode == "existing" you expect the data to be consistent at > the end of the mirroring > The reason I added the "existing" exception was so the user could avoid the time penalty of zeroing out the data if they knew the target had already explicitly been zeroed. Do you think it is fair to assume that if the user specified existing, that they take responsibility for setting up the target image how they like (including data initialization)? Or should we add another option for mirror, to allow the user to bypass the zero fill? Thanks, Jeff
Re: [Qemu-block] [PATCH 2/3] block: mirror - split out part of mirror_run()
On 28.09.2015 05:29, Jeff Cody wrote: > This is code relocation, to pull the part of mirror_run() that > calls mirror_iteration out into a separate function. > > Signed-off-by: Jeff Cody> --- > block/mirror.c | 206 > ++--- > 1 file changed, 110 insertions(+), 96 deletions(-) Reviewed-by: Max Reitz signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [PATCH 1/3] block: allow creation of detached dirty bitmaps
On 28.09.2015 05:29, Jeff Cody wrote: > This allows the creation of detached dirty bitmaps, so that the > block driver dirty bitmaps can be used without inserting the > bitmap into the dirty bitmap list for a BDS. > > To free a bitmap that was created "detached = true", call > bdrv_release_dirty_bitmap() with the BlockDriverState argument > as NULL. > > Signed-off-by: Jeff Cody> --- > block.c | 26 -- > block/mirror.c| 3 ++- > blockdev.c| 2 +- > include/block/block.h | 1 + > migration/block.c | 2 +- > 5 files changed, 25 insertions(+), 9 deletions(-) Reviewed-by: Max Reitz signature.asc Description: OpenPGP digital signature
[Qemu-block] [PATCH] qtest/ide-test: ppc64be correction for ATAPI tests
the 16bit ide data register is LE by definition. Signed-off-by: John Snow--- tests/ide-test.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tests/ide-test.c b/tests/ide-test.c index 5594738..b6e9e1a 100644 --- a/tests/ide-test.c +++ b/tests/ide-test.c @@ -633,7 +633,7 @@ static void send_scsi_cdb_read10(uint64_t lba, int nblocks) /* Send Packet */ for (i = 0; i < sizeof(Read10CDB)/2; i++) { -outw(IDE_BASE + reg_data, ((uint16_t *))[i]); +outw(IDE_BASE + reg_data, cpu_to_le16(((uint16_t *))[i])); } } @@ -733,7 +733,7 @@ static void cdrom_pio_impl(int nblocks) size_t offset = i * (limit / 2); size_t rem = (rxsize / 2) - offset; for (j = 0; j < MIN((limit / 2), rem); j++) { -rx[offset + j] = inw(IDE_BASE + reg_data); +rx[offset + j] = le16_to_cpu(inw(IDE_BASE + reg_data)); } ide_wait_intr(IDE_PRIMARY_IRQ); } -- 2.4.3
Re: [Qemu-block] [PATCH 3/3] block: mirror - zero unallocated target sectors when zero init not present
On 28.09.2015 05:29, Jeff Cody wrote: > During mirror, if the target device does not have support zero > initialization, a mirror may result in a corrupt image. > > For instance, on mirror to a host device with format = raw, whatever > random data is on the target device will still be there for unallocated > sectors. > > This is because during the mirror, we set the dirty bitmap to copy only > sectors allocated above 'base'. In the case of target devices where we > cannot assume unallocated sectors will be read as zeroes, we need to > explicitely zero out this data. > > In order to avoid zeroing out all sectors of the target device prior to > mirroring, we do zeroing as part of the block job. A second dirty > bitmap cache is created, to track sectors that are unallocated above > 'base'. These sectors are then checked for status of BDRV_BLOCK_ZERO > on the target - if they are not, then zeroes are explicitly written. > > This only occurs under two conditions: > > 1. 'mode' != "existing" > 2. bdrv_has_zero_init(target) == NULL > > We perform the mirroring through mirror_iteration() as before, except > in two passes. If the above two conditions are met, the first pass > is using the bitmap tracking unallocated sectors, to write the needed > zeroes. Then, the second pass is performed, to mirror the actual data > as before. > > If the above two conditions are not met, then the first pass is skipped, > and only the second pass (the one with the actual data) is performed. > > Signed-off-by: Jeff Cody> --- > block/mirror.c| 109 > ++ > blockdev.c| 2 +- > include/block/block_int.h | 3 +- > qapi/block-core.json | 6 ++- > 4 files changed, 87 insertions(+), 33 deletions(-) > > diff --git a/block/mirror.c b/block/mirror.c > index 405e5c4..b599176 100644 > --- a/block/mirror.c > +++ b/block/mirror.c > @@ -50,7 +50,9 @@ typedef struct MirrorBlockJob { > int64_t bdev_length; > unsigned long *cow_bitmap; > BdrvDirtyBitmap *dirty_bitmap; > -HBitmapIter hbi; > +HBitmapIter zero_hbi; > +HBitmapIter allocated_hbi; > +HBitmapIter *hbi; > uint8_t *buf; > QSIMPLEQ_HEAD(, MirrorBuffer) buf_free; > int buf_free_count; > @@ -60,6 +62,8 @@ typedef struct MirrorBlockJob { > int sectors_in_flight; > int ret; > bool unmap; > +bool zero_unallocated; > +bool zero_cycle; > bool waiting_for_io; > } MirrorBlockJob; > > @@ -166,10 +170,10 @@ static uint64_t coroutine_fn > mirror_iteration(MirrorBlockJob *s) > int pnum; > int64_t ret; > > -s->sector_num = hbitmap_iter_next(>hbi); > +s->sector_num = hbitmap_iter_next(s->hbi); > if (s->sector_num < 0) { > -bdrv_dirty_iter_init(s->dirty_bitmap, >hbi); > -s->sector_num = hbitmap_iter_next(>hbi); > +bdrv_dirty_iter_init(s->dirty_bitmap, s->hbi); > +s->sector_num = hbitmap_iter_next(s->hbi); > trace_mirror_restart_iter(s, bdrv_get_dirty_count(s->dirty_bitmap)); > assert(s->sector_num >= 0); > } > @@ -287,7 +291,7 @@ static uint64_t coroutine_fn > mirror_iteration(MirrorBlockJob *s) > */ > if (next_sector > hbitmap_next_sector > && bdrv_get_dirty(source, s->dirty_bitmap, next_sector)) { > -hbitmap_next_sector = hbitmap_iter_next(>hbi); > +hbitmap_next_sector = hbitmap_iter_next(s->hbi); > } > > next_sector += sectors_per_chunk; > @@ -300,25 +304,34 @@ static uint64_t coroutine_fn > mirror_iteration(MirrorBlockJob *s) > s->sectors_in_flight += nb_sectors; > trace_mirror_one_iteration(s, sector_num, nb_sectors); > > -ret = bdrv_get_block_status_above(source, NULL, sector_num, > - nb_sectors, ); > -if (ret < 0 || pnum < nb_sectors || > -(ret & BDRV_BLOCK_DATA && !(ret & BDRV_BLOCK_ZERO))) { > -bdrv_aio_readv(source, sector_num, >qiov, nb_sectors, > - mirror_read_complete, op); > -} else if (ret & BDRV_BLOCK_ZERO) { > -bdrv_aio_write_zeroes(s->target, sector_num, op->nb_sectors, > - s->unmap ? BDRV_REQ_MAY_UNMAP : 0, > - mirror_write_complete, op); > +if (s->zero_cycle) { > +ret = bdrv_get_block_status(s->target, sector_num, nb_sectors, > ); > +if (!(ret & BDRV_BLOCK_ZERO)) { > +bdrv_aio_write_zeroes(s->target, sector_num, op->nb_sectors, > + s->unmap ? BDRV_REQ_MAY_UNMAP : 0, > + mirror_write_complete, op); > +} > } else { > -assert(!(ret & BDRV_BLOCK_DATA)); > -bdrv_aio_discard(s->target, sector_num, op->nb_sectors, > - mirror_write_complete, op); > +ret = bdrv_get_block_status_above(source, NULL, sector_num, > +
Re: [Qemu-block] [Qemu-devel] [PATCH 3/3] block: mirror - zero unallocated target sectors when zero init not present
On 09/28/2015 08:13 AM, Paolo Bonzini wrote: > > > On 28/09/2015 05:29, Jeff Cody wrote: >> This only occurs under two conditions: >> >> 1. 'mode' != "existing" >> 2. bdrv_has_zero_init(target) == NULL >> > > I'm not sure if mode != "existing" actually matters. I think what > actually matters is sync == "full". When mode == 'existing' for a shallow mirror (sync != 'full'), that is the caller stating that the guest-visible contents of the destination match the guest-visible contents of the backing image. The only sectors to be copied are those that differ from the backing file, and we should not be zeroing unrelated sectors because the user has already promised they have the same guest-visible content as the backing image would report. When mode == 'existing' for a full mirror (sync == 'full'), that is the caller stating that they want every single sector of the destination written to hold the current state of the source (of course, allowing for optimizations such as skipping the write where the contents will read back the same as if the write had been performed). I think Paolo is right: we care about zeroing unallocated sectors for sync == 'full', regardless of whether mode == 'existing'. I also think the reason Jeff confused it for mode == 'existing' is that the other modes let qemu create the file, but qemu does not create block devices (the only way to mirror to a block device is via mode == 'existing'), and it is primarily block devices where zero init is not guaranteed. > > The reasons are: > > 1) with sync != "full", unallocated target sectors should remain > unallocated on the destination because they are supposed to point to the > backing file. > > 2) even with mode == "existing" you expect the data to be consistent at > the end of the mirroring > > Paolo > > -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [Qemu-devel] [PATCH 3/3] block: mirror - zero unallocated target sectors when zero init not present
On Sep 28, 2015 5:34 PM, "Kevin Wolf"wrote: > > Am 28.09.2015 um 05:29 hat Jeff Cody geschrieben: > > During mirror, if the target device does not have support zero > > initialization, a mirror may result in a corrupt image. > > > > For instance, on mirror to a host device with format = raw, whatever > > random data is on the target device will still be there for unallocated > > sectors. > > > > This is because during the mirror, we set the dirty bitmap to copy only > > sectors allocated above 'base'. In the case of target devices where we > > cannot assume unallocated sectors will be read as zeroes, we need to > > explicitely zero out this data. > > > > In order to avoid zeroing out all sectors of the target device prior to > > mirroring, we do zeroing as part of the block job. A second dirty > > bitmap cache is created, to track sectors that are unallocated above > > 'base'. These sectors are then checked for status of BDRV_BLOCK_ZERO > > on the target - if they are not, then zeroes are explicitly written. > > > > This only occurs under two conditions: > > > > 1. 'mode' != "existing" > > 2. bdrv_has_zero_init(target) == NULL > > > > We perform the mirroring through mirror_iteration() as before, except > > in two passes. If the above two conditions are met, the first pass > > is using the bitmap tracking unallocated sectors, to write the needed > > zeroes. Then, the second pass is performed, to mirror the actual data > > as before. > > > > If the above two conditions are not met, then the first pass is skipped, > > and only the second pass (the one with the actual data) is performed. > > > > Signed-off-by: Jeff Cody > > Also, this makes qemu-iotests 097 fail for me. > OK, thanks - I'll check that out tomorrow afternoon. I ran iotests on all the tests I thought dealt with mirror, but I must have missed that one with my grep. Jeff
Re: [Qemu-block] [Qemu-devel] [PATCH 3/3] block: mirror - zero unallocated target sectors when zero init not present
On Sep 28, 2015 5:31 PM, "Kevin Wolf"wrote: > (Responding from mobile phone again) > Am 28.09.2015 um 05:29 hat Jeff Cody geschrieben: > > During mirror, if the target device does not have support zero > > initialization, a mirror may result in a corrupt image. > > I think you want to check this sentence. ("During mirror [...], a > mirror may result [...]") > Yes, thanks. > > For instance, on mirror to a host device with format = raw, whatever > > random data is on the target device will still be there for unallocated > > sectors. > > > > This is because during the mirror, we set the dirty bitmap to copy only > > sectors allocated above 'base'. In the case of target devices where we > > cannot assume unallocated sectors will be read as zeroes, we need to > > explicitely zero out this data. > > > > In order to avoid zeroing out all sectors of the target device prior to > > mirroring, we do zeroing as part of the block job. A second dirty > > bitmap cache is created, to track sectors that are unallocated above > > 'base'. These sectors are then checked for status of BDRV_BLOCK_ZERO > > on the target - if they are not, then zeroes are explicitly written. > > Why do you need a bitmap? You never change the bitmap after initialising > it, so couldn't you instead just check the allocation status when you > need it? The main reason was really to maximize code reuse, and be able to use the same iteration code in the mirror coroutine. > > In fact, why do we need two passes? I would have expected that commit > dcfb3beb already does the trick, with checking allocation status and > writing zeroes during the normal single pass. > > If that commit fails to solve the problem, I guess I first need to > understand why before I can continue reviewing this one... > Responding from memory right now, but that commit only helps if the guest unmaps data, changing the sectors to unallocated after the mirror begins. However, before we get to this point we've already generated our bitmap of dirty sectors in mirror_run(), and those are explicitly only sectors that are allocated above the source. Inside the iteration, we'll only pick up the unallocated sectors if they have been changed by the guest. > > This only occurs under two conditions: > > > > 1. 'mode' != "existing" > > 2. bdrv_has_zero_init(target) == NULL > > > > We perform the mirroring through mirror_iteration() as before, except > > in two passes. If the above two conditions are met, the first pass > > is using the bitmap tracking unallocated sectors, to write the needed > > zeroes. Then, the second pass is performed, to mirror the actual data > > as before. > > > > If the above two conditions are not met, then the first pass is skipped, > > and only the second pass (the one with the actual data) is performed. > > > > Signed-off-by: Jeff Cody > > --- > > block/mirror.c| 109 ++ > > blockdev.c| 2 +- > > include/block/block_int.h | 3 +- > > qapi/block-core.json | 6 ++- > > 4 files changed, 87 insertions(+), 33 deletions(-) > > > > diff --git a/block/mirror.c b/block/mirror.c > > index 405e5c4..b599176 100644 > > --- a/block/mirror.c > > +++ b/block/mirror.c > > @@ -50,7 +50,9 @@ typedef struct MirrorBlockJob { > > int64_t bdev_length; > > unsigned long *cow_bitmap; > > BdrvDirtyBitmap *dirty_bitmap; > > -HBitmapIter hbi; > > +HBitmapIter zero_hbi; > > +HBitmapIter allocated_hbi; > > +HBitmapIter *hbi; > > uint8_t *buf; > > QSIMPLEQ_HEAD(, MirrorBuffer) buf_free; > > int buf_free_count; > > @@ -60,6 +62,8 @@ typedef struct MirrorBlockJob { > > int sectors_in_flight; > > int ret; > > bool unmap; > > +bool zero_unallocated; > > +bool zero_cycle; > > bool waiting_for_io; > > } MirrorBlockJob; > > > > @@ -166,10 +170,10 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s) > > int pnum; > > int64_t ret; > > > > -s->sector_num = hbitmap_iter_next(>hbi); > > +s->sector_num = hbitmap_iter_next(s->hbi); > > if (s->sector_num < 0) { > > -bdrv_dirty_iter_init(s->dirty_bitmap, >hbi); > > -s->sector_num = hbitmap_iter_next(>hbi); > > +bdrv_dirty_iter_init(s->dirty_bitmap, s->hbi); > > +s->sector_num = hbitmap_iter_next(s->hbi); > > trace_mirror_restart_iter(s, bdrv_get_dirty_count(s->dirty_bitmap)); > > assert(s->sector_num >= 0); > > } > > @@ -287,7 +291,7 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s) > > */ > > if (next_sector > hbitmap_next_sector > > && bdrv_get_dirty(source, s->dirty_bitmap, next_sector)) { > > -hbitmap_next_sector = hbitmap_iter_next(>hbi); > > +hbitmap_next_sector = hbitmap_iter_next(s->hbi); > > } > > > > next_sector += sectors_per_chunk; > > @@ -300,25