Re: [Qemu-block] [Qemu-devel] [PATCH v10 12/14] block: add transactional properties
On 11/06/2015 01:46 PM, John Snow wrote: > > > On 11/06/2015 03:32 AM, Kevin Wolf wrote: >> Am 05.11.2015 um 19:52 hat John Snow geschrieben: >>> >>> >>> On 11/05/2015 05:47 AM, Stefan Hajnoczi wrote: On Tue, Nov 03, 2015 at 12:27:19PM -0500, John Snow wrote: > > > On 11/03/2015 10:17 AM, Stefan Hajnoczi wrote: >> On Fri, Oct 23, 2015 at 07:56:50PM -0400, John Snow wrote: >>> @@ -1732,6 +1757,10 @@ static void >>> block_dirty_bitmap_add_prepare(BlkActionState *common, >>> BlockDirtyBitmapState *state = DO_UPCAST(BlockDirtyBitmapState, >>> common, common); >>> >>> +if (action_check_cancel_mode(common, errp) < 0) { >>> +return; >>> +} >>> + >>> action = common->action->block_dirty_bitmap_add; >>> /* AIO context taken and released within >>> qmp_block_dirty_bitmap_add */ >>> qmp_block_dirty_bitmap_add(action->node, action->name, >>> @@ -1767,6 +1796,10 @@ static void >>> block_dirty_bitmap_clear_prepare(BlkActionState *common, >>> common, common); >>> BlockDirtyBitmap *action; >>> >>> +if (action_check_cancel_mode(common, errp) < 0) { >>> +return; >>> +} >>> + >>> action = common->action->block_dirty_bitmap_clear; >>> state->bitmap = block_dirty_bitmap_lookup(action->node, >>>action->name, >> >> Why do the bitmap add/clear actions not support err-cancel=all? >> >> I understand why other block jobs don't support it, but it's not clear >> why these non-block job actions cannot. >> > > Because they don't have a callback to invoke if the rest of the job fails. > > I could create a BlockJob for them complete with a callback to invoke, > but basically it's just because there's no interface to unwind them, or > an interface to join them with the transaction. > > They're small, synchronous non-job actions. Which makes them weird. Funny, we've been looking at the same picture while seeing different things: https://en.wikipedia.org/wiki/Rabbit%E2%80%93duck_illusion I think I understand your idea: the transaction should include both immediate actions as well as block jobs. My mental model was different: immediate actions commit/abort along with the 'transaction' command. Block jobs are separate and complete/cancel together in a group. In practice I think the two end up being similar because we won't be able to implement immediate action commit/abort together with long-running block jobs because the immediate actions rely on quiescing/pausing the guest for atomic commit/abort. So with your mental model the QMP client has to submit 2 'transaction' commands: 1 for the immediate actions, 1 for the block jobs. In my mental model the QMP client submits 1 command but the immediate actions and block jobs are two separate transaction scopes. This means if the block jobs fail, the client needs to be aware of the immediate actions that have committed. Because of this, it becomes just as much client effort as submitting two separate 'transaction' commands in your model. Can anyone see a practical difference? I think I'm happy with John's model. Stefan >>> >>> We discussed this off-list, but for the sake of the archive: >>> >>> == How it is now == >>> >>> Currently, transactions have two implicit phases: the first is the >>> synchronous phase. If this phase completes successfully, we consider the >>> transaction a success. The second phase is the asynchronous phase where >>> jobs launched by the synchronous phase run to completion. >>> >>> all synchronous commands must complete for the transaction to "succeed." >>> There are currently (pre-patch) no guarantees about asynchronous command >>> completion. As long as all synchronous actions complete, asynchronous >>> actions are free to succeed or fail individually. >> >> You're overcomplicating this. All actions are currently synchronous and >> what you consider asynchronous transaction actions aren't actually part >> of the transaction at all. The action is "start block job X", not "run >> block job X". >> > > "OK," except the entire goal of the series was to allow the "aren't > actually part of the transaction at all" parts to live and die together > based on the transaction they were launched from. This really implies > they belong to a second asynchronous phase of the transaction. > > Otherwise, why would totally unrelated jobs fail because a different one > did? > > I realize this is an /extension/ of the existing semantics vs a "fix," > and part of the confusion is how I and everyone else was looking at it > differently. How cou
Re: [Qemu-block] [Qemu-devel] [PATCH v4 0/3] qemu-io: clean up cvtnum usage
On 11/06/2015 04:46 AM, Kevin Wolf wrote: > Am 06.11.2015 um 00:53 hat John Snow geschrieben: >> cvtnum returns an int64_t, not an int, so correct the lvalue types >> wherever it is used. While we're at it, make the error messages more >> meaningful and hopefully less confusing. >> >> v4: >> - Now missing ALL sweaters >> >> v3: >> - pulled a lot of loose yarn, now missing my sweater >>(Updated patch 1 even further, reported-by Kevin) >> >> v2: >> - Squashed NSIG error-checking from patch 3 into patch 1 >> - Reported-by credits for Max and Reviewed-by from Eric added > > Thanks, applied to the block branch. (Should we mention in the changelog > that qemu 2.5 contains some of your sweaters?) > > Kevin > If you want to destroy my sweater, NACK this patch as I walk away (as I walk away) --js
Re: [Qemu-block] [PATCH v6 03/15] block: Release dirty bitmaps in bdrv_close()
On 11/04/2015 01:57 PM, Max Reitz wrote: > bdrv_delete() is not very happy about deleting BlockDriverStates with > dirty bitmaps still attached to them. In the past, we got around that > very easily by relying on bdrv_close_all() bypassing bdrv_delete(), and > bdrv_close() simply ignoring that condition. We should fix that by > releasing all dirty bitmaps in bdrv_close() and drop the assertion in > bdrv_delete(). > > Signed-off-by: Max Reitz > --- > block.c | 20 +++- > 1 file changed, 19 insertions(+), 1 deletion(-) > > diff --git a/block.c b/block.c > index 3493501..23448ed 100644 > --- a/block.c > +++ b/block.c > @@ -87,6 +87,8 @@ static int bdrv_open_inherit(BlockDriverState **pbs, const > char *filename, > const BdrvChildRole *child_role, Error **errp); > > static void bdrv_dirty_bitmap_truncate(BlockDriverState *bs); > +static void bdrv_release_all_dirty_bitmaps(BlockDriverState *bs); > + > /* If non-zero, use only whitelisted block drivers */ > static int use_bdrv_whitelist; > > @@ -1916,6 +1918,8 @@ void bdrv_close(BlockDriverState *bs) > bdrv_drain(bs); /* in case flush left pending I/O */ > notifier_list_notify(&bs->close_notifiers, bs); > > +bdrv_release_all_dirty_bitmaps(bs); > + > if (bs->blk) { > blk_dev_change_media_cb(bs->blk, false); > } > @@ -2123,7 +2127,6 @@ static void bdrv_delete(BlockDriverState *bs) > assert(!bs->job); > assert(bdrv_op_blocker_is_empty(bs)); > assert(!bs->refcnt); > -assert(QLIST_EMPTY(&bs->dirty_bitmaps)); > > bdrv_close(bs); > > @@ -3318,6 +3321,21 @@ void bdrv_release_dirty_bitmap(BlockDriverState *bs, > BdrvDirtyBitmap *bitmap) > } > } > > +/** > + * Release all dirty bitmaps attached to a BDS, independently of whether they > + * are frozen or not (for use in bdrv_close()). > + */ This comment caught my attention ... > +static void bdrv_release_all_dirty_bitmaps(BlockDriverState *bs) > +{ > +BdrvDirtyBitmap *bm, *next; > +QLIST_FOREACH_SAFE(bm, &bs->dirty_bitmaps, list, next) { > +QLIST_REMOVE(bm, list); > +hbitmap_free(bm->bitmap); > +g_free(bm->name); > +g_free(bm); > +} > +} > + > void bdrv_disable_dirty_bitmap(BdrvDirtyBitmap *bitmap) > { > assert(!bdrv_dirty_bitmap_frozen(bitmap)); > Currently, the only way a bitmap could/should be frozen is if it is in use by a job. If a job is running, you probably shouldn't delete stuff it is using out from under it. I am assuming by now that it's actually likely that you've canceled any jobs attached to this node before you go through the motions of deleting it, and it should be safe to just call bdrv_release_dirty_bitmap ... We don't want the two foreach loops though, so maybe just factor out a helper that bdrv_release_all_dirty_bitmaps and bdrv_release_dirty_bitmap can share. You can leave the frozen assertion in just bdrv_release_dirty_bitmap before it invokes the helper, so that bdrv_delete always succeeds in case something gets out-of-sync in the internals. (Hmm, to prevent leaks, if you delete a bitmap that is frozen, you should probably call the helper on its child object too... or just make the helper recursive.)
Re: [Qemu-block] [Qemu-devel] [PATCH v10 12/14] block: add transactional properties
On 11/06/2015 03:32 AM, Kevin Wolf wrote: > Am 05.11.2015 um 19:52 hat John Snow geschrieben: >> >> >> On 11/05/2015 05:47 AM, Stefan Hajnoczi wrote: >>> On Tue, Nov 03, 2015 at 12:27:19PM -0500, John Snow wrote: On 11/03/2015 10:17 AM, Stefan Hajnoczi wrote: > On Fri, Oct 23, 2015 at 07:56:50PM -0400, John Snow wrote: >> @@ -1732,6 +1757,10 @@ static void >> block_dirty_bitmap_add_prepare(BlkActionState *common, >> BlockDirtyBitmapState *state = DO_UPCAST(BlockDirtyBitmapState, >> common, common); >> >> +if (action_check_cancel_mode(common, errp) < 0) { >> +return; >> +} >> + >> action = common->action->block_dirty_bitmap_add; >> /* AIO context taken and released within qmp_block_dirty_bitmap_add >> */ >> qmp_block_dirty_bitmap_add(action->node, action->name, >> @@ -1767,6 +1796,10 @@ static void >> block_dirty_bitmap_clear_prepare(BlkActionState *common, >> common, common); >> BlockDirtyBitmap *action; >> >> +if (action_check_cancel_mode(common, errp) < 0) { >> +return; >> +} >> + >> action = common->action->block_dirty_bitmap_clear; >> state->bitmap = block_dirty_bitmap_lookup(action->node, >>action->name, > > Why do the bitmap add/clear actions not support err-cancel=all? > > I understand why other block jobs don't support it, but it's not clear > why these non-block job actions cannot. > Because they don't have a callback to invoke if the rest of the job fails. I could create a BlockJob for them complete with a callback to invoke, but basically it's just because there's no interface to unwind them, or an interface to join them with the transaction. They're small, synchronous non-job actions. Which makes them weird. >>> >>> Funny, we've been looking at the same picture while seeing different >>> things: >>> https://en.wikipedia.org/wiki/Rabbit%E2%80%93duck_illusion >>> >>> I think I understand your idea: the transaction should include both >>> immediate actions as well as block jobs. >>> >>> My mental model was different: immediate actions commit/abort along with >>> the 'transaction' command. Block jobs are separate and complete/cancel >>> together in a group. >>> >>> In practice I think the two end up being similar because we won't be >>> able to implement immediate action commit/abort together with >>> long-running block jobs because the immediate actions rely on >>> quiescing/pausing the guest for atomic commit/abort. >>> >>> So with your mental model the QMP client has to submit 2 'transaction' >>> commands: 1 for the immediate actions, 1 for the block jobs. >>> >>> In my mental model the QMP client submits 1 command but the immediate >>> actions and block jobs are two separate transaction scopes. This means >>> if the block jobs fail, the client needs to be aware of the immediate >>> actions that have committed. Because of this, it becomes just as much >>> client effort as submitting two separate 'transaction' commands in your >>> model. >>> >>> Can anyone see a practical difference? I think I'm happy with John's >>> model. >>> >>> Stefan >>> >> >> We discussed this off-list, but for the sake of the archive: >> >> == How it is now == >> >> Currently, transactions have two implicit phases: the first is the >> synchronous phase. If this phase completes successfully, we consider the >> transaction a success. The second phase is the asynchronous phase where >> jobs launched by the synchronous phase run to completion. >> >> all synchronous commands must complete for the transaction to "succeed." >> There are currently (pre-patch) no guarantees about asynchronous command >> completion. As long as all synchronous actions complete, asynchronous >> actions are free to succeed or fail individually. > > You're overcomplicating this. All actions are currently synchronous and > what you consider asynchronous transaction actions aren't actually part > of the transaction at all. The action is "start block job X", not "run > block job X". > "OK," except the entire goal of the series was to allow the "aren't actually part of the transaction at all" parts to live and die together based on the transaction they were launched from. This really implies they belong to a second asynchronous phase of the transaction. Otherwise, why would totally unrelated jobs fail because a different one did? I realize this is an /extension/ of the existing semantics vs a "fix," and part of the confusion is how I and everyone else was looking at it differently. How could this happen, though? Let's look at our transaction documentation: "Operations that are currently supported:" [...] "- drive-backup" [...] "If there is any failure perf
Re: [Qemu-block] [PATCH] mirror: Improve zero-write and discard with fragmented image
On 06.11.2015 11:22, Fam Zheng wrote: > The "pnum < nb_sectors" condition in deciding whether to actually copy > data is unnecessarily strict, and the qiov initialization is > unnecessarily too, for both bdrv_aio_write_zeroes and bdrv_aio_discard > branches. > > Reorganize mirror_iteration flow so that we: > > 1) Find the contiguous zero/discarded sectors with > bdrv_get_block_status_above() before deciding what to do. We query > s->buf_size sized blocks at a time. > > 2) If the sectors in question are zeroed/discarded and aligned to > target cluster, issue zero write or discard accordingly. It's done > in mirror_do_zero_or_discard, where we don't add buffer to qiov. > > 3) Otherwise, do the same loop as before in mirror_do_read. > > Signed-off-by: Fam Zheng > --- > block/mirror.c | 161 > + > 1 file changed, 127 insertions(+), 34 deletions(-) Looks good overall, some comments below. Max > diff --git a/block/mirror.c b/block/mirror.c > index b1252a1..ac796b4 100644 > --- a/block/mirror.c > +++ b/block/mirror.c > @@ -157,28 +157,21 @@ static void mirror_read_complete(void *opaque, int ret) > mirror_write_complete, op); > } > > -static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s) > +static uint64_t mirror_do_read(MirrorBlockJob *s) > { > BlockDriverState *source = s->common.bs; > -int nb_sectors, sectors_per_chunk, nb_chunks; > -int64_t end, sector_num, next_chunk, next_sector, hbitmap_next_sector; > +int sectors_per_chunk, nb_sectors, nb_chunks; > +int64_t end, next_chunk, next_sector, hbitmap_next_sector, sector_num; > uint64_t delay_ns = 0; > MirrorOp *op; > -int pnum; > -int64_t ret; > > -s->sector_num = hbitmap_iter_next(&s->hbi); > -if (s->sector_num < 0) { > -bdrv_dirty_iter_init(s->dirty_bitmap, &s->hbi); > -s->sector_num = hbitmap_iter_next(&s->hbi); > -trace_mirror_restart_iter(s, bdrv_get_dirty_count(s->dirty_bitmap)); > -assert(s->sector_num >= 0); > -} > +sector_num = s->sector_num; > +sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS; > +end = s->bdev_length / BDRV_SECTOR_SIZE; > > +next_sector = sector_num; > +next_chunk = sector_num / sectors_per_chunk; @next_sector and @next_chunk set here... > hbitmap_next_sector = s->sector_num; > -sector_num = s->sector_num; > -sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS; > -end = s->bdev_length / BDRV_SECTOR_SIZE; > > /* Extend the QEMUIOVector to include all adjacent blocks that will > * be copied in this operation. > @@ -198,14 +191,6 @@ static uint64_t coroutine_fn > mirror_iteration(MirrorBlockJob *s) > next_sector = sector_num; > next_chunk = sector_num / sectors_per_chunk; ...and here already. So the above seems superfluous, considering that they are not read in between. (If you keep hbitmap_next_sector = s->sector_num; above the sector_num = ... block, that may reduce conflicts further) > > -/* Wait for I/O to this cluster (from a previous iteration) to be done. > */ > -while (test_bit(next_chunk, s->in_flight_bitmap)) { > -trace_mirror_yield_in_flight(s, sector_num, s->in_flight); > -s->waiting_for_io = true; > -qemu_coroutine_yield(); > -s->waiting_for_io = false; > -} > - > do { > int added_sectors, added_chunks; > > @@ -301,24 +286,132 @@ static uint64_t coroutine_fn > mirror_iteration(MirrorBlockJob *s) > s->sectors_in_flight += nb_sectors; > trace_mirror_one_iteration(s, sector_num, nb_sectors); > > -ret = bdrv_get_block_status_above(source, NULL, sector_num, > - nb_sectors, &pnum); > -if (ret < 0 || pnum < nb_sectors || > -(ret & BDRV_BLOCK_DATA && !(ret & BDRV_BLOCK_ZERO))) { > -bdrv_aio_readv(source, sector_num, &op->qiov, nb_sectors, > - mirror_read_complete, op); > -} else if (ret & BDRV_BLOCK_ZERO) { > +bdrv_aio_readv(source, sector_num, &op->qiov, nb_sectors, > + mirror_read_complete, op); > +return delay_ns; > +} > + > +static uint64_t mirror_do_zero_or_discard(MirrorBlockJob *s, > + int64_t sector_num, > + int nb_sectors, > + bool is_discard) > +{ > +int sectors_per_chunk, nb_chunks; > +int64_t next_chunk, next_sector, hbitmap_next_sector; > +uint64_t delay_ns = 0; > +MirrorOp *op; > + > +sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS; > +assert(nb_sectors >= sectors_per_chunk); > +next_chunk = sector_num / sectors_per_chunk; > +nb_chunks = DIV_ROUND_UP(nb_sectors, sectors_per_chunk); > +bitmap_set(s->in_flight_bitmap, next_chunk, nb_chunks); > +delay_ns = ratelimit_calculate_de
Re: [Qemu-block] [Qemu-devel] [PATCH v10 12/14] block: add transactional properties
On Fri, Nov 06, 2015 at 09:32:19AM +0100, Kevin Wolf wrote: > Am 05.11.2015 um 19:52 hat John Snow geschrieben: > > > > > > On 11/05/2015 05:47 AM, Stefan Hajnoczi wrote: > > > On Tue, Nov 03, 2015 at 12:27:19PM -0500, John Snow wrote: > > >> > > >> > > >> On 11/03/2015 10:17 AM, Stefan Hajnoczi wrote: > > >>> On Fri, Oct 23, 2015 at 07:56:50PM -0400, John Snow wrote: > > @@ -1732,6 +1757,10 @@ static void > > block_dirty_bitmap_add_prepare(BlkActionState *common, > > BlockDirtyBitmapState *state = DO_UPCAST(BlockDirtyBitmapState, > > common, common); > > > > +if (action_check_cancel_mode(common, errp) < 0) { > > +return; > > +} > > + > > action = common->action->block_dirty_bitmap_add; > > /* AIO context taken and released within > > qmp_block_dirty_bitmap_add */ > > qmp_block_dirty_bitmap_add(action->node, action->name, > > @@ -1767,6 +1796,10 @@ static void > > block_dirty_bitmap_clear_prepare(BlkActionState *common, > > common, common); > > BlockDirtyBitmap *action; > > > > +if (action_check_cancel_mode(common, errp) < 0) { > > +return; > > +} > > + > > action = common->action->block_dirty_bitmap_clear; > > state->bitmap = block_dirty_bitmap_lookup(action->node, > > action->name, > > >>> > > >>> Why do the bitmap add/clear actions not support err-cancel=all? > > >>> > > >>> I understand why other block jobs don't support it, but it's not clear > > >>> why these non-block job actions cannot. > > >>> > > >> > > >> Because they don't have a callback to invoke if the rest of the job > > >> fails. > > >> > > >> I could create a BlockJob for them complete with a callback to invoke, > > >> but basically it's just because there's no interface to unwind them, or > > >> an interface to join them with the transaction. > > >> > > >> They're small, synchronous non-job actions. Which makes them weird. > > > > > > Funny, we've been looking at the same picture while seeing different > > > things: > > > https://en.wikipedia.org/wiki/Rabbit%E2%80%93duck_illusion > > > > > > I think I understand your idea: the transaction should include both > > > immediate actions as well as block jobs. > > > > > > My mental model was different: immediate actions commit/abort along with > > > the 'transaction' command. Block jobs are separate and complete/cancel > > > together in a group. > > > > > > In practice I think the two end up being similar because we won't be > > > able to implement immediate action commit/abort together with > > > long-running block jobs because the immediate actions rely on > > > quiescing/pausing the guest for atomic commit/abort. > > > > > > So with your mental model the QMP client has to submit 2 'transaction' > > > commands: 1 for the immediate actions, 1 for the block jobs. > > > > > > In my mental model the QMP client submits 1 command but the immediate > > > actions and block jobs are two separate transaction scopes. This means > > > if the block jobs fail, the client needs to be aware of the immediate > > > actions that have committed. Because of this, it becomes just as much > > > client effort as submitting two separate 'transaction' commands in your > > > model. > > > > > > Can anyone see a practical difference? I think I'm happy with John's > > > model. > > > > > > Stefan > > > > > > > We discussed this off-list, but for the sake of the archive: > > > > == How it is now == > > > > Currently, transactions have two implicit phases: the first is the > > synchronous phase. If this phase completes successfully, we consider the > > transaction a success. The second phase is the asynchronous phase where > > jobs launched by the synchronous phase run to completion. > > > > all synchronous commands must complete for the transaction to "succeed." > > There are currently (pre-patch) no guarantees about asynchronous command > > completion. As long as all synchronous actions complete, asynchronous > > actions are free to succeed or fail individually. > > You're overcomplicating this. All actions are currently synchronous and > what you consider asynchronous transaction actions aren't actually part > of the transaction at all. The action is "start block job X", not "run > block job X". Yes, this is how I've thought of it too. > > == My Model == > > > > The current behavior is my "err-cancel = none" scenario: we offer no > > guarantee about the success or failure of the transaction as a whole > > after the synchronous portion has completed. > > > > What I was proposing is "err-cancel = all," which to me means that _ALL_ > > commands in this transaction are to succeed (synchronous or not) before > > _any_ actions are irrevocably committed. This means that f
Re: [Qemu-block] [PATCH v7 12/35] util: let qemu_fd_getlength support block device
On Mon, Nov 02, 2015 at 05:13:14PM +0800, Xiao Guangrong wrote: > lseek can not work for all block devices as the man page says: > | Some devices are incapable of seeking and POSIX does not specify > | which devices must support lseek(). > > This patch tries to add the support on Linux by using BLKGETSIZE64 > ioctl > > Signed-off-by: Xiao Guangrong On which cases is this patch necessary? Do you know any examples of Linux block devices that won't work with lseek(SEEK_END)? > --- > util/osdep.c | 20 > 1 file changed, 20 insertions(+) > > diff --git a/util/osdep.c b/util/osdep.c > index 5a61e19..b20c793 100644 > --- a/util/osdep.c > +++ b/util/osdep.c > @@ -45,6 +45,11 @@ > extern int madvise(caddr_t, size_t, int); > #endif > > +#ifdef CONFIG_LINUX > +#include > +#include > +#endif > + > #include "qemu-common.h" > #include "qemu/sockets.h" > #include "qemu/error-report.h" > @@ -433,6 +438,21 @@ int64_t qemu_fd_getlength(int fd) > { > int64_t size; > > +#ifdef CONFIG_LINUX > +struct stat stat_buf; > +if (fstat(fd, &stat_buf) < 0) { > +return -errno; > +} > + > +if ((S_ISBLK(stat_buf.st_mode)) && !ioctl(fd, BLKGETSIZE64, &size)) { > +/* The size of block device is larger than max int64_t? */ > +if (size < 0) { > +return -EOVERFLOW; > +} > +return size; > +} > +#endif > + > size = lseek(fd, 0, SEEK_END); > if (size < 0) { > return -errno; > -- > 1.8.3.1 > -- Eduardo
Re: [Qemu-block] [PATCH v7 11/35] util: introduce qemu_file_getlength()
As this patch affects raw_getlength(), CCing the raw block driver maintainer and the qemu-block mailing list. On Mon, Nov 02, 2015 at 05:13:13PM +0800, Xiao Guangrong wrote: > It is used to get the size of the specified file, also qemu_fd_getlength() > is introduced to unify the code with raw_getlength() in block/raw-posix.c > > Signed-off-by: Xiao Guangrong > --- > block/raw-posix.c| 7 +-- > include/qemu/osdep.h | 2 ++ > util/osdep.c | 31 +++ I know I was the one who suggested osdep.c, but maybe oslib-posix.c is a more appropriate place for the new function? > 3 files changed, 34 insertions(+), 6 deletions(-) > > diff --git a/block/raw-posix.c b/block/raw-posix.c > index 918c756..734e6dd 100644 > --- a/block/raw-posix.c > +++ b/block/raw-posix.c > @@ -1592,18 +1592,13 @@ static int64_t raw_getlength(BlockDriverState *bs) > { > BDRVRawState *s = bs->opaque; > int ret; > -int64_t size; > > ret = fd_open(bs); > if (ret < 0) { > return ret; > } > > -size = lseek(s->fd, 0, SEEK_END); > -if (size < 0) { > -return -errno; > -} > -return size; > +return qemu_fd_getlength(s->fd); > } > #endif > > diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h > index dbc17dc..ca4c3fa 100644 > --- a/include/qemu/osdep.h > +++ b/include/qemu/osdep.h > @@ -303,4 +303,6 @@ int qemu_read_password(char *buf, int buf_size); > pid_t qemu_fork(Error **errp); > > size_t qemu_file_get_page_size(const char *mem_path, Error **errp); > +int64_t qemu_fd_getlength(int fd); > +size_t qemu_file_getlength(const char *file, Error **errp); > #endif > diff --git a/util/osdep.c b/util/osdep.c > index 0092bb6..5a61e19 100644 > --- a/util/osdep.c > +++ b/util/osdep.c > @@ -428,3 +428,34 @@ writev(int fd, const struct iovec *iov, int iov_cnt) > return readv_writev(fd, iov, iov_cnt, true); > } > #endif > + > +int64_t qemu_fd_getlength(int fd) > +{ > +int64_t size; > + > +size = lseek(fd, 0, SEEK_END); > +if (size < 0) { > +return -errno; > +} > +return size; > +} > + > +size_t qemu_file_getlength(const char *file, Error **errp) > +{ > +int64_t size; > +int fd = qemu_open(file, O_RDONLY); > + > +if (fd < 0) { > +error_setg_file_open(errp, errno, file); > +return 0; > +} > + > +size = qemu_fd_getlength(fd); > +if (size < 0) { > +error_setg_errno(errp, -size, "can't get size of file %s", file); > +size = 0; > +} > + > +qemu_close(fd); > +return size; > +} > -- > 1.8.3.1 > -- Eduardo
Re: [Qemu-block] [PATCH v7 12/35] util: let qemu_fd_getlength support block device
As this patch affects raw_getlength(), CCing the raw block driver maintainer and the qemu-block mailing list. On Mon, Nov 02, 2015 at 05:13:14PM +0800, Xiao Guangrong wrote: > lseek can not work for all block devices as the man page says: > | Some devices are incapable of seeking and POSIX does not specify > | which devices must support lseek(). > > This patch tries to add the support on Linux by using BLKGETSIZE64 > ioctl > > Signed-off-by: Xiao Guangrong > --- > util/osdep.c | 20 > 1 file changed, 20 insertions(+) > > diff --git a/util/osdep.c b/util/osdep.c > index 5a61e19..b20c793 100644 > --- a/util/osdep.c > +++ b/util/osdep.c > @@ -45,6 +45,11 @@ > extern int madvise(caddr_t, size_t, int); > #endif > > +#ifdef CONFIG_LINUX > +#include > +#include > +#endif > + > #include "qemu-common.h" > #include "qemu/sockets.h" > #include "qemu/error-report.h" > @@ -433,6 +438,21 @@ int64_t qemu_fd_getlength(int fd) > { > int64_t size; > > +#ifdef CONFIG_LINUX > +struct stat stat_buf; > +if (fstat(fd, &stat_buf) < 0) { > +return -errno; > +} > + > +if ((S_ISBLK(stat_buf.st_mode)) && !ioctl(fd, BLKGETSIZE64, &size)) { > +/* The size of block device is larger than max int64_t? */ > +if (size < 0) { > +return -EOVERFLOW; > +} > +return size; > +} > +#endif > + > size = lseek(fd, 0, SEEK_END); > if (size < 0) { > return -errno; > -- > 1.8.3.1 > -- Eduardo
[Qemu-block] [PATCH v9 11/15] qmp: Introduce blockdev-change-medium
Introduce a new QMP command 'blockdev-change-medium' which is intended to replace the 'change' command for block devices. The existing function qmp_change_blockdev() is accordingly renamed to qmp_blockdev_change_medium(). Signed-off-by: Max Reitz --- blockdev.c| 7 --- include/sysemu/blockdev.h | 2 -- qapi-schema.json | 6 -- qapi/block-core.json | 23 +++ qmp-commands.hx | 31 +++ qmp.c | 2 +- ui/cocoa.m| 10 ++ 7 files changed, 69 insertions(+), 12 deletions(-) diff --git a/blockdev.c b/blockdev.c index df93173..dec6169 100644 --- a/blockdev.c +++ b/blockdev.c @@ -2133,8 +2133,9 @@ void qmp_blockdev_insert_medium(const char *device, const char *node_name, qmp_blockdev_insert_anon_medium(device, bs, errp); } -void qmp_change_blockdev(const char *device, const char *filename, - const char *format, Error **errp) +void qmp_blockdev_change_medium(const char *device, const char *filename, +bool has_format, const char *format, +Error **errp) { BlockBackend *blk; BlockDriverState *medium_bs = NULL; @@ -2155,7 +2156,7 @@ void qmp_change_blockdev(const char *device, const char *filename, bdrv_flags = blk_get_open_flags_from_root_state(blk); -if (format) { +if (has_format) { options = qdict_new(); qdict_put(options, "driver", qstring_from_str(format)); } diff --git a/include/sysemu/blockdev.h b/include/sysemu/blockdev.h index a00be94..b06a060 100644 --- a/include/sysemu/blockdev.h +++ b/include/sysemu/blockdev.h @@ -63,8 +63,6 @@ DriveInfo *drive_new(QemuOpts *arg, BlockInterfaceType block_default_type); /* device-hotplug */ -void qmp_change_blockdev(const char *device, const char *filename, - const char *format, Error **errp); void hmp_commit(Monitor *mon, const QDict *qdict); void hmp_drive_del(Monitor *mon, const QDict *qdict); #endif diff --git a/qapi-schema.json b/qapi-schema.json index 702b7b5..058b8ec 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -1842,8 +1842,10 @@ # device's password. The behavior of reads and writes to the block # device between when these calls are executed is undefined. # -# Notes: It is strongly recommended that this interface is not used especially -# for changing block devices. +# Notes: This interface is deprecated, and it is strongly recommended that you +# avoid using it. For changing block devices, use +# blockdev-change-medium; for changing VNC parameters, use +# change-vnc-password. # # Since: 0.14.0 ## diff --git a/qapi/block-core.json b/qapi/block-core.json index 5c4fc72..e9fa649 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -1959,6 +1959,29 @@ ## +# @blockdev-change-medium: +# +# Changes the medium inserted into a block device by ejecting the current medium +# and loading a new image file which is inserted as the new medium (this command +# combines blockdev-open-tray, blockdev-remove-medium, blockdev-insert-medium +# and blockdev-close-tray). +# +# @device: block device name +# +# @filename:filename of the new image to be loaded +# +# @format: #optional, format to open the new image with (defaults to +# the probed format) +# +# Since: 2.5 +## +{ 'command': 'blockdev-change-medium', + 'data': { 'device': 'str', +'filename': 'str', +'*format': 'str' } } + + +## # @BlockErrorAction # # An enumeration of action that has been taken when a DISK I/O occurs diff --git a/qmp-commands.hx b/qmp-commands.hx index 1dfa809..d6e4d7e 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -4163,6 +4163,37 @@ Example: EQMP { +.name = "blockdev-change-medium", +.args_type = "device:B,filename:F,format:s?", +.mhandler.cmd_new = qmp_marshal_blockdev_change_medium, +}, + +SQMP +blockdev-change-medium +-- + +Changes the medium inserted into a block device by ejecting the current medium +and loading a new image file which is inserted as the new medium. + +Arguments: + +- "device": device name (json-string) +- "filename": filename of the new image (json-string) +- "format": format of the new image (json-string, optional) + +Examples: + +1. Change a removable medium + +-> { "execute": "blockdev-change-medium", + "arguments": { "device": "ide1-cd0", +"filename": "/srv/images/Fedora-12-x86_64-DVD.iso", +"format": "raw" } } +<- { "return": {} } + +EQMP + +{ .name = "query-memdev", .args_type = "", .mhandler.cmd_new = qmp_marshal_query_memdev, diff --git a/qmp.c b/qmp.c index ff54e5a..4e44f98 100644 --- a/qmp.c +++ b/qmp.c @@ -414,7 +414,7 @@ void q
[Qemu-block] [PATCH v9 00/15] blockdev: BlockBackend and media
Patch 11 of the last series broke the OS X build due to ui/cocoa.m directly referencing qmp_change_blockdev() (which was an internal function not directly mapped to any QMP command before, now it becomes an "external" function and is renamed qmp_blockdev_change_medium()). This v9 consists only of that patch, because the rest is completely unchanged from v8. Unfortunately, neither am I rich enough nor do I have the immediate intention of buying Apple hardware any time soon anyway, so I have no way of verifying that this changed version builds on OS X. I would appreciate help with that. Oh, and if anybody ever asks me again why it might be a bad idea to put management code into the GUI layer of one specific operating system... git-backport-diff against v8: Key: [] : patches are identical [] : number of functional differences between upstream/downstream patch [down] : patch is downstream-only The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively 001/15:[] [--] 'block: Add blk_remove_bs()' 002/15:[] [--] 'block: Make bdrv_states public' 003/15:[] [--] 'block: Add functions for inheriting a BBRS' 004/15:[] [--] 'blockdev: Add blockdev-open-tray' 005/15:[] [--] 'blockdev: Add blockdev-close-tray' 006/15:[] [--] 'blockdev: Add blockdev-remove-medium' 007/15:[] [--] 'blockdev: Add blockdev-insert-medium' 008/15:[] [--] 'blockdev: Implement eject with basic operations' 009/15:[] [--] 'blockdev: Implement change with basic operations' 010/15:[] [--] 'block: Inquire tray state before tray-moved events' 011/15:[0010] [FC] 'qmp: Introduce blockdev-change-medium' 012/15:[] [--] 'hmp: Use blockdev-change-medium for change command' 013/15:[] [--] 'blockdev: read-only-mode for blockdev-change-medium' 014/15:[] [--] 'hmp: Add read-only-mode option to change command' 015/15:[] [--] 'iotests: Add test for change-related QMP commands' Max Reitz (15): block: Add blk_remove_bs() block: Make bdrv_states public block: Add functions for inheriting a BBRS blockdev: Add blockdev-open-tray blockdev: Add blockdev-close-tray blockdev: Add blockdev-remove-medium blockdev: Add blockdev-insert-medium blockdev: Implement eject with basic operations blockdev: Implement change with basic operations block: Inquire tray state before tray-moved events qmp: Introduce blockdev-change-medium hmp: Use blockdev-change-medium for change command blockdev: read-only-mode for blockdev-change-medium hmp: Add read-only-mode option to change command iotests: Add test for change-related QMP commands block.c| 3 +- block/block-backend.c | 56 +++- blockdev.c | 286 hmp-commands.hx| 20 +- hmp.c | 47 ++- include/block/block_int.h | 2 + include/sysemu/block-backend.h | 3 + include/sysemu/blockdev.h | 2 - qapi-schema.json | 6 +- qapi/block-core.json | 126 qmp-commands.hx| 218 + qmp.c | 3 +- tests/qemu-iotests/118 | 720 + tests/qemu-iotests/118.out | 5 + tests/qemu-iotests/group | 1 + ui/cocoa.m | 10 +- 16 files changed, 1402 insertions(+), 106 deletions(-) create mode 100755 tests/qemu-iotests/118 create mode 100644 tests/qemu-iotests/118.out -- 2.6.2
Re: [Qemu-block] [Qemu-devel] [PULL 00/37] Block layer patches
On 06.11.2015 09:17, Kevin Wolf wrote: > Am 05.11.2015 um 20:01 hat Peter Maydell geschrieben: >> On 5 November 2015 at 18:17, Kevin Wolf wrote: >>> The following changes since commit 8835b9df3bddf332c883c861d6a1defc12c4ebe9: >>> >>> Merge remote-tracking branch >>> 'remotes/mdroth/tags/qga-pull-2015-11-04-tag' into staging (2015-11-05 >>> 10:52:35 +) >>> >>> are available in the git repository at: >>> >>> >>> git://repo.or.cz/qemu/kevin.git tags/for-upstream >>> >>> for you to fetch changes up to 22bdadd23b64af65ac2dd816848dbe2b1240a77a: >>> >>> Merge remote-tracking branch >>> 'mreitz/tags/pull-block-for-kevin-2015-11-05' into queue-block (2015-11-05 >>> 18:01:37 +0100) >>> >>> >>> >>> Block layer patches >> >> Fails to build on OSX, I'm afraid: >> Undefined symbols for architecture x86_64: >> "_qmp_change_blockdev", referenced from: >> -[QemuCocoaAppController changeDeviceMedia:] in cocoa.o > > Max, I think this is yours. Indeed, will fix. Sorry. Max signature.asc Description: OpenPGP digital signature
[Qemu-block] BlockDeviceInfo's backing_file and ImageInfo's backing-filename
Hi all, I noticed that query-named-block-nodes (and HMP 'info block') returns odd results when the effective backing image of a BlockDriverState is different from the string that is stored on the file. There are two fields exposed on the API that show this information: BlockDeviceInfo's backing_file and ImageInfo's backing-filename. From the description I would expect that the former refers to the actual backing file in use (bs->backing) and the latter to the string stored in the file. That's however not what happens. Let's see a few examples: qemu-img create -f qcow2 base.img 4G qemu-img create -f qcow2 -o backing_file=base.img hd0.img qemu-img create -f qcow2 -o backing_file=hd0.img hd1.img 1) { "execute": "blockdev-add", "arguments": { "options": { "driver": "qcow2", "file": { "driver": "file", "filename": "hd1.img" }, "id": "drive0" } } } The output here is normal: BlockDeviceInfo.backing_file = hd0.img BlockDeviceInfo.image.backing-filename: hd0.img 2) { "execute": "blockdev-add", "arguments": { "options": { "driver": "qcow2", "file": { "driver": "file", "filename": "hd1.img" }, "id": "drive0", "backing": { "driver": "qcow2", "file": { "driver": "file", "filename": "base.img" } } } } } Now there's no reference to hd0.img anywhere: BlockDeviceInfo.backing_file = base.img BlockDeviceInfo.image.backing-filename: base.img 3) { "execute": "blockdev-add", "arguments": { "options": { "driver": "qcow2", "file": { "driver": "file", "filename": "hd1.img" }, "id": "drive0", "backing": "" } } } Now both fields point to hd0.img, although it hasn't been opened: BlockDeviceInfo.backing_file = hd0.img BlockDeviceInfo.image.backing-filename: hd0.img I would like to fix this but first I need to know the exact semantics of both fields, and the exact meaning of bs->backing_file. On top of that the actual backing file string is also present in the format-specific structs, e.g. BDRVQcow2State.image_backing_file, but that's not directly accessible from the API (it's not in ImageInfoSpecific). Berto
Re: [Qemu-block] [PATCH] virtio-blk: trivial code optimization
On 06/11/2015 11:35, Stefan Hajnoczi wrote: >> > if (niov + req->qiov.niov > IOV_MAX) { >> > merge = false; >> > +goto unmerge; >> > } >> > >> > /* merge would exceed maximum transfer length of backend >> > device */ >> > if (req->qiov.size / BDRV_SECTOR_SIZE + nb_sectors > >> > max_xfer_len) { >> > merge = false; >> > +goto unmerge; >> > } >> > >> > /* requests are not sequential */ >> > if (sector_num + nb_sectors != req->sector_num) { >> > merge = false; >> > } >> > - >> > +unmerge: > C has a way of expressing this without gotos. Please use else if: > > if (a) { > ... > } else if (b) { > ... > } else if (c) { > ... > } Another way is if (niov + req->qiov.niov > IOV_MAX || req->qiov.size / BDRV_SECTOR_SIZE + nb_sectors > max_xfer_len || sector_num + nb_sectors != req->sector_num) { submit_requests(...) ... } While at it, we could reorder the conditions so that the most common ("requests are not sequential") comes first. I'm not sure about handling of overflow. It's probably better to write conditions as "new > max - old" (e.g. "niov > IOV_MAX - req->qiov.niov") rather than "old + new > max". The former is always safe, because we know that old <= max and there can be no integer overflow. Thanks, Paolo
Re: [Qemu-block] [PATCH V4] block/nfs: add support for setting debug level
Am 06.11.2015 um 12:23 schrieb Stefan Hajnoczi: > On Thu, Nov 05, 2015 at 11:25:34PM +0100, Peter Lieven wrote: >> recent libnfs versions support logging debug messages. Add >> support for it in qemu through an URL parameter. >> >> Example: >> qemu -cdrom nfs://127.0.0.1/iso/my.iso?debug=2 >> --- >> v3->v4: revert to the initial version, but limit max debug level >> v2->v3: use a per-drive option instead of a global one. [Stefan] >> v1->v2: reworked patch to accept the debug level as a cmdline >> parameter instead of an URI parameter [Stefan] >> >> block/nfs.c | 10 ++ >> 1 file changed, 10 insertions(+) >> >> diff --git a/block/nfs.c b/block/nfs.c >> index fd79f89..fbea25a 100644 >> --- a/block/nfs.c >> +++ b/block/nfs.c >> @@ -36,6 +36,7 @@ >> #include >> >> #define QEMU_NFS_MAX_READAHEAD_SIZE 1048576 >> +#define QEMU_NFS_MAX_DEBUG_LEVEL 2 >> >> typedef struct NFSClient { >> struct nfs_context *context; >> @@ -334,6 +335,15 @@ static int64_t nfs_client_open(NFSClient *client, const >> char *filename, >> } >> nfs_set_readahead(client->context, val); >> #endif >> +#ifdef LIBNFS_FEATURE_DEBUG >> +} else if (!strcmp(qp->p[i].name, "debug")) { >> +if (val > QEMU_NFS_MAX_DEBUG_LEVEL) { >> +error_report("NFS Warning: Limiting NFS debug level" >> + " to %d", QEMU_NFS_MAX_DEBUG_LEVEL); >> +val = QEMU_NFS_MAX_DEBUG_LEVEL; >> +} >> +nfs_set_debug(client->context, val); >> +#endif > Please include the rationale for limiting the debug level as a comment > in the code. Otherwise people may read the code and not understand why > we do this. Okay. Then we should add this to the readahead limitation as well. > > The error message should indicate how to set high debug levels (if > possible). A simple "You can't do that" error message is frustrating > :). But thats the case at the moment ;-) Peter
Re: [Qemu-block] [PATCH V4] block/nfs: add support for setting debug level
On Thu, Nov 05, 2015 at 11:25:34PM +0100, Peter Lieven wrote: > recent libnfs versions support logging debug messages. Add > support for it in qemu through an URL parameter. > > Example: > qemu -cdrom nfs://127.0.0.1/iso/my.iso?debug=2 > --- > v3->v4: revert to the initial version, but limit max debug level > v2->v3: use a per-drive option instead of a global one. [Stefan] > v1->v2: reworked patch to accept the debug level as a cmdline > parameter instead of an URI parameter [Stefan] > > block/nfs.c | 10 ++ > 1 file changed, 10 insertions(+) > > diff --git a/block/nfs.c b/block/nfs.c > index fd79f89..fbea25a 100644 > --- a/block/nfs.c > +++ b/block/nfs.c > @@ -36,6 +36,7 @@ > #include > > #define QEMU_NFS_MAX_READAHEAD_SIZE 1048576 > +#define QEMU_NFS_MAX_DEBUG_LEVEL 2 > > typedef struct NFSClient { > struct nfs_context *context; > @@ -334,6 +335,15 @@ static int64_t nfs_client_open(NFSClient *client, const > char *filename, > } > nfs_set_readahead(client->context, val); > #endif > +#ifdef LIBNFS_FEATURE_DEBUG > +} else if (!strcmp(qp->p[i].name, "debug")) { > +if (val > QEMU_NFS_MAX_DEBUG_LEVEL) { > +error_report("NFS Warning: Limiting NFS debug level" > + " to %d", QEMU_NFS_MAX_DEBUG_LEVEL); > +val = QEMU_NFS_MAX_DEBUG_LEVEL; > +} > +nfs_set_debug(client->context, val); > +#endif Please include the rationale for limiting the debug level as a comment in the code. Otherwise people may read the code and not understand why we do this. The error message should indicate how to set high debug levels (if possible). A simple "You can't do that" error message is frustrating :). signature.asc Description: PGP signature
Re: [Qemu-block] [PATCH] virtio-blk: trivial code optimization
On Fri, Nov 06, 2015 at 09:04:57AM +0800, arei.gong...@huawei.com wrote: > diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c > index 093e475..752586d 100644 > --- a/hw/block/virtio-blk.c > +++ b/hw/block/virtio-blk.c > @@ -409,18 +409,20 @@ void virtio_blk_submit_multireq(BlockBackend *blk, > MultiReqBuffer *mrb) > /* merge would exceed maximum number of IOVs */ > if (niov + req->qiov.niov > IOV_MAX) { > merge = false; > +goto unmerge; > } > > /* merge would exceed maximum transfer length of backend device > */ > if (req->qiov.size / BDRV_SECTOR_SIZE + nb_sectors > > max_xfer_len) { > merge = false; > +goto unmerge; > } > > /* requests are not sequential */ > if (sector_num + nb_sectors != req->sector_num) { > merge = false; > } > - > +unmerge: C has a way of expressing this without gotos. Please use else if: if (a) { ... } else if (b) { ... } else if (c) { ... } signature.asc Description: PGP signature
[Qemu-block] [PATCH] mirror: Improve zero-write and discard with fragmented image
The "pnum < nb_sectors" condition in deciding whether to actually copy data is unnecessarily strict, and the qiov initialization is unnecessarily too, for both bdrv_aio_write_zeroes and bdrv_aio_discard branches. Reorganize mirror_iteration flow so that we: 1) Find the contiguous zero/discarded sectors with bdrv_get_block_status_above() before deciding what to do. We query s->buf_size sized blocks at a time. 2) If the sectors in question are zeroed/discarded and aligned to target cluster, issue zero write or discard accordingly. It's done in mirror_do_zero_or_discard, where we don't add buffer to qiov. 3) Otherwise, do the same loop as before in mirror_do_read. Signed-off-by: Fam Zheng --- block/mirror.c | 161 + 1 file changed, 127 insertions(+), 34 deletions(-) diff --git a/block/mirror.c b/block/mirror.c index b1252a1..ac796b4 100644 --- a/block/mirror.c +++ b/block/mirror.c @@ -157,28 +157,21 @@ static void mirror_read_complete(void *opaque, int ret) mirror_write_complete, op); } -static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s) +static uint64_t mirror_do_read(MirrorBlockJob *s) { BlockDriverState *source = s->common.bs; -int nb_sectors, sectors_per_chunk, nb_chunks; -int64_t end, sector_num, next_chunk, next_sector, hbitmap_next_sector; +int sectors_per_chunk, nb_sectors, nb_chunks; +int64_t end, next_chunk, next_sector, hbitmap_next_sector, sector_num; uint64_t delay_ns = 0; MirrorOp *op; -int pnum; -int64_t ret; -s->sector_num = hbitmap_iter_next(&s->hbi); -if (s->sector_num < 0) { -bdrv_dirty_iter_init(s->dirty_bitmap, &s->hbi); -s->sector_num = hbitmap_iter_next(&s->hbi); -trace_mirror_restart_iter(s, bdrv_get_dirty_count(s->dirty_bitmap)); -assert(s->sector_num >= 0); -} +sector_num = s->sector_num; +sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS; +end = s->bdev_length / BDRV_SECTOR_SIZE; +next_sector = sector_num; +next_chunk = sector_num / sectors_per_chunk; hbitmap_next_sector = s->sector_num; -sector_num = s->sector_num; -sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS; -end = s->bdev_length / BDRV_SECTOR_SIZE; /* Extend the QEMUIOVector to include all adjacent blocks that will * be copied in this operation. @@ -198,14 +191,6 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s) next_sector = sector_num; next_chunk = sector_num / sectors_per_chunk; -/* Wait for I/O to this cluster (from a previous iteration) to be done. */ -while (test_bit(next_chunk, s->in_flight_bitmap)) { -trace_mirror_yield_in_flight(s, sector_num, s->in_flight); -s->waiting_for_io = true; -qemu_coroutine_yield(); -s->waiting_for_io = false; -} - do { int added_sectors, added_chunks; @@ -301,24 +286,132 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s) s->sectors_in_flight += nb_sectors; trace_mirror_one_iteration(s, sector_num, nb_sectors); -ret = bdrv_get_block_status_above(source, NULL, sector_num, - nb_sectors, &pnum); -if (ret < 0 || pnum < nb_sectors || -(ret & BDRV_BLOCK_DATA && !(ret & BDRV_BLOCK_ZERO))) { -bdrv_aio_readv(source, sector_num, &op->qiov, nb_sectors, - mirror_read_complete, op); -} else if (ret & BDRV_BLOCK_ZERO) { +bdrv_aio_readv(source, sector_num, &op->qiov, nb_sectors, + mirror_read_complete, op); +return delay_ns; +} + +static uint64_t mirror_do_zero_or_discard(MirrorBlockJob *s, + int64_t sector_num, + int nb_sectors, + bool is_discard) +{ +int sectors_per_chunk, nb_chunks; +int64_t next_chunk, next_sector, hbitmap_next_sector; +uint64_t delay_ns = 0; +MirrorOp *op; + +sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS; +assert(nb_sectors >= sectors_per_chunk); +next_chunk = sector_num / sectors_per_chunk; +nb_chunks = DIV_ROUND_UP(nb_sectors, sectors_per_chunk); +bitmap_set(s->in_flight_bitmap, next_chunk, nb_chunks); +delay_ns = ratelimit_calculate_delay(&s->limit, nb_sectors); + +/* Allocate a MirrorOp that is used as an AIO callback. The qiov is zeroed + * out so the freeing in iteration is nop. */ +op = g_new0(MirrorOp, 1); +op->s = s; +op->sector_num = sector_num; +op->nb_sectors = nb_sectors; + +/* Advance the HBitmapIter in parallel, so that we do not examine + * the same sector twice. + */ +hbitmap_next_sector = sector_num; +next_sector = sector_num + nb_sectors; +while (next_sector > hbitmap_next_sector) { +hbitmap_next_sector = hbitmap_iter_next(&s->hbi)
Re: [Qemu-block] [PATCH v4 0/3] qemu-io: clean up cvtnum usage
Am 06.11.2015 um 00:53 hat John Snow geschrieben: > cvtnum returns an int64_t, not an int, so correct the lvalue types > wherever it is used. While we're at it, make the error messages more > meaningful and hopefully less confusing. > > v4: > - Now missing ALL sweaters > > v3: > - pulled a lot of loose yarn, now missing my sweater >(Updated patch 1 even further, reported-by Kevin) > > v2: > - Squashed NSIG error-checking from patch 3 into patch 1 > - Reported-by credits for Max and Reviewed-by from Eric added Thanks, applied to the block branch. (Should we mention in the changelog that qemu 2.5 contains some of your sweaters?) Kevin
[Qemu-block] [PATCH V3 6/6] ide: enable buffered requests for PIO read requests
Signed-off-by: Peter Lieven --- hw/ide/core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/hw/ide/core.c b/hw/ide/core.c index 53f9c2c..d1feae2 100644 --- a/hw/ide/core.c +++ b/hw/ide/core.c @@ -676,8 +676,8 @@ static void ide_sector_read(IDEState *s) block_acct_start(blk_get_stats(s->blk), &s->acct, n * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ); -s->pio_aiocb = blk_aio_readv(s->blk, sector_num, &s->qiov, n, - ide_sector_read_cb, s); +s->pio_aiocb = ide_buffered_readv(s, sector_num, &s->qiov, n, + ide_sector_read_cb, s); } void dma_buf_commit(IDEState *s, uint32_t tx_bytes) -- 1.9.1
[Qemu-block] [PATCH V3 5/6] ide: enable buffered requests for ATAPI devices
Signed-off-by: Peter Lieven --- hw/ide/atapi.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c index 29fd131..2f6d018 100644 --- a/hw/ide/atapi.c +++ b/hw/ide/atapi.c @@ -190,8 +190,8 @@ static int cd_read_sector(IDEState *s, void *buf) block_acct_start(blk_get_stats(s->blk), &s->acct, 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ); -blk_aio_readv(s->blk, (int64_t)s->lba << 2, &s->qiov, 4, - cd_read_sector_cb, s); +ide_buffered_readv(s, (int64_t)s->lba << 2, &s->qiov, 4, + cd_read_sector_cb, s); s->status |= BUSY_STAT; return 0; @@ -424,9 +424,9 @@ static void ide_atapi_cmd_read_dma_cb(void *opaque, int ret) s->bus->dma->iov.iov_len = n * 4 * 512; qemu_iovec_init_external(&s->bus->dma->qiov, &s->bus->dma->iov, 1); -s->bus->dma->aiocb = blk_aio_readv(s->blk, (int64_t)s->lba << 2, - &s->bus->dma->qiov, n * 4, - ide_atapi_cmd_read_dma_cb, s); +s->bus->dma->aiocb = ide_buffered_readv(s, (int64_t)s->lba << 2, +&s->bus->dma->qiov, n * 4, +ide_atapi_cmd_read_dma_cb, s); return; eot: -- 1.9.1
[Qemu-block] [PATCH V3 0/6] ide: avoid main-loop hang on CDROM/NFS failure
This series aims at avoiding a hanging main-loop if a vserver has a CDROM image mounted from a NFS share and that NFS share goes down. Typical situation is that users mount an CDROM ISO to install something and then forget to eject that CDROM afterwards. As a consequence this mounted CD is able to bring down the whole vserver if the backend NFS share is unreachable. This is bad especially if the CDROM itself is not needed anymore at this point. v2->v3: - adressed Stefans comments on Patch 1 - added patches 2,4,5,6 - avoided the term cancel in Patch 3. Added an iovec, added a BH [Stefan] v1->v2: - fix offset for 2352 byte sector size [Kevin] - use a sync request if we continue an elementary transfer. As John pointed out we enter a race condition between next IDE command and async transfer otherwise. This is sill not optimal, but it fixes the NFS down problems for all cases where the NFS server goes down while there is no PIO CD activity. Of course, it could still happen during a PIO transfer, but I expect this to be the unlikelier case. I spent some effort trying to read more sectors at once and avoiding continuation of elementary transfers, but with whatever I came up it was destroying migration between different Qemu versions. I have a quite hackish patch that works and should survive migration, but I am not happy with it. So I would like to start with this version as it is a big improvement already. - Dropped Patch 5 because it is upstream meanwhile. Peter Lieven (6): ide/atapi: make PIO read requests async block: add blk_abort_aio_request ide: add support for IDEBufferedRequest ide: orphan all buffered requests on DMA cancel ide: enable buffered requests for ATAPI devices ide: enable buffered requests for PIO read requests block/block-backend.c | 17 +++ hw/ide/atapi.c | 103 +++-- hw/ide/core.c | 50 +++- hw/ide/internal.h | 14 ++ hw/ide/pci.c | 19 include/sysemu/block-backend.h | 3 ++ 6 files changed, 181 insertions(+), 25 deletions(-) -- 1.9.1
[Qemu-block] [PATCH V3 1/6] ide/atapi: make PIO read requests async
PIO read requests on the ATAPI interface used to be sync blk requests. This has two significant drawbacks. First the main loop hangs util an I/O request is completed and secondly if the I/O request does not complete (e.g. due to an unresponsive storage) Qemu hangs completely. Note: Due to possible race conditions requests during an ongoing elementary transfer are still sync. Signed-off-by: Peter Lieven --- hw/ide/atapi.c | 97 ++ 1 file changed, 85 insertions(+), 12 deletions(-) diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c index 747f466..29fd131 100644 --- a/hw/ide/atapi.c +++ b/hw/ide/atapi.c @@ -105,33 +105,98 @@ static void cd_data_to_raw(uint8_t *buf, int lba) memset(buf, 0, 288); } -static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, int sector_size) +static int +cd_read_sector_sync(IDEState *s, uint8_t *buf) { int ret; -switch(sector_size) { +#ifdef DEBUG_IDE_ATAPI +printf("cd_read_sector_sync: lba=%d\n", s->lba); +#endif + +switch (s->cd_sector_size) { case 2048: block_acct_start(blk_get_stats(s->blk), &s->acct, 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ); -ret = blk_read(s->blk, (int64_t)lba << 2, buf, 4); +ret = blk_read(s->blk, (int64_t)s->lba << 2, buf, 4); block_acct_done(blk_get_stats(s->blk), &s->acct); break; case 2352: block_acct_start(blk_get_stats(s->blk), &s->acct, 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ); -ret = blk_read(s->blk, (int64_t)lba << 2, buf + 16, 4); +ret = blk_read(s->blk, (int64_t)s->lba << 2, buf + 16, 4); block_acct_done(blk_get_stats(s->blk), &s->acct); if (ret < 0) return ret; -cd_data_to_raw(buf, lba); +cd_data_to_raw(buf, s->lba); break; default: ret = -EIO; break; } + +if (!ret) { +s->lba++; +s->io_buffer_index = 0; +} + return ret; } +static void cd_read_sector_cb(void *opaque, int ret) +{ +IDEState *s = opaque; + +block_acct_done(blk_get_stats(s->blk), &s->acct); + +#ifdef DEBUG_IDE_ATAPI +printf("cd_read_sector_cb: lba=%d ret=%d\n", s->lba, ret); +#endif + +if (ret < 0) { +ide_atapi_io_error(s, ret); +return; +} + +if (s->cd_sector_size == 2352) { +cd_data_to_raw(s->io_buffer, s->lba); +} + +s->lba++; +s->io_buffer_index = 0; +s->status &= ~BUSY_STAT; + +ide_atapi_cmd_reply_end(s); +} + +static int cd_read_sector(IDEState *s, void *buf) +{ +if (s->cd_sector_size != 2048 && s->cd_sector_size != 2352) { +return -EINVAL; +} + +s->iov.iov_base = buf; +if (s->cd_sector_size == 2352) { +buf += 16; +} + +s->iov.iov_len = 4 * BDRV_SECTOR_SIZE; +qemu_iovec_init_external(&s->qiov, &s->iov, 1); + +#ifdef DEBUG_IDE_ATAPI +printf("cd_read_sector: lba=%d\n", s->lba); +#endif + +block_acct_start(blk_get_stats(s->blk), &s->acct, + 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ); + +blk_aio_readv(s->blk, (int64_t)s->lba << 2, &s->qiov, 4, + cd_read_sector_cb, s); + +s->status |= BUSY_STAT; +return 0; +} + void ide_atapi_cmd_ok(IDEState *s) { s->error = 0; @@ -182,18 +247,27 @@ void ide_atapi_cmd_reply_end(IDEState *s) ide_atapi_cmd_ok(s); ide_set_irq(s->bus); #ifdef DEBUG_IDE_ATAPI -printf("status=0x%x\n", s->status); +printf("end of transfer, status=0x%x\n", s->status); #endif } else { /* see if a new sector must be read */ if (s->lba != -1 && s->io_buffer_index >= s->cd_sector_size) { -ret = cd_read_sector(s, s->lba, s->io_buffer, s->cd_sector_size); -if (ret < 0) { -ide_atapi_io_error(s, ret); +if (!s->elementary_transfer_size) { +ret = cd_read_sector(s, s->io_buffer); +if (ret < 0) { +ide_atapi_io_error(s, ret); +} return; +} else { +/* rebuffering within an elementary transfer is + * only possible with a sync request because we + * end up with a race condition otherwise */ +ret = cd_read_sector_sync(s, s->io_buffer); +if (ret < 0) { +ide_atapi_io_error(s, ret); +return; +} } -s->lba++; -s->io_buffer_index = 0; } if (s->elementary_transfer_size > 0) { /* there are some data left to transmit in this elementary @@ -275,7 +349,6 @@ static void ide_atapi_cmd_read_pio(IDEState *s, int lba, int nb_sectors, s->io_buffer_index = sector_size; s->cd_sector_size = sector_size; -s->status = READY_STAT | SEEK_STAT; ide_atapi_cmd_reply_end(s); } -- 1.9.1
[Qemu-block] [PATCH V3 3/6] ide: add support for IDEBufferedRequest
this patch adds a new aio readv compatible function which copies all data through a bounce buffer. These buffered requests can be flagged as orphaned which means that their original callback has already been invoked and the request has just not been completed by the backend storage. The bounce buffer guarantees that guest memory corruption is avoided when such a orphaned request is completed by the backend at a later stage. This trick only works for read requests as a write request completed at a later stage might corrupt data as there is no way to control if and what data has already been written to the storage. Signed-off-by: Peter Lieven --- hw/ide/core.c | 46 ++ hw/ide/internal.h | 14 ++ 2 files changed, 60 insertions(+) diff --git a/hw/ide/core.c b/hw/ide/core.c index 364ba21..53f9c2c 100644 --- a/hw/ide/core.c +++ b/hw/ide/core.c @@ -561,6 +561,52 @@ static bool ide_sect_range_ok(IDEState *s, return true; } +static void ide_buffered_readv_cb(void *opaque, int ret) +{ +IDEBufferedRequest *req = opaque; +if (!req->orphaned) { +if (!ret) { +qemu_iovec_from_buf(req->original_qiov, 0, req->iov.iov_base, +req->original_qiov->size); +} +req->original_cb(req->original_opaque, ret); +} +QLIST_REMOVE(req, list); +g_free(req); +} + +#define MAX_BUFFERED_REQS 16 + +BlockAIOCB *ide_buffered_readv(IDEState *s, int64_t sector_num, + QEMUIOVector *iov, int nb_sectors, + BlockCompletionFunc *cb, void *opaque) +{ +BlockAIOCB *aioreq; +IDEBufferedRequest *req; +int c = 0; + +QLIST_FOREACH(req, &s->buffered_requests, list) { +c++; +} +if (c > MAX_BUFFERED_REQS) { +return blk_abort_aio_request(s->blk, cb, opaque, -EIO); +} + +req = g_new0(IDEBufferedRequest, 1); +req->original_qiov = iov; +req->original_cb = cb; +req->original_opaque = opaque; +req->iov.iov_base = qemu_blockalign(blk_bs(s->blk), iov->size); +req->iov.iov_len = iov->size; +qemu_iovec_init_external(&req->qiov, &req->iov, 1); + +aioreq = blk_aio_readv(s->blk, sector_num, &req->qiov, nb_sectors, + ide_buffered_readv_cb, req); + +QLIST_INSERT_HEAD(&s->buffered_requests, req, list); +return aioreq; +} + static void ide_sector_read(IDEState *s); static void ide_sector_read_cb(void *opaque, int ret) diff --git a/hw/ide/internal.h b/hw/ide/internal.h index 05e93ff..3d1116f 100644 --- a/hw/ide/internal.h +++ b/hw/ide/internal.h @@ -343,6 +343,16 @@ enum ide_dma_cmd { #define ide_cmd_is_read(s) \ ((s)->dma_cmd == IDE_DMA_READ) +typedef struct IDEBufferedRequest { +QLIST_ENTRY(IDEBufferedRequest) list; +struct iovec iov; +QEMUIOVector qiov; +QEMUIOVector *original_qiov; +BlockCompletionFunc *original_cb; +void *original_opaque; +bool orphaned; +} IDEBufferedRequest; + /* NOTE: IDEState represents in fact one drive */ struct IDEState { IDEBus *bus; @@ -396,6 +406,7 @@ struct IDEState { BlockAIOCB *pio_aiocb; struct iovec iov; QEMUIOVector qiov; +QLIST_HEAD(, IDEBufferedRequest) buffered_requests; /* ATA DMA state */ int32_t io_buffer_offset; int32_t io_buffer_size; @@ -572,6 +583,9 @@ void ide_set_inactive(IDEState *s, bool more); BlockAIOCB *ide_issue_trim(BlockBackend *blk, int64_t sector_num, QEMUIOVector *qiov, int nb_sectors, BlockCompletionFunc *cb, void *opaque); +BlockAIOCB *ide_buffered_readv(IDEState *s, int64_t sector_num, + QEMUIOVector *iov, int nb_sectors, + BlockCompletionFunc *cb, void *opaque); /* hw/ide/atapi.c */ void ide_atapi_cmd(IDEState *s); -- 1.9.1
[Qemu-block] [PATCH V3 2/6] block: add blk_abort_aio_request
Signed-off-by: Peter Lieven --- block/block-backend.c | 17 + include/sysemu/block-backend.h | 3 +++ 2 files changed, 12 insertions(+), 8 deletions(-) diff --git a/block/block-backend.c b/block/block-backend.c index 19fdaae..b13dc4e 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -627,8 +627,9 @@ static void error_callback_bh(void *opaque) qemu_aio_unref(acb); } -static BlockAIOCB *abort_aio_request(BlockBackend *blk, BlockCompletionFunc *cb, - void *opaque, int ret) +BlockAIOCB *blk_abort_aio_request(BlockBackend *blk, + BlockCompletionFunc *cb, + void *opaque, int ret) { struct BlockBackendAIOCB *acb; QEMUBH *bh; @@ -650,7 +651,7 @@ BlockAIOCB *blk_aio_write_zeroes(BlockBackend *blk, int64_t sector_num, { int ret = blk_check_request(blk, sector_num, nb_sectors); if (ret < 0) { -return abort_aio_request(blk, cb, opaque, ret); +return blk_abort_aio_request(blk, cb, opaque, ret); } return bdrv_aio_write_zeroes(blk->bs, sector_num, nb_sectors, flags, @@ -710,7 +711,7 @@ BlockAIOCB *blk_aio_readv(BlockBackend *blk, int64_t sector_num, { int ret = blk_check_request(blk, sector_num, nb_sectors); if (ret < 0) { -return abort_aio_request(blk, cb, opaque, ret); +return blk_abort_aio_request(blk, cb, opaque, ret); } return bdrv_aio_readv(blk->bs, sector_num, iov, nb_sectors, cb, opaque); @@ -722,7 +723,7 @@ BlockAIOCB *blk_aio_writev(BlockBackend *blk, int64_t sector_num, { int ret = blk_check_request(blk, sector_num, nb_sectors); if (ret < 0) { -return abort_aio_request(blk, cb, opaque, ret); +return blk_abort_aio_request(blk, cb, opaque, ret); } return bdrv_aio_writev(blk->bs, sector_num, iov, nb_sectors, cb, opaque); @@ -732,7 +733,7 @@ BlockAIOCB *blk_aio_flush(BlockBackend *blk, BlockCompletionFunc *cb, void *opaque) { if (!blk_is_available(blk)) { -return abort_aio_request(blk, cb, opaque, -ENOMEDIUM); +return blk_abort_aio_request(blk, cb, opaque, -ENOMEDIUM); } return bdrv_aio_flush(blk->bs, cb, opaque); @@ -744,7 +745,7 @@ BlockAIOCB *blk_aio_discard(BlockBackend *blk, { int ret = blk_check_request(blk, sector_num, nb_sectors); if (ret < 0) { -return abort_aio_request(blk, cb, opaque, ret); +return blk_abort_aio_request(blk, cb, opaque, ret); } return bdrv_aio_discard(blk->bs, sector_num, nb_sectors, cb, opaque); @@ -787,7 +788,7 @@ BlockAIOCB *blk_aio_ioctl(BlockBackend *blk, unsigned long int req, void *buf, BlockCompletionFunc *cb, void *opaque) { if (!blk_is_available(blk)) { -return abort_aio_request(blk, cb, opaque, -ENOMEDIUM); +return blk_abort_aio_request(blk, cb, opaque, -ENOMEDIUM); } return bdrv_aio_ioctl(blk->bs, req, buf, cb, opaque); diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h index 9306a52..b5267a8 100644 --- a/include/sysemu/block-backend.h +++ b/include/sysemu/block-backend.h @@ -180,5 +180,8 @@ int blk_save_vmstate(BlockBackend *blk, const uint8_t *buf, int blk_load_vmstate(BlockBackend *blk, uint8_t *buf, int64_t pos, int size); int blk_probe_blocksizes(BlockBackend *blk, BlockSizes *bsz); int blk_probe_geometry(BlockBackend *blk, HDGeometry *geo); +BlockAIOCB *blk_abort_aio_request(BlockBackend *blk, + BlockCompletionFunc *cb, + void *opaque, int ret); #endif -- 1.9.1
[Qemu-block] [PATCH V3 4/6] ide: orphan all buffered requests on DMA cancel
If the guests canceles a DMA request we can prematurely invoke all callbacks of buffered requests and flag all them as orphaned. Ideally this avoids the need for draining all requests. For CDROM devices this works in 100% of all cases. Signed-off-by: Peter Lieven --- hw/ide/pci.c | 19 +++ 1 file changed, 19 insertions(+) diff --git a/hw/ide/pci.c b/hw/ide/pci.c index d31ff88..a9e164e 100644 --- a/hw/ide/pci.c +++ b/hw/ide/pci.c @@ -240,6 +240,22 @@ void bmdma_cmd_writeb(BMDMAState *bm, uint32_t val) /* Ignore writes to SSBM if it keeps the old value */ if ((val & BM_CMD_START) != (bm->cmd & BM_CMD_START)) { if (!(val & BM_CMD_START)) { +/* First invoke the callbacks of all buffered requests + * and flag those requests as orphaned. Ideally there + * are no unbuffered (Scatter Gather DMA Requests or + * write requests) pending and we can avoid to drain. */ +IDEBufferedRequest *req; +IDEState *s = idebus_active_if(bm->bus); +QLIST_FOREACH(req, &s->buffered_requests, list) { +if (!req->orphaned) { +#ifdef DEBUG_IDE +printf("%s: invoking cb %p of buffered request %p with" + " -ECANCELED\n", __func__, req->original_cb, req); +#endif +req->original_cb(req->original_opaque, -ECANCELED); +} +req->orphaned = true; +} /* * We can't cancel Scatter Gather DMA in the middle of the * operation or a partial (not full) DMA transfer would reach @@ -253,6 +269,9 @@ void bmdma_cmd_writeb(BMDMAState *bm, uint32_t val) * aio operation with preadv/pwritev. */ if (bm->bus->dma->aiocb) { +#ifdef DEBUG_IDE +printf("%s: draining all remaining requests", __func__); +#endif blk_drain_all(); assert(bm->bus->dma->aiocb == NULL); } -- 1.9.1
Re: [Qemu-block] [Qemu-devel] [PATCH v10 12/14] block: add transactional properties
Am 05.11.2015 um 19:52 hat John Snow geschrieben: > > > On 11/05/2015 05:47 AM, Stefan Hajnoczi wrote: > > On Tue, Nov 03, 2015 at 12:27:19PM -0500, John Snow wrote: > >> > >> > >> On 11/03/2015 10:17 AM, Stefan Hajnoczi wrote: > >>> On Fri, Oct 23, 2015 at 07:56:50PM -0400, John Snow wrote: > @@ -1732,6 +1757,10 @@ static void > block_dirty_bitmap_add_prepare(BlkActionState *common, > BlockDirtyBitmapState *state = DO_UPCAST(BlockDirtyBitmapState, > common, common); > > +if (action_check_cancel_mode(common, errp) < 0) { > +return; > +} > + > action = common->action->block_dirty_bitmap_add; > /* AIO context taken and released within qmp_block_dirty_bitmap_add > */ > qmp_block_dirty_bitmap_add(action->node, action->name, > @@ -1767,6 +1796,10 @@ static void > block_dirty_bitmap_clear_prepare(BlkActionState *common, > common, common); > BlockDirtyBitmap *action; > > +if (action_check_cancel_mode(common, errp) < 0) { > +return; > +} > + > action = common->action->block_dirty_bitmap_clear; > state->bitmap = block_dirty_bitmap_lookup(action->node, > action->name, > >>> > >>> Why do the bitmap add/clear actions not support err-cancel=all? > >>> > >>> I understand why other block jobs don't support it, but it's not clear > >>> why these non-block job actions cannot. > >>> > >> > >> Because they don't have a callback to invoke if the rest of the job fails. > >> > >> I could create a BlockJob for them complete with a callback to invoke, > >> but basically it's just because there's no interface to unwind them, or > >> an interface to join them with the transaction. > >> > >> They're small, synchronous non-job actions. Which makes them weird. > > > > Funny, we've been looking at the same picture while seeing different > > things: > > https://en.wikipedia.org/wiki/Rabbit%E2%80%93duck_illusion > > > > I think I understand your idea: the transaction should include both > > immediate actions as well as block jobs. > > > > My mental model was different: immediate actions commit/abort along with > > the 'transaction' command. Block jobs are separate and complete/cancel > > together in a group. > > > > In practice I think the two end up being similar because we won't be > > able to implement immediate action commit/abort together with > > long-running block jobs because the immediate actions rely on > > quiescing/pausing the guest for atomic commit/abort. > > > > So with your mental model the QMP client has to submit 2 'transaction' > > commands: 1 for the immediate actions, 1 for the block jobs. > > > > In my mental model the QMP client submits 1 command but the immediate > > actions and block jobs are two separate transaction scopes. This means > > if the block jobs fail, the client needs to be aware of the immediate > > actions that have committed. Because of this, it becomes just as much > > client effort as submitting two separate 'transaction' commands in your > > model. > > > > Can anyone see a practical difference? I think I'm happy with John's > > model. > > > > Stefan > > > > We discussed this off-list, but for the sake of the archive: > > == How it is now == > > Currently, transactions have two implicit phases: the first is the > synchronous phase. If this phase completes successfully, we consider the > transaction a success. The second phase is the asynchronous phase where > jobs launched by the synchronous phase run to completion. > > all synchronous commands must complete for the transaction to "succeed." > There are currently (pre-patch) no guarantees about asynchronous command > completion. As long as all synchronous actions complete, asynchronous > actions are free to succeed or fail individually. You're overcomplicating this. All actions are currently synchronous and what you consider asynchronous transaction actions aren't actually part of the transaction at all. The action is "start block job X", not "run block job X". > == My Model == > > The current behavior is my "err-cancel = none" scenario: we offer no > guarantee about the success or failure of the transaction as a whole > after the synchronous portion has completed. > > What I was proposing is "err-cancel = all," which to me means that _ALL_ > commands in this transaction are to succeed (synchronous or not) before > _any_ actions are irrevocably committed. This means that for a > hypothetical mixed synchronous-asynchronous transaction, that even after > the transaction succeeded (it passed the synchronous phase), if an > asynchronous action later fails, all actions both synchronous and non > are rolled-back -- a kind of retroactive failure of the transaction. > This is clearly not possible in
Re: [Qemu-block] [Qemu-devel] [PULL 00/37] Block layer patches
Am 05.11.2015 um 20:01 hat Peter Maydell geschrieben: > On 5 November 2015 at 18:17, Kevin Wolf wrote: > > The following changes since commit 8835b9df3bddf332c883c861d6a1defc12c4ebe9: > > > > Merge remote-tracking branch > > 'remotes/mdroth/tags/qga-pull-2015-11-04-tag' into staging (2015-11-05 > > 10:52:35 +) > > > > are available in the git repository at: > > > > > > git://repo.or.cz/qemu/kevin.git tags/for-upstream > > > > for you to fetch changes up to 22bdadd23b64af65ac2dd816848dbe2b1240a77a: > > > > Merge remote-tracking branch > > 'mreitz/tags/pull-block-for-kevin-2015-11-05' into queue-block (2015-11-05 > > 18:01:37 +0100) > > > > > > > > Block layer patches > > Fails to build on OSX, I'm afraid: > Undefined symbols for architecture x86_64: > "_qmp_change_blockdev", referenced from: > -[QemuCocoaAppController changeDeviceMedia:] in cocoa.o Max, I think this is yours. Kevin