Re: [Qemu-block] [PATCH v2] blkdebug: Add support for latency rules
On 09/14/2018 01:46 PM, John Snow wrote: On 09/13/2018 12:48 PM, Marc Olson wrote: Are there further thoughts on this patch? The CI tools may have missed it since it appears to have been sent in reply to the V1 instead of as a new thread. When you send a revision out, can you send it as its own thread? Ah, sorry. Happy to do that for v3. (We're also in a bit of a logjam at the moment with no pull requests having been processed recently, so we're a bit gummed up.) Some comments below. On 09/04/2018 05:24 PM, Marc Olson wrote: Sometimes storage devices can be slow to respond, due to media errors, firmware issues, SSD garbage collection, etc. This patch adds a new rule type to blkdebug that allows injection of latency to I/O operations. Similar to error injection rules, latency rules can be specified with or without an offset, and can also apply upon state transitions. Signed-off-by: Marc Olson --- v2: - Change so that delay rules are executed before error rules - Add QMP support - Add tests --- block/blkdebug.c | 119 +++-- docs/devel/blkdebug.txt | 30 +--- qapi/block-core.json | 39 +-- tests/qemu-iotests/071 | 63 tests/qemu-iotests/071.out | 31 5 files changed, 244 insertions(+), 38 deletions(-) diff --git a/block/blkdebug.c b/block/blkdebug.c index 0759452..1785fe3 100644 --- a/block/blkdebug.c +++ b/block/blkdebug.c @@ -65,6 +65,7 @@ typedef struct BlkdebugSuspendedReq { enum { ACTION_INJECT_ERROR, + ACTION_DELAY, ACTION_SET_STATE, ACTION_SUSPEND, }; @@ -73,14 +74,17 @@ typedef struct BlkdebugRule { BlkdebugEvent event; int action; int state; + int once; + int64_t offset; Hm, I guess we're just promoting these to universal values that might exist for any of the rule types, but not allowing the user to set them for rules where it doesn't make sense. It seemed a bit odd to me to duplicate them in the union, but happy to move them back if you're concerned about it. union { struct { int error; int immediately; - int once; - int64_t offset; } inject; struct { + int64_t latency; + } delay; + struct { int new_state; } set_state; struct { @@ -123,6 +127,33 @@ static QemuOptsList inject_error_opts = { }, }; +static QemuOptsList delay_opts = { + .name = "delay", + .head = QTAILQ_HEAD_INITIALIZER(delay_opts.head), + .desc = { + { + .name = "event", + }, + { + .name = "state", + .type = QEMU_OPT_NUMBER, + }, + { + .name = "latency", + .type = QEMU_OPT_NUMBER, + }, + { + .name = "sector", + .type = QEMU_OPT_NUMBER, + }, + { + .name = "once", + .type = QEMU_OPT_BOOL, + }, + { /* end of list */ } + }, +}; + static QemuOptsList set_state_opts = { .name = "set-state", .head = QTAILQ_HEAD_INITIALIZER(set_state_opts.head), @@ -145,6 +176,7 @@ static QemuOptsList set_state_opts = { static QemuOptsList *config_groups[] = { _error_opts, + _opts, _state_opts, NULL }; @@ -182,16 +214,21 @@ static int add_rule(void *opaque, QemuOpts *opts, Error **errp) .state = qemu_opt_get_number(opts, "state", 0), }; + rule->once = qemu_opt_get_bool(opts, "once", 0); + sector = qemu_opt_get_number(opts, "sector", -1); + rule->offset = sector == -1 ? -1 : sector * BDRV_SECTOR_SIZE; + /* Parse action-specific options */ switch (d->action) { case ACTION_INJECT_ERROR: rule->options.inject.error = qemu_opt_get_number(opts, "errno", EIO); - rule->options.inject.once = qemu_opt_get_bool(opts, "once", 0); rule->options.inject.immediately = qemu_opt_get_bool(opts, "immediately", 0); - sector = qemu_opt_get_number(opts, "sector", -1); - rule->options.inject.offset = - sector == -1 ? -1 : sector * BDRV_SECTOR_SIZE; + break; + + case ACTION_DELAY: + rule->options.delay.latency = + qemu_opt_get_number(opts, "latency", 100) * SCALE_US; break; case ACTION_SET_STATE: @@ -264,6 +301,14 @@ static int read_config(BDRVBlkdebugState *s, const char *filename, goto fail; } + d.action = ACTION_DELAY; + qemu_opts_foreach(_opts, add_rule, , _err); + if (local_err) { + error_propagate(errp, local_err); + ret = -EINVAL; + goto fail; + } + d.action = ACTION_SET_STATE; qemu_opts_foreach(_state_opts, add_rule, , _err); if (local_err) { @@ -275,6 +320,7 @@ static int read_config(BDRVBlkdebugState *s, const char
Re: [Qemu-block] [PATCH v2] blkdebug: Add support for latency rules
On 09/13/2018 12:48 PM, Marc Olson wrote: > Are there further thoughts on this patch? > The CI tools may have missed it since it appears to have been sent in reply to the V1 instead of as a new thread. When you send a revision out, can you send it as its own thread? (We're also in a bit of a logjam at the moment with no pull requests having been processed recently, so we're a bit gummed up.) Some comments below. > > On 09/04/2018 05:24 PM, Marc Olson wrote: >> Sometimes storage devices can be slow to respond, due to media errors, >> firmware >> issues, SSD garbage collection, etc. This patch adds a new rule type to >> blkdebug that allows injection of latency to I/O operations. Similar >> to error >> injection rules, latency rules can be specified with or without an >> offset, and >> can also apply upon state transitions. >> >> Signed-off-by: Marc Olson >> >> --- >> v2: >> - Change so that delay rules are executed before error rules >> - Add QMP support >> - Add tests >> --- >> block/blkdebug.c | 119 >> +++-- >> docs/devel/blkdebug.txt | 30 +--- >> qapi/block-core.json | 39 +-- >> tests/qemu-iotests/071 | 63 >> tests/qemu-iotests/071.out | 31 >> 5 files changed, 244 insertions(+), 38 deletions(-) >> >> diff --git a/block/blkdebug.c b/block/blkdebug.c >> index 0759452..1785fe3 100644 >> --- a/block/blkdebug.c >> +++ b/block/blkdebug.c >> @@ -65,6 +65,7 @@ typedef struct BlkdebugSuspendedReq { >> enum { >> ACTION_INJECT_ERROR, >> + ACTION_DELAY, >> ACTION_SET_STATE, >> ACTION_SUSPEND, >> }; >> @@ -73,14 +74,17 @@ typedef struct BlkdebugRule { >> BlkdebugEvent event; >> int action; >> int state; >> + int once; >> + int64_t offset; Hm, I guess we're just promoting these to universal values that might exist for any of the rule types, but not allowing the user to set them for rules where it doesn't make sense. >> union { >> struct { >> int error; >> int immediately; >> - int once; >> - int64_t offset; >> } inject; >> struct { >> + int64_t latency; >> + } delay; >> + struct { >> int new_state; >> } set_state; >> struct { >> @@ -123,6 +127,33 @@ static QemuOptsList inject_error_opts = { >> }, >> }; >> +static QemuOptsList delay_opts = { >> + .name = "delay", >> + .head = QTAILQ_HEAD_INITIALIZER(delay_opts.head), >> + .desc = { >> + { >> + .name = "event", >> + }, >> + { >> + .name = "state", >> + .type = QEMU_OPT_NUMBER, >> + }, >> + { >> + .name = "latency", >> + .type = QEMU_OPT_NUMBER, >> + }, >> + { >> + .name = "sector", >> + .type = QEMU_OPT_NUMBER, >> + }, >> + { >> + .name = "once", >> + .type = QEMU_OPT_BOOL, >> + }, >> + { /* end of list */ } >> + }, >> +}; >> + >> static QemuOptsList set_state_opts = { >> .name = "set-state", >> .head = QTAILQ_HEAD_INITIALIZER(set_state_opts.head), >> @@ -145,6 +176,7 @@ static QemuOptsList set_state_opts = { >> static QemuOptsList *config_groups[] = { >> _error_opts, >> + _opts, >> _state_opts, >> NULL >> }; >> @@ -182,16 +214,21 @@ static int add_rule(void *opaque, QemuOpts >> *opts, Error **errp) >> .state = qemu_opt_get_number(opts, "state", 0), >> }; >> + rule->once = qemu_opt_get_bool(opts, "once", 0); >> + sector = qemu_opt_get_number(opts, "sector", -1); >> + rule->offset = sector == -1 ? -1 : sector * BDRV_SECTOR_SIZE; >> + >> /* Parse action-specific options */ >> switch (d->action) { >> case ACTION_INJECT_ERROR: >> rule->options.inject.error = qemu_opt_get_number(opts, >> "errno", EIO); >> - rule->options.inject.once = qemu_opt_get_bool(opts, "once", 0); >> rule->options.inject.immediately = >> qemu_opt_get_bool(opts, "immediately", 0); >> - sector = qemu_opt_get_number(opts, "sector", -1); >> - rule->options.inject.offset = >> - sector == -1 ? -1 : sector * BDRV_SECTOR_SIZE; >> + break; >> + >> + case ACTION_DELAY: >> + rule->options.delay.latency = >> + qemu_opt_get_number(opts, "latency", 100) * SCALE_US; >> break; >> case ACTION_SET_STATE: >> @@ -264,6 +301,14 @@ static int read_config(BDRVBlkdebugState *s, >> const char *filename, >> goto fail; >> } >> + d.action = ACTION_DELAY; >> + qemu_opts_foreach(_opts, add_rule, , _err); >> + if (local_err) { >> + error_propagate(errp, local_err); >> + ret = -EINVAL; >> + goto fail; >> + } >> + >> d.action = ACTION_SET_STATE; >>
Re: [Qemu-block] [Qemu-devel] [PATCH v3 1/8] dirty-bitmap: improve bdrv_dirty_bitmap_next_zero
On 09/14/2018 01:51 PM, Vladimir Sementsov-Ogievskiy wrote: > 14.09.2018 20:39, John Snow wrote: >> >> On 09/10/2018 01:00 PM, Vladimir Sementsov-Ogievskiy wrote: >>> 10.09.2018 19:55, Eric Blake wrote: On 9/10/18 11:49 AM, Vladimir Sementsov-Ogievskiy wrote: >>> -int64_t hbitmap_next_zero(const HBitmap *hb, uint64_t start); >>> +int64_t hbitmap_next_zero(const HBitmap *hb, uint64_t start, >>> int64_t end); >> The interface looks weird because we can define a 'start' that's >> beyond >> the 'end'. >> >> I realize that you need a signed integer for 'end' to signify EOF... >> should we do a 'bytes' parameter instead? (Did you already do that >> in an >> earlier version and we changed it?) >> >> Well, it's not a big deal to me personally. > interface with constant end parameter is more comfortable for loop: > we don't need to update 'bytes' parameter on each iteration But there's still the question of WHO should be calculating end. Your interface argues for the caller: hbitmap_next_zero(start, start + bytes) int64_t hbitmap_next_zero(...) { while (offset != end) ... } while we're asking about a consistent interface for the caller (if most callers already have a 'bytes' rather than an 'end' computed): hbitmap_next_zero(start, bytes) int64_t hbitmap_next_zero(...) { int64_t end = start + bytes; while (offset != end) ... } >>> Yes, that's an issue. Ok, if you are not comfortable with start,end, I >>> can switch to start,bytes. >>> >> The series looks pretty close, I can merge the next version if you think >> it's worth changing the interface. >> >> --js > > I've started to change interface and found a bug in bitmap_to_extents > (patch sent > https://lists.gnu.org/archive/html/qemu-devel/2018-09/msg01804.html). > So, next version will be based on this patch, which will go through > Eric's tree.. > OK, I spoke with Eric and if you resend and I R-B & ACK the patches, he'll stage them through NBD. Thanks, --js
Re: [Qemu-block] [PATCH v7 3/6] qcow2: Reduce REFT_OFFSET_MASK
[revisiting this thread] On 6/29/18 10:43 AM, Kevin Wolf wrote: Am 29.06.2018 um 17:22 hat Eric Blake geschrieben: On 06/29/2018 03:44 AM, Kevin Wolf wrote: Am 28.06.2018 um 21:07 hat Eric Blake geschrieben: Match our code to the spec change in the previous patch - there's no reason for the refcount table to allow larger offsets than the L1/L2 tables. What about internal snapshots? And anyway, because of the metadata overhead, the physical image size of a fully allocated image is always going to be at least minimally larger than the virtual disk size. I'm not necessarily opposed to making the change if there is a good reason to make it, but I don't see a real need for it and the justification used here and also in the previous patch is incorrect. The fact that ext4 cannot hold an image this large is already an indication that setting this limit on the refcount table is NOT going to bite real users. Yes, you can argue that with lots of metadata, including internal snapshots, and on a capable file system (such as tmpfs) that can even hold files this large to begin with, then yes, allowing the refcount to exceed this limit will allow slightly more metadata to be crammed into a single image. But will it actually help anyone? Do I need to respin the series to split patch 2 into the obvious changes (stuff unrelated to capping refcount size) vs. the controversial stuff (refcount cap and this code change)? Kevin In practice, no image has more than 64PB of allocated clusters anyways, as anything beyond that can't be expressed via L2 mappings to host offsets. If you're opposed to an exact 56-bit limit on the grounds that 56-bit guest data plus minimal metadata should still be expressable, would it be better to cap refcount at 57-bits? You're arguing that the patch doesn't hurt in practice, but what I'm really looking for is what do we gain from it? I guess one thing to be gained by having qemu's implementation of qcow2 pull a hard-stop at 56 bits is that it becomes easier to detect errant refcount entries beyond the end of the file as being errant, and not attempt to expand the file during 'qemu-img check' just because a refcount pointed out that far. But that's still a pretty weak argument (we've already mentioned that qemu-img check should do a better job of handling/flagging refcount entries that point way beyond the end of the image - but that's true regardless of what limits may be placed on the host image size) We generally don't apply patches just because they don't hurt, but because they are helpful for something. In the simple, fully compatible cases "helpful" could just mean "we like the code better". This is a slightly different category: "it's technically incompatible, but we don't think anyone uses this". We have done such changes before when they allowed us to improve something substantially, but I think the requirements for this are at least a little higher than just "because we like it better". And I'm not even sure yet if I like it better because it seems like an arbitrary restriction. So apart from not hurting in practice, what does the restriction buy us? As I'm not coming up with a solid reason for tightening the bounds, I will (eventually) respin the series without the spec change. At this point, it's probably easier for me to wait until after the pull request queue has started flushing again. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [Qemu-block] [Qemu-devel] [PATCH] nbd/server: fix bitmap export
On 9/14/18 12:24 PM, Eric Blake wrote: On 9/14/18 11:51 AM, Vladimir Sementsov-Ogievskiy wrote: bitmap_to_extents function is broken: it switches dirty variable after every iteration, however it can process only part of dirty (or zero) area during one iteration in case when this area is too large for one extent. Fortunately, the bug don't produce wrong extents: it just inserts zero-length extents between sequential extents representing large dirty (or zero) area. However, zero-length extents are abandoned by NBD s/abandoned by/forbidden by the/ protocol. So, careful client should consider such replay as server s/replay/reply/ fault and not-careful will likely ignore zero-length extents. Which camp is qemu 3.0 as client in? Does it tolerate the zero-length extent, and still manage to see correct information overall, or does it crash? Hmm - I think we're "safe" with qemu as client - right now, the only way qemu 3.0 accesses the qemu dirty bitmap over NBD is with my x-dirty-bitmap hack (commit 216ee3657), which uses block/nbd-client.c:nbd_client_co_block_status() to read the bitmap, and that always passes NBD_CMD_FLAG_REQ_ONE. qemu will assert() if nbd_client_co_block_status() doesn't make any progress, but from what I'm reading of your bug report, qemu as client never permits the server to answer with more than one extent, and the bug of a zero-length extent is triggered only after the first extent has been sent. A further mitigation - the bug can only occur for a client that requests block status with a length in the range (4G-bitmap_granularity, 4G). Otherwise, the loop terminates after the extent that got truncated to 4G-bitmap_granularity, preventing the next iteration of the loop from detecting a 0-length extent. Thus, the primary reason to accept this patch is not because of qemu 3.0 as client, but for interoperability with other clients. I'm planning on updating the commit message to add these additional details. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [Qemu-block] [Qemu-devel] [PATCH v3 1/8] dirty-bitmap: improve bdrv_dirty_bitmap_next_zero
On 09/14/2018 01:51 PM, Vladimir Sementsov-Ogievskiy wrote: > 14.09.2018 20:39, John Snow wrote: >> >> On 09/10/2018 01:00 PM, Vladimir Sementsov-Ogievskiy wrote: >>> 10.09.2018 19:55, Eric Blake wrote: On 9/10/18 11:49 AM, Vladimir Sementsov-Ogievskiy wrote: >>> -int64_t hbitmap_next_zero(const HBitmap *hb, uint64_t start); >>> +int64_t hbitmap_next_zero(const HBitmap *hb, uint64_t start, >>> int64_t end); >> The interface looks weird because we can define a 'start' that's >> beyond >> the 'end'. >> >> I realize that you need a signed integer for 'end' to signify EOF... >> should we do a 'bytes' parameter instead? (Did you already do that >> in an >> earlier version and we changed it?) >> >> Well, it's not a big deal to me personally. > interface with constant end parameter is more comfortable for loop: > we don't need to update 'bytes' parameter on each iteration But there's still the question of WHO should be calculating end. Your interface argues for the caller: hbitmap_next_zero(start, start + bytes) int64_t hbitmap_next_zero(...) { while (offset != end) ... } while we're asking about a consistent interface for the caller (if most callers already have a 'bytes' rather than an 'end' computed): hbitmap_next_zero(start, bytes) int64_t hbitmap_next_zero(...) { int64_t end = start + bytes; while (offset != end) ... } >>> Yes, that's an issue. Ok, if you are not comfortable with start,end, I >>> can switch to start,bytes. >>> >> The series looks pretty close, I can merge the next version if you think >> it's worth changing the interface. >> >> --js > > I've started to change interface and found a bug in bitmap_to_extents > (patch sent > https://lists.gnu.org/archive/html/qemu-devel/2018-09/msg01804.html). > So, next version will be based on this patch, which will go through > Eric's tree.. > ah, I see. you can send it to the list anyway with the requires: header and I can have Eric stage it to make the eventual merge easier for Peter. --js
Re: [Qemu-block] [Qemu-devel] [PATCH v3 1/8] dirty-bitmap: improve bdrv_dirty_bitmap_next_zero
14.09.2018 20:39, John Snow wrote: On 09/10/2018 01:00 PM, Vladimir Sementsov-Ogievskiy wrote: 10.09.2018 19:55, Eric Blake wrote: On 9/10/18 11:49 AM, Vladimir Sementsov-Ogievskiy wrote: -int64_t hbitmap_next_zero(const HBitmap *hb, uint64_t start); +int64_t hbitmap_next_zero(const HBitmap *hb, uint64_t start, int64_t end); The interface looks weird because we can define a 'start' that's beyond the 'end'. I realize that you need a signed integer for 'end' to signify EOF... should we do a 'bytes' parameter instead? (Did you already do that in an earlier version and we changed it?) Well, it's not a big deal to me personally. interface with constant end parameter is more comfortable for loop: we don't need to update 'bytes' parameter on each iteration But there's still the question of WHO should be calculating end. Your interface argues for the caller: hbitmap_next_zero(start, start + bytes) int64_t hbitmap_next_zero(...) { while (offset != end) ... } while we're asking about a consistent interface for the caller (if most callers already have a 'bytes' rather than an 'end' computed): hbitmap_next_zero(start, bytes) int64_t hbitmap_next_zero(...) { int64_t end = start + bytes; while (offset != end) ... } Yes, that's an issue. Ok, if you are not comfortable with start,end, I can switch to start,bytes. The series looks pretty close, I can merge the next version if you think it's worth changing the interface. --js I've started to change interface and found a bug in bitmap_to_extents (patch sent https://lists.gnu.org/archive/html/qemu-devel/2018-09/msg01804.html). So, next version will be based on this patch, which will go through Eric's tree.. -- Best regards, Vladimir
Re: [Qemu-block] [Qemu-devel] [PULL 2/4] block/rbd: Attempt to parse legacy filenames
On 9/12/18 8:18 AM, Jeff Cody wrote: When we converted rbd to get rid of the older key/value-centric encoding format, we broke compatibility with image files with backing file strings encoded in the old format. This leaves a bit of an ugly conundrum, and a hacky solution. +/* Take care whenever deciding to actually deprecate; once this ability + * is removed, we will not be able to open any images with legacy-styled + * backing image strings. */ +error_report("RBD options encoded in the filename as keyvalue pairs " + "is deprecated. Future versions may cease to parse " + "these options in the future."); Fam spotted offline that " in the future" is redundant with "Future versions", if you want to respin the pull request. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [Qemu-block] [PATCH] nbd/server: fix bitmap export
14.09.2018 20:35, Eric Blake wrote: On 9/14/18 12:30 PM, Vladimir Sementsov-Ogievskiy wrote: while (begin < overall_end && i < nb_extents) { + bool next_dirty = !dirty; + if (dirty) { end = bdrv_dirty_bitmap_next_zero(bitmap, begin); } else { @@ -1962,6 +1964,7 @@ static unsigned int bitmap_to_extents(BdrvDirtyBitmap *bitmap, uint64_t offset, end = MIN(bdrv_dirty_bitmap_size(bitmap), begin + UINT32_MAX + 1 - bdrv_dirty_bitmap_granularity(bitmap)); + next_dirty = dirty; } Rather than introducing next_dirty, couldn't you just make this: if (end == -1 || end - begin > UINT32_MAX) { /* Cap ... */ end = MIN(...); } else { dirty = !dirty; } no, dirty variable is used after it, we can't change it here. Ah, right. But we could also hoist the extents[i].flags = cpu_to_be32(dirty ? NBD_STATEE_DIRTY : 0) line to occur prior to the 'if' doing the end capping calculation. However, splitting the two assignments into extents[i].* to no longer be consecutive statements, just to avoid the use of a temporary variable, starts to get less aesthetically pleasing. Thus, I'm fine with your version (with commit message improved), unless you want to send a v2. Ok, I don't want:) Be free to update commit message and pull it. Reviewed-by: Eric Blake -- Best regards, Vladimir
Re: [Qemu-block] [PATCH v2 11/17] block-backend: Decrease in_flight only after callback
On 14/09/2018 19:14, Kevin Wolf wrote: >> As you mention, you could have a nested aio_poll() in the main thread, >> for example invoked from a bottom half, but in that case I'd rather >> track the caller that is creating the bottom half and see if it lacks a >> bdrv_ref/bdrv_unref (or perhaps it's even higher in the tree that is >> missing). > I went back to the commit where I first added the patch (it already > contained the ref/unref pair) and tried if I could reproduce a bug with > the pair removed. I couldn't. > > I'm starting to think that maybe I was just overly cautious with the > ref/unref. I may have confused the nested aio_poll() crash with a > different situation. I've dealt with so many crashes and hangs while > working on this series that it's quite possible. Are you going to drop the patch hen? Thanks, Paolo
Re: [Qemu-block] [Qemu-devel] [PATCH v3 1/8] dirty-bitmap: improve bdrv_dirty_bitmap_next_zero
On 09/10/2018 01:00 PM, Vladimir Sementsov-Ogievskiy wrote: > 10.09.2018 19:55, Eric Blake wrote: >> On 9/10/18 11:49 AM, Vladimir Sementsov-Ogievskiy wrote: >> > -int64_t hbitmap_next_zero(const HBitmap *hb, uint64_t start); > +int64_t hbitmap_next_zero(const HBitmap *hb, uint64_t start, > int64_t end); The interface looks weird because we can define a 'start' that's beyond the 'end'. I realize that you need a signed integer for 'end' to signify EOF... should we do a 'bytes' parameter instead? (Did you already do that in an earlier version and we changed it?) Well, it's not a big deal to me personally. >>> >>> interface with constant end parameter is more comfortable for loop: >>> we don't need to update 'bytes' parameter on each iteration >> >> But there's still the question of WHO should be calculating end. Your >> interface argues for the caller: >> >> hbitmap_next_zero(start, start + bytes) >> >> int64_t hbitmap_next_zero(...) >> { >> while (offset != end) ... >> } >> >> while we're asking about a consistent interface for the caller (if >> most callers already have a 'bytes' rather than an 'end' computed): >> >> hbitmap_next_zero(start, bytes) >> >> int64_t hbitmap_next_zero(...) >> { >> int64_t end = start + bytes; >> while (offset != end) ... >> } >> > > Yes, that's an issue. Ok, if you are not comfortable with start,end, I > can switch to start,bytes. > The series looks pretty close, I can merge the next version if you think it's worth changing the interface. --js
Re: [Qemu-block] [PATCH] nbd/server: fix bitmap export
14.09.2018 20:30, Vladimir Sementsov-Ogievskiy wrote: 14.09.2018 20:24, Eric Blake wrote: On 9/14/18 11:51 AM, Vladimir Sementsov-Ogievskiy wrote: bitmap_to_extents function is broken: it switches dirty variable after every iteration, however it can process only part of dirty (or zero) area during one iteration in case when this area is too large for one extent. Fortunately, the bug don't produce wrong extents: it just inserts zero-length extents between sequential extents representing large dirty (or zero) area. However, zero-length extents are abandoned by NBD s/abandoned by/forbidden by the/ protocol. So, careful client should consider such replay as server s/replay/reply/ fault and not-careful will likely ignore zero-length extents. Which camp is qemu 3.0 as client in? Does it tolerate the zero-length extent, and still manage to see correct information overall, or does it crash? Hmm - I think we're "safe" with qemu as client - right now, the only way qemu 3.0 accesses the qemu dirty bitmap over NBD is with my x-dirty-bitmap hack (commit 216ee3657), which uses block/nbd-client.c:nbd_client_co_block_status() to read the bitmap, and that always passes NBD_CMD_FLAG_REQ_ONE. qemu will assert() if nbd_client_co_block_status() doesn't make any progress, but from what I'm reading of your bug report, qemu as client never permits the server to answer with more than one extent, and the bug of a zero-length extent is triggered only after the first extent has been sent. Thus, the primary reason to accept this patch is not because of qemu 3.0 as client, but for interoperability with other clients. I'm planning on updating the commit message to add these additional details. Fix this by more careful handling of dirty variable. Bug was introduced in 3d068aff16 "nbd/server: implement dirty bitmap export", with the whole function. and presents in v3.0.0 release. s/presents/present/ Signed-off-by: Vladimir Sementsov-Ogievskiy CC: qemu-sta...@nongnu.org --- nbd/server.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/nbd/server.c b/nbd/server.c index ea5fe0eb33..12f721482d 100644 --- a/nbd/server.c +++ b/nbd/server.c @@ -1951,6 +1951,8 @@ static unsigned int bitmap_to_extents(BdrvDirtyBitmap *bitmap, uint64_t offset, assert(begin < overall_end && nb_extents); while (begin < overall_end && i < nb_extents) { + bool next_dirty = !dirty; + if (dirty) { end = bdrv_dirty_bitmap_next_zero(bitmap, begin); } else { @@ -1962,6 +1964,7 @@ static unsigned int bitmap_to_extents(BdrvDirtyBitmap *bitmap, uint64_t offset, end = MIN(bdrv_dirty_bitmap_size(bitmap), begin + UINT32_MAX + 1 - bdrv_dirty_bitmap_granularity(bitmap)); + next_dirty = dirty; } Rather than introducing next_dirty, couldn't you just make this: if (end == -1 || end - begin > UINT32_MAX) { /* Cap ... */ end = MIN(...); } else { dirty = !dirty; } no, dirty variable is used after it, we can't change it here. however, it can be done if we move extents[i].flags = cpu_to_be32(dirty ? NBD_STATE_DIRTY : 0); to be the first line of the loop body, so, with this movement, I'm ok with your changes and turning it into a pull. if (dont_fragment && end > overall_end) { end = overall_end; @@ -1971,7 +1974,7 @@ static unsigned int bitmap_to_extents(BdrvDirtyBitmap *bitmap, uint64_t offset, extents[i].flags = cpu_to_be32(dirty ? NBD_STATE_DIRTY : 0); i++; begin = end; - dirty = !dirty; + dirty = next_dirty; } and then you merely delete the assignment to 'dirty' here. bdrv_dirty_iter_free(it); At any rate, the fix makes sense to me. Should I go ahead and incorporate the changes I've suggested and turn it into a pull request through my NBD tree, or would you like to send a v2? -- Best regards, Vladimir
Re: [Qemu-block] [Qemu-devel] [PATCH v3 0/4] transaction support for bitmap merge
On 09/14/2018 11:15 AM, Vladimir Sementsov-Ogievskiy wrote: > This is a last brick, necessary to play with nbd bitmap export in > conjunction with image fleecing. > > v3: > 01: fix type in commit message, add John's r-b > 02: splitted refactoring > 03: improve commit message, split some refactoring to 02 > 04: add John's r-b > 05: drop extra state variable, make it local instead. John's r-b. > > > v2: don't compare with v1, it is changed a lot, to do the whole thing > in .prepare instead of .commit. It is needed to be compatible with > backup block job transaction actions [John] > > Vladimir Sementsov-Ogievskiy (5): > dirty-bitmap: switch assert-fails to errors in bdrv_merge_dirty_bitmap > dirty-bitmap: rename bdrv_undo_clear_dirty_bitmap > dirty-bitmap: make it possible to restore bitmap after merge > blockdev: rename block-dirty-bitmap-clear transaction handlers > qapi: add transaction support for x-block-dirty-bitmap-merge > > qapi/transaction.json| 2 ++ > include/block/block_int.h| 2 +- > include/block/dirty-bitmap.h | 2 +- > include/qemu/hbitmap.h | 25 +-- > block/dirty-bitmap.c | 36 +- > blockdev.c | 59 +--- > util/hbitmap.c | 11 +-- > 7 files changed, 99 insertions(+), 38 deletions(-) > Thanks, applied to my bitmaps tree: https://github.com/jnsnow/qemu/commits/bitmaps https://github.com/jnsnow/qemu.git --js
Re: [Qemu-block] [PATCH] nbd/server: fix bitmap export
On 9/14/18 12:30 PM, Vladimir Sementsov-Ogievskiy wrote: while (begin < overall_end && i < nb_extents) { + bool next_dirty = !dirty; + if (dirty) { end = bdrv_dirty_bitmap_next_zero(bitmap, begin); } else { @@ -1962,6 +1964,7 @@ static unsigned int bitmap_to_extents(BdrvDirtyBitmap *bitmap, uint64_t offset, end = MIN(bdrv_dirty_bitmap_size(bitmap), begin + UINT32_MAX + 1 - bdrv_dirty_bitmap_granularity(bitmap)); + next_dirty = dirty; } Rather than introducing next_dirty, couldn't you just make this: if (end == -1 || end - begin > UINT32_MAX) { /* Cap ... */ end = MIN(...); } else { dirty = !dirty; } no, dirty variable is used after it, we can't change it here. Ah, right. But we could also hoist the extents[i].flags = cpu_to_be32(dirty ? NBD_STATEE_DIRTY : 0) line to occur prior to the 'if' doing the end capping calculation. However, splitting the two assignments into extents[i].* to no longer be consecutive statements, just to avoid the use of a temporary variable, starts to get less aesthetically pleasing. Thus, I'm fine with your version (with commit message improved), unless you want to send a v2. Reviewed-by: Eric Blake -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [Qemu-block] [PATCH] nbd/server: fix bitmap export
14.09.2018 20:24, Eric Blake wrote: On 9/14/18 11:51 AM, Vladimir Sementsov-Ogievskiy wrote: bitmap_to_extents function is broken: it switches dirty variable after every iteration, however it can process only part of dirty (or zero) area during one iteration in case when this area is too large for one extent. Fortunately, the bug don't produce wrong extents: it just inserts zero-length extents between sequential extents representing large dirty (or zero) area. However, zero-length extents are abandoned by NBD s/abandoned by/forbidden by the/ protocol. So, careful client should consider such replay as server s/replay/reply/ fault and not-careful will likely ignore zero-length extents. Which camp is qemu 3.0 as client in? Does it tolerate the zero-length extent, and still manage to see correct information overall, or does it crash? Hmm - I think we're "safe" with qemu as client - right now, the only way qemu 3.0 accesses the qemu dirty bitmap over NBD is with my x-dirty-bitmap hack (commit 216ee3657), which uses block/nbd-client.c:nbd_client_co_block_status() to read the bitmap, and that always passes NBD_CMD_FLAG_REQ_ONE. qemu will assert() if nbd_client_co_block_status() doesn't make any progress, but from what I'm reading of your bug report, qemu as client never permits the server to answer with more than one extent, and the bug of a zero-length extent is triggered only after the first extent has been sent. Thus, the primary reason to accept this patch is not because of qemu 3.0 as client, but for interoperability with other clients. I'm planning on updating the commit message to add these additional details. Fix this by more careful handling of dirty variable. Bug was introduced in 3d068aff16 "nbd/server: implement dirty bitmap export", with the whole function. and presents in v3.0.0 release. s/presents/present/ Signed-off-by: Vladimir Sementsov-Ogievskiy CC: qemu-sta...@nongnu.org --- nbd/server.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/nbd/server.c b/nbd/server.c index ea5fe0eb33..12f721482d 100644 --- a/nbd/server.c +++ b/nbd/server.c @@ -1951,6 +1951,8 @@ static unsigned int bitmap_to_extents(BdrvDirtyBitmap *bitmap, uint64_t offset, assert(begin < overall_end && nb_extents); while (begin < overall_end && i < nb_extents) { + bool next_dirty = !dirty; + if (dirty) { end = bdrv_dirty_bitmap_next_zero(bitmap, begin); } else { @@ -1962,6 +1964,7 @@ static unsigned int bitmap_to_extents(BdrvDirtyBitmap *bitmap, uint64_t offset, end = MIN(bdrv_dirty_bitmap_size(bitmap), begin + UINT32_MAX + 1 - bdrv_dirty_bitmap_granularity(bitmap)); + next_dirty = dirty; } Rather than introducing next_dirty, couldn't you just make this: if (end == -1 || end - begin > UINT32_MAX) { /* Cap ... */ end = MIN(...); } else { dirty = !dirty; } no, dirty variable is used after it, we can't change it here. if (dont_fragment && end > overall_end) { end = overall_end; @@ -1971,7 +1974,7 @@ static unsigned int bitmap_to_extents(BdrvDirtyBitmap *bitmap, uint64_t offset, extents[i].flags = cpu_to_be32(dirty ? NBD_STATE_DIRTY : 0); i++; begin = end; - dirty = !dirty; + dirty = next_dirty; } and then you merely delete the assignment to 'dirty' here. bdrv_dirty_iter_free(it); At any rate, the fix makes sense to me. Should I go ahead and incorporate the changes I've suggested and turn it into a pull request through my NBD tree, or would you like to send a v2? -- Best regards, Vladimir
Re: [Qemu-block] [PATCH] nbd/server: fix bitmap export
On 9/14/18 11:51 AM, Vladimir Sementsov-Ogievskiy wrote: bitmap_to_extents function is broken: it switches dirty variable after every iteration, however it can process only part of dirty (or zero) area during one iteration in case when this area is too large for one extent. Fortunately, the bug don't produce wrong extents: it just inserts zero-length extents between sequential extents representing large dirty (or zero) area. However, zero-length extents are abandoned by NBD s/abandoned by/forbidden by the/ protocol. So, careful client should consider such replay as server s/replay/reply/ fault and not-careful will likely ignore zero-length extents. Which camp is qemu 3.0 as client in? Does it tolerate the zero-length extent, and still manage to see correct information overall, or does it crash? Hmm - I think we're "safe" with qemu as client - right now, the only way qemu 3.0 accesses the qemu dirty bitmap over NBD is with my x-dirty-bitmap hack (commit 216ee3657), which uses block/nbd-client.c:nbd_client_co_block_status() to read the bitmap, and that always passes NBD_CMD_FLAG_REQ_ONE. qemu will assert() if nbd_client_co_block_status() doesn't make any progress, but from what I'm reading of your bug report, qemu as client never permits the server to answer with more than one extent, and the bug of a zero-length extent is triggered only after the first extent has been sent. Thus, the primary reason to accept this patch is not because of qemu 3.0 as client, but for interoperability with other clients. I'm planning on updating the commit message to add these additional details. Fix this by more careful handling of dirty variable. Bug was introduced in 3d068aff16 "nbd/server: implement dirty bitmap export", with the whole function. and presents in v3.0.0 release. s/presents/present/ Signed-off-by: Vladimir Sementsov-Ogievskiy CC: qemu-sta...@nongnu.org --- nbd/server.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/nbd/server.c b/nbd/server.c index ea5fe0eb33..12f721482d 100644 --- a/nbd/server.c +++ b/nbd/server.c @@ -1951,6 +1951,8 @@ static unsigned int bitmap_to_extents(BdrvDirtyBitmap *bitmap, uint64_t offset, assert(begin < overall_end && nb_extents); while (begin < overall_end && i < nb_extents) { +bool next_dirty = !dirty; + if (dirty) { end = bdrv_dirty_bitmap_next_zero(bitmap, begin); } else { @@ -1962,6 +1964,7 @@ static unsigned int bitmap_to_extents(BdrvDirtyBitmap *bitmap, uint64_t offset, end = MIN(bdrv_dirty_bitmap_size(bitmap), begin + UINT32_MAX + 1 - bdrv_dirty_bitmap_granularity(bitmap)); +next_dirty = dirty; } Rather than introducing next_dirty, couldn't you just make this: if (end == -1 || end - begin > UINT32_MAX) { /* Cap ... */ end = MIN(...); } else { dirty = !dirty; } if (dont_fragment && end > overall_end) { end = overall_end; @@ -1971,7 +1974,7 @@ static unsigned int bitmap_to_extents(BdrvDirtyBitmap *bitmap, uint64_t offset, extents[i].flags = cpu_to_be32(dirty ? NBD_STATE_DIRTY : 0); i++; begin = end; -dirty = !dirty; +dirty = next_dirty; } and then you merely delete the assignment to 'dirty' here. bdrv_dirty_iter_free(it); At any rate, the fix makes sense to me. Should I go ahead and incorporate the changes I've suggested and turn it into a pull request through my NBD tree, or would you like to send a v2? -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [Qemu-block] [Qemu-devel] [PATCH v3 0/4] transaction support for bitmap merge
On 09/14/2018 11:15 AM, Vladimir Sementsov-Ogievskiy wrote: > This is a last brick, necessary to play with nbd bitmap export in > conjunction with image fleecing. > > v3: > 01: fix type in commit message, add John's r-b > 02: splitted refactoring > 03: improve commit message, split some refactoring to 02 > 04: add John's r-b > 05: drop extra state variable, make it local instead. John's r-b. > > > v2: don't compare with v1, it is changed a lot, to do the whole thing > in .prepare instead of .commit. It is needed to be compatible with > backup block job transaction actions [John] > > Vladimir Sementsov-Ogievskiy (5): > dirty-bitmap: switch assert-fails to errors in bdrv_merge_dirty_bitmap > dirty-bitmap: rename bdrv_undo_clear_dirty_bitmap > dirty-bitmap: make it possible to restore bitmap after merge > blockdev: rename block-dirty-bitmap-clear transaction handlers > qapi: add transaction support for x-block-dirty-bitmap-merge > > qapi/transaction.json| 2 ++ > include/block/block_int.h| 2 +- > include/block/dirty-bitmap.h | 2 +- > include/qemu/hbitmap.h | 25 +-- > block/dirty-bitmap.c | 36 +- > blockdev.c | 59 +--- > util/hbitmap.c | 11 +-- > 7 files changed, 99 insertions(+), 38 deletions(-) > Reviewed-by: John Snow
Re: [Qemu-block] [PATCH v2 11/17] block-backend: Decrease in_flight only after callback
Am 14.09.2018 um 17:12 hat Paolo Bonzini geschrieben: > On 13/09/2018 18:59, Kevin Wolf wrote: > > Am 13.09.2018 um 17:10 hat Paolo Bonzini geschrieben: > >> On 13/09/2018 14:52, Kevin Wolf wrote: > >>> + if (qemu_get_current_aio_context() == qemu_get_aio_context()) { > >>> + /* If we are in the main thread, the callback is allowed to unref > >>> + * the BlockBackend, so we have to hold an additional reference */ > >>> + blk_ref(acb->rwco.blk); > >>> + } > >>> acb->common.cb(acb->common.opaque, acb->rwco.ret); > >>> + blk_dec_in_flight(acb->rwco.blk); > >>> + if (qemu_get_current_aio_context() == qemu_get_aio_context()) { > >>> + blk_unref(acb->rwco.blk); > >>> + } > >> > >> Is this something that happens only for some specific callers? That is, > >> which callers are sure that the callback is invoked from the main thread? > > > > I can't seem to reproduce the problem I saw any more even when reverting > > the bdrv_ref/unref pair. If I remember correctly it was actually a > > nested aio_poll() that was running a block job completion or something > > like that - which would obviously only happen on the main thread because > > the job intentionally defers to the main thread. > > > > The only reason I made this conditional is that I think bdrv_unref() > > still isn't safe outside the main thread, is it? > > Yes, making it conditional is correct, but it is quite fishy even with > the conditional. > > As you mention, you could have a nested aio_poll() in the main thread, > for example invoked from a bottom half, but in that case I'd rather > track the caller that is creating the bottom half and see if it lacks a > bdrv_ref/bdrv_unref (or perhaps it's even higher in the tree that is > missing). I went back to the commit where I first added the patch (it already contained the ref/unref pair) and tried if I could reproduce a bug with the pair removed. I couldn't. I'm starting to think that maybe I was just overly cautious with the ref/unref. I may have confused the nested aio_poll() crash with a different situation. I've dealt with so many crashes and hangs while working on this series that it's quite possible. Kevin
[Qemu-block] [PATCH] nbd/server: fix bitmap export
bitmap_to_extents function is broken: it switches dirty variable after every iteration, however it can process only part of dirty (or zero) area during one iteration in case when this area is too large for one extent. Fortunately, the bug don't produce wrong extents: it just inserts zero-length extents between sequential extents representing large dirty (or zero) area. However, zero-length extents are abandoned by NBD protocol. So, careful client should consider such replay as server fault and not-careful will likely ignore zero-length extents. Fix this by more careful handling of dirty variable. Bug was introduced in 3d068aff16 "nbd/server: implement dirty bitmap export", with the whole function. and presents in v3.0.0 release. Signed-off-by: Vladimir Sementsov-Ogievskiy --- nbd/server.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/nbd/server.c b/nbd/server.c index ea5fe0eb33..12f721482d 100644 --- a/nbd/server.c +++ b/nbd/server.c @@ -1951,6 +1951,8 @@ static unsigned int bitmap_to_extents(BdrvDirtyBitmap *bitmap, uint64_t offset, assert(begin < overall_end && nb_extents); while (begin < overall_end && i < nb_extents) { +bool next_dirty = !dirty; + if (dirty) { end = bdrv_dirty_bitmap_next_zero(bitmap, begin); } else { @@ -1962,6 +1964,7 @@ static unsigned int bitmap_to_extents(BdrvDirtyBitmap *bitmap, uint64_t offset, end = MIN(bdrv_dirty_bitmap_size(bitmap), begin + UINT32_MAX + 1 - bdrv_dirty_bitmap_granularity(bitmap)); +next_dirty = dirty; } if (dont_fragment && end > overall_end) { end = overall_end; @@ -1971,7 +1974,7 @@ static unsigned int bitmap_to_extents(BdrvDirtyBitmap *bitmap, uint64_t offset, extents[i].flags = cpu_to_be32(dirty ? NBD_STATE_DIRTY : 0); i++; begin = end; -dirty = !dirty; +dirty = next_dirty; } bdrv_dirty_iter_free(it); -- 2.18.0
Re: [Qemu-block] [PATCH v2 12/17] mirror: Fix potential use-after-free in active commit
Am 13.09.2018 um 22:55 hat Max Reitz geschrieben: > On 13.09.18 14:52, Kevin Wolf wrote: > > When starting an active commit job, other callbacks can run before > > mirror_start_job() calls bdrv_ref() where needed and cause the nodes to > > go away. Add another pair of bdrv_ref/unref() around it to protect > > against this case. > > > > Signed-off-by: Kevin Wolf > > --- > > block/mirror.c | 11 +++ > > 1 file changed, 11 insertions(+) > > Reviewed-by: Max Reitz > > But... How? Tried to reproduce the other bug Paolo was concerned about (good there is something like 'git reflog'!) and dug up this one instead. So the scenario seems to be test_stream_commit_1 in qemu-iotests 030. The backtrace when the BDS is deleted is the following: (rr) bt #0 0x7faeaf6145ec in __memset_sse2_unaligned_erms () at /lib64/libc.so.6 #1 0x7faeaf60414e in _int_free () at /lib64/libc.so.6 #2 0x7faeaf6093be in free () at /lib64/libc.so.6 #3 0x7faecaea6b4e in g_free () at /lib64/libglib-2.0.so.0 #4 0x55c0c94f9d6c in bdrv_delete (bs=0x55c0cbe69240) at block.c:3590 #5 0x55c0c94fbe82 in bdrv_unref (bs=0x55c0cbe69240) at block.c:4638 #6 0x55c0c94f6761 in bdrv_root_unref_child (child=0x55c0cbe57af0) at block.c:2188 #7 0x55c0c94fe239 in block_job_remove_all_bdrv (job=0x55c0ccf9e220) at blockjob.c:200 #8 0x55c0c94fdf1f in block_job_free (job=0x55c0ccf9e220) at blockjob.c:94 #9 0x55c0c94ffe0d in job_unref (job=0x55c0ccf9e220) at job.c:368 #10 0x55c0c95007cd in job_do_dismiss (job=0x55c0ccf9e220) at job.c:641 #11 0x55c0c95008d9 in job_conclude (job=0x55c0ccf9e220) at job.c:667 #12 0x55c0c9500b66 in job_finalize_single (job=0x55c0ccf9e220) at job.c:735 #13 0x55c0c94ff4fa in job_txn_apply (txn=0x55c0cbe19eb0, fn=0x55c0c9500a62 , lock=true) at job.c:151 #14 0x55c0c9500e92 in job_do_finalize (job=0x55c0ccf9e220) at job.c:822 #15 0x55c0c9501040 in job_completed_txn_success (job=0x55c0ccf9e220) at job.c:872 #16 0x55c0c950115c in job_completed (job=0x55c0ccf9e220, ret=0, error=0x0) at job.c:892 #17 0x55c0c92b572c in stream_complete (job=0x55c0ccf9e220, opaque=0x55c0cc050bc0) at block/stream.c:96 #18 0x55c0c950142b in job_defer_to_main_loop_bh (opaque=0x55c0cbe38a50) at job.c:981 #19 0x55c0c96171fc in aio_bh_call (bh=0x55c0cbe19f20) at util/async.c:90 #20 0x55c0c9617294 in aio_bh_poll (ctx=0x55c0cbd7b8d0) at util/async.c:118 #21 0x55c0c961c150 in aio_poll (ctx=0x55c0cbd7b8d0, blocking=true) at util/aio-posix.c:690 #22 0x55c0c957246c in bdrv_flush (bs=0x55c0cbe4b250) at block/io.c:2693 #23 0x55c0c94f8e46 in bdrv_reopen_prepare (reopen_state=0x55c0cbef4d68, queue=0x55c0cbeab9f0, errp=0x7ffd65617050) at block.c:3206 #24 0x55c0c94f883a in bdrv_reopen_multiple (ctx=0x55c0cbd7b8d0, bs_queue=0x55c0cbeab9f0, errp=0x7ffd656170c8) at block.c:3032 #25 0x55c0c94f8a00 in bdrv_reopen (bs=0x55c0cbe2c250, bdrv_flags=8194, errp=0x7ffd65617220) at block.c:3075 #26 0x55c0c956a23f in commit_active_start (job_id=0x0, bs=0x55c0cbd905b0, base=0x55c0cbe2c250, creation_flags=0, speed=0, on_error=BLOCKDEV_ON_ERROR_REPORT, filter_node_name=0x0, cb=0x0, opaque=0x0, auto_complete=false, errp=0x7ffd65617220) at block/mirror.c:1687 #27 0x55c0c927437c in qmp_block_commit (has_job_id=false, job_id=0x0, device=0x55c0cbef4d20 "drive0", has_base_node=false, base_node=0x0, has_base=true, base=0x55c0cbe38a00 "/home/kwolf/source/qemu/tests/qemu-iotests/scratch/img-3.img", has_top_node=false, top_node=0x0, has_top=false, top=0x0, has_backing_file=false, backing_file=0x0, has_speed=false, speed=0, has_filter_node_name=false, filter_node_name=0x0, errp=0x7ffd656172d8) at blockdev.c:3325 #28 0x55c0c928bb08 in qmp_marshal_block_commit (args=, ret=, errp=0x7ffd656173b8) at qapi/qapi-commands-block-core.c:409 #29 0x55c0c960beef in do_qmp_dispatch (errp=0x7ffd656173b0, allow_oob=false, request=, cmds=0x55c0c9f56f80 ) at qapi/qmp-dispatch.c:129 #30 0x55c0c960beef in qmp_dispatch (cmds=0x55c0c9f56f80 , request=, allow_oob=) at qapi/qmp-dispatch.c:171 #31 0x55c0c9111623 in monitor_qmp_dispatch (mon=0x55c0cbdbb930, req=0x55c0ccf90e00, id=0x0) at /home/kwolf/source/qemu/monitor.c:4168 #32 0x55c0c911191f in monitor_qmp_bh_dispatcher (data=0x0) at /home/kwolf/source/qemu/monitor.c:4237 #33 0x55c0c96171fc in aio_bh_call (bh=0x55c0cbccc840) at util/async.c:90 #34 0x55c0c9617294 in aio_bh_poll (ctx=0x55c0cbccc550) at util/async.c:118 #35 0x55c0c961b723 in aio_dispatch (ctx=0x55c0cbccc550) at util/aio-posix.c:436 #36 0x55c0c961762f in aio_ctx_dispatch (source=0x55c0cbccc550, callback=0x0, user_data=0x0) at util/async.c:261 #37 0x7faecaea1257 in g_main_context_dispatch () at /lib64/libglib-2.0.so.0 #38 0x55c0c961a2aa in glib_pollfds_poll () at util/main-loop.c:215 #39 0x55c0c961a2aa in os_host_main_loop_wait (timeout=0) at util/main-loop.c:238 #40 0x55c0c961a2aa in main_loop_wait
Re: [Qemu-block] [PATCH v5 4/6] nbd/server: implement dirty bitmap export
20.06.2018 21:09, Eric Blake wrote: On 06/20/2018 12:04 PM, Vladimir Sementsov-Ogievskiy wrote: 20.06.2018 19:27, Eric Blake wrote: On 06/09/2018 10:17 AM, Vladimir Sementsov-Ogievskiy wrote: Handle new NBD meta namespace: "qemu", and corresponding queries: "qemu:dirty-bitmap:". With new metadata context negotiated, BLOCK_STATUS query will reply with dirty-bitmap data, converted to extents. New public function nbd_export_bitmap selects bitmap to export. For now, only one bitmap may be exported. static int nbd_co_send_extents(NBDClient *client, uint64_t handle, - NBDExtent *extents, unsigned nb_extents, + NBDExtent *extents, unsigned int nb_extents, + uint64_t length, bool last, uint32_t context_id, Error **errp) { NBDStructuredMeta chunk; @@ -1890,7 +1897,9 @@ static int nbd_co_send_extents(NBDClient *client, uint64_t handle, {.iov_base = extents, .iov_len = nb_extents * sizeof(extents[0])} }; - set_be_chunk(, 0, NBD_REPLY_TYPE_BLOCK_STATUS, + trace_nbd_co_send_extents(handle, nb_extents, context_id, length, last); hm, length variable added only to be traced.. Ok, but a bit strange. Yes. It doesn't affect what goes over the wire, but when it comes to tracing, knowing the sum of the extents can be quite helpful (especially knowing if the server's reply is shorter or longer than the client's request, and the fact that when two or more contexts are negotiated by the client, the different contexts can have different length replies) +static unsigned int bitmap_to_extents(BdrvDirtyBitmap *bitmap, uint64_t offset, + uint64_t *length, NBDExtent *extents, length - in-out? worth comment? Sure. + unsigned int nb_extents, + bool dont_fragment) { uint64_t begin = offset, end; - uint64_t overall_end = offset + length; - unsigned i = 0; + uint64_t overall_end = offset + *length; + unsigned int i = 0; BdrvDirtyBitmapIter *it; bool dirty; @@ -1983,23 +1994,25 @@ static unsigned bitmap_to_extents(BdrvDirtyBitmap *bitmap, uint64_t offset, bdrv_dirty_bitmap_unlock(bitmap); + *length = end - begin; hm, it will always be zero, as at the end of the previous loop we have "begin = end;" Ah, then it should be end - offset. Thanks for the careful review. In fact, since ONLY the final extent can be larger than 2G (all earlier extents were short of the overall_end, and the incoming length is 32-bit), but the NBD protocol says that at most one extent can go beyond the original request, we do NOT want to split a >2G extent into multiple extents, but rather cap it to just shy of 4G at the granularity offered by the bitmap itself. At which point add_extents() never returns more than 1, and can just be inlined. return i; } static int nbd_co_send_bitmap(NBDClient *client, uint64_t handle, BdrvDirtyBitmap *bitmap, uint64_t offset, - uint64_t length, bool dont_fragment, + uint32_t length, bool dont_fragment, uint32_t context_id, Error **errp) { int ret; - unsigned nb_extents = dont_fragment ? 1 : NBD_MAX_BITMAP_EXTENTS; + unsigned int nb_extents = dont_fragment ? 1 : NBD_MAX_BITMAP_EXTENTS; NBDExtent *extents = g_new(NBDExtent, nb_extents); + uint64_t final_length = length; - nb_extents = bitmap_to_extents(bitmap, offset, length, extents, nb_extents, - dont_fragment); + nb_extents = bitmap_to_extents(bitmap, offset, _length, extents, + nb_extents, dont_fragment); - ret = nbd_co_send_extents(client, handle, extents, nb_extents, context_id, - errp); + ret = nbd_co_send_extents(client, handle, extents, nb_extents, + final_length, true, context_id, errp); hmm in-out pointer field only to trace final_length? may be just trace it in bitmap_to_extents? No, because base:allocation also benefits from tracing final_length. also, it's not trivial, that the function now sends FLAG_DONE. I think, worth add a comment, or a parameter like for nbd_co_send_block_status. It will simplify introducing new contexts in future. Do we anticipate adding any in the near future? But adding a parameter that is always true so that the callsite becomes more obvious on when to pass the done flag may indeed make future additions easier to spot in one place. So here's what else I'll squash in: diff --git i/nbd/server.c w/nbd/server.c index e84aea6cf03..7a96b296839 100644 --- i/nbd/server.c +++ w/nbd/server.c @@ -1881,9 +1881,10 @@ static int blockstatus_to_extent_be(BlockDriverState *bs, uint64_t offset, /* nbd_co_send_extents * - * @last controls
[Qemu-block] [PATCH v3 5/5] qapi: add transaction support for x-block-dirty-bitmap-merge
New action is like clean action: do the whole thing in .prepare and undo in .abort. This behavior for bitmap-changing actions is needed because backup job actions use bitmap in .prepare. Signed-off-by: Vladimir Sementsov-Ogievskiy Reviewed-by: John Snow --- qapi/transaction.json | 2 ++ blockdev.c| 37 - 2 files changed, 38 insertions(+), 1 deletion(-) diff --git a/qapi/transaction.json b/qapi/transaction.json index d7e4274550..5875cdb16c 100644 --- a/qapi/transaction.json +++ b/qapi/transaction.json @@ -48,6 +48,7 @@ # - @block-dirty-bitmap-clear: since 2.5 # - @x-block-dirty-bitmap-enable: since 3.0 # - @x-block-dirty-bitmap-disable: since 3.0 +# - @x-block-dirty-bitmap-merge: since 3.1 # - @blockdev-backup: since 2.3 # - @blockdev-snapshot: since 2.5 # - @blockdev-snapshot-internal-sync: since 1.7 @@ -63,6 +64,7 @@ 'block-dirty-bitmap-clear': 'BlockDirtyBitmap', 'x-block-dirty-bitmap-enable': 'BlockDirtyBitmap', 'x-block-dirty-bitmap-disable': 'BlockDirtyBitmap', + 'x-block-dirty-bitmap-merge': 'BlockDirtyBitmapMerge', 'blockdev-backup': 'BlockdevBackup', 'blockdev-snapshot': 'BlockdevSnapshot', 'blockdev-snapshot-internal-sync': 'BlockdevSnapshotInternal', diff --git a/blockdev.c b/blockdev.c index 5348e8ba9b..2917bf6e09 100644 --- a/blockdev.c +++ b/blockdev.c @@ -2112,6 +2112,35 @@ static void block_dirty_bitmap_disable_abort(BlkActionState *common) } } +static void block_dirty_bitmap_merge_prepare(BlkActionState *common, + Error **errp) +{ +BlockDirtyBitmapMerge *action; +BlockDirtyBitmapState *state = DO_UPCAST(BlockDirtyBitmapState, + common, common); +BdrvDirtyBitmap *merge_source; + +if (action_check_completion_mode(common, errp) < 0) { +return; +} + +action = common->action->u.x_block_dirty_bitmap_merge.data; +state->bitmap = block_dirty_bitmap_lookup(action->node, + action->dst_name, + >bs, + errp); +if (!state->bitmap) { +return; +} + +merge_source = bdrv_find_dirty_bitmap(state->bs, action->src_name); +if (!merge_source) { +return; +} + +bdrv_merge_dirty_bitmap(state->bitmap, merge_source, >backup, errp); +} + static void abort_prepare(BlkActionState *common, Error **errp) { error_setg(errp, "Transaction aborted using Abort action"); @@ -2182,7 +2211,13 @@ static const BlkActionOps actions[] = { .instance_size = sizeof(BlockDirtyBitmapState), .prepare = block_dirty_bitmap_disable_prepare, .abort = block_dirty_bitmap_disable_abort, - } +}, +[TRANSACTION_ACTION_KIND_X_BLOCK_DIRTY_BITMAP_MERGE] = { +.instance_size = sizeof(BlockDirtyBitmapState), +.prepare = block_dirty_bitmap_merge_prepare, +.commit = block_dirty_bitmap_free_backup, +.abort = block_dirty_bitmap_restore, +} }; /** -- 2.18.0
[Qemu-block] [PATCH v3 3/5] dirty-bitmap: make it possible to restore bitmap after merge
Add backup parameter to bdrv_merge_dirty_bitmap() to be used then with bdrv_restore_dirty_bitmap() if it needed to restore the bitmap after merge operation. This is needed to implement bitmap merge transaction action in further commit. Signed-off-by: Vladimir Sementsov-Ogievskiy --- include/block/dirty-bitmap.h | 2 +- include/qemu/hbitmap.h | 25 - block/dirty-bitmap.c | 17 ++--- blockdev.c | 2 +- util/hbitmap.c | 11 --- 5 files changed, 40 insertions(+), 17 deletions(-) diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h index 259bd27c40..201ff7f20b 100644 --- a/include/block/dirty-bitmap.h +++ b/include/block/dirty-bitmap.h @@ -71,7 +71,7 @@ void bdrv_dirty_bitmap_set_persistance(BdrvDirtyBitmap *bitmap, bool persistent); void bdrv_dirty_bitmap_set_qmp_locked(BdrvDirtyBitmap *bitmap, bool qmp_locked); void bdrv_merge_dirty_bitmap(BdrvDirtyBitmap *dest, const BdrvDirtyBitmap *src, - Error **errp); + HBitmap **backup, Error **errp); /* Functions that require manual locking. */ void bdrv_dirty_bitmap_lock(BdrvDirtyBitmap *bitmap); diff --git a/include/qemu/hbitmap.h b/include/qemu/hbitmap.h index ddca52c48e..a7cb780592 100644 --- a/include/qemu/hbitmap.h +++ b/include/qemu/hbitmap.h @@ -73,16 +73,23 @@ void hbitmap_truncate(HBitmap *hb, uint64_t size); /** * hbitmap_merge: - * @a: The bitmap to store the result in. - * @b: The bitmap to merge into @a. - * @return true if the merge was successful, - * false if it was not attempted. - * - * Merge two bitmaps together. - * A := A (BITOR) B. - * B is left unmodified. + * + * Store result of merging @a and @b into @result. + * @result is allowed to be equal to @a or @b. + * + * Return true if the merge was successful, + *false if it was not attempted. + */ +bool hbitmap_merge(const HBitmap *a, const HBitmap *b, HBitmap *result); + +/** + * hbitmap_can_merge: + * + * hbitmap_can_merge(a, b) && hbitmap_can_merge(a, result) is sufficient and + * necessary for hbitmap_merge will not fail. + * */ -bool hbitmap_merge(HBitmap *a, const HBitmap *b); +bool hbitmap_can_merge(const HBitmap *a, const HBitmap *b); /** * hbitmap_empty: diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c index 017ee9db46..8ac933cf1c 100644 --- a/block/dirty-bitmap.c +++ b/block/dirty-bitmap.c @@ -314,7 +314,7 @@ BdrvDirtyBitmap *bdrv_reclaim_dirty_bitmap_locked(BlockDriverState *bs, return NULL; } -if (!hbitmap_merge(parent->bitmap, successor->bitmap)) { +if (!hbitmap_merge(parent->bitmap, successor->bitmap, parent->bitmap)) { error_setg(errp, "Merging of parent and successor bitmap failed"); return NULL; } @@ -791,8 +791,10 @@ int64_t bdrv_dirty_bitmap_next_zero(BdrvDirtyBitmap *bitmap, uint64_t offset) } void bdrv_merge_dirty_bitmap(BdrvDirtyBitmap *dest, const BdrvDirtyBitmap *src, - Error **errp) + HBitmap **backup, Error **errp) { +bool ret; + /* only bitmaps from one bds are supported */ assert(dest->mutex == src->mutex); @@ -810,11 +812,20 @@ void bdrv_merge_dirty_bitmap(BdrvDirtyBitmap *dest, const BdrvDirtyBitmap *src, goto out; } -if (!hbitmap_merge(dest->bitmap, src->bitmap)) { +if (!hbitmap_can_merge(dest->bitmap, src->bitmap)) { error_setg(errp, "Bitmaps are incompatible and can't be merged"); goto out; } +if (backup) { +*backup = dest->bitmap; +dest->bitmap = hbitmap_alloc(dest->size, hbitmap_granularity(*backup)); +ret = hbitmap_merge(*backup, src->bitmap, dest->bitmap); +} else { +ret = hbitmap_merge(dest->bitmap, src->bitmap, dest->bitmap); +} +assert(ret); + out: qemu_mutex_unlock(dest->mutex); } diff --git a/blockdev.c b/blockdev.c index f133e87414..9cb29ca63e 100644 --- a/blockdev.c +++ b/blockdev.c @@ -2961,7 +2961,7 @@ void qmp_x_block_dirty_bitmap_merge(const char *node, const char *dst_name, return; } -bdrv_merge_dirty_bitmap(dst, src, errp); +bdrv_merge_dirty_bitmap(dst, src, NULL, errp); } BlockDirtyBitmapSha256 *qmp_x_debug_block_dirty_bitmap_sha256(const char *node, diff --git a/util/hbitmap.c b/util/hbitmap.c index bcd304041a..d5aca5159f 100644 --- a/util/hbitmap.c +++ b/util/hbitmap.c @@ -723,6 +723,10 @@ void hbitmap_truncate(HBitmap *hb, uint64_t size) } } +bool hbitmap_can_merge(const HBitmap *a, const HBitmap *b) +{ +return (a->size == b->size) && (a->granularity == b->granularity); +} /** * Given HBitmaps A and B, let A := A (BITOR) B. @@ -731,14 +735,15 @@ void hbitmap_truncate(HBitmap *hb, uint64_t size) * @return true if the merge was successful, * false if it was not attempted. */ -bool hbitmap_merge(HBitmap *a,
[Qemu-block] [PATCH v3 1/5] dirty-bitmap: switch assert-fails to errors in bdrv_merge_dirty_bitmap
Move checks from qmp_x_block_dirty_bitmap_merge() to bdrv_merge_dirty_bitmap(), to share them with dirty bitmap merge transaction action in future commit. Note: for now, only qmp_x_block_dirty_bitmap_merge() calls bdrv_merge_dirty_bitmap(). Signed-off-by: Vladimir Sementsov-Ogievskiy Reviewed-by: John Snow --- block/dirty-bitmap.c | 15 +-- blockdev.c | 10 -- 2 files changed, 13 insertions(+), 12 deletions(-) diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c index c9b8a6fd52..6c8761e027 100644 --- a/block/dirty-bitmap.c +++ b/block/dirty-bitmap.c @@ -798,12 +798,23 @@ void bdrv_merge_dirty_bitmap(BdrvDirtyBitmap *dest, const BdrvDirtyBitmap *src, qemu_mutex_lock(dest->mutex); -assert(bdrv_dirty_bitmap_enabled(dest)); -assert(!bdrv_dirty_bitmap_readonly(dest)); +if (bdrv_dirty_bitmap_frozen(dest)) { +error_setg(errp, "Bitmap '%s' is frozen and cannot be modified", + dest->name); +goto out; +} + +if (bdrv_dirty_bitmap_readonly(dest)) { +error_setg(errp, "Bitmap '%s' is readonly and cannot be modified", + dest->name); +goto out; +} if (!hbitmap_merge(dest->bitmap, src->bitmap)) { error_setg(errp, "Bitmaps are incompatible and can't be merged"); +goto out; } +out: qemu_mutex_unlock(dest->mutex); } diff --git a/blockdev.c b/blockdev.c index 72f5347df5..902338e815 100644 --- a/blockdev.c +++ b/blockdev.c @@ -2955,16 +2955,6 @@ void qmp_x_block_dirty_bitmap_merge(const char *node, const char *dst_name, return; } -if (bdrv_dirty_bitmap_frozen(dst)) { -error_setg(errp, "Bitmap '%s' is frozen and cannot be modified", - dst_name); -return; -} else if (bdrv_dirty_bitmap_readonly(dst)) { -error_setg(errp, "Bitmap '%s' is readonly and cannot be modified", - dst_name); -return; -} - src = bdrv_find_dirty_bitmap(bs, src_name); if (!src) { error_setg(errp, "Dirty bitmap '%s' not found", src_name); -- 2.18.0
[Qemu-block] [PATCH v3 2/5] dirty-bitmap: rename bdrv_undo_clear_dirty_bitmap
Use more generic names to reuse the function for bitmap merge in the following commit. Signed-off-by: Vladimir Sementsov-Ogievskiy --- include/block/block_int.h | 2 +- block/dirty-bitmap.c | 4 ++-- blockdev.c| 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/include/block/block_int.h b/include/block/block_int.h index 903b9c1034..677065ae30 100644 --- a/include/block/block_int.h +++ b/include/block/block_int.h @@ -1149,7 +1149,7 @@ bool blk_dev_is_medium_locked(BlockBackend *blk); void bdrv_set_dirty(BlockDriverState *bs, int64_t offset, int64_t bytes); void bdrv_clear_dirty_bitmap(BdrvDirtyBitmap *bitmap, HBitmap **out); -void bdrv_undo_clear_dirty_bitmap(BdrvDirtyBitmap *bitmap, HBitmap *in); +void bdrv_restore_dirty_bitmap(BdrvDirtyBitmap *bitmap, HBitmap *backup); void bdrv_inc_in_flight(BlockDriverState *bs); void bdrv_dec_in_flight(BlockDriverState *bs); diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c index 6c8761e027..017ee9db46 100644 --- a/block/dirty-bitmap.c +++ b/block/dirty-bitmap.c @@ -633,12 +633,12 @@ void bdrv_clear_dirty_bitmap(BdrvDirtyBitmap *bitmap, HBitmap **out) bdrv_dirty_bitmap_unlock(bitmap); } -void bdrv_undo_clear_dirty_bitmap(BdrvDirtyBitmap *bitmap, HBitmap *in) +void bdrv_restore_dirty_bitmap(BdrvDirtyBitmap *bitmap, HBitmap *backup) { HBitmap *tmp = bitmap->bitmap; assert(bdrv_dirty_bitmap_enabled(bitmap)); assert(!bdrv_dirty_bitmap_readonly(bitmap)); -bitmap->bitmap = in; +bitmap->bitmap = backup; hbitmap_free(tmp); } diff --git a/blockdev.c b/blockdev.c index 902338e815..f133e87414 100644 --- a/blockdev.c +++ b/blockdev.c @@ -2032,7 +2032,7 @@ static void block_dirty_bitmap_clear_abort(BlkActionState *common) common, common); if (state->backup) { -bdrv_undo_clear_dirty_bitmap(state->bitmap, state->backup); +bdrv_restore_dirty_bitmap(state->bitmap, state->backup); } } -- 2.18.0
[Qemu-block] [PATCH v3 0/4] transaction support for bitmap merge
This is a last brick, necessary to play with nbd bitmap export in conjunction with image fleecing. v3: 01: fix type in commit message, add John's r-b 02: splitted refactoring 03: improve commit message, split some refactoring to 02 04: add John's r-b 05: drop extra state variable, make it local instead. John's r-b. v2: don't compare with v1, it is changed a lot, to do the whole thing in .prepare instead of .commit. It is needed to be compatible with backup block job transaction actions [John] Vladimir Sementsov-Ogievskiy (5): dirty-bitmap: switch assert-fails to errors in bdrv_merge_dirty_bitmap dirty-bitmap: rename bdrv_undo_clear_dirty_bitmap dirty-bitmap: make it possible to restore bitmap after merge blockdev: rename block-dirty-bitmap-clear transaction handlers qapi: add transaction support for x-block-dirty-bitmap-merge qapi/transaction.json| 2 ++ include/block/block_int.h| 2 +- include/block/dirty-bitmap.h | 2 +- include/qemu/hbitmap.h | 25 +-- block/dirty-bitmap.c | 36 +- blockdev.c | 59 +--- util/hbitmap.c | 11 +-- 7 files changed, 99 insertions(+), 38 deletions(-) -- 2.18.0
[Qemu-block] [PATCH v3 4/5] blockdev: rename block-dirty-bitmap-clear transaction handlers
Rename block-dirty-bitmap-clear transaction handlers to reuse them for x-block-dirty-bitmap-merge transaction in the following patch. Signed-off-by: Vladimir Sementsov-Ogievskiy Reviewed-by: John Snow --- blockdev.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/blockdev.c b/blockdev.c index 9cb29ca63e..5348e8ba9b 100644 --- a/blockdev.c +++ b/blockdev.c @@ -2026,7 +2026,7 @@ static void block_dirty_bitmap_clear_prepare(BlkActionState *common, bdrv_clear_dirty_bitmap(state->bitmap, >backup); } -static void block_dirty_bitmap_clear_abort(BlkActionState *common) +static void block_dirty_bitmap_restore(BlkActionState *common) { BlockDirtyBitmapState *state = DO_UPCAST(BlockDirtyBitmapState, common, common); @@ -2036,7 +2036,7 @@ static void block_dirty_bitmap_clear_abort(BlkActionState *common) } } -static void block_dirty_bitmap_clear_commit(BlkActionState *common) +static void block_dirty_bitmap_free_backup(BlkActionState *common) { BlockDirtyBitmapState *state = DO_UPCAST(BlockDirtyBitmapState, common, common); @@ -2170,8 +2170,8 @@ static const BlkActionOps actions[] = { [TRANSACTION_ACTION_KIND_BLOCK_DIRTY_BITMAP_CLEAR] = { .instance_size = sizeof(BlockDirtyBitmapState), .prepare = block_dirty_bitmap_clear_prepare, -.commit = block_dirty_bitmap_clear_commit, -.abort = block_dirty_bitmap_clear_abort, +.commit = block_dirty_bitmap_free_backup, +.abort = block_dirty_bitmap_restore, }, [TRANSACTION_ACTION_KIND_X_BLOCK_DIRTY_BITMAP_ENABLE] = { .instance_size = sizeof(BlockDirtyBitmapState), -- 2.18.0
Re: [Qemu-block] [PATCH v2 03/17] aio-wait: Increase num_waiters even in home thread
On 13/09/2018 19:21, Kevin Wolf wrote: > Am 13.09.2018 um 17:11 hat Paolo Bonzini geschrieben: >> On 13/09/2018 14:52, Kevin Wolf wrote: >>> Even if AIO_WAIT_WHILE() is called in the home context of the >>> AioContext, we still want to allow the condition to change depending on >>> other threads as long as they kick the AioWait. Specfically block jobs >>> can be running in an I/O thread and should then be able to kick a drain >>> in the main loop context. >> >> I don't understand the scenario very well. Why hasn't the main loop's >> drain incremented num_waiters? > > We are in this path (that didn't increase num_waiters before this patch) > because drain, and therefore AIO_WAIT_WHILE(), was called from the main > thread. But draining depends on a job in a different thread, so we need > to be able to be kicked when that job finally is quiesced. Ah, because AIO_WAIT_WHILE() is invoked with ctx == qemu_get_aio_context(), but the condition is triggered *outside* the main context? Tricky, but seems correct. AIO_WAIT_WHILE() anyway is not a fast path. Paolo > If I revert this, the test /bdrv-drain/blockjob/iothread/drain hangs. > This is a block job that works on two nodes in two different contexts. > > (I think I saw this with mirror, which doesn't take additional locks > when it issues a request, so maybe there's a bit more wrong there... We > clearly need more testing with iothreads, this series probably only > scratches the surface.) > >> Note I'm not against the patch---though I would hoist the >> atomic_inc/atomic_dec outside the if, since it's done in both branches. > > Ok. > > Kevin >
Re: [Qemu-block] [PATCH v2 11/17] block-backend: Decrease in_flight only after callback
On 13/09/2018 18:59, Kevin Wolf wrote: > Am 13.09.2018 um 17:10 hat Paolo Bonzini geschrieben: >> On 13/09/2018 14:52, Kevin Wolf wrote: >>> + if (qemu_get_current_aio_context() == qemu_get_aio_context()) { >>> + /* If we are in the main thread, the callback is allowed to unref >>> + * the BlockBackend, so we have to hold an additional reference */ >>> + blk_ref(acb->rwco.blk); >>> + } >>> acb->common.cb(acb->common.opaque, acb->rwco.ret); >>> + blk_dec_in_flight(acb->rwco.blk); >>> + if (qemu_get_current_aio_context() == qemu_get_aio_context()) { >>> + blk_unref(acb->rwco.blk); >>> + } >> >> Is this something that happens only for some specific callers? That is, >> which callers are sure that the callback is invoked from the main thread? > > I can't seem to reproduce the problem I saw any more even when reverting > the bdrv_ref/unref pair. If I remember correctly it was actually a > nested aio_poll() that was running a block job completion or something > like that - which would obviously only happen on the main thread because > the job intentionally defers to the main thread. > > The only reason I made this conditional is that I think bdrv_unref() > still isn't safe outside the main thread, is it? Yes, making it conditional is correct, but it is quite fishy even with the conditional. As you mention, you could have a nested aio_poll() in the main thread, for example invoked from a bottom half, but in that case I'd rather track the caller that is creating the bottom half and see if it lacks a bdrv_ref/bdrv_unref (or perhaps it's even higher in the tree that is missing). Paolo
Re: [Qemu-block] [Qemu-devel] Can I only commit from active image to corresponding range of its backing file by qemu cmd?
On 9/13/18 9:19 PM, lampahome wrote: Sorry, I need to explain what case I want to do Todo: I want to *backup a block device into qcow2 format image.* I met a problem which is the *file size limit of filesystem* ex: Max is 16TB for any file in ext4, but the block device maybe 32TB or more. I figure out one way is to *divide data of device into 1TB chunk* and save every chunk into qcow2 image cuz I don't change filesystem, and connect with backing chain. A better way would be to use a different filesystem that does not have those limits, or even better to just directly use a raw block device with the size you need instead of worrying about storing a file system on top of the block device just to introduce artificial size limitations into the mix. LVM is great for that. *(That's what I said range is different)* Ex: 1st chunk of device will save into image.000 2nd chunk of device will save into image.001 Nth chunk of device will save into image.(N-1) ...etc I can see all block device data when I mount image.(N-1) by qemu-nbd cuz the chunk doesn't overlap and all chunks connect by backing chain. How exactly did you create those images? I'm trying to verify the steps you used to split the image. I know the concept of the split, but without seeing actual commands used, I don't know that you actually accomplished the split in the manner desired. (It's okay if a reproduction uses smaller scales for speed, such as splitting a 32M image across 1M qcow2 files - the point remains that seeing the actual steps used may offer additional insights into your usage scenario). Or are you trying to ask if it is possible to create such a fragmented design with current tools? (The answer that we've given you is that no, it is not easy to do, because no one has needed it so far). There's no way to tell a running qemu that writes to offsets 0-1M go into one file, while writes to offsets 1M to 2M go into another - ALL writes go into the currently active layer, regardless of the offset represented by the write. It would be possible to come up with a new driver (or to add yet another mode to the existing quorum driver) that DOES allow runtime concatenation of multiple subsidiary devices, in order to present a linear view of those images as a single guest device. To an extent, that's what 'qemu-img convert image1 image2 imageout' is doing, except that qemu-img is doing it via manual hacks, rather than something baked into the internal qemu block layer (we'd need it in the qemu block layer for it to work with a running guest with random access, rather than just a one-time conversion pass). But no one has submitted patches for that yet. Now I want to do next thing: *Incremental backup* When I modify data of 1st chunk, what I thought is to write new 1st chunk to new image *image.N* and let *imgae.(N-1)* be the backing file of *image.N* . That's cuz I want to store the data before modified to roll back anytime. Qemu DOES support incremental backups via persistent bitmaps coupled with NBD exports. See https://bugzilla.redhat.com/show_bug.cgi?id=1207657#c27 for a demonstration of all the steps involved, but it is quite possible to create an NBD export of a point-in-time incremental of a running guest, where you can then query over NBD which portions of the backup represent deltas from your earlier point in time (by using a bitmap to track which clusters were written from the earlier point in time), and where you can read the data from NBD in ANY manner you see fit (including reading dirty clusters from 0-1M to write into backup file .000, reading dirty clusters from 1M-2M to write into backup file .001, and so on). So if you want to split your backing file into ranges (which I already questioned as to how you plan to do that, given that the subsequent writes are not split), you can at least create incremental backups that are also split. So now I have two *version of block device(like concept of snapshot)*: One is image.000 to image.(N-1). I can access the data before modify by mount image.(N-1) through qemu-nbd The other one is image.000 to image.N. I can access the data after modify by mount image.N through qemu-nbd(cuz the visible 1st chunk are in the image.N) Consider about the situation: 000 A - - - - - - - - <--- store the 1st chunk of block device 001 - B - - - - - - - 002 - - C - - - - - - (1st state of block device) 003 A' - - - - - - - - <--- store the 1st chunk of block device, but data is different 004 - - - D - - - - - (2nd state of block device) 005 - - - - E - - - - (3rd state of block device) The original problem is If I want to remove the 2nd state(003 and 004) but I need to keep the data of 003 and 004. If I just commit 003, the A' of 003 must be committed into 002 cuz 002 is the backing file of 003. I try to figure out some way to let it only commit from 003 into 000. I'm not quite following your diagram. My naive read
Re: [Qemu-block] [Qemu-devel] [PATCH v2 4/4] qapi: add transaction support for x-block-dirty-bitmap-merge
11.09.2018 22:45, John Snow wrote: On 07/06/2018 07:36 AM, Vladimir Sementsov-Ogievskiy wrote: New action is like clean action: do the whole thing in .prepare and undo in .abort. This behavior for bitmap-changing actions is needed because backup job actions use bitmap in .prepare. Signed-off-by: Vladimir Sementsov-Ogievskiy --- qapi/transaction.json | 2 ++ blockdev.c| 38 +- 2 files changed, 39 insertions(+), 1 deletion(-) diff --git a/qapi/transaction.json b/qapi/transaction.json index d7e4274550..5875cdb16c 100644 --- a/qapi/transaction.json +++ b/qapi/transaction.json @@ -48,6 +48,7 @@ # - @block-dirty-bitmap-clear: since 2.5 # - @x-block-dirty-bitmap-enable: since 3.0 # - @x-block-dirty-bitmap-disable: since 3.0 +# - @x-block-dirty-bitmap-merge: since 3.1 # - @blockdev-backup: since 2.3 # - @blockdev-snapshot: since 2.5 # - @blockdev-snapshot-internal-sync: since 1.7 @@ -63,6 +64,7 @@ 'block-dirty-bitmap-clear': 'BlockDirtyBitmap', 'x-block-dirty-bitmap-enable': 'BlockDirtyBitmap', 'x-block-dirty-bitmap-disable': 'BlockDirtyBitmap', + 'x-block-dirty-bitmap-merge': 'BlockDirtyBitmapMerge', 'blockdev-backup': 'BlockdevBackup', 'blockdev-snapshot': 'BlockdevSnapshot', 'blockdev-snapshot-internal-sync': 'BlockdevSnapshotInternal', diff --git a/blockdev.c b/blockdev.c index 5348e8ba9b..feebbb9a9a 100644 --- a/blockdev.c +++ b/blockdev.c @@ -1940,6 +1940,7 @@ static void blockdev_backup_clean(BlkActionState *common) typedef struct BlockDirtyBitmapState { BlkActionState common; BdrvDirtyBitmap *bitmap; +BdrvDirtyBitmap *merge_source; Is this necessary? looks like it isn't, will drop it. BlockDriverState *bs; HBitmap *backup; bool prepared; @@ -2112,6 +2113,35 @@ static void block_dirty_bitmap_disable_abort(BlkActionState *common) } } +static void block_dirty_bitmap_merge_prepare(BlkActionState *common, + Error **errp) +{ +BlockDirtyBitmapMerge *action; +BlockDirtyBitmapState *state = DO_UPCAST(BlockDirtyBitmapState, + common, common); + +if (action_check_completion_mode(common, errp) < 0) { +return; +} + +action = common->action->u.x_block_dirty_bitmap_merge.data; +state->bitmap = block_dirty_bitmap_lookup(action->node, + action->dst_name, + >bs, + errp); +if (!state->bitmap) { +return; +} + +state->merge_source = bdrv_find_dirty_bitmap(state->bs, action->src_name); +if (!state->merge_source) { +return; +} + +bdrv_merge_dirty_bitmap(state->bitmap, state->merge_source, >backup, +errp); +} + static void abort_prepare(BlkActionState *common, Error **errp) { error_setg(errp, "Transaction aborted using Abort action"); @@ -2182,7 +2212,13 @@ static const BlkActionOps actions[] = { .instance_size = sizeof(BlockDirtyBitmapState), .prepare = block_dirty_bitmap_disable_prepare, .abort = block_dirty_bitmap_disable_abort, - } +}, +[TRANSACTION_ACTION_KIND_X_BLOCK_DIRTY_BITMAP_MERGE] = { +.instance_size = sizeof(BlockDirtyBitmapState), +.prepare = block_dirty_bitmap_merge_prepare, +.commit = block_dirty_bitmap_free_backup, +.abort = block_dirty_bitmap_restore, +} }; /** If the new state is not necessary and you remove it: Reviewed-by: John Snow -- Best regards, Vladimir
Re: [Qemu-block] [Qemu-devel] Can I only commit from active image to corresponding range of its backing file by qemu cmd?
Sorry, I need to explain what case I want to do Todo: I want to *backup a block device into qcow2 format image.* I met a problem which is the *file size limit of filesystem* ex: Max is 16TB for any file in ext4, but the block device maybe 32TB or more. I figure out one way is to *divide data of device into 1TB chunk* and save every chunk into qcow2 image cuz I don't change filesystem, and connect with backing chain. *(That's what I said range is different)* Ex: 1st chunk of device will save into image.000 2nd chunk of device will save into image.001 Nth chunk of device will save into image.(N-1) ...etc I can see all block device data when I mount image.(N-1) by qemu-nbd cuz the chunk doesn't overlap and all chunks connect by backing chain. Now I want to do next thing: *Incremental backup* When I modify data of 1st chunk, what I thought is to write new 1st chunk to new image *image.N* and let *imgae.(N-1)* be the backing file of *image.N* . That's cuz I want to store the data before modified to roll back anytime. So now I have two *version of block device(like concept of snapshot)*: One is image.000 to image.(N-1). I can access the data before modify by mount image.(N-1) through qemu-nbd The other one is image.000 to image.N. I can access the data after modify by mount image.N through qemu-nbd(cuz the visible 1st chunk are in the image.N) Consider about the situation: 000 A - - - - - - - - <--- store the 1st chunk of block device 001 - B - - - - - - - 002 - - C - - - - - - (1st state of block device) 003 A' - - - - - - - - <--- store the 1st chunk of block device, but data is different 004 - - - D - - - - - (2nd state of block device) 005 - - - - E - - - - (3rd state of block device) The original problem is If I want to remove the 2nd state(003 and 004) but I need to keep the data of 003 and 004. If I just commit 003, the A' of 003 must be committed into 002 cuz 002 is the backing file of 003. I try to figure out some way to let it only commit from 003 into 000.
[Qemu-block] [PATCH] curl: Make sslverify=off disable host as well as peer verification.
The sslverify setting is supposed to turn off all TLS certificate checks in libcurl. However because of the way we use it, it only turns off peer certificate authenticity checks (CURLOPT_SSL_VERIFYPEER). This patch makes it also turn off the check that the server name in the certificate is the same as the server you're connecting to (CURLOPT_SSL_VERIFYHOST). We can use Google's server at 8.8.8.8 which happens to have a bad TLS certificate to demonstrate this: $ ./qemu-img create -q -f qcow2 -b 'json: { "file.sslverify": "off", "file.driver": "https", "file.url": "https://8.8.8.8/foo; }' /var/tmp/file.qcow2 qemu-img: /var/tmp/file.qcow2: CURL: Error opening file: SSL: no alternative certificate subject name matches target host name '8.8.8.8' Could not open backing image to determine size. With this patch applied, qemu-img connects to the server regardless of the bad certificate: $ ./qemu-img create -q -f qcow2 -b 'json: { "file.sslverify": "off", "file.driver": "https", "file.url": "https://8.8.8.8/foo; }' /var/tmp/file.qcow2 qemu-img: /var/tmp/file.qcow2: CURL: Error opening file: The requested URL returned error: 404 Not Found (The 404 error is expected because 8.8.8.8 is not actually serving a file called "/foo".) Of course the default (without sslverify=off) remains to always check the certificate: $ ./qemu-img create -q -f qcow2 -b 'json: { "file.driver": "https", "file.url": "https://8.8.8.8/foo; }' /var/tmp/file.qcow2 qemu-img: /var/tmp/file.qcow2: CURL: Error opening file: SSL: no alternative certificate subject name matches target host name '8.8.8.8' Could not open backing image to determine size. Further information about the two settings is available here: https://curl.haxx.se/libcurl/c/CURLOPT_SSL_VERIFYPEER.html https://curl.haxx.se/libcurl/c/CURLOPT_SSL_VERIFYHOST.html Signed-off-by: Richard W.M. Jones --- block/curl.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/block/curl.c b/block/curl.c index 229bb84a27..fabb2b4da7 100644 --- a/block/curl.c +++ b/block/curl.c @@ -483,6 +483,8 @@ static int curl_init_state(BDRVCURLState *s, CURLState *state) curl_easy_setopt(state->curl, CURLOPT_URL, s->url); curl_easy_setopt(state->curl, CURLOPT_SSL_VERIFYPEER, (long) s->sslverify); +curl_easy_setopt(state->curl, CURLOPT_SSL_VERIFYHOST, + s->sslverify ? 2L : 0L); if (s->cookie) { curl_easy_setopt(state->curl, CURLOPT_COOKIE, s->cookie); } -- 2.19.0.rc0
Re: [Qemu-block] [PATCH v2 11/17] block-backend: Decrease in_flight only after callback
On Thu, 09/13 18:59, Kevin Wolf wrote: > Am 13.09.2018 um 17:10 hat Paolo Bonzini geschrieben: > > On 13/09/2018 14:52, Kevin Wolf wrote: > > > + if (qemu_get_current_aio_context() == qemu_get_aio_context()) { > > > + /* If we are in the main thread, the callback is allowed to unref > > > + * the BlockBackend, so we have to hold an additional reference */ > > > + blk_ref(acb->rwco.blk); > > > + } > > > acb->common.cb(acb->common.opaque, acb->rwco.ret); > > > + blk_dec_in_flight(acb->rwco.blk); > > > + if (qemu_get_current_aio_context() == qemu_get_aio_context()) { > > > + blk_unref(acb->rwco.blk); > > > + } > > > > Is this something that happens only for some specific callers? That is, > > which callers are sure that the callback is invoked from the main thread? > > I can't seem to reproduce the problem I saw any more even when reverting > the bdrv_ref/unref pair. If I remember correctly it was actually a > nested aio_poll() that was running a block job completion or something > like that - which would obviously only happen on the main thread because > the job intentionally defers to the main thread. > > The only reason I made this conditional is that I think bdrv_unref() > still isn't safe outside the main thread, is it? I think that is correct. Fam