[Qemu-block] [PATCH] ratelimit: don't align wait time with slices
It is possible for rate limited writes to keep overshooting a slice's quota by a tiny amount causing the slice-aligned waiting period to effectively halve the rate. Signed-off-by: Wolfgang Bumiller--- Copied the Ccs from the discussion thread, hope that's fine, as I also just noticed that for my reply containing this snippet I had hit reply on the mail that did not contain those Ccs yet, sorry about that. include/qemu/ratelimit.h | 11 +-- 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/include/qemu/ratelimit.h b/include/qemu/ratelimit.h index 8dece483f5..1b38291823 100644 --- a/include/qemu/ratelimit.h +++ b/include/qemu/ratelimit.h @@ -36,7 +36,7 @@ typedef struct { static inline int64_t ratelimit_calculate_delay(RateLimit *limit, uint64_t n) { int64_t now = qemu_clock_get_ns(QEMU_CLOCK_REALTIME); -uint64_t delay_slices; +double delay_slices; assert(limit->slice_quota && limit->slice_ns); @@ -55,12 +55,11 @@ static inline int64_t ratelimit_calculate_delay(RateLimit *limit, uint64_t n) return 0; } -/* Quota exceeded. Calculate the next time slice we may start - * sending data again. */ -delay_slices = (limit->dispatched + limit->slice_quota - 1) / -limit->slice_quota; +/* Quota exceeded. Wait based on the excess amount and then start a new + * slice. */ +delay_slices = (double)limit->dispatched / limit->slice_quota; limit->slice_end_time = limit->slice_start_time + -delay_slices * limit->slice_ns; +(uint64_t)(delay_slices * limit->slice_ns); return limit->slice_end_time - now; } -- 2.11.0
Re: [Qemu-block] [Qemu-devel] [PATCH v2 3/6] qapi: add nbd-server-remove
"Dr. David Alan Gilbert"writes: > * Vladimir Sementsov-Ogievskiy (vsement...@virtuozzo.com) wrote: >> 26.01.2018 18:05, Dr. David Alan Gilbert wrote: >> > * Vladimir Sementsov-Ogievskiy (vsement...@virtuozzo.com) wrote: [...] >> > > most of commands, ported to hmp are done in same style: they just call >> > > corresponding qmp command. HMP commands *should* call the QMP command to do the actual work. That way, we *know* all the functionality is available in QMP, and HMP is consistent with it. Sometimes, calling helpers shared with QMP is more convenient, and that's okay, but then you have to think about QMP completeness and HMP/QMP consistency. The only exception are HMP commands that don't make sense in QMP, such as @cpu. >> > > Isn't it better to provide common interface for calling qmp commands >> > > through >> > > HMP monitor, to never >> > > create hmp versions of new commands? they will be available >> > > automatically. >> > It would be nice to do that, but they're not that consistent in how they >> > convert parameters and options, but I occasionally wonder if we could >> > automate more of it. >> >> >> What about allowing some new syntax in hmp, directly mapped to qmp? >> >> something like >> >> >>> blockdev-add id disk driver qcow2 cache {writeback true direct true} aio >> native discard unmap file {driver file filename /tmp/somedisk} >> >> ? > > Hmm, I don't particularly find that easy to read either; however the > actual block device specification for HMP should be the same as what we > pass on the command line, so we only have to worry about any extra > things that are part of blockdev_add. > (I'm sure we can find a way of making the one we pass on the commandline > more readable as well, there's so much duplication). Good points. QMP syntax is different for a good reason: it serves machines rather than humans. Both HMP and command line serve the same humans, yet the syntax they wrap around common functionality is different. Sad waste of developer time, sad waste of user brain power. The former could perhaps be reduced with better tooling, say having QAPI generate the details. If you have QAPI generate HMP and command line from the exact same definitions as QMP, you get what Vladimir wants: different interfaces to the exact same functionality, without additional coding. Note that the needs of humans and machines differ in more ways than just syntax. For instance, humans appreciate convenience features to save typing. In a machine interface, they'd add unnecessary and inappropriate complexity. Adding convenience is a good reason for actually designing the HMP interface, rather than copying the QMP one blindly.
Re: [Qemu-block] [PATCH v5 0/5] coroutine-lock: polymorphic CoQueue
On Sat, 02/03 10:39, Paolo Bonzini wrote: > There are cases in which a queued coroutine must be restarted from > non-coroutine context (with qemu_co_enter_next). In this cases, > qemu_co_enter_next also needs to be thread-safe, but it cannot use a > CoMutex and so cannot qemu_co_queue_wait. This happens in curl (which > right now is rolling its own list of Coroutines) and will happen in > Fam's NVMe driver as well. > > This series extracts the idea of a polymorphic lockable object > from my "scoped lock guard" proposal, and applies it to CoQueue. > The implementation of QemuLockable is similar to C11 _Generic, but > redone using the preprocessor and GCC builtins for compatibility. > > In general, while a bit on the esoteric side, the functionality used > to emulate _Generic is fairly old in GCC, and the builtins are already > used by include/qemu/atomic.h; the series was tested with Fedora 27 (boot > Damn Small Linux via http) and CentOS 6 (compiled only). > > Paolo > > v4->v5: fix checkpatch complaints Queued, thanks. Fam
Re: [Qemu-block] [PATCH for-2.9-rc5 v4 2/2] block: Drain BH in bdrv_drained_begin
On Tue, Feb 6, 2018 at 11:32 PM, Kevin Wolfwrote: > Am 18.04.2017 um 16:30 hat Fam Zheng geschrieben: >> During block job completion, nothing is preventing >> block_job_defer_to_main_loop_bh from being called in a nested >> aio_poll(), which is a trouble, such as in this code path: >> >> qmp_block_commit >> commit_active_start >> bdrv_reopen >> bdrv_reopen_multiple >> bdrv_reopen_prepare >> bdrv_flush >> aio_poll >> aio_bh_poll >> aio_bh_call >> block_job_defer_to_main_loop_bh >> stream_complete >> bdrv_reopen >> >> block_job_defer_to_main_loop_bh is the last step of the stream job, >> which should have been "paused" by the bdrv_drained_begin/end in >> bdrv_reopen_multiple, but it is not done because it's in the form of a >> main loop BH. >> >> Similar to why block jobs should be paused between drained_begin and >> drained_end, BHs they schedule must be excluded as well. To achieve >> this, this patch forces draining the BH in BDRV_POLL_WHILE. >> >> As a side effect this fixes a hang in block_job_detach_aio_context >> during system_reset when a block job is ready: >> >> #0 0x55aa79f3 in bdrv_drain_recurse >> #1 0x55aa825d in bdrv_drained_begin >> #2 0x55aa8449 in bdrv_drain >> #3 0x55a9c356 in blk_drain >> #4 0x55aa3cfd in mirror_drain >> #5 0x55a66e11 in block_job_detach_aio_context >> #6 0x55a62f4d in bdrv_detach_aio_context >> #7 0x55a63116 in bdrv_set_aio_context >> #8 0x55a9d326 in blk_set_aio_context >> #9 0x557e38da in virtio_blk_data_plane_stop >> #10 0x559f9d5f in virtio_bus_stop_ioeventfd >> #11 0x559fa49b in virtio_bus_stop_ioeventfd >> #12 0x559f6a18 in virtio_pci_stop_ioeventfd >> #13 0x559f6a18 in virtio_pci_reset >> #14 0x559139a9 in qdev_reset_one >> #15 0x55916738 in qbus_walk_children >> #16 0x55913318 in qdev_walk_children >> #17 0x55916738 in qbus_walk_children >> #18 0x559168ca in qemu_devices_reset >> #19 0x5581fcbb in pc_machine_reset >> #20 0x558a4d96 in qemu_system_reset >> #21 0x5577157a in main_loop_should_exit >> #22 0x5577157a in main_loop >> #23 0x5577157a in main >> >> The rationale is that the loop in block_job_detach_aio_context cannot >> make any progress in pausing/completing the job, because bs->in_flight >> is 0, so bdrv_drain doesn't process the block_job_defer_to_main_loop >> BH. With this patch, it does. >> >> Reported-by: Jeff Cody >> Signed-off-by: Fam Zheng > > Fam, do you remember whether this was really only about drain? Because > in that case... Yes I believe so. > >> diff --git a/include/block/block.h b/include/block/block.h >> index 97d4330..5ddc0cf 100644 >> --- a/include/block/block.h >> +++ b/include/block/block.h >> @@ -381,12 +381,13 @@ void bdrv_drain_all(void); >> >> #define BDRV_POLL_WHILE(bs, cond) ({ \ >> bool waited_ = false; \ >> +bool busy_ = true; \ >> BlockDriverState *bs_ = (bs); \ >> AioContext *ctx_ = bdrv_get_aio_context(bs_); \ >> if (aio_context_in_iothread(ctx_)) { \ >> -while ((cond)) { \ >> -aio_poll(ctx_, true); \ >> -waited_ = true;\ >> +while ((cond) || busy_) { \ >> +busy_ = aio_poll(ctx_, (cond));\ >> +waited_ |= !!(cond) | busy_; \ >> } \ >> } else { \ >> assert(qemu_get_current_aio_context() == \ >> @@ -398,11 +399,16 @@ void bdrv_drain_all(void); >> */\ >> assert(!bs_->wakeup); \ >> bs_->wakeup = true;\ >> -while ((cond)) { \ >> -aio_context_release(ctx_); \ >> -aio_poll(qemu_get_aio_context(), true);\ >> -aio_context_acquire(ctx_); \ >> -waited_ = true;\ >> +while (busy_) {\ >> +if ((cond)) { \ >> +waited_ = busy_ = true;\ >> +aio_context_release(ctx_);
Re: [Qemu-block] [Qemu-devel] [PATCH v8 18/26] block: Add sgfnt_runtime_opts to BlockDriver
On 02/05/2018 09:18 AM, Max Reitz wrote: This new field can be set by block drivers to list the runtime options they accept that may influence the contents of the respective BDS. As of a follow-up patch, this list will be used by the common bdrv_refresh_filename() implementation to decide which options to put into BDS.full_open_options (and consequently whether a JSON filename has to be created), thus freeing the drivers of having to implement that logic themselves. Additionally, this patch adds the field to all of the block drivers that need it and sets it accordingly. Signed-off-by: Max Reitz--- +/* Pointer to a NULL-terminated array of names of significant options that + * can be specified for bdrv_open(). A significant option is one that + * changes the data of a BDS. + * If this pointer is NULL, the array is considered empty. + * "filename" and "driver" are always considered significant. */ +const char *const *sgfnt_runtime_opts; Warning: Bikeshedding follows: s/sgfnt/vital/ might read easier (same number of letters, but has some vowels rthr thn bng crptc bbrvtn). -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [Qemu-block] [PATCH] block: Simplify bdrv_can_write_zeroes_with_unmap()
On 01/29/2018 05:08 AM, Stefan Hajnoczi wrote: On Fri, Jan 26, 2018 at 01:34:39PM -0600, Eric Blake wrote: We don't need the can_write_zeroes_with_unmap field in BlockDriverInfo, because it is redundant information with supported_zero_flags & BDRV_REQ_MAY_UNMAP. Note that BlockDriverInfo and supported_zero_flags are both per-device settings, rather than global state about the driver as a whole, which means one or both of these bits of information can already be conditional. Let's audit how they were set: ... Simplify the code by moving the conditional into supported_zero_flags for all drivers, then dropping the now-unused BDI field. For callers that relied on bdrv_can_write_zeroes_with_unmap(), we return the same per-device settings for drivers that had conditions (no observable change in behavior there); and can now return true (instead of false) for drivers that support passthrough (for example, the commit driver) which gives those drivers the same fix as nbd just got in bca80059e. For callers that relied on supported_zero_flags, we now have a few more places that can avoid a wasted call to pwrite_zeroes() that will just fail with ENOTSUP. Suggested-by: Paolo BonziniSigned-off-by: Eric Blake --- The commit id mentioned above is dependent on me not having to respin my latest NBD pull request: Based-on: <20180126160411.4033-1-ebl...@redhat.com> ([PULL 0/8] NBD patches through 26 Jan) Reviewed-by: Stefan Hajnoczi Thanks. Since this patch was discovered in relation to NBD code, I'm fine taking it through my NBD queue; but it's also more generic to the block layer, so I'm also fine if one of the other block maintainers grabs it first through their tree. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [Qemu-block] [PATCH 2/2] qapi: add block latency histogram interface
On 02/06/2018 12:06 PM, Vladimir Sementsov-Ogievskiy wrote: 06.02.2018 18:50, Eric Blake wrote: On 02/06/2018 08:07 AM, Vladimir Sementsov-Ogievskiy wrote: Set (and clear) histogram through new command block-latency-histogram-set and show new statistics in query-blockstats results. The commit message mentioned that you can set and clear histogram tracking; how does this interface let you clear things? By passing a 0-length latency array? If you violate the constraint (pass non-ascending points, for example), does the previous latency map get wiped out or is it still preserved unchanged? On error nothing is changed. By "clear" I mean zeroing statistics, not removing the whole histogram. And to zero statistics you can call set with the same latency array. There is no way to remove histogram at all.. We can add block-latency-histogram-unset later if needed. Maybe "set (or restart) histogram collection points" might read better? I also don't think we need a new command; if you make 'latency' optional, then omitting it can serve to stop collecting statistics altogether, without needing a new command (then again, if you do that, now the command is used to "set, restart, and clear histogram collection"). -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [Qemu-block] [PATCH v2] iotests: 205: support luks format
On 02/06/2018 12:26 PM, Daniel P. Berrangé wrote: On Tue, Feb 06, 2018 at 09:25:07PM +0300, Vladimir Sementsov-Ogievskiy wrote: Support default luks options in VM.add_drive and in new library function qemu_img_create. Use it in 205 iotests. Signed-off-by: Vladimir Sementsov-Ogievskiy--- Reviewed-by: Daniel P. Berrange Thanks. I'll take this through my NBD queue. git git://repo.or.cz/qemu/ericb.git nbd -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [Qemu-block] [PATCH v2 3/6] qapi: add nbd-server-remove
* Vladimir Sementsov-Ogievskiy (vsement...@virtuozzo.com) wrote: > 26.01.2018 18:05, Dr. David Alan Gilbert wrote: > > * Vladimir Sementsov-Ogievskiy (vsement...@virtuozzo.com) wrote: > > > 17.01.2018 19:03, Eric Blake wrote: > > > > On 01/17/2018 09:51 AM, Vladimir Sementsov-Ogievskiy wrote: > > > > > > > > > > > I have a script (for managing libvirt guest, but it can be > > > > > > > adopted for > > > > > > > qemu or even used for qemu monitor), which allows > > > > > > > me run qmp commands on vms as easy as: > > > > > > > > > > > > > > |qmp VMNAME query-block-jobs or qmp VMNAME nbd-server-remove name > > > > > > > exp1 > > > > > > > mode hard or even | > > > > > > > > > > > > > > |qmp VMNAME blockdev-add id disk driver qcow2 cache {writeback > > > > > > > true > > > > > > > direct true} aio native discard unmap file {driver file filename > > > > > > > /tmp/somedisk} ||| > > > > > > Yeah, there are various scripting solutions around QMP that can > > > > > > make it > > > > > > easier; but HMP is often still an easy front-line interface for > > > > > > experiments. > > > > > > > > > > > isn't it because these solutions are not available directly in > > > > > monitor, > > > > > when HMP is? > > > > QMP can be directly accessed in a monitor; it just requires more typing. > > > >If you are developing QMP commands, it may be easier to use > > > > ./scripts/qmp/qmp-shell (couple it with a readline wrapper, and you can > > > > even get tab-completion and history across sessions). There's also > > > > things like libvirt's 'virsh qmp-monitor-command' for shell-scripting > > > > access to arbitrary QMP commands, provided your guest is run by libvirt. > > > > > > > > > may be, we need third type of monitor HQMP which is QMP with > > > > > simplified > > > > > syntax? Or > > > > > allow qmp commands in simplified syntax directly in HMP? > > > > No, I don't think we need either thing. Wrappers around existing > > > > monitors is better than bloating qemu proper with a third flavor of > > > > monitor. And HMP is for humans, with no restrictions on back-compat > > > > changes, so if it doesn't do something you want for quick-and-dirty > > > > testing, you can either add a new HMP command, or just use QMP (or one > > > > of its wrappers, like qmp-shell) in the first place. Ultimately, our > > > > long-term concern is only about the QMP interface; HMP is supposed to be > > > > convenient. So if it starts costing too much time to port a QMP > > > > interface to HMP, then don't worry about it. > > > > > > > most of commands, ported to hmp are done in same style: they just call > > > corresponding qmp command. > > > Isn't it better to provide common interface for calling qmp commands > > > through > > > HMP monitor, to never > > > create hmp versions of new commands? they will be available automatically. > > It would be nice to do that, but they're not that consistent in how they > > convert parameters and options, but I occasionally wonder if we could > > automate more of it. > > > What about allowing some new syntax in hmp, directly mapped to qmp? > > something like > > >>> blockdev-add id disk driver qcow2 cache {writeback true direct true} aio > native discard unmap file {driver file filename /tmp/somedisk} > > ? Hmm, I don't particularly find that easy to read either; however the actual block device specification for HMP should be the same as what we pass on the command line, so we only have to worry about any extra things that are part of blockdev_add. (I'm sure we can find a way of making the one we pass on the commandline more readable as well, there's so much duplication). Dave > Or it may be realized as a separate hmp command "qmp" (looks more safe as a > first step, however, I think previous variant (direct call) is better): > > >>> qmp blockdev-add id disk driver qcow2 cache {writeback true direct true} > aio native discard unmap file {driver file filename /tmp/somedisk} > > what do think? This looks simple to implement and should be useful. > > > > > Dave > > > > > -- > > > Best regards, > > > Vladimir > > > > > -- > > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK > > > -- > Best regards, > Vladimir > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
[Qemu-block] [PATCH v2] iotests: 205: support luks format
Support default luks options in VM.add_drive and in new library function qemu_img_create. Use it in 205 iotests. Signed-off-by: Vladimir Sementsov-Ogievskiy--- v2: use keysec0 and IMGKEYSECRET tests/qemu-iotests/205| 4 ++-- tests/qemu-iotests/iotests.py | 31 +++ 2 files changed, 33 insertions(+), 2 deletions(-) diff --git a/tests/qemu-iotests/205 b/tests/qemu-iotests/205 index 10388920dc..e7b2eae51d 100644 --- a/tests/qemu-iotests/205 +++ b/tests/qemu-iotests/205 @@ -22,7 +22,7 @@ import os import sys import iotests import time -from iotests import qemu_img, qemu_io, filter_qemu_io, QemuIoInteractive +from iotests import qemu_img_create, qemu_io, filter_qemu_io, QemuIoInteractive nbd_sock = 'nbd_sock' nbd_uri = 'nbd+unix:///exp?socket=' + nbd_sock @@ -31,7 +31,7 @@ disk = os.path.join(iotests.test_dir, 'disk') class TestNbdServerRemove(iotests.QMPTestCase): def setUp(self): -qemu_img('create', '-f', iotests.imgfmt, disk, '1M') +qemu_img_create('-f', iotests.imgfmt, disk, '1M') self.vm = iotests.VM().add_drive(disk) self.vm.launch() diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py index 5a10b2d534..1bcc9ca57d 100644 --- a/tests/qemu-iotests/iotests.py +++ b/tests/qemu-iotests/iotests.py @@ -58,6 +58,11 @@ qemu_default_machine = os.environ.get('QEMU_DEFAULT_MACHINE') socket_scm_helper = os.environ.get('SOCKET_SCM_HELPER', 'socket_scm_helper') debug = False +luks_default_secret_object = 'secret,id=keysec0,data=' + \ + os.environ['IMGKEYSECRET'] +luks_default_key_secret_opt = 'key-secret=keysec0' + + def qemu_img(*args): '''Run qemu-img and return the exit code''' devnull = open('/dev/null', 'r+') @@ -66,6 +71,25 @@ def qemu_img(*args): sys.stderr.write('qemu-img received signal %i: %s\n' % (-exitcode, ' '.join(qemu_img_args + list(args return exitcode +def qemu_img_create(*args): +args = list(args) + +# default luks support +if '-f' in args and args[args.index('-f') + 1] == 'luks': +if '-o' in args: +i = args.index('-o') +if 'key-secret' not in args[i + 1]: +args[i + 1].append(luks_default_key_secret_opt) +args.insert(i + 2, '--object') +args.insert(i + 3, luks_default_secret_object) +else: +args = ['-o', luks_default_key_secret_opt, +'--object', luks_default_secret_object] + args + +args.insert(0, 'create') + +return qemu_img(*args) + def qemu_img_verbose(*args): '''Run qemu-img without suppressing its output and return the exit code''' exitcode = subprocess.call(qemu_img_args + list(args)) @@ -263,6 +287,13 @@ class VM(qtest.QEMUQtestMachine): if opts: options.append(opts) +if format == 'luks' and 'key-secret' not in opts: +# default luks support +if luks_default_secret_object not in self._args: +self.add_object(luks_default_secret_object) + +options.append(luks_default_key_secret_opt) + self._args.append('-drive') self._args.append(','.join(options)) self._num_drives += 1 -- 2.11.1
Re: [Qemu-block] [PATCH] iotests: 205: support luks format
On Tue, Feb 06, 2018 at 08:57:38PM +0300, Vladimir Sementsov-Ogievskiy wrote: > 06.02.2018 20:29, Daniel P. Berrangé wrote: > > On Tue, Feb 06, 2018 at 08:16:42PM +0300, Vladimir Sementsov-Ogievskiy > > wrote: > > > Support default luks options in VM.add_drive and in new library > > > function qemu_img_create. Use it in 205 iotests. > > > > > > Signed-off-by: Vladimir Sementsov-Ogievskiy> > > --- > > > > > > instead of > > >[PATCH] iotests: 205: support only raw format > > > > > > let's just support luks. This patch also makes it simple to support > > > luks in any other python iotest. > > > > > > tests/qemu-iotests/205| 4 ++-- > > > tests/qemu-iotests/iotests.py | 33 + > > > 2 files changed, 35 insertions(+), 2 deletions(-) > > > > > > diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py > > > index 5a10b2d534..4b9a4445cd 100644 > > > --- a/tests/qemu-iotests/iotests.py > > > +++ b/tests/qemu-iotests/iotests.py > > > @@ -58,6 +58,13 @@ qemu_default_machine = > > > os.environ.get('QEMU_DEFAULT_MACHINE') > > > socket_scm_helper = os.environ.get('SOCKET_SCM_HELPER', > > > 'socket_scm_helper') > > > debug = False > > > +luks_default_secret_id = 'luks_secret_default_iotests_id' > > Can we just call this "keysec0", so we matchh convention used by > > the shell script based tests. > > Here I'm trying to avoid intersection with some user-defined id. The "user" here is the person writing individual I/O tests. They already have to know to avoid keysec0 because that's the standard we've defined for the shell based scripts. So I don't see any benefit to divering in the Python - just doubles the stuff they need to remember. Regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
Re: [Qemu-block] [PATCH] iotests: 205: support luks format
06.02.2018 21:04, Eric Blake wrote: On 02/06/2018 11:57 AM, Vladimir Sementsov-Ogievskiy wrote: +luks_default_secret_id = 'luks_secret_default_iotests_id' Can we just call this "keysec0", so we matchh convention used by the shell script based tests. Here I'm trying to avoid intersection with some user-defined id. You're overthinking it. We are only using this in the testsuite, and nothing else in the testsuite is using 'keysec0' for anything except the id of the secret to pass to encrypted disks. The longer name doesn't add any protection. It might be different if we were trying to provide a reusable library for contexts outside the testsuite, but since we are not doing that, we can rely on 'make check' failing as evidence if we have any collisions in naming choices that need long name munging as a workaround. Ok -- Best regards, Vladimir
Re: [Qemu-block] [PATCH 2/2] qapi: add block latency histogram interface
06.02.2018 18:50, Eric Blake wrote: On 02/06/2018 08:07 AM, Vladimir Sementsov-Ogievskiy wrote: Set (and clear) histogram through new command block-latency-histogram-set and show new statistics in query-blockstats results. Signed-off-by: Vladimir Sementsov-Ogievskiy--- qapi/block-core.json | 62 +++- block/qapi.c | 31 ++ blockdev.c | 15 + 3 files changed, 107 insertions(+), 1 deletion(-) diff --git a/qapi/block-core.json b/qapi/block-core.json index 8225308904..4706a934d9 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -451,6 +451,63 @@ 'status': 'DirtyBitmapStatus'} } ## +# @BlockLatencyHistogramInfo: +# +# Block latency histogram. +# +# @latency: list of latency points in microseconds. Equals to @latency parameter +# of last called block-latency-histogram-set. Second sentence might read better as: Matches the @latency parameter from the last call to block-latency-histogram-set for the given device. +# +# @read: list of read-request counts corresponding to latency region. +# len(@read) = len(@latency) + 1 +# @read[0] corresponds to latency region [0, @latency[0]) +# for 0 < i < len(@latency): @read[i] corresponds to latency region +# [@latency[i-1], @latency[i]) +# @read.last_element corresponds to latency region +# [@latency.last_element, +inf) +# +# @write: list of write-request counts (see @read semantics) +# +# @flush: list of flush-request counts (see @read semantics) +# +# Since: 2.12 +## +{ 'struct': 'BlockLatencyHistogramInfo', + 'data': {'latency': ['uint64'], + 'read': ['uint64'], + 'write': ['uint64'], + 'flush': ['uint64'] } } Okay, I can see how this interface works. + +## +# @block-latency-histogram-set: +# +# Add latency histogram to block device. If latency histogram alredy exists for s/latency/a latency/ (twice) s/alredy/already/ +# the device it will be removed and a new one created. Latency histogram may be s/Latency/The latency/ +# quered through query-blockstats. s/quered/queried/ +# +# @device: device name to set latency histogram for. +# +# @latency: list of latency points in microseconds. The sequcence must be s/sequcence/sequence/ +# ascending, elements must be greater than zero. Histogram latency +# regions would be +# [0, @latency[0]), ..., [@latency[i], @latency[i+1]), ..., +# [@latency.last_element, +inf) +# +# Returns: error if device is not found. +# +# Since: 2.12 +# +# Example: +# +# -> { "execute": "block-latency-histogram-set", +# "arguments": { "device": "drive0", +# "latency": [50, 100, 200] } } +# <- { "return": {} } +## +{ 'command': 'block-latency-histogram-set', + 'data': {'device': 'str', 'latency': ['uint64'] } } The commit message mentioned that you can set and clear histogram tracking; how does this interface let you clear things? By passing a 0-length latency array? If you violate the constraint (pass non-ascending points, for example), does the previous latency map get wiped out or is it still preserved unchanged? On error nothing is changed. By "clear" I mean zeroing statistics, not removing the whole histogram. And to zero statistics you can call set with the same latency array. There is no way to remove histogram at all.. We can add block-latency-histogram-unset later if needed. + +## # @BlockInfo: # # Block device information. This structure describes a virtual device and @@ -730,6 +787,8 @@ # @timed_stats: Statistics specific to the set of previously defined # intervals of time (Since 2.5) # +# @latency-histogram: @BlockLatencyHistogramInfo. (Since 2.12) I'd mention that this only appears if block-latency-histogram-set has been called. yes +# # Since: 0.14.0 ## { 'struct': 'BlockDeviceStats', @@ -742,7 +801,8 @@ 'failed_flush_operations': 'int', 'invalid_rd_operations': 'int', 'invalid_wr_operations': 'int', 'invalid_flush_operations': 'int', 'account_invalid': 'bool', 'account_failed': 'bool', - 'timed_stats': ['BlockDeviceTimedStats'] } } + 'timed_stats': ['BlockDeviceTimedStats'], + '*latency-histogram': 'BlockLatencyHistogramInfo' } } -- Best regards, Vladimir
Re: [Qemu-block] [PATCH] iotests: 205: support luks format
On 02/06/2018 11:57 AM, Vladimir Sementsov-Ogievskiy wrote: +luks_default_secret_id = 'luks_secret_default_iotests_id' Can we just call this "keysec0", so we matchh convention used by the shell script based tests. Here I'm trying to avoid intersection with some user-defined id. You're overthinking it. We are only using this in the testsuite, and nothing else in the testsuite is using 'keysec0' for anything except the id of the secret to pass to encrypted disks. The longer name doesn't add any protection. It might be different if we were trying to provide a reusable library for contexts outside the testsuite, but since we are not doing that, we can rely on 'make check' failing as evidence if we have any collisions in naming choices that need long name munging as a workaround. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [Qemu-block] [PATCH] iotests: 205: support luks format
06.02.2018 20:29, Daniel P. Berrangé wrote: On Tue, Feb 06, 2018 at 08:16:42PM +0300, Vladimir Sementsov-Ogievskiy wrote: Support default luks options in VM.add_drive and in new library function qemu_img_create. Use it in 205 iotests. Signed-off-by: Vladimir Sementsov-Ogievskiy--- instead of [PATCH] iotests: 205: support only raw format let's just support luks. This patch also makes it simple to support luks in any other python iotest. tests/qemu-iotests/205| 4 ++-- tests/qemu-iotests/iotests.py | 33 + 2 files changed, 35 insertions(+), 2 deletions(-) diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py index 5a10b2d534..4b9a4445cd 100644 --- a/tests/qemu-iotests/iotests.py +++ b/tests/qemu-iotests/iotests.py @@ -58,6 +58,13 @@ qemu_default_machine = os.environ.get('QEMU_DEFAULT_MACHINE') socket_scm_helper = os.environ.get('SOCKET_SCM_HELPER', 'socket_scm_helper') debug = False +luks_default_secret_id = 'luks_secret_default_iotests_id' Can we just call this "keysec0", so we matchh convention used by the shell script based tests. Here I'm trying to avoid intersection with some user-defined id. +luks_default_secret_data = '12345' The 'check' script exports an environment variable IMGKEYSECRET that is intended to be used as the default password for LUKS. agree, missed this. +luks_default_secret_object = 'secret,id=' + luks_default_secret_id + \ + ',data=' + luks_default_secret_data +luks_default_key_secret_opt = 'key-secret=' + luks_default_secret_id Regards, Daniel -- Best regards, Vladimir
Re: [Qemu-block] [PATCH v2 3/6] qapi: add nbd-server-remove
06.02.2018 19:06, Eric Blake wrote: On 02/06/2018 09:29 AM, Vladimir Sementsov-Ogievskiy wrote: most of commands, ported to hmp are done in same style: they just call corresponding qmp command. Isn't it better to provide common interface for calling qmp commands through HMP monitor, to never create hmp versions of new commands? they will be available automatically. It would be nice to do that, but they're not that consistent in how they convert parameters and options, but I occasionally wonder if we could automate more of it. What about allowing some new syntax in hmp, directly mapped to qmp? something like >>> blockdev-add id disk driver qcow2 cache {writeback true direct true} aio native discard unmap file {driver file filename /tmp/somedisk} Personally, if I'm testing blockdev-add, I'll use QMP directly (or even scripts/qmp/qmp-shell or virsh qemu-monitor-command), not an HMP wrapper where I have to learn a new syntax of how to write something that will convert to QMP. We already have enough different ways to write things that I don't need to learn yet another syntax wrapper. Or maybe what I'm saying is that instead of inventing a new syntax, that if you DO add an HMP command that forwards to QMP, please reuse an existing syntax (whether direct JSON as used by QMP, or the syntax used by qmp-shell). I'm afraid, that JSON is too hard to use in human monitor. And this will make the whole feature useless. If you think writing a new HMP command is worth it, I won't stop you from writing it. But at this point, our current approach of writing a manual wrapper per command as we have interest, rather than a generic wrap-anything, has worked for the cases that HMP users have cared about. Remember, QMP is the interface that MUST work, while HMP is only for convenience, and if it is not trivial to make HMP do everything that QMP can do, it is no real loss. But we create hmp wrappers on demand, and for each case, we actually invent new syntax. I just search for the way to avoid creating new and new hmp wrappers, by introducing new syntax only once. And, here is almost nothing to learn: command := command-name parameters parameters = [key value ]... value = simple-value | array | map map = '{' parameters '}' array = '[' [value ]... ']' another variant is to use yaml - like json, but we do not need put all keys into quotes. On the other hand, implementing new parser in qemu is not trivial task (hmm, I don't want do it=), it should be simpler to create direct JSON wrapper in HMP monitor, and use some python wrapper around the monitor. And this looks useless, as with same result I can use wrapper around QMP monitor. So, may be the most interesting solution would be to make some easy-to-use python-based wrapper, which will give a simple way to use both qmp and hmp commands.. I'll think about it. However it doesn't solve initial problem of creating new and new hmp wrappers by hand. ? Or it may be realized as a separate hmp command "qmp" (looks more safe as a first step, however, I think previous variant (direct call) is better): >>> qmp blockdev-add id disk driver qcow2 cache {writeback true direct true} aio native discard unmap file {driver file filename /tmp/somedisk} what do think? This looks simple to implement and should be useful. Up to you if you want to tackle anything like that, but it would be a new thread (a generic way to invoke QMP from HMP is independent of nbd-server-remove). -- Best regards, Vladimir
Re: [Qemu-block] [PATCH] iotests: 205: support luks format
On Tue, Feb 06, 2018 at 08:16:42PM +0300, Vladimir Sementsov-Ogievskiy wrote: > Support default luks options in VM.add_drive and in new library > function qemu_img_create. Use it in 205 iotests. > > Signed-off-by: Vladimir Sementsov-Ogievskiy> --- > > instead of > [PATCH] iotests: 205: support only raw format > > let's just support luks. This patch also makes it simple to support > luks in any other python iotest. > > tests/qemu-iotests/205| 4 ++-- > tests/qemu-iotests/iotests.py | 33 + > 2 files changed, 35 insertions(+), 2 deletions(-) > > diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py > index 5a10b2d534..4b9a4445cd 100644 > --- a/tests/qemu-iotests/iotests.py > +++ b/tests/qemu-iotests/iotests.py > @@ -58,6 +58,13 @@ qemu_default_machine = > os.environ.get('QEMU_DEFAULT_MACHINE') > socket_scm_helper = os.environ.get('SOCKET_SCM_HELPER', 'socket_scm_helper') > debug = False > > +luks_default_secret_id = 'luks_secret_default_iotests_id' Can we just call this "keysec0", so we matchh convention used by the shell script based tests. > +luks_default_secret_data = '12345' The 'check' script exports an environment variable IMGKEYSECRET that is intended to be used as the default password for LUKS. > +luks_default_secret_object = 'secret,id=' + luks_default_secret_id + \ > + ',data=' + luks_default_secret_data > +luks_default_key_secret_opt = 'key-secret=' + luks_default_secret_id Regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
[Qemu-block] [PATCH] iotests: 205: support luks format
Support default luks options in VM.add_drive and in new library function qemu_img_create. Use it in 205 iotests. Signed-off-by: Vladimir Sementsov-Ogievskiy--- instead of [PATCH] iotests: 205: support only raw format let's just support luks. This patch also makes it simple to support luks in any other python iotest. tests/qemu-iotests/205| 4 ++-- tests/qemu-iotests/iotests.py | 33 + 2 files changed, 35 insertions(+), 2 deletions(-) diff --git a/tests/qemu-iotests/205 b/tests/qemu-iotests/205 index 10388920dc..e7b2eae51d 100644 --- a/tests/qemu-iotests/205 +++ b/tests/qemu-iotests/205 @@ -22,7 +22,7 @@ import os import sys import iotests import time -from iotests import qemu_img, qemu_io, filter_qemu_io, QemuIoInteractive +from iotests import qemu_img_create, qemu_io, filter_qemu_io, QemuIoInteractive nbd_sock = 'nbd_sock' nbd_uri = 'nbd+unix:///exp?socket=' + nbd_sock @@ -31,7 +31,7 @@ disk = os.path.join(iotests.test_dir, 'disk') class TestNbdServerRemove(iotests.QMPTestCase): def setUp(self): -qemu_img('create', '-f', iotests.imgfmt, disk, '1M') +qemu_img_create('-f', iotests.imgfmt, disk, '1M') self.vm = iotests.VM().add_drive(disk) self.vm.launch() diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py index 5a10b2d534..4b9a4445cd 100644 --- a/tests/qemu-iotests/iotests.py +++ b/tests/qemu-iotests/iotests.py @@ -58,6 +58,13 @@ qemu_default_machine = os.environ.get('QEMU_DEFAULT_MACHINE') socket_scm_helper = os.environ.get('SOCKET_SCM_HELPER', 'socket_scm_helper') debug = False +luks_default_secret_id = 'luks_secret_default_iotests_id' +luks_default_secret_data = '12345' +luks_default_secret_object = 'secret,id=' + luks_default_secret_id + \ + ',data=' + luks_default_secret_data +luks_default_key_secret_opt = 'key-secret=' + luks_default_secret_id + + def qemu_img(*args): '''Run qemu-img and return the exit code''' devnull = open('/dev/null', 'r+') @@ -66,6 +73,25 @@ def qemu_img(*args): sys.stderr.write('qemu-img received signal %i: %s\n' % (-exitcode, ' '.join(qemu_img_args + list(args return exitcode +def qemu_img_create(*args): +args = list(args) + +# default luks support +if '-f' in args and args[args.index('-f') + 1] == 'luks': +if '-o' in args: +i = args.index('-o') +if 'key-secret' not in args[i + 1]: +args[i + 1].append(luks_default_key_secret_opt) +args.insert(i + 2, '--object') +args.insert(i + 3, luks_default_secret_object) +else: +args = ['-o', luks_default_key_secret_opt, +'--object', luks_default_secret_object] + args + +args.insert(0, 'create') + +return qemu_img(*args) + def qemu_img_verbose(*args): '''Run qemu-img without suppressing its output and return the exit code''' exitcode = subprocess.call(qemu_img_args + list(args)) @@ -263,6 +289,13 @@ class VM(qtest.QEMUQtestMachine): if opts: options.append(opts) +if format == 'luks' and 'key-secret' not in opts: +# default luks support +if luks_default_secret_object not in self._args: +self.add_object(luks_default_secret_object) + +options.append(luks_default_key_secret_opt) + self._args.append('-drive') self._args.append(','.join(options)) self._num_drives += 1 -- 2.11.1
Re: [Qemu-block] [PATCH] block: fix write with zero flag set and iovector provided
On Thu, Feb 01, 2018 at 05:16:31PM +0300, Anton Nefedov wrote: > The normal bdrv_co_pwritev() use is either > - BDRV_REQ_ZERO_WRITE reset and iovector provided > - BDRV_REQ_ZERO_WRITE set and iovector == NULL > > while > - the flag reset and iovector == NULL is an assertion failure > in bdrv_co_do_zero_pwritev() > - the flag set and iovector provided is in fact allowed > (the flag prevails and zeroes are written) > > However the alignment logic does not support the latter case so the padding > areas get overwritten with zeroes. Please include a test case. Berto mentioned that bdrv_pwrite_zeroes() hits this issue, that might be one way to test it. > Solution could be to forbid such case or just use bdrv_co_do_zero_pwritev() > alignment for it which also makes the code a bit more obvious anyway. > > Signed-off-by: Anton Nefedov> --- > block/io.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/block/io.c b/block/io.c > index 7ea4023..cf63fd0 100644 > --- a/block/io.c > +++ b/block/io.c > @@ -1701,7 +1701,7 @@ int coroutine_fn bdrv_co_pwritev(BdrvChild *child, > */ > tracked_request_begin(, bs, offset, bytes, BDRV_TRACKED_WRITE); > > -if (!qiov) { > +if (flags & BDRV_REQ_ZERO_WRITE) { > ret = bdrv_co_do_zero_pwritev(child, offset, bytes, flags, ); > goto out; > } Looks good. signature.asc Description: PGP signature
Re: [Qemu-block] [PATCH v2 3/6] qapi: add nbd-server-remove
On 02/06/2018 09:29 AM, Vladimir Sementsov-Ogievskiy wrote: most of commands, ported to hmp are done in same style: they just call corresponding qmp command. Isn't it better to provide common interface for calling qmp commands through HMP monitor, to never create hmp versions of new commands? they will be available automatically. It would be nice to do that, but they're not that consistent in how they convert parameters and options, but I occasionally wonder if we could automate more of it. What about allowing some new syntax in hmp, directly mapped to qmp? something like >>> blockdev-add id disk driver qcow2 cache {writeback true direct true} aio native discard unmap file {driver file filename /tmp/somedisk} Personally, if I'm testing blockdev-add, I'll use QMP directly (or even scripts/qmp/qmp-shell or virsh qemu-monitor-command), not an HMP wrapper where I have to learn a new syntax of how to write something that will convert to QMP. We already have enough different ways to write things that I don't need to learn yet another syntax wrapper. Or maybe what I'm saying is that instead of inventing a new syntax, that if you DO add an HMP command that forwards to QMP, please reuse an existing syntax (whether direct JSON as used by QMP, or the syntax used by qmp-shell). If you think writing a new HMP command is worth it, I won't stop you from writing it. But at this point, our current approach of writing a manual wrapper per command as we have interest, rather than a generic wrap-anything, has worked for the cases that HMP users have cared about. Remember, QMP is the interface that MUST work, while HMP is only for convenience, and if it is not trivial to make HMP do everything that QMP can do, it is no real loss. ? Or it may be realized as a separate hmp command "qmp" (looks more safe as a first step, however, I think previous variant (direct call) is better): >>> qmp blockdev-add id disk driver qcow2 cache {writeback true direct true} aio native discard unmap file {driver file filename /tmp/somedisk} what do think? This looks simple to implement and should be useful. Up to you if you want to tackle anything like that, but it would be a new thread (a generic way to invoke QMP from HMP is independent of nbd-server-remove). -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [Qemu-block] [PATCH 2/2] qapi: add block latency histogram interface
On 02/06/2018 08:07 AM, Vladimir Sementsov-Ogievskiy wrote: Set (and clear) histogram through new command block-latency-histogram-set and show new statistics in query-blockstats results. Signed-off-by: Vladimir Sementsov-Ogievskiy--- qapi/block-core.json | 62 +++- block/qapi.c | 31 ++ blockdev.c | 15 + 3 files changed, 107 insertions(+), 1 deletion(-) diff --git a/qapi/block-core.json b/qapi/block-core.json index 8225308904..4706a934d9 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -451,6 +451,63 @@ 'status': 'DirtyBitmapStatus'} } ## +# @BlockLatencyHistogramInfo: +# +# Block latency histogram. +# +# @latency: list of latency points in microseconds. Equals to @latency parameter +# of last called block-latency-histogram-set. Second sentence might read better as: Matches the @latency parameter from the last call to block-latency-histogram-set for the given device. +# +# @read: list of read-request counts corresponding to latency region. +#len(@read) = len(@latency) + 1 +#@read[0] corresponds to latency region [0, @latency[0]) +#for 0 < i < len(@latency): @read[i] corresponds to latency region +#[@latency[i-1], @latency[i]) +#@read.last_element corresponds to latency region +#[@latency.last_element, +inf) +# +# @write: list of write-request counts (see @read semantics) +# +# @flush: list of flush-request counts (see @read semantics) +# +# Since: 2.12 +## +{ 'struct': 'BlockLatencyHistogramInfo', + 'data': {'latency': ['uint64'], + 'read': ['uint64'], + 'write': ['uint64'], + 'flush': ['uint64'] } } Okay, I can see how this interface works. + +## +# @block-latency-histogram-set: +# +# Add latency histogram to block device. If latency histogram alredy exists for s/latency/a latency/ (twice) s/alredy/already/ +# the device it will be removed and a new one created. Latency histogram may be s/Latency/The latency/ +# quered through query-blockstats. s/quered/queried/ +# +# @device: device name to set latency histogram for. +# +# @latency: list of latency points in microseconds. The sequcence must be s/sequcence/sequence/ +# ascending, elements must be greater than zero. Histogram latency +# regions would be +# [0, @latency[0]), ..., [@latency[i], @latency[i+1]), ..., +# [@latency.last_element, +inf) +# +# Returns: error if device is not found. +# +# Since: 2.12 +# +# Example: +# +# -> { "execute": "block-latency-histogram-set", +# "arguments": { "device": "drive0", +# "latency": [50, 100, 200] } } +# <- { "return": {} } +## +{ 'command': 'block-latency-histogram-set', + 'data': {'device': 'str', 'latency': ['uint64'] } } The commit message mentioned that you can set and clear histogram tracking; how does this interface let you clear things? By passing a 0-length latency array? If you violate the constraint (pass non-ascending points, for example), does the previous latency map get wiped out or is it still preserved unchanged? + +## # @BlockInfo: # # Block device information. This structure describes a virtual device and @@ -730,6 +787,8 @@ # @timed_stats: Statistics specific to the set of previously defined # intervals of time (Since 2.5) # +# @latency-histogram: @BlockLatencyHistogramInfo. (Since 2.12) I'd mention that this only appears if block-latency-histogram-set has been called. +# # Since: 0.14.0 ## { 'struct': 'BlockDeviceStats', @@ -742,7 +801,8 @@ 'failed_flush_operations': 'int', 'invalid_rd_operations': 'int', 'invalid_wr_operations': 'int', 'invalid_flush_operations': 'int', 'account_invalid': 'bool', 'account_failed': 'bool', - 'timed_stats': ['BlockDeviceTimedStats'] } } + 'timed_stats': ['BlockDeviceTimedStats'], + '*latency-histogram': 'BlockLatencyHistogramInfo' } } -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [Qemu-block] [PATCH for-2.9-rc5 v4 2/2] block: Drain BH in bdrv_drained_begin
Am 18.04.2017 um 16:30 hat Fam Zheng geschrieben: > During block job completion, nothing is preventing > block_job_defer_to_main_loop_bh from being called in a nested > aio_poll(), which is a trouble, such as in this code path: > > qmp_block_commit > commit_active_start > bdrv_reopen > bdrv_reopen_multiple > bdrv_reopen_prepare > bdrv_flush > aio_poll > aio_bh_poll > aio_bh_call > block_job_defer_to_main_loop_bh > stream_complete > bdrv_reopen > > block_job_defer_to_main_loop_bh is the last step of the stream job, > which should have been "paused" by the bdrv_drained_begin/end in > bdrv_reopen_multiple, but it is not done because it's in the form of a > main loop BH. > > Similar to why block jobs should be paused between drained_begin and > drained_end, BHs they schedule must be excluded as well. To achieve > this, this patch forces draining the BH in BDRV_POLL_WHILE. > > As a side effect this fixes a hang in block_job_detach_aio_context > during system_reset when a block job is ready: > > #0 0x55aa79f3 in bdrv_drain_recurse > #1 0x55aa825d in bdrv_drained_begin > #2 0x55aa8449 in bdrv_drain > #3 0x55a9c356 in blk_drain > #4 0x55aa3cfd in mirror_drain > #5 0x55a66e11 in block_job_detach_aio_context > #6 0x55a62f4d in bdrv_detach_aio_context > #7 0x55a63116 in bdrv_set_aio_context > #8 0x55a9d326 in blk_set_aio_context > #9 0x557e38da in virtio_blk_data_plane_stop > #10 0x559f9d5f in virtio_bus_stop_ioeventfd > #11 0x559fa49b in virtio_bus_stop_ioeventfd > #12 0x559f6a18 in virtio_pci_stop_ioeventfd > #13 0x559f6a18 in virtio_pci_reset > #14 0x559139a9 in qdev_reset_one > #15 0x55916738 in qbus_walk_children > #16 0x55913318 in qdev_walk_children > #17 0x55916738 in qbus_walk_children > #18 0x559168ca in qemu_devices_reset > #19 0x5581fcbb in pc_machine_reset > #20 0x558a4d96 in qemu_system_reset > #21 0x5577157a in main_loop_should_exit > #22 0x5577157a in main_loop > #23 0x5577157a in main > > The rationale is that the loop in block_job_detach_aio_context cannot > make any progress in pausing/completing the job, because bs->in_flight > is 0, so bdrv_drain doesn't process the block_job_defer_to_main_loop > BH. With this patch, it does. > > Reported-by: Jeff Cody> Signed-off-by: Fam Zheng Fam, do you remember whether this was really only about drain? Because in that case... > diff --git a/include/block/block.h b/include/block/block.h > index 97d4330..5ddc0cf 100644 > --- a/include/block/block.h > +++ b/include/block/block.h > @@ -381,12 +381,13 @@ void bdrv_drain_all(void); > > #define BDRV_POLL_WHILE(bs, cond) ({ \ > bool waited_ = false; \ > +bool busy_ = true; \ > BlockDriverState *bs_ = (bs); \ > AioContext *ctx_ = bdrv_get_aio_context(bs_); \ > if (aio_context_in_iothread(ctx_)) { \ > -while ((cond)) { \ > -aio_poll(ctx_, true); \ > -waited_ = true;\ > +while ((cond) || busy_) { \ > +busy_ = aio_poll(ctx_, (cond));\ > +waited_ |= !!(cond) | busy_; \ > } \ > } else { \ > assert(qemu_get_current_aio_context() == \ > @@ -398,11 +399,16 @@ void bdrv_drain_all(void); > */\ > assert(!bs_->wakeup); \ > bs_->wakeup = true;\ > -while ((cond)) { \ > -aio_context_release(ctx_); \ > -aio_poll(qemu_get_aio_context(), true);\ > -aio_context_acquire(ctx_); \ > -waited_ = true;\ > +while (busy_) {\ > +if ((cond)) { \ > +waited_ = busy_ = true;\ > +aio_context_release(ctx_); \ > +aio_poll(qemu_get_aio_context(), true);\ > +aio_context_acquire(ctx_); \ > +} else {
Re: [Qemu-block] [PATCH v2 3/6] qapi: add nbd-server-remove
26.01.2018 18:05, Dr. David Alan Gilbert wrote: * Vladimir Sementsov-Ogievskiy (vsement...@virtuozzo.com) wrote: 17.01.2018 19:03, Eric Blake wrote: On 01/17/2018 09:51 AM, Vladimir Sementsov-Ogievskiy wrote: I have a script (for managing libvirt guest, but it can be adopted for qemu or even used for qemu monitor), which allows me run qmp commands on vms as easy as: |qmp VMNAME query-block-jobs or qmp VMNAME nbd-server-remove name exp1 mode hard or even | |qmp VMNAME blockdev-add id disk driver qcow2 cache {writeback true direct true} aio native discard unmap file {driver file filename /tmp/somedisk} ||| Yeah, there are various scripting solutions around QMP that can make it easier; but HMP is often still an easy front-line interface for experiments. isn't it because these solutions are not available directly in monitor, when HMP is? QMP can be directly accessed in a monitor; it just requires more typing. If you are developing QMP commands, it may be easier to use ./scripts/qmp/qmp-shell (couple it with a readline wrapper, and you can even get tab-completion and history across sessions). There's also things like libvirt's 'virsh qmp-monitor-command' for shell-scripting access to arbitrary QMP commands, provided your guest is run by libvirt. may be, we need third type of monitor HQMP which is QMP with simplified syntax? Or allow qmp commands in simplified syntax directly in HMP? No, I don't think we need either thing. Wrappers around existing monitors is better than bloating qemu proper with a third flavor of monitor. And HMP is for humans, with no restrictions on back-compat changes, so if it doesn't do something you want for quick-and-dirty testing, you can either add a new HMP command, or just use QMP (or one of its wrappers, like qmp-shell) in the first place. Ultimately, our long-term concern is only about the QMP interface; HMP is supposed to be convenient. So if it starts costing too much time to port a QMP interface to HMP, then don't worry about it. most of commands, ported to hmp are done in same style: they just call corresponding qmp command. Isn't it better to provide common interface for calling qmp commands through HMP monitor, to never create hmp versions of new commands? they will be available automatically. It would be nice to do that, but they're not that consistent in how they convert parameters and options, but I occasionally wonder if we could automate more of it. What about allowing some new syntax in hmp, directly mapped to qmp? something like >>> blockdev-add id disk driver qcow2 cache {writeback true direct true} aio native discard unmap file {driver file filename /tmp/somedisk} ? Or it may be realized as a separate hmp command "qmp" (looks more safe as a first step, however, I think previous variant (direct call) is better): >>> qmp blockdev-add id disk driver qcow2 cache {writeback true direct true} aio native discard unmap file {driver file filename /tmp/somedisk} what do think? This looks simple to implement and should be useful. Dave -- Best regards, Vladimir -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK -- Best regards, Vladimir
Re: [Qemu-block] [Qemu-devel] [PATCH] block: include original filename when reporting invalid URIs
On 02/06/2018 04:52 AM, Daniel P. Berrangé wrote: Consider passing a JSON based block driver to "qemu-img commit" $ qemu-img commit 'json:{"driver":"qcow2","file":{"driver":"gluster",\ "volume":"gv0","path":"sn1.qcow2", "server":[{"type":\ "tcp","host":"10.73.199.197","port":"24007"}]},}' Currently it will commit the content and then report an incredibly useless error message when trying to re-open the committed image: qemu-img: invalid URI Usage: file=gluster[+transport]://[host[:port]]volume/path[?socket=...][,file.debug=N][,file.logfile=/path/filename.log] With this fix we get: qemu-img: invalid URI json:{"server.0.host": "10.73.199.197", "driver": "gluster", "path": "luks.qcow2", "server.0.type": "tcp", "server.0.port": "24007", "volume": "gv0"} Of course the root cause problem still exists, but now we know what actually needs fixing. Signed-off-by: Daniel P. Berrangé--- block/gluster.c | 2 +- block/sheepdog.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) Reviewed-by: Eric Blake -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [Qemu-block] [Qemu-devel] [PATCH v9 03/13] block/dirty-bitmap: add _locked version of bdrv_reclaim_dirty_bitmap
24.01.2018 13:16, Paolo Bonzini wrote: On 22/01/2018 13:14, Vladimir Sementsov-Ogievskiy wrote: so, accessing the bitmap needs mutex lock? Then what do you mean under accessing the bitmap? Any touch of BdrvDirtyBitmap fields? Then "reading the list" will require bitmap mutex too. Or accessing the bitmap is accessing any field except BdrvDirtyBitmap.list? Then in (2), what do you mean? For example query-block will go through the list, but it touches other fields too, so it should lock mutex. The bitmap mutex is internal to block/dirty-bitmap.c. yes and query-block actually calls bdrv_query_dirty_bitmaps, which locks mutex.. and one more question: What about qmp transactions? Should we lock mutex during the whole transaction? Transactions hold the BQL, but you don't need to lock the bitmap mutex. I don't understand. But "Accessing a bitmap only requires dirty_bitmap_mutex".. So, if we have several operations on one bitmap, each of them will lock/unlock mutex by itself? Then we'll have unlocked "holes" in our transaction. Or this doesn't matter? Paolo -- Best regards, Vladimir
Re: [Qemu-block] [PATCH v8 21/26] block: Purify .bdrv_refresh_filename()
On Mon 05 Feb 2018 04:18:30 PM CET, Max Reitz wrote: > Currently, BlockDriver.bdrv_refresh_filename() is supposed to both > refresh the filename (BDS.exact_filename) and set BDS.full_open_options. > Now that we have generic code in the central bdrv_refresh_filename() for > creating BDS.full_open_options, we can drop the latter part from all > BlockDriver.bdrv_refresh_filename() implementations. > > This also means that we can drop all of the existing default code for > this from the global bdrv_refresh_filename() itself. > > Furthermore, we now have to call BlockDriver.bdrv_refresh_filename() > after having set BDS.full_open_options, because the block driver's > implementation should now be allowed to depend on BDS.full_open_options > being set correctly. > > Finally, with this patch we can drop the @options parameter from > BlockDriver.bdrv_refresh_filename(); also, add a comment on this > function's purpose in block/block_int.h while touching its interface. > > Signed-off-by: Max ReitzReviewed-by: Alberto Garcia Berto
[Qemu-block] [PATCH 1/2] block/accounting: introduce latency histogram
Introduce latency histogram statics for block devices. For each accounted operation type latency region [0, +inf) is divided into subregions by several points. Then, calculate hits for each subregion. Signed-off-by: Vladimir Sementsov-Ogievskiy--- include/block/accounting.h | 8 + block/accounting.c | 83 ++ 2 files changed, 91 insertions(+) diff --git a/include/block/accounting.h b/include/block/accounting.h index b833d26d6c..7fbfc86c43 100644 --- a/include/block/accounting.h +++ b/include/block/accounting.h @@ -45,6 +45,12 @@ struct BlockAcctTimedStats { QSLIST_ENTRY(BlockAcctTimedStats) entries; }; +typedef struct BlockLatencyHistogram { +int size; +uint64_t *points; /* @size-1 points here (all points, except 0 and +inf) */ +uint64_t *histogram[BLOCK_MAX_IOTYPE]; /* @size elements for each type */ +} BlockLatencyHistogram; + struct BlockAcctStats { QemuMutex lock; uint64_t nr_bytes[BLOCK_MAX_IOTYPE]; @@ -57,6 +63,7 @@ struct BlockAcctStats { QSLIST_HEAD(, BlockAcctTimedStats) intervals; bool account_invalid; bool account_failed; +BlockLatencyHistogram latency_histogram; }; typedef struct BlockAcctCookie { @@ -82,5 +89,6 @@ void block_acct_merge_done(BlockAcctStats *stats, enum BlockAcctType type, int64_t block_acct_idle_time_ns(BlockAcctStats *stats); double block_acct_queue_depth(BlockAcctTimedStats *stats, enum BlockAcctType type); +int block_latency_histogram_set(BlockAcctStats *stats, uint64List *latency); #endif diff --git a/block/accounting.c b/block/accounting.c index 87ef5bbfaa..a34ef09015 100644 --- a/block/accounting.c +++ b/block/accounting.c @@ -94,6 +94,86 @@ void block_acct_start(BlockAcctStats *stats, BlockAcctCookie *cookie, cookie->type = type; } +/* block_latency_histogram_compare_func + * Compare @key with interval [@el, @el+1), where @el+1 is a next array element + * after @el. + * Return: -1 if @key < @el + * 0 if @key in [@el, @el+1) + * +1 if @key >= @el+1 + */ +static int block_latency_histogram_compare_func(const void *key, const void *el) +{ +uint64_t k = *(uint64_t *)key; +uint64_t a = *(uint64_t *)el; +uint64_t b = *((uint64_t *)el + 1); + +return k < a ? -1 : (k < b ? 0 : 1); +} + +static void block_latency_histogram_account(BlockLatencyHistogram *hist, +enum BlockAcctType type, +int64_t latency_ns) +{ +uint64_t *data, *pos; + +if (hist->points == NULL) { +/* histogram disabled */ +return; +} + +data = hist->histogram[type]; + +if (latency_ns < hist->points[0]) { +data[0]++; +return; +} + +if (latency_ns >= hist->points[hist->size - 2]) { +data[hist->size - 1]++; +return; +} + +pos = bsearch(_ns, hist->points, hist->size - 2, + sizeof(hist->points[0]), + block_latency_histogram_compare_func); +assert(pos != NULL); + +data[pos - hist->points + 1]++; +} + +int block_latency_histogram_set(BlockAcctStats *stats, uint64List *latency) +{ +BlockLatencyHistogram *hist = >latency_histogram; +uint64List *entry; +uint64_t *ptr; +int i; +uint64_t prev = 0; + +hist->size = 1; + +for (entry = latency; entry; entry = entry->next) { +if (entry->value <= prev) { +return -EINVAL; +} +hist->size++; +prev = entry->value; +} + +hist->points = g_renew(uint64_t, hist->points, hist->size - 1); +for (entry = latency, ptr = hist->points; entry; + entry = entry->next, ptr++) +{ +*ptr = entry->value; +} + +for (i = 0; i < BLOCK_MAX_IOTYPE; i++) { +hist->histogram[i] = g_renew(uint64_t, hist->histogram[i], hist->size); +memset(hist->histogram[i], 0, hist->size * sizeof(uint64_t)); +} + +return 0; +} + static void block_account_one_io(BlockAcctStats *stats, BlockAcctCookie *cookie, bool failed) { @@ -116,6 +196,9 @@ static void block_account_one_io(BlockAcctStats *stats, BlockAcctCookie *cookie, stats->nr_ops[cookie->type]++; } +block_latency_histogram_account(>latency_histogram, cookie->type, +latency_ns); + if (!failed || stats->account_failed) { stats->total_time_ns[cookie->type] += latency_ns; stats->last_access_time_ns = time_ns; -- 2.11.1
Re: [Qemu-block] [PATCH v8 25/26] block/curl: Implement bdrv_refresh_filename()
On Mon 05 Feb 2018 04:18:34 PM CET, Max Reitz wrote: > Signed-off-by: Max ReitzReviewed-by: Alberto Garcia Berto
[Qemu-block] [PATCH 2/2] qapi: add block latency histogram interface
Set (and clear) histogram through new command block-latency-histogram-set and show new statistics in query-blockstats results. Signed-off-by: Vladimir Sementsov-Ogievskiy--- qapi/block-core.json | 62 +++- block/qapi.c | 31 ++ blockdev.c | 15 + 3 files changed, 107 insertions(+), 1 deletion(-) diff --git a/qapi/block-core.json b/qapi/block-core.json index 8225308904..4706a934d9 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -451,6 +451,63 @@ 'status': 'DirtyBitmapStatus'} } ## +# @BlockLatencyHistogramInfo: +# +# Block latency histogram. +# +# @latency: list of latency points in microseconds. Equals to @latency parameter +# of last called block-latency-histogram-set. +# +# @read: list of read-request counts corresponding to latency region. +#len(@read) = len(@latency) + 1 +#@read[0] corresponds to latency region [0, @latency[0]) +#for 0 < i < len(@latency): @read[i] corresponds to latency region +#[@latency[i-1], @latency[i]) +#@read.last_element corresponds to latency region +#[@latency.last_element, +inf) +# +# @write: list of write-request counts (see @read semantics) +# +# @flush: list of flush-request counts (see @read semantics) +# +# Since: 2.12 +## +{ 'struct': 'BlockLatencyHistogramInfo', + 'data': {'latency': ['uint64'], + 'read': ['uint64'], + 'write': ['uint64'], + 'flush': ['uint64'] } } + +## +# @block-latency-histogram-set: +# +# Add latency histogram to block device. If latency histogram alredy exists for +# the device it will be removed and a new one created. Latency histogram may be +# quered through query-blockstats. +# +# @device: device name to set latency histogram for. +# +# @latency: list of latency points in microseconds. The sequcence must be +# ascending, elements must be greater than zero. Histogram latency +# regions would be +# [0, @latency[0]), ..., [@latency[i], @latency[i+1]), ..., +# [@latency.last_element, +inf) +# +# Returns: error if device is not found. +# +# Since: 2.12 +# +# Example: +# +# -> { "execute": "block-latency-histogram-set", +# "arguments": { "device": "drive0", +# "latency": [50, 100, 200] } } +# <- { "return": {} } +## +{ 'command': 'block-latency-histogram-set', + 'data': {'device': 'str', 'latency': ['uint64'] } } + +## # @BlockInfo: # # Block device information. This structure describes a virtual device and @@ -730,6 +787,8 @@ # @timed_stats: Statistics specific to the set of previously defined # intervals of time (Since 2.5) # +# @latency-histogram: @BlockLatencyHistogramInfo. (Since 2.12) +# # Since: 0.14.0 ## { 'struct': 'BlockDeviceStats', @@ -742,7 +801,8 @@ 'failed_flush_operations': 'int', 'invalid_rd_operations': 'int', 'invalid_wr_operations': 'int', 'invalid_flush_operations': 'int', 'account_invalid': 'bool', 'account_failed': 'bool', - 'timed_stats': ['BlockDeviceTimedStats'] } } + 'timed_stats': ['BlockDeviceTimedStats'], + '*latency-histogram': 'BlockLatencyHistogramInfo' } } ## # @BlockStats: diff --git a/block/qapi.c b/block/qapi.c index fc10f0a565..715ed17a6b 100644 --- a/block/qapi.c +++ b/block/qapi.c @@ -389,6 +389,24 @@ static void bdrv_query_info(BlockBackend *blk, BlockInfo **p_info, qapi_free_BlockInfo(info); } +static uint64List *uint64_list(uint64_t *list, int size) +{ +int i; +uint64List *out_list = NULL; +uint64List **pout_list = _list; + +for (i = 0; i < size; i++) { +uint64List *entry = g_new(uint64List, 1); +entry->value = list[i]; +*pout_list = entry; +pout_list = >next; +} + +*pout_list = NULL; + +return out_list; +} + static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk) { BlockAcctStats *stats = blk_get_stats(blk); @@ -454,6 +472,19 @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk) dev_stats->avg_wr_queue_depth = block_acct_queue_depth(ts, BLOCK_ACCT_WRITE); } + +ds->has_latency_histogram = stats->latency_histogram.points != NULL; +if (ds->has_latency_histogram) { +BlockLatencyHistogramInfo *info = g_new0(BlockLatencyHistogramInfo, 1); +BlockLatencyHistogram *h = >latency_histogram; + +ds->latency_histogram = info; + +info->latency = uint64_list(h->points, h->size - 1); +info->read = uint64_list(h->histogram[BLOCK_ACCT_READ], h->size); +info->write = uint64_list(h->histogram[BLOCK_ACCT_WRITE], h->size); +info->flush = uint64_list(h->histogram[BLOCK_ACCT_FLUSH], h->size); +} } static BlockStats *bdrv_query_bds_stats(BlockDriverState *bs, diff --git a/blockdev.c
[Qemu-block] [PATCH 0/2] block latency histogram
Vladimir Sementsov-Ogievskiy (2): block/accounting: introduce latency histogram qapi: add block latency histogram interface qapi/block-core.json | 62 +- include/block/accounting.h | 8 + block/accounting.c | 83 ++ block/qapi.c | 31 + blockdev.c | 15 + 5 files changed, 198 insertions(+), 1 deletion(-) -- 2.11.1
Re: [Qemu-block] [PATCH v8 24/26] block/curl: Harmonize option defaults
On Mon 05 Feb 2018 04:18:33 PM CET, Max Reitz wrote: > Both of the defaults we currently have in the curl driver are named > based on a slightly different schema, let's unify that and call both > CURL_BLOCK_OPT_${NAME}_DEFAULT. > > While at it, we can add a macro for the third option for which a default > exists, namely "sslverify". > > Signed-off-by: Max ReitzReviewed-by: Alberto Garcia Berto
Re: [Qemu-block] [PATCH v8 20/26] block: Generically refresh runtime options
On Mon 05 Feb 2018 04:18:29 PM CET, Max Reitz wrote: > Instead of having every block driver which implements > bdrv_refresh_filename() copy all of the significant runtime options over > to bs->full_open_options, implement this process generically in > bdrv_refresh_filename(). > > This patch only adds this new generic implementation, it does not remove > the old functionality. This is done in a follow-up patch. > > With this patch, some superfluous information (that should never have > been there) may be removed from some JSON filenames, as can be seen in > the change to iotest 110's reference output. In case of 191, backing > nodes that have not been overridden are now removed from the filename. > > Signed-off-by: Max ReitzReviewed-by: Alberto Garcia Berto
Re: [Qemu-block] [PATCH v8 05/26] block: Respect backing bs in bdrv_refresh_filename
On Mon 05 Feb 2018 04:18:14 PM CET, Max Reitz wrote: > Basically, bdrv_refresh_filename() should respect all children of a > BlockDriverState. However, generally those children are driver-specific, > so this function cannot handle the general case. On the other hand, > there are only few drivers which use other children than @file and > @backing (that being vmdk, quorum, and blkverify). > > Most block drivers only use @file and/or @backing (if they use any > children at all). Both can be implemented directly in > bdrv_refresh_filename. > > The user overriding the file's filename is already handled, however, the > user overriding the backing file is not. If this is done, opening the > BDS with the plain filename of its file will not be correct, so we may > not set bs->exact_filename in that case. > > iotests 051 and 191 contain test cases for overwriting the backing file, > and so their output changes with this patch applied (which I consider a > good thing). Note that in the case of 191, the implicitly opened > (non-overridden) base file is included in the json:{} filename as well > -- this will be remedied by a later patch. > > Signed-off-by: Max ReitzReviewed-by: Alberto Garcia Berto
Re: [Qemu-block] [PATCH v8 04/26] iotests: Drop explicit base blockdev in 191
On Mon 05 Feb 2018 04:18:13 PM CET, Max Reitz wrote: > Overriding the backing image should result in a json:{} pseudo-filename. > Then, you can no longer use the commit block job with filename > parameters. Therefore, do not explicitly add the base and override the > middle image in iotest 191, since we do not need to anyway. This will > allow us to continue to use the middle image's filename to identify it. > > In the long run, we want block-commit to accept node names for base and > top (just like block-stream does). > > Signed-off-by: Max ReitzReviewed-by: Alberto Garcia Berto
Re: [Qemu-block] [PATCH v2 1/8] qapi: group BlockDeviceStats fields
On Fri 19 Jan 2018 01:50:00 PM CET, Anton Nefedov wrote: > Make the stat fields definition slightly more readable. > Cosmetic change only. > > Signed-off-by: Anton Nefedov> --- > qapi/block-core.json | 24 ++-- > 1 file changed, 18 insertions(+), 6 deletions(-) > > diff --git a/qapi/block-core.json b/qapi/block-core.json > index e94a688..2e0665e 100644 > --- a/qapi/block-core.json > +++ b/qapi/block-core.json > @@ -733,14 +733,26 @@ > # Since: 0.14.0 > ## > { 'struct': 'BlockDeviceStats', > - 'data': {'rd_bytes': 'int', 'wr_bytes': 'int', 'rd_operations': 'int', > - 'wr_operations': 'int', 'flush_operations': 'int', > + 'data': {'rd_bytes': 'int', 'wr_bytes': 'int', > + > + 'rd_operations': 'int', 'wr_operations': 'int', > + 'flush_operations': 'int', > + > 'flush_total_time_ns': 'int', 'wr_total_time_ns': 'int', > - 'rd_total_time_ns': 'int', 'wr_highest_offset': 'int', > - 'rd_merged': 'int', 'wr_merged': 'int', '*idle_time_ns': 'int', > + 'rd_total_time_ns': 'int', It would be nice to reorder these following the read-write-flush order. This way these would be rd_total_time_ns, wr_total_time_ns and flush_total_time_ns. The order of the fields in the documentation would also need to be changed. Berto
Re: [Qemu-block] Block Migration and CPU throttling
Am 12.12.2017 um 18:05 schrieb Dr. David Alan Gilbert: * Peter Lieven (p...@kamp.de) wrote: Am 21.09.2017 um 14:36 schrieb Dr. David Alan Gilbert: * Peter Lieven (p...@kamp.de) wrote: Am 19.09.2017 um 16:41 schrieb Dr. David Alan Gilbert: * Peter Lieven (p...@kamp.de) wrote: Am 19.09.2017 um 16:38 schrieb Dr. David Alan Gilbert: * Peter Lieven (p...@kamp.de) wrote: Hi, I just noticed that CPU throttling and Block Migration don't work together very well. During block migration the throttling heuristic detects that we obviously make no progress in ram transfer. But the reason is the running block migration and not a too high dirty pages rate. The result is that any VM is throttled by 99% during block migration. Hmm that's unfortunate; do you have a bandwidth set lower than your actual network connection? I'm just wondering if it's actually going between the block and RAM iterative sections or getting stuck in ne. It happens also if source and dest are on the same machine and speed is set to 100G. But does it happen if they're not and the speed is set low? Yes, it does. I noticed it in our test environment between different nodes with a 10G link in between. But its totally clear why it happens. During block migration we transfer all dirty memory pages in each round (if there is moderate memory load), but all dirty pages are obviously more than 50% of the transferred ram in that round. It is more exactly 100%. But the current logic triggers on this condition. I think I will go forward and send a patch which disables auto converge during block migration bulk stage. Yes, that's fair; it probably would also make sense to throttle the RAM migration during the block migration bulk stage, since the chances are it's not going to get far. (I think in the nbd setup, the main migration process isn't started until the end of bulk). Catching up with the idea of delaying ram migration until block bulk has completed. What do you think is the easiest way to achieve this? I think the answer depends whether we think this is a 'special' or we need a new general purpose mechanism. If it was really general then we'd probably want to split the iterative stage in two somehow, and only do RAM in the second half. But I'm not sure it's worth it; I suspect the easiest way is: a) Add a counter in migration/ram.c or in the RAM state somewhere b) Make ram_save_inhibit increment the counter c) Check the counter at the head of ram_save_iterate and just exit if it's none 0 d) Call ram_save_inhibit from block_save_setup e) Then release it when you've finished the bulk stage Make sure you still count the RAM in the pending totals, otherwise migration might think it's finished a bit early. Is there any culprit I don't see or is it as easy as this? diff --git a/migration/ram.c b/migration/ram.c index cb1950f..c67bcf1 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -2255,6 +2255,13 @@ static int ram_save_iterate(QEMUFile *f, void *opaque) int64_t t0; int done = 0; + if (blk_mig_bulk_active()) { + /* Avoid transferring RAM during bulk phase of block migration as + * the bulk phase will usually take a lot of time and transferring + * RAM updates again and again is pointless. */ + goto out; + } + rcu_read_lock(); if (ram_list.version != rs->last_version) { ram_state_reset(rs); @@ -2301,6 +2308,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque) */ ram_control_after_iterate(f, RAM_CONTROL_ROUND); +out: qemu_put_be64(f, RAM_SAVE_FLAG_EOS); ram_counters.transferred += 8; Peter
[Qemu-block] [PATCH] block: include original filename when reporting invalid URIs
Consider passing a JSON based block driver to "qemu-img commit" $ qemu-img commit 'json:{"driver":"qcow2","file":{"driver":"gluster",\ "volume":"gv0","path":"sn1.qcow2", "server":[{"type":\ "tcp","host":"10.73.199.197","port":"24007"}]},}' Currently it will commit the content and then report an incredibly useless error message when trying to re-open the committed image: qemu-img: invalid URI Usage: file=gluster[+transport]://[host[:port]]volume/path[?socket=...][,file.debug=N][,file.logfile=/path/filename.log] With this fix we get: qemu-img: invalid URI json:{"server.0.host": "10.73.199.197", "driver": "gluster", "path": "luks.qcow2", "server.0.type": "tcp", "server.0.port": "24007", "volume": "gv0"} Of course the root cause problem still exists, but now we know what actually needs fixing. Signed-off-by: Daniel P. Berrangé--- block/gluster.c | 2 +- block/sheepdog.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/block/gluster.c b/block/gluster.c index 0f4265a3a4..0215e19087 100644 --- a/block/gluster.c +++ b/block/gluster.c @@ -660,7 +660,7 @@ static struct glfs *qemu_gluster_init(BlockdevOptionsGluster *gconf, if (filename) { ret = qemu_gluster_parse_uri(gconf, filename); if (ret < 0) { -error_setg(errp, "invalid URI"); +error_setg(errp, "invalid URI %s", filename); error_append_hint(errp, "Usage: file=gluster[+transport]://" "[host[:port]]volume/path[?socket=...]" "[,file.debug=N]" diff --git a/block/sheepdog.c b/block/sheepdog.c index f684477328..c847ab6c98 100644 --- a/block/sheepdog.c +++ b/block/sheepdog.c @@ -1050,7 +1050,7 @@ static void sd_parse_uri(SheepdogConfig *cfg, const char *filename, cfg->uri = uri = uri_parse(filename); if (!uri) { -error_setg(, "invalid URI"); +error_setg(, "invalid URI '%s'", filename); goto out; } -- 2.14.3