Re: [Qemu-block] [Qemu-devel] [PATCH] nbd/client: add x-block-status hack for testing server

2018-06-20 Thread no-reply
Hi,

This series failed docker-mingw@fedora build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

Type: series
Message-id: 20180621032539.134944-1-ebl...@redhat.com
Subject: [Qemu-devel] [PATCH] nbd/client: add x-block-status hack for testing 
server

=== TEST SCRIPT BEGIN ===
#!/bin/bash
set -e
git submodule update --init dtc
# Let docker tests dump environment info
export SHOW_ENV=1
export J=8
time make docker-test-mingw@fedora
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
a8c78c2842 nbd/client: add x-block-status hack for testing server

=== OUTPUT BEGIN ===
Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc'
Cloning into '/var/tmp/patchew-tester-tmp-ecxi0nux/src/dtc'...
Submodule path 'dtc': checked out 'e54388015af1fb4bf04d0bca99caba1074d9cc42'
  BUILD   fedora
make[1]: Entering directory '/var/tmp/patchew-tester-tmp-ecxi0nux/src'
  GEN 
/var/tmp/patchew-tester-tmp-ecxi0nux/src/docker-src.2018-06-21-01.51.06.25070/qemu.tar
Cloning into 
'/var/tmp/patchew-tester-tmp-ecxi0nux/src/docker-src.2018-06-21-01.51.06.25070/qemu.tar.vroot'...
done.
Your branch is up-to-date with 'origin/test'.
Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc'
Cloning into 
'/var/tmp/patchew-tester-tmp-ecxi0nux/src/docker-src.2018-06-21-01.51.06.25070/qemu.tar.vroot/dtc'...
Submodule path 'dtc': checked out 'e54388015af1fb4bf04d0bca99caba1074d9cc42'
Submodule 'ui/keycodemapdb' (git://git.qemu.org/keycodemapdb.git) registered 
for path 'ui/keycodemapdb'
Cloning into 
'/var/tmp/patchew-tester-tmp-ecxi0nux/src/docker-src.2018-06-21-01.51.06.25070/qemu.tar.vroot/ui/keycodemapdb'...
Submodule path 'ui/keycodemapdb': checked out 
'6b3d716e2b6472eb7189d3220552280ef3d832ce'
  COPYRUNNER
RUN test-mingw in qemu:fedora 
Packages installed:
SDL2-devel-2.0.8-5.fc28.x86_64
bc-1.07.1-5.fc28.x86_64
bison-3.0.4-9.fc28.x86_64
bluez-libs-devel-5.49-3.fc28.x86_64
brlapi-devel-0.6.7-12.fc28.x86_64
bzip2-1.0.6-26.fc28.x86_64
bzip2-devel-1.0.6-26.fc28.x86_64
ccache-3.4.2-2.fc28.x86_64
clang-6.0.0-5.fc28.x86_64
device-mapper-multipath-devel-0.7.4-2.git07e7bd5.fc28.x86_64
findutils-4.6.0-19.fc28.x86_64
flex-2.6.1-7.fc28.x86_64
gcc-8.1.1-1.fc28.x86_64
gcc-c++-8.1.1-1.fc28.x86_64
gettext-0.19.8.1-14.fc28.x86_64
git-2.17.1-2.fc28.x86_64
glib2-devel-2.56.1-3.fc28.x86_64
glusterfs-api-devel-4.0.2-1.fc28.x86_64
gnutls-devel-3.6.2-1.fc28.x86_64
gtk3-devel-3.22.30-1.fc28.x86_64
hostname-3.20-3.fc28.x86_64
libaio-devel-0.3.110-11.fc28.x86_64
libasan-8.1.1-1.fc28.x86_64
libattr-devel-2.4.47-23.fc28.x86_64
libcap-devel-2.25-9.fc28.x86_64
libcap-ng-devel-0.7.9-1.fc28.x86_64
libcurl-devel-7.59.0-3.fc28.x86_64
libfdt-devel-1.4.6-4.fc28.x86_64
libpng-devel-1.6.34-3.fc28.x86_64
librbd-devel-12.2.5-1.fc28.x86_64
libssh2-devel-1.8.0-7.fc28.x86_64
libubsan-8.1.1-1.fc28.x86_64
libusbx-devel-1.0.21-6.fc28.x86_64
libxml2-devel-2.9.7-4.fc28.x86_64
llvm-6.0.0-11.fc28.x86_64
lzo-devel-2.08-12.fc28.x86_64
make-4.2.1-6.fc28.x86_64
mingw32-SDL2-2.0.5-3.fc27.noarch
mingw32-bzip2-1.0.6-9.fc27.noarch
mingw32-curl-7.57.0-1.fc28.noarch
mingw32-glib2-2.54.1-1.fc28.noarch
mingw32-gmp-6.1.2-2.fc27.noarch
mingw32-gnutls-3.5.13-2.fc27.noarch
mingw32-gtk3-3.22.16-1.fc27.noarch
mingw32-libjpeg-turbo-1.5.1-3.fc27.noarch
mingw32-libpng-1.6.29-2.fc27.noarch
mingw32-libssh2-1.8.0-3.fc27.noarch
mingw32-libtasn1-4.13-1.fc28.noarch
mingw32-nettle-3.3-3.fc27.noarch
mingw32-pixman-0.34.0-3.fc27.noarch
mingw32-pkg-config-0.28-9.fc27.x86_64
mingw64-SDL2-2.0.5-3.fc27.noarch
mingw64-bzip2-1.0.6-9.fc27.noarch
mingw64-curl-7.57.0-1.fc28.noarch
mingw64-glib2-2.54.1-1.fc28.noarch
mingw64-gmp-6.1.2-2.fc27.noarch
mingw64-gnutls-3.5.13-2.fc27.noarch
mingw64-gtk3-3.22.16-1.fc27.noarch
mingw64-libjpeg-turbo-1.5.1-3.fc27.noarch
mingw64-libpng-1.6.29-2.fc27.noarch
mingw64-libssh2-1.8.0-3.fc27.noarch
mingw64-libtasn1-4.13-1.fc28.noarch
mingw64-nettle-3.3-3.fc27.noarch
mingw64-pixman-0.34.0-3.fc27.noarch
mingw64-pkg-config-0.28-9.fc27.x86_64
ncurses-devel-6.1-5.20180224.fc28.x86_64
nettle-devel-3.4-2.fc28.x86_64
nss-devel-3.36.1-1.1.fc28.x86_64
numactl-devel-2.0.11-8.fc28.x86_64
package PyYAML is not installed
package libjpeg-devel is not installed
perl-5.26.2-411.fc28.x86_64
pixman-devel-0.34.0-8.fc28.x86_64
python3-3.6.5-1.fc28.x86_64
snappy-devel-1.1.7-5.fc28.x86_64
sparse-0.5.2-1.fc28.x86_64
spice-server-devel-0.14.0-4.fc28.x86_64
systemtap-sdt-devel-3.2-11.fc28.x86_64
tar-1.30-3.fc28.x86_64
usbredir-devel-0.7.1-7.fc28.x86_64
virglrenderer-devel-0.6.0-4.20170210git76b3da97b.fc28.x86_64
vte3-devel-0.36.5-6.fc28.x86_64
which-2.21-8.fc28.x86_64
xen-devel-4.10.1-3.fc28.x86_64
zlib-devel-1.2.11-8.fc28.x86_64

Environment variables:
TARGET_LIST=
PACKAGES=ccache gettext git tar PyYAML sparse flex bison python3 bzip2 hostname 
gcc gcc-c++ llvm clang make perl which bc findutils glib2-devel 

Re: [Qemu-block] [Qemu-devel] [PATCH] [RFC v2] aio: properly bubble up errors from initialization

2018-06-20 Thread Nishanth Aravamudan
On 20.06.2018 [12:34:52 -0700], Nishanth Aravamudan wrote:
> On 20.06.2018 [11:57:42 +0200], Kevin Wolf wrote:
> > Am 20.06.2018 um 00:54 hat Nishanth Aravamudan geschrieben:
> > > On 19.06.2018 [15:35:57 -0700], Nishanth Aravamudan wrote:
> > > > On 19.06.2018 [13:14:51 -0700], Nishanth Aravamudan wrote:
> > > > > On 19.06.2018 [14:35:33 -0500], Eric Blake wrote:
> > > > > > On 06/15/2018 12:47 PM, Nishanth Aravamudan via Qemu-devel wrote:
> > > > 
> > > > 
> > > > 
> > > > > > >   } else if (s->use_linux_aio) {
> > > > > > > +int rc;
> > > > > > > +rc = aio_setup_linux_aio(bdrv_get_aio_context(bs));
> > > > > > > +if (rc != 0) {
> > > > > > > +error_report("Unable to use native AIO, falling 
> > > > > > > back to "
> > > > > > > + "thread pool.");
> > > > > > 
> > > > > > In general, error_report() should not output a trailing '.'.
> > > > > 
> > > > > Will fix.
> > > > > 
> > > > > > > +s->use_linux_aio = 0;
> > > > > > > +return rc;
> > > > > > 
> > > > > > Wait - the message claims we are falling back, but the non-zero 
> > > > > > return code
> > > > > > sounds like we are returning an error instead of falling back.  (My
> > > > > > preference - if the user requested something and we can't do it, 
> > > > > > it's better
> > > > > > to error than to fall back to something that does not match the 
> > > > > > user's
> > > > > > request).
> > > > > 
> > > > > I think that makes sense, I hadn't tested this specific case (in my
> > > > > reading of the code, it wasn't clear to me if raw_co_prw() could be
> > > > > called before raw_aio_plug() had been called, but I think returning 
> > > > > the
> > > > > error code up should be handled correctly. What about the cases where
> > > > > there is no error handling (the other two changes in the patch)?
> > > > 
> > > > While looking at doing these changes, I realized that I'm not quite sure
> > > > what the right approach is here. My original rationale for returning
> > > > non-zero was that AIO was requested but could not be completed. I
> > > > haven't fully tracked back the calling paths, but I assumed it would get
> > > > retried at the top level, and since we indicated to not use AIO on
> > > > subsequent calls, it will succeed and use threads then (note, that I do
> > > > now realize this means a mismatch between the qemu command-line and the
> > > > in-use AIO model).
> > > > 
> > > > In practice, with my v2 patch, where I do return a non-zero error-code
> > > > from this function, qemu does not exit (nor is any logging other than
> > > > that I added emitted on the monitor). If I do not fallback, I imagine we
> > > > would just continuously see this error message and IO might not actually
> > > > every occur? Reworking all of the callpath to fail on non-zero returns
> > > > from raw_co_prw() seems like a fair bit of work, but if that is what is
> > > > being requested, I can try that (it will just take a while).
> > > > Alternatively, I can produce a v3 quickly that does not bubble the
> > > > actual errno all the way up (since it does seem like it is ignored
> > > > anyways?).
> > > 
> > > Sorry for the noise, but I had one more thought. Would it be appropriate
> > > to push the _setup() call up to when we parse the arguments about
> > > aio=native? E.g., we already check there if cache=directsync is
> > > specified and error out if not.
> > 
> > We already do this:
> 
> Right, I stated above it already is done, I simply meant adding a second
> check here that we can obtain and setup the AIO context successfully.
>  
> >  /* Currently Linux does AIO only for files opened with O_DIRECT */
> > if (s->use_linux_aio && !(s->open_flags & O_DIRECT)) {
> > error_setg(errp, "aio=native was specified, but it requires "
> >  "cache.direct=on, which was not specified.");
> > ret = -EINVAL;
> > goto fail;
> > }
> > 
> > laio_init() is about other types of errors. But anyway, yes, calling
> > laio_init() already in .bdrv_open() is possible. Returning errors from
> > .bdrv_open() is nice and easy and we should do it.
> 
> Ack.
> 
> > However, we may also need to call laio_init() again when switching to a
> > different I/O thread after the image is already opened. This is what I
> > meant when I commented on v1 that you should do this in the
> > .bdrv_attach_aio_context callback. The problem here is that we can't
> > return an error there and the guest is already using the image. In this
> > case, logging an error and falling back to the thread pool seems to be
> > the best option we have.
> 
> Is this is a request for new functionality? Just trying to understand,
> because aiui, block/file-posix.c does not implement the
> bdrv_attach_aio_context callback currently. Instead, aio_get_linux_aio()
> is called from three places, raw_co_prw, raw_aio_plug and
> raw_aio_unplug, which calls into 

[Qemu-block] [PATCH] nbd/client: add x-block-status hack for testing server

2018-06-20 Thread Eric Blake
In order to test that the NBD server is properly advertising
dirty bitmaps, we need a bare minimum client that can request
and read the context.  This patch is a hack (hence the use of
the x- prefix) that serves two purposes: first, it lets the
client pass a request of more than one context at a time to
the server, to test the reaction of the server to various
contexts (via the list command).  Second, whatever the first
context in the user's list becomes the context wired up to the
results visible in bdrv_block_status(); this has the result
that if you pass in 'qemu:dirty-bitmap:b' instead of the usual
'base:allocation', and the server is currently serving a named
bitmap 'b', then commands like 'qemu-img map' now output status
corresponding to the dirty bitmap (dirty sections look like
holes, while clean sections look like data, based on how the
status bits are mapped over the NBD protocol).

Since the hack corrupts the meaning of bdrv_block_status(), I
would NOT try to run 'qemu-img convert' or any other program
that might misbehave based on thinking clusters have a different
status than what the normal 'base:allocation' would provide.

The hack uses a semicolon-separated list embedded in a single
string, as that was easier to wire into the nbd block driver than
figuring out the right incantation of flattened QDict to represent
an array via the command line.  Oh well, just one more reason that
this hack deserves the 'x-' prefix.

As a demo, I was able to prove things work with the following sequence:

$ qemu-img info file
image: file
file format: qcow2
virtual size: 2.0M (2097152 bytes)
disk size: 2.0M
cluster_size: 65536
Format specific information:
compat: 1.1
lazy refcounts: false
refcount bits: 16
corrupt: false

$ ./x86_64-softmmu/qemu-system-x86_64 -nodefaults -nographic -qmp stdio
{"QMP": {"version": {"qemu": {"micro": 50, "minor": 12, "major": 2}, "package": 
"v2.12.0-1531-g3ab98aa673d"}, "capabilities": []}}
{'execute':'qmp_capabilities'}
{"return": {}}
{'execute':'blockdev-add','arguments':{'driver':'qcow2','node-name':'n','file':{'driver':'file','filename':'file'}}}
{"return": {}}
{'execute':'block-dirty-bitmap-add','arguments':{'node':'n','name':'b','persistent':true}}
{"return": {}}
{'execute':'quit'}
{"return": {}}
{"timestamp": {"seconds": 1529548814, "microseconds": 472828}, "event": 
"SHUTDOWN", "data": {"guest": false}}

$ ./qemu-io -f qcow2 file
qemu-io> r -v 0 1
:  01  .
read 1/1 bytes at offset 0
1 bytes, 1 ops; 0.0001 sec (4.957 KiB/sec and 5076.1421 ops/sec)
qemu-io> w -P 1 0 1
wrote 1/1 bytes at offset 0
1 bytes, 1 ops; 0.0078 sec (127.502231 bytes/sec and 127.5022 ops/sec)
qemu-io> q

$ ./x86_64-softmmu/qemu-system-x86_64 -nodefaults -nographic -qmp stdio
{"QMP": {"version": {"qemu": {"micro": 50, "minor": 12, "major": 2}, "package": 
"v2.12.0-1531-g3ab98aa673d"}, "capabilities": []}}
{'execute':'qmp_capabilities'}
{"return": {}}
{'execute':'nbd-server-start','arguments':{'addr':{'type':'inet','data':{'host':'localhost','port':'10809'
{"return": {}}
{'execute':'blockdev-add','arguments':{'driver':'qcow2','node-name':'n','file':{'driver':'file','filename':'file'}}}
{"return": {}}
{'execute':'nbd-server-add','arguments':{'device':'n'}}
{"return": {}}
{'execute':'x-nbd-server-add-bitmap','arguments':{'name':'n','bitmap':'b'}}
{"error": {"class": "GenericError", "desc": "Bitmap 'b' is enabled"}}
{'execute':'x-block-dirty-bitmap-disable','arguments':{'node':'n','name':'b'}}
{"return": {}}
{'execute':'x-nbd-server-add-bitmap','arguments':{'name':'n','bitmap':'b'}}
{"return": {}}
... leave running

$ ./qemu-img map --output=json --image-opts 
driver=nbd,export=n,server.type=inet,server.host=localhost,server.port=10809
[{ "start": 0, "length": 1114112, "depth": 0, "zero": false, "data": true},
{ "start": 1114112, "length": 458752, "depth": 0, "zero": true, "data": false},
{ "start": 1572864, "length": 524288, "depth": 0, "zero": false, "data": true}]

$ ./qemu-img map --output=json --image-opts 
driver=nbd,export=n,server.type=inet,server.host=localhost,server.port=10809,x-block-status=qemu:dirty-bitmap:b
[{ "start": 0, "length": 65536, "depth": 0, "zero": false, "data": false},
{ "start": 65536, "length": 2031616, "depth": 0, "zero": false, "data": true}]

The difference between the two runs shows that base:allocation status
is thus different from the contents of dirty bitmap 'b'; and that the
dirty bitmap 'b' indeed tracked the first 64k of the file as being
dirty due to the qemu-io write at offset 0 performed between the creation
of bitmap b in the first qemu, and the disabling it prior to exporting it
in the second qemu.

Signed-off-by: Eric Blake 
---

Based-on: <20180621031957.134718-1-ebl...@redhat.com>
([PULL 0/7] bitmap export over NBD)

 qapi/block-core.json |  12 -
 block/nbd-client.h   |   1 +
 include/block/nbd.h  |   1 +
 block/nbd-client.c   |   2 +
 block/nbd.c  |   9 +++-
 nbd/client.c | 130 

[Qemu-block] [PULL 4/7] nbd/server: add nbd_meta_empty_or_pattern helper

2018-06-20 Thread Eric Blake
From: Vladimir Sementsov-Ogievskiy 

Add nbd_meta_pattern() and nbd_meta_empty_or_pattern() helpers for
metadata query parsing. nbd_meta_pattern() will be reused for the
"qemu" namespace in following patches.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Message-Id: <20180609151758.17343-4-vsement...@virtuozzo.com>
Reviewed-by: Eric Blake 
[eblake: comment tweaks]
Signed-off-by: Eric Blake 
---
 nbd/server.c | 101 ---
 1 file changed, 68 insertions(+), 33 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index bbdc3c01b9f..cea5192addb 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -733,6 +733,71 @@ static int nbd_negotiate_send_meta_context(NBDClient 
*client,
 return qio_channel_writev_all(client->ioc, iov, 2, errp) < 0 ? -EIO : 0;
 }

+/* Read strlen(@pattern) bytes, and set @match to true if they match @pattern.
+ * @match is never set to false.
+ *
+ * Return -errno on I/O error, 0 if option was completely handled by
+ * sending a reply about inconsistent lengths, or 1 on success.
+ *
+ * Note: return code = 1 doesn't mean that we've read exactly @pattern.
+ * It only means that there are no errors.
+ */
+static int nbd_meta_pattern(NBDClient *client, const char *pattern, bool 
*match,
+Error **errp)
+{
+int ret;
+char *query;
+size_t len = strlen(pattern);
+
+assert(len);
+
+query = g_malloc(len);
+ret = nbd_opt_read(client, query, len, errp);
+if (ret <= 0) {
+g_free(query);
+return ret;
+}
+
+if (strncmp(query, pattern, len) == 0) {
+trace_nbd_negotiate_meta_query_parse(pattern);
+*match = true;
+} else {
+trace_nbd_negotiate_meta_query_skip("pattern not matched");
+}
+g_free(query);
+
+return 1;
+}
+
+/*
+ * Read @len bytes, and set @match to true if they match @pattern, or if @len
+ * is 0 and the client is performing _LIST_. @match is never set to false.
+ *
+ * Return -errno on I/O error, 0 if option was completely handled by
+ * sending a reply about inconsistent lengths, or 1 on success.
+ *
+ * Note: return code = 1 doesn't mean that we've read exactly @pattern.
+ * It only means that there are no errors.
+ */
+static int nbd_meta_empty_or_pattern(NBDClient *client, const char *pattern,
+ uint32_t len, bool *match, Error **errp)
+{
+if (len == 0) {
+if (client->opt == NBD_OPT_LIST_META_CONTEXT) {
+*match = true;
+}
+trace_nbd_negotiate_meta_query_parse("empty");
+return 1;
+}
+
+if (len != strlen(pattern)) {
+trace_nbd_negotiate_meta_query_skip("different lengths");
+return nbd_opt_skip(client, len, errp);
+}
+
+return nbd_meta_pattern(client, pattern, match, errp);
+}
+
 /* nbd_meta_base_query
  *
  * Handle queries to 'base' namespace. For now, only the base:allocation
@@ -741,43 +806,12 @@ static int nbd_negotiate_send_meta_context(NBDClient 
*client,
  *
  * Return -errno on I/O error, 0 if option was completely handled by
  * sending a reply about inconsistent lengths, or 1 on success.
- *
- * Note: return code = 1 doesn't mean that we've parsed the "base:allocation"
- * namespace. It only means that there are no errors.
  */
 static int nbd_meta_base_query(NBDClient *client, NBDExportMetaContexts *meta,
uint32_t len, Error **errp)
 {
-int ret;
-char query[sizeof("allocation") - 1];
-size_t alen = strlen("allocation");
-
-if (len == 0) {
-if (client->opt == NBD_OPT_LIST_META_CONTEXT) {
-meta->base_allocation = true;
-}
-trace_nbd_negotiate_meta_query_parse("base:");
-return 1;
-}
-
-if (len != alen) {
-trace_nbd_negotiate_meta_query_skip("not base:allocation");
-return nbd_opt_skip(client, len, errp);
-}
-
-ret = nbd_opt_read(client, query, len, errp);
-if (ret <= 0) {
-return ret;
-}
-
-if (strncmp(query, "allocation", alen) == 0) {
-trace_nbd_negotiate_meta_query_parse("base:allocation");
-meta->base_allocation = true;
-} else {
-trace_nbd_negotiate_meta_query_skip("not base:allocation");
-}
-
-return 1;
+return nbd_meta_empty_or_pattern(client, "allocation", len,
+ >base_allocation, errp);
 }

 /* nbd_negotiate_meta_query
@@ -823,6 +857,7 @@ static int nbd_negotiate_meta_query(NBDClient *client,
 return nbd_opt_skip(client, len, errp);
 }

+trace_nbd_negotiate_meta_query_parse("base:");
 return nbd_meta_base_query(client, meta, len, errp);
 }

-- 
2.14.4




[Qemu-block] [PULL 5/7] nbd/server: implement dirty bitmap export

2018-06-20 Thread Eric Blake
From: Vladimir Sementsov-Ogievskiy 

Handle a new NBD meta namespace: "qemu", and corresponding queries:
"qemu:dirty-bitmap:".

With the new metadata context negotiated, BLOCK_STATUS query will reply
with dirty-bitmap data, converted to extents. The new public function
nbd_export_bitmap selects which bitmap to export. For now, only one bitmap
may be exported.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Message-Id: <20180609151758.17343-5-vsement...@virtuozzo.com>
Reviewed-by: Eric Blake 
[eblake: wording tweaks, minor cleanups, additional tracing]
Signed-off-by: Eric Blake 
---
 include/block/nbd.h |   8 +-
 nbd/server.c| 277 +++-
 nbd/trace-events|   1 +
 3 files changed, 261 insertions(+), 25 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index fcdcd545023..8bb9606c39b 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -229,11 +229,13 @@ enum {
 #define NBD_REPLY_TYPE_ERROR NBD_REPLY_ERR(1)
 #define NBD_REPLY_TYPE_ERROR_OFFSET  NBD_REPLY_ERR(2)

-/* Flags for extents (NBDExtent.flags) of NBD_REPLY_TYPE_BLOCK_STATUS,
- * for base:allocation meta context */
+/* Extent flags for base:allocation in NBD_REPLY_TYPE_BLOCK_STATUS */
 #define NBD_STATE_HOLE (1 << 0)
 #define NBD_STATE_ZERO (1 << 1)

+/* Extent flags for qemu:dirty-bitmap in NBD_REPLY_TYPE_BLOCK_STATUS */
+#define NBD_STATE_DIRTY (1 << 0)
+
 static inline bool nbd_reply_type_is_error(int type)
 {
 return type & (1 << 15);
@@ -315,6 +317,8 @@ void nbd_client_put(NBDClient *client);
 void nbd_server_start(SocketAddress *addr, const char *tls_creds,
   Error **errp);

+void nbd_export_bitmap(NBDExport *exp, const char *bitmap,
+   const char *bitmap_export_name, Error **errp);

 /* nbd_read
  * Reads @size bytes from @ioc. Returns 0 on success.
diff --git a/nbd/server.c b/nbd/server.c
index cea5192addb..f7f1fda4b3f 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -23,6 +23,13 @@
 #include "nbd-internal.h"

 #define NBD_META_ID_BASE_ALLOCATION 0
+#define NBD_META_ID_DIRTY_BITMAP 1
+
+/* NBD_MAX_BITMAP_EXTENTS: 1 mb of extents data. An empirical
+ * constant. If an increase is needed, note that the NBD protocol
+ * recommends no larger than 32 mb, so that the client won't consider
+ * the reply as a denial of service attack. */
+#define NBD_MAX_BITMAP_EXTENTS (0x10 / 8)

 static int system_errno_to_nbd_errno(int err)
 {
@@ -80,6 +87,9 @@ struct NBDExport {

 BlockBackend *eject_notifier_blk;
 Notifier eject_notifier;
+
+BdrvDirtyBitmap *export_bitmap;
+char *export_bitmap_context;
 };

 static QTAILQ_HEAD(, NBDExport) exports = QTAILQ_HEAD_INITIALIZER(exports);
@@ -92,6 +102,7 @@ typedef struct NBDExportMetaContexts {
 bool valid; /* means that negotiation of the option finished without
errors */
 bool base_allocation; /* export base:allocation context (block status) */
+bool bitmap; /* export qemu:dirty-bitmap: */
 } NBDExportMetaContexts;

 struct NBDClient {
@@ -814,6 +825,56 @@ static int nbd_meta_base_query(NBDClient *client, 
NBDExportMetaContexts *meta,
  >base_allocation, errp);
 }

+/* nbd_meta_bitmap_query
+ *
+ * Handle query to 'qemu:' namespace.
+ * @len is the amount of text remaining to be read from the current name, after
+ * the 'qemu:' portion has been stripped.
+ *
+ * Return -errno on I/O error, 0 if option was completely handled by
+ * sending a reply about inconsistent lengths, or 1 on success. */
+static int nbd_meta_qemu_query(NBDClient *client, NBDExportMetaContexts *meta,
+   uint32_t len, Error **errp)
+{
+bool dirty_bitmap = false;
+size_t dirty_bitmap_len = strlen("dirty-bitmap:");
+int ret;
+
+if (!meta->exp->export_bitmap) {
+trace_nbd_negotiate_meta_query_skip("no dirty-bitmap exported");
+return nbd_opt_skip(client, len, errp);
+}
+
+if (len == 0) {
+if (client->opt == NBD_OPT_LIST_META_CONTEXT) {
+meta->bitmap = true;
+}
+trace_nbd_negotiate_meta_query_parse("empty");
+return 1;
+}
+
+if (len < dirty_bitmap_len) {
+trace_nbd_negotiate_meta_query_skip("not dirty-bitmap:");
+return nbd_opt_skip(client, len, errp);
+}
+
+len -= dirty_bitmap_len;
+ret = nbd_meta_pattern(client, "dirty-bitmap:", _bitmap, errp);
+if (ret <= 0) {
+return ret;
+}
+if (!dirty_bitmap) {
+trace_nbd_negotiate_meta_query_skip("not dirty-bitmap:");
+return nbd_opt_skip(client, len, errp);
+}
+
+trace_nbd_negotiate_meta_query_parse("dirty-bitmap:");
+
+return nbd_meta_empty_or_pattern(
+client, meta->exp->export_bitmap_context +
+strlen("qemu:dirty_bitmap:"), len, >bitmap, errp);
+}
+
 /* nbd_negotiate_meta_query
  *
  * Parse namespace name and call corresponding function to parse body of the
@@ -829,9 

[Qemu-block] [PULL 7/7] docs/interop: add nbd.txt

2018-06-20 Thread Eric Blake
From: Vladimir Sementsov-Ogievskiy 

Describe new metadata namespace: "qemu".

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Message-Id: <20180609151758.17343-7-vsement...@virtuozzo.com>
Reviewed-by: Eric Blake 
[eblake: grammar tweaks]
Signed-off-by: Eric Blake 
---
 docs/interop/nbd.txt | 38 ++
 MAINTAINERS  |  1 +
 2 files changed, 39 insertions(+)
 create mode 100644 docs/interop/nbd.txt

diff --git a/docs/interop/nbd.txt b/docs/interop/nbd.txt
new file mode 100644
index 000..77b5f459111
--- /dev/null
+++ b/docs/interop/nbd.txt
@@ -0,0 +1,38 @@
+Qemu supports the NBD protocol, and has an internal NBD client (see
+block/nbd.c), an internal NBD server (see blockdev-nbd.c), and an
+external NBD server tool (see qemu-nbd.c). The common code is placed
+in nbd/*.
+
+The NBD protocol is specified here:
+https://github.com/NetworkBlockDevice/nbd/blob/master/doc/proto.md
+
+The following paragraphs describe some specific properties of NBD
+protocol realization in Qemu.
+
+= Metadata namespaces =
+
+Qemu supports the "base:allocation" metadata context as defined in the
+NBD protocol specification, and also defines an additional metadata
+namespace "qemu".
+
+
+== "qemu" namespace ==
+
+The "qemu" namespace currently contains only one type of context,
+related to exposing the contents of a dirty bitmap alongside the
+associated disk contents.  That context has the following form:
+
+qemu:dirty-bitmap:
+
+Each dirty-bitmap metadata context defines only one flag for extents
+in reply for NBD_CMD_BLOCK_STATUS:
+
+bit 0: NBD_STATE_DIRTY, means that the extent is "dirty"
+
+For NBD_OPT_LIST_META_CONTEXT the following queries are supported
+in addition to "qemu:dirty-bitmap:":
+
+* "qemu:" - returns list of all available metadata contexts in the
+namespace.
+* "qemu:dirty-bitmap:" - returns list of all available dirty-bitmap
+ metadata contexts.
diff --git a/MAINTAINERS b/MAINTAINERS
index da91501c7a6..efb17e6ac0f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1972,6 +1972,7 @@ F: nbd/
 F: include/block/nbd*
 F: qemu-nbd.*
 F: blockdev-nbd.c
+F: docs/interop/nbd.txt
 T: git git://repo.or.cz/qemu/ericb.git nbd

 NFS
-- 
2.14.4




[Qemu-block] [PULL 6/7] qapi: new qmp command nbd-server-add-bitmap

2018-06-20 Thread Eric Blake
From: Vladimir Sementsov-Ogievskiy 

For now, the actual command ix x-nbd-server-add-bitmap, reflecting
the fact that we are still working on libvirt code that proves the
command works as needed, and also the fact that we may remove
bitmap-export-name (and just require that the exported name be the
bitmap name).

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Message-Id: <20180609151758.17343-6-vsement...@virtuozzo.com>
Reviewed-by: Eric Blake 
[eblake: make the command experimental by adding x- prefix]
Signed-off-by: Eric Blake 
---
 qapi/block.json | 23 +++
 blockdev-nbd.c  | 23 +++
 2 files changed, 46 insertions(+)

diff --git a/qapi/block.json b/qapi/block.json
index c6945240029..ca807f176ae 100644
--- a/qapi/block.json
+++ b/qapi/block.json
@@ -268,6 +268,29 @@
 { 'command': 'nbd-server-remove',
   'data': {'name': 'str', '*mode': 'NbdServerRemoveMode'} }

+##
+# @x-nbd-server-add-bitmap:
+#
+# Expose a dirty bitmap associated with the selected export. The bitmap search
+# starts at the device attached to the export, and includes all backing files.
+# The exported bitmap is then locked until the NBD export is removed.
+#
+# @name: Export name.
+#
+# @bitmap: Bitmap name to search for.
+#
+# @bitmap-export-name: How the bitmap will be seen by nbd clients
+#  (default @bitmap)
+#
+# Note: the client must use NBD_OPT_SET_META_CONTEXT with a query of
+# "qemu:dirty-bitmap:NAME" (where NAME matches @bitmap-export-name) to access
+# the exposed bitmap.
+#
+# Since: 3.0
+##
+  { 'command': 'x-nbd-server-add-bitmap',
+'data': {'name': 'str', 'bitmap': 'str', '*bitmap-export-name': 'str'} }
+
 ##
 # @nbd-server-stop:
 #
diff --git a/blockdev-nbd.c b/blockdev-nbd.c
index 65a84739edc..1ef11041a73 100644
--- a/blockdev-nbd.c
+++ b/blockdev-nbd.c
@@ -220,3 +220,26 @@ void qmp_nbd_server_stop(Error **errp)
 nbd_server_free(nbd_server);
 nbd_server = NULL;
 }
+
+void qmp_x_nbd_server_add_bitmap(const char *name, const char *bitmap,
+ bool has_bitmap_export_name,
+ const char *bitmap_export_name,
+ Error **errp)
+{
+NBDExport *exp;
+
+if (!nbd_server) {
+error_setg(errp, "NBD server not running");
+return;
+}
+
+exp = nbd_export_find(name);
+if (exp == NULL) {
+error_setg(errp, "Export '%s' is not found", name);
+return;
+}
+
+nbd_export_bitmap(exp, bitmap,
+  has_bitmap_export_name ? bitmap_export_name : bitmap,
+  errp);
+}
-- 
2.14.4




[Qemu-block] [PULL 2/7] nbd/server: fix trace

2018-06-20 Thread Eric Blake
From: Vladimir Sementsov-Ogievskiy 

Return code = 1 doesn't mean that we parsed base:allocation. Use
correct traces in both -parsed and -skipped cases.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Message-Id: <20180609151758.17343-2-vsement...@virtuozzo.com>
Reviewed-by: Eric Blake 
[eblake: comment tweaks]
Signed-off-by: Eric Blake 
---
 nbd/server.c | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index 9e1f2271784..e71301b8cd7 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -736,12 +736,16 @@ static int nbd_negotiate_send_meta_context(NBDClient 
*client,

 /* nbd_meta_base_query
  *
- * Handle query to 'base' namespace. For now, only base:allocation context is
- * available in it.  'len' is the amount of text remaining to be read from
+ * Handle queries to 'base' namespace. For now, only the base:allocation
+ * context is available.  'len' is the amount of text remaining to be read from
  * the current name, after the 'base:' portion has been stripped.
  *
  * Return -errno on I/O error, 0 if option was completely handled by
- * sending a reply about inconsistent lengths, or 1 on success. */
+ * sending a reply about inconsistent lengths, or 1 on success.
+ *
+ * Note: return code = 1 doesn't mean that we've parsed the "base:allocation"
+ * namespace. It only means that there are no errors.
+ */
 static int nbd_meta_base_query(NBDClient *client, NBDExportMetaContexts *meta,
uint32_t len, Error **errp)
 {
@@ -768,10 +772,12 @@ static int nbd_meta_base_query(NBDClient *client, 
NBDExportMetaContexts *meta,
 }

 if (strncmp(query, "allocation", alen) == 0) {
+trace_nbd_negotiate_meta_query_parse("base:allocation");
 meta->base_allocation = true;
+} else {
+trace_nbd_negotiate_meta_query_skip("not base:allocation");
 }

-trace_nbd_negotiate_meta_query_parse("base:allocation");
 return 1;
 }

-- 
2.14.4




[Qemu-block] [PULL 3/7] nbd/server: refactor NBDExportMetaContexts

2018-06-20 Thread Eric Blake
From: Vladimir Sementsov-Ogievskiy 

Use NBDExport pointer instead of just export name: there is no need to
store a duplicated name in the struct; moreover, NBDExport will be used
further.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Message-Id: <20180609151758.17343-3-vsement...@virtuozzo.com>
Reviewed-by: Eric Blake 
[eblake: commit message grammar tweak]
Signed-off-by: Eric Blake 
---
 nbd/server.c | 23 +++
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index e71301b8cd7..bbdc3c01b9f 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -88,7 +88,7 @@ static QTAILQ_HEAD(, NBDExport) exports = 
QTAILQ_HEAD_INITIALIZER(exports);
  * as selected by NBD_OPT_SET_META_CONTEXT. Also used for
  * NBD_OPT_LIST_META_CONTEXT. */
 typedef struct NBDExportMetaContexts {
-char export_name[NBD_MAX_NAME_SIZE + 1];
+NBDExport *exp;
 bool valid; /* means that negotiation of the option finished without
errors */
 bool base_allocation; /* export base:allocation context (block status) */
@@ -399,10 +399,9 @@ static int nbd_negotiate_handle_list(NBDClient *client, 
Error **errp)
 return nbd_negotiate_send_rep(client, NBD_REP_ACK, errp);
 }

-static void nbd_check_meta_export_name(NBDClient *client)
+static void nbd_check_meta_export(NBDClient *client)
 {
-client->export_meta.valid &= !strcmp(client->exp->name,
- client->export_meta.export_name);
+client->export_meta.valid &= client->exp == client->export_meta.exp;
 }

 /* Send a reply to NBD_OPT_EXPORT_NAME.
@@ -456,7 +455,7 @@ static int nbd_negotiate_handle_export_name(NBDClient 
*client,

 QTAILQ_INSERT_TAIL(>exp->clients, client, next);
 nbd_export_get(client->exp);
-nbd_check_meta_export_name(client);
+nbd_check_meta_export(client);

 return 0;
 }
@@ -650,7 +649,7 @@ static int nbd_negotiate_handle_info(NBDClient *client, 
uint16_t myflags,
 client->exp = exp;
 QTAILQ_INSERT_TAIL(>exp->clients, client, next);
 nbd_export_get(client->exp);
-nbd_check_meta_export_name(client);
+nbd_check_meta_export(client);
 rc = 1;
 }
 return rc;
@@ -835,7 +834,7 @@ static int nbd_negotiate_meta_queries(NBDClient *client,
   NBDExportMetaContexts *meta, Error 
**errp)
 {
 int ret;
-NBDExport *exp;
+char export_name[NBD_MAX_NAME_SIZE + 1];
 NBDExportMetaContexts local_meta;
 uint32_t nb_queries;
 int i;
@@ -854,15 +853,15 @@ static int nbd_negotiate_meta_queries(NBDClient *client,

 memset(meta, 0, sizeof(*meta));

-ret = nbd_opt_read_name(client, meta->export_name, NULL, errp);
+ret = nbd_opt_read_name(client, export_name, NULL, errp);
 if (ret <= 0) {
 return ret;
 }

-exp = nbd_export_find(meta->export_name);
-if (exp == NULL) {
+meta->exp = nbd_export_find(export_name);
+if (meta->exp == NULL) {
 return nbd_opt_drop(client, NBD_REP_ERR_UNKNOWN, errp,
-"export '%s' not present", meta->export_name);
+"export '%s' not present", export_name);
 }

 ret = nbd_opt_read(client, _queries, sizeof(nb_queries), errp);
@@ -871,7 +870,7 @@ static int nbd_negotiate_meta_queries(NBDClient *client,
 }
 cpu_to_be32s(_queries);
 trace_nbd_negotiate_meta_context(nbd_opt_lookup(client->opt),
- meta->export_name, nb_queries);
+ export_name, nb_queries);

 if (client->opt == NBD_OPT_LIST_META_CONTEXT && !nb_queries) {
 /* enable all known contexts */
-- 
2.14.4




Re: [Qemu-block] [Qemu-devel] [PATCH v2 03/10] qcow2/bitmap: cache bm_list

2018-06-20 Thread John Snow



On 06/20/2018 09:04 AM, Vladimir Sementsov-Ogievskiy wrote:
> 13.06.2018 05:06, John Snow wrote:
>> We don't need to re-read this list every time, exactly. We can keep it
>> cached
>> and delete our copy when we flush to disk.
>>
>> Because we don't try to flush bitmaps on close if there's nothing to
>> flush,
>> add a new conditional to delete the state anyway for a clean exit.
>>
>> Signed-off-by: John Snow 
>> ---
>>   block/qcow2-bitmap.c | 74
>> ++--
>>   block/qcow2.c    |  2 ++
>>   block/qcow2.h    |  2 ++
>>   3 files changed, 52 insertions(+), 26 deletions(-)
>>
>> diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c
>> index 85c1b5afe3..5ae9b17928 100644
>> --- a/block/qcow2-bitmap.c
>> +++ b/block/qcow2-bitmap.c
>> @@ -636,6 +636,34 @@ fail:
>>   return NULL;
>>   }
>>   +static Qcow2BitmapList *get_bitmap_list(BlockDriverState *bs, Error
>> **errp)
>> +{
>> +    BDRVQcow2State *s = bs->opaque;
>> +    Qcow2BitmapList *bm_list;
>> +
>> +    if (s->bitmap_list) {
>> +    return (Qcow2BitmapList *)s->bitmap_list;
>> +    }
>> +
>> +    if (s->nb_bitmaps) {
>> +    bm_list = bitmap_list_load(bs, errp);
>> +    } else {
>> +    bm_list = bitmap_list_new();
>> +    }
>> +    s->bitmap_list = bm_list;
> 
> may be, we shouldn't cache it in inactive mode. However, I think we'll
> finally will not load bitmaps in inactive mode and drop the on
> inactivate, so it would not matter..
> 

Do we not load bitmaps when BDRV_O_INACTIVE is set? it looks like we do?

(From your subsequent email):
>
> really, now it would be a problem: we can start in inactive mode, load
> nothing, and then we can't reload bitmaps; my fix in
> https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg03254.html
> will not work after this patch.
>

So we load nothing because when we opened up the image RO, saw IN_USE
bitmaps (or saw none at all) and decided not to load them. qcow2_do_open
however marks that it has "loaded the bitmaps."

Later, when we reopen RW, we have a cached bm_list that doesn't include
this bitmap, so we don't mark it as IN_USE again or update the header.

(Wait, if you are worried about the bitmap's data having been changed,
why do we not reload the bitmap data here too?)


Now, patch 06 changes the cache so that we load all bitmaps and not just
ones not IN_USE. On RW reload, I reject any such IN_USE bitmaps as a
reason to prohibit the RW reload. However, this is broken too, because I
will miss any new flags that exist on-disk, so this function should
never use the cached data.

I'm confused, though, the comment that calls reopen_bitmaps_rw_hint says:

"It's some kind of reopen. There are no known cases where we need to
reload bitmaps in such a situation, so it's safer to skip them.
Moreover, if we have some readonly bitmaps and we are reopening for rw
we should reopen bitmaps correspondingly."

Why do we assume that bitmap data cannot change while BDRV_O_INACTIVATE
is set? Is that not wrong?

> Hm, I've understood the following problem: cache becomes incorrect after
> failed update_ext_header_and_dir or update_ext_header_and_dir_in_place
> operations. (after failed qcow2_remove_persistent_dirty_bitmap or
> qcow2_reopen_bitmaps_rw_hint)
> 
> And this comes from incomplete architecture after the patch:
> On the one hand, we work with one singleton bitmap_list, and loading
> part is refactored to do it safely. On the other hand, storing functions
> still have old behavior, they work with bitmap list like with their own
> local variable.

Yeah, I see it. Dropping the cache on such errors is fine.

> 
> So, we have safe mechanism to read list through the cache. We need also
> safe mechanism to update list both in cache and in file.
> 
> There are two possible variants:
> 
> 1. drop cache after failed store
> 2. rollback cache after failed store
> 
> 1 looks simpler..
> 
> Also, we should drop cache on inactivate (we do this) and should not
> create cache in inactive mode, because the other process may change the
> image.
> 
> Hm. may be, it is better to work with s->bitmap_list directly? In this
> case it will be more obvious that it is the cache, not local variable.
> And we will work with it like with other "parts of extension cache"
> s->nb_bitmaps, s->bitmap_directory_offset ...
> 
> After the patch, functions update_ext_header_and_dir* becomes strange:
> 
> 1. before the patch, they take external parameter - bm_list, and by this
> parameter they updated the file and cached s->nb_bitmaps, s->bitmap_*, ..
> 2. after the patch, they take parameter (actually s->bitmap_list) of
> same nature like s->nb_bitmap, and update s->nb_bitmap from it.
> 

Yeah, if we do decide that keeping a cache is the right thing, some of
the helper functions could be refactored or simplified a little to take
advantage of the new paradigm.

> Sorry for being late and for disordered stream of thoughts. Is this
> patch really needed for the whole series?
> 


Re: [Qemu-block] [PATCH v5 6/6] docs/interop: add nbd.txt

2018-06-20 Thread John Snow



On 06/20/2018 10:16 AM, Vladimir Sementsov-Ogievskiy wrote:
> 20.06.2018 14:33, Eric Blake wrote:
>> On 06/09/2018 10:17 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> Describe new metadata namespace: "qemu".
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>>> ---
>>>   docs/interop/nbd.txt | 37 +
>>>   MAINTAINERS  |  1 +
>>>   2 files changed, 38 insertions(+)
>>>   create mode 100644 docs/interop/nbd.txt
>>>
>>> diff --git a/docs/interop/nbd.txt b/docs/interop/nbd.txt
>>> new file mode 100644
>>> index 00..7366269fc0
>>> --- /dev/null
>>> +++ b/docs/interop/nbd.txt
>>> @@ -0,0 +1,37 @@
>>> +Qemu supports NBD protocol, and has internal NBD client (look at
>>
>> s/supports/supports the/
>>
>>> +block/nbd.c), internal NBD server (look at blockdev-nbd.c) as well as
>>
>> s/internal/an internal/2
>>
>>> +external NBD server tool - qemu-nbd.c. The common code is placed in
>>
>> s/external/an external/
>>
>>> +nbd/*.
>>> +
>>> +NBD protocol is specified here:
>>
>> s/NBD/The NBD/
>>
>>> +https://github.com/NetworkBlockDevice/nbd/blob/master/doc/proto.md
>>> +
>>> +This following paragraphs describe some specific properties of NBD
>>> +protocol realization in Qemu.
>>> +
>>> +
>>> += Metadata namespaces =
>>> +
>>> +Qemu supports "base:allocation" metadata context as defined in the NBD
>>
>> s/supports/supports the/
>>
>>> +protocol specification and defines own metadata namespace: "qemu".
>>
>> s/own/an additional/
>>
>>> +
>>> +
>>> +== "qemu" namespace ==
>>> +
>>> +For now, the only type of metadata context in the namespace is dirty
>>> +bitmap. All available metadata contexts have the following form:
>>
>> maybe:
>>
>> The "qemu" namespace currently contains only one type of context,
>> related to exposing the contents of a dirty bitmap alongside the
>> associated disk contents.  The available metadata context has the
>> following form:
> 
> Ok
> 
>>
>>> +
>>> +   qemu:dirty-bitmap:
>>> +
>>> +Each dirty-bitmap metadata context defines the only one flag for
>>> +extents in reply for NBD_CMD_BLOCK_STATUS:
>>> +
>>> +    bit 0: NBD_STATE_DIRTY, means that the extent is "dirty"
>>> +
>>> +For NBD_OPT_LIST_META_CONTEXT the following queries are supported
>>> +additionally to "qemu:dirty-bitmap:":
>>
>> s/additionally/in addition/
>>
>>> +
>>> +* "qemu:" : returns list of all available metadata contexts in the
>>> +    namespace.
>>> +* "qemu:dirty-bitmap:" : returns list of all available dirty-bitmap
>>> + metadata contexts.
>>> diff --git a/MAINTAINERS b/MAINTAINERS
>>> index e187b1f18f..887b479440 100644
>>> --- a/MAINTAINERS
>>> +++ b/MAINTAINERS
>>> @@ -1923,6 +1923,7 @@ F: nbd/
>>>   F: include/block/nbd*
>>>   F: qemu-nbd.*
>>>   F: blockdev-nbd.c
>>> +F: docs/interop/nbd.txt
>>>   T: git git://repo.or.cz/qemu/ericb.git nbd
>>>     NFS
>>>
>>
>> Reviewed-by: Eric Blake 
>>
>> At this point, I think I'll touch up the issues I've spotted and
>> submit a pull request, in order to make it easier for me to test my
>> libvirt code.
>>
> 
> Ok, thank you!
> 

ACK; the x- prefixes will help us get everything rolling together much
faster and gives us some leeway to change things later as needed.

Vladimir, can you jog our memories and let us know which series still
need to hit QEMU for 3.0 for safe persistence/migration et al?

(Not including any of my own qemu-img patches which I'll get to by freeze.)



Re: [Qemu-block] [Qemu-devel] [PATCH] [RFC v2] aio: properly bubble up errors from initialization

2018-06-20 Thread Nishanth Aravamudan
On 20.06.2018 [11:57:42 +0200], Kevin Wolf wrote:
> Am 20.06.2018 um 00:54 hat Nishanth Aravamudan geschrieben:
> > On 19.06.2018 [15:35:57 -0700], Nishanth Aravamudan wrote:
> > > On 19.06.2018 [13:14:51 -0700], Nishanth Aravamudan wrote:
> > > > On 19.06.2018 [14:35:33 -0500], Eric Blake wrote:
> > > > > On 06/15/2018 12:47 PM, Nishanth Aravamudan via Qemu-devel wrote:
> > > 
> > > 
> > > 
> > > > > >   } else if (s->use_linux_aio) {
> > > > > > +int rc;
> > > > > > +rc = aio_setup_linux_aio(bdrv_get_aio_context(bs));
> > > > > > +if (rc != 0) {
> > > > > > +error_report("Unable to use native AIO, falling 
> > > > > > back to "
> > > > > > + "thread pool.");
> > > > > 
> > > > > In general, error_report() should not output a trailing '.'.
> > > > 
> > > > Will fix.
> > > > 
> > > > > > +s->use_linux_aio = 0;
> > > > > > +return rc;
> > > > > 
> > > > > Wait - the message claims we are falling back, but the non-zero 
> > > > > return code
> > > > > sounds like we are returning an error instead of falling back.  (My
> > > > > preference - if the user requested something and we can't do it, it's 
> > > > > better
> > > > > to error than to fall back to something that does not match the user's
> > > > > request).
> > > > 
> > > > I think that makes sense, I hadn't tested this specific case (in my
> > > > reading of the code, it wasn't clear to me if raw_co_prw() could be
> > > > called before raw_aio_plug() had been called, but I think returning the
> > > > error code up should be handled correctly. What about the cases where
> > > > there is no error handling (the other two changes in the patch)?
> > > 
> > > While looking at doing these changes, I realized that I'm not quite sure
> > > what the right approach is here. My original rationale for returning
> > > non-zero was that AIO was requested but could not be completed. I
> > > haven't fully tracked back the calling paths, but I assumed it would get
> > > retried at the top level, and since we indicated to not use AIO on
> > > subsequent calls, it will succeed and use threads then (note, that I do
> > > now realize this means a mismatch between the qemu command-line and the
> > > in-use AIO model).
> > > 
> > > In practice, with my v2 patch, where I do return a non-zero error-code
> > > from this function, qemu does not exit (nor is any logging other than
> > > that I added emitted on the monitor). If I do not fallback, I imagine we
> > > would just continuously see this error message and IO might not actually
> > > every occur? Reworking all of the callpath to fail on non-zero returns
> > > from raw_co_prw() seems like a fair bit of work, but if that is what is
> > > being requested, I can try that (it will just take a while).
> > > Alternatively, I can produce a v3 quickly that does not bubble the
> > > actual errno all the way up (since it does seem like it is ignored
> > > anyways?).
> > 
> > Sorry for the noise, but I had one more thought. Would it be appropriate
> > to push the _setup() call up to when we parse the arguments about
> > aio=native? E.g., we already check there if cache=directsync is
> > specified and error out if not.
> 
> We already do this:

Right, I stated above it already is done, I simply meant adding a second
check here that we can obtain and setup the AIO context successfully.
 
>  /* Currently Linux does AIO only for files opened with O_DIRECT */
> if (s->use_linux_aio && !(s->open_flags & O_DIRECT)) {
> error_setg(errp, "aio=native was specified, but it requires "
>  "cache.direct=on, which was not specified.");
> ret = -EINVAL;
> goto fail;
> }
> 
> laio_init() is about other types of errors. But anyway, yes, calling
> laio_init() already in .bdrv_open() is possible. Returning errors from
> .bdrv_open() is nice and easy and we should do it.

Ack.

> However, we may also need to call laio_init() again when switching to a
> different I/O thread after the image is already opened. This is what I
> meant when I commented on v1 that you should do this in the
> .bdrv_attach_aio_context callback. The problem here is that we can't
> return an error there and the guest is already using the image. In this
> case, logging an error and falling back to the thread pool seems to be
> the best option we have.

Is this is a request for new functionality? Just trying to understand,
because aiui, block/file-posix.c does not implement the
bdrv_attach_aio_context callback currently. Instead, aio_get_linux_aio()
is called from three places, raw_co_prw, raw_aio_plug and
raw_aio_unplug, which calls into laio_init() and
laio_attach_aio_context(). I can add the callback you suggest with
appropriate error handling (I suppose it would point to
laio_attach_aio_context, possibly with some modifications) and remove
the call from aio_get_linux_aio()? Just trying to understand 

Re: [Qemu-block] [PATCH v5 5/6] qapi: new qmp command nbd-server-add-bitmap

2018-06-20 Thread Eric Blake

On 06/20/2018 09:13 AM, Vladimir Sementsov-Ogievskiy wrote:

20.06.2018 14:26, Eric Blake wrote:

On 06/09/2018 10:17 AM, Vladimir Sementsov-Ogievskiy wrote:

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  qapi/block.json | 23 +++
  blockdev-nbd.c  | 23 +++
  2 files changed, 46 insertions(+)


I'm tempted to temporarily name this x-nbd-server-add-bitmap, until I 
have the counterpart Libvirt patches tested, just in case testing 
turns up any tweaks we need to the interface.





+# @bitmap-export-name: How the bitmap will be seen by nbd clients
+#  (default @bitmap)


Do we really need the flexibility of naming the bitmap differently to 
the NBD client than we do in qemu?


It was needed for our backup-api implementation. Nikolay?


In fact, if we include the ability now, we can't remove it later (unless 
we use the x- prefix); but if we omit it now, we can add it later 
(regardless of using the x- prefix).  So I've gone ahead and added the 
x- prefix; here's what I'm squashing, along with:


Reviewed-by: Eric Blake 

diff --git i/qapi/block.json w/qapi/block.json
index ddbca2e286c..ca807f176ae 100644
--- i/qapi/block.json
+++ w/qapi/block.json
@@ -269,7 +269,7 @@
   'data': {'name': 'str', '*mode': 'NbdServerRemoveMode'} }

 ##
-# @nbd-server-add-bitmap:
+# @x-nbd-server-add-bitmap:
 #
 # Expose a dirty bitmap associated with the selected export. The 
bitmap search
 # starts at the device attached to the export, and includes all 
backing files.

@@ -288,7 +288,7 @@
 #
 # Since: 3.0
 ##
-  { 'command': 'nbd-server-add-bitmap',
+  { 'command': 'x-nbd-server-add-bitmap',
 'data': {'name': 'str', 'bitmap': 'str', '*bitmap-export-name': 
'str'} }


 ##
diff --git i/blockdev-nbd.c w/blockdev-nbd.c
index 6b0c50732c1..1ef11041a73 100644
--- i/blockdev-nbd.c
+++ w/blockdev-nbd.c
@@ -221,10 +221,10 @@ void qmp_nbd_server_stop(Error **errp)
 nbd_server = NULL;
 }

-void qmp_nbd_server_add_bitmap(const char *name, const char *bitmap,
-   bool has_bitmap_export_name,
-   const char *bitmap_export_name,
-   Error **errp)
+void qmp_x_nbd_server_add_bitmap(const char *name, const char *bitmap,
+ bool has_bitmap_export_name,
+ const char *bitmap_export_name,
+ Error **errp)
 {
 NBDExport *exp;



--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-block] [PATCH v5 4/6] nbd/server: implement dirty bitmap export

2018-06-20 Thread Eric Blake

On 06/20/2018 12:04 PM, Vladimir Sementsov-Ogievskiy wrote:

20.06.2018 19:27, Eric Blake wrote:

On 06/09/2018 10:17 AM, Vladimir Sementsov-Ogievskiy wrote:

Handle new NBD meta namespace: "qemu", and corresponding queries:
"qemu:dirty-bitmap:".

With new metadata context negotiated, BLOCK_STATUS query will reply
with dirty-bitmap data, converted to extents. New public function
nbd_export_bitmap selects bitmap to export. For now, only one bitmap
may be exported.




 static int nbd_co_send_extents(NBDClient *client, uint64_t handle,
-   NBDExtent *extents, unsigned nb_extents,
+   NBDExtent *extents, unsigned int 
nb_extents,

+   uint64_t length, bool last,
    uint32_t context_id, Error **errp)
 {
 NBDStructuredMeta chunk;
@@ -1890,7 +1897,9 @@ static int nbd_co_send_extents(NBDClient 
*client, uint64_t handle,
 {.iov_base = extents, .iov_len = nb_extents * 
sizeof(extents[0])}

 };

-    set_be_chunk(, 0, NBD_REPLY_TYPE_BLOCK_STATUS,
+    trace_nbd_co_send_extents(handle, nb_extents, context_id, length, 
last);


hm, length variable added only to be traced.. Ok, but a bit strange.


Yes. It doesn't affect what goes over the wire, but when it comes to 
tracing, knowing the sum of the extents can be quite helpful (especially 
knowing if the server's reply is shorter or longer than the client's 
request, and the fact that when two or more contexts are negotiated by 
the client, the different contexts can have different length replies)



+static unsigned int bitmap_to_extents(BdrvDirtyBitmap *bitmap, 
uint64_t offset,
+  uint64_t *length, NBDExtent 
*extents,


length - in-out? worth comment?


Sure.




+ unsigned int nb_extents,
+  bool dont_fragment)
 {
 uint64_t begin = offset, end;
-    uint64_t overall_end = offset + length;
-    unsigned i = 0;
+    uint64_t overall_end = offset + *length;
+    unsigned int i = 0;
 BdrvDirtyBitmapIter *it;
 bool dirty;

@@ -1983,23 +1994,25 @@ static unsigned 
bitmap_to_extents(BdrvDirtyBitmap *bitmap, uint64_t offset,


 bdrv_dirty_bitmap_unlock(bitmap);

+    *length = end - begin;


hm, it will always be zero, as at the end of the previous loop we have 
"begin = end;"


Ah, then it should be end - offset. Thanks for the careful review.

In fact, since ONLY the final extent can be larger than 2G (all earlier 
extents were short of the overall_end, and the incoming length is 
32-bit), but the NBD protocol says that at most one extent can go beyond 
the original request, we do NOT want to split a >2G extent into multiple 
extents, but rather cap it to just shy of 4G at the granularity offered 
by the bitmap itself.  At which point add_extents() never returns more 
than 1, and can just be inlined.





return i;
 }

 static int nbd_co_send_bitmap(NBDClient *client, uint64_t handle,
   BdrvDirtyBitmap *bitmap, uint64_t offset,
-  uint64_t length, bool dont_fragment,
+  uint32_t length, bool dont_fragment,
   uint32_t context_id, Error **errp)
 {
 int ret;
-    unsigned nb_extents = dont_fragment ? 1 : NBD_MAX_BITMAP_EXTENTS;
+    unsigned int nb_extents = dont_fragment ? 1 : 
NBD_MAX_BITMAP_EXTENTS;

 NBDExtent *extents = g_new(NBDExtent, nb_extents);
+    uint64_t final_length = length;

-    nb_extents = bitmap_to_extents(bitmap, offset, length, extents, 
nb_extents,

-   dont_fragment);
+    nb_extents = bitmap_to_extents(bitmap, offset, _length, 
extents,

+   nb_extents, dont_fragment);

-    ret = nbd_co_send_extents(client, handle, extents, nb_extents, 
context_id,

-  errp);
+    ret = nbd_co_send_extents(client, handle, extents, nb_extents,
+  final_length, true, context_id, errp);


hmm in-out pointer field only to trace final_length? may be just trace 
it in bitmap_to_extents?


No, because base:allocation also benefits from tracing final_length.



also, it's not trivial, that the function now sends FLAG_DONE. I think, 
worth add a comment, or
a parameter like for nbd_co_send_block_status. It will simplify 
introducing new contexts in future.


Do we anticipate adding any in the near future? But adding a parameter 
that is always true so that the callsite becomes more obvious on when to 
pass the done flag may indeed make future additions easier to spot in 
one place.


So here's what else I'll squash in:

diff --git i/nbd/server.c w/nbd/server.c
index e84aea6cf03..7a96b296839 100644
--- i/nbd/server.c
+++ w/nbd/server.c
@@ -1881,9 +1881,10 @@ static int 
blockstatus_to_extent_be(BlockDriverState *bs, uint64_t offset,


 /* nbd_co_send_extents
  *
- * @last controls whether NBD_REPLY_FLAG_DONE is sent.
- *

Re: [Qemu-block] [PATCH] block/qcow2: fix logic around dirty_bitmaps_loaded

2018-06-20 Thread Vladimir Sementsov-Ogievskiy

20.06.2018 20:17, Vladimir Sementsov-Ogievskiy wrote:

13.06.2018 01:18, John Snow wrote:


On 06/12/2018 06:11 PM, John Snow wrote:


On 06/12/2018 01:26 PM, Vladimir Sementsov-Ogievskiy wrote:

First: this variable was introduced to handle reopens. We need it on
following qcow2_do_open, to don't try loading bitmaps again. So, we 
are

fixing qcow2_invalidate_cache().

However, if we fix only qcow2_invalidate_cache, iotest 169 fails on
case test__persistent__not_migbitmap__online_shared, because on first
target open, source is not yet closed, and is not yet stored bitmaps,
so, we are load nothing. And on second open, dirty_bitmaps_loaded is
already true, but we have no bitmaps to reopen with
qcow2_reopen_bitmaps_rw_hint(). So, we are fixing qcow2_do_open() too.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block/qcow2.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 6fa5e1d71a..b4216a5830 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1499,7 +1499,7 @@ static int coroutine_fn 
qcow2_do_open(BlockDriverState *bs, QDict *options,

  {
  qcow2_reopen_bitmaps_rw_hint(bs, _updated, 
_err);

  }
-    } else {
+    } else if (s->nb_bitmaps > 0) {
  header_updated = qcow2_load_dirty_bitmaps(bs, _err);
  s->dirty_bitmaps_loaded = true;
  }
@@ -2178,6 +2178,7 @@ static void coroutine_fn 
qcow2_co_invalidate_cache(BlockDriverState *bs,

  QDict *options;
  Error *local_err = NULL;
  int ret;
+    bool dirty_bitmaps_loaded = s->dirty_bitmaps_loaded;
    /*
   * Backing files are read-only which makes all of their 
metadata immutable,
@@ -2190,6 +2191,7 @@ static void coroutine_fn 
qcow2_co_invalidate_cache(BlockDriverState *bs,

  qcow2_close(bs);
    memset(s, 0, sizeof(BDRVQcow2State));
+    s->dirty_bitmaps_loaded = dirty_bitmaps_loaded;
  options = qdict_clone_shallow(bs->options);
    flags &= ~BDRV_O_INACTIVE;


Ah, tch... I didn't realize that invalidate completely wipes the qcow2
metadata.

I guess the other three fields, nb_bitmaps, bitmap_directory_size and
bitmap_directory_offset get initialized again on re-open.

(I suppose this means I need to rethink caching bm_list more carefully,
too...)

I think this looks correct mechanically.


Oh, I meant:

Reviewed-by: John Snow 


It is interesting, this patch also fixes iotest 169. It's not broken 
formally, but if we look at vm output of vm_b after migration, we'll see


qemu-system-x86_64: Could not reopen qcow2 layer: Bitmap already 
exists: bitmap0


which leads to failed invalidation and bs->drv = NULL :)

I'll improve iotest somehow, to show this failure.



the bug can be shown with the following workaround:

diff --git a/tests/qemu-iotests/169 b/tests/qemu-iotests/169
index f243db9955..6dce24d29f 100755
--- a/tests/qemu-iotests/169
+++ b/tests/qemu-iotests/169
@@ -131,9 +131,13 @@ class TestDirtyBitmapMigration(iotests.QMPTestCase):
 break

 self.check_bitmap(self.vm_b, sha256 if should_migrate else False)
+    self.vm_b.hmp_qemu_io('drive0', 'write 0 512')

 if should_migrate:
 self.vm_b.shutdown()
+    print ''
+    print self.vm_b.get_log()
+    print ''
 # recreate vm_b, as we don't want -incoming option (this 
will lead

 # to "cat" process left alive after test finish)
 self.vm_b = iotests.VM(path_suffix='b')
@@ -152,7 +156,8 @@ for cmb in list(itertools.product((True, False), 
repeat=4)):

 name += '_online' if cmb[2] else '_offline'
 name += '_shared' if cmb[3] else '_nonshared'

-    inject_test_case(TestDirtyBitmapMigration, name, 'do_test_migration',
+    if name == '_persistent__not_migbitmap__offline_shared':
+    inject_test_case(TestDirtyBitmapMigration, name, 
'do_test_migration',

  *list(cmb))




--
Best regards,
Vladimir




Re: [Qemu-block] [PATCH] block/qcow2: fix logic around dirty_bitmaps_loaded

2018-06-20 Thread Vladimir Sementsov-Ogievskiy

13.06.2018 01:18, John Snow wrote:


On 06/12/2018 06:11 PM, John Snow wrote:


On 06/12/2018 01:26 PM, Vladimir Sementsov-Ogievskiy wrote:

First: this variable was introduced to handle reopens. We need it on
following qcow2_do_open, to don't try loading bitmaps again. So, we are
fixing qcow2_invalidate_cache().

However, if we fix only qcow2_invalidate_cache, iotest 169 fails on
case test__persistent__not_migbitmap__online_shared, because on first
target open, source is not yet closed, and is not yet stored bitmaps,
so, we are load nothing. And on second open, dirty_bitmaps_loaded is
already true, but we have no bitmaps to reopen with
qcow2_reopen_bitmaps_rw_hint(). So, we are fixing qcow2_do_open() too.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block/qcow2.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 6fa5e1d71a..b4216a5830 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1499,7 +1499,7 @@ static int coroutine_fn qcow2_do_open(BlockDriverState 
*bs, QDict *options,
  {
  qcow2_reopen_bitmaps_rw_hint(bs, _updated, _err);
  }
-} else {
+} else if (s->nb_bitmaps > 0) {
  header_updated = qcow2_load_dirty_bitmaps(bs, _err);
  s->dirty_bitmaps_loaded = true;
  }
@@ -2178,6 +2178,7 @@ static void coroutine_fn 
qcow2_co_invalidate_cache(BlockDriverState *bs,
  QDict *options;
  Error *local_err = NULL;
  int ret;
+bool dirty_bitmaps_loaded = s->dirty_bitmaps_loaded;
  
  /*

   * Backing files are read-only which makes all of their metadata 
immutable,
@@ -2190,6 +2191,7 @@ static void coroutine_fn 
qcow2_co_invalidate_cache(BlockDriverState *bs,
  qcow2_close(bs);
  
  memset(s, 0, sizeof(BDRVQcow2State));

+s->dirty_bitmaps_loaded = dirty_bitmaps_loaded;
  options = qdict_clone_shallow(bs->options);
  
  flags &= ~BDRV_O_INACTIVE;



Ah, tch... I didn't realize that invalidate completely wipes the qcow2
metadata.

I guess the other three fields, nb_bitmaps, bitmap_directory_size and
bitmap_directory_offset get initialized again on re-open.

(I suppose this means I need to rethink caching bm_list more carefully,
too...)

I think this looks correct mechanically.


Oh, I meant:

Reviewed-by: John Snow 


It is interesting, this patch also fixes iotest 169. It's not broken 
formally, but if we look at vm output of vm_b after migration, we'll see


qemu-system-x86_64: Could not reopen qcow2 layer: Bitmap already exists: 
bitmap0


which leads to failed invalidation and bs->drv = NULL :)

I'll improve iotest somehow, to show this failure.

--
Best regards,
Vladimir




Re: [Qemu-block] [PATCH v5 4/6] nbd/server: implement dirty bitmap export

2018-06-20 Thread Vladimir Sementsov-Ogievskiy

20.06.2018 19:27, Eric Blake wrote:

On 06/09/2018 10:17 AM, Vladimir Sementsov-Ogievskiy wrote:

Handle new NBD meta namespace: "qemu", and corresponding queries:
"qemu:dirty-bitmap:".

With new metadata context negotiated, BLOCK_STATUS query will reply
with dirty-bitmap data, converted to extents. New public function
nbd_export_bitmap selects bitmap to export. For now, only one bitmap
may be exported.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  include/block/nbd.h |   6 ++
  nbd/server.c    | 271 


  2 files changed, 259 insertions(+), 18 deletions(-)


I'm squashing this in, and adding:
Reviewed-by: Eric Blake 

(Grammar fixes, prefer 'unsigned int' over 'unsigned', add 'last' flag 
for avoiding separate chunk in reply, shorter variable name to fit 80 
columns, tweak 'length' tracking in order to add a trace before 
sending a status reply)


diff --git i/include/block/nbd.h w/include/block/nbd.h
index a653d0fba79..8bb9606c39b 100644
--- i/include/block/nbd.h
+++ w/include/block/nbd.h
@@ -229,13 +229,11 @@ enum {
 #define NBD_REPLY_TYPE_ERROR NBD_REPLY_ERR(1)
 #define NBD_REPLY_TYPE_ERROR_OFFSET  NBD_REPLY_ERR(2)

-/* Flags for extents (NBDExtent.flags) of NBD_REPLY_TYPE_BLOCK_STATUS,
- * for base:allocation meta context */
+/* Extent flags for base:allocation in NBD_REPLY_TYPE_BLOCK_STATUS */
 #define NBD_STATE_HOLE (1 << 0)
 #define NBD_STATE_ZERO (1 << 1)

-/* Flags for extents (NBDExtent.flags) of NBD_REPLY_TYPE_BLOCK_STATUS,
- * for qemu:dirty-bitmap:* meta contexts */
+/* Extent flags for qemu:dirty-bitmap in NBD_REPLY_TYPE_BLOCK_STATUS */
 #define NBD_STATE_DIRTY (1 << 0)

 static inline bool nbd_reply_type_is_error(int type)
diff --git i/nbd/server.c w/nbd/server.c
index 6f662bd4c7f..e84aea6cf03 100644
--- i/nbd/server.c
+++ w/nbd/server.c
@@ -25,9 +25,10 @@
 #define NBD_META_ID_BASE_ALLOCATION 0
 #define NBD_META_ID_DIRTY_BITMAP 1

-/* NBD_MAX_BITMAP_EXTENTS: 1 mb of extents data. An empirical 
constant. If need
- * to increase, note that NBD protocol defines 32 mb as a limit, 
after which the

- * reply may be considered as a denial of service attack. */
+/* NBD_MAX_BITMAP_EXTENTS: 1 mb of extents data. An empirical
+ * constant. If an increase is needed, note that the NBD protocol
+ * recommends no larger than 32 mb, so that the client won't consider
+ * the reply as a denial of service attack. */
 #define NBD_MAX_BITMAP_EXTENTS (0x10 / 8)

 static int system_errno_to_nbd_errno(int err)
@@ -101,7 +102,7 @@ typedef struct NBDExportMetaContexts {
 bool valid; /* means that negotiation of the option finished without
    errors */
 bool base_allocation; /* export base:allocation context (block 
status) */
-    bool dirty_bitmap; /* export qemu:dirty-bitmap:name> */

+    bool bitmap; /* export qemu:dirty-bitmap: */
 } NBDExportMetaContexts;

 struct NBDClient {
@@ -836,16 +837,17 @@ static int nbd_meta_qemu_query(NBDClient 
*client, NBDExportMetaContexts *meta,

    uint32_t len, Error **errp)
 {
 bool dirty_bitmap = false;
-    int dirty_bitmap_len = strlen("dirty-bitmap:");
+    size_t dirty_bitmap_len = strlen("dirty-bitmap:");
 int ret;

 if (!meta->exp->export_bitmap) {
+    trace_nbd_negotiate_meta_query_skip("no dirty-bitmap exported");
 return nbd_opt_skip(client, len, errp);
 }

 if (len == 0) {
 if (client->opt == NBD_OPT_LIST_META_CONTEXT) {
-    meta->dirty_bitmap = true;
+    meta->bitmap = true;
 }
 trace_nbd_negotiate_meta_query_parse("empty");
 return 1;
@@ -870,7 +872,7 @@ static int nbd_meta_qemu_query(NBDClient *client, 
NBDExportMetaContexts *meta,


 return nbd_meta_empty_or_pattern(
 client, meta->exp->export_bitmap_context +
-    strlen("qemu:dirty_bitmap:"), len, >dirty_bitmap, 
errp);

+    strlen("qemu:dirty_bitmap:"), len, >bitmap, errp);
 }

 /* nbd_negotiate_meta_query
@@ -888,11 +890,14 @@ static int nbd_meta_qemu_query(NBDClient 
*client, NBDExportMetaContexts *meta,

 static int nbd_negotiate_meta_query(NBDClient *client,
NBDExportMetaContexts *meta, Error **errp)
 {
+    /*
+ * Both 'qemu' and 'base' namespaces have length = 5 including a
+ * colon. If another length namespace is later introduced, this
+ * should certainly be refactored.
+ */
 int ret;
 size_t ns_len = 5;
-    char ns[5]; /* Both 'qemu' and 'base' namespaces have length = 5 
including a
-   colon. If it's needed to introduce a namespace of 
the other

-   length, this should be certainly refactored. */
+    char ns[5];
 uint32_t len;

 ret = nbd_opt_read(client, , sizeof(len), errp);
@@ -991,7 +996,7 @@ static int nbd_negotiate_meta_queries(NBDClient 
*client,

 }
 }

-    if (meta->dirty_bitmap) {
+    if (meta->bitmap) {
 ret = nbd_negotiate_send_meta_context(client,


Re: [Qemu-block] [PATCH v5 4/6] nbd/server: implement dirty bitmap export

2018-06-20 Thread Eric Blake

On 06/09/2018 10:17 AM, Vladimir Sementsov-Ogievskiy wrote:

Handle new NBD meta namespace: "qemu", and corresponding queries:
"qemu:dirty-bitmap:".

With new metadata context negotiated, BLOCK_STATUS query will reply
with dirty-bitmap data, converted to extents. New public function
nbd_export_bitmap selects bitmap to export. For now, only one bitmap
may be exported.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  include/block/nbd.h |   6 ++
  nbd/server.c| 271 
  2 files changed, 259 insertions(+), 18 deletions(-)


I'm squashing this in, and adding:
Reviewed-by: Eric Blake 

(Grammar fixes, prefer 'unsigned int' over 'unsigned', add 'last' flag 
for avoiding separate chunk in reply, shorter variable name to fit 80 
columns, tweak 'length' tracking in order to add a trace before sending 
a status reply)


diff --git i/include/block/nbd.h w/include/block/nbd.h
index a653d0fba79..8bb9606c39b 100644
--- i/include/block/nbd.h
+++ w/include/block/nbd.h
@@ -229,13 +229,11 @@ enum {
 #define NBD_REPLY_TYPE_ERROR NBD_REPLY_ERR(1)
 #define NBD_REPLY_TYPE_ERROR_OFFSET  NBD_REPLY_ERR(2)

-/* Flags for extents (NBDExtent.flags) of NBD_REPLY_TYPE_BLOCK_STATUS,
- * for base:allocation meta context */
+/* Extent flags for base:allocation in NBD_REPLY_TYPE_BLOCK_STATUS */
 #define NBD_STATE_HOLE (1 << 0)
 #define NBD_STATE_ZERO (1 << 1)

-/* Flags for extents (NBDExtent.flags) of NBD_REPLY_TYPE_BLOCK_STATUS,
- * for qemu:dirty-bitmap:* meta contexts */
+/* Extent flags for qemu:dirty-bitmap in NBD_REPLY_TYPE_BLOCK_STATUS */
 #define NBD_STATE_DIRTY (1 << 0)

 static inline bool nbd_reply_type_is_error(int type)
diff --git i/nbd/server.c w/nbd/server.c
index 6f662bd4c7f..e84aea6cf03 100644
--- i/nbd/server.c
+++ w/nbd/server.c
@@ -25,9 +25,10 @@
 #define NBD_META_ID_BASE_ALLOCATION 0
 #define NBD_META_ID_DIRTY_BITMAP 1

-/* NBD_MAX_BITMAP_EXTENTS: 1 mb of extents data. An empirical constant. 
If need
- * to increase, note that NBD protocol defines 32 mb as a limit, after 
which the

- * reply may be considered as a denial of service attack. */
+/* NBD_MAX_BITMAP_EXTENTS: 1 mb of extents data. An empirical
+ * constant. If an increase is needed, note that the NBD protocol
+ * recommends no larger than 32 mb, so that the client won't consider
+ * the reply as a denial of service attack. */
 #define NBD_MAX_BITMAP_EXTENTS (0x10 / 8)

 static int system_errno_to_nbd_errno(int err)
@@ -101,7 +102,7 @@ typedef struct NBDExportMetaContexts {
 bool valid; /* means that negotiation of the option finished without
errors */
 bool base_allocation; /* export base:allocation context (block 
status) */

-bool dirty_bitmap; /* export qemu:dirty-bitmap: */
+bool bitmap; /* export qemu:dirty-bitmap: */
 } NBDExportMetaContexts;

 struct NBDClient {
@@ -836,16 +837,17 @@ static int nbd_meta_qemu_query(NBDClient *client, 
NBDExportMetaContexts *meta,

uint32_t len, Error **errp)
 {
 bool dirty_bitmap = false;
-int dirty_bitmap_len = strlen("dirty-bitmap:");
+size_t dirty_bitmap_len = strlen("dirty-bitmap:");
 int ret;

 if (!meta->exp->export_bitmap) {
+trace_nbd_negotiate_meta_query_skip("no dirty-bitmap exported");
 return nbd_opt_skip(client, len, errp);
 }

 if (len == 0) {
 if (client->opt == NBD_OPT_LIST_META_CONTEXT) {
-meta->dirty_bitmap = true;
+meta->bitmap = true;
 }
 trace_nbd_negotiate_meta_query_parse("empty");
 return 1;
@@ -870,7 +872,7 @@ static int nbd_meta_qemu_query(NBDClient *client, 
NBDExportMetaContexts *meta,


 return nbd_meta_empty_or_pattern(
 client, meta->exp->export_bitmap_context +
-strlen("qemu:dirty_bitmap:"), len, >dirty_bitmap, errp);
+strlen("qemu:dirty_bitmap:"), len, >bitmap, errp);
 }

 /* nbd_negotiate_meta_query
@@ -888,11 +890,14 @@ static int nbd_meta_qemu_query(NBDClient *client, 
NBDExportMetaContexts *meta,

 static int nbd_negotiate_meta_query(NBDClient *client,
 NBDExportMetaContexts *meta, Error 
**errp)

 {
+/*
+ * Both 'qemu' and 'base' namespaces have length = 5 including a
+ * colon. If another length namespace is later introduced, this
+ * should certainly be refactored.
+ */
 int ret;
 size_t ns_len = 5;
-char ns[5]; /* Both 'qemu' and 'base' namespaces have length = 5 
including a
-   colon. If it's needed to introduce a namespace of 
the other

-   length, this should be certainly refactored. */
+char ns[5];
 uint32_t len;

 ret = nbd_opt_read(client, , sizeof(len), errp);
@@ -991,7 +996,7 @@ static int nbd_negotiate_meta_queries(NBDClient *client,
 }
 }

-if (meta->dirty_bitmap) {
+if (meta->bitmap) {
 ret = nbd_negotiate_send_meta_context(client,


Re: [Qemu-block] [PATCH] qapi: drop x- from x-block-latency-histogram-set

2018-06-20 Thread John Snow



On 06/20/2018 11:28 AM, Vladimir Sementsov-Ogievskiy wrote:
> 20.06.2018 18:27, Vladimir Sementsov-Ogievskiy wrote:
>> Libvirt part is ready, let's drop x- prefix.
> 
> libvirt patches will be sent soon I hope.
> 

OK, can you ping this patch with a link to the series when it is posted?

Thank you,
--js

>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>> ---
>>   qapi/block-core.json | 4 ++--
>>   blockdev.c   | 2 +-
>>   2 files changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>> index cc3ede0630..dfaa050651 100644
>> --- a/qapi/block-core.json
>> +++ b/qapi/block-core.json
>> @@ -483,7 +483,7 @@
>>     'data': {'boundaries': ['uint64'], 'bins': ['uint64'] } }
>>     ##
>> -# @x-block-latency-histogram-set:
>> +# @block-latency-histogram-set:
>>   #
>>   # Manage read, write and flush latency histograms for the device.
>>   #
>> @@ -547,7 +547,7 @@
>>   #  "arguments": { "device": "drive0" } }
>>   # <- { "return": {} }
>>   ##
>> -{ 'command': 'x-block-latency-histogram-set',
>> +{ 'command': 'block-latency-histogram-set',
>>     'data': {'device': 'str',
>>  '*boundaries': ['uint64'],
>>  '*boundaries-read': ['uint64'],
>> diff --git a/blockdev.c b/blockdev.c
>> index 58d7570932..6d4ae77041 100644
>> --- a/blockdev.c
>> +++ b/blockdev.c
>> @@ -4299,7 +4299,7 @@ void qmp_x_blockdev_set_iothread(const char
>> *node_name, StrOrNull *iothread,
>>   aio_context_release(old_context);
>>   }
>>   -void qmp_x_block_latency_histogram_set(
>> +void qmp_block_latency_histogram_set(
>>   const char *device,
>>   bool has_boundaries, uint64List *boundaries,
>>   bool has_boundaries_read, uint64List *boundaries_read,
> 
> 



Re: [Qemu-block] [Qemu-devel] [PATCH v5 4/6] nbd/server: implement dirty bitmap export

2018-06-20 Thread Eric Blake

On 06/20/2018 10:43 AM, Eric Blake wrote:

On 06/20/2018 06:24 AM, Eric Blake wrote:

+/* Set several extents, describing region of given @length with 
given @flags.

+ * Do not set more than @nb_extents, return number of set extents.
+ */
+static unsigned add_extents(NBDExtent *extents, unsigned nb_extents,
+    uint64_t length, uint32_t flags)
+{
+    unsigned i = 0;
+    uint32_t max_extent = QEMU_ALIGN_DOWN(INT32_MAX, BDRV_SECTOR_SIZE);


This is too small of a granularity wrong when the server advertised 4k 
alignment during NBD_OPT_GO; it should probably refer to 
bs->bl.request_alignment.


In fact, we can just use INT32_MAX. The dirty bitmap has a granularity 
at least as large as the sector size, but no smaller than the 
request_alignment. We don't have to worry about alignment here, as the 
extents will already be naturally aligned when converting from the 
bitmap into extents in the caller.


Oh, I see. The NBD protocol can only ask for a length of up to 32 bits, 
but if you learn via bitmap query that the entire rest of the image has 
the same state, you can indeed call add_extents() with a value larger 
than 32 bits, that needs to be fragmented back down into sub-32-bit 
chunks.  INT32_MAX isn't quite right, but rounding may still pick the 
wrong granularity on an export with 4k alignment; however, INT32_MAX + 
1U is just fine, and unlikely to run into alignment issues.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-block] [Qemu-devel] [PATCH v5 4/6] nbd/server: implement dirty bitmap export

2018-06-20 Thread Eric Blake

On 06/20/2018 06:24 AM, Eric Blake wrote:

+/* Set several extents, describing region of given @length with given 
@flags.

+ * Do not set more than @nb_extents, return number of set extents.
+ */
+static unsigned add_extents(NBDExtent *extents, unsigned nb_extents,
+    uint64_t length, uint32_t flags)
+{
+    unsigned i = 0;
+    uint32_t max_extent = QEMU_ALIGN_DOWN(INT32_MAX, BDRV_SECTOR_SIZE);


This is too small of a granularity wrong when the server advertised 4k 
alignment during NBD_OPT_GO; it should probably refer to 
bs->bl.request_alignment.


In fact, we can just use INT32_MAX. The dirty bitmap has a granularity 
at least as large as the sector size, but no smaller than the 
request_alignment. We don't have to worry about alignment here, as the 
extents will already be naturally aligned when converting from the 
bitmap into extents in the caller.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-block] [PATCH] qapi: drop x- from x-block-latency-histogram-set

2018-06-20 Thread Vladimir Sementsov-Ogievskiy

20.06.2018 18:27, Vladimir Sementsov-Ogievskiy wrote:

Libvirt part is ready, let's drop x- prefix.


libvirt patches will be sent soon I hope.



Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  qapi/block-core.json | 4 ++--
  blockdev.c   | 2 +-
  2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index cc3ede0630..dfaa050651 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -483,7 +483,7 @@
'data': {'boundaries': ['uint64'], 'bins': ['uint64'] } }
  
  ##

-# @x-block-latency-histogram-set:
+# @block-latency-histogram-set:
  #
  # Manage read, write and flush latency histograms for the device.
  #
@@ -547,7 +547,7 @@
  #  "arguments": { "device": "drive0" } }
  # <- { "return": {} }
  ##
-{ 'command': 'x-block-latency-histogram-set',
+{ 'command': 'block-latency-histogram-set',
'data': {'device': 'str',
 '*boundaries': ['uint64'],
 '*boundaries-read': ['uint64'],
diff --git a/blockdev.c b/blockdev.c
index 58d7570932..6d4ae77041 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -4299,7 +4299,7 @@ void qmp_x_blockdev_set_iothread(const char *node_name, 
StrOrNull *iothread,
  aio_context_release(old_context);
  }
  
-void qmp_x_block_latency_histogram_set(

+void qmp_block_latency_histogram_set(
  const char *device,
  bool has_boundaries, uint64List *boundaries,
  bool has_boundaries_read, uint64List *boundaries_read,



--
Best regards,
Vladimir




[Qemu-block] [PATCH] qapi: drop x- from x-block-latency-histogram-set

2018-06-20 Thread Vladimir Sementsov-Ogievskiy
Libvirt part is ready, let's drop x- prefix.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 qapi/block-core.json | 4 ++--
 blockdev.c   | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index cc3ede0630..dfaa050651 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -483,7 +483,7 @@
   'data': {'boundaries': ['uint64'], 'bins': ['uint64'] } }
 
 ##
-# @x-block-latency-histogram-set:
+# @block-latency-histogram-set:
 #
 # Manage read, write and flush latency histograms for the device.
 #
@@ -547,7 +547,7 @@
 #  "arguments": { "device": "drive0" } }
 # <- { "return": {} }
 ##
-{ 'command': 'x-block-latency-histogram-set',
+{ 'command': 'block-latency-histogram-set',
   'data': {'device': 'str',
'*boundaries': ['uint64'],
'*boundaries-read': ['uint64'],
diff --git a/blockdev.c b/blockdev.c
index 58d7570932..6d4ae77041 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -4299,7 +4299,7 @@ void qmp_x_blockdev_set_iothread(const char *node_name, 
StrOrNull *iothread,
 aio_context_release(old_context);
 }
 
-void qmp_x_block_latency_histogram_set(
+void qmp_block_latency_histogram_set(
 const char *device,
 bool has_boundaries, uint64List *boundaries,
 bool has_boundaries_read, uint64List *boundaries_read,
-- 
2.11.1




[Qemu-block] [PATCH v2 3/3] qcow2: add compress threads

2018-06-20 Thread Vladimir Sementsov-Ogievskiy
Do data compression in separate threads. This significantly improve
performance for qemu-img convert with -W (allow async writes) and -c
(compressed) options.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/qcow2.h |  3 +++
 block/qcow2.c | 62 ++-
 2 files changed, 64 insertions(+), 1 deletion(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 01b5250415..0bd21623c2 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -326,6 +326,9 @@ typedef struct BDRVQcow2State {
  * override) */
 char *image_backing_file;
 char *image_backing_format;
+
+CoQueue compress_wait_queue;
+int nb_compress_threads;
 } BDRVQcow2State;
 
 typedef struct Qcow2COWRegion {
diff --git a/block/qcow2.c b/block/qcow2.c
index e431c73e0d..362d9452f4 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -44,6 +44,7 @@
 #include "qapi/qobject-input-visitor.h"
 #include "qapi/qapi-visit-block-core.h"
 #include "crypto.h"
+#include "block/thread-pool.h"
 
 /*
   Differences with QCOW:
@@ -1546,6 +1547,9 @@ static int coroutine_fn qcow2_do_open(BlockDriverState 
*bs, QDict *options,
 qcow2_check_refcounts(bs, , 0);
 }
 #endif
+
+qemu_co_queue_init(>compress_wait_queue);
+
 return ret;
 
  fail:
@@ -3714,6 +3718,62 @@ static ssize_t qcow2_compress(void *dest, const void 
*src, size_t size)
 return ret;
 }
 
+#define MAX_COMPRESS_THREADS 4
+
+typedef struct Qcow2CompressData {
+void *dest;
+const void *src;
+size_t size;
+ssize_t ret;
+} Qcow2CompressData;
+
+static int qcow2_compress_pool_func(void *opaque)
+{
+Qcow2CompressData *data = opaque;
+
+data->ret = qcow2_compress(data->dest, data->src, data->size);
+
+return 0;
+}
+
+static void qcow2_compress_complete(void *opaque, int ret)
+{
+qemu_coroutine_enter(opaque);
+}
+
+/* See qcow2_compress definition for parameters description */
+static ssize_t qcow2_co_compress(BlockDriverState *bs,
+ void *dest, const void *src, size_t size)
+{
+BDRVQcow2State *s = bs->opaque;
+BlockAIOCB *acb;
+ThreadPool *pool = aio_get_thread_pool(bdrv_get_aio_context(bs));
+Qcow2CompressData arg = {
+.dest = dest,
+.src = src,
+.size = size,
+};
+
+while (s->nb_compress_threads >= MAX_COMPRESS_THREADS) {
+qemu_co_queue_wait(>compress_wait_queue, NULL);
+}
+
+s->nb_compress_threads++;
+acb = thread_pool_submit_aio(pool, qcow2_compress_pool_func, ,
+ qcow2_compress_complete,
+ qemu_coroutine_self());
+
+if (!acb) {
+s->nb_compress_threads--;
+return -EINVAL;
+}
+qemu_coroutine_yield();
+s->nb_compress_threads--;
+qemu_co_queue_next(>compress_wait_queue);
+
+return arg.ret;
+}
+
 /* XXX: put compressed sectors first, then all the cluster aligned
tables to avoid losing bytes in alignment */
 static coroutine_fn int
@@ -3757,7 +3817,7 @@ qcow2_co_pwritev_compressed(BlockDriverState *bs, 
uint64_t offset,
 
 out_buf = g_malloc(s->cluster_size);
 
-out_len = qcow2_compress(out_buf, buf, s->cluster_size);
+out_len = qcow2_co_compress(bs, out_buf, buf, s->cluster_size);
 if (out_len == -2) {
 ret = -EINVAL;
 goto fail;
-- 
2.11.1




[Qemu-block] [PATCH v2 0/3] qcow2 compress threads

2018-06-20 Thread Vladimir Sementsov-Ogievskiy
Hi all!

Here are compress threads for qcow2, to increase performance of
compressed writes.

v2 changes:

02: fix typo in commit msg
keep "qemu/osdep.h" to be the first included header,
fix comment style

===

I've created the following test:

[]# cat ../gen.sh 
#!/bin/bash

echo 'create pattern-file t_pat'

./qemu-img create -f raw t_pat 1000m
./qemu-io -c 'write -P 0xab 0 1000m' t_pat

echo 'create randod t_rand'

dd if=/dev/urandom of=t_rand bs=1M count=1000

[]# cat ../test.sh 
#!/bin/bash

rm -f t_out

echo 'test pattern-file compression'

time ./qemu-img convert -W -f raw -O qcow2 -c t_pat t_out

rm -f t_out

echo 'test random-file compression'

time ./qemu-img convert -W -f raw -O qcow2 -c t_rand t_out

rm -f t_out


and results before the series (and without -W flag):

test pattern-file compression

real0m16.658s
user0m16.450s
sys 0m0.628s
test random-file compression

real0m24.194s
user0m24.361s
sys 0m0.395s

results with -W flag, after first patch:

test pattern-file compression

real0m16.242s
user0m16.895s
sys 0m0.080s
test random-file compression

real0m23.450s
user0m23.767s
sys 0m1.085s

results with -W flag, after third patch:

test pattern-file compression

real0m5.747s
user0m22.637s
sys 0m0.393s
test random-file compression

real0m8.402s
user0m33.315s
sys 0m0.926s

So, we see significant performance gain. But this of course don't work
without -W flag.

results without -W flag, after third patch:

test pattern-file compression

real0m16.908s
user0m16.775s
sys 0m0.589s
test random-file compression

real0m24.913s
user0m24.586s
sys 0m0.898s

Note: my cpu is 4-cores 8-threads i7-4790

Vladimir Sementsov-Ogievskiy (3):
  qemu-img: allow compressed not-in-order writes
  qcow2: refactor data compression
  qcow2: add compress threads

 block/qcow2.h |   3 ++
 block/qcow2.c | 136 ++
 qemu-img.c|   5 ---
 3 files changed, 112 insertions(+), 32 deletions(-)

-- 
2.11.1




[Qemu-block] [PATCH v2 1/3] qemu-img: allow compressed not-in-order writes

2018-06-20 Thread Vladimir Sementsov-Ogievskiy
No reason to forbid them, and they are needed to improve performance
with compress-threads in further patches.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 qemu-img.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index e1a506f7f6..7651d8172c 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -2141,11 +2141,6 @@ static int img_convert(int argc, char **argv)
 goto fail_getopt;
 }
 
-if (!s.wr_in_order && s.compressed) {
-error_report("Out of order write and compress are mutually exclusive");
-goto fail_getopt;
-}
-
 if (tgt_image_opts && !skip_create) {
 error_report("--target-image-opts requires use of -n flag");
 goto fail_getopt;
-- 
2.11.1




Re: [Qemu-block] [PATCH v2 1/6] qemu-nbd: add support for authorization of TLS clients

2018-06-20 Thread Dr. David Alan Gilbert
* Daniel P. Berrangé (berra...@redhat.com) wrote:
> On Wed, Jun 20, 2018 at 03:22:53PM +0100, Dr. David Alan Gilbert wrote:
> > * Daniel P. Berrangé (berra...@redhat.com) wrote:
> > > From: "Daniel P. Berrange" 
> > > 
> > > Currently any client which can complete the TLS handshake is able to use
> > > the NBD server. The server admin can turn on the 'verify-peer' option
> > > for the x509 creds to require the client to provide a x509 certificate.
> > > This means the client will have to acquire a certificate from the CA
> > > before they are permitted to use the NBD server. This is still a fairly
> > > low bar to cross.
> > > 
> > > This adds a '--tls-authz OBJECT-ID' option to the qemu-nbd command which
> > > takes the ID of a previously added 'QAuthZ' object instance. This will
> > > be used to validate the client's x509 distinguished name. Clients
> > > failing the authorization check will not be permitted to use the NBD
> > > server.
> > > 
> > > For example to setup authorization that only allows connection from a 
> > > client
> > > whose x509 certificate distinguished name is
> > > 
> > >CN=laptop.example.com,O=Example Org,L=London,ST=London,C=GB
> > > 
> > > use:
> > > 
> > >   qemu-nbd --object tls-creds-x509,id=tls0,dir=/home/berrange/qemutls,\
> > > endpoint=server,verify-peer=yes \
> > >--object 
> > > authz-simple,id=auth0,identity=CN=laptop.example.com,,\
> > > O=Example Org,,L=London,,ST=London,,C=GB \
> > 
> > I'm confused about how that gets parsed, what differentiates the ,s
> > that separate the arguments (e.g. ,id=  ,identity=) and the ,s that
> > separate the options within the identity string (e.g. the ,ST=London)
> 
> That's why I've doubled up - eg ',,' must be used when you need to
> include a literal ',' in a value without it being interpreted as
> starting a new option

OK, yeh I forgot about the obscure double-comma rule.

> > Would:
> >   --object authz-simple,identity=CN=laptop.example.com,,O=Example 
> > Org,,L=London,,ST=London,,C=GB,id=auth0
> > 
> > be equivalent?
> 
> Yes

OK.

Dave

> 
> 
> Regards,
> Daniel
> -- 
> |: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-https://fstop138.berrange.com :|
> |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



[Qemu-block] [PATCH v2 2/3] qcow2: refactor data compression

2018-06-20 Thread Vladimir Sementsov-Ogievskiy
Make a separate function for compression to be parallelized later.
 - use .avail_out field instead of .next_out to calculate size of
   compressed data. It looks more natural and it allows to keep dest to
   be void pointer
 - set avail_out to be at least one byte less than input, to be sure
   avoid inefficient compression earlier

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/qcow2.c | 76 ++-
 1 file changed, 49 insertions(+), 27 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 945132f692..e431c73e0d 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -23,11 +23,14 @@
  */
 
 #include "qemu/osdep.h"
+
+#define ZLIB_CONST
+#include 
+
 #include "block/block_int.h"
 #include "block/qdict.h"
 #include "sysemu/block-backend.h"
 #include "qemu/module.h"
-#include 
 #include "qcow2.h"
 #include "qemu/error-report.h"
 #include "qapi/error.h"
@@ -3671,6 +3674,46 @@ static int qcow2_truncate(BlockDriverState *bs, int64_t 
offset,
 return 0;
 }
 
+/*
+ * qcow2_compress()
+ *
+ * @dest - destination buffer, at least of @size-1 bytes
+ * @src - source buffer, @size bytes
+ *
+ * Returns: compressed size on success
+ *  -1 if compression is inefficient
+ *  -2 on any other error
+ */
+static ssize_t qcow2_compress(void *dest, const void *src, size_t size)
+{
+ssize_t ret;
+z_stream strm;
+
+/* best compression, small window, no zlib header */
+memset(, 0, sizeof(strm));
+ret = deflateInit2(, Z_DEFAULT_COMPRESSION, Z_DEFLATED,
+   -12, 9, Z_DEFAULT_STRATEGY);
+if (ret != 0) {
+return -2;
+}
+
+strm.avail_in = size;
+strm.next_in = src;
+strm.avail_out = size - 1;
+strm.next_out = dest;
+
+ret = deflate(, Z_FINISH);
+if (ret == Z_STREAM_END) {
+ret = size - 1 - strm.avail_out;
+} else {
+ret = (ret == Z_OK ? -1 : -2);
+}
+
+deflateEnd();
+
+return ret;
+}
+
 /* XXX: put compressed sectors first, then all the cluster aligned
tables to avoid losing bytes in alignment */
 static coroutine_fn int
@@ -3680,8 +3723,8 @@ qcow2_co_pwritev_compressed(BlockDriverState *bs, 
uint64_t offset,
 BDRVQcow2State *s = bs->opaque;
 QEMUIOVector hd_qiov;
 struct iovec iov;
-z_stream strm;
-int ret, out_len;
+int ret;
+size_t out_len;
 uint8_t *buf, *out_buf;
 int64_t cluster_offset;
 
@@ -3714,32 +3757,11 @@ qcow2_co_pwritev_compressed(BlockDriverState *bs, 
uint64_t offset,
 
 out_buf = g_malloc(s->cluster_size);
 
-/* best compression, small window, no zlib header */
-memset(, 0, sizeof(strm));
-ret = deflateInit2(, Z_DEFAULT_COMPRESSION,
-   Z_DEFLATED, -12,
-   9, Z_DEFAULT_STRATEGY);
-if (ret != 0) {
+out_len = qcow2_compress(out_buf, buf, s->cluster_size);
+if (out_len == -2) {
 ret = -EINVAL;
 goto fail;
-}
-
-strm.avail_in = s->cluster_size;
-strm.next_in = (uint8_t *)buf;
-strm.avail_out = s->cluster_size;
-strm.next_out = out_buf;
-
-ret = deflate(, Z_FINISH);
-if (ret != Z_STREAM_END && ret != Z_OK) {
-deflateEnd();
-ret = -EINVAL;
-goto fail;
-}
-out_len = strm.next_out - out_buf;
-
-deflateEnd();
-
-if (ret != Z_STREAM_END || out_len >= s->cluster_size) {
+} else if (out_len == -1) {
 /* could not compress: write normal cluster */
 ret = qcow2_co_pwritev(bs, offset, bytes, qiov, 0);
 if (ret < 0) {
-- 
2.11.1




Re: [Qemu-block] [PATCH v2 1/6] qemu-nbd: add support for authorization of TLS clients

2018-06-20 Thread Eric Blake

On 06/20/2018 09:22 AM, Dr. David Alan Gilbert wrote:


For example to setup authorization that only allows connection from a client
whose x509 certificate distinguished name is

CN=laptop.example.com,O=Example Org,L=London,ST=London,C=GB

use:

   qemu-nbd --object tls-creds-x509,id=tls0,dir=/home/berrange/qemutls,\
 endpoint=server,verify-peer=yes \
--object authz-simple,id=auth0,identity=CN=laptop.example.com,,\
 O=Example Org,,L=London,,ST=London,,C=GB \


I'm confused about how that gets parsed, what differentiates the ,s
that separate the arguments (e.g. ,id=  ,identity=) and the ,s that
separate the options within the identity string (e.g. the ,ST=London)
Would:
   --object authz-simple,identity=CN=laptop.example.com,,O=Example 
Org,,L=London,,ST=London,,C=GB,id=auth0

be equivalent?


Yes, once you take care of quoting the space and unfolding indentation. 
Our standard QemuOpt parser treats ',,' as a literal comma, and all 
other ',' as separating args.  So either form is ultimately parsed as:


--object
  [type=]"authz-simple"
  id="auth0"
  identity="CN=laptop.example.com,O=Example Org,L=London,ST=London,C=GB"

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-block] bug in reopen arch

2018-06-20 Thread Vladimir Sementsov-Ogievskiy

Kevin?

15.06.2018 21:42, Vladimir Sementsov-Ogievskiy wrote:

14.06.2018 13:46, Kevin Wolf wrote:

Am 12.06.2018 um 20:57 hat Vladimir Sementsov-Ogievskiy geschrieben:

Hi all!

I've faced the following problem:

 1. create image with dirty bitmap, a.qcow2 (start qemu and run qmp
 command block-dirty-bitmap-add)

 2. run the following commands:

 qemu-img create -f qcow2 -b a.qcow2 b.qcow2 10M
 qemu-io -c 'write 0 512' b.qcow2
 qemu-img commit b.qcow2

 3. last command fails with the following output:

Formatting 'b.qcow2', fmt=qcow2 size=68719476736 backing_file=a.qcow2
cluster_size=65536 lazy_refcounts=off refcount_bits=16
wrote 512/512 bytes at offset 0
512 bytes, 1 ops; 0.0953 sec (5.243 KiB/sec and 10.4867 ops/sec)
qemu-img: #block397: Failed to make dirty bitmaps writable: Can't 
update

bitmap directory: Operation not permitted
qemu-img: Block job failed: Operation not permitted

And problem is that children are reopened _after_ parent. But qcow2 
reopen

needs write access to its file, to write IN_USE flag to dirty-bitmaps
extension.

I was aware of a different instance of this problem: Assume a qcow2
image with an unknown autoclear flag (so it will be cleared on r/w
open), which is first opened r/o and then reopened r/w. This will fail
because .bdrv_reopen_prepare doesn't have the permissions yet.


Hm.. If I understand correctly qcow2_reopen_prepare doesn't deal with  
autoclear flags, as it doesn't call qcow2_do_open.




Simply changing the order won't fix this because in the r/w -> r/o, the
driver will legitimately flush its caches in .bdrv_reopen_prepare, and
for this it still needs to be able to write.

We may need to have a way for nodes to access both the old and the new
state of their children. I'm not completely sure how to achieve this
best, though.

When I thought only of permissions, the obvious and simple thing to do
was to just get combined permissions for the old and new state, i.e.
'old_perm | new_perm' and 'old_shared & new_shared'. But I don't think
this is actually enough when the child node switches between a r/w and
a r/o file descriptor because even though QEMU's permission system would
allow the write, you still can't successfully write to a r/o file
descriptor.

Kevin


Maybe we want two .bdrv_reopen_prepare: 
.bdrv_reopen_prepare_before_children and 
.bdrv_reopen_prepare_after_children. But to write something in 
reopen_prepare, we need to move bdrv_set_perm from reopen_commit to 
reopen_prepare.. Is it possible?


Now, I've found the following workaround, what do you think about 
something like this as a temporary fix:


diff --git a/include/block/block.h b/include/block/block.h
index e677080c4e..c21392491d 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -266,7 +266,8 @@ BlockDriverState *bdrv_new_open_driver(BlockDriver 
*drv, const char *node_name,

 BlockReopenQueue *bdrv_reopen_queue(BlockReopenQueue *bs_queue,
 BlockDriverState *bs,
 QDict *options, int flags);
-int bdrv_reopen_multiple(AioContext *ctx, BlockReopenQueue *bs_queue, 
Error **errp);

+int bdrv_reopen_multiple(AioContext *ctx, BlockReopenQueue *bs_queue,
+ bool cheat_reopen_rw, Error **errp);
 int bdrv_reopen(BlockDriverState *bs, int bdrv_flags, Error **errp);
 int bdrv_reopen_prepare(BDRVReopenState *reopen_state,
 BlockReopenQueue *queue, Error **errp);
diff --git a/block.c b/block.c
index 50887087f3..9b50828cd2 100644
--- a/block.c
+++ b/block.c
@@ -2988,7 +2988,8 @@ BlockReopenQueue 
*bdrv_reopen_queue(BlockReopenQueue *bs_queue,

  * All affected nodes must be drained between bdrv_reopen_queue() and
  * bdrv_reopen_multiple().
  */
-int bdrv_reopen_multiple(AioContext *ctx, BlockReopenQueue *bs_queue, 
Error **errp)

+int bdrv_reopen_multiple(AioContext *ctx, BlockReopenQueue *bs_queue,
+ bool cheat_reopen_rw, Error **errp)
 {
 int ret = -1;
 BlockReopenQueueEntry *bs_entry, *next;
@@ -3005,6 +3006,14 @@ int bdrv_reopen_multiple(AioContext *ctx, 
BlockReopenQueue *bs_queue, Error **er

 bs_entry->prepared = true;
 }

+    if (cheat_reopen_rw) {
+    /* reverse queue, to reopen children first */
+    QSIMPLEQ_FOREACH_SAFE(bs_entry, bs_queue, entry, next) {
+    QSIMPLEQ_REMOVE(bs_queue, bs_entry, 
BlockReopenQueueEntry, entry);

+    QSIMPLEQ_INSERT_HEAD(bs_queue, bs_entry, entry);
+    }
+    }
+
 /* If we reach this point, we have success and just need to apply 
the

  * changes
  */
@@ -3036,11 +3045,13 @@ int bdrv_reopen(BlockDriverState *bs, int 
bdrv_flags, Error **errp)

 int ret = -1;
 Error *local_err = NULL;
 BlockReopenQueue *queue;
+    bool cheat_reopen_rw = bdrv_is_read_only(bs) && (bdrv_flags & 
BDRV_O_RDWR);


 bdrv_subtree_drained_begin(bs);

 queue = bdrv_reopen_queue(NULL, bs, NULL, bdrv_flags);
-    

Re: [Qemu-block] [PATCH v2 1/6] qemu-nbd: add support for authorization of TLS clients

2018-06-20 Thread Daniel P . Berrangé
On Wed, Jun 20, 2018 at 03:22:53PM +0100, Dr. David Alan Gilbert wrote:
> * Daniel P. Berrangé (berra...@redhat.com) wrote:
> > From: "Daniel P. Berrange" 
> > 
> > Currently any client which can complete the TLS handshake is able to use
> > the NBD server. The server admin can turn on the 'verify-peer' option
> > for the x509 creds to require the client to provide a x509 certificate.
> > This means the client will have to acquire a certificate from the CA
> > before they are permitted to use the NBD server. This is still a fairly
> > low bar to cross.
> > 
> > This adds a '--tls-authz OBJECT-ID' option to the qemu-nbd command which
> > takes the ID of a previously added 'QAuthZ' object instance. This will
> > be used to validate the client's x509 distinguished name. Clients
> > failing the authorization check will not be permitted to use the NBD
> > server.
> > 
> > For example to setup authorization that only allows connection from a client
> > whose x509 certificate distinguished name is
> > 
> >CN=laptop.example.com,O=Example Org,L=London,ST=London,C=GB
> > 
> > use:
> > 
> >   qemu-nbd --object tls-creds-x509,id=tls0,dir=/home/berrange/qemutls,\
> > endpoint=server,verify-peer=yes \
> >--object authz-simple,id=auth0,identity=CN=laptop.example.com,,\
> > O=Example Org,,L=London,,ST=London,,C=GB \
> 
> I'm confused about how that gets parsed, what differentiates the ,s
> that separate the arguments (e.g. ,id=  ,identity=) and the ,s that
> separate the options within the identity string (e.g. the ,ST=London)

That's why I've doubled up - eg ',,' must be used when you need to
include a literal ',' in a value without it being interpreted as
starting a new option

> Would:
>   --object authz-simple,identity=CN=laptop.example.com,,O=Example 
> Org,,L=London,,ST=London,,C=GB,id=auth0
> 
> be equivalent?

Yes


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [Qemu-block] [PATCH v5 5/6] qapi: new qmp command nbd-server-add-bitmap

2018-06-20 Thread Vladimir Sementsov-Ogievskiy

20.06.2018 14:26, Eric Blake wrote:

On 06/09/2018 10:17 AM, Vladimir Sementsov-Ogievskiy wrote:

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  qapi/block.json | 23 +++
  blockdev-nbd.c  | 23 +++
  2 files changed, 46 insertions(+)


I'm tempted to temporarily name this x-nbd-server-add-bitmap, until I 
have the counterpart Libvirt patches tested, just in case testing 
turns up any tweaks we need to the interface.




diff --git a/qapi/block.json b/qapi/block.json
index c694524002..ddbca2e286 100644
--- a/qapi/block.json
+++ b/qapi/block.json
@@ -269,6 +269,29 @@
    'data': {'name': 'str', '*mode': 'NbdServerRemoveMode'} }
    ##
+# @nbd-server-add-bitmap:
+#
+# Expose a dirty bitmap associated with the selected export. The 
bitmap search
+# starts at the device attached to the export, and includes all 
backing files.

+# The exported bitmap is then locked until the NBD export is removed.


The fact that you search the backing chain is at least consistent with 
your code.



+#
+# @name: Export name.
+#
+# @bitmap: Bitmap name to search for.
+#
+# @bitmap-export-name: How the bitmap will be seen by nbd clients
+#  (default @bitmap)


Do we really need the flexibility of naming the bitmap differently to 
the NBD client than we do in qemu?


It was needed for our backup-api implementation. Nikolay?




+#
+# Note: the client must use NBD_OPT_SET_META_CONTEXT with a query of
+# "qemu:dirty-bitmap:NAME" (where NAME matches @bitmap-export-name) 
to access

+# the exposed bitmap.
+#
+# Since: 3.0
+##
+  { 'command': 'nbd-server-add-bitmap',
+    'data': {'name': 'str', 'bitmap': 'str', '*bitmap-export-name': 
'str'} }

+
+##
  # @nbd-server-stop:
  #
  # Stop QEMU's embedded NBD server, and unregister all devices 
previously

diff --git a/blockdev-nbd.c b/blockdev-nbd.c
index 65a84739ed..6b0c50732c 100644
--- a/blockdev-nbd.c
+++ b/blockdev-nbd.c
@@ -220,3 +220,26 @@ void qmp_nbd_server_stop(Error **errp)
  nbd_server_free(nbd_server);
  nbd_server = NULL;
  }
+
+void qmp_nbd_server_add_bitmap(const char *name, const char *bitmap,
+   bool has_bitmap_export_name,
+   const char *bitmap_export_name,
+   Error **errp)
+{
+    NBDExport *exp;
+
+    if (!nbd_server) {
+    error_setg(errp, "NBD server not running");
+    return;
+    }
+
+    exp = nbd_export_find(name);
+    if (exp == NULL) {
+    error_setg(errp, "Export '%s' is not found", name);
+    return;
+    }
+
+    nbd_export_bitmap(exp, bitmap,
+  has_bitmap_export_name ? bitmap_export_name : 
bitmap,

+  errp);
+}






--
Best regards,
Vladimir




Re: [Qemu-block] [PATCH v2 2/6] nbd: allow authorization with nbd-server-start QMP command

2018-06-20 Thread Eric Blake

On 06/20/2018 07:14 AM, Daniel P. Berrangé wrote:

From: "Daniel P. Berrange" 


I thought you preferred the UTF-8 accent in your Author lines these 
days?  Or is this because this patch has been sitting around in your 
local repo prior to the point where you switched your git config author 
spelling? (Also applies to S-o-b in the series)




As with the previous patch to qemu-nbd, the nbd-server-start QMP command
also needs to be able to specify authorization when enabling TLS encryption.

First the client must create a QAuthZ object instance using the
'object-add' command:




They can then reference this in the new 'tls-authz' parameter when
executing the 'nbd-server-start' command:




@@ -132,11 +137,12 @@ void nbd_server_start(SocketAddress *addr, const char 
*tls_creds,
  
  void qmp_nbd_server_start(SocketAddressLegacy *addr,

bool has_tls_creds, const char *tls_creds,
+  bool has_tls_authz, const char *tls_authz,
Error **errp)
  {
  SocketAddress *addr_flat = socket_address_flatten(addr);
  
-nbd_server_start(addr_flat, tls_creds, errp);

+nbd_server_start(addr_flat, tls_creds, tls_authz, errp);


Relies on QMP generated code setting tls_authz = NULL if has_tls_authz 
is false (but no different than the fact that we already relied on it 
for tls_creds).  Someday it would be nice to get rid of the has_FOO for 
optional strings, but that's not your problem.



+++ b/qapi/block.json
@@ -197,6 +197,11 @@
  #
  # @addr: Address on which to listen.
  # @tls-creds: (optional) ID of the TLS credentials object. Since 2.6
+# @tls-authz: ID of the QAuthZ authorization object used to validate
+# the client's x509 distinguished name. This object is
+# is only resolved at time of use, so can be deleted and
+# recreated on the fly while the NBD server is active.
+# If missing, it will default to denying access. Since 3.0
  #
  # Returns: error if the server is already running.
  #
@@ -204,7 +209,8 @@
  ##
  { 'command': 'nbd-server-start',
'data': { 'addr': 'SocketAddressLegacy',
-'*tls-creds': 'str'} }
+'*tls-creds': 'str',
+'*tls-authz': 'str'} }


Reviewed-by: Eric Blake 

Although patch 1 and 2 touch NBD, I'm happy for Dan to be the one that 
merges it as part of the larger series.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-block] [PATCH v2 2/6] nbd: allow authorization with nbd-server-start QMP command

2018-06-20 Thread Daniel P . Berrangé
On Wed, Jun 20, 2018 at 09:05:32AM -0500, Eric Blake wrote:
> On 06/20/2018 07:14 AM, Daniel P. Berrangé wrote:
> > From: "Daniel P. Berrange" 
> 
> I thought you preferred the UTF-8 accent in your Author lines these days?
> Or is this because this patch has been sitting around in your local repo
> prior to the point where you switched your git config author spelling? (Also
> applies to S-o-b in the series)

Yeah its "only" been sitting in my tree since late 2016 :-)

> 
> > 
> > As with the previous patch to qemu-nbd, the nbd-server-start QMP command
> > also needs to be able to specify authorization when enabling TLS encryption.
> > 
> > First the client must create a QAuthZ object instance using the
> > 'object-add' command:
> > 
> 
> > They can then reference this in the new 'tls-authz' parameter when
> > executing the 'nbd-server-start' command:
> > 
> 
> > @@ -132,11 +137,12 @@ void nbd_server_start(SocketAddress *addr, const char 
> > *tls_creds,
> >   void qmp_nbd_server_start(SocketAddressLegacy *addr,
> > bool has_tls_creds, const char *tls_creds,
> > +  bool has_tls_authz, const char *tls_authz,
> > Error **errp)
> >   {
> >   SocketAddress *addr_flat = socket_address_flatten(addr);
> > -nbd_server_start(addr_flat, tls_creds, errp);
> > +nbd_server_start(addr_flat, tls_creds, tls_authz, errp);
> 
> Relies on QMP generated code setting tls_authz = NULL if has_tls_authz is
> false (but no different than the fact that we already relied on it for
> tls_creds).  Someday it would be nice to get rid of the has_FOO for optional
> strings, but that's not your problem.
> 
> > +++ b/qapi/block.json
> > @@ -197,6 +197,11 @@
> >   #
> >   # @addr: Address on which to listen.
> >   # @tls-creds: (optional) ID of the TLS credentials object. Since 2.6
> > +# @tls-authz: ID of the QAuthZ authorization object used to validate
> > +# the client's x509 distinguished name. This object is
> > +# is only resolved at time of use, so can be deleted and
> > +# recreated on the fly while the NBD server is active.
> > +# If missing, it will default to denying access. Since 3.0
> >   #
> >   # Returns: error if the server is already running.
> >   #
> > @@ -204,7 +209,8 @@
> >   ##
> >   { 'command': 'nbd-server-start',
> > 'data': { 'addr': 'SocketAddressLegacy',
> > -'*tls-creds': 'str'} }
> > +'*tls-creds': 'str',
> > +'*tls-authz': 'str'} }
> 
> Reviewed-by: Eric Blake 
> 
> Although patch 1 and 2 touch NBD, I'm happy for Dan to be the one that
> merges it as part of the larger series.
> 
> -- 
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.   +1-919-301-3266
> Virtualization:  qemu.org | libvirt.org

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [Qemu-block] [PATCH v5 4/6] nbd/server: implement dirty bitmap export

2018-06-20 Thread Vladimir Sementsov-Ogievskiy

20.06.2018 14:24, Eric Blake wrote:

On 06/09/2018 10:17 AM, Vladimir Sementsov-Ogievskiy wrote:

Handle new NBD meta namespace: "qemu", and corresponding queries:


s/new/a new/


"qemu:dirty-bitmap:".

With new metadata context negotiated, BLOCK_STATUS query will reply


s/new/the new/


with dirty-bitmap data, converted to extents. New public function


s/New/The new/


nbd_export_bitmap selects bitmap to export. For now, only one bitmap


s/bitmap/which bitmap/


may be exported.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  include/block/nbd.h |   6 ++
  nbd/server.c    | 271 


  2 files changed, 259 insertions(+), 18 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index fcdcd54502..a653d0fba7 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -234,6 +234,10 @@ enum {
  #define NBD_STATE_HOLE (1 << 0)
  #define NBD_STATE_ZERO (1 << 1)
  +/* Flags for extents (NBDExtent.flags) of 
NBD_REPLY_TYPE_BLOCK_STATUS,

+ * for qemu:dirty-bitmap:* meta contexts */


Comment tail.


+#define NBD_STATE_DIRTY (1 << 0)
+
  static inline bool nbd_reply_type_is_error(int type)
  {
  return type & (1 << 15);
@@ -315,6 +319,8 @@ void nbd_client_put(NBDClient *client);
  void nbd_server_start(SocketAddress *addr, const char *tls_creds,
    Error **errp);
  +void nbd_export_bitmap(NBDExport *exp, const char *bitmap,
+   const char *bitmap_export_name, Error **errp);
    /* nbd_read
   * Reads @size bytes from @ioc. Returns 0 on success.
diff --git a/nbd/server.c b/nbd/server.c
index 2d762d7289..569e068fc2 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -23,6 +23,12 @@
  #include "nbd-internal.h"
    #define NBD_META_ID_BASE_ALLOCATION 0
+#define NBD_META_ID_DIRTY_BITMAP 1
+
+/* NBD_MAX_BITMAP_EXTENTS: 1 mb of extents data. An empirical 
constant. If need
+ * to increase, note that NBD protocol defines 32 mb as a limit, 
after which the


s/need to increase/an increase is needed/


+ * reply may be considered as a denial of service attack. */
+#define NBD_MAX_BITMAP_EXTENTS (0x10 / 8)
    static int system_errno_to_nbd_errno(int err)
  {
@@ -80,6 +86,9 @@ struct NBDExport {
    BlockBackend *eject_notifier_blk;
  Notifier eject_notifier;
+
+    BdrvDirtyBitmap *export_bitmap;
+    char *export_bitmap_context;
  };
    static QTAILQ_HEAD(, NBDExport) exports = 
QTAILQ_HEAD_INITIALIZER(exports);

@@ -92,6 +101,7 @@ typedef struct NBDExportMetaContexts {
  bool valid; /* means that negotiation of the option finished 
without

 errors */
  bool base_allocation; /* export base:allocation context (block 
status) */
+    bool dirty_bitmap; /* export qemu:dirty-bitmap:name> */

  } NBDExportMetaContexts;
    struct NBDClient {
@@ -810,6 +820,55 @@ static int nbd_meta_base_query(NBDClient 
*client, NBDExportMetaContexts *meta,

>base_allocation, errp);
  }
  +/* nbd_meta_bitmap_query
+ *
+ * Handle query to 'qemu:' namespace.
+ * @len is the amount of text remaining to be read from the current 
name, after

+ * the 'qemu:' portion has been stripped.
+ *
+ * Return -errno on I/O error, 0 if option was completely handled by
+ * sending a reply about inconsistent lengths, or 1 on success. */
+static int nbd_meta_qemu_query(NBDClient *client, 
NBDExportMetaContexts *meta,

+   uint32_t len, Error **errp)
+{
+    bool dirty_bitmap = false;
+    int dirty_bitmap_len = strlen("dirty-bitmap:");


size_t is better for strlen() values.


+    int ret;
+
+    if (!meta->exp->export_bitmap) {
+    return nbd_opt_skip(client, len, errp);
+    }


Worth a trace?...


+
+    if (len == 0) {
+    if (client->opt == NBD_OPT_LIST_META_CONTEXT) {
+    meta->dirty_bitmap = true;
+    }
+    trace_nbd_negotiate_meta_query_parse("empty");
+    return 1;
+    }
+
+    if (len < dirty_bitmap_len) {
+    trace_nbd_negotiate_meta_query_skip("not dirty-bitmap:");
+    return nbd_opt_skip(client, len, errp);
+    }
+


...especially since other returns have traced the result.

I'm strongly thinking of adding a patch to QMP to add an x-context 
parameter when creating an NBD client, in order to at least make 
testing client/server interactions easier than just code inspection.  
Does not have to be part of this series.



+    len -= dirty_bitmap_len;
+    ret = nbd_meta_pattern(client, "dirty-bitmap:", _bitmap, 
errp);

+    if (ret <= 0) {
+    return ret;
+    }
+    if (!dirty_bitmap) {
+    trace_nbd_negotiate_meta_query_skip("not dirty-bitmap:");
+    return nbd_opt_skip(client, len, errp);
+    }
+
+    trace_nbd_negotiate_meta_query_parse("dirty-bitmap:");
+
+    return nbd_meta_empty_or_pattern(
+    client, meta->exp->export_bitmap_context +
+    strlen("qemu:dirty_bitmap:"), len, >dirty_bitmap, 
errp);

+}
+
  /* nbd_negotiate_meta_query
   *
   * Parse namespace name and call 

Re: [Qemu-block] [PATCH v2 1/6] qemu-nbd: add support for authorization of TLS clients

2018-06-20 Thread Daniel P . Berrangé
On Wed, Jun 20, 2018 at 08:58:40AM -0500, Eric Blake wrote:
> On 06/20/2018 07:14 AM, Daniel P. Berrangé wrote:
> > From: "Daniel P. Berrange" 
> > 
> > Currently any client which can complete the TLS handshake is able to use
> > the NBD server. The server admin can turn on the 'verify-peer' option
> > for the x509 creds to require the client to provide a x509 certificate.
> > This means the client will have to acquire a certificate from the CA
> > before they are permitted to use the NBD server. This is still a fairly
> > low bar to cross.
> > 
> > This adds a '--tls-authz OBJECT-ID' option to the qemu-nbd command which
> > takes the ID of a previously added 'QAuthZ' object instance. This will
> > be used to validate the client's x509 distinguished name. Clients
> > failing the authorization check will not be permitted to use the NBD
> > server.
> > 
> > For example to setup authorization that only allows connection from a client
> > whose x509 certificate distinguished name is
> > 
> > CN=laptop.example.com,O=Example Org,L=London,ST=London,C=GB
> 
> Is the space in O= intentional?

Yes it really does have a space, though perhaps for sake of
illustration I'll remove the space.

> 
> > 
> > use:
> > 
> >qemu-nbd --object tls-creds-x509,id=tls0,dir=/home/berrange/qemutls,\
> >  endpoint=server,verify-peer=yes \
> > --object authz-simple,id=auth0,identity=CN=laptop.example.com,,\
> >  O=Example Org,,L=London,,ST=London,,C=GB \
> 
> you need shell quoting to preserve the space. Also, the indentation breaks
> the intent that these long lines are single arguments, not separate
> arguments.

Yeah I'll definitely remove the space, because it just makes this
more confusing than needed.

I know the \ isn't really supposed to separate args, but I don't
see a nicer way to format it for a commit message while respecting
line length. I figured people would understand what I meant, but
the fact that I've indented the second line to vertically align.

> 
> > --tls-creds tls0 \
> > --tls-authz authz0
> >other qemu-nbd args...
> 
> Is it also worth a sample command line using JSON syntax?

Probably overkill for the commit message IMHO.

> 
> > 
> > Signed-off-by: Daniel P. Berrange 
> > ---
> >   include/block/nbd.h |  2 +-
> >   nbd/server.c| 10 +-
> >   qemu-nbd.c  | 13 -
> >   qemu-nbd.texi   |  4 
> >   4 files changed, 22 insertions(+), 7 deletions(-)
> > 
> 
> Reviewed-by: Eric Blake 
> 
> -- 
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.   +1-919-301-3266
> Virtualization:  qemu.org | libvirt.org

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [Qemu-block] [PATCH v2 1/6] qemu-nbd: add support for authorization of TLS clients

2018-06-20 Thread Eric Blake

On 06/20/2018 07:14 AM, Daniel P. Berrangé wrote:

From: "Daniel P. Berrange" 

Currently any client which can complete the TLS handshake is able to use
the NBD server. The server admin can turn on the 'verify-peer' option
for the x509 creds to require the client to provide a x509 certificate.
This means the client will have to acquire a certificate from the CA
before they are permitted to use the NBD server. This is still a fairly
low bar to cross.

This adds a '--tls-authz OBJECT-ID' option to the qemu-nbd command which
takes the ID of a previously added 'QAuthZ' object instance. This will
be used to validate the client's x509 distinguished name. Clients
failing the authorization check will not be permitted to use the NBD
server.

For example to setup authorization that only allows connection from a client
whose x509 certificate distinguished name is

CN=laptop.example.com,O=Example Org,L=London,ST=London,C=GB


Is the space in O= intentional?



use:

   qemu-nbd --object tls-creds-x509,id=tls0,dir=/home/berrange/qemutls,\
 endpoint=server,verify-peer=yes \
--object authz-simple,id=auth0,identity=CN=laptop.example.com,,\
 O=Example Org,,L=London,,ST=London,,C=GB \


you need shell quoting to preserve the space. Also, the indentation 
breaks the intent that these long lines are single arguments, not 
separate arguments.



--tls-creds tls0 \
--tls-authz authz0
   other qemu-nbd args...


Is it also worth a sample command line using JSON syntax?



Signed-off-by: Daniel P. Berrange 
---
  include/block/nbd.h |  2 +-
  nbd/server.c| 10 +-
  qemu-nbd.c  | 13 -
  qemu-nbd.texi   |  4 
  4 files changed, 22 insertions(+), 7 deletions(-)



Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-block] [PATCH v2 03/10] qcow2/bitmap: cache bm_list

2018-06-20 Thread Vladimir Sementsov-Ogievskiy

20.06.2018 16:04, Vladimir Sementsov-Ogievskiy wrote:

13.06.2018 05:06, John Snow wrote:

We don't need to re-read this list every time, exactly. We can keep it cached
and delete our copy when we flush to disk.

Because we don't try to flush bitmaps on close if there's nothing to flush,
add a new conditional to delete the state anyway for a clean exit.

Signed-off-by: John Snow
---
  block/qcow2-bitmap.c | 74 ++--
  block/qcow2.c|  2 ++
  block/qcow2.h|  2 ++
  3 files changed, 52 insertions(+), 26 deletions(-)

diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c
index 85c1b5afe3..5ae9b17928 100644
--- a/block/qcow2-bitmap.c
+++ b/block/qcow2-bitmap.c
@@ -636,6 +636,34 @@ fail:
  return NULL;
  }
  
+static Qcow2BitmapList *get_bitmap_list(BlockDriverState *bs, Error **errp)

+{
+BDRVQcow2State *s = bs->opaque;
+Qcow2BitmapList *bm_list;
+
+if (s->bitmap_list) {
+return (Qcow2BitmapList *)s->bitmap_list;
+}
+
+if (s->nb_bitmaps) {
+bm_list = bitmap_list_load(bs, errp);
+} else {
+bm_list = bitmap_list_new();
+}
+s->bitmap_list = bm_list;


may be, we shouldn't cache it in inactive mode. However, I think we'll 
finally will not load bitmaps in inactive mode and drop the on 
inactivate, so it would not matter..


really, now it would be a problem: we can start in inactive mode, load 
nothing, and then we can't reload bitmaps; my fix in

https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg03254.html
will not work after this patch.



Hm, I've understood the following problem: cache becomes incorrect 
after failed update_ext_header_and_dir or 
update_ext_header_and_dir_in_place operations. (after failed 
qcow2_remove_persistent_dirty_bitmap or qcow2_reopen_bitmaps_rw_hint)


And this comes from incomplete architecture after the patch:
On the one hand, we work with one singleton bitmap_list, and loading 
part is refactored to do it safely. On the other hand, storing 
functions still have old behavior, they work with bitmap list like 
with their own local variable.


So, we have safe mechanism to read list through the cache. We need 
also safe mechanism to update list both in cache and in file.


There are two possible variants:

1. drop cache after failed store
2. rollback cache after failed store

1 looks simpler..

Also, we should drop cache on inactivate (we do this) and should not 
create cache in inactive mode, because the other process may change 
the image.


Hm. may be, it is better to work with s->bitmap_list directly? In this 
case it will be more obvious that it is the cache, not local variable. 
And we will work with it like with other "parts of extension cache" 
s->nb_bitmaps, s->bitmap_directory_offset ...


After the patch, functions update_ext_header_and_dir* becomes strange:

1. before the patch, they take external parameter - bm_list, and by 
this parameter they updated the file and cached s->nb_bitmaps, 
s->bitmap_*, ..
2. after the patch, they take parameter (actually s->bitmap_list) of 
same nature like s->nb_bitmap, and update s->nb_bitmap from it.


Sorry for being late and for disordered stream of thoughts. Is this 
patch really needed for the whole series?



+return bm_list;
+}
+
+static void del_bitmap_list(BlockDriverState *bs)
+{
+BDRVQcow2State *s = bs->opaque;
+
+if (s->bitmap_list) {
+bitmap_list_free(s->bitmap_list);
+s->bitmap_list = NULL;
+}
+}
+
  int qcow2_check_bitmaps_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
void **refcount_table,
int64_t *refcount_table_size)
@@ -656,7 +684,7 @@ int qcow2_check_bitmaps_refcounts(BlockDriverState *bs, 
BdrvCheckResult *res,
  return ret;
  }
  
-bm_list = bitmap_list_load(bs, NULL);

+bm_list = get_bitmap_list(bs, NULL);
  if (bm_list == NULL) {
  res->corruptions++;
  return -EINVAL;
@@ -706,8 +734,6 @@ int qcow2_check_bitmaps_refcounts(BlockDriverState *bs, 
BdrvCheckResult *res,
  }
  
  out:

-bitmap_list_free(bm_list);
-
  return ret;
  }
  
@@ -944,7 +970,7 @@ bool qcow2_load_dirty_bitmaps(BlockDriverState *bs, Error **errp)

  return false;
  }
  
-bm_list = bitmap_list_load(bs, errp);

+bm_list = get_bitmap_list(bs, errp);
  if (bm_list == NULL) {
  return false;
  }
@@ -978,8 +1004,6 @@ bool qcow2_load_dirty_bitmaps(BlockDriverState *bs, Error 
**errp)
  }
  }
  
-bitmap_list_free(bm_list);

-
  return needs_update;
  
  fail:

@@ -988,8 +1012,7 @@ fail:
  bdrv_release_dirty_bitmap(bs, bm->dirty_bitmap);
  }
  }
-bitmap_list_free(bm_list);
-
+del_bitmap_list(bs);
  return false;
  }
  
@@ -1016,7 +1039,7 @@ int qcow2_reopen_bitmaps_rw_hint(BlockDriverState *bs, bool *header_updated,

  return -EINVAL;
  }
  
-bm_list = 

Re: [Qemu-block] [PATCH v2 04/10] qcow2/bitmap: cache loaded bitmaps

2018-06-20 Thread Vladimir Sementsov-Ogievskiy

13.06.2018 05:06, John Snow wrote:

For bitmaps that we succeeded in loading, we can cache a reference
to that object. This will let us iterate over the more convenient
form of in-memory bitmaps for qemu-img bitmap manipulation tools.

Signed-off-by: John Snow 
---
  block/qcow2-bitmap.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c
index 5ae9b17928..d94b6bd5cf 100644
--- a/block/qcow2-bitmap.c
+++ b/block/qcow2-bitmap.c
@@ -981,6 +981,7 @@ bool qcow2_load_dirty_bitmaps(BlockDriverState *bs, Error 
**errp)
  if (bitmap == NULL) {
  goto fail;
  }
+bm->dirty_bitmap = bitmap;
  
  if (!(bm->flags & BME_FLAG_AUTO)) {

  bdrv_disable_dirty_bitmap(bitmap);
@@ -1382,6 +1383,7 @@ void 
qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs, Error **errp)
  bm->name = g_strdup(name);
  QSIMPLEQ_INSERT_TAIL(bm_list, bm, entry);
  } else {
+assert(bitmap == bm->dirty_bitmap);
  if (!(bm->flags & BME_FLAG_IN_USE)) {
  error_setg(errp, "Bitmap '%s' already exists in the image",
 name);


and now it is already used in 02, so it should be reordered or merged to it

--
Best regards,
Vladimir




Re: [Qemu-block] [PATCH v2 03/10] qcow2/bitmap: cache bm_list

2018-06-20 Thread Vladimir Sementsov-Ogievskiy

13.06.2018 05:06, John Snow wrote:

We don't need to re-read this list every time, exactly. We can keep it cached
and delete our copy when we flush to disk.

Because we don't try to flush bitmaps on close if there's nothing to flush,
add a new conditional to delete the state anyway for a clean exit.

Signed-off-by: John Snow 
---
  block/qcow2-bitmap.c | 74 ++--
  block/qcow2.c|  2 ++
  block/qcow2.h|  2 ++
  3 files changed, 52 insertions(+), 26 deletions(-)

diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c
index 85c1b5afe3..5ae9b17928 100644
--- a/block/qcow2-bitmap.c
+++ b/block/qcow2-bitmap.c
@@ -636,6 +636,34 @@ fail:
  return NULL;
  }
  
+static Qcow2BitmapList *get_bitmap_list(BlockDriverState *bs, Error **errp)

+{
+BDRVQcow2State *s = bs->opaque;
+Qcow2BitmapList *bm_list;
+
+if (s->bitmap_list) {
+return (Qcow2BitmapList *)s->bitmap_list;
+}
+
+if (s->nb_bitmaps) {
+bm_list = bitmap_list_load(bs, errp);
+} else {
+bm_list = bitmap_list_new();
+}
+s->bitmap_list = bm_list;


may be, we shouldn't cache it in inactive mode. However, I think we'll 
finally will not load bitmaps in inactive mode and drop the on 
inactivate, so it would not matter..


Hm, I've understood the following problem: cache becomes incorrect after 
failed update_ext_header_and_dir or update_ext_header_and_dir_in_place 
operations. (after failed qcow2_remove_persistent_dirty_bitmap or 
qcow2_reopen_bitmaps_rw_hint)


And this comes from incomplete architecture after the patch:
On the one hand, we work with one singleton bitmap_list, and loading 
part is refactored to do it safely. On the other hand, storing functions 
still have old behavior, they work with bitmap list like with their own 
local variable.


So, we have safe mechanism to read list through the cache. We need also 
safe mechanism to update list both in cache and in file.


There are two possible variants:

1. drop cache after failed store
2. rollback cache after failed store

1 looks simpler..

Also, we should drop cache on inactivate (we do this) and should not 
create cache in inactive mode, because the other process may change the 
image.


Hm. may be, it is better to work with s->bitmap_list directly? In this 
case it will be more obvious that it is the cache, not local variable. 
And we will work with it like with other "parts of extension cache" 
s->nb_bitmaps, s->bitmap_directory_offset ...


After the patch, functions update_ext_header_and_dir* becomes strange:

1. before the patch, they take external parameter - bm_list, and by this 
parameter they updated the file and cached s->nb_bitmaps, s->bitmap_*, ..
2. after the patch, they take parameter (actually s->bitmap_list) of 
same nature like s->nb_bitmap, and update s->nb_bitmap from it.


Sorry for being late and for disordered stream of thoughts. Is this 
patch really needed for the whole series?



+return bm_list;
+}
+
+static void del_bitmap_list(BlockDriverState *bs)
+{
+BDRVQcow2State *s = bs->opaque;
+
+if (s->bitmap_list) {
+bitmap_list_free(s->bitmap_list);
+s->bitmap_list = NULL;
+}
+}
+
  int qcow2_check_bitmaps_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
void **refcount_table,
int64_t *refcount_table_size)
@@ -656,7 +684,7 @@ int qcow2_check_bitmaps_refcounts(BlockDriverState *bs, 
BdrvCheckResult *res,
  return ret;
  }
  
-bm_list = bitmap_list_load(bs, NULL);

+bm_list = get_bitmap_list(bs, NULL);
  if (bm_list == NULL) {
  res->corruptions++;
  return -EINVAL;
@@ -706,8 +734,6 @@ int qcow2_check_bitmaps_refcounts(BlockDriverState *bs, 
BdrvCheckResult *res,
  }
  
  out:

-bitmap_list_free(bm_list);
-
  return ret;
  }
  
@@ -944,7 +970,7 @@ bool qcow2_load_dirty_bitmaps(BlockDriverState *bs, Error **errp)

  return false;
  }
  
-bm_list = bitmap_list_load(bs, errp);

+bm_list = get_bitmap_list(bs, errp);
  if (bm_list == NULL) {
  return false;
  }
@@ -978,8 +1004,6 @@ bool qcow2_load_dirty_bitmaps(BlockDriverState *bs, Error 
**errp)
  }
  }
  
-bitmap_list_free(bm_list);

-
  return needs_update;
  
  fail:

@@ -988,8 +1012,7 @@ fail:
  bdrv_release_dirty_bitmap(bs, bm->dirty_bitmap);
  }
  }
-bitmap_list_free(bm_list);
-
+del_bitmap_list(bs);
  return false;
  }
  
@@ -1016,7 +1039,7 @@ int qcow2_reopen_bitmaps_rw_hint(BlockDriverState *bs, bool *header_updated,

  return -EINVAL;
  }
  
-bm_list = bitmap_list_load(bs, errp);

+bm_list = get_bitmap_list(bs, errp);
  if (bm_list == NULL) {
  return -EINVAL;
  }
@@ -1056,7 +1079,6 @@ int qcow2_reopen_bitmaps_rw_hint(BlockDriverState *bs, 
bool *header_updated,
  
  out:

  g_slist_free(ro_dirty_bitmaps);
-

Re: [Qemu-block] [RFC PATCH 06/10] block: Allow changing the backing file on reopen

2018-06-20 Thread Alberto Garcia
On Wed 20 Jun 2018 12:58:55 PM CEST, Kevin Wolf wrote:
> Am 19.06.2018 um 16:20 hat Alberto Garcia geschrieben:
>> >> Wait, I think the description I gave is inaccurate:
>> >> 
>> >> commit_complete() calls bdrv_drop_intermediate(), and that updates the
>> >> backing image name (c->role->update_filename()). If we're doing this in
>> >> an intermediate node then it needs to be reopened in read-write mode,
>> >> while keeping the rest of the options.
>> >> 
>> >> But then bdrv_reopen_commit() realizes that the backing file (from
>> >> reopen_state->options) is not the current one (because there's a
>> >> bdrv_mirror_top implicit filter) and attempts to remove it. And that's
>> >> the root cause of the crash.
>> >
>> > How did the other (the real?) backing file get into
>> > reopen_state->options? I don't think bdrv_drop_intermediate() wants to
>> > change anything except the read-only flag, so we should surely have
>> > the node name of bdrv_mirror_top there?
>> 
>> No, it doesn't (try to) change anything else. It cannot do it:
>> bdrv_reopen() only takes flags, but keeps the current options. And the
>> current options have 'backing' set to a thing different from the
>> bdrv_mirror_top node.
>
> Okay, so in theory this is expected to just work.
>
> Actually, do we ever use bdrv_reopen() for flags other than read-only?
> Maybe we should get rid of that flags nonsense and simply make it a
> bdrv_reopen_set_readonly() taking a boolean.

I think that's a good idea. There's however one case where other flags
are changed: reopen_backing_file() in block/replication.c that also
clears BDRV_O_INACTIVE. I'd need to take a closer look to that code to
see what we can do about it.

And there's of course qemu-io's "reopen" command, which does take
options but keeps the previous values.

>> > > So my first impression is that we should not try to change backing
>> > > files if a reopen was not requested by the user (blockdev-reopen)
>> > > and perhaps we should forbid it when there are implicit nodes
>> > > involved.
>> > Changing the behaviour depending on who the caller is sounds like a
>> > hack that relies on coincidence and may break sooner or later.
>> 
>> The main difference between the user calling blockdev-reopen and the
>> code doing bdrv_reopen() internally is that in the former case all
>> existing options are ignored (keep_old_opts = false) and in the latter
>> they are kept.
>> 
>> This latter case can have unintended consequences, and I think they're
>> all related to the fact that bs->explicit_options also keeps the options
>> of its children. So if you have
>> 
>>C <- B <- A
>> 
>> and A contains 'backing.driver=qcow2, backing.node-name=foo, ...', and
>> you reopen A (keeping its options) then everything goes fine. However if
>> you remove B from the chain (using e.g. block-stream) then you can't
>> reopen A anymore because backing.* no longer matches the current backing
>> file.
>> 
>> I suppose that we would need to remove the children options from
>> bs->explicit_options in that case? If this all happens because of the
>> user calling blockdev-reopen then we don't need to worry about it becase
>> bs->explicit_options are ignored.
>
> So the problem is that bs->explicit_options (and probably bs->options)
> aren't consistent with the actual state of the graph. The fix for that
> is likely not in bdrv_reopen().

Probably not because the graph can be changed by other means (e.g
block-stream as I just said).

> I think we should already remove nested options of children from the
> dicts in bdrv_open_inherit(). I'm less sure about node-name
> references.  Maybe instead of keeing the dicts up-to-date each time we
> modify the graph, we should just get rid of those in the dicts as
> well, and instead add a function that reconstructs the full dict from
> bs->options and the currently attached children. This requires that
> the child name and the option name match, but I think that's
> true. (Mostly at least - what about quorum? But the child node
> handling of quorum is broken anyway.)

Yes, removing nested options sounds like a good idea. In what cases do
we need the full qdict, though?

>> At the moment there's 
>> 
>> Berto
>
> And it's very good to have a Berto at the moment, but I think you
> wanted to add something else there? ;-)

I think it was just a leftover from a previous paragraph :-)

Berto



[Qemu-block] [PATCH v2 4/6] chardev: add support for authorization for TLS clients

2018-06-20 Thread Daniel P . Berrangé
From: "Daniel P. Berrange" 

Currently any client which can complete the TLS handshake is able to use
a chardev server. The server admin can turn on the 'verify-peer' option
for the x509 creds to require the client to provide a x509
certificate. This means the client will have to acquire a certificate
from the CA before they are permitted to use the chardev server. This is
still a fairly low bar.

This adds a 'tls-authz=OBJECT-ID' option to the socket chardev backend
which takes the ID of a previously added 'QAuthZ' object instance. This
will be used to validate the client's x509 distinguished name. Clients
failing the check will not be permitted to use the chardev server.

For example to setup authorization that only allows connection from a
client whose x509 certificate distinguished name contains 'CN=fred', you
would use:

  $QEMU -object tls-creds-x509,id=tls0,dir=/home/berrange/qemutls,\
endpoint=server,verify-peer=yes \
-object authz-simple,id=authz0,identity=CN=laptop.example.com,,\
O=Example Org,,L=London,,ST=London,,C=GB \
-chardev socket,host=127.0.0.1,port=9000,server,\
 tls-creds=tls0,tls-authz=authz0 \
...other qemu args...

Signed-off-by: Daniel P. Berrange 
---
 chardev/char-socket.c | 11 ++-
 chardev/char.c|  3 +++
 qapi/char.json|  6 ++
 qemu-options.hx   |  9 +++--
 4 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index 159e69c3b1..d0c9d26025 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -53,6 +53,7 @@ typedef struct {
 QIONetListener *listener;
 GSource *hup_source;
 QCryptoTLSCreds *tls_creds;
+char *tls_authz;
 int connected;
 int max_size;
 int do_telnetopt;
@@ -729,7 +730,7 @@ static void tcp_chr_tls_init(Chardev *chr)
 if (s->is_listen) {
 tioc = qio_channel_tls_new_server(
 s->ioc, s->tls_creds,
-NULL, /* XXX Use an ACL */
+s->tls_authz,
 );
 } else {
 tioc = qio_channel_tls_new_client(
@@ -881,6 +882,7 @@ static void char_socket_finalize(Object *obj)
 if (s->tls_creds) {
 object_unref(OBJECT(s->tls_creds));
 }
+g_free(s->tls_authz);
 
 qemu_chr_be_event(chr, CHR_EVENT_CLOSED);
 }
@@ -986,6 +988,7 @@ static void qmp_chardev_open_socket(Chardev *chr,
 }
 }
 }
+s->tls_authz = g_strdup(sock->tls_authz);
 
 s->addr = addr = socket_address_flatten(sock->addr);
 
@@ -1066,6 +1069,7 @@ static void qemu_chr_parse_socket(QemuOpts *opts, 
ChardevBackend *backend,
 const char *fd = qemu_opt_get(opts, "fd");
 const char *tls_creds = qemu_opt_get(opts, "tls-creds");
 SocketAddressLegacy *addr;
+const char *tls_authz = qemu_opt_get(opts, "tls-authz");
 ChardevSocket *sock;
 
 if ((!!path + !!fd + !!host) != 1) {
@@ -1094,6 +1098,10 @@ static void qemu_chr_parse_socket(QemuOpts *opts, 
ChardevBackend *backend,
 } else {
 g_assert_not_reached();
 }
+if (tls_authz && !tls_creds) {
+error_setg(errp, "Authorization can only be used when TLS is enabled");
+return;
+}
 
 sock = backend->u.socket.data = g_new0(ChardevSocket, 1);
 qemu_chr_parse_common(opts, qapi_ChardevSocket_base(sock));
@@ -,6 +1119,7 @@ static void qemu_chr_parse_socket(QemuOpts *opts, 
ChardevBackend *backend,
 sock->has_reconnect = true;
 sock->reconnect = reconnect;
 sock->tls_creds = g_strdup(tls_creds);
+sock->tls_authz = g_strdup(tls_authz);
 
 addr = g_new0(SocketAddressLegacy, 1);
 if (path) {
diff --git a/chardev/char.c b/chardev/char.c
index 76d866e6fe..d28c16ccca 100644
--- a/chardev/char.c
+++ b/chardev/char.c
@@ -837,6 +837,9 @@ QemuOptsList qemu_chardev_opts = {
 },{
 .name = "tls-creds",
 .type = QEMU_OPT_STRING,
+},{
+.name = "tls-authz",
+.type = QEMU_OPT_STRING,
 },{
 .name = "width",
 .type = QEMU_OPT_NUMBER,
diff --git a/qapi/char.json b/qapi/char.json
index ae19dcd1ed..a4eeecce88 100644
--- a/qapi/char.json
+++ b/qapi/char.json
@@ -242,6 +242,11 @@
 # @addr: socket address to listen on (server=true)
 #or connect to (server=false)
 # @tls-creds: the ID of the TLS credentials object (since 2.6)
+# @tls-authz: the ID of the QAuthZ authorization object against which
+# the client's x509 distinguished name will validated. This
+# object is only resolved at time of use, so can be deleted
+# and recreated on the fly while the chardev server is active.
+# If missing, it will default to denying access (since 3.0)
 # @server: create server socket (default: true)
 # @wait: wait for incoming connection on server
 #sockets (default: false).
@@ -259,6 +264,7 @@
 ##
 { 'struct': 'ChardevSocket', 'data': { 'addr'   : 

[Qemu-block] [PATCH v2 6/6] monitor: deprecate acl_show, acl_reset, acl_policy, acl_add, acl_remove

2018-06-20 Thread Daniel P . Berrangé
The various ACL related commands are obsolete now that the QAuthZ
framework for authorization is fully integrated throughout QEMU network
services. Mark it as deprecated with no replacement to be provided.

Authorization is now provided by using 'object_add' together with
the 'tls-authz' or 'sasl-authz' parameters to the VNC server, and
equivalent for other network services.

Signed-off-by: Daniel P. Berrangé 
---
 monitor.c | 23 +++
 qemu-doc.texi |  8 
 2 files changed, 31 insertions(+)

diff --git a/monitor.c b/monitor.c
index 07d14f53f9..cbcfbf912b 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2112,6 +2112,19 @@ static QAuthZList *find_auth(Monitor *mon, const char 
*name)
 return QAUTHZ_LIST(obj);
 }
 
+static bool warn_acl;
+static void hmp_warn_acl(void)
+{
+if (warn_acl) {
+return;
+}
+error_report("The acl_show, acl_reset, acl_policy, acl_add, acl_remove "
+ "commands are deprecated with no replacement. Authorization "
+ "for VNC should be performed using the pluggable QAuthZ "
+ "objects");
+warn_acl = true;
+}
+
 static void hmp_acl_show(Monitor *mon, const QDict *qdict)
 {
 const char *aclname = qdict_get_str(qdict, "aclname");
@@ -2119,6 +2132,8 @@ static void hmp_acl_show(Monitor *mon, const QDict *qdict)
 QAuthZListRuleList *rules;
 size_t i = 0;
 
+hmp_warn_acl();
+
 if (!auth) {
 return;
 }
@@ -2142,6 +2157,8 @@ static void hmp_acl_reset(Monitor *mon, const QDict 
*qdict)
 const char *aclname = qdict_get_str(qdict, "aclname");
 QAuthZList *auth = find_auth(mon, aclname);
 
+hmp_warn_acl();
+
 if (!auth) {
 return;
 }
@@ -2160,6 +2177,8 @@ static void hmp_acl_policy(Monitor *mon, const QDict 
*qdict)
 int val;
 Error *err = NULL;
 
+hmp_warn_acl();
+
 if (!auth) {
 return;
 }
@@ -2195,6 +2214,8 @@ static void hmp_acl_add(Monitor *mon, const QDict *qdict)
 QAuthZListFormat format;
 size_t i = 0;
 
+hmp_warn_acl();
+
 if (!auth) {
 return;
 }
@@ -2250,6 +2271,8 @@ static void hmp_acl_remove(Monitor *mon, const QDict 
*qdict)
 QAuthZList *auth = find_auth(mon, aclname);
 ssize_t i = 0;
 
+hmp_warn_acl();
+
 if (!auth) {
 return;
 }
diff --git a/qemu-doc.texi b/qemu-doc.texi
index 5639b7ecdb..8327e56db9 100644
--- a/qemu-doc.texi
+++ b/qemu-doc.texi
@@ -2933,6 +2933,14 @@ The ``query-cpus'' command is replaced by the 
``query-cpus-fast'' command.
 The ``arch'' output member of the ``query-cpus-fast'' command is
 replaced by the ``target'' output member.
 
+@section Human Monitor Protocol (HMP) commands
+
+@subsection acl_show, acl_reset, acl_policy, acl_add, acl_remove (since 3.0.0)
+
+The ``acl_show'', ``acl_reset'', ``acl_policy'', ``acl_add'', and
+``acl_remove'' commands are deprecated with no replacement. Authorization
+for VNC should be performed using the pluggable QAuthZ objects.
+
 @section System emulator devices
 
 @subsection ivshmem (since 2.6.0)
-- 
2.17.0




[Qemu-block] [PATCH v2 3/6] migration: add support for a "tls-authz" migration parameter

2018-06-20 Thread Daniel P . Berrangé
From: "Daniel P. Berrange" 

The QEMU instance that runs as the server for the migration data
transport (ie the target QEMU) needs to be able to configure access
control so it can prevent unauthorized clients initiating an incoming
migration. This adds a new 'tls-authz' migration parameter that is used
to provide the QOM ID of a QAuthZ subclass instance that provides the
access control check. This is checked against the x509 certificate
obtained during the TLS handshake.

For example, when starting a QEMU for incoming migration, it is
possible to give an example identity of the source QEMU that is
intended to be connecting later:

  $QEMU \
 -monitor stdio \
 -incoming defer \
 ...other args...

  (qemu) object_add tls-creds-x509,id=tls0,dir=/home/berrange/qemutls,\
 endpoint=server,verify-peer=yes \
  (qemu) object_add authz-simple,id=auth0,identity=CN=laptop.example.com,,\
 O=Example Org,,L=London,,ST=London,,C=GB \
  (qemu) migrate_incoming tcp:localhost:9000

Signed-off-by: Daniel P. Berrange 
---
 hmp.c |  9 +
 migration/migration.c |  8 
 migration/tls.c   |  2 +-
 qapi/migration.json   | 14 +-
 4 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/hmp.c b/hmp.c
index b93a871830..b0a7f3a2fb 100644
--- a/hmp.c
+++ b/hmp.c
@@ -373,6 +373,9 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict 
*qdict)
 monitor_printf(mon, "%s: %" PRIu64 "\n",
 MigrationParameter_str(MIGRATION_PARAMETER_MAX_POSTCOPY_BANDWIDTH),
 params->max_postcopy_bandwidth);
+monitor_printf(mon, " %s: '%s'\n",
+MigrationParameter_str(MIGRATION_PARAMETER_TLS_AUTHZ),
+params->has_tls_authz ? params->tls_authz : "");
 }
 
 qapi_free_MigrationParameters(params);
@@ -1635,6 +1638,12 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict 
*qdict)
 p->tls_hostname->type = QTYPE_QSTRING;
 visit_type_str(v, param, >tls_hostname->u.s, );
 break;
+case MIGRATION_PARAMETER_TLS_AUTHZ:
+p->has_tls_authz = true;
+p->tls_authz = g_new0(StrOrNull, 1);
+p->tls_authz->type = QTYPE_QSTRING;
+visit_type_str(v, param, >tls_authz->u.s, );
+break;
 case MIGRATION_PARAMETER_MAX_BANDWIDTH:
 p->has_max_bandwidth = true;
 /*
diff --git a/migration/migration.c b/migration/migration.c
index e1eaa97df4..40df9a9bc3 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -650,6 +650,8 @@ MigrationParameters *qmp_query_migrate_parameters(Error 
**errp)
 params->tls_creds = g_strdup(s->parameters.tls_creds);
 params->has_tls_hostname = true;
 params->tls_hostname = g_strdup(s->parameters.tls_hostname);
+params->has_tls_authz = true;
+params->tls_authz = g_strdup(s->parameters.tls_authz);
 params->has_max_bandwidth = true;
 params->max_bandwidth = s->parameters.max_bandwidth;
 params->has_downtime_limit = true;
@@ -1116,6 +1118,12 @@ static void migrate_params_apply(MigrateSetParameters 
*params, Error **errp)
 s->parameters.tls_hostname = g_strdup(params->tls_hostname->u.s);
 }
 
+if (params->has_tls_authz) {
+g_free(s->parameters.tls_authz);
+assert(params->tls_authz->type == QTYPE_QSTRING);
+s->parameters.tls_authz = g_strdup(params->tls_authz->u.s);
+}
+
 if (params->has_max_bandwidth) {
 s->parameters.max_bandwidth = params->max_bandwidth;
 if (s->to_dst_file) {
diff --git a/migration/tls.c b/migration/tls.c
index 3b9e8c9263..5171afc6c4 100644
--- a/migration/tls.c
+++ b/migration/tls.c
@@ -94,7 +94,7 @@ void migration_tls_channel_process_incoming(MigrationState *s,
 
 tioc = qio_channel_tls_new_server(
 ioc, creds,
-NULL, /* XXX pass ACL name */
+s->parameters.tls_authz,
 errp);
 if (!tioc) {
 return;
diff --git a/qapi/migration.json b/qapi/migration.json
index 1b4c1db670..3fe0d0c949 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -488,6 +488,12 @@
 #hostname must be provided so that the server's x509
 #certificate identity can be validated. (Since 2.7)
 #
+# @tls-authz: ID of the 'authz' object subclass that provides access control
+# checking of the TLS x509 certificate distinguished name.
+# This object is only resolved at time of use, so can be deleted
+# and recreated on the fly while the migration server is active.
+# If missing, it will default to denying access (Since 3.0)
+#
 # @max-bandwidth: to set maximum speed for migration. maximum speed in
 # bytes per second. (Since 2.8)
 #
@@ -525,7 +531,7 @@
 { 'enum': 'MigrationParameter',
   'data': ['compress-level', 'compress-threads', 'decompress-threads',
'cpu-throttle-initial', 'cpu-throttle-increment',
-   'tls-creds', 'tls-hostname', 'max-bandwidth',
+   

Re: [Qemu-block] [PATCH v6 5/6] iotests: Add new test 214 for max compressed cluster offset

2018-06-20 Thread Alberto Garcia
On Tue 19 Jun 2018 08:51:36 PM CEST, Eric Blake wrote:
> On 04/26/2018 07:10 AM, Alberto Garcia wrote:
>> On Thu 26 Apr 2018 04:51:28 AM CEST, Eric Blake wrote:
>>> If you have a capable file system (tmpfs is good, ext4 not so much;
>>> run ./check with TEST_DIR pointing to a good location so as not
>>> to skip the test), it's actually possible to create a qcow2 file
>>> that expands to a sparse 512T image with just over 38M of content.
>>> The test is not the world's fastest (qemu crawling through 256M
>>> bits of refcount table to find the next cluster to allocate takes
>>> several seconds, as does qemu-img check reporting millions of
>>> leaked clusters); but it DOES catch the problem that the previous
>>> patch just fixed where writing a compressed cluster to a full
>>> image ended up overwriting the wrong cluster.
>>>
>>> Suggested-by: Max Reitz 
>>> Signed-off-by: Eric Blake 
>> 
>> Nice test :-)
>> 
>> Reviewed-by: Alberto Garcia 
>
> 214 is already in the tree in the meantime; this will need a rebase to
> pick the next available test number (220 might be claimed, so 222?)

You can also take 220 if you want, I suspect that it will take a bit
longer to merge the blockdev-reopen patches. Whatever is easier for you.

Berto



[Qemu-block] [PATCH v2 2/6] nbd: allow authorization with nbd-server-start QMP command

2018-06-20 Thread Daniel P . Berrangé
From: "Daniel P. Berrange" 

As with the previous patch to qemu-nbd, the nbd-server-start QMP command
also needs to be able to specify authorization when enabling TLS encryption.

First the client must create a QAuthZ object instance using the
'object-add' command:

   {
 'execute': 'object-add',
 'arguments': {
   'qom-type': 'authz-list',
   'id': 'authz0',
   'parameters': {
 'policy': 'deny',
 'rules': [
   {
 'match': '*CN=fred',
 'policy': 'allow'
   }
 ]
   }
 }
   }

They can then reference this in the new 'tls-authz' parameter when
executing the 'nbd-server-start' command:

   {
 'execute': 'nbd-server-start',
 'arguments': {
   'addr': {
   'type': 'inet',
   'host': '127.0.0.1',
   'port': '9000'
   },
   'tls-creds': 'tls0',
   'tls-authz': 'authz0'
 }
   }

Signed-off-by: Daniel P. Berrange 
---
 blockdev-nbd.c  | 12 +---
 hmp.c   |  2 +-
 include/block/nbd.h |  2 +-
 qapi/block.json |  8 +++-
 4 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/blockdev-nbd.c b/blockdev-nbd.c
index 65a84739ed..2627379ede 100644
--- a/blockdev-nbd.c
+++ b/blockdev-nbd.c
@@ -23,6 +23,7 @@
 typedef struct NBDServerData {
 QIONetListener *listener;
 QCryptoTLSCreds *tlscreds;
+char *tlsauthz;
 } NBDServerData;
 
 static NBDServerData *nbd_server;
@@ -37,7 +38,8 @@ static void nbd_accept(QIONetListener *listener, 
QIOChannelSocket *cioc,
 {
 qio_channel_set_name(QIO_CHANNEL(cioc), "nbd-server");
 nbd_client_new(NULL, cioc,
-   nbd_server->tlscreds, NULL,
+   nbd_server->tlscreds,
+   nbd_server->tlsauthz,
nbd_blockdev_client_closed);
 }
 
@@ -53,6 +55,7 @@ static void nbd_server_free(NBDServerData *server)
 if (server->tlscreds) {
 object_unref(OBJECT(server->tlscreds));
 }
+g_free(server->tlsauthz);
 
 g_free(server);
 }
@@ -88,7 +91,7 @@ static QCryptoTLSCreds *nbd_get_tls_creds(const char *id, 
Error **errp)
 
 
 void nbd_server_start(SocketAddress *addr, const char *tls_creds,
-  Error **errp)
+  const char *tls_authz, Error **errp)
 {
 if (nbd_server) {
 error_setg(errp, "NBD server already running");
@@ -118,6 +121,8 @@ void nbd_server_start(SocketAddress *addr, const char 
*tls_creds,
 }
 }
 
+nbd_server->tlsauthz = g_strdup(tls_authz);
+
 qio_net_listener_set_client_func(nbd_server->listener,
  nbd_accept,
  NULL,
@@ -132,11 +137,12 @@ void nbd_server_start(SocketAddress *addr, const char 
*tls_creds,
 
 void qmp_nbd_server_start(SocketAddressLegacy *addr,
   bool has_tls_creds, const char *tls_creds,
+  bool has_tls_authz, const char *tls_authz,
   Error **errp)
 {
 SocketAddress *addr_flat = socket_address_flatten(addr);
 
-nbd_server_start(addr_flat, tls_creds, errp);
+nbd_server_start(addr_flat, tls_creds, tls_authz, errp);
 qapi_free_SocketAddress(addr_flat);
 }
 
diff --git a/hmp.c b/hmp.c
index f40d8279cf..b93a871830 100644
--- a/hmp.c
+++ b/hmp.c
@@ -2221,7 +2221,7 @@ void hmp_nbd_server_start(Monitor *mon, const QDict 
*qdict)
 goto exit;
 }
 
-nbd_server_start(addr, NULL, _err);
+nbd_server_start(addr, NULL, NULL, _err);
 qapi_free_SocketAddress(addr);
 if (local_err != NULL) {
 goto exit;
diff --git a/include/block/nbd.h b/include/block/nbd.h
index 80ea9d240c..8a8ae8c3a7 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -313,7 +313,7 @@ void nbd_client_get(NBDClient *client);
 void nbd_client_put(NBDClient *client);
 
 void nbd_server_start(SocketAddress *addr, const char *tls_creds,
-  Error **errp);
+  const char *tls_authz, Error **errp);
 
 
 /* nbd_read
diff --git a/qapi/block.json b/qapi/block.json
index c694524002..9403d45d20 100644
--- a/qapi/block.json
+++ b/qapi/block.json
@@ -197,6 +197,11 @@
 #
 # @addr: Address on which to listen.
 # @tls-creds: (optional) ID of the TLS credentials object. Since 2.6
+# @tls-authz: ID of the QAuthZ authorization object used to validate
+# the client's x509 distinguished name. This object is
+# is only resolved at time of use, so can be deleted and
+# recreated on the fly while the NBD server is active.
+# If missing, it will default to denying access. Since 3.0
 #
 # Returns: error if the server is already running.
 #
@@ -204,7 +209,8 @@
 ##
 { 'command': 'nbd-server-start',
   'data': { 'addr': 'SocketAddressLegacy',
-'*tls-creds': 'str'} }
+'*tls-creds': 'str',
+'*tls-authz': 'str'} }
 
 ##
 # @nbd-server-add:
-- 
2.17.0




[Qemu-block] [PATCH v2 0/6] Add authorization support to all network services

2018-06-20 Thread Daniel P . Berrangé
This series builds on the core authorization framework:

  https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg04469.html

enabling its use with the VNC, chardev, NBD and migration network servers.

In combination with TLS x509 client certificates, this allows these
services to whitelist specific clients, which avoids the need to setup
restricted child certificate authorities.

In VNC it also allows whitelisting based on SASL user names.

Changed in v2:

 - Document that authz objects are resolved at time of use, not
   time of network service activation
 - Improve docs for tls-authz parameters on services
 - Fix 2.13 -> 3.0 version tags
 - Remove redundant conditionals around g_strdup
 - Fix arg syntax for qemu-nbd  s/-/--/
 - Remove QAPI (optional) annotation
 - Fix some outdated usage example

Based-on: <20180620103555.1342-1-berra...@redhat.com>

Daniel P. Berrangé (6):
  qemu-nbd: add support for authorization of TLS clients
  nbd: allow authorization with nbd-server-start QMP command
  migration: add support for a "tls-authz" migration parameter
  chardev: add support for authorization for TLS clients
  vnc: allow specifying a custom authorization object name
  monitor: deprecate acl_show, acl_reset, acl_policy, acl_add,
acl_remove

 blockdev-nbd.c| 12 ++---
 chardev/char-socket.c | 11 +++-
 chardev/char.c|  3 +++
 hmp.c | 11 +++-
 include/block/nbd.h   |  4 +--
 migration/migration.c |  8 ++
 migration/tls.c   |  2 +-
 monitor.c | 23 +
 nbd/server.c  | 10 
 qapi/block.json   |  8 +-
 qapi/char.json|  6 +
 qapi/migration.json   | 14 ++-
 qemu-doc.texi | 13 ++
 qemu-nbd.c| 13 +-
 qemu-nbd.texi |  4 +++
 qemu-options.hx   | 44 +++-
 ui/vnc.c  | 58 ---
 17 files changed, 207 insertions(+), 37 deletions(-)

-- 
2.17.0




[Qemu-block] [PATCH v2 1/6] qemu-nbd: add support for authorization of TLS clients

2018-06-20 Thread Daniel P . Berrangé
From: "Daniel P. Berrange" 

Currently any client which can complete the TLS handshake is able to use
the NBD server. The server admin can turn on the 'verify-peer' option
for the x509 creds to require the client to provide a x509 certificate.
This means the client will have to acquire a certificate from the CA
before they are permitted to use the NBD server. This is still a fairly
low bar to cross.

This adds a '--tls-authz OBJECT-ID' option to the qemu-nbd command which
takes the ID of a previously added 'QAuthZ' object instance. This will
be used to validate the client's x509 distinguished name. Clients
failing the authorization check will not be permitted to use the NBD
server.

For example to setup authorization that only allows connection from a client
whose x509 certificate distinguished name is

   CN=laptop.example.com,O=Example Org,L=London,ST=London,C=GB

use:

  qemu-nbd --object tls-creds-x509,id=tls0,dir=/home/berrange/qemutls,\
endpoint=server,verify-peer=yes \
   --object authz-simple,id=auth0,identity=CN=laptop.example.com,,\
O=Example Org,,L=London,,ST=London,,C=GB \
   --tls-creds tls0 \
   --tls-authz authz0
   other qemu-nbd args...

Signed-off-by: Daniel P. Berrange 
---
 include/block/nbd.h |  2 +-
 nbd/server.c| 10 +-
 qemu-nbd.c  | 13 -
 qemu-nbd.texi   |  4 
 4 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index fcdcd54502..80ea9d240c 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -307,7 +307,7 @@ void nbd_export_close_all(void);
 void nbd_client_new(NBDExport *exp,
 QIOChannelSocket *sioc,
 QCryptoTLSCreds *tlscreds,
-const char *tlsaclname,
+const char *tlsauthz,
 void (*close_fn)(NBDClient *, bool));
 void nbd_client_get(NBDClient *client);
 void nbd_client_put(NBDClient *client);
diff --git a/nbd/server.c b/nbd/server.c
index 9e1f227178..4f10f08dc0 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -100,7 +100,7 @@ struct NBDClient {
 
 NBDExport *exp;
 QCryptoTLSCreds *tlscreds;
-char *tlsaclname;
+char *tlsauthz;
 QIOChannelSocket *sioc; /* The underlying data channel */
 QIOChannel *ioc; /* The current I/O channel which may differ (eg TLS) */
 
@@ -677,7 +677,7 @@ static QIOChannel *nbd_negotiate_handle_starttls(NBDClient 
*client,
 
 tioc = qio_channel_tls_new_server(ioc,
   client->tlscreds,
-  client->tlsaclname,
+  client->tlsauthz,
   errp);
 if (!tioc) {
 return NULL;
@@ -1250,7 +1250,7 @@ void nbd_client_put(NBDClient *client)
 if (client->tlscreds) {
 object_unref(OBJECT(client->tlscreds));
 }
-g_free(client->tlsaclname);
+g_free(client->tlsauthz);
 if (client->exp) {
 QTAILQ_REMOVE(>exp->clients, client, next);
 nbd_export_put(client->exp);
@@ -2140,7 +2140,7 @@ static coroutine_fn void nbd_co_client_start(void *opaque)
 void nbd_client_new(NBDExport *exp,
 QIOChannelSocket *sioc,
 QCryptoTLSCreds *tlscreds,
-const char *tlsaclname,
+const char *tlsauthz,
 void (*close_fn)(NBDClient *, bool))
 {
 NBDClient *client;
@@ -2153,7 +2153,7 @@ void nbd_client_new(NBDExport *exp,
 if (tlscreds) {
 object_ref(OBJECT(client->tlscreds));
 }
-client->tlsaclname = g_strdup(tlsaclname);
+client->tlsauthz = g_strdup(tlsauthz);
 client->sioc = sioc;
 object_ref(OBJECT(client->sioc));
 client->ioc = QIO_CHANNEL(sioc);
diff --git a/qemu-nbd.c b/qemu-nbd.c
index 51b9d38c72..c0c9c805c0 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -52,6 +52,7 @@
 #define QEMU_NBD_OPT_TLSCREDS  261
 #define QEMU_NBD_OPT_IMAGE_OPTS262
 #define QEMU_NBD_OPT_FORK  263
+#define QEMU_NBD_OPT_TLSAUTHZ  264
 
 #define MBR_SIZE 512
 
@@ -66,6 +67,7 @@ static int shared = 1;
 static int nb_fds;
 static QIONetListener *server;
 static QCryptoTLSCreds *tlscreds;
+static const char *tlsauthz;
 
 static void usage(const char *name)
 {
@@ -355,7 +357,7 @@ static void nbd_accept(QIONetListener *listener, 
QIOChannelSocket *cioc,
 nb_fds++;
 nbd_update_server_watch();
 nbd_client_new(newproto ? NULL : exp, cioc,
-   tlscreds, NULL, nbd_client_closed);
+   tlscreds, tlsauthz, nbd_client_closed);
 }
 
 static void nbd_update_server_watch(void)
@@ -533,6 +535,7 @@ int main(int argc, char **argv)
 { "image-opts", no_argument, NULL, QEMU_NBD_OPT_IMAGE_OPTS },
 { "trace", required_argument, NULL, 'T' },
 { "fork", no_argument, NULL, QEMU_NBD_OPT_FORK },
+

[Qemu-block] [PATCH v2 5/6] vnc: allow specifying a custom authorization object name

2018-06-20 Thread Daniel P . Berrangé
From: "Daniel P. Berrange" 

The VNC server has historically had support for ACLs to check both the
SASL username and the TLS x509 distinguished name. The VNC server was
responsible for creating the initial ACL, and the client app was then
responsible for populating it with rules using the HMP 'acl_add' command.

This is not satisfactory for a variety of reasons. There is no way to
populate the ACLs from the command line, users are forced to use the
HMP. With multiple network services all supporting TLS and ACLs now, it
is desirable to be able to define a single ACL that is referenced by all
services.

To address these limitations, two new options are added to the VNC
server CLI. The 'tls-authz' option takes the ID of a QAuthZ object to
use for checking TLS x509 distinguished names, and the 'sasl-authz'
option takes the ID of another object to use for checking SASL usernames.

In this example, we setup two authorization rules. The first allows any
client with a certificate issued by the 'RedHat' organization in the
'London' locality. The second ACL allows clients with either the
'j...@redhat.com' or  'f...@redhat.com' kerberos usernames. Both checks
must pass for the user to be allowed.

$QEMU -object tls-creds-x509,id=tls0,dir=/home/berrange/qemutls,\
  endpoint=server,verify-peer=yes \
  -object authz-simple,id=authz0,policy=deny,\
  rules.0.match=O=RedHat,,L=London,rules.0.policy=allow \
  -object authz-simple,id=authz1,policy=deny,\
  rules.0.match=f...@redhat.com,rules.0.policy=allow \
  rules.0.match=j...@redhat.com,rules.0.policy=allow \
  -vnc 0.0.0.0:1,tls-creds=tls0,tls-authz=authz0,
   sasl,sasl-authz=authz1 \
  ...other QEMU args...

Signed-off-by: Daniel P. Berrange 
---
 qemu-doc.texi   |  5 +
 qemu-options.hx | 35 -
 ui/vnc.c| 58 +
 3 files changed, 79 insertions(+), 19 deletions(-)

diff --git a/qemu-doc.texi b/qemu-doc.texi
index 282bc3dc35..5639b7ecdb 100644
--- a/qemu-doc.texi
+++ b/qemu-doc.texi
@@ -2912,6 +2912,11 @@ Option @option{-virtioconsole} has been replaced by
 The @code{-clock} option is ignored since QEMU version 1.7.0. There is no
 replacement since it is not needed anymore.
 
+@subsection -vnc acl (since 3.0.0)
+
+The @code{acl} option to the @code{-vnc} argument has been replaced
+by the @code{tls-authz} and @code{sasl-authz} options.
+
 @section QEMU Machine Protocol (QMP) commands
 
 @subsection block-dirty-bitmap-add "autoload" parameter (since 2.12.0)
diff --git a/qemu-options.hx b/qemu-options.hx
index b7337da782..d7a87e8b1e 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -1647,6 +1647,14 @@ The @option{tls-creds} parameter obsoletes the 
@option{tls},
 it is not permitted to set both new and old type options at
 the same time.
 
+@item tls-authz=@var{ID}
+
+Provides the ID of the QAuthZ authorization object against which
+the client's x509 distinguished name will validated. This object is
+only resolved at time of use, so can be deleted and recreated on the
+fly while the VNC server is active. If missing, it will default
+to denying access.
+
 @item tls
 
 Require that client use TLS when communicating with the VNC server. This
@@ -1700,18 +1708,25 @@ ensures a data encryption preventing compromise of 
authentication
 credentials. See the @ref{vnc_security} section for details on using
 SASL authentication.
 
+@item sasl-authz=@var{ID}
+
+Provides the ID of the QAuthZ authorization object against which
+the client's SASL username will validated. This object is
+only resolved at time of use, so can be deleted and recreated on the
+fly while the VNC server is active. If missing, it will default
+to denying access.
+
 @item acl
 
-Turn on access control lists for checking of the x509 client certificate
-and SASL party. For x509 certs, the ACL check is made against the
-certificate's distinguished name. This is something that looks like
-@code{C=GB,O=ACME,L=Boston,CN=bob}. For SASL party, the ACL check is
-made against the username, which depending on the SASL plugin, may
-include a realm component, eg @code{bob} or @code{bob@@EXAMPLE.COM}.
-When the @option{acl} flag is set, the initial access list will be
-empty, with a @code{deny} policy. Thus no one will be allowed to
-use the VNC server until the ACLs have been loaded. This can be
-achieved using the @code{acl} monitor command.
+Legacy method for enabling authorization of clients against the
+x509 distinguished name and SASL username. It results in the creation
+of two @code{authz-list} objects with IDs of @code{vnc.username} and
+@code{vnc.x509dname}. The rules for these objects must be configured
+with the HMP ACL commands.
+
+This option is deprecated and should no longer be used. The new
+@option{sasl-authz} and @option{tls-authz} options are a
+replacement.
 
 @item lossy
 
diff --git a/ui/vnc.c b/ui/vnc.c

Re: [Qemu-block] [PATCH v5 6/6] docs/interop: add nbd.txt

2018-06-20 Thread Eric Blake

On 06/09/2018 10:17 AM, Vladimir Sementsov-Ogievskiy wrote:

Describe new metadata namespace: "qemu".

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  docs/interop/nbd.txt | 37 +
  MAINTAINERS  |  1 +
  2 files changed, 38 insertions(+)
  create mode 100644 docs/interop/nbd.txt

diff --git a/docs/interop/nbd.txt b/docs/interop/nbd.txt
new file mode 100644
index 00..7366269fc0
--- /dev/null
+++ b/docs/interop/nbd.txt
@@ -0,0 +1,37 @@
+Qemu supports NBD protocol, and has internal NBD client (look at


s/supports/supports the/


+block/nbd.c), internal NBD server (look at blockdev-nbd.c) as well as


s/internal/an internal/2


+external NBD server tool - qemu-nbd.c. The common code is placed in


s/external/an external/


+nbd/*.
+
+NBD protocol is specified here:


s/NBD/The NBD/


+https://github.com/NetworkBlockDevice/nbd/blob/master/doc/proto.md
+
+This following paragraphs describe some specific properties of NBD
+protocol realization in Qemu.
+
+
+= Metadata namespaces =
+
+Qemu supports "base:allocation" metadata context as defined in the NBD


s/supports/supports the/


+protocol specification and defines own metadata namespace: "qemu".


s/own/an additional/


+
+
+== "qemu" namespace ==
+
+For now, the only type of metadata context in the namespace is dirty
+bitmap. All available metadata contexts have the following form:


maybe:

The "qemu" namespace currently contains only one type of context, 
related to exposing the contents of a dirty bitmap alongside the 
associated disk contents.  The available metadata context has the 
following form:



+
+   qemu:dirty-bitmap:
+
+Each dirty-bitmap metadata context defines the only one flag for
+extents in reply for NBD_CMD_BLOCK_STATUS:
+
+bit 0: NBD_STATE_DIRTY, means that the extent is "dirty"
+
+For NBD_OPT_LIST_META_CONTEXT the following queries are supported
+additionally to "qemu:dirty-bitmap:":


s/additionally/in addition/


+
+* "qemu:" : returns list of all available metadata contexts in the
+namespace.
+* "qemu:dirty-bitmap:" : returns list of all available dirty-bitmap
+ metadata contexts.
diff --git a/MAINTAINERS b/MAINTAINERS
index e187b1f18f..887b479440 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1923,6 +1923,7 @@ F: nbd/
  F: include/block/nbd*
  F: qemu-nbd.*
  F: blockdev-nbd.c
+F: docs/interop/nbd.txt
  T: git git://repo.or.cz/qemu/ericb.git nbd
  
  NFS




Reviewed-by: Eric Blake 

At this point, I think I'll touch up the issues I've spotted and submit 
a pull request, in order to make it easier for me to test my libvirt code.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-block] [PATCH v5 5/6] qapi: new qmp command nbd-server-add-bitmap

2018-06-20 Thread Eric Blake

On 06/09/2018 10:17 AM, Vladimir Sementsov-Ogievskiy wrote:

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  qapi/block.json | 23 +++
  blockdev-nbd.c  | 23 +++
  2 files changed, 46 insertions(+)


I'm tempted to temporarily name this x-nbd-server-add-bitmap, until I 
have the counterpart Libvirt patches tested, just in case testing turns 
up any tweaks we need to the interface.




diff --git a/qapi/block.json b/qapi/block.json
index c694524002..ddbca2e286 100644
--- a/qapi/block.json
+++ b/qapi/block.json
@@ -269,6 +269,29 @@
'data': {'name': 'str', '*mode': 'NbdServerRemoveMode'} }
  
  ##

+# @nbd-server-add-bitmap:
+#
+# Expose a dirty bitmap associated with the selected export. The bitmap search
+# starts at the device attached to the export, and includes all backing files.
+# The exported bitmap is then locked until the NBD export is removed.


The fact that you search the backing chain is at least consistent with 
your code.



+#
+# @name: Export name.
+#
+# @bitmap: Bitmap name to search for.
+#
+# @bitmap-export-name: How the bitmap will be seen by nbd clients
+#  (default @bitmap)


Do we really need the flexibility of naming the bitmap differently to 
the NBD client than we do in qemu?



+#
+# Note: the client must use NBD_OPT_SET_META_CONTEXT with a query of
+# "qemu:dirty-bitmap:NAME" (where NAME matches @bitmap-export-name) to access
+# the exposed bitmap.
+#
+# Since: 3.0
+##
+  { 'command': 'nbd-server-add-bitmap',
+'data': {'name': 'str', 'bitmap': 'str', '*bitmap-export-name': 'str'} }
+
+##
  # @nbd-server-stop:
  #
  # Stop QEMU's embedded NBD server, and unregister all devices previously
diff --git a/blockdev-nbd.c b/blockdev-nbd.c
index 65a84739ed..6b0c50732c 100644
--- a/blockdev-nbd.c
+++ b/blockdev-nbd.c
@@ -220,3 +220,26 @@ void qmp_nbd_server_stop(Error **errp)
  nbd_server_free(nbd_server);
  nbd_server = NULL;
  }
+
+void qmp_nbd_server_add_bitmap(const char *name, const char *bitmap,
+   bool has_bitmap_export_name,
+   const char *bitmap_export_name,
+   Error **errp)
+{
+NBDExport *exp;
+
+if (!nbd_server) {
+error_setg(errp, "NBD server not running");
+return;
+}
+
+exp = nbd_export_find(name);
+if (exp == NULL) {
+error_setg(errp, "Export '%s' is not found", name);
+return;
+}
+
+nbd_export_bitmap(exp, bitmap,
+  has_bitmap_export_name ? bitmap_export_name : bitmap,
+  errp);
+}



--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-block] [PATCH v5 4/6] nbd/server: implement dirty bitmap export

2018-06-20 Thread Eric Blake

On 06/09/2018 10:17 AM, Vladimir Sementsov-Ogievskiy wrote:

Handle new NBD meta namespace: "qemu", and corresponding queries:


s/new/a new/


"qemu:dirty-bitmap:".

With new metadata context negotiated, BLOCK_STATUS query will reply


s/new/the new/


with dirty-bitmap data, converted to extents. New public function


s/New/The new/


nbd_export_bitmap selects bitmap to export. For now, only one bitmap


s/bitmap/which bitmap/


may be exported.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  include/block/nbd.h |   6 ++
  nbd/server.c| 271 
  2 files changed, 259 insertions(+), 18 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index fcdcd54502..a653d0fba7 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -234,6 +234,10 @@ enum {
  #define NBD_STATE_HOLE (1 << 0)
  #define NBD_STATE_ZERO (1 << 1)
  
+/* Flags for extents (NBDExtent.flags) of NBD_REPLY_TYPE_BLOCK_STATUS,

+ * for qemu:dirty-bitmap:* meta contexts */


Comment tail.


+#define NBD_STATE_DIRTY (1 << 0)
+
  static inline bool nbd_reply_type_is_error(int type)
  {
  return type & (1 << 15);
@@ -315,6 +319,8 @@ void nbd_client_put(NBDClient *client);
  void nbd_server_start(SocketAddress *addr, const char *tls_creds,
Error **errp);
  
+void nbd_export_bitmap(NBDExport *exp, const char *bitmap,

+   const char *bitmap_export_name, Error **errp);
  
  /* nbd_read

   * Reads @size bytes from @ioc. Returns 0 on success.
diff --git a/nbd/server.c b/nbd/server.c
index 2d762d7289..569e068fc2 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -23,6 +23,12 @@
  #include "nbd-internal.h"
  
  #define NBD_META_ID_BASE_ALLOCATION 0

+#define NBD_META_ID_DIRTY_BITMAP 1
+
+/* NBD_MAX_BITMAP_EXTENTS: 1 mb of extents data. An empirical constant. If need
+ * to increase, note that NBD protocol defines 32 mb as a limit, after which 
the


s/need to increase/an increase is needed/


+ * reply may be considered as a denial of service attack. */
+#define NBD_MAX_BITMAP_EXTENTS (0x10 / 8)
  
  static int system_errno_to_nbd_errno(int err)

  {
@@ -80,6 +86,9 @@ struct NBDExport {
  
  BlockBackend *eject_notifier_blk;

  Notifier eject_notifier;
+
+BdrvDirtyBitmap *export_bitmap;
+char *export_bitmap_context;
  };
  
  static QTAILQ_HEAD(, NBDExport) exports = QTAILQ_HEAD_INITIALIZER(exports);

@@ -92,6 +101,7 @@ typedef struct NBDExportMetaContexts {
  bool valid; /* means that negotiation of the option finished without
 errors */
  bool base_allocation; /* export base:allocation context (block status) */
+bool dirty_bitmap; /* export qemu:dirty-bitmap: */
  } NBDExportMetaContexts;
  
  struct NBDClient {

@@ -810,6 +820,55 @@ static int nbd_meta_base_query(NBDClient *client, 
NBDExportMetaContexts *meta,
   >base_allocation, errp);
  }
  
+/* nbd_meta_bitmap_query

+ *
+ * Handle query to 'qemu:' namespace.
+ * @len is the amount of text remaining to be read from the current name, after
+ * the 'qemu:' portion has been stripped.
+ *
+ * Return -errno on I/O error, 0 if option was completely handled by
+ * sending a reply about inconsistent lengths, or 1 on success. */
+static int nbd_meta_qemu_query(NBDClient *client, NBDExportMetaContexts *meta,
+   uint32_t len, Error **errp)
+{
+bool dirty_bitmap = false;
+int dirty_bitmap_len = strlen("dirty-bitmap:");


size_t is better for strlen() values.


+int ret;
+
+if (!meta->exp->export_bitmap) {
+return nbd_opt_skip(client, len, errp);
+}


Worth a trace?...


+
+if (len == 0) {
+if (client->opt == NBD_OPT_LIST_META_CONTEXT) {
+meta->dirty_bitmap = true;
+}
+trace_nbd_negotiate_meta_query_parse("empty");
+return 1;
+}
+
+if (len < dirty_bitmap_len) {
+trace_nbd_negotiate_meta_query_skip("not dirty-bitmap:");
+return nbd_opt_skip(client, len, errp);
+}
+


...especially since other returns have traced the result.

I'm strongly thinking of adding a patch to QMP to add an x-context 
parameter when creating an NBD client, in order to at least make testing 
client/server interactions easier than just code inspection.  Does not 
have to be part of this series.



+len -= dirty_bitmap_len;
+ret = nbd_meta_pattern(client, "dirty-bitmap:", _bitmap, errp);
+if (ret <= 0) {
+return ret;
+}
+if (!dirty_bitmap) {
+trace_nbd_negotiate_meta_query_skip("not dirty-bitmap:");
+return nbd_opt_skip(client, len, errp);
+}
+
+trace_nbd_negotiate_meta_query_parse("dirty-bitmap:");
+
+return nbd_meta_empty_or_pattern(
+client, meta->exp->export_bitmap_context +
+strlen("qemu:dirty_bitmap:"), len, >dirty_bitmap, errp);
+}
+
  /* nbd_negotiate_meta_query
   *
   * Parse namespace name and call 

Re: [Qemu-block] [RFC PATCH 06/10] block: Allow changing the backing file on reopen

2018-06-20 Thread Kevin Wolf
Am 19.06.2018 um 16:20 hat Alberto Garcia geschrieben:
> >> Wait, I think the description I gave is inaccurate:
> >> 
> >> commit_complete() calls bdrv_drop_intermediate(), and that updates the
> >> backing image name (c->role->update_filename()). If we're doing this in
> >> an intermediate node then it needs to be reopened in read-write mode,
> >> while keeping the rest of the options.
> >> 
> >> But then bdrv_reopen_commit() realizes that the backing file (from
> >> reopen_state->options) is not the current one (because there's a
> >> bdrv_mirror_top implicit filter) and attempts to remove it. And that's
> >> the root cause of the crash.
> >
> > How did the other (the real?) backing file get into
> > reopen_state->options? I don't think bdrv_drop_intermediate() wants to
> > change anything except the read-only flag, so we should surely have
> > the node name of bdrv_mirror_top there?
> 
> No, it doesn't (try to) change anything else. It cannot do it:
> bdrv_reopen() only takes flags, but keeps the current options. And the
> current options have 'backing' set to a thing different from the
> bdrv_mirror_top node.

Okay, so in theory this is expected to just work.

Actually, do we ever use bdrv_reopen() for flags other than read-only?
Maybe we should get rid of that flags nonsense and simply make it a
bdrv_reopen_set_readonly() taking a boolean.

> > > So my first impression is that we should not try to change backing
> > > files if a reopen was not requested by the user (blockdev-reopen)
> > > and perhaps we should forbid it when there are implicit nodes
> > > involved.
> > Changing the behaviour depending on who the caller is sounds like a
> > hack that relies on coincidence and may break sooner or later.
> 
> The main difference between the user calling blockdev-reopen and the
> code doing bdrv_reopen() internally is that in the former case all
> existing options are ignored (keep_old_opts = false) and in the latter
> they are kept.
> 
> This latter case can have unintended consequences, and I think they're
> all related to the fact that bs->explicit_options also keeps the options
> of its children. So if you have
> 
>C <- B <- A
> 
> and A contains 'backing.driver=qcow2, backing.node-name=foo, ...', and
> you reopen A (keeping its options) then everything goes fine. However if
> you remove B from the chain (using e.g. block-stream) then you can't
> reopen A anymore because backing.* no longer matches the current backing
> file.
> 
> I suppose that we would need to remove the children options from
> bs->explicit_options in that case? If this all happens because of the
> user calling blockdev-reopen then we don't need to worry about it becase
> bs->explicit_options are ignored.

So the problem is that bs->explicit_options (and probably bs->options)
aren't consistent with the actual state of the graph. The fix for that
is likely not in bdrv_reopen().

I think we should already remove nested options of children from the
dicts in bdrv_open_inherit(). I'm less sure about node-name references.
Maybe instead of keeing the dicts up-to-date each time we modify the
graph, we should just get rid of those in the dicts as well, and instead
add a function that reconstructs the full dict from bs->options and the
currently attached children. This requires that the child name and the
option name match, but I think that's true. (Mostly at least - what
about quorum? But the child node handling of quorum is broken anyway.)

I'm almost sure Max has an opinion about this, too. :-)

> >> Ah, ok, I think that's related to the crash that I mentioned earlier
> >> with block-commit. Yes, it makes sense that we forbid that. I suppose
> >> that we can do this already with BLK_PERM_GRAPH_MOD ?
> >
> > Possibly. The thing with BLK_PERM_GRAPH_MOD is that nobody knows what
> > its meaning has to be so that it's actually useful, so we're not using
> > it in any meaningful way yet.
> >
> > I suspect BLK_PERM_GRAPH_MOD is the wrong tool because what we
> > actually want to protect against modification is not a BDS, but a
> > BdrvChild. But I may be wrong there.
> 
> I'll take a closer look and see what I come up with.

Okay. Maybe don't implement anything yet, but just try to come up with a
concept for discussion.

> At the moment there's 
> 
> Berto

And it's very good to have a Berto at the moment, but I think you wanted
to add something else there? ;-)

Kevin



Re: [Qemu-block] [PATCH 3/6] migration: add support for a "tls-authz" migration parameter

2018-06-20 Thread Juan Quintela
Daniel P. Berrangé  wrote:
> On Wed, Jun 20, 2018 at 12:03:45PM +0200, Juan Quintela wrote:
>> Daniel P. Berrangé  wrote:
>> > From: "Daniel P. Berrange" 
>> 
>> .
>> 
>> 
>> It is not just the fault of this patch, but as you are the one doing the
>> tls bits on migration...
>> 
>> 
>> > @@ -1106,6 +1108,12 @@ static void 
>> > migrate_params_apply(MigrateSetParameters *params, Error **errp)
>> >  s->parameters.tls_hostname = g_strdup(params->tls_hostname->u.s);
>> >  }
>> >  
>> > +if (params->has_tls_authz) {
>> > +g_free(s->parameters.tls_authz);
>> > +assert(params->tls_authz->type == QTYPE_QSTRING);
>> 
>> We really try  hard not to use assert() on migration code (yes, it is an
>> ongoing effort).  The code around this is something like:
>> 
>> static void migrate_params_test_apply(MigrateSetParameters *params,
>>   MigrationParameters *dest)
>> {
>> [...]
>> 
>> if (params->has_compress_level) {
>> dest->compress_level = params->compress_level;
>> }
>> 
>> [...]
>> 
>> if (params->has_tls_creds) {
>> assert(params->tls_creds->type == QTYPE_QSTRING);
>> dest->tls_creds = g_strdup(params->tls_creds->u.s);
>> }
>> [...]
>> }
>> 
>> Ok, tls code is the one with strings, but still.
>
> My understanding is that because we declared this parameter to
> have "str" type in the QAPI schema, the QAPI code should already
> guarantee that  "params->tls_creds->type == QTYPE_QSTRING" is
> true.
>
> IOW, the assert should never fail, and if it does, that would
> be a good thing as if QAPI wasn't validating input correctly
> something very bad has gone wrong.
>
>
>> static bool migrate_params_check(MigrationParameters *params, Error **errp)
>> {
>> if (params->has_compress_level &&
>> (params->compress_level > 9)) {
>> error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "compress_level",
>>"is invalid, it should be in the range of 0 to 9");
>> return false;
>
> This is different because QAPI schema merely declares it to be an
> int, so nothing in the QAPI input validation will do range checking.
> So this codepath is definitely reachable by users feeding bad input.

ok then.  Thanks for the explanation.

Later, Juan.



Re: [Qemu-block] [PATCH 3/6] migration: add support for a "tls-authz" migration parameter

2018-06-20 Thread Daniel P . Berrangé
On Wed, Jun 20, 2018 at 12:03:45PM +0200, Juan Quintela wrote:
> Daniel P. Berrangé  wrote:
> > From: "Daniel P. Berrange" 
> 
> .
> 
> 
> It is not just the fault of this patch, but as you are the one doing the
> tls bits on migration...
> 
> 
> > @@ -1106,6 +1108,12 @@ static void 
> > migrate_params_apply(MigrateSetParameters *params, Error **errp)
> >  s->parameters.tls_hostname = g_strdup(params->tls_hostname->u.s);
> >  }
> >  
> > +if (params->has_tls_authz) {
> > +g_free(s->parameters.tls_authz);
> > +assert(params->tls_authz->type == QTYPE_QSTRING);
> 
> We really try  hard not to use assert() on migration code (yes, it is an
> ongoing effort).  The code around this is something like:
> 
> static void migrate_params_test_apply(MigrateSetParameters *params,
>   MigrationParameters *dest)
> {
> [...]
> 
> if (params->has_compress_level) {
> dest->compress_level = params->compress_level;
> }
> 
> [...]
> 
> if (params->has_tls_creds) {
> assert(params->tls_creds->type == QTYPE_QSTRING);
> dest->tls_creds = g_strdup(params->tls_creds->u.s);
> }
> [...]
> }
> 
> Ok, tls code is the one with strings, but still.

My understanding is that because we declared this parameter to
have "str" type in the QAPI schema, the QAPI code should already
guarantee that  "params->tls_creds->type == QTYPE_QSTRING" is
true.

IOW, the assert should never fail, and if it does, that would
be a good thing as if QAPI wasn't validating input correctly
something very bad has gone wrong.


> static bool migrate_params_check(MigrationParameters *params, Error **errp)
> {
> if (params->has_compress_level &&
> (params->compress_level > 9)) {
> error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "compress_level",
>"is invalid, it should be in the range of 0 to 9");
> return false;

This is different because QAPI schema merely declares it to be an
int, so nothing in the QAPI input validation will do range checking.
So this codepath is definitely reachable by users feeding bad input.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [Qemu-block] [PATCH 3/6] migration: add support for a "tls-authz" migration parameter

2018-06-20 Thread Juan Quintela
Daniel P. Berrangé  wrote:
> From: "Daniel P. Berrange" 

.


It is not just the fault of this patch, but as you are the one doing the
tls bits on migration...


> @@ -1106,6 +1108,12 @@ static void migrate_params_apply(MigrateSetParameters 
> *params, Error **errp)
>  s->parameters.tls_hostname = g_strdup(params->tls_hostname->u.s);
>  }
>  
> +if (params->has_tls_authz) {
> +g_free(s->parameters.tls_authz);
> +assert(params->tls_authz->type == QTYPE_QSTRING);

We really try  hard not to use assert() on migration code (yes, it is an
ongoing effort).  The code around this is something like:

static void migrate_params_test_apply(MigrateSetParameters *params,
  MigrationParameters *dest)
{
[...]

if (params->has_compress_level) {
dest->compress_level = params->compress_level;
}

[...]

if (params->has_tls_creds) {
assert(params->tls_creds->type == QTYPE_QSTRING);
dest->tls_creds = g_strdup(params->tls_creds->u.s);
}
[...]
}

Ok, tls code is the one with strings, but still.

static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
{
[...]
if (params->has_compress_level) {
s->parameters.compress_level = params->compress_level;
}
[...]

if (params->has_tls_creds) {
g_free(s->parameters.tls_creds);
assert(params->tls_creds->type == QTYPE_QSTRING);
s->parameters.tls_creds = g_strdup(params->tls_creds->u.s);
}
}

And this other function:

static bool migrate_params_check(MigrationParameters *params, Error **errp)
{
if (params->has_compress_level &&
(params->compress_level > 9)) {
error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "compress_level",
   "is invalid, it should be in the range of 0 to 9");
return false;
}
[...]
}

Where we don't check anything for tls.

Perhaps we can move the asserts here?

We can also try to merge migrate_params_check and
migrate_params_test_apply() into a single function, but it is not
completely trivial at the moment.

Wondering if we can do it better.

Later, Juan.



Re: [Qemu-block] [PATCH v2 02/10] qcow2/bitmap: avoid adjusting bm->flags for RO bitmaps

2018-06-20 Thread Vladimir Sementsov-Ogievskiy

13.06.2018 05:06, John Snow wrote:

Instead of always setting IN_USE, do this conditionally based on
whether or not the bitmap is read-only. Eliminate the two-pass
processing and move the second loop to the failure case.

This will allow us to show the flags exactly as they appear
on-disk for bitmaps in `qemu-img info`.

Signed-off-by: John Snow 
---
  block/qcow2-bitmap.c | 51 +--
  1 file changed, 21 insertions(+), 30 deletions(-)

diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c
index 0670e5eb41..85c1b5afe3 100644
--- a/block/qcow2-bitmap.c
+++ b/block/qcow2-bitmap.c
@@ -919,13 +919,6 @@ fail:
  return ret;
  }
  
-/* for g_slist_foreach for GSList of BdrvDirtyBitmap* elements */

-static void release_dirty_bitmap_helper(gpointer bitmap,
-gpointer bs)
-{
-bdrv_release_dirty_bitmap(bs, bitmap);
-}
-
  /* for g_slist_foreach for GSList of BdrvDirtyBitmap* elements */
  static void set_readonly_helper(gpointer bitmap, gpointer value)
  {
@@ -944,8 +937,7 @@ bool qcow2_load_dirty_bitmaps(BlockDriverState *bs, Error 
**errp)
  BDRVQcow2State *s = bs->opaque;
  Qcow2BitmapList *bm_list;
  Qcow2Bitmap *bm;
-GSList *created_dirty_bitmaps = NULL;
-bool header_updated = false;
+bool needs_update = false;
  
  if (s->nb_bitmaps == 0) {

  /* No bitmaps - nothing to do */
@@ -968,35 +960,34 @@ bool qcow2_load_dirty_bitmaps(BlockDriverState *bs, Error 
**errp)
  bdrv_disable_dirty_bitmap(bitmap);
  }
  bdrv_dirty_bitmap_set_persistance(bitmap, true);
-bm->flags |= BME_FLAG_IN_USE;
-created_dirty_bitmaps =
-g_slist_append(created_dirty_bitmaps, bitmap);
-}
-}
-
-if (created_dirty_bitmaps != NULL) {
-if (can_write(bs)) {
-/* in_use flags must be updated */
-int ret = update_ext_header_and_dir_in_place(bs, bm_list);
-if (ret < 0) {
-error_setg_errno(errp, -ret, "Can't update bitmap directory");
-goto fail;
+if (can_write(bs)) {
+bm->flags |= BME_FLAG_IN_USE;
+needs_update = true;
+} else {
+bdrv_dirty_bitmap_set_readonly(bitmap, true);
  }


+    bm->dirty_bitmap = bitmap; [1]


-header_updated = true;
-} else {
-g_slist_foreach(created_dirty_bitmaps, set_readonly_helper,
-(gpointer)true);
  }
  }
  
-g_slist_free(created_dirty_bitmaps);

+/* in_use flags must be updated */
+if (needs_update) {
+int ret = update_ext_header_and_dir_in_place(bs, bm_list);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "Can't update bitmap directory");
+goto fail;
+}
+}
+
  bitmap_list_free(bm_list);
  
-return header_updated;

+return needs_update;
  
  fail:

-g_slist_foreach(created_dirty_bitmaps, release_dirty_bitmap_helper, bs);
-g_slist_free(created_dirty_bitmaps);
+QSIMPLEQ_FOREACH(bm, bm_list, entry) {
+if (bm->dirty_bitmap) {


hm stop, at this point, you didn't yet set it. may be in later patches. 
I remember that you did it in previous version of patch series, so [1].



+bdrv_release_dirty_bitmap(bs, bm->dirty_bitmap);
+}
+}
  bitmap_list_free(bm_list);
  
  return false;


with [1]:

Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir




Re: [Qemu-block] [Qemu-devel] [PATCH] [RFC v2] aio: properly bubble up errors from initialization

2018-06-20 Thread Kevin Wolf
Am 20.06.2018 um 00:54 hat Nishanth Aravamudan geschrieben:
> On 19.06.2018 [15:35:57 -0700], Nishanth Aravamudan wrote:
> > On 19.06.2018 [13:14:51 -0700], Nishanth Aravamudan wrote:
> > > On 19.06.2018 [14:35:33 -0500], Eric Blake wrote:
> > > > On 06/15/2018 12:47 PM, Nishanth Aravamudan via Qemu-devel wrote:
> > 
> > 
> > 
> > > > >   } else if (s->use_linux_aio) {
> > > > > +int rc;
> > > > > +rc = aio_setup_linux_aio(bdrv_get_aio_context(bs));
> > > > > +if (rc != 0) {
> > > > > +error_report("Unable to use native AIO, falling back 
> > > > > to "
> > > > > + "thread pool.");
> > > > 
> > > > In general, error_report() should not output a trailing '.'.
> > > 
> > > Will fix.
> > > 
> > > > > +s->use_linux_aio = 0;
> > > > > +return rc;
> > > > 
> > > > Wait - the message claims we are falling back, but the non-zero return 
> > > > code
> > > > sounds like we are returning an error instead of falling back.  (My
> > > > preference - if the user requested something and we can't do it, it's 
> > > > better
> > > > to error than to fall back to something that does not match the user's
> > > > request).
> > > 
> > > I think that makes sense, I hadn't tested this specific case (in my
> > > reading of the code, it wasn't clear to me if raw_co_prw() could be
> > > called before raw_aio_plug() had been called, but I think returning the
> > > error code up should be handled correctly. What about the cases where
> > > there is no error handling (the other two changes in the patch)?
> > 
> > While looking at doing these changes, I realized that I'm not quite sure
> > what the right approach is here. My original rationale for returning
> > non-zero was that AIO was requested but could not be completed. I
> > haven't fully tracked back the calling paths, but I assumed it would get
> > retried at the top level, and since we indicated to not use AIO on
> > subsequent calls, it will succeed and use threads then (note, that I do
> > now realize this means a mismatch between the qemu command-line and the
> > in-use AIO model).
> > 
> > In practice, with my v2 patch, where I do return a non-zero error-code
> > from this function, qemu does not exit (nor is any logging other than
> > that I added emitted on the monitor). If I do not fallback, I imagine we
> > would just continuously see this error message and IO might not actually
> > every occur? Reworking all of the callpath to fail on non-zero returns
> > from raw_co_prw() seems like a fair bit of work, but if that is what is
> > being requested, I can try that (it will just take a while).
> > Alternatively, I can produce a v3 quickly that does not bubble the
> > actual errno all the way up (since it does seem like it is ignored
> > anyways?).
> 
> Sorry for the noise, but I had one more thought. Would it be appropriate
> to push the _setup() call up to when we parse the arguments about
> aio=native? E.g., we already check there if cache=directsync is
> specified and error out if not.

We already do this:

 /* Currently Linux does AIO only for files opened with O_DIRECT */
if (s->use_linux_aio && !(s->open_flags & O_DIRECT)) {
error_setg(errp, "aio=native was specified, but it requires "
 "cache.direct=on, which was not specified.");
ret = -EINVAL;
goto fail;
}

laio_init() is about other types of errors. But anyway, yes, calling
laio_init() already in .bdrv_open() is possible. Returning errors from
.bdrv_open() is nice and easy and we should do it.

However, we may also need to call laio_init() again when switching to a
different I/O thread after the image is already opened. This is what I
meant when I commented on v1 that you should do this in the
.bdrv_attach_aio_context callback. The problem here is that we can't
return an error there and the guest is already using the image. In this
case, logging an error and falling back to the thread pool seems to be
the best option we have.

Kevin



Re: [Qemu-block] [PATCH v2 01/10] qcow2/bitmap: remove redundant arguments from bitmap_list_load

2018-06-20 Thread Vladimir Sementsov-Ogievskiy

13.06.2018 05:06, John Snow wrote:

We always call it with the same fields of the struct we always pass.
We can split this out later if we really wind up needing to.

Signed-off-by: John Snow 



Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir




Re: [Qemu-block] [PATCH v5 3/6] nbd/server: add nbd_meta_empty_or_pattern helper

2018-06-20 Thread Vladimir Sementsov-Ogievskiy

19.06.2018 23:24, Eric Blake wrote:

On 06/09/2018 10:17 AM, Vladimir Sementsov-Ogievskiy wrote:

Add nbd_meta_pattern() and nbd_meta_empty_or_pattern() helpers for
metadata query parsing. nbd_meta_pattern() will be reused for "qemu"


s/for/for the/


namespace in following patches.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  nbd/server.c | 86 
+---

  1 file changed, 59 insertions(+), 27 deletions(-)


Feels like growth, even though the goal of refactoring is reuse; but 
the reuse comes later so I'm okay with it.




diff --git a/nbd/server.c b/nbd/server.c
index 567561a77e..2d762d7289 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -733,52 +733,83 @@ static int 
nbd_negotiate_send_meta_context(NBDClient *client,
  return qio_channel_writev_all(client->ioc, iov, 2, errp) < 0 ? 
-EIO : 0;

  }
  -/* nbd_meta_base_query
- *
- * Handle query to 'base' namespace. For now, only base:allocation 
context is


[1]...

- * available in it.  'len' is the amount of text remaining to be 
read from

- * the current name, after the 'base:' portion has been stripped.
+/* Read strlen(@pattern) bytes, and set @match to true if they match 
@pattern.

+ * @match is never set to false.
   *
   * Return -errno on I/O error, 0 if option was completely handled by
   * sending a reply about inconsistent lengths, or 1 on success.
   *
- * Note: return code = 1 doesn't mean that we've parsed 
"base:allocation"

- * namespace. It only means that there are no errors.*/
-static int nbd_meta_base_query(NBDClient *client, 
NBDExportMetaContexts *meta,

-   uint32_t len, Error **errp)
+ * Note: return code = 1 doesn't mean that we've read exactly @pattern
+ * It only means that there are no errors. */


Comment tail on its own line (now that we've got a patch pending for 
HACKING to document that, I'll start abiding by it...)


+static int nbd_meta_pattern(NBDClient *client, const char *pattern, 
bool *match,

+    Error **errp)
  {
  int ret;
-    char query[sizeof("allocation") - 1];
-    size_t alen = strlen("allocation");
+    char *query;
+    int len = strlen(pattern);


size_t is better than len for strlen() results.


  -    if (len == 0) {
-    if (client->opt == NBD_OPT_LIST_META_CONTEXT) {
-    meta->base_allocation = true;
-    }
-    trace_nbd_negotiate_meta_query_parse("base:");
-    return 1;
-    }
-
-    if (len != alen) {
-    trace_nbd_negotiate_meta_query_skip("not base:allocation");
-    return nbd_opt_skip(client, len, errp);
-    }
+    assert(len);
  +    query = g_malloc(len);


At first, I wondered if we could just use a pre-allocated stack buffer 
larger than any string we ever anticipate.  But thinking about it, 
your dirty bitmap exports expose a name under user control, which 
means a user could (spitefully) pick a name longer than our buffer 
(well, up to the 4k name limit imposed by the NBD protocol).  So I can 
live with the malloc.



  ret = nbd_opt_read(client, query, len, errp);
  if (ret <= 0) {
+    g_free(query);
  return ret;
  }
  -    if (strncmp(query, "allocation", alen) == 0) {
- trace_nbd_negotiate_meta_query_parse("base:allocation");
-    meta->base_allocation = true;
+    if (strncmp(query, pattern, len) == 0) {
+    trace_nbd_negotiate_meta_query_parse(pattern);
+    *match = true;
  } else {
-    trace_nbd_negotiate_meta_query_skip("not base:allocation");
+    trace_nbd_negotiate_meta_query_skip(pattern);


Would this one read better as "not %s", pattern?


may be, but how? introduce malloced string variable to sprintf?




  }
+    g_free(query);
    return 1;
  }
  +/* Read @len bytes, and set @match to true if they match @pattern, 
or if @len
+ * is 0 and the client is performing _LIST_. @match is never set to 
false.

+ *
+ * Return -errno on I/O error, 0 if option was completely handled by
+ * sending a reply about inconsistent lengths, or 1 on success.
+ *
+ * Note: return code = 1 doesn't mean that we've read exactly @pattern
+ * It only means that there are no errors. */


More comment formatting.


hm, we need something in checkpatch for it



+static int nbd_meta_empty_or_pattern(NBDClient *client, const char 
*pattern,
+ uint32_t len, bool *match, 
Error **errp)

+{
+    if (len == 0) {
+    if (client->opt == NBD_OPT_LIST_META_CONTEXT) {
+    *match = true;
+    }
+    trace_nbd_negotiate_meta_query_parse("empty");
+    return 1;
+    }
+
+    if (len != strlen(pattern)) {
+    trace_nbd_negotiate_meta_query_skip("different lengths");
+    return nbd_opt_skip(client, len, errp);
+    }
+
+    return nbd_meta_pattern(client, pattern, match, errp);
+}
+
+/* nbd_meta_base_query
+ *
+ * Handle query to 'base' namespace. For now, only base:allocation 
context is


Pre-existing (see [1]), but reads 

Re: [Qemu-block] [Qemu-devel] [PATCH 3/7] block/qcow2-refcount: check_refcounts_l2: refactor compressed case

2018-06-20 Thread Vladimir Sementsov-Ogievskiy

19.06.2018 21:50, Eric Blake wrote:

On 06/19/2018 01:34 PM, Vladimir Sementsov-Ogievskiy wrote:

Separate offset and size of compressed cluster.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block/qcow2-refcount.c | 15 ++-
  1 file changed, 10 insertions(+), 5 deletions(-)


Hmm, I wonder if this duplicates my pending patch:

https://lists.gnu.org/archive/html/qemu-devel/2018-04/msg04542.html



hm which one? don't see.

--
Best regards,
Vladimir




Re: [Qemu-block] [Qemu-devel] [PATCH 6/7] block/qcow2-refcount: fix out-of-file L1 entries to be zero

2018-06-20 Thread Vladimir Sementsov-Ogievskiy

19.06.2018 21:54, Eric Blake wrote:

On 06/19/2018 01:34 PM, Vladimir Sementsov-Ogievskiy wrote:

Zero out corrupted L1 table entry, which reference L2 table out of
underlying file.
Zero L1 table entry means that "the L2 table and all clusters described
by this L2 table are unallocated."

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block/qcow2-refcount.c | 37 +
  1 file changed, 37 insertions(+)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index d993252fb6..3c9e2da39e 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1641,6 +1641,29 @@ static int 
fix_l2_entry_to_zero(BlockDriverState *bs, BdrvCheckResult *res,

  return ret;
  }
  +/* Zero out L1 entry
+ *
+ * Returns: -errno if overlap check failed
+ *  0 if write failed


If the write failed, wouldn't there be an errno value worth returning?


it's done to mimic existing behavior in check_refcounts_l2, when on 
rewriting error, overlap error is fatal and write error is not.





+ *  1 on success
+ */
+static int fix_l1_entry_to_zero(BlockDriverState *bs, 
BdrvCheckResult *res,

+    BdrvCheckMode fix, int64_t l1_offset,
+    int l1_index, bool active,
+    const char *fmt, ...)
+{
+    int ret;
+    int ign = active ? QCOW2_OL_ACTIVE_L2 : QCOW2_OL_INACTIVE_L2;
+    va_list args;
+
+    va_start(args, fmt);
+    ret = fix_table_entry(bs, res, fix, "L1", l1_offset, l1_index, 
0, ign,

+  fmt, args);
+    va_end(args);
+
+    return ret;
+}
+
  /*
   * Increases the refcount in the given refcount table for the all 
clusters
   * referenced in the L2 table. While doing so, performs some checks 
on L2
@@ -1837,6 +1860,20 @@ static int check_refcounts_l1(BlockDriverState 
*bs,

  if (l2_offset) {
  /* Mark L2 table as used */
  l2_offset &= L1E_OFFSET_MASK;
+    if (l2_offset >= bdrv_getlength(bs->file->bs)) {


Again, bdrv_getlength() can fail; you want to make sure that you check 
for failures before using it in comparisons.



+    ret = fix_l1_entry_to_zero(
+    bs, res, fix, l1_table_offset, i, active,
+    "l2 table offset out of file: offset 0x%" 
PRIx64,

+    l2_offset);
+    if (ret < 0) {
+    /* Something is seriously wrong, so abort checking
+ * this L1 table */
+    goto fail;
+    }
+
+    continue;
+    }
+
  ret = qcow2_inc_refcounts_imrt(bs, res,
 refcount_table, 
refcount_table_size,
 l2_offset, 
s->cluster_size);







--
Best regards,
Vladimir




Re: [Qemu-block] [PATCH 1/6] qemu-nbd: add support for authorization of TLS clients

2018-06-20 Thread Daniel P . Berrangé
On Tue, Jun 19, 2018 at 03:06:06PM -0500, Eric Blake wrote:
> On 06/15/2018 10:50 AM, Daniel P. Berrangé wrote:
> > From: "Daniel P. Berrange" 
> > 
> > Currently any client which can complete the TLS handshake is able to use
> > the NBD server. The server admin can turn on the 'verify-peer' option
> > for the x509 creds to require the client to provide a x509 certificate.
> > This means the client will have to acquire a certificate from the CA
> > before they are permitted to use the NBD server. This is still a fairly
> > low bar to cross.
> > 
> > This adds a '--tls-authz OBJECT-ID' option to the qemu-nbd command which
> > takes the ID of a previously added 'QAuthZ' object instance. This will
> > be used to validate the client's x509 distinguished name. Clients
> > failing the authorization check will not be permitted to use the NBD
> > server.
> > 
> > For example to setup authorization that only allows connection from a client
> > whose x509 certificate distinguished name contains 'CN=fred', you would
> > use:
> > 
> >qemu-nbd -object tls-creds-x509,id=tls0,dir=/home/berrange/qemutls,\
> > endpoint=server,verify-peer=yes \
> > -object authz-simple,id=authz0,policy=deny,\
> >rules.0.match=*CN=fred,rules.0.policy=allow \
> 
> s/-object/--object/g
> 
> > -tls-creds tls0 \
> > -tls-authz authz0
> 
> s/-tls/--tls/g
> 
> (qemu-nbd requires double-dash long-opts, -o means --offset except that
> 'bject' is not an offset; similarly for -t meaning --persistent)

Sigh, yes, just another reason for us to standardize on -- everywhere

> > +++ b/qemu-nbd.c
> 
> > @@ -533,6 +535,7 @@ int main(int argc, char **argv)
> >   { "image-opts", no_argument, NULL, QEMU_NBD_OPT_IMAGE_OPTS },
> >   { "trace", required_argument, NULL, 'T' },
> >   { "fork", no_argument, NULL, QEMU_NBD_OPT_FORK },
> > +{ "tls-authz", no_argument, NULL, QEMU_NBD_OPT_TLSAUTHZ },
> 
> Not your fault, but worth sorting these alphabetically?
> 
> Bummer that pre-patch, you could use '--tls' as an unambiguous abbreviation
> for --tls-creds; now it is an ambiguous prefix (you have to type --tls-c or
> --tls-a to get to the point of no ambiguity).  If we really cared, we could
> add:
> 
> { "t", required_argument, NULL, QEMU_NBD_OPT_TLSCREDS },
> { "tl", required_argument, NULL, QEMU_NBD_OPT_TLSCREDS },
> { "tls", required_argument, NULL, QEMU_NBD_OPT_TLSCREDS },
> { "tls-", required_argument, NULL, QEMU_NBD_OPT_TLSCREDS },
> 
> since getopt_long() no longer reports ambiguity if there is an exact match
> to what is otherwise the common prefix of two ambiguous options. But I don't
> think backwards-compatibility on this front is worth worrying about
> (generally, scripts don't rely on getopt_long()'s unambiguous prefix
> handling).

Personally I think this is not worth worrying about. We've never documented
ability to abbreviate nor ever promised they are stable. Abbreviations are
inherantly unstable as you illustrate, so if anything we should just document
that you should never abbreviate args.

> 
> > +++ b/qemu-nbd.texi
> > @@ -91,6 +91,10 @@ of the TLS credentials object previously created with 
> > the --object
> >   option.
> >   @item --fork
> >   Fork off the server process and exit the parent once the server is 
> > running.
> > +@item --tls-authz=ID
> > +Specify the ID of a qauthz object previously created with the
> 
> s/qauthz/authz-simple/ ?

No, qauthz is a QOM interface, which is implemented by many subclasses,
of which authz-simple is just one example.

> > +--object option. This will be used to authorize users who
> > +connect against their x509 distinguished name.
> 
> Sounds like someone is "connecting against their name", rather than
> "authorizing against their name".  Better might be:
> 
> This will be used to authorize connecting users against their x509
> distinguished name.

Ok

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|