Re: [Qemu-block] [PATCH 2/4] nbd/server: add nbd_meta_single_query helper

2018-04-13 Thread Vladimir Sementsov-Ogievskiy

21.03.2018 18:05, Eric Blake wrote:

On 03/21/2018 07:19 AM, Vladimir Sementsov-Ogievskiy wrote:

The helper will be reused for bitmaps namespace.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  nbd/server.c | 41 -
  1 file changed, 28 insertions(+), 13 deletions(-)




+/* Read len bytes and check matching to the pattern.
+ * @match is set to true on empty query for _LIST_ and for query 
matching the

+ * @pattern. @match is never set to false.


How about:

Read @len bytes, and set @match to true if they match @pattern, or if 
@len is 0 and the client is performing _LIST_.  @match is never set to 
false.


will add, as always, thank you for natural rewording) Hm, I have a 
question: why do you often use double white-space "  " between 
sentences? Is it something meaningful?




At any rate, the refactoring is sane; and comment touchups are 
trivial, so


Reviewed-by: Eric Blake 




--
Best regards,
Vladimir




[Qemu-block] [PATCH v2 3/3] qapi: new qmp command nbd-server-add-bitmap

2018-04-13 Thread Vladimir Sementsov-Ogievskiy
Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 qapi/block.json | 23 +++
 blockdev-nbd.c  | 23 +++
 2 files changed, 46 insertions(+)

diff --git a/qapi/block.json b/qapi/block.json
index c694524002..cc0e607b5b 100644
--- a/qapi/block.json
+++ b/qapi/block.json
@@ -269,6 +269,29 @@
   'data': {'name': 'str', '*mode': 'NbdServerRemoveMode'} }
 
 ##
+# @nbd-server-add-bitmap:
+#
+# Expose a dirty bitmap associated with the selected export. The bitmap search
+# starts at the device attached to the export, and includes all backing files.
+# The exported bitmap is then locked until the NBD export is removed.
+#
+# @name: Export name.
+#
+# @bitmap: Bitmap name to search for.
+#
+# @bitmap-export-name: How the bitmap will be seen by nbd clients
+#  (default @bitmap)
+#
+# Note: the client must use NBD_OPT_SET_META_CONTEXT with a query of
+# "qemu:dirty-bitmap:NAME" (where NAME matches @bitmap-export-name) to access
+# the exposed bitmap.
+#
+# Since: 2.13
+##
+  { 'command': 'nbd-server-add-bitmap',
+'data': {'name': 'str', 'bitmap': 'str', '*bitmap-export-name': 'str'} }
+
+##
 # @nbd-server-stop:
 #
 # Stop QEMU's embedded NBD server, and unregister all devices previously
diff --git a/blockdev-nbd.c b/blockdev-nbd.c
index 65a84739ed..6b0c50732c 100644
--- a/blockdev-nbd.c
+++ b/blockdev-nbd.c
@@ -220,3 +220,26 @@ void qmp_nbd_server_stop(Error **errp)
 nbd_server_free(nbd_server);
 nbd_server = NULL;
 }
+
+void qmp_nbd_server_add_bitmap(const char *name, const char *bitmap,
+   bool has_bitmap_export_name,
+   const char *bitmap_export_name,
+   Error **errp)
+{
+NBDExport *exp;
+
+if (!nbd_server) {
+error_setg(errp, "NBD server not running");
+return;
+}
+
+exp = nbd_export_find(name);
+if (exp == NULL) {
+error_setg(errp, "Export '%s' is not found", name);
+return;
+}
+
+nbd_export_bitmap(exp, bitmap,
+  has_bitmap_export_name ? bitmap_export_name : bitmap,
+  errp);
+}
-- 
2.11.1




[Qemu-block] [PATCH v2 0/3] NBD export bitmaps

2018-04-13 Thread Vladimir Sementsov-Ogievskiy
Hi all.

This is a proposal and realization of new NBD meta context:
qemu. (I hope to send corresponding proposal to NBD protocol soon)

New possible queries will look like:
qemu:dirty-bitmap:

Mapping from export-bitmap-name to BdrvDirtyBitmap is done through qmp
command nbd-server-add-bitmap. For now, only one bitmap export is
allowed per NBD export, however it may be easily improved if needed 
(we don't have such cases for now)

Client and testing.
I wrote client code for Virtuozzo, but it turned out to be unused,
actually it's used only for tests. We don't have cases, where we need
to import dirty bitmap through qemu nbd-client. All this done for
exporting dirty bitmaps to the third tool. So, I think, it is not worth
refactoring, rebasing and merging client part upstream, if there are no
real usage cases.

v2:
01 from v1 is dropped: actually, we don't need generic namespace
parsing for now (especially, after moving to qemu: namespace, which has
the same length as base:), lets postpone it.

01: Improve comment wording (Eric), add Eric's r-b
02: improve commit message
move NBD_STATE_DIRTY to header
add comment on NBD_MAX_BITMAP_EXTENTS
remove MAX_EXTENT_LENGTH and instead update add_extents() which
  uses it
use export_bitmap_context instead of export_bitmap_name to reduce
  operations on it
move from qemu-dirty-bitmap to qemu:dirty-bitmap
other way to parse namespace name
handle FLAG_DF
03: Improve specification of new qmp command (Eric)

Vladimir Sementsov-Ogievskiy (3):
  nbd/server: add nbd_meta_single_query helper
  nbd/server: implement dirty bitmap export
  qapi: new qmp command nbd-server-add-bitmap

 qapi/block.json |  23 +
 include/block/nbd.h |   6 ++
 blockdev-nbd.c  |  23 +
 nbd/server.c| 274 ++--
 4 files changed, 297 insertions(+), 29 deletions(-)

-- 
2.11.1




Re: [Qemu-block] [PATCH 2/4] nbd/server: add nbd_meta_single_query helper

2018-04-13 Thread John Snow


On 04/13/2018 01:44 PM, Vladimir Sementsov-Ogievskiy wrote:
> 
> will add, as always, thank you for natural rewording) Hm, I have a
> question: why do you often use double white-space "  " between
> sentences? Is it something meaningful?

There is some GREAT DEBATE in the English-speaking world over whether or
not this is correct typography. At one point, it was thought that there
should be two spaces after every full stop (".") to improve readability.
Allegedly, this was most important for physical typesetting on typewriters.

Since digital typography has taken off, some people argue that this is a
relic and that semantically we ought to be using only one literal space
after the full stop, and using various kerning and display parameters to
extend the physical buffer between two sentences if desired.

Famously in my mind, PEP8 mandates the two spaces after a period style.

MLA says "One, unless your professor prefers two."
https://style.mla.org/number-of-spaces-after-period/

Chicago Manual of Style mandates one space after the full stop.
http://www.chicagomanualofstyle.org/qanda/data/faq/topics/OneSpaceorTwo.html

Strunk & White uses one space after the period:
https://www.legalwatercoolerblog.com/tag/strunk-and-white/

APA style (American Psychiatric Association) actually requests two
spaces after a period for *manuscripts*:
https://owl.english.purdue.edu/owl/resource/560/24/

I think it used to be the pedagogical norm to instruct students to type
two spaces after a period. Most institutions (Python documentation
excluded) do not recommend the practice currently.



*cough*



So it's not just the programming world that argues about things like
"tabs vs spaces." Literary nerds do it too.

I'm sure this email will be entirely without controversy. :)


Happy Friday,
--js



[Qemu-block] [PATCH 3/3] qemu-iotests: Test new qemu-nbd --nolist option

2018-04-13 Thread Nir Soffer
Add new test module for tesing the --nolist option.

Signed-off-by: Nir Soffer 
---
 tests/qemu-iotests/214 | 46 ++
 tests/qemu-iotests/214.out |  2 ++
 tests/qemu-iotests/group   |  1 +
 3 files changed, 49 insertions(+)
 create mode 100755 tests/qemu-iotests/214
 create mode 100644 tests/qemu-iotests/214.out

diff --git a/tests/qemu-iotests/214 b/tests/qemu-iotests/214
new file mode 100755
index 00..779e382070
--- /dev/null
+++ b/tests/qemu-iotests/214
@@ -0,0 +1,46 @@
+#!/usr/bin/env python
+#
+# Test qemu-nbd compatibility with other tools.
+#
+# Copyright (C) 2018 Nir Soffer 
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+import iotests
+
+iotests.verify_image_format(supported_fmts=['raw'])
+
+iotests.log('Check that listing exports is allowed by default')
+disk, nbd_sock = iotests.file_path('disk1', 'nbd-sock1')
+iotests.qemu_img_create('-f', iotests.imgfmt, disk, '1m')
+iotests.qemu_nbd('-k', nbd_sock, '-f', iotests.imgfmt, '-x', 'export', disk)
+out = iotests.run('nbd-client', '-l', '--unix', nbd_sock)
+
+assert 'export' in out.splitlines(), 'Export not in %r' % out
+
+iotests.log('Check that listing exports is forbidden with --nolist')
+disk, nbd_sock = iotests.file_path('disk2', 'nbd-sock2')
+iotests.qemu_img_create('-f', iotests.imgfmt, disk, '1m')
+iotests.qemu_nbd('-k', nbd_sock, '-f', iotests.imgfmt, '-x', 'secret',
+ '--nolist', disk)
+
+# nbd-client fails when listing is not allowed, but lets not depend on 3rd
+# party tool behavior here.
+try:
+out = iotests.run('nbd-client', '-l', '--unix', nbd_sock)
+assert 'secret' not in out, 'Export in %r' % out
+except iotests.CommandFailed as e:
+# This text comes from qemu-nbd.
+assert 'Listing exports is forbidden' in e.err, 'Unexpected error: %s' % e
diff --git a/tests/qemu-iotests/214.out b/tests/qemu-iotests/214.out
new file mode 100644
index 00..dae61b5a57
--- /dev/null
+++ b/tests/qemu-iotests/214.out
@@ -0,0 +1,2 @@
+Check that listing exports is allowed by default
+Check that listing exports is forbidden with --nolist
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index 52a80f3f9e..a820dcb91f 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -212,3 +212,4 @@
 211 rw auto quick
 212 rw auto quick
 213 rw auto quick
+214 rw auto quick
-- 
2.14.3




[Qemu-block] [PATCH 0/3] qemu-nbd: Disallow listing exports

2018-04-13 Thread Nir Soffer
oVirt uses random URLs to expose images temporarily via HTTPS. We would
like to integrated qemu-nbd in the same system, proving a user an easy
and uniform way to access an image - either using HTTPS:

https://server:54322/images/dc72d3cc-b933-45e8-89a2-e028e1c2ef3d

Or using NBD over TLS:

nbd://server:10809/dc72d3cc-b933-45e8-89a2-e028e1c2ef3d

Unfortunatly, qemu-nbd allows listing exports by default. Allowing
anyone to find the secret export using easy to guess port number.

These patches:
- add --nolist option to qemu-nbd, disabling NBD_OPT_LIST command.
- add some infrastructure ot iotests.py
- and use the new infrastructure to add test the new option using
  nbd-client.

Adding dependency on nbd-client may be probelematic, but I think
qemu-nbd should have tests ensuring compatibility with other tools.

Nir Soffer (3):
  nbd: Add option to disallow listing exports
  iotests.py: Add helper for running commands
  qemu-iotests: Test new qemu-nbd --nolist option

 blockdev-nbd.c|  2 +-
 include/block/nbd.h   |  1 +
 nbd/server.c  |  7 +++
 qemu-nbd.c|  9 -
 qemu-nbd.texi |  2 ++
 tests/qemu-iotests/214| 46 +++
 tests/qemu-iotests/214.out|  2 ++
 tests/qemu-iotests/group  |  1 +
 tests/qemu-iotests/iotests.py | 18 +
 9 files changed, 86 insertions(+), 2 deletions(-)
 create mode 100755 tests/qemu-iotests/214
 create mode 100644 tests/qemu-iotests/214.out

-- 
2.14.3




Re: [Qemu-block] [PATCH 1/3] nbd: Add option to disallow listing exports

2018-04-13 Thread Richard W.M. Jones
On Fri, Apr 13, 2018 at 10:26:03PM +0300, Nir Soffer wrote:
> When a management application expose images using qemu-nbd, it needs a
> secure way to allow temporary access to the disk. Using a random export
> name can solve this problem:
> 
> nbd://server:10809/22965f19-9ab5-4d18-94e1-cbeb321fa433
> 
> Assuming that the url is passed to the user in a secure way, and the
> user is using TLS to access the image.
> 
> However, since qemu-nbd implements NBD_OPT_LIST, anyone can easily find
> the secret export:
> 
> $ nbd-client -l server 10809
> Negotiation: ..
> 22965f19-9ab5-4d18-94e1-cbeb321fa433
> 
> Add a new --nolist option, disabling listing, similar the "allowlist"
> nbd-server configuration option.
> 
> When used, listing exports will fail like this:
> 
> $ nbd-client -l localhost 10809
> Negotiation: ..
> 
> E: listing not allowed by server.
> Server said: Listing exports is forbidden
> 
> Signed-off-by: Nir Soffer 

Tested-by: Richard W.M. Jones 

The code looks good to me too, so ACK.

Rich.

>  blockdev-nbd.c  | 2 +-
>  include/block/nbd.h | 1 +
>  nbd/server.c| 7 +++
>  qemu-nbd.c  | 9 -
>  qemu-nbd.texi   | 2 ++
>  5 files changed, 19 insertions(+), 2 deletions(-)
> 
> diff --git a/blockdev-nbd.c b/blockdev-nbd.c
> index 65a84739ed..b9a885dc4b 100644
> --- a/blockdev-nbd.c
> +++ b/blockdev-nbd.c
> @@ -37,7 +37,7 @@ static void nbd_accept(QIONetListener *listener, 
> QIOChannelSocket *cioc,
>  {
>  qio_channel_set_name(QIO_CHANNEL(cioc), "nbd-server");
>  nbd_client_new(NULL, cioc,
> -   nbd_server->tlscreds, NULL,
> +   nbd_server->tlscreds, NULL, true,
> nbd_blockdev_client_closed);
>  }
>  
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index fcdcd54502..5c6b6272a0 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -308,6 +308,7 @@ void nbd_client_new(NBDExport *exp,
>  QIOChannelSocket *sioc,
>  QCryptoTLSCreds *tlscreds,
>  const char *tlsaclname,
> +bool allow_list,
>  void (*close_fn)(NBDClient *, bool));
>  void nbd_client_get(NBDClient *client);
>  void nbd_client_put(NBDClient *client);
> diff --git a/nbd/server.c b/nbd/server.c
> index 9e1f227178..7b91922d1d 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -115,6 +115,7 @@ struct NBDClient {
>  
>  bool structured_reply;
>  NBDExportMetaContexts export_meta;
> +bool allow_list;
>  
>  uint32_t opt; /* Current option being negotiated */
>  uint32_t optlen; /* remaining length of data in ioc for the option being
> @@ -1032,6 +1033,10 @@ static int nbd_negotiate_options(NBDClient *client, 
> uint16_t myflags,
>  case NBD_OPT_LIST:
>  if (length) {
>  ret = nbd_reject_length(client, false, errp);
> +} else if (!client->allow_list) {
> +ret = nbd_negotiate_send_rep_err(client,
> + NBD_REP_ERR_POLICY, 
> errp,
> + "Listing exports is 
> forbidden");
>  } else {
>  ret = nbd_negotiate_handle_list(client, errp);
>  }
> @@ -2141,6 +2146,7 @@ void nbd_client_new(NBDExport *exp,
>  QIOChannelSocket *sioc,
>  QCryptoTLSCreds *tlscreds,
>  const char *tlsaclname,
> +bool allow_list,
>  void (*close_fn)(NBDClient *, bool))
>  {
>  NBDClient *client;
> @@ -2158,6 +2164,7 @@ void nbd_client_new(NBDExport *exp,
>  object_ref(OBJECT(client->sioc));
>  client->ioc = QIO_CHANNEL(sioc);
>  object_ref(OBJECT(client->ioc));
> +client->allow_list = allow_list;
>  client->close_fn = close_fn;
>  
>  co = qemu_coroutine_create(nbd_co_client_start, client);
> diff --git a/qemu-nbd.c b/qemu-nbd.c
> index 0af0560ad1..b63d4d9e8b 100644
> --- a/qemu-nbd.c
> +++ b/qemu-nbd.c
> @@ -52,6 +52,7 @@
>  #define QEMU_NBD_OPT_TLSCREDS  261
>  #define QEMU_NBD_OPT_IMAGE_OPTS262
>  #define QEMU_NBD_OPT_FORK  263
> +#define QEMU_NBD_OPT_NOLIST264
>  
>  #define MBR_SIZE 512
>  
> @@ -66,6 +67,7 @@ static int shared = 1;
>  static int nb_fds;
>  static QIONetListener *server;
>  static QCryptoTLSCreds *tlscreds;
> +static bool allow_list = true;
>  
>  static void usage(const char *name)
>  {
> @@ -86,6 +88,7 @@ static void usage(const char *name)
>  "  -v, --verbose display extra debugging information\n"
>  "  -x, --export-name=NAMEexpose export by name\n"
>  "  -D, --description=TEXTwith -x, also export a human-readable 
> description\n"
> +"  --nolist  do not list export\n"
>  "\n"
>  "Exposing part of the image:\n"
>  "  -o, --offset=OFFSET  

[Qemu-block] [PATCH 2/3] iotests.py: Add helper for running commands

2018-04-13 Thread Nir Soffer
Add few helpers for running external commands:

- CommandFailed: exception, keeping all the info related to a failed
  command, and providing a useful error message. (Unfortunately
  subprocess.CalledProcessError does not).

- run(): run a command collecting output from the underlying process
  stdout and stderr, returning the command output or raising
  CommandFailed.

These helpers will be used by new qemu-nbd tests. And later can be used
to cleanup helpers for running qemu-* tools in iotests.py.

Signed-off-by: Nir Soffer 
---
 tests/qemu-iotests/iotests.py | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index b25d48a91b..0f8abf99cb 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -64,6 +64,24 @@ luks_default_secret_object = 'secret,id=keysec0,data=' + \
  os.environ['IMGKEYSECRET']
 luks_default_key_secret_opt = 'key-secret=keysec0'
 
+class CommandFailed(Exception):
+
+def __init__(self, cmd, rc, out, err):
+self.cmd = cmd
+self.rc = rc
+self.out = out
+self.err = err
+
+def __str__(self):
+return ("Command {self.cmd} failed: rc={self.rc}, out={self.out!r}, "
+"err={self.err!r}").format(self=self)
+
+def run(*args):
+p = subprocess.Popen(args, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
+out, err = p.communicate()
+if p.returncode != 0:
+raise CommandFailed(args, p.returncode, out, err)
+return out
 
 def qemu_img(*args):
 '''Run qemu-img and return the exit code'''
-- 
2.14.3




Re: [Qemu-block] [PATCH v4 2/5] qcow2: Document some maximum size constraints

2018-04-13 Thread Max Reitz
On 2018-02-28 15:20, Alberto Garcia wrote:
> On Wed 28 Feb 2018 03:01:33 PM CET, Eric Blake wrote:
> 
 The refcount table has implications on the maximum host file size; a
 larger cluster size is required for the refcount table to cover
 larger offsets.
>>>
>>> Why is this? Because of the refcount_table_clusters field ?
>>>
>>> I think the maximum offset allowed by that is ridiculously high,
>>> exceeding any other limit imposed by the L1/L2 tables.
>[...]
>>> With 512 byte clusters and 64 bit refcount entries I still get 8 PB,
>>> way over what's limited by the L1/L2 tables (128 GB).
>>
>> Do I need to make any modifications to the sentence, then?
> 
> I guess what surprised me the first time that I read it was that it
> suggests that this has to be taken into account when calculating the
> physical limits of an image, while in practice it can be ignored.
> 
> You could say something like 
> 
>   Although the larger the cluster size, the larger the offsets that can
>   be covered by the refcount table, in practice these limits cannot be
>   reached because they are larger than the ones imposed by other data
>   structures.

Are there any updates here?  I guess I personally would just drop the
whole paragraph, because I think it really doesn't matter...

Also note that the maximum file size of ext4 is 16 PB (for 4 kB blocks).
 OK, it's bigger for XFS, but that still gives some perspective.

Also, long before anyone is going to complain about the specification
failing to mention that limit, they are going to complain that qemu
refuses to open their image (because of its limit on the reftable size).

Max

> although I'm sure that you can come up with a better wording than mine :)
> 
> Berto
> 




[Qemu-block] [PATCH v2 1/3] nbd/server: add nbd_meta_single_query helper

2018-04-13 Thread Vladimir Sementsov-Ogievskiy
The helper will be reused for bitmaps namespace.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Eric Blake 
---
 nbd/server.c | 40 +++-
 1 file changed, 27 insertions(+), 13 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index cea158913b..c169c11b3e 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -733,44 +733,58 @@ static int nbd_negotiate_send_meta_context(NBDClient 
*client,
 return qio_channel_writev_all(client->ioc, iov, 2, errp) < 0 ? -EIO : 0;
 }
 
-/* nbd_meta_base_query
- *
- * Handle query to 'base' namespace. For now, only base:allocation context is
- * available in it.  'len' is the amount of text remaining to be read from
- * the current name, after the 'base:' portion has been stripped.
+/* Read @len bytes, and set @match to true if they match @pattern, or if @len
+ * is 0 and the client is performing _LIST_. @match is never set to false.
  *
  * Return -errno on I/O error, 0 if option was completely handled by
  * sending a reply about inconsistent lengths, or 1 on success. */
-static int nbd_meta_base_query(NBDClient *client, NBDExportMetaContexts *meta,
-   uint32_t len, Error **errp)
+static int nbd_meta_single_query(NBDClient *client, const char *pattern,
+ uint32_t len, bool *match, Error **errp)
 {
 int ret;
-char query[sizeof("allocation") - 1];
-size_t alen = strlen("allocation");
+char *query;
 
 if (len == 0) {
 if (client->opt == NBD_OPT_LIST_META_CONTEXT) {
-meta->base_allocation = true;
+*match = true;
 }
 return 1;
 }
 
-if (len != alen) {
+if (len != strlen(pattern)) {
 return nbd_opt_skip(client, len, errp);
 }
 
+query = g_malloc(len);
 ret = nbd_opt_read(client, query, len, errp);
 if (ret <= 0) {
+g_free(query);
 return ret;
 }
 
-if (strncmp(query, "allocation", alen) == 0) {
-meta->base_allocation = true;
+if (strncmp(query, pattern, len) == 0) {
+*match = true;
 }
+g_free(query);
 
 return 1;
 }
 
+/* nbd_meta_base_query
+ *
+ * Handle query to 'base' namespace. For now, only base:allocation context is
+ * available in it.  'len' is the amount of text remaining to be read from
+ * the current name, after the 'base:' portion has been stripped.
+ *
+ * Return -errno on I/O error, 0 if option was completely handled by
+ * sending a reply about inconsistent lengths, or 1 on success. */
+static int nbd_meta_base_query(NBDClient *client, NBDExportMetaContexts *meta,
+   uint32_t len, Error **errp)
+{
+return nbd_meta_single_query(client, "allocation", len,
+ >base_allocation, errp);
+}
+
 /* nbd_negotiate_meta_query
  *
  * Parse namespace name and call corresponding function to parse body of the
-- 
2.11.1




[Qemu-block] [PATCH v2 2/3] nbd/server: implement dirty bitmap export

2018-04-13 Thread Vladimir Sementsov-Ogievskiy
Handle new NBD meta namespace: "qemu", and corresponding queries:
"qemu:dirty-bitmap:".

With new metadata context negotiated, BLOCK_STATUS query will reply
with dirty-bitmap data, converted to extents. New public function
nbd_export_bitmap selects bitmap to export. For now, only one bitmap
may be exported.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/nbd.h |   6 ++
 nbd/server.c| 234 
 2 files changed, 224 insertions(+), 16 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index fcdcd54502..a653d0fba7 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -234,6 +234,10 @@ enum {
 #define NBD_STATE_HOLE (1 << 0)
 #define NBD_STATE_ZERO (1 << 1)
 
+/* Flags for extents (NBDExtent.flags) of NBD_REPLY_TYPE_BLOCK_STATUS,
+ * for qemu:dirty-bitmap:* meta contexts */
+#define NBD_STATE_DIRTY (1 << 0)
+
 static inline bool nbd_reply_type_is_error(int type)
 {
 return type & (1 << 15);
@@ -315,6 +319,8 @@ void nbd_client_put(NBDClient *client);
 void nbd_server_start(SocketAddress *addr, const char *tls_creds,
   Error **errp);
 
+void nbd_export_bitmap(NBDExport *exp, const char *bitmap,
+   const char *bitmap_export_name, Error **errp);
 
 /* nbd_read
  * Reads @size bytes from @ioc. Returns 0 on success.
diff --git a/nbd/server.c b/nbd/server.c
index c169c11b3e..4682b99354 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -23,6 +23,12 @@
 #include "nbd-internal.h"
 
 #define NBD_META_ID_BASE_ALLOCATION 0
+#define NBD_META_ID_DIRTY_BITMAP 1
+
+/* NBD_MAX_BITMAP_EXTENTS: 1 mb of extents data. An empirical constant. If need
+ * to increase, note that NBD protocol defines 32 mb as a limit, after which 
the
+ * reply may be considered as a denial of service attack. */
+#define NBD_MAX_BITMAP_EXTENTS (0x10 / 8)
 
 static int system_errno_to_nbd_errno(int err)
 {
@@ -80,6 +86,9 @@ struct NBDExport {
 
 BlockBackend *eject_notifier_blk;
 Notifier eject_notifier;
+
+BdrvDirtyBitmap *export_bitmap;
+char *export_bitmap_context;
 };
 
 static QTAILQ_HEAD(, NBDExport) exports = QTAILQ_HEAD_INITIALIZER(exports);
@@ -92,6 +101,7 @@ typedef struct NBDExportMetaContexts {
 bool valid; /* means that negotiation of the option finished without
errors */
 bool base_allocation; /* export base:allocation context (block status) */
+bool dirty_bitmap; /* export qemu:dirty-bitmap: */
 } NBDExportMetaContexts;
 
 struct NBDClient {
@@ -785,6 +795,26 @@ static int nbd_meta_base_query(NBDClient *client, 
NBDExportMetaContexts *meta,
  >base_allocation, errp);
 }
 
+/* nbd_meta_bitmap_query
+ *
+ * Handle query to 'qemu-dirty-bitmap' namespace.
+ * 'len' is the amount of text remaining to be read from the current name, 
after
+ * the 'qemu-dirty-bitmap:' portion has been stripped.
+ *
+ * Return -errno on I/O error, 0 if option was completely handled by
+ * sending a reply about inconsistent lengths, or 1 on success. */
+static int nbd_meta_bitmap_query(NBDClient *client, NBDExportMetaContexts 
*meta,
+ uint32_t len, Error **errp)
+{
+if (!client->exp->export_bitmap) {
+return nbd_opt_skip(client, len, errp);
+}
+
+return nbd_meta_single_query(
+client, client->exp->export_bitmap_context + strlen("qemu:"), len,
+>dirty_bitmap, errp);
+}
+
 /* nbd_negotiate_meta_query
  *
  * Parse namespace name and call corresponding function to parse body of the
@@ -801,8 +831,10 @@ static int nbd_negotiate_meta_query(NBDClient *client,
 NBDExportMetaContexts *meta, Error **errp)
 {
 int ret;
-char query[sizeof("base:") - 1];
-size_t baselen = strlen("base:");
+size_t ns_len = 5;
+char ns[5]; /* Both 'qemu' and 'base' namespaces have length = 5 including 
a
+   colon. If it's needed to introduce a namespace of the other
+   length, this should be certainly refactored. */
 uint32_t len;
 
 ret = nbd_opt_read(client, , sizeof(len), errp);
@@ -811,22 +843,23 @@ static int nbd_negotiate_meta_query(NBDClient *client,
 }
 cpu_to_be32s();
 
-/* The only supported namespace for now is 'base'. So query should start
- * with 'base:'. Otherwise, we can ignore it and skip the remainder. */
-if (len < baselen) {
+if (len < ns_len) {
 return nbd_opt_skip(client, len, errp);
 }
 
-len -= baselen;
-ret = nbd_opt_read(client, query, baselen, errp);
+len -= ns_len;
+ret = nbd_opt_read(client, ns, ns_len, errp);
 if (ret <= 0) {
 return ret;
 }
-if (strncmp(query, "base:", baselen) != 0) {
-return nbd_opt_skip(client, len, errp);
+
+if (!strncmp(ns, "base:", ns_len)) {
+return nbd_meta_base_query(client, meta, len, errp);
+} else if (!strncmp(ns, "qemu:", 

Re: [Qemu-block] [PATCH 07/19] block: Really pause block jobs on drain

2018-04-13 Thread Kevin Wolf
Am 12.04.2018 um 22:44 hat Paolo Bonzini geschrieben:
> On 12/04/2018 16:25, Kevin Wolf wrote:
> > This is already the order we have there. What is probably different from
> > what you envision is that after the parents have concluded, we still
> > check that they are still quiescent in every iteration.
> 
> Yes, and that's the quadratic part.
> 
> > What we could do easily is introducing a bool bs->quiesce_concluded or
> > something that bdrv_drain_poll() sets to true the first time that it
> > returns false. It also gets a shortcut so that it returns false
> > immediately if bs->quiesce_concluded is true. The field is reset in
> > bdrv_do_drained_end() when bs->quiesce_counter reaches 0.
> 
> Or bs->quiescent, for the sake of bikeshedding.

Yes, that sounds better.

The only problem with the proposal as I made it is that it's wrong. We
can't keep bs->quiescent until bdrv_do_drained_end() because the caller
can issue new requests and then have a nested drained section that needs
to wait for all requests again instead of deciding that everything is
already quiescent.

Maybe where we should really reset it is in the initial recursion of
bdrv_do_drained_begin(), specifically in bdrv_do_drained_begin_quiesce()
which is called by both the parent and the child recursion.

There don't seem to be completely obviously correct solutions (can't an
I/O thread be draining a specific node while the main loop runs
drain_all?), but this would probably be the most obvious one.

Hm... Or actually, reset bs->quiescent in bdrv_inc_in_flight()? Would
this be enough?

Kevin



Re: [Qemu-block] [PATCH 07/19] block: Really pause block jobs on drain

2018-04-13 Thread Paolo Bonzini
On 13/04/2018 10:01, Kevin Wolf wrote:
>> Or bs->quiescent, for the sake of bikeshedding.
> Yes, that sounds better.
> 
> The only problem with the proposal as I made it is that it's wrong. We
> can't keep bs->quiescent until bdrv_do_drained_end() because the caller
> can issue new requests and then have a nested drained section that needs
> to wait for all requests again instead of deciding that everything is
> already quiescent.
> 
> Maybe where we should really reset it is in the initial recursion of
> bdrv_do_drained_begin(), specifically in bdrv_do_drained_begin_quiesce()
> which is called by both the parent and the child recursion.
> 
> There don't seem to be completely obviously correct solutions (can't an
> I/O thread be draining a specific node while the main loop runs
> drain_all?), but this would probably be the most obvious one.

Or use a hash table?

Paolo



Re: [Qemu-block] [PATCH] scripts/dump-qcow2.pl: Script to dump qcow2 metadata

2018-04-13 Thread Alberto Garcia
On Fri 13 Apr 2018 04:45:15 PM CEST, Max Reitz wrote:

> OK, now that that's out of the way...  I'm wondering why you want to
> add this to the qemu tree?  If you'd written an iotest that would make
> use of it, sure.  But if it's just for debugging, then I'd personally
> think it would be better to just add it to a private repository.

Sure, that works for me, I just find it useful for debugging and I
decided to share it in case you think it fits in the scripts/ directory.

> That would give you more freedom to extend it, too.

Indeed.

Berto



Re: [Qemu-block] [PATCH 07/19] block: Really pause block jobs on drain

2018-04-13 Thread Kevin Wolf
Am 13.04.2018 um 13:05 hat Paolo Bonzini geschrieben:
> On 13/04/2018 10:01, Kevin Wolf wrote:
> >> Or bs->quiescent, for the sake of bikeshedding.
> > Yes, that sounds better.
> > 
> > The only problem with the proposal as I made it is that it's wrong. We
> > can't keep bs->quiescent until bdrv_do_drained_end() because the caller
> > can issue new requests and then have a nested drained section that needs
> > to wait for all requests again instead of deciding that everything is
> > already quiescent.
> > 
> > Maybe where we should really reset it is in the initial recursion of
> > bdrv_do_drained_begin(), specifically in bdrv_do_drained_begin_quiesce()
> > which is called by both the parent and the child recursion.
> > 
> > There don't seem to be completely obviously correct solutions (can't an
> > I/O thread be draining a specific node while the main loop runs
> > drain_all?), but this would probably be the most obvious one.
> 
> Or use a hash table?

I don't think it would be any more obvious, but it would bring in quite
a bit of additional complexity (structural, not computational), with
multiple places that create a hash table and then it would have to be
passed down to all functions involved in the recursion etc.

The fundamental question would stay the same as with bool quiescent:
When do you have to enter a node in the hash table, and when do you have
to remove it again?

The first question is easy, you mark it quiescent when bdrv_drain_poll()
returns false. The second is a bit harder, but reseting the quiescent
state in bdrv_do_drained_begin_quiesce() feels like the best place. It's
much simpler than recursively resetting it in all places that start new
activity (which would include BlockBackend users, not only nodes).

Kevin



[Qemu-block] [PATCH] nbd/server: introduce NBD_CMD_CACHE

2018-04-13 Thread Vladimir Sementsov-Ogievskiy
Handle nbd CACHE command. Just do read, without sending read data back.
Cache mechanism should be done by exported node driver chain.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/nbd.h |  3 ++-
 nbd/server.c| 10 ++
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index fcdcd54502..b4793d0a29 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -135,6 +135,7 @@ typedef struct NBDExtent {
 #define NBD_FLAG_SEND_TRIM (1 << 5) /* Send TRIM (discard) */
 #define NBD_FLAG_SEND_WRITE_ZEROES (1 << 6) /* Send WRITE_ZEROES */
 #define NBD_FLAG_SEND_DF   (1 << 7) /* Send DF (Do not Fragment) */
+#define NBD_FLAG_SEND_CACHE(1 << 8) /* Send CACHE (prefetch) */
 
 /* New-style handshake (global) flags, sent from server to client, and
control what will happen during handshake phase. */
@@ -195,7 +196,7 @@ enum {
 NBD_CMD_DISC = 2,
 NBD_CMD_FLUSH = 3,
 NBD_CMD_TRIM = 4,
-/* 5 reserved for failed experiment NBD_CMD_CACHE */
+NBD_CMD_CACHE = 5,
 NBD_CMD_WRITE_ZEROES = 6,
 NBD_CMD_BLOCK_STATUS = 7,
 };
diff --git a/nbd/server.c b/nbd/server.c
index 9e1f227178..30d7d3f444 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1134,7 +1134,7 @@ static coroutine_fn int nbd_negotiate(NBDClient *client, 
Error **errp)
 int ret;
 const uint16_t myflags = (NBD_FLAG_HAS_FLAGS | NBD_FLAG_SEND_TRIM |
   NBD_FLAG_SEND_FLUSH | NBD_FLAG_SEND_FUA |
-  NBD_FLAG_SEND_WRITE_ZEROES);
+  NBD_FLAG_SEND_WRITE_ZEROES | 
NBD_FLAG_SEND_CACHE);
 bool oldStyle;
 
 /* Old style negotiation header, no room for options
@@ -1826,7 +1826,9 @@ static int nbd_co_receive_request(NBDRequestData *req, 
NBDRequest *request,
 return -EIO;
 }
 
-if (request->type == NBD_CMD_READ || request->type == NBD_CMD_WRITE) {
+if (request->type == NBD_CMD_READ || request->type == NBD_CMD_WRITE ||
+request->type == NBD_CMD_CACHE)
+{
 if (request->len > NBD_MAX_BUFFER_SIZE) {
 error_setg(errp, "len (%" PRIu32" ) is larger than max len (%u)",
request->len, NBD_MAX_BUFFER_SIZE);
@@ -1911,7 +1913,7 @@ static coroutine_fn int nbd_do_cmd_read(NBDClient 
*client, NBDRequest *request,
 int ret;
 NBDExport *exp = client->exp;
 
-assert(request->type == NBD_CMD_READ);
+assert(request->type == NBD_CMD_READ || request->type == NBD_CMD_CACHE);
 
 /* XXX: NBD Protocol only documents use of FUA with WRITE */
 if (request->flags & NBD_CMD_FLAG_FUA) {
@@ -1930,7 +1932,7 @@ static coroutine_fn int nbd_do_cmd_read(NBDClient 
*client, NBDRequest *request,
 
 ret = blk_pread(exp->blk, request->from + exp->dev_offset, data,
 request->len);
-if (ret < 0) {
+if (ret < 0 || request->type == NBD_CMD_CACHE) {
 return nbd_send_generic_reply(client, request->handle, ret,
   "reading from file failed", errp);
 }
-- 
2.11.1




Re: [Qemu-block] [PATCH 0/2] Minor fixes about compressed clusters with OFLAG_COPIED

2018-04-13 Thread Max Reitz
On 2018-04-10 18:05, Alberto Garcia wrote:
> Hi,
> 
> while reviewing one previous patch about data corruption and
> compressed clusters we discussed that the documentation doesn't
> clarify that L2 entries for compressed clusters are not supposed to
> have the OFLAG_COPIED bit set.
> 
> Here's a patch to update the documentation and another one to fix the
> error message that "qemu-img check" produces.
> 
> Regards,
> 
> Berto
> 
> Alberto Garcia (2):
>   Fix error message about compressed clusters with OFLAG_COPIED
>   specs/qcow2: Clarify that compressed clusters have the COPIED bit
> reset
> 
>  block/qcow2-refcount.c | 4 ++--
>  docs/interop/qcow2.txt | 8 
>  2 files changed, 6 insertions(+), 6 deletions(-)

Thanks, applied to my block-next tree:

https://github.com/XanClic/qemu/commits/block-next

Max



Re: [Qemu-block] [PATCH] nbd/server: introduce NBD_CMD_CACHE

2018-04-13 Thread Vladimir Sementsov-Ogievskiy

13.04.2018 17:31, Vladimir Sementsov-Ogievskiy wrote:

Handle nbd CACHE command. Just do read, without sending read data back.
Cache mechanism should be done by exported node driver chain.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  include/block/nbd.h |  3 ++-
  nbd/server.c| 10 ++
  2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index fcdcd54502..b4793d0a29 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -135,6 +135,7 @@ typedef struct NBDExtent {
  #define NBD_FLAG_SEND_TRIM (1 << 5) /* Send TRIM (discard) */
  #define NBD_FLAG_SEND_WRITE_ZEROES (1 << 6) /* Send WRITE_ZEROES */
  #define NBD_FLAG_SEND_DF   (1 << 7) /* Send DF (Do not Fragment) */
+#define NBD_FLAG_SEND_CACHE(1 << 8) /* Send CACHE (prefetch) */
  
  /* New-style handshake (global) flags, sent from server to client, and

 control what will happen during handshake phase. */
@@ -195,7 +196,7 @@ enum {
  NBD_CMD_DISC = 2,
  NBD_CMD_FLUSH = 3,
  NBD_CMD_TRIM = 4,
-/* 5 reserved for failed experiment NBD_CMD_CACHE */
+NBD_CMD_CACHE = 5,
  NBD_CMD_WRITE_ZEROES = 6,
  NBD_CMD_BLOCK_STATUS = 7,
  };
diff --git a/nbd/server.c b/nbd/server.c
index 9e1f227178..30d7d3f444 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1134,7 +1134,7 @@ static coroutine_fn int nbd_negotiate(NBDClient *client, 
Error **errp)
  int ret;
  const uint16_t myflags = (NBD_FLAG_HAS_FLAGS | NBD_FLAG_SEND_TRIM |
NBD_FLAG_SEND_FLUSH | NBD_FLAG_SEND_FUA |
-  NBD_FLAG_SEND_WRITE_ZEROES);
+  NBD_FLAG_SEND_WRITE_ZEROES | 
NBD_FLAG_SEND_CACHE);
  bool oldStyle;
  
  /* Old style negotiation header, no room for options

@@ -1826,7 +1826,9 @@ static int nbd_co_receive_request(NBDRequestData *req, 
NBDRequest *request,
  return -EIO;
  }
  
-if (request->type == NBD_CMD_READ || request->type == NBD_CMD_WRITE) {

+if (request->type == NBD_CMD_READ || request->type == NBD_CMD_WRITE ||
+request->type == NBD_CMD_CACHE)
+{
  if (request->len > NBD_MAX_BUFFER_SIZE) {
  error_setg(errp, "len (%" PRIu32" ) is larger than max len (%u)",
 request->len, NBD_MAX_BUFFER_SIZE);
@@ -1911,7 +1913,7 @@ static coroutine_fn int nbd_do_cmd_read(NBDClient 
*client, NBDRequest *request,
  int ret;
  NBDExport *exp = client->exp;
  
-assert(request->type == NBD_CMD_READ);

+assert(request->type == NBD_CMD_READ || request->type == NBD_CMD_CACHE);
  
  /* XXX: NBD Protocol only documents use of FUA with WRITE */

  if (request->flags & NBD_CMD_FLAG_FUA) {
@@ -1930,7 +1932,7 @@ static coroutine_fn int nbd_do_cmd_read(NBDClient 
*client, NBDRequest *request,
  
  ret = blk_pread(exp->blk, request->from + exp->dev_offset, data,

  request->len);
-if (ret < 0) {
+if (ret < 0 || request->type == NBD_CMD_CACHE) {
  return nbd_send_generic_reply(client, request->handle, ret,
"reading from file failed", errp);
  }


ohh, forget the main thing:

diff --git a/nbd/server.c b/nbd/server.c
index 30d7d3f444..d6a161c8d5 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1965,6 +1965,7 @@ static coroutine_fn int 
nbd_handle_request(NBDClient *client,


 switch (request->type) {
 case NBD_CMD_READ:
+    case NBD_CMD_CACHE:
 return nbd_do_cmd_read(client, request, data, errp);

 case NBD_CMD_WRITE:



--
Best regards,
Vladimir




Re: [Qemu-block] [PATCH] scripts/dump-qcow2.pl: Script to dump qcow2 metadata

2018-04-13 Thread Max Reitz
On 2018-03-28 15:38, Alberto Garcia wrote:
> This script takes a qcow2 image and dumps its metadata: header,
> snapshot table and some extensions (although not all qcow2 features
> are supported yet).
> 
> It can also display a list of all host clusters and the guest -> host
> address mappings, so it's useful to debug allocations.
> 
> The image is assumed not to be corrupted, and this script does not do
> consistency checks (yet).
> 
> Signed-off-by: Alberto Garcia 
> ---
>  scripts/dump-qcow2.pl | 425 
> ++
>  1 file changed, 425 insertions(+)
>  create mode 100755 scripts/dump-qcow2.pl

First of all: The main reason I haven't reviewed this so far is because
I don't like Perl.

OK, now that that's out of the way...  I'm wondering why you want to add
this to the qemu tree?  If you'd written an iotest that would make use
of it, sure.  But if it's just for debugging, then I'd personally think
it would be better to just add it to a private repository.  That would
give you more freedom to extend it, too.

(I know John has an own script for debugging qcow2 images.)

Of course the nice thing about putting it into the qemu repository would
be that people would be able to find it.  But then again, we could
probably achieve the same with a Wiki page.

Max



Re: [Qemu-block] [PATCH v2 1/2] qcow2: Give the refcount cache the minimum possible size by default

2018-04-13 Thread Max Reitz
On 2018-03-14 09:29, Alberto Garcia wrote:
> The L2 and refcount caches have default sizes that can be overridden
> using the l2-cache-size and refcount-cache-size (an additional
> parameter named cache-size sets the combined size of both caches).
> 
> Unless forced by one of the aforementioned parameters, QEMU will set
> the unspecified sizes so that the L2 cache is 4 times larger than the
> refcount cache.
> 
> This is based on the premise that the refcount metadata needs to be
> only a fourth of the L2 metadata to cover the same amount of disk
> space. This is incorrect for two reasons:
> 
>  a) The amount of disk covered by an L2 table depends solely on the
> cluster size, but in the case of a refcount block it depends on
> the cluster size *and* the width of each refcount entry.
> The 4/1 ratio is only valid with 16-bit entries (the default).
> 
>  b) When we talk about disk space and L2 tables we are talking about
> guest space (L2 tables map guest clusters to host clusters),
> whereas refcount blocks are used for host clusters (including
> L1/L2 tables and the refcount blocks themselves). On a fully
> populated (and uncompressed) qcow2 file, image size > virtual size
> so there are more refcount entries than L2 entries.
> 
> Problem (a) could be fixed by adjusting the algorithm to take into
> account the refcount entry width. Problem (b) could be fixed by
> increasing a bit the refcount cache size to account for the clusters
> used for qcow2 metadata.

Yeah, well, but I wouldn't give too much thought to (b).  In case of
internal snapshots, you won't write to them, so those clusters don't
matter.  So, yes, you get a bit of overhead for the metadata, but come
on, that's not really much.

And (a) would be trivial, as you yourself say.

> However this patch takes a completely different approach and instead
> of keeping a ratio between both cache sizes it assigns as much as
> possible to the L2 cache and the remainder to the refcount cache.

Seems reasonable in a sense, but I'm wary still.  I especially don't
like that it just changes the default when nothing has been specified.

I can see that we want to limit the refcount cache size, so if you want
to give really much RAM to the L2 tables, then you don't want a 25 %
overhead for the refcount cache, but do we really need to limit it to
the minimum size?

OTOH, four clusters doesn't seem extremely limited...  Did you do your
benchmark on an HDD?

Max

> The reason is that L2 tables are used for every single I/O request
> from the guest and the effect of increasing the cache is significant
> and clearly measurable. Refcount blocks are however only used for
> cluster allocation and internal snapshots and in practice are accessed
> sequentially in most cases, so the effect of increasing the cache is
> negligible (even when doing random writes from the guest).
> 
> So, make the refcount cache as small as possible unless the user
> explicitly asks for a larger one.
> 
> Signed-off-by: Alberto Garcia 
> Reviewed-by: Eric Blake 
> ---
>  block/qcow2.c  | 31 +++
>  block/qcow2.h  |  4 
>  tests/qemu-iotests/137.out |  2 +-
>  3 files changed, 20 insertions(+), 17 deletions(-)
> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 7472af6931..8342b0186f 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -802,23 +802,30 @@ static void read_cache_sizes(BlockDriverState *bs, 
> QemuOpts *opts,
>  } else if (refcount_cache_size_set) {
>  *l2_cache_size = combined_cache_size - *refcount_cache_size;
>  } else {
> -*refcount_cache_size = combined_cache_size
> - / (DEFAULT_L2_REFCOUNT_SIZE_RATIO + 1);
> -*l2_cache_size = combined_cache_size - *refcount_cache_size;
> +uint64_t virtual_disk_size = bs->total_sectors * 
> BDRV_SECTOR_SIZE;
> +uint64_t max_l2_cache = virtual_disk_size / (s->cluster_size / 
> 8);
> +uint64_t min_refcount_cache =
> +(uint64_t) MIN_REFCOUNT_CACHE_SIZE * s->cluster_size;
> +
> +/* Assign as much memory as possible to the L2 cache, and
> + * use the remainder for the refcount cache */
> +if (combined_cache_size >= max_l2_cache + min_refcount_cache) {
> +*l2_cache_size = max_l2_cache;
> +*refcount_cache_size = combined_cache_size - *l2_cache_size;
> +} else {
> +*refcount_cache_size =
> +MIN(combined_cache_size, min_refcount_cache);
> +*l2_cache_size = combined_cache_size - *refcount_cache_size;
> +}
>  }
>  } else {
> -if (!l2_cache_size_set && !refcount_cache_size_set) {
> +if (!l2_cache_size_set) {
>  *l2_cache_size = MAX(DEFAULT_L2_CACHE_BYTE_SIZE,
>   

Re: [Qemu-block] [Qemu-devel] [PATCH] iotests: fix 169

2018-04-13 Thread Max Reitz
On 2018-04-12 11:09, Vladimir Sementsov-Ogievskiy wrote:
> 12.04.2018 11:34, Vladimir Sementsov-Ogievskiy wrote:
>> 11.04.2018 19:11, Max Reitz wrote:
>>> On 2018-04-11 15:05, Vladimir Sementsov-Ogievskiy wrote:
>>>
>>> [...]
>>>
 Hmm, first type? I'm now not sure about, did I really see sha256
 mismatch, or something like this (should be error, but found bitmap):

 --- /work/src/qemu/up-169/tests/qemu-iotests/169.out 2018-04-11
 15:35:10.055027392 +0300
 +++ /work/src/qemu/up-169/tests/qemu-iotests/169.out.bad 2018-04-11
 15:58:09.300450045 +0300
 @@ -1,5 +1,20 @@
 -
 +F...
 +==
 +FAIL: test__not_persistent__migbitmap__offline
 (__main__.TestDirtyBitmapMigration)
 +methodcaller(name, ...) --> methodcaller object
 +--
 +Traceback (most recent call last):
 +  File "169", line 136, in do_test_migration
 +    self.check_bitmap(self.vm_b, sha256 if persistent else False)
 +  File "169", line 77, in check_bitmap
 +    "Dirty bitmap 'bitmap0' not found");
 +  File "/work/src/qemu/up-169/tests/qemu-iotests/iotests.py", line
 389,
 in assert_qmp
 +    result = self.dictpath(d, path)
 +  File "/work/src/qemu/up-169/tests/qemu-iotests/iotests.py", line
 348,
 in dictpath
 +    self.fail('failed path traversal for "%s" in "%s"' % (path,
 str(d)))
 +AssertionError: failed path traversal for "error/desc" in "{u'return':
 {u'sha256':
 u'01d2ebedcb8f549a2547dbf8e231c410e3e747a9479e98909fc936e0035cf8b1'}}"


 Max, did you really seed sha256 mismatch or only something like this?
>>> I'm pretty sure I did see mismatches.
>>
>> hm, may be it possible too, because of not waiting for RESUME in this
>> case, we can request sha256 in the intermediate state of bitmap loading
> 
> no. if you see mismatch on first check after migration, then it is after
> RESUME event, so it means, migrated broken bitmap, it's a bug..
> 
> if you see mismatch on second check - after vm_b stop/start, this means
> that it is a persistent case, so incoming migration should fail, and
> bitmap is loaded from file, but how it can mismatch? persistance bug?

Well, I'll tell you if I ever see it again.  If not... Then I guess
we're fine.

Max



Re: [Qemu-block] [RFC] Intermediate block mirroring

2018-04-13 Thread Max Reitz
On 2018-04-12 19:07, Alberto Garcia wrote:
> Hello,
> 
> I mentioned this some time ago, but I'd like to retake it now: I'm
> checking how to copy arbitrary nodes on a backing chain, so if I have
> e.g.
> 
>[A] <- [B] <- [C] <- [D]
> 
> I'd like to end up with
> 
>[A] <- [E] <- [C] <- [D]
> 
> where [E] is a copy of [B]. The most obvious use case is to move [B]
> to a different storage backend.
> 
> At the moment we can already copy [B] into [E] (outside QEMU) and then
> do
> 
>change-backing-file device=[D] image-node-name=[C] backing-file=[E]
> 
> However this only changes the image on disk and the bs->backing_file
> string in memory. QEMU keeps using the [B] BlockDriverState, and we
> need to make it switch to [E].
> 
> One possible way to do this would be to modify blockdev-mirror so the
> source image can be located anywhere on a chain. Currently there's
> bs->backing_blocker preventing that, plus qmp_blockdev_mirror()
> refuses to take any non-root source node. Other than permission
> changes I don't think the algorithm itself needs any additional
> modification, although I suppose we'd like to change the backing file
> as part of the same job, and that would require API changes.

Another way would be to write blockdev-copy. :-)

I don't think there should be a limitation which nodes can be used as
the source of blockdev-mirror, and I think the hardest part about
lifting that limitation is about thinking what might break...  Maybe we
should just drop the limitation and deal with the fallout later?  I
can't imagine a single node configuration where we'd want to disallow
copying off a node, so I suppose if that breaks in some edge case that
wouldn't be something we'd have to disallow again but we'd just have to
fix it.

> One other way is to have a more generic replace-node command which
> would call bdrv_replace_node(), but I don't know if we want to expose
> that and I don't have any other use case for it at the moment.

I think we do want to expose that.  As far as I'm informed, for graph
manipulation we want a command that can replace nodes (including
replacing nothing by a new node or replacing an existing node by
nothing, if the parent supports that).

Also, well, there is blockdev-mirror already has a @replaces parameter,
right?  That was supposed to work with quorum, but it seems obvious to
use it here, too.

Max