Re: [Qemu-block] [PATCH for-2.12] nbd/client: Correctly handle bad server REP_META_CONTEXT

2018-03-30 Thread Vladimir Sementsov-Ogievskiy

30.03.2018 02:18, Eric Blake wrote:

It's never a good idea to blindly read for size bytes as
returned by the server without first validating that the size
is within bounds; a malicious or buggy server could cause us
to hang or get out of sync from reading further messages.

It may be smarter to try and teach the client to cope with
unexpected context ids by silently ignoring them instead of
hanging up on the server, but for now, if the server doesn't
reply with exactly the one context we expect, it's easier to
just give up - however, if we give up for any reason other
than an I/O failure, we might as well try to politely tell
the server we are quitting rather than continuing.

Fix some typos in the process.

Signed-off-by: Eric Blake 
---
  nbd/client.c | 19 +--
  1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/nbd/client.c b/nbd/client.c
index 9b9b7f0ea29..4ee1d9a4a2c 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -599,8 +599,8 @@ static QIOChannel *nbd_receive_starttls(QIOChannel *ioc,
   * Set one meta context. Simple means that reply must contain zero (not
   * negotiated) or one (negotiated) contexts. More contexts would be considered
   * as a protocol error. It's also implied that meta-data query equals queried
- * context name, so, if server replies with something different then @context,
- * it considered as error too.
+ * context name, so, if server replies with something different than @context,
+ * it is considered an error too.
   * return 1 for successful negotiation, context_id is set
   *0 if operation is unsupported,
   *-1 with errp set for any other error
@@ -651,6 +651,14 @@ static int nbd_negotiate_simple_meta_context(QIOChannel 
*ioc,
  char *name;
  size_t len;

+if (reply.length != sizeof(received_id) + context_len) {
+error_setg(errp, "Failed to negotiate meta context '%s', server "
+   "answered with unexpected length %u", context,


uint32_t, is it worth PRIu32 ? Or %u is absolutely portable in this case?


+   reply.length);
+nbd_send_opt_abort(ioc);
+return -1;
+}


hmm, after this check, len variable is not actually needed, we can use 
context_len



+
  if (nbd_read(ioc, _id, sizeof(received_id), errp) < 0) {
  return -1;
  }
@@ -668,6 +676,7 @@ static int nbd_negotiate_simple_meta_context(QIOChannel 
*ioc,
 "answered with different context '%s'", context,
 name);
  g_free(name);
+nbd_send_opt_abort(ioc);
  return -1;
  }
  g_free(name);
@@ -690,6 +699,12 @@ static int nbd_negotiate_simple_meta_context(QIOChannel 
*ioc,
  if (reply.type != NBD_REP_ACK) {
  error_setg(errp, "Unexpected reply type %" PRIx32 " expected %x",
 reply.type, NBD_REP_ACK);
+nbd_send_opt_abort(ioc);
+return -1;
+}
+if (reply.length) {


this check is very common for REP_ACK, it may be better to move it to 
nbd_handle_reply_err... (and rename this function? and  combine it 
somehow with _option_request() and _option_reply()?)



+error_setg(errp, "Unexpected length to ACK response");
+nbd_send_opt_abort(ioc);


hmm, looks like we want nbd_send_opt_abort() before most of return -1. 
Looks like it lacks some generalization, may be want to send it at some 
common point..



  return -1;
  }



mostly, just ideas for future refactoring, so:
Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir




Re: [Qemu-block] [PATCH for-2.12] nbd: trace meta context negotiation

2018-03-30 Thread Vladimir Sementsov-Ogievskiy

30.03.2018 16:09, Eric Blake wrote:

Having a more detailed log of the interaction between client and
server is invaluable in debugging how meta context negotiation
actually works.

Signed-off-by: Eric Blake 
---
  nbd/client.c | 2 ++
  nbd/server.c | 8 
  nbd/trace-events | 6 ++
  3 files changed, 16 insertions(+)

diff --git a/nbd/client.c b/nbd/client.c
index 4ee1d9a4a2c..478b6085df7 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -623,6 +623,7 @@ static int nbd_negotiate_simple_meta_context(QIOChannel 
*ioc,
  char *data = g_malloc(data_len);
  char *p = data;

+trace_nbd_opt_meta_request(context, export);
  stl_be_p(p, export_len);
  memcpy(p += sizeof(export_len), export, export_len);
  stl_be_p(p += export_len, 1);
@@ -681,6 +682,7 @@ static int nbd_negotiate_simple_meta_context(QIOChannel 
*ioc,
  }
  g_free(name);

+trace_nbd_opt_meta_reply(context, received_id);
  received = true;

  /* receive NBD_REP_ACK */
diff --git a/nbd/server.c b/nbd/server.c
index cea158913ba..9e1f2271784 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -726,6 +726,7 @@ static int nbd_negotiate_send_meta_context(NBDClient 
*client,
  context_id = 0;
  }

+trace_nbd_negotiate_meta_query_reply(context, context_id);
  set_be_option_rep(, client->opt, NBD_REP_META_CONTEXT,
sizeof(opt) - sizeof(opt.h) + iov[1].iov_len);
  stl_be_p(_id, context_id);
@@ -752,10 +753,12 @@ static int nbd_meta_base_query(NBDClient *client, 
NBDExportMetaContexts *meta,
  if (client->opt == NBD_OPT_LIST_META_CONTEXT) {
  meta->base_allocation = true;
  }
+trace_nbd_negotiate_meta_query_parse("base:");
  return 1;
  }

  if (len != alen) {
+trace_nbd_negotiate_meta_query_skip("not base:allocation");
  return nbd_opt_skip(client, len, errp);
  }

@@ -768,6 +771,7 @@ static int nbd_meta_base_query(NBDClient *client, 
NBDExportMetaContexts *meta,
  meta->base_allocation = true;
  }

+trace_nbd_negotiate_meta_query_parse("base:allocation");
  return 1;
  }

@@ -800,6 +804,7 @@ static int nbd_negotiate_meta_query(NBDClient *client,
  /* The only supported namespace for now is 'base'. So query should start
   * with 'base:'. Otherwise, we can ignore it and skip the remainder. */
  if (len < baselen) {
+trace_nbd_negotiate_meta_query_skip("length too short");
  return nbd_opt_skip(client, len, errp);
  }

@@ -809,6 +814,7 @@ static int nbd_negotiate_meta_query(NBDClient *client,
  return ret;
  }
  if (strncmp(query, "base:", baselen) != 0) {
+trace_nbd_negotiate_meta_query_skip("not for base: namespace");


Hmm, reasonable example of using same trace-point in several places in 
the code. Is it a precedent?



  return nbd_opt_skip(client, len, errp);
  }

@@ -858,6 +864,8 @@ static int nbd_negotiate_meta_queries(NBDClient *client,
  return ret;
  }
  cpu_to_be32s(_queries);
+trace_nbd_negotiate_meta_context(nbd_opt_lookup(client->opt),
+ meta->export_name, nb_queries);

  if (client->opt == NBD_OPT_LIST_META_CONTEXT && !nb_queries) {
  /* enable all known contexts */
diff --git a/nbd/trace-events b/nbd/trace-events
index 0d03edc967d..cfdbe04b447 100644
--- a/nbd/trace-events
+++ b/nbd/trace-events
@@ -10,6 +10,8 @@ nbd_receive_query_exports_start(const char *wantname) 
"Querying export list for
  nbd_receive_query_exports_success(const char *wantname) "Found desired export name 
'%s'"
  nbd_receive_starttls_new_client(void) "Setting up TLS"
  nbd_receive_starttls_tls_handshake(void) "Starting TLS handshake"
+nbd_opt_meta_request(const char *context, const char *export) "Requesting to set 
meta context %s for export %s"
+nbd_opt_meta_reply(const char *context, int id) "Received mapping of context %s to 
id %d"


actual type is uint32_t, I'd prefer to reflect it here.


  nbd_receive_negotiate(void *tlscreds, const char *hostname) "Receiving negotiation 
tlscreds=%p hostname=%s"
  nbd_receive_negotiate_magic(uint64_t magic) "Magic is 0x%" PRIx64
  nbd_receive_negotiate_server_flags(uint32_t globalflags) "Global flags are 
0x%" PRIx32
@@ -44,6 +46,10 @@ nbd_negotiate_handle_info_request(int request, const char *name) 
"Client request
  nbd_negotiate_handle_info_block_size(uint32_t minimum, uint32_t preferred, uint32_t maximum) 
"advertising minimum 0x%" PRIx32 ", preferred 0x%" PRIx32 ", maximum 0x%" PRIx32
  nbd_negotiate_handle_starttls(void) "Setting up TLS"
  nbd_negotiate_handle_starttls_handshake(void) "Starting TLS handshake"
+nbd_negotiate_meta_context(const char *optname, const char *export, int queries) 
"Client requested %s for export %s, with %d queries"


and here


+nbd_negotiate_meta_query_skip(const char *reason) "Skipping meta query: %s"

[Qemu-block] [PATCH] iotests: fix 169

2018-03-30 Thread Vladimir Sementsov-Ogievskiy
Use MIGRATION events instead of RESUME. Also, make a TODO: enable
dirty-bitmaps capability for offline case.

This (likely) fixes racy faults at least of the following types:

- timeout on waiting for RESUME event
- sha256 mismatch on 136 (138 after this patch)

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---

This patch is a true change for the test anyway. But I don't understand,
why (and do really) it fixes the things. And I'm not sure about do we
really have a bug in bitmap migration or persistence. So, it's up to you,
take it into 2.12... 

It was already discussed, that "STOP" event is bad for tests. What about
"RESUME"? How can we miss it? And sha256 mismatch is really something
strange.

Max, please check, do it fix 169 for you.

 tests/qemu-iotests/169 | 44 +++-
 1 file changed, 23 insertions(+), 21 deletions(-)

diff --git a/tests/qemu-iotests/169 b/tests/qemu-iotests/169
index 153b10b6e7..5e525ab9d5 100755
--- a/tests/qemu-iotests/169
+++ b/tests/qemu-iotests/169
@@ -31,6 +31,8 @@ disk_a = os.path.join(iotests.test_dir, 'disk_a')
 disk_b = os.path.join(iotests.test_dir, 'disk_b')
 size = '1M'
 mig_file = os.path.join(iotests.test_dir, 'mig_file')
+mig_cmd = 'exec: cat > ' + mig_file
+incoming_cmd = 'exec: cat ' + mig_file
 
 
 class TestDirtyBitmapMigration(iotests.QMPTestCase):
@@ -49,7 +51,6 @@ class TestDirtyBitmapMigration(iotests.QMPTestCase):
 self.vm_a.launch()
 
 self.vm_b = iotests.VM(path_suffix='b')
-self.vm_b.add_incoming("exec: cat '" + mig_file + "'")
 
 def add_bitmap(self, vm, granularity, persistent):
 params = {'node': 'drive0',
@@ -86,36 +87,30 @@ class TestDirtyBitmapMigration(iotests.QMPTestCase):
(0xa0201, 0x1000))
 
 should_migrate = migrate_bitmaps or persistent and shared_storage
+mig_caps = [{'capability': 'events', 'state': True}]
+if migrate_bitmaps:
+mig_caps.append({'capability': 'dirty-bitmaps', 'state': True})
 
+result = self.vm_a.qmp('migrate-set-capabilities',
+   capabilities=mig_caps)
+self.assert_qmp(result, 'return', {})
+
+self.vm_b.add_incoming(incoming_cmd if online else "defer")
 self.vm_b.add_drive(disk_a if shared_storage else disk_b)
 
 if online:
 os.mkfifo(mig_file)
 self.vm_b.launch()
+result = self.vm_b.qmp('migrate-set-capabilities',
+   capabilities=mig_caps)
+self.assert_qmp(result, 'return', {})
 
 self.add_bitmap(self.vm_a, granularity, persistent)
 for r in regions:
 self.vm_a.hmp_qemu_io('drive0', 'write %d %d' % r)
 sha256 = self.get_bitmap_hash(self.vm_a)
 
-if migrate_bitmaps:
-capabilities = [{'capability': 'dirty-bitmaps', 'state': True}]
-
-result = self.vm_a.qmp('migrate-set-capabilities',
-   capabilities=capabilities)
-self.assert_qmp(result, 'return', {})
-
-if online:
-result = self.vm_b.qmp('migrate-set-capabilities',
-   capabilities=capabilities)
-self.assert_qmp(result, 'return', {})
-
-result = self.vm_a.qmp('migrate-set-capabilities',
-   capabilities=[{'capability': 'events',
-  'state': True}])
-self.assert_qmp(result, 'return', {})
-
-result = self.vm_a.qmp('migrate', uri='exec:cat>' + mig_file)
+result = self.vm_a.qmp('migrate', uri=mig_cmd)
 while True:
 event = self.vm_a.event_wait('MIGRATION')
 if event['data']['status'] == 'completed':
@@ -124,9 +119,16 @@ class TestDirtyBitmapMigration(iotests.QMPTestCase):
 if not online:
 self.vm_a.shutdown()
 self.vm_b.launch()
-# TODO enable bitmap capability for vm_b in this case
+result = self.vm_b.qmp('migrate-set-capabilities',
+   capabilities=mig_caps)
+self.assert_qmp(result, 'return', {})
+result = self.vm_b.qmp('migrate-incoming', uri=incoming_cmd)
+self.assert_qmp(result, 'return', {})
 
-self.vm_b.event_wait("RESUME", timeout=10.0)
+while True:
+event = self.vm_b.event_wait('MIGRATION')
+if event['data']['status'] == 'completed':
+break
 
 self.check_bitmap(self.vm_b, sha256 if should_migrate else False)
 
-- 
2.11.1




Re: [Qemu-block] [PATCH for 2.12 0/3] iotests: blacklist bochs and cloop for 205 and 208

2018-03-30 Thread Vladimir Sementsov-Ogievskiy

for 2.12

30.03.2018 18:16, Vladimir Sementsov-Ogievskiy wrote:

Blacklist these formats, as they don't support image creation, as they
say:
 > ./qemu-img create -f bochs x 1m
 qemu-img: x: Format driver 'bochs' does not support image creation

 > ./qemu-img create -f cloop x 1m
 qemu-img: x: Format driver 'cloop' does not support image creation

Vladimir Sementsov-Ogievskiy (3):
   iotests.py: improve verify_image_format helper
   iotests.py: support unsupported_fmts in main()
   iotests: blacklist bochs and cloop for 205 and 208

  tests/qemu-iotests/205|  2 +-
  tests/qemu-iotests/208|  2 ++
  tests/qemu-iotests/iotests.py | 11 ++-
  3 files changed, 9 insertions(+), 6 deletions(-)




--
Best regards,
Vladimir




[Qemu-block] [PATCH 1/3] iotests.py: improve verify_image_format helper

2018-03-30 Thread Vladimir Sementsov-Ogievskiy
Add an assert (we don't want set both arguments) and remove
duplication.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 tests/qemu-iotests/iotests.py | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index b5d7945..83c454d 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -532,9 +532,9 @@ def notrun(reason):
 sys.exit(0)
 
 def verify_image_format(supported_fmts=[], unsupported_fmts=[]):
-if supported_fmts and (imgfmt not in supported_fmts):
-notrun('not suitable for this image format: %s' % imgfmt)
-if unsupported_fmts and (imgfmt in unsupported_fmts):
+assert not (supported_fmts and unsupported_fmts)
+not_sup = supported_fmts and (imgfmt not in supported_fmts)
+if not_sup or (imgfmt in unsupported_fmts):
 notrun('not suitable for this image format: %s' % imgfmt)
 
 def verify_platform(supported_oses=['linux']):
-- 
2.7.4




[Qemu-block] [PATCH 2/3] iotests.py: support unsupported_fmts in main()

2018-03-30 Thread Vladimir Sementsov-Ogievskiy
Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 tests/qemu-iotests/iotests.py | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 83c454d..89fe446 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -553,7 +553,8 @@ def verify_quorum():
 if not supports_quorum():
 notrun('quorum support missing')
 
-def main(supported_fmts=[], supported_oses=['linux'], 
supported_cache_modes=[]):
+def main(supported_fmts=[], supported_oses=['linux'], supported_cache_modes=[],
+ unsupported_fmts=[]):
 '''Run tests'''
 
 global debug
@@ -568,7 +569,7 @@ def main(supported_fmts=[], supported_oses=['linux'], 
supported_cache_modes=[]):
 
 debug = '-d' in sys.argv
 verbosity = 1
-verify_image_format(supported_fmts)
+verify_image_format(supported_fmts, unsupported_fmts)
 verify_platform(supported_oses)
 verify_cache_mode(supported_cache_modes)
 
-- 
2.7.4




[Qemu-block] [PATCH 0/3] iotests: blacklist bochs and cloop for 205 and 208

2018-03-30 Thread Vladimir Sementsov-Ogievskiy
Blacklist these formats, as they don't support image creation, as they
say:
> ./qemu-img create -f bochs x 1m
qemu-img: x: Format driver 'bochs' does not support image creation

> ./qemu-img create -f cloop x 1m
qemu-img: x: Format driver 'cloop' does not support image creation

Vladimir Sementsov-Ogievskiy (3):
  iotests.py: improve verify_image_format helper
  iotests.py: support unsupported_fmts in main()
  iotests: blacklist bochs and cloop for 205 and 208

 tests/qemu-iotests/205|  2 +-
 tests/qemu-iotests/208|  2 ++
 tests/qemu-iotests/iotests.py | 11 ++-
 3 files changed, 9 insertions(+), 6 deletions(-)

-- 
2.7.4




Re: [Qemu-block] QEMU v2.12.0-rc1 iotest failures

2018-03-30 Thread Vladimir Sementsov-Ogievskiy

30.03.2018 15:22, Eric Blake wrote:

On 03/29/2018 07:18 PM, John Snow wrote:

Collecting the remaining failures that I know about for sake of easy
tracking on the list. I think Max has been having troubles with iotest
169 as well, but I haven't reproduced the failure myself. I do
occasionally see a locking failure on 153, but it appears to be
relatively rare.




051, 186: Related to culling SCSI options. Do we remove the tests?


A patch for this one is pending:
https://lists.gnu.org/archive/html/qemu-devel/2018-03/msg06086.html


185: Being handled by QingFeng Hao
205: Can't create images for bochs/cloop.
208: Can't create images for bochs/cloop, and seems to be mismanaging 
LUKS?







# luks

Failures: 208

commit 44a8174e0a2919693dc28f19c3ce8eef6d4ac2c3
Author: Stefan Hajnoczi 
Date:   Mon Mar 12 16:04:39 2018 -0500

 iotests: add 208 nbd-server + blockdev-snapshot-sync test case


qemu-system-x86_64: -drive
if=none,id=drive0,file=/home/bos/jhuston/src/qemu/bin/git/tests/qemu-iotests/scratch/9125-disk.img,format=luks,cache=writeback,node-name=drive0-node,key-secret=keysec0: 


Volume is not in LUKS format


Not sure if this test can easily be updated to support luks, or if it 
should just blacklist it




let me try :)
http://lists.gnu.org/archive/html/qemu-devel/2018-03/msg07439.html

--
Best regards,
Vladimir




[Qemu-block] [PATCH for 2.12] iotests: fix 208 for luks format

2018-03-30 Thread Vladimir Sementsov-Ogievskiy
Support luks images creatins like in 205

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 tests/qemu-iotests/208 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/208 b/tests/qemu-iotests/208
index 4e82b96..18f59ad 100755
--- a/tests/qemu-iotests/208
+++ b/tests/qemu-iotests/208
@@ -28,7 +28,7 @@ with iotests.FilePath('disk.img') as disk_img_path, \
  iotests.VM() as vm:
 
 img_size = '10M'
-iotests.qemu_img_pipe('create', '-f', iotests.imgfmt, disk_img_path, 
img_size)
+iotests.qemu_img_create('-f', iotests.imgfmt, disk_img_path, img_size)
 
 iotests.log('Launching VM...')
 (vm.add_drive(disk_img_path, 'node-name=drive0-node', interface='none')
-- 
2.7.4




Re: [Qemu-block] [PATCH v4 for 2.12 0/3] fix bitmaps migration through shared storage

2018-03-30 Thread Vladimir Sementsov-Ogievskiy

29.03.2018 18:09, Vladimir Sementsov-Ogievskiy wrote:

29.03.2018 17:03, Max Reitz wrote:

On 2018-03-29 10:08, Vladimir Sementsov-Ogievskiy wrote:

28.03.2018 17:53, Max Reitz wrote:

On 2018-03-27 12:11, Vladimir Sementsov-Ogievskiy wrote:

[...]


isn't it because a lot of cat processes? will check, update loop to
i=0; while check -qcow2 169; do ((i++)); echo $i OK; killall -9 cat;
done
Hmm...  I know I tried to kill all of the cats, but for some reason 
that

didn't really help yesterday.  Seems to help now, for 2.12.0-rc0 at
least (that is, before this series).

reproduced with killing... (without these series, just on master)


After the whole series, I still get a lot of failures in 169
(mismatching bitmap hash, mostly).

And interestingly, if I add an abort():

diff --git a/block/qcow2.c b/block/qcow2.c
index 486f3e83b7..9204c1c0ac 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1481,6 +1481,7 @@ static int coroutine_fn
qcow2_do_open(BlockDriverState *bs, QDict *options, }

   if (bdrv_dirty_bitmap_next(bs, NULL)) {
+    abort();
   /* It's some kind of reopen with already existing dirty
bitmaps. There
    * are no known cases where we need loading bitmaps in such
situation,
    * so it's safer don't load them.

Then this fires for a couple of test cases of 169 even without the 
third

patch of this series.

I guess bdrv_dirty_bitmap_next() reacts to some bitmaps that migration
adds or something?  Then this would be the wrong condition, because I
guess we still want to load the bitmaps that are in the qcow2 file.

I'm not sure whether bdrv_has_readonly_bitmaps() is the correct
condition then, either, though.  Maybe let's take a step back: We want
to load all the bitmaps from the file exactly once, and that is 
when it

is opened the first time.  Or that's what I would have thought...  Is
that even correct?

Why do we load the bitmaps when the device is inactive anyway?
Shouldn't we load them only once the device is activated?

Hmm, not sure. May be, we don't need. But we anyway need to load them,
when opening read-only, and we should correspondingly reopen in this 
case.

Yeah, well, yeah, but the current state seems just wrong. Apparently
there are cases where a qcow2 node may have bitmaps before we try to
load them from the file, so the current condition doesn't work.

Furthermore, it seems like the current "state machine" is too complex so
we don't know which cases are possible anymore and what to do when.

So the first thing we could do is add a field to the BDRVQCow2State that
tells us whether the bitmaps have been loaded already or not. If not,
we invoke qcow2_load_dirty_bitmaps() and set the value to true. If the
value was true already and the BDS is active and R/W now, we call
qcow2_reopen_bitmaps_rw_hint().  That should solve one problem.


good idea, will do.



The other problem of course is the question whether we should call
qcow2_load_dirty_bitmaps() at all while the drive is still inactive.
You know the migration model better than me, so I'm asking this question
to you.  We can phrase it differently: Do we need to load the bitmaps
before the drive is activated?


Now I think that we don't need. At least, we don't have such cases in 
Virtuozzo (I hope :).


Why did I doubt:

1. We have cases, when we want to start vm as inactive, to be able to 
export it's drive as NBD export, push some data to it and then start 
the VM (which involves activating)
2. We have cases, when we want to start vm stopped and operate on 
dirty bitmaps.


If just open all images in inactive mode until vm start, it looks like 
we need bitmaps in inactive mode (for 2.). But it looks like wrong 
approach anyway.
Firstly, I tried to solve (1.) by simply inactivate_all() in case of 
start vm in paused mode, but it breaks at least (2.), so finally, I 
solved (1.) by an approach similar with "-incoming defer". So, we have 
inactive mode in two cases:

 - incoming migration
 - push data to vm before start

and, in these cases, we don't need to load dirty-bitmaps.

Also, inconsistency: now, we remove persistent bitmaps on inactivate. 
So, it is inconsistent to load the in inactive mode.


Ok, I'll try to respin.


finally, what cases we actually have for qcow2_do_open?

1. INACTIVE -> ACTIVE (through invalidate_cache, we obviously should 
load bitmaps, if we decided that we have no persistent bitmaps in 
INACTIVE mode)
2. creating new bdrv state (first open of the image) in INACTIVE mode 
(will not load bitmaps)
3. creating new bdrv state (first open of the image) in ACTIVE mode 
(will load bitmaps, maybe read-only if disk is RO)


If only these three cases, it would be enough to just load bitmaps if 
!INACTIVE and do nothing otherwise.


Or, we have some of the following cases too?

1?. ACTIVE -> ACTIVE (through invalidate_cache, some kind of no-op, we 
should not reload bitmaps)
2?. RO -> RW (we should reopen_bitmaps_rw) (or it is possible only 
through bdrv_reopen, which will 

[Qemu-block] [PATCH for-2.12] nbd: trace meta context negotiation

2018-03-30 Thread Eric Blake
Having a more detailed log of the interaction between client and
server is invaluable in debugging how meta context negotiation
actually works.

Signed-off-by: Eric Blake 
---
 nbd/client.c | 2 ++
 nbd/server.c | 8 
 nbd/trace-events | 6 ++
 3 files changed, 16 insertions(+)

diff --git a/nbd/client.c b/nbd/client.c
index 4ee1d9a4a2c..478b6085df7 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -623,6 +623,7 @@ static int nbd_negotiate_simple_meta_context(QIOChannel 
*ioc,
 char *data = g_malloc(data_len);
 char *p = data;

+trace_nbd_opt_meta_request(context, export);
 stl_be_p(p, export_len);
 memcpy(p += sizeof(export_len), export, export_len);
 stl_be_p(p += export_len, 1);
@@ -681,6 +682,7 @@ static int nbd_negotiate_simple_meta_context(QIOChannel 
*ioc,
 }
 g_free(name);

+trace_nbd_opt_meta_reply(context, received_id);
 received = true;

 /* receive NBD_REP_ACK */
diff --git a/nbd/server.c b/nbd/server.c
index cea158913ba..9e1f2271784 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -726,6 +726,7 @@ static int nbd_negotiate_send_meta_context(NBDClient 
*client,
 context_id = 0;
 }

+trace_nbd_negotiate_meta_query_reply(context, context_id);
 set_be_option_rep(, client->opt, NBD_REP_META_CONTEXT,
   sizeof(opt) - sizeof(opt.h) + iov[1].iov_len);
 stl_be_p(_id, context_id);
@@ -752,10 +753,12 @@ static int nbd_meta_base_query(NBDClient *client, 
NBDExportMetaContexts *meta,
 if (client->opt == NBD_OPT_LIST_META_CONTEXT) {
 meta->base_allocation = true;
 }
+trace_nbd_negotiate_meta_query_parse("base:");
 return 1;
 }

 if (len != alen) {
+trace_nbd_negotiate_meta_query_skip("not base:allocation");
 return nbd_opt_skip(client, len, errp);
 }

@@ -768,6 +771,7 @@ static int nbd_meta_base_query(NBDClient *client, 
NBDExportMetaContexts *meta,
 meta->base_allocation = true;
 }

+trace_nbd_negotiate_meta_query_parse("base:allocation");
 return 1;
 }

@@ -800,6 +804,7 @@ static int nbd_negotiate_meta_query(NBDClient *client,
 /* The only supported namespace for now is 'base'. So query should start
  * with 'base:'. Otherwise, we can ignore it and skip the remainder. */
 if (len < baselen) {
+trace_nbd_negotiate_meta_query_skip("length too short");
 return nbd_opt_skip(client, len, errp);
 }

@@ -809,6 +814,7 @@ static int nbd_negotiate_meta_query(NBDClient *client,
 return ret;
 }
 if (strncmp(query, "base:", baselen) != 0) {
+trace_nbd_negotiate_meta_query_skip("not for base: namespace");
 return nbd_opt_skip(client, len, errp);
 }

@@ -858,6 +864,8 @@ static int nbd_negotiate_meta_queries(NBDClient *client,
 return ret;
 }
 cpu_to_be32s(_queries);
+trace_nbd_negotiate_meta_context(nbd_opt_lookup(client->opt),
+ meta->export_name, nb_queries);

 if (client->opt == NBD_OPT_LIST_META_CONTEXT && !nb_queries) {
 /* enable all known contexts */
diff --git a/nbd/trace-events b/nbd/trace-events
index 0d03edc967d..cfdbe04b447 100644
--- a/nbd/trace-events
+++ b/nbd/trace-events
@@ -10,6 +10,8 @@ nbd_receive_query_exports_start(const char *wantname) 
"Querying export list for
 nbd_receive_query_exports_success(const char *wantname) "Found desired export 
name '%s'"
 nbd_receive_starttls_new_client(void) "Setting up TLS"
 nbd_receive_starttls_tls_handshake(void) "Starting TLS handshake"
+nbd_opt_meta_request(const char *context, const char *export) "Requesting to 
set meta context %s for export %s"
+nbd_opt_meta_reply(const char *context, int id) "Received mapping of context 
%s to id %d"
 nbd_receive_negotiate(void *tlscreds, const char *hostname) "Receiving 
negotiation tlscreds=%p hostname=%s"
 nbd_receive_negotiate_magic(uint64_t magic) "Magic is 0x%" PRIx64
 nbd_receive_negotiate_server_flags(uint32_t globalflags) "Global flags are 
0x%" PRIx32
@@ -44,6 +46,10 @@ nbd_negotiate_handle_info_request(int request, const char 
*name) "Client request
 nbd_negotiate_handle_info_block_size(uint32_t minimum, uint32_t preferred, 
uint32_t maximum) "advertising minimum 0x%" PRIx32 ", preferred 0x%" PRIx32 ", 
maximum 0x%" PRIx32
 nbd_negotiate_handle_starttls(void) "Setting up TLS"
 nbd_negotiate_handle_starttls_handshake(void) "Starting TLS handshake"
+nbd_negotiate_meta_context(const char *optname, const char *export, int 
queries) "Client requested %s for export %s, with %d queries"
+nbd_negotiate_meta_query_skip(const char *reason) "Skipping meta query: %s"
+nbd_negotiate_meta_query_parse(const char *query) "Parsed meta query '%s'"
+nbd_negotiate_meta_query_reply(const char *context, unsigned int id) "Replying 
with meta context '%s' id %d"
 nbd_negotiate_options_flags(uint32_t flags) "Received client flags 0x%" PRIx32
 

Re: [Qemu-block] QEMU v2.12.0-rc1 iotest failures

2018-03-30 Thread Eric Blake

On 03/29/2018 07:18 PM, John Snow wrote:

Collecting the remaining failures that I know about for sake of easy
tracking on the list. I think Max has been having troubles with iotest
169 as well, but I haven't reproduced the failure myself. I do
occasionally see a locking failure on 153, but it appears to be
relatively rare.




051, 186: Related to culling SCSI options. Do we remove the tests?


A patch for this one is pending:
https://lists.gnu.org/archive/html/qemu-devel/2018-03/msg06086.html


185: Being handled by QingFeng Hao
205: Can't create images for bochs/cloop.
208: Can't create images for bochs/cloop, and seems to be mismanaging LUKS?






# luks

Failures: 208

commit 44a8174e0a2919693dc28f19c3ce8eef6d4ac2c3
Author: Stefan Hajnoczi 
Date:   Mon Mar 12 16:04:39 2018 -0500

 iotests: add 208 nbd-server + blockdev-snapshot-sync test case


qemu-system-x86_64: -drive
if=none,id=drive0,file=/home/bos/jhuston/src/qemu/bin/git/tests/qemu-iotests/scratch/9125-disk.img,format=luks,cache=writeback,node-name=drive0-node,key-secret=keysec0:
Volume is not in LUKS format


Not sure if this test can easily be updated to support luks, or if it 
should just blacklist it


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-block] [PATCH 1/1] block/file-posix.c: fix not reopened lock file descriptor

2018-03-30 Thread Fam Zheng
On Thu, 03/22 19:08, Dion Bosschieter wrote:
> Yeah I have a use case, before a last sync on a storage migration we suspend a
> VM -> send the last diffs -> mount the new storage server and after that we
> change a symlink -> call reopen -> check if all file descriptors are changed
> before resuming the VM.

What is the point of changing the symlink and checking if FDs are changed?

Fam



Re: [Qemu-block] [PATCH 1/1] block/file-posix.c: fix not reopened lock file descriptor

2018-03-30 Thread Fam Zheng
On Thu, 03/22 18:39, Kevin Wolf wrote:
> [ Cc: qemu-block ]
> 
> Am 22.03.2018 um 18:20 hat Dion Bosschieter geschrieben:
> > In commit 244a5668106297378391b768e7288eb157616f64 another
> > file descriptor to BDRVRawState is added. When we try to issue the
> > reopen command only s->fd is reopened; lock_fd could still hold an old
> > file descriptor "possibly" pointing to another file.
> > 
> > - change raw_reopen_prepare so it checks use_lock from BDRVRawState and
> > tries to reopen lock_fd accordingly
> > - change raw_reopen_commit so it closes the old lock_fd on use_lock
> > 
> > Signed-off-by: Dion Bosschieter 
> 
> bdrv_reopen() is not meant for opening a different file, it is meant to
> change the flags and options of the same file. Do you have a use case
> where you would actually need to switch to a different file?
> 
> As far as I know, lock_fd was specifically introduced _because_ it stays
> the same across reopen, so we don't need a racy release/reacquire pair.
> Fam (CCed) should know more.

I think (I remember we have discussed a bit before) techinically we can do
reopen just well without a separate s->lock_fd.  After all all locks we acquire
are shared lock, so it is possible to handle the lock between old fd and the new
one back and forth freely.

Fam