Re: [Qemu-block] [PATCH 9/9] iotests: new test 206 for NBD BLOCK_STATUS

2018-02-16 Thread Eric Blake

On 02/15/2018 07:51 AM, Vladimir Sementsov-Ogievskiy wrote:

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  tests/qemu-iotests/206 | 34 ++
  tests/qemu-iotests/206.out |  2 ++
  tests/qemu-iotests/group   |  1 +
  3 files changed, 37 insertions(+)
  create mode 100644 tests/qemu-iotests/206
  create mode 100644 tests/qemu-iotests/206.out


It's a race! Kevin already has test 206 and 207 claimed in his pending 
series:

https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg02363.html
Whoever loses gets to rebase and renumber (or the maintainer on their 
behalf) ;)




diff --git a/tests/qemu-iotests/206 b/tests/qemu-iotests/206
new file mode 100644


Oops, you forgot to mark this executable.  And oops, I forgot to flag 
that on test 205.  Other inconsistent tests: 096, 124, 129, 132, 136, 
139, 148, 152, 163. Separate cleanup patch coming up soon.



index 00..259e991ec6
--- /dev/null
+++ b/tests/qemu-iotests/206
@@ -0,0 +1,34 @@
+#!/usr/bin/env python
+#
+# Tests for NBD BLOCK_STATUS extension
+#



+import iotests
+from iotests import qemu_img_create, qemu_io, qemu_img_verbose, qemu_nbd, \
+file_path
+
+iotests.verify_image_format(supported_fmts=['qcow2'])


Don't we also need to blacklist v2 files (and only operate on v3 or 
newer), since the behavior of zero clusters is much worse on v2 images?



+
+disk, nbd_sock = file_path('disk', 'nbd-sock')
+nbd_uri = 'nbd+unix:///exp?socket=' + nbd_sock
+
+qemu_img_create('-f', iotests.imgfmt, disk, '1M')
+qemu_io('-f', iotests.imgfmt, '-c', 'write 0 512K', disk)


Hopefully large enough for all the major filesystems with sparse support 
to show half-data, half-hole, and not run into any weird issues where 
granularity differences between filesystems change the outcome.



+
+qemu_nbd('-k', nbd_sock, '-x', 'exp', '-f', iotests.imgfmt, disk)
+qemu_img_verbose('map', '-f', 'raw', '--output=json', nbd_uri)
diff --git a/tests/qemu-iotests/206.out b/tests/qemu-iotests/206.out
new file mode 100644
index 00..0d29724e84
--- /dev/null
+++ b/tests/qemu-iotests/206.out
@@ -0,0 +1,2 @@
+[{ "start": 0, "length": 524288, "depth": 0, "zero": false, "data": true},
+{ "start": 524288, "length": 524288, "depth": 0, "zero": true, "data": false}]


But definite evidence of getting block status over NBD! Progress!

And kudos for producing an .out file that I can actually read and 
reproduce without the aid of python (in contrast to the more 
"unit-test"-y ones like 205, where we had a big long side-discussion 
about that. No need to repeat it here).



diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index a2dfe79d86..2c3925566a 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -202,3 +202,4 @@
  203 rw auto
  204 rw auto quick
  205 rw auto quick
+206 rw auto quick



--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-block] [Qemu-devel] [PATCH 5/6] ahci-test: fix opts leak of skip tests

2018-02-16 Thread John Snow


On 02/15/2018 04:25 PM, Marc-André Lureau wrote:
> Fixes the following ASAN report:
> 
> Direct leak of 128 byte(s) in 8 object(s) allocated from:
> #0 0x7fefce311850 in malloc (/lib64/libasan.so.4+0xde850)
> #1 0x7fefcdd5ef0c in g_malloc ../glib/gmem.c:94
> #2 0x559b976faff0 in create_ahci_io_test 
> /home/elmarco/src/qemu/tests/ahci-test.c:1810
> 
> Signed-off-by: Marc-André Lureau 
> ---
>  tests/ahci-test.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/tests/ahci-test.c b/tests/ahci-test.c
> index 7aa5af428c..1bd3cc7ca8 100644
> --- a/tests/ahci-test.c
> +++ b/tests/ahci-test.c
> @@ -1822,6 +1822,7 @@ static void create_ahci_io_test(enum IOMode type, enum 
> AddrMode addr,
>  if ((addr == ADDR_MODE_LBA48) && (offset == OFFSET_HIGH) &&
>  (mb_to_sectors(test_image_size_mb) <= 0xFFF)) {
>  g_test_message("%s: skipped; test image too small", name);
> +g_free(opts);
>  g_free(name);
>  return;
>  }
> 

Whupps.

Thanks.

Reviewed-by: John Snow 

And, feel free to stage this in whomever's branch, it won't conflict
with anything:

Acked-by: John Snow 



Re: [Qemu-block] [PATCH 8/9] iotests: add file_path helper

2018-02-16 Thread Eric Blake

On 02/15/2018 07:51 AM, Vladimir Sementsov-Ogievskiy wrote:

Simple way to have auto generated filenames with auto clenup. Like


s/clenup/cleanup/


FilePath but without using 'with' statement and without additional
indentation of the whole test.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  tests/qemu-iotests/iotests.py | 32 
  1 file changed, 32 insertions(+)


Jeff, where do we stand on your iotests cleanups?  Is this something you 
want to use?




diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index c1302a2f9b..f2d05ca3fd 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -27,6 +27,7 @@ import struct
  import json
  import signal
  import logging
+import atexit
  
  sys.path.append(os.path.join(os.path.dirname(__file__), '..', '..', 'scripts'))

  import qtest
@@ -250,6 +251,37 @@ class FilePath(object):
  return False
  
  
+def file_path_remover():

+for path in reversed(file_path_remover.paths):
+try:
+os.remove(path)
+except OSError:
+pass
+
+
+def file_path(*names):
+''' Another way to get auto-generated filename that cleans itself up.
+
+Use it as simple as:
+
+img_a, img_b = file_path('a.img', 'b.img')
+sock = file_path('socket')
+'''
+
+if not hasattr(file_path_remover, 'paths'):
+file_path_remover.paths = []
+atexit.register(file_path_remover)
+
+paths = []
+for name in names:
+filename = '{0}-{1}'.format(os.getpid(), name)
+path = os.path.join(test_dir, filename)
+file_path_remover.paths.append(path)
+paths.append(path)
+
+return paths[0] if len(paths) == 1 else paths
+
+
  class VM(qtest.QEMUQtestMachine):
  '''A QEMU VM'''
  



--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-block] [PATCH 7/9] iotests.py: tiny refactor: move system imports up

2018-02-16 Thread Eric Blake

On 02/15/2018 07:51 AM, Vladimir Sementsov-Ogievskiy wrote:

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  tests/qemu-iotests/iotests.py | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)


What breaks if they aren't moved?  But stylistically, this looks 
reasonable; and can be merged independently of NBD stuff if someone else 
wants.


Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-block] [PATCH 6/9] nbd: BLOCK_STATUS for standard get_block_status function: client part

2018-02-16 Thread Eric Blake

On 02/15/2018 07:51 AM, Vladimir Sementsov-Ogievskiy wrote:

Minimal realization: only one extent in server answer is supported.
Flag NBD_CMD_FLAG_REQ_ONE is used to force this behavior.

Tests 140, 147 and 205 are fixed due to now server failed on searching
export in context of NBD_OPT_SET_META_CONTEXT option negotiation.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---



+++ b/block/nbd-client.c
@@ -228,6 +228,45 @@ static int 
nbd_parse_offset_hole_payload(NBDStructuredReplyChunk *chunk,
  return 0;
  }
  
+/* nbd_parse_blockstatus_payload

+ * support only one extent in reply and only for
+ * base:allocation context


Reasonable, since the server should not be sending us contexts we did 
not ask for during negotiation.



+ */
+static int nbd_parse_blockstatus_payload(NBDClientSession *client,
+ NBDStructuredReplyChunk *chunk,
+ uint8_t *payload, uint64_t 
orig_length,
+ NBDExtent *extent, Error **errp)
+{
+uint32_t context_id;
+
+if (chunk->length != sizeof(context_id) + sizeof(extent)) {


Okay as long as we use REQ_ONE to ensure exactly one extent per chunk.


+error_setg(errp, "Protocol error: invalid payload for "
+ "NBD_REPLY_TYPE_BLOCK_STATUS");
+return -EINVAL;
+}
+
+context_id = payload_advance32();
+if (client->info.meta_base_allocation_id != context_id) {
+error_setg(errp, "Protocol error: unexpected context id: %d for "
+ "NBD_REPLY_TYPE_BLOCK_STATUS, when negotiated context 
"
+ "id is %d", context_id,
+ client->info.meta_base_allocation_id);
+return -EINVAL;
+}
+
+memcpy(extent, payload, sizeof(*extent));
+be32_to_cpus(>length);
+be32_to_cpus(>flags);


Instead of doing a memcpy() and then in-place bit-swizzling, you could 
do the swapping as part of assignment, for one less function call (and 
make the code a bit easier to extend, if we later drop our REQ_ONE 
limitation on only having one extent, because you'll advance payload as 
needed):


extent->length = payload_advance32();
extent->flags = payload_advance32();

We should probably validate that the length field is a multiple of 
min_block (if a server tells us that all operations must be 512-byte 
aligned, then reports an extent that is smaller than 512 bytes, we have 
no way to ask for the status of the second half of the sector). 
Probably also something that needs to be explicitly stated in the NBD 
spec. [1]



+
+if (extent->length > orig_length) {
+error_setg(errp, "Protocol error: server sent chunk exceeding 
requested"
+ " region");
+return -EINVAL;


That matches the current spec wording, but I'm not sure I agree with it 
- what's wrong with a server providing a final extent that extends 
beyond the request, if the information was already available for free 
(the classic example: if the server never replies with HOLE or ZERO, 
then the entire file has the same status, so all requests could 
trivially be replied to by taking the starting offset to the end of the 
file as the returned length, rather than just clamping at the requested 
length).



+}
+
+return 0;
+}
+
  /* nbd_parse_error_payload
   * on success @errp contains message describing nbd error reply
   */
@@ -642,6 +681,61 @@ static int nbd_co_receive_cmdread_reply(NBDClientSession 
*s, uint64_t handle,
  return iter.ret;
  }
  
+static int nbd_co_receive_blockstatus_reply(NBDClientSession *s,

+uint64_t handle, uint64_t length,
+NBDExtent *extent, Error **errp)
+{
+NBDReplyChunkIter iter;
+NBDReply reply;
+void *payload = NULL;
+Error *local_err = NULL;
+bool received = false;
+
+NBD_FOREACH_REPLY_CHUNK(s, iter, handle, s->info.structured_reply,
+NULL, , )
+{
+int ret;
+NBDStructuredReplyChunk *chunk = 
+
+assert(nbd_reply_is_structured());
+
+switch (chunk->type) {
+case NBD_REPLY_TYPE_BLOCK_STATUS:


Depending on the outcome of the discussion on the NBD list, here's where 
you could make a client easily listen to both your initial server (that 
sent 5) and a server compliant to the current spec wording (where this 
constant is 3); although it remains to be seen if that's the only 
difference between your initial implementation and the NBD spec wording 
that gets promoted to stable.



+if (received) {
+s->quit = true;
+error_setg(_err, "Several BLOCK_STATUS chunks in reply");
+nbd_iter_error(, true, -EINVAL, _err);
+}


We may change this in the future to ask for more than one context at 
once; but for now, this matches that we negotiated for 

Re: [Qemu-block] Block Migration and CPU throttling

2018-02-16 Thread Dr. David Alan Gilbert
* Peter Lieven (p...@kamp.de) wrote:
> 
> > Am 07.02.2018 um 19:29 schrieb Dr. David Alan Gilbert :
> > 
> > * Peter Lieven (p...@kamp.de) wrote:
> >> Am 12.12.2017 um 18:05 schrieb Dr. David Alan Gilbert:
> >>> * Peter Lieven (p...@kamp.de) wrote:
>  Am 21.09.2017 um 14:36 schrieb Dr. David Alan Gilbert:
> > * Peter Lieven (p...@kamp.de) wrote:
> >> Am 19.09.2017 um 16:41 schrieb Dr. David Alan Gilbert:
> >>> * Peter Lieven (p...@kamp.de) wrote:
>  Am 19.09.2017 um 16:38 schrieb Dr. David Alan Gilbert:
> > * Peter Lieven (p...@kamp.de) wrote:
> >> Hi,
> >> 
> >> I just noticed that CPU throttling and Block Migration don't work 
> >> together very well.
> >> During block migration the throttling heuristic detects that we 
> >> obviously make no progress
> >> in ram transfer. But the reason is the running block migration and 
> >> not a too high dirty pages rate.
> >> 
> >> The result is that any VM is throttled by 99% during block 
> >> migration.
> > Hmm that's unfortunate; do you have a bandwidth set lower than your
> > actual network connection? I'm just wondering if it's actually going
> > between the block and RAM iterative sections or getting stuck in ne.
>  It happens also if source and dest are on the same machine and speed 
>  is set to 100G.
> >>> But does it happen if they're not and the speed is set low?
> >> Yes, it does. I noticed it in our test environment between different 
> >> nodes with a 10G
> >> link in between. But its totally clear why it happens. During block 
> >> migration we transfer
> >> all dirty memory pages in each round (if there is moderate memory 
> >> load), but all dirty
> >> pages are obviously more than 50% of the transferred ram in that round.
> >> It is more exactly 100%. But the current logic triggers on this 
> >> condition.
> >> 
> >> I think I will go forward and send a patch which disables auto 
> >> converge during
> >> block migration bulk stage.
> > Yes, that's fair;  it probably would also make sense to throttle the RAM
> > migration during the block migration bulk stage, since the chances are
> > it's not going to get far.  (I think in the nbd setup, the main
> > migration process isn't started until the end of bulk).
>  Catching up with the idea of delaying ram migration until block bulk has 
>  completed.
>  What do you think is the easiest way to achieve this?
> >>> 
> >>> 
> >>> I think the answer depends whether we think this is a 'special' or we
> >>> need a new general purpose mechanism.
> >>> 
> >>> If it was really general then we'd probably want to split the iterative
> >>> stage in two somehow, and only do RAM in the second half.
> >>> 
> >>> But I'm not sure it's worth it; I suspect the easiest way is:
> >>> 
> >>>a) Add a counter in migration/ram.c or in the RAM state somewhere
> >>>b) Make ram_save_inhibit increment the counter
> >>>c) Check the counter at the head of ram_save_iterate and just exit
> >>>  if it's none 0
> >>>d) Call ram_save_inhibit from block_save_setup
> >>>e) Then release it when you've finished the bulk stage
> >>> 
> >>> Make sure you still count the RAM in the pending totals, otherwise
> >>> migration might think it's finished a bit early.
> >> 
> >> Is there any culprit I don't see or is it as easy as this?
> > 
> > Hmm, looks promising doesn't it;  might need an include or two tidied
> > up, but looks worth a try.   Just be careful that there are no cases
> > where block migration can't transfer data in that state, otherwise we'll
> > keep coming back to here and spewing empty sections.
> 
> I already tested it and it actually works.

OK.

> What would you expect to be cleaned up before it would be a proper patch?

It's simple enough so not much; maybe add a trace for when it does the
exit just to make it easier to watch; I hadn't realised ram.c already
included migration/block.h for the !bulk case.

> Are there any implications with RDMA

Hmm; don't think so; it's mainly concerned with how the RAM is
transferred; and the ram_control_* hooks are called after your test,
and on the load side only once it's read the flags and got a block.

> and/or post copy migration?

Again I don't think so;  I think block migration always shows the
outstanding amount of block storage, and so it won't flip into postcopy
mode until the bulk of block migration is done.
Also the 'ram_save_complete' does it's own scan of blocks and doesn't
share the iterate code, so it won't be affected by your change.

> Is block migration possible at all with those?

Yes I think so in both cases; with RDMA I'm pretty sure it is, and I
think people have had it running with postcopy as well.   The postcopy
case isn't great (because the actual block storage isn't 

Re: [Qemu-block] [PATCH 5/9] nbd/client: fix error messages in nbd_handle_reply_err

2018-02-16 Thread Eric Blake

On 02/15/2018 07:51 AM, Vladimir Sementsov-Ogievskiy wrote:

1. NBD_REP_ERR_INVALID is not only about length, so, make message more
general

2. hex format is not very good: it's hard to read something like
"option a (set meta context)", so switch to dec.


It would be okay as option 0xa; I also want to be sure we match the 
output in server traces with the output in client traces; for example, I 
found:


nbd/server.c-error_setg(errp, "Unsupported option 0x%" 
PRIx32 " (%s)",

nbd/server.c:   option, nbd_opt_lookup(option));

So we want consistency through all the call sites, before this patch is 
ready to go, although I agree we need it.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-block] [PATCH 4/9] block/nbd-client: save first fatal error in nbd_iter_error

2018-02-16 Thread Eric Blake

On 02/15/2018 07:51 AM, Vladimir Sementsov-Ogievskiy wrote:

It is ok, that fatal error hides previous not fatal, but hiding
first fatal error is a bad feature.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block/nbd-client.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)



Can this trigger in 2.11 clients with structured reads to the point 
where we'd want to add qemu-stable in CC?  Or did you only hit it once 
block_status was added to the mix?


Note that comparing 2.11 server with 2.11 client doesn't help - because 
2.11 as server only sends a single chunk per structured read (as the 
only structured reply); it wasn't until commit 418638d3 that we have a 
qemu server that can send multiple chunks, and it looks like you need 
multiple chunks before multiple errors becomes a possibility.


At any rate,

Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-block] [PATCH 3/9] nbd: BLOCK_STATUS for standard get_block_status function: server part

2018-02-16 Thread Eric Blake

On 02/16/2018 08:43 AM, Vladimir Sementsov-Ogievskiy wrote:

16.02.2018 16:21, Eric Blake wrote:

On 02/15/2018 07:51 AM, Vladimir Sementsov-Ogievskiy wrote:

Minimal realization: only one extent in server answer is supported.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---



Again, function comments are useful.


+    if (query[0] == '\0' || strcmp(query, "allocation") == 0) {
+    /* Note: empty query should select all contexts within base
+ * namespace. */
+    meta->base_allocation = true;


From the client perspective, this handling of the empty leaf-name 
works well for NBD_OPT_LIST_META_CONTEXT (I want to see what leaf 
nodes the server supports), but not so well for 
NBD_OPT_SET_META_CONTEXT (asking the server to send ALL base 
allocations, even when I don't necessarily know how to interpret 
anything other than base:allocation, is a waste). So this function 
needs a bool parameter that says whether it is being invoked from 
_LIST (empty string okay, to advertise ALL base leaf names back to 
client, which for now is just base:allocation) or from _SET (empty 
string is ignored as invalid; client has to specifically ask for 
base:allocation by name).


"empty string is ignored as invalid", hm, do we have this in spec? I 
think SET and LIST must select exactly same sets of contexts.


No, it is specifically NOT intended that SET and LIST have to produce 
the same set of contexts; although I do see your point that as currently 
written, it does not appear to require SET to ignore "base:" or to treat 
it as an error.  At any rate, the spec gives the example of:



In either case, however, for any given namespace the server MAY, instead of 
exhaustively listing every matching context available to select (or every 
context available to select where no query is given), send sufficient context 
records back to allow a client with knowledge of the namespace to select any 
context. This may be helpful where a client can construct algorithmic queries. 
For instance, a client might reply simply with the namespace with no leaf-name 
(e.g. 'X-FooBar:') or with a range of values (e.g. 
'X-ModifiedDate:20160310-20161214'). The semantics of such a reply are a matter 
for the definition of the namespace. However each namespace returned MUST begin 
with the relevant namespace, followed by a colon, and then other UTF-8 
characters, with the entire string following the restrictions for strings set 
out earlier in this document.


with the intent being that for some namespaces, it may be easy to 
perform an algorithmic query via _LIST to see what ranges are supported, 
but that you cannot select ALL elements in the range simultaneously. 
The empty query for _LIST exists to enumerate what is supported, but 
does not have to equate to an empty query for _SET selecting everything 
possible.  I could even see it being possible to have some round-trips, 
depending on the namespace (of course, any namespace other than "base:" 
will be tightly coordinated between both client and server, so they 
understand each other - the point was that the NBD spec didn't want to 
constrain what a client and server could do as long as they stay within 
the generic framework):


C> LIST ""
S> REPLY "base:allocation" id 0
S> REPLY "X-FooBar:" id 0
S> ACK
C> LIST "X-FooBar:"
S> REPLY "X-FooBar:A_Required", id 0
S> REPLY "X-FooBar:A_Min=100", id 0
S> REPLY "X-FooBar:A_Max=200", id 0
S> REPLY "X-FooBar:B_Default=300", id 0
S> REPLY "X-FooBar:B_Min=300", id 0
S> REPLY "X-FooBar:B_Max=400", id 0
S> ACK
C> SET "X-FooBar:A=150" "base:allocation"
S> REPLY "X-FooBar:A=150,B=300", id 1
S> REPLY "base:allocation", id 2
S> ACK

where the global query of all available contexts merely lists that 
X-FooBar: is understood, but that a specific query is needed for more 
details (to avoid the client having to parse those specifics if it 
doesn't care about X-FooBar:), and the specific query sets up the 
algorithmic description (parameter A is required, between 100 and 200; 
parameter B is optional, between 300 and 400, defaulting to 300), and 
the final SET gives the actual request (A given a value, B left to its 
default; but the reply names the entire context rather than repeating 
the shorter request).  So the spec is written to permit something like 
that for third-party namespaces, while also trying to be very specific 
about the "base:" context as that is the one that needs the most 
interoperability.


It is 
strange behavior of client to set "base:", but it is its decision. And I 
don't thing that it is invalid.


For LIST, querying "base:" makes total sense (for the sake of example, 
we may add "base:frob" down the road that does something new.  Being 
able to LIST whether "base:" turns into just "base:allocation" or into 
"base:allocation"+"base:frob" may be useful to a client that understands 
both formats and wants to probe if the server is new; and even for a 
client right now, the client can gracefully 

Re: [Qemu-block] [PATCH v10 00/12] Dirty bitmaps postcopy migration

2018-02-16 Thread Vladimir Sementsov-Ogievskiy

07.02.2018 18:58, Vladimir Sementsov-Ogievskiy wrote:

Hi all!

There is a new version of dirty bitmap postcopy migration series.

Now it is based on Max's block tree: 
https://github.com/XanClic/qemu/commits/block,
where it needs only one patch: "block: maintain persistent disabled bitmaps",
but I hope it is near to be merged.


Yahoo, it's merged, thank you!

Patches don't apply (by git am) as is to master, but branch may be rebased
onto master with automatic conflicts resolution, so I think it's not needed
to post v11 without actual changes.



v10

clone: tag postcopy-v10 from https://src.openvz.org/scm/~vsementsov/qemu.git
online: 
https://src.openvz.org/users/vsementsov/repos/qemu/browse?at=postcopy-v10

01,02: r-b Fam
03: adjust comments about locking
04: fixed 124 iotest (was broken because of small mistake in 
block/dirty-bitmap.c)
05: rebased on master, staff from migration_thread is moved to 
migration_iteration_run, so
 drop r-b by John and Juan
06: 2.11->2.12, r-b Fam
07,08,09,: r-b Fam

10: move to device names instead of node names, looks like libvirt don't care 
about
 same node-names.
 flag AUTOLOAD is ignored for now
 use QEMU_ALIGN_UP and DIV_ROUND_UP
 skip automatically inserted nodes, when search for dirty bitmaps
 allow migration of no bitmaps (see in dirty_bitmap_load_header new logic
with nothing variable, which avoids extra 
errors)
 handle return code of dirty_bitmap_load_header
 avoid iteration if there are no bitmaps (see new .no_bitmaps field of
  dirty_bitmap_mig_state)
 call dirty_bitmap_mig_before_vm_start from process_incoming_migration_bh 
too,
 to enable bitmaps in case of postcopy not actually started.
11: not add r-b Fam
 tiny reorganisation of do_test_migration parameters: remove useless default
 values and make shared_storage to be the last
 disable shared storage test for now, until it will be fixed (it will be 
separate
 series, more related to qcow2 than to migration)
12: r-b Fam

also, "iotests: add default node-name" is dropped, as not more needed.


v9

clone: tag postcopy-v9 from https://src.openvz.org/scm/~vsementsov/qemu.git
online: https://src.openvz.org/users/vsementsov/repos/qemu/browse?at=postcopy-v9

01: r-b John
02: was incomplete, now add here bdrv_reclaim_dirty_bitmap fix
03: new
04: new
05: r-b John
07: fix type in commit message, r-b John
09: add comment about is_active_iterate, r-b Snow and keep Juan's r-b, hope 
comment is ok
10: change copyright to Virtuozzo
 reword comment at the top of the file
 rewrite init_dirty_bitmap_migration, to not do same things twice (John)
   and skip _only_ unnamed bitmaps, error out for unnamed nodes (John)
 use new "locked" state of bitmaps instead of frozen on source vm
 do not support migrating bitmap to existent one with the same name,
   keep only create-new-bitmap way
 break loop in dirty_bitmap_load_complete when bitmap is found
 use bitmap locking instead of context acquire
12: rewrite, to add more cases. (note, that 169 iotest is also in my
 "[PATCH v2 0/3] fix bitmaps migration through shared storage", which 
probably should
 go to qemu-stable. So this patch should rewrite it, but here I make it 
like new patch,
 to simplify review. When "[PATCH v2..." merged I'll rebase this on it), 
drop r-b
13: move to separate test, drop r-b


v8.1

clone: tag postcopy-v8.1 from https://src.openvz.org/scm/~vsementsov/qemu.git
online: 
https://src.openvz.org/users/vsementsov/repos/qemu/browse?at=postcopy-v8.1

05: fix compilation, add new version for cmma_save_pending too.


v8

clone: tag postcopy-v8 from https://src.openvz.org/scm/~vsementsov/qemu.git
online: https://src.openvz.org/users/vsementsov/repos/qemu/browse?at=postcopy-v8

- rebased on master
- patches 01-03 from v7 are already merged to master
- patch order is changed to make it possible to merge block/dirty-bitmap patches
   in separate if is needed
01: new patch
03: fixed to use _locked version of bdrv_release_dirty_bitmap
06: qapi-schema.json -> qapi/migration.json
 2.9 -> 2.11
10: protocol changed a bit:
   instead of 1 byte "bitmap enabled flag" this byte becomes just "flags"
   and have "enabled", "persistent" and "autoloading" flags inside.
   also, make all migrated bitmaps to be not persistent (to prevent their
   storing on source vm)
14: new patch


patches status:
01-04 - are only about block/dirty-bitmap and have no r-b. Fam, John, Paolo 
(about bitmap lock),
 please look at. These patches are ok to be merged in separate (but before 
05-14)
other patches are about migration
05-09 has Juan's r-b (and some of them has John's and Eric's r-bs)
10 - the main patch (dirty bitmaps migration), has no r-b.
11 - preparation for tests, not related to migration directly, has Max's r-b, 
ok to be merged
 separately (but before 12-14)
12-14 - tests, 12 and 13 have 

Re: [Qemu-block] [PATCH 3/9] nbd: BLOCK_STATUS for standard get_block_status function: server part

2018-02-16 Thread Vladimir Sementsov-Ogievskiy

16.02.2018 16:21, Eric Blake wrote:

On 02/15/2018 07:51 AM, Vladimir Sementsov-Ogievskiy wrote:

Minimal realization: only one extent in server answer is supported.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---


continuing where I left off,



+++ b/nbd/common.c
@@ -75,6 +75,10 @@ const char *nbd_opt_lookup(uint32_t opt)
  return "go";
  case NBD_OPT_STRUCTURED_REPLY:
  return "structured reply";
+    case NBD_OPT_LIST_META_CONTEXT:
+    return "list meta context";
+    case NBD_OPT_SET_META_CONTEXT:
+    return "set meta context";
  default:


Should the changes to the header for new macros and to common.c for 
mapping bits to names be split into a separate patch, so that someone 
could backport just the new constants and then the client-side 
implementation, rather than being forced to backport the rest of the 
server implementation at the same time?


Not a problem, I can split. I've thought about it, but decided for first 
roll send it as one patch.





+
+/* nbd_read_size_string
+ *
+ * Read string in format
+ *   uint32_t len
+ *   len bytes string (not 0-terminated)
+ *
+ * @buf should be enough to store @max_len+1
+ *
+ * Return -errno on I/O error, 0 if option was completely handled by
+ * sending a reply about inconsistent lengths, or 1 on success. */
+static int nbd_read_size_string(NBDClient *client, char *buf,
+    uint32_t max_len, Error **errp)


Would existing code benefit from using this helper?  If so, splitting 
it into a separate patch, plus converting initial clients to use it, 
would be worthwhile.




+
+static int nbd_negotiate_send_meta_context(NBDClient *client,
+   const char *context,
+   uint32_t context_id,
+   Error **errp)


No comment documenting this function?


+{
+    NBDOptionReplyMetaContext opt;
+    struct iovec iov[] = {
+    {.iov_base = , .iov_len = sizeof(opt)},
+    {.iov_base = (void *)context, .iov_len = strlen(context)}


Casting to void* looks suspicious, but I see that it is casting away 
const.  Okay.



+    };
+
+    set_be_option_rep(, client->opt, NBD_REP_META_CONTEXT,
+  sizeof(opt) - sizeof(opt.h) + iov[1].iov_len);
+    stl_be_p(_id, context_id);
+
+    return qio_channel_writev_all(client->ioc, iov, 2, errp) < 0 ? 
-EIO : 0;

+}
+
+static void nbd_meta_base_query(NBDExportMetaContexts *meta, const 
char *query)

+{


Again, function comments are useful.


+    if (query[0] == '\0' || strcmp(query, "allocation") == 0) {
+    /* Note: empty query should select all contexts within base
+ * namespace. */
+    meta->base_allocation = true;


From the client perspective, this handling of the empty leaf-name 
works well for NBD_OPT_LIST_META_CONTEXT (I want to see what leaf 
nodes the server supports), but not so well for 
NBD_OPT_SET_META_CONTEXT (asking the server to send ALL base 
allocations, even when I don't necessarily know how to interpret 
anything other than base:allocation, is a waste). So this function 
needs a bool parameter that says whether it is being invoked from 
_LIST (empty string okay, to advertise ALL base leaf names back to 
client, which for now is just base:allocation) or from _SET (empty 
string is ignored as invalid; client has to specifically ask for 
base:allocation by name).


"empty string is ignored as invalid", hm, do we have this in spec? I 
think SET and LIST must select exactly same sets of contexts. It is 
strange behavior of client to set "base:", but it is its decision. And I 
don't thing that it is invalid.
Formally we may answer with NBD_REP_ERR_TOO_BIG, but it will look weird, 
as client see that both base: and base:allocation returns _one_  
context, but in one case it is too big. But if we will have several 
base: contextes, server may fairly answer with NBD_REP_ERR_TOO_BIG.


So, I think for now the code is ok.

Also, I don't see NBD_REP_ERR_TOO_BIG possible reply in 
NBD_OPT_LIST_META_CONTEXT description. Should it be here?





+    }
+}
+
+/* nbd_negotiate_meta_query
+ * Return -errno on I/O error, 0 if option was completely handled by
+ * sending a reply about inconsistent lengths, or 1 on success. */
+static int nbd_negotiate_meta_query(NBDClient *client,
+    NBDExportMetaContexts *meta, 
Error **errp)

+{
+    int ret;
+    char *query, *colon, *namespace, *subquery;


Is it worth stack-allocating query here, so you don't have to g_free() 
it later?  NBD limits the maximum string to 4k, which is a little bit 
big for stack allocation (on an operating system with 4k pages, 
allocating more than 4k on the stack in one function risks missing the 
guard page on stack overflow), but we also have the benefit that we 
KNOW that the set of meta-context namespaces that we support have a 
much smaller maximum limit of what we 

Re: [Qemu-block] [PATCH 3/9] nbd: BLOCK_STATUS for standard get_block_status function: server part

2018-02-16 Thread Eric Blake

On 02/15/2018 07:51 AM, Vladimir Sementsov-Ogievskiy wrote:

Minimal realization: only one extent in server answer is supported.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---


continuing where I left off,



+++ b/nbd/common.c
@@ -75,6 +75,10 @@ const char *nbd_opt_lookup(uint32_t opt)
  return "go";
  case NBD_OPT_STRUCTURED_REPLY:
  return "structured reply";
+case NBD_OPT_LIST_META_CONTEXT:
+return "list meta context";
+case NBD_OPT_SET_META_CONTEXT:
+return "set meta context";
  default:


Should the changes to the header for new macros and to common.c for 
mapping bits to names be split into a separate patch, so that someone 
could backport just the new constants and then the client-side 
implementation, rather than being forced to backport the rest of the 
server implementation at the same time?



+
+/* nbd_read_size_string
+ *
+ * Read string in format
+ *   uint32_t len
+ *   len bytes string (not 0-terminated)
+ *
+ * @buf should be enough to store @max_len+1
+ *
+ * Return -errno on I/O error, 0 if option was completely handled by
+ * sending a reply about inconsistent lengths, or 1 on success. */
+static int nbd_read_size_string(NBDClient *client, char *buf,
+uint32_t max_len, Error **errp)


Would existing code benefit from using this helper?  If so, splitting it 
into a separate patch, plus converting initial clients to use it, would 
be worthwhile.




+
+static int nbd_negotiate_send_meta_context(NBDClient *client,
+   const char *context,
+   uint32_t context_id,
+   Error **errp)


No comment documenting this function?


+{
+NBDOptionReplyMetaContext opt;
+struct iovec iov[] = {
+{.iov_base = , .iov_len = sizeof(opt)},
+{.iov_base = (void *)context, .iov_len = strlen(context)}


Casting to void* looks suspicious, but I see that it is casting away 
const.  Okay.



+};
+
+set_be_option_rep(, client->opt, NBD_REP_META_CONTEXT,
+  sizeof(opt) - sizeof(opt.h) + iov[1].iov_len);
+stl_be_p(_id, context_id);
+
+return qio_channel_writev_all(client->ioc, iov, 2, errp) < 0 ? -EIO : 0;
+}
+
+static void nbd_meta_base_query(NBDExportMetaContexts *meta, const char *query)
+{


Again, function comments are useful.


+if (query[0] == '\0' || strcmp(query, "allocation") == 0) {
+/* Note: empty query should select all contexts within base
+ * namespace. */
+meta->base_allocation = true;


From the client perspective, this handling of the empty leaf-name works 
well for NBD_OPT_LIST_META_CONTEXT (I want to see what leaf nodes the 
server supports), but not so well for NBD_OPT_SET_META_CONTEXT (asking 
the server to send ALL base allocations, even when I don't necessarily 
know how to interpret anything other than base:allocation, is a waste). 
So this function needs a bool parameter that says whether it is being 
invoked from _LIST (empty string okay, to advertise ALL base leaf names 
back to client, which for now is just base:allocation) or from _SET 
(empty string is ignored as invalid; client has to specifically ask for 
base:allocation by name).



+}
+}
+
+/* nbd_negotiate_meta_query
+ * Return -errno on I/O error, 0 if option was completely handled by
+ * sending a reply about inconsistent lengths, or 1 on success. */
+static int nbd_negotiate_meta_query(NBDClient *client,
+NBDExportMetaContexts *meta, Error **errp)
+{
+int ret;
+char *query, *colon, *namespace, *subquery;


Is it worth stack-allocating query here, so you don't have to g_free() 
it later?  NBD limits the maximum string to 4k, which is a little bit 
big for stack allocation (on an operating system with 4k pages, 
allocating more than 4k on the stack in one function risks missing the 
guard page on stack overflow), but we also have the benefit that we KNOW 
that the set of meta-context namespaces that we support have a much 
smaller maximum limit of what we even care about.



+
+ret = nbd_alloc_read_size_string(client, , errp);
+if (ret <= 0) {
+return ret;
+}
+
+colon = strchr(query, ':');
+if (colon == NULL) {
+ret = nbd_opt_invalid(client, errp, "no colon in query");
+goto out;


Hmm, that puts a slight wrinkle into my proposal, or else maybe it is 
something I should bring up on the NBD list.  If we only read 5 
characters (because the max namespace WE support is "base:"), but a 
client asks for namespace "X-longname:", we should gracefully ignore the 
client's request; while we still want to reply with an error to a client 
that asks for "garbage" with no colon at all.  The question for the NBD 
spec, then, is whether detecting bad client requests that didn't use 
colon is mandatory for the server (meaning we MUST read the 

Re: [Qemu-block] [PATCH] block: implement the bdrv_reopen_prepare helper for LUKS driver

2018-02-16 Thread Daniel P . Berrangé
Ping, can this be queued in the block tree, since it appears the no-op impl
is ok ?

On Thu, Jan 18, 2018 at 10:31:43AM +, Daniel P. Berrange wrote:
> If the bdrv_reopen_prepare helper isn't provided, the qemu-img commit
> command fails to re-open the base layer after committing changes into
> it. Provide a no-op implementation for the LUKS driver, since there
> is not any custom work that needs doing to re-open it.
> 
> Signed-off-by: Daniel P. Berrange 
> ---
>  block/crypto.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/block/crypto.c b/block/crypto.c
> index 60ddf8623e..bb9a8f5376 100644
> --- a/block/crypto.c
> +++ b/block/crypto.c
> @@ -382,6 +382,12 @@ static void block_crypto_close(BlockDriverState *bs)
>  qcrypto_block_free(crypto->block);
>  }
>  
> +static int block_crypto_reopen_prepare(BDRVReopenState *state,
> +   BlockReopenQueue *queue, Error **errp)
> +{
> +/* nothing needs checking */
> +return 0;
> +}
>  
>  /*
>   * 1 MB bounce buffer gives good performance / memory tradeoff
> @@ -620,6 +626,7 @@ BlockDriver bdrv_crypto_luks = {
>  .bdrv_truncate  = block_crypto_truncate,
>  .create_opts= _crypto_create_opts_luks,
>  
> +.bdrv_reopen_prepare = block_crypto_reopen_prepare,
>  .bdrv_refresh_limits = block_crypto_refresh_limits,
>  .bdrv_co_preadv = block_crypto_co_preadv,
>  .bdrv_co_pwritev= block_crypto_co_pwritev,
> -- 
> 2.14.3
> 

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [Qemu-block] [PATCH] nbd: Honor server's advertised minimum block size

2018-02-16 Thread Vladimir Sementsov-Ogievskiy

15.02.2018 06:29, Eric Blake wrote:

Commit 79ba8c98 (v2.7) changed the setting of request_alignment
to occur only during bdrv_refresh_limits(), rather than at at
bdrv_open() time; but at the time, NBD was unaffected, because
it still used sector-based callbacks, so the block layer
defaulted NBD to use 512 request_alignment.

Later, commit 70c4fb26 (also v2.7) changed NBD to use byte-based
callbacks, without setting request_alignment.  This resulted in
NBD using request_alignment of 1, which works great when the
server supports it (as is the case for qemu-nbd), but falls apart
miserably if the server requires alignment (but only if qemu
actually sends a sub-sector request; qemu-io can do it, but
most qemu operations still perform on sectors or larger).

Even later, the NBD protocol was updated to document that clients
should learn the server's minimum alignment during NBD_OPT_GO;
and recommended that clients should assume a minimum size of 512
unless the server understands NBD_OPT_GO and replied with a smaller
size.  Commit 081dd1fe (v2.10) attempted to do that, by assigning
request_alignment to whatever was learned from the server; but
it has two flaws: the assignment is done during bdrv_open() so
it gets unconditionally wiped out back to 1 during any later
bdrv_refresh_limits(); and the code is not using a default of 512
when the server did not report a minimum size.

Fix these issues by moving the assignment to request_alignment
to the right function, and by using a sane default when the
server does not advertise a minimum size.

CC: qemu-sta...@nongnu.org
Signed-off-by: Eric Blake 
---
  block/nbd-client.c | 3 ---
  block/nbd.c| 2 ++
  2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/block/nbd-client.c b/block/nbd-client.c
index 9206652e45c..7b68499b76a 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -846,9 +846,6 @@ int nbd_client_init(BlockDriverState *bs,
  if (client->info.flags & NBD_FLAG_SEND_WRITE_ZEROES) {
  bs->supported_zero_flags |= BDRV_REQ_MAY_UNMAP;
  }
-if (client->info.min_block > bs->bl.request_alignment) {
-bs->bl.request_alignment = client->info.min_block;
-}

  qemu_co_mutex_init(>send_mutex);
  qemu_co_queue_init(>free_sema);
diff --git a/block/nbd.c b/block/nbd.c
index ef81a9f53ba..69b5fd5e8fa 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -474,8 +474,10 @@ static int nbd_co_flush(BlockDriverState *bs)
  static void nbd_refresh_limits(BlockDriverState *bs, Error **errp)
  {
  NBDClientSession *s = nbd_get_client_session(bs);
+uint32_t min = s->info.min_block;
  uint32_t max = MIN_NON_ZERO(NBD_MAX_BUFFER_SIZE, s->info.max_block);

+bs->bl.request_alignment = min ? min : BDRV_SECTOR_SIZE;
  bs->bl.max_pdiscard = max;
  bs->bl.max_pwrite_zeroes = max;
  bs->bl.max_transfer = max;


Reviewed-by: Vladimir Sementsov-Ogievskiy



--
Best regards,
Vladimir



Re: [Qemu-block] [PATCH 3/9] nbd: BLOCK_STATUS for standard get_block_status function: server part

2018-02-16 Thread Eric Blake

On 02/16/2018 05:09 AM, Vladimir Sementsov-Ogievskiy wrote:

16.02.2018 02:02, Eric Blake wrote:

On 02/15/2018 07:51 AM, Vladimir Sementsov-Ogievskiy wrote:

Minimal realization: only one extent in server answer is supported.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  include/block/nbd.h |  33 ++
  nbd/common.c    |  10 ++
  nbd/server.c    | 310 
+++-

  3 files changed, 352 insertions(+), 1 deletion(-)




@@ -200,9 +227,15 @@ enum {
  #define NBD_REPLY_TYPE_NONE  0
  #define NBD_REPLY_TYPE_OFFSET_DATA   1
  #define NBD_REPLY_TYPE_OFFSET_HOLE   2
+#define NBD_REPLY_TYPE_BLOCK_STATUS  5


Stale; see nbd.git commit 56c77720 which changed this to 3.


Very unpleasant surprise. I understand that this is still experimental 
extension, but actually we use it =5 in production about one year. Can 
we revert it to 5?


For that, you'll have to ask on the upstream NBD list.  There may be 
other things that turn out to need tweaking as I get through the rest of 
your initial implementation, vs. what works well for interoperability, 
so having a distinct number for experimental vs. actual may be 
worthwhile for other reasons as well, even if it requires some glue code 
on your side to handle an older server sending 5 instead of 3 to a newer 
client.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-block] [PATCH v2 7/8] file-posix: account discard operations

2018-02-16 Thread Alberto Garcia
On Mon 12 Feb 2018 05:19:57 PM CET, Anton Nefedov wrote:
>>> @@ -158,6 +158,11 @@ typedef struct BDRVRawState {
>>>   bool page_cache_inconsistent:1;
>>>   bool has_fallocate;
>>>   bool needs_alignment;
>>> +struct {
>>> +int64_t discard_nb_ok;
>>> +int64_t discard_nb_failed;
>>> +int64_t discard_bytes_ok;
>>> +} stats;
>> 
>> Shouldn't this new structure be defined in a header file so other
>> drivers can use it? Or did you define it here because you don't see that
>> happening soon?
>> 
>
> I guess there's no reason to burden the common header files as long as
> it's not really used anywhere else.

Fair enough,

Reviewed-by: Alberto Garcia 

Berto



Re: [Qemu-block] [PATCH 3/9] nbd: BLOCK_STATUS for standard get_block_status function: server part

2018-02-16 Thread Vladimir Sementsov-Ogievskiy

16.02.2018 02:02, Eric Blake wrote:

On 02/15/2018 07:51 AM, Vladimir Sementsov-Ogievskiy wrote:

Minimal realization: only one extent in server answer is supported.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  include/block/nbd.h |  33 ++
  nbd/common.c    |  10 ++
  nbd/server.c    | 310 
+++-

  3 files changed, 352 insertions(+), 1 deletion(-)




@@ -200,9 +227,15 @@ enum {
  #define NBD_REPLY_TYPE_NONE  0
  #define NBD_REPLY_TYPE_OFFSET_DATA   1
  #define NBD_REPLY_TYPE_OFFSET_HOLE   2
+#define NBD_REPLY_TYPE_BLOCK_STATUS  5


Stale; see nbd.git commit 56c77720 which changed this to 3.


Very unpleasant surprise. I understand that this is still experimental 
extension, but actually we use it =5 in production about one year. Can 
we revert it to 5?





+++ b/nbd/server.c
@@ -82,6 +82,15 @@ struct NBDExport {
    static QTAILQ_HEAD(, NBDExport) exports = 
QTAILQ_HEAD_INITIALIZER(exports);

  +/* NBDExportMetaContexts represents list of selected by
+ * NBD_OPT_SET_META_CONTEXT contexts to be exported. */


represents a list of contexts to be exported, as selected by 
NBD_OPT_SET_META_CONTEXT.



+typedef struct NBDExportMetaContexts {
+    char export_name[NBD_MAX_NAME_SIZE + 1];


Would this work as const char * pointing at some other storage, 
instead of having to copy into this storage?


I'll think about



+    bool valid; /* means that negotiation of the option finished 
without

+   errors */
+    bool base_allocation; /* export base:allocation context (block 
status) */

+} NBDExportMetaContexts;
+
  struct NBDClient {


@@ -636,6 +646,201 @@ static QIOChannel 
*nbd_negotiate_handle_starttls(NBDClient *client,

  return QIO_CHANNEL(tioc);
  }
  +/* nbd_alloc_read_size_string
+ *
+ * Read string in format
+ *   uint32_t len
+ *   len bytes string (not 0-terminated)
+ * String is allocated and pointer returned as @buf
+ *
+ * Return -errno on I/O error, 0 if option was completely handled by
+ * sending a reply about inconsistent lengths, or 1 on success. */
+static int nbd_alloc_read_size_string(NBDClient *client, char **buf,
+  Error **errp)
+{
+    int ret;
+    uint32_t len;
+
+    ret = nbd_opt_read(client, , sizeof(len), errp);
+    if (ret <= 0) {
+    return ret;
+    }
+    cpu_to_be32s();
+
+    *buf = g_try_malloc(len + 1);


I'd rather check that len is sane prior to trying to malloc. 
Otherwise, a malicious client can convince us to waste time/space 
doing a large malloc before we finally realize that we can't read that 
many bytes after all.  And in fact, if you check len in advance, you 
can probably just use g_malloc() instead of g_try_malloc() 
(g_try_malloc() makes sense on a 1M allocation, where we can still 
allocate smaller stuff in reporting the error; but since NBD limits 
strings to 4k, if we fail at allocating 4k, we are probably already so 
hosed that our attempts to report the failure will also run out of 
memory and abort, so why not just abort now).


reasonable, will do




+    if (*buf == NULL) {
+    error_setg(errp, "No memory");
+    return -ENOMEM;
+    }
+    (*buf)[len] = '\0';
+
+    ret = nbd_opt_read(client, *buf, len, errp);
+    if (ret <= 0) {
+    g_free(*buf);
+    *buf = NULL;
+    }
+
+    return ret;
+}
+
+/* nbd_read_size_string


Will resume review here...




--
Best regards,
Vladimir



Re: [Qemu-block] [PATCH v2] block: unify blocksize types

2018-02-16 Thread Alberto Garcia
On Fri 09 Feb 2018 10:53:12 AM CET, Piotr Sarna wrote:
> BlockSizes structure used in block size probing has uint32_t types
> for logical and physical sizes. These fields are wrongfully assigned
> to uint16_t in BlockConf, which results, among other errors,
> in assigning 0 instead of 65536 (which will be the case in at least
> future LizardFS block device driver among other things).
>
> This commit makes BlockConf's physical_block_size and logical_block_size
> fields uint32_t to avoid inconsistencies.
>
> Signed-off-by: Piotr Sarna 
> ---
>  hw/core/qdev-properties.c| 10 +-
>  include/hw/block/block.h |  4 ++--
>  include/hw/qdev-properties.h |  2 +-
>  3 files changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/hw/core/qdev-properties.c b/hw/core/qdev-properties.c
> index 5bbc2d9..1400ba5 100644
> --- a/hw/core/qdev-properties.c
> +++ b/hw/core/qdev-properties.c
> @@ -731,14 +731,14 @@ static void set_blocksize(Object *obj, Visitor *v, 
> const char *name,
>  uint16_t value, *ptr = qdev_get_prop_ptr(dev, prop);
>  Error *local_err = NULL;
>  const int64_t min = 512;
> -const int64_t max = 32768;
> +const int64_t max = 2147483648;
>  
>  if (dev->realized) {
>  qdev_prop_set_after_realize(dev, name, errp);
>  return;
>  }
>  
> -visit_type_uint16(v, name, , _err);
> +visit_type_uint32(v, name, , _err);

I'm not familiar with this code so apologies if there's something that I
have overlooked, but it catches my attention that

   a) you're using 2147483648 as an upper limit, that's INT32_MAX + 1.
  Where does that value come from? It's certainly not the highest
  value that a uint32_t can hold, and it doesn't fit in an int32_t
  either.

   b) you're using visit_type_uint32() but the 'value' you're passing is
  a uint16_t (also *ptr later in the same function).

Berto