Re: [Qemu-block] [PATCH v4 00/21] nbd: add qemu-nbd --list

2019-01-18 Thread Eric Blake
On 1/18/19 7:47 AM, Vladimir Sementsov-Ogievskiy wrote:

> 
>  > Only expose MBR partition @var{num}.  Understands physical partitions
>  > 1-4 and logical partitions 5-8.
> 
> I'm afraid, I'm too lazy to sort out these things I don't know, so just 
> believe. It at least
> corresponds to limits in code that it should be 1 <= partition <= 8 ;)

It matches the code, but I just learned the code is buggy for anything
larger than 5.  According to
http://tldp.org/HOWTO/Large-Disk-HOWTO-13.html, MBR Extended/Logical
partitions form a linked-list, something like:

MBR:  EBR1  EBR2
+--+  +--+  +--+
+ Part 1   +   |->+ Part 5   +   |->+ Part 6   +
+ Part 2   +   |  + Next |  + 0+
+ Part 3   +   |  + 0+  + 0+
+ Part 4 ---  + 0+  + 0+
+--+  +--+  +--+

In reality, ANY of part 1-4 can point to the first EBR, as long as there
is at most one primary partition pointing to the extended partition -
even crazier is that there are different magic numbers in historical
use, such as type 5 for Part 3 and type 85 for Part 4, where MS-DOS
would treat Partition 3 as the extended partition and ignore Part 4, but
Linux would follow both chains (where the second extended partition thus
allowed Linux access to logical partitions residing in space beyond what
DOS could access).  But once you start the extended partition chain, all
logical partitions within the chain each have their own EBR table with
one entry for the partition and the next entry pointing to the next EBR,
meaning you can have more than 8 logical partitions (provided your disk
is big enough).

But our find_partition() code stupidly assumes that if there is any
extended partition type in the MBR (recognizing only type 5 or type F,
but not Linux' type 85), then all four entries in that EBR are logical
partitions 5-8 (and no more, rather than a linked list chain, as in:

MBR:  EBR
+--+  +--+
+ Part 1   +   |->+ Part 5   +
+ Part 2   +   |  + Part 6   +
+ Part 3   +   |  + Part 7   +
+ Part 4 ---  + Part 8   +
+--+  +--+

It's highly unlikely that there are any BIOS implementations that would
actually recognize such a partition beyond 5 as being bootable (there
might be OSs which are a bit more tolerant, since MBR doesn't seem to
have any one hard canonical specification).  I'd have to compare what
the Linux kernel MBR code does, to see if we even stand a chance of
being interoperable in any manner.  It doesn't help that nbdkit does not
yet support Logical MBR partitions.

Oh well, another project for another day; this documentation change is
going in as-is because it at least matches the code, even if the code is
buggy.  (I'm tempted to fix nbdkit to fully support MBR logical
partitions, then rip out the partitioning code in qemu as redundant as
you could get it via nbdkit if you really need it - qemu-nbd's -P 8 has
unchanged, and thus buggy, since at least commit 7a5ca8648 in May 2008,
with no user complaining of a bug for 11 years)

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v3 8/9] qcow2: bdrv_co_pwritev: move encryption code out of the lock

2019-01-18 Thread Vladimir Sementsov-Ogievskiy
18.01.2019 17:39, Alberto Garcia wrote:
> On Fri 18 Jan 2019 12:37:42 PM CET, Vladimir Sementsov-Ogievskiy wrote:
>> 18.01.2019 12:51, Alberto Garcia wrote:
>>> On Tue 08 Jan 2019 06:06:54 PM CET, Vladimir Sementsov-Ogievskiy wrote:
 Encryption will be done in threads, to take benefit of it, we should
 move it out of the lock first.

 Signed-off-by: Vladimir Sementsov-Ogievskiy 
 ---
block/qcow2.c | 35 +--
1 file changed, 21 insertions(+), 14 deletions(-)

 diff --git a/block/qcow2.c b/block/qcow2.c
 index d6ef606d89..76d3715350 100644
 --- a/block/qcow2.c
 +++ b/block/qcow2.c
 @@ -2077,11 +2077,20 @@ static coroutine_fn int 
 qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset,
ret = qcow2_alloc_cluster_offset(bs, offset, _bytes,
 _offset, );
if (ret < 0) {
 -goto fail;
 +goto out_locked;
}

assert((cluster_offset & 511) == 0);

 +ret = qcow2_pre_write_overlap_check(bs, 0,
 +cluster_offset + 
 offset_in_cluster,
 +cur_bytes);
 +if (ret < 0) {
 +goto out_locked;
 +}
 +
 +qemu_co_mutex_unlock(>lock);
>>>
>>> The usage of lock() and unlock() functions inside and outside of the
>>> loop plus the two 'locked' and 'unlocked' exit paths is starting to make
>>> things a bit more messy.
>>>
>>> Having a look at the code it seems that there's only these three parts
>>> in the whole function that need to have the lock held:
>>>
>>> one:
>>>  ret = qcow2_alloc_cluster_offset(bs, offset, _bytes,
>>>   _offset, );
>>>  /* ... */
>>>  ret = qcow2_pre_write_overlap_check(bs, 0,
>>>  cluster_offset +
>>>  offset_in_cluster,
>>>  cur_bytes);
>>>
>>> two:
>>>
>>>  ret = qcow2_handle_l2meta(bs, , true);
>>>
>>>
>>> three:
>>>  qcow2_handle_l2meta(bs, , false);
>>>
>>>
>>> So I wonder if it's perhaps simpler to add lock/unlock calls around
>>> those blocks?
>>
>> this means, that we'll unlock after qcow2_handle_l2meta and then
>> immediately lock on next iteration before qcow2_alloc_cluster_offset,
>> so current code avoids this extra unlock/lock..
> 
> I don't have a very strong opinion, but maybe it's worth having if it
> makes the code easier to read and maintain.
> 

Intuitively I think, that locks optimization worth this difficulty,
so I'd prefer to leave this logic as is, at least in these series.


-- 
Best regards,
Vladimir


Re: [Qemu-block] [PATCH 0/3] block: fix last address-of-packed-member warnings

2019-01-18 Thread Kevin Wolf
Am 10.12.2018 um 12:26 hat Peter Maydell geschrieben:
> This patchset fixes the remaining clang warnings in the block/ code
> about taking the address of a packed struct member, which are all
> in block/vpc and block/vdi code handling UUIDs. Mostly I fix
> these by copying the unaligned field to/from a local variable.
> In the case of qemu_uuid_bswap() I opted to change the API to
> take and return the QemuUUID rather than taking a pointer to it,
> which makes almost all the callsites simpler. This does mean
> a struct copy but the struct is only 16 bytes and I didn't
> judge any of the callsites performance-sensitive enough to care
> about a struct copy of that size.
> 
> As usual, tested with "make check" only.

Thanks, applied to the block branch.

Kevin



Re: [Qemu-block] [PATCH 2/3] file: Expose some protocol-specific information

2019-01-18 Thread Eric Blake
On 1/18/19 8:08 AM, Vladimir Sementsov-Ogievskiy wrote:

>>
>> +##
>> +# @ImageInfoSpecificFile:
>> +#
>> +# Details about a file protocol BDS.
>> +#
>> +# @align: Alignment in use
>> +#
>> +# @discard: True if discard operations can be attempted
>> +#
>> +# @write-zero: True if efficient write zero operations can be attempted
>> +#
>> +# @discard-zero: True if discarding is known to read back as zero
>> +#
>> +# Since: 4.0
>> +##
>> +{ 'struct': 'ImageInfoSpecificFile',
>> +  'data': { 'align': 'int', 'discard': 'bool', 'write-zero': 'bool',
>> +'*discard-zero': 'bool' } }
> 
> Not sure if this works, as I remember raw-posix will adjust these fields
> on first failed write, and without this first write, we will not actually
> know are they supported.

Indeed. We could make them a tri-state enum instead of a bool (unknown,
supported, unsupported) - but it would require code tweaks to use the
tri-state in the code too (basically, treat unknown and supported as a
request to try on the first write that needs it; and change unknown to
supported or unsupported after that first write).  Then again, I worded
it as "can be attempted", not "known to work", so while it is indeed
non-helpful for 'qemu-img info' (because the attempt wasn't resolved
into something known for sure), it DOES help a long-running qemu process
when querying the same information over QMP (where such writes have
probably been attempted in the past).

Again, this series is more of an RFC on what do we really want to expose
to the user, and when; and not necessarily a hard commitment that this
is the best struct.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [PATCH v4 00/21] nbd: add qemu-nbd --list

2019-01-18 Thread Eric Blake
On 1/17/19 1:36 PM, Eric Blake wrote:
> I got tired of debugging whether a server was advertising the
> correct things during negotiation by inspecting the trace
> logs of qemu-io as client - not to mention that without SOME
> sort of client tracing particular commands, we can't easily
> regression test the server for correct behavior.  The final
> straw was at KVM Forum, when Nir asked me to make sure there
> was a way to easily determine if an NBD server is exposing what
> we really want (and fixing x-dirty-bitmap to behave saner fell
> out as a result of answering that question).
> 
> I note that upstream NBD has 'nbd-client -l $host' for querying
> just export names (with no quoting, so you have to know that
> a blank line means the default export), but it wasn't powerful
> enough, so I implemented 'qemu-nbd -L' to document everything.
> Upstream NBD has separate 'nbd-client' and 'nbd-server' binaries,
> while we only have 'qemu-nbd' (which is normally just a server,
> but 'qemu-nbd -c' also operates a second thread as a client).
> Our other uses of qemu as NBD client are for consuming a block
> device (as in qemu-io, qemu-img, or a drive to qemu) - but those
> binaries are less suited to something so specific to the NBD
> protocol.
> 
> Bonus: As a result of my work on this series, nbdkit now supports
> NBD_OPT_INFO (my interoperability testing between server
> implementations has been paying off, both at fixing server bugs,
> and at making this code more reliable across difference in valid
> servers).
> 
> Also available at:
> https://repo.or.cz/qemu/ericb.git qemu-nbd-list-v4

Vladimir spotted a few more tweaks to make, but they all look
sufficiently minor that I'll go ahead and queue this on my NBD tree for
a pull request on Monday, rather than posting a v5, if there aren't any
other major comments in the meantime.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v4 00/21] nbd: add qemu-nbd --list

2019-01-18 Thread Eric Blake
On 1/18/19 7:47 AM, Vladimir Sementsov-Ogievskiy wrote:

>>
>> Interesting, I don't get it again. Searched in outlook online, I found only 
>> v2 of it,
>> checked spam folder and filters. Magic.

Weird indeed.  Maybe there's enough @ signs due to .texi content in
there that a spam filter somewhere along the line ate it silently?

>>
>> anyway, I can look at 
>> https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg04328.html
>>
>>
> 
> just look at it in your qemu-nbd-list-v4 tag:

Thanks for that extra effort.

> 
>  > Only expose MBR partition @var{num}.  Understands physical partitions
>  > 1-4 and logical partitions 5-8.
> 
> I'm afraid, I'm too lazy to sort out these things I don't know, so just 
> believe. It at least
> corresponds to limits in code that it should be 1 <= partition <= 8 ;)
> 
>  > @item -c, --connect=@var{dev}
>  > Connect @var{filename} to NBD device @var{dev} (Linux only).
>  > @item -d, --disconnect
>  > Disconnect the device @var{dev} (Linux only).
> 
> Yes, and we now have clean code which establish this limitation
> 
>  > @item -e, --shared=@var{num}
>  > Allow up to @var{num} clients to share the device (default
>  > @samp{1}). Safe for readers, but for now, consistency is not
>  > guaranteed between multiple writers.
> 
> Hmm, don't understand, why you decided to move @samp{1}). on the following 
> line, it looks
> better as it was. And with a period it's only 69 symbols, the file have a lot 
> longer lines.

Emacs auto-reflowed the line as I edited the paragraph.  End-of-lines
doesn't really matter in .texi files (it gets reflowed again in the
creation of output documentation from the .texi input), so if my editor
tries to stick to certain column lengths, I don't really fight it.


>  > @example
>  > qemu-nbd \
>  >   --object tls-creds-x509,id=tls0,endpoint=server,dir=/path/to/qemutls \
>  >   --tls-creds tls0 -t -x subset -p 10810 \
>  >   --image-opts 
> driver=raw,offset=1M,length=1M,file.driver=file,file.filename=file.raw
>  > @end example
> 
> I don't know tls-related, the other options looks fine.

I looked through past git logs to see the command lines that Dan used
when initially implementing things; as well as comparison to iotest 233.

> 
> upd (decided to run the command, with dropped tls):
> looks fine but not work, as s/length/size.

Oh, good catch!  Will fix.

> 
>  >
>  > Serve a read-only copy of just the first MBR partition of a guest
>  > image over a Unix socket with as many as 5 simultaneous readers, with
>  > a persistent process forked as a daemon:
>  >
>  > @example
>  > qemu-nbd --fork -t -e 5 -s /path/to/sock -p 1 -r -f qcow2 file.qcow2
>  > @end example
> 
> Oops. s/-s/-k/, s/-p/-P/,

Ouch. Yes, thanks for catching that.

 and idea: may be, use self-descriptive long option names in
> examples?

Hmm. The nice part about short options is that they are less typing and
shorter lines; but verbose options are also useful.  Maybe listing the
same example twice, in both the short and long form?

> 
>  >
>  > Expose the guest-visible contents of a qcow2 file via a block device
>  > /dev/nbd0 (and possibly creating /dev/nbd0p1 and friends for
>  > partitions found within), then disconnect the device when done.
>  > @emph{CAUTION}: Do not use this method to mount filesystems from an
>  > untrusted guest image - a malicious guest may have prepared the image
>  > to attempt to trigger kernel bugs in partition probing or file system
>  > mounting.
>  >
>  > @example
> 
> should we note, that nbd kernel-module should be loaded for this to work?

Yes, that's worth mentioning (some distros include that module to be
running or autoload already, but it's not universal, so such a note is
indeed warranted).

> 
>  > qemu-nbd -c /dev/nbd0 -f qcow2 file.qcow2
>  > qemu-nbd -d /dev/nbd0
>  > @end example
> 
> With fixed wrong option names and s/length/size:
> for 03/21:
> Reviewed-by: Vladimir Sementsov-Ogievskiy 
> 
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v4 01/21] iotests: Make 233 output more reliable

2019-01-18 Thread Eric Blake
On 1/18/19 4:02 AM, Daniel P. Berrangé wrote:
> On Thu, Jan 17, 2019 at 01:36:38PM -0600, Eric Blake wrote:
>> We have a race between the nbd server and the client both trying
>> to report errors at once which can make the test sometimes fail
>> if the output lines swap order under load.  Break the race by
>> collecting server messages into a file and then replaying that
>> at the end of the test.
>>
>> Signed-off-by: Eric Blake 
>> CC: Daniel P. Berrangé 
>>
>> ---
>> An alternative solution might be to silence the message from the
>> server by default, and output it only when -v was passed
> 
> I wouldn't consider this an either/or situation. It is probably
> good practice to qemu-nbd to be completely silent wrt client
> problems so a malicious client can't spam the qemu-nbd log (if
> any). None the less it is also useful to have the iotests validate
> that this log message is printed.

Thus, the idea for future patches is to:
- teach qemu-nbd to be silent on client disconnects by default to avoid
a malicious client performing DoS by excessive logging,
- teach iotests to run qemu-nbd with -v to double-check what server
logs, as verbose server logs are quite handy when debugging why a
particular client can't connect

Now that the issue is public, is this something I should report to
secalert, or is it not at the level of a CVE?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v4] qemu-io: Add generic function for reinitializing optind.

2019-01-18 Thread Eric Blake
On 1/18/19 4:11 AM, Richard W.M. Jones wrote:
> On FreeBSD 11.2:
> 
>   $ nbdkit memory size=1M --run './qemu-io -f raw -c "aio_write 0 512" $nbd'
>   Parsing error: non-numeric argument, or extraneous/unrecognized suffix -- 
> aio_write
> 
> After main option parsing, we reinitialize optind so we can parse each
> command.  However reinitializing optind to 0 does not work on FreeBSD.
> What happens when you do this is optind remains 0 after the option

> This commit introduces an OS-portability function called
> qemu_reset_optind which provides a way of resetting optind that works
> on FreeBSD and platforms that use optreset, while keeping it the same
> as now on other platforms.
> 
> Note that the qemu codebase sets optind in many other places, but in
> those other places it's setting a local variable and not using getopt.
> This change is only needed in places where we are using getopt and the
> associated global variable optind.
> 
> Signed-off-by: Richard W.M. Jones 
> ---
>  configure| 14 ++
>  include/qemu/osdep.h | 16 
>  qemu-img.c   |  2 +-
>  qemu-io-cmds.c   |  2 +-
>  4 files changed, 32 insertions(+), 2 deletions(-)

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] qemu-img info ran at 100% CPU for 45 minutes without writing a byte (stopped it)

2019-01-18 Thread Alberto Garcia
On Wed 16 Jan 2019 02:15:24 AM CET, james harvey wrote:

> I ran:
>
> # qemu-img convert /var/lib/libvirt/images/win7.qcow2 -O raw
> /mnt/tmpqcow/win7.raw
>
> 45 minutes later, qemu-img had been running with 100% CPU every time I
> checked, and it had allocated the raw file, but still hadn't actually
> written a single byte: (note the dd in the VM completed the 90GB in
> about 8 minutes)

[...]

> After running this long, I ran strace for 15 seconds, here:
> https://termbin.com/gg9k -- It's repeatedly running lseek with
> SEEK_DATA and SEEK_HOLE.  The SEEK_HOLE always results in 96251936768,
> and SEEK_DATA is different results.

It seems like the problem addressed by this patch:

   https://lists.gnu.org/archive/html/qemu-block/2019-01/msg00246.html

Berto



Re: [Qemu-block] [PATCH v5 11/11] block/backup: use backup-top instead of write notifiers

2019-01-18 Thread Max Reitz
On 29.12.18 13:20, Vladimir Sementsov-Ogievskiy wrote:
> Drop write notifiers and use filter node instead. Changes:
> 
> 1. copy-before-writes now handled by filter node, so, drop all
>is_write_notifier arguments.
> 
> 2. we don't have intersecting requests, so their handling is dropped.
> Instead, synchronization works as follows:
> when backup or backup-top starts copying of some area it firstly
> clears copy-bitmap bits, and nobody touches areas, not marked with
> dirty bits in copy-bitmap, so there is no intersection. Also, backup
> job copy operations are surrounded by bdrv region lock, which is
> actually serializing request, to not interfer with guest writes and
> not read changed data from source (before reading we clear
> corresponding bit in copy-bitmap, so, this area is not more handled by
> backup-top).
> 
> 3. To sync with in-flight requests we now just drain hook node, we
> don't need rw-lock.
> 
> 4. After the whole backup loop (top, full, incremental modes), we need
> to check for not copied clusters, which were under backup-top operation
> and we skipped them, but backup-top operation failed, error returned to
> the guest and dirty bits set back.
> 
> 5. Don't create additional blk, use backup-top children for copy
> operations.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  block/backup.c | 285 ++---
>  1 file changed, 149 insertions(+), 136 deletions(-)
> 
> diff --git a/block/backup.c b/block/backup.c
> index 88c0242b4e..e332909fb7 100644
> --- a/block/backup.c
> +++ b/block/backup.c

[...]

> @@ -300,21 +231,23 @@ static void backup_abort(Job *job)
>  static void backup_clean(Job *job)
>  {
>  BackupBlockJob *s = container_of(job, BackupBlockJob, common.job);
> -assert(s->target);
> -blk_unref(s->target);
> +
> +/* We must clean it to not crash in backup_drain. */
>  s->target = NULL;

Why not set s->source to NULL along with it?  It makes sense if you're
going to drop the backup-top node because both of these are its children.

>  
>  if (s->copy_bitmap) {
>  hbitmap_free(s->copy_bitmap);
>  s->copy_bitmap = NULL;
>  }
> +
> +bdrv_backup_top_drop(s->backup_top);
>  }

[...]

> @@ -386,21 +319,45 @@ static int coroutine_fn 
> backup_run_incremental(BackupBlockJob *job)
>  bool error_is_read;
>  int64_t offset;
>  HBitmapIter hbi;
> +void *lock = NULL;
>  
>  hbitmap_iter_init(, job->copy_bitmap, 0);
> -while ((offset = hbitmap_iter_next()) != -1) {
> +while (hbitmap_count(job->copy_bitmap)) {
> +offset = hbitmap_iter_next();
> +if (offset == -1) {
> +/*
> + * we may have skipped some clusters, which were handled by
> + * backup-top, but failed and finished by returning error to
> + * the guest and set dirty bit back.
> + */
> +hbitmap_iter_init(, job->copy_bitmap, 0);
> +offset = hbitmap_iter_next();
> +assert(offset);

I think you want to assert "offset >= 0".

> +}
> +
> +lock = bdrv_co_try_lock(job->source, offset, job->cluster_size);
> +/*
> + * Dirty bit is set, which means that there are no in-flight
> + * write requests on this area. We must succeed.
> + */
> +assert(lock);

I'm not sure that is true right now, but more on that below in backup_run().

> +
>  do {
>  if (yield_and_check(job)) {
> +bdrv_co_unlock(lock);
>  return 0;
>  }
> -ret = backup_do_cow(job, offset,
> -job->cluster_size, _is_read, false);
> +ret = backup_do_cow(job, offset, job->cluster_size, 
> _is_read);
>  if (ret < 0 && backup_error_action(job, error_is_read, -ret) ==
> BLOCK_ERROR_ACTION_REPORT)
>  {
> +bdrv_co_unlock(lock);
>  return ret;
>  }
>  } while (ret < 0);
> +
> +bdrv_co_unlock(lock);
> +lock = NULL;

This statement seems unnecessary here.

>  }
>  
>  return 0;

[...]

> @@ -447,26 +402,39 @@ static int coroutine_fn backup_run(Job *job, Error 
> **errp)
>  hbitmap_set(s->copy_bitmap, 0, s->len);
>  }
>  
> -s->before_write.notify = backup_before_write_notify;
> -bdrv_add_before_write_notifier(bs, >before_write);
> -
>  if (s->sync_mode == MIRROR_SYNC_MODE_NONE) {
>  /* All bits are set in copy_bitmap to allow any cluster to be copied.
>   * This does not actually require them to be copied. */
>  while (!job_is_cancelled(job)) {
> -/* Yield until the job is cancelled.  We just let our 
> before_write
> - * notify callback service CoW requests. */
> +/*
> + * Yield until the job is cancelled.  We just let our backup-top
> + * fileter driver 

Re: [Qemu-block] [PATCH v3 8/9] qcow2: bdrv_co_pwritev: move encryption code out of the lock

2019-01-18 Thread Alberto Garcia
On Fri 18 Jan 2019 12:37:42 PM CET, Vladimir Sementsov-Ogievskiy wrote:
> 18.01.2019 12:51, Alberto Garcia wrote:
>> On Tue 08 Jan 2019 06:06:54 PM CET, Vladimir Sementsov-Ogievskiy wrote:
>>> Encryption will be done in threads, to take benefit of it, we should
>>> move it out of the lock first.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>>> ---
>>>   block/qcow2.c | 35 +--
>>>   1 file changed, 21 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/block/qcow2.c b/block/qcow2.c
>>> index d6ef606d89..76d3715350 100644
>>> --- a/block/qcow2.c
>>> +++ b/block/qcow2.c
>>> @@ -2077,11 +2077,20 @@ static coroutine_fn int 
>>> qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset,
>>>   ret = qcow2_alloc_cluster_offset(bs, offset, _bytes,
>>>_offset, );
>>>   if (ret < 0) {
>>> -goto fail;
>>> +goto out_locked;
>>>   }
>>>   
>>>   assert((cluster_offset & 511) == 0);
>>>   
>>> +ret = qcow2_pre_write_overlap_check(bs, 0,
>>> +cluster_offset + 
>>> offset_in_cluster,
>>> +cur_bytes);
>>> +if (ret < 0) {
>>> +goto out_locked;
>>> +}
>>> +
>>> +qemu_co_mutex_unlock(>lock);
>> 
>> The usage of lock() and unlock() functions inside and outside of the
>> loop plus the two 'locked' and 'unlocked' exit paths is starting to make
>> things a bit more messy.
>> 
>> Having a look at the code it seems that there's only these three parts
>> in the whole function that need to have the lock held:
>> 
>> one:
>> ret = qcow2_alloc_cluster_offset(bs, offset, _bytes,
>>  _offset, );
>> /* ... */
>> ret = qcow2_pre_write_overlap_check(bs, 0,
>> cluster_offset +
>> offset_in_cluster,
>> cur_bytes);
>> 
>> two:
>> 
>> ret = qcow2_handle_l2meta(bs, , true);
>> 
>> 
>> three:
>> qcow2_handle_l2meta(bs, , false);
>> 
>> 
>> So I wonder if it's perhaps simpler to add lock/unlock calls around
>> those blocks?
>
> this means, that we'll unlock after qcow2_handle_l2meta and then
> immediately lock on next iteration before qcow2_alloc_cluster_offset,
> so current code avoids this extra unlock/lock..

I don't have a very strong opinion, but maybe it's worth having if it
makes the code easier to read and maintain.

Berto



Re: [Qemu-block] [PATCH 1/3] block: Expose protocol-specific data to 'qemu-img info'

2019-01-18 Thread Vladimir Sementsov-Ogievskiy
17.01.2019 18:33, Eric Blake wrote:
> 'qemu-img info' is useful for showing additional information
> about an image - but sometimes the interesting information is
> specific to the protocol rather than to the format layer.  Set
> the groundwork for showing this information; further patches
> will then enable specific pieces of information.
> 
> Signed-off-by: Eric Blake 
> ---
>   qapi/block-core.json | 6 +-
>   block/qapi.c | 7 +++
>   2 files changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 91685be6c29..f28d5f5fc76 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -152,6 +152,9 @@
>   # @format-specific: structure supplying additional format-specific
>   # information (since 1.7)
>   #
> +# @protocol-specific: structure supplying additional protocol-specific
> +# information (since 4.0)
> +#
>   # Since: 1.3
>   #
>   ##
> @@ -162,7 +165,8 @@
>  '*backing-filename': 'str', '*full-backing-filename': 'str',
>  '*backing-filename-format': 'str', '*snapshots': 
> ['SnapshotInfo'],
>  '*backing-image': 'ImageInfo',
> -   '*format-specific': 'ImageInfoSpecific' } }
> +   '*format-specific': 'ImageInfoSpecific',
> +   '*protocol-specific': 'ImageInfoSpecific' } }
> 
>   ##
>   # @ImageCheck:
> diff --git a/block/qapi.c b/block/qapi.c
> index c66f949db83..a8d104ba8ce 100644
> --- a/block/qapi.c
> +++ b/block/qapi.c
> @@ -284,6 +284,9 @@ void bdrv_query_image_info(BlockDriverState *bs,
>   }
>   info->format_specific = bdrv_get_specific_info(bs);
>   info->has_format_specific = info->format_specific != NULL;
> +info->protocol_specific =
> +bs->file ? bdrv_get_specific_info(bs->file->bs) : NULL;
> +info->has_protocol_specific = info->protocol_specific != NULL;
> 
>   backing_filename = bs->backing_file;
>   if (backing_filename[0] != '\0') {
> @@ -870,4 +873,8 @@ void bdrv_image_info_dump(fprintf_function func_fprintf, 
> void *f,
>   func_fprintf(f, "Format specific information:\n");
>   bdrv_image_info_specific_dump(func_fprintf, f, 
> info->format_specific);
>   }
> +if (info->has_protocol_specific) {
> +func_fprintf(f, "Protocol specific information:\n");
> +bdrv_image_info_specific_dump(func_fprintf, f, 
> info->protocol_specific);
> +}
>   }
> 


A bit related question:

Why do we always have nbd node as a protocol node? Can't we use nbd node 
directly,
without raw layer?


-- 
Best regards,
Vladimir


Re: [Qemu-block] [PATCH 2/3] file: Expose some protocol-specific information

2019-01-18 Thread Vladimir Sementsov-Ogievskiy
17.01.2019 18:33, Eric Blake wrote:
> When analyzing performance, it might be nice to let 'qemu-img info'
> expose details that it learned from the underlying file or block
> device.  Start the process by picking a few useful items determined
> during raw_open_common().
> 
> Signed-off-by: Eric Blake 
> ---
>   qapi/block-core.json | 24 +++-
>   block/file-posix.c   | 21 +
>   2 files changed, 44 insertions(+), 1 deletion(-)
> 
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index f28d5f5fc76..296c22a1003 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -101,6 +101,25 @@
> 'extents': ['ImageInfo']
> } }
> 
> +##
> +# @ImageInfoSpecificFile:
> +#
> +# Details about a file protocol BDS.
> +#
> +# @align: Alignment in use
> +#
> +# @discard: True if discard operations can be attempted
> +#
> +# @write-zero: True if efficient write zero operations can be attempted
> +#
> +# @discard-zero: True if discarding is known to read back as zero
> +#
> +# Since: 4.0
> +##
> +{ 'struct': 'ImageInfoSpecificFile',
> +  'data': { 'align': 'int', 'discard': 'bool', 'write-zero': 'bool',
> +'*discard-zero': 'bool' } }

Not sure if this works, as I remember raw-posix will adjust these fields
on first failed write, and without this first write, we will not actually
know are they supported.

> +
>   ##
>   # @ImageInfoSpecific:
>   #
> @@ -110,12 +129,15 @@
>   ##
>   { 'union': 'ImageInfoSpecific',
> 'data': {
> +  # Format drivers:
> 'qcow2': 'ImageInfoSpecificQCow2',
> 'vmdk': 'ImageInfoSpecificVmdk',
> # If we need to add block driver specific parameters for
> # LUKS in future, then we'll subclass QCryptoBlockInfoLUKS
> # to define a ImageInfoSpecificLUKS
> -  'luks': 'QCryptoBlockInfoLUKS'
> +  'luks': 'QCryptoBlockInfoLUKS',
> +  # Protocol drivers:
> +  'file': 'ImageInfoSpecificFile'
> } }
> 
>   ##
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 8aee7a3fb8b..d4ce0e9116c 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -1079,6 +1079,23 @@ static void raw_refresh_limits(BlockDriverState *bs, 
> Error **errp)
>   bs->bl.opt_mem_alignment = MAX(s->buf_align, getpagesize());
>   }
> 
> +static ImageInfoSpecific *raw_get_specific_info(BlockDriverState *bs)
> +{
> +BDRVRawState *s = bs->opaque;
> +ImageInfoSpecific *info = g_new0(ImageInfoSpecific, 1);
> +ImageInfoSpecificFile *infofile = g_new0(ImageInfoSpecificFile, 1);
> +
> +info->type = IMAGE_INFO_SPECIFIC_KIND_FILE;
> +info->u.file.data = infofile;
> +infofile->align = s->buf_align;
> +infofile->discard = s->has_discard;
> +infofile->write_zero = s->has_write_zeroes;
> +infofile->has_discard_zero = s->has_discard;
> +infofile->discard_zero = s->discard_zeroes;
> +
> +return info;
> +}
> +
>   static int check_for_dasd(int fd)
>   {
>   #ifdef BIODASDINFO2
> @@ -2765,6 +2782,7 @@ BlockDriver bdrv_file = {
>   .bdrv_co_copy_range_from = raw_co_copy_range_from,
>   .bdrv_co_copy_range_to  = raw_co_copy_range_to,
>   .bdrv_refresh_limits = raw_refresh_limits,
> +.bdrv_get_specific_info = raw_get_specific_info,
>   .bdrv_io_plug = raw_aio_plug,
>   .bdrv_io_unplug = raw_aio_unplug,
>   .bdrv_attach_aio_context = raw_aio_attach_aio_context,
> @@ -3240,6 +3258,7 @@ static BlockDriver bdrv_host_device = {
>   .bdrv_co_copy_range_from = raw_co_copy_range_from,
>   .bdrv_co_copy_range_to  = raw_co_copy_range_to,
>   .bdrv_refresh_limits = raw_refresh_limits,
> +.bdrv_get_specific_info = raw_get_specific_info,
>   .bdrv_io_plug = raw_aio_plug,
>   .bdrv_io_unplug = raw_aio_unplug,
>   .bdrv_attach_aio_context = raw_aio_attach_aio_context,
> @@ -3363,6 +3382,7 @@ static BlockDriver bdrv_host_cdrom = {
>   .bdrv_co_pwritev= raw_co_pwritev,
>   .bdrv_co_flush_to_disk  = raw_co_flush_to_disk,
>   .bdrv_refresh_limits = raw_refresh_limits,
> +.bdrv_get_specific_info = raw_get_specific_info,
>   .bdrv_io_plug = raw_aio_plug,
>   .bdrv_io_unplug = raw_aio_unplug,
>   .bdrv_attach_aio_context = raw_aio_attach_aio_context,
> @@ -3494,6 +3514,7 @@ static BlockDriver bdrv_host_cdrom = {
>   .bdrv_co_pwritev= raw_co_pwritev,
>   .bdrv_co_flush_to_disk  = raw_co_flush_to_disk,
>   .bdrv_refresh_limits = raw_refresh_limits,
> +.bdrv_get_specific_info = raw_get_specific_info,
>   .bdrv_io_plug = raw_aio_plug,
>   .bdrv_io_unplug = raw_aio_unplug,
>   .bdrv_attach_aio_context = raw_aio_attach_aio_context,
> 


-- 
Best regards,
Vladimir


Re: [Qemu-block] [PATCH v4 00/21] nbd: add qemu-nbd --list

2019-01-18 Thread Vladimir Sementsov-Ogievskiy
18.01.2019 11:15, Vladimir Sementsov-Ogievskiy wrote:
> 17.01.2019 22:36, Eric Blake wrote:
>> I got tired of debugging whether a server was advertising the
>> correct things during negotiation by inspecting the trace
>> logs of qemu-io as client - not to mention that without SOME
>> sort of client tracing particular commands, we can't easily
>> regression test the server for correct behavior.  The final
>> straw was at KVM Forum, when Nir asked me to make sure there
>> was a way to easily determine if an NBD server is exposing what
>> we really want (and fixing x-dirty-bitmap to behave saner fell
>> out as a result of answering that question).
>>
>> I note that upstream NBD has 'nbd-client -l $host' for querying
>> just export names (with no quoting, so you have to know that
>> a blank line means the default export), but it wasn't powerful
>> enough, so I implemented 'qemu-nbd -L' to document everything.
>> Upstream NBD has separate 'nbd-client' and 'nbd-server' binaries,
>> while we only have 'qemu-nbd' (which is normally just a server,
>> but 'qemu-nbd -c' also operates a second thread as a client).
>> Our other uses of qemu as NBD client are for consuming a block
>> device (as in qemu-io, qemu-img, or a drive to qemu) - but those
>> binaries are less suited to something so specific to the NBD
>> protocol.
>>
>> Bonus: As a result of my work on this series, nbdkit now supports
>> NBD_OPT_INFO (my interoperability testing between server
>> implementations has been paying off, both at fixing server bugs,
>> and at making this code more reliable across difference in valid
>> servers).
>>
>> Also available at:
>> https://repo.or.cz/qemu/ericb.git qemu-nbd-list-v4
>>
>> Currently based on master.
>>
>> Since v3:
>> - 1 new patch (1/21)
>> - split old patch 15/19 into two (16,17/21)
>> - retitle two patches (git backport-diff doesn't do too well at showing
>> the diff on a retitled patch; 5,11/21)
>> - fix review comments from Rich, Vladimir
>>
>> 001/21:[down] 'iotests: Make 233 output more reliable'
>> 002/21:[] [--] 'maint: Allow for EXAMPLES in texi2pod'
>> 003/21:[] [--] 'qemu-nbd: Enhance man page'
> 
> Interesting, I don't get it again. Searched in outlook online, I found only 
> v2 of it,
> checked spam folder and filters. Magic.
> 
> anyway, I can look at 
> https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg04328.html
> 
> 

just look at it in your qemu-nbd-list-v4 tag:

 > Only expose MBR partition @var{num}.  Understands physical partitions
 > 1-4 and logical partitions 5-8.

I'm afraid, I'm too lazy to sort out these things I don't know, so just 
believe. It at least
corresponds to limits in code that it should be 1 <= partition <= 8 ;)

 > @item -c, --connect=@var{dev}
 > Connect @var{filename} to NBD device @var{dev} (Linux only).
 > @item -d, --disconnect
 > Disconnect the device @var{dev} (Linux only).

Yes, and we now have clean code which establish this limitation

 > @item -e, --shared=@var{num}
 > Allow up to @var{num} clients to share the device (default
 > @samp{1}). Safe for readers, but for now, consistency is not
 > guaranteed between multiple writers.

Hmm, don't understand, why you decided to move @samp{1}). on the following 
line, it looks
better as it was. And with a period it's only 69 symbols, the file have a lot 
longer lines.

 > Start a server listening on port 10809 that exposes only the
 > guest-visible contents of a qcow2 file, with no TLS encryption, and
 > with the default export name (an empty string). The command is
 > one-shot, and will block until the first successful client
 > disconnects:
 >
 > @example
 > qemu-nbd -f qcow2 file.qcow2
 > @end example
 >
 > Start a long-running server listening with encryption on port 10810,
 > and require clients to have a correct X.509 certificate to connect to
 > a 1 megabyte subset of a raw file, using the export name 'subset':
 >
 > @example
 > qemu-nbd \
 >   --object tls-creds-x509,id=tls0,endpoint=server,dir=/path/to/qemutls \
 >   --tls-creds tls0 -t -x subset -p 10810 \
 >   --image-opts 
 > driver=raw,offset=1M,length=1M,file.driver=file,file.filename=file.raw
 > @end example

I don't know tls-related, the other options looks fine.

upd (decided to run the command, with dropped tls):
looks fine but not work, as s/length/size.

 >
 > Serve a read-only copy of just the first MBR partition of a guest
 > image over a Unix socket with as many as 5 simultaneous readers, with
 > a persistent process forked as a daemon:
 >
 > @example
 > qemu-nbd --fork -t -e 5 -s /path/to/sock -p 1 -r -f qcow2 file.qcow2
 > @end example

Oops. s/-s/-k/, s/-p/-P/, and idea: may be, use self-descriptive long option 
names in
examples?

 >
 > Expose the guest-visible contents of a qcow2 file via a block device
 > /dev/nbd0 (and possibly creating /dev/nbd0p1 and friends for
 > partitions found within), then disconnect the device when done.
 > @emph{CAUTION}: Do not use this method to mount filesystems from an
 > untrusted guest image 

Re: [Qemu-block] [Qemu-devel] [PATCH 0/3] block: fix last address-of-packed-member warnings

2019-01-18 Thread Peter Maydell
Ping?

thanks
-- PMM

On Mon, 10 Dec 2018 at 11:28, Peter Maydell  wrote:
>
> This patchset fixes the remaining clang warnings in the block/ code
> about taking the address of a packed struct member, which are all
> in block/vpc and block/vdi code handling UUIDs. Mostly I fix
> these by copying the unaligned field to/from a local variable.
> In the case of qemu_uuid_bswap() I opted to change the API to
> take and return the QemuUUID rather than taking a pointer to it,
> which makes almost all the callsites simpler. This does mean
> a struct copy but the struct is only 16 bytes and I didn't
> judge any of the callsites performance-sensitive enough to care
> about a struct copy of that size.
>
> As usual, tested with "make check" only.
>
> thanks
> -- PMM
>
>
> Peter Maydell (3):
>   block/vpc: Don't take address of fields in packed structs
>   block/vdi: Don't take address of fields in packed structs
>   uuid: Make qemu_uuid_bswap() take and return a QemuUUID
>
>  include/qemu/uuid.h  |  2 +-
>  block/vdi.c  | 54 +++-
>  block/vpc.c  |  4 +++-
>  hw/acpi/vmgenid.c|  6 ++---
>  tests/vmgenid-test.c |  2 +-
>  util/uuid.c  | 10 
>  6 files changed, 45 insertions(+), 33 deletions(-)



Re: [Qemu-block] [PATCH v5 10/11] block/backup: tiny refactor backup_job_create

2019-01-18 Thread Max Reitz
On 29.12.18 13:20, Vladimir Sementsov-Ogievskiy wrote:
> Move copy-bitmap find/create code. It's needed for the following
> commit, as we'll need copy_bitmap before actual block job creation. Do
> it in a separate commit to simplify review.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  block/backup.c | 69 +-
>  1 file changed, 40 insertions(+), 29 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v4 02/21] maint: Allow for EXAMPLES in texi2pod

2019-01-18 Thread Vladimir Sementsov-Ogievskiy
17.01.2019 22:36, Eric Blake wrote:
> The next commit will add an EXAMPLES section to qemu-nbd.8;
> for that to work, we need to recognize EXAMPLES in texi2pod.
> We also need to add a dependency from all man pages against
> the generator script, since a change to the generator may
> cause the resulting man page to differ.
> 
> Signed-off-by: Eric Blake 
> Reviewed-by: Richard W.M. Jones 
> 
> ---
> v3: add generic dependency for all man pages in $(DOCS) instead of
> per-line editing [Vladimir]
> ---
>   Makefile| 2 ++
>   scripts/texi2pod.pl | 2 +-
>   2 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/Makefile b/Makefile
> index dccba1dca27..2639437cdaf 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -858,6 +858,8 @@ docs/interop/qemu-qmp-ref.dvi 
> docs/interop/qemu-qmp-ref.html \
>   docs/interop/qemu-qmp-ref.txt docs/interop/qemu-qmp-ref.7: \
>   docs/interop/qemu-qmp-ref.texi docs/interop/qemu-qmp-qapi.texi
> 
> +$(filter %.1 %.7 %.8,$(DOCS)): scripts/texi2pod.pl
> +
>   # Reports/Analysis
> 
>   %/coverage-report.html:
> diff --git a/scripts/texi2pod.pl b/scripts/texi2pod.pl
> index 39ce584a322..839b7917cf7 100755
> --- a/scripts/texi2pod.pl
> +++ b/scripts/texi2pod.pl
> @@ -398,7 +398,7 @@ $sects{NAME} = "$fn \- $tl\n";
>   $sects{FOOTNOTES} .= "=back\n" if exists $sects{FOOTNOTES};
> 
>   for $sect (qw(NAME SYNOPSIS DESCRIPTION OPTIONS ENVIRONMENT FILES
> -   BUGS NOTES FOOTNOTES SEEALSO AUTHOR COPYRIGHT)) {
> +   BUGS NOTES FOOTNOTES EXAMPLES SEEALSO AUTHOR COPYRIGHT)) {
>   if(exists $sects{$sect}) {
>   $head = $sect;
>   $head =~ s/SEEALSO/SEE ALSO/;
> 

I've no idea about how texi2pod.pl works, but Makefile change looks a lot
clearer now, so I have nor more suggestions. Also, I've checked that after
this patch change to scripts/texi2pod.pl really results in remaking selected
targets :)

Reviewed-by: Vladimir Sementsov-Ogievskiy 

-- 
Best regards,
Vladimir


Re: [Qemu-block] [PATCH v3 9/9] qcow2: do encryption in threads

2019-01-18 Thread Alberto Garcia
On Tue 08 Jan 2019 06:06:55 PM CET, Vladimir Sementsov-Ogievskiy wrote:
> Do encryption/decryption in threads, like it is already done for
> compression. This improves asynchronous encrypted io.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy 

Reviewed-by: Alberto Garcia 

Berto



Re: [Qemu-block] [PATCH v4 19/21] qemu-nbd: Add --list option

2019-01-18 Thread Vladimir Sementsov-Ogievskiy
17.01.2019 22:36, Eric Blake wrote:
> We want to be able to detect whether a given qemu NBD server is
> exposing the right export(s) and dirty bitmaps, at least for
> regression testing.  We could use 'nbd-client -l' from the upstream
> NBD project to list exports, but it's annoying to rely on
> out-of-tree binaries; furthermore, nbd-client doesn't necessarily
> know about all of the qemu NBD extensions.  Thus, it is time to add
> a new mode to qemu-nbd that merely sniffs all possible information
> from the server during handshake phase, then disconnects and dumps
> the information.

[..] (hm, first time I do it for commit message)

> Not done here, but maybe worth future experiments: capture
> the meat of NBDExportInfo into a QAPI struct, and use the
> generated QAPI pretty-printers instead of hand-rolling our
> output loop.  It would also permit us to add a JSON output
> mode for machine parsing.
> 
> Signed-off-by: Eric Blake 
> Reviewed-by: Richard W.M. Jones 

Reviewed-by: Vladimir Sementsov-Ogievskiy 


> @@ -816,6 +940,11 @@ int main(int argc, char **argv)
>   }
>   }
> 
> +if (list) {
> +saddr = nbd_build_socket_address(sockpath, bindto, port);
> +return qemu_nbd_client_list(saddr, tlscreds, bindto);

note, that this main() prefers to use exit() instead of return, even for last 
exit(EXIT_SUCCESS).
(but it has already one return, so this is not the first, and for me return 
looks better)


-- 
Best regards,
Vladimir


Re: [Qemu-block] [PATCH v4 17/21] nbd/client: Add nbd_receive_export_list()

2019-01-18 Thread Vladimir Sementsov-Ogievskiy
17.01.2019 22:36, Eric Blake wrote:
> We want to be able to detect whether a given qemu NBD server is
> exposing the right export(s) and dirty bitmaps, at least for
> regression testing.  We could use 'nbd-client -l' from the upstream
> NBD project to list exports, but it's annoying to rely on
> out-of-tree binaries; furthermore, nbd-client doesn't necessarily
> know about all of the qemu NBD extensions.  Thus, we plan on adding
> a new mode to qemu-nbd that merely sniffs all possible information
> from the server during handshake phase, then disconnects and dumps
> the information.
> 
> This patch adds the low-level client code for grabbing the list
> of exports.  It benefits from the recent refactoring patches, in
> order to share as much code as possible when it comes to doing
> validation of server replies.  The resulting information is stored
> in an array of NBDExportInfo which has been expanded to any
> description string, along with a convenience function for freeing
> the list.
> 
> Note: a malicious server could exhaust memory of a client by feeding
> an unending loop of exports; perhaps we should place a limit on how
> many we are willing to receive. But note that a server could
> reasonably be serving an export for every file in a large directory,
> where an arbitrary limit in the client means we can't list anything
> from such a server; the same happens if we just run until the client
> fails to malloc() and thus dies by an abort(), where the limit is
> no longer arbitrary but determined by available memory.  Since the
> client is already planning on being short-lived, it's hard to call
> this a denial of service attack that would starve off other uses,
> so it does not appear to be a security issue.
> 
> Signed-off-by: Eric Blake
> Reviewed-by: Richard W.M. Jones

Reviewed-by: Vladimir Sementsov-Ogievskiy 

-- 
Best regards,
Vladimir


Re: [Qemu-block] [PATCH v5 07/11] block: introduce backup-top filter driver

2019-01-18 Thread Max Reitz
On 17.01.19 13:13, Vladimir Sementsov-Ogievskiy wrote:
> 16.01.2019 19:02, Max Reitz wrote:
>> On 29.12.18 13:20, Vladimir Sementsov-Ogievskiy wrote:
>>> Backup-top filter does copy-before-write operation. It should be
>>> inserted above active disk and has a target node for CBW, like the
>>> following:
>>>
>>>  +---+
>>>  | Guest |
>>>  +---+---+
>>>  |r,w
>>>  v
>>>  +---+---+  target   +---+
>>>  | backup_top|-->| target(qcow2) |
>>>  +---+---+   CBW +---+---+
>>>  |
>>> backing |r,w
>>>  v
>>>  +---+-+
>>>  | Active disk |
>>>  +-+
>>>
>>> The driver will be used in backup instead of write-notifiers.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>>> ---
>>>   block/backup-top.h  |  43 +++
>>>   block/backup-top.c  | 306 
>>>   block/Makefile.objs |   2 +
>>>   3 files changed, 351 insertions(+)
>>>   create mode 100644 block/backup-top.h
>>>   create mode 100644 block/backup-top.c
>>
>> Looks good to me overall, comments below:
>>
> 
> [..]
> 
>  >> +static coroutine_fn int backup_top_cbw(BlockDriverState *bs, uint64_t 
> offset,
>  >> +  uint64_t bytes)
> 
> [..]
> 
>>> +
>>> +/*
>>> + * Features to be implemented:
>>> + * F3. parallelize copying loop
>>> + * F4. detect zeros
>>
>> Isn't there detect-zeroes for that?
> 
> Hmm interesting note. I've placed F4 here, as we do have detecting zeros by 
> hand in
> current backup code. Should we drop it from backup, or switch to just enabling
> detect-zeros on target?

Off the top of my head I don't know a reason why we wouldn't just set
detect-zeroes.  Maybe because there may be other writers to target which
should not use that setting?  But that scenario just seems wrong to me.

>>> + * F5. use block_status ?
>>> + * F6. don't copy clusters which are already cached by COR [see F1]
>>> + * F7. if target is ram-cache and it is full, there should be a 
>>> possibility
>>> + * to drop not necessary data (cached by COR [see F1]) to handle 
>>> CBW
>>> + * fast.
>>
>> Another idea: Read all dirty data from bs->backing (in parallel, more or
>> less), and then perform all writes (to bs->backing and s->target) in
>> parallel as well.
>>
>> (So the writes do not have to wait on one another)
> 
> Yes, we can continue guest write after read part of CBW, but we can't do it 
> always,
> as we want to limit RAM usage to store all these in-flight requests.

But this code here already allocates a buffer to cover all of the area.

> And I think,
> in scheme with ram-cache, ram-cache itself should be actually a kind of 
> storage
> for in-flight requests. And in this terminology, if ram-cache is full, we 
> can't
> proceed with guest write. It's the same as we reached limit on in-flight 
> requests
> count.
> 
> [..]
> 
>>> +
>>> +static int coroutine_fn backup_top_co_pdiscard(BlockDriverState *bs,
>>> +  int64_t offset, int 
>>> bytes)
>>
>> (Indentation)
> 
> looks like it was after renaming
> fleecing_hook
> to
> backup_top
> 
> will fix all

Ah :-)

> [..]
> 
>>> +
>>> +static int coroutine_fn backup_top_co_flush(BlockDriverState *bs)
>>> +{
>>> +if (!bs->backing) {
>>> +return 0;
>>> +}
>>> +
>>> +return bdrv_co_flush(bs->backing->bs);
>>
>> Why not the target as well?
> 
> Should we? It may be guest flush, why to flush backup target on it?

Hm...  Well, the current backup code didn't flush the target, and the
mirror filter doesn't either.  OK then.

Max

> [..]
> 
>>> +static void backup_top_child_perm(BlockDriverState *bs, BdrvChild *c,
>>> +   const BdrvChildRole *role,
>>> +   BlockReopenQueue *reopen_queue,
>>> +   uint64_t perm, uint64_t shared,
>>> +   uint64_t *nperm, uint64_t *nshared)
>>
>> (Indentation)
>>
>>> +{
>>> +bdrv_filter_default_perms(bs, c, role, reopen_queue, perm, shared, 
>>> nperm,
>>> +  nshared);
>>> +
>>> +if (role == _file) {
>>> +/*
>>> + * share write to target, to not interfere guest writes to it's 
>>> disk
>>
>> *interfere with guest writes to its disk
>>
>>> + * which will be in target backing chain
>>> + */
>>> +*nshared = *nshared | BLK_PERM_WRITE;
>>> +*nperm = *nperm | BLK_PERM_WRITE;
>>
>> Why do you need to take the write permission?  This filter does not
>> perform any writes unless it receives writes, right?  (So
>> bdrv_filter_default_perms() should take care of this.)
> 
> Your commit shed a bit of light on permission system for me) Ok, I'll check, 
> if
> that work without PERM_WRITE here.
> 
>>
>>> +} else {
>>> +*nperm = *nperm | 

Re: [Qemu-block] [PATCH v4 16/21] nbd/client: Refactor nbd_opt_go() to support NBD_OPT_INFO

2019-01-18 Thread Vladimir Sementsov-Ogievskiy
17.01.2019 22:36, Eric Blake wrote:
> Rename the function to nbd_opt_info_or_go() with an added parameter
> and slight changes to comments and trace messages, in order to
> reuse the function for NBD_OPT_INFO.
> 
> Signed-off-by: Eric Blake 

Reviewed-by: Vladimir Sementsov-Ogievskiy 

-- 
Best regards,
Vladimir


Re: [Qemu-block] [PATCH v3 8/9] qcow2: bdrv_co_pwritev: move encryption code out of the lock

2019-01-18 Thread Vladimir Sementsov-Ogievskiy
18.01.2019 12:51, Alberto Garcia wrote:
> On Tue 08 Jan 2019 06:06:54 PM CET, Vladimir Sementsov-Ogievskiy wrote:
>> Encryption will be done in threads, to take benefit of it, we should
>> move it out of the lock first.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>> ---
>>   block/qcow2.c | 35 +--
>>   1 file changed, 21 insertions(+), 14 deletions(-)
>>
>> diff --git a/block/qcow2.c b/block/qcow2.c
>> index d6ef606d89..76d3715350 100644
>> --- a/block/qcow2.c
>> +++ b/block/qcow2.c
>> @@ -2077,11 +2077,20 @@ static coroutine_fn int 
>> qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset,
>>   ret = qcow2_alloc_cluster_offset(bs, offset, _bytes,
>>_offset, );
>>   if (ret < 0) {
>> -goto fail;
>> +goto out_locked;
>>   }
>>   
>>   assert((cluster_offset & 511) == 0);
>>   
>> +ret = qcow2_pre_write_overlap_check(bs, 0,
>> +cluster_offset + 
>> offset_in_cluster,
>> +cur_bytes);
>> +if (ret < 0) {
>> +goto out_locked;
>> +}
>> +
>> +qemu_co_mutex_unlock(>lock);
> 
> The usage of lock() and unlock() functions inside and outside of the
> loop plus the two 'locked' and 'unlocked' exit paths is starting to make
> things a bit more messy.
> 
> Having a look at the code it seems that there's only these three parts
> in the whole function that need to have the lock held:
> 
> one:
> ret = qcow2_alloc_cluster_offset(bs, offset, _bytes,
>  _offset, );
> /* ... */
> ret = qcow2_pre_write_overlap_check(bs, 0,
> cluster_offset +
> offset_in_cluster,
> cur_bytes);
> 
> two:
> 
> ret = qcow2_handle_l2meta(bs, , true);
> 
> 
> three:
> qcow2_handle_l2meta(bs, , false);
> 
> 
> So I wonder if it's perhaps simpler to add lock/unlock calls around
> those blocks?

this means, that we'll unlock after qcow2_handle_l2meta and then immediately
lock on next iteration before qcow2_alloc_cluster_offset, so current code
avoids this extra unlock/lock..


> 
> Berto
> 


-- 
Best regards,
Vladimir


Re: [Qemu-block] [Qemu-devel] block: Update flags in bdrv_set_read_only()

2019-01-18 Thread Kevin Wolf
[ Cc: qemu-block - noticed only now that it was missing ]

Am 14.01.2019 um 12:01 hat Kevin Wolf geschrieben:
> Am 12.01.2019 um 18:08 hat Michael Tokarev geschrieben:
> > commit eeae6a596b0efc092f5101c67683053e245e6250
> > Author: Kevin Wolf 
> > Date:   Tue Oct 9 16:57:12 2018 +0200
> > 
> > block: Update flags in bdrv_set_read_only()
> > 
> > To fully change the read-only state of a node, we must not only 
> > change
> > bs->read_only, but also update bs->open_flags.
> > 
> > sort of broke vfat support:
> > 
> >  $ qemu-system-x86_64 -hda fat:foo/
> >  WARNING: Image format was not specified for 'json:{"fat-type": 0, "dir": 
> > "foo/", "driver": "vvfat", "floppy": false, "rw": false}' and probing 
> > guessed raw.
> >   Automatically detecting the format is dangerous for raw images, 
> > write operations on block 0 will be restricted.
> >   Specify the 'raw' format explicitly to remove the restrictions.
> >  qemu-system-x86_64: Initialization of device ide-hd failed: Block node is 
> > read-only
> >  $ _
> > 
> > The warning is annoying but harmless, but the read-only error is fatal.
> > 
> > "Sort-of" is because there's a somewhat strange workaround:
> > 
> >   -hda fat:rw:foo/
> > 
> > but it is a bit more dangerous as well.
> > 
> > It looks like vfat should be handled differently somewhere, to
> > eliminate both the warning and the error?
> 
> Hm... This is not nice, but obviously that patch is still correct.
> 
> Essentially what you're saying is either:
> 
> 1. We want to be able to attach read-only backends to read-write guest
>devices sometimes. If you actually do a write request then, you'll
>get an I/O error,
> 
>or
> 
> 2. vvfat shouldn't expose a read-only backend, but a read-write one that
>always fails when you write.
> 
> I think 2. is easier to implement, but it's special casing vvfat. Does
> this make sense or is it a problem that needs to be solved more
> generically? If it's okay for a read-only FAT backend to be attached to
> an IDE disk that really needs a read-write backend, why wouldn't it be
> okay to attach e.g. a read-only HTTP backend? Or even a read-only image
> file on the local filesystem?
> 
> On the other hand, usually users wouldn't want to silently get a guest
> started up that produces I/O errors on the first write request when they
> just configured things wrong or have the wrong file permissions.
> 
> We can't do both at the same time, though. So what is the behaviour that
> we actually want regarding read-only backends and read-write guest
> devices?
> 
> Kevin



Re: [Qemu-block] [PATCH v2 0/6] Acquire the AioContext during _realize()

2019-01-18 Thread Kevin Wolf
Am 14.01.2019 um 15:23 hat Alberto Garcia geschrieben:
> This series acquires the AioContext in the _realize() functions of
> several devices before making use of their block backends. This fixes
> at least a couple of crashes (in virtio-blk and scsi). The other
> devices don't currently support iothreads so there's no crashes.

>  tests/qemu-iotests/236 | 121 
> +
>  tests/qemu-iotests/236.out |  46 +

236 is already taken by two other series. :-)

I'll use 239 and 239.out for the conflict resolution, so I think the
next free one for you is 240.

Kevin



Re: [Qemu-block] [PATCH v4] qemu-io: Add generic function for reinitializing optind.

2019-01-18 Thread Daniel P . Berrangé
On Fri, Jan 18, 2019 at 10:11:14AM +, Richard W.M. Jones wrote:
> On FreeBSD 11.2:
> 
>   $ nbdkit memory size=1M --run './qemu-io -f raw -c "aio_write 0 512" $nbd'
>   Parsing error: non-numeric argument, or extraneous/unrecognized suffix -- 
> aio_write
> 
> After main option parsing, we reinitialize optind so we can parse each
> command.  However reinitializing optind to 0 does not work on FreeBSD.
> What happens when you do this is optind remains 0 after the option
> parsing loop, and the result is we try to parse argv[optind] ==
> argv[0] == "aio_write" as if it was the first parameter.
> 
> The FreeBSD manual page says:
> 
>   In order to use getopt() to evaluate multiple sets of arguments, or to
>   evaluate a single set of arguments multiple times, the variable optreset
>   must be set to 1 before the second and each additional set of calls to
>   getopt(), and the variable optind must be reinitialized.
> 
> (From the rest of the man page it is clear that optind must be
> reinitialized to 1).
> 
> The glibc man page says:
> 
>   A program that scans multiple argument vectors,  or  rescans  the  same
>   vector  more than once, and wants to make use of GNU extensions such as
>   '+' and '-' at  the  start  of  optstring,  or  changes  the  value  of
>   POSIXLY_CORRECT  between scans, must reinitialize getopt() by resetting
>   optind to 0, rather than the traditional value of 1.  (Resetting  to  0
>   forces  the  invocation  of  an  internal  initialization  routine that
>   rechecks POSIXLY_CORRECT and checks for GNU extensions in optstring.)
> 
> This commit introduces an OS-portability function called
> qemu_reset_optind which provides a way of resetting optind that works
> on FreeBSD and platforms that use optreset, while keeping it the same
> as now on other platforms.
> 
> Note that the qemu codebase sets optind in many other places, but in
> those other places it's setting a local variable and not using getopt.
> This change is only needed in places where we are using getopt and the
> associated global variable optind.
> 
> Signed-off-by: Richard W.M. Jones 
> ---
>  configure| 14 ++
>  include/qemu/osdep.h | 16 
>  qemu-img.c   |  2 +-
>  qemu-io-cmds.c   |  2 +-
>  4 files changed, 32 insertions(+), 2 deletions(-)

Reviewed-by: Daniel P. Berrangé 



Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [Qemu-block] [Qemu-devel] [PATCH v2 0/6] Acquire the AioContext during _realize()

2019-01-18 Thread Kevin Wolf
Am 18.01.2019 um 10:56 hat Stefan Hajnoczi geschrieben:
> On Thu, Jan 17, 2019 at 02:24:10PM +0100, Alberto Garcia wrote:
> > On Wed 16 Jan 2019 02:54:44 PM CET, Stefan Hajnoczi wrote:
> > > This patch series makes virtio-blk and virtio-scsi more robust,
> > > although I haven't checked what happens if the drive is attached to a
> > > different IOThread than the device - will the switchover work?
> > 
> > You mean if you use x-blockdev-set-iothread with a drive and then attach
> > it to a device using a different thread? That seems to work (with my
> > patch).
> 
> Does it work with a running VM that will do I/O?  If yes, then that's
> great and we should merge the virtio-blk/scsi patches to make them
> robust for this case.  I'm a little surprised it works though :-)
> because the code wasn't designed or tested for this.

There are two ways to trigger the crash even without
x-blockdev-set-iothread:

* device_del, then device_add for a device with iothread (virtio-scsi;
  may or may not exist with virtio-blk)
  https://bugzilla.redhat.com/show_bug.cgi?id=1656276

* Simply attach two devices with iothread to the the same node
  https://bugzilla.redhat.com/show_bug.cgi?id=1662508

Maybe this series is actually papering over the real problems, though.
Unplug should move the BDS back to the main AioContext, and attaching
two users with different AioContexts must fail. If we don't take care
there, the locking will be wrong.

Note that even though there are more problems, we still need this series
so we can allow attaching two devices to the same node while using the
same iothread.

Kevin


signature.asc
Description: PGP signature


[Qemu-block] [PATCH v4] qemu-io: Add generic function for reinitializing optind.

2019-01-18 Thread Richard W.M. Jones
v3 was posted here:
https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg01536.html

In v4:

 - Detect optreset at ./configure time and use ifdef HAVE_OPTRESET.

 - Adjusted comment style.

 - Checkpatch is now clean.

 - Retested on Linux, FreeBSD 11.2 and OpenBSD 6.4.

Rich.





[Qemu-block] [PATCH v4] qemu-io: Add generic function for reinitializing optind.

2019-01-18 Thread Richard W.M. Jones
On FreeBSD 11.2:

  $ nbdkit memory size=1M --run './qemu-io -f raw -c "aio_write 0 512" $nbd'
  Parsing error: non-numeric argument, or extraneous/unrecognized suffix -- 
aio_write

After main option parsing, we reinitialize optind so we can parse each
command.  However reinitializing optind to 0 does not work on FreeBSD.
What happens when you do this is optind remains 0 after the option
parsing loop, and the result is we try to parse argv[optind] ==
argv[0] == "aio_write" as if it was the first parameter.

The FreeBSD manual page says:

  In order to use getopt() to evaluate multiple sets of arguments, or to
  evaluate a single set of arguments multiple times, the variable optreset
  must be set to 1 before the second and each additional set of calls to
  getopt(), and the variable optind must be reinitialized.

(From the rest of the man page it is clear that optind must be
reinitialized to 1).

The glibc man page says:

  A program that scans multiple argument vectors,  or  rescans  the  same
  vector  more than once, and wants to make use of GNU extensions such as
  '+' and '-' at  the  start  of  optstring,  or  changes  the  value  of
  POSIXLY_CORRECT  between scans, must reinitialize getopt() by resetting
  optind to 0, rather than the traditional value of 1.  (Resetting  to  0
  forces  the  invocation  of  an  internal  initialization  routine that
  rechecks POSIXLY_CORRECT and checks for GNU extensions in optstring.)

This commit introduces an OS-portability function called
qemu_reset_optind which provides a way of resetting optind that works
on FreeBSD and platforms that use optreset, while keeping it the same
as now on other platforms.

Note that the qemu codebase sets optind in many other places, but in
those other places it's setting a local variable and not using getopt.
This change is only needed in places where we are using getopt and the
associated global variable optind.

Signed-off-by: Richard W.M. Jones 
---
 configure| 14 ++
 include/qemu/osdep.h | 16 
 qemu-img.c   |  2 +-
 qemu-io-cmds.c   |  2 +-
 4 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/configure b/configure
index 3eee3fcf70..3d46df1517 100755
--- a/configure
+++ b/configure
@@ -4215,6 +4215,17 @@ if compile_prog "" "" ; then
   signalfd=yes
 fi
 
+# check if optreset global is declared by 
+optreset="no"
+cat > $TMPC << EOF
+#include 
+int main(void) { return optreset; }
+EOF
+
+if compile_prog "" "" ; then
+  optreset=yes
+fi
+
 # check if eventfd is supported
 eventfd=no
 cat > $TMPC << EOF
@@ -6577,6 +6588,9 @@ fi
 if test "$signalfd" = "yes" ; then
   echo "CONFIG_SIGNALFD=y" >> $config_host_mak
 fi
+if test "$optreset" = "yes" ; then
+  echo "HAVE_OPTRESET=y" >> $config_host_mak
+fi
 if test "$tcg" = "yes"; then
   echo "CONFIG_TCG=y" >> $config_host_mak
   if test "$tcg_interpreter" = "yes" ; then
diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index 80df7253db..840af09cb0 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -109,6 +109,7 @@ extern int daemon(int, int);
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -604,4 +605,19 @@ extern int qemu_icache_linesize_log;
 extern int qemu_dcache_linesize;
 extern int qemu_dcache_linesize_log;
 
+/*
+ * After using getopt or getopt_long, if you need to parse another set
+ * of options, then you must reset optind.  Unfortunately the way to
+ * do this varies between implementations of getopt.
+ */
+static inline void qemu_reset_optind(void)
+{
+#ifdef HAVE_OPTRESET
+optind = 1;
+optreset = 1;
+#else
+optind = 0;
+#endif
+}
+
 #endif
diff --git a/qemu-img.c b/qemu-img.c
index ad04f59565..25288c4d18 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -4962,7 +4962,7 @@ int main(int argc, char **argv)
 return 0;
 }
 argv += optind;
-optind = 0;
+qemu_reset_optind();
 
 if (!trace_init_backends()) {
 exit(1);
diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
index 2c39124036..ee8f56e46a 100644
--- a/qemu-io-cmds.c
+++ b/qemu-io-cmds.c
@@ -114,7 +114,7 @@ static int command(BlockBackend *blk, const cmdinfo_t *ct, 
int argc,
 }
 }
 
-optind = 0;
+qemu_reset_optind();
 return ct->cfunc(blk, argc, argv);
 }
 
-- 
2.20.1




Re: [Qemu-block] [PATCH v4 01/21] iotests: Make 233 output more reliable

2019-01-18 Thread Daniel P . Berrangé
On Thu, Jan 17, 2019 at 01:36:38PM -0600, Eric Blake wrote:
> We have a race between the nbd server and the client both trying
> to report errors at once which can make the test sometimes fail
> if the output lines swap order under load.  Break the race by
> collecting server messages into a file and then replaying that
> at the end of the test.
> 
> Signed-off-by: Eric Blake 
> CC: Daniel P. Berrangé 
> 
> ---
> An alternative solution might be to silence the message from the
> server by default, and output it only when -v was passed

I wouldn't consider this an either/or situation. It is probably
good practice to qemu-nbd to be completely silent wrt client
problems so a malicious client can't spam the qemu-nbd log (if
any). None the less it is also useful to have the iotests validate
that this log message is printed.

Reviewed-by: Daniel P. Berrangé 


> ---
>  tests/qemu-iotests/233 | 11 ---
>  tests/qemu-iotests/233.out |  4 +++-
>  2 files changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/tests/qemu-iotests/233 b/tests/qemu-iotests/233
> index 1814efe..ab15c777370 100755
> --- a/tests/qemu-iotests/233
> +++ b/tests/qemu-iotests/233
> @@ -2,7 +2,7 @@
>  #
>  # Test NBD TLS certificate / authorization integration
>  #
> -# Copyright (C) 2018 Red Hat, Inc.
> +# Copyright (C) 2018-2019 Red Hat, Inc.
>  #
>  # This program is free software; you can redistribute it and/or modify
>  # it under the terms of the GNU General Public License as published by
> @@ -30,6 +30,7 @@ _cleanup()
>  {
>  nbd_server_stop
>  _cleanup_test_img
> +rm -f "$TEST_DIR/server.log"
>  tls_x509_cleanup
>  }
>  trap "_cleanup; exit \$status" 0 1 2 3 15
> @@ -66,7 +67,7 @@ $QEMU_IO -c 'w -P 0x11 1m 1m' "$TEST_IMG" | _filter_qemu_io
> 
>  echo
>  echo "== check TLS client to plain server fails =="
> -nbd_server_start_tcp_socket -f $IMGFMT "$TEST_IMG"
> +nbd_server_start_tcp_socket -f $IMGFMT "$TEST_IMG" 2> "$TEST_DIR/server.log"
> 
>  $QEMU_IMG info --image-opts \
>  --object tls-creds-x509,dir=${tls_dir}/client1,endpoint=client,id=tls0 \
> @@ -81,7 +82,7 @@ echo "== check plain client to TLS server fails =="
>  nbd_server_start_tcp_socket \
>  --object 
> tls-creds-x509,dir=${tls_dir}/server1,endpoint=server,id=tls0,verify-peer=yes 
> \
>  --tls-creds tls0 \
> --f $IMGFMT "$TEST_IMG"
> +-f $IMGFMT "$TEST_IMG" 2>> "$TEST_DIR/server.log"
> 
>  $QEMU_IMG info nbd://localhost:$nbd_tcp_port 2>&1 | sed 
> "s/$nbd_tcp_port/PORT/g"
> 
> @@ -109,6 +110,10 @@ $QEMU_IO -c 'r -P 0x11 1m 1m' -c 'w -P 0x22 1m 1m' 
> --image-opts \
> 
>  $QEMU_IO -f $IMGFMT -r -U -c 'r -P 0x22 1m 1m' "$TEST_IMG" | _filter_qemu_io
> 
> +echo
> +echo "== final server log =="
> +cat "$TEST_DIR/server.log"
> +
>  # success, all done
>  echo "*** done"
>  rm -f $seq.full
> diff --git a/tests/qemu-iotests/233.out b/tests/qemu-iotests/233.out
> index 5f416721b03..2199d8a8b32 100644
> --- a/tests/qemu-iotests/233.out
> +++ b/tests/qemu-iotests/233.out
> @@ -27,7 +27,6 @@ virtual size: 64M (67108864 bytes)
>  disk size: unavailable
> 
>  == check TLS with different CA fails ==
> -qemu-nbd: option negotiation failed: Verify failed: No certificate was found.
>  qemu-img: Could not open 
> 'driver=nbd,host=127.0.0.1,port=PORT,tls-creds=tls0': The certificate hasn't 
> got a known issuer
> 
>  == perform I/O over TLS ==
> @@ -37,4 +36,7 @@ wrote 1048576/1048576 bytes at offset 1048576
>  1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>  read 1048576/1048576 bytes at offset 1048576
>  1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +
> +== final server log ==
> +qemu-nbd: option negotiation failed: Verify failed: No certificate was found.
>  *** done
> -- 
> 2.20.1
> 

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [Qemu-block] [PATCH v4 01/21] iotests: Make 233 output more reliable

2019-01-18 Thread Daniel P . Berrangé
On Fri, Jan 18, 2019 at 08:28:21AM +, Vladimir Sementsov-Ogievskiy wrote:
> 17.01.2019 22:36, Eric Blake wrote:
> > We have a race between the nbd server and the client both trying
> > to report errors at once which can make the test sometimes fail
> > if the output lines swap order under load.  Break the race by
> > collecting server messages into a file and then replaying that
> > at the end of the test.
> > 
> > Signed-off-by: Eric Blake 
> > CC: Daniel P. Berrangé 
> > 
> > ---
> > An alternative solution might be to silence the message from the
> > server by default, and output it only when -v was passed
> > ---
> >   tests/qemu-iotests/233 | 11 ---
> >   tests/qemu-iotests/233.out |  4 +++-
> >   2 files changed, 11 insertions(+), 4 deletions(-)
> > 
> > diff --git a/tests/qemu-iotests/233 b/tests/qemu-iotests/233
> > index 1814efe..ab15c777370 100755
> > --- a/tests/qemu-iotests/233
> > +++ b/tests/qemu-iotests/233
> > @@ -2,7 +2,7 @@
> >   #
> >   # Test NBD TLS certificate / authorization integration
> >   #
> > -# Copyright (C) 2018 Red Hat, Inc.
> > +# Copyright (C) 2018-2019 Red Hat, Inc.
> >   #
> >   # This program is free software; you can redistribute it and/or modify
> >   # it under the terms of the GNU General Public License as published by
> > @@ -30,6 +30,7 @@ _cleanup()
> >   {
> >   nbd_server_stop
> >   _cleanup_test_img
> > +rm -f "$TEST_DIR/server.log"
> >   tls_x509_cleanup
> >   }
> >   trap "_cleanup; exit \$status" 0 1 2 3 15
> > @@ -66,7 +67,7 @@ $QEMU_IO -c 'w -P 0x11 1m 1m' "$TEST_IMG" | 
> > _filter_qemu_io
> > 
> >   echo
> >   echo "== check TLS client to plain server fails =="
> > -nbd_server_start_tcp_socket -f $IMGFMT "$TEST_IMG"
> > +nbd_server_start_tcp_socket -f $IMGFMT "$TEST_IMG" 2> 
> > "$TEST_DIR/server.log"
> > 
> >   $QEMU_IMG info --image-opts \
> >   --object 
> > tls-creds-x509,dir=${tls_dir}/client1,endpoint=client,id=tls0 \
> > @@ -81,7 +82,7 @@ echo "== check plain client to TLS server fails =="
> >   nbd_server_start_tcp_socket \
> >   --object 
> > tls-creds-x509,dir=${tls_dir}/server1,endpoint=server,id=tls0,verify-peer=yes
> >  \
> >   --tls-creds tls0 \
> > --f $IMGFMT "$TEST_IMG"
> > +-f $IMGFMT "$TEST_IMG" 2>> "$TEST_DIR/server.log"
> > 
> >   $QEMU_IMG info nbd://localhost:$nbd_tcp_port 2>&1 | sed 
> > "s/$nbd_tcp_port/PORT/g"
> > 
> > @@ -109,6 +110,10 @@ $QEMU_IO -c 'r -P 0x11 1m 1m' -c 'w -P 0x22 1m 1m' 
> > --image-opts \
> > 
> >   $QEMU_IO -f $IMGFMT -r -U -c 'r -P 0x22 1m 1m' "$TEST_IMG" | 
> > _filter_qemu_io
> > 
> > +echo
> > +echo "== final server log =="
> > +cat "$TEST_DIR/server.log"
> > +
> >   # success, all done
> >   echo "*** done"
> >   rm -f $seq.full
> > diff --git a/tests/qemu-iotests/233.out b/tests/qemu-iotests/233.out
> > index 5f416721b03..2199d8a8b32 100644
> > --- a/tests/qemu-iotests/233.out
> > +++ b/tests/qemu-iotests/233.out
> > @@ -27,7 +27,6 @@ virtual size: 64M (67108864 bytes)
> >   disk size: unavailable
> > 
> >   == check TLS with different CA fails ==
> > -qemu-nbd: option negotiation failed: Verify failed: No certificate was 
> > found.
> >   qemu-img: Could not open 
> > 'driver=nbd,host=127.0.0.1,port=PORT,tls-creds=tls0': The certificate 
> > hasn't got a known issuer
> > 
> >   == perform I/O over TLS ==
> > @@ -37,4 +36,7 @@ wrote 1048576/1048576 bytes at offset 1048576
> >   1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> >   read 1048576/1048576 bytes at offset 1048576
> >   1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> > +
> > +== final server log ==
> > +qemu-nbd: option negotiation failed: Verify failed: No certificate was 
> > found.
> >   *** done
> > 
> 
> The drawback of this way is that we lose captions of test cases, and will not 
> know, from which testcase is error output in "final server log".

That is true, but I don't think it is worth worrying about. It wouldn't
be very hard to identify which test was related to the server log messages,.

> 
> What about something like
> 
> prt() {
>echo "$@" | tee "$SERVER_LOG"
> }
> 
> and then for captions:
> 
> - echo
> - echo "== check TLS works =="
> + prt
> + prt "== check TLS works =="

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [Qemu-block] [PATCH v4 15/21] nbd/client: Pull out oldstyle size determination

2019-01-18 Thread Vladimir Sementsov-Ogievskiy
17.01.2019 22:36, Eric Blake wrote:
> Another refactoring creating nbd_negotiate_finish_oldstyle()
> for further reuse during 'qemu-nbd --list'.
> 
> Signed-off-by: Eric Blake 
> Reviewed-by: Richard W.M. Jones 

you forget my r-b:
Reviewed-by: Vladimir Sementsov-Ogievskiy 

-- 
Best regards,
Vladimir


Re: [Qemu-block] [PATCH v4 11/21] nbd/client: Split out nbd_send_meta_query()

2019-01-18 Thread Vladimir Sementsov-Ogievskiy
17.01.2019 22:36, Eric Blake wrote:
> Refactor nbd_negotiate_simple_meta_context() to pull out the
> code that can be reused to send a LIST request for 0 or 1 query.
> No semantic change.  The old comment about 'sizeof(uint32_t)'
> being equivalent to '/* number of queries */' is no longer
> needed, now that we are computing 'sizeof(queries)' instead.
> 
> Signed-off-by: Eric Blake 
> Message-Id: <20181215135324.152629-14-ebl...@redhat.com>
> Reviewed-by: Richard W.M. Jones 

Reviewed-by: Vladimir Sementsov-Ogievskiy 

-- 
Best regards,
Vladimir


Re: [Qemu-block] [Qemu-devel] [PATCH v2 0/6] Acquire the AioContext during _realize()

2019-01-18 Thread Stefan Hajnoczi
On Thu, Jan 17, 2019 at 02:24:10PM +0100, Alberto Garcia wrote:
> On Wed 16 Jan 2019 02:54:44 PM CET, Stefan Hajnoczi wrote:
> > This patch series makes virtio-blk and virtio-scsi more robust,
> > although I haven't checked what happens if the drive is attached to a
> > different IOThread than the device - will the switchover work?
> 
> You mean if you use x-blockdev-set-iothread with a drive and then attach
> it to a device using a different thread? That seems to work (with my
> patch).

Does it work with a running VM that will do I/O?  If yes, then that's
great and we should merge the virtio-blk/scsi patches to make them
robust for this case.  I'm a little surprised it works though :-)
because the code wasn't designed or tested for this.

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-block] [PATCH v3 8/9] qcow2: bdrv_co_pwritev: move encryption code out of the lock

2019-01-18 Thread Alberto Garcia
On Tue 08 Jan 2019 06:06:54 PM CET, Vladimir Sementsov-Ogievskiy wrote:
> Encryption will be done in threads, to take benefit of it, we should
> move it out of the lock first.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  block/qcow2.c | 35 +--
>  1 file changed, 21 insertions(+), 14 deletions(-)
>
> diff --git a/block/qcow2.c b/block/qcow2.c
> index d6ef606d89..76d3715350 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -2077,11 +2077,20 @@ static coroutine_fn int 
> qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset,
>  ret = qcow2_alloc_cluster_offset(bs, offset, _bytes,
>   _offset, );
>  if (ret < 0) {
> -goto fail;
> +goto out_locked;
>  }
>  
>  assert((cluster_offset & 511) == 0);
>  
> +ret = qcow2_pre_write_overlap_check(bs, 0,
> +cluster_offset + 
> offset_in_cluster,
> +cur_bytes);
> +if (ret < 0) {
> +goto out_locked;
> +}
> +
> +qemu_co_mutex_unlock(>lock);

The usage of lock() and unlock() functions inside and outside of the
loop plus the two 'locked' and 'unlocked' exit paths is starting to make
things a bit more messy.

Having a look at the code it seems that there's only these three parts
in the whole function that need to have the lock held:

one:
   ret = qcow2_alloc_cluster_offset(bs, offset, _bytes,
_offset, );
   /* ... */
   ret = qcow2_pre_write_overlap_check(bs, 0,
   cluster_offset +
   offset_in_cluster,
   cur_bytes);

two:

   ret = qcow2_handle_l2meta(bs, , true);


three:
   qcow2_handle_l2meta(bs, , false);


So I wonder if it's perhaps simpler to add lock/unlock calls around
those blocks?

Berto



Re: [Qemu-block] [PATCH v4 05/21] nbd/server: Hoist length check to qmp_nbd_server_add

2019-01-18 Thread Vladimir Sementsov-Ogievskiy
17.01.2019 22:36, Eric Blake wrote:
> We only had two callers to nbd_export_new; qemu-nbd.c always
> passed a valid offset/length pair (because it already checked
> the file length, to ensure that offset was in bounds), while
> blockdev-nbd.c always passed 0/-1.  Then nbd_export_new reduces
> the size to a multiple of BDRV_SECTOR_SIZE (can only happen
> when offset is not sector-aligned, since bdrv_getlength()
> currently rounds up), which can result in offset being greater
> than the enforced length, but that's not fatal (the server
> rejects client requests that exceed the advertised length).
> 
> However, I'm finding it easier to work with the code if we are
> consistent on having both callers pass in a valid length, and
> just assert that things are sane in nbd_export_new, meaning
> that no negative values were passed, and that offset+size does
> not exceed 63 bits (as that really is a fundamental limit to
> later operations, whether we use off_t or uint64_t).
> 
> Signed-off-by: Eric Blake 

Reviewed-by: Vladimir Sementsov-Ogievskiy 

-- 
Best regards,
Vladimir


Re: [Qemu-block] [PATCH v4 01/21] iotests: Make 233 output more reliable

2019-01-18 Thread Vladimir Sementsov-Ogievskiy
17.01.2019 22:36, Eric Blake wrote:
> We have a race between the nbd server and the client both trying
> to report errors at once which can make the test sometimes fail
> if the output lines swap order under load.  Break the race by
> collecting server messages into a file and then replaying that
> at the end of the test.
> 
> Signed-off-by: Eric Blake 
> CC: Daniel P. Berrangé 
> 
> ---
> An alternative solution might be to silence the message from the
> server by default, and output it only when -v was passed
> ---
>   tests/qemu-iotests/233 | 11 ---
>   tests/qemu-iotests/233.out |  4 +++-
>   2 files changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/tests/qemu-iotests/233 b/tests/qemu-iotests/233
> index 1814efe..ab15c777370 100755
> --- a/tests/qemu-iotests/233
> +++ b/tests/qemu-iotests/233
> @@ -2,7 +2,7 @@
>   #
>   # Test NBD TLS certificate / authorization integration
>   #
> -# Copyright (C) 2018 Red Hat, Inc.
> +# Copyright (C) 2018-2019 Red Hat, Inc.
>   #
>   # This program is free software; you can redistribute it and/or modify
>   # it under the terms of the GNU General Public License as published by
> @@ -30,6 +30,7 @@ _cleanup()
>   {
>   nbd_server_stop
>   _cleanup_test_img
> +rm -f "$TEST_DIR/server.log"
>   tls_x509_cleanup
>   }
>   trap "_cleanup; exit \$status" 0 1 2 3 15
> @@ -66,7 +67,7 @@ $QEMU_IO -c 'w -P 0x11 1m 1m' "$TEST_IMG" | _filter_qemu_io
> 
>   echo
>   echo "== check TLS client to plain server fails =="
> -nbd_server_start_tcp_socket -f $IMGFMT "$TEST_IMG"
> +nbd_server_start_tcp_socket -f $IMGFMT "$TEST_IMG" 2> "$TEST_DIR/server.log"
> 
>   $QEMU_IMG info --image-opts \
>   --object tls-creds-x509,dir=${tls_dir}/client1,endpoint=client,id=tls0 \
> @@ -81,7 +82,7 @@ echo "== check plain client to TLS server fails =="
>   nbd_server_start_tcp_socket \
>   --object 
> tls-creds-x509,dir=${tls_dir}/server1,endpoint=server,id=tls0,verify-peer=yes 
> \
>   --tls-creds tls0 \
> --f $IMGFMT "$TEST_IMG"
> +-f $IMGFMT "$TEST_IMG" 2>> "$TEST_DIR/server.log"
> 
>   $QEMU_IMG info nbd://localhost:$nbd_tcp_port 2>&1 | sed 
> "s/$nbd_tcp_port/PORT/g"
> 
> @@ -109,6 +110,10 @@ $QEMU_IO -c 'r -P 0x11 1m 1m' -c 'w -P 0x22 1m 1m' 
> --image-opts \
> 
>   $QEMU_IO -f $IMGFMT -r -U -c 'r -P 0x22 1m 1m' "$TEST_IMG" | _filter_qemu_io
> 
> +echo
> +echo "== final server log =="
> +cat "$TEST_DIR/server.log"
> +
>   # success, all done
>   echo "*** done"
>   rm -f $seq.full
> diff --git a/tests/qemu-iotests/233.out b/tests/qemu-iotests/233.out
> index 5f416721b03..2199d8a8b32 100644
> --- a/tests/qemu-iotests/233.out
> +++ b/tests/qemu-iotests/233.out
> @@ -27,7 +27,6 @@ virtual size: 64M (67108864 bytes)
>   disk size: unavailable
> 
>   == check TLS with different CA fails ==
> -qemu-nbd: option negotiation failed: Verify failed: No certificate was found.
>   qemu-img: Could not open 
> 'driver=nbd,host=127.0.0.1,port=PORT,tls-creds=tls0': The certificate hasn't 
> got a known issuer
> 
>   == perform I/O over TLS ==
> @@ -37,4 +36,7 @@ wrote 1048576/1048576 bytes at offset 1048576
>   1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>   read 1048576/1048576 bytes at offset 1048576
>   1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +
> +== final server log ==
> +qemu-nbd: option negotiation failed: Verify failed: No certificate was found.
>   *** done
> 

The drawback of this way is that we lose captions of test cases, and will not 
know, from which testcase is error output in "final server log".

What about something like

prt() {
   echo "$@" | tee "$SERVER_LOG"
}

and then for captions:

- echo
- echo "== check TLS works =="
+ prt
+ prt "== check TLS works =="



-- 
Best regards,
Vladimir


Re: [Qemu-block] [PATCH v4 00/21] nbd: add qemu-nbd --list

2019-01-18 Thread Vladimir Sementsov-Ogievskiy
17.01.2019 22:36, Eric Blake wrote:
> I got tired of debugging whether a server was advertising the
> correct things during negotiation by inspecting the trace
> logs of qemu-io as client - not to mention that without SOME
> sort of client tracing particular commands, we can't easily
> regression test the server for correct behavior.  The final
> straw was at KVM Forum, when Nir asked me to make sure there
> was a way to easily determine if an NBD server is exposing what
> we really want (and fixing x-dirty-bitmap to behave saner fell
> out as a result of answering that question).
> 
> I note that upstream NBD has 'nbd-client -l $host' for querying
> just export names (with no quoting, so you have to know that
> a blank line means the default export), but it wasn't powerful
> enough, so I implemented 'qemu-nbd -L' to document everything.
> Upstream NBD has separate 'nbd-client' and 'nbd-server' binaries,
> while we only have 'qemu-nbd' (which is normally just a server,
> but 'qemu-nbd -c' also operates a second thread as a client).
> Our other uses of qemu as NBD client are for consuming a block
> device (as in qemu-io, qemu-img, or a drive to qemu) - but those
> binaries are less suited to something so specific to the NBD
> protocol.
> 
> Bonus: As a result of my work on this series, nbdkit now supports
> NBD_OPT_INFO (my interoperability testing between server
> implementations has been paying off, both at fixing server bugs,
> and at making this code more reliable across difference in valid
> servers).
> 
> Also available at:
> https://repo.or.cz/qemu/ericb.git qemu-nbd-list-v4
> 
> Currently based on master.
> 
> Since v3:
> - 1 new patch (1/21)
> - split old patch 15/19 into two (16,17/21)
> - retitle two patches (git backport-diff doesn't do too well at showing
> the diff on a retitled patch; 5,11/21)
> - fix review comments from Rich, Vladimir
> 
> 001/21:[down] 'iotests: Make 233 output more reliable'
> 002/21:[] [--] 'maint: Allow for EXAMPLES in texi2pod'
> 003/21:[] [--] 'qemu-nbd: Enhance man page'

Interesting, I don't get it again. Searched in outlook online, I found only v2 
of it,
checked spam folder and filters. Magic.

anyway, I can look at 
https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg04328.html


-- 
Best regards,
Vladimir