date:20190112

Re: [Qemu-devel] qemu-user-linux: how could I measure performance for aarch64 and arm?

2019-01-12 Thread Matwey V. Kornilov

пт, 11 янв. 2019 г. в 22:24, Matwey V. Kornilov :
>
> пт, 11 янв. 2019 г. в 12:52, Peter Maydell :
> >
> > On Thu, 10 Jan 2019 at 19:33, Matwey V. Kornilov
> >  wrote:
> > > I am running the same application compiled for aarch64 and armv7l on
> > > x86_64 platform using qemu-user-linux tools.
> > >
> > > I see dramatic performance difference (30 times) between emulated
> > > architectures: aarch64 runs for ~4 minutes, armv7l runs for ~2 hours.
> > > I do understand that CPU architecture emulation is inherently slow
> > > thing, but my question is about the difference.
> > >
> > > How could I debug to understand what is the reason for such a big
> > > difference? I've already tried to run stress-ng compiled for this two
> > > architectures, but it leads to the same performance per second.
> > >
> > > I am running qemu 2.11, should I try other version?
> >
> > Yes, do try 3.1 -- we have done some overall TCG performance
> > improvements.
>
> Indeed, qemu-arm from master runs for 4 minutes where 2.11 runs for 2
> hours for me. It is impressive improvement.

I've managed to bisected the first good (fast) commit:

commit 2a53535af471f4bee9d6cb5b363746b8d5ed21dd
Author: Luke Shumaker 
Date:   Thu Dec 28 13:08:13 2017 -0500

linux-user: init_guest_space: Try to make ARM space+commpage continuous

Though I am not sure, how does it help.

>
> >
> > For a big difference between target architectures like that,
> > I would try starting by using some host performance tools on
> > the two runs to see where all the time is being taken in
> > the armv7l guest run -- is it all in translated guest code,
> > or is there more time (proportionally) spent in particular
> > parts of the QEMU C code? Does the armv7l version do
> > many more or different syscalls (check with the QEMU -strace
> > option) ?
> >
> > Also you should check performance on h/w 32 bit vs
> > 64-bit Arm if you can, to confirm that it's not just
> > that the guest application runs much slower there.
> > (If you don't have the arm hardware you could at least
> > check x86 32-bit vs 64-bit.)
> >
> > thanks
> > -- PMM
>
>
>
> --
> With best regards,
> Matwey V. Kornilov



-- 
With best regards,
Matwey V. Kornilov

Re: [Qemu-devel] [PATCH 1/2 v2] usb-mtp: Reallocate buffer in multiples of MTP_WRITE_BUF_SZ

2019-01-12 Thread Bandan Das

Eric Blake  writes:

> On 1/11/19 2:20 AM, Bandan Das wrote:
>> This is a "pre-patch" to breaking up the write buffer for
>> MTP writes. Instead of allocating a mtp buffer equal to size
>> sent by the initiator, we start with a small size and reallocate
>> multiples (of that small size) as needed.
>> 
>> Signed-off-by: Bandan Das 
>> ---
>>  hw/usb/dev-mtp.c | 26 --
>>  1 file changed, 12 insertions(+), 14 deletions(-)
>> 
>> diff --git a/hw/usb/dev-mtp.c b/hw/usb/dev-mtp.c
>> index b19b576278..a0d98c93ee 100644
>> --- a/hw/usb/dev-mtp.c
>> +++ b/hw/usb/dev-mtp.c
>> @@ -152,7 +152,6 @@ struct MTPData {
>>  bool first;
>>  /* Used for >4G file sizes */
>>  bool pending;
>> -uint64_t cached_length;
>>  int  fd;
>>  };
>>  
>> @@ -244,6 +243,7 @@ typedef struct {
>>  
>>  #define MTP_MANUFACTURER  "QEMU"
>>  #define MTP_PRODUCT   "QEMU filesharing"
>> +#define MTP_WRITE_BUF_SZ  512000
>
> Why not a power of two?  Perhaps use units.h and spell it (512 * KiB) ?

Sure, I will change it in a later patch.

Bandan

Re: [Qemu-devel] [PATCH 2/4] linuxboot_dma: move common functions in a new header

2019-01-12 Thread Eric Blake

On 1/12/19 12:25 PM, Stefano Garzarella wrote:

>>> Better to just pull in qemu/osdep.h
>>
>> Except that qemu/osdep.h should already have been pulled in by whatever
>> .c file is including this header. We specifically document that .h files
>> shouldn't need to include osdep.h (and in turn, anything that osdep.h
>> already pulls in, like ).
> 
> Since I can't include qemu/osdep.h because this header file is used by
> the option roms (bare metal), do you think is better to include
> stdint.h in optrom.h or in the .c files that use it?
> As Stefan pointed out, I'm including qemu_fw_cfg.h in optrom.h and it
> depends on stdint.h. In this case, what is the best approach?

If this is one of the exception files that is used to build files
outside of qemu, then its should probably be mentioned as an exception
in scripts/clean-includes (okay, I see that pc-bios/ is already
exempted) - and thus using osdep.h is not an option, but using
 is, because it is the exception to the rule, in relation to
files used only for qemu.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v3 0/5] kvm "virtio pmem" device

2019-01-12 Thread Pankaj Gupta



> >
> >
> >
> > >
> > > On Thu 10-01-19 12:26:17, Dave Chinner wrote:
> > > > On Wed, Jan 09, 2019 at 08:17:31PM +0530, Pankaj Gupta wrote:
> > > > >  This patch series has implementation for "virtio pmem".
> > > > >  "virtio pmem" is fake persistent memory(nvdimm) in guest
> > > > >  which allows to bypass the guest page cache. This also
> > > > >  implements a VIRTIO based asynchronous flush mechanism.
> > > >
> > > > H. Sharing the host page cache direct into the guest VM. Sounds
> > > > like a good idea, but.
> > > >
> > > > This means the guest VM can now run timing attacks to observe host
> > > > side page cache residency, and depending on the implementation I'm
> > > > guessing that the guest will be able to control host side page
> > > > cache eviction, too (e.g. via discard or hole punch operations).
> > > >
> > > > Which means this functionality looks to me like a new vector for
> > > > information leakage into and out of the guest VM via guest
> > > > controlled host page cache manipulation.
> > > >
> > > > https://arxiv.org/pdf/1901.01161
> > > >
> > > > I might be wrong, but if I'm not we're going to have to be very
> > > > careful about how guest VMs can access and manipulate host side
> > > > resources like the page cache.
> > >
> > > Right. Thinking about this I would be more concerned about the fact that
> > > guest can effectively pin amount of host's page cache upto size of the
> > > device/file passed to guest as PMEM, can't it Pankaj? Or is there some
> > > QEMU
> > > magic that avoids this?
> >
> > Yes, guest will pin these host page cache pages using 'get_user_pages' by
> > elevating the page reference count. But these pages can be reclaimed by
> > host
> > at any time when there is memory pressure.
> 
> Wait, how can the guest pin the host pages? I would expect this to
> happen only when using vfio and device assignment. Otherwise, no the
> host can't reclaim a pinned page, that's the whole point of a pin to
> prevent the mm from reclaiming ownership.

yes. You are right I just used the pin word but it does not actually pin pages 
permanently. I had gone through the discussion on existing problems with 
get_user_pages and DMA e.g [1] to understand Jan's POV. It does mention GUP 
pin pages so I also used the word 'pin'. But guest does not permanently pin 
these pages and these pages can be reclaimed by host.

> 
> > KVM does not permanently pin pages. vfio does that but we are not using
> > it here.
> 
> Right, so I'm confused by your pin assertion above.

Sorry! for the confusion. 

[1] https://lwn.net/Articles/753027/

Thanks,
Pankaj

Re: [Qemu-devel] [PATCH v3 0/5] kvm "virtio pmem" device

2019-01-12 Thread Dan Williams

On Sat, Jan 12, 2019 at 5:38 PM Pankaj Gupta  wrote:
>
>
>
> >
> > On Thu 10-01-19 12:26:17, Dave Chinner wrote:
> > > On Wed, Jan 09, 2019 at 08:17:31PM +0530, Pankaj Gupta wrote:
> > > >  This patch series has implementation for "virtio pmem".
> > > >  "virtio pmem" is fake persistent memory(nvdimm) in guest
> > > >  which allows to bypass the guest page cache. This also
> > > >  implements a VIRTIO based asynchronous flush mechanism.
> > >
> > > H. Sharing the host page cache direct into the guest VM. Sounds
> > > like a good idea, but.
> > >
> > > This means the guest VM can now run timing attacks to observe host
> > > side page cache residency, and depending on the implementation I'm
> > > guessing that the guest will be able to control host side page
> > > cache eviction, too (e.g. via discard or hole punch operations).
> > >
> > > Which means this functionality looks to me like a new vector for
> > > information leakage into and out of the guest VM via guest
> > > controlled host page cache manipulation.
> > >
> > > https://arxiv.org/pdf/1901.01161
> > >
> > > I might be wrong, but if I'm not we're going to have to be very
> > > careful about how guest VMs can access and manipulate host side
> > > resources like the page cache.
> >
> > Right. Thinking about this I would be more concerned about the fact that
> > guest can effectively pin amount of host's page cache upto size of the
> > device/file passed to guest as PMEM, can't it Pankaj? Or is there some QEMU
> > magic that avoids this?
>
> Yes, guest will pin these host page cache pages using 'get_user_pages' by
> elevating the page reference count. But these pages can be reclaimed by host
> at any time when there is memory pressure.

Wait, how can the guest pin the host pages? I would expect this to
happen only when using vfio and device assignment. Otherwise, no the
host can't reclaim a pinned page, that's the whole point of a pin to
prevent the mm from reclaiming ownership.

> KVM does not permanently pin pages. vfio does that but we are not using
> it here.

Right, so I'm confused by your pin assertion above.

Re: [Qemu-devel] [PATCH v3 0/5] kvm "virtio pmem" device

2019-01-12 Thread Pankaj Gupta




> 
> On Thu 10-01-19 12:26:17, Dave Chinner wrote:
> > On Wed, Jan 09, 2019 at 08:17:31PM +0530, Pankaj Gupta wrote:
> > >  This patch series has implementation for "virtio pmem".
> > >  "virtio pmem" is fake persistent memory(nvdimm) in guest
> > >  which allows to bypass the guest page cache. This also
> > >  implements a VIRTIO based asynchronous flush mechanism.
> > 
> > H. Sharing the host page cache direct into the guest VM. Sounds
> > like a good idea, but.
> > 
> > This means the guest VM can now run timing attacks to observe host
> > side page cache residency, and depending on the implementation I'm
> > guessing that the guest will be able to control host side page
> > cache eviction, too (e.g. via discard or hole punch operations).
> > 
> > Which means this functionality looks to me like a new vector for
> > information leakage into and out of the guest VM via guest
> > controlled host page cache manipulation.
> > 
> > https://arxiv.org/pdf/1901.01161
> > 
> > I might be wrong, but if I'm not we're going to have to be very
> > careful about how guest VMs can access and manipulate host side
> > resources like the page cache.
> 
> Right. Thinking about this I would be more concerned about the fact that
> guest can effectively pin amount of host's page cache upto size of the
> device/file passed to guest as PMEM, can't it Pankaj? Or is there some QEMU
> magic that avoids this?

Yes, guest will pin these host page cache pages using 'get_user_pages' by
elevating the page reference count. But these pages can be reclaimed by host
at any time when there is memory pressure.

KVM does not permanently pin pages. vfio does that but we are not using
it here.

Could you please elaborate what you are thinking?

Thanks,
Pankaj

[Qemu-devel] [Bug 1811543] [NEW] virtio-scsi gives improper discard sysfs entries

2019-01-12 Thread James Harvey

Public bug reported:

Apologies if this is just an inherent part of paravirtualization that
should be expected.

In my host, I have an LVM thin pool with chunk_size 128MB.  Within it, I
have a thin volume "tmp".  In the host:

# fdisk -l /dev/lvm/tmp
Disk /dev/lvm/tmp: 256 MiB, 268435456 bytes, 524288 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 262144 bytes / 134217728 bytes
Disklabel type: gpt
Disk identifier: BAE3154E-6E85-F642-8129-BAD7B58B2775

DeviceStartEnd Sectors  Size Type
/dev/lvm/tmp1  2048 524254  522207  255M Linux filesystem

$ lsblk
...
  └─lvm-tmp  254:13   0   256M  0 lvm
└─lvm-tmp1   254:14   0   255M  0 part

$ cat /sys/dev/block/254:13/discard_alignment
0
$ cat /sys/dev/block/254:13/queue/discard_granularity
134217728
$ cat /sys/dev/block/254:13/queue/discard_max_bytes
17179869184
$ cat /sys/dev/block/254:13/queue/discard_max_hw_bytes
0
$ cat /sys/dev/block/254:13/queue/discard_zeroes_data
0

$ cat /sys/dev/block/254:14/discard_alignment
133169152
$ cat /sys/dev/block/254:14/queue/discard_granularity
134217728
$ cat /sys/dev/block/254:14/queue/discard_max_bytes
17179869184
$ cat /sys/dev/block/254:14/queue/discard_max_hw_bytes
0
$ cat /sys/dev/block/254:14/queue/discard_zeroes_data
0

If this is given to QEMU using virtio-scsi:

   -device virtio-scsi-pci,id=scsi1 \
   -drive 
driver=raw,node-name=hdb,file=/dev/lvm/tmp,if=none,discard=unmap,id=hd2 \
   -device scsi-hd,drive=hd2,bootindex=1 \

Then incorrect values are given:

$ lsblk
...
sdb 8:16   0   256M  0 disk
└─sdb1  8:17   0   255M  0 part /mnt

$ cat /sys/dev/block/8:16/discard_alignment
0
$ cat /sys/dev/block/8:16/queue/discard_granularity
4096
$ cat /sys/dev/block/8:16/queue/discard_max_bytes
1073741824
$ cat /sys/dev/block/8:16/queue/discard_max_hw_bytes
1073741824
$ cat /sys/dev/block/8:16/queue/discard_zeroes_data
0

$ cat /sys/dev/block/8:17/discard_alignment
133169152

And, there isn't even a /sys/dev/block/8:17/queue direcotry.

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1811543

Title:
  virtio-scsi gives improper discard sysfs entries

Status in QEMU:
  New

Bug description:
  Apologies if this is just an inherent part of paravirtualization that
  should be expected.

  In my host, I have an LVM thin pool with chunk_size 128MB.  Within it,
  I have a thin volume "tmp".  In the host:

  # fdisk -l /dev/lvm/tmp
  Disk /dev/lvm/tmp: 256 MiB, 268435456 bytes, 524288 sectors
  Units: sectors of 1 * 512 = 512 bytes
  Sector size (logical/physical): 512 bytes / 4096 bytes
  I/O size (minimum/optimal): 262144 bytes / 134217728 bytes
  Disklabel type: gpt
  Disk identifier: BAE3154E-6E85-F642-8129-BAD7B58B2775

  DeviceStartEnd Sectors  Size Type
  /dev/lvm/tmp1  2048 524254  522207  255M Linux filesystem

  $ lsblk
  ...
└─lvm-tmp  254:13   0   256M  0 lvm
  └─lvm-tmp1   254:14   0   255M  0 part

  $ cat /sys/dev/block/254:13/discard_alignment
  0
  $ cat /sys/dev/block/254:13/queue/discard_granularity
  134217728
  $ cat /sys/dev/block/254:13/queue/discard_max_bytes
  17179869184
  $ cat /sys/dev/block/254:13/queue/discard_max_hw_bytes
  0
  $ cat /sys/dev/block/254:13/queue/discard_zeroes_data
  0

  $ cat /sys/dev/block/254:14/discard_alignment
  133169152
  $ cat /sys/dev/block/254:14/queue/discard_granularity
  134217728
  $ cat /sys/dev/block/254:14/queue/discard_max_bytes
  17179869184
  $ cat /sys/dev/block/254:14/queue/discard_max_hw_bytes
  0
  $ cat /sys/dev/block/254:14/queue/discard_zeroes_data
  0

  If this is given to QEMU using virtio-scsi:

 -device virtio-scsi-pci,id=scsi1 \
 -drive 
driver=raw,node-name=hdb,file=/dev/lvm/tmp,if=none,discard=unmap,id=hd2 \
 -device scsi-hd,drive=hd2,bootindex=1 \

  Then incorrect values are given:

  $ lsblk
  ...
  sdb 8:16   0   256M  0 disk
  └─sdb1  8:17   0   255M  0 part /mnt

  $ cat /sys/dev/block/8:16/discard_alignment
  0
  $ cat /sys/dev/block/8:16/queue/discard_granularity
  4096
  $ cat /sys/dev/block/8:16/queue/discard_max_bytes
  1073741824
  $ cat /sys/dev/block/8:16/queue/discard_max_hw_bytes
  1073741824
  $ cat /sys/dev/block/8:16/queue/discard_zeroes_data
  0

  $ cat /sys/dev/block/8:17/discard_alignment
  133169152

  And, there isn't even a /sys/dev/block/8:17/queue direcotry.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1811543/+subscriptions

[Qemu-devel] Various GTK window display problems - sorting bugs from features

2019-01-12 Thread halfdog

Hello List,

After filing a Debian bug, the package maintainer gave a hint,
that this might be more relevant here:


There seems to exist a (minor) usability issue with GTK frontend
and detaching a window, that seems to lead to full loss of keyboard
input to the guest, in the end.



With the first issue (usability), it is not clear if this is intended
behaviour. Maybe someone can give a short opinion to finish this
one as "little unexpected feature". See also [0].


When running a Qemu machine with "-display gtk" then selecting

Menu-bar -> View -> Detach tab

and 

Menu-bar -> View -> Fullscreen

will then display the menu bar into fullscreen mode, not the virtual
machine window.



As it is impossible to bring the detached window to fullscreen
mode, it has to be reattached to the main windows again, This
can be done by closing the windows. Everything looks nice afterwards
but keybord events are not forwarded to the guest any more
(mouse is still working).

There is no difference in behaviour when GTK window is scaling
the guest screen, guest display is in full screen mode or not,
input is grabbed.

Using the qemu monitor (with "-monitor stdio") still transports
the keystrokes, so the guest seems to be working still.

(qemu) sendkey x 
(qemu) sendkey ret 

Can anyone reproduce this behaviour? See also [1].



Background: the initial reason for exploring all window/fullscreen
variants was, that the mouse input stopped working with the
sdl -> gtk switch. The Debian package maintainer seems very
responsive and commited, so we are still sorting this one out
in [2].


Thanks for any feedback, opinions,
hd


[0] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=919056
[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=919116
[2] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=919057

[Qemu-devel] [Bug 1811533] [NEW] Unstable Win10 guest with qemu 3.1 + huge pages + hv_stimer

2019-01-12 Thread Žilvinas Žaltiena

Public bug reported:

Host:
Gentoo linux x86_64, kernel 4.20.1
Qemu 3.1.0 
CPU: Intel i7 6850K
Chipset: X99

Guest:
Windows 10 Pro 64bit (1809)
Machine type: pc-q35_3.1
Hyper-V enlightenments: 
hv_stimer,hv_reenlightenment,hv_frequencies,hv_vapic,hv_reset,hv_synic,hv_runtime,hv_vpindex,hv_time,hv_relaxed,hv_spinlocks=0x1fff
Memory: 16GB backed by 2MB huge pages

Issue:
Once guest is started, log gets flooded with:

qemu-system-x86_64: vhost_region_add_section: Overlapping but not
coherent sections at 103000

or

qemu-system-x86_64: vhost_region_add_section:Section rounded to 0 prior
to previous 1f000

(line endings change)

and as time goes guest loses network access (virtio-net-pci) and general
performance diminishes to extent of freezing applications.

Observations:
1) problem disappears when hv_stimer is removed
2) problem disappears when memory backing with huge pages is disabled
3) problem disappears when machine type is downgraded to pc-q35_3.0

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1811533

Title:
  Unstable Win10 guest with qemu 3.1 + huge pages + hv_stimer

Status in QEMU:
  New

Bug description:
  Host:
  Gentoo linux x86_64, kernel 4.20.1
  Qemu 3.1.0 
  CPU: Intel i7 6850K
  Chipset: X99

  Guest:
  Windows 10 Pro 64bit (1809)
  Machine type: pc-q35_3.1
  Hyper-V enlightenments: 
hv_stimer,hv_reenlightenment,hv_frequencies,hv_vapic,hv_reset,hv_synic,hv_runtime,hv_vpindex,hv_time,hv_relaxed,hv_spinlocks=0x1fff
  Memory: 16GB backed by 2MB huge pages

  Issue:
  Once guest is started, log gets flooded with:

  qemu-system-x86_64: vhost_region_add_section: Overlapping but not
  coherent sections at 103000

  or

  qemu-system-x86_64: vhost_region_add_section:Section rounded to 0
  prior to previous 1f000

  (line endings change)

  and as time goes guest loses network access (virtio-net-pci) and
  general performance diminishes to extent of freezing applications.

  Observations:
  1) problem disappears when hv_stimer is removed
  2) problem disappears when memory backing with huge pages is disabled
  3) problem disappears when machine type is downgraded to pc-q35_3.0

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1811533/+subscriptions

Re: [Qemu-devel] [PATCH 2/4] linuxboot_dma: move common functions in a new header

2019-01-12 Thread Stefano Garzarella

On Fri, Jan 11, 2019 at 6:55 PM Eric Blake  wrote:
>
> On 1/11/19 11:48 AM, Michael S. Tsirkin wrote:
>
> >>>
>  diff --git a/pc-bios/optionrom/optrom.h b/pc-bios/optionrom/optrom.h
>  new file mode 100644
>  index 00..36f43b43fd
>  --- /dev/null
>  +++ b/pc-bios/optionrom/optrom.h
>
>  +#include "../../include/standard-headers/linux/qemu_fw_cfg.h"
> >>>
> >>> This depends on , please include it first.
> >>
> >> Sure.
> >>
> >>
> >> Thanks,
> >> Stefano
> >
> > Better to just pull in qemu/osdep.h
>
> Except that qemu/osdep.h should already have been pulled in by whatever
> .c file is including this header. We specifically document that .h files
> shouldn't need to include osdep.h (and in turn, anything that osdep.h
> already pulls in, like ).

Since I can't include qemu/osdep.h because this header file is used by
the option roms (bare metal), do you think is better to include
stdint.h in optrom.h or in the .c files that use it?
As Stefan pointed out, I'm including qemu_fw_cfg.h in optrom.h and it
depends on stdint.h. In this case, what is the best approach?

Thanks,
Stefano

Re: [Qemu-devel] [PATCH 2/4] linuxboot_dma: move common functions in a new header

2019-01-12 Thread Stefano Garzarella

On Fri, Jan 11, 2019 at 6:48 PM Michael S. Tsirkin  wrote:
> > > > +#include "../../include/standard-headers/linux/qemu_fw_cfg.h"
> > >
> > > This depends on , please include it first.
>
> Better to just pull in qemu/osdep.h

This header is used in the option roms that are bare-metal, so IMHO I
can't include qemu/osdep.h here or in the option roms.

Thanks,
Stefano

[Qemu-devel] [PATCH v3 00/19] nbd: add qemu-nbd --list

2019-01-12 Thread Eric Blake

I got tired of debugging whether a server was advertising the
correct things during negotiation by inspecting the trace
logs of qemu-io as client - not to mention that without SOME
sort of client tracing particular commands, we can't easily
regression test the server for correct behavior.  The final
straw was at KVM Forum, when Nir asked me to make sure there
was a way to easily determine if an NBD server is exposing what
we really want (and fixing x-dirty-bitmap to behave saner fell
out as a result of answering that question).

I note that upstream NBD has 'nbd-client -l $host' for querying
just export names (with no quoting, so you have to know that
a blank line means the default export), but it wasn't powerful
enough, so I implemented 'qemu-nbd -L' to document everything.
Upstream NBD has separate 'nbd-client' and 'nbd-server' binaries,
while we only have 'qemu-nbd' (which is normally just a server,
but 'qemu-nbd -c' also operates a second thread as a client).
Our other uses of qemu as NBD client are for consuming a block
device (as in qemu-io, qemu-img, or a drive to qemu) - but those
binaries are less suited to something so specific to the NBD
protocol.

Bonus: As a result of my work on this series, nbdkit now supports
NBD_OPT_INFO (my interoperability testing between server
implementations has been paying off, both at fixing server bugs,
and at making this code more reliable across difference in valid
servers).

Also available at:
https://repo.or.cz/qemu/ericb.git qemu-nbd-list-v2

Based-on: <20181221093529.23855-1-js...@redhat.com>
[jsnow: 0/11 bitmaps: remove x- prefix from QMP api]
Based-on: <2019063519.11457-1-phi...@redhat.com>
[philmd: qemu-nbd: Rename 'exp' variable clashing with math::exp() symbol]
Based-on: <2019094720.15671-1-ebl...@redhat.com>
[eblake: 0/8 Promote x-nbd-server-add-bitmap to stable]

Since v2:
- Several patches merged already
- 3 new patches based on audit of off_t vs. strtol
- rebase patches on top of other changes, such as qemu-nbd --bitmap
- address various review comments [Vladimir, Rich]
- drop patch 12/22

001/19:[0020] [FC] 'maint: Allow for EXAMPLES in texi2pod'
002/19:[0030] [FC] 'qemu-nbd: Enhance man page'
003/19:[down] 'qemu-nbd: Sanity check partition bounds'
004/19:[down] 'nbd/server: Hoist length check to qemp_nbd_server_add'
005/19:[down] 'nbd/server: Favor [u]int64_t over off_t'
006/19:[0007] [FC] 'qemu-nbd: Avoid strtol open-coding'
007/19:[0016] [FC] 'nbd/client: Refactor nbd_receive_list()'
008/19:[] [--] 'nbd/client: Move export name into NBDExportInfo'
009/19:[] [--] 'nbd/client: Change signature of 
nbd_negotiate_simple_meta_context()'
010/19:[0007] [FC] 'nbd/client: Split out nbd_send_one_meta_context()'
011/19:[0048] [FC] 'nbd/client: Split out nbd_receive_one_meta_context()'
012/19:[] [--] 'nbd/client: Refactor return of nbd_receive_negotiate()'
013/19:[] [-C] 'nbd/client: Split handshake into two functions'
014/19:[] [--] 'nbd/client: Pull out oldstyle size determination'
015/19:[0008] [FC] 'nbd/client: Add nbd_receive_export_list()'
016/19:[] [-C] 'nbd/client: Add meta contexts to nbd_receive_export_list()'
017/19:[0015] [FC] 'qemu-nbd: Add --list option'
018/19:[] [--] 'nbd/client: Work around 3.0 bug for listing meta contexts'
019/19:[0002] [FC] 'iotests: Enhance 223, 233 to cover 'qemu-nbd --list''

Eric Blake (19):
  maint: Allow for EXAMPLES in texi2pod
  qemu-nbd: Enhance man page
  qemu-nbd: Sanity check partition bounds
  nbd/server: Hoist length check to qemp_nbd_server_add
  nbd/server: Favor [u]int64_t over off_t
  qemu-nbd: Avoid strtol open-coding
  nbd/client: Refactor nbd_receive_list()
  nbd/client: Move export name into NBDExportInfo
  nbd/client: Change signature of nbd_negotiate_simple_meta_context()
  nbd/client: Split out nbd_send_one_meta_context()
  nbd/client: Split out nbd_receive_one_meta_context()
  nbd/client: Refactor return of nbd_receive_negotiate()
  nbd/client: Split handshake into two functions
  nbd/client: Pull out oldstyle size determination
  nbd/client: Add nbd_receive_export_list()
  nbd/client: Add meta contexts to nbd_receive_export_list()
  qemu-nbd: Add --list option
  nbd/client: Work around 3.0 bug for listing meta contexts
  iotests: Enhance 223, 233 to cover 'qemu-nbd --list'

 qemu-nbd.texi  | 114 --
 Makefile   |   2 +
 include/block/nbd.h|  31 +-
 block/nbd-client.c |   9 +-
 blockdev-nbd.c |  10 +-
 nbd/client.c   | 756 ++---
 nbd/server.c   |  23 +-
 qemu-nbd.c | 221 ---
 nbd/trace-events   |  11 +-
 scripts/texi2pod.pl|   2 +-
 tests/qemu-iotests/223 |   2 +
 tests/qemu-iotests/223.out |  20 +
 tests/qemu-iotests/233 |  19 +-
 tests/qemu-iotests/233.out |  15 +
 14 files changed, 910 insertions(+), 325 deletions(-)

-- 
2.20.1

[Qemu-devel] [PATCH v3 01/19] maint: Allow for EXAMPLES in texi2pod

2019-01-12 Thread Eric Blake

The next commit will add an EXAMPLES section to qemu-nbd.8;
for that to work, we need to recognize EXAMPLES in texi2pod.
We also need to add a dependency from all man pages against
the generator script, since a change to the generator may
cause the resulting man page to differ.

Signed-off-by: Eric Blake 
---
v3: add generic dependency for all man pages in $(DOCS) instead of
per-line editing [Vladimir]
---
 Makefile| 2 ++
 scripts/texi2pod.pl | 2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index a9ac16d94e8..e2d3ace190a 100644
--- a/Makefile
+++ b/Makefile
@@ -857,6 +857,8 @@ docs/interop/qemu-qmp-ref.dvi 
docs/interop/qemu-qmp-ref.html \
 docs/interop/qemu-qmp-ref.txt docs/interop/qemu-qmp-ref.7: \
docs/interop/qemu-qmp-ref.texi docs/interop/qemu-qmp-qapi.texi

+$(filter %.1 %.7 %.8,$(DOCS)): scripts/texi2pod.pl
+
 # Reports/Analysis

 %/coverage-report.html:
diff --git a/scripts/texi2pod.pl b/scripts/texi2pod.pl
index 39ce584a322..839b7917cf7 100755
--- a/scripts/texi2pod.pl
+++ b/scripts/texi2pod.pl
@@ -398,7 +398,7 @@ $sects{NAME} = "$fn \- $tl\n";
 $sects{FOOTNOTES} .= "=back\n" if exists $sects{FOOTNOTES};

 for $sect (qw(NAME SYNOPSIS DESCRIPTION OPTIONS ENVIRONMENT FILES
- BUGS NOTES FOOTNOTES SEEALSO AUTHOR COPYRIGHT)) {
+ BUGS NOTES FOOTNOTES EXAMPLES SEEALSO AUTHOR COPYRIGHT)) {
 if(exists $sects{$sect}) {
$head = $sect;
$head =~ s/SEEALSO/SEE ALSO/;
-- 
2.20.1

[Qemu-devel] [PATCH v3 11/19] nbd/client: Split out nbd_receive_one_meta_context()

2019-01-12 Thread Eric Blake

Extract portions of nbd_negotiate_simple_meta_context() to
a new function nbd_receive_one_meta_context() that copies the
pattern of nbd_receive_list() for performing the argument
validation of one reply.  The error message when the server
replies with more than one context changes slightly, but
that shouldn't happen in the common case.

Signed-off-by: Eric Blake 
Message-Id: <20181215135324.152629-15-ebl...@redhat.com>

---
v3: rebase, without changing into a loop
---
 nbd/client.c | 148 +--
 nbd/trace-events |   2 +-
 2 files changed, 92 insertions(+), 58 deletions(-)

diff --git a/nbd/client.c b/nbd/client.c
index 3c716be2719..22505199d3b 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -672,7 +672,86 @@ static int nbd_send_one_meta_context(QIOChannel *ioc,
 return ret;
 }

-/* nbd_negotiate_simple_meta_context:
+/*
+ * nbd_receive_one_meta_context:
+ * Called in a loop to receive and trace one set/list meta context reply.
+ * Pass non-NULL @name or @id to collect results back to the caller, which
+ * must eventually call g_free().
+ * return 1 if name is set and iteration must continue,
+ *0 if iteration is complete (including if option is unsupported),
+ *-1 with errp set for any error
+ */
+static int nbd_receive_one_meta_context(QIOChannel *ioc,
+uint32_t opt,
+char **name,
+uint32_t *id,
+Error **errp)
+{
+int ret;
+NBDOptionReply reply;
+char *local_name = NULL;
+uint32_t local_id;
+
+if (nbd_receive_option_reply(ioc, opt, , errp) < 0) {
+return -1;
+}
+
+ret = nbd_handle_reply_err(ioc, , errp);
+if (ret <= 0) {
+return ret;
+}
+
+if (reply.type == NBD_REP_ACK) {
+if (reply.length != 0) {
+error_setg(errp, "Unexpected length to ACK response");
+nbd_send_opt_abort(ioc);
+return -1;
+}
+return 0;
+} else if (reply.type != NBD_REP_META_CONTEXT) {
+error_setg(errp, "Unexpected reply type %u (%s), expected %u (%s)",
+   reply.type, nbd_rep_lookup(reply.type),
+   NBD_REP_META_CONTEXT, nbd_rep_lookup(NBD_REP_META_CONTEXT));
+nbd_send_opt_abort(ioc);
+return -1;
+}
+
+if (reply.length <= sizeof(local_id) ||
+reply.length > NBD_MAX_BUFFER_SIZE) {
+error_setg(errp, "Failed to negotiate meta context, server "
+   "answered with unexpected length %" PRIu32,
+   reply.length);
+nbd_send_opt_abort(ioc);
+return -1;
+}
+
+if (nbd_read(ioc, _id, sizeof(local_id), errp) < 0) {
+return -1;
+}
+local_id = be32_to_cpu(local_id);
+
+reply.length -= sizeof(local_id);
+local_name = g_malloc(reply.length + 1);
+if (nbd_read(ioc, local_name, reply.length, errp) < 0) {
+g_free(local_name);
+return -1;
+}
+local_name[reply.length] = '\0';
+trace_nbd_opt_meta_reply(nbd_opt_lookup(opt), local_name, local_id);
+
+if (name) {
+*name = local_name;
+} else {
+g_free(local_name);
+}
+if (id) {
+*id = local_id;
+}
+return 1;
+}
+
+/*
+ * nbd_negotiate_simple_meta_context:
  * Request the server to set the meta context for export @info->name
  * using @info->x_dirty_bitmap with a fallback to "base:allocation",
  * setting @info->context_id to the resulting id. Fail if the server
@@ -693,50 +772,21 @@ static int nbd_negotiate_simple_meta_context(QIOChannel 
*ioc,
  * function should lose the term _simple.
  */
 int ret;
-NBDOptionReply reply;
 const char *context = info->x_dirty_bitmap ?: "base:allocation";
 bool received = false;
+char *name = NULL;

 if (nbd_send_one_meta_context(ioc, NBD_OPT_SET_META_CONTEXT,
   info->name, context, errp) < 0) {
 return -1;
 }

-if (nbd_receive_option_reply(ioc, NBD_OPT_SET_META_CONTEXT, ,
- errp) < 0)
-{
+ret = nbd_receive_one_meta_context(ioc, NBD_OPT_SET_META_CONTEXT,
+   , >context_id, errp);
+if (ret < 0) {
 return -1;
 }
-
-ret = nbd_handle_reply_err(ioc, , errp);
-if (ret <= 0) {
-return ret;
-}
-
-if (reply.type == NBD_REP_META_CONTEXT) {
-char *name;
-
-if (reply.length != sizeof(info->context_id) + strlen(context)) {
-error_setg(errp, "Failed to negotiate meta context '%s', server "
-   "answered with unexpected length %" PRIu32, context,
-   reply.length);
-nbd_send_opt_abort(ioc);
-return -1;
-}
-
-if (nbd_read(ioc, >context_id, sizeof(info->context_id),
- errp) < 0) {
-return

[Qemu-devel] [PATCH v3 04/19] nbd/server: Hoist length check to qemp_nbd_server_add

2019-01-12 Thread Eric Blake

We only had two callers to nbd_export_new; qemu-nbd.c always
passed a valid offset/length pair (because it already checked
the file length, to ensure that offset was in bounds), while
blockdev-nbd always passed 0/-1.  Then nbd_export_new reduces
the size to a multiple of BDRV_SECTOR_SIZE (can only happen
when offset is not sector-aligned, since bdrv_getlength()
currently rounds up), which can result in offset being greater
than the enforced length, but that's not fatal (the server
rejects client requests that exceed the advertised length).

However, I'm finding it easier to work with the code if we are
consistent on having both callers pass in a valid length, and
just assert that things are sane in nbd_export_new.

Signed-off-by: Eric Blake 

---
v3: new patch
---
 blockdev-nbd.c | 10 +-
 nbd/server.c   |  9 ++---
 2 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/blockdev-nbd.c b/blockdev-nbd.c
index c76d5416b90..d73ac1b026a 100644
--- a/blockdev-nbd.c
+++ b/blockdev-nbd.c
@@ -146,6 +146,7 @@ void qmp_nbd_server_add(const char *device, bool has_name, 
const char *name,
 BlockDriverState *bs = NULL;
 BlockBackend *on_eject_blk;
 NBDExport *exp;
+int64_t len;

 if (!nbd_server) {
 error_setg(errp, "NBD server not running");
@@ -168,6 +169,13 @@ void qmp_nbd_server_add(const char *device, bool has_name, 
const char *name,
 return;
 }

+len = bdrv_getlength(bs);
+if (len < 0) {
+error_setg_errno(errp, -len,
+ "Failed to determine the NBD export's length");
+return;
+}
+
 if (!has_writable) {
 writable = false;
 }
@@ -175,7 +183,7 @@ void qmp_nbd_server_add(const char *device, bool has_name, 
const char *name,
 writable = false;
 }

-exp = nbd_export_new(bs, 0, -1, name, NULL, bitmap,
+exp = nbd_export_new(bs, 0, len, name, NULL, bitmap,
  writable ? 0 : NBD_FLAG_READ_ONLY,
  NULL, false, on_eject_blk, errp);
 if (!exp) {
diff --git a/nbd/server.c b/nbd/server.c
index e8c56607eff..c9937ccdc2a 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1499,13 +1499,8 @@ NBDExport *nbd_export_new(BlockDriverState *bs, off_t 
dev_offset, off_t size,
 exp->name = g_strdup(name);
 exp->description = g_strdup(description);
 exp->nbdflags = nbdflags;
-exp->size = size < 0 ? blk_getlength(blk) : size;
-if (exp->size < 0) {
-error_setg_errno(errp, -exp->size,
- "Failed to determine the NBD export's length");
-goto fail;
-}
-exp->size -= exp->size % BDRV_SECTOR_SIZE;
+assert(dev_offset <= size);
+exp->size = QEMU_ALIGN_DOWN(size, BDRV_SECTOR_SIZE);

 if (bitmap) {
 BdrvDirtyBitmap *bm = NULL;
-- 
2.20.1

[Qemu-devel] [PATCH v3 03/19] qemu-nbd: Sanity check partition bounds

2019-01-12 Thread Eric Blake

When the user requests a partition, we were using data read
from the disk as disk offsets without a bounds check. We got
lucky that even when computed offsets are out-of-bounds,
blk_pread() will gracefully catch the error later (so I don't
think a malicious image can crash or exploit qemu-nbd, and am
not treating this as a security flaw), but it's better to
flag the problem up front than to risk permanent EIO death of
the block device down the road.  Also, note that the
partition code blindly overwrites any offset passed in by the
user; so make the -o/-P combo an error for less confusion.

This can be tested with nbdkit:
$ echo hi > file
$ nbdkit -fv --filter=truncate partitioning file truncate=64k

Pre-patch:
$ qemu-nbd -p 10810 -P 1 -f raw nbd://localhost:10809 &
$ qemu-io -f raw nbd://localhost:10810
qemu-io> r -v 0 1
Disconnect client, due to: Failed to send reply: reading from file failed: 
Input/output error
Connection closed
read failed: Input/output error
qemu-io> q
[1]+  Doneqemu-nbd -p 10810 -P 1 -f raw 
nbd://localhost:10809

Post-patch:
$ qemu-nbd -p 10810 -P 1 -f raw nbd://localhost:10809
qemu-nbd: Discovered partition 1 at offset 1048576 size 512, but size exceeds 
file length 65536

Signed-off-by: Eric Blake 
---
v3: new patch
---
 qemu-nbd.c | 18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/qemu-nbd.c b/qemu-nbd.c
index 51b55f2e066..ff4adb9b3eb 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -1013,12 +1013,28 @@ int main(int argc, char **argv)
 fd_size -= dev_offset;

 if (partition != -1) {
-ret = find_partition(blk, partition, _offset, _size);
+off_t limit;
+
+if (dev_offset) {
+error_report("Cannot request partition and offset together");
+exit(EXIT_FAILURE);
+}
+ret = find_partition(blk, partition, _offset, );
 if (ret < 0) {
 error_report("Could not find partition %d: %s", partition,
  strerror(-ret));
 exit(EXIT_FAILURE);
 }
+/* partition limits are (32-bit << 9); can't overflow 64 bits */
+assert(dev_offset >= 0 && dev_offset + limit >= dev_offset);
+if (dev_offset + limit > fd_size) {
+error_report("Discovered partition %d at offset %lld size %lld, "
+ "but size exceeds file length %lld", partition,
+ (long long int) dev_offset, (long long int) limit,
+ (long long int) fd_size);
+exit(EXIT_FAILURE);
+}
+fd_size = limit;
 }

 export = nbd_export_new(bs, dev_offset, fd_size, export_name,
-- 
2.20.1

[Qemu-devel] [PATCH v3 08/19] nbd/client: Move export name into NBDExportInfo

2019-01-12 Thread Eric Blake

Refactor the 'name' parameter of nbd_receive_negotiate() from
being a separate parameter into being part of the in-out 'info'.
This also spills over to a simplification of nbd_opt_go().

The main driver for this refactoring is that an upcoming patch
would like to add support to qemu-nbd to list information about
all exports available on a server, where the name(s) will be
provided by the server instead of the client.  But another benefit
is that we can now allow the client to explicitly specify the
empty export name "" even when connecting to an oldstyle server
(even if qemu is no longer such a server after commit 7f7dfe2a).

Signed-off-by: Eric Blake 
Reviewed-by: Richard W.M. Jones 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Message-Id: <20181215135324.152629-11-ebl...@redhat.com>
---
 include/block/nbd.h |  8 
 block/nbd-client.c  |  5 +++--
 nbd/client.c| 39 ++-
 qemu-nbd.c  |  6 --
 nbd/trace-events|  2 +-
 5 files changed, 30 insertions(+), 30 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 0f252829376..6d76d3dc221 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -262,6 +262,7 @@ struct NBDExportInfo {
 /* Set by client before nbd_receive_negotiate() */
 bool request_sizes;
 char *x_dirty_bitmap;
+char *name; /* must be non-NULL */

 /* In-out fields, set by client before nbd_receive_negotiate() and
  * updated by server results during nbd_receive_negotiate() */
@@ -279,10 +280,9 @@ struct NBDExportInfo {
 };
 typedef struct NBDExportInfo NBDExportInfo;

-int nbd_receive_negotiate(QIOChannel *ioc, const char *name,
-  QCryptoTLSCreds *tlscreds, const char *hostname,
-  QIOChannel **outioc, NBDExportInfo *info,
-  Error **errp);
+int nbd_receive_negotiate(QIOChannel *ioc, QCryptoTLSCreds *tlscreds,
+  const char *hostname, QIOChannel **outioc,
+  NBDExportInfo *info, Error **errp);
 int nbd_init(int fd, QIOChannelSocket *sioc, NBDExportInfo *info,
  Error **errp);
 int nbd_send_request(QIOChannel *ioc, NBDRequest *request);
diff --git a/block/nbd-client.c b/block/nbd-client.c
index ef320759716..3309376bc16 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -999,10 +999,11 @@ int nbd_client_init(BlockDriverState *bs,
 client->info.structured_reply = true;
 client->info.base_allocation = true;
 client->info.x_dirty_bitmap = g_strdup(x_dirty_bitmap);
-ret = nbd_receive_negotiate(QIO_CHANNEL(sioc), export,
-tlscreds, hostname,
+client->info.name = g_strdup(export ?: "");
+ret = nbd_receive_negotiate(QIO_CHANNEL(sioc), tlscreds, hostname,
 >ioc, >info, errp);
 g_free(client->info.x_dirty_bitmap);
+g_free(client->info.name);
 if (ret < 0) {
 logout("Failed to negotiate with the NBD server\n");
 return ret;
diff --git a/nbd/client.c b/nbd/client.c
index fd4ba8dec37..8227e69478a 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -330,15 +330,14 @@ static int nbd_receive_list(QIOChannel *ioc, char **name, 
char **description,
 }


-/* Returns -1 if NBD_OPT_GO proves the export @wantname cannot be
+/* Returns -1 if NBD_OPT_GO proves the export @info->name cannot be
  * used, 0 if NBD_OPT_GO is unsupported (fall back to NBD_OPT_LIST and
  * NBD_OPT_EXPORT_NAME in that case), and > 0 if the export is good to
- * go (with @info populated). */
-static int nbd_opt_go(QIOChannel *ioc, const char *wantname,
-  NBDExportInfo *info, Error **errp)
+ * go (with the rest of @info populated). */
+static int nbd_opt_go(QIOChannel *ioc, NBDExportInfo *info, Error **errp)
 {
 NBDOptionReply reply;
-uint32_t len = strlen(wantname);
+uint32_t len = strlen(info->name);
 uint16_t type;
 int error;
 char *buf;
@@ -348,10 +347,10 @@ static int nbd_opt_go(QIOChannel *ioc, const char 
*wantname,
  * flags still 0 is a witness of a broken server. */
 info->flags = 0;

-trace_nbd_opt_go_start(wantname);
+trace_nbd_opt_go_start(info->name);
 buf = g_malloc(4 + len + 2 + 2 * info->request_sizes + 1);
 stl_be_p(buf, len);
-memcpy(buf + 4, wantname, len);
+memcpy(buf + 4, info->name, len);
 /* At most one request, everything else up to server */
 stw_be_p(buf + 4 + len, info->request_sizes);
 if (info->request_sizes) {
@@ -753,10 +752,9 @@ static int nbd_negotiate_simple_meta_context(QIOChannel 
*ioc,
 return 0;
 }

-int nbd_receive_negotiate(QIOChannel *ioc, const char *name,
-  QCryptoTLSCreds *tlscreds, const char *hostname,
-  QIOChannel **outioc, NBDExportInfo *info,
-  Error **errp)
+int nbd_receive_negotiate(QIOChannel *ioc, QCryptoTLSCreds *tlscreds,
+  const char

[Qemu-devel] [PATCH v3 14/19] nbd/client: Pull out oldstyle size determination

2019-01-12 Thread Eric Blake

Another refactoring creating nbd_negotiate_finish_oldstyle()
for further reuse during 'qemu-nbd --list'.

Signed-off-by: Eric Blake 
Message-Id: <20181215135324.152629-18-ebl...@redhat.com>
Reviewed-by: Richard W.M. Jones 
---
 nbd/client.c | 49 -
 1 file changed, 32 insertions(+), 17 deletions(-)

diff --git a/nbd/client.c b/nbd/client.c
index 5053433ea5e..620fbb5ef01 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -818,7 +818,7 @@ static int nbd_negotiate_simple_meta_context(QIOChannel 
*ioc,
  * Start the handshake to the server.  After a positive return, the server
  * is ready to accept additional NBD_OPT requests.
  * Returns: negative errno: failure talking to server
- *  0: server is oldstyle, client must still parse export size
+ *  0: server is oldstyle, must call nbd_negotiate_finish_oldstyle
  *  1: server is newstyle, but can only accept EXPORT_NAME
  *  2: server is newstyle, but lacks structured replies
  *  3: server is newstyle and set up for structured replies
@@ -923,6 +923,36 @@ static int nbd_start_negotiate(QIOChannel *ioc, 
QCryptoTLSCreds *tlscreds,
 }
 }

+/*
+ * nbd_negotiate_finish_oldstyle:
+ * Populate @info with the size and export flags from an oldstyle server,
+ * but does not consume 124 bytes of reserved zero padding.
+ * Returns 0 on success, -1 with @errp set on failure
+ */
+static int nbd_negotiate_finish_oldstyle(QIOChannel *ioc, NBDExportInfo *info,
+ Error **errp)
+{
+uint32_t oldflags;
+
+if (nbd_read(ioc, >size, sizeof(info->size), errp) < 0) {
+error_prepend(errp, "Failed to read export length: ");
+return -EINVAL;
+}
+info->size = be64_to_cpu(info->size);
+
+if (nbd_read(ioc, , sizeof(oldflags), errp) < 0) {
+error_prepend(errp, "Failed to read export flags: ");
+return -EINVAL;
+}
+oldflags = be32_to_cpu(oldflags);
+if (oldflags & ~0x) {
+error_setg(errp, "Unexpected export flags %0x" PRIx32, oldflags);
+return -EINVAL;
+}
+info->flags = oldflags;
+return 0;
+}
+
 /*
  * nbd_receive_negotiate:
  * Connect to server, complete negotiation, and move into transmission phase.
@@ -936,7 +966,6 @@ int nbd_receive_negotiate(QIOChannel *ioc, QCryptoTLSCreds 
*tlscreds,
 int result;
 bool zeroes = true;
 bool base_allocation = info->base_allocation;
-uint32_t oldflags;

 assert(info->name);
 trace_nbd_receive_negotiate_name(info->name);
@@ -1009,23 +1038,9 @@ int nbd_receive_negotiate(QIOChannel *ioc, 
QCryptoTLSCreds *tlscreds,
 error_setg(errp, "Server does not support non-empty export names");
 return -EINVAL;
 }
-
-if (nbd_read(ioc, >size, sizeof(info->size), errp) < 0) {
-error_prepend(errp, "Failed to read export length: ");
+if (nbd_negotiate_finish_oldstyle(ioc, info, errp) < 0) {
 return -EINVAL;
 }
-info->size = be64_to_cpu(info->size);
-
-if (nbd_read(ioc, , sizeof(oldflags), errp) < 0) {
-error_prepend(errp, "Failed to read export flags: ");
-return -EINVAL;
-}
-oldflags = be32_to_cpu(oldflags);
-if (oldflags & ~0x) {
-error_setg(errp, "Unexpected export flags %0x" PRIx32, oldflags);
-return -EINVAL;
-}
-info->flags = oldflags;
 break;
 default:
 return result;
-- 
2.20.1

[Qemu-devel] [PATCH v3 02/19] qemu-nbd: Enhance man page

2019-01-12 Thread Eric Blake

Document some useful qemu-nbd command lines. Mention some restrictions
on particular options, like -p being only for MBR images, or -c/-d
being Linux-only.  Update some text given the recent change to no
longer serve oldstyle protocol (missed in commit 7f7dfe2a).  Also,
consistently use trailing '.' in describing options.

Signed-off-by: Eric Blake 

---
v3: wording improvements, use -t in more examples [Rich]
---
 qemu-nbd.texi | 91 ---
 1 file changed, 72 insertions(+), 19 deletions(-)

diff --git a/qemu-nbd.texi b/qemu-nbd.texi
index 96b1546006a..3f22559beb4 100644
--- a/qemu-nbd.texi
+++ b/qemu-nbd.texi
@@ -10,11 +10,17 @@

 Export a QEMU disk image using the NBD protocol.

+Other uses:
+@itemize
+@item
+Bind a /dev/nbdX block device to a QEMU server (on Linux).
+@end itemize
+
 @c man end

 @c man begin OPTIONS
 @var{filename} is a disk image filename, or a set of block
-driver options if @var{--image-opts} is specified.
+driver options if @option{--image-opts} is specified.

 @var{dev} is an NBD device.

@@ -27,24 +33,25 @@ supported. The common object types that it makes sense to 
define are the
 keys, and the @code{tls-creds} object, which is used to supply TLS
 credentials for the qemu-nbd server.
 @item -p, --port=@var{port}
-The TCP port to listen on (default @samp{10809})
+The TCP port to listen on (default @samp{10809}).
 @item -o, --offset=@var{offset}
-The offset into the image
+The offset into the image.
 @item -b, --bind=@var{iface}
-The interface to bind to (default @samp{0.0.0.0})
+The interface to bind to (default @samp{0.0.0.0}).
 @item -k, --socket=@var{path}
-Use a unix socket with path @var{path}
+Use a unix socket with path @var{path}.
 @item --image-opts
 Treat @var{filename} as a set of image options, instead of a plain
 filename. If this flag is specified, the @var{-f} flag should
 not be used, instead the '@code{format=}' option should be set.
 @item -f, --format=@var{fmt}
 Force the use of the block driver for format @var{fmt} instead of
-auto-detecting
+auto-detecting.
 @item -r, --read-only
-Export the disk as read-only
+Export the disk as read-only.
 @item -P, --partition=@var{num}
-Only expose partition @var{num}
+Only expose MBR partition @var{num}.  Understands physical partitions
+1-4 and logical partitions 5-8.
 @item -B, --bitmap=@var{name}
 If @var{filename} has a qcow2 persistent bitmap @var{name}, expose
 that bitmap via the ``qemu:dirty-bitmap:@var{name}'' context
@@ -52,7 +59,7 @@ accessible through NBD_OPT_SET_META_CONTEXT.
 @item -s, --snapshot
 Use @var{filename} as an external snapshot, create a temporary
 file with backing_file=@var{filename}, redirect the write to
-the temporary one
+the temporary one.
 @item -l, --load-snapshot=@var{snapshot_param}
 Load an internal snapshot inside @var{filename} and export it
 as an read-only device, @var{snapshot_param} format is
@@ -76,19 +83,20 @@ driver-specific optimized zero write commands.  
@var{detect-zeroes} is one of
 converts a zero write to an unmap operation and can only be used if
 @var{discard} is set to @samp{unmap}.  The default is @samp{off}.
 @item -c, --connect=@var{dev}
-Connect @var{filename} to NBD device @var{dev}
+Connect @var{filename} to NBD device @var{dev} (Linux only).
 @item -d, --disconnect
-Disconnect the device @var{dev}
+Disconnect the device @var{dev} (Linux only).
 @item -e, --shared=@var{num}
-Allow up to @var{num} clients to share the device (default @samp{1})
+Allow up to @var{num} clients to share the device (default
+@samp{1}). Safe for readers, but for now, consistency is not
+guaranteed between multiple writers.
 @item -t, --persistent
-Don't exit on the last connection
+Don't exit on the last connection.
 @item -x, --export-name=@var{name}
-Set the NBD volume export name. This switches the server to use
-the new style NBD protocol negotiation
+Set the NBD volume export name (default of a zero-length string).
 @item -D, --description=@var{description}
 Set the NBD volume export description, as a human-readable
-string. Requires the use of @option{-x}
+string.
 @item --tls-creds=ID
 Enable mandatory TLS encryption for the server by setting the ID
 of the TLS credentials object previously created with the --object
@@ -96,11 +104,11 @@ option.
 @item --fork
 Fork off the server process and exit the parent once the server is running.
 @item -v, --verbose
-Display extra debugging information
+Display extra debugging information.
 @item -h, --help
-Display this help and exit
+Display this help and exit.
 @item -V, --version
-Display version information and exit
+Display version information and exit.
 @item -T, --trace 
[[enable=]@var{pattern}][,events=@var{file}][,file=@var{file}]
 @findex --trace
 @include qemu-option-trace.texi
@@ -108,6 +116,51 @@ Display version information and exit

 @c man end

+@c man begin EXAMPLES
+Start a server listening on port 10809 that exposes only the
+guest-visible contents of a qcow2 file, with no TLS

[Qemu-devel] [PATCH v3 13/19] nbd/client: Split handshake into two functions

2019-01-12 Thread Eric Blake

An upcoming patch will add the ability for qemu-nbd to list
the services provided by an NBD server.  Share the common
code of the TLS handshake by splitting the initial exchange
into a separate function, leaving only the export handling
in the original function.  Functionally, there should be no
change in behavior in this patch, although some of the code
motion may be difficult to follow due to indentation changes
(view with 'git diff -w' for a smaller changeset).

I considered an enum for the return code coordinating state
between the two functions, but in the end just settled with
ample comments.

Signed-off-by: Eric Blake 
Reviewed-by: Richard W.M. Jones 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Message-Id: <20181215135324.152629-17-ebl...@redhat.com>
---
 nbd/client.c | 144 +++
 nbd/trace-events |   2 +-
 2 files changed, 95 insertions(+), 51 deletions(-)

diff --git a/nbd/client.c b/nbd/client.c
index fa931fd8e5d..5053433ea5e 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -813,21 +813,24 @@ static int nbd_negotiate_simple_meta_context(QIOChannel 
*ioc,
 return received;
 }

-int nbd_receive_negotiate(QIOChannel *ioc, QCryptoTLSCreds *tlscreds,
-  const char *hostname, QIOChannel **outioc,
-  NBDExportInfo *info, Error **errp)
+/*
+ * nbd_start_negotiate:
+ * Start the handshake to the server.  After a positive return, the server
+ * is ready to accept additional NBD_OPT requests.
+ * Returns: negative errno: failure talking to server
+ *  0: server is oldstyle, client must still parse export size
+ *  1: server is newstyle, but can only accept EXPORT_NAME
+ *  2: server is newstyle, but lacks structured replies
+ *  3: server is newstyle and set up for structured replies
+ */
+static int nbd_start_negotiate(QIOChannel *ioc, QCryptoTLSCreds *tlscreds,
+   const char *hostname, QIOChannel **outioc,
+   bool structured_reply, bool *zeroes,
+   Error **errp)
 {
 uint64_t magic;
-bool zeroes = true;
-bool structured_reply = info->structured_reply;
-bool base_allocation = info->base_allocation;

-trace_nbd_receive_negotiate(tlscreds, hostname ? hostname : "");
-
-assert(info->name);
-trace_nbd_receive_negotiate_name(info->name);
-info->structured_reply = false;
-info->base_allocation = false;
+trace_nbd_start_negotiate(tlscreds, hostname ? hostname : "");

 if (outioc) {
 *outioc = NULL;
@@ -872,7 +875,7 @@ int nbd_receive_negotiate(QIOChannel *ioc, QCryptoTLSCreds 
*tlscreds,
 clientflags |= NBD_FLAG_C_FIXED_NEWSTYLE;
 }
 if (globalflags & NBD_FLAG_NO_ZEROES) {
-zeroes = false;
+*zeroes = false;
 clientflags |= NBD_FLAG_C_NO_ZEROES;
 }
 /* client requested flags */
@@ -894,7 +897,7 @@ int nbd_receive_negotiate(QIOChannel *ioc, QCryptoTLSCreds 
*tlscreds,
 }
 }
 if (fixedNewStyle) {
-int result;
+int result = 0;

 if (structured_reply) {
 result = nbd_request_simple_option(ioc,
@@ -903,39 +906,85 @@ int nbd_receive_negotiate(QIOChannel *ioc, 
QCryptoTLSCreds *tlscreds,
 if (result < 0) {
 return -EINVAL;
 }
-info->structured_reply = result == 1;
 }
+return 2 + result;
+} else {
+return 1;
+}
+} else if (magic == NBD_CLIENT_MAGIC) {
+if (tlscreds) {
+error_setg(errp, "Server does not support STARTTLS");
+return -EINVAL;
+}
+return 0;
+} else {
+error_setg(errp, "Bad server magic received: 0x%" PRIx64, magic);
+return -EINVAL;
+}
+}

-if (info->structured_reply && base_allocation) {
-result = nbd_negotiate_simple_meta_context(ioc, info, errp);
-if (result < 0) {
-return -EINVAL;
-}
-info->base_allocation = result == 1;
-}
+/*
+ * nbd_receive_negotiate:
+ * Connect to server, complete negotiation, and move into transmission phase.
+ * Returns: negative errno: failure talking to server
+ *  0: server is connected
+ */
+int nbd_receive_negotiate(QIOChannel *ioc, QCryptoTLSCreds *tlscreds,
+  const char *hostname, QIOChannel **outioc,
+  NBDExportInfo *info, Error **errp)
+{
+int result;
+bool zeroes = true;
+bool base_allocation = info->base_allocation;
+uint32_t oldflags;
+
+assert(info->name);
+trace_nbd_receive_negotiate_name(info->name);
+
+result = nbd_start_negotiate(ioc, tlscreds, hostname, outioc,
+ info->structured_reply, , errp);
+
+info->structured_reply =

[Qemu-devel] [PATCH v3 15/19] nbd/client: Add nbd_receive_export_list()

2019-01-12 Thread Eric Blake

We want to be able to detect whether a given qemu NBD server is
exposing the right export(s) and dirty bitmaps, at least for
regression testing.  We could use 'nbd-client -l' from the upstream
NBD project to list exports, but it's annoying to rely on
out-of-tree binaries; furthermore, nbd-client doesn't necessarily
know about all of the qemu NBD extensions.  Thus, we plan on adding
a new mode to qemu-nbd that merely sniffs all possible information
from the server during handshake phase, then disconnects and dumps
the information.

This patch adds the low-level client code for grabbing the list
of exports.  It benefits from the recent refactoring patches, as
well as a minor tweak of changing nbd_opt_go() to nbd_opt_info_or_go(),
in order to share as much code as possible when it comes to doing
validation of server replies.  The resulting information is stored
in an array of NBDExportInfo which has been expanded to any
description string, along with a convenience function for freeing
the list.

Note: a malicious server could exhaust memory of a client by feeding
an unending loop of exports; perhaps we should place a limit on how
many we are willing to receive. But note that a server could
reasonably be serving an export for every file in a large directory,
where an arbitrary limit in the client means we can't list anything
from such a server; the same happens if we just run until the client
fails to malloc() and thus dies by an abort(), where the limit is
no longer arbitrary but determined by available memory.  Since the
client is already planning on being short-lived, it's hard to call
this a denial of service attack that would starve off other uses,
so it does not appear to be a security issue.

Signed-off-by: Eric Blake 
Message-Id: <20181215135324.152629-19-ebl...@redhat.com>
Reviewed-by: Richard W.M. Jones 

---
v3: mention security (non-)issue in commit message [Rich], formatting
tweaks
---
 include/block/nbd.h |  15 -
 nbd/client.c| 144 +---
 nbd/trace-events|   2 +-
 3 files changed, 150 insertions(+), 11 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index bdc7ec7195a..e9a442ce7e9 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -1,5 +1,5 @@
 /*
- *  Copyright (C) 2016-2017 Red Hat, Inc.
+ *  Copyright (C) 2016-2019 Red Hat, Inc.
  *  Copyright (C) 2005  Anthony Liguori 
  *
  *  Network Block Device
@@ -262,6 +262,9 @@ struct NBDExportInfo {
 /* Set by client before nbd_receive_negotiate() */
 bool request_sizes;
 char *x_dirty_bitmap;
+
+/* Set by client before nbd_receive_negotiate(), or by server results
+ * during nbd_receive_export_list() */
 char *name; /* must be non-NULL */

 /* In-out fields, set by client before nbd_receive_negotiate() and
@@ -269,7 +272,8 @@ struct NBDExportInfo {
 bool structured_reply;
 bool base_allocation; /* base:allocation context for NBD_CMD_BLOCK_STATUS 
*/

-/* Set by server results during nbd_receive_negotiate() */
+/* Set by server results during nbd_receive_negotiate() and
+ * nbd_receive_export_list() */
 uint64_t size;
 uint16_t flags;
 uint32_t min_block;
@@ -277,12 +281,19 @@ struct NBDExportInfo {
 uint32_t max_block;

 uint32_t context_id;
+
+/* Set by server results during nbd_receive_export_list() */
+char *description;
 };
 typedef struct NBDExportInfo NBDExportInfo;

 int nbd_receive_negotiate(QIOChannel *ioc, QCryptoTLSCreds *tlscreds,
   const char *hostname, QIOChannel **outioc,
   NBDExportInfo *info, Error **errp);
+void nbd_free_export_list(NBDExportInfo *info, int count);
+int nbd_receive_export_list(QIOChannel *ioc, QCryptoTLSCreds *tlscreds,
+const char *hostname, NBDExportInfo **info,
+Error **errp);
 int nbd_init(int fd, QIOChannelSocket *sioc, NBDExportInfo *info,
  Error **errp);
 int nbd_send_request(QIOChannel *ioc, NBDRequest *request);
diff --git a/nbd/client.c b/nbd/client.c
index 620fbb5ef01..a5a705a67f7 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -330,11 +330,14 @@ static int nbd_receive_list(QIOChannel *ioc, char **name, 
char **description,
 }


-/* Returns -1 if NBD_OPT_GO proves the export @info->name cannot be
+/*
+ * Returns -1 if NBD_OPT_GO proves the export @info->name cannot be
  * used, 0 if NBD_OPT_GO is unsupported (fall back to NBD_OPT_LIST and
  * NBD_OPT_EXPORT_NAME in that case), and > 0 if the export is good to
- * go (with the rest of @info populated). */
-static int nbd_opt_go(QIOChannel *ioc, NBDExportInfo *info, Error **errp)
+ * go (with the rest of @info populated).
+ */
+static int nbd_opt_info_or_go(QIOChannel *ioc, uint32_t opt,
+  NBDExportInfo *info, Error **errp)
 {
 NBDOptionReply reply;
 uint32_t len = strlen(info->name);
@@ -347,7 +350,8 @@ static int nbd_opt_go(QIOChannel *ioc,

[Qemu-devel] [PATCH v3 10/19] nbd/client: Split out nbd_send_one_meta_context()

2019-01-12 Thread Eric Blake

Refactor nbd_negotiate_simple_meta_context() to pull out the
code that can be reused to send a LIST request for 0 or 1 query.
No semantic change.  The old comment about 'sizeof(uint32_t)'
being equivalent to '/* number of queries */' is no longer
needed, now that we are computing 'sizeof(queries)' instead.

Signed-off-by: Eric Blake 
Message-Id: <20181215135324.152629-14-ebl...@redhat.com>
Reviewed-by: Richard W.M. Jones 

---
v3: Improve commit message [Rich], formatting tweak [checkpatch],
rebase to dropped patch
---
 nbd/client.c | 67 +---
 nbd/trace-events |  2 +-
 2 files changed, 48 insertions(+), 21 deletions(-)

diff --git a/nbd/client.c b/nbd/client.c
index 77993890f04..3c716be2719 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -629,6 +629,49 @@ static QIOChannel *nbd_receive_starttls(QIOChannel *ioc,
 return QIO_CHANNEL(tioc);
 }

+/*
+ * nbd_send_one_meta_context:
+ * Send 0 or 1 set/list meta context queries.
+ * Return 0 on success, -1 with errp set for any error
+ */
+static int nbd_send_one_meta_context(QIOChannel *ioc,
+ uint32_t opt,
+ const char *export,
+ const char *query,
+ Error **errp)
+{
+int ret;
+uint32_t export_len = strlen(export);
+uint32_t queries = !!query;
+uint32_t context_len = 0;
+uint32_t data_len;
+char *data;
+char *p;
+
+data_len = sizeof(export_len) + export_len + sizeof(queries);
+if (query) {
+context_len = strlen(query);
+data_len += sizeof(context_len) + context_len;
+} else {
+assert(opt == NBD_OPT_LIST_META_CONTEXT);
+}
+data = g_malloc(data_len);
+p = data;
+
+trace_nbd_opt_meta_request(nbd_opt_lookup(opt), query ?: "(all)", export);
+stl_be_p(p, export_len);
+memcpy(p += sizeof(export_len), export, export_len);
+stl_be_p(p += export_len, queries);
+if (query) {
+stl_be_p(p += sizeof(uint32_t), context_len);
+memcpy(p += sizeof(context_len), query, context_len);
+}
+
+ret = nbd_send_option_request(ioc, opt, data_len, data, errp);
+g_free(data);
+return ret;
+}
+
 /* nbd_negotiate_simple_meta_context:
  * Request the server to set the meta context for export @info->name
  * using @info->x_dirty_bitmap with a fallback to "base:allocation",
@@ -653,26 +696,10 @@ static int nbd_negotiate_simple_meta_context(QIOChannel 
*ioc,
 NBDOptionReply reply;
 const char *context = info->x_dirty_bitmap ?: "base:allocation";
 bool received = false;
-uint32_t export_len = strlen(info->name);
-uint32_t context_len = strlen(context);
-uint32_t data_len = sizeof(export_len) + export_len +
-sizeof(uint32_t) + /* number of queries */
-sizeof(context_len) + context_len;
-char *data = g_malloc(data_len);
-char *p = data;

-trace_nbd_opt_meta_request(context, info->name);
-stl_be_p(p, export_len);
-memcpy(p += sizeof(export_len), info->name, export_len);
-stl_be_p(p += export_len, 1);
-stl_be_p(p += sizeof(uint32_t), context_len);
-memcpy(p += sizeof(context_len), context, context_len);
-
-ret = nbd_send_option_request(ioc, NBD_OPT_SET_META_CONTEXT, data_len, 
data,
-  errp);
-g_free(data);
-if (ret < 0) {
-return ret;
+if (nbd_send_one_meta_context(ioc, NBD_OPT_SET_META_CONTEXT,
+  info->name, context, errp) < 0) {
+return -1;
 }

 if (nbd_receive_option_reply(ioc, NBD_OPT_SET_META_CONTEXT, ,
@@ -689,7 +716,7 @@ static int nbd_negotiate_simple_meta_context(QIOChannel 
*ioc,
 if (reply.type == NBD_REP_META_CONTEXT) {
 char *name;

-if (reply.length != sizeof(info->context_id) + context_len) {
+if (reply.length != sizeof(info->context_id) + strlen(context)) {
 error_setg(errp, "Failed to negotiate meta context '%s', server "
"answered with unexpected length %" PRIu32, context,
reply.length);
diff --git a/nbd/trace-events b/nbd/trace-events
index c3966d2b653..59521e47a3d 100644
--- a/nbd/trace-events
+++ b/nbd/trace-events
@@ -12,7 +12,7 @@ nbd_receive_query_exports_start(const char *wantname) 
"Querying export list for
 nbd_receive_query_exports_success(const char *wantname) "Found desired export 
name '%s'"
 nbd_receive_starttls_new_client(void) "Setting up TLS"
 nbd_receive_starttls_tls_handshake(void) "Starting TLS handshake"
-nbd_opt_meta_request(const char *context, const char *export) "Requesting to 
set meta context %s for export %s"
+nbd_opt_meta_request(const char *optname, const char *context, const char 
*export) "Requesting %s %s for export %s"
 nbd_opt_meta_reply(const char *context, uint32_t id) "Received mapping of 
context %s to id %" PRIu32

[Qemu-devel] [PATCH v3 12/19] nbd/client: Refactor return of nbd_receive_negotiate()

2019-01-12 Thread Eric Blake

The function could only ever return 0 or -EINVAL; make this
clearer by dropping a useless 'fail:' label.

Signed-off-by: Eric Blake 
Reviewed-by: Richard W.M. Jones 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Message-Id: <20181215135324.152629-16-ebl...@redhat.com>
---
 nbd/client.c | 51 +++
 1 file changed, 23 insertions(+), 28 deletions(-)

diff --git a/nbd/client.c b/nbd/client.c
index 22505199d3b..fa931fd8e5d 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -818,7 +818,6 @@ int nbd_receive_negotiate(QIOChannel *ioc, QCryptoTLSCreds 
*tlscreds,
   NBDExportInfo *info, Error **errp)
 {
 uint64_t magic;
-int rc;
 bool zeroes = true;
 bool structured_reply = info->structured_reply;
 bool base_allocation = info->base_allocation;
@@ -829,31 +828,30 @@ int nbd_receive_negotiate(QIOChannel *ioc, 
QCryptoTLSCreds *tlscreds,
 trace_nbd_receive_negotiate_name(info->name);
 info->structured_reply = false;
 info->base_allocation = false;
-rc = -EINVAL;

 if (outioc) {
 *outioc = NULL;
 }
 if (tlscreds && !outioc) {
 error_setg(errp, "Output I/O channel required for TLS");
-goto fail;
+return -EINVAL;
 }

 if (nbd_read(ioc, , sizeof(magic), errp) < 0) {
 error_prepend(errp, "Failed to read initial magic: ");
-goto fail;
+return -EINVAL;
 }
 magic = be64_to_cpu(magic);
 trace_nbd_receive_negotiate_magic(magic);

 if (magic != NBD_INIT_MAGIC) {
 error_setg(errp, "Bad initial magic received: 0x%" PRIx64, magic);
-goto fail;
+return -EINVAL;
 }

 if (nbd_read(ioc, , sizeof(magic), errp) < 0) {
 error_prepend(errp, "Failed to read server magic: ");
-goto fail;
+return -EINVAL;
 }
 magic = be64_to_cpu(magic);
 trace_nbd_receive_negotiate_magic(magic);
@@ -865,7 +863,7 @@ int nbd_receive_negotiate(QIOChannel *ioc, QCryptoTLSCreds 
*tlscreds,

 if (nbd_read(ioc, , sizeof(globalflags), errp) < 0) {
 error_prepend(errp, "Failed to read server flags: ");
-goto fail;
+return -EINVAL;
 }
 globalflags = be16_to_cpu(globalflags);
 trace_nbd_receive_negotiate_server_flags(globalflags);
@@ -881,18 +879,18 @@ int nbd_receive_negotiate(QIOChannel *ioc, 
QCryptoTLSCreds *tlscreds,
 clientflags = cpu_to_be32(clientflags);
 if (nbd_write(ioc, , sizeof(clientflags), errp) < 0) {
 error_prepend(errp, "Failed to send clientflags field: ");
-goto fail;
+return -EINVAL;
 }
 if (tlscreds) {
 if (fixedNewStyle) {
 *outioc = nbd_receive_starttls(ioc, tlscreds, hostname, errp);
 if (!*outioc) {
-goto fail;
+return -EINVAL;
 }
 ioc = *outioc;
 } else {
 error_setg(errp, "Server does not support STARTTLS");
-goto fail;
+return -EINVAL;
 }
 }
 if (fixedNewStyle) {
@@ -903,7 +901,7 @@ int nbd_receive_negotiate(QIOChannel *ioc, QCryptoTLSCreds 
*tlscreds,
NBD_OPT_STRUCTURED_REPLY,
errp);
 if (result < 0) {
-goto fail;
+return -EINVAL;
 }
 info->structured_reply = result == 1;
 }
@@ -911,7 +909,7 @@ int nbd_receive_negotiate(QIOChannel *ioc, QCryptoTLSCreds 
*tlscreds,
 if (info->structured_reply && base_allocation) {
 result = nbd_negotiate_simple_meta_context(ioc, info, errp);
 if (result < 0) {
-goto fail;
+return -EINVAL;
 }
 info->base_allocation = result == 1;
 }
@@ -923,7 +921,7 @@ int nbd_receive_negotiate(QIOChannel *ioc, QCryptoTLSCreds 
*tlscreds,
  * export, then use NBD_OPT_EXPORT_NAME.  */
 result = nbd_opt_go(ioc, info, errp);
 if (result < 0) {
-goto fail;
+return -EINVAL;
 }
 if (result > 0) {
 return 0;
@@ -935,25 +933,25 @@ int nbd_receive_negotiate(QIOChannel *ioc, 
QCryptoTLSCreds *tlscreds,
  * export name is not available.
  */
 if (nbd_receive_query_exports(ioc, info->name, errp) < 0) {
-goto fail;
+return -EINVAL;
 }
 }
 /* write the export name request */
 if (nbd_send_option_request(ioc, NBD_OPT_EXPORT_NAME, -1, info->name,
 errp) < 0) {
-goto fail;
+return -EINVAL;
 }

 /* Read the response */
 if (nbd_read(ioc,

[Qemu-devel] [PATCH v3 05/19] nbd/server: Favor [u]int64_t over off_t

2019-01-12 Thread Eric Blake

Although our compile-time environment is set up so that we always
support long files with 64-bit off_t, we have no guarantee whether
off_t is the same type as int64_t.  This requires casts when
printing values, and prevents us from directly using qemu_strtoi64().
Let's just flip to [u]int64_t (signed for length, because we have to
detect failure of blk_getlength() and because off_t was signed;
unsigned for offset because it lets us simplify some math without
having to worry about signed overflow).

Suggested-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Eric Blake 

---
v3: new patch
---
 include/block/nbd.h |  4 ++--
 nbd/server.c| 14 +++---
 qemu-nbd.c  | 26 ++
 3 files changed, 19 insertions(+), 25 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 1971b557896..0f252829376 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -294,8 +294,8 @@ int nbd_errno_to_system_errno(int err);
 typedef struct NBDExport NBDExport;
 typedef struct NBDClient NBDClient;

-NBDExport *nbd_export_new(BlockDriverState *bs, off_t dev_offset, off_t size,
-  const char *name, const char *description,
+NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset,
+  int64_t size, const char *name, const char *desc,
   const char *bitmap, uint16_t nbdflags,
   void (*close)(NBDExport *), bool writethrough,
   BlockBackend *on_eject_blk, Error **errp);
diff --git a/nbd/server.c b/nbd/server.c
index c9937ccdc2a..15357d40fd7 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -77,8 +77,8 @@ struct NBDExport {
 BlockBackend *blk;
 char *name;
 char *description;
-off_t dev_offset;
-off_t size;
+uint64_t dev_offset;
+int64_t size;
 uint16_t nbdflags;
 QTAILQ_HEAD(, NBDClient) clients;
 QTAILQ_ENTRY(NBDExport) next;
@@ -1455,8 +1455,8 @@ static void nbd_eject_notifier(Notifier *n, void *data)
 nbd_export_close(exp);
 }

-NBDExport *nbd_export_new(BlockDriverState *bs, off_t dev_offset, off_t size,
-  const char *name, const char *description,
+NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset,
+  int64_t size, const char *name, const char *desc,
   const char *bitmap, uint16_t nbdflags,
   void (*close)(NBDExport *), bool writethrough,
   BlockBackend *on_eject_blk, Error **errp)
@@ -1497,7 +1497,7 @@ NBDExport *nbd_export_new(BlockDriverState *bs, off_t 
dev_offset, off_t size,
 exp->blk = blk;
 exp->dev_offset = dev_offset;
 exp->name = g_strdup(name);
-exp->description = g_strdup(description);
+exp->description = g_strdup(desc);
 exp->nbdflags = nbdflags;
 assert(dev_offset <= size);
 exp->size = QEMU_ALIGN_DOWN(size, BDRV_SECTOR_SIZE);
@@ -2131,8 +2131,8 @@ static int nbd_co_receive_request(NBDRequestData *req, 
NBDRequest *request,
 if (request->from > client->exp->size ||
 request->from + request->len > client->exp->size) {
 error_setg(errp, "operation past EOF; From: %" PRIu64 ", Len: %" PRIu32
-   ", Size: %" PRIu64, request->from, request->len,
-   (uint64_t)client->exp->size);
+   ", Size: %" PRId64, request->from, request->len,
+   client->exp->size);
 return (request->type == NBD_CMD_WRITE ||
 request->type == NBD_CMD_WRITE_ZEROES) ? -ENOSPC : -EINVAL;
 }
diff --git a/qemu-nbd.c b/qemu-nbd.c
index ff4adb9b3eb..96c0829970c 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -176,7 +176,7 @@ static void read_partition(uint8_t *p, struct 
partition_record *r)
 }

 static int find_partition(BlockBackend *blk, int partition,
-  off_t *offset, off_t *size)
+  uint64_t *offset, int64_t *size)
 {
 struct partition_record mbr[4];
 uint8_t data[MBR_SIZE];
@@ -500,14 +500,14 @@ int main(int argc, char **argv)
 {
 BlockBackend *blk;
 BlockDriverState *bs;
-off_t dev_offset = 0;
+uint64_t dev_offset = 0;
 uint16_t nbdflags = 0;
 bool disconnect = false;
 const char *bindto = NULL;
 const char *port = NULL;
 char *sockpath = NULL;
 char *device = NULL;
-off_t fd_size;
+int64_t fd_size;
 QemuOpts *sn_opts = NULL;
 const char *sn_id_or_name = NULL;
 const char *sopt = "hVb:o:p:rsnP:c:dvk:e:f:tl:x:T:D:B:";
@@ -665,10 +665,6 @@ int main(int argc, char **argv)
 error_report("Invalid offset `%s'", optarg);
 exit(EXIT_FAILURE);
 }
-if (dev_offset < 0) {
-error_report("Offset must be positive `%s'", optarg);
-exit(EXIT_FAILURE);
-}
 break;
 case 'l':
 if (strstart(optarg,

[Qemu-devel] [PATCH v3 06/19] qemu-nbd: Avoid strtol open-coding

2019-01-12 Thread Eric Blake

Our copy-and-pasted open-coding of strtol handling forgot to
handle overflow conditions.  Use qemu_strto*() instead.

In the case of --partition, since we insist on a user-supplied
partition to be non-zero, we can use 0 rather than -1 for our
initial value to distinguish when a partition is not being
served, for slightly more optimal code.

The error messages for out-of-bounds values are less specific,
but should not be a terrible loss in quality.

Signed-off-by: Eric Blake 
Message-Id: <20181215135324.152629-8-ebl...@redhat.com>

---
v3: rebase to use int64_t rather than off_t [Vladimir]
---
 qemu-nbd.c | 28 +---
 1 file changed, 9 insertions(+), 19 deletions(-)

diff --git a/qemu-nbd.c b/qemu-nbd.c
index 96c0829970c..4670b659167 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -546,9 +546,8 @@ int main(int argc, char **argv)
 };
 int ch;
 int opt_ind = 0;
-char *end;
 int flags = BDRV_O_RDWR;
-int partition = -1;
+int partition = 0;
 int ret = 0;
 bool seen_cache = false;
 bool seen_discard = false;
@@ -660,9 +659,8 @@ int main(int argc, char **argv)
 port = optarg;
 break;
 case 'o':
-dev_offset = strtoll (optarg, , 0);
-if (*end) {
-error_report("Invalid offset `%s'", optarg);
+if (qemu_strtou64(optarg, NULL, 0, _offset) < 0) {
+error_report("Invalid offset '%s'", optarg);
 exit(EXIT_FAILURE);
 }
 break;
@@ -684,13 +682,9 @@ int main(int argc, char **argv)
 flags &= ~BDRV_O_RDWR;
 break;
 case 'P':
-partition = strtol(optarg, , 0);
-if (*end) {
-error_report("Invalid partition `%s'", optarg);
-exit(EXIT_FAILURE);
-}
-if (partition < 1 || partition > 8) {
-error_report("Invalid partition %d", partition);
+if (qemu_strtoi(optarg, NULL, 0, ) < 0 ||
+partition < 1 || partition > 8) {
+error_report("Invalid partition '%s'", optarg);
 exit(EXIT_FAILURE);
 }
 break;
@@ -711,15 +705,11 @@ int main(int argc, char **argv)
 device = optarg;
 break;
 case 'e':
-shared = strtol(optarg, , 0);
-if (*end) {
+if (qemu_strtoi(optarg, NULL, 0, ) < 0 ||
+shared < 1) {
 error_report("Invalid shared device number '%s'", optarg);
 exit(EXIT_FAILURE);
 }
-if (shared < 1) {
-error_report("Shared device number must be greater than 0");
-exit(EXIT_FAILURE);
-}
 break;
 case 'f':
 fmt = optarg;
@@ -1007,7 +997,7 @@ int main(int argc, char **argv)
 }
 fd_size -= dev_offset;

-if (partition != -1) {
+if (partition) {
 int64_t limit;

 if (dev_offset) {
-- 
2.20.1

[Qemu-devel] [PATCH v3 19/19] iotests: Enhance 223, 233 to cover 'qemu-nbd --list'

2019-01-12 Thread Eric Blake

Any good new feature deserves some regression testing :)
Coverage includes:
- 223: what happens when there are 0 or more than 1 export,
proof that we can see multiple contexts including qemu:dirty-bitmap
- 233: proof that we can list over TLS, and that mix-and-match of
plain/TLS listings will behave sanely

Signed-off-by: Eric Blake 
Reviewed-by: Richard W.M. Jones 
Tested-by: Richard W.M. Jones 
Reviewed-by: Vladimir Sementsov-Ogievskiy 

---
v3: Rebase to earlier changes
---
 tests/qemu-iotests/223 |  2 ++
 tests/qemu-iotests/223.out | 20 
 tests/qemu-iotests/233 | 19 +--
 tests/qemu-iotests/233.out | 15 +++
 4 files changed, 50 insertions(+), 6 deletions(-)

diff --git a/tests/qemu-iotests/223 b/tests/qemu-iotests/223
index 773892dbe60..f120a016460 100755
--- a/tests/qemu-iotests/223
+++ b/tests/qemu-iotests/223
@@ -127,6 +127,7 @@ _send_qemu_cmd $QEMU_HANDLE '{"execute":"nbd-server-start",
 _send_qemu_cmd $QEMU_HANDLE '{"execute":"nbd-server-start",
   "arguments":{"addr":{"type":"unix",
 "data":{"path":"'"$TEST_DIR/nbd"1'"' "error" # Attempt second server
+$QEMU_NBD_PROG -L -k "$TEST_DIR/nbd"
 _send_qemu_cmd $QEMU_HANDLE '{"execute":"nbd-server-add",
   "arguments":{"device":"n", "bitmap":"b"}}' "return"
 _send_qemu_cmd $QEMU_HANDLE '{"execute":"nbd-server-add",
@@ -142,6 +143,7 @@ _send_qemu_cmd $QEMU_HANDLE '{"execute":"nbd-server-add",
 _send_qemu_cmd $QEMU_HANDLE '{"execute":"nbd-server-add",
   "arguments":{"device":"n", "name":"n2", "writable":true,
   "bitmap":"b2"}}' "return"
+$QEMU_NBD_PROG -L -k "$TEST_DIR/nbd"

 echo
 echo "=== Contrast normal status to large granularity dirty-bitmap ==="
diff --git a/tests/qemu-iotests/223.out b/tests/qemu-iotests/223.out
index 0de5240a75e..6476b77ba20 100644
--- a/tests/qemu-iotests/223.out
+++ b/tests/qemu-iotests/223.out
@@ -30,12 +30,32 @@ wrote 2097152/2097152 bytes at offset 2097152
 {"error": {"class": "GenericError", "desc": "NBD server not running"}}
 {"return": {}}
 {"error": {"class": "GenericError", "desc": "NBD server already running"}}
+exports available: 0
 {"return": {}}
 {"error": {"class": "GenericError", "desc": "Cannot find device=nosuch nor 
node_name=nosuch"}}
 {"error": {"class": "GenericError", "desc": "NBD server already has export 
named 'n'"}}
 {"error": {"class": "GenericError", "desc": "Enabled bitmap 'b2' incompatible 
with readonly export"}}
 {"error": {"class": "GenericError", "desc": "Bitmap 'b3' is not found"}}
 {"return": {}}
+exports available: 2
+ export: 'n'
+  size:  4194304
+  flags: 0x4ef ( readonly flush fua trim zeroes df cache )
+  min block: 512
+  opt block: 4096
+  max block: 33554432
+  available meta contexts: 2
+   base:allocation
+   qemu:dirty-bitmap:b
+ export: 'n2'
+  size:  4194304
+  flags: 0x4ed ( flush fua trim zeroes df cache )
+  min block: 512
+  opt block: 4096
+  max block: 33554432
+  available meta contexts: 2
+   base:allocation
+   qemu:dirty-bitmap:b2

 === Contrast normal status to large granularity dirty-bitmap ===

diff --git a/tests/qemu-iotests/233 b/tests/qemu-iotests/233
index 1814efe..a6ef7b20fb4 100755
--- a/tests/qemu-iotests/233
+++ b/tests/qemu-iotests/233
@@ -68,10 +68,12 @@ echo
 echo "== check TLS client to plain server fails =="
 nbd_server_start_tcp_socket -f $IMGFMT "$TEST_IMG"

-$QEMU_IMG info --image-opts \
---object tls-creds-x509,dir=${tls_dir}/client1,endpoint=client,id=tls0 \
+obj=tls-creds-x509,dir=${tls_dir}/client1,endpoint=client,id=tls0
+$QEMU_IMG info --image-opts --object $obj \
 driver=nbd,host=$nbd_tcp_addr,port=$nbd_tcp_port,tls-creds=tls0 \
 2>&1 | sed "s/$nbd_tcp_port/PORT/g"
+$QEMU_NBD_PROG -L -b $nbd_tcp_addr -p $nbd_tcp_port --object $obj \
+--tls-creds=tls0

 nbd_server_stop

@@ -84,20 +86,25 @@ nbd_server_start_tcp_socket \
 -f $IMGFMT "$TEST_IMG"

 $QEMU_IMG info nbd://localhost:$nbd_tcp_port 2>&1 | sed 
"s/$nbd_tcp_port/PORT/g"
+$QEMU_NBD_PROG -L -b $nbd_tcp_addr -p $nbd_tcp_port

 echo
 echo "== check TLS works =="
-$QEMU_IMG info --image-opts \
---object tls-creds-x509,dir=${tls_dir}/client1,endpoint=client,id=tls0 \
+obj=tls-creds-x509,dir=${tls_dir}/client1,endpoint=client,id=tls0
+$QEMU_IMG info --image-opts --object $obj \
 driver=nbd,host=$nbd_tcp_addr,port=$nbd_tcp_port,tls-creds=tls0 \
 2>&1 | sed "s/$nbd_tcp_port/PORT/g"
+$QEMU_NBD_PROG -L -b $nbd_tcp_addr -p $nbd_tcp_port --object $obj \
+--tls-creds=tls0

 echo
 echo "== check TLS with different CA fails =="
-$QEMU_IMG info --image-opts \
---object tls-creds-x509,dir=${tls_dir}/client2,endpoint=client,id=tls0 \
+obj=tls-creds-x509,dir=${tls_dir}/client2,endpoint=client,id=tls0
+$QEMU_IMG info --image-opts --object $obj \
 driver=nbd,host=$nbd_tcp_addr,port=$nbd_tcp_port,tls-creds=tls0 \
 2>&1 | sed "s/$nbd_tcp_port/PORT/g"
+$QEMU_NBD_PROG -L -b $nbd_tcp_addr -p $nbd_tcp_port --object $obj \
+--tls-creds=tls0

 echo
 echo "== perform I/O over TLS =="
diff --git

[Qemu-devel] [PATCH v3 09/19] nbd/client: Change signature of nbd_negotiate_simple_meta_context()

2019-01-12 Thread Eric Blake

Pass 'info' instead of three separate parameters related to info,
when requesting the server to set the meta context.  Update the
NBDExportInfo struct to rename the received id field to match the
fact that we are currently overloading the field to match whatever
context the user supplied through the x-dirty-bitmap hack, as well
as adding a TODO comment to remind future patches about a desire
to request two contexts at once.

Signed-off-by: Eric Blake 
Message-Id: <20181215135324.152629-12-ebl...@redhat.com>
Reviewed-by: Richard W.M. Jones 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/nbd.h |  2 +-
 block/nbd-client.c  |  4 ++--
 nbd/client.c| 53 +
 3 files changed, 28 insertions(+), 31 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 6d76d3dc221..bdc7ec7195a 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -276,7 +276,7 @@ struct NBDExportInfo {
 uint32_t opt_block;
 uint32_t max_block;

-uint32_t meta_base_allocation_id;
+uint32_t context_id;
 };
 typedef struct NBDExportInfo NBDExportInfo;

diff --git a/block/nbd-client.c b/block/nbd-client.c
index 3309376bc16..813539676d2 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -249,11 +249,11 @@ static int nbd_parse_blockstatus_payload(NBDClientSession 
*client,
 }

 context_id = payload_advance32();
-if (client->info.meta_base_allocation_id != context_id) {
+if (client->info.context_id != context_id) {
 error_setg(errp, "Protocol error: unexpected context id %d for "
  "NBD_REPLY_TYPE_BLOCK_STATUS, when negotiated context 
"
  "id is %d", context_id,
- client->info.meta_base_allocation_id);
+ client->info.context_id);
 return -EINVAL;
 }

diff --git a/nbd/client.c b/nbd/client.c
index 8227e69478a..77993890f04 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -630,26 +630,30 @@ static QIOChannel *nbd_receive_starttls(QIOChannel *ioc,
 }

 /* nbd_negotiate_simple_meta_context:
- * Set one meta context. Simple means that reply must contain zero (not
- * negotiated) or one (negotiated) contexts. More contexts would be considered
- * as a protocol error. It's also implied that meta-data query equals queried
- * context name, so, if server replies with something different than @context,
- * it is considered an error too.
- * return 1 for successful negotiation, context_id is set
+ * Request the server to set the meta context for export @info->name
+ * using @info->x_dirty_bitmap with a fallback to "base:allocation",
+ * setting @info->context_id to the resulting id. Fail if the server
+ * responds with more than one context or with a context different
+ * than the query.
+ * return 1 for successful negotiation,
  *0 if operation is unsupported,
  *-1 with errp set for any other error
  */
 static int nbd_negotiate_simple_meta_context(QIOChannel *ioc,
- const char *export,
- const char *context,
- uint32_t *context_id,
+ NBDExportInfo *info,
  Error **errp)
 {
+/*
+ * TODO: Removing the x_dirty_bitmap hack will mean refactoring
+ * this function to request and store ids for multiple contexts
+ * (both base:allocation and a dirty bitmap), at which point this
+ * function should lose the term _simple.
+ */
 int ret;
 NBDOptionReply reply;
-uint32_t received_id = 0;
+const char *context = info->x_dirty_bitmap ?: "base:allocation";
 bool received = false;
-uint32_t export_len = strlen(export);
+uint32_t export_len = strlen(info->name);
 uint32_t context_len = strlen(context);
 uint32_t data_len = sizeof(export_len) + export_len +
 sizeof(uint32_t) + /* number of queries */
@@ -657,9 +661,9 @@ static int nbd_negotiate_simple_meta_context(QIOChannel 
*ioc,
 char *data = g_malloc(data_len);
 char *p = data;

-trace_nbd_opt_meta_request(context, export);
+trace_nbd_opt_meta_request(context, info->name);
 stl_be_p(p, export_len);
-memcpy(p += sizeof(export_len), export, export_len);
+memcpy(p += sizeof(export_len), info->name, export_len);
 stl_be_p(p += export_len, 1);
 stl_be_p(p += sizeof(uint32_t), context_len);
 memcpy(p += sizeof(context_len), context, context_len);
@@ -685,7 +689,7 @@ static int nbd_negotiate_simple_meta_context(QIOChannel 
*ioc,
 if (reply.type == NBD_REP_META_CONTEXT) {
 char *name;

-if (reply.length != sizeof(received_id) + context_len) {
+if (reply.length != sizeof(info->context_id) + context_len) {
 error_setg(errp, "Failed to negotiate meta context '%s', server "

[Qemu-devel] [PATCH v3 16/19] nbd/client: Add meta contexts to nbd_receive_export_list()

2019-01-12 Thread Eric Blake

We want to be able to detect whether a given qemu NBD server is
exposing the right export(s) and dirty bitmaps, at least for
regression testing.  We could use 'nbd-client -l' from the upstream
NBD project to list exports, but it's annoying to rely on
out-of-tree binaries; furthermore, nbd-client doesn't necessarily
know about all of the qemu NBD extensions.  Thus, we plan on adding
a new mode to qemu-nbd that merely sniffs all possible information
from the server during handshake phase, then disconnects and dumps
the information.

This patch continues the work of the previous patch, by adding the
ability to track the list of available meta contexts into
NBDExportInfo.  It benefits from the recent refactoring patches
with a new nbd_list_meta_contexts() that reuses much of the same
framework as setting a meta context.

Note: a malicious server could exhaust memory of a client by feeding
an unending loop of contexts; perhaps we could place a limit on how
many we are willing to receive. But this is no different from our
earlier analysis on a server sending an unending list of exports,
and the death of a client due to memory exhaustion when the client
was going to exit soon anyways is not really a denial of service
attack.

Signed-off-by: Eric Blake 
Message-Id: <20181215135324.152629-20-ebl...@redhat.com>
Reviewed-by: Richard W.M. Jones 

---
v3: mention security (non-)issue in commit message [Rich]
---
 include/block/nbd.h |  2 ++
 nbd/client.c| 41 +++--
 2 files changed, 41 insertions(+), 2 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index e9a442ce7e9..80ee3cc997e 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -284,6 +284,8 @@ struct NBDExportInfo {

 /* Set by server results during nbd_receive_export_list() */
 char *description;
+int n_contexts;
+char **contexts;
 };
 typedef struct NBDExportInfo NBDExportInfo;

diff --git a/nbd/client.c b/nbd/client.c
index a5a705a67f7..2001e6e8160 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -817,6 +817,36 @@ static int nbd_negotiate_simple_meta_context(QIOChannel 
*ioc,
 return received;
 }

+/*
+ * nbd_list_meta_contexts:
+ * Request the server to list all meta contexts for export @info->name.
+ * return 0 if list is complete (even if empty),
+ *-1 with errp set for any other error
+ */
+static int nbd_list_meta_contexts(QIOChannel *ioc,
+  NBDExportInfo *info,
+  Error **errp)
+{
+int ret;
+
+if (nbd_send_one_meta_context(ioc, NBD_OPT_LIST_META_CONTEXT,
+  info->name, NULL, errp) < 0) {
+return -1;
+}
+
+while (1) {
+char *context;
+
+ret = nbd_receive_one_meta_context(ioc, NBD_OPT_LIST_META_CONTEXT,
+   , NULL, errp);
+if (ret <= 0) {
+return ret;
+}
+info->contexts = g_renew(char *, info->contexts, ++info->n_contexts);
+info->contexts[info->n_contexts - 1] = context;
+}
+}
+
 /*
  * nbd_start_negotiate:
  * Start the handshake to the server.  After a positive return, the server
@@ -1063,7 +1093,7 @@ int nbd_receive_negotiate(QIOChannel *ioc, 
QCryptoTLSCreds *tlscreds,
 /* Clean up result of nbd_receive_export_list */
 void nbd_free_export_list(NBDExportInfo *info, int count)
 {
-int i;
+int i, j;

 if (!info) {
 return;
@@ -1072,6 +1102,10 @@ void nbd_free_export_list(NBDExportInfo *info, int count)
 for (i = 0; i < count; i++) {
 g_free(info[i].name);
 g_free(info[i].description);
+for (j = 0; j < info[i].n_contexts; j++) {
+g_free(info[i].contexts[j]);
+}
+g_free(info[i].contexts);
 }
 g_free(info);
 }
@@ -1139,7 +1173,10 @@ int nbd_receive_export_list(QIOChannel *ioc, 
QCryptoTLSCreds *tlscreds,
 break;
 }

-/* TODO: Grab meta contexts */
+if (result == 3 &&
+nbd_list_meta_contexts(ioc, [i], errp) < 0) {
+goto out;
+}
 }

 /* Send NBD_OPT_ABORT as a courtesy before hanging up */
-- 
2.20.1

[Qemu-devel] [PATCH v3 07/19] nbd/client: Refactor nbd_receive_list()

2019-01-12 Thread Eric Blake

Right now, nbd_receive_list() is only called by
nbd_receive_query_exports(), which in turn is only called if the
server lacks NBD_OPT_GO but has working option negotiation, and is
merely used as a quality-of-implementation trick since servers
can't give decent errors for NBD_OPT_EXPORT_NAME.  However, servers
that lack NBD_OPT_GO are becoming increasingly rare (nbdkit was a
latecomer, in Aug 2018, but qemu has been such a server since commit
f37708f6 in July 2017 and released in 2.10), so it no longer makes
sense to micro-optimize that function for performance.

Furthermore, when debugging a server's implementation, tracing the
full reply (both names and descriptions) is useful, not to mention
that upcoming patches adding 'qemu-nbd --list' will want to collect
that data.  And when you consider that a server can send an export
name up to the NBD protocol length limit of 4k; but our current
NBD_MAX_NAME_SIZE is only 256, we can't trace all valid server
names without more storage, but 4k is large enough that the heap
is better than the stack for long names.

Thus, I'm changing the division of labor, with nbd_receive_list()
now always malloc'ing a result on success (the malloc is bounded
by the fact that we reject servers with a reply length larger
than 32M), and moving the comparison to 'wantname' to the caller.

There is a minor change in behavior where a server with 0 exports
(an immediate NBD_REP_ACK reply) is now no longer distinguished
from a server without LIST support (NBD_REP_ERR_UNSUP); this
information could be preserved with a complication to the calling
contract to provide a bit more information, but I didn't see the
point.  After all, the worst that can happen if our guess at a
match is wrong is that the caller will get a cryptic disconnect
when NBD_OPT_EXPORT_NAME fails (which is no different from what
would happen if we had not tried LIST), while treating an empty
list as immediate failure would prevent connecting to really old
servers that really did lack LIST.  Besides, NBD servers with 0
exports are rare (qemu can do it when using QMP nbd-server-start
without nbd-server-add - but qemu understands NBD_OPT_GO and
thus won't tickle this change in behavior).

Fix the spelling of foundExport to match coding standards while
in the area.

Signed-off-by: Eric Blake 
Message-Id: <20181215135324.152629-10-ebl...@redhat.com>
Reviewed-by: Richard W.M. Jones 
Reviewed-by: Vladimir Sementsov-Ogievskiy 

---
v3: comment tweak, s/listEmpty/list_empty/,
s/foundExport/found_export/ [Vladimir]
---
 nbd/client.c | 91 ++--
 nbd/trace-events |  1 +
 2 files changed, 59 insertions(+), 33 deletions(-)

diff --git a/nbd/client.c b/nbd/client.c
index f625c207c54..fd4ba8dec37 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -234,18 +234,24 @@ static int nbd_handle_reply_err(QIOChannel *ioc, 
NBDOptionReply *reply,
 return result;
 }

-/* Process another portion of the NBD_OPT_LIST reply.  Set *@match if
- * the current reply matches @want or if the server does not support
- * NBD_OPT_LIST, otherwise leave @match alone.  Return 0 if iteration
- * is complete, positive if more replies are expected, or negative
- * with @errp set if an unrecoverable error occurred. */
-static int nbd_receive_list(QIOChannel *ioc, const char *want, bool *match,
+/* nbd_receive_list:
+ * Process another portion of the NBD_OPT_LIST reply, populating any
+ * name received into *@name. If @description is non-NULL, and the
+ * server provided a description, that is also populated. The caller
+ * must eventually call g_free() on success.
+ * Returns 1 if name and description were set and iteration must continue,
+ * 0 if iteration is complete (including if OPT_LIST unsupported),
+ * -1 with @errp set if an unrecoverable error occurred.
+ */
+static int nbd_receive_list(QIOChannel *ioc, char **name, char **description,
 Error **errp)
 {
+int ret = -1;
 NBDOptionReply reply;
 uint32_t len;
 uint32_t namelen;
-char name[NBD_MAX_NAME_SIZE + 1];
+char *local_name = NULL;
+char *local_desc = NULL;
 int error;

 if (nbd_receive_option_reply(ioc, NBD_OPT_LIST, , errp) < 0) {
@@ -253,9 +259,6 @@ static int nbd_receive_list(QIOChannel *ioc, const char 
*want, bool *match,
 }
 error = nbd_handle_reply_err(ioc, , errp);
 if (error <= 0) {
-/* The server did not support NBD_OPT_LIST, so set *match on
- * the assumption that any name will be accepted.  */
-*match = true;
 return error;
 }
 len = reply.length;
@@ -292,33 +295,38 @@ static int nbd_receive_list(QIOChannel *ioc, const char 
*want, bool *match,
 nbd_send_opt_abort(ioc);
 return -1;
 }
-if (namelen != strlen(want)) {
-if (nbd_drop(ioc, len, errp) < 0) {
-error_prepend(errp,
-  "failed to skip export name with wrong length: ");
-

[Qemu-devel] [PATCH v3 17/19] qemu-nbd: Add --list option

2019-01-12 Thread Eric Blake

We want to be able to detect whether a given qemu NBD server is
exposing the right export(s) and dirty bitmaps, at least for
regression testing.  We could use 'nbd-client -l' from the upstream
NBD project to list exports, but it's annoying to rely on
out-of-tree binaries; furthermore, nbd-client doesn't necessarily
know about all of the qemu NBD extensions.  Thus, it is time to add
a new mode to qemu-nbd that merely sniffs all possible information
from the server during handshake phase, then disconnects and dumps
the information.

This patch actually implements --list/-L, while reusing other
options such as --tls-creds for now designating how to connect
as the client (rather than their non-list usage of how to operate
as the server).

I debated about adding this functionality to something akin to
'qemu-img info' - but that tool does not readily lend itself
to connecting to an arbitrary NBD server without also tying to
a specific export (I may, however, still add ImageInfoSpecificNBD
for reporting the bitmaps available when connecting to a single
export).  And, while it may feel a bit odd that normally
qemu-nbd is a server but 'qemu-nbd -L' is a client, we are not
really making the qemu-nbd binary that much larger, because
'qemu-nbd -c' has to operate as both server and client
simultaneously across two threads when feeding the kernel module
for /dev/nbdN access.

Sample output:
$ qemu-nbd -L
exports available: 1
 export: ''
  size:  65536
  flags: 0x4ed ( flush fua trim zeroes df cache )
  min block: 512
  opt block: 4096
  max block: 33554432
  available meta contexts: 1
   base:allocation

Note that the output only lists sizes if the server sent
NBD_FLAG_HAS_FLAGS, because a newstyle server does not give
the size otherwise.  It has the side effect that for really
old servers that did not send any flags, the size is not
output even though it was available.  However, I'm not too
concerned about that - oldstyle servers are (rightfully)
getting less common to encounter (qemu 3.0 was the last
version where we even serve it), and most existing servers
that still even offer oldstyle negotiation (such as nbdkit)
still send flags (since that was added to the NBD protocol
in 2007 to permit read-only connections).

Not done here, but maybe worth future experiments: capture
the meat of NBDExportInfo into a QAPI struct, and use the
generated QAPI pretty-printers instead of hand-rolling our
output loop.  It would also permit us to add a JSON output
mode for machine parsing.

Signed-off-by: Eric Blake 
Message-Id: <20181215135324.152629-21-ebl...@redhat.com>
Reviewed-by: Richard W.M. Jones 

---
v3: comment tweak [Rich], rebase to earlier changes
---
 qemu-nbd.texi |  27 +++--
 qemu-nbd.c| 155 +-
 2 files changed, 165 insertions(+), 17 deletions(-)

diff --git a/qemu-nbd.texi b/qemu-nbd.texi
index 3f22559beb4..65caeb7874a 100644
--- a/qemu-nbd.texi
+++ b/qemu-nbd.texi
@@ -2,6 +2,8 @@
 @c man begin SYNOPSIS
 @command{qemu-nbd} [OPTION]... @var{filename}

+@command{qemu-nbd} @option{-L} [OPTION]...
+
 @command{qemu-nbd} @option{-d} @var{dev}
 @c man end
 @end example
@@ -14,6 +16,8 @@ Other uses:
 @itemize
 @item
 Bind a /dev/nbdX block device to a QEMU server (on Linux).
+@item
+As a client to query exports of a remote NBD server.
 @end itemize

 @c man end
@@ -31,13 +35,15 @@ See the @code{qemu(1)} manual page for full details of the 
properties
 supported. The common object types that it makes sense to define are the
 @code{secret} object, which is used to supply passwords and/or encryption
 keys, and the @code{tls-creds} object, which is used to supply TLS
-credentials for the qemu-nbd server.
+credentials for the qemu-nbd server or client.
 @item -p, --port=@var{port}
-The TCP port to listen on (default @samp{10809}).
+The TCP port to listen on as a server, or connect to as a client
+(default @samp{10809}).
 @item -o, --offset=@var{offset}
 The offset into the image.
 @item -b, --bind=@var{iface}
-The interface to bind to (default @samp{0.0.0.0}).
+The interface to bind to as a server, or connect to as a client
+(default @samp{0.0.0.0}).
 @item -k, --socket=@var{path}
 Use a unix socket with path @var{path}.
 @item --image-opts
@@ -97,10 +103,14 @@ Set the NBD volume export name (default of a zero-length 
string).
 @item -D, --description=@var{description}
 Set the NBD volume export description, as a human-readable
 string.
+@item -L, --list
+Connect as a client and list all details about the exports exposed by
+a remote NBD server.
 @item --tls-creds=ID
 Enable mandatory TLS encryption for the server by setting the ID
 of the TLS credentials object previously created with the --object
-option.
+option; or provide the credentials needed for connecting as a client
+in list mode.
 @item --fork
 Fork off the server process and exit the parent once the server is running.
 @item -v, --verbose
@@ -159,6 +169,15 @@ qemu-nbd -c /dev/nbd0 -f qcow2 file.qcow2
 qemu-nbd

[Qemu-devel] [PATCH v3 18/19] nbd/client: Work around 3.0 bug for listing meta contexts

2019-01-12 Thread Eric Blake

Commit 3d068aff forgot to advertise available qemu: contexts
when the client requests a list with 0 queries. Furthermore,
3.0 shipped with a qemu-img hack of x-dirty-bitmap (commit
216ee365) that _silently_ acts as though the entire image is
clean if a requested bitmap is not present.  Both bugs have
been recently fixed, so that a modern qemu server gives full
context output right away, and the client refuses a
connection if a requested x-dirty-bitmap was not found.

Still, it is likely that there will be users that have to
work with a mix of old and new qemu versions, depending on
which features get backported where, at which point being
able to rely on 'qemu-img --list' output to know for sure
whether a given NBD export has the desired dirty bitmap is
much nicer than blindly connecting and risking that the
entire image may appear clean.  We can make our --list code
smart enough to work around buggy servers by tracking
whether we've seen any qemu: replies in the original 0-query
list; if not, repeat with a single query on "qemu:" (which
may still have no replies, but then we know for sure we
didn't trip up on the server bug).

Signed-off-by: Eric Blake 
Message-Id: <20181215135324.152629-22-ebl...@redhat.com>
---
 nbd/client.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/nbd/client.c b/nbd/client.c
index 2001e6e8160..64f3e45edd4 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -21,6 +21,7 @@
 #include "qapi/error.h"
 #include "trace.h"
 #include "nbd-internal.h"
+#include "qemu/cutils.h"

 /* Definitions for opaque data types */

@@ -828,6 +829,8 @@ static int nbd_list_meta_contexts(QIOChannel *ioc,
   Error **errp)
 {
 int ret;
+int seen_any = false;
+int seen_qemu = false;

 if (nbd_send_one_meta_context(ioc, NBD_OPT_LIST_META_CONTEXT,
   info->name, NULL, errp) < 0) {
@@ -839,9 +842,25 @@ static int nbd_list_meta_contexts(QIOChannel *ioc,

 ret = nbd_receive_one_meta_context(ioc, NBD_OPT_LIST_META_CONTEXT,
, NULL, errp);
+if (ret == 0 && seen_any && !seen_qemu) {
+/*
+ * Work around qemu 3.0 bug: the server forgot to send
+ * "qemu:" replies to 0 queries. If we saw at least one
+ * reply (probably base:allocation), but none of them were
+ * qemu:, then run a more specific query to make sure.
+ */
+seen_qemu = true;
+if (nbd_send_one_meta_context(ioc, NBD_OPT_LIST_META_CONTEXT,
+  info->name, "qemu:", errp) < 0) {
+return -1;
+}
+continue;
+}
 if (ret <= 0) {
 return ret;
 }
+seen_any = true;
+seen_qemu |= strstart(context, "qemu:", NULL);
 info->contexts = g_renew(char *, info->contexts, ++info->n_contexts);
 info->contexts[info->n_contexts - 1] = context;
 }
-- 
2.20.1

Re: [Qemu-devel] block: Update flags in bdrv_set_read_only()

2019-01-12 Thread Michael Tokarev


commit eeae6a596b0efc092f5101c67683053e245e6250
Author: Kevin Wolf 
Date:   Tue Oct 9 16:57:12 2018 +0200

block: Update flags in bdrv_set_read_only()

To fully change the read-only state of a node, we must not only change
bs->read_only, but also update bs->open_flags.

sort of broke vfat support:

 $ qemu-system-x86_64 -hda fat:foo/
 WARNING: Image format was not specified for 'json:{"fat-type": 0, "dir": "foo/", "driver": 
"vvfat", "floppy": false, "rw": false}' and probing guessed raw.
  Automatically detecting the format is dangerous for raw images, write 
operations on block 0 will be restricted.
  Specify the 'raw' format explicitly to remove the restrictions.
 qemu-system-x86_64: Initialization of device ide-hd failed: Block node is 
read-only
 $ _

The warning is annoying but harmless, but the read-only error is fatal.

"Sort-of" is because there's a somewhat strange workaround:

  -hda fat:rw:foo/

but it is a bit more dangerous as well.

It looks like vfat should be handled differently somewhere, to
eliminate both the warning and the error?

Thanks,

/mjt

[Qemu-devel] 3.1: second invocation of migrate crashes qemu

2019-01-12 Thread Michael Tokarev


$ qemu-system-x86_64 -monitor stdio -hda foo.img
QEMU 3.1.0 monitor - type 'help' for more information
(qemu) stop
(qemu) migrate "exec:cat >/dev/null"
(qemu) migrate "exec:cat >/dev/null"
qemu-system-x86_64: /build/qemu/qemu-3.1/block.c:4647: bdrv_inactivate_recurse: 
Assertion `!(bs->open_flags & BDRV_O_INACTIVE)' failed.
Aborted

(it is irrelevant what's in foo.img, it only needs to be initialized).

If it is worth to bisect I'll do that tomorrow.

Thanks,

/mjt

Re: [Qemu-devel] [PATCH] hw/pvrdma: Post CQE when receive invalid gid index

2019-01-12 Thread Marcel Apfelbaum





On 1/9/19 10:15 PM, Yuval Shaia wrote:

This error should propagate back to guest.

Signed-off-by: Yuval Shaia 
---
  hw/rdma/rdma_backend.h  | 1 +
  hw/rdma/vmw/pvrdma_qp_ops.c | 6 --
  2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/hw/rdma/rdma_backend.h b/hw/rdma/rdma_backend.h
index a9ba40ae48..5114c90e67 100644
--- a/hw/rdma/rdma_backend.h
+++ b/hw/rdma/rdma_backend.h
@@ -32,6 +32,7 @@
  #define VENDOR_ERR_INVLKEY  0x207
  #define VENDOR_ERR_MR_SMALL 0x208
  #define VENDOR_ERR_INV_MAD_BUFF 0x209
+#define VENDOR_ERR_INV_GID_IDX  0x210
  
  /* Add definition for QP0 and QP1 as there is no userspace enums for them */

  enum ibv_special_qp_type {
diff --git a/hw/rdma/vmw/pvrdma_qp_ops.c b/hw/rdma/vmw/pvrdma_qp_ops.c
index 465bee8641..0565eba981 100644
--- a/hw/rdma/vmw/pvrdma_qp_ops.c
+++ b/hw/rdma/vmw/pvrdma_qp_ops.c
@@ -178,7 +178,8 @@ int pvrdma_qp_send(PVRDMADev *dev, uint32_t qp_handle)
  sgid = rdma_rm_get_gid(>rdma_dev_res, 
wqe->hdr.wr.ud.av.gid_index);
  if (!sgid) {
  pr_dbg("Fail to get gid for idx %d\n", 
wqe->hdr.wr.ud.av.gid_index);
-return -EIO;
+complete_with_error(VENDOR_ERR_INV_GID_IDX, comp_ctx);


Informing the guest is OK, but are you sure it makes sense
to continue sending the other work requests?
Is maybe safer to return with error as before?


+continue;
  }
  pr_dbg("sgid_id=%d, sgid=0x%llx\n", wqe->hdr.wr.ud.av.gid_index,
 sgid->global.interface_id);
@@ -189,7 +190,8 @@ int pvrdma_qp_send(PVRDMADev *dev, uint32_t qp_handle)
  if (sgid_idx <= 0) {
  pr_dbg("Fail to get bk sgid_idx for sgid_idx %d\n",
 wqe->hdr.wr.ud.av.gid_index);
-return -EIO;
+complete_with_error(VENDOR_ERR_INV_GID_IDX, comp_ctx);
+continue;


Same question here.

Thanks,
Marcel

  }
  
  if (wqe->hdr.num_sge > dev->dev_attr.max_sge) {

Re: [Qemu-devel] [PATCH] hw/rdma: Delete unused struct member

2019-01-12 Thread Marcel Apfelbaum





On 1/9/19 10:19 PM, Yuval Shaia wrote:

This member is used only in init_device_caps function, make it local.

Signed-off-by: Yuval Shaia 
---
  hw/rdma/rdma_backend.c  | 26 ++
  hw/rdma/rdma_backend_defs.h |  1 -
  2 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/hw/rdma/rdma_backend.c b/hw/rdma/rdma_backend.c
index 16dca69ee9..b49edaacaf 100644
--- a/hw/rdma/rdma_backend.c
+++ b/hw/rdma/rdma_backend.c
@@ -917,23 +917,25 @@ void rdma_backend_destroy_qp(RdmaBackendQP *qp)
  static int init_device_caps(RdmaBackendDev *backend_dev,
  struct ibv_device_attr *dev_attr)
  {
-if (ibv_query_device(backend_dev->context, _dev->dev_attr)) {
+struct ibv_device_attr bk_dev_attr;
+
+if (ibv_query_device(backend_dev->context, _dev_attr)) {
  return -EIO;
  }
  
  dev_attr->max_sge = MAX_SGE;
  
-CHK_ATTR(dev_attr, backend_dev->dev_attr, max_mr_size, "%" PRId64);

-CHK_ATTR(dev_attr, backend_dev->dev_attr, max_qp, "%d");
-CHK_ATTR(dev_attr, backend_dev->dev_attr, max_sge, "%d");
-CHK_ATTR(dev_attr, backend_dev->dev_attr, max_qp_wr, "%d");
-CHK_ATTR(dev_attr, backend_dev->dev_attr, max_cq, "%d");
-CHK_ATTR(dev_attr, backend_dev->dev_attr, max_cqe, "%d");
-CHK_ATTR(dev_attr, backend_dev->dev_attr, max_mr, "%d");
-CHK_ATTR(dev_attr, backend_dev->dev_attr, max_pd, "%d");
-CHK_ATTR(dev_attr, backend_dev->dev_attr, max_qp_rd_atom, "%d");
-CHK_ATTR(dev_attr, backend_dev->dev_attr, max_qp_init_rd_atom, "%d");
-CHK_ATTR(dev_attr, backend_dev->dev_attr, max_ah, "%d");
+CHK_ATTR(dev_attr, bk_dev_attr, max_mr_size, "%" PRId64);
+CHK_ATTR(dev_attr, bk_dev_attr, max_qp, "%d");
+CHK_ATTR(dev_attr, bk_dev_attr, max_sge, "%d");
+CHK_ATTR(dev_attr, bk_dev_attr, max_qp_wr, "%d");
+CHK_ATTR(dev_attr, bk_dev_attr, max_cq, "%d");
+CHK_ATTR(dev_attr, bk_dev_attr, max_cqe, "%d");
+CHK_ATTR(dev_attr, bk_dev_attr, max_mr, "%d");
+CHK_ATTR(dev_attr, bk_dev_attr, max_pd, "%d");
+CHK_ATTR(dev_attr, bk_dev_attr, max_qp_rd_atom, "%d");
+CHK_ATTR(dev_attr, bk_dev_attr, max_qp_init_rd_atom, "%d");
+CHK_ATTR(dev_attr, bk_dev_attr, max_ah, "%d");
  
  return 0;

  }
diff --git a/hw/rdma/rdma_backend_defs.h b/hw/rdma/rdma_backend_defs.h
index 1e5c3dd3bf..15ae8b970e 100644
--- a/hw/rdma/rdma_backend_defs.h
+++ b/hw/rdma/rdma_backend_defs.h
@@ -41,7 +41,6 @@ typedef struct RdmaCmMux {
  } RdmaCmMux;
  
  typedef struct RdmaBackendDev {

-struct ibv_device_attr dev_attr;
  RdmaBackendThread comp_thread;
  PCIDevice *dev;
  RdmaDeviceResources *rdma_dev_res;


Reviewed-by: Marcel Apfelbaum

Thanks,
Marcel

Re: [Qemu-devel] [PATCH] hw/pvrdma: Remove max-sge command-line param

2019-01-12 Thread Marcel Apfelbaum





On 1/9/19 9:41 PM, Yuval Shaia wrote:

This parameter has no effect, fix it.

The function init_dev_caps sets the front-end's max-sge to MAX_SGE. Then
it checks backend's max-sge and adjust it accordingly (we can't send
more than what the device supports).

On send and recv we need to make sure the num_sge in the WQE does not
exceeds the backend device capability.
This check is done in pvrdma level so check on rdma level is deleted.


I think it makes sense to match max-sge with the underlying device
capability.

Reviewed-by: Marcel Apfelbaum
Thanks,
Marcel




Signed-off-by: Yuval Shaia 
---
  docs/pvrdma.txt |  1 -
  hw/rdma/rdma_backend.c  | 23 ++-
  hw/rdma/rdma_backend.h  | 11 +++
  hw/rdma/vmw/pvrdma_main.c   | 10 +-
  hw/rdma/vmw/pvrdma_qp_ops.c | 24 
  5 files changed, 42 insertions(+), 27 deletions(-)

diff --git a/docs/pvrdma.txt b/docs/pvrdma.txt
index 5175251b47..5973c0c68b 100644
--- a/docs/pvrdma.txt
+++ b/docs/pvrdma.txt
@@ -153,7 +153,6 @@ Ethernet function can be used for other Ethernet purposes 
such as IP.
specify the port to use. If not set 1 will be used.
  - dev-caps-max-mr-size: The maximum size of MR.
  - dev-caps-max-qp:  Maximum number of QPs.
-- dev-caps-max-sge: Maximum number of SGE elements in WR.
  - dev-caps-max-cq:  Maximum number of CQs.
  - dev-caps-max-mr:  Maximum number of MRs.
  - dev-caps-max-pd:  Maximum number of PDs.
diff --git a/hw/rdma/rdma_backend.c b/hw/rdma/rdma_backend.c
index c28bfbd44d..16dca69ee9 100644
--- a/hw/rdma/rdma_backend.c
+++ b/hw/rdma/rdma_backend.c
@@ -32,17 +32,6 @@
  #include "rdma_rm.h"
  #include "rdma_backend.h"
  
-/* Vendor Errors */

-#define VENDOR_ERR_FAIL_BACKEND 0x201
-#define VENDOR_ERR_TOO_MANY_SGES0x202
-#define VENDOR_ERR_NOMEM0x203
-#define VENDOR_ERR_QP0  0x204
-#define VENDOR_ERR_INV_NUM_SGE  0x205
-#define VENDOR_ERR_MAD_SEND 0x206
-#define VENDOR_ERR_INVLKEY  0x207
-#define VENDOR_ERR_MR_SMALL 0x208
-#define VENDOR_ERR_INV_MAD_BUFF 0x209
-
  #define THR_NAME_LEN 16
  #define THR_POLL_TO  5000
  
@@ -475,11 +464,6 @@ void rdma_backend_post_send(RdmaBackendDev *backend_dev,

  }
  
  pr_dbg("num_sge=%d\n", num_sge);

-if (!num_sge || num_sge > MAX_SGE) {
-pr_dbg("invalid num_sge=%d\n", num_sge);
-complete_work(IBV_WC_GENERAL_ERR, VENDOR_ERR_INV_NUM_SGE, ctx);
-return;
-}
  
  bctx = g_malloc0(sizeof(*bctx));

  bctx->up_ctx = ctx;
@@ -602,11 +586,6 @@ void rdma_backend_post_recv(RdmaBackendDev *backend_dev,
  }
  
  pr_dbg("num_sge=%d\n", num_sge);

-if (!num_sge || num_sge > MAX_SGE) {
-pr_dbg("invalid num_sge=%d\n", num_sge);
-complete_work(IBV_WC_GENERAL_ERR, VENDOR_ERR_INV_NUM_SGE, ctx);
-return;
-}
  
  bctx = g_malloc0(sizeof(*bctx));

  bctx->up_ctx = ctx;
@@ -942,6 +921,8 @@ static int init_device_caps(RdmaBackendDev *backend_dev,
  return -EIO;
  }
  
+dev_attr->max_sge = MAX_SGE;

+
  CHK_ATTR(dev_attr, backend_dev->dev_attr, max_mr_size, "%" PRId64);
  CHK_ATTR(dev_attr, backend_dev->dev_attr, max_qp, "%d");
  CHK_ATTR(dev_attr, backend_dev->dev_attr, max_sge, "%d");
diff --git a/hw/rdma/rdma_backend.h b/hw/rdma/rdma_backend.h
index 8cae40f827..a9ba40ae48 100644
--- a/hw/rdma/rdma_backend.h
+++ b/hw/rdma/rdma_backend.h
@@ -22,6 +22,17 @@
  #include "rdma_rm_defs.h"
  #include "rdma_backend_defs.h"
  
+/* Vendor Errors */

+#define VENDOR_ERR_FAIL_BACKEND 0x201
+#define VENDOR_ERR_TOO_MANY_SGES0x202
+#define VENDOR_ERR_NOMEM0x203
+#define VENDOR_ERR_QP0  0x204
+#define VENDOR_ERR_INV_NUM_SGE  0x205
+#define VENDOR_ERR_MAD_SEND 0x206
+#define VENDOR_ERR_INVLKEY  0x207
+#define VENDOR_ERR_MR_SMALL 0x208
+#define VENDOR_ERR_INV_MAD_BUFF 0x209
+
  /* Add definition for QP0 and QP1 as there is no userspace enums for them */
  enum ibv_special_qp_type {
  IBV_QPT_SMI = 0,
diff --git a/hw/rdma/vmw/pvrdma_main.c b/hw/rdma/vmw/pvrdma_main.c
index 838ad8a949..d2bdb5ba8c 100644
--- a/hw/rdma/vmw/pvrdma_main.c
+++ b/hw/rdma/vmw/pvrdma_main.c
@@ -43,7 +43,6 @@ static Property pvrdma_dev_properties[] = {
  DEFINE_PROP_UINT64("dev-caps-max-mr-size", PVRDMADev, 
dev_attr.max_mr_size,
 MAX_MR_SIZE),
  DEFINE_PROP_INT32("dev-caps-max-qp", PVRDMADev, dev_attr.max_qp, MAX_QP),
-DEFINE_PROP_INT32("dev-caps-max-sge", PVRDMADev, dev_attr.max_sge, 
MAX_SGE),
  DEFINE_PROP_INT32("dev-caps-max-cq", PVRDMADev, dev_attr.max_cq, MAX_CQ),
  DEFINE_PROP_INT32("dev-caps-max-mr", PVRDMADev, dev_attr.max_mr, MAX_MR),
  DEFINE_PROP_INT32("dev-caps-max-pd", PVRDMADev, dev_attr.max_pd, MAX_PD),
@@ -549,8 +548,9 @@ static void init_dev_caps(PVRDMADev *dev)
 sizeof(struct pvrdma_rq_wqe_hdr));
  
  dev->dev_attr.max_qp_wr

Re: [Qemu-devel] [PATCH] hw/pvrdma: Make function pvrdma_qp_send/recv return void.

2019-01-12 Thread Marcel Apfelbaum





On 1/9/19 10:21 PM, Yuval Shaia wrote:

The functions handles errors internaly, callers have nothing to do with
the return value.

Signed-off-by: Yuval Shaia 
---
  hw/rdma/vmw/pvrdma_qp_ops.c | 14 ++
  hw/rdma/vmw/pvrdma_qp_ops.h |  4 ++--
  2 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/hw/rdma/vmw/pvrdma_qp_ops.c b/hw/rdma/vmw/pvrdma_qp_ops.c
index 0565eba981..ce5a60e184 100644
--- a/hw/rdma/vmw/pvrdma_qp_ops.c
+++ b/hw/rdma/vmw/pvrdma_qp_ops.c
@@ -143,7 +143,7 @@ int pvrdma_qp_ops_init(void)
  return 0;
  }
  
-int pvrdma_qp_send(PVRDMADev *dev, uint32_t qp_handle)

+void pvrdma_qp_send(PVRDMADev *dev, uint32_t qp_handle)
  {
  RdmaRmQP *qp;
  PvrdmaSqWqe *wqe;
@@ -155,7 +155,8 @@ int pvrdma_qp_send(PVRDMADev *dev, uint32_t qp_handle)
  
  qp = rdma_rm_get_qp(>rdma_dev_res, qp_handle);

  if (unlikely(!qp)) {
-return -EINVAL;
+pr_dbg("Invalid qpn\n");
+return;
  }
  
  ring = (PvrdmaRing *)qp->opaque;

@@ -212,11 +213,9 @@ int pvrdma_qp_send(PVRDMADev *dev, uint32_t qp_handle)
  
  wqe = pvrdma_ring_next_elem_read(ring);

  }
-
-return 0;
  }
  
-int pvrdma_qp_recv(PVRDMADev *dev, uint32_t qp_handle)

+void pvrdma_qp_recv(PVRDMADev *dev, uint32_t qp_handle)
  {
  RdmaRmQP *qp;
  PvrdmaRqWqe *wqe;
@@ -226,7 +225,8 @@ int pvrdma_qp_recv(PVRDMADev *dev, uint32_t qp_handle)
  
  qp = rdma_rm_get_qp(>rdma_dev_res, qp_handle);

  if (unlikely(!qp)) {
-return -EINVAL;
+pr_dbg("Invalid qpn\n");
+return;
  }
  
  ring = &((PvrdmaRing *)qp->opaque)[1];

@@ -262,8 +262,6 @@ int pvrdma_qp_recv(PVRDMADev *dev, uint32_t qp_handle)
  
  wqe = pvrdma_ring_next_elem_read(ring);

  }
-
-return 0;
  }
  
  void pvrdma_cq_poll(RdmaDeviceResources *dev_res, uint32_t cq_handle)

diff --git a/hw/rdma/vmw/pvrdma_qp_ops.h b/hw/rdma/vmw/pvrdma_qp_ops.h
index ac46bf7fdf..31cb48ba29 100644
--- a/hw/rdma/vmw/pvrdma_qp_ops.h
+++ b/hw/rdma/vmw/pvrdma_qp_ops.h
@@ -20,8 +20,8 @@
  
  int pvrdma_qp_ops_init(void);

  void pvrdma_qp_ops_fini(void);
-int pvrdma_qp_send(PVRDMADev *dev, uint32_t qp_handle);
-int pvrdma_qp_recv(PVRDMADev *dev, uint32_t qp_handle);
+void pvrdma_qp_send(PVRDMADev *dev, uint32_t qp_handle);
+void pvrdma_qp_recv(PVRDMADev *dev, uint32_t qp_handle);
  void pvrdma_cq_poll(RdmaDeviceResources *dev_res, uint32_t cq_handle);
  
  #endif


Reviewed-by: Marcel Apfelbaum

Thanks,
Marcel

[Qemu-devel] [PATCH 2/3] hw/rdma: modify struct initialization

2019-01-12 Thread Marcel Apfelbaum

Do not initialize structs with {0} since some
CLANG versions do not support it.

Use memset instead.

Signed-off-by: Marcel Apfelbaum 
---
 contrib/rdmacm-mux/main.c | 12 +---
 hw/rdma/rdma_backend.c| 16 
 2 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/contrib/rdmacm-mux/main.c b/contrib/rdmacm-mux/main.c
index 64676030c5..d01dc76927 100644
--- a/contrib/rdmacm-mux/main.c
+++ b/contrib/rdmacm-mux/main.c
@@ -350,9 +350,11 @@ static int get_fd(const char *mad, int *fd, __be64 
*gid_ifid)
 static void *umad_recv_thread_func(void *args)
 {
 int rc;
-RdmaCmMuxMsg msg = {0};
+RdmaCmMuxMsg msg;
 int fd = -2;
 
+memset(, 0, sizeof(msg));
+
 msg.hdr.msg_type = RDMACM_MUX_MSG_TYPE_REQ;
 msg.hdr.op_code = RDMACM_MUX_OP_CODE_MAD;
 
@@ -387,11 +389,13 @@ static void *umad_recv_thread_func(void *args)
 static int read_and_process(int fd)
 {
 int rc;
-RdmaCmMuxMsg msg = {0};
+RdmaCmMuxMsg msg;
 struct umad_hdr *hdr;
 uint32_t *comm_id = 0;
 uint16_t attr_id;
 
+memset(, 0, sizeof(msg));
+
 rc = recv(fd, , sizeof(msg), 0);
 syslog(LOG_DEBUG, "Socket %d, recv %d\n", fd, rc);
 
@@ -744,7 +748,9 @@ static void signal_handler(int sig, siginfo_t *siginfo, 
void *context)
 static int init(void)
 {
 int rc;
-struct sigaction sig = {0};
+struct sigaction sig;
+
+memset(, 0, sizeof(sig));
 
 rc = init_listener();
 if (rc) {
diff --git a/hw/rdma/rdma_backend.c b/hw/rdma/rdma_backend.c
index c28bfbd44d..92e95aa640 100644
--- a/hw/rdma/rdma_backend.c
+++ b/hw/rdma/rdma_backend.c
@@ -190,9 +190,11 @@ static inline int 
rdmacm_mux_can_process_async(RdmaBackendDev *backend_dev)
 
 static int check_mux_op_status(CharBackend *mad_chr_be)
 {
-RdmaCmMuxMsg msg = {0};
+RdmaCmMuxMsg msg;
 int ret;
 
+memset(, 0, sizeof(msg));
+
 pr_dbg("Reading response\n");
 ret = qemu_chr_fe_read_all(mad_chr_be, (uint8_t *), sizeof(msg));
 if (ret != sizeof(msg)) {
@@ -387,10 +389,12 @@ static int build_host_sge_array(RdmaDeviceResources 
*rdma_dev_res,
 static int mad_send(RdmaBackendDev *backend_dev, uint8_t sgid_idx,
 union ibv_gid *sgid, struct ibv_sge *sge, uint32_t num_sge)
 {
-RdmaCmMuxMsg msg = {0};
+RdmaCmMuxMsg msg;
 char *hdr, *data;
 int ret;
 
+memset(, 0, sizeof(msg));
+
 pr_dbg("num_sge=%d\n", num_sge);
 
 if (num_sge != 2) {
@@ -1112,9 +1116,11 @@ int rdma_backend_get_gid_index(RdmaBackendDev 
*backend_dev,
 int rdma_backend_add_gid(RdmaBackendDev *backend_dev, const char *ifname,
  union ibv_gid *gid)
 {
-RdmaCmMuxMsg msg = {0};
+RdmaCmMuxMsg msg;
 int ret;
 
+memset(, 0, sizeof(msg));
+
 pr_dbg("0x%llx, 0x%llx\n",
(long long unsigned int)be64_to_cpu(gid->global.subnet_prefix),
(long long unsigned int)be64_to_cpu(gid->global.interface_id));
@@ -1138,9 +1144,11 @@ int rdma_backend_add_gid(RdmaBackendDev *backend_dev, 
const char *ifname,
 int rdma_backend_del_gid(RdmaBackendDev *backend_dev, const char *ifname,
  union ibv_gid *gid)
 {
-RdmaCmMuxMsg msg = {0};
+RdmaCmMuxMsg msg;
 int ret;
 
+memset(, 0, sizeof(msg));
+
 pr_dbg("0x%llx, 0x%llx\n",
(long long unsigned int)be64_to_cpu(gid->global.subnet_prefix),
(long long unsigned int)be64_to_cpu(gid->global.interface_id));
-- 
2.17.1

[Qemu-devel] [PATCH 1/3] contrib/rdmacm-mux: remove Wno-format-truncation flag

2019-01-12 Thread Marcel Apfelbaum

The flag is not recognized by some CLANG versions.
Add proper constraints in code instead.

Signed-off-by: Marcel Apfelbaum 
---
 contrib/rdmacm-mux/Makefile.objs | 2 +-
 contrib/rdmacm-mux/main.c| 6 --
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/contrib/rdmacm-mux/Makefile.objs b/contrib/rdmacm-mux/Makefile.objs
index be3eacb6f7..e1ff4fe569 100644
--- a/contrib/rdmacm-mux/Makefile.objs
+++ b/contrib/rdmacm-mux/Makefile.objs
@@ -1,4 +1,4 @@
 ifdef CONFIG_PVRDMA
-CFLAGS += -libumad -Wno-format-truncation
+CFLAGS += -libumad
 rdmacm-mux-obj-y = main.o
 endif
diff --git a/contrib/rdmacm-mux/main.c b/contrib/rdmacm-mux/main.c
index 835a7f9214..64676030c5 100644
--- a/contrib/rdmacm-mux/main.c
+++ b/contrib/rdmacm-mux/main.c
@@ -42,6 +42,8 @@
 
 /* The below can be override by command line parameter */
 #define UNIX_SOCKET_PATH "/var/run/rdmacm-mux"
+/* Has format %s-%s-%d" -- */
+#define SOCKET_PATH_MAX (PATH_MAX - NAME_MAX - sizeof(int) - 2)
 #define RDMA_PORT_NUM 1
 
 typedef struct RdmaCmServerArgs {
@@ -95,7 +97,7 @@ static void help(const char *progname)
 static void parse_args(int argc, char *argv[])
 {
 int c;
-char unix_socket_path[PATH_MAX];
+char unix_socket_path[SOCKET_PATH_MAX];
 
 strcpy(server.args.rdma_dev_name, "");
 strcpy(unix_socket_path, UNIX_SOCKET_PATH);
@@ -113,7 +115,7 @@ static void parse_args(int argc, char *argv[])
 
 case 's':
 /* This is temporary, final name will build below */
-strncpy(unix_socket_path, optarg, PATH_MAX);
+strncpy(unix_socket_path, optarg, SOCKET_PATH_MAX);
 break;
 
 case 'p':
-- 
2.17.1

[Qemu-devel] [PATCH 3/3] contrib/rdmacm-mux: fix clang compilation

2019-01-12 Thread Marcel Apfelbaum

Fix Commit a5d2f6f877 (contrib/rdmacm-mux: Add implementation
   of RDMA User MAD multiplexer).

The above commit introduces a new contrib target, adding a global dependency
to libumad library in case pvrdma configuration option is enabled.
Clang forbids it:
clang-6.0: error: -libumad: 'linker' input unused
  [-Werror,-Wunused-command-line-argument]

Fix by limiting the scope to the rdmacm-mux target itself.

Reported-by: Cornelia Huck 
Signed-off-by: Marcel Apfelbaum 
---
 Makefile | 2 ++
 contrib/rdmacm-mux/Makefile.objs | 1 -
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index a9ac16d94e..31e87e0c2d 100644
--- a/Makefile
+++ b/Makefile
@@ -580,6 +580,8 @@ vhost-user-scsi$(EXESUF): $(vhost-user-scsi-obj-y) 
libvhost-user.a
$(call LINK, $^)
 vhost-user-blk$(EXESUF): $(vhost-user-blk-obj-y) libvhost-user.a
$(call LINK, $^)
+
+rdmacm-mux$(EXESUF): LIBS += "-libumad"
 rdmacm-mux$(EXESUF): $(rdmacm-mux-obj-y) $(COMMON_LDADDS)
$(call LINK, $^)
 
diff --git a/contrib/rdmacm-mux/Makefile.objs b/contrib/rdmacm-mux/Makefile.objs
index e1ff4fe569..3df744af89 100644
--- a/contrib/rdmacm-mux/Makefile.objs
+++ b/contrib/rdmacm-mux/Makefile.objs
@@ -1,4 +1,3 @@
 ifdef CONFIG_PVRDMA
-CFLAGS += -libumad
 rdmacm-mux-obj-y = main.o
 endif
-- 
2.17.1

[Qemu-devel] [PATCH 0/3] contrib/rdmacm-mux: fix clang compilation

2019-01-12 Thread Marcel Apfelbaum

Fix Commit a5d2f6f877 (contrib/rdmacm-mux: Add implementation
   of RDMA User MAD multiplexer).

The above commit introduces a new contrib target, adding a global dependency
to libumad library in case pvrdma configuration option is enabled.
Clang forbids it:
clang-6.0: error: -libumad: 'linker' input unused
  [-Werror,-Wunused-command-line-argument]

Fix by limiting the scope to the rdmacm-mux target itself.
Fix related clang errors while at it.

Marcel Apfelbaum (3):
  contrib/rdmacm-mux: remove Wno-format-truncation flag
  hw/rdma: modify struct initialization
  contrib/rdmacm-mux: fix clang compilation

 Makefile |  2 ++
 contrib/rdmacm-mux/Makefile.objs |  1 -
 contrib/rdmacm-mux/main.c| 18 +-
 hw/rdma/rdma_backend.c   | 16 
 4 files changed, 27 insertions(+), 10 deletions(-)

-- 
2.17.1

[Qemu-devel] QEMU 3.1 - Getting Instruction Traces

2019-01-12 Thread Dhruv Agarwal

Dear Developers,

I am an undergraduate student trying to generate *a trace of all
instructions executed by the QEMU ARM emulator when it runs a given binary*.

I have been using the* following command* (from an out-of-tree build from
source code):
arm-linux-user/qemu-arm -d in_asm ~/Desktop/hello
(hello is the name of my binary cross-compiled for ARM
using gcc-arm-none-eabi on an x86 host machine).

By doing this, I do get a list of translation blocks that are executed by
the emulator, however I was told by a colleague that this trace/log is
incomplete because of the following reason: the translation blocks already
in the translation cache are not translated again, and hence are not logged
to the screen by QEMU.

Hence, I am now stuck and r*equire a way to generate a complete instruction
trace of all instructions that are executed by the ARM emulator for a given
binary*. Has anyone done the same earlier and can guide me on how to get
this trace?

Thank you and best regards,
Dhruv

Re: [Qemu-devel] [PATCH PULL 02/31] contrib/rdmacm-mux: Add implementation of RDMA User MAD multiplexer

2019-01-12 Thread Marcel Apfelbaum





On 1/8/19 11:18 AM, Cornelia Huck wrote:

On Tue, 8 Jan 2019 10:41:38 +0200
Marcel Apfelbaum  wrote:


On 1/7/19 7:54 PM, Cornelia Huck wrote:

On Mon, 7 Jan 2019 19:28:10 +0200
Marcel Apfelbaum  wrote:
  

On 1/3/19 12:34 PM, Cornelia Huck wrote:

clang-6.0: error: -libumad: 'linker' input unused 
[-Werror,-Wunused-command-line-argument]

Is really strange, the rdma-mux is part of the contrib directory
and is not even compiled by default.
  

Explicitly passing --disable-pvrdma to configure disables the offending
code.

Let me know if you need more information.

Can you please send the exact steps you are using to configure and
compile QEMU,
are you compiling on a x86 machine?

Yes, this is on my laptop (up-to-date F28). I can reproduce on current
master, configured with

../configure --target-list="s390x-softmmu s390x-linux-user i386-softmmu 
x86_64-softmmu cris-softmmu arm-softmmu ppc64-softmmu" --enable-linux-aio 
--enable-virtfs --enable-trace-backends=simple --cc=clang --enable-rdma --enable-pvrdma

It compiles if I pass --disable-pvrdma instead. It also compiles if I
drop --cc=clang.

I'm building in a build subdirectory; I have already tried purging it
before rebuilding, without luck.

clang --version says

clang version 6.0.1 (tags/RELEASE_601/final)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

Thanks for the information, I will look into it.
Marcel

Thanks.

Another data point: I see the same on a Fedora 29 on s390x, so this
does not seem to be a broken local setup (unless I managed to mess up
both systems in the same way...) clang --version says

clang version 7.0.0 (Fedora 7.0.0-2.fc29)
Target: s390x-ibm-linux
Thread model: posix
InstalledDir: /usr/bin


Hi Cornelia,
I reproduced the issue and found the root cause:
   contrib/rdmacm-mux/Makefile.objs:
       ifdef CONFIG_PVRDMA
       CFLAGS += -libumad -Wno-format-truncation
   rdmacm-mux-obj-y = main.o
   endif

The above file adds libumad dependency for all project targets even if 
not needed.

Gcc allows that while clang forbids it. (is strange it passed patchew tests)

I will come up with a fix soon.

Thanks,
Marcel

[Qemu-devel] [Bug 1811499] [NEW] qemu/net/colo-compare.c:288: possible pointless code duplication ?

2019-01-12 Thread dcb

Public bug reported:

qemu/net/colo-compare.c:288] -> [qemu/net/colo-compare.c:296]: (style)
The if condition is the same as the previous if condition

Source code is

if (ppkt->tcp_seq == spkt->tcp_seq && ppkt->seq_end == spkt->seq_end) {
if (colo_compare_packet_payload(ppkt, spkt,
ppkt->header_size, spkt->header_size,
ppkt->payload_size)) {
*mark = COLO_COMPARE_FREE_SECONDARY | COLO_COMPARE_FREE_PRIMARY;
return true;
}
}
if (ppkt->tcp_seq == spkt->tcp_seq && ppkt->seq_end == spkt->seq_end) {
if (colo_compare_packet_payload(ppkt, spkt,
ppkt->header_size, spkt->header_size,
ppkt->payload_size)) {
*mark = COLO_COMPARE_FREE_SECONDARY | COLO_COMPARE_FREE_PRIMARY;
return true;
}
}

Maybe the second block was supposed to be different ?

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1811499

Title:
  qemu/net/colo-compare.c:288: possible pointless code duplication ?

Status in QEMU:
  New

Bug description:
  qemu/net/colo-compare.c:288] -> [qemu/net/colo-compare.c:296]: (style)
  The if condition is the same as the previous if condition

  Source code is

  if (ppkt->tcp_seq == spkt->tcp_seq && ppkt->seq_end == spkt->seq_end) {
  if (colo_compare_packet_payload(ppkt, spkt,
  ppkt->header_size, spkt->header_size,
  ppkt->payload_size)) {
  *mark = COLO_COMPARE_FREE_SECONDARY | COLO_COMPARE_FREE_PRIMARY;
  return true;
  }
  }
  if (ppkt->tcp_seq == spkt->tcp_seq && ppkt->seq_end == spkt->seq_end) {
  if (colo_compare_packet_payload(ppkt, spkt,
  ppkt->header_size, spkt->header_size,
  ppkt->payload_size)) {
  *mark = COLO_COMPARE_FREE_SECONDARY | COLO_COMPARE_FREE_PRIMARY;
  return true;
  }
  }

  Maybe the second block was supposed to be different ?

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1811499/+subscriptions

[Qemu-devel] [PATCH] KVM: MMU: fast cleanup D bit based on fast write protect

2019-01-12 Thread Zhuangyanying

From: Zhuang Yanying 

Recently I tested live-migration with large-memory guests, find vcpu may hang 
for a long time while starting migration, such as 9s for 
2048G(linux-4.20.1+qemu-3.1.0).
The reason is memory_global_dirty_log_start() taking too long, and the vcpu is 
waiting for BQL. The page-by-page D bit clearup is the main time consumption.
I think that the idea of "KVM: MMU: fast write protect" by xiaoguangrong, 
especially the function kvm_mmu_write_protect_all_pages(), is very helpful.
After a little modifcation, on his patch, can solve this problem, 9s to 0.5s.

At the begining of live migration, write protection is only applied to the 
top-level SPTE. Then the write from vm trigger the EPT violation, with 
for_each_shadow_entry write protection is performed at dirct_map.
Finally the Dirty bit of the target page(at level 1 page table) is cleared, and 
the dirty page tracking is started. Of coure, the page where GPA is located is 
marked dirty when mmu_set_spte.
A similar implementation on xen, just emt instead of write protection.

What do you think about this solution?
---
 mmu.c | 5 -
 vmx.c | 3 +--
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/mmu.c b/mmu.c
index b079d74..f49d316 100755
--- a/mmu.c
+++ b/mmu.c
@@ -3210,7 +3210,10 @@ static bool mmu_load_shadow_page(struct kvm *kvm, struct 
kvm_mmu_page *sp)
break;
 
 if (is_last_spte(spte, sp->role.level)) {
-   flush |= spte_write_protect(sptep, false);
+   if (sp->role.level == PT_PAGE_TABLE_LEVEL)
+   flush |= spte_clear_dirty(sptep);
+   else
+   flush |= spte_write_protect(sptep, false);
continue;
 }
 
diff --git a/vmx.c b/vmx.c
index 95784bc..7ec717f 100755
--- a/vmx.c
+++ b/vmx.c
@@ -14421,8 +14421,7 @@ static void vmx_sched_in(struct kvm_vcpu *vcpu, int cpu)
 static void vmx_slot_enable_log_dirty(struct kvm *kvm,
 struct kvm_memory_slot *slot)
 {
-   kvm_mmu_slot_leaf_clear_dirty(kvm, slot);
-   kvm_mmu_slot_largepage_remove_write_access(kvm, slot);
+   kvm_mmu_write_protect_all_pages(kvm, true);
 }
 
 static void vmx_slot_disable_log_dirty(struct kvm *kvm,
-- 
1.8.3.1

Re: [Qemu-devel] [QEMU-devel][PATCH v4 0/2] Fix concurrent aio_poll/set_fd_handler.

2019-01-12 Thread Stefan Hajnoczi

On Thu, Dec 20, 2018 at 04:20:28PM +0100, remy.n...@blade-group.com wrote:
> From: Remy Noel 
> 
> It is possible for an io_poll/read/write callback to be concurrently executed 
> along
> with an aio_set_fd_handlers. This can cause all sorts of problems, like
> a NULL callback or a bad opaque pointer.
> 
> V2:
> * Do not use RCU anymore as it inccurs a performance loss
> V3:
> * Don't drop revents when a handler is modified [Stefan]
> V4:
> * Unregister fd from ctx epoll when removing fd_handler [Paolo]
> 
> Remy Noel (2):
>   aio-posix: Unregister fd from ctx epoll when removing fd_handler.
>   aio-posix: Fix concurrent aio_poll/set_fd_handler.
> 
>  util/aio-posix.c | 90 +---
>  util/aio-win32.c | 67 ---
>  2 files changed, 84 insertions(+), 73 deletions(-)
> 
> -- 
> 2.19.2
> 

Thanks, applied to my block tree:
https://github.com/stefanha/qemu/commits/block

Stefan


signature.asc
Description: PGP signature

46 matches

Mail list logo