date:20190109

Re: [Qemu-devel] [PATCH 3/4] nvme: check msix_init_exclusive_bar return value

2019-01-09 Thread Li Qiang

Max Reitz  于2019年1月9日周三 下午10:52写道：

> On 30.10.18 06:18, Li Qiang wrote:
> > As this function can fail.
> >
> > Signed-off-by: Li Qiang 
> > ---
> >  hw/block/nvme.c | 5 -
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> > index 72c9644..a406c23 100644
> > --- a/hw/block/nvme.c
> > +++ b/hw/block/nvme.c
> > @@ -1250,7 +1250,10 @@ static void nvme_realize(PCIDevice *pci_dev,
> Error **errp)
> >  pci_register_bar(>parent_obj, 0,
> >  PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64,
> >  >iomem);
> > -msix_init_exclusive_bar(>parent_obj, n->num_queues, 4, NULL);
> > +if (msix_init_exclusive_bar(>parent_obj, n->num_queues, 4,
> NULL)) {
> > +error_setg(errp, "msix_init_exclusive_bar failed");
> > +return;
> > +}
>
> Commit ee640c625e1 which last touched this function stated that creating
> this device won't fail if this function fails, so I suppose it's
> intential that the error isn't checked.
>

OK, got. I will drop this patch later.

Thanks,
Li Qiang


>
>
> But if you think the error should be checked and device creation should
> be aborted if this function failed, then I suppose there'd be some
> cleaning up to instead of just returning.
>
> Also, we should just pass errp to that function (as its last parameter,
> instead of NULL).
>
> Max
>
> >
> >  id->vid = cpu_to_le16(pci_get_word(pci_conf + PCI_VENDOR_ID));
> >  id->ssvid = cpu_to_le16(pci_get_word(pci_conf +
> PCI_SUBSYSTEM_VENDOR_ID));
> >
>
>
>

Re: [Qemu-devel] [PATCH] ppc/xive: fix remaining XiveFabric names

2019-01-09 Thread David Gibson

On Wed, Jan 09, 2019 at 04:15:32PM +0100, Cédric Le Goater wrote:
> Signed-off-by: Cédric Le Goater 

Applied, thanks.

> ---
>  include/hw/ppc/xive.h | 2 +-
>  hw/intc/xive.c| 6 +++---
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
> index 4cd2e54c4958..0f935a146c66 100644
> --- a/include/hw/ppc/xive.h
> +++ b/include/hw/ppc/xive.h
> @@ -146,7 +146,7 @@
>  #include "hw/ppc/xive_regs.h"
>  
>  /*
> - * XIVE Fabric (Interface between Source and Router)
> + * XIVE Notifier (Interface between Source and Router)
>   */
>  
>  typedef struct XiveNotifier {
> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
> index 469c90e50c6a..39aac9f89d2a 100644
> --- a/hw/intc/xive.c
> +++ b/hw/intc/xive.c
> @@ -1659,9 +1659,9 @@ static const TypeInfo xive_end_source_info = {
>  };
>  
>  /*
> - * XIVE Fabric
> + * XIVE Notifier
>   */
> -static const TypeInfo xive_fabric_info = {
> +static const TypeInfo xive_notifier_info = {
>  .name = TYPE_XIVE_NOTIFIER,
>  .parent = TYPE_INTERFACE,
>  .class_size = sizeof(XiveNotifierClass),
> @@ -1670,7 +1670,7 @@ static const TypeInfo xive_fabric_info = {
>  static void xive_register_types(void)
>  {
>  type_register_static(_source_info);
> -type_register_static(_fabric_info);
> +type_register_static(_notifier_info);
>  type_register_static(_router_info);
>  type_register_static(_end_source_info);
>  type_register_static(_tctx_info);

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v1 0/3] gitdm updates

2019-01-09 Thread Aleksandar Markovic

On Tuesday, January 8, 2019, Aleksandar Markovic <
aleksandar.m.m...@gmail.com> wrote:

>
>
> On Monday, January 7, 2019, Alex Bennée  wrote:
>
>>
>> Hi,
>>
>> Added a few more updates mostly of IBMers with non corporate emails.
>
>
> Hi, Alex, could you please add
>
> leon.al...@imgtec.com
>
> to Wave Computing group?
>
> (my email client automatically inserts and displays "ae" as one letter,
> the last name should be ending with a e, a-l-r-a-e.)
>
> I have some patches from Leon that will be upstreamed soon, with his email
> as above.
>
> I am on days-off and it is difficult for me to create and send patches, so
> I ask you toake this minor correction.
>
> Thanks, Aleksandar
>
>
>
>> The year-end stats as per:
>>
>>  git log --numstat --after="1/1/2018 00:00" \
>>--before="31/12/2018 23:59" | gitdm -n -l 10
>>
>> Are:
>>
>>   Top changeset contributors by employer
>>   Red Hat   3091 (43.3%)
>>   Linaro1201 (16.8%)
>>   (None) 484 (6.8%)
>>   IBM426 (6.0%)
>>   Academics (various)186 (2.6%)
>>   Virtuozzo  172 (2.4%)
>>   Wave Computing 118 (1.7%)
>>   Igalia 109 (1.5%)
>>   Xilinx 102 (1.4%)
>>   Cadence Design Systems  80 (1.1%)
>>
>>   Top lines changed by employer
>>   Red Hat   140523 (30.3%)
>>   Cadence Design Systems81010 (17.5%)
>>   Linaro78098 (16.8%)
>>   Wave Computing33134 (7.1%)
>>   IBM   18918 (4.1%)
>>   SiFive14436 (3.1%)
>>   Academics (various)   11995 (2.6%)
>>   (None)11458 (2.5%)
>>   Virtuozzo 10770 (2.3%)
>>   Oracle6698 (1.4%)
>>
>> Alex Bennée (2):
>>   contrib/gitdm: add Nokia and Proxmox to the domain-map
>>   contrib/gitdm: add two more IBM'ers to group-map-ibm
>>
>> Joel Stanley (1):
>>   contrib/gitdm: Add other IBMers
>>
>>  contrib/gitdm/domain-map| 2 ++
>>  contrib/gitdm/group-map-ibm | 7 +++
>>  2 files changed, 9 insertions(+)
>>
>> --
>> 2.17.1
>>
>>
>>
If you create a patch for Wave Computing as I described it also has:

 Reviewed-by: Aleksandar Markovic

Re: [Qemu-devel] [PULL 00/29] ppc-for-4.0 queue 20190109

2019-01-09 Thread Peter Maydell

On Tue, 8 Jan 2019 at 22:46, David Gibson  wrote:
>
> The following changes since commit 147923b1a901a0370f83a0f4c58ec1baffef22f0:
>
>   Merge remote-tracking branch 
> 'remotes/kraxel/tags/usb-20190108-pull-request' into staging (2019-01-08 
> 16:07:32 +)
>
> are available in the Git repository at:
>
>   git://github.com/dgibson/qemu.git tags/ppc-for-4.0-20190109
>
> for you to fetch changes up to 3a8eb78e6c135422017888380db091793039b6dd:
>
>   spapr: enable XIVE MMIOs at reset (2019-01-09 09:28:14 +1100)
>
> 
> ppc patch queue 2019-01-09
>
> Second main pull request for qemu-4.0.  Highlights are:
>  * Final parts of XIVE support for pseries (without KVM)
>  * Preliminary work for PHB hotplug
>  * Starting to use TCG vector operations
>
> This includes some changes in the PCI core, which Michael Tsirkin
> requested come through this tree, since they're primarily of interest
> for ppc.
>

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/4.0
for any user-visible changes.

-- PMM

Re: [Qemu-devel] [PATCH v3 0/5] kvm "virtio pmem" device

2019-01-09 Thread Dave Chinner

On Wed, Jan 09, 2019 at 08:17:31PM +0530, Pankaj Gupta wrote:
>  This patch series has implementation for "virtio pmem". 
>  "virtio pmem" is fake persistent memory(nvdimm) in guest 
>  which allows to bypass the guest page cache. This also
>  implements a VIRTIO based asynchronous flush mechanism.  

H. Sharing the host page cache direct into the guest VM. Sounds
like a good idea, but.

This means the guest VM can now run timing attacks to observe host
side page cache residency, and depending on the implementation I'm
guessing that the guest will be able to control host side page
cache eviction, too (e.g. via discard or hole punch operations).

Which means this functionality looks to me like a new vector for
information leakage into and out of the guest VM via guest
controlled host page cache manipulation.

https://arxiv.org/pdf/1901.01161

I might be wrong, but if I'm not we're going to have to be very
careful about how guest VMs can access and manipulate host side
resources like the page cache.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com

Re: [Qemu-devel] [PATCH v2 43/52] audio: add mixeng option (documentation)

2019-01-09 Thread Eric Blake

On 12/23/18 2:52 PM, Kővágó, Zoltán wrote:
> This will allow us to disable mixeng when we use a decent backend.
> 
> Disabling mixeng have a few advantages:
> * we no longer convert the audio output from one format to another, when
>   the underlying audio system would just convert it to a third format.
>   We no longer convert, only the underlying system, when needed.
> * the underlying system probably has better resampling and sample format
>   converting methods anyway...
> * we may support formats that the mixeng currently does not support (S24
>   or float samples, more than two channels)
> * when using an audio server (like pulseaudio) different sound card
>   outputs will show up as separate streams, even if we use only one
>   backend
> 
> Disadvantages:
> * audio capturing no longer works (wavcapture, and vnc audio extension)
> * some backends only support a single playback stream or very picky
>   about the audio format.  In this case we can't disable mixeng.
> 
> However mixeng is not removed, only made optional, so this shouldn't be
> a big concern.
> 
> Signed-off-by: Kővágó, Zoltán 
> ---
>  qapi/audio.json | 5 +
>  qemu-options.hx | 6 ++
>  2 files changed, 11 insertions(+)
> 
> diff --git a/qapi/audio.json b/qapi/audio.json
> index 56d8ce439f..180bf207a8 100644
> --- a/qapi/audio.json
> +++ b/qapi/audio.json
> @@ -184,6 +184,10 @@
>  #
>  # General audio backend options that are used for both playback and 
> recording.
>  #
> +# @mixeng: #optional use QEMU's mixing engine to mix all streams inside QEMU.

The #optional tag is no longer necessary (the docs generator figures it
out from the '*mixeng' name below).

> +#  When set to off, fixed-settings must be also off.  Not every 
> backend
> +#  compatible with the off setting (default on)

Missing a '(since 4.0)' tag.

> +#
>  # @fixed-settings: #optional use fixed settings for host input/output.  When
>  #  off, frequency, channels and format must not be specified
>  #  (default on)
> @@ -207,6 +211,7 @@
>  ##
>  { 'struct': 'AudiodevPerDirectionOptions',
>'data': {
> +'*mixeng': 'bool',
>  '*fixed-settings': 'bool',
>  '*frequency':  'int',
>  '*channels':   'int',


-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH] vfio/common: Work around kernel overflow bug in DMA unmap

2019-01-09 Thread Alex Williamson

A kernel bug was introduced in v4.15 via commit 71a7d3d78e3c which
adds a test for address space wrap-around in the vfio DMA unmap path.
Unfortunately due to overflow, the kernel detects an unmap of the last
page in the 64-bit address space as a wrap-around.  In QEMU, a Q35
guest with VT-d emulation and guest IOMMU enabled will attempt to make
such an unmap request during VM system reset, triggering an error:

  qemu-kvm: VFIO_UNMAP_DMA: -22
  qemu-kvm: vfio_dma_unmap(0x561f059948f0, 0xfef0, 0x0110) = 
-22 (Invalid argument)

Here the IOVA start address (0xfef0) and the size parameter
(0x0110) add to exactly 2^64, triggering the bug.  A
kernel fix is queued for the Linux v5.0 release to address this.

This patch implements a workaround to retry the unmap, excluding the
final page of the range when we detect an unmap failing which matches
the requirements for this issue.  This is expected to be a safe and
complete workaround as the VT-d address space does not extend to the
full 64-bit space and therefore the last page should never be mapped.

This workaround can be removed once all kernels with this bug are
sufficiently deprecated.

Link: https://bugzilla.redhat.com/show_bug.cgi?id=1662291
Reported-by: Pei Zhang 
Debugged-by: Peter Xu 
Signed-off-by: Alex Williamson 
---
 hw/vfio/common.c |   20 +++-
 hw/vfio/trace-events |1 +
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 7c185e5a2e79..820b839057c6 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -220,7 +220,25 @@ static int vfio_dma_unmap(VFIOContainer *container,
 .size = size,
 };
 
-if (ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, )) {
+while (ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, )) {
+/*
+ * The type1 backend has an off-by-one bug in the kernel (71a7d3d78e3c
+ * v4.15) where an overflow in its wrap-around check prevents us from
+ * unmapping the last page of the address space.  Test for the error
+ * condition and re-try the unmap excluding the last page.  The
+ * expectation is that we've never mapped the last page anyway and this
+ * unmap request comes via vIOMMU support which also makes it unlikely
+ * that this page is used.  This bug was introduced well after type1 v2
+ * support was introduced, so we shouldn't need to test for v1.  A fix
+ * is queued for kernel v5.0 so this workaround can be removed once
+ * affected kernels are sufficiently deprecated.
+ */
+if (errno == EINVAL && unmap.size && !(unmap.iova + unmap.size) &&
+container->iommu_type == VFIO_TYPE1v2_IOMMU) {
+trace_vfio_dma_unmap_overflow_workaround();
+unmap.size -= 1ULL << ctz64(container->pgsizes);
+continue;
+}
 error_report("VFIO_UNMAP_DMA: %d", -errno);
 return -errno;
 }
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index a85e8662eadb..a002c6af2dda 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -110,6 +110,7 @@ vfio_region_mmaps_set_enabled(const char *name, bool 
enabled) "Region %s mmaps e
 vfio_region_sparse_mmap_header(const char *name, int index, int nr_areas) 
"Device %s region %d: %d sparse mmap entries"
 vfio_region_sparse_mmap_entry(int i, unsigned long start, unsigned long end) 
"sparse entry %d [0x%lx - 0x%lx]"
 vfio_get_dev_region(const char *name, int index, uint32_t type, uint32_t 
subtype) "%s index %d, %08x/%0x8"
+vfio_dma_unmap_overflow_workaround(void) ""
 
 # hw/vfio/platform.c
 vfio_platform_base_device_init(char *name, int groupid) "%s belongs to group 
#%d"

[Qemu-devel] [PATCH v2 2/3] globals: Allow global properties to be optional

2019-01-09 Thread Eduardo Habkost

Making some global properties optional will let us simplify
compat code when a given property works on most (but not all)
subclasses of a given type.

Device types will be able to opt out from optional compat
properties by simply not registering those properties, or by
making the property setter report an error.

Signed-off-by: Eduardo Habkost 
---
Note: the "An error is fatal for non-hotplugged devices [...]"
comment that appears in the diff scope is inaccurate, but I will
fix that in a separate patch because I don't want this to get
into the way of fixing the crash.
---
 include/hw/qdev-core.h | 3 +++
 qom/object.c   | 4 +++-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index bc014c1c9f..463e1ddb1e 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -250,6 +250,8 @@ struct PropertyInfo {
 /**
  * GlobalProperty:
  * @used: Set to true if property was used when initializing a device.
+ * @optional: If set to true, errors when applying the property won't be
+ *reported by object_apply_global_props().
  *
  * An error is fatal for non-hotplugged devices, when the global is applied.
  */
@@ -258,6 +260,7 @@ typedef struct GlobalProperty {
 const char *property;
 const char *value;
 bool used;
+bool optional;
 } GlobalProperty;
 
 static inline void
diff --git a/qom/object.c b/qom/object.c
index 4e5226ca12..226ddf0189 100644
--- a/qom/object.c
+++ b/qom/object.c
@@ -387,7 +387,9 @@ void object_apply_global_props(Object *obj, const GPtrArray 
*props, Error **errp
 }
 p->used = true;
 object_property_parse(obj, p->value, p->property, );
-if (err != NULL) {
+if (p->optional) {
+error_free(err);
+} else if (err != NULL) {
 error_prepend(, "can't apply global %s.%s=%s: ",
   p->driver, p->property, p->value);
 /*
-- 
2.18.0.rc1.1.g3f1ff2140

Re: [Qemu-devel] [PATCH v2 03/52] qapi: qapi for audio backends

2019-01-09 Thread Eric Blake

On 12/23/18 2:51 PM, Kővágó, Zoltán wrote:
> This patch adds structures into qapi to replace the existing
> configuration structures used by audio backends currently. This qapi
> will be the base of the -audiodev command line parameter (that replaces
> the old environment variables based config).
> 
> This is not a 1:1 translation of the old options, I've tried to make
> them much more consistent (e.g. almost every backend had an option to
> specify buffer size, but the name was different for every backend, and
> some backends required usecs, while some other required frames, samples
> or bytes). Also tried to reduce the number of abbreviations used by the
> config keys.
> 
> Some of the more important changes:
> * use `in` and `out` instead of `ADC` and `DAC`, as the former is more
>   user friendly imho
> * moved buffer settings into the global setting area (so it's the same
>   for all backends that support it. Backends that can't change buffer
>   size will simply ignore them). Also using usecs, as it's probably more
>   user friendly than samples or bytes.
> * try-poll is now an alsa backend specific option (as all other backends
>   currently ignore it)
> 
> Signed-off-by: Kővágó, Zoltán 
> ---
>  Makefile.objs |   6 +-
>  qapi/audio.json   | 253 ++
>  qapi/qapi-schema.json |   1 +
>  3 files changed, 257 insertions(+), 3 deletions(-)
>  create mode 100644 qapi/audio.json
> 
> diff --git a/Makefile.objs b/Makefile.objs
> index bc5b8a8442..3f833a70c0 100644
> --- a/Makefile.objs
> +++ b/Makefile.objs
> @@ -1,6 +1,6 @@
> -QAPI_MODULES = block-core block char common crypto introspect job migration
> -QAPI_MODULES += misc net rdma rocker run-state sockets tpm trace transaction
> -QAPI_MODULES += ui
> +QAPI_MODULES = audio block-core block char common crypto introspect job
> +QAPI_MODULES += migration misc net rdma rocker run-state sockets tpm trace
> +QAPI_MODULES += transaction ui
>  
>  ###
>  # Common libraries for tools and emulators
> diff --git a/qapi/audio.json b/qapi/audio.json
> new file mode 100644
> index 00..56d8ce439f
> --- /dev/null
> +++ b/qapi/audio.json
> @@ -0,0 +1,253 @@
> +# -*- mode: python -*-
> +#
> +# Copyright (C) 2015 Zoltán Kővágó 

Do you want to claim 2015-2019 now?  But ultimately what you put is your
call, so don't treat my suggestion as a legal mandate.

> +#
> +# This work is licensed under the terms of the GNU GPL, version 2 or later.
> +# See the COPYING file in the top-level directory.
> +
> +##
> +# @AudiodevNoOptions:
> +#
> +# The none, coreaudio, sdl and spice audio backend have no options.
> +#
> +# Since: 3.2

4.0, now.

> +##
> +{ 'struct': 'AudiodevNoOptions',
> +  'data': { } }

Also, we now have support for empty branches in a flat union, so do you
really need this type, or...

> +
> +##
> +# @AudiodevAlsaPerDirectionOptions:
> +#
> +# Options of the alsa backend that are used for both playback and recording.
> +#
> +# @dev: #optional the name of the alsa device to use (default 'default')

No need to use the '#optional' tag any more; the doc generator now takes
care of that.

> +#
> +# @try-poll: #optional attempt to use poll mode, falling back to non polling
> +#access on failure (default on)
> +#
> +# Since: 3.2

More 4.0 (I'll quit pointing it out)


> +##
> +{ 'union': 'Audiodev',
> +  'base': {
> +'id':'str',
> +'driver':'AudiodevDriver',
> +'in':'AudiodevPerDirectionOptions',
> +'out':   'AudiodevPerDirectionOptions',
> +'*timer-period': 'int' },
> +  'discriminator': 'driver',
> +  'data': {
> +'none':  'AudiodevNoOptions',

...you could just omit the lines that don't add anything.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH] vfio/common: Work around kernel overflow bug in DMA unmap

2019-01-09 Thread Peter Xu

On Wed, Jan 09, 2019 at 04:10:51PM -0700, Alex Williamson wrote:
> A kernel bug was introduced in v4.15 via commit 71a7d3d78e3c which
> adds a test for address space wrap-around in the vfio DMA unmap path.
> Unfortunately due to overflow, the kernel detects an unmap of the last
> page in the 64-bit address space as a wrap-around.  In QEMU, a Q35
> guest with VT-d emulation and guest IOMMU enabled will attempt to make
> such an unmap request during VM system reset, triggering an error:
> 
>   qemu-kvm: VFIO_UNMAP_DMA: -22
>   qemu-kvm: vfio_dma_unmap(0x561f059948f0, 0xfef0, 0x0110) = 
> -22 (Invalid argument)
> 
> Here the IOVA start address (0xfef0) and the size parameter
> (0x0110) add to exactly 2^64, triggering the bug.  A
> kernel fix is queued for the Linux v5.0 release to address this.
> 
> This patch implements a workaround to retry the unmap, excluding the
> final page of the range when we detect an unmap failing which matches
> the requirements for this issue.  This is expected to be a safe and
> complete workaround as the VT-d address space does not extend to the
> full 64-bit space and therefore the last page should never be mapped.
> 
> This workaround can be removed once all kernels with this bug are
> sufficiently deprecated.
> 
> Link: https://bugzilla.redhat.com/show_bug.cgi?id=1662291
> Reported-by: Pei Zhang 
> Debugged-by: Peter Xu 
> Signed-off-by: Alex Williamson 

Reviewed-by: Peter Xu 

-- 
Peter Xu

Re: [Qemu-devel] [PATCH] target/arm: Allow to switch from MON->HYP on AArch32

2019-01-09 Thread Peter Maydell

On Wed, 9 Jan 2019 at 17:14, Alexander Graf  wrote:
>
> On 01/09/2019 05:59 PM, Peter Maydell wrote:
> > On Wed, 9 Jan 2019 at 16:52, Peter Maydell  wrote:
> >> On Wed, 9 Jan 2019 at 15:26, Alexander Graf  wrote:
> >>> In U-boot, we switch from S-SVC -> MON -> HYP when we want to enter
> >>> HYP mode. This dance seems to work ok (hence it's there in the code
> >>> base), but breaks with current QEMU.
> > PS: it would be helpful if the commit message said how u-boot
> > is trying to go from Mon to Hyp -- some ways to try to do
> > this are OK, and some are not, so whether it's supposed to
> > work or not depends on what u-boot is actually doing...
>
> I don't fully understand all of it to be honest :). But the code is here:
>
> http://git.denx.de/?p=u-boot.git;a=blob;f=arch/arm/cpu/armv7/nonsec_virt.S
>
> What I managed to understand so far is that it goes to MON using the smc
> #0 call and then changes SPSR so that on return (movs pc) the mode will
> be different.

Thanks -- yes, that's an exception return so it's the
expected way to go from Mon to Hyp.

-- PMM

Re: [Qemu-devel] [PATCH 2/5] nbd: Allow bitmap export during QMP nbd-server-add

2019-01-09 Thread Eric Blake

On 1/8/19 10:14 PM, Eric Blake wrote:
> With the experimental x-nbd-server-add-bitmap command, there was
> a window of time where a client could see the export but not the
> associated dirty bitmap, which can cause a client that planned
> on using the dirty bitmap to be forced to treat the entire image
> as dirty as a safety fallback.  Furthermore, if the QMP client
> successfully exports a disk but then fails to add the bitmap, it
> has to take on the burden of removing the export.  Since we don't
> allow changing a dirty bitmap once it is exported (whether to a
> different bitmap, or removing advertisement of the bitmap), it is
> nicer to make the bitmap tied to the export at the time the export
> is created, and automatically failing to export if the bitmap is
> not available.
> 
> Since there is working libvirt demo code that uses both the bitmap
> export and the ability to specify an alternative name (rather than
> exposing the private-use bitmap that libvirt created to merge in
> several persistent bitmaps when doing a differential backup), the
> two new parameters do not need to be marked experimental.  See
> https://www.redhat.com/archives/libvir-list/2018-October/msg01254.html,
> https://kvmforum2018.sched.com/event/FzuB/facilitating-incremental-backup-eric-blake-red-hat

...

> +#
> +# @bitmap-export-name: How the bitmap will be seen by nbd clients
> +#  (default @bitmap). The NBD client must use
> +#  NBD_OPT_SET_META_CONTEXT with "qemu:dirty-bitmap:NAME"
> +#  to access the exposed bitmap. (since 4.0)
>  #

Actually, on re-reading my libvirt code, it turns out that I had
differential backups working without the use of bitmap-export-name,
because I always created a temporary bitmap (with the name I planned to
expose, namely "vda" if the bitmap belonged to the device that libvirt
calls "vda") and merged in the persistent bitmaps into the temporary
bitmap.  Of course, right now we only support one backup NBD job at a
time (because we don't support parallel NBD servers), but thinking ahead
to when we may support that, it becomes obvious that libvirt can't pick
the same name for two separate temporary bitmaps being used by two
parallel backup jobs.  On the other hand, Nir requested that libvirt
backup job XMLs support some sort of name or UUID, and it is just as
easy to ensure that the temporary bitmap is given the name of the backup
job (which will be unique) rather than the name of the device being
backed up.

And my proposed patches for 'qemu-nbd --list' make it easy to
definitively query what names are actually being exported.

So at this point, I'm leaning towards NOT exposing a way to remap the
exported bitmap name.  We can always add it later if it proves useful,
but I'd rather not be burdened with being stuck with it since libvirt
hasn't used it so far.  I'll spin v2 of this series along those lines.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH] ppc440: Avoid reporting error when reading non-existent RAM slot

2019-01-09 Thread BALATON Zoltan

When reading base register of RAM slot with no RAM we should not try
to calculate register value because that will result printing an error
due to invalid RAM size. Just return 0 without the error in this case.

Signed-off-by: BALATON Zoltan 
---
 hw/ppc/ppc440_uc.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/hw/ppc/ppc440_uc.c b/hw/ppc/ppc440_uc.c
index c489368905..9130eb314c 100644
--- a/hw/ppc/ppc440_uc.c
+++ b/hw/ppc/ppc440_uc.c
@@ -613,8 +613,10 @@ static uint32_t dcr_read_sdram(void *opaque, int dcrn)
 case SDRAM_R1BAS:
 case SDRAM_R2BAS:
 case SDRAM_R3BAS:
-ret = sdram_bcr(sdram->ram_bases[dcrn - SDRAM_R0BAS],
-sdram->ram_sizes[dcrn - SDRAM_R0BAS]);
+if (sdram->ram_sizes[dcrn - SDRAM_R0BAS]) {
+ret = sdram_bcr(sdram->ram_bases[dcrn - SDRAM_R0BAS],
+sdram->ram_sizes[dcrn - SDRAM_R0BAS]);
+}
 break;
 case SDRAM_CONF1HB:
 case SDRAM_CONF1LL:
-- 
2.13.7

[Qemu-devel] [PATCH v1 1/1] default-configs: Enable USB support for RISC-V machines

2019-01-09 Thread Alistair Francis

Signed-off-by: Alistair Francis 
---
 default-configs/riscv32-softmmu.mak | 1 +
 default-configs/riscv64-softmmu.mak | 1 +
 2 files changed, 2 insertions(+)

diff --git a/default-configs/riscv32-softmmu.mak 
b/default-configs/riscv32-softmmu.mak
index dbc9398284..c9c5971409 100644
--- a/default-configs/riscv32-softmmu.mak
+++ b/default-configs/riscv32-softmmu.mak
@@ -1,6 +1,7 @@
 # Default configuration for riscv-softmmu
 
 include pci.mak
+include usb.mak
 
 CONFIG_SERIAL=y
 CONFIG_VIRTIO_MMIO=y
diff --git a/default-configs/riscv64-softmmu.mak 
b/default-configs/riscv64-softmmu.mak
index dbc9398284..c9c5971409 100644
--- a/default-configs/riscv64-softmmu.mak
+++ b/default-configs/riscv64-softmmu.mak
@@ -1,6 +1,7 @@
 # Default configuration for riscv-softmmu
 
 include pci.mak
+include usb.mak
 
 CONFIG_SERIAL=y
 CONFIG_VIRTIO_MMIO=y
-- 
2.19.1

Re: [Qemu-devel] [PATCH 3/3] machine: Use shorter format for GlobalProperty arrays

2019-01-09 Thread Eduardo Habkost

On Tue, Jan 08, 2019 at 11:20:12AM +0100, Cornelia Huck wrote:
> On Tue, 8 Jan 2019 07:45:43 +0100
> Gerd Hoffmann  wrote:
> 
> >   Hi,
> > 
> > > +{ "migration", "decompress-error-check", "off" },
> > > +{ "hda-audio", "use-timer", "false" },
> > > +{ "cirrus-vga", "global-vmstate", "true" },
> > > +{ "VGA", "global-vmstate", "true" },
> > > +{ "vmware-svga", "global-vmstate", "true" },
> > > +{ "qxl-vga", "global-vmstate", "true" },  
> > 
> > I'd like to have the fields aligned.  Especially in cases like this one
> > where multiple devices get the same value assigned it makes things more
> > readable:
> > 
> > { "migration",   "decompress-error-check", "off"   },
> > { "hda-audio",   "use-timer",  "false" },
> > { "cirrus-vga",  "global-vmstate", "true"  },
> > { "VGA", "global-vmstate", "true"  },
> > { "vmware-svga", "global-vmstate", "true"  },
> > { "qxl-vga", "global-vmstate", "true"  },
> > 
> > thanks,
> >   Gerd
> > 
> 
> I'm a bit on the fence here. It does make things more readable (at
> least in your example), but I find editing aligned tables a bit
> annoying. OTOH, that won't happen often, anyway.

I'm unsure, too.  Also, not merging this series is increasing the
likelihood of conflicts with other patches.  I'm queueing this
version, and we can discuss alignment alternatives later.

(I'm less worried about conflicts caused by future alignment
patches because alignment conflicts are easier to sort out than
redoing the .driver/.property/.value conversion).

-- 
Eduardo

Re: [Qemu-devel] [PATCH] vfio: assign idstr for VFIO's mmaped regions for migration

2019-01-09 Thread Zhao Yan

On Tue, Jan 08, 2019 at 10:09:11AM -0700, Alex Williamson wrote:
> On Tue,  8 Jan 2019 01:03:48 -0500
> Zhao Yan  wrote:
> 
> > if multiple regions in vfio are mmaped, their corresponding ramblocks
> > are like below, i.e. their idstrs are "".
> > 
> > (qemu) info ramblock
> > Block Name  PSize   Offset   UsedTotal
> > pc.ram  4 KiB  0x 0x2000 0x2000
> > 4 KiB  0x2110 0x2000 0x2000
> > 4 KiB  0x2090 0x0080 0x0080
> > 4 KiB  0x2024 0x00687000 0x00687000
> > 4 KiB  0x200c 0x00178000 0x00178000
> > pc.bios 4 KiB  0x2000 0x0004 0x0004
> > pc.rom  4 KiB  0x2004 0x0002 0x0002
> > 
> > This is because ramblocks' idstr are assigned by calling
> > vmstate_register_ram(), but memory region of type ram device ptr does not
> > call vmstate_register_ram().
> > vfio_region_mmap
> > |->memory_region_init_ram_device_ptr
> >|-> memory_region_init_ram_ptr
> > 
> > Without empty idstrs will cause problem to snapshot copying during
> > migration, because it uses ramblocks' idstr to identify ramblocks.
> > ram_save_setup {
> >   …
> >   RAMBLOCK_FOREACH(block) {
> >   qemu_put_byte(f, strlen(block->idstr));
> >   qemu_put_buffer(f, (uint8_t *)block->idstr,strlen(block->idstr));
> >   qemu_put_be64(f, block->used_length);
> >   }
> >   …
> > }
> > ram_load() {
> > block = qemu_ram_block_by_name(id);
> > if (block) {
> > if (length != block->used_length) {
> > qemu_ram_resize(block, length, _err);
> > }
> >  ….
> >}
> > }
> > 
> > Therefore, in this patch,
> > vmstate_register_ram() is called for memory region of type ram ptr,
> > also a unique vfioid is assigned to vfio devices across source
> > and target vms.
> > e.g. in source vm, use qemu parameter
> > -device
> > vfio-pci,sysfsdev=/sys/bus/pci/devices/:00:02.0/
> > 882cc4da-dede-11e7-9180-078a62063ab1,vfioid=igd
> > 
> > and in target vm, use qemu paramter
> > -device
> > vfio-pci,sysfsdev=/sys/bus/pci/devices/:00:02.0/
> > 5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd,vfioid=igd
> 
> Why wouldn't we just use the id= (DeviceState.id) value instead of
> adding yet another one?  I can't imagine anyone, especially libvirt,
> wants to deal with a vfio specific id for a device.
>
hi Alex
You are right! DeviceState.id can be used here. Thanks for your suggestion.


> > Signed-off-by: Zhao Yan 
> > ---
> >  hw/vfio/pci.c | 8 +++-
> >  include/hw/vfio/vfio-common.h | 1 +
> >  memory.c  | 4 
> >  3 files changed, 12 insertions(+), 1 deletion(-)
> > 
> > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > index c0cb1ec289..7bc2ed0752 100644
> > --- a/hw/vfio/pci.c
> > +++ b/hw/vfio/pci.c
> > @@ -2533,7 +2533,12 @@ static void vfio_populate_device(VFIOPCIDevice 
> > *vdev, Error **errp)
> >  }
> >  
> >  for (i = VFIO_PCI_BAR0_REGION_INDEX; i < VFIO_PCI_ROM_REGION_INDEX; 
> > i++) {
> > -char *name = g_strdup_printf("%s BAR %d", vbasedev->name, i);
> > +char *name;
> > +if (vbasedev->vfioid) {
> > +name = g_strdup_printf("%s BAR %d", vbasedev->vfioid, i);
> > +} else {
> > +name = g_strdup_printf("%s BAR %d", vbasedev->name, i);
> > +}
> >  
> >  ret = vfio_region_setup(OBJECT(vdev), vbasedev,
> >  >bars[i].region, i, name);
> > @@ -3180,6 +3185,7 @@ static void vfio_instance_init(Object *obj)
> >  static Property vfio_pci_dev_properties[] = {
> >  DEFINE_PROP_PCI_HOST_DEVADDR("host", VFIOPCIDevice, host),
> >  DEFINE_PROP_STRING("sysfsdev", VFIOPCIDevice, vbasedev.sysfsdev),
> > +DEFINE_PROP_STRING("vfioid", VFIOPCIDevice, vbasedev.vfioid),
> >  DEFINE_PROP_ON_OFF_AUTO("display", VFIOPCIDevice,
> >  display, ON_OFF_AUTO_OFF),
> >  DEFINE_PROP_UINT32("x-intx-mmap-timeout-ms", VFIOPCIDevice,
> > diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> > index 1b434d02f6..84bab94f52 100644
> > --- a/include/hw/vfio/vfio-common.h
> > +++ b/include/hw/vfio/vfio-common.h
> > @@ -108,6 +108,7 @@ typedef struct VFIODevice {
> >  struct VFIOGroup *group;
> >  char *sysfsdev;
> >  char *name;
> > +char *vfioid;
> >  DeviceState *dev;
> >  int fd;
> >  int type;
> > diff --git a/memory.c b/memory.c
> > index d14c6dec1d..dbb29fa989 100644
> > --- a/memory.c
> > +++ b/memory.c
> > @@ -1588,6 +1588,7 @@ void memory_region_init_ram_ptr(MemoryRegion *mr,
> >  uint64_t size,
> >  void *ptr)
> >  {
> > +DeviceState *owner_dev;
> >  memory_region_init(mr, owner, name, size);
> >

Re: [Qemu-devel] [PATCH v1 3/3] contrib/gitdm: add two more IBM'ers to group-map-ibm

2019-01-09 Thread Aleksandar Markovic

On Monday, January 7, 2019, Alex Bennée  wrote:

> Signed-off-by: Alex Bennée 
> ---
>  contrib/gitdm/group-map-ibm | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/contrib/gitdm/group-map-ibm b/contrib/gitdm/group-map-ibm
> index 6c0570107d..22727319b3 100644
> --- a/contrib/gitdm/group-map-ibm
> +++ b/contrib/gitdm/group-map-ibm
> @@ -6,6 +6,8 @@ a...@ozlabs.ru
>  and...@aj.id.au
>  b...@kernel.crashing.org
>  c...@kaod.org
> +danielhb...@gmail.com
>  gr...@kaod.org
> +jcfara...@gmail.com
>  j...@jms.id.au
>  sjitindarsi...@gmail.com
> --
> 2.17.1
>
>
>
 Reviewed-by: Aleksandar Markovic

Re: [Qemu-devel] [PATCH QEMU v5] hw/arm/sysbus-fdt: Add support for instantiating generic devices

2019-01-09 Thread Peter Maydell

So I should start out upfront by saying that I'm aware that
the reality is that people do want to do passthrough with
this kind of hardware, and that's not an unreasonable
thing to do. I just don't really like the way that pushes
the software into having to do ugly things...

Overall I'll let Eric call the shots about how well this
feature fits into our sysbus-fdt support; he knows this
code much better than I do. (I would have preferred us
not to have sysbus-fdt passthrough at all, in an
ideal world :-))

On Wed, 9 Jan 2019 at 16:30, Geert Uytterhoeven  wrote:
> On Wed, Jan 9, 2019 at 5:03 PM Peter Maydell  wrote:
> > whitelists for the devices we want to allow passthrough of and
> > that we've tested to work.
>
> In the end, this will become a loong list (SoC x devices)...

Well, yes, but it deters people from trying to do passthrough
with hardware that's not designed in a way that makes
passthrough reasonably supportable at a software level.

> Reality is that on embedded, on-SoC devices are usually not on a PCI bus.
> But IOMMUs are present, and virtualization is wanted.

I don't insist on PCI. Probeable, and consistent in
terms of what they provide and how you interact with
them, is what we want. Embedded on-SoC devices are
generally neither. (The host kernel could in theory
provide an API that exposed only devices that were safely
pass-through-able in a consistent way, I suppose, but it
doesn't, or we wouldn't be having to fish through the
device tree nodes making guesses about what's safe to
allow and what isn't and having heuristics about not
providing clocks info being ok if we think the clock
might only be used for power management...)

thanks
-- PMM

Re: [Qemu-devel] [PATCH v2 06/52] audio: -audiodev command line option basic implementation

2019-01-09 Thread Zoltán Kővágó

On 2019-01-08 04:42, Markus Armbruster wrote:
> "Zoltán Kővágó"  writes:
> 
>> On 2019-01-07 14:13, Markus Armbruster wrote:
>>> "Kővágó, Zoltán"  writes:
>>>
 Audio drivers now get an Audiodev * as config paramters, instead of the
 global audio_option structs.  There is some code in audio/audio_legacy.c
 that converts the old environment variables to audiodev options (this
 way backends do not have to worry about legacy options).  It also
 contains a replacement of -audio-help, which prints out the equivalent
 -audiodev based config of the currently specified environment variables.

 Note that backends are not updated and still rely on environment
 variables.

 Also note that (due to moving try-poll from global to backend specific
 option) currently ALSA and OSS will always try poll mode, regardless of
 environment variables or -audiodev options.

 Signed-off-by: Kővágó, Zoltán 
 ---
>>> [...]
 diff --git a/audio/audio.c b/audio/audio.c
 index 96cbd57c37..e7f25ea84b 100644
 --- a/audio/audio.c
 +++ b/audio/audio.c
>>> [...]
 @@ -2127,3 +1841,158 @@ void AUD_set_volume_in (SWVoiceIn *sw, int mute, 
 uint8_t lvol, uint8_t rvol)
  }
  }
  }
 +
 +QemuOptsList qemu_audiodev_opts = {
 +.name = "audiodev",
 +.head = QTAILQ_HEAD_INITIALIZER(qemu_audiodev_opts.head),
 +.implied_opt_name = "driver",
 +.desc = {
 +/*
 + * no elements => accept any params
 + * sanity checking will happen later
 + */
 +{ /* end of list */ }
 +},
 +};
 +
 +static void validate_per_direction_opts(AudiodevPerDirectionOptions *pdo,
 +Error **errp)
 +{
 +if (!pdo->has_fixed_settings) {
 +pdo->has_fixed_settings = true;
 +pdo->fixed_settings = true;
 +}
 +if (!pdo->fixed_settings &&
 +(pdo->has_frequency || pdo->has_channels || pdo->has_format)) {
 +error_setg(errp,
 +   "You can't use frequency, channels or format with 
 fixed-settings=off");
 +return;
 +}
 +
 +if (!pdo->has_frequency) {
 +pdo->has_frequency = true;
 +pdo->frequency = 44100;
 +}
 +if (!pdo->has_channels) {
 +pdo->has_channels = true;
 +pdo->channels = 2;
 +}
 +if (!pdo->has_voices) {
 +pdo->has_voices = true;
 +pdo->voices = 1;
 +}
 +if (!pdo->has_format) {
 +pdo->has_format = true;
 +pdo->format = AUDIO_FORMAT_S16;
 +}
 +}
 +
 +static Audiodev *parse_option(QemuOpts *opts, Error **errp)
 +{
 +Error *local_err = NULL;
 +Visitor *v = opts_visitor_new(opts, true);
 +Audiodev *dev = NULL;
 +visit_type_Audiodev(v, NULL, , _err);
 +visit_free(v);
 +
 +if (local_err) {
 +goto err2;
 +}
 +
 +validate_per_direction_opts(dev->in, _err);
 +if (local_err) {
 +goto err;
 +}
 +validate_per_direction_opts(dev->out, _err);
 +if (local_err) {
 +goto err;
 +}
 +
 +if (!dev->has_timer_period) {
 +dev->has_timer_period = true;
 +dev->timer_period = 1; /* 100Hz -> 10ms */
 +}
 +
 +return dev;
 +
 +err:
 +qapi_free_Audiodev(dev);
 +err2:
 +error_propagate(errp, local_err);
 +return NULL;
 +}
 +
 +static int each_option(void *opaque, QemuOpts *opts, Error **errp)
 +{
 +Audiodev *dev = parse_option(opts, errp);
 +if (!dev) {
 +return -1;
 +}
 +return audio_init(dev);
 +}
 +
 +void audio_set_options(void)
 +{
 +if (qemu_opts_foreach(qemu_find_opts("audiodev"), each_option, NULL,
 +  _abort)) {
 +exit(1);
 +}
 +}
>>> [...]
 diff --git a/vl.c b/vl.c
 index 8353d3c718..b5364ffe46 100644
 --- a/vl.c
 +++ b/vl.c
 @@ -3074,6 +3074,7 @@ int main(int argc, char **argv, char **envp)
  qemu_add_opts(_option_rom_opts);
  qemu_add_opts(_machine_opts);
  qemu_add_opts(_accel_opts);
 +qemu_add_opts(_audiodev_opts);
  qemu_add_opts(_mem_opts);
  qemu_add_opts(_smp_opts);
  qemu_add_opts(_boot_opts);
 @@ -3307,9 +3308,15 @@ int main(int argc, char **argv, char **envp)
  add_device_config(DEV_BT, optarg);
  break;
  case QEMU_OPTION_audio_help:
 -AUD_help ();
 +audio_legacy_help();
  exit (0);
  break;
 +case QEMU_OPTION_audiodev:
 +

Re: [Qemu-devel] [PATCH 2/4] nvme: ensure the num_queues is not zero

2019-01-09 Thread Li Qiang

Max Reitz  于2019年1月9日周三 下午10:38写道：

> On 30.10.18 06:18, Li Qiang wrote:
> > When it is zero, it causes segv. Backtrack:
> > Thread 5 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
> > [Switching to Thread 0x7fffc6c17700 (LWP 51808)]
> > 0x55acbd90 in nvme_start_ctrl (n=0x624c8100) at
> hw/block/nvme.c:820
> > warning: Source file is more recent than executable.
> > 820   if (unlikely(n->cq[0])) {
> > (gdb) bt
> > 0  0x55acbd90 in nvme_start_ctrl (n=0x624c8100) at
> hw/block/nvme.c:820
> > 1  0x55accdbc in nvme_write_bar (n=0x624c8100, offset=20,
> data=4587521, size=4) at hw/block/nvme.c:964
> > 2  0x55acdd2b in nvme_mmio_write (opaque=0x624c8100,
> addr=20, data=4587521, size=4) at hw/block/nvme.c:1158
> > 3  0x558973ed in memory_region_write_accessor
> (mr=0x624c89e0, addr=20, value=0x7fffc6c14428, size=4, shift=0,
> mask=4294967295, attrs=...) at
> /home/liqiang02/qemu-upstream/qemu/memory.c:500
> > 4  0x55897600 in access_with_adjusted_size (addr=20,
> value=0x7fffc6c14428, size=4, access_size_min=2, access_size_max=8,
> access_fn=0x55897304 , mr=0x624c89e0,
> attrs=...) at /home/liqiang02/qemu-upstream/qemu/memory.c:566
> > 5  0x5589a200 in memory_region_dispatch_write
> (mr=0x624c89e0, addr=20, data=4587521, size=4, attrs=...) at
> /home/liqiang02/qemu-upstream/qemu/memory.c:1442
> > 6  0x55835151 in flatview_write_continue (fv=0x606e6fc0,
> addr=4273930260, attrs=..., buf=0x7fffc8a18028 "\001", len=4, addr1=20,
> l=4, mr=0x624c89e0) at /home/liqiang02/qemu-upstream/qemu/exec.c:3233
> > 7  0x5583529b in flatview_write (fv=0x606e6fc0,
> addr=4273930260, attrs=..., buf=0x7fffc8a18028 "\001", len=4) at
> /home/liqiang02/qemu-upstream/qemu/exec.c:3272
> > 8  0x558355a1 in address_space_write (as=0x5683ade0
> , addr=4273930260, attrs=..., buf=0x7fffc8a18028
> "\001", len=4) at /home/liqiang02/qemu-upstream/qemu/exec.c:3362
> > 9  0x558355f2 in address_space_rw (as=0x5683ade0
> , addr=4273930260, attrs=..., buf=0x7fffc8a18028
> "\001", len=4, is_write=true) at
> /home/liqiang02/qemu-upstream/qemu/exec.c:3373
> > 10 0x558b66ac in kvm_cpu_exec (cpu=0x63114800) at
> /home/liqiang02/qemu-upstream/qemu/accel/kvm/kvm-all.c:2031
> > 11 0x5587c3ac in qemu_kvm_cpu_thread_fn (arg=0x63114800) at
> /home/liqiang02/qemu-upstream/qemu/cpus.c:1277
> > 12 0x55e54ae6 in qemu_thread_start (args=0x6032c170) at
> util/qemu-thread-posix.c:504
> > 13 0x7fffdadbd494 in start_thread () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> > 14 0x7fffdaaffacf in clone () from /lib/x86_64-linux-gnu/libc.so.6
> > (gdb) q
> >
> > Signed-off-by: Li Qiang 
> > ---
> >  hw/block/nvme.c | 4 
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> > index 676cc48..72c9644 100644
> > --- a/hw/block/nvme.c
> > +++ b/hw/block/nvme.c
> > @@ -1221,6 +1221,10 @@ static void nvme_realize(PCIDevice *pci_dev,
> Error **errp)
> >  error_setg(errp, "serial property not set");
> >  return;
> >  }
> > +
> > +if (!n->num_queues) {
> > +error_setg(errp, "num_queues can't be zero");
>
> I think we should return here.
>
>
Ok, forget this. will make a revised version later.

Thanks,
Li Qiang


> (Sorry for the late review...)
>
> Max
>
> > +}
> >  blkconf_blocksizes(>conf);
> >  if (!blkconf_apply_backend_options(>conf,
> blk_is_read_only(n->conf.blk),
> > false, errp)) {
> >
>
>
>

Re: [Qemu-devel] [PATCH v1 2/3] contrib/gitdm: Add other IBMers

2019-01-09 Thread Aleksandar Markovic

On Monday, January 7, 2019, Alex Bennée  wrote:

> From: Joel Stanley 
>
> Here are some IBMers who use their personal addresses when submitting
> patches.
>
> Signed-off-by: Joel Stanley 
> Acked-by: Andrew Jeffery 
> Signed-off-by: Alex Bennée 
> ---
>  contrib/gitdm/group-map-ibm | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/contrib/gitdm/group-map-ibm b/contrib/gitdm/group-map-ibm
> index b66db5f4a8..6c0570107d 100644
> --- a/contrib/gitdm/group-map-ibm
> +++ b/contrib/gitdm/group-map-ibm
> @@ -2,5 +2,10 @@
>  # Some IBM contributors submit via another domain
>  #
>
> +a...@ozlabs.ru
> +and...@aj.id.au
> +b...@kernel.crashing.org
>  c...@kaod.org
>  gr...@kaod.org
> +j...@jms.id.au
> +sjitindarsi...@gmail.com
> --
> 2.17.1
>
>
 Reviewed-by: Aleksandar Markovic

Re: [Qemu-devel] [SPDK] Qemu migration with vhost-user-blk on top of local storage

2019-01-09 Thread wuzhouhui

> -Original Messages-
> From: "Wodkowski, PawelX" 
> Sent Time: 2019-01-10 00:19:48 (Thursday)
> To: "stefa...@gmail.com" , "s...@lists.01.org" 
> 
> Cc: "libvir-l...@redhat.com" , "xieyon...@baidu.com" 
> , "qemu-devel@nongnu.org" , 
> "lili...@baidu.com" 
> Subject: Re: [SPDK] [Qemu-devel] Qemu migration with vhost-user-blk on top of 
> local storage
> 
> On Wed, 2019-01-09 at 21:23 +0800, wuzhouhui wrote:
> > > -Original Messages-
> > > From: "Stefan Hajnoczi" 
> > > Sent Time: 2019-01-09 20:42:58 (Wednesday)
> > > To: wuzhouhui 
> > > Cc: qemu-devel@nongnu.org, xieyon...@baidu.com, lili...@baidu.com, 
> > > libvir-l...@redhat.com, s...@lists.01.org
> > > Subject: Re: [Qemu-devel] Qemu migration with vhost-user-blk on top
> > > of local storage
> > > 
> > > On Wed, Jan 09, 2019 at 06:23:42PM +0800, wuzhouhui wrote:
> > > > Hi everyone,
> > > > 
> > > > I'm working qemu with vhost target (e.g. spdk), and I attempt to
> > > > migrate VM with
> > > > 2 local storages. One local storage is a regular file, e.g.
> > > > /tmp/c74.qcow2, and
> > > > the other is a malloc bdev that spdk created. This malloc bdev
> > > > will exported to
> > > > VM via vhost-user-blk. When I execute following command:
> > > > 
> > > >   virsh migrate --live --persistent --unsafe --undefinesource --
> > > > copy-storage-all \
> > > >  --p2p --auto-converge --verbose --desturi
> > > > qemu+tcp:///system vm0
> > > > 
> > > > The libvirt reports:
> > > > 
> > > >   qemu-2.12.1: error: internal error: unable to execute QEMU
> > > > command \
> > > > 'nbd-server-add': Cannot find device=drive-virtio-disk1 nor \
> > > > node_name=drive-virtio-disk1
> > > 
> > > Please post your libvirt domain XML.
> > 
> > My libvirt is based on libvirt-1.1.1-29.el7, and add many patches to
> > satisfy our
> > own needs, e.g. add support for vhost-user-blk. Post domain xml may
> > not useful.
> > Anyway, following is full contents of XML:
> > 
> >   
> > wzh
> > a84e96e6-2c53-408d-986b-c709bc6a0e51
> > 4096
> > 
> >   
> > 
> > 4096
> > 2
> > 
> >   hvm
> >   
> > 
> > 
> >   
> > 
> > 
> >   
> > 
> > destroy
> > restart
> > destroy
> > 
> >   /data/wzh/x86_64-softmmu/qemu-system-
> > x86_64
> >   
> > 
> > 
> > 
> > 
> >   
> > 
> >   
> > 
> > 
> > 
> > 
> >   
> > 
> >   
> > 
> >   
> >   
> > 
> >   
> >   
> > 
> > 
> >   
> >   
> > 
> > 
> >   
> >   
> > 
> >   
> >   
> >> keymap='en-us'>
> > 
> >   
> >   
> > 
> > 
> >   
> > 
> > 
> >   
> > 
> > > 
> > > > Does it means that qemu with spdk on top of local storage don't
> > > > support migration?
> > > > 
> > > > QEMU: 2.12.1
> > > > SPDK: 18.10
> > > 
> > > vhost-user-blk bypasses the QEMU block layer, so NBD storage
> > > migration
> > > at the QEMU level will not work for the vhost-user-blk disk.
> > > 
> > > Stefan
> 
> 
> Don't know if this is the case that wuzhouhui is using but generally

My case is different from yours. All storages (including qemu image and malloc
bdev) are local, and only accessible from one of hosts. Anyway, thanks for
everyone's response.

wuzhouhui

> migration should work if all storages are accesible from both machines.
> The QEMU image and Malloc bdev should be exposed using some kind of
> network sharing. SPDK on migration target should be configured the same
> way as on source machine.
> 
> We are testing this issuing QEMU monitor commands so I can't help with
> libvirt here.
> 
> Example setup/test case we are using in one of our tests:
> 
> Machine1:
>   Expose VM image over SSHF
>   Expose some block device using SPDK nvmf target
>   SPDK nvmf initiator:
>  - connects to Machine1 SPDK nvmf target to access block device
>   SPDK vhost-scsi:
> - presents block device to QEMU from SPDK nvmf initiator:
>   QEMU VM uses
> - shared VM image
> - block device from SPDK nvmf  initiator over vhost-user-scsi
> 
> Machine2:  
>   SPDK nvmf initiator:
>  - connects to Machine1 SPDK nvmf target to access block device
>   SPDK vhost-scsi:
> - presents block device to QEMU from nvmf
> initiator:
>   QEMU instance - waiting for incoming migration
> - shared VM image over SSHFS
> - block device from SPDK nvmf  initiator over vhost-user-scsi
> 
> Some traffic is generated using FIO and then we just migrate VM from
> Machine1 to Machine2.
> 
> Pawel
> 
> > 
> > ___
> > SPDK mailing list
> > s...@lists.01.org
> > https://lists.01.org/mailman/listinfo/spdk
> ___
> SPDK mailing list
> s...@lists.01.org
> https://lists.01.org/mailman/listinfo/spdk

[Qemu-devel] [PATCH v2 3/3] virtio: Make disable-legacy/disable-modern compat properties optional

2019-01-09 Thread Eduardo Habkost

The disable-legacy and disable-modern properties apply only to
some virtio-pci devices.  Make those properties optional.

This fixes the crash introduced by commit f6e501a28ef9 ("virtio: Provide
version-specific variants of virtio PCI devices"):

  $ qemu-system-x86_64 -machine pc-i440fx-2.6 \
-device virtio-net-pci-non-transitional
  Unexpected error in object_property_find() at qom/object.c:1092:
  qemu-system-x86_64: -device virtio-net-pci-non-transitional: can't apply \
  global virtio-pci.disable-modern=on: Property '.disable-modern' not found
  Aborted (core dumped)

Reported-by: Thomas Huth 
Fixes: f6e501a28ef9 ("virtio: Provide version-specific variants of virtio PCI 
devices")
Signed-off-by: Eduardo Habkost 
---
 hw/core/machine.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 5530b71981..a19143aa44 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -91,8 +91,9 @@ const size_t hw_compat_2_7_len = G_N_ELEMENTS(hw_compat_2_7);
 
 GlobalProperty hw_compat_2_6[] = {
 { "virtio-mmio", "format_transport_address", "off" },
-{ "virtio-pci", "disable-modern", "on" },
-{ "virtio-pci", "disable-legacy", "off" },
+/* Optional because not all virtio-pci devices support legacy mode */
+{ "virtio-pci", "disable-modern", "on",  .optional = true },
+{ "virtio-pci", "disable-legacy", "off", .optional = true },
 };
 const size_t hw_compat_2_6_len = G_N_ELEMENTS(hw_compat_2_6);
 
-- 
2.18.0.rc1.1.g3f1ff2140

[Qemu-devel] [PATCH v2 1/3] qom: Don't keep error value between object_property_parse() calls

2019-01-09 Thread Eduardo Habkost

When handling errp==NULL at object_apply_global_props(), we are
leaving the old error value in `err` after printing a warning.
This makes QEMU crash if two global properties generate warnings:

  $ echo device_add rtl8139 | qemu-system-x86_64 -monitor stdio -global 
rtl8139.xxx=yyy -global rtl8139.xxx=zzz
  warning: can't apply global rtl8139.xxx=yyy: Property '.xxx' not found
  qemu-system-x86_64: util/error.c:57: error_setv: Assertion `*errp == NULL' 
failed.
  Aborted (core dumped)

Fix that by making `err` go out of scope immediately after the
warn_report_err() call.

Fixes: 50545b2cc029 "qdev-props: call object_apply_global_props()"
Signed-off-by: Eduardo Habkost 
---
 qom/object.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/qom/object.c b/qom/object.c
index aa6f3a2a71..4e5226ca12 100644
--- a/qom/object.c
+++ b/qom/object.c
@@ -372,7 +372,6 @@ static void object_post_init_with_type(Object *obj, 
TypeImpl *ti)
 
 void object_apply_global_props(Object *obj, const GPtrArray *props, Error 
**errp)
 {
-Error *err = NULL;
 int i;
 
 if (!props) {
@@ -381,6 +380,7 @@ void object_apply_global_props(Object *obj, const GPtrArray 
*props, Error **errp
 
 for (i = 0; i < props->len; i++) {
 GlobalProperty *p = g_ptr_array_index(props, i);
+Error *err = NULL;
 
 if (object_dynamic_cast(obj, p->driver) == NULL) {
 continue;
-- 
2.18.0.rc1.1.g3f1ff2140

[Qemu-devel] [PATCH v2 0/3] Fix virtio-*(-non)-transitional crash on 2.6 machine-types

2019-01-09 Thread Eduardo Habkost

This is a second attempt to fix the crash reported by Thomas[1].

This keeps the compat property array simple, different from my
first attempt[2].

This also avoids extra complexity on the device code: we don't
need interface name, inheritance tricks, or devices overriding a
compat property after the fact.  The simple absence of the QOM
properties on some device types is enough to make the compat code
skip them.

While working on the fix, I stumbled upon another minor bug in
object_apply_global_props(), which I fixed in patch 1/3.

This series is based on my machine-next branch, because of
conflicts with compat property array cleanups that are already
queued.

[1] a28d196a-e2fe-a013-a6e2-99ac260f6279@redhat.com">http://mid.mail-archive.com/a28d196a-e2fe-a013-a6e2-99ac260f6279@redhat.com
[2] 20190104032226.21428-1-ehabkost@redhat.com">http://mid.mail-archive.com/20190104032226.21428-1-ehabkost@redhat.com

Eduardo Habkost (3):
  qom: Don't keep error value between object_property_parse() calls
  globals: Allow global properties to be optional
  virtio: Make disable-legacy/disable-modern compat properties optional

 include/hw/qdev-core.h | 3 +++
 hw/core/machine.c  | 5 +++--
 qom/object.c   | 6 --
 3 files changed, 10 insertions(+), 4 deletions(-)

-- 
2.18.0.rc1.1.g3f1ff2140

Re: [Qemu-devel] [PATCH v1 1/3] contrib/gitdm: add Nokia and Proxmox to the domain-map

2019-01-09 Thread Aleksandar Markovic

On Monday, January 7, 2019, Alex Bennée  wrote:

> Signed-off-by: Alex Bennée 
> ---
>  contrib/gitdm/domain-map | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/contrib/gitdm/domain-map b/contrib/gitdm/domain-map
> index 8cbbcfe93d..0ab41ee27a 100644
> --- a/contrib/gitdm/domain-map
> +++ b/contrib/gitdm/domain-map
> @@ -9,7 +9,9 @@ greensocs.com   GreenSocs
>  ibm.com IBM
>  igalia.com  Igalia
>  linaro.org  Linaro
> +nokia.com   Nokia
>  oracle.com  Oracle
> +proxmox.com Proxmox
>  redhat.com  Red Hat
>  siemens.com Siemens
>  sifive.com  SiFive
> --
> 2.17.1
>
>
>
 Reviewed-by: Aleksandar Markovic

Re: [Qemu-devel] Emulation of TCG OPAL self-encrypting drive

2019-01-09 Thread David Kozub


On Mon, 7 Jan 2019, Stefan Hajnoczi wrote:


QEMU supports LUKS encrypted disk images so no new code is needed for
the actual encryption.


Thanks for the feedback, Stefan. I know very little about qemu internals 
(I looked around a bit). One issue is: OPAL needs some persistent data 
outside of the actual user-visible data. How does that fit in with storage 
in QEMU? Perhaps the implementation could just occupy a fixed size of the 
associated storage for the OPAL state.



Or, just a pass-through to a block device in the host - but a pass-through
that would allow OPAL commands.


You can pass through a storage controller using PCI passthrough or you
can pass through a SCSI LUN, but there is no ATA passthrough.


I currently don't have a usable box for PCI passthrough. I'm thinking 
that ATA passthrough could be generally usable for any fiddling and 
perhaps not too difficult to implement.


If I understand QEMU sources correctly, this needs to touch hw/ide/core.c 
(ide_exec_cmd), either adding a layer for OPAL, or just forwarding ATA 
commands for pass-through. Right?


Best regards,
David

Re: [Qemu-devel] [PATCH v2 1/3] qom: Don't keep error value between object_property_parse() calls

2019-01-09 Thread Eric Blake

On 1/9/19 8:02 PM, Eduardo Habkost wrote:
> When handling errp==NULL at object_apply_global_props(), we are
> leaving the old error value in `err` after printing a warning.
> This makes QEMU crash if two global properties generate warnings:
> 
>   $ echo device_add rtl8139 | qemu-system-x86_64 -monitor stdio -global 
> rtl8139.xxx=yyy -global rtl8139.xxx=zzz
>   warning: can't apply global rtl8139.xxx=yyy: Property '.xxx' not found
>   qemu-system-x86_64: util/error.c:57: error_setv: Assertion `*errp == NULL' 
> failed.
>   Aborted (core dumped)
> 
> Fix that by making `err` go out of scope immediately after the
> warn_report_err() call.
> 
> Fixes: 50545b2cc029 "qdev-props: call object_apply_global_props()"
> Signed-off-by: Eduardo Habkost 
> ---
>  qom/object.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/qom/object.c b/qom/object.c
> index aa6f3a2a71..4e5226ca12 100644
> --- a/qom/object.c
> +++ b/qom/object.c
> @@ -372,7 +372,6 @@ static void object_post_init_with_type(Object *obj, 
> TypeImpl *ti)
>  
>  void object_apply_global_props(Object *obj, const GPtrArray *props, Error 
> **errp)
>  {
> -Error *err = NULL;

Could also have been fixed by leaving this line at this scope,...

>  int i;
>  
>  if (!props) {
> @@ -381,6 +380,7 @@ void object_apply_global_props(Object *obj, const 
> GPtrArray *props, Error **errp
>  
>  for (i = 0; i < props->len; i++) {
>  GlobalProperty *p = g_ptr_array_index(props, i);
> +Error *err = NULL;
>  
>  if (object_dynamic_cast(obj, p->driver) == NULL) {
>  continue;
> 

...and doing 'err = NULL;' after warn_report_err().  That is, it's not
the going out of scope that fixes it per se, but the fact that you
changed to resetting it to NULL on each loop invocation rather than
leaving it pointing at freed memory.  Whether you set to NULL by a
tighter scope initializer or by an assignment doesn't matter, so no need
to respin since your way works.

Reviewed-by: Eric Blake 


-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v3 2/2] configure: Force the C standard to gnu99

2019-01-09 Thread Thomas Huth

On 2019-01-09 22:26, Paolo Bonzini wrote:
> On 09/01/19 19:10, Philippe Mathieu-Daudé wrote:
>> Using '-std=gnu++98' for g++ v4.8 looks like a good compromise to the
>> issues Daniel mentioned (still 'experimental').
>>
>> Reviewed-by: Philippe Mathieu-Daudé 
> 
> C++11 has many new features that almost make it an entirely different
> language (the main being rvalue references, and type inference with auto
> and decltype).  If it works for GCC 4.8, I would prefer using that
> instead of g++98.

I think we can revisit that setting when we really want to introduce
code that needs it. Currently it does not seem to be necessary, so that
does not justify to use an "experimental" feature of GCC 4.8, I guess.

 Thomas

Re: [Qemu-devel] [Qemu-ppc] [PATCH 1/2] include/hw/ppc: Fix compilation with clang -std=gnu99

2019-01-09 Thread Thomas Huth

On 2019-01-09 18:47, Greg Kurz wrote:
> On Wed, 9 Jan 2019 18:40:48 +0100
> Cédric Le Goater  wrote:
> 
>> On 1/9/19 6:28 PM, Daniel P. Berrangé wrote:
>>> On Wed, Jan 09, 2019 at 06:13:01PM +0100, Cédric Le Goater wrote:  
 On 1/9/19 5:39 PM, Thomas Huth wrote:  
> When compiling with Clang and -std=gnu99, I get the following errors:
>
>   CC  ppc64-softmmu/hw/intc/xics_spapr.o
> In file included from hw/intc/xics_spapr.c:34:
> include/hw/ppc/xics.h:46:25: error: redefinition of typedef 'ICSState' is 
> a C11 feature
>   [-Werror,-Wtypedef-redefinition]
> typedef struct ICSState ICSState;
> ^
> include/hw/ppc/spapr.h:18:25: note: previous definition is here
> typedef struct ICSState ICSState;
> ^
> In file included from hw/intc/xics_spapr.c:34:
> include/hw/ppc/xics.h:203:34: error: redefinition of typedef 
> 'sPAPRMachineState' is a C11 feature
>   [-Werror,-Wtypedef-redefinition]
> typedef struct sPAPRMachineState sPAPRMachineState;
>  ^
> include/hw/ppc/spapr_irq.h:25:34: note: previous definition is here
> typedef struct sPAPRMachineState sPAPRMachineState;
>  ^
>   CC  ppc64-softmmu/hw/intc/spapr_xive.o
> In file included from hw/intc/spapr_xive.c:19:
> include/hw/ppc/spapr_xive.h:38:3: error: redefinition of typedef 
> 'sPAPRXive' is a C11 feature
>   [-Werror,-Wtypedef-redefinition]
> } sPAPRXive;
>   ^
> include/hw/ppc/spapr.h:20:26: note: previous definition is here
> typedef struct sPAPRXive sPAPRXive;
>  ^
> In file included from hw/intc/spapr_xive.c:19:
> include/hw/ppc/spapr_xive.h:45:34: error: redefinition of typedef 
> 'sPAPRMachineState' is a C11 feature
>   [-Werror,-Wtypedef-redefinition]
> typedef struct sPAPRMachineState sPAPRMachineState;
>  ^
> include/hw/ppc/spapr_irq.h:25:34: note: previous definition is here
> typedef struct sPAPRMachineState sPAPRMachineState;
>  ^
>   CC  ppc64-softmmu/hw/char/spapr_vty.o
> In file included from hw/char/spapr_vty.c:8:
> In file included from include/hw/ppc/spapr.h:12:
> include/hw/ppc/spapr_xive.h:45:34: error: redefinition of typedef 
> 'sPAPRMachineState' is a C11 feature
>   [-Werror,-Wtypedef-redefinition]
> typedef struct sPAPRMachineState sPAPRMachineState;
>  ^
> include/hw/ppc/spapr_irq.h:25:34: note: previous definition is here
> typedef struct sPAPRMachineState sPAPRMachineState;
>  ^
>
> Since we're going to make -std=gnu99 mandatory, fix these issues
> by including the right header files indead of typedeffing stuff twice.
>
> Signed-off-by: Thomas Huth 
> ---
>  include/hw/ppc/spapr.h  | 4 ++--
>  include/hw/ppc/spapr_xive.h | 2 +-
>  include/hw/ppc/xics.h   | 4 ++--
>  3 files changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 2c77a8b..6a5ae4f 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -8,6 +8,8 @@
>  #include "hw/mem/pc-dimm.h"
>  #include "hw/ppc/spapr_ovec.h"
>  #include "hw/ppc/spapr_irq.h"
> +#include "hw/ppc/xics.h"
> +#include "hw/ppc/spapr_xive.h"
>  
>  struct VIOsPAPRBus;
>  struct sPAPRPHBState;
> @@ -15,8 +17,6 @@ struct sPAPRNVRAM;
>  typedef struct sPAPREventLogEntry sPAPREventLogEntry;
>  typedef struct sPAPREventSource sPAPREventSource;
>  typedef struct sPAPRPendingHPT sPAPRPendingHPT;
> -typedef struct ICSState ICSState;
> -typedef struct sPAPRXive sPAPRXive;
>  
>  #define HPTE64_V_HPTE_DIRTY 0x0040ULL
>  #define SPAPR_ENTRY_POINT   0x100
> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
> index 728735d..aff4366 100644
> --- a/include/hw/ppc/spapr_xive.h
> +++ b/include/hw/ppc/spapr_xive.h
> @@ -42,7 +42,7 @@ bool spapr_xive_irq_free(sPAPRXive *xive, uint32_t 
> lisn);
>  void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon);
>  qemu_irq spapr_xive_qirq(sPAPRXive *xive, uint32_t lisn);
>  
> -typedef struct sPAPRMachineState sPAPRMachineState;
> +#include "hw/ppc/spapr.h"  /* for sPAPRMachineState */  

 so both files include each other, how nice ...  
>>>
>>> If the header files are mutually dependent it makes me wonder what the
>>> point of having them split up is ?  
>>
>> The XICS interrupt controller models are used in two different machines,
>> sPAPR and PowerNV.
>>
>>> Feels like either they need to be merged, or they need to be split up
>>> and refactored even more to remove the

Re: [Qemu-devel] [Qemu-ppc] [PATCH 1/2] include/hw/ppc: Fix compilation with clang -std=gnu99

2019-01-09 Thread Thomas Huth

On 2019-01-09 22:20, Paolo Bonzini wrote:
> On 09/01/19 18:28, Daniel P. Berrangé wrote:
>>> so both files include each other, how nice ...
>> If the header files are mutually dependent it makes me wonder what the
>> point of having them split up is ?
>>
>> Feels like either they need to be merged, or they need to be split up
>> and refactored even more to remove the mutual dependancy.
> 
> If they include each other only for the typedefs, then prehaps the
> solution is to change the coding style and allow using struct in
> function prototypes.  I'm pretty sure there are several examples of this
> already.

I'm all in favor for this!

 Thomas

Re: [Qemu-devel] [PATCH v5 4/6] nbd/server: implement dirty bitmap export

2019-01-09 Thread Eric Blake

On 1/9/19 1:21 PM, Eric Blake wrote:
> Revisiting an older thread:
> 
> On 6/9/18 10:17 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Handle new NBD meta namespace: "qemu", and corresponding queries:
>> "qemu:dirty-bitmap:".
>>
>> With new metadata context negotiated, BLOCK_STATUS query will reply
>> with dirty-bitmap data, converted to extents. New public function
>> nbd_export_bitmap selects bitmap to export. For now, only one bitmap
>> may be exported.
>>

>> +if (bdrv_dirty_bitmap_enabled(bm)) {
>> +error_setg(errp, "Bitmap '%s' is enabled", bitmap);
>> +return;
>> +}
> 
> Why are we restricting things to only export disabled bitmaps?
> 
> I can understand the argument that if the image being exported is
> read-only, then an enabled bitmap _that can be changed_ is probably a
> bad idea (it goes against the notion of the export being read only).
> But if we were to allow a writable access to an image, wouldn't we
> expect that writes be reflected into the bitmap, which means permitting
> an enabled bitmap?

I've now addressed this in my v2 Promote x-nbd-server-add-bitmap to
stable series.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH v2 4/6] nbd: Remove x-nbd-server-add-bitmap

2019-01-09 Thread Eric Blake

Now that nbd-server-add can do the same functionality, we no
longer need the experimental separate command.

Signed-off-by: Eric Blake 
---
 qapi/block.json | 23 ---
 blockdev-nbd.c  | 23 ---
 2 files changed, 46 deletions(-)

diff --git a/qapi/block.json b/qapi/block.json
index 3d70420f763..5a79d639e8c 100644
--- a/qapi/block.json
+++ b/qapi/block.json
@@ -301,29 +301,6 @@
 { 'command': 'nbd-server-remove',
   'data': {'name': 'str', '*mode': 'NbdServerRemoveMode'} }

-##
-# @x-nbd-server-add-bitmap:
-#
-# Expose a dirty bitmap associated with the selected export. The bitmap search
-# starts at the device attached to the export, and includes all backing files.
-# The exported bitmap is then locked until the NBD export is removed.
-#
-# @name: Export name.
-#
-# @bitmap: Bitmap name to search for.
-#
-# @bitmap-export-name: How the bitmap will be seen by nbd clients
-#  (default @bitmap)
-#
-# Note: the client must use NBD_OPT_SET_META_CONTEXT with a query of
-# "qemu:dirty-bitmap:NAME" (where NAME matches @bitmap-export-name) to access
-# the exposed bitmap.
-#
-# Since: 3.0
-##
-  { 'command': 'x-nbd-server-add-bitmap',
-'data': {'name': 'str', 'bitmap': 'str', '*bitmap-export-name': 'str'} }
-
 ##
 # @nbd-server-stop:
 #
diff --git a/blockdev-nbd.c b/blockdev-nbd.c
index ac7e993c35f..003ba7d7180 100644
--- a/blockdev-nbd.c
+++ b/blockdev-nbd.c
@@ -228,26 +228,3 @@ void qmp_nbd_server_stop(Error **errp)
 nbd_server_free(nbd_server);
 nbd_server = NULL;
 }
-
-void qmp_x_nbd_server_add_bitmap(const char *name, const char *bitmap,
- bool has_bitmap_export_name,
- const char *bitmap_export_name,
- Error **errp)
-{
-NBDExport *exp;
-
-if (!nbd_server) {
-error_setg(errp, "NBD server not running");
-return;
-}
-
-exp = nbd_export_find(name);
-if (exp == NULL) {
-error_setg(errp, "Export '%s' is not found", name);
-return;
-}
-
-nbd_export_bitmap(exp, bitmap,
-  has_bitmap_export_name ? bitmap_export_name : bitmap,
-  errp);
-}
-- 
2.20.1

[Qemu-devel] [PATCH v2 2/6] nbd: Merge nbd_export_set_name into nbd_export_new

2019-01-09 Thread Eric Blake

The existing NBD code had a weird split where nbd_export_new()
created an export but did not add it to the list of exported
names until a later nbd_export_set_name() came along and grabbed
a second reference on the object; later, nbd_export_close()
drops the second reference.  But since we never change the
name of an NBD export while it is exposed, it is easier to just
inline the process of setting the name as part of creating the
export.

Inline the contents of nbd_export_set_name() and
nbd_export_set_description() into the two points in an export
lifecycle where they matter, then adjust both callers to pass
the name up front.  Note that all callers pass a non-NULL name,
(passing NULL at creation was for old style servers, but we
removed support for that in commit 7f7dfe2a).

Signed-off-by: Eric Blake 
---
 include/block/nbd.h |  3 +--
 blockdev-nbd.c  |  5 ++---
 nbd/server.c| 45 -
 qemu-nbd.c  |  5 ++---
 4 files changed, 21 insertions(+), 37 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 65402d33964..2f9a2aeb73c 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -295,6 +295,7 @@ typedef struct NBDExport NBDExport;
 typedef struct NBDClient NBDClient;

 NBDExport *nbd_export_new(BlockDriverState *bs, off_t dev_offset, off_t size,
+  const char *name, const char *description,
   uint16_t nbdflags, void (*close)(NBDExport *),
   bool writethrough, BlockBackend *on_eject_blk,
   Error **errp);
@@ -306,8 +307,6 @@ void nbd_export_put(NBDExport *exp);
 BlockBackend *nbd_export_get_blockdev(NBDExport *exp);

 NBDExport *nbd_export_find(const char *name);
-void nbd_export_set_name(NBDExport *exp, const char *name);
-void nbd_export_set_description(NBDExport *exp, const char *description);
 void nbd_export_close_all(void);

 void nbd_client_new(QIOChannelSocket *sioc,
diff --git a/blockdev-nbd.c b/blockdev-nbd.c
index 1d170c80b82..f5edbc27d88 100644
--- a/blockdev-nbd.c
+++ b/blockdev-nbd.c
@@ -174,14 +174,13 @@ void qmp_nbd_server_add(const char *device, bool 
has_name, const char *name,
 writable = false;
 }

-exp = nbd_export_new(bs, 0, -1, writable ? 0 : NBD_FLAG_READ_ONLY,
+exp = nbd_export_new(bs, 0, -1, name, NULL,
+ writable ? 0 : NBD_FLAG_READ_ONLY,
  NULL, false, on_eject_blk, errp);
 if (!exp) {
 return;
 }

-nbd_export_set_name(exp, name);
-
 /* The list of named exports has a strong reference to this export now and
  * our only way of accessing it is through nbd_export_find(), so we can 
drop
  * the strong reference that is @exp. */
diff --git a/nbd/server.c b/nbd/server.c
index 98327088cb4..676fb4886d0 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1456,6 +1456,7 @@ static void nbd_eject_notifier(Notifier *n, void *data)
 }

 NBDExport *nbd_export_new(BlockDriverState *bs, off_t dev_offset, off_t size,
+  const char *name, const char *description,
   uint16_t nbdflags, void (*close)(NBDExport *),
   bool writethrough, BlockBackend *on_eject_blk,
   Error **errp)
@@ -1471,6 +1472,7 @@ NBDExport *nbd_export_new(BlockDriverState *bs, off_t 
dev_offset, off_t size,
  * that BDRV_O_INACTIVE is cleared and the image is ready for write
  * access since the export could be available before migration handover.
  */
+assert(name);
 ctx = bdrv_get_aio_context(bs);
 aio_context_acquire(ctx);
 bdrv_invalidate_cache(bs, NULL);
@@ -1494,6 +1496,8 @@ NBDExport *nbd_export_new(BlockDriverState *bs, off_t 
dev_offset, off_t size,
 QTAILQ_INIT(>clients);
 exp->blk = blk;
 exp->dev_offset = dev_offset;
+exp->name = g_strdup(name);
+exp->description = g_strdup(description);
 exp->nbdflags = nbdflags;
 exp->size = size < 0 ? blk_getlength(blk) : size;
 if (exp->size < 0) {
@@ -1513,10 +1517,14 @@ NBDExport *nbd_export_new(BlockDriverState *bs, off_t 
dev_offset, off_t size,
 exp->eject_notifier.notify = nbd_eject_notifier;
 blk_add_remove_bs_notifier(on_eject_blk, >eject_notifier);
 }
+QTAILQ_INSERT_TAIL(, exp, next);
+nbd_export_get(exp);
 return exp;

 fail:
 blk_unref(blk);
+g_free(exp->name);
+g_free(exp->description);
 g_free(exp);
 return NULL;
 }
@@ -1533,33 +1541,6 @@ NBDExport *nbd_export_find(const char *name)
 return NULL;
 }

-void nbd_export_set_name(NBDExport *exp, const char *name)
-{
-if (exp->name == name) {
-return;
-}
-
-nbd_export_get(exp);
-if (exp->name != NULL) {
-g_free(exp->name);
-exp->name = NULL;
-QTAILQ_REMOVE(, exp, next);
-nbd_export_put(exp);
-}
-if (name != NULL) {
-nbd_export_get(exp);
-exp->name =

[Qemu-devel] [PATCH v2 1/6] nbd: Only require disabled bitmap for read-only exports

2019-01-09 Thread Eric Blake

Our initial implementation of x-nbd-server-add-bitmap put
in a restriction because of incremental backups: in that
usage, we are exporting one qcow2 file (the temporary overlay
target of a blockdev-backup sync:none job) and a dirty bitmap
owned by a second qcow2 file (the source of the
blockdev-backup, which is the backing file of the temporary).
While both qcow2 files are still writable (the target capture
copy-on-write of old contents, the source to track live guest
writes in the meantime), the NBD client expects to see
constant data, including the dirty bitmap.  An enabled bitmap
in the source would be modified by guest writes, which is at
odds with the NBD export being a read-only constant view,
hence the initial code choice of enforcing a disabled bitmap
(the intent is that the exposed bitmap was disabled in the
same transaction that started the blockdev-backup job,
although we don't want to actually track enough state to
actually enforce that).

However, in the case of image migration, where we WANT a
writable export, it makes total sense that the NBD client
could expect writes to change the status visible through
the exposed dirty bitmap, and thus permitting an enabled
bitmap for an export that is not read-only makes sense;
although the current restriction prevents that usage.

Alternatively, consider the case when the active layer does
not contain the bitmap but the backing layer does.  If the
backing layer is read-only (which is different from the
backing layer of a blockdev-backup job being writable),
we know the backing layer cannot be modified, and thus the
bitmap will not change contents even if it is still marked
enabled (for that matter, since the backing file is
read-only, we couldn't change the bitmap to be disabled
in the first place).  Again, the current restriction
prevents this usage.

Solve both issues by gating the restriction against a
disabled bitmap to only happen when the caller has
requested a read-only export, and where the BDS that owns
the bitmap (which might be a backing file of the BDS
handed to nbd_export_new) is still writable.

Update iotest 223 to add some tests of the error paths.

Signed-off-by: Eric Blake 

---
v2: new patch
---
 nbd/server.c   |  7 +--
 tests/qemu-iotests/223 | 14 --
 tests/qemu-iotests/223.out |  4 
 3 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index 7af0ddffb20..98327088cb4 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -2456,8 +2456,11 @@ void nbd_export_bitmap(NBDExport *exp, const char 
*bitmap,
 return;
 }

-if (bdrv_dirty_bitmap_enabled(bm)) {
-error_setg(errp, "Bitmap '%s' is enabled", bitmap);
+if ((exp->nbdflags & NBD_FLAG_READ_ONLY) && bdrv_is_writable(bs) &&
+bdrv_dirty_bitmap_enabled(bm)) {
+error_setg(errp,
+   "Enabled bitmap '%s' incompatible with readonly export",
+   bitmap);
 return;
 }

diff --git a/tests/qemu-iotests/223 b/tests/qemu-iotests/223
index 5513dc62159..f1fbb9bc1c6 100755
--- a/tests/qemu-iotests/223
+++ b/tests/qemu-iotests/223
@@ -61,6 +61,8 @@ echo "=== Create partially sparse image, then add dirty 
bitmaps ==="
 echo

 # Two bitmaps, to contrast granularity issues
+# Also note that b will be disabled, while b2 is left enabled, to
+# check for read-only interactions
 _make_test_img -o cluster_size=4k 4M
 $QEMU_IO -c 'w -P 0x11 1M 2M' "$TEST_IMG" | _filter_qemu_io
 run_qemu <

Re: [Qemu-devel] [PATCH v3 0/5] kvm "virtio pmem" device

2019-01-09 Thread Rik van Riel

On Thu, 2019-01-10 at 12:26 +1100, Dave Chinner wrote:
> On Wed, Jan 09, 2019 at 08:17:31PM +0530, Pankaj Gupta wrote:
> >  This patch series has implementation for "virtio pmem". 
> >  "virtio pmem" is fake persistent memory(nvdimm) in guest 
> >  which allows to bypass the guest page cache. This also
> >  implements a VIRTIO based asynchronous flush mechanism.  
> 
> H. Sharing the host page cache direct into the guest VM. Sounds
> like a good idea, but.
> 
> This means the guest VM can now run timing attacks to observe host
> side page cache residency, and depending on the implementation I'm
> guessing that the guest will be able to control host side page
> cache eviction, too (e.g. via discard or hole punch operations).

Even if the guest can only access its own disk image,
using its own page cache pages, Pankaj's patches could
still be a win because they allow the host kernel to
easily evict page cache pages without doing IO.

Evicting page cache pages that live inside a guest
involves something like swapping, ballooning, or some
sort of callback into the guest to figure out whether
a page is clean.

Evicting page cache pages that live in the host, OTOH...

-- 
All Rights Reversed.

signature.asc
Description: This is a digitally signed message part

[Qemu-devel] [PATCH v2 3/6] nbd: Allow bitmap export during QMP nbd-server-add

2019-01-09 Thread Eric Blake

With the experimental x-nbd-server-add-bitmap command, there was
a window of time where an NBD client could see the export but not
the associated dirty bitmap, which can cause a client that planned
on using the dirty bitmap to be forced to treat the entire image
as dirty as a safety fallback.  Furthermore, if the QMP client
successfully exports a disk but then fails to add the bitmap, it
has to take on the burden of removing the export.  Since we don't
allow changing the exposed dirty bitmap (whether to a different
bitmap, or removing advertisement of the bitmap), it is nicer to
make the bitmap tied to the export at the time the export is
created, with automatic failure to export if the bitmap is not
available.

The experimental command included an optional 'bitmap-export-name'
field for remapping the name exposed over NBD to be different from
the bitmap name stored on disk.  However, my libvirt demo code
for implementing differential backups on top of persistent bitmaps
did not need to take advantage of that feature (it is instead
possible to create a new temporary bitmap with the desired name,
use block-dirty-bitmap-merge to merge one or more persistent
bitmaps into the temporary, then associate the temporary with the
NBD export, if control is needed over the exported bitmap name).
Hence, I'm not copying that part of the experiment over to the
stable addition. For more details on the libvirt demo, see
https://www.redhat.com/archives/libvir-list/2018-October/msg01254.html,
https://kvmforum2018.sched.com/event/FzuB/facilitating-incremental-backup-eric-blake-red-hat

This patch focuses on the user interface, and reduces (but does
not completely eliminate) the window where an NBD client can see
the export but not the dirty bitmap.  Later patches will add
further cleanups now that this interface is declared stable via a
single QMP command, including removing the race window.

Update test 223 to use the new interface, including a demonstration
that it is now easier to handle failures.

Signed-off-by: Eric Blake 
---
 qapi/block.json|  7 ++-
 blockdev-nbd.c | 12 +++-
 hmp.c  |  5 +++--
 tests/qemu-iotests/223 | 17 ++---
 tests/qemu-iotests/223.out |  5 +
 5 files changed, 27 insertions(+), 19 deletions(-)

diff --git a/qapi/block.json b/qapi/block.json
index 11f01f28efe..3d70420f763 100644
--- a/qapi/block.json
+++ b/qapi/block.json
@@ -246,6 +246,10 @@
 #
 # @writable: Whether clients should be able to write to the device via the
 # NBD connection (default false).
+
+# @bitmap: Also export the dirty bitmap reachable from @device, so the
+#  NBD client can use NBD_OPT_SET_META_CONTEXT with
+#  "qemu:dirty-bitmap:NAME" to inspect the bitmap. (since 4.0)
 #
 # Returns: error if the server is not running, or export with the same name
 #  already exists.
@@ -253,7 +257,8 @@
 # Since: 1.3.0
 ##
 { 'command': 'nbd-server-add',
-  'data': {'device': 'str', '*name': 'str', '*writable': 'bool'} }
+  'data': {'device': 'str', '*name': 'str', '*writable': 'bool',
+   '*bitmap': 'str' } }

 ##
 # @NbdServerRemoveMode:
diff --git a/blockdev-nbd.c b/blockdev-nbd.c
index f5edbc27d88..ac7e993c35f 100644
--- a/blockdev-nbd.c
+++ b/blockdev-nbd.c
@@ -140,7 +140,8 @@ void qmp_nbd_server_start(SocketAddressLegacy *addr,
 }

 void qmp_nbd_server_add(const char *device, bool has_name, const char *name,
-bool has_writable, bool writable, Error **errp)
+bool has_writable, bool writable,
+bool has_bitmap, const char *bitmap, Error **errp)
 {
 BlockDriverState *bs = NULL;
 BlockBackend *on_eject_blk;
@@ -185,6 +186,15 @@ void qmp_nbd_server_add(const char *device, bool has_name, 
const char *name,
  * our only way of accessing it is through nbd_export_find(), so we can 
drop
  * the strong reference that is @exp. */
 nbd_export_put(exp);
+
+if (has_bitmap) {
+Error *err = NULL;
+nbd_export_bitmap(exp, bitmap, bitmap, );
+if (err) {
+error_propagate(errp, err);
+nbd_export_remove(exp, NBD_SERVER_REMOVE_MODE_HARD, NULL);
+}
+}
 }

 void qmp_nbd_server_remove(const char *name,
diff --git a/hmp.c b/hmp.c
index 80aa5ab504b..8da5fd8760a 100644
--- a/hmp.c
+++ b/hmp.c
@@ -2326,7 +2326,7 @@ void hmp_nbd_server_start(Monitor *mon, const QDict 
*qdict)
 }

 qmp_nbd_server_add(info->value->device, false, NULL,
-   true, writable, _err);
+   true, writable, false, NULL, _err);

 if (local_err != NULL) {
 qmp_nbd_server_stop(NULL);
@@ -2347,7 +2347,8 @@ void hmp_nbd_server_add(Monitor *mon, const QDict *qdict)
 bool writable = qdict_get_try_bool(qdict, "writable", false);
 Error *local_err = NULL;

-qmp_nbd_server_add(device, !!name, name, true, writable, _err);
+

[Qemu-devel] [PATCH v2 0/6] Promote x-nbd-server-add-bitmap to stable

2019-01-09 Thread Eric Blake

Or rather, move its functionality into nbd-server-add.  And as
a side effect, teach qemu-nbd how to export a persistent bitmap
without having to go through a qemu process and several QMP
commands.

Based-on: <20181221093529.23855-1-js...@redhat.com>
[0/11 bitmaps: remove x- prefix from QMP api]

Available at: https://repo.or.cz/qemu/ericb.git nbd-bitmap-add-v2

Since v1:
- add new patch 1 allowing bitmaps with writable exports
- add coverage in iotest 223
- fix logic bug that rendered qemu-nbd -B useless
- drop support for bitmap-export-name remapping

001/6:[down] 'nbd: Only require disabled bitmap for read-only exports'
002/6:[] [--] 'nbd: Merge nbd_export_set_name into nbd_export_new'
003/6:[0056] [FC] 'nbd: Allow bitmap export during QMP nbd-server-add'
004/6:[] [--] 'nbd: Remove x-nbd-server-add-bitmap'
005/6:[0042] [FC] 'nbd: Merge nbd_export_bitmap into nbd_export_new'
006/6:[0041] [FC] 'qemu-nbd: Add --bitmap=NAME option'

Eric Blake (6):
  nbd: Only require disabled bitmap for read-only exports
  nbd: Merge nbd_export_set_name into nbd_export_new
  nbd: Allow bitmap export during QMP nbd-server-add
  nbd: Remove x-nbd-server-add-bitmap
  nbd: Merge nbd_export_bitmap into nbd_export_new
  qemu-nbd: Add --bitmap=NAME option

 qemu-nbd.texi  |   4 ++
 qapi/block.json|  30 ++---
 include/block/nbd.h|  12 ++--
 blockdev-nbd.c |  31 ++---
 hmp.c  |   5 +-
 nbd/server.c   | 129 -
 qemu-nbd.c |  15 +++--
 tests/qemu-iotests/223 |  39 ---
 tests/qemu-iotests/223.out |  19 --
 9 files changed, 132 insertions(+), 152 deletions(-)

-- 
2.20.1

[Qemu-devel] [PATCH v2 6/6] qemu-nbd: Add --bitmap=NAME option

2019-01-09 Thread Eric Blake

Having to fire up qemu, then use QMP commands for nbd-server-start
and nbd-server-add, just to expose a persistent dirty bitmap, is
rather tedious.  Make it possible to expose a dirty bitmap using
just qemu-nbd (of course, for now this only works when qemu-nbd is
visiting a BDS formatted as qcow2).

Of course, any good feature also needs unit testing, so expand
iotest 223 to cover it.

Signed-off-by: Eric Blake 
---
 qemu-nbd.texi  |  4 
 qemu-nbd.c | 10 --
 tests/qemu-iotests/223 | 18 +-
 tests/qemu-iotests/223.out | 12 +++-
 4 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/qemu-nbd.texi b/qemu-nbd.texi
index 9a84e81eed9..96b1546006a 100644
--- a/qemu-nbd.texi
+++ b/qemu-nbd.texi
@@ -45,6 +45,10 @@ auto-detecting
 Export the disk as read-only
 @item -P, --partition=@var{num}
 Only expose partition @var{num}
+@item -B, --bitmap=@var{name}
+If @var{filename} has a qcow2 persistent bitmap @var{name}, expose
+that bitmap via the ``qemu:dirty-bitmap:@var{name}'' context
+accessible through NBD_OPT_SET_META_CONTEXT.
 @item -s, --snapshot
 Use @var{filename} as an external snapshot, create a temporary
 file with backing_file=@var{filename}, redirect the write to
diff --git a/qemu-nbd.c b/qemu-nbd.c
index 03813187846..26f5c08821d 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -95,6 +95,7 @@ static void usage(const char *name)
 "Exposing part of the image:\n"
 "  -o, --offset=OFFSET   offset into the image\n"
 "  -P, --partition=NUM   only expose partition NUM\n"
+"  -B, --bitmap=NAME expose a persistent dirty bitmap\n"
 "\n"
 "General purpose options:\n"
 "  --object type,id=ID,...   define an object such as 'secret' for providing\n"
@@ -509,7 +510,7 @@ int main(int argc, char **argv)
 off_t fd_size;
 QemuOpts *sn_opts = NULL;
 const char *sn_id_or_name = NULL;
-const char *sopt = "hVb:o:p:rsnP:c:dvk:e:f:tl:x:T:D:";
+const char *sopt = "hVb:o:p:rsnP:c:dvk:e:f:tl:x:T:D:B:";
 struct option lopt[] = {
 { "help", no_argument, NULL, 'h' },
 { "version", no_argument, NULL, 'V' },
@@ -519,6 +520,7 @@ int main(int argc, char **argv)
 { "offset", required_argument, NULL, 'o' },
 { "read-only", no_argument, NULL, 'r' },
 { "partition", required_argument, NULL, 'P' },
+{ "bitmap", required_argument, NULL, 'B' },
 { "connect", required_argument, NULL, 'c' },
 { "disconnect", no_argument, NULL, 'd' },
 { "snapshot", no_argument, NULL, 's' },
@@ -558,6 +560,7 @@ int main(int argc, char **argv)
 QDict *options = NULL;
 const char *export_name = ""; /* Default export name */
 const char *export_description = NULL;
+const char *bitmap = NULL;
 const char *tlscredsid = NULL;
 bool imageOpts = false;
 bool writethrough = true;
@@ -695,6 +698,9 @@ int main(int argc, char **argv)
 exit(EXIT_FAILURE);
 }
 break;
+case 'B':
+bitmap = optarg;
+break;
 case 'k':
 sockpath = optarg;
 if (sockpath[0] != '/') {
@@ -1016,7 +1022,7 @@ int main(int argc, char **argv)
 }

 exp = nbd_export_new(bs, dev_offset, fd_size, export_name,
- export_description, NULL, nbdflags,
+ export_description, bitmap, nbdflags,
  nbd_export_closed, writethrough, NULL, _fatal);

 if (device) {
diff --git a/tests/qemu-iotests/223 b/tests/qemu-iotests/223
index 1f6822f9489..d7c4a31a181 100755
--- a/tests/qemu-iotests/223
+++ b/tests/qemu-iotests/223
@@ -25,6 +25,7 @@ status=1 # failure is the default!

 _cleanup()
 {
+nbd_server_stop
 _cleanup_test_img
 _cleanup_qemu
 rm -f "$TEST_DIR/nbd"
@@ -35,6 +36,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 . ./common.rc
 . ./common.filter
 . ./common.qemu
+. ./common.nbd

 _supported_fmt qcow2
 _supported_proto file # uses NBD as well
@@ -153,7 +155,7 @@ $QEMU_IMG map --output=json --image-opts \
   "$IMG,x-dirty-bitmap=qemu:dirty-bitmap:b2" | _filter_qemu_img_map

 echo
-echo "=== End NBD server ==="
+echo "=== End qemu NBD server ==="
 echo

 _send_qemu_cmd $QEMU_HANDLE '{"execute":"nbd-server-remove",
@@ -165,6 +167,20 @@ _send_qemu_cmd $QEMU_HANDLE 
'{"execute":"nbd-server-remove",
 _send_qemu_cmd $QEMU_HANDLE '{"execute":"nbd-server-stop"}' "return"
 _send_qemu_cmd $QEMU_HANDLE '{"execute":"quit"}' "return"

+echo
+echo "=== Use qemu-nbd as server ==="
+echo
+
+nbd_server_start_unix_socket -r -f $IMGFMT -B b "$TEST_IMG"
+IMG="driver=nbd,server.type=unix,server.path=$nbd_unix_socket"
+$QEMU_IMG map --output=json --image-opts \
+  "$IMG,x-dirty-bitmap=qemu:dirty-bitmap:b" | _filter_qemu_img_map
+
+nbd_server_start_unix_socket -f $IMGFMT -B b2 "$TEST_IMG"
+IMG="driver=nbd,server.type=unix,server.path=$nbd_unix_socket"
+$QEMU_IMG map --output=json --image-opts \
+  "$IMG,x-dirty-bitmap=qemu:dirty-bitmap:b2"

Re: [Qemu-devel] [PATCH v2 06/52] audio: -audiodev command line option basic implementation

2019-01-09 Thread Gerd Hoffmann

  Hi,

> I was thinking about creating an Audiodev (the qapi type) directly would
> be better, then somehow print it with reflection.  While this is not a
> typical use of qapi, at least qmp_qom_list creates qapi objects
> directly, so I assume it's ok.

Yes, it's perfectly fine.

> The second problem: with my patched options visitor, nested structs were
> required in qapi, the options visitor unconditionally filled them.  With
> qobject_input_visitor, I have to mark them optional, otherwise the user
> have to specify at least one option from the nested structs.

Can we just drop the nesting?

Have a look at DisplayOptions (qapi struct used to store -display
options).  It's a union, has some common base fields, and the
discriminator field (type in that case) decides which of the data
branches is used.  And on the command line I can do stuff like this:

  -display gtk,full-screen=on,zoom-to-fit=on
  ^^^ in struct DisplayGTK
   ^^^common base field

Audiodev should be able to do the same to support backend-specific
options without nesting.

cheers,
  Gerd

Re: [Qemu-devel] [PATCH] target/arm: Allow to switch from MON->HYP on AArch32

2019-01-09 Thread Alexander Graf




On 10.01.19 00:08, Peter Maydell wrote:
> On Wed, 9 Jan 2019 at 17:14, Alexander Graf  wrote:
>>
>> On 01/09/2019 05:59 PM, Peter Maydell wrote:
>>> On Wed, 9 Jan 2019 at 16:52, Peter Maydell  wrote:
 On Wed, 9 Jan 2019 at 15:26, Alexander Graf  wrote:
> In U-boot, we switch from S-SVC -> MON -> HYP when we want to enter
> HYP mode. This dance seems to work ok (hence it's there in the code
> base), but breaks with current QEMU.
>>> PS: it would be helpful if the commit message said how u-boot
>>> is trying to go from Mon to Hyp -- some ways to try to do
>>> this are OK, and some are not, so whether it's supposed to
>>> work or not depends on what u-boot is actually doing...
>>
>> I don't fully understand all of it to be honest :). But the code is here:
>>
>> http://git.denx.de/?p=u-boot.git;a=blob;f=arch/arm/cpu/armv7/nonsec_virt.S
>>
>> What I managed to understand so far is that it goes to MON using the smc
>> #0 call and then changes SPSR so that on return (movs pc) the mode will
>> be different.
> 
> Thanks -- yes, that's an exception return so it's the
> expected way to go from Mon to Hyp.

That was my understanding, yes. Do you still want me to change the
commit message to mention that or will you just do it when applying?

Thanks,

Alex

Re: [Qemu-devel] [PATCH] target-i386: hvf: remove MPX support

2019-01-09 Thread Roman Bolshakov

On Thu, Dec 20, 2018 at 01:11:12PM +0100, Paolo Bonzini wrote:
> MPX support is being phased out by Intel and actually I am not sure that
> OS X has ever enabled it in XCR0.  Drop it from the Hypervisor.framework
> acceleration.
> 

I also doubt if OS X enabled it, but I can't confirm that as I have only
Ivy Bridge and Haswell-based laptops.

Reviewed-by: Roman Bolshakov 

Thanks,
Roman

Re: [Qemu-devel] [PATCH] spice: Remove unused include

2019-01-09 Thread Gerd Hoffmann

On Mon, Jan 07, 2019 at 06:44:04PM +, Frediano Ziglio wrote:
> The definitions in the header are not  used.
> Also this fixes porting SPICE to Windows where the header is not
> available.

Added to ui queue.

thanks,
  Gerd

[Qemu-devel] [PATCH v2 5/6] nbd: Merge nbd_export_bitmap into nbd_export_new

2019-01-09 Thread Eric Blake

We only have one caller that wants to export a bitmap name,
which it does right after creation of the export. But there is
still a brief window of time where an NBD client could see the
export but not the dirty bitmap, which a robust client would
have to interpret as meaning the entire image should be treated
as dirty.  Better is to eliminate the window entirely, by
inlining nbd_export_bitmap() into nbd_export_new(), and refusing
to create the bitmap in the first place if the requested bitmap
can't be located.

We also no longer need logic for setting a different bitmap
name compared to the bitmap being exported.

Signed-off-by: Eric Blake 
---
 include/block/nbd.h |  9 ++---
 blockdev-nbd.c  | 11 +-
 nbd/server.c| 87 +
 qemu-nbd.c  |  4 +--
 4 files changed, 46 insertions(+), 65 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 2f9a2aeb73c..1971b557896 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -296,9 +296,9 @@ typedef struct NBDClient NBDClient;

 NBDExport *nbd_export_new(BlockDriverState *bs, off_t dev_offset, off_t size,
   const char *name, const char *description,
-  uint16_t nbdflags, void (*close)(NBDExport *),
-  bool writethrough, BlockBackend *on_eject_blk,
-  Error **errp);
+  const char *bitmap, uint16_t nbdflags,
+  void (*close)(NBDExport *), bool writethrough,
+  BlockBackend *on_eject_blk, Error **errp);
 void nbd_export_close(NBDExport *exp);
 void nbd_export_remove(NBDExport *exp, NbdServerRemoveMode mode, Error **errp);
 void nbd_export_get(NBDExport *exp);
@@ -319,9 +319,6 @@ void nbd_client_put(NBDClient *client);
 void nbd_server_start(SocketAddress *addr, const char *tls_creds,
   Error **errp);

-void nbd_export_bitmap(NBDExport *exp, const char *bitmap,
-   const char *bitmap_export_name, Error **errp);
-
 /* nbd_read
  * Reads @size bytes from @ioc. Returns 0 on success.
  */
diff --git a/blockdev-nbd.c b/blockdev-nbd.c
index 003ba7d7180..0df6307be2d 100644
--- a/blockdev-nbd.c
+++ b/blockdev-nbd.c
@@ -175,7 +175,7 @@ void qmp_nbd_server_add(const char *device, bool has_name, 
const char *name,
 writable = false;
 }

-exp = nbd_export_new(bs, 0, -1, name, NULL,
+exp = nbd_export_new(bs, 0, -1, name, NULL, bitmap,
  writable ? 0 : NBD_FLAG_READ_ONLY,
  NULL, false, on_eject_blk, errp);
 if (!exp) {
@@ -186,15 +186,6 @@ void qmp_nbd_server_add(const char *device, bool has_name, 
const char *name,
  * our only way of accessing it is through nbd_export_find(), so we can 
drop
  * the strong reference that is @exp. */
 nbd_export_put(exp);
-
-if (has_bitmap) {
-Error *err = NULL;
-nbd_export_bitmap(exp, bitmap, bitmap, );
-if (err) {
-error_propagate(errp, err);
-nbd_export_remove(exp, NBD_SERVER_REMOVE_MODE_HARD, NULL);
-}
-}
 }

 void qmp_nbd_server_remove(const char *name,
diff --git a/nbd/server.c b/nbd/server.c
index 676fb4886d0..e613594fde0 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1457,9 +1457,9 @@ static void nbd_eject_notifier(Notifier *n, void *data)

 NBDExport *nbd_export_new(BlockDriverState *bs, off_t dev_offset, off_t size,
   const char *name, const char *description,
-  uint16_t nbdflags, void (*close)(NBDExport *),
-  bool writethrough, BlockBackend *on_eject_blk,
-  Error **errp)
+  const char *bitmap, uint16_t nbdflags,
+  void (*close)(NBDExport *), bool writethrough,
+  BlockBackend *on_eject_blk, Error **errp)
 {
 AioContext *ctx;
 BlockBackend *blk;
@@ -1507,6 +1507,43 @@ NBDExport *nbd_export_new(BlockDriverState *bs, off_t 
dev_offset, off_t size,
 }
 exp->size -= exp->size % BDRV_SECTOR_SIZE;

+if (bitmap) {
+BdrvDirtyBitmap *bm = NULL;
+BlockDriverState *bs = blk_bs(blk);
+
+while (true) {
+bm = bdrv_find_dirty_bitmap(bs, bitmap);
+if (bm != NULL || bs->backing == NULL) {
+break;
+}
+
+bs = bs->backing->bs;
+}
+
+if (bm == NULL) {
+error_setg(errp, "Bitmap '%s' is not found", bitmap);
+goto fail;
+}
+
+if ((nbdflags & NBD_FLAG_READ_ONLY) && bdrv_is_writable(bs) &&
+bdrv_dirty_bitmap_enabled(bm)) {
+error_setg(errp,
+   "Enabled bitmap '%s' incompatible with readonly export",
+   bitmap);
+goto fail;
+}
+
+if (bdrv_dirty_bitmap_user_locked(bm)) {
+

Re: [Qemu-devel] [PATCH] linux-user: make pwrite64/pread64(fd, NULL, 0, offset) return 0

2019-01-09 Thread Laurent Vivier

Le 08/01/2019 à 19:49, Peter Maydell a écrit :
> Linux returns success if pwrite64() or pread64() are called with a
> zero length NULL buffer, but QEMU was returning -TARGET_EFAULT.
> 
> This is the same bug that we fixed in commit 58cfa6c2e6eb51b23cc9
> for the write syscall, and long before that in 38d840e6790c29f59
> for the read syscall.
> 
> Fixes: https://bugs.launchpad.net/qemu/+bug/1810433
> 
> Signed-off-by: Peter Maydell 
> ---
> I chose to fix this by setting p to NULL and falling through
> to the normal-case codepath rather than having a call to
> pread/pwrite in the special-case if like 58cfa6c2e6eb5,
> because here the normal-case is a bit more complicated as
> it has the target_offset64() call in it.
> 38d840e6790c29f59 has "just return 0" for the NULL buffer
> case, but we can't do that here as that would not get the
> "negative offset should return -EINVAL" case write.
> ---
>  linux-user/syscall.c | 22 ++
>  1 file changed, 18 insertions(+), 4 deletions(-)
> 

Reviewed-by: Laurent Vivier

[Qemu-devel] Qemu migration with vhost-user-blk on top of local storage

2019-01-09 Thread wuzhouhui

Hi everyone,

I'm working qemu with vhost target (e.g. spdk), and I attempt to migrate VM with
2 local storages. One local storage is a regular file, e.g. /tmp/c74.qcow2, and
the other is a malloc bdev that spdk created. This malloc bdev will exported to
VM via vhost-user-blk. When I execute following command:

  virsh migrate --live --persistent --unsafe --undefinesource 
--copy-storage-all \
 --p2p --auto-converge --verbose --desturi qemu+tcp:///system vm0

The libvirt reports:

  qemu-2.12.1: error: internal error: unable to execute QEMU command \
'nbd-server-add': Cannot find device=drive-virtio-disk1 nor \
node_name=drive-virtio-disk1

Does it means that qemu with spdk on top of local storage don't support 
migration?

QEMU: 2.12.1
SPDK: 18.10

Thanks.

Re: [Qemu-devel] [PATCH] linux-user: make pwrite64/pread64(fd, NULL, 0, offset) return 0

2019-01-09 Thread Laurent Vivier

On 08/01/2019 19:49, Peter Maydell wrote:
> Linux returns success if pwrite64() or pread64() are called with a
> zero length NULL buffer, but QEMU was returning -TARGET_EFAULT.
> 
> This is the same bug that we fixed in commit 58cfa6c2e6eb51b23cc9
> for the write syscall, and long before that in 38d840e6790c29f59
> for the read syscall.
> 
> Fixes: https://bugs.launchpad.net/qemu/+bug/1810433
> 
> Signed-off-by: Peter Maydell 
> ---
> I chose to fix this by setting p to NULL and falling through
> to the normal-case codepath rather than having a call to
> pread/pwrite in the special-case if like 58cfa6c2e6eb5,
> because here the normal-case is a bit more complicated as
> it has the target_offset64() call in it.
> 38d840e6790c29f59 has "just return 0" for the NULL buffer
> case, but we can't do that here as that would not get the
> "negative offset should return -EINVAL" case write.
> ---
>  linux-user/syscall.c | 22 ++
>  1 file changed, 18 insertions(+), 4 deletions(-)
> 
> diff --git a/linux-user/syscall.c b/linux-user/syscall.c
> index 280137da8c2..b13a170e52e 100644
> --- a/linux-user/syscall.c
> +++ b/linux-user/syscall.c
> @@ -9677,8 +9677,15 @@ static abi_long do_syscall1(void *cpu_env, int num, 
> abi_long arg1,
>  arg4 = arg5;
>  arg5 = arg6;
>  }
> -if (!(p = lock_user(VERIFY_WRITE, arg2, arg3, 0)))
> -return -TARGET_EFAULT;
> +if (arg2 == 0 && arg3 == 0) {
> +/* Special-case NULL buffer and zero length, which should 
> succeed */
> +p = 0;
> +} else {
> +p = lock_user(VERIFY_WRITE, arg2, arg3, 0);
> +if (!p) {
> +return -TARGET_EFAULT;
> +}
> +}
>  ret = get_errno(pread64(arg1, p, arg3, target_offset64(arg4, arg5)));
>  unlock_user(p, arg2, ret);
>  return ret;
> @@ -9687,8 +9694,15 @@ static abi_long do_syscall1(void *cpu_env, int num, 
> abi_long arg1,
>  arg4 = arg5;
>  arg5 = arg6;
>  }
> -if (!(p = lock_user(VERIFY_READ, arg2, arg3, 1)))
> -return -TARGET_EFAULT;
> +if (arg2 == 0 && arg3 == 0) {
> +/* Special-case NULL buffer and zero length, which should 
> succeed */
> +p = 0;
> +} else {
> +p = lock_user(VERIFY_READ, arg2, arg3, 1);
> +if (!p) {
> +return -TARGET_EFAULT;
> +}
> +}
>  ret = get_errno(pwrite64(arg1, p, arg3, target_offset64(arg4, 
> arg5)));
>  unlock_user(p, arg2, 0);
>  return ret;
> 

Applied to my linux-user branch.

Thanks,
Laurent

Re: [Qemu-devel] [Qemu-trivial] [PATCH] ioapic: use TYPE_FOO MACRO than constant string

2019-01-09 Thread Laurent Vivier

On 05/01/2019 03:38, Li Qiang wrote:
> Make them more QOMConventional.
> Cc:qemu-triv...@nongnu.org
> 
> Signed-off-by: Li Qiang 
> ---
>  hw/i386/kvm/ioapic.c | 2 +-
>  hw/i386/pc.c | 4 ++--
>  hw/intc/ioapic.c | 2 +-
>  include/hw/i386/ioapic.h | 3 +++
>  4 files changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/i386/kvm/ioapic.c b/hw/i386/kvm/ioapic.c
> index 5b40d75439..e453692199 100644
> --- a/hw/i386/kvm/ioapic.c
> +++ b/hw/i386/kvm/ioapic.c
> @@ -163,7 +163,7 @@ static void kvm_ioapic_class_init(ObjectClass *klass, 
> void *data)
>  }
>  
>  static const TypeInfo kvm_ioapic_info = {
> -.name  = "kvm-ioapic",
> +.name  = TYPE_KVM_IOAPIC,
>  .parent = TYPE_IOAPIC_COMMON,
>  .instance_size = sizeof(KVMIOAPICState),
>  .class_init = kvm_ioapic_class_init,
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index f248662e97..af68a61615 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -1665,9 +1665,9 @@ void ioapic_init_gsi(GSIState *gsi_state, const char 
> *parent_name)
>  unsigned int i;
>  
>  if (kvm_ioapic_in_kernel()) {
> -dev = qdev_create(NULL, "kvm-ioapic");
> +dev = qdev_create(NULL, TYPE_KVM_IOAPIC);
>  } else {
> -dev = qdev_create(NULL, "ioapic");
> +dev = qdev_create(NULL, TYPE_IOAPIC);
>  }
>  if (parent_name) {
>  object_property_add_child(object_resolve_path(parent_name, NULL),
> diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c
> index 4e529729b4..9d75f84d3b 100644
> --- a/hw/intc/ioapic.c
> +++ b/hw/intc/ioapic.c
> @@ -429,7 +429,7 @@ static void ioapic_class_init(ObjectClass *klass, void 
> *data)
>  }
>  
>  static const TypeInfo ioapic_info = {
> -.name  = "ioapic",
> +.name  = TYPE_IOAPIC,
>  .parent= TYPE_IOAPIC_COMMON,
>  .instance_size = sizeof(IOAPICCommonState),
>  .class_init= ioapic_class_init,
> diff --git a/include/hw/i386/ioapic.h b/include/hw/i386/ioapic.h
> index 9c8816f11f..59fcb158a7 100644
> --- a/include/hw/i386/ioapic.h
> +++ b/include/hw/i386/ioapic.h
> @@ -23,6 +23,9 @@
>  #define IOAPIC_NUM_PINS 24
>  #define IO_APIC_DEFAULT_ADDRESS 0xfec0
>  
> +#define TYPE_KVM_IOAPIC "kvm-ioapic"
> +#define TYPE_IOAPIC "ioapic"
> +
>  void ioapic_eoi_broadcast(int vector);
>  
>  #endif /* HW_IOAPIC_H */
> 

Applied to my trivial-patches branch.

Thanks,
Laurent

Re: [Qemu-devel] [PATCH v3] dump: Set correct vaddr for ELF dump

2019-01-09 Thread Laszlo Ersek

On 01/09/19 09:25, Marc-André Lureau wrote:
> On Wed, Jan 9, 2019 at 12:22 PM Jon Doron  wrote:
>>
>> vaddr needs to be equal to the paddr since the dump file represents the
>> physical memory image.
>>
>> Without setting vaddr correctly, GDB would load all the different memory
>> regions on top of each other to vaddr 0, thus making GDB showing the wrong
>> memory data for a given address.
>>
>> Signed-off-by: Jon Doron 
> 
> Tested-by: Marc-André Lureau 
> Reviewed-by: Marc-André Lureau 

Apparently the only change in v3, relative to v2, is cleaning up the
whitespace problem that was caught by Patchew.

Acked-by: Laszlo Ersek 

Thanks
Laszlo

>>  dump.c   | 5 +++--
>>  scripts/dump-guest-memory.py | 1 +
>>  2 files changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/dump.c b/dump.c
>> index 4ec94c5e25..de7f70f099 100644
>> --- a/dump.c
>> +++ b/dump.c
>> @@ -192,7 +192,7 @@ static void write_elf64_load(DumpState *s, MemoryMapping 
>> *memory_mapping,
>>  phdr.p_paddr = cpu_to_dump64(s, memory_mapping->phys_addr);
>>  phdr.p_filesz = cpu_to_dump64(s, filesz);
>>  phdr.p_memsz = cpu_to_dump64(s, memory_mapping->length);
>> -phdr.p_vaddr = cpu_to_dump64(s, memory_mapping->virt_addr);
>> +phdr.p_vaddr = cpu_to_dump64(s, memory_mapping->virt_addr) ? : 
>> phdr.p_paddr;
>>
>>  assert(memory_mapping->length >= filesz);
>>
>> @@ -216,7 +216,8 @@ static void write_elf32_load(DumpState *s, MemoryMapping 
>> *memory_mapping,
>>  phdr.p_paddr = cpu_to_dump32(s, memory_mapping->phys_addr);
>>  phdr.p_filesz = cpu_to_dump32(s, filesz);
>>  phdr.p_memsz = cpu_to_dump32(s, memory_mapping->length);
>> -phdr.p_vaddr = cpu_to_dump32(s, memory_mapping->virt_addr);
>> +phdr.p_vaddr =
>> +cpu_to_dump32(s, memory_mapping->virt_addr) ? : phdr.p_paddr;
>>
>>  assert(memory_mapping->length >= filesz);
>>
>> diff --git a/scripts/dump-guest-memory.py b/scripts/dump-guest-memory.py
>> index 198cd0fe40..2c587cbefc 100644
>> --- a/scripts/dump-guest-memory.py
>> +++ b/scripts/dump-guest-memory.py
>> @@ -163,6 +163,7 @@ class ELF(object):
>>  phdr = get_arch_phdr(self.endianness, self.elfclass)
>>  phdr.p_type = p_type
>>  phdr.p_paddr = p_paddr
>> +phdr.p_vaddr = p_paddr
>>  phdr.p_filesz = p_size
>>  phdr.p_memsz = p_size
>>  self.segments.append(phdr)
>> --
>> 2.19.2
>>
>>
> 
> 
I pre

Re: [Qemu-devel] [PATCH v4] tests: vm: auto_install OpenBSD

2019-01-09 Thread Daniel P . Berrangé

On Wed, Nov 14, 2018 at 05:58:07PM +0800, Fam Zheng wrote:
> On Sun, 11/11 18:20, Brad Smith wrote:
> > ping.
> 
> Queued. Will send a pull request soon.

ping, this seems to have got lost, and is blocking us deleting SDL 1.2
support from QEMU.

> > On 10/30/2018 10:57 PM, Fam Zheng wrote:
> > > Upgrade OpenBSD to 6.4 using auto_install. Especially, drop SDL1,
> > > include SDL2.
> > > 
> > > Also do the build in $HOME since both /var/tmp and /tmp are tmpfs with
> > > limited capacities.
> > > 
> > > Signed-off-by: Fam Zheng 
> > > 
> > > ---
> > > 
> > > v4: Use 6.4. [Brad]
> > > ---
> > >   tests/vm/basevm.py |  6 ++--
> > >   tests/vm/openbsd   | 85 +++---
> > >   2 files changed, 76 insertions(+), 15 deletions(-)
> > > 
> > > diff --git a/tests/vm/basevm.py b/tests/vm/basevm.py
> > > index 5caf77d6b8..6fb446d4c5 100755
> > > --- a/tests/vm/basevm.py
> > > +++ b/tests/vm/basevm.py
> > > @@ -68,8 +68,6 @@ class BaseVM(object):
> > >   self._args = [ \
> > >   "-nodefaults", "-m", "4G",
> > >   "-cpu", "max",
> > > -"-netdev", "user,id=vnet,hostfwd=:127.0.0.1:0-:22",
> > > -"-device", "virtio-net-pci,netdev=vnet",
> > >   "-vnc", "127.0.0.1:0,to=20",
> > >   "-serial", "file:%s" % os.path.join(self._tmpdir, 
> > > "serial.out")]
> > >   if vcpus and vcpus > 1:
> > > @@ -146,8 +144,10 @@ class BaseVM(object):
> > >   "-device",
> > >   "virtio-blk,drive=%s,serial=%s,bootindex=1" 
> > > % (name, name)]
> > > -def boot(self, img, extra_args=[]):
> > > +def boot(self, img, extra_args=[], extra_usernet_args=""):
> > >   args = self._args + [
> > > +"-netdev", "user,id=vnet,hostfwd=:127.0.0.1:0-:22" + 
> > > extra_usernet_args,
> > > +"-device", "virtio-net-pci,netdev=vnet",
> > >   "-device", "VGA",
> > >   "-drive", "file=%s,if=none,id=drive0,cache=writeback" % img,
> > >   "-device", "virtio-blk,drive=drive0,bootindex=0"]
> > > diff --git a/tests/vm/openbsd b/tests/vm/openbsd
> > > index cfe0572c59..99a7e98d80 100755
> > > --- a/tests/vm/openbsd
> > > +++ b/tests/vm/openbsd
> > > @@ -14,6 +14,9 @@
> > >   import os
> > >   import sys
> > >   import subprocess
> > > +import time
> > > +import atexit
> > > +import tempfile
> > >   import basevm
> > >   class OpenBSDVM(basevm.BaseVM):
> > > @@ -21,25 +24,83 @@ class OpenBSDVM(basevm.BaseVM):
> > >   arch = "x86_64"
> > >   BUILD_SCRIPT = """
> > >   set -e;
> > > -rm -rf /var/tmp/qemu-test.*
> > > -cd $(mktemp -d /var/tmp/qemu-test.XX);
> > > +rm -rf $HOME/qemu-test.*
> > > +cd $(mktemp -d $HOME/qemu-test.XX);
> > >   tar -xf /dev/rsd1c;
> > > -./configure --cc=x86_64-unknown-openbsd6.1-gcc-4.9.4 
> > > --python=python2.7 {configure_opts};
> > > +./configure {configure_opts};
> > >   gmake --output-sync -j{jobs} {verbose};
> > >   # XXX: "gmake check" seems to always hang or fail
> > >   #gmake --output-sync -j{jobs} check {verbose};
> > >   """
> > > +def _install_os(self, img):
> > > +tmpdir = tempfile.mkdtemp()
> > > +pxeboot = 
> > > self._download_with_cache("https://fastly.cdn.openbsd.org/pub/OpenBSD/6.4/amd64/pxeboot;,
> > > +
> > > sha256sum="d87ab39d941ff926d693943a927585945456ccedb76ea504a251b4b93cd4c266")
> > > +bsd_rd = 
> > > self._download_with_cache("https://fastly.cdn.openbsd.org/pub/OpenBSD/6.4/amd64/bsd.rd;,
> > > +
> > > sha256sum="89505c683cbcd75582fe475e847ed53d89e2b8180c3e3d61f4eb4b76b5e11f5c")
> > > +install = 
> > > self._download_with_cache("https://fastly.cdn.openbsd.org/pub/OpenBSD/6.4/amd64/install64.iso;,
> > > +
> > > sha256sum='81833b79e23dc0f961ac5fb34484bca66386deb3181ddb8236870fa4f488cdd2')
> > > +subprocess.check_call(["qemu-img", "create", img, "32G"])
> > > +subprocess.check_call(["cp", pxeboot, os.path.join(tmpdir, 
> > > "auto_install")])
> > > +subprocess.check_call(["cp", bsd_rd, os.path.join(tmpdir, 
> > > "bsd")])
> > > +
> > > +self._gen_install_conf(tmpdir)
> > > +# BOOTP filename being auto_install makes sure OpenBSD installer
> > > +# not prompt for "auto install mode"
> > > +usernet_args = ",tftp=%s,bootfile=/auto_install" % tmpdir
> > > +usernet_args += ",tftp-server-name=10.0.2.4"
> > > +usernet_args += ",guestfwd=tcp:10.0.2.4:80-cmd:cat %s" % \
> > > +os.path.join(tmpdir, "install.conf")
> > > +self.boot(img,
> > > +  extra_args=["-boot", "once=n", "-no-reboot",
> > > +  "-cdrom", install],
> > > +  extra_usernet_args=usernet_args)
> > > +self.wait()
> > > +
> > > +def _gen_install_conf(self, tmpdir):
> >

Re: [Qemu-devel] [PATCH v3] qemu-io: Add generic function for reinitializing optind.

2019-01-09 Thread Richard W.M. Jones

On Wed, Jan 09, 2019 at 01:44:48PM +, Daniel P. Berrangé wrote:
> On Wed, Jan 09, 2019 at 01:23:01PM +, Richard W.M. Jones wrote:
> > How about this one?  Add a generic osdep function for reinitializing
> > optind, which does optreset on FreeBSD (but is identical on all other
> > OSes).  Use it from qemu-io and qemu-img.
> > 
> > I have tested this on Linux, FreeBSD and OpenBSD.
> > 
> > checkpatch complains:
> > 
> > WARNING: Block comments use a leading /* on a separate line
> > #69: FILE: include/qemu/osdep.h:591:
> > +/**
> 
> I think it just doesn't like your '/**' and wants '/*' instead.

The existing comments in the same file are a mix of three styles.
I chose the style used closest to the new comment I was adding :-)

> > WARNING: architecture specific defines should be avoided
> > #78: FILE: include/qemu/osdep.h:600:
> > +#ifdef __FreeBSD__
> 
> Normally we'd suggest doing a configure test to for the platform
> feature and then using a feature based ifdef test. In this case
> though that would be difficult and/or overly complex.
> 
> This does make me wonder about the other *BSDs, OS-X and Mingw

OpenBSD is known fine with optind = 0.  I can't test OS-X.  I can test
mingw (on Linux) later.

Rich.

> though ?  Should they all be using the #else codepath, or should
> the other BSDs / OS-X use the __FreeBSD__ codepath.
> 
> 
> Regards,
> Daniel
> -- 
> |: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-https://fstop138.berrange.com :|
> |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v

Re: [Qemu-devel] [PATCH] configure: Force the C standard to gnu11

2019-01-09 Thread Thomas Huth

On 2019-01-09 15:20, Markus Armbruster wrote:
> Thomas Huth  writes:
> 
>> On 2019-01-09 14:10, Markus Armbruster wrote:
>>> Thomas Huth  writes:
>>>
 On 2019-01-09 12:44, Daniel P. Berrangé wrote:
> On Wed, Jan 09, 2019 at 12:25:43PM +0100, Thomas Huth wrote:
>> On 2019-01-09 11:58, Daniel P. Berrangé wrote:
>>> On Mon, Jan 07, 2019 at 11:45:26AM +0100, Thomas Huth wrote:
 Different versions of GCC and Clang use different versions of the C 
 standard.
 This repeatedly caused problems already, e.g. with duplicated typedefs:

  https://lists.gnu.org/archive/html/qemu-devel/2018-11/msg05829.html

 or with for-loop variable initializers:

  https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg00237.html

 To avoid these problems, we should enforce the C language version to 
 the
 same level for all compilers. Since our minimum compiler versions are
 GCC v4.8 and Clang v3.4 now, and both basically support "gnu11" 
 already,
 this seems to be a good choice.
>>>
>>> In 4.x   gnu11 is marked as experimental. I'm not really comfortable
>>> using experimental features - even if its warning free there's a risk
>>> it would silently mis-compile something.
>>>
>>> gnu99 is ok with 4.x - it is merely "incomplete".
>>
>> gnu11 has the big advantage that it also fixes the problem with
>> duplicated typedefs that are reported by older versions of Clang.
>>
>> Are you sure about the experimental character in 4.x? I just looked at
>> https://gcc.gnu.org/onlinedocs/gcc-4.8.5/gcc/Standards.html and it says:
>>
>> "A fourth version of the C standard, known as C11, was published in 2011
>> as ISO/IEC 9899:2011. GCC has limited incomplete support for parts of
>> this standard, enabled with -std=c11 or -std=iso9899:2011."
>>
>> It does not say anything about "experimental" there. The word
>> "experimental" is only used for the C++ support, but we hardly have C++
>> code in QEMU -- if you worry about that, I could simply drop the
>> "-std=gnu++11" part from my patch?
>
> I was looking at the "info gcc" docs on RHEL7, 
> gcc-4.8.5-16.el7_4.1.x86_64:
>
>   "3.4 Options Controlling C Dialect
>
>snip...
>
>  'gnu11'
>  'gnu1x'
>   GNU dialect of ISO C11.  Support is incomplete and
>   experimental.  The name 'gnu1x' is deprecated."

 Ok. Looks like the "Support is incomplete and experimental" sentence has
 been removed with GCC 4.9.0 here. So GCC 4.8 is likely pretty close
 already. IMHO we could give it a try and enable gnu11 for QEMU with GCC
 v4.8, too. If we later find problems, we could still switch back to
 gnu99 instead. Other opinions?
>>>
>>> Switchinh back could be somewhat painful if we already started using C11
>>> features.  And if we don't plan to, then what exactly will -std=gnu11
>>> buy us?
>>
>> With C11, we get safety for the "duplicated typedef" problem that we run
>> into regularly again and again, see e.g.:
>>
>> https://lists.gnu.org/archive/html/qemu-devel/2018-11/msg05829.html
> 
> That's a compilation failure.  "Support is experimental" makes me afraid
> of run time failures.
> 
> If we truly want C11, shouldn't we bump minimum required GCC to 4.9?

That's not possible, since we claim to support RHEL7 / CentOS7 that is
still using GCC v4.8.

 Thomas

Re: [Qemu-devel] [PATCH v2 0/3] HMP/snapshot changes - do not use ID anymore

2019-01-09 Thread Max Reitz

On 09.01.19 15:21, Kevin Wolf wrote:
> Am 09.01.2019 um 15:10 hat Max Reitz geschrieben:
>> On 06.09.18 13:11, Daniel Henrique Barboza wrote:
>>> changes in v2:
>>> - removed the "RFC" marker;
>>> - added a new patch (patch 2) that removes
>>> bdrv_snapshot_delete_by_id_or_name from the code;
>>> - made changes in patch 1 as suggested by Murilo;
>>> - previous patch set link:
>>> https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg04658.html
>>>
>>>
>>> It is not uncommon to see bugs being opened by testers that attempt to
>>> create VM snapshots using HMP. It turns out that "0" and "1" are quite
>>> common snapshot names and they trigger a lot of bugs. I gave an example
>>> in the commit message of patch 1, but to sum up here: QEMU treats the
>>> input of savevm/loadvm/delvm sometimes as 'ID', sometimes as 'name'. It
>>> is documented as such, but this can lead to strange situations.
>>>
>>> Given that it is strange for an API to consider a parameter to be 2 fields
>>> at the same time, and inadvently treating them as one or the other, and
>>> that removing the ID field is too drastic, my idea here is to keep the
>>> ID field for internal control, but do not let the user set it.
>>>
>>> I guess there's room for discussion about considering this change an API
>>> change or not. It doesn't affect users of HMP and it doesn't affect Libvirt,
>>> but simplifying the meaning of the parameters of savevm/loadvm/delvm.
>>
>> (Yes, very late reply, I'm sorry...)
>>
>> I do think it affects users of HMP, because right now you can delete
>> snapshots with their ID, and after this series you cannot.
> 
> Can there be snapshots that can't be identified by a snapshot name, but
> only by their ID?

I don't know, but what I meant is that if you have scripts to do all
this, you might have to adjust them with this change.

>> I think we had a short discussion about just disallowing numeric
>> snapshot names.  How bad would that be?
> 
> It would be incompatible with existing images and result in a more
> complex snapshot identifier resolution. Why would it be any better?

It wouldn't be incompatible with existing images if we'd just disallow
creating such snapshots.  I don't see how the identifier resolution
would be more complex.

I don't know if it'd be better.  I'd just find it simpler (for us, that
is -- for users, I'm not sure).

Max



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH] iotests: Make 234 stable

2019-01-09 Thread Kevin Wolf

Am 21.12.2018 um 19:42 hat Max Reitz geschrieben:
> This test waits for a MIGRATION event with status=completed on the
> source VM before querying the migration status on both source and
> destination.  However, just because the source says migration has
> completed does not mean the destination thinks the same.  Therefore, in
> some cases, the destination VM may still report "active" instead of
> "completed" when asked for its migration status.
> 
> Fix this by enabling migration events on both VMs and waiting until both
> source and destination emit a status=completed MIGRATION event.
> 
> Signed-off-by: Max Reitz 

Thanks, applied to the block branch.

Kevin

Re: [Qemu-devel] [PATCH v3 4/5] ext4: disable map_sync for virtio pmem

2019-01-09 Thread Jan Kara

On Wed 09-01-19 19:26:05, Pankaj Gupta wrote:
> Virtio pmem provides asynchronous host page cache flush
> mechanism. We don't support 'MAP_SYNC' with virtio pmem 
> and ext4. 
> 
> Signed-off-by: Pankaj Gupta 
...
> @@ -371,6 +373,13 @@ static int ext4_file_mmap(struct file *file, struct 
> vm_area_struct *vma)
>   if (!IS_DAX(file_inode(file)) && (vma->vm_flags & VM_SYNC))
>   return -EOPNOTSUPP;
>  
> + /* We don't support synchronous mappings with guest direct access
> +  * and virtio based host page cache flush mechanism.
> +  */
> + if (IS_DAX(file_inode(file)) && virtio_pmem_host_cache_enabled(dax_dev)
> + && (vma->vm_flags & VM_SYNC))
> + return -EOPNOTSUPP;
> +

Shouldn't there rather be some generic way of doing this? Having
virtio_pmem_host_cache_enabled() check in filesystem code just looks like
filesystem sniffing into details is should not care about... Maybe just
naming this (or having a wrapper) dax_dev_map_sync_supported()?

Honza
-- 
Jan Kara 
SUSE Labs, CR

Re: [Qemu-devel] [PATCH v15 5/6] acpi: add ACPI memory clear interface

2019-01-09 Thread Michael S. Tsirkin

On Wed, Jan 09, 2019 at 12:29:01PM +0400, Marc-André Lureau wrote:
> The interface is described in the "TCG Platform Reset Attack
> Mitigation Specification", chapter 6 "ACPI _DSM Function". According
> to Laszlo, it's not so easy to implement in OVMF, he suggested to do
> it in qemu instead.
> 
> See specification documentation for more details, and next commit for
> memory clear on reset handling.
> 
> Signed-off-by: Marc-André Lureau 
> ---
>  hw/acpi/tpm.c  | 48 ++
>  docs/specs/tpm.txt |  2 ++
>  2 files changed, 50 insertions(+)
> 
> diff --git a/hw/acpi/tpm.c b/hw/acpi/tpm.c
> index fee9490306..61da1de97b 100644
> --- a/hw/acpi/tpm.c
> +++ b/hw/acpi/tpm.c
> @@ -53,6 +53,16 @@ void tpm_build_ppi_acpi(TPMIf *tpm, Aml *dev)
>  pprq = aml_name("PPRQ");
>  pprm = aml_name("PPRM");
>  
> +aml_append(dev,
> +   aml_operation_region(
> +   "TPP3", AML_SYSTEM_MEMORY,
> +   aml_int(TPM_PPI_ADDR_BASE +
> +   0x15a /* movv, docs/specs/tpm.txt */),
> +   0x1
> ));
> +field = aml_field("TPP3", AML_BYTE_ACC, AML_NOLOCK, AML_PRESERVE);
> +aml_append(field, aml_named_field("MOVV", 8));
> +aml_append(dev, field);
> +
>  /*
>   * DerefOf in Windows is broken with SYSTEM_MEMORY.  Use a dynamic
>   * operation region inside of a method for getting FUNC[op].
> @@ -395,6 +405,44 @@ void tpm_build_ppi_acpi(TPMIf *tpm, Aml *dev)
>  aml_append(ifctx, aml_return(aml_buffer(1, zerobyte)));
>  }
>  aml_append(method, ifctx);
> +
> +ifctx = aml_if(
> +aml_equal(uuid,
> +  aml_touuid("376054ED-CC13-4675-901C-4756D7F2D45D")));
> +{
> +/* standard DSM query function */

refer to spec chapter here?

> +ifctx2 = aml_if(aml_equal(function, zero));
> +{
> +uint8_t byte_list[1] = { 0x03 };

and explain 3 above?

> +aml_append(ifctx2, aml_return(aml_buffer(1

1 -> sizeof(byte_list)?

> + , byte_list)));
> +}
> +aml_append(ifctx, ifctx2);
> +
> +/*
> + * TCG Platform Reset Attack Mitigation Specification 1.0 Ch.6
> + *
> + * Arg 2 (Integer): Function Index = 1
> + * Arg 3 (Package): Arguments = Package: Type: Integer
> + *  Operation Value of the Request
> + * Returns: Type: Integer
> + *  0: Success
> + *  1: General Failure
> + */
> +ifctx2 = aml_if(aml_equal(function, one));
> +{
> +aml_append(ifctx2,
> +   aml_store(aml_derefof(aml_index(arguments, zero)),
> + op));
> +{
> +aml_append(ifctx2, aml_store(op, aml_name("MOVV")));
> +
> +/* 0: success */
> +aml_append(ifctx2, aml_return(zero));
> +}
> +}
> +aml_append(ifctx, ifctx2);
> +}
> +aml_append(method, ifctx);
>  }
>  aml_append(dev, method);
>  }
> diff --git a/docs/specs/tpm.txt b/docs/specs/tpm.txt
> index 424d1511fc..5d8c26b1ad 100644
> --- a/docs/specs/tpm.txt
> +++ b/docs/specs/tpm.txt
> @@ -135,6 +135,8 @@ layout:
>   +--+++---+
>   | next_step|   0x1  |  0x159 | Operation to execute after reboot by  |
>   |  ||| firmware. Used by firmware.   |
> + +--+++---+
> + | movv |   0x1  |  0x15a | Memory overwrite variable |
>   +--+++---+
>  
> The following values are supported for the 'func' field. They correspond
> -- 
> 2.20.1.2.gb21ebb671b

[Qemu-devel] [PATCH v3 4/5] ext4: disable map_sync for virtio pmem

2019-01-09 Thread Pankaj Gupta

Virtio pmem provides asynchronous host page cache flush
mechanism. We don't support 'MAP_SYNC' with virtio pmem 
and ext4. 

Signed-off-by: Pankaj Gupta 
---
 fs/ext4/file.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 69d65d4..e54f48b 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -360,8 +360,10 @@ static const struct vm_operations_struct ext4_file_vm_ops 
= {
 static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma)
 {
struct inode *inode = file->f_mapping->host;
+   struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+   struct dax_device *dax_dev = sbi->s_daxdev;
 
-   if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb
+   if (unlikely(ext4_forced_shutdown(sbi)))
return -EIO;
 
/*
@@ -371,6 +373,13 @@ static int ext4_file_mmap(struct file *file, struct 
vm_area_struct *vma)
if (!IS_DAX(file_inode(file)) && (vma->vm_flags & VM_SYNC))
return -EOPNOTSUPP;
 
+   /* We don't support synchronous mappings with guest direct access
+* and virtio based host page cache flush mechanism.
+*/
+   if (IS_DAX(file_inode(file)) && virtio_pmem_host_cache_enabled(dax_dev)
+   && (vma->vm_flags & VM_SYNC))
+   return -EOPNOTSUPP;
+
file_accessed(file);
if (IS_DAX(file_inode(file))) {
vma->vm_ops = _dax_vm_ops;
-- 
2.9.3

Re: [Qemu-devel] [PATCH] json: Fix % handling when not interpolating

2019-01-09 Thread Daniel P . Berrangé

On Wed, Jan 09, 2019 at 03:32:47PM +0100, Markus Armbruster wrote:
> Max Reitz  writes:
> 
> > On 08.01.19 11:36, Markus Armbruster wrote:
> >> Copying block maintainers for help with assessing the bug's (non-)impact
> >> on security.
> >> 
> >> Christophe Fergeau  writes:
> >> 
> >>> On Mon, Jan 07, 2019 at 04:47:44PM +0100, Markus Armbruster wrote:
>  Eric Blake  writes:
> 
> > On 1/2/19 12:01 PM, Christophe Fergeau wrote:
> >> Adding Markus to cc: list, I forgot to do it when sending the patch.
> >
> > Also worth backporting via qemu-stable, now in cc.
> >
> >>
> >> Christophe
> >>
> >> On Wed, Jan 02, 2019 at 03:05:35PM +0100, Christophe Fergeau wrote:
> >>> commit 8bca4613 added support for %% in json strings when 
> >>> interpolating,
> >>> but in doing so, this broke handling of % when not interpolating as 
> >>> the
> >>> '%' is skipped in both cases.
> >>> This commit ensures we only try to handle %% when interpolating.
> 
>  Impact?
> 
>  If you're unable to assess, could you give us at least a reproducer?
> >>>
> >>> This all came from 
> >>> https://lists.freedesktop.org/archives/spice-devel/2018-December/046644.html
> >>> Setting up a VM with libvirt with  >>> passwd='password%'/>
> >>> fails to start with:
> >>>   qemu-system-x86_64: qobject/json-parser.c:146: parse_string: Assertion 
> >>> `*ptr' failed.
> >>>
> >>> If you use 'password%%' as the password instead, when trying to connect
> >>> to the VM, you type 'password%' as the password instead of 'password%%'
> >>> as configured in the domain XML.
> >> 
> >> Thanks.
> >> 
> >> As the commit message says, the bug bites when we parse a string
> >> containing '%s' with !ctxt->ap.  The parser then swallows a character.
> >> If it swallows the terminating '"', it fails the assertion.
> >> 
> >> We parse with !ctxt->ap in the following cases:
> >> 
> >> * Tests (tests/check-qjson.c, tests/test-qobject-input-visitor.c,
> >>   tests/test-visitor-serialization.c)
> >> 
> >>   Plenty of tests, but we still failed to cover the buggy case :(
> >> 
> >> * QMP input (monitor.c)
> >> 
> >> * QGA input (qga/main.c)
> >> 
> >> * qobject_from_json()
> >> 
> >>   - JSON pseudo-filenames (block.c)
> >> 
> >> These are pseudo-filenames starting with "json:".
> >> 
> >>   - JSON key pairs (block/rbd.c)
> >> 
> >> As far as I can tell, these can come only from pseudo-filenames
> >> starting with "rbd:".
> >> 
> >>   - JSON command line option arguments of -display and -blockdev
> >> (qobject-input-visitor.c)
> >> 
> >> Reproducer: -blockdev '{"%"}'
> >> 
> >> Command line, QMP and QGA input are trusted.
> >> 
> >> Filenames are trusted when they come from command line, QMP or HMP.
> >> They are untrusted when they come from from image file headers.
> >> Example: QCOW2 backing file name.  Note that this is *not* the security
> >> boundary between host and guest.  It's the boundary between host and an
> >> image file from an untrusted source.
> >> 
> >> I can't see how the bug could be exploited.  Neither failing an
> >> assertion nor skipping a character in a filename of your choice is
> >> interesting.  We don't support compiling with NDEBUG.
> >> 
> >> Kevin, Max, do you agree?
> >
> > I wouldn't call it "not interesting" if adding an image to your VM at
> > runtime can crash the whole thing.
> >
> > (qemu-img create -f qcow2 -u -b 'json:{"%"}' foo.qcow2 64M)
> 
> "Not interesting" strictly from the point of view of exploiting the bug
> to penetrate trust boundaries.
> 
> > Whether this is a security issue...  I don't know, but it is a DoS.
> 
> I'm not sure whether feeding untrusted images to QEMU is a good idea in
> general --- there's so much that could go wrong.  How hardened against
> abuse are out block drivers?
>
> I figure what distinguishes this case is how utterly trivial creating a
> "bad" image is.

Consider that you can already create a qcow2 image with a backing file
of /etc/shadow. Or create a qcow2 image many EB in size that causes QEMU 
to allocate massive amounts of RAM and/or burn CPU time, and so on. 

IOW, mgmt apps should never pass untrusted images to QEMU.  Crashing is
just one of many bad things, and probably not the worst that can happen.

They need to do validation upfront in some manner if receiving an 
untrustworthy image. Openstack does this by running qemu-img, with 
limits set on virutal memory size, CPU time, and then rejecting any 
image with a backing file from being used at all.

> Anyway, you are the block layer maintainers, so you get to decide
> whether to give this the full security bug treatment.  I'm merely the
> clown who broke it %-/

Accepting an image with any backing file at all from an untrusted user
would be a flaw in the layered management app itself, not QEMU.

So I think it would only be considered a security bug in QEMU if there was 
a way for an unprivileged user to trick QEMU into writing malformed

Re: [Qemu-devel] [PATCH v2] configure: Force the C standard to gnu99

2019-01-09 Thread Daniel P . Berrangé

On Wed, Jan 09, 2019 at 03:51:33PM +0100, Thomas Huth wrote:
> On 2019-01-09 15:31, Daniel P. Berrangé wrote:
> > On Wed, Jan 09, 2019 at 03:24:29PM +0100, Thomas Huth wrote:
> >> Different versions of GCC and Clang use different versions of the C 
> >> standard.
> >> This repeatedly caused problems already, e.g. with duplicated typedefs:
> >>
> >>  https://lists.gnu.org/archive/html/qemu-devel/2018-11/msg05829.html
> >>
> >> or with for-loop variable initializers:
> >>
> >>  https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg00237.html
> >>
> >> To avoid these problems, we should enforce the C language version to the
> >> same level for all compilers. Since our minimum compiler versions is
> >> GCC v4.8, our best option is "gnu99" right now ("gnu17" is not available
> >> there yet, and "gnu11" is marked as "experimental").
> >>
> >> Signed-off-by: Thomas Huth 
> >> ---
> >>  v2: Use gnu99 instead of gnu11
> >>
> >>  configure | 5 +++--
> >>  1 file changed, 3 insertions(+), 2 deletions(-)
> > 
> > Reviewed-by: Daniel P. Berrangé 
> > 
> >> diff --git a/configure b/configure
> >> index b9f34af..721ade7 100755
> >> --- a/configure
> >> +++ b/configure
> >> @@ -105,7 +105,8 @@ update_cxxflags() {
> >>  for arg in $QEMU_CFLAGS; do
> >>  case $arg in
> >>  -Wstrict-prototypes|-Wmissing-prototypes|-Wnested-externs|\
> >> -
> >> -Wold-style-declaration|-Wold-style-definition|-Wredundant-decls)
> >> +
> >> -Wold-style-declaration|-Wold-style-definition|-Wredundant-decls|\
> >> +-std=gnu99)
> > 
> > IIUC this is to  drop  -std=gnu99 from CXXFLAGS, so C++ code (only the
> > guest-agent on Win32 IIUC) will use the compiler default still.
> 
> We also use it for the capstone disassembler and disas/libvixl.
> 
> > We could consider also setting a suitable -std for CXXFLAGS too in future
> > though...
> 
> Shall I send a v3 with "-std=gnu++98" (which seems to be the only usable
> option right now), or shall we wait until we really hit a problem with
> the C++ code?

I'd have a slight preference for adding a C++ suitable -std now
just to give us a more predictable build setup. I don't consider
it a blocker though, hence my R-b on this v2 patch as it is now.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [Qemu-devel] [PATCH] json: Fix % handling when not interpolating

2019-01-09 Thread Max Reitz

On 09.01.19 15:49, Daniel P. Berrangé wrote:
> On Wed, Jan 09, 2019 at 03:32:47PM +0100, Markus Armbruster wrote:
>> Max Reitz  writes:
>>
>>> On 08.01.19 11:36, Markus Armbruster wrote:
 Copying block maintainers for help with assessing the bug's (non-)impact
 on security.

 Christophe Fergeau  writes:

> On Mon, Jan 07, 2019 at 04:47:44PM +0100, Markus Armbruster wrote:
>> Eric Blake  writes:
>>
>>> On 1/2/19 12:01 PM, Christophe Fergeau wrote:
 Adding Markus to cc: list, I forgot to do it when sending the patch.
>>>
>>> Also worth backporting via qemu-stable, now in cc.
>>>

 Christophe

 On Wed, Jan 02, 2019 at 03:05:35PM +0100, Christophe Fergeau wrote:
> commit 8bca4613 added support for %% in json strings when 
> interpolating,
> but in doing so, this broke handling of % when not interpolating as 
> the
> '%' is skipped in both cases.
> This commit ensures we only try to handle %% when interpolating.
>>
>> Impact?
>>
>> If you're unable to assess, could you give us at least a reproducer?
>
> This all came from 
> https://lists.freedesktop.org/archives/spice-devel/2018-December/046644.html
> Setting up a VM with libvirt with  passwd='password%'/>
> fails to start with:
>   qemu-system-x86_64: qobject/json-parser.c:146: parse_string: Assertion 
> `*ptr' failed.
>
> If you use 'password%%' as the password instead, when trying to connect
> to the VM, you type 'password%' as the password instead of 'password%%'
> as configured in the domain XML.

 Thanks.

 As the commit message says, the bug bites when we parse a string
 containing '%s' with !ctxt->ap.  The parser then swallows a character.
 If it swallows the terminating '"', it fails the assertion.

 We parse with !ctxt->ap in the following cases:

 * Tests (tests/check-qjson.c, tests/test-qobject-input-visitor.c,
   tests/test-visitor-serialization.c)

   Plenty of tests, but we still failed to cover the buggy case :(

 * QMP input (monitor.c)

 * QGA input (qga/main.c)

 * qobject_from_json()

   - JSON pseudo-filenames (block.c)

 These are pseudo-filenames starting with "json:".

   - JSON key pairs (block/rbd.c)

 As far as I can tell, these can come only from pseudo-filenames
 starting with "rbd:".

   - JSON command line option arguments of -display and -blockdev
 (qobject-input-visitor.c)

 Reproducer: -blockdev '{"%"}'

 Command line, QMP and QGA input are trusted.

 Filenames are trusted when they come from command line, QMP or HMP.
 They are untrusted when they come from from image file headers.
 Example: QCOW2 backing file name.  Note that this is *not* the security
 boundary between host and guest.  It's the boundary between host and an
 image file from an untrusted source.

 I can't see how the bug could be exploited.  Neither failing an
 assertion nor skipping a character in a filename of your choice is
 interesting.  We don't support compiling with NDEBUG.

 Kevin, Max, do you agree?
>>>
>>> I wouldn't call it "not interesting" if adding an image to your VM at
>>> runtime can crash the whole thing.
>>>
>>> (qemu-img create -f qcow2 -u -b 'json:{"%"}' foo.qcow2 64M)
>>
>> "Not interesting" strictly from the point of view of exploiting the bug
>> to penetrate trust boundaries.
>>
>>> Whether this is a security issue...  I don't know, but it is a DoS.
>>
>> I'm not sure whether feeding untrusted images to QEMU is a good idea in
>> general --- there's so much that could go wrong.  How hardened against
>> abuse are out block drivers?
>>
>> I figure what distinguishes this case is how utterly trivial creating a
>> "bad" image is.
> 
> Consider that you can already create a qcow2 image with a backing file
> of /etc/shadow.

If you cannot access this file, then it should just be an error and not
crash qemu.

If you can access this file, that's your own fault for bad permissions.

> Or create a qcow2 image many EB in size that causes QEMU 
> to allocate massive amounts of RAM and/or burn CPU time, and so on.

That would be the qcow2 driver's fault.  We do try to open only images
which are sane.

And memory allocation failures should be handled gracefully, so the VM
shouldn't crash.  Well, at least qcow2 does its best, what Linux makes
of it, who knows.  (e.g. it may assign the memory to qemu and then the
OOM killer may crash it later)

> IOW, mgmt apps should never pass untrusted images to QEMU.  Crashing is
> just one of many bad things, and probably not the worst that can happen.
> 
> They need to do validation upfront in some manner if receiving an 
> untrustworthy image. Openstack does this by running qemu-img, with 
> limits set on

Re: [Qemu-devel] [PATCH v2] configure: Force the C standard to gnu99

2019-01-09 Thread Thomas Huth

On 2019-01-09 15:58, Daniel P. Berrangé wrote:
> On Wed, Jan 09, 2019 at 03:51:33PM +0100, Thomas Huth wrote:
>> On 2019-01-09 15:31, Daniel P. Berrangé wrote:
[...]
>>> We could consider also setting a suitable -std for CXXFLAGS too in future
>>> though...
>>
>> Shall I send a v3 with "-std=gnu++98" (which seems to be the only usable
>> option right now), or shall we wait until we really hit a problem with
>> the C++ code?
> 
> I'd have a slight preference for adding a C++ suitable -std now
> just to give us a more predictable build setup.
I think that's my preference, too. I'll send a v3...

 Thomas

Re: [Qemu-devel] [PATCH] block: Acquire the AioContext in guess_disk_lchs()

2019-01-09 Thread Kevin Wolf

Am 09.01.2019 um 15:28 hat Alberto Garcia geschrieben:
> This fixes the following crash:
> 
> { "execute": "blockdev-add",
>   "arguments": {"driver": "null-co", "node-name": "hd0"}}
> { "execute": "object-add",
>   "arguments": {"qom-type": "iothread", "id": "iothread0"}}
> { "execute": "x-blockdev-set-iothread",
>   "arguments": {"node-name": "hd0", "iothread": "iothread0"}}
> { "execute": "device_add",
>   "arguments": {"id": "virtio0", "driver": "virtio-blk-pci",
> "drive": "hd0"}}
> qemu: qemu_mutex_unlock_impl: Operation not permitted
> Aborted
> 
> Signed-off-by: Alberto Garcia 

This looks like the same thing that I talked about with Markus
yesterday. He asked me where to put the acquire/release pair. My answer
was that there is more than one way to do it, but I suspect that the
realize functions of the devices may call the block layer more than once
and putting the acquire/release there might be nicer. We would then
document blkconf_geometry() to assume that the AioContext is already
locked.

There are many calls for node management stuff where we never defined
the locking rules, but where we might reasonably assume that they only
affect things that are only ever accessed from the main thread. But
there are also things like blk_ioctl() in scsi_block_realize(), which
should most certainly run with the lock held.

What do you think?

(I also wondered whether virtio-blk is affected as well or whether it
moves the BlockBackend to the iothread AioContext only later. Your
reproducer answers this question. :-))

Kevin

Re: [Qemu-devel] [PATCH v2 0/3] HMP/snapshot changes - do not use ID anymore

2019-01-09 Thread Kevin Wolf

Am 09.01.2019 um 15:54 hat Max Reitz geschrieben:
> On 09.01.19 15:48, Kevin Wolf wrote:
> > Am 09.01.2019 um 15:27 hat Max Reitz geschrieben:
> >> On 09.01.19 15:21, Kevin Wolf wrote:
> >>> Am 09.01.2019 um 15:10 hat Max Reitz geschrieben:
>  On 06.09.18 13:11, Daniel Henrique Barboza wrote:
> > changes in v2:
> > - removed the "RFC" marker;
> > - added a new patch (patch 2) that removes
> > bdrv_snapshot_delete_by_id_or_name from the code;
> > - made changes in patch 1 as suggested by Murilo;
> > - previous patch set link:
> > https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg04658.html
> >
> >
> > It is not uncommon to see bugs being opened by testers that attempt to
> > create VM snapshots using HMP. It turns out that "0" and "1" are quite
> > common snapshot names and they trigger a lot of bugs. I gave an example
> > in the commit message of patch 1, but to sum up here: QEMU treats the
> > input of savevm/loadvm/delvm sometimes as 'ID', sometimes as 'name'. It
> > is documented as such, but this can lead to strange situations.
> >
> > Given that it is strange for an API to consider a parameter to be 2 
> > fields
> > at the same time, and inadvently treating them as one or the other, and
> > that removing the ID field is too drastic, my idea here is to keep the
> > ID field for internal control, but do not let the user set it.
> >
> > I guess there's room for discussion about considering this change an API
> > change or not. It doesn't affect users of HMP and it doesn't affect 
> > Libvirt,
> > but simplifying the meaning of the parameters of savevm/loadvm/delvm.
> 
>  (Yes, very late reply, I'm sorry...)
> 
>  I do think it affects users of HMP, because right now you can delete
>  snapshots with their ID, and after this series you cannot.
> >>>
> >>> Can there be snapshots that can't be identified by a snapshot name, but
> >>> only by their ID?
> >>
> >> I don't know, but what I meant is that if you have scripts to do all
> >> this, you might have to adjust them with this change.
> > 
> > That's what the H in HMP means.
> > 
>  I think we had a short discussion about just disallowing numeric
>  snapshot names.  How bad would that be?
> >>>
> >>> It would be incompatible with existing images and result in a more
> >>> complex snapshot identifier resolution. Why would it be any better?
> >>
> >> It wouldn't be incompatible with existing images if we'd just disallow
> >> creating such snapshots.  I don't see how the identifier resolution
> >> would be more complex.
> >>
> >> I don't know if it'd be better.  I'd just find it simpler (for us, that
> >> is -- for users, I'm not sure).
> > 
> > Identifier resolution A:
> > - Find a snapshot that has the given identifier as a name
> > - If no such snapshot exists, it is an error
> > 
> > Identifier resolution B:
> > - If the identifier is a number, find a snapshot that has the given
> >   identifier as its ID
> > - If the identifier is not a number, find a snapshot that has the given
> >   identifier as a name
> > - If no such snapshot exists, it is an error
> 
> No, my idea was to keep the resolution the same as it is; just to forbid
> creating new snapshots with numeric names.  This would prevent users
> from getting into the whole situation.

That's the version with an even more complex resolution method C. :-)

I actually think the current behaviour is more confusing than helpful.
Without looking into the code or trying it out, I couldn't even tell
whether ID or name takes precedence if there is a matching snapshot for
both. Considering your proposal, it's probably the ID, but how should a
user know that? (If against all expectations documentation exists, it
doesn't count because nobody reads that.)

Kevin


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH for-4.0 v9 13/16] qemu_thread: supplement error handling for migration

2019-01-09 Thread Markus Armbruster

Fei Li  writes:

> Update qemu_thread_create()'s callers by
> - setting an error on qemu_thread_create() failure for callers that
>   set an error on failure;
> - reporting the error and returning failure for callers that return
>   an error code on failure;
> - reporting the error and setting some state for callers that just
>   report errors and choose not to continue on.
>
> Cc: Markus Armbruster 
> Cc: Dr. David Alan Gilbert 
> Cc: Peter Xu 
> Signed-off-by: Fei Li 
[...]
> diff --git a/migration/ram.c b/migration/ram.c
> index eed1daf302..1e24a78eaa 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
[...]
> @@ -3625,6 +3637,7 @@ static void compress_threads_load_cleanup(void)
>  static int compress_threads_load_setup(QEMUFile *f)
>  {
>  int i, thread_count;
> +Error *local_err = NULL;
>  
>  if (!migrate_use_compression()) {
>  return 0;
> @@ -3646,10 +3659,13 @@ static int compress_threads_load_setup(QEMUFile *f)
>  qemu_cond_init(_param[i].cond);
>  decomp_param[i].done = true;
>  decomp_param[i].quit = false;
> -/* TODO: let the further caller handle the error instead of abort() 
> */
> -qemu_thread_create(decompress_threads + i, "decompress",
> -   do_data_decompress, decomp_param + i,
> -   QEMU_THREAD_JOINABLE, _abort);
> +if (!qemu_thread_create(decompress_threads + i, "decompress",
> +do_data_decompress, decomp_param + i,
> +QEMU_THREAD_JOINABLE, _err)) {
> +error_reportf_err(local_err,
> +  "failed to create do_data_decompress: ");
> +goto exit;

Broken error handling, see my review of PATCH 16.

> +}
>  }
>  return 0;
>  exit:
[...]

Re: [Qemu-devel] [PATCH for-4.0 v9 16/16] qemu_thread_join: fix segmentation fault

2019-01-09 Thread Markus Armbruster

Fei Li  writes:

> 在 2019/1/9 上午1:29, Markus Armbruster 写道:
>> fei  writes:
>>
 在 2019年1月8日，01:55，Markus Armbruster  写道：

 Fei Li  writes:

> To avoid the segmentation fault in qemu_thread_join(), just directly
> return when the QemuThread *thread failed to be created in either
> qemu-thread-posix.c or qemu-thread-win32.c.
>
> Cc: Stefan Weil 
> Signed-off-by: Fei Li 
> Reviewed-by: Fam Zheng 
> ---
> util/qemu-thread-posix.c | 3 +++
> util/qemu-thread-win32.c | 2 +-
> 2 files changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/util/qemu-thread-posix.c b/util/qemu-thread-posix.c
> index 39834b0551..3548935dac 100644
> --- a/util/qemu-thread-posix.c
> +++ b/util/qemu-thread-posix.c
> @@ -571,6 +571,9 @@ void *qemu_thread_join(QemuThread *thread)
>  int err;
>  void *ret;
>
> +if (!thread->thread) {
> +return NULL;
> +}
 How can this happen?
>>> I think I have answered this earlier, please check the following link to 
>>> see whether it helps:
>>> http://lists.nongnu.org/archive/html/qemu-devel/2018-11/msg06554.html
>> Thanks for the pointer.  Unfortunately, I don't understand your
>> explanation.  You also wrote there "I will remove this patch in next
>> version"; looks like you've since changed your mind.
> Emm, issues left over from history.. The background is I was hurry to
> make those five
> Reviewed-by patches be merged, including this v9 16/16 patch but not
> the real
> qemu_thread_create() modification. But actually this patch is to fix
> the segmentation
> fault after we modified qemu_thread_create() related functions
> although it has got a
> Reviewed-by earlier. :) Thus to not make troube, I wrote the
> "remove..." sentence
> to separate it from those 5 Reviewed-by patches, and were plan to send
> only four patches.
> But later I got a message that these five patches are not that urgent
> to catch qemu v3.1,
> thus I joined the earlier 5 R-b patches into the later v8 & v9 to have
> a better review.
>
> Sorry for the trouble, I need to explain it without involving too much
> background..
>
> Back at the farm: in our current qemu code, some cleanups use a loop
> to join()
> the total number of threads if caller fails. This is not a problem
> until applying the
> qemu_thread_create() modification. E.g. when compress_threads_save_setup()
> fails while trying to create the last do_data_compress thread,
> segmentation fault
> will occur when join() is called (sadly there's not enough condition
> to filter this
> unsuccessful created thread) as this thread is actually not be created.
>
> Hope the above makes it clear. :)

Alright, let's have a look at compress_threads_save_setup():

static int compress_threads_save_setup(void)
{
int i, thread_count;

if (!migrate_use_compression()) {
return 0;
}
thread_count = migrate_compress_threads();
compress_threads = g_new0(QemuThread, thread_count);
comp_param = g_new0(CompressParam, thread_count);
qemu_cond_init(_done_cond);
qemu_mutex_init(_done_lock);
for (i = 0; i < thread_count; i++) {
comp_param[i].originbuf = g_try_malloc(TARGET_PAGE_SIZE);
if (!comp_param[i].originbuf) {
goto exit;
}

if (deflateInit(_param[i].stream,
migrate_compress_level()) != Z_OK) {
g_free(comp_param[i].originbuf);
goto exit;
}

/* comp_param[i].file is just used as a dummy buffer to save data,
 * set its ops to empty.
 */
comp_param[i].file = qemu_fopen_ops(NULL, _ops);
comp_param[i].done = true;
comp_param[i].quit = false;
qemu_mutex_init(_param[i].mutex);
qemu_cond_init(_param[i].cond);
qemu_thread_create(compress_threads + i, "compress",
   do_data_compress, comp_param + i,
   QEMU_THREAD_JOINABLE);
}
return 0;

exit:
compress_threads_save_cleanup();
return -1;
}

At label exit, we have @i threads, all fully initialized.  That's an
invariant.

compress_threads_save_cleanup() finds the threads to clean up by
checking comp_param[i].file:

static void compress_threads_save_cleanup(void)
{
int i, thread_count;

if (!migrate_use_compression() || !comp_param) {
return;
}

thread_count = migrate_compress_threads();
for (i = 0; i < thread_count; i++) {
/*
 * we use it as a indicator which shows if the thread is
 * properly init'd or not
 */
--->if (!comp_param[i].file) {
--->break;
--->}

qemu_mutex_lock(_param[i].mutex);
comp_param[i].quit = true;

Re: [Qemu-devel] [PATCH] throttle-groups: fix restart coroutine iothread race

2019-01-09 Thread Alberto Garcia

On Wed 09 Jan 2019 12:01:44 PM CET, Stefan Hajnoczi wrote:
> The following QMP command leads to a crash when iothreads are used:
>
>   { 'execute': 'device_del', 'arguments': {'id': 'data'} }

How did you reproduce this? Do you have a test case?

Berto

Re: [Qemu-devel] [PATCH v3] qdev/core: fix qbus_is_full()

2019-01-09 Thread Tony Krowiak


On 1/9/19 5:14 AM, Cornelia Huck wrote:

On Tue, 8 Jan 2019 15:34:37 -0500
Tony Krowiak  wrote:


On 1/8/19 12:06 PM, Cornelia Huck wrote:

On Tue, 8 Jan 2019 17:50:21 +0100
Halil Pasic  wrote:
   

On Tue, 8 Jan 2019 17:31:13 +0100
Cornelia Huck  wrote:
  

On Tue, 8 Jan 2019 11:08:56 -0500
Tony Krowiak  wrote:
  

On 12/17/18 10:57 AM, Tony Krowiak wrote:

The qbus_is_full(BusState *bus) function (qdev_monitor.c) compares the max_index
value of the BusState structure with the max_dev value of the BusClass structure
to determine whether the maximum number of children has been reached for the
bus. The problem is, the max_index field of the BusState structure does not
necessarily reflect the number of devices that have been plugged into
the bus.

Whenever a child device is plugged into the bus, the bus's max_index value is
assigned to the child device and then incremented. If the child is subsequently
unplugged, the value of the max_index does not change and no longer reflects the
number of children.

When the bus's max_index value reaches the maximum number of devices
allowed for the bus (i.e., the max_dev field in the BusClass structure),
attempts to plug another device will be rejected claiming that the bus is
full -- even if the bus is actually empty.

To resolve the problem, a new 'num_children' field is being added to the
BusState structure to keep track of the number of children plugged into the
bus. It will be incremented when a child is plugged, and decremented when a
child is unplugged.

Signed-off-by: Tony Krowiak 
Reviewed-by: Pierre Morel
Reviewed-by: Halil Pasic 
---
hw/core/qdev.c | 3 +++
include/hw/qdev-core.h | 1 +
qdev-monitor.c | 2 +-
3 files changed, 5 insertions(+), 1 deletion(-)


Ping ...
I could not determine who the maintainer is for the three files
listed above. I checked the MAINTAINERS file and the prologue of each
individual file. Can someone please tell me who is responsible
for merging these changes? Any additional review comments?
  


diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index 6b3cc55b27c2..956923f33520 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -59,6 +59,8 @@ static void bus_remove_child(BusState *bus, DeviceState 
*child)
snprintf(name, sizeof(name), "child[%d]", kid->index);
QTAILQ_REMOVE(>children, kid, sibling);

+bus->num_children--;

+
/* This gives back ownership of kid->child back to us.  */
object_property_del(OBJECT(bus), name, NULL);
object_unref(OBJECT(kid->child));
@@ -73,6 +75,7 @@ static void bus_add_child(BusState *bus, DeviceState *child)
char name[32];
BusChild *kid = g_malloc0(sizeof(*kid));

+bus->num_children++;

kid->index = bus->max_index++;


Hm... I'm wondering what happens for insane numbers of hotplugging
operations here?

(Preexisting problem for busses without limit, but busses with a limit
could now run into that as well.)
  


How does this patch change things? I mean bus->num_children gets
decremented on unplug.


We don't stop anymore if max_index >= max_dev, which means that we can
now trigger that even if max_dev != 0.


I guess I am missing your point. If max_dev == 0, then there is nothing
stopping an insane number of hot plug operations; either before this
patch, or with this patch. With the patch, once the number of children
hot plugged reaches max_dev, the qbus_is_full function will return false
and no more children can be plugged. If a child device is unplugged,
the num_children - which counts the number of children attached to the
bus - will be decremented, so it always reflects the number of children
added to the bus. Besides, checking max_index against max_dev
is erroneous, because max_index is incremented every time a child device
is plugged and is never decremented. It really operates as more of a
uinique identifier than a counter and is also used to create a unique
property name when the child device is linked to the bus as a property
(see bus_add_child function in hw/core/qdev.c).


Checking num_children against max_dev is the right thing to do, no
argument here.

However, max_index continues to be incremented even if devices have
been unplugged again. That means it can overflow, as it is never bound
by the max_dev constraint.

This has been a problem before for busses with an unrestricted number of
devices before, but with your patch, the problem is now triggerable for
all busses.

Not a problem with your patch, but we might want to look into making
max_index overflow/wraparound save.


I see your point. It does beg the question, what are the chances that
max_index reaches INT_MAX? In the interest of making this code more
bullet proof, I suppose it is something that should be dealt with.

A search reveals that max_index is used in only two places: It is used
to set the index for a child of the bus (i.e., bus_add_child function in

[Qemu-devel] [PATCH v3 5/5] xfs: disable map_sync for virtio pmem

2019-01-09 Thread Pankaj Gupta

Virtio pmem provides asynchronous host page cache flush
mechanism. we don't support 'MAP_SYNC' with virtio pmem 
and xfs.

Signed-off-by: Pankaj Gupta 
---
 fs/xfs/xfs_file.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index e474250..eae4aa4 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1190,6 +1190,14 @@ xfs_file_mmap(
if (!IS_DAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC))
return -EOPNOTSUPP;
 
+   /* We don't support synchronous mappings with guest direct access
+* and virtio based host page cache mechanism.
+*/
+   if (IS_DAX(file_inode(filp)) && virtio_pmem_host_cache_enabled(
+   xfs_find_daxdev_for_inode(file_inode(filp))) &&
+   (vma->vm_flags & VM_SYNC))
+   return -EOPNOTSUPP;
+
file_accessed(filp);
vma->vm_ops = _file_vm_ops;
if (IS_DAX(file_inode(filp)))
-- 
2.9.3

[Qemu-devel] [PATCH v3 1/5] libnvdimm: nd_region flush callback support

2019-01-09 Thread Pankaj Gupta

This patch adds functionality to perform flush from guest
to host over VIRTIO. We are registering a callback based
on 'nd_region' type. virtio_pmem driver requires this special
flush function. For rest of the region types we are registering
existing flush function. Report error returned by host fsync
failure to userspace.

This also handles asynchronous flush requests from the block layer
by creating a child bio and chaining it with parent bio.

Signed-off-by: Pankaj Gupta 
---
 drivers/acpi/nfit/core.c |  4 ++--
 drivers/nvdimm/claim.c   |  6 --
 drivers/nvdimm/nd.h  |  1 +
 drivers/nvdimm/pmem.c| 12 
 drivers/nvdimm/region_devs.c | 38 --
 include/linux/libnvdimm.h|  5 -
 6 files changed, 55 insertions(+), 11 deletions(-)

diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index b072cfc..f154852 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -2234,7 +2234,7 @@ static void write_blk_ctl(struct nfit_blk *nfit_blk, 
unsigned int bw,
offset = to_interleave_offset(offset, mmio);
 
writeq(cmd, mmio->addr.base + offset);
-   nvdimm_flush(nfit_blk->nd_region);
+   nvdimm_flush(nfit_blk->nd_region, NULL, false);
 
if (nfit_blk->dimm_flags & NFIT_BLK_DCR_LATCH)
readq(mmio->addr.base + offset);
@@ -2283,7 +2283,7 @@ static int acpi_nfit_blk_single_io(struct nfit_blk 
*nfit_blk,
}
 
if (rw)
-   nvdimm_flush(nfit_blk->nd_region);
+   nvdimm_flush(nfit_blk->nd_region, NULL, false);
 
rc = read_blk_stat(nfit_blk, lane) ? -EIO : 0;
return rc;
diff --git a/drivers/nvdimm/claim.c b/drivers/nvdimm/claim.c
index fb667bf..a1dfa06 100644
--- a/drivers/nvdimm/claim.c
+++ b/drivers/nvdimm/claim.c
@@ -263,7 +263,7 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns,
struct nd_namespace_io *nsio = to_nd_namespace_io(>dev);
unsigned int sz_align = ALIGN(size + (offset & (512 - 1)), 512);
sector_t sector = offset >> 9;
-   int rc = 0;
+   int rc = 0, ret = 0;
 
if (unlikely(!size))
return 0;
@@ -301,7 +301,9 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns,
}
 
memcpy_flushcache(nsio->addr + offset, buf, size);
-   nvdimm_flush(to_nd_region(ndns->dev.parent));
+   ret = nvdimm_flush(to_nd_region(ndns->dev.parent), NULL, false);
+   if (ret)
+   rc = ret;
 
return rc;
 }
diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index 98317e7..d53a2d1 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -160,6 +160,7 @@ struct nd_region {
struct nd_interleave_set *nd_set;
struct nd_percpu_lane __percpu *lane;
struct nd_mapping mapping[0];
+   int (*flush)(struct nd_region *nd_region);
 };
 
 struct nd_blk_region {
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 6071e29..5d6a4a1 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -192,6 +192,7 @@ static blk_status_t pmem_do_bvec(struct pmem_device *pmem, 
struct page *page,
 
 static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio)
 {
+   int ret = 0;
blk_status_t rc = 0;
bool do_acct;
unsigned long start;
@@ -201,7 +202,7 @@ static blk_qc_t pmem_make_request(struct request_queue *q, 
struct bio *bio)
struct nd_region *nd_region = to_region(pmem);
 
if (bio->bi_opf & REQ_PREFLUSH)
-   nvdimm_flush(nd_region);
+   ret = nvdimm_flush(nd_region, bio, true);
 
do_acct = nd_iostat_start(bio, );
bio_for_each_segment(bvec, bio, iter) {
@@ -216,7 +217,10 @@ static blk_qc_t pmem_make_request(struct request_queue *q, 
struct bio *bio)
nd_iostat_end(bio, start);
 
if (bio->bi_opf & REQ_FUA)
-   nvdimm_flush(nd_region);
+   ret = nvdimm_flush(nd_region, bio, true);
+
+   if (ret)
+   bio->bi_status = errno_to_blk_status(ret);
 
bio_endio(bio);
return BLK_QC_T_NONE;
@@ -528,14 +532,14 @@ static int nd_pmem_remove(struct device *dev)
sysfs_put(pmem->bb_state);
pmem->bb_state = NULL;
}
-   nvdimm_flush(to_nd_region(dev->parent));
+   nvdimm_flush(to_nd_region(dev->parent), NULL, false);
 
return 0;
 }
 
 static void nd_pmem_shutdown(struct device *dev)
 {
-   nvdimm_flush(to_nd_region(dev->parent));
+   nvdimm_flush(to_nd_region(dev->parent), NULL, false);
 }
 
 static void nd_pmem_notify(struct device *dev, enum nvdimm_event event)
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index fa37afc..5508727 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -290,7 +290,9 @@ static ssize_t deep_flush_store(struct device *dev, struct 
device_attribute *att
return rc;
if

[Qemu-devel] [PATCH] hw/ppc/spapr: Encode the SCSI channel (bus) in the SRP LUNs

2019-01-09 Thread Thomas Huth

In hw/scsi/spapr_vio.c we declare that the controller supports multiple
buses by specifying "max_channel = 7" there. So in the code that fixes
up the device tree nodes, we must encode the channel number (a.k.a. bus
number in the "Logical unit addressing format" table of SAM5) into the
64-bit LUN, too.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1663160
Signed-off-by: Thomas Huth 
---
 hw/ppc/spapr.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 5671608..1f49489 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2905,10 +2905,11 @@ static char *spapr_get_fw_dev_path(FWPathProvider *p, 
BusState *bus,
 if (spapr) {
 /*
  * Replace "channel@0/disk@0,0" with "disk@8000":
- * We use SRP luns of the form 8000 | (bus << 8) | (id << 5) | lun
- * in the top 16 bits of the 64-bit LUN
+ * In the top 16 bits of the 64-bit LUN, we use SRP luns of the 
form
+ * 0x8000 | (target << 8) | (bus << 5) | lun
+ * (see the "Logical unit addressing format" table in SAM5)
  */
-unsigned id = 0x8000 | (d->id << 8) | d->lun;
+unsigned id = 0x8000 | (d->id << 8) | (d->channel << 5) | d->lun;
 return g_strdup_printf("%s@%"PRIX64, qdev_fw_name(dev),
(uint64_t)id << 48);
 } else if (virtio) {
-- 
1.8.3.1

Re: [Qemu-devel] [PATCH v2 0/3] HMP/snapshot changes - do not use ID anymore

2019-01-09 Thread Max Reitz

On 06.09.18 13:11, Daniel Henrique Barboza wrote:
> changes in v2:
> - removed the "RFC" marker;
> - added a new patch (patch 2) that removes
> bdrv_snapshot_delete_by_id_or_name from the code;
> - made changes in patch 1 as suggested by Murilo;
> - previous patch set link:
> https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg04658.html
> 
> 
> It is not uncommon to see bugs being opened by testers that attempt to
> create VM snapshots using HMP. It turns out that "0" and "1" are quite
> common snapshot names and they trigger a lot of bugs. I gave an example
> in the commit message of patch 1, but to sum up here: QEMU treats the
> input of savevm/loadvm/delvm sometimes as 'ID', sometimes as 'name'. It
> is documented as such, but this can lead to strange situations.
> 
> Given that it is strange for an API to consider a parameter to be 2 fields
> at the same time, and inadvently treating them as one or the other, and
> that removing the ID field is too drastic, my idea here is to keep the
> ID field for internal control, but do not let the user set it.
> 
> I guess there's room for discussion about considering this change an API
> change or not. It doesn't affect users of HMP and it doesn't affect Libvirt,
> but simplifying the meaning of the parameters of savevm/loadvm/delvm.

(Yes, very late reply, I'm sorry...)

I do think it affects users of HMP, because right now you can delete
snapshots with their ID, and after this series you cannot.

I think we had a short discussion about just disallowing numeric
snapshot names.  How bad would that be?

Max



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v3] qemu-io: Add generic function for reinitializing optind.

2019-01-09 Thread Daniel P . Berrangé

On Wed, Jan 09, 2019 at 02:14:46PM +, Richard W.M. Jones wrote:
> On Wed, Jan 09, 2019 at 01:44:48PM +, Daniel P. Berrangé wrote:
> > On Wed, Jan 09, 2019 at 01:23:01PM +, Richard W.M. Jones wrote:
> > > How about this one?  Add a generic osdep function for reinitializing
> > > optind, which does optreset on FreeBSD (but is identical on all other
> > > OSes).  Use it from qemu-io and qemu-img.
> > > 
> > > I have tested this on Linux, FreeBSD and OpenBSD.
> > > 
> > > checkpatch complains:
> > > 
> > > WARNING: Block comments use a leading /* on a separate line
> > > #69: FILE: include/qemu/osdep.h:591:
> > > +/**
> > 
> > I think it just doesn't like your '/**' and wants '/*' instead.
> 
> The existing comments in the same file are a mix of three styles.
> I chose the style used closest to the new comment I was adding :-)

Yeah, pre-existing code is a mess often not passing current style
rules. I wish we'd clean up existing code, but failing that, it can
be valid to ignore style warnings to be more consistent with existing
code.

> 
> > > WARNING: architecture specific defines should be avoided
> > > #78: FILE: include/qemu/osdep.h:600:
> > > +#ifdef __FreeBSD__
> > 
> > Normally we'd suggest doing a configure test to for the platform
> > feature and then using a feature based ifdef test. In this case
> > though that would be difficult and/or overly complex.
> > 
> > This does make me wonder about the other *BSDs, OS-X and Mingw
> 
> OpenBSD is known fine with optind = 0.  I can't test OS-X.  I can test
> mingw (on Linux) later.
> 
> 
> > though ?  Should they all be using the #else codepath, or should
> > the other BSDs / OS-X use the __FreeBSD__ codepath.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [Qemu-devel] [PATCH] configure: Force the C standard to gnu11

2019-01-09 Thread Markus Armbruster

Thomas Huth  writes:

> On 2019-01-09 14:10, Markus Armbruster wrote:
>> Thomas Huth  writes:
>> 
>>> On 2019-01-09 12:44, Daniel P. Berrangé wrote:
 On Wed, Jan 09, 2019 at 12:25:43PM +0100, Thomas Huth wrote:
> On 2019-01-09 11:58, Daniel P. Berrangé wrote:
>> On Mon, Jan 07, 2019 at 11:45:26AM +0100, Thomas Huth wrote:
>>> Different versions of GCC and Clang use different versions of the C 
>>> standard.
>>> This repeatedly caused problems already, e.g. with duplicated typedefs:
>>>
>>>  https://lists.gnu.org/archive/html/qemu-devel/2018-11/msg05829.html
>>>
>>> or with for-loop variable initializers:
>>>
>>>  https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg00237.html
>>>
>>> To avoid these problems, we should enforce the C language version to the
>>> same level for all compilers. Since our minimum compiler versions are
>>> GCC v4.8 and Clang v3.4 now, and both basically support "gnu11" already,
>>> this seems to be a good choice.
>>
>> In 4.x   gnu11 is marked as experimental. I'm not really comfortable
>> using experimental features - even if its warning free there's a risk
>> it would silently mis-compile something.
>>
>> gnu99 is ok with 4.x - it is merely "incomplete".
>
> gnu11 has the big advantage that it also fixes the problem with
> duplicated typedefs that are reported by older versions of Clang.
>
> Are you sure about the experimental character in 4.x? I just looked at
> https://gcc.gnu.org/onlinedocs/gcc-4.8.5/gcc/Standards.html and it says:
>
> "A fourth version of the C standard, known as C11, was published in 2011
> as ISO/IEC 9899:2011. GCC has limited incomplete support for parts of
> this standard, enabled with -std=c11 or -std=iso9899:2011."
>
> It does not say anything about "experimental" there. The word
> "experimental" is only used for the C++ support, but we hardly have C++
> code in QEMU -- if you worry about that, I could simply drop the
> "-std=gnu++11" part from my patch?

 I was looking at the "info gcc" docs on RHEL7, gcc-4.8.5-16.el7_4.1.x86_64:

   "3.4 Options Controlling C Dialect

snip...

  'gnu11'
  'gnu1x'
   GNU dialect of ISO C11.  Support is incomplete and
   experimental.  The name 'gnu1x' is deprecated."
>>>
>>> Ok. Looks like the "Support is incomplete and experimental" sentence has
>>> been removed with GCC 4.9.0 here. So GCC 4.8 is likely pretty close
>>> already. IMHO we could give it a try and enable gnu11 for QEMU with GCC
>>> v4.8, too. If we later find problems, we could still switch back to
>>> gnu99 instead. Other opinions?
>> 
>> Switchinh back could be somewhat painful if we already started using C11
>> features.  And if we don't plan to, then what exactly will -std=gnu11
>> buy us?
>
> With C11, we get safety for the "duplicated typedef" problem that we run
> into regularly again and again, see e.g.:
>
> https://lists.gnu.org/archive/html/qemu-devel/2018-11/msg05829.html

That's a compilation failure.  "Support is experimental" makes me afraid
of run time failures.

If we truly want C11, shouldn't we bump minimum required GCC to 4.9?

Re: [Qemu-devel] [libvirt] Qemu migration with vhost-user-blk on top of local storage

2019-01-09 Thread Peter Krempa

On Wed, Jan 09, 2019 at 21:23:25 +0800, wuzhouhui wrote:
> > -Original Messages-
> > From: "Stefan Hajnoczi" 
> > Sent Time: 2019-01-09 20:42:58 (Wednesday)
> > To: wuzhouhui 
> > Cc: qemu-devel@nongnu.org, xieyon...@baidu.com, lili...@baidu.com, 
> > libvir-l...@redhat.com, s...@lists.01.org
> > Subject: Re: [Qemu-devel] Qemu migration with vhost-user-blk on top of 
> > local storage
> > 
> > On Wed, Jan 09, 2019 at 06:23:42PM +0800, wuzhouhui wrote:
> > > Hi everyone,
> > > 
> > > I'm working qemu with vhost target (e.g. spdk), and I attempt to migrate 
> > > VM with
> > > 2 local storages. One local storage is a regular file, e.g. 
> > > /tmp/c74.qcow2, and
> > > the other is a malloc bdev that spdk created. This malloc bdev will 
> > > exported to
> > > VM via vhost-user-blk. When I execute following command:
> > > 
> > >   virsh migrate --live --persistent --unsafe --undefinesource 
> > > --copy-storage-all \
> > >  --p2p --auto-converge --verbose --desturi qemu+tcp:///system vm0
> > > 
> > > The libvirt reports:
> > > 
> > >   qemu-2.12.1: error: internal error: unable to execute QEMU command \
> > > 'nbd-server-add': Cannot find device=drive-virtio-disk1 nor \
> > > node_name=drive-virtio-disk1
> > 
> > Please post your libvirt domain XML.
> 
> My libvirt is based on libvirt-1.1.1-29.el7, and add many patches to satisfy 
> our
> own needs, e.g. add support for vhost-user-blk. Post domain xml may not 
> useful.

So you added support for vhost-user-blk, but did not fix the
NBD migration code. You are on your own, sorry. We can't support
arbitrary downstream changes.

Please either upstream your patches or make sure to skip it for
vhost-user.


signature.asc
Description: PGP signature

[Qemu-devel] [PATCH v2] configure: Force the C standard to gnu99

2019-01-09 Thread Thomas Huth

Different versions of GCC and Clang use different versions of the C standard.
This repeatedly caused problems already, e.g. with duplicated typedefs:

 https://lists.gnu.org/archive/html/qemu-devel/2018-11/msg05829.html

or with for-loop variable initializers:

 https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg00237.html

To avoid these problems, we should enforce the C language version to the
same level for all compilers. Since our minimum compiler versions is
GCC v4.8, our best option is "gnu99" right now ("gnu17" is not available
there yet, and "gnu11" is marked as "experimental").

Signed-off-by: Thomas Huth 
---
 v2: Use gnu99 instead of gnu11

 configure | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/configure b/configure
index b9f34af..721ade7 100755
--- a/configure
+++ b/configure
@@ -105,7 +105,8 @@ update_cxxflags() {
 for arg in $QEMU_CFLAGS; do
 case $arg in
 -Wstrict-prototypes|-Wmissing-prototypes|-Wnested-externs|\
--Wold-style-declaration|-Wold-style-definition|-Wredundant-decls)
+-Wold-style-declaration|-Wold-style-definition|-Wredundant-decls|\
+-std=gnu99)
 ;;
 *)
 QEMU_CXXFLAGS=${QEMU_CXXFLAGS:+$QEMU_CXXFLAGS }$arg
@@ -585,7 +586,7 @@ ARFLAGS="${ARFLAGS-rv}"
 # left shift of signed integers is well defined and has the expected
 # 2s-complement style results. (Both clang and gcc agree that it
 # provides these semantics.)
-QEMU_CFLAGS="-fno-strict-aliasing -fno-common -fwrapv $QEMU_CFLAGS"
+QEMU_CFLAGS="-fno-strict-aliasing -fno-common -fwrapv -std=gnu99 $QEMU_CFLAGS"
 QEMU_CFLAGS="-Wall -Wundef -Wwrite-strings -Wmissing-prototypes $QEMU_CFLAGS"
 QEMU_CFLAGS="-Wstrict-prototypes -Wredundant-decls $QEMU_CFLAGS"
 QEMU_CFLAGS="-D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE 
$QEMU_CFLAGS"
-- 
1.8.3.1

[Qemu-devel] [PATCH v2] spapr/vio: remove the "irq" property"

2019-01-09 Thread Cédric Le Goater

commit efe2add7cb7f ("spapr/vio: deprecate the "irq" property") was
merged in QEMU version 3.0. The "irq" property" can be removed for
QEMU version 4.0.

Signed-off-by: Cédric Le Goater 
Reviewed-by: Thomas Huth 
Reviewed-by: Greg Kurz 
---
 qemu-deprecated.texi |  6 --
 hw/ppc/spapr_vio.c   | 47 ++--
 2 files changed, 6 insertions(+), 47 deletions(-)

diff --git a/qemu-deprecated.texi b/qemu-deprecated.texi
index c3735b698ee4..eb5cd2a72d41 100644
--- a/qemu-deprecated.texi
+++ b/qemu-deprecated.texi
@@ -168,9 +168,3 @@ Example of legacy encoding:
 The above, converted to the current supported format:
 
 @code{json:@{"file.driver":"rbd", "file.pool":"rbd", "file.image":"name"@}}
-
-@subsection vio-spapr-device device options
-
-@subsubsection "irq": "" (since 3.0.0)
-
-The ``irq'' property is obsoleted.
diff --git a/hw/ppc/spapr_vio.c b/hw/ppc/spapr_vio.c
index 7e8a9ad09337..414673d31341 100644
--- a/hw/ppc/spapr_vio.c
+++ b/hw/ppc/spapr_vio.c
@@ -44,38 +44,6 @@
 
 #define SPAPR_VIO_REG_BASE 0x7100
 
-static void spapr_vio_get_irq(Object *obj, Visitor *v, const char *name,
-  void *opaque, Error **errp)
-{
-Property *prop = opaque;
-uint32_t *ptr = qdev_get_prop_ptr(DEVICE(obj), prop);
-
-visit_type_uint32(v, name, ptr, errp);
-}
-
-static void spapr_vio_set_irq(Object *obj, Visitor *v, const char *name,
-  void *opaque, Error **errp)
-{
-Property *prop = opaque;
-uint32_t *ptr = qdev_get_prop_ptr(DEVICE(obj), prop);
-
-if (!qtest_enabled()) {
-warn_report(TYPE_VIO_SPAPR_DEVICE " '%s' property is deprecated", 
name);
-}
-visit_type_uint32(v, name, ptr, errp);
-}
-
-static const PropertyInfo spapr_vio_irq_propinfo = {
-.name = "irq",
-.get = spapr_vio_get_irq,
-.set = spapr_vio_set_irq,
-};
-
-static Property spapr_vio_props[] = {
-DEFINE_PROP("irq", VIOsPAPRDevice, irq, spapr_vio_irq_propinfo, uint32_t),
-DEFINE_PROP_END_OF_LIST(),
-};
-
 static char *spapr_vio_get_dev_name(DeviceState *qdev)
 {
 VIOsPAPRDevice *dev = VIO_SPAPR_DEVICE(qdev);
@@ -534,15 +502,13 @@ static void spapr_vio_busdev_realize(DeviceState *qdev, 
Error **errp)
 dev->qdev.id = id;
 }
 
-if (!dev->irq) {
-dev->irq = spapr_vio_reg_to_irq(dev->reg);
+dev->irq = spapr_vio_reg_to_irq(dev->reg);
 
-if (SPAPR_MACHINE_GET_CLASS(spapr)->legacy_irq_allocation) {
-dev->irq = spapr_irq_findone(spapr, _err);
-if (local_err) {
-error_propagate(errp, local_err);
-return;
-}
+if (SPAPR_MACHINE_GET_CLASS(spapr)->legacy_irq_allocation) {
+dev->irq = spapr_irq_findone(spapr, _err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
 }
 }
 
@@ -668,7 +634,6 @@ static void vio_spapr_device_class_init(ObjectClass *klass, 
void *data)
 k->realize = spapr_vio_busdev_realize;
 k->reset = spapr_vio_busdev_reset;
 k->bus_type = TYPE_SPAPR_VIO_BUS;
-k->props = spapr_vio_props;
 }
 
 static const TypeInfo spapr_vio_type_info = {
-- 
2.20.1

Re: [Qemu-devel] [PATCH v2] configure: Force the C standard to gnu99

2019-01-09 Thread Daniel P . Berrangé

On Wed, Jan 09, 2019 at 03:24:29PM +0100, Thomas Huth wrote:
> Different versions of GCC and Clang use different versions of the C standard.
> This repeatedly caused problems already, e.g. with duplicated typedefs:
> 
>  https://lists.gnu.org/archive/html/qemu-devel/2018-11/msg05829.html
> 
> or with for-loop variable initializers:
> 
>  https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg00237.html
> 
> To avoid these problems, we should enforce the C language version to the
> same level for all compilers. Since our minimum compiler versions is
> GCC v4.8, our best option is "gnu99" right now ("gnu17" is not available
> there yet, and "gnu11" is marked as "experimental").
> 
> Signed-off-by: Thomas Huth 
> ---
>  v2: Use gnu99 instead of gnu11
> 
>  configure | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)

Reviewed-by: Daniel P. Berrangé 


> 
> diff --git a/configure b/configure
> index b9f34af..721ade7 100755
> --- a/configure
> +++ b/configure
> @@ -105,7 +105,8 @@ update_cxxflags() {
>  for arg in $QEMU_CFLAGS; do
>  case $arg in
>  -Wstrict-prototypes|-Wmissing-prototypes|-Wnested-externs|\
> --Wold-style-declaration|-Wold-style-definition|-Wredundant-decls)
> +
> -Wold-style-declaration|-Wold-style-definition|-Wredundant-decls|\
> +-std=gnu99)

IIUC this is to  drop  -std=gnu99 from CXXFLAGS, so C++ code (only the
guest-agent on Win32 IIUC) will use the compiler default still. No worse
than what we have today.

We could consider also setting a suitable -std for CXXFLAGS too in future
though...

>  ;;
>  *)
>  QEMU_CXXFLAGS=${QEMU_CXXFLAGS:+$QEMU_CXXFLAGS }$arg
> @@ -585,7 +586,7 @@ ARFLAGS="${ARFLAGS-rv}"
>  # left shift of signed integers is well defined and has the expected
>  # 2s-complement style results. (Both clang and gcc agree that it
>  # provides these semantics.)
> -QEMU_CFLAGS="-fno-strict-aliasing -fno-common -fwrapv $QEMU_CFLAGS"
> +QEMU_CFLAGS="-fno-strict-aliasing -fno-common -fwrapv -std=gnu99 
> $QEMU_CFLAGS"
>  QEMU_CFLAGS="-Wall -Wundef -Wwrite-strings -Wmissing-prototypes $QEMU_CFLAGS"
>  QEMU_CFLAGS="-Wstrict-prototypes -Wredundant-decls $QEMU_CFLAGS"
>  QEMU_CFLAGS="-D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE 
> $QEMU_CFLAGS"

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [Qemu-devel] [PATCH] json: Fix % handling when not interpolating

2019-01-09 Thread Markus Armbruster

Max Reitz  writes:

> On 08.01.19 11:36, Markus Armbruster wrote:
>> Copying block maintainers for help with assessing the bug's (non-)impact
>> on security.
>> 
>> Christophe Fergeau  writes:
>> 
>>> On Mon, Jan 07, 2019 at 04:47:44PM +0100, Markus Armbruster wrote:
 Eric Blake  writes:

> On 1/2/19 12:01 PM, Christophe Fergeau wrote:
>> Adding Markus to cc: list, I forgot to do it when sending the patch.
>
> Also worth backporting via qemu-stable, now in cc.
>
>>
>> Christophe
>>
>> On Wed, Jan 02, 2019 at 03:05:35PM +0100, Christophe Fergeau wrote:
>>> commit 8bca4613 added support for %% in json strings when interpolating,
>>> but in doing so, this broke handling of % when not interpolating as the
>>> '%' is skipped in both cases.
>>> This commit ensures we only try to handle %% when interpolating.

 Impact?

 If you're unable to assess, could you give us at least a reproducer?
>>>
>>> This all came from 
>>> https://lists.freedesktop.org/archives/spice-devel/2018-December/046644.html
>>> Setting up a VM with libvirt with >> passwd='password%'/>
>>> fails to start with:
>>>   qemu-system-x86_64: qobject/json-parser.c:146: parse_string: Assertion 
>>> `*ptr' failed.
>>>
>>> If you use 'password%%' as the password instead, when trying to connect
>>> to the VM, you type 'password%' as the password instead of 'password%%'
>>> as configured in the domain XML.
>> 
>> Thanks.
>> 
>> As the commit message says, the bug bites when we parse a string
>> containing '%s' with !ctxt->ap.  The parser then swallows a character.
>> If it swallows the terminating '"', it fails the assertion.
>> 
>> We parse with !ctxt->ap in the following cases:
>> 
>> * Tests (tests/check-qjson.c, tests/test-qobject-input-visitor.c,
>>   tests/test-visitor-serialization.c)
>> 
>>   Plenty of tests, but we still failed to cover the buggy case :(
>> 
>> * QMP input (monitor.c)
>> 
>> * QGA input (qga/main.c)
>> 
>> * qobject_from_json()
>> 
>>   - JSON pseudo-filenames (block.c)
>> 
>> These are pseudo-filenames starting with "json:".
>> 
>>   - JSON key pairs (block/rbd.c)
>> 
>> As far as I can tell, these can come only from pseudo-filenames
>> starting with "rbd:".
>> 
>>   - JSON command line option arguments of -display and -blockdev
>> (qobject-input-visitor.c)
>> 
>> Reproducer: -blockdev '{"%"}'
>> 
>> Command line, QMP and QGA input are trusted.
>> 
>> Filenames are trusted when they come from command line, QMP or HMP.
>> They are untrusted when they come from from image file headers.
>> Example: QCOW2 backing file name.  Note that this is *not* the security
>> boundary between host and guest.  It's the boundary between host and an
>> image file from an untrusted source.
>> 
>> I can't see how the bug could be exploited.  Neither failing an
>> assertion nor skipping a character in a filename of your choice is
>> interesting.  We don't support compiling with NDEBUG.
>> 
>> Kevin, Max, do you agree?
>
> I wouldn't call it "not interesting" if adding an image to your VM at
> runtime can crash the whole thing.
>
> (qemu-img create -f qcow2 -u -b 'json:{"%"}' foo.qcow2 64M)

"Not interesting" strictly from the point of view of exploiting the bug
to penetrate trust boundaries.

> Whether this is a security issue...  I don't know, but it is a DoS.

I'm not sure whether feeding untrusted images to QEMU is a good idea in
general --- there's so much that could go wrong.  How hardened against
abuse are out block drivers?

I figure what distinguishes this case is how utterly trivial creating a
"bad" image is.

Anyway, you are the block layer maintainers, so you get to decide
whether to give this the full security bug treatment.  I'm merely the
clown who broke it %-/

Re: [Qemu-devel] [PATCH 2/4] nvme: ensure the num_queues is not zero

2019-01-09 Thread Max Reitz

On 30.10.18 06:18, Li Qiang wrote:
> When it is zero, it causes segv. Backtrack:
> Thread 5 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7fffc6c17700 (LWP 51808)]
> 0x55acbd90 in nvme_start_ctrl (n=0x624c8100) at 
> hw/block/nvme.c:820
> warning: Source file is more recent than executable.
> 820   if (unlikely(n->cq[0])) {
> (gdb) bt
> 0  0x55acbd90 in nvme_start_ctrl (n=0x624c8100) at 
> hw/block/nvme.c:820
> 1  0x55accdbc in nvme_write_bar (n=0x624c8100, offset=20, 
> data=4587521, size=4) at hw/block/nvme.c:964
> 2  0x55acdd2b in nvme_mmio_write (opaque=0x624c8100, addr=20, 
> data=4587521, size=4) at hw/block/nvme.c:1158
> 3  0x558973ed in memory_region_write_accessor (mr=0x624c89e0, 
> addr=20, value=0x7fffc6c14428, size=4, shift=0, mask=4294967295, attrs=...) 
> at /home/liqiang02/qemu-upstream/qemu/memory.c:500
> 4  0x55897600 in access_with_adjusted_size (addr=20, 
> value=0x7fffc6c14428, size=4, access_size_min=2, access_size_max=8, 
> access_fn=0x55897304 , mr=0x624c89e0, 
> attrs=...) at /home/liqiang02/qemu-upstream/qemu/memory.c:566
> 5  0x5589a200 in memory_region_dispatch_write (mr=0x624c89e0, 
> addr=20, data=4587521, size=4, attrs=...) at 
> /home/liqiang02/qemu-upstream/qemu/memory.c:1442
> 6  0x55835151 in flatview_write_continue (fv=0x606e6fc0, 
> addr=4273930260, attrs=..., buf=0x7fffc8a18028 "\001", len=4, addr1=20, l=4, 
> mr=0x624c89e0) at /home/liqiang02/qemu-upstream/qemu/exec.c:3233
> 7  0x5583529b in flatview_write (fv=0x606e6fc0, addr=4273930260, 
> attrs=..., buf=0x7fffc8a18028 "\001", len=4) at 
> /home/liqiang02/qemu-upstream/qemu/exec.c:3272
> 8  0x558355a1 in address_space_write (as=0x5683ade0 
> , addr=4273930260, attrs=..., buf=0x7fffc8a18028 
> "\001", len=4) at /home/liqiang02/qemu-upstream/qemu/exec.c:3362
> 9  0x558355f2 in address_space_rw (as=0x5683ade0 
> , addr=4273930260, attrs=..., buf=0x7fffc8a18028 
> "\001", len=4, is_write=true) at 
> /home/liqiang02/qemu-upstream/qemu/exec.c:3373
> 10 0x558b66ac in kvm_cpu_exec (cpu=0x63114800) at 
> /home/liqiang02/qemu-upstream/qemu/accel/kvm/kvm-all.c:2031
> 11 0x5587c3ac in qemu_kvm_cpu_thread_fn (arg=0x63114800) at 
> /home/liqiang02/qemu-upstream/qemu/cpus.c:1277
> 12 0x55e54ae6 in qemu_thread_start (args=0x6032c170) at 
> util/qemu-thread-posix.c:504
> 13 0x7fffdadbd494 in start_thread () from 
> /lib/x86_64-linux-gnu/libpthread.so.0
> 14 0x7fffdaaffacf in clone () from /lib/x86_64-linux-gnu/libc.so.6
> (gdb) q
> 
> Signed-off-by: Li Qiang 
> ---
>  hw/block/nvme.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index 676cc48..72c9644 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -1221,6 +1221,10 @@ static void nvme_realize(PCIDevice *pci_dev, Error 
> **errp)
>  error_setg(errp, "serial property not set");
>  return;
>  }
> +
> +if (!n->num_queues) {
> +error_setg(errp, "num_queues can't be zero");

I think we should return here.

(Sorry for the late review...)

Max

> +}
>  blkconf_blocksizes(>conf);
>  if (!blkconf_apply_backend_options(>conf, 
> blk_is_read_only(n->conf.blk),
> false, errp)) {
> 




signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 1/4] nvme: use TYPE_NVME instead of constant string

2019-01-09 Thread Max Reitz

On 30.10.18 06:18, Li Qiang wrote:
> Signed-off-by: Li Qiang 
> ---
>  hw/block/nvme.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v3 0/5] kvm "virtio pmem" device

2019-01-09 Thread Pankaj Gupta




Please ignore this series as my network went down while 
sending this. I will send this series again.

Thanks,
Pankaj  

> 
> This patch series has implementation for "virtio pmem".
>  "virtio pmem" is fake persistent memory(nvdimm) in guest
>  which allows to bypass the guest page cache. This also
>  implements a VIRTIO based asynchronous flush mechanism.
>  
>  Sharing guest kernel driver in this patchset with the
>  changes suggested in v2. Tested with Qemu side device
>  emulation for virtio-pmem [6].
>  
>  Details of project idea for 'virtio pmem' flushing interface
>  is shared [3] & [4].
> 
>  Implementation is divided into two parts:
>  New virtio pmem guest driver and qemu code changes for new
>  virtio pmem paravirtualized device.
> 
> 1. Guest virtio-pmem kernel driver
> -
>- Reads persistent memory range from paravirt device and
>  registers with 'nvdimm_bus'.
>- 'nvdimm/pmem' driver uses this information to allocate
>  persistent memory region and setup filesystem operations
>  to the allocated memory.
>- virtio pmem driver implements asynchronous flushing
>  interface to flush from guest to host.
> 
> 2. Qemu virtio-pmem device
> -
>- Creates virtio pmem device and exposes a memory range to
>  KVM guest.
>- At host side this is file backed memory which acts as
>  persistent memory.
>- Qemu side flush uses aio thread pool API's and virtio
>  for asynchronous guest multi request handling.
> 
>David Hildenbrand CCed also posted a modified version[6] of
>qemu virtio-pmem code based on updated Qemu memory device API.
> 
>  Virtio-pmem errors handling:
>  
>   Checked behaviour of virtio-pmem for below types of errors
>   Need suggestions on expected behaviour for handling these errors?
> 
>   - Hardware Errors: Uncorrectable recoverable Errors:
>   a] virtio-pmem:
> - As per current logic if error page belongs to Qemu process,
>   host MCE handler isolates(hwpoison) that page and send SIGBUS.
>   Qemu SIGBUS handler injects exception to KVM guest.
> - KVM guest then isolates the page and send SIGBUS to guest
>   userspace process which has mapped the page.
>   
>   b] Existing implementation for ACPI pmem driver:
> - Handles such errors with MCE notifier and creates a list
>   of bad blocks. Read/direct access DAX operation return EIO
>   if accessed memory page fall in bad block list.
> - It also starts backgound scrubbing.
> - Similar functionality can be reused in virtio-pmem with MCE
>   notifier but without scrubbing(no ACPI/ARS)? Need inputs to
>   confirm if this behaviour is ok or needs any change?
> 
> Changes from PATCH v2: [1]
> - Disable MAP_SYNC for ext4 & XFS filesystems - [Dan]
> - Use name 'virtio pmem' in place of 'fake dax'
> 
> Changes from PATCH v1: [2]
> - 0-day build test for build dependency on libnvdimm
> 
>  Changes suggested by - [Dan Williams]
> - Split the driver into two parts virtio & pmem
> - Move queuing of async block request to block layer
> - Add "sync" parameter in nvdimm_flush function
> - Use indirect call for nvdimm_flush
> - Don’t move declarations to common global header e.g nd.h
> - nvdimm_flush() return 0 or -EIO if it fails
> - Teach nsio_rw_bytes() that the flush can fail
> - Rename nvdimm_flush() to generic_nvdimm_flush()
> - Use 'nd_region->provider_data' for long dereferencing
> - Remove virtio_pmem_freeze/restore functions
> - Remove BSD license text with SPDX license text
> 
> - Add might_sleep() in virtio_pmem_flush - [Luiz]
> - Make spin_lock_irqsave() narrow
> 
> Changes from RFC v3
> - Rebase to latest upstream - Luiz
> - Call ndregion->flush in place of nvdimm_flush- Luiz
> - kmalloc return check - Luiz
> - virtqueue full handling - Stefan
> - Don't map entire virtio_pmem_req to device - Stefan
> - request leak, correct sizeof req- Stefan
> - Move declaration to virtio_pmem.c
> 
> Changes from RFC v2:
> - Add flush function in the nd_region in place of switching
>   on a flag - Dan & Stefan
> - Add flush completion function with proper locking and wait
>   for host side flush completion - Stefan & Dan
> - Keep userspace API in uapi header file - Stefan, MST
> - Use LE fields & New device id - MST
> - Indentation & spacing suggestions - MST & Eric
> - Remove extra header files & add licensing - Stefan
> 
> Changes from RFC v1:
> - Reuse existing 'pmem' code for registering persistent
>   memory and other operations instead of creating an entirely
>   new block driver.
> - Use VIRTIO driver to register memory information with
>   nvdimm_bus and create region_type accordingly.
> - Call VIRTIO flush from existing pmem driver.
> 
> Pankaj Gupta (5):
>libnvdimm: nd_region flush callback support
>virtio-pmem: Add virtio-pmem guest driver
>libnvdimm: add nd_region buffered dax_dev flag
>ext4: disable map_sync for virtio

[Qemu-devel] [PATCH v3 3/5] libnvdimm: add nd_region buffered dax_dev flag

2019-01-09 Thread Pankaj Gupta

This patch adds 'DAXDEV_BUFFERED' flag which is set
for virtio pmem corresponding nd_region. This later
is used to disable MAP_SYNC functionality for ext4
& xfs filesystem.

Signed-off-by: Pankaj Gupta 
---
 drivers/dax/super.c  | 17 +
 drivers/nvdimm/pmem.c|  3 +++
 drivers/nvdimm/region_devs.c |  7 +++
 drivers/virtio/pmem.c|  1 +
 include/linux/dax.h  |  9 +
 include/linux/libnvdimm.h|  6 ++
 6 files changed, 43 insertions(+)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 6e928f3..9128740 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -167,6 +167,8 @@ enum dax_device_flags {
DAXDEV_ALIVE,
/* gate whether dax_flush() calls the low level flush routine */
DAXDEV_WRITE_CACHE,
+   /* flag to disable MAP_SYNC for virtio based host page cache flush */
+   DAXDEV_BUFFERED,
 };
 
 /**
@@ -335,6 +337,21 @@ bool dax_write_cache_enabled(struct dax_device *dax_dev)
 }
 EXPORT_SYMBOL_GPL(dax_write_cache_enabled);
 
+void virtio_pmem_host_cache(struct dax_device *dax_dev, bool wc)
+{
+   if (wc)
+   set_bit(DAXDEV_BUFFERED, _dev->flags);
+   else
+   clear_bit(DAXDEV_BUFFERED, _dev->flags);
+}
+EXPORT_SYMBOL_GPL(virtio_pmem_host_cache);
+
+bool virtio_pmem_host_cache_enabled(struct dax_device *dax_dev)
+{
+   return test_bit(DAXDEV_BUFFERED, _dev->flags);
+}
+EXPORT_SYMBOL_GPL(virtio_pmem_host_cache_enabled);
+
 bool dax_alive(struct dax_device *dax_dev)
 {
lockdep_assert_held(_srcu);
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index fe1217b..8d190a3 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -472,6 +472,9 @@ static int pmem_attach_disk(struct device *dev,
return -ENOMEM;
}
dax_write_cache(dax_dev, nvdimm_has_cache(nd_region));
+
+   /* Set buffered bit in 'dax_dev' for virtio pmem */
+   virtio_pmem_host_cache(dax_dev, nvdimm_is_buffered(nd_region));
pmem->dax_dev = dax_dev;
 
gendev = disk_to_dev(disk);
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index f8218b4..1f8b2be 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -1264,6 +1264,13 @@ int nd_region_conflict(struct nd_region *nd_region, 
resource_size_t start,
return device_for_each_child(_bus->dev, , region_conflict);
 }
 
+int nvdimm_is_buffered(struct nd_region *nd_region)
+{
+   return is_nd_pmem(_region->dev) &&
+   test_bit(ND_REGION_BUFFERED, _region->flags);
+}
+EXPORT_SYMBOL_GPL(nvdimm_is_buffered);
+
 void __exit nd_region_devs_exit(void)
 {
ida_destroy(_ida);
diff --git a/drivers/virtio/pmem.c b/drivers/virtio/pmem.c
index 51f5349..901767b 100644
--- a/drivers/virtio/pmem.c
+++ b/drivers/virtio/pmem.c
@@ -81,6 +81,7 @@ static int virtio_pmem_probe(struct virtio_device *vdev)
ndr_desc.numa_node = nid;
ndr_desc.flush = virtio_pmem_flush;
set_bit(ND_REGION_PAGEMAP, _desc.flags);
+   set_bit(ND_REGION_BUFFERED, _desc.flags);
nd_region = nvdimm_pmem_region_create(nvdimm_bus, _desc);
nd_region->provider_data =  dev_to_virtio
(nd_region->dev.parent->parent);
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 0dd316a..d16e03e 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -37,6 +37,8 @@ void put_dax(struct dax_device *dax_dev);
 void kill_dax(struct dax_device *dax_dev);
 void dax_write_cache(struct dax_device *dax_dev, bool wc);
 bool dax_write_cache_enabled(struct dax_device *dax_dev);
+void virtio_pmem_host_cache(struct dax_device *dax_dev, bool wc);
+bool virtio_pmem_host_cache_enabled(struct dax_device *dax_dev);
 #else
 static inline struct dax_device *dax_get_by_host(const char *host)
 {
@@ -64,6 +66,13 @@ static inline bool dax_write_cache_enabled(struct dax_device 
*dax_dev)
 {
return false;
 }
+static inline void virtio_pmem_host_cache(struct dax_device *dax_dev, bool wc)
+{
+}
+static inline bool virtio_pmem_host_cache_enabled(struct dax_device *dax_dev)
+{
+   return false;
+}
 #endif
 
 struct writeback_control;
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index ca8bc07..94616f1 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -64,6 +64,11 @@ enum {
 */
ND_REGION_PERSIST_MEMCTRL = 2,
 
+   /* provides virtio based asynchronous flush mechanism for buffered
+* host page cache.
+*/
+   ND_REGION_BUFFERED = 3,
+
/* mark newly adjusted resources as requiring a label update */
DPA_RESOURCE_ADJUSTED = 1 << 0,
 };
@@ -265,6 +270,7 @@ int generic_nvdimm_flush(struct nd_region *nd_region);
 int nvdimm_has_flush(struct nd_region *nd_region);
 int nvdimm_has_cache(struct nd_region *nd_region);
 int nvdimm_in_overwrite(struct nvdimm *nvdimm);
+int nvdimm_is_buffered(struct nd_region

Re: [Qemu-devel] [PATCH v2] configure: Force the C standard to gnu99

2019-01-09 Thread Thomas Huth

On 2019-01-09 15:31, Daniel P. Berrangé wrote:
> On Wed, Jan 09, 2019 at 03:24:29PM +0100, Thomas Huth wrote:
>> Different versions of GCC and Clang use different versions of the C standard.
>> This repeatedly caused problems already, e.g. with duplicated typedefs:
>>
>>  https://lists.gnu.org/archive/html/qemu-devel/2018-11/msg05829.html
>>
>> or with for-loop variable initializers:
>>
>>  https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg00237.html
>>
>> To avoid these problems, we should enforce the C language version to the
>> same level for all compilers. Since our minimum compiler versions is
>> GCC v4.8, our best option is "gnu99" right now ("gnu17" is not available
>> there yet, and "gnu11" is marked as "experimental").
>>
>> Signed-off-by: Thomas Huth 
>> ---
>>  v2: Use gnu99 instead of gnu11
>>
>>  configure | 5 +++--
>>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> Reviewed-by: Daniel P. Berrangé 
> 
>> diff --git a/configure b/configure
>> index b9f34af..721ade7 100755
>> --- a/configure
>> +++ b/configure
>> @@ -105,7 +105,8 @@ update_cxxflags() {
>>  for arg in $QEMU_CFLAGS; do
>>  case $arg in
>>  -Wstrict-prototypes|-Wmissing-prototypes|-Wnested-externs|\
>> -
>> -Wold-style-declaration|-Wold-style-definition|-Wredundant-decls)
>> +
>> -Wold-style-declaration|-Wold-style-definition|-Wredundant-decls|\
>> +-std=gnu99)
> 
> IIUC this is to  drop  -std=gnu99 from CXXFLAGS, so C++ code (only the
> guest-agent on Win32 IIUC) will use the compiler default still.

We also use it for the capstone disassembler and disas/libvixl.

> We could consider also setting a suitable -std for CXXFLAGS too in future
> though...

Shall I send a v3 with "-std=gnu++98" (which seems to be the only usable
option right now), or shall we wait until we really hit a problem with
the C++ code?

 Thomas

Re: [Qemu-devel] [PATCH v3] qemu-io: Add generic function for reinitializing optind.

2019-01-09 Thread Eric Blake

On 1/9/19 7:44 AM, Daniel P. Berrangé wrote:
> On Wed, Jan 09, 2019 at 01:23:01PM +, Richard W.M. Jones wrote:
>> How about this one?  Add a generic osdep function for reinitializing
>> optind, which does optreset on FreeBSD (but is identical on all other
>> OSes).  Use it from qemu-io and qemu-img.
>>
>> I have tested this on Linux, FreeBSD and OpenBSD.
>>

> 
>> WARNING: architecture specific defines should be avoided
>> #78: FILE: include/qemu/osdep.h:600:
>> +#ifdef __FreeBSD__
> 
> Normally we'd suggest doing a configure test to for the platform
> feature and then using a feature based ifdef test. In this case
> though that would be difficult and/or overly complex.
> 
> This does make me wonder about the other *BSDs, OS-X and Mingw
> though ?  Should they all be using the #else codepath, or should
> the other BSDs / OS-X use the __FreeBSD__ codepath.

Indeed, and I already suggested a configure-time probe  on the v2
review.  My preference, if we want to go with this helper function for a
hard reset, would be:

#if HAVE_OPTRESET
  optind = 1;
  optreset = 1;
#else
  optind = 0;
#endif

where HAVE_OPTRESET is based on a configure-time probe.

Basically, any platform that HAS optreset will probably honor it; any
platform that lacks optreset is hopefully okay with the
POSIX-unspecified behavior of optind = 0.

But I do like the idea presented in this version of trying to isolate
the reset into a common helper function, so that if we have to make
later changes, we only have to touch the one function rather than all
callsites that reset getopt parsing.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH] block: Acquire the AioContext in guess_disk_lchs()

2019-01-09 Thread Stefan Hajnoczi

On Wed, Jan 09, 2019 at 04:28:50PM +0200, Alberto Garcia wrote:
> This fixes the following crash:
> 
> { "execute": "blockdev-add",
>   "arguments": {"driver": "null-co", "node-name": "hd0"}}
> { "execute": "object-add",
>   "arguments": {"qom-type": "iothread", "id": "iothread0"}}
> { "execute": "x-blockdev-set-iothread",
>   "arguments": {"node-name": "hd0", "iothread": "iothread0"}}
> { "execute": "device_add",
>   "arguments": {"id": "virtio0", "driver": "virtio-blk-pci",
> "drive": "hd0"}}
> qemu: qemu_mutex_unlock_impl: Operation not permitted
> Aborted
> 
> Signed-off-by: Alberto Garcia 
> ---
>  hw/block/hd-geometry.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH 3/4] nvme: check msix_init_exclusive_bar return value

2019-01-09 Thread Max Reitz

On 30.10.18 06:18, Li Qiang wrote:
> As this function can fail.
> 
> Signed-off-by: Li Qiang 
> ---
>  hw/block/nvme.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index 72c9644..a406c23 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -1250,7 +1250,10 @@ static void nvme_realize(PCIDevice *pci_dev, Error 
> **errp)
>  pci_register_bar(>parent_obj, 0,
>  PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64,
>  >iomem);
> -msix_init_exclusive_bar(>parent_obj, n->num_queues, 4, NULL);
> +if (msix_init_exclusive_bar(>parent_obj, n->num_queues, 4, NULL)) {
> +error_setg(errp, "msix_init_exclusive_bar failed");
> +return;
> +}

Commit ee640c625e1 which last touched this function stated that creating
this device won't fail if this function fails, so I suppose it's
intential that the error isn't checked.

But if you think the error should be checked and device creation should
be aborted if this function failed, then I suppose there'd be some
cleaning up to instead of just returning.

Also, we should just pass errp to that function (as its last parameter,
instead of NULL).

Max

>  
>  id->vid = cpu_to_le16(pci_get_word(pci_conf + PCI_VENDOR_ID));
>  id->ssvid = cpu_to_le16(pci_get_word(pci_conf + 
> PCI_SUBSYSTEM_VENDOR_ID));
> 

signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH v3 5/5] xfs: disable map_sync for virtio pmem

2019-01-09 Thread Pankaj Gupta

Virtio pmem provides asynchronous host page cache flush
mechanism. we don't support 'MAP_SYNC' with virtio pmem 
and xfs.

Signed-off-by: Pankaj Gupta 
---
 fs/xfs/xfs_file.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index e474250..eae4aa4 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1190,6 +1190,14 @@ xfs_file_mmap(
if (!IS_DAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC))
return -EOPNOTSUPP;
 
+   /* We don't support synchronous mappings with guest direct access
+* and virtio based host page cache mechanism.
+*/
+   if (IS_DAX(file_inode(filp)) && virtio_pmem_host_cache_enabled(
+   xfs_find_daxdev_for_inode(file_inode(filp))) &&
+   (vma->vm_flags & VM_SYNC))
+   return -EOPNOTSUPP;
+
file_accessed(filp);
vma->vm_ops = _file_vm_ops;
if (IS_DAX(file_inode(filp)))
-- 
2.9.3

Re: [Qemu-devel] [PATCH v2 0/3] HMP/snapshot changes - do not use ID anymore

2019-01-09 Thread Max Reitz

On 09.01.19 15:48, Kevin Wolf wrote:
> Am 09.01.2019 um 15:27 hat Max Reitz geschrieben:
>> On 09.01.19 15:21, Kevin Wolf wrote:
>>> Am 09.01.2019 um 15:10 hat Max Reitz geschrieben:
 On 06.09.18 13:11, Daniel Henrique Barboza wrote:
> changes in v2:
> - removed the "RFC" marker;
> - added a new patch (patch 2) that removes
> bdrv_snapshot_delete_by_id_or_name from the code;
> - made changes in patch 1 as suggested by Murilo;
> - previous patch set link:
> https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg04658.html
>
>
> It is not uncommon to see bugs being opened by testers that attempt to
> create VM snapshots using HMP. It turns out that "0" and "1" are quite
> common snapshot names and they trigger a lot of bugs. I gave an example
> in the commit message of patch 1, but to sum up here: QEMU treats the
> input of savevm/loadvm/delvm sometimes as 'ID', sometimes as 'name'. It
> is documented as such, but this can lead to strange situations.
>
> Given that it is strange for an API to consider a parameter to be 2 fields
> at the same time, and inadvently treating them as one or the other, and
> that removing the ID field is too drastic, my idea here is to keep the
> ID field for internal control, but do not let the user set it.
>
> I guess there's room for discussion about considering this change an API
> change or not. It doesn't affect users of HMP and it doesn't affect 
> Libvirt,
> but simplifying the meaning of the parameters of savevm/loadvm/delvm.

 (Yes, very late reply, I'm sorry...)

 I do think it affects users of HMP, because right now you can delete
 snapshots with their ID, and after this series you cannot.
>>>
>>> Can there be snapshots that can't be identified by a snapshot name, but
>>> only by their ID?
>>
>> I don't know, but what I meant is that if you have scripts to do all
>> this, you might have to adjust them with this change.
> 
> That's what the H in HMP means.
> 
 I think we had a short discussion about just disallowing numeric
 snapshot names.  How bad would that be?
>>>
>>> It would be incompatible with existing images and result in a more
>>> complex snapshot identifier resolution. Why would it be any better?
>>
>> It wouldn't be incompatible with existing images if we'd just disallow
>> creating such snapshots.  I don't see how the identifier resolution
>> would be more complex.
>>
>> I don't know if it'd be better.  I'd just find it simpler (for us, that
>> is -- for users, I'm not sure).
> 
> Identifier resolution A:
> - Find a snapshot that has the given identifier as a name
> - If no such snapshot exists, it is an error
> 
> Identifier resolution B:
> - If the identifier is a number, find a snapshot that has the given
>   identifier as its ID
> - If the identifier is not a number, find a snapshot that has the given
>   identifier as a name
> - If no such snapshot exists, it is an error

No, my idea was to keep the resolution the same as it is; just to forbid
creating new snapshots with numeric names.  This would prevent users
from getting into the whole situation.

Max



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v15 0/6] Add support for TPM Physical Presence interface

2019-01-09 Thread Marc-André Lureau

Hi

On Wed, Jan 9, 2019 at 6:51 PM Michael S. Tsirkin  wrote:
>
> On Wed, Jan 09, 2019 at 12:28:56PM +0400, Marc-André Lureau wrote:
> > Hi,
> >
> > The following patches implement the TPM Physical Presence Interface
> > that allows a user to set a command via ACPI (sysfs entry in Linux)
> > that, upon the next reboot, the firmware looks for and acts upon by
> > sending sequences of commands to the TPM.
> >
> > A dedicated memory region is added to the TPM CRB & TIS devices, at
> > address/size 0xFED45000/0x400. A new "etc/tpm/config" fw_cfg entry
> > holds the location for that PPI region and some version details, to
> > allow for future flexibility.
> >
> > With the associated edk2/ovmf firmware, the Windows HLK "PPI 1.3" test
> > now runs successfully.
> >
> > It is based on previous work from Stefan Berger ("[PATCH v2 0/4]
> > Implement Physical Presence interface for TPM 1.2 and 2")
> >
> > The edk2 support is merged upstream.
>
> So it looks good to me overall.
> Minor style comments.
>
> With or without:
>
> Reviewed-by: Michael S. Tsirkin 

Thanks Michael for the review!

Would you take it in your next pull request? Or may I send a pull req?

For the minor style comments, I can either let you do that on commit,
resend, or together with a pullreq.

thanks

>
>
> > v15:
> > - fix crash on reset when PPI is disabled
> >
> > v14:
> > - rebased, fixing conflicts after compat-props refactoring
> > - fix build regression from v13 with --disable-tpm
> >
> > v13:
> > - removed needless error handling in tpm_ppi_init()
> > - splitted "add ACPI memory clear interface"
> > - moved acpi build function in dedicated hw/acpi/tpm.c
> > - added some function documentation in headers
> > - various code cleanups suggested by Philippe
> > - rebased
> >
> > Marc-André Lureau (3):
> >   tpm: add a "ppi" boolean property
> >   acpi: add ACPI memory clear interface
> >   tpm: clear RAM when "memory overwrite" requested
> >
> > Stefan Berger (3):
> >   tpm: allocate/map buffer for TPM Physical Presence interface
> >   acpi: expose TPM/PPI configuration parameters to firmware via fw_cfg
> >   acpi: build TPM Physical Presence interface
> >
> >  hw/tpm/tpm_ppi.h  |  46 +
> >  include/hw/acpi/tpm.h |  21 ++
> >  hw/acpi/tpm.c | 448 ++
> >  hw/core/machine.c |   8 +
> >  hw/i386/acpi-build.c  |  29 ++-
> >  hw/tpm/tpm_crb.c  |  13 ++
> >  hw/tpm/tpm_ppi.c  |  53 +
> >  hw/tpm/tpm_tis.c  |  13 ++
> >  stubs/tpm.c   |   5 +
> >  docs/specs/tpm.txt| 104 ++
> >  hw/acpi/Makefile.objs |   1 +
> >  hw/tpm/Makefile.objs  |   1 +
> >  hw/tpm/trace-events   |   3 +
> >  13 files changed, 743 insertions(+), 2 deletions(-)
> >  create mode 100644 hw/tpm/tpm_ppi.h
> >  create mode 100644 hw/acpi/tpm.c
> >  create mode 100644 hw/tpm/tpm_ppi.c
> >
> > --
> > 2.20.1.2.gb21ebb671b
>


-- 
Marc-André Lureau

Re: [Qemu-devel] [PATCH] throttle-groups: fix restart coroutine iothread race

2019-01-09 Thread Alberto Garcia

On Wed 09 Jan 2019 12:01:44 PM CET, Stefan Hajnoczi wrote:
> The following QMP command leads to a crash when iothreads are used:
>
>   { 'execute': 'device_del', 'arguments': {'id': 'data'} }

I was trying to reproduce this and I found this crashing in master:

$ qemu-system-x86_64 -enable-kvm -qmp stdio -display none
{ "execute": "qmp_capabilities" }
{ "execute": "blockdev-add", "arguments": {"driver": "qcow2", "file": 
{"driver": "file", "filename": "hd.qcow2"}, "node-name": "hd0"}}
{ "execute": "object-add", "arguments": {"qom-type": "iothread", "id": 
"iothread0"}}
{ "execute": "x-blockdev-set-iothread", "arguments": {"node-name": "hd0", 
"iothread": "iothread0"}}
{ "execute": "device_add", "arguments": {"id": "virtio0", "driver": 
"virtio-blk-pci", "drive": "hd0"}}

Thread 1 "qemu-system-x86" received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  [...] in __GI_abort () at abort.c:89
#2  [...] in error_exit () at util/qemu-thread-posix.c:36
#3  [...] in qemu_mutex_unlock_impl () at util/qemu-thread-posix.c:96
#4  [...] in aio_context_release () at util/async.c:516
#5  [...] in blk_prw () at block/block-backend.c:1262
#6  [...] in blk_pread () at block/block-backend.c:1424
#7  [...] in blk_pread_unthrottled () at block/block-backend.c:1279
#8  [...] in guess_disk_lchs () at hw/block/hd-geometry.c:71
#9  [...] in hd_geometry_guess () at hw/block/hd-geometry.c:136
#10 [...] in blkconf_geometry () at hw/block/block.c:99
#11 [...] in virtio_blk_device_realize () at hw/block/virtio-blk.c:944
#12 [...] in virtio_device_realize () at hw/virtio/virtio.c:2538
[...]

Berto

Re: [Qemu-devel] [PATCH] configure: Force the C standard to gnu11

2019-01-09 Thread Thomas Huth

On 2019-01-09 14:27, Daniel P. Berrangé wrote:
> On Wed, Jan 09, 2019 at 02:20:02PM +0100, Thomas Huth wrote:
>> On 2019-01-09 13:58, Daniel P. Berrangé wrote:
>>> On Wed, Jan 09, 2019 at 01:52:02PM +0100, Thomas Huth wrote:
 On 2019-01-09 12:44, Daniel P. Berrangé wrote:
> On Wed, Jan 09, 2019 at 12:25:43PM +0100, Thomas Huth wrote:
>> On 2019-01-09 11:58, Daniel P. Berrangé wrote:
>>> On Mon, Jan 07, 2019 at 11:45:26AM +0100, Thomas Huth wrote:
 Different versions of GCC and Clang use different versions of the C 
 standard.
 This repeatedly caused problems already, e.g. with duplicated typedefs:

  https://lists.gnu.org/archive/html/qemu-devel/2018-11/msg05829.html

 or with for-loop variable initializers:

  https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg00237.html

 To avoid these problems, we should enforce the C language version to 
 the
 same level for all compilers. Since our minimum compiler versions are
 GCC v4.8 and Clang v3.4 now, and both basically support "gnu11" 
 already,
 this seems to be a good choice.
>>>
>>> In 4.x   gnu11 is marked as experimental. I'm not really comfortable
>>> using experimental features - even if its warning free there's a risk
>>> it would silently mis-compile something.
>>>
>>> gnu99 is ok with 4.x - it is merely "incomplete".
>>
>> gnu11 has the big advantage that it also fixes the problem with
>> duplicated typedefs that are reported by older versions of Clang.
>>
>> Are you sure about the experimental character in 4.x? I just looked at
>> https://gcc.gnu.org/onlinedocs/gcc-4.8.5/gcc/Standards.html and it says:
>>
>> "A fourth version of the C standard, known as C11, was published in 2011
>> as ISO/IEC 9899:2011. GCC has limited incomplete support for parts of
>> this standard, enabled with -std=c11 or -std=iso9899:2011."
>>
>> It does not say anything about "experimental" there. The word
>> "experimental" is only used for the C++ support, but we hardly have C++
>> code in QEMU -- if you worry about that, I could simply drop the
>> "-std=gnu++11" part from my patch?
>
> I was looking at the "info gcc" docs on RHEL7, 
> gcc-4.8.5-16.el7_4.1.x86_64:
>
>   "3.4 Options Controlling C Dialect
>
>snip...
>
>  'gnu11'
>  'gnu1x'
>   GNU dialect of ISO C11.  Support is incomplete and
>   experimental.  The name 'gnu1x' is deprecated."

 Ok. Looks like the "Support is incomplete and experimental" sentence has
 been removed with GCC 4.9.0 here. So GCC 4.8 is likely pretty close
 already. IMHO we could give it a try and enable gnu11 for QEMU with GCC
 v4.8, too. If we later find problems, we could still switch back to
 gnu99 instead. Other opinions?
>>>
>>> Our code is already cleanly compiling with gnu99 standard - the problem
>>> is merely that we sometimes introduce regressions due to not enforcing
>>> that standard level. I don't think the features in gnu11 are compelling
>>> enough to justify using something that's declared experimental.
>> What about the duplicated typedef problem? See:
>>
>>  https://lists.gnu.org/archive/html/qemu-devel/2018-11/msg05829.html
>>
>> That one occured with Clang, but I think we've had plenty of these in
>> the past with GCC, too...
> 
> IIUC, That's only a problem because we don't pass any  -std  flag, and
> so get the compilers default, which may be gnu11.  If we explicitly
> set -std=gnu99, that problem will be reported by patchew, travis and
> maintainers own build tests, and thus won't get anywhere near git master.

Ah, right, I just tried it with a newer version of Clang, and indeed it
then prints out a warning if some code with a duplicated typedef is
compiled with "-std=gnu99" (while it remains silent without that
option). So in that case I'm fine with gnu99, too. I'll send a v2 of the
patch...

 Thomas

[Qemu-devel] [PATCH v3 3/5] libnvdimm: add nd_region buffered dax_dev flag

2019-01-09 Thread Pankaj Gupta

This patch adds 'DAXDEV_BUFFERED' flag which is set
for virtio pmem corresponding nd_region. This later
is used to disable MAP_SYNC functionality for ext4
& xfs filesystem.

Signed-off-by: Pankaj Gupta 
---
 drivers/dax/super.c  | 17 +
 drivers/nvdimm/pmem.c|  3 +++
 drivers/nvdimm/region_devs.c |  7 +++
 drivers/virtio/pmem.c|  1 +
 include/linux/dax.h  |  9 +
 include/linux/libnvdimm.h|  6 ++
 6 files changed, 43 insertions(+)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 6e928f3..9128740 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -167,6 +167,8 @@ enum dax_device_flags {
DAXDEV_ALIVE,
/* gate whether dax_flush() calls the low level flush routine */
DAXDEV_WRITE_CACHE,
+   /* flag to disable MAP_SYNC for virtio based host page cache flush */
+   DAXDEV_BUFFERED,
 };
 
 /**
@@ -335,6 +337,21 @@ bool dax_write_cache_enabled(struct dax_device *dax_dev)
 }
 EXPORT_SYMBOL_GPL(dax_write_cache_enabled);
 
+void virtio_pmem_host_cache(struct dax_device *dax_dev, bool wc)
+{
+   if (wc)
+   set_bit(DAXDEV_BUFFERED, _dev->flags);
+   else
+   clear_bit(DAXDEV_BUFFERED, _dev->flags);
+}
+EXPORT_SYMBOL_GPL(virtio_pmem_host_cache);
+
+bool virtio_pmem_host_cache_enabled(struct dax_device *dax_dev)
+{
+   return test_bit(DAXDEV_BUFFERED, _dev->flags);
+}
+EXPORT_SYMBOL_GPL(virtio_pmem_host_cache_enabled);
+
 bool dax_alive(struct dax_device *dax_dev)
 {
lockdep_assert_held(_srcu);
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index fe1217b..8d190a3 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -472,6 +472,9 @@ static int pmem_attach_disk(struct device *dev,
return -ENOMEM;
}
dax_write_cache(dax_dev, nvdimm_has_cache(nd_region));
+
+   /* Set buffered bit in 'dax_dev' for virtio pmem */
+   virtio_pmem_host_cache(dax_dev, nvdimm_is_buffered(nd_region));
pmem->dax_dev = dax_dev;
 
gendev = disk_to_dev(disk);
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index f8218b4..1f8b2be 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -1264,6 +1264,13 @@ int nd_region_conflict(struct nd_region *nd_region, 
resource_size_t start,
return device_for_each_child(_bus->dev, , region_conflict);
 }
 
+int nvdimm_is_buffered(struct nd_region *nd_region)
+{
+   return is_nd_pmem(_region->dev) &&
+   test_bit(ND_REGION_BUFFERED, _region->flags);
+}
+EXPORT_SYMBOL_GPL(nvdimm_is_buffered);
+
 void __exit nd_region_devs_exit(void)
 {
ida_destroy(_ida);
diff --git a/drivers/virtio/pmem.c b/drivers/virtio/pmem.c
index 51f5349..901767b 100644
--- a/drivers/virtio/pmem.c
+++ b/drivers/virtio/pmem.c
@@ -81,6 +81,7 @@ static int virtio_pmem_probe(struct virtio_device *vdev)
ndr_desc.numa_node = nid;
ndr_desc.flush = virtio_pmem_flush;
set_bit(ND_REGION_PAGEMAP, _desc.flags);
+   set_bit(ND_REGION_BUFFERED, _desc.flags);
nd_region = nvdimm_pmem_region_create(nvdimm_bus, _desc);
nd_region->provider_data =  dev_to_virtio
(nd_region->dev.parent->parent);
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 0dd316a..d16e03e 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -37,6 +37,8 @@ void put_dax(struct dax_device *dax_dev);
 void kill_dax(struct dax_device *dax_dev);
 void dax_write_cache(struct dax_device *dax_dev, bool wc);
 bool dax_write_cache_enabled(struct dax_device *dax_dev);
+void virtio_pmem_host_cache(struct dax_device *dax_dev, bool wc);
+bool virtio_pmem_host_cache_enabled(struct dax_device *dax_dev);
 #else
 static inline struct dax_device *dax_get_by_host(const char *host)
 {
@@ -64,6 +66,13 @@ static inline bool dax_write_cache_enabled(struct dax_device 
*dax_dev)
 {
return false;
 }
+static inline void virtio_pmem_host_cache(struct dax_device *dax_dev, bool wc)
+{
+}
+static inline bool virtio_pmem_host_cache_enabled(struct dax_device *dax_dev)
+{
+   return false;
+}
 #endif
 
 struct writeback_control;
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index ca8bc07..94616f1 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -64,6 +64,11 @@ enum {
 */
ND_REGION_PERSIST_MEMCTRL = 2,
 
+   /* provides virtio based asynchronous flush mechanism for buffered
+* host page cache.
+*/
+   ND_REGION_BUFFERED = 3,
+
/* mark newly adjusted resources as requiring a label update */
DPA_RESOURCE_ADJUSTED = 1 << 0,
 };
@@ -265,6 +270,7 @@ int generic_nvdimm_flush(struct nd_region *nd_region);
 int nvdimm_has_flush(struct nd_region *nd_region);
 int nvdimm_has_cache(struct nd_region *nd_region);
 int nvdimm_in_overwrite(struct nvdimm *nvdimm);
+int nvdimm_is_buffered(struct nd_region

[Qemu-devel] [PATCH v3 0/5] kvm "virtio pmem" device

2019-01-09 Thread Pankaj Gupta

 This patch series has implementation for "virtio pmem". 
 "virtio pmem" is fake persistent memory(nvdimm) in guest 
 which allows to bypass the guest page cache. This also
 implements a VIRTIO based asynchronous flush mechanism.  
 
 Sharing guest kernel driver in this patchset with the 
 changes suggested in v2. Tested with Qemu side device 
 emulation for virtio-pmem [6]. 
 
 Details of project idea for 'virtio pmem' flushing interface 
 is shared [3] & [4].

 Implementation is divided into two parts:
 New virtio pmem guest driver and qemu code changes for new 
 virtio pmem paravirtualized device.

1. Guest virtio-pmem kernel driver
-
   - Reads persistent memory range from paravirt device and 
 registers with 'nvdimm_bus'.  
   - 'nvdimm/pmem' driver uses this information to allocate 
 persistent memory region and setup filesystem operations 
 to the allocated memory. 
   - virtio pmem driver implements asynchronous flushing 
 interface to flush from guest to host.

2. Qemu virtio-pmem device
-
   - Creates virtio pmem device and exposes a memory range to 
 KVM guest. 
   - At host side this is file backed memory which acts as 
 persistent memory. 
   - Qemu side flush uses aio thread pool API's and virtio 
 for asynchronous guest multi request handling. 

   David Hildenbrand CCed also posted a modified version[6] of 
   qemu virtio-pmem code based on updated Qemu memory device API. 

 Virtio-pmem errors handling:
 
  Checked behaviour of virtio-pmem for below types of errors
  Need suggestions on expected behaviour for handling these errors?

  - Hardware Errors: Uncorrectable recoverable Errors: 
  a] virtio-pmem: 
- As per current logic if error page belongs to Qemu process, 
  host MCE handler isolates(hwpoison) that page and send SIGBUS. 
  Qemu SIGBUS handler injects exception to KVM guest. 
- KVM guest then isolates the page and send SIGBUS to guest 
  userspace process which has mapped the page. 
  
  b] Existing implementation for ACPI pmem driver: 
- Handles such errors with MCE notifier and creates a list 
  of bad blocks. Read/direct access DAX operation return EIO 
  if accessed memory page fall in bad block list.
- It also starts backgound scrubbing.  
- Similar functionality can be reused in virtio-pmem with MCE 
  notifier but without scrubbing(no ACPI/ARS)? Need inputs to 
  confirm if this behaviour is ok or needs any change?

Changes from PATCH v2: [1]
- Disable MAP_SYNC for ext4 & XFS filesystems - [Dan] 
- Use name 'virtio pmem' in place of 'fake dax' 

Changes from PATCH v1: [2]
- 0-day build test for build dependency on libnvdimm 

 Changes suggested by - [Dan Williams]
- Split the driver into two parts virtio & pmem  
- Move queuing of async block request to block layer
- Add "sync" parameter in nvdimm_flush function
- Use indirect call for nvdimm_flush
- Don’t move declarations to common global header e.g nd.h
- nvdimm_flush() return 0 or -EIO if it fails
- Teach nsio_rw_bytes() that the flush can fail
- Rename nvdimm_flush() to generic_nvdimm_flush()
- Use 'nd_region->provider_data' for long dereferencing
- Remove virtio_pmem_freeze/restore functions
- Remove BSD license text with SPDX license text

- Add might_sleep() in virtio_pmem_flush - [Luiz]
- Make spin_lock_irqsave() narrow

Changes from RFC v3
- Rebase to latest upstream - Luiz
- Call ndregion->flush in place of nvdimm_flush- Luiz
- kmalloc return check - Luiz
- virtqueue full handling - Stefan
- Don't map entire virtio_pmem_req to device - Stefan
- request leak, correct sizeof req- Stefan
- Move declaration to virtio_pmem.c

Changes from RFC v2:
- Add flush function in the nd_region in place of switching
  on a flag - Dan & Stefan
- Add flush completion function with proper locking and wait
  for host side flush completion - Stefan & Dan
- Keep userspace API in uapi header file - Stefan, MST
- Use LE fields & New device id - MST
- Indentation & spacing suggestions - MST & Eric
- Remove extra header files & add licensing - Stefan

Changes from RFC v1:
- Reuse existing 'pmem' code for registering persistent 
  memory and other operations instead of creating an entirely 
  new block driver.
- Use VIRTIO driver to register memory information with 
  nvdimm_bus and create region_type accordingly. 
- Call VIRTIO flush from existing pmem driver.

Pankaj Gupta (5):
   libnvdimm: nd_region flush callback support
   virtio-pmem: Add virtio-pmem guest driver
   libnvdimm: add nd_region buffered dax_dev flag
   ext4: disable map_sync for virtio pmem
   xfs: disable map_sync for virtio pmem

[2] https://lkml.org/lkml/2018/8/31/407
[3] https://www.spinics.net/lists/kvm/msg149761.html
[4] https://www.spinics.net/lists/kvm/msg153095.html  
[5] https://lkml.org/lkml/2018/8/31/413
[6] https://marc.info/?l=qemu-devel=153555721901824=2

Re: [Qemu-devel] [PATCH v4] tests: vm: auto_install OpenBSD

2019-01-09 Thread Daniel P . Berrangé

On Wed, Jan 09, 2019 at 01:55:04PM +, Daniel P. Berrangé wrote:
> On Wed, Nov 14, 2018 at 05:58:07PM +0800, Fam Zheng wrote:
> > On Sun, 11/11 18:20, Brad Smith wrote:
> > > ping.
> > 
> > Queued. Will send a pull request soon.
> 
> ping, this seems to have got lost, and is blocking us deleting SDL 1.2
> support from QEMU.

Sigh, never mind, I found the PR is here:

https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg00732.html

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

[Qemu-devel] [PATCH v3 2/5] virtio-pmem: Add virtio pmem driver

2019-01-09 Thread Pankaj Gupta

This patch adds virtio-pmem driver for KVM guest.

Guest reads the persistent memory range information from
Qemu over VIRTIO and registers it on nvdimm_bus. It also
creates a nd_region object with the persistent memory
range information so that existing 'nvdimm/pmem' driver
can reserve this into system memory map. This way
'virtio-pmem' driver uses existing functionality of pmem
driver to register persistent memory compatible for DAX
capable filesystems.

This also provides function to perform guest flush over
VIRTIO from 'pmem' driver when userspace performs flush
on DAX memory range.

Signed-off-by: Pankaj Gupta 
---
 drivers/nvdimm/virtio_pmem.c |  84 ++
 drivers/virtio/Kconfig   |  10 
 drivers/virtio/Makefile  |   1 +
 drivers/virtio/pmem.c| 124 +++
 include/linux/virtio_pmem.h  |  60 +++
 include/uapi/linux/virtio_ids.h  |   1 +
 include/uapi/linux/virtio_pmem.h |  10 
 7 files changed, 290 insertions(+)
 create mode 100644 drivers/nvdimm/virtio_pmem.c
 create mode 100644 drivers/virtio/pmem.c
 create mode 100644 include/linux/virtio_pmem.h
 create mode 100644 include/uapi/linux/virtio_pmem.h

diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c
new file mode 100644
index 000..2a1b1ba
--- /dev/null
+++ b/drivers/nvdimm/virtio_pmem.c
@@ -0,0 +1,84 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * virtio_pmem.c: Virtio pmem Driver
+ *
+ * Discovers persistent memory range information
+ * from host and provides a virtio based flushing
+ * interface.
+ */
+#include 
+#include "nd.h"
+
+ /* The interrupt handler */
+void host_ack(struct virtqueue *vq)
+{
+   unsigned int len;
+   unsigned long flags;
+   struct virtio_pmem_request *req, *req_buf;
+   struct virtio_pmem *vpmem = vq->vdev->priv;
+
+   spin_lock_irqsave(>pmem_lock, flags);
+   while ((req = virtqueue_get_buf(vq, )) != NULL) {
+   req->done = true;
+   wake_up(>host_acked);
+
+   if (!list_empty(>req_list)) {
+   req_buf = list_first_entry(>req_list,
+   struct virtio_pmem_request, list);
+   list_del(>req_list);
+   req_buf->wq_buf_avail = true;
+   wake_up(_buf->wq_buf);
+   }
+   }
+   spin_unlock_irqrestore(>pmem_lock, flags);
+}
+EXPORT_SYMBOL_GPL(host_ack);
+
+ /* The request submission function */
+int virtio_pmem_flush(struct nd_region *nd_region)
+{
+   int err;
+   unsigned long flags;
+   struct scatterlist *sgs[2], sg, ret;
+   struct virtio_device *vdev = nd_region->provider_data;
+   struct virtio_pmem *vpmem = vdev->priv;
+   struct virtio_pmem_request *req;
+
+   might_sleep();
+   req = kmalloc(sizeof(*req), GFP_KERNEL);
+   if (!req)
+   return -ENOMEM;
+
+   req->done = req->wq_buf_avail = false;
+   strcpy(req->name, "FLUSH");
+   init_waitqueue_head(>host_acked);
+   init_waitqueue_head(>wq_buf);
+   sg_init_one(, req->name, strlen(req->name));
+   sgs[0] = 
+   sg_init_one(, >ret, sizeof(req->ret));
+   sgs[1] = 
+
+   spin_lock_irqsave(>pmem_lock, flags);
+   err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);
+   if (err) {
+   dev_err(>dev, "failed to send command to virtio pmem 
device\n");
+
+   list_add_tail(>req_list, >list);
+   spin_unlock_irqrestore(>pmem_lock, flags);
+
+   /* When host has read buffer, this completes via host_ack */
+   wait_event(req->wq_buf, req->wq_buf_avail);
+   spin_lock_irqsave(>pmem_lock, flags);
+   }
+   virtqueue_kick(vpmem->req_vq);
+   spin_unlock_irqrestore(>pmem_lock, flags);
+
+   /* When host has read buffer, this completes via host_ack */
+   wait_event(req->host_acked, req->done);
+   err = req->ret;
+   kfree(req);
+
+   return err;
+};
+EXPORT_SYMBOL_GPL(virtio_pmem_flush);
+MODULE_LICENSE("GPL");
diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
index 3589764..9f634a2 100644
--- a/drivers/virtio/Kconfig
+++ b/drivers/virtio/Kconfig
@@ -42,6 +42,16 @@ config VIRTIO_PCI_LEGACY
 
  If unsure, say Y.
 
+config VIRTIO_PMEM
+   tristate "Support for virtio pmem driver"
+   depends on VIRTIO
+   depends on LIBNVDIMM
+   help
+   This driver provides support for virtio based flushing interface
+   for persistent memory range.
+
+   If unsure, say M.
+
 config VIRTIO_BALLOON
tristate "Virtio balloon driver"
depends on VIRTIO
diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
index 3a2b5c5..143ce91 100644
--- a/drivers/virtio/Makefile
+++ b/drivers/virtio/Makefile
@@ -6,3 +6,4 @@ virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o

Re: [Qemu-devel] [PATCH for-4.0 v9 16/16] qemu_thread_join: fix segmentation fault

2019-01-09 Thread Fei Li




在 2019/1/9 上午1:29, Markus Armbruster 写道:

fei  writes:


在 2019年1月8日，01:55，Markus Armbruster  写道：

Fei Li  writes:


To avoid the segmentation fault in qemu_thread_join(), just directly
return when the QemuThread *thread failed to be created in either
qemu-thread-posix.c or qemu-thread-win32.c.

Cc: Stefan Weil 
Signed-off-by: Fei Li 
Reviewed-by: Fam Zheng 
---
util/qemu-thread-posix.c | 3 +++
util/qemu-thread-win32.c | 2 +-
2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/util/qemu-thread-posix.c b/util/qemu-thread-posix.c
index 39834b0551..3548935dac 100644
--- a/util/qemu-thread-posix.c
+++ b/util/qemu-thread-posix.c
@@ -571,6 +571,9 @@ void *qemu_thread_join(QemuThread *thread)
 int err;
 void *ret;

+if (!thread->thread) {
+return NULL;
+}

How can this happen?

I think I have answered this earlier, please check the following link to see 
whether it helps:
http://lists.nongnu.org/archive/html/qemu-devel/2018-11/msg06554.html

Thanks for the pointer.  Unfortunately, I don't understand your
explanation.  You also wrote there "I will remove this patch in next
version"; looks like you've since changed your mind.
Emm, issues left over from history.. The background is I was hurry to 
make those five
Reviewed-by patches be merged, including this v9 16/16 patch but not the 
real
qemu_thread_create() modification. But actually this patch is to fix the 
segmentation
fault after we modified qemu_thread_create() related functions although 
it has got a
Reviewed-by earlier. :) Thus to not make troube, I wrote the "remove..." 
sentence
to separate it from those 5 Reviewed-by patches, and were plan to send 
only four patches.
But later I got a message that these five patches are not that urgent to 
catch qemu v3.1,
thus I joined the earlier 5 R-b patches into the later v8 & v9 to have a 
better review.


Sorry for the trouble, I need to explain it without involving too much 
background..


Back at the farm: in our current qemu code, some cleanups use a loop to 
join()
the total number of threads if caller fails. This is not a problem until 
applying the

qemu_thread_create() modification. E.g. when compress_threads_save_setup()
fails while trying to create the last do_data_compress thread, 
segmentation fault
will occur when join() is called (sadly there's not enough condition to 
filter this

unsuccessful created thread) as this thread is actually not be created.

Hope the above makes it clear. :)

Have a nice day
Fei


What exactly breaks if we omit this patch?  Assuming something does
break: imagine we did omit this patch, then forgot we ever saw it, and
now you've discovered the breakage.  Write us the bug report, complete
with reproducer.

[...]

Re: [Qemu-devel] [PATCH] spapr/vio: remove the "irq" property"

2019-01-09 Thread Greg Kurz

On Wed,  9 Jan 2019 11:30:28 +0100
Cédric Le Goater  wrote:

> commit efe2add7cb7f ("spapr/vio: deprecate the "irq" property") was
> merged in QEMU version 3.0. The "irq" property" can be removed for
> QEMU version 4.0, 2 version later.
> 
> Signed-off-by: Cédric Le Goater 
> ---
> 

The change in the code looks good but...

>  The machine uselessly allocates an IRQ number for the sPAPR NVRAM.
>  This is harmless but still a little ugly.
> 
>  hw/ppc/spapr_vio.c | 47 ++

... but qemu-deprecated.texi still contains this subsection:

@subsection vio-spapr-device device options

@subsubsection "irq": "" (since 3.0.0)

The ``irq'' property is obsoleted.

It should be removed in this patch as well.

Another file that usually needs patching is qemu-options.hx, but we don't
need it as it doesn't contain any sPAPR related documentation.

With the subsection removed from qemu-deprecated.texi,

Reviewed-by: Greg Kurz 

>  1 file changed, 6 insertions(+), 41 deletions(-)
> 
> diff --git a/hw/ppc/spapr_vio.c b/hw/ppc/spapr_vio.c
> index 7e8a9ad09337..414673d31341 100644
> --- a/hw/ppc/spapr_vio.c
> +++ b/hw/ppc/spapr_vio.c
> @@ -44,38 +44,6 @@
>  
>  #define SPAPR_VIO_REG_BASE 0x7100
>  
> -static void spapr_vio_get_irq(Object *obj, Visitor *v, const char *name,
> -  void *opaque, Error **errp)
> -{
> -Property *prop = opaque;
> -uint32_t *ptr = qdev_get_prop_ptr(DEVICE(obj), prop);
> -
> -visit_type_uint32(v, name, ptr, errp);
> -}
> -
> -static void spapr_vio_set_irq(Object *obj, Visitor *v, const char *name,
> -  void *opaque, Error **errp)
> -{
> -Property *prop = opaque;
> -uint32_t *ptr = qdev_get_prop_ptr(DEVICE(obj), prop);
> -
> -if (!qtest_enabled()) {
> -warn_report(TYPE_VIO_SPAPR_DEVICE " '%s' property is deprecated", 
> name);
> -}
> -visit_type_uint32(v, name, ptr, errp);
> -}
> -
> -static const PropertyInfo spapr_vio_irq_propinfo = {
> -.name = "irq",
> -.get = spapr_vio_get_irq,
> -.set = spapr_vio_set_irq,
> -};
> -
> -static Property spapr_vio_props[] = {
> -DEFINE_PROP("irq", VIOsPAPRDevice, irq, spapr_vio_irq_propinfo, 
> uint32_t),
> -DEFINE_PROP_END_OF_LIST(),
> -};
> -
>  static char *spapr_vio_get_dev_name(DeviceState *qdev)
>  {
>  VIOsPAPRDevice *dev = VIO_SPAPR_DEVICE(qdev);
> @@ -534,15 +502,13 @@ static void spapr_vio_busdev_realize(DeviceState *qdev, 
> Error **errp)
>  dev->qdev.id = id;
>  }
>  
> -if (!dev->irq) {
> -dev->irq = spapr_vio_reg_to_irq(dev->reg);
> +dev->irq = spapr_vio_reg_to_irq(dev->reg);
>  
> -if (SPAPR_MACHINE_GET_CLASS(spapr)->legacy_irq_allocation) {
> -dev->irq = spapr_irq_findone(spapr, _err);
> -if (local_err) {
> -error_propagate(errp, local_err);
> -return;
> -}
> +if (SPAPR_MACHINE_GET_CLASS(spapr)->legacy_irq_allocation) {
> +dev->irq = spapr_irq_findone(spapr, _err);
> +if (local_err) {
> +error_propagate(errp, local_err);
> +return;
>  }
>  }
>  
> @@ -668,7 +634,6 @@ static void vio_spapr_device_class_init(ObjectClass 
> *klass, void *data)
>  k->realize = spapr_vio_busdev_realize;
>  k->reset = spapr_vio_busdev_reset;
>  k->bus_type = TYPE_SPAPR_VIO_BUS;
> -k->props = spapr_vio_props;
>  }
>  
>  static const TypeInfo spapr_vio_type_info = {

Re: [Qemu-devel] [PATCH v2 0/3] HMP/snapshot changes - do not use ID anymore

2019-01-09 Thread Kevin Wolf

Am 09.01.2019 um 15:10 hat Max Reitz geschrieben:
> On 06.09.18 13:11, Daniel Henrique Barboza wrote:
> > changes in v2:
> > - removed the "RFC" marker;
> > - added a new patch (patch 2) that removes
> > bdrv_snapshot_delete_by_id_or_name from the code;
> > - made changes in patch 1 as suggested by Murilo;
> > - previous patch set link:
> > https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg04658.html
> > 
> > 
> > It is not uncommon to see bugs being opened by testers that attempt to
> > create VM snapshots using HMP. It turns out that "0" and "1" are quite
> > common snapshot names and they trigger a lot of bugs. I gave an example
> > in the commit message of patch 1, but to sum up here: QEMU treats the
> > input of savevm/loadvm/delvm sometimes as 'ID', sometimes as 'name'. It
> > is documented as such, but this can lead to strange situations.
> > 
> > Given that it is strange for an API to consider a parameter to be 2 fields
> > at the same time, and inadvently treating them as one or the other, and
> > that removing the ID field is too drastic, my idea here is to keep the
> > ID field for internal control, but do not let the user set it.
> > 
> > I guess there's room for discussion about considering this change an API
> > change or not. It doesn't affect users of HMP and it doesn't affect Libvirt,
> > but simplifying the meaning of the parameters of savevm/loadvm/delvm.
> 
> (Yes, very late reply, I'm sorry...)
> 
> I do think it affects users of HMP, because right now you can delete
> snapshots with their ID, and after this series you cannot.

Can there be snapshots that can't be identified by a snapshot name, but
only by their ID?

> I think we had a short discussion about just disallowing numeric
> snapshot names.  How bad would that be?

It would be incompatible with existing images and result in a more
complex snapshot identifier resolution. Why would it be any better?

Kevin


signature.asc
Description: PGP signature

[Qemu-devel] [PATCH v3 2/5] virtio-pmem: Add virtio pmem driver

2019-01-09 Thread Pankaj Gupta

This patch adds virtio-pmem driver for KVM guest.

Guest reads the persistent memory range information from
Qemu over VIRTIO and registers it on nvdimm_bus. It also
creates a nd_region object with the persistent memory
range information so that existing 'nvdimm/pmem' driver
can reserve this into system memory map. This way
'virtio-pmem' driver uses existing functionality of pmem
driver to register persistent memory compatible for DAX
capable filesystems.

This also provides function to perform guest flush over
VIRTIO from 'pmem' driver when userspace performs flush
on DAX memory range.

Signed-off-by: Pankaj Gupta 
---
 drivers/nvdimm/virtio_pmem.c |  84 ++
 drivers/virtio/Kconfig   |  10 
 drivers/virtio/Makefile  |   1 +
 drivers/virtio/pmem.c| 124 +++
 include/linux/virtio_pmem.h  |  60 +++
 include/uapi/linux/virtio_ids.h  |   1 +
 include/uapi/linux/virtio_pmem.h |  10 
 7 files changed, 290 insertions(+)
 create mode 100644 drivers/nvdimm/virtio_pmem.c
 create mode 100644 drivers/virtio/pmem.c
 create mode 100644 include/linux/virtio_pmem.h
 create mode 100644 include/uapi/linux/virtio_pmem.h

diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c
new file mode 100644
index 000..2a1b1ba
--- /dev/null
+++ b/drivers/nvdimm/virtio_pmem.c
@@ -0,0 +1,84 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * virtio_pmem.c: Virtio pmem Driver
+ *
+ * Discovers persistent memory range information
+ * from host and provides a virtio based flushing
+ * interface.
+ */
+#include 
+#include "nd.h"
+
+ /* The interrupt handler */
+void host_ack(struct virtqueue *vq)
+{
+   unsigned int len;
+   unsigned long flags;
+   struct virtio_pmem_request *req, *req_buf;
+   struct virtio_pmem *vpmem = vq->vdev->priv;
+
+   spin_lock_irqsave(>pmem_lock, flags);
+   while ((req = virtqueue_get_buf(vq, )) != NULL) {
+   req->done = true;
+   wake_up(>host_acked);
+
+   if (!list_empty(>req_list)) {
+   req_buf = list_first_entry(>req_list,
+   struct virtio_pmem_request, list);
+   list_del(>req_list);
+   req_buf->wq_buf_avail = true;
+   wake_up(_buf->wq_buf);
+   }
+   }
+   spin_unlock_irqrestore(>pmem_lock, flags);
+}
+EXPORT_SYMBOL_GPL(host_ack);
+
+ /* The request submission function */
+int virtio_pmem_flush(struct nd_region *nd_region)
+{
+   int err;
+   unsigned long flags;
+   struct scatterlist *sgs[2], sg, ret;
+   struct virtio_device *vdev = nd_region->provider_data;
+   struct virtio_pmem *vpmem = vdev->priv;
+   struct virtio_pmem_request *req;
+
+   might_sleep();
+   req = kmalloc(sizeof(*req), GFP_KERNEL);
+   if (!req)
+   return -ENOMEM;
+
+   req->done = req->wq_buf_avail = false;
+   strcpy(req->name, "FLUSH");
+   init_waitqueue_head(>host_acked);
+   init_waitqueue_head(>wq_buf);
+   sg_init_one(, req->name, strlen(req->name));
+   sgs[0] = 
+   sg_init_one(, >ret, sizeof(req->ret));
+   sgs[1] = 
+
+   spin_lock_irqsave(>pmem_lock, flags);
+   err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);
+   if (err) {
+   dev_err(>dev, "failed to send command to virtio pmem 
device\n");
+
+   list_add_tail(>req_list, >list);
+   spin_unlock_irqrestore(>pmem_lock, flags);
+
+   /* When host has read buffer, this completes via host_ack */
+   wait_event(req->wq_buf, req->wq_buf_avail);
+   spin_lock_irqsave(>pmem_lock, flags);
+   }
+   virtqueue_kick(vpmem->req_vq);
+   spin_unlock_irqrestore(>pmem_lock, flags);
+
+   /* When host has read buffer, this completes via host_ack */
+   wait_event(req->host_acked, req->done);
+   err = req->ret;
+   kfree(req);
+
+   return err;
+};
+EXPORT_SYMBOL_GPL(virtio_pmem_flush);
+MODULE_LICENSE("GPL");
diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
index 3589764..9f634a2 100644
--- a/drivers/virtio/Kconfig
+++ b/drivers/virtio/Kconfig
@@ -42,6 +42,16 @@ config VIRTIO_PCI_LEGACY
 
  If unsure, say Y.
 
+config VIRTIO_PMEM
+   tristate "Support for virtio pmem driver"
+   depends on VIRTIO
+   depends on LIBNVDIMM
+   help
+   This driver provides support for virtio based flushing interface
+   for persistent memory range.
+
+   If unsure, say M.
+
 config VIRTIO_BALLOON
tristate "Virtio balloon driver"
depends on VIRTIO
diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
index 3a2b5c5..143ce91 100644
--- a/drivers/virtio/Makefile
+++ b/drivers/virtio/Makefile
@@ -6,3 +6,4 @@ virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o

[Qemu-devel] [PATCH v3 1/5] libnvdimm: nd_region flush callback support

2019-01-09 Thread Pankaj Gupta

This patch adds functionality to perform flush from guest
to host over VIRTIO. We are registering a callback based
on 'nd_region' type. virtio_pmem driver requires this special
flush function. For rest of the region types we are registering
existing flush function. Report error returned by host fsync
failure to userspace.

This also handles asynchronous flush requests from the block layer
by creating a child bio and chaining it with parent bio.

Signed-off-by: Pankaj Gupta 
---
 drivers/acpi/nfit/core.c |  4 ++--
 drivers/nvdimm/claim.c   |  6 --
 drivers/nvdimm/nd.h  |  1 +
 drivers/nvdimm/pmem.c| 12 
 drivers/nvdimm/region_devs.c | 38 --
 include/linux/libnvdimm.h|  5 -
 6 files changed, 55 insertions(+), 11 deletions(-)

diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index b072cfc..f154852 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -2234,7 +2234,7 @@ static void write_blk_ctl(struct nfit_blk *nfit_blk, 
unsigned int bw,
offset = to_interleave_offset(offset, mmio);
 
writeq(cmd, mmio->addr.base + offset);
-   nvdimm_flush(nfit_blk->nd_region);
+   nvdimm_flush(nfit_blk->nd_region, NULL, false);
 
if (nfit_blk->dimm_flags & NFIT_BLK_DCR_LATCH)
readq(mmio->addr.base + offset);
@@ -2283,7 +2283,7 @@ static int acpi_nfit_blk_single_io(struct nfit_blk 
*nfit_blk,
}
 
if (rw)
-   nvdimm_flush(nfit_blk->nd_region);
+   nvdimm_flush(nfit_blk->nd_region, NULL, false);
 
rc = read_blk_stat(nfit_blk, lane) ? -EIO : 0;
return rc;
diff --git a/drivers/nvdimm/claim.c b/drivers/nvdimm/claim.c
index fb667bf..a1dfa06 100644
--- a/drivers/nvdimm/claim.c
+++ b/drivers/nvdimm/claim.c
@@ -263,7 +263,7 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns,
struct nd_namespace_io *nsio = to_nd_namespace_io(>dev);
unsigned int sz_align = ALIGN(size + (offset & (512 - 1)), 512);
sector_t sector = offset >> 9;
-   int rc = 0;
+   int rc = 0, ret = 0;
 
if (unlikely(!size))
return 0;
@@ -301,7 +301,9 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns,
}
 
memcpy_flushcache(nsio->addr + offset, buf, size);
-   nvdimm_flush(to_nd_region(ndns->dev.parent));
+   ret = nvdimm_flush(to_nd_region(ndns->dev.parent), NULL, false);
+   if (ret)
+   rc = ret;
 
return rc;
 }
diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index 98317e7..d53a2d1 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -160,6 +160,7 @@ struct nd_region {
struct nd_interleave_set *nd_set;
struct nd_percpu_lane __percpu *lane;
struct nd_mapping mapping[0];
+   int (*flush)(struct nd_region *nd_region);
 };
 
 struct nd_blk_region {
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 6071e29..5d6a4a1 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -192,6 +192,7 @@ static blk_status_t pmem_do_bvec(struct pmem_device *pmem, 
struct page *page,
 
 static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio)
 {
+   int ret = 0;
blk_status_t rc = 0;
bool do_acct;
unsigned long start;
@@ -201,7 +202,7 @@ static blk_qc_t pmem_make_request(struct request_queue *q, 
struct bio *bio)
struct nd_region *nd_region = to_region(pmem);
 
if (bio->bi_opf & REQ_PREFLUSH)
-   nvdimm_flush(nd_region);
+   ret = nvdimm_flush(nd_region, bio, true);
 
do_acct = nd_iostat_start(bio, );
bio_for_each_segment(bvec, bio, iter) {
@@ -216,7 +217,10 @@ static blk_qc_t pmem_make_request(struct request_queue *q, 
struct bio *bio)
nd_iostat_end(bio, start);
 
if (bio->bi_opf & REQ_FUA)
-   nvdimm_flush(nd_region);
+   ret = nvdimm_flush(nd_region, bio, true);
+
+   if (ret)
+   bio->bi_status = errno_to_blk_status(ret);
 
bio_endio(bio);
return BLK_QC_T_NONE;
@@ -528,14 +532,14 @@ static int nd_pmem_remove(struct device *dev)
sysfs_put(pmem->bb_state);
pmem->bb_state = NULL;
}
-   nvdimm_flush(to_nd_region(dev->parent));
+   nvdimm_flush(to_nd_region(dev->parent), NULL, false);
 
return 0;
 }
 
 static void nd_pmem_shutdown(struct device *dev)
 {
-   nvdimm_flush(to_nd_region(dev->parent));
+   nvdimm_flush(to_nd_region(dev->parent), NULL, false);
 }
 
 static void nd_pmem_notify(struct device *dev, enum nvdimm_event event)
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index fa37afc..5508727 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -290,7 +290,9 @@ static ssize_t deep_flush_store(struct device *dev, struct 
device_attribute *att
return rc;
if

1 2 3 4 >

1 - 100 of 369 matches

Mail list logo