date:20171206

Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication

2017-12-06 Thread Avi Cohen (A)

There is already a  virtio mechanism in which 2 VMs assigned a virtio device , 
are communicating via a veth pair  in the host .
KVM just passes  a pointer of the page of the writer VM to the reader VM - 
resulting  in  excellent performance (no vSwitch in the middle)
**Question**:  What is the advantage of vhost-pci compared to this ?
Best Regards
Avi

> -Original Message-
> From: Stefan Hajnoczi [mailto:stefa...@gmail.com]
> Sent: Thursday, 07 December, 2017 8:31 AM
> To: Wei Wang
> Cc: Stefan Hajnoczi; virtio-...@lists.oasis-open.org; m...@redhat.com; Yang,
> Zhiyong; jan.kis...@siemens.com; jasow...@redhat.com; Avi Cohen (A);
> qemu-devel@nongnu.org; marcandre.lur...@redhat.com;
> pbonz...@redhat.com
> Subject: Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM
> communication
> 
> On Thu, Dec 7, 2017 at 3:57 AM, Wei Wang  wrote:
> > On 12/07/2017 12:27 AM, Stefan Hajnoczi wrote:
> >>
> >> On Wed, Dec 6, 2017 at 4:09 PM, Wang, Wei W 
> wrote:
> >>>
> >>> On Wednesday, December 6, 2017 9:50 PM, Stefan Hajnoczi wrote:
> 
>  On Tue, Dec 05, 2017 at 11:33:09AM +0800, Wei Wang wrote:
> >
> > Vhost-pci is a point-to-point based inter-VM communication solution.
> > This patch series implements the vhost-pci-net device setup and
> > emulation. The device is implemented as a virtio device, and it is
> > set up via the vhost-user protocol to get the neessary info (e.g
> > the memory info of the remote VM, vring info).
> >
> > Currently, only the fundamental functions are implemented. More
> > features, such as MQ and live migration, will be updated in the future.
> >
> > The DPDK PMD of vhost-pci has been posted to the dpdk mailinglist here:
> > http://dpdk.org/ml/archives/dev/2017-November/082615.html
> 
>  I have asked questions about the scope of this feature.  In
>  particular, I think it's best to support all device types rather
>  than just virtio-net.  Here is a design document that shows how
>  this can be achieved.
> 
>  What I'm proposing is different from the current approach:
>  1. It's a PCI adapter (see below for justification) 2. The
>  vhost-user protocol is exposed by the device (not handled 100% in
>  QEMU).  Ultimately I think your approach would also need to do this.
> 
>  I'm not implementing this and not asking you to implement it.
>  Let's just use this for discussion so we can figure out what the
>  final vhost-pci will look like.
> 
>  Please let me know what you think, Wei, Michael, and others.
> 
> >>> Thanks for sharing the thoughts. If I understand it correctly, the
> >>> key difference is that this approach tries to relay every vhost-user
> >>> msg to the guest. I'm not sure about the benefits of doing this.
> >>> To make data plane (i.e. driver to send/receive packets) work, I
> >>> think, mostly, the memory info and vring info are enough. Other
> >>> things like callfd, kickfd don't need to be sent to the guest, they
> >>> are needed by QEMU only for the eventfd and irqfd setup.
> >>
> >> Handling the vhost-user protocol inside QEMU and exposing a different
> >> interface to the guest makes the interface device-specific.  This
> >> will cause extra work to support new devices (vhost-user-scsi,
> >> vhost-user-blk).  It also makes development harder because you might
> >> have to learn 3 separate specifications to debug the system (virtio,
> >> vhost-user, vhost-pci-net).
> >>
> >> If vhost-user is mapped to a PCI device then these issues are solved.
> >
> >
> > I intend to have a different opinion about this:
> >
> > 1) Even relaying the msgs to the guest, QEMU still need to handle the
> > msg first, for example, it needs to decode the msg to see if it is the
> > ones (e.g. SET_MEM_TABLE, SET_VRING_KICK, SET_VRING_CALL) that should
> > be used for the device setup (e.g. mmap the memory given via
> > SET_MEM_TABLE). In this case, we will be likely to have 2 slave
> > handlers - one in the guest, another in QEMU device.
> 
> In theory the vhost-pci PCI adapter could decide not to relay certain 
> messages.
> As explained in the document, I think it's better to relay everything because
> some messages that only carry an fd still have a meaning.  They are a signal
> that the master has entered a new state.
> 
> The approach in this patch series doesn't really solve the 2 handler problem, 
> it
> still needs to notify the guest when certain vhost-user messages are received
> from the master.  The difference is just that it's non-trivial in this patch 
> series
> because each message is handled on a case-by-case basis and has a custom
> interface (does not simply relay a vhost-user protocol message).
> 
> A 1:1 model is simple and consistent.  I think it will avoid bugs and design
> mistakes.
> 
> > 2) If people already understand the vhost-user protocol, it would be
> > natural for them to

[Qemu-devel] .qcow file recovery

2017-12-06 Thread RR via Qemu-devel


Hi,

A .qcow file was deleted by mistake. No recovery or backup is available. 
Hard disk was plugged out from the NAS after half a hour to prevent 
Synology OS operations writing over desallocated stockage. The file 
system on the virtual disk was ntfs. Virtualisation OS is Proxmox.


Ease Us Data Recovery didn't help much. We need to get the virtual disk 
file back and up.


Do you know somebody who knows somebody who can deal with this issue ?

Please contact me at rrazmkhah at ltpsn.org

I look forward to receiving some cost estimates.

Best regards,

Remi Razmkhah

--
+33 6 81 96 65 45
Service Informatique
Lycée Technique Privé Saint-Nicolas
Paris 06

Re: [Qemu-devel] [PATCH v4 0/2] check VirtiQueue Vring objects

2017-12-06 Thread P J P

+-- On Thu, 30 Nov 2017, P J P wrote --+
| +-- On Thu, 30 Nov 2017, Stefan Hajnoczi wrote --+
| | Michael is the virtio maintainer.  I have added him to this email
| | thread so the patch series can be merged.

  -> https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg05473.html

@mst: this qtest is not pulled in it seems.

Thank you.
--
Prasad J Pandit / Red Hat Product Security Team
47AF CE69 3A90 54AA 9045 1053 DD13 3D32 FE5B 041F

[Qemu-devel] [PATCH] hw/input/hid: Add support for several keys.

2017-12-06 Thread Tao Wu via Qemu-devel

Add support for these keys: audiomute volumedown volumeup power.
Tested with "sendkey" command in monitor and verify the behavior
in guest OS.

Signed-off-by: Tao Wu 
---
 hw/input/hid.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/input/hid.c b/hw/input/hid.c
index 0d049ff61c..aa4fb826fd 100644
--- a/hw/input/hid.c
+++ b/hw/input/hid.c
@@ -57,14 +57,14 @@ static const uint8_t hid_usage_keys[0x100] = {
 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
 0x00, 0x00, 0x00, 0x00, 0x58, 0xe4, 0x00, 0x00,
-0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
-0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
-0x00, 0x00, 0x00, 0x00, 0x00, 0x54, 0x00, 0x46,
+0x7f, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x81, 0x00,
+0x80, 0x00, 0x00, 0x00, 0x00, 0x54, 0x00, 0x46,
 0xe6, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
 0x00, 0x00, 0x00, 0x00, 0x00, 0x48, 0x48, 0x4a,
 0x52, 0x4b, 0x00, 0x50, 0x00, 0x4f, 0x00, 0x4d,
 0x51, 0x4e, 0x49, 0x4c, 0x00, 0x00, 0x00, 0x00,
-0x00, 0x00, 0x00, 0xe3, 0xe7, 0x65, 0x00, 0x00,
+0x00, 0x00, 0x00, 0xe3, 0xe7, 0x65, 0x66, 0x00,
 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
-- 
2.15.1.424.g9478a66081-goog

Re: [Qemu-devel] [PATCH] vhost: fix crash on virtio_error while device stop

2017-12-06 Thread Ilya Maximets

On 06.12.2017 19:45, Michael S. Tsirkin wrote:
> On Wed, Dec 06, 2017 at 04:06:18PM +0300, Ilya Maximets wrote:
>> In case virtio error occured after vhost_dev_close(), qemu will crash
>> in nested cleanup while checking IOMMU flag because dev->vdev already
>> set to zero and resources are already freed.
>>
>> Example:
>>
>> Program received signal SIGSEGV, Segmentation fault.
>> vhost_virtqueue_stop at hw/virtio/vhost.c:1155
>>
>> #0  vhost_virtqueue_stop at hw/virtio/vhost.c:1155
>> #1  vhost_dev_stop at hw/virtio/vhost.c:1594
>> #2  vhost_net_stop_one at hw/net/vhost_net.c:289
>> #3  vhost_net_stop at hw/net/vhost_net.c:368
>>
>> Nested call to vhost_net_stop(). First time was at #14.
>>
>> #4  virtio_net_vhost_status at hw/net/virtio-net.c:180
>> #5  virtio_net_set_status (status=79) at hw/net/virtio-net.c:254
>> #6  virtio_set_status at hw/virtio/virtio.c:1146
>> #7  virtio_error at hw/virtio/virtio.c:2455
>>
>> virtqueue_get_head() failed here.
>>
>> #8  virtqueue_get_head at hw/virtio/virtio.c:543
>> #9  virtqueue_drop_all at hw/virtio/virtio.c:984
>> #10 virtio_net_drop_tx_queue_data at hw/net/virtio-net.c:240
>> #11 virtio_bus_set_host_notifier at hw/virtio/virtio-bus.c:297
>> #12 vhost_dev_disable_notifiers at hw/virtio/vhost.c:1431
>>
>> vhost_dev_stop() was executed here. dev->vdev == NULL now.
>>
>> #13 vhost_net_stop_one at hw/net/vhost_net.c:290
>> #14 vhost_net_stop at hw/net/vhost_net.c:368
>> #15 virtio_net_vhost_status at hw/net/virtio-net.c:180
>> #16 virtio_net_set_status (status=15) at hw/net/virtio-net.c:254
>> #17 qmp_set_link ("hostnet0", up=false) at net/net.c:1430
>> #18 chr_closed_bh at net/vhost-user.c:214
>> #19 aio_bh_call at util/async.c:90
>> #20 aio_bh_poll at util/async.c:118
>> #21 aio_dispatch at util/aio-posix.c:429
>> #22 aio_ctx_dispatch at util/async.c:261
>> #23 g_main_context_dispatch
>> #24 glib_pollfds_poll at util/main-loop.c:213
>> #25 os_host_main_loop_wait at util/main-loop.c:261
>> #26 main_loop_wait at util/main-loop.c:515
>> #27 main_loop () at vl.c:1917
>> #28 main at vl.c:4795
>>
>> Above backtrace captured from qemu crash on vhost disconnect while
>> virtio driver in guest was in failed state.
>>
>> We can just add checking for 'vdev' in 'vhost_dev_has_iommu()' but
>> it will assert further trying to free already freed ioeventfds. The
>> real problem is that we're allowing nested calls to 'vhost_net_stop'.
>>
>> This patch is aimed to forbid nested calls to 'vhost_net_stop' to avoid
>> any possible double frees and segmentation faults doue to using of
>> already freed resources by setting 'vhost_started' flag to zero prior
>> to 'vhost_net_stop' call.
>>
>> Signed-off-by: Ilya Maximets 
>> ---
>>
>> This issue was already addressed more than a year ago by the following
>> patch: https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg06732.html
>> but it was dropped without review due to not yet implemented re-connection
>> of vhost-user. Re-connection implementation lately fixed most of the
>> nested calls, but few of them still exists. For example, above backtrace
>> captured after 'virtqueue_get_head()' failure on vhost-user disconnection.
>>
>>  hw/net/virtio-net.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>> index 38674b0..4d95a18 100644
>> --- a/hw/net/virtio-net.c
>> +++ b/hw/net/virtio-net.c
>> @@ -177,8 +177,8 @@ static void virtio_net_vhost_status(VirtIONet *n, 
>> uint8_t status)
>>  n->vhost_started = 0;
>>  }
>>  } else {
>> -vhost_net_stop(vdev, n->nic->ncs, queues);
>>  n->vhost_started = 0;
>> +vhost_net_stop(vdev, n->nic->ncs, queues);
>>  }
>>  }
> 
> Well the wider context is
> 
> 
> n->vhost_started = 1;
> r = vhost_net_start(vdev, n->nic->ncs, queues);
> if (r < 0) {
> error_report("unable to start vhost net: %d: "
>  "falling back on userspace virtio", -r);
> n->vhost_started = 0;
> }
> } else {
> vhost_net_stop(vdev, n->nic->ncs, queues);
> n->vhost_started = 0;
> 
> So we set it to 1 before start, we should clear after stop.

OK. I agree that clearing after is a bit safer. But in this case we need
a separate flag or other way to detect that we're already inside the
'vhost_net_stop()'.

What do you think about that old patch:
https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg06732.html  ?

It implements the same thing but introduces additional flag. It even could
be still applicable.

Re: [Qemu-devel] [qemu-s390x] [RFC PATCH v2 0/3] tests for CCW IDA

2017-12-06 Thread Thomas Huth

On 08.11.2017 17:54, Halil Pasic wrote:
> I've keept the title althogh the scope shifted a bit: it's
> more about introducing ccw-testdev than about IDA. The goal
> is to facilitate testing the virtual channel subsystem
> implementation, and the ccw interpretation.
> 
> The first patch is the interesting one. See it's cover letter
> for details. The RFC is about discussing some technical issues
> with this patch.
> 
> The other two patches are an out of source kernel module which
> is basically only there so you can try out the first patch. The
> tests there should probably be ported to something else. I don't
> know what: maybe kvm-unit-tests, maybe qtest+libqos, or maybe some
> bios based test image. We still have to figure out that. 

I think both, kvm-unit-tests or qtest+libqos would be good candidates.
Please don't invent a new bios base test image, since kvm-unit-tests
should be very similar already and we really don't need to duplicate
work here.

Anyway, you'd need to add some CSS infracture there first (in both
kvm-unit-tests and the qtest environments), so it's likely a similar
amount of work. qtest has the advantage that it gets checked
automatically during "make check" each time, so I'd have a weak
preference for that one.

 Thomas

Re: [Qemu-devel] [RFC PATCH v2 1/3] s390x/ccs: add ccw-testdev emulated device

2017-12-06 Thread Thomas Huth

 Hi Halil,

just a high-level review since I'm not a CSS expert...

On 08.11.2017 17:54, Halil Pasic wrote:
[...]
> I'm not really happy with the side effects of moving it to hw/misc, which
> ain't s390x specific.

Sorry, I'm missing the context - why can't this go into hw/s390x/ ?

> I've pretty much bounced off the build system, so
> I would really appreciate some help here. Currently you have to say you
> want it when you do make or it won't get compiled into your qemu. IMHO
> this device only makes sense for testing and should not be rutinely
> shipped in production builds. That is why I did not touch
> default-configs/s390x-softmmu.mak.

You could at least add a CONFIG_CCW_TESTDEV=n there, but I think the
normal QEMU policy is to enable everything by default to avoid that code
is bit-rotting, so I think "=y" is also OK there (distros can then still
disable it in downstream if they want).

> I think, I have the most problematic places marked with a  TODO
> comment in the code.
> 
> Happy reviewing and looking forward to your comments.
> ---
>  hw/misc/Makefile.objs |   1 +
>  hw/misc/ccw-testdev.c | 284 
> ++
>  hw/misc/ccw-testdev.h |  18 
>  3 files changed, 303 insertions(+)
>  create mode 100644 hw/misc/ccw-testdev.c
>  create mode 100644 hw/misc/ccw-testdev.h
> 
> diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
> index 19202d90cf..b41314d096 100644
> --- a/hw/misc/Makefile.objs
> +++ b/hw/misc/Makefile.objs
> @@ -61,3 +61,4 @@ obj-$(CONFIG_AUX) += auxbus.o
>  obj-$(CONFIG_ASPEED_SOC) += aspeed_scu.o aspeed_sdmc.o
>  obj-y += mmio_interface.o
>  obj-$(CONFIG_MSF2) += msf2-sysreg.o
> +obj-$(CONFIG_CCW_TESTDEV) += ccw-testdev.o
> diff --git a/hw/misc/ccw-testdev.c b/hw/misc/ccw-testdev.c
> new file mode 100644
> index 00..39cf46e90d
> --- /dev/null
> +++ b/hw/misc/ccw-testdev.c
> @@ -0,0 +1,284 @@

Please add a short description of the device in a comment here.

And don't you also want to add some license statement and/or author
information here?

> +#include "qemu/osdep.h"
> +#include "qapi/error.h"
> +#include "qemu/module.h"
> +#include "ccw-testdev.h"
> +#include "hw/s390x/css.h"
> +#include "hw/s390x/css-bridge.h"
> +#include "hw/s390x/3270-ccw.h"
> +#include "exec/address-spaces.h"
> +#include "hw/s390x/s390-virtio-hcall.h"
> +#include 
> +
> +typedef struct CcwTestDevDevice {
> +CcwDevice parent_obj;
> +uint16_t cu_type;
> +uint8_t chpid_type;
> +uint32_t op_mode;
> +bool op_mode_locked;
> +struct {
> +uint32_t ring[4];
> +unsigned int next;
> +} fib;
> +} CcwTestDevDevice;
> +
> +typedef struct CcwTestDevClass {
> +CCWDeviceClass parent_class;
> +DeviceRealize parent_realize;
> +} CcwTestDevClass;
> +
> +#define TYPE_CCW_TESTDEV "ccw-testdev"
> +
> +#define CCW_TESTDEV(obj) \
> + OBJECT_CHECK(CcwTestDevDevice, (obj), TYPE_CCW_TESTDEV)
> +#define CCW_TESTDEV_CLASS(klass) \
> + OBJECT_CLASS_CHECK(CcwTestDevClass, (klass), TYPE_CCW_TESTDEV)
> +#define CCW_TESTDEV_GET_CLASS(obj) \
> + OBJECT_GET_CLASS(CcwTestDevClass, (obj), TYPE_CCW_TESTDEV)
> +
> +typedef int (*ccw_cb_t)(SubchDev *, CCW1);
> +static ccw_cb_t get_ccw_cb(CcwTestDevOpMode op_mode);
> +
> +/* TODO This is the in-band set mode. We may want to get rid of it. */
> +static int set_mode_ccw(SubchDev *sch)
> +{
> +CcwTestDevDevice *d = sch->driver_data;
> +const char pattern[] = CCW_TST_SET_MODE_INCANTATION;
> +char buf[sizeof(pattern)];
> +int ret;
> +uint32_t tmp;
> +
> +if (d->op_mode_locked) {
> +return -EINVAL;
> +}
> +
> +ret = ccw_dstream_read(>cds, buf);
> +if (ret) {
> +return ret;
> +}
> +ret = strncmp(buf, pattern, sizeof(pattern));
> +if (ret) {
> +return 0; /* ignore malformed request -- maybe fuzzing */
> +}
> +ret = ccw_dstream_read(>cds, tmp);
> +if (ret) {
> +return ret;
> +}
> +be32_to_cpus();
> +if (tmp >= OP_MODE_MAX) {
> +return -EINVAL;
> +}
> +d->op_mode = tmp;
> +sch->ccw_cb = get_ccw_cb(d->op_mode);
> +return ret;
> +}
> +
> +

Please remove one empty line above.

> +static unsigned int abs_to_ring(unsigned int i)

IMHO a short comment above that function would be nice. If I just read
"abs_to_ring(unsigned int i)" I have a hard time to figure out what this
means.

> +{
> +return i & 0x3;
> +}
> +
> +static int  ccw_testdev_write_fib(SubchDev *sch)
> +{
> +CcwTestDevDevice *d = sch->driver_data;
> +bool is_fib = true;
> +uint32_t tmp;
> +int ret = 0;
> +
> +d->fib.next = 0;
> +while (ccw_dstream_avail(>cds) > 0) {
> +ret = ccw_dstream_read(>cds, tmp);
> +if (ret) {
> +error(0, -ret, "fib");

Where does this error() function come from? I failed to spot its location...

> +break;
> +}
> +d->fib.ring[abs_to_ring(d->fib.next)] = cpu_to_be32(tmp);
> +if (d->fib.next >

Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication

2017-12-06 Thread Stefan Hajnoczi

On Thu, Dec 7, 2017 at 3:57 AM, Wei Wang  wrote:
> On 12/07/2017 12:27 AM, Stefan Hajnoczi wrote:
>>
>> On Wed, Dec 6, 2017 at 4:09 PM, Wang, Wei W  wrote:
>>>
>>> On Wednesday, December 6, 2017 9:50 PM, Stefan Hajnoczi wrote:

 On Tue, Dec 05, 2017 at 11:33:09AM +0800, Wei Wang wrote:
>
> Vhost-pci is a point-to-point based inter-VM communication solution.
> This patch series implements the vhost-pci-net device setup and
> emulation. The device is implemented as a virtio device, and it is set
> up via the vhost-user protocol to get the neessary info (e.g the
> memory info of the remote VM, vring info).
>
> Currently, only the fundamental functions are implemented. More
> features, such as MQ and live migration, will be updated in the future.
>
> The DPDK PMD of vhost-pci has been posted to the dpdk mailinglist here:
> http://dpdk.org/ml/archives/dev/2017-November/082615.html

 I have asked questions about the scope of this feature.  In particular,
 I think
 it's best to support all device types rather than just virtio-net.  Here
 is a
 design document that shows how this can be achieved.

 What I'm proposing is different from the current approach:
 1. It's a PCI adapter (see below for justification) 2. The vhost-user
 protocol is
 exposed by the device (not handled 100% in
 QEMU).  Ultimately I think your approach would also need to do this.

 I'm not implementing this and not asking you to implement it.  Let's
 just use
 this for discussion so we can figure out what the final vhost-pci will
 look like.

 Please let me know what you think, Wei, Michael, and others.

>>> Thanks for sharing the thoughts. If I understand it correctly, the key
>>> difference is that this approach tries to relay every vhost-user msg to the
>>> guest. I'm not sure about the benefits of doing this.
>>> To make data plane (i.e. driver to send/receive packets) work, I think,
>>> mostly, the memory info and vring info are enough. Other things like callfd,
>>> kickfd don't need to be sent to the guest, they are needed by QEMU only for
>>> the eventfd and irqfd setup.
>>
>> Handling the vhost-user protocol inside QEMU and exposing a different
>> interface to the guest makes the interface device-specific.  This will
>> cause extra work to support new devices (vhost-user-scsi,
>> vhost-user-blk).  It also makes development harder because you might
>> have to learn 3 separate specifications to debug the system (virtio,
>> vhost-user, vhost-pci-net).
>>
>> If vhost-user is mapped to a PCI device then these issues are solved.
>
>
> I intend to have a different opinion about this:
>
> 1) Even relaying the msgs to the guest, QEMU still need to handle the msg
> first, for example, it needs to decode the msg to see if it is the ones
> (e.g. SET_MEM_TABLE, SET_VRING_KICK, SET_VRING_CALL) that should be used for
> the device setup (e.g. mmap the memory given via SET_MEM_TABLE). In this
> case, we will be likely to have 2 slave handlers - one in the guest, another
> in QEMU device.

In theory the vhost-pci PCI adapter could decide not to relay certain
messages.  As explained in the document, I think it's better to relay
everything because some messages that only carry an fd still have a
meaning.  They are a signal that the master has entered a new state.

The approach in this patch series doesn't really solve the 2 handler
problem, it still needs to notify the guest when certain vhost-user
messages are received from the master.  The difference is just that
it's non-trivial in this patch series because each message is handled
on a case-by-case basis and has a custom interface (does not simply
relay a vhost-user protocol message).

A 1:1 model is simple and consistent.  I think it will avoid bugs and
design mistakes.

> 2) If people already understand the vhost-user protocol, it would be natural
> for them to understand the vhost-pci metadata - just the obtained memory and
> vring info are put to the metadata area (no new things).

This is debatable.  It's like saying if you understand QEMU
command-line options you will understand libvirt domain XML.  They map
to each other but how obvious that mapping is depends on the details.
I'm saying a 1:1 mapping (reusing the vhost-user protocol message
layout) is the cleanest option.

> Inspired from your sharing, how about the following:
> we can actually factor out a common vhost-pci layer, which handles all the
> features that are common to all the vhost-pci series of devices
> (vhost-pci-net, vhost-pci-blk,...)
> Coming to the implementation, we can have a VhostpciDeviceClass (similar to
> VirtioDeviceClass), the device realize sequence will be
> virtio_device_realize()-->vhost_pci_device_realize()-->vhost_pci_net_device_realize()

Why have individual device types (vhost-pci-net, vhost-pci-blk, etc)
instead of just a

Re: [Qemu-devel] [PATCH v2] hw/ide: Remove duplicated definitions from ahci_internal.h

2017-12-06 Thread Thomas Huth

On 06.12.2017 23:16, John Snow wrote:
> I tweaked this again, sorry:
> 
> The names need to stay public, but the wrappers to manipulate the
> objects can stay internal. Minor difference.
> 
> If that's okay, I'll just merge this in.
> OK?

Sure. Feel also free to replace my "Signed-off-by" with "Reported-by" in
that case if you like.

 Thomas

Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication

2017-12-06 Thread Wei Wang


On 12/07/2017 01:11 PM, Michael S. Tsirkin wrote:

On Thu, Dec 07, 2017 at 11:57:33AM +0800, Wei Wang wrote:

On 12/07/2017 12:27 AM, Stefan Hajnoczi wrote:

On Wed, Dec 6, 2017 at 4:09 PM, Wang, Wei W  wrote:

On Wednesday, December 6, 2017 9:50 PM, Stefan Hajnoczi wrote:

On Tue, Dec 05, 2017 at 11:33:09AM +0800, Wei Wang wrote:

Vhost-pci is a point-to-point based inter-VM communication solution.
This patch series implements the vhost-pci-net device setup and
emulation. The device is implemented as a virtio device, and it is set
up via the vhost-user protocol to get the neessary info (e.g the
memory info of the remote VM, vring info).

Currently, only the fundamental functions are implemented. More
features, such as MQ and live migration, will be updated in the future.

The DPDK PMD of vhost-pci has been posted to the dpdk mailinglist here:
http://dpdk.org/ml/archives/dev/2017-November/082615.html

I have asked questions about the scope of this feature.  In particular, I think
it's best to support all device types rather than just virtio-net.  Here is a
design document that shows how this can be achieved.

What I'm proposing is different from the current approach:
1. It's a PCI adapter (see below for justification) 2. The vhost-user protocol 
is
exposed by the device (not handled 100% in
 QEMU).  Ultimately I think your approach would also need to do this.

I'm not implementing this and not asking you to implement it.  Let's just use
this for discussion so we can figure out what the final vhost-pci will look 
like.

Please let me know what you think, Wei, Michael, and others.


Thanks for sharing the thoughts. If I understand it correctly, the key 
difference is that this approach tries to relay every vhost-user msg to the 
guest. I'm not sure about the benefits of doing this.
To make data plane (i.e. driver to send/receive packets) work, I think, mostly, 
the memory info and vring info are enough. Other things like callfd, kickfd 
don't need to be sent to the guest, they are needed by QEMU only for the 
eventfd and irqfd setup.

Handling the vhost-user protocol inside QEMU and exposing a different
interface to the guest makes the interface device-specific.  This will
cause extra work to support new devices (vhost-user-scsi,
vhost-user-blk).  It also makes development harder because you might
have to learn 3 separate specifications to debug the system (virtio,
vhost-user, vhost-pci-net).

If vhost-user is mapped to a PCI device then these issues are solved.

I intend to have a different opinion about this:

1) Even relaying the msgs to the guest, QEMU still need to handle the msg
first, for example, it needs to decode the msg to see if it is the ones
(e.g. SET_MEM_TABLE, SET_VRING_KICK, SET_VRING_CALL) that should be used for
the device setup (e.g. mmap the memory given via SET_MEM_TABLE). In this
case, we will be likely to have 2 slave handlers - one in the guest, another
in QEMU device.

2) If people already understand the vhost-user protocol, it would be natural
for them to understand the vhost-pci metadata - just the obtained memory and
vring info are put to the metadata area (no new things).

I see a bigger problem with passthrough. If qemu can't fully decode all
messages, it can not operate in a disconected mode - guest will have to
stop on disconnect until we re-connect a backend.



What do you mean by "passthrough" in this case? Why qemu can't fully 
decode all the messages (probably I haven't got the point)


Best,
Wei

Re: [Qemu-devel] [PATCHv2 5/5] sunhme: switch sunhme_receive() over to use net_crc32_le()

2017-12-06 Thread Mark Cave-Ayland


On 06/12/17 03:34, Philippe Mathieu-Daudé wrote:


Hi Mark,

On 12/05/2017 05:17 AM, Mark Cave-Ayland wrote:

Signed-off-by: Mark Cave-Ayland 
---
  hw/net/sunhme.c | 25 +
  1 file changed, 1 insertion(+), 24 deletions(-)

diff --git a/hw/net/sunhme.c b/hw/net/sunhme.c
index b1efa1b88d..df66e2630c 100644
--- a/hw/net/sunhme.c
+++ b/hw/net/sunhme.c
@@ -698,29 +698,6 @@ static inline void sunhme_set_rx_ring_nr(SunHMEState *s, 
int i)
  s->erxregs[HME_ERXI_RING >> 2] = ring;
  }
  
-#define POLYNOMIAL_LE 0xedb88320

-static uint32_t sunhme_crc32_le(const uint8_t *p, int len)
-{
-uint32_t crc;
-int carry, i, j;
-uint8_t b;
-
-crc = 0x;
-for (i = 0; i < len; i++) {
-b = *p++;
-for (j = 0; j < 8; j++) {
-carry = (crc & 0x1) ^ (b & 0x01);
-crc >>= 1;
-b >>= 1;
-if (carry) {
-crc = crc ^ POLYNOMIAL_LE;
-}
-}
-}
-
-return crc;
-}
-
  #define MIN_BUF_SIZE 60
  
  static ssize_t sunhme_receive(NetClientState *nc, const uint8_t *buf,

@@ -761,7 +738,7 @@ static ssize_t sunhme_receive(NetClientState *nc, const 
uint8_t *buf,
  trace_sunhme_rx_filter_bcast_match();
  } else if (s->macregs[HME_MACI_RXCFG >> 2] & HME_MAC_RXCFG_HENABLE) {
  /* Didn't match local address, check hash filter */
-int mcast_idx = sunhme_crc32_le(buf, 6) >> 26;


This could be:

int mcast_idx = compute_mcast_idx_le(buf);

With:

unsigned compute_mcast_idx_le(const uint8_t *ep)
{
 return net_crc32(ep, ETH_ALEN) >> 26;
}


It looks like Stefan is suggesting in his comments that the inline 
approach is better, so I'll try to keep that style for the moment.


I do like the use of ETH_ALEN though so I'll add that in to the next 
version.



Anyway:
Reviewed-by: Philippe Mathieu-Daudé 


+int mcast_idx = net_crc32_le(buf, 6) >> 26;
  if (!(s->macregs[(HME_MACI_HASHTAB0 >> 2) - (mcast_idx >> 4)] &
  (1 << (mcast_idx & 0xf {
  /* Didn't match hash filter */



ATB,

Mark.

Re: [Qemu-devel] [PATCHv2 4/5] eepro100: switch e100_compute_mcast_idx() over to use net_crc32()

2017-12-06 Thread Mark Cave-Ayland


On 05/12/17 15:13, Stefan Weil wrote:


Am 05.12.2017 um 09:17 schrieb Mark Cave-Ayland:

Signed-off-by: Mark Cave-Ayland 
---
  hw/net/eepro100.c | 19 +--
  1 file changed, 1 insertion(+), 18 deletions(-)

diff --git a/hw/net/eepro100.c b/hw/net/eepro100.c
index 1c0def555b..4fe94b7471 100644
--- a/hw/net/eepro100.c
+++ b/hw/net/eepro100.c
@@ -327,26 +327,9 @@ static const uint16_t eepro100_mdi_mask[] = {
  
  static E100PCIDeviceInfo *eepro100_get_class(EEPRO100State *s);
  
-/* From FreeBSD (locally modified). */

  static unsigned e100_compute_mcast_idx(const uint8_t *ep)
  {
-uint32_t crc;
-int carry, i, j;
-uint8_t b;
-
-crc = 0x;
-for (i = 0; i < 6; i++) {
-b = *ep++;
-for (j = 0; j < 8; j++) {
-carry = ((crc & 0x8000L) ? 1 : 0) ^ (b & 0x01);
-crc <<= 1;
-b >>= 1;
-if (carry) {
-crc = ((crc ^ POLYNOMIAL) | carry);
-}
-}
-}
-return (crc & BITS(7, 2)) >> 2;
+return (net_crc32(ep, 6) & BITS(7, 2)) >> 2;
  }
  
  /* Read a 16 bit control/status (CSR) register. */



What about eliminating the intermediate function e100_compute_mcast_idx (and 
function lnc_mchash, too)?
You did that for lnc_mchash, and I think that is cleaner and saves some lines 
of code.


Yes, I can do if you like. The reason I've left these as they are for 
the moment is that I don't have something readily available to test 
multicast for eepro100 post-conversion (my SPARC/PPC images cover 
pcnet/sunhme) but if you are happy the functionality is the same during 
review then I can go ahead and do it.


I don't really mind exactly how we do the conversion as long as we aim 
for consistency.



With or without that minor change:

Reviewed-by: Stefan Weil 

Regards
Stefan


ATB,

Mark.

Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication

2017-12-06 Thread Michael S. Tsirkin

On Thu, Dec 07, 2017 at 11:57:33AM +0800, Wei Wang wrote:
> On 12/07/2017 12:27 AM, Stefan Hajnoczi wrote:
> > On Wed, Dec 6, 2017 at 4:09 PM, Wang, Wei W  wrote:
> > > On Wednesday, December 6, 2017 9:50 PM, Stefan Hajnoczi wrote:
> > > > On Tue, Dec 05, 2017 at 11:33:09AM +0800, Wei Wang wrote:
> > > > > Vhost-pci is a point-to-point based inter-VM communication solution.
> > > > > This patch series implements the vhost-pci-net device setup and
> > > > > emulation. The device is implemented as a virtio device, and it is set
> > > > > up via the vhost-user protocol to get the neessary info (e.g the
> > > > > memory info of the remote VM, vring info).
> > > > > 
> > > > > Currently, only the fundamental functions are implemented. More
> > > > > features, such as MQ and live migration, will be updated in the 
> > > > > future.
> > > > > 
> > > > > The DPDK PMD of vhost-pci has been posted to the dpdk mailinglist 
> > > > > here:
> > > > > http://dpdk.org/ml/archives/dev/2017-November/082615.html
> > > > I have asked questions about the scope of this feature.  In particular, 
> > > > I think
> > > > it's best to support all device types rather than just virtio-net.  
> > > > Here is a
> > > > design document that shows how this can be achieved.
> > > > 
> > > > What I'm proposing is different from the current approach:
> > > > 1. It's a PCI adapter (see below for justification) 2. The vhost-user 
> > > > protocol is
> > > > exposed by the device (not handled 100% in
> > > > QEMU).  Ultimately I think your approach would also need to do this.
> > > > 
> > > > I'm not implementing this and not asking you to implement it.  Let's 
> > > > just use
> > > > this for discussion so we can figure out what the final vhost-pci will 
> > > > look like.
> > > > 
> > > > Please let me know what you think, Wei, Michael, and others.
> > > > 
> > > Thanks for sharing the thoughts. If I understand it correctly, the key 
> > > difference is that this approach tries to relay every vhost-user msg to 
> > > the guest. I'm not sure about the benefits of doing this.
> > > To make data plane (i.e. driver to send/receive packets) work, I think, 
> > > mostly, the memory info and vring info are enough. Other things like 
> > > callfd, kickfd don't need to be sent to the guest, they are needed by 
> > > QEMU only for the eventfd and irqfd setup.
> > Handling the vhost-user protocol inside QEMU and exposing a different
> > interface to the guest makes the interface device-specific.  This will
> > cause extra work to support new devices (vhost-user-scsi,
> > vhost-user-blk).  It also makes development harder because you might
> > have to learn 3 separate specifications to debug the system (virtio,
> > vhost-user, vhost-pci-net).
> > 
> > If vhost-user is mapped to a PCI device then these issues are solved.
> 
> I intend to have a different opinion about this:
> 
> 1) Even relaying the msgs to the guest, QEMU still need to handle the msg
> first, for example, it needs to decode the msg to see if it is the ones
> (e.g. SET_MEM_TABLE, SET_VRING_KICK, SET_VRING_CALL) that should be used for
> the device setup (e.g. mmap the memory given via SET_MEM_TABLE). In this
> case, we will be likely to have 2 slave handlers - one in the guest, another
> in QEMU device.
> 
> 2) If people already understand the vhost-user protocol, it would be natural
> for them to understand the vhost-pci metadata - just the obtained memory and
> vring info are put to the metadata area (no new things).

I see a bigger problem with passthrough. If qemu can't fully decode all
messages, it can not operate in a disconected mode - guest will have to
stop on disconnect until we re-connect a backend.

> 
> Inspired from your sharing, how about the following:
> we can actually factor out a common vhost-pci layer, which handles all the
> features that are common to all the vhost-pci series of devices
> (vhost-pci-net, vhost-pci-blk,...)
> Coming to the implementation, we can have a VhostpciDeviceClass (similar to
> VirtioDeviceClass), the device realize sequence will be 
> virtio_device_realize()-->vhost_pci_device_realize()-->vhost_pci_net_device_realize()
> 
> 
> 
> > 
> > > > vhost-pci is a PCI adapter instead of a virtio device to allow 
> > > > doorbells and
> > > > interrupts to be connected to the virtio device in the master VM in the 
> > > > most
> > > > efficient way possible.  This means the Vring call doorbell can be an
> > > > ioeventfd that signals an irqfd inside the host kernel without host 
> > > > userspace
> > > > involvement.  The Vring kick interrupt can be an irqfd that is 
> > > > signalled by the
> > > > master VM's virtqueue ioeventfd.
> > > > 
> > > 
> > > This looks the same as the implementation of inter-VM notification in v2:
> > > https://www.mail-archive.com/qemu-devel@nongnu.org/msg450005.html
> > > which is fig. 4 here: 
> > > https://github.com/wei-w-wang/vhost-pci-discussion/blob/master/vhost-pci-rfc2.0.pdf
> > > 
>

Re: [Qemu-devel] [PATCHv2 2/5] net: introduce net_crc32_le() function

2017-12-06 Thread Mark Cave-Ayland


On 05/12/17 14:31, Eric Blake wrote:


On 12/05/2017 02:17 AM, Mark Cave-Ayland wrote:

This provides a standard ethernet CRC32 little-endian implementation.

Signed-off-by: Mark Cave-Ayland 
---
  include/net/net.h |  2 ++
  net/net.c | 22 ++
  2 files changed, 24 insertions(+)


Reviewed-by: Eric Blake 


+if (carry) {
+crc = crc ^ POLYNOMIAL_LE;
+}


Worth writing as 'crc ^= POLYNOMIAL_LE;'?


Yes, works for me.


ATB,

Mark.

Re: [Qemu-devel] [PATCHv2 4/5] eepro100: switch e100_compute_mcast_idx() over to use net_crc32()

2017-12-06 Thread Mark Cave-Ayland


On 05/12/17 14:28, Eric Blake wrote:


On 12/05/2017 02:17 AM, Mark Cave-Ayland wrote:

Signed-off-by: Mark Cave-Ayland 
---
  hw/net/eepro100.c | 19 +--
  1 file changed, 1 insertion(+), 18 deletions(-)




-if (carry) {
-crc = ((crc ^ POLYNOMIAL) | carry);


How does this compile after 1/5 renames POLYNOMIAL to POLYNOMIAL_BE in
net.h?

/me looks

Oh, you have a redundant definition in the .c file, which is now a dead
define.  Patch 1 should be updated to remove the duplicate definitions,
and fix code to uniformly use POLYNOMIAL_BE.


Ah yes, I can fix that up on a v3.


But overall, I like what the series is doing.


Great stuff, in that case I'll fix it up based upon all the comments and 
continue. It has been lying around in a local branch for months now...


BTW one thing I did notice is that sungem.c calls zlib's crc32 function 
directly which doesn't seem right, so I'll probably add that into the 
next version too. Once this has been done, switching the new 
net_crc32()/net_crc32_le() functions over to use a LUT or zlib or 
something else as the underlying implementation should be trivial.



ATB,

Mark.

Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication

2017-12-06 Thread Wei Wang


On 12/07/2017 12:27 AM, Stefan Hajnoczi wrote:

On Wed, Dec 6, 2017 at 4:09 PM, Wang, Wei W  wrote:

On Wednesday, December 6, 2017 9:50 PM, Stefan Hajnoczi wrote:

On Tue, Dec 05, 2017 at 11:33:09AM +0800, Wei Wang wrote:

Vhost-pci is a point-to-point based inter-VM communication solution.
This patch series implements the vhost-pci-net device setup and
emulation. The device is implemented as a virtio device, and it is set
up via the vhost-user protocol to get the neessary info (e.g the
memory info of the remote VM, vring info).

Currently, only the fundamental functions are implemented. More
features, such as MQ and live migration, will be updated in the future.

The DPDK PMD of vhost-pci has been posted to the dpdk mailinglist here:
http://dpdk.org/ml/archives/dev/2017-November/082615.html

I have asked questions about the scope of this feature.  In particular, I think
it's best to support all device types rather than just virtio-net.  Here is a
design document that shows how this can be achieved.

What I'm proposing is different from the current approach:
1. It's a PCI adapter (see below for justification) 2. The vhost-user protocol 
is
exposed by the device (not handled 100% in
QEMU).  Ultimately I think your approach would also need to do this.

I'm not implementing this and not asking you to implement it.  Let's just use
this for discussion so we can figure out what the final vhost-pci will look 
like.

Please let me know what you think, Wei, Michael, and others.


Thanks for sharing the thoughts. If I understand it correctly, the key 
difference is that this approach tries to relay every vhost-user msg to the 
guest. I'm not sure about the benefits of doing this.
To make data plane (i.e. driver to send/receive packets) work, I think, mostly, 
the memory info and vring info are enough. Other things like callfd, kickfd 
don't need to be sent to the guest, they are needed by QEMU only for the 
eventfd and irqfd setup.

Handling the vhost-user protocol inside QEMU and exposing a different
interface to the guest makes the interface device-specific.  This will
cause extra work to support new devices (vhost-user-scsi,
vhost-user-blk).  It also makes development harder because you might
have to learn 3 separate specifications to debug the system (virtio,
vhost-user, vhost-pci-net).

If vhost-user is mapped to a PCI device then these issues are solved.


I intend to have a different opinion about this:

1) Even relaying the msgs to the guest, QEMU still need to handle the 
msg first, for example, it needs to decode the msg to see if it is the 
ones (e.g. SET_MEM_TABLE, SET_VRING_KICK, SET_VRING_CALL) that should be 
used for the device setup (e.g. mmap the memory given via 
SET_MEM_TABLE). In this case, we will be likely to have 2 slave handlers 
- one in the guest, another in QEMU device.


2) If people already understand the vhost-user protocol, it would be 
natural for them to understand the vhost-pci metadata - just the 
obtained memory and vring info are put to the metadata area (no new things).



Inspired from your sharing, how about the following:
we can actually factor out a common vhost-pci layer, which handles all 
the features that are common to all the vhost-pci series of devices 
(vhost-pci-net, vhost-pci-blk,...)
Coming to the implementation, we can have a VhostpciDeviceClass (similar 
to VirtioDeviceClass), the device realize sequence will be 
virtio_device_realize()-->vhost_pci_device_realize()-->vhost_pci_net_device_realize()







vhost-pci is a PCI adapter instead of a virtio device to allow doorbells and
interrupts to be connected to the virtio device in the master VM in the most
efficient way possible.  This means the Vring call doorbell can be an
ioeventfd that signals an irqfd inside the host kernel without host userspace
involvement.  The Vring kick interrupt can be an irqfd that is signalled by the
master VM's virtqueue ioeventfd.



This looks the same as the implementation of inter-VM notification in v2:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg450005.html
which is fig. 4 here: 
https://github.com/wei-w-wang/vhost-pci-discussion/blob/master/vhost-pci-rfc2.0.pdf

When the vhost-pci driver kicks its tx, the host signals the irqfd of 
virtio-net's rx. I think this has already bypassed the host userspace (thanks 
to the fast mmio implementation)

Yes, I think the irqfd <-> ioeventfd mapping is good.  Perhaps it even
makes sense to implement a special fused_irq_ioevent_fd in the host
kernel to bypass the need for a kernel thread to read the eventfd so
that an interrupt can be injected (i.e. to make the operation
synchronous).

Is the tx virtqueue in your inter-VM notification v2 series a real
virtqueue that gets used?  Or is it just a dummy virtqueue that you're
using for the ioeventfd doorbell?  It looks like vpnet_handle_vq() is
empty so it's really just a dummy.  The actual virtqueue is in the
vhost-user master guest memory.



Yes, that tx

Re: [Qemu-devel] [PATCH] input: free InputEvent when it can't be inserted into a full kdb queue

2017-12-06 Thread Tian Dianchen

Hi, Marc-André Lureau
Thank you for your comments.

Hi,Gerd Hoffmann,
If there is no other comments,please join this note when merge this patch .


2017-12-06 17:46 GMT+08:00 Marc-André Lureau :

> Hi
>
> On Wed, Dec 6, 2017 at 3:29 AM, 田殿臣  wrote:
> > From e8c03f405c2112428e79bb82064c7b7743d0cc86 Mon Sep 17 00:00:00 2001
> > From: Tian Dianchen 
> > Date: Tue, 5 Dec 2017 14:03:53 +0800
> > Subject: [PATCH] input: free InputEvent when it can't be inserted into a
> > full
> >  kdb queue
> >
> > When the kdb queue is full, the evt can't be placed in it, so it should
> > be released to free the memory.
> >
> > Impact: Without this limit vnc clients can exhaust host memory by keep
> > sending keyboard events when kdb queue is full.
>
> You may add "Leak introduced in commit fa18f36a461984eae50ab957e47ec7
> 8dae3c14fc"
>
> >
> > Reviewed-by: Zhang Chao 
> > Reviewed-by: Quan Xu 
> > Signed-off-by: Tian Dianchen 
>
> Reviewed-by: Marc-André Lureau 
>
>
> > ---
> >  ui/input.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/ui/input.c b/ui/input.c
> > index 3e2d324..e5b78aa 100644
> > --- a/ui/input.c
> > +++ b/ui/input.c
> > @@ -421,6 +421,8 @@ void qemu_input_event_send_key(QemuConsole *src,
> > KeyValue *key, bool down)
> >  } else if (queue_count < queue_limit) {
> >  qemu_input_queue_event(_queue, src, evt);
> >  qemu_input_queue_sync(_queue);
> > +} else {
> > +qapi_free_InputEvent(evt);
> >  }
> >  }
> >
> > --
> > 1.8.3.1
>
>
>
> --
> Marc-André Lureau
>

[Qemu-devel] [PATCH] ui: Add enabled field to egl_fb struct

2017-12-06 Thread Tina Zhang

Add a switch to enable/disable a egl_fb to make sure a egl_fb can only
be flushed when it's enabled.

For example, the cursor plane might be disabled by guest Apps on purpose.
With the "enabled" field, a cursor plane can be ignored when it's disabled by
guest Apps.

Against branch: work/intel-vgpu

Signed-off-by: Tina Zhang 
Cc: Gerd Hoffmann 
---
 hw/vfio/display.c|  5 +
 include/ui/egl-helpers.h |  1 +
 ui/egl-headless.c| 12 
 ui/gtk-egl.c | 11 ---
 4 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/hw/vfio/display.c b/hw/vfio/display.c
index 0366c02..bf1062f 100644
--- a/hw/vfio/display.c
+++ b/hw/vfio/display.c
@@ -182,6 +182,11 @@ static void vfio_display_dmabuf_update(void *opaque)
cursor->hot_y,
cursor->pos_x,
cursor->pos_y);
+} else {
+/* Cursor plane is disabled */
+dpy_gl_cursor_position(vdev->display_con,
+   false, false,
+   0, 0, 0, 0);
 }
 
 dpy_gl_update(vdev->display_con, 0, 0,
diff --git a/include/ui/egl-helpers.h b/include/ui/egl-helpers.h
index 071bedc..1328489 100644
--- a/include/ui/egl-helpers.h
+++ b/include/ui/egl-helpers.h
@@ -14,6 +14,7 @@ typedef struct egl_fb {
 GLuint texture;
 GLuint framebuffer;
 bool delete_texture;
+bool enabled;
 } egl_fb;
 
 void egl_log_error(const char *func, const char *call);
diff --git a/ui/egl-headless.c b/ui/egl-headless.c
index 299af01..2bd6e9f 100644
--- a/ui/egl-headless.c
+++ b/ui/egl-headless.c
@@ -103,9 +103,13 @@ static void egl_cursor_position(DisplayChangeListener *dcl,
 uint32_t pos_x, uint32_t pos_y)
 {
 egl_dpy *edpy = container_of(dcl, egl_dpy, dcl);
-
-edpy->pos_x = pos_x;
-edpy->pos_y = pos_y;
+if (!have_pos) {
+edpy->cursor_fb.enabled = false;
+} else {
+edpy->cursor_fb.enabled = true ;
+edpy->pos_x = pos_x;
+edpy->pos_y = pos_y;
+}
 }
 
 static void egl_release_dmabuf(DisplayChangeListener *dcl,
@@ -127,7 +131,7 @@ static void egl_scanout_flush(DisplayChangeListener *dcl,
 assert(surface_height(edpy->ds) == edpy->guest_fb.height);
 assert(surface_format(edpy->ds) == PIXMAN_x8r8g8b8);
 
-if (edpy->cursor_fb.texture) {
+if (edpy->cursor_fb.texture && edpy->cursor_fb.enabled) {
 /* have cursor -> render using textures */
 egl_texture_blit(edpy->gls, >blit_fb, >guest_fb,
  !edpy->y_0_top);
diff --git a/ui/gtk-egl.c b/ui/gtk-egl.c
index cafd95d..bddd733 100644
--- a/ui/gtk-egl.c
+++ b/ui/gtk-egl.c
@@ -249,8 +249,13 @@ void gd_egl_cursor_position(DisplayChangeListener *dcl,
 {
 VirtualConsole *vc = container_of(dcl, VirtualConsole, gfx.dcl);
 
-vc->gfx.cursor_x = pos_x;
-vc->gfx.cursor_y = pos_y;
+if (!have_pos) {
+vc->gfx.cursor_fb.enabled = false;
+} else {
+vc->gfx.cursor_fb.enabled = true;
+vc->gfx.cursor_x = pos_x;
+vc->gfx.cursor_y = pos_y;
+}
 }
 
 void gd_egl_release_dmabuf(DisplayChangeListener *dcl,
@@ -287,7 +292,7 @@ void gd_egl_scanout_flush(DisplayChangeListener *dcl,
 egl_fb_setup_default(>gfx.win_fb, ww, wh);
 egl_texture_blit(vc->gfx.gls, >gfx.win_fb, >gfx.guest_fb,
  vc->gfx.y0_top);
-if (vc->gfx.cursor_fb.texture) {
+if (vc->gfx.cursor_fb.texture && vc->gfx.cursor_fb.enabled) {
 egl_texture_blend(vc->gfx.gls, >gfx.win_fb, >gfx.cursor_fb,
   vc->gfx.y0_top,
   vc->gfx.cursor_x, vc->gfx.cursor_y);
-- 
2.7.4

Re: [Qemu-devel] About the light VM solution!

2017-12-06 Thread Gonglei (Arei)

> -Original Message-
> From: Stefan Hajnoczi [mailto:stefa...@redhat.com]
> Sent: Wednesday, December 06, 2017 11:10 PM
> To: Gonglei (Arei)
> Cc: Paolo Bonzini; Yang Zhong; Stefan Hajnoczi; qemu-devel
> Subject: Re: [Qemu-devel] About the light VM solution!
> 
> On Wed, Dec 06, 2017 at 09:21:55AM +, Gonglei (Arei) wrote:
> >
> > > -Original Message-
> > > From: Qemu-devel
> > > [mailto:qemu-devel-bounces+arei.gonglei=huawei@nongnu.org] On
> > > Behalf Of Stefan Hajnoczi
> > > Sent: Wednesday, December 06, 2017 12:31 AM
> > > To: Paolo Bonzini
> > > Cc: Yang Zhong; Stefan Hajnoczi; qemu-devel
> > > Subject: Re: [Qemu-devel] About the light VM solution!
> > >
> > > On Tue, Dec 05, 2017 at 03:00:10PM +0100, Paolo Bonzini wrote:
> > > > On 05/12/2017 14:47, Stefan Hajnoczi wrote:
> > > > > On Tue, Dec 5, 2017 at 1:35 PM, Paolo Bonzini 
> > > wrote:
> > > > >> On 05/12/2017 13:06, Stefan Hajnoczi wrote:
> > > > >>> On Tue, Dec 05, 2017 at 02:33:13PM +0800, Yang Zhong wrote:
> > > >  As you know, AWS has decided to switch to KVM in their clouds. This
> > > news make almost all
> > > >  china CSPs(clouds service provider) pay more attention on
> KVM/Qemu,
> > > especially light VM
> > > >  solution.
> > > > 
> > > >  Below are intel solution for light VM, qemu-lite.
> > > > 
> > >
> http://events.linuxfoundation.org/sites/events/files/slides/Light%20weight%2
> > > 0virtualization%20with%20QEMU%26KVM_0.pdf
> > > > 
> > > >  My question is whether community has some plan to implement
> light
> > > VM or alternative solutions? If no, whether our
> > > >  qemu-lite solution is suitable for upstream again? Many thanks!
> > > > >>>
> > > > >>> What caused a lot of discussion and held back progress was the
> approach
> > > > >>> that was taken.  The basic philosophy seems to be bypassing or
> > > > >>> special-casing components in order to avoid slow operations.  This
> > > > >>> requires special QEMU, firmware, and/or guest kernel binaries and
> > > causes
> > > > >>> extra work for the management stack, distributions, and testers.
> > > > >>
> > > > >> I think having a special firmware (be it qboot or a special-purpose
> > > > >> SeaBIOS) is acceptable.
> > > > >
> > > > > The work Marc Mari Barcelo did in 2015 showed that SeaBIOS can boot
> > > > > guests quickly.  The guest kernel was entered in <35 milliseconds
> > > > > IIRC.  Why is special firmware necessary?
> > > >
> > > > I thought that wasn't the "conventional" SeaBIOS, but rather one with
> > > > reduced configuration options, but I may be remembering wrong.
> > >
> > > Marc didn't spend much time on optimizing SeaBIOS, he used the build
> > > options that were suggested.  An extra flag can be added in
> > > qemu_preinit() to skip slow init that's unnecessary on optimized
> > > machines.  That would allow a single SeaBIOS binary to run both full and
> > > lite systems.
> > >
> > What's options do you remember? Stefan. Or any links about that
> > thread? I'm Interesting with this topic.
> 
> Here is what I found:
> 
> Marc Mari's fastest SeaBIOS build took 8 ms from the first guest CPU
> instruction to entering the guest kernel.  CBFS was used instead of a
> normal boot device (e.g. virtio-blk).  Most hardware support was
> disabled.
> 
> https://mail.coreboot.org/pipermail/seabios/2015-July/009554.html
> 
> The SeaBIOS configuration file is here:
> 
> https://mail.coreboot.org/pipermail/seabios/2015-July/009548.html
> 
Thanks for your information. :)
 
Thanks,
-Gonglei

Re: [Qemu-devel] [PATCH for-2.12 0/4] qmp dirty bitmap API

2017-12-06 Thread John Snow



On 11/30/2017 07:10 AM, Vladimir Sementsov-Ogievskiy wrote:
> 18.11.2017 00:35, John Snow wrote:
>>
>> On 11/17/2017 03:22 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> 17.11.2017 06:10, John Snow wrote:
 On 11/16/2017 03:17 AM, Vladimir Sementsov-Ogievskiy wrote:
> 16.11.2017 00:20, John Snow wrote:
>> On 11/13/2017 11:20 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> Hi all.
>>>
>>> There are three qmp commands, needed to implement external backup
>>> API.
>>>
>>> Using these three commands, client may do all needed bitmap
>>> management by
>>> hand:
>>>
>>> on backup start we need to do a transaction:
>>>    {disable old bitmap, create new bitmap}
>>>
>>> on backup success:
>>>    drop old bitmap
>>>
>>> on backup fail:
>>>    enable old bitmap
>>>    merge new bitmap to old bitmap
>>>    drop new bitmap
>>>
>> Can you give me an example of how you expect these commands to be
>> used,
>> and why they're required?
>>
>> I'm a little weary about how error-prone these commands might be
>> and the
>> potential for incorrect usage seems... high. Why do we require them?
> It is needed for incremental backup. It looks like bad idea to export
> abdicate/reclaim functionality, it is simpler
> and clearer to allow user to merge/enable/disable bitmaps by hand.
>
> usage is like this:
>
> 1. we have dirty bitmap bitmap0 for incremental backup.
>
> 2. prepare image fleecing (create temporary image with
> backing=our_disk)
> 3. in qmp transaction:
>  - disable bitmap0
>  - create bitmap1
>  - start image fleecing (backup sync=none our_disk -> temp_disk)
 This could probably just be its own command, though:

 block-job-fleece node=foobar bitmap=bitmap0 etc=etera etc=etera

 Could handle forking the bitmap. I'm not sure what the arguments would
 look like, but we could name the NBD export here, too. (Assuming the
 server is already started and we just need to create the share.)

 Then, we can basically do what mirror does:

 (1) Cancel
 (2) Complete

 Cancel would instruct QEMU to keep the bitmap changes (i.e. roll back),
 and Complete would instruct QEMU to discard the changes.

 This way we don't need to expose commands like split or merge that will
 almost always be dangerous to use over QMP.

 In fact, a fleecing job would be really convenient even without a
 bitmap, because it'd still be nice to have a convenience command for
 it.
 Using an existing infrastructure and understood paradigm is just a
 bonus.
>>> 1. If I understand correctly, Kevin and Max said in their report in
>>> Prague about new block-job approach,
>>>    using filter nodes, so I'm not sure that this is a good Idea to
>>> introduce now new old-style block-job, where we can
>>>    do without it.
>>>
>> We could do without it, but it might be a lot better to have everything
>> wrapped up in a command that's easy to digest instead of releasing 10
>> smaller commands that have to be executed in a very specific way in
>> order to work correctly.
>>
>> I'm thinking about the complexity of error checking here with all the
>> smaller commands, versus error checking on a larger workflow we
>> understand and can quality test better.
>>
>> I'm not sure that filter nodes becoming the new normal for block jobs
>> precludes our ability to use the job-management API as a handle for
>> managing the lifetime of a long-running task like fleecing, but I'll
>> check with Max and Kevin about this.
>>
>>> 2. there is the following scenario: customers needs a possibility to
>>> create a backup of data changed since some
>>> point in time. So, maintaining several static and one (or several) activ
>>> bitmaps with a possiblity of merge some of them
>>> and create a backup using this merged bitmap may be convenient.
>>>
>> I think the ability to copy bitmaps and issue differential backups would
>> be sufficient in all cases I could think of...
> 
> so, instead of keeping several static bitmaps with ability to merge them,
> you propose to keep several active bitmaps and copy them to make a backup?
> 
> so, instead of new qmp command for merge, add new qmp command for copy?
> 
> in case of static bitmaps, we can implement saving/loading them to the
> image to free RAM space,
> so it is better.
> 
> or what do you propose for  [2] ?
> 
> 
> 
>

I'm sorry, I don't think I understand.

"customers needs a possibility to create a backup of data changed since
some point in time."

Is that not the existing case for a simple incremental backup? Granted,
the point in time was decided when we created the bitmap or when we made
the last backup, but it is "since some point in time."

If you mean to say an arbitrary point in time after-the-fact, I don't
see how the API presented here helps enable that

Re: [Qemu-devel] [PATCH for-2.11] vfio: Fix vfio-kvm group registration

2017-12-06 Thread Alexey Kardashevskiy

On 06/12/17 12:30, Alex Williamson wrote:
> On Wed, 6 Dec 2017 12:02:01 +1100
> Alexey Kardashevskiy  wrote:
> 
>> On 06/12/17 08:09, Alex Williamson wrote:
>>> Commit 8c37faa475f3 ("vfio-pci, ppc64/spapr: Reorder group-to-container
>>> attaching") moved registration of groups with the vfio-kvm device from
>>> vfio_get_group() to vfio_connect_container(), but it missed the case
>>> where a group is attached to an existing container and takes an early
>>> exit.  Perhaps this is a less common case on ppc64/spapr, but on x86
>>> (without viommu) all groups are connected to the same container and
>>> thus only the first group gets registered with the vfio-kvm device.
>>> This becomes a problem if we then hot-unplug the devices associated
>>> with that first group and we end up with KVM being misinformed about
>>> any vfio connections that might remain.  Fix by including the call to
>>> vfio_kvm_device_add_group() in this early exit path.
>>>
>>> Fixes: 8c37faa475f3 ("vfio-pci, ppc64/spapr: Reorder group-to-container 
>>> attaching")
>>> Cc: qemu-sta...@nongnu.org # qemu-2.10+
>>> Signed-off-by: Alex Williamson 
>>> ---
>>>
>>> This bug also existed in QEMU 2.10, but I think the fix is sufficiently
>>> obvious (famous last words) to propose for 2.11 at this late date.  If
>>> the first group is hot unplugged then KVM may revert to code emulation
>>> that assumes no non-coherent DMA is present on some systems.  Also for
>>> KVMGT, if the vGPU is not the first device registered, then the
>>> notifier to enable linkages to KVM would not be called.  Please review.  
>>
>> For what it is worth
>>
>> Reviewed-by: Alexey Kardashevskiy 
> 
> Thanks!
> 
>> Sorry for the breakage...
>>
>> One question - how was this discovered? I'd love to set up a test
>> environment on my old thinkpad x230 if possible.
> 
> Assign two devices from separate iommu groups, hot unplug the first
> device, followed by the second device.  The second unplug will trigger:
> 
> qemu-kvm: Failed to remove group ## from KVM VFIO device: No such file or 
> directory
> 
> Laptops don't have many devices and we're not good about keeping up
> with ACS quirks on laptop chipsets, so it might be difficult to find
> the prerequisite setup there.  Thanks,


This is actually easy to reproduce on the spapr platform as reusing the
same container is what we do these days, at least till we get multiple PHB
support in libvirt :-/



-- 
Alexey

Re: [Qemu-devel] [PATCH-2.12 v2 2/3] xilinx_spips: Set all of the reset values

2017-12-06 Thread francisco iglesias

Hi Alistair,

On 6 December 2017 at 23:22, Alistair Francis 
wrote:

> Following the ZynqMP register spec let's ensure that all reset values
> are set.
>
> Signed-off-by: Alistair Francis 
> ---
> V2:
>  - Don't bother double setting registers
>
>  hw/ssi/xilinx_spips.c | 35 ++-
>  include/hw/ssi/xilinx_spips.h |  2 +-
>  2 files changed, 31 insertions(+), 6 deletions(-)
>
> diff --git a/hw/ssi/xilinx_spips.c b/hw/ssi/xilinx_spips.c
> index 899db814ee..b8182cfd74 100644
> --- a/hw/ssi/xilinx_spips.c
> +++ b/hw/ssi/xilinx_spips.c
> @@ -66,6 +66,7 @@
>
>  /* interrupt mechanism */
>  #define R_INTR_STATUS   (0x04 / 4)
> +#define R_INTR_STATUS_RESET (0x104)
>  #define R_INTR_EN   (0x08 / 4)
>  #define R_INTR_DIS  (0x0C / 4)
>  #define R_INTR_MASK (0x10 / 4)
> @@ -102,6 +103,9 @@
>  #define R_SLAVE_IDLE_COUNT  (0x24 / 4)
>  #define R_TX_THRES  (0x28 / 4)
>  #define R_RX_THRES  (0x2C / 4)
> +#define R_GPIO  (0x30 / 4)
> +#define R_LPBK_DLY_ADJ  (0x38 / 4)
> +#define R_LPBK_DLY_ADJ_RESET (0x33)
>  #define R_TXD1  (0x80 / 4)
>  #define R_TXD2  (0x84 / 4)
>  #define R_TXD3  (0x88 / 4)
> @@ -140,8 +144,12 @@
>  #define R_GQSPI_IER (0x108 / 4)
>  #define R_GQSPI_IDR (0x10c / 4)
>  #define R_GQSPI_IMR (0x110 / 4)
> +#define R_GQSPI_IMR_RESET   (0xfbe)
>  #define R_GQSPI_TX_THRESH   (0x128 / 4)
>  #define R_GQSPI_RX_THRESH   (0x12c / 4)
> +#define R_GQSPI_GPIO_THRESH (0x130 / 4)
>

According to doc (mentioned in patch 0/3) the address above, 0x130, is
"GQSPI GPIO for Write Protect". Should we rename the define to
R_GQSPI_GPIO? (Based on doc and that the other WP is named R_GPIO).

Best regards,
Francisco Iglesias


> +#define R_GQSPI_LPBK_DLY_ADJ (0x138 / 4)
> +#define R_GQSPI_LPBK_DLY_ADJ_RESET (0x33)
>  #define R_GQSPI_CNFG(0x100 / 4)
>  FIELD(GQSPI_CNFG, MODE_EN, 30, 2)
>  FIELD(GQSPI_CNFG, GEN_FIFO_START_MODE, 29, 1)
> @@ -177,8 +185,16 @@
>  FIELD(GQSPI_GF_SNAPSHOT, EXPONENT, 9, 1)
>  FIELD(GQSPI_GF_SNAPSHOT, DATA_XFER, 8, 1)
>  FIELD(GQSPI_GF_SNAPSHOT, IMMEDIATE_DATA, 0, 8)
> -#define R_GQSPI_MOD_ID(0x168 / 4)
> -#define R_GQSPI_MOD_ID_VALUE  0x010A
> +#define R_GQSPI_MOD_ID(0x1fc / 4)
> +#define R_GQSPI_MOD_ID_RESET  (0x10a)
> +
> +#define R_QSPIDMA_DST_CTRL (0x80c / 4)
> +#define R_QSPIDMA_DST_CTRL_RESET   (0x803ffa00)
> +#define R_QSPIDMA_DST_I_MASK   (0x820 / 4)
> +#define R_QSPIDMA_DST_I_MASK_RESET (0xfe)
> +#define R_QSPIDMA_DST_CTRL2(0x824 / 4)
> +#define R_QSPIDMA_DST_CTRL2_RESET  (0x081bfff8)
> +
>  /* size of TXRX FIFOs */
>  #define RXFF_A  (128)
>  #define TXFF_A  (128)
> @@ -351,11 +367,20 @@ static void xlnx_zynqmp_qspips_reset(DeviceState *d)
>  fifo8_reset(>rx_fifo_g);
>  fifo8_reset(>rx_fifo_g);
>  fifo32_reset(>fifo_g);
> +s->regs[R_INTR_STATUS] = R_INTR_STATUS_RESET;
> +s->regs[R_GPIO] = 1;
> +s->regs[R_LPBK_DLY_ADJ] = R_LPBK_DLY_ADJ_RESET;
> +s->regs[R_GQSPI_GFIFO_THRESH] = 0x10;
> +s->regs[R_MOD_ID] = 0x01090101;
> +s->regs[R_GQSPI_IMR] = R_GQSPI_IMR_RESET;
>  s->regs[R_GQSPI_TX_THRESH] = 1;
>  s->regs[R_GQSPI_RX_THRESH] = 1;
> -s->regs[R_GQSPI_GFIFO_THRESH] = 1;
> -s->regs[R_GQSPI_IMR] = GQSPI_IXR_MASK;
> -s->regs[R_MOD_ID] = 0x01090101;
> +s->regs[R_GQSPI_GPIO_THRESH] = 1;
> +s->regs[R_GQSPI_LPBK_DLY_ADJ] = R_GQSPI_LPBK_DLY_ADJ_RESET;
> +s->regs[R_GQSPI_MOD_ID] = R_GQSPI_MOD_ID_RESET;
> +s->regs[R_QSPIDMA_DST_CTRL] = R_QSPIDMA_DST_CTRL_RESET;
> +s->regs[R_QSPIDMA_DST_I_MASK] = R_QSPIDMA_DST_I_MASK_RESET;
> +s->regs[R_QSPIDMA_DST_CTRL2] = R_QSPIDMA_DST_CTRL2_RESET;
>  s->man_start_com_g = false;
>  s->gqspi_irqline = 0;
>  xlnx_zynqmp_qspips_update_ixr(s);
> diff --git a/include/hw/ssi/xilinx_spips.h b/include/hw/ssi/xilinx_spips.h
> index 75fc94ce5d..d398a4e81c 100644
> --- a/include/hw/ssi/xilinx_spips.h
> +++ b/include/hw/ssi/xilinx_spips.h
> @@ -32,7 +32,7 @@
>  typedef struct XilinxSPIPS XilinxSPIPS;
>
>  #define XLNX_SPIPS_R_MAX(0x100 / 4)
> -#define XLNX_ZYNQMP_SPIPS_R_MAX (0x200 / 4)
> +#define XLNX_ZYNQMP_SPIPS_R_MAX (0x830 / 4)
>
>  /* Bite off 4k chunks at a time */
>  #define LQSPI_CACHE_SIZE 1024
> --
> 2.14.1
>
>

Re: [Qemu-devel] [PATCH 2/2] virtio-blk: reject configs with logical block size > physical block size

2017-12-06 Thread Martin K. Petersen


Mark,

> virtio-blk logical block size should never be larger than physical block
> size because it doesn't make sense to have such configurations. QEMU doesn't
> have a way to effectively express this condition; the best it can do is
> report the physical block exponent as 0 - indicating the logical block size
> equals the physical block size.
>
> This is identical to commit 3da023b5827543ee4c022986ea2ad9d1274410b2
> but applied to virtio-blk (instead of virtio-scsi).

Reviewed-by: Martin K. Petersen 

-- 
Martin K. Petersen  Oracle Linux Engineering

Re: [Qemu-devel] [Qemu-block] [PATCH 1/2] qcow2: add overlap check for bitmap directory

2017-12-06 Thread John Snow



On 11/30/2017 11:47 AM, Vladimir Sementsov-Ogievskiy wrote:
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  block/qcow2.h  |  7 +--
>  block/qcow2-refcount.c | 12 
>  block/qcow2.c  |  6 ++
>  3 files changed, 23 insertions(+), 2 deletions(-)
> 
> diff --git a/block/qcow2.h b/block/qcow2.h
> index 6f0ff15dd0..8f226a3609 100644
> --- a/block/qcow2.h
> +++ b/block/qcow2.h
> @@ -98,6 +98,7 @@
>  #define QCOW2_OPT_OVERLAP_SNAPSHOT_TABLE "overlap-check.snapshot-table"
>  #define QCOW2_OPT_OVERLAP_INACTIVE_L1 "overlap-check.inactive-l1"
>  #define QCOW2_OPT_OVERLAP_INACTIVE_L2 "overlap-check.inactive-l2"
> +#define QCOW2_OPT_OVERLAP_BITMAP_DIRECTORY "overlap-check.bitmap-directory"
>  #define QCOW2_OPT_CACHE_SIZE "cache-size"
>  #define QCOW2_OPT_L2_CACHE_SIZE "l2-cache-size"
>  #define QCOW2_OPT_REFCOUNT_CACHE_SIZE "refcount-cache-size"
> @@ -406,8 +407,9 @@ typedef enum QCow2MetadataOverlap {
>  QCOW2_OL_SNAPSHOT_TABLE_BITNR = 5,
>  QCOW2_OL_INACTIVE_L1_BITNR= 6,
>  QCOW2_OL_INACTIVE_L2_BITNR= 7,
> +QCOW2_OL_BITMAP_DIRECTORY_BITNR = 8,
>  
> -QCOW2_OL_MAX_BITNR= 8,
> +QCOW2_OL_MAX_BITNR  = 9,
>  
>  QCOW2_OL_NONE   = 0,
>  QCOW2_OL_MAIN_HEADER= (1 << QCOW2_OL_MAIN_HEADER_BITNR),
> @@ -420,12 +422,13 @@ typedef enum QCow2MetadataOverlap {
>  /* NOTE: Checking overlaps with inactive L2 tables will result in bdrv
>   * reads. */
>  QCOW2_OL_INACTIVE_L2= (1 << QCOW2_OL_INACTIVE_L2_BITNR),
> +QCOW2_OL_BITMAP_DIRECTORY = (1 << QCOW2_OL_BITMAP_DIRECTORY_BITNR),
>  } QCow2MetadataOverlap;
>  
>  /* Perform all overlap checks which can be done in constant time */
>  #define QCOW2_OL_CONSTANT \
>  (QCOW2_OL_MAIN_HEADER | QCOW2_OL_ACTIVE_L1 | QCOW2_OL_REFCOUNT_TABLE | \
> - QCOW2_OL_SNAPSHOT_TABLE)
> + QCOW2_OL_SNAPSHOT_TABLE | QCOW2_OL_BITMAP_DIRECTORY)
>  
>  /* Perform all overlap checks which don't require disk access */
>  #define QCOW2_OL_CACHED \
> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
> index 3de1ab51ba..a7a2703f26 100644
> --- a/block/qcow2-refcount.c
> +++ b/block/qcow2-refcount.c
> @@ -2585,6 +2585,18 @@ int qcow2_check_metadata_overlap(BlockDriverState *bs, 
> int ign, int64_t offset,
>  }
>  }
>  
> +if ((chk & QCOW2_OL_BITMAP_DIRECTORY) &&
> +(s->autoclear_features & QCOW2_AUTOCLEAR_BITMAPS))
> +{
> +/* update_ext_header_and_dir_in_place firstly drop autoclear flag,
> + * so it will not fail */
> +if (overlaps_with(s->bitmap_directory_offset,
> +  s->bitmap_directory_size))
> +{
> +return QCOW2_OL_BITMAP_DIRECTORY;
> +}
> +}
> +

Isn't the purpose of this function to test if a given offset conflicts
with known regions of the file? I don't see you actually utilize the
'offset' parameter here, but maybe I don't understand what you're trying
to accomplish.

>  return 0;
>  }
>  
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 1914a940e5..8278c0e124 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -655,6 +655,11 @@ static QemuOptsList qcow2_runtime_opts = {
>  .help = "Check for unintended writes into an inactive L2 table",
>  },
>  {
> +.name = QCOW2_OPT_OVERLAP_BITMAP_DIRECTORY,
> +.type = QEMU_OPT_BOOL,
> +.help = "Check for unintended writes into the bitmap directory",
> +},
> +{
>  .name = QCOW2_OPT_CACHE_SIZE,
>  .type = QEMU_OPT_SIZE,
>  .help = "Maximum combined metadata (L2 tables and refcount 
> blocks) "
> @@ -690,6 +695,7 @@ static const char 
> *overlap_bool_option_names[QCOW2_OL_MAX_BITNR] = {
>  [QCOW2_OL_SNAPSHOT_TABLE_BITNR] = QCOW2_OPT_OVERLAP_SNAPSHOT_TABLE,
>  [QCOW2_OL_INACTIVE_L1_BITNR]= QCOW2_OPT_OVERLAP_INACTIVE_L1,
>  [QCOW2_OL_INACTIVE_L2_BITNR]= QCOW2_OPT_OVERLAP_INACTIVE_L2,
> +[QCOW2_OL_BITMAP_DIRECTORY_BITNR] = QCOW2_OPT_OVERLAP_BITMAP_DIRECTORY,
>  };
>  
>  static void cache_clean_timer_cb(void *opaque)
>

Re: [Qemu-devel] [RFC PATCH] target/sh4/translate.c: fix TCG leak during gusa sequence

2017-12-06 Thread Aurelien Jarno

On 2017-12-06 09:30, Alex Bennée wrote:
> This fixes bug #1735384 while running java under qemu-sh4. When debug
> was enabled it showed a problem with TCG temps. Once fixed I was able
> to run java -version normally.
> 
> Reported-by: John Paul Adrian Glaubitz 
> Suggested-by: Richard Henderson 
> Signed-off-by: Alex Bennée 
> ---
>  target/sh4/translate.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/target/sh4/translate.c b/target/sh4/translate.c
> index 703020fe87..b4b5c822d0 100644
> --- a/target/sh4/translate.c
> +++ b/target/sh4/translate.c
> @@ -2189,7 +2189,7 @@ static int decode_gusa(DisasContext *ctx, CPUSH4State 
> *env, int *pmax_insns)
>  }
>  
>  /* If op_src is not a valid register, then op_arg was a constant.  */
> -if (op_src < 0) {
> +if (op_src < 0 && !TCGV_IS_UNUSED(op_arg)) {
>  tcg_temp_free_i32(op_arg);
>  }

I guess this happens when trying to match the exchange pattern, so this
looks correct to me.

Reviewed-by: Aurelien Jarno 

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

[Qemu-devel] [PATCH-2.12 v2 3/3] xilinx_spips: Use memset instead of a for loop to zero registers

2017-12-06 Thread Alistair Francis

Use memset() instead of a for loop to zero all of the registers.

Signed-off-by: Alistair Francis 
Reviewed-by: KONRAD Frederic 
Reviewed-by: Francisco Iglesias 
---

 hw/ssi/xilinx_spips.c | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/hw/ssi/xilinx_spips.c b/hw/ssi/xilinx_spips.c
index b8182cfd74..59d42bfce7 100644
--- a/hw/ssi/xilinx_spips.c
+++ b/hw/ssi/xilinx_spips.c
@@ -329,10 +329,7 @@ static void xilinx_spips_reset(DeviceState *d)
 {
 XilinxSPIPS *s = XILINX_SPIPS(d);
 
-int i;
-for (i = 0; i < XLNX_SPIPS_R_MAX; i++) {
-s->regs[i] = 0;
-}
+memset(s->regs, 0, sizeof(s->regs));
 
 fifo8_reset(>rx_fifo);
 fifo8_reset(>rx_fifo);
@@ -357,13 +354,11 @@ static void xilinx_spips_reset(DeviceState *d)
 static void xlnx_zynqmp_qspips_reset(DeviceState *d)
 {
 XlnxZynqMPQSPIPS *s = XLNX_ZYNQMP_QSPIPS(d);
-int i;
 
 xilinx_spips_reset(d);
 
-for (i = 0; i < XLNX_ZYNQMP_SPIPS_R_MAX; i++) {
-s->regs[i] = 0;
-}
+memset(s->regs, 0, sizeof(s->regs));
+
 fifo8_reset(>rx_fifo_g);
 fifo8_reset(>rx_fifo_g);
 fifo32_reset(>fifo_g);
-- 
2.14.1

[Qemu-devel] [PATCH-2.12 v2 1/3] xilinx_spips: Update the QSPI Mod ID reset value

2017-12-06 Thread Alistair Francis

Update the reset value to match the latest ZynqMP register spec.

Signed-off-by: Alistair Francis 
Reviewed-by: KONRAD Frederic 
Reviewed-by: Francisco Iglesias 
---

 hw/ssi/xilinx_spips.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/ssi/xilinx_spips.c b/hw/ssi/xilinx_spips.c
index ad1b2ba79f..899db814ee 100644
--- a/hw/ssi/xilinx_spips.c
+++ b/hw/ssi/xilinx_spips.c
@@ -355,6 +355,7 @@ static void xlnx_zynqmp_qspips_reset(DeviceState *d)
 s->regs[R_GQSPI_RX_THRESH] = 1;
 s->regs[R_GQSPI_GFIFO_THRESH] = 1;
 s->regs[R_GQSPI_IMR] = GQSPI_IXR_MASK;
+s->regs[R_MOD_ID] = 0x01090101;
 s->man_start_com_g = false;
 s->gqspi_irqline = 0;
 xlnx_zynqmp_qspips_update_ixr(s);
-- 
2.14.1

[Qemu-devel] [PATCH-2.12 v2 0/3] Update the reset values of the Xilinx ZynqMP QSPI

2017-12-06 Thread Alistair Francis

Update the reset values of the Xilinx ZynqMP QSPI device to match the
resister spec here:
https://www.xilinx.com/html_docs/registers/ug1087/ug1087-zynq-ultrascale-registers.html

V2:
 - Don't double set registers

Based-on: 20171126231634.9531-14-frasse.igles...@gmail.com

Alistair Francis (3):
  xilinx_spips: Update the QSPI Mod ID reset value
  xilinx_spips: Set all of the reset values
  xilinx_spips: Use memset instead of a for loop to zero registers

 hw/ssi/xilinx_spips.c | 45 +++
 include/hw/ssi/xilinx_spips.h |  2 +-
 2 files changed, 34 insertions(+), 13 deletions(-)

-- 
2.14.1

[Qemu-devel] [PATCH-2.12 v2 2/3] xilinx_spips: Set all of the reset values

2017-12-06 Thread Alistair Francis

Following the ZynqMP register spec let's ensure that all reset values
are set.

Signed-off-by: Alistair Francis 
---
V2:
 - Don't bother double setting registers

 hw/ssi/xilinx_spips.c | 35 ++-
 include/hw/ssi/xilinx_spips.h |  2 +-
 2 files changed, 31 insertions(+), 6 deletions(-)

diff --git a/hw/ssi/xilinx_spips.c b/hw/ssi/xilinx_spips.c
index 899db814ee..b8182cfd74 100644
--- a/hw/ssi/xilinx_spips.c
+++ b/hw/ssi/xilinx_spips.c
@@ -66,6 +66,7 @@
 
 /* interrupt mechanism */
 #define R_INTR_STATUS   (0x04 / 4)
+#define R_INTR_STATUS_RESET (0x104)
 #define R_INTR_EN   (0x08 / 4)
 #define R_INTR_DIS  (0x0C / 4)
 #define R_INTR_MASK (0x10 / 4)
@@ -102,6 +103,9 @@
 #define R_SLAVE_IDLE_COUNT  (0x24 / 4)
 #define R_TX_THRES  (0x28 / 4)
 #define R_RX_THRES  (0x2C / 4)
+#define R_GPIO  (0x30 / 4)
+#define R_LPBK_DLY_ADJ  (0x38 / 4)
+#define R_LPBK_DLY_ADJ_RESET (0x33)
 #define R_TXD1  (0x80 / 4)
 #define R_TXD2  (0x84 / 4)
 #define R_TXD3  (0x88 / 4)
@@ -140,8 +144,12 @@
 #define R_GQSPI_IER (0x108 / 4)
 #define R_GQSPI_IDR (0x10c / 4)
 #define R_GQSPI_IMR (0x110 / 4)
+#define R_GQSPI_IMR_RESET   (0xfbe)
 #define R_GQSPI_TX_THRESH   (0x128 / 4)
 #define R_GQSPI_RX_THRESH   (0x12c / 4)
+#define R_GQSPI_GPIO_THRESH (0x130 / 4)
+#define R_GQSPI_LPBK_DLY_ADJ (0x138 / 4)
+#define R_GQSPI_LPBK_DLY_ADJ_RESET (0x33)
 #define R_GQSPI_CNFG(0x100 / 4)
 FIELD(GQSPI_CNFG, MODE_EN, 30, 2)
 FIELD(GQSPI_CNFG, GEN_FIFO_START_MODE, 29, 1)
@@ -177,8 +185,16 @@
 FIELD(GQSPI_GF_SNAPSHOT, EXPONENT, 9, 1)
 FIELD(GQSPI_GF_SNAPSHOT, DATA_XFER, 8, 1)
 FIELD(GQSPI_GF_SNAPSHOT, IMMEDIATE_DATA, 0, 8)
-#define R_GQSPI_MOD_ID(0x168 / 4)
-#define R_GQSPI_MOD_ID_VALUE  0x010A
+#define R_GQSPI_MOD_ID(0x1fc / 4)
+#define R_GQSPI_MOD_ID_RESET  (0x10a)
+
+#define R_QSPIDMA_DST_CTRL (0x80c / 4)
+#define R_QSPIDMA_DST_CTRL_RESET   (0x803ffa00)
+#define R_QSPIDMA_DST_I_MASK   (0x820 / 4)
+#define R_QSPIDMA_DST_I_MASK_RESET (0xfe)
+#define R_QSPIDMA_DST_CTRL2(0x824 / 4)
+#define R_QSPIDMA_DST_CTRL2_RESET  (0x081bfff8)
+
 /* size of TXRX FIFOs */
 #define RXFF_A  (128)
 #define TXFF_A  (128)
@@ -351,11 +367,20 @@ static void xlnx_zynqmp_qspips_reset(DeviceState *d)
 fifo8_reset(>rx_fifo_g);
 fifo8_reset(>rx_fifo_g);
 fifo32_reset(>fifo_g);
+s->regs[R_INTR_STATUS] = R_INTR_STATUS_RESET;
+s->regs[R_GPIO] = 1;
+s->regs[R_LPBK_DLY_ADJ] = R_LPBK_DLY_ADJ_RESET;
+s->regs[R_GQSPI_GFIFO_THRESH] = 0x10;
+s->regs[R_MOD_ID] = 0x01090101;
+s->regs[R_GQSPI_IMR] = R_GQSPI_IMR_RESET;
 s->regs[R_GQSPI_TX_THRESH] = 1;
 s->regs[R_GQSPI_RX_THRESH] = 1;
-s->regs[R_GQSPI_GFIFO_THRESH] = 1;
-s->regs[R_GQSPI_IMR] = GQSPI_IXR_MASK;
-s->regs[R_MOD_ID] = 0x01090101;
+s->regs[R_GQSPI_GPIO_THRESH] = 1;
+s->regs[R_GQSPI_LPBK_DLY_ADJ] = R_GQSPI_LPBK_DLY_ADJ_RESET;
+s->regs[R_GQSPI_MOD_ID] = R_GQSPI_MOD_ID_RESET;
+s->regs[R_QSPIDMA_DST_CTRL] = R_QSPIDMA_DST_CTRL_RESET;
+s->regs[R_QSPIDMA_DST_I_MASK] = R_QSPIDMA_DST_I_MASK_RESET;
+s->regs[R_QSPIDMA_DST_CTRL2] = R_QSPIDMA_DST_CTRL2_RESET;
 s->man_start_com_g = false;
 s->gqspi_irqline = 0;
 xlnx_zynqmp_qspips_update_ixr(s);
diff --git a/include/hw/ssi/xilinx_spips.h b/include/hw/ssi/xilinx_spips.h
index 75fc94ce5d..d398a4e81c 100644
--- a/include/hw/ssi/xilinx_spips.h
+++ b/include/hw/ssi/xilinx_spips.h
@@ -32,7 +32,7 @@
 typedef struct XilinxSPIPS XilinxSPIPS;
 
 #define XLNX_SPIPS_R_MAX(0x100 / 4)
-#define XLNX_ZYNQMP_SPIPS_R_MAX (0x200 / 4)
+#define XLNX_ZYNQMP_SPIPS_R_MAX (0x830 / 4)
 
 /* Bite off 4k chunks at a time */
 #define LQSPI_CACHE_SIZE 1024
-- 
2.14.1

Re: [Qemu-devel] [PATCH v2] hw/ide: Remove duplicated definitions from ahci_internal.h

2017-12-06 Thread John Snow

I tweaked this again, sorry:

The names need to stay public, but the wrappers to manipulate the
objects can stay internal. Minor difference.

If that's okay, I'll just merge this in.
OK?

--js

diff --git a/hw/ide/ahci_internal.h b/hw/ide/ahci_internal.h
index ce2e818c8c..8c755d4ca1 100644
--- a/hw/ide/ahci_internal.h
+++ b/hw/ide/ahci_internal.h
@@ -311,8 +311,6 @@ struct AHCIPCIState {
 AHCIState ahci;
 };

-#define TYPE_ICH9_AHCI "ich9-ahci"
-
 #define ICH_AHCI(obj) \
 OBJECT_CHECK(AHCIPCIState, (obj), TYPE_ICH9_AHCI)

@@ -375,10 +373,8 @@ void ahci_uninit(AHCIState *s);

 void ahci_reset(AHCIState *s);

-#define TYPE_SYSBUS_AHCI "sysbus-ahci"
 #define SYSBUS_AHCI(obj) OBJECT_CHECK(SysbusAHCIState, (obj),
TYPE_SYSBUS_AHCI)

-#define TYPE_ALLWINNER_AHCI "allwinner-ahci"
 #define ALLWINNER_AHCI(obj) OBJECT_CHECK(AllwinnerAHCIState, (obj), \
TYPE_ALLWINNER_AHCI)

diff --git a/include/hw/ide/ahci.h b/include/hw/ide/ahci.h
index 5a06537e6b..b7bb2b02d6 100644
--- a/include/hw/ide/ahci.h
+++ b/include/hw/ide/ahci.h
@@ -54,14 +54,10 @@ typedef struct AHCIPCIState AHCIPCIState;

 #define TYPE_ICH9_AHCI "ich9-ahci"

-#define ICH_AHCI(obj) \
-OBJECT_CHECK(AHCIPCIState, (obj), TYPE_ICH9_AHCI)
-
 int32_t ahci_get_num_ports(PCIDevice *dev);
 void ahci_ide_create_devs(PCIDevice *dev, DriveInfo **hd);

 #define TYPE_SYSBUS_AHCI "sysbus-ahci"
-#define SYSBUS_AHCI(obj) OBJECT_CHECK(SysbusAHCIState, (obj),
TYPE_SYSBUS_AHCI)

 typedef struct SysbusAHCIState {
 /*< private >*/
@@ -73,8 +69,6 @@ typedef struct SysbusAHCIState {
 } SysbusAHCIState;

 #define TYPE_ALLWINNER_AHCI "allwinner-ahci"
-#define ALLWINNER_AHCI(obj) OBJECT_CHECK(AllwinnerAHCIState, (obj), \
-   TYPE_ALLWINNER_AHCI)

 #define ALLWINNER_AHCI_MMIO_OFF  0x80
 #define ALLWINNER_AHCI_MMIO_SIZE 0x80


On 12/05/2017 02:10 AM, Thomas Huth wrote:
> The same definitions can also be found in include/hw/ide/ahci.h
> so let's remove these #defines from ahci_internal.h.
> 
> Signed-off-by: Thomas Huth 
> ---
>  v2: Also remove TYPE_ICH9_AHCI as suggested by John
> 
>  hw/ide/ahci_internal.h | 12 
>  1 file changed, 12 deletions(-)
> 
> diff --git a/hw/ide/ahci_internal.h b/hw/ide/ahci_internal.h
> index ce2e818..e3e3ed2 100644
> --- a/hw/ide/ahci_internal.h
> +++ b/hw/ide/ahci_internal.h
> @@ -311,11 +311,6 @@ struct AHCIPCIState {
>  AHCIState ahci;
>  };
>  
> -#define TYPE_ICH9_AHCI "ich9-ahci"
> -
> -#define ICH_AHCI(obj) \
> -OBJECT_CHECK(AHCIPCIState, (obj), TYPE_ICH9_AHCI)
> -
>  extern const VMStateDescription vmstate_ahci;
>  
>  #define VMSTATE_AHCI(_field, _state) {   \
> @@ -375,11 +370,4 @@ void ahci_uninit(AHCIState *s);
>  
>  void ahci_reset(AHCIState *s);
>  
> -#define TYPE_SYSBUS_AHCI "sysbus-ahci"
> -#define SYSBUS_AHCI(obj) OBJECT_CHECK(SysbusAHCIState, (obj), 
> TYPE_SYSBUS_AHCI)
> -
> -#define TYPE_ALLWINNER_AHCI "allwinner-ahci"
> -#define ALLWINNER_AHCI(obj) OBJECT_CHECK(AllwinnerAHCIState, (obj), \
> -   TYPE_ALLWINNER_AHCI)
> -
>  #endif /* HW_IDE_AHCI_H */
>

Re: [Qemu-devel] [Qemu-block] [PATCH 2/7] ide: account UNMAP (TRIM) operations

2017-12-06 Thread John Snow

On 12/05/2017 12:14 PM, Anton Nefedov wrote:
> 
> 
> On 5/12/2017 6:21 PM, Alberto Garcia wrote:
>> On Mon 20 Nov 2017 05:50:59 PM CET, Anton Nefedov wrote:
>>> Signed-off-by: Anton Nefedov 
>>> Reviewed-by: Vladimir Sementsov-Ogievskiy 
>>> ---
>>>   hw/ide/core.c | 12 
>>>   1 file changed, 12 insertions(+)
>>>
>>> diff --git a/hw/ide/core.c b/hw/ide/core.c
>>> index 471d0c9..2e4dea7 100644
>>> --- a/hw/ide/core.c
>>> +++ b/hw/ide/core.c
>>> @@ -389,6 +389,7 @@ typedef struct TrimAIOCB {
>>>   QEMUIOVector *qiov;
>>>   BlockAIOCB *aiocb;
>>>   int i, j;
>>> +    BlockAcctCookie acct;
>>>   } TrimAIOCB;
>>>     static void trim_aio_cancel(BlockAIOCB *acb)
>>> @@ -426,6 +427,14 @@ static void ide_trim_bh_cb(void *opaque)
>>>   static void ide_issue_trim_cb(void *opaque, int ret)
>>>   {
>>>   TrimAIOCB *iocb = opaque;
>>> +    if (iocb->i >= 0) {
>>> +    if (ret >= 0) {
>>> +    block_acct_done(blk_get_stats(iocb->blk), >acct);
>>> +    } else {
>>> +    block_acct_failed(blk_get_stats(iocb->blk), >acct);
>>> +    }
>>> +    }
>>
>> This part looks fine, but don't you also need to account for invalid
>> requests (in ide_dma_cb() or somewhere else) ?
>>

not ide_dma_cb this time, because the command does not use ATA registers
as input, see below :(

>> Berto
>>
> 
> Good point; in fact, the TRIM sector range is never checked.
> (well it should be, down at the block layer, and then counted as error).
> 
> The motivation was:
> 
>     commit d66168ed687325aa6d338ce3a3cff18ce3098ed6
>     Author: Michael Tokarev 
>     Date:   Wed Aug 13 11:23:31 2014 +0400
> 
>     ide: only constrain read/write requests to drive size, not other types
> 
>     Commit 58ac321135a introduced a check to ide dma processing which
>     constrains all requests to drive size.  However, apparently, some
>     valid requests (like TRIM) does not fit in this constraint, and
>     fails in 2.1.  So check the range only for reads and writes.
> 

I wound up at the same commit. The problem here is that the TRIM command
does not issue contiguous LBA+count requests in the same way using the
ATA registers like DMA R/W functions do, but instead works a bit more
like DMA commands in that it transmits a list of regions separately.

Kevin pointed this out in 2014:

https://lists.gnu.org/archive/html/qemu-devel/2014-08/msg02012.html

"I can't give you a clear answer, but it all depends on the value of
sector_num. This is the contents of the LBA registers and unused for
TRIM (spec says it's reserved, so we shouldn't be looking at it)."

He's right! This means the LBA/Count registers are undefined when we're
using TRIM in ide_dma_cb.

So, you have two things you can do:

(1) Modify ide_sector_start_dma to start accounting for TRIM early, and
then continue to mark it failed/complete where you do, but you'll need
to add in a new error case in ide_issue_trim_cb to check the range and
abort the job. We do not need to preprocess the entire "list" of TRIM
regions upfront as we are within our rights to process some of them
before aborting.

(2) Continue to start accounting where you do, per-region, which keeps
trim requests "per chunk", but add the error range checking almost
immediately after, if this is useful for statistical purposes. It will
look a little funny to start accounting and then immediately and
synchronously mark it invalid, but that's probably the most accurate thing.

I'm basing this off of the ATA8-AC3 spec, which defines the command
"DATA SET MANAGEMENT - 06h, DMA":

> If the Trim bit is set to one and:
> a) the device detects an invalid LBA Range Entry; or
> b) count is greater than IDENTIFY DEVICE data word 105 (see 7.16.7.55),
> then the device shall return command aborted.
> A device may trim one or more LBA Range Entries before it returns
command aborted. See table 209.

ATA8-ACS3 section  seems pretty clear to me that we *may* abort the
command if we detect an "invalid LBA Range Entry" which is defined in
3.1.40 as:

"3.1.40 Invalid LBA Range: A range of LBAs that contains one or more
invalid LBAs."

which references 3.1.39:

3.1.39 Invalid LBA: An LBA that is greater than or equal to the largest
value reported in IDENTIFY
DEVICE data words 60..61 (see 7.16.7.22), IDENTIFY DEVICE data words
100..103
(see 7.16.7.53), or IDENTIFY DEVICE data words 230..233 (see 7.16.7.88).

Of course, we only know if a range could possibly be invalid by the time
we actually process it in ide_issue_trim_cb.

> 
> It seems like the removed check was at the wrong place (trim request has
> to be parsed first to get offset and nbytes).
> Probably it should be put to ide_issue_trim_cb() instead.
> 
> cc John (should have done it earlier)
> 

Hi!

> /Anton
>

Re: [Qemu-devel] [PATCH v5 01/23] memattrs: add debug attribute

2017-12-06 Thread Peter Maydell

On 6 December 2017 at 20:03, Brijesh Singh  wrote:
> The debug attribute will be set when qemu attempts to access the guest
> memory for debug (e.g memory access from gdbstub, memory dump commands
> etc).
>
> When guest memory is encrypted, the debug access will need to go through
> the memory encryption APIs.
>
> Cc: Alistair Francis 
> Cc: Peter Maydell 
> Cc: Edgar E. Iglesias" 
> Cc: Richard Henderson 
> Cc: Paolo Bonzini 
> Signed-off-by: Brijesh Singh 
> ---
>  include/exec/memattrs.h | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/include/exec/memattrs.h b/include/exec/memattrs.h
> index d4a16420984b..721362e06292 100644
> --- a/include/exec/memattrs.h
> +++ b/include/exec/memattrs.h
> @@ -37,6 +37,8 @@ typedef struct MemTxAttrs {
>  unsigned int user:1;
>  /* Requester ID (for MSI for example) */
>  unsigned int requester_id:16;
> +/* Debug memory access for encrypted guest */
> +unsigned int debug:1;
>  } MemTxAttrs;

Can we have some more detailed semantics for this please?
For instance, if a device gets a debug=1 transaction
should it refuse to do things like read-clears-bits
semantics or other side-effects you wouldn't expect
of debugger accesses?

thanks
-- PMM

Re: [Qemu-devel] [PATCH 1/2] target/sh4: add missing tcg_temp_free() in gen_conditional_jump()

2017-12-06 Thread Aurelien Jarno

On 2017-12-05 14:00, Philippe Mathieu-Daudé wrote:
> missed in c55497ecb8c.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  target/sh4/translate.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/target/sh4/translate.c b/target/sh4/translate.c
> index 703020fe87..5aeaabdd8d 100644
> --- a/target/sh4/translate.c
> +++ b/target/sh4/translate.c
> @@ -322,13 +322,16 @@ static void gen_delayed_conditional_jump(DisasContext * 
> ctx)
>  gen_jump(ctx);
>  
>  gen_set_label(l1);
> -return;
> +goto done;
>  }
>  
>  tcg_gen_brcondi_i32(TCG_COND_NE, ds, 0, l1);
>  gen_goto_tb(ctx, 1, ctx->pc + 2);
>  gen_set_label(l1);
>  gen_jump(ctx);
> +
> +done:
> +tcg_temp_free(ds);
>  }
>  
>  static inline void gen_load_fpr64(DisasContext *ctx, TCGv_i64 t, int reg)

AFAIR, temps are not preserved across a branch (contrary to local
temps), so I am not sure they need to be freed.

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH 2/2] target/sh4: add missing tcg_temp_free() in _decode_opc()

2017-12-06 Thread Aurelien Jarno

On 2017-12-05 14:00, Philippe Mathieu-Daudé wrote:
> missed in c55497ecb8c and 852d481faf7.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  target/sh4/translate.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/target/sh4/translate.c b/target/sh4/translate.c
> index 5aeaabdd8d..62d01227fc 100644
> --- a/target/sh4/translate.c
> +++ b/target/sh4/translate.c
> @@ -604,6 +604,7 @@ static void _decode_opc(DisasContext * ctx)
>   tcg_gen_subi_i32(addr, REG(B11_8), 4);
>  tcg_gen_qemu_st_i32(REG(B7_4), addr, ctx->memidx, MO_TEUL);
>   tcg_gen_mov_i32(REG(B11_8), addr);
> +tcg_temp_free(addr);
>   }
>   return;
>  case 0x6004: /* mov.b @Rm+,Rn */
> @@ -1527,6 +1528,7 @@ static void _decode_opc(DisasContext * ctx)
>  tcg_gen_qemu_ld_i32(val, REG(B11_8), ctx->memidx, MO_TEUL);
>  gen_helper_movcal(cpu_env, REG(B11_8), val);
>  tcg_gen_qemu_st_i32(REG(0), REG(B11_8), ctx->memidx, MO_TEUL);
> +tcg_temp_free(val);
>  }
>  ctx->has_movcal = 1;
>   return;

Good catch!

Reviewed-by: Aurelien Jarno 

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [Qemu-block] [PATCH 4/4] iotests: add dirty bitmap migration test

2017-12-06 Thread John Snow



On 12/06/2017 04:51 AM, Vladimir Sementsov-Ogievskiy wrote:
> 28.11.2017 10:14, Vladimir Sementsov-Ogievskiy wrote:
>> The test creates two vms (vm_a, vm_b), create dirty bitmap in
>> the first one, do several writes to corresponding device and
>> then migrate vm_a to vm_b with dirty bitmaps.
>>
>> For now, only migration through shared storage for persistent
>> bitmaps is available, so it is tested here. Only offline variant
>> is tested for now (a kind of suspend-resume), as it is needed
>> to test that this case is successfully fixed by recent patch.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>> ---
>>   tests/qemu-iotests/169    | 82
>> +++
>>   tests/qemu-iotests/169.out    |  5 +++
>>   tests/qemu-iotests/group  |  1 +
>>   tests/qemu-iotests/iotests.py |  7 +++-
>>   4 files changed, 94 insertions(+), 1 deletion(-)
>>   create mode 100755 tests/qemu-iotests/169
>>   create mode 100644 tests/qemu-iotests/169.out
>>
>> diff --git a/tests/qemu-iotests/169 b/tests/qemu-iotests/169
>> new file mode 100755
>> index 00..a0f213b274
>> --- /dev/null
>> +++ b/tests/qemu-iotests/169
>> @@ -0,0 +1,82 @@
>> +#!/usr/bin/env python
>> +#
>> +# Tests for dirty bitmaps migration.
>> +#
>> +# Copyright (c) 2016-2017 Parallels International GmbH
>> +#
>> +# This program is free software; you can redistribute it and/or modify
>> +# it under the terms of the GNU General Public License as published by
>> +# the Free Software Foundation; either version 2 of the License, or
>> +# (at your option) any later version.
>> +#
>> +# This program is distributed in the hope that it will be useful,
>> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> +# GNU General Public License for more details.
>> +#
>> +# You should have received a copy of the GNU General Public License
>> +# along with this program.  If not, see .
>> +#
>> +
>> +import os
>> +import iotests
>> +import time
>> +from iotests import qemu_img
>> +
>> +disk = os.path.join(iotests.test_dir, 'disk')
>> +migfile = os.path.join(iotests.test_dir, 'migfile')
>> +
>> +class TestPersistentDirtyBitmapSuspendResume(iotests.QMPTestCase):
>> +
>> +    def tearDown(self):
>> +    self.vm_a.shutdown()
>> +    self.vm_b.shutdown()
>> +    os.remove(disk)
>> +    os.remove(migfile)
>> +
>> +    def setUp(self):
>> +    qemu_img('create', '-f', iotests.imgfmt, disk, '1M')
>> +
>> +    self.vm_a = iotests.VM(path_suffix='a').add_drive(disk)
>> +    self.vm_a.launch()
>> +
>> +    self.vm_b = iotests.VM(path_suffix='b').add_drive(disk)
>> +    self.vm_b.add_incoming("exec: cat '" + migfile + "'")
>> +
>> +    def test_migration_persistent_shared_offline(self):
>> +    """ A kind of suspend-resume """
>> +    granularity = 512
>> +    regions = [
>> +    { 'start': 0,   'count': 0x1 },
>> +    { 'start': 0xf, 'count': 0x1 },
>> +    { 'start': 0xa0201, 'count': 0x1000  }
>> +    ]
>> +
>> +    result = self.vm_a.qmp('block-dirty-bitmap-add', node='drive0',
>> +   name='bitmap0', granularity=granularity,
>> +   persistent=True, autoload=True)
>> +    self.assert_qmp(result, 'return', {});
>> +
>> +    for r in regions:
>> +    self.vm_a.hmp_qemu_io('drive0',
>> +  'write %d %d' % (r['start'],
>> r['count']))
>> +
>> +    result = self.vm_a.qmp('x-debug-block-dirty-bitmap-sha256',
>> +   node='drive0', name='bitmap0')
>> +    sha256 = result['return']['sha256']
>> +
>> +    result = self.vm_a.qmp('migrate', uri='exec:cat>' + migfile)
>> +    self.assert_qmp(result, 'return', {});
>> +    self.assertNotEqual(self.vm_a.event_wait("STOP"), None)
>> +    self.vm_a.shutdown()
>> +
>> +    self.vm_b.launch()
>> +    self.vm_b.event_wait("RESUME", timeout=10)
> 
> with previous patch dropped, please fix it to be 10.0

Oh, I see, it gets confused over integral values? We should probably fix
that but it can be separate for now.

Everything looks good to me in that case, thanks

Re: [Qemu-devel] [RFC 4/7] vhost: update_mem_cb implementation

2017-12-06 Thread Dr. David Alan Gilbert

* Igor Mammedov (imamm...@redhat.com) wrote:
> On Wed, 29 Nov 2017 18:50:23 +
> "Dr. David Alan Gilbert (git)"  wrote:
> 
> > From: "Dr. David Alan Gilbert" 
> > 
> > Add the meat of update_mem_cb;  this is called for each region,
> > to add a region to our temporary list.
> > Our temporary list is in order we look to see if this
> > region can be merged with the current head of list.
> > 
> > Signed-off-by: Dr. David Alan Gilbert 
> > ---
> >  hw/virtio/trace-events |  2 ++
> >  hw/virtio/vhost.c  | 55 
> > +-
> >  2 files changed, 56 insertions(+), 1 deletion(-)
> > 
> > diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> > index 4a493bcd46..92fadec192 100644
> > --- a/hw/virtio/trace-events
> > +++ b/hw/virtio/trace-events
> > @@ -2,6 +2,8 @@
> >  
> >  # hw/virtio/vhost.c
> >  vhost_section(const char *name, int r) "%s:%d"
> > +vhost_update_mem_cb(const char *name, uint64_t gpa, uint64_t size, 
> > uint64_t host) "%s: 0x%"PRIx64"+0x%"PRIx64" @ 0x%"PRIx64
> > +vhost_update_mem_cb_abut(const char *name, uint64_t new_size) "%s: 
> > 0x%"PRIx64
> >  
> >  # hw/virtio/virtio.c
> >  virtqueue_alloc_element(void *elem, size_t sz, unsigned in_num, unsigned 
> > out_num) "elem %p size %zd in_num %u out_num %u"
> > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> > index c959a59fb3..7e3c6ae032 100644
> > --- a/hw/virtio/vhost.c
> > +++ b/hw/virtio/vhost.c
> > @@ -638,11 +638,64 @@ struct vhost_update_mem_tmp {
> >  /* Called for each MRS from vhost_update_mem */
> >  static int vhost_update_mem_cb(MemoryRegionSection *mrs, void *opaque)
> >  {
> > +struct vhost_update_mem_tmp *vtmp = opaque;
> > +struct vhost_memory_region *cur_vmr;
> > +bool need_add = true;
> > +uint64_t mrs_size;
> > +uint64_t mrs_gpa;
> > +uintptr_t mrs_host;
> > +
> >  if (!vhost_section(mrs)) {
> >  return 0;
> >  }
> > +mrs_size = int128_get64(mrs->size);
> > +mrs_gpa  = mrs->offset_within_address_space;
> > +mrs_host = (uintptr_t)memory_region_get_ram_ptr(mrs->mr) +
> > + mrs->offset_within_region;
> > +
> > +trace_vhost_update_mem_cb(mrs->mr->name, mrs_gpa, mrs_size,
> > +  (uint64_t)mrs_host);
> > +
> > +if (vtmp->nregions) {
> What forces you to maintain helper vhost_memory_region array
> instead of MemoryRegionSection array?

I looked at this - neither is nice.
I think I need to keep the real dev->mem->regions to keep vhost
happy.  If I've got to keep that then I've got to produce something
in that format; so producing it here and comparing it at the end
(possibly with your simple memcmp) works nicely.

The other downside of keeping working with the MemoryRegionSections is
that they have the size as an Int128 which is a pain to work with.

Dave

> > +/* Since we already have at least one region, lets see if
> > + * this extends it; since we're scanning in order, we only
> > + * have to look at the last one, and the FlatView that calls
> > + * us shouldn't have overlaps.
> > + */
> > +struct vhost_memory_region *prev_vmr = vtmp->regions +
> > +   (vtmp->nregions - 1);
> > +uint64_t prev_gpa_start = prev_vmr->guest_phys_addr;
> > +uint64_t prev_gpa_end   = range_get_last(prev_gpa_start,
> > + prev_vmr->memory_size);
> > +uint64_t prev_host_start = prev_vmr->userspace_addr;
> > +uint64_t prev_host_end   = range_get_last(prev_host_start,
> > +  prev_vmr->memory_size);
> > +
> > +if (prev_gpa_end + 1 == mrs_gpa &&
> > +prev_host_end + 1 == mrs_host &&
> > +(!vtmp->dev->vhost_ops->vhost_backend_can_merge ||
> > +vtmp->dev->vhost_ops->vhost_backend_can_merge(vtmp->dev,
> > +mrs_host, mrs_size,
> > +prev_host_start, prev_vmr->memory_size))) {
> > +/* The two regions abut */
> > +need_add = false;
> > +mrs_size = mrs_size + prev_vmr->memory_size;
> > +prev_vmr->memory_size = mrs_size;
> > +trace_vhost_update_mem_cb_abut(mrs->mr->name, mrs_size);
> > +}
> > +}
> > +
> > +if (need_add) {
> > +vtmp->nregions++;
> > +vtmp->regions = g_realloc_n(vtmp->regions, vtmp->nregions,
> > +sizeof(vtmp->regions[0]));
> > +cur_vmr = >regions[vtmp->nregions - 1];
> > +cur_vmr->guest_phys_addr = mrs_gpa;
> > +cur_vmr->memory_size = mrs_size;
> > +cur_vmr->userspace_addr  = mrs_host;
> > +cur_vmr->flags_padding = 0;
> > +}
> >  
> > -/* TODO */
> >  return 0;
> >  }
> >  
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester,

[Qemu-devel] [PATCH v5 20/23] hw: i386: set ram_debug_ops when memory encryption is enabled

2017-12-06 Thread Brijesh Singh

When memory encryption is enabled, the guest RAM and boot flash ROM will
contain the encrypted data. By setting the debug ops allow us to invoke
encryption APIs when accessing the memory for the debug purposes.

Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Eduardo Habkost 
Cc: "Michael S. Tsirkin" 
Signed-off-by: Brijesh Singh 
---
 hw/i386/pc.c   | 9 +
 hw/i386/pc_sysfw.c | 6 ++
 2 files changed, 15 insertions(+)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 186545d2a4e5..937cf75d5545 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1355,6 +1355,15 @@ void pc_memory_init(PCMachineState *pcms,
 e820_add_entry(0x1ULL, pcms->above_4g_mem_size, E820_RAM);
 }
 
+/*
+ * When memory encryption is enabled, the guest RAM will be encrypted with
+ * a guest unique key. Set the debug ops so that any debug access to the
+ * guest RAM will go through the memory encryption APIs.
+ */
+if (kvm_memcrypt_enabled()) {
+kvm_memcrypt_set_debug_ops(ram);
+}
+
 if (!pcmc->has_reserved_memory &&
 (machine->ram_slots ||
  (machine->maxram_size > machine->ram_size))) {
diff --git a/hw/i386/pc_sysfw.c b/hw/i386/pc_sysfw.c
index 8ddbbf74d330..3d149b1c9f3c 100644
--- a/hw/i386/pc_sysfw.c
+++ b/hw/i386/pc_sysfw.c
@@ -180,6 +180,12 @@ static void pc_system_flash_init(MemoryRegion *rom_memory)
 error_report("failed to encrypt pflash rom");
 exit(1);
 }
+
+/*
+ * The pflash ROM is encrypted, set the debug ops so that any
+ * debug accesses will use memory encryption APIs.
+ */
+kvm_memcrypt_set_debug_ops(flash_mem);
 }
 }
 }
-- 
2.9.5

[Qemu-devel] [PATCH v5 19/23] sev: Finalize the SEV guest launch flow

2017-12-06 Thread Brijesh Singh

SEV launch flow requires us to issue LAUNCH_FINISH command before guest
is ready to run.

Cc: Paolo Bonzini 
Cc: k...@vger.kernel.org
Signed-off-by: Brijesh Singh 
---
 accel/kvm/sev.c | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/accel/kvm/sev.c b/accel/kvm/sev.c
index c0eea371fa06..fbbd99becc0a 100644
--- a/accel/kvm/sev.c
+++ b/accel/kvm/sev.c
@@ -454,6 +454,35 @@ static Notifier sev_machine_done_notify = {
 .notify = sev_launch_get_measure,
 };
 
+static void
+sev_launch_finish(SEVState *s)
+{
+int ret, error;
+
+ret = sev_ioctl(KVM_SEV_LAUNCH_FINISH, 0, );
+if (ret) {
+error_report("%s: LAUNCH_FINISH ret=%d fw_error=%d '%s'",
+ __func__, ret, error, fw_error_to_str(error));
+exit(1);
+}
+
+s->cur_state = SEV_STATE_RUNNING;
+DPRINTF("SEV: LAUNCH_FINISH\n");
+}
+
+static void
+sev_vm_state_change(void *opaque, int running, RunState state)
+{
+SEVState *s = opaque;
+
+if (running) {
+/* we are about to resume the guest, finalize the launch flow */
+if (s->cur_state == SEV_STATE_SECRET) {
+sev_launch_finish(s);
+}
+}
+}
+
 void *
 sev_guest_init(const char *id)
 {
@@ -497,6 +526,7 @@ sev_guest_init(const char *id)
 
 ram_block_notifier_add(_ram_notifier);
 qemu_add_machine_init_done_notifier(_machine_done_notify);
+qemu_add_vm_change_state_handler(sev_vm_state_change, s);
 
 sev_state = s;
 
-- 
2.9.5

[Qemu-devel] [PATCH v5 13/23] hmp: display memory encryption support in 'info kvm'

2017-12-06 Thread Brijesh Singh

update 'info kvm' to display the memory encryption support.

(qemu) info kvm
kvm support: enabled
memory encryption: disabled

Cc: "Dr. David Alan Gilbert" 
Cc: Eric Blake 
Cc: Markus Armbruster 
Cc: Paolo Bonzini 
Signed-off-by: Brijesh Singh 
---
 hmp.c| 2 ++
 qapi-schema.json | 5 -
 qmp.c| 1 +
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/hmp.c b/hmp.c
index 35a704182494..3184ed5d1550 100644
--- a/hmp.c
+++ b/hmp.c
@@ -88,6 +88,8 @@ void hmp_info_kvm(Monitor *mon, const QDict *qdict)
 monitor_printf(mon, "kvm support: ");
 if (info->present) {
 monitor_printf(mon, "%s\n", info->enabled ? "enabled" : "disabled");
+monitor_printf(mon, "memory encryption: %s\n",
+   info->mem_encryption ? "enabled" : "disabled");
 } else {
 monitor_printf(mon, "not compiled\n");
 }
diff --git a/qapi-schema.json b/qapi-schema.json
index 18457954a841..7eec403cd34a 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -314,9 +314,12 @@
 #
 # @present: true if KVM acceleration is built into this executable
 #
+# @mem-encryption: true if Memory Encryption is active (since 2.11)
+#
 # Since: 0.14.0
 ##
-{ 'struct': 'KvmInfo', 'data': {'enabled': 'bool', 'present': 'bool'} }
+{ 'struct': 'KvmInfo', 'data': {'enabled': 'bool', 'present': 'bool',
+'mem-encryption' : 'bool'} }
 
 ##
 # @query-kvm:
diff --git a/qmp.c b/qmp.c
index e8c303116af2..baf367af55c0 100644
--- a/qmp.c
+++ b/qmp.c
@@ -69,6 +69,7 @@ KvmInfo *qmp_query_kvm(Error **errp)
 
 info->enabled = kvm_enabled();
 info->present = kvm_available();
+info->mem_encryption = kvm_memcrypt_enabled();
 
 return info;
 }
-- 
2.9.5

[Qemu-devel] [PATCH v5 16/23] target/i386: encrypt bios rom

2017-12-06 Thread Brijesh Singh

SEV requires that guest bios must be encrypted before booting the guest.

Cc: "Michael S. Tsirkin" 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Eduardo Habkost 
Signed-off-by: Brijesh Singh 
---
 hw/i386/pc_sysfw.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/hw/i386/pc_sysfw.c b/hw/i386/pc_sysfw.c
index 6b183747fcea..8ddbbf74d330 100644
--- a/hw/i386/pc_sysfw.c
+++ b/hw/i386/pc_sysfw.c
@@ -112,6 +112,8 @@ static void pc_system_flash_init(MemoryRegion *rom_memory)
 pflash_t *system_flash;
 MemoryRegion *flash_mem;
 char name[64];
+void *flash_ptr;
+int ret, flash_size;
 
 sector_bits = 12;
 sector_size = 1 << sector_bits;
@@ -168,6 +170,17 @@ static void pc_system_flash_init(MemoryRegion *rom_memory)
 if (unit == 0) {
 flash_mem = pflash_cfi01_get_memory(system_flash);
 pc_isa_bios_init(rom_memory, flash_mem, size);
+
+/* Encrypt the pflash boot ROM */
+if (kvm_memcrypt_enabled()) {
+flash_ptr = memory_region_get_ram_ptr(flash_mem);
+flash_size = memory_region_size(flash_mem);
+ret = kvm_memcrypt_encrypt_data(flash_ptr, flash_size);
+if (ret) {
+error_report("failed to encrypt pflash rom");
+exit(1);
+}
+}
 }
 }
 }
-- 
2.9.5

[Qemu-devel] [PATCH v5 12/23] kvm: introduce memory encryption APIs

2017-12-06 Thread Brijesh Singh

Inorder to integerate the Secure Encryption Virtualization (SEV) support
add few high-level memory encryption APIs which can be used for encrypting
the guest memory region.

Cc: Paolo Bonzini 
Cc: k...@vger.kernel.org
Signed-off-by: Brijesh Singh 
---
 accel/kvm/kvm-all.c| 30 ++
 accel/stubs/kvm-stub.c | 14 ++
 include/sysemu/kvm.h   | 25 +
 3 files changed, 69 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index a9b16846675e..54a0fd6097fb 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -107,6 +107,8 @@ struct KVMState
 
 /* memory encryption */
 void *memcrypt_handle;
+int (*memcrypt_encrypt_data)(void *handle, uint8_t *ptr, uint64_t len);
+void (*memcrypt_debug_ops)(void *handle, MemoryRegion *mr);
 };
 
 KVMState *kvm_state;
@@ -142,6 +144,34 @@ int kvm_get_max_memslots(void)
 return s->nr_slots;
 }
 
+bool kvm_memcrypt_enabled(void)
+{
+if (kvm_state && kvm_state->memcrypt_handle) {
+return true;
+}
+
+return false;
+}
+
+int kvm_memcrypt_encrypt_data(uint8_t *ptr, uint64_t len)
+{
+if (kvm_state->memcrypt_handle &&
+kvm_state->memcrypt_encrypt_data) {
+return kvm_state->memcrypt_encrypt_data(kvm_state->memcrypt_handle,
+  ptr, len);
+}
+
+return 1;
+}
+
+void kvm_memcrypt_set_debug_ops(MemoryRegion *mr)
+{
+if (kvm_state->memcrypt_handle &&
+kvm_state->memcrypt_debug_ops) {
+kvm_state->memcrypt_debug_ops(kvm_state->memcrypt_handle, mr);
+}
+}
+
 static KVMSlot *kvm_get_free_slot(KVMMemoryListener *kml)
 {
 KVMState *s = kvm_state;
diff --git a/accel/stubs/kvm-stub.c b/accel/stubs/kvm-stub.c
index c964af3e1c97..5739712a67e3 100644
--- a/accel/stubs/kvm-stub.c
+++ b/accel/stubs/kvm-stub.c
@@ -105,6 +105,20 @@ int kvm_on_sigbus(int code, void *addr)
 return 1;
 }
 
+bool kvm_memcrypt_enabled(void)
+{
+return false;
+}
+
+int kvm_memcrypt_encrypt_data(uint8_t *ptr, uint64_t len)
+{
+  return 1;
+}
+
+void kvm_memcrypt_set_debug_ops(MemoryRegion *mr)
+{
+}
+
 #ifndef CONFIG_USER_ONLY
 int kvm_irqchip_add_msi_route(KVMState *s, int vector, PCIDevice *dev)
 {
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index bbf12a172339..4a5db5dde390 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -231,6 +231,31 @@ int kvm_destroy_vcpu(CPUState *cpu);
  */
 bool kvm_arm_supports_user_irq(void);
 
+/**
+ * kvm_memcrypt_enabled - return boolean indicating whether memory encryption
+ *is enabled
+ * Returns: 1 memory encryption is enabled
+ *  0 memory encryption is disabled
+ */
+bool kvm_memcrypt_enabled(void);
+
+/**
+ * kvm_memcrypt_encrypt_data: encrypt the memory range
+ *
+ * Return: 1 failed to encrypt the range
+ * 0 succesfully encrypted memory region
+ */
+int kvm_memcrypt_encrypt_data(uint8_t *ptr, uint64_t len);
+
+/**
+ * kvm_memcrypt_set_debug_ram_ops: set debug_ram_ops callback
+ *
+ * When debug_ram_ops is set, debug access to this memory region will use
+ * memory encryption APIs.
+ */
+void kvm_memcrypt_set_debug_ops(MemoryRegion *mr);
+
+
 #ifdef NEED_CPU_H
 #include "cpu.h"
 
-- 
2.9.5

[Qemu-devel] [PATCH v5 14/23] sev: add command to create launch memory encryption context

2017-12-06 Thread Brijesh Singh

The KVM_SEV_LAUNCH_START command creates a new VM encryption key (VEK).
The encryption key created with the command will be used for encrypting
the bootstrap images (such as guest bios).

Cc: Paolo Bonzini 
Cc: k...@vger.kernel.org
Signed-off-by: Brijesh Singh 
---
 accel/kvm/sev.c  | 86 
 include/sysemu/sev.h | 11 +++
 2 files changed, 97 insertions(+)

diff --git a/accel/kvm/sev.c b/accel/kvm/sev.c
index 7b5318993969..74eb67526bd0 100644
--- a/accel/kvm/sev.c
+++ b/accel/kvm/sev.c
@@ -22,6 +22,15 @@
 #define DEFAULT_GUEST_POLICY0x1 /* disable debug */
 #define DEFAULT_SEV_DEVICE  "/dev/sev"
 
+#define DEBUG_SEV
+#ifdef DEBUG_SEV
+#define DPRINTF(fmt, ...) \
+do { fprintf(stderr, fmt, ## __VA_ARGS__); } while (0)
+#else
+#define DPRINTF(fmt, ...) \
+do { } while (0)
+#endif
+
 static int sev_fd;
 
 #define SEV_FW_MAX_ERROR  0x17
@@ -288,6 +297,77 @@ lookup_sev_guest_info(const char *id)
 return info;
 }
 
+static int
+sev_read_file_base64(const char *filename, guchar **data, gsize *len)
+{
+gsize sz;
+gchar *base64;
+GError *error = NULL;
+
+if (!g_file_get_contents(filename, , , )) {
+error_report("failed to read '%s' (%s)", filename, error->message);
+return -1;
+}
+
+*data = g_base64_decode(base64, len);
+return 0;
+}
+
+static int
+sev_launch_start(SEVState *s)
+{
+gsize sz;
+int ret = 1;
+int fw_error;
+QSevGuestInfo *sev = s->sev_info;
+struct kvm_sev_launch_start *start;
+guchar *session = NULL, *dh_cert = NULL;
+
+start = g_malloc0(sizeof(*start));
+if (!start) {
+return 1;
+}
+
+start->handle = object_property_get_int(OBJECT(sev), "handle",
+_abort);
+start->policy = object_property_get_int(OBJECT(sev), "policy",
+_abort);
+if (sev->session_file) {
+if (sev_read_file_base64(sev->session_file, , ) < 0) {
+return 1;
+}
+start->session_uaddr = (unsigned long)session;
+start->session_len = sz;
+}
+
+if (sev->dh_cert_file) {
+if (sev_read_file_base64(sev->dh_cert_file, _cert, ) < 0) {
+return 1;
+}
+start->dh_uaddr = (unsigned long)dh_cert;
+start->dh_len = sz;
+}
+
+ret = sev_ioctl(KVM_SEV_LAUNCH_START, start, _error);
+if (ret < 0) {
+error_report("%s: LAUNCH_START ret=%d fw_error=%d '%s'",
+__func__, ret, fw_error, fw_error_to_str(fw_error));
+return 1;
+}
+
+DPRINTF("SEV: LAUNCH_START\n");
+
+object_property_set_int(OBJECT(sev), start->handle, "handle",
+_abort);
+s->cur_state = SEV_STATE_LUPDATE;
+
+g_free(start);
+g_free(session);
+g_free(dh_cert);
+
+return 0;
+}
+
 void *
 sev_guest_init(const char *id)
 {
@@ -323,6 +403,12 @@ sev_guest_init(const char *id)
 goto err;
 }
 
+ret = sev_launch_start(s);
+if (ret) {
+error_report("%s: failed to create encryption context", __func__);
+goto err;
+}
+
 ram_block_notifier_add(_ram_notifier);
 
 return s;
diff --git a/include/sysemu/sev.h b/include/sysemu/sev.h
index f85517c0b5b5..45b464cc96f5 100644
--- a/include/sysemu/sev.h
+++ b/include/sysemu/sev.h
@@ -51,8 +51,19 @@ struct QSevGuestInfoClass {
 ObjectClass parent_class;
 };
 
+enum {
+SEV_STATE_INVALID = 0,
+SEV_STATE_LUPDATE,
+SEV_STATE_SECRET,
+SEV_STATE_RUNNING,
+SEV_STATE_SENDING,
+SEV_STATE_RECEIVING,
+SEV_STATE_MAX
+};
+
 struct SEVState {
 QSevGuestInfo *sev_info;
+int cur_state;
 };
 
 typedef struct SEVState SEVState;
-- 
2.9.5

[Qemu-devel] [PATCH v5 04/23] monitor/i386: use debug APIs when accessing guest memory

2017-12-06 Thread Brijesh Singh

Updates HMP commands to use the debug version of APIs when accessing the
guest memory.

Cc: Paolo Bonzini 
Cc: Peter Crosthwaite 
Cc: Richard Henderson 
Cc: "Dr. David Alan Gilbert" 
Cc: Markus Armbruster 
Cc: Eduardo Habkost 
Signed-off-by: Brijesh Singh 
---
 cpus.c|  2 +-
 disas.c   |  2 +-
 monitor.c |  2 +-
 target/i386/helper.c  | 14 ++--
 target/i386/monitor.c | 59 +++
 5 files changed, 41 insertions(+), 38 deletions(-)

diff --git a/cpus.c b/cpus.c
index 114c29b6a0d3..d1e7e28993e8 100644
--- a/cpus.c
+++ b/cpus.c
@@ -2026,7 +2026,7 @@ void qmp_pmemsave(int64_t addr, int64_t size, const char 
*filename,
 l = sizeof(buf);
 if (l > size)
 l = size;
-cpu_physical_memory_read(addr, buf, l);
+cpu_physical_memory_read_debug(addr, buf, l);
 if (fwrite(buf, 1, l, f) != l) {
 error_setg(errp, QERR_IO_ERROR);
 goto exit;
diff --git a/disas.c b/disas.c
index d4ad1089efb3..fcedbf263302 100644
--- a/disas.c
+++ b/disas.c
@@ -586,7 +586,7 @@ static int
 physical_read_memory(bfd_vma memaddr, bfd_byte *myaddr, int length,
  struct disassemble_info *info)
 {
-cpu_physical_memory_read(memaddr, myaddr, length);
+cpu_physical_memory_read_debug(memaddr, myaddr, length);
 return 0;
 }
 
diff --git a/monitor.c b/monitor.c
index e36fb5308d34..d8f05b9f88fa 100644
--- a/monitor.c
+++ b/monitor.c
@@ -1359,7 +1359,7 @@ static void memory_dump(Monitor *mon, int count, int 
format, int wsize,
 if (l > line_size)
 l = line_size;
 if (is_physical) {
-cpu_physical_memory_read(addr, buf, l);
+cpu_physical_memory_read_debug(addr, buf, l);
 } else {
 if (cpu_memory_rw_debug(cs, addr, buf, l, 0) < 0) {
 monitor_printf(mon, " Cannot access memory\n");
diff --git a/target/i386/helper.c b/target/i386/helper.c
index f63eb3d3f4fb..5dc9e8839bc8 100644
--- a/target/i386/helper.c
+++ b/target/i386/helper.c
@@ -757,7 +757,7 @@ hwaddr x86_cpu_get_phys_page_debug(CPUState *cs, vaddr addr)
 if (la57) {
 pml5e_addr = ((env->cr[3] & ~0xfff) +
 (((addr >> 48) & 0x1ff) << 3)) & a20_mask;
-pml5e = x86_ldq_phys(cs, pml5e_addr);
+pml5e = ldq_phys_debug(cs, pml5e_addr);
 if (!(pml5e & PG_PRESENT_MASK)) {
 return -1;
 }
@@ -767,7 +767,7 @@ hwaddr x86_cpu_get_phys_page_debug(CPUState *cs, vaddr addr)
 
 pml4e_addr = ((pml5e & PG_ADDRESS_MASK) +
 (((addr >> 39) & 0x1ff) << 3)) & a20_mask;
-pml4e = x86_ldq_phys(cs, pml4e_addr);
+pml4e = ldq_phys_debug(cs, pml4e_addr);
 if (!(pml4e & PG_PRESENT_MASK)) {
 return -1;
 }
@@ -788,14 +788,14 @@ hwaddr x86_cpu_get_phys_page_debug(CPUState *cs, vaddr 
addr)
 {
 pdpe_addr = ((env->cr[3] & ~0x1f) + ((addr >> 27) & 0x18)) &
 a20_mask;
-pdpe = x86_ldq_phys(cs, pdpe_addr);
+pdpe = ldq_phys_debug(cs, pdpe_addr);
 if (!(pdpe & PG_PRESENT_MASK))
 return -1;
 }
 
 pde_addr = ((pdpe & PG_ADDRESS_MASK) +
 (((addr >> 21) & 0x1ff) << 3)) & a20_mask;
-pde = x86_ldq_phys(cs, pde_addr);
+pde = ldq_phys_debug(cs, pde_addr);
 if (!(pde & PG_PRESENT_MASK)) {
 return -1;
 }
@@ -808,7 +808,7 @@ hwaddr x86_cpu_get_phys_page_debug(CPUState *cs, vaddr addr)
 pte_addr = ((pde & PG_ADDRESS_MASK) +
 (((addr >> 12) & 0x1ff) << 3)) & a20_mask;
 page_size = 4096;
-pte = x86_ldq_phys(cs, pte_addr);
+pte = ldq_phys_debug(cs, pte_addr);
 }
 if (!(pte & PG_PRESENT_MASK)) {
 return -1;
@@ -818,7 +818,7 @@ hwaddr x86_cpu_get_phys_page_debug(CPUState *cs, vaddr addr)
 
 /* page directory entry */
 pde_addr = ((env->cr[3] & ~0xfff) + ((addr >> 20) & 0xffc)) & a20_mask;
-pde = x86_ldl_phys(cs, pde_addr);
+pde = ldl_phys_debug(cs, pde_addr);
 if (!(pde & PG_PRESENT_MASK))
 return -1;
 if ((pde & PG_PSE_MASK) && (env->cr[4] & CR4_PSE_MASK)) {
@@ -827,7 +827,7 @@ hwaddr x86_cpu_get_phys_page_debug(CPUState *cs, vaddr addr)
 } else {
 /* page directory entry */
 pte_addr = ((pde & ~0xfff) + ((addr >> 10) & 0xffc)) & a20_mask;
-pte = x86_ldl_phys(cs, pte_addr);
+pte = ldl_phys_debug(cs, pte_addr);
 if (!(pte & PG_PRESENT_MASK)) {
 return -1;
 }
diff --git a/target/i386/monitor.c

[Qemu-devel] [PATCH v5 10/23] sev: add command to initialize the memory encryption context

2017-12-06 Thread Brijesh Singh

When memory encryption is enabled, KVM_SEV_INIT command is used to
initialize the platform. The command loads the SEV related persistent
data from non-volatile storage and initializes the platform context.
This command should be first issued before invoking any other guest
commands provided by the SEV firmware.

Cc: Paolo Bonzini 
Signed-off-by: Brijesh Singh 
---
 accel/kvm/kvm-all.c  |  15 +++
 accel/kvm/sev.c  | 122 +++
 include/sysemu/sev.h |  10 +
 3 files changed, 147 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index f290f487a573..a9b16846675e 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -38,6 +38,7 @@
 #include "qemu/event_notifier.h"
 #include "trace.h"
 #include "hw/irq.h"
+#include "sysemu/sev.h"
 
 #include "hw/boards.h"
 
@@ -103,6 +104,9 @@ struct KVMState
 #endif
 KVMMemoryListener memory_listener;
 QLIST_HEAD(, KVMParkedVcpu) kvm_parked_vcpus;
+
+/* memory encryption */
+void *memcrypt_handle;
 };
 
 KVMState *kvm_state;
@@ -1632,6 +1636,17 @@ static int kvm_init(MachineState *ms)
 
 kvm_state = s;
 
+/*
+ * if memory encryption object is specified then initialize the memory
+ * encryption context.
+ * */
+if (ms->memory_encryption) {
+kvm_state->memcrypt_handle = sev_guest_init(ms->memory_encryption);
+if (!kvm_state->memcrypt_handle) {
+goto err;
+}
+}
+
 ret = kvm_arch_init(ms, s);
 if (ret < 0) {
 goto err;
diff --git a/accel/kvm/sev.c b/accel/kvm/sev.c
index a9b9a63c2da0..37020751bd14 100644
--- a/accel/kvm/sev.c
+++ b/accel/kvm/sev.c
@@ -22,6 +22,67 @@
 #define DEFAULT_GUEST_POLICY0x1 /* disable debug */
 #define DEFAULT_SEV_DEVICE  "/dev/sev"
 
+static int sev_fd;
+
+#define SEV_FW_MAX_ERROR  0x17
+
+static char sev_fw_errlist[SEV_FW_MAX_ERROR][100] = {
+"",
+"Platform state is invalid",
+"Guest state is invalid",
+"Platform configuration is invalid",
+"Buffer too small",
+"Platform is already owned",
+"Certificate is invalid",
+"Policy is not allowed",
+"Guest is not active",
+"Invalid address",
+"Bad signature",
+"Bad measurement",
+"Asid is already owned",
+"Invalid ASID",
+"WBINVD is required",
+"DF_FLUSH is required",
+"Guest handle is invalid",
+"Invalid command",
+"Guest is active",
+"Hardware error",
+"Hardware unsafe",
+"Feature not supported",
+"Invalid parameter"
+};
+
+static int
+sev_ioctl(int cmd, void *data, int *error)
+{
+int r;
+struct kvm_sev_cmd input;
+
+memset(, 0x0, sizeof(input));
+
+input.id = cmd;
+input.sev_fd = sev_fd;
+input.data = (__u64)data;
+
+r = kvm_vm_ioctl(kvm_state, KVM_MEMORY_ENCRYPT_OP, );
+
+if (error) {
+*error = input.error;
+}
+
+return r;
+}
+
+static char *
+fw_error_to_str(int code)
+{
+if (code > SEV_FW_MAX_ERROR) {
+return NULL;
+}
+
+return sev_fw_errlist[code];
+}
+
 static void
 qsev_guest_finalize(Object *obj)
 {
@@ -170,6 +231,67 @@ static const TypeInfo qsev_guest_info = {
 }
 };
 
+static QSevGuestInfo *
+lookup_sev_guest_info(const char *id)
+{
+Object *obj;
+QSevGuestInfo *info;
+
+obj = object_resolve_path_component(object_get_objects_root(), id);
+if (!obj) {
+return NULL;
+}
+
+info = (QSevGuestInfo *)
+object_dynamic_cast(obj, TYPE_QSEV_GUEST_INFO);
+if (!info) {
+return NULL;
+}
+
+return info;
+}
+
+void *
+sev_guest_init(const char *id)
+{
+SEVState *s;
+char *devname;
+int ret, fw_error;
+
+s = g_malloc0(sizeof(SEVState));
+if (!s) {
+return NULL;
+}
+
+s->sev_info = lookup_sev_guest_info(id);
+if (!s->sev_info) {
+error_report("%s: '%s' is not a valid '%s' object",
+ __func__, id, TYPE_QSEV_GUEST_INFO);
+goto err;
+}
+
+devname = object_property_get_str(OBJECT(s->sev_info), "sev-device", NULL);
+sev_fd = open(devname, O_RDWR);
+if (sev_fd < 0) {
+error_report("%s: Failed to open %s '%s'", __func__,
+ devname, strerror(errno));
+goto err;
+}
+g_free(devname);
+
+ret = sev_ioctl(KVM_SEV_INIT, NULL, _error);
+if (ret) {
+error_report("%s: failed to initialize ret=%d fw_error=%d '%s'",
+ __func__, ret, fw_error, fw_error_to_str(fw_error));
+goto err;
+}
+
+return s;
+err:
+g_free(s);
+return NULL;
+}
+
 static void
 sev_register_types(void)
 {
diff --git a/include/sysemu/sev.h b/include/sysemu/sev.h
index e00794ec1805..f85517c0b5b5 100644
--- a/include/sysemu/sev.h
+++ b/include/sysemu/sev.h
@@ -14,6 +14,8 @@
 #ifndef QEMU_SEV_H
 #define QEMU_SEV_H
 
+#include 
+
 #include "qom/object.h"
 #include "qapi/error.h"
 #include "sysemu/kvm.h"
@@ -49,5 +51,13 @@

Re: [Qemu-devel] [PATCH] Remove MemoryRegionSection check code from sparc_cpu_get_phys_page_debug()

2017-12-06 Thread Jean-Christophe DUBOIS


Le 04/12/2017 à 21:45, Mark Cave-Ayland a écrit :

On 27/11/17 20:19, Jean-Christophe DUBOIS wrote:


Hello Mark,

Did you get any second opinion on this?

Also do you need me to resend the patch with the SPARC keyword in the 
patch subject line?


Hi Jean-Christophe,

Apologies for the delay as I've been fairly busy with my day job. I 
believe Artyom is away at the moment which is why I haven't written a 
reply, but AFAICT there are 2 options:


1) Remove the MemoryRegion check (as per your patch)

2) Change dump_mmu() to call cpu_sparc_get_phys_page() directly

I'm mildly leaning towards 1) since there doesn't seem to be 
equivalent code in other architectures, however the tree is currently 
in freeze for the upcoming 2.11 release so that's where most people's 
free time is currently being spent.


Once I can confirm the correct approach, I'm keen to get this into the 
2.12 tree early so there is plenty of time to spot any regressions 
during the next development cycle.



OK, thanks for the feedback.

JC



ATB,

Mark.

[Qemu-devel] [PATCH v5 02/23] exec: add ram_debug_ops support

2017-12-06 Thread Brijesh Singh

Currently, the guest memory access for the debug purpose is performed
using the memcpy(). Lets extend the 'struct MemoryRegion' to include
ram_debug_ops callbacks. The ram_debug_ops can be used to override
memcpy() with something else.

The feature can be used by encrypted guest -- which can register
callbacks to override memcpy() with memory encryption/decryption APIs.

a typical usage:

mem_read(uint8_t *dst, uint8_t *src, uint32_t len, MemTxAttrs *attrs);
mem_write(uint8_t *dst, uint8_t *src, uint32_t len, MemTxAttrs *attrs);

MemoryRegionRAMReadWriteOps ops;
ops.read = mem_read;
ops.write = mem_write;

memory_region_init_ram(mem, NULL, "memory", size, NULL);
memory_region_set_ram_debug_ops(mem, ops);

Cc: Paolo Bonzini 
Cc: Peter Crosthwaite 
Cc: Richard Henderson 
Signed-off-by: Brijesh Singh 
---
 exec.c| 65 ++-
 include/exec/memory.h | 27 +
 2 files changed, 76 insertions(+), 16 deletions(-)

diff --git a/exec.c b/exec.c
index 03238a3449d9..9b0ab1648945 100644
--- a/exec.c
+++ b/exec.c
@@ -2981,7 +2981,11 @@ static MemTxResult flatview_write_continue(FlatView *fv, 
hwaddr addr,
 } else {
 /* RAM case */
 ptr = qemu_ram_ptr_length(mr->ram_block, addr1, , false);
-memcpy(ptr, buf, l);
+if (attrs.debug && mr->ram_debug_ops) {
+mr->ram_debug_ops->write(ptr, buf, l, attrs);
+} else {
+memcpy(ptr, buf, l);
+}
 invalidate_and_set_dirty(mr, addr1, l);
 }
 
@@ -3079,7 +3083,10 @@ MemTxResult flatview_read_continue(FlatView *fv, hwaddr 
addr,
 } else {
 /* RAM case */
 ptr = qemu_ram_ptr_length(mr->ram_block, addr1, , false);
-memcpy(buf, ptr, l);
+if (attrs.debug && mr->ram_debug_ops)
+mr->ram_debug_ops->read(buf, ptr, l, attrs);
+else
+memcpy(buf, ptr, l);
 }
 
 if (release_lock) {
@@ -3149,11 +3156,13 @@ void cpu_physical_memory_rw(hwaddr addr, uint8_t *buf,
 
 enum write_rom_type {
 WRITE_DATA,
+READ_DATA,
 FLUSH_CACHE,
 };
 
-static inline void cpu_physical_memory_write_rom_internal(AddressSpace *as,
-hwaddr addr, const uint8_t *buf, int len, enum write_rom_type type)
+static inline void cpu_physical_memory_rw_debug_internal(AddressSpace *as,
+hwaddr addr, uint8_t *buf, int len, MemTxAttrs attrs,
+enum write_rom_type type)
 {
 hwaddr l;
 uint8_t *ptr;
@@ -3168,12 +3177,33 @@ static inline void 
cpu_physical_memory_write_rom_internal(AddressSpace *as,
 if (!(memory_region_is_ram(mr) ||
   memory_region_is_romd(mr))) {
 l = memory_access_size(mr, l, addr1);
+/* Pass MMIO down to address address_space_rw */
+switch (type) {
+case READ_DATA:
+case WRITE_DATA:
+address_space_rw(as, addr1, attrs, buf, l,
+ type == WRITE_DATA);
+break;
+case FLUSH_CACHE:
+break;
+}
 } else {
 /* ROM/RAM case */
 ptr = qemu_map_ram_ptr(mr->ram_block, addr1);
 switch (type) {
+case READ_DATA:
+if (mr->ram_debug_ops) {
+mr->ram_debug_ops->read(buf, ptr, l, attrs);
+} else {
+memcpy(buf, ptr, l);
+}
+break;
 case WRITE_DATA:
-memcpy(ptr, buf, l);
+if (mr->ram_debug_ops) {
+mr->ram_debug_ops->write(ptr, buf, l, attrs);
+} else {
+memcpy(ptr, buf, l);
+}
 invalidate_and_set_dirty(mr, addr1, l);
 break;
 case FLUSH_CACHE:
@@ -3192,7 +3222,8 @@ static inline void 
cpu_physical_memory_write_rom_internal(AddressSpace *as,
 void cpu_physical_memory_write_rom(AddressSpace *as, hwaddr addr,
const uint8_t *buf, int len)
 {
-cpu_physical_memory_write_rom_internal(as, addr, buf, len, WRITE_DATA);
+cpu_physical_memory_rw_debug_internal(as, addr, (uint8_t *)buf, len,
+MEMTXATTRS_UNSPECIFIED, WRITE_DATA);
 }
 
 void cpu_flush_icache_range(hwaddr start, int len)
@@ -3207,8 +3238,10 @@ void cpu_flush_icache_range(hwaddr start, int len)
 return;
 }
 
-cpu_physical_memory_write_rom_internal(_space_memory,
-   start, NULL, len, FLUSH_CACHE);
+cpu_physical_memory_rw_debug_internal(_space_memory,
+   start, NULL, len,
+   MEMTXATTRS_UNSPECIFIED,
+   FLUSH_CACHE);
 }
 
 typedef

[Qemu-devel] [PATCH v5 09/23] accel: add Secure Encrypted Virtulization (SEV) object

2017-12-06 Thread Brijesh Singh

Add a new memory encryption object 'sev-guest'. The object will be used
to create enrypted VMs on AMD EPYC CPU. The object provides the properties
to pass guest owner's public Diffie-hellman key, guest policy and session
information required to create the memory encryption context within the
SEV firmware.

e.g to launch SEV guest
 # $QEMU \
-object sev-guest,id=sev0 \
-machine ,memory-encryption=sev0

Cc: Paolo Bonzini 
Signed-off-by: Brijesh Singh 
---
 accel/kvm/Makefile.objs|   2 +-
 accel/kvm/sev.c| 179 +
 docs/amd-memory-encryption.txt |  17 
 include/sysemu/sev.h   |  53 
 qemu-options.hx|  34 
 5 files changed, 284 insertions(+), 1 deletion(-)
 create mode 100644 accel/kvm/sev.c
 create mode 100644 include/sysemu/sev.h

diff --git a/accel/kvm/Makefile.objs b/accel/kvm/Makefile.objs
index 85351e7de7e8..666ceef3dae3 100644
--- a/accel/kvm/Makefile.objs
+++ b/accel/kvm/Makefile.objs
@@ -1 +1 @@
-obj-$(CONFIG_KVM) += kvm-all.o
+obj-$(CONFIG_KVM) += kvm-all.o sev.o
diff --git a/accel/kvm/sev.c b/accel/kvm/sev.c
new file mode 100644
index ..a9b9a63c2da0
--- /dev/null
+++ b/accel/kvm/sev.c
@@ -0,0 +1,179 @@
+/*
+ * QEMU SEV support
+ *
+ * Copyright Advanced Micro Devices 2016-2017
+ *
+ * Author:
+ *  Brijesh Singh 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qom/object_interfaces.h"
+#include "qemu/base64.h"
+#include "sysemu/kvm.h"
+#include "sysemu/sev.h"
+#include "sysemu/sysemu.h"
+
+#define DEFAULT_GUEST_POLICY0x1 /* disable debug */
+#define DEFAULT_SEV_DEVICE  "/dev/sev"
+
+static void
+qsev_guest_finalize(Object *obj)
+{
+}
+
+static char *
+qsev_guest_get_session_file(Object *obj, Error **errp)
+{
+QSevGuestInfo *s = QSEV_GUEST_INFO(obj);
+
+return s->session_file ? g_strdup(s->session_file) : NULL;
+}
+
+static void
+qsev_guest_set_session_file(Object *obj, const char *value, Error **errp)
+{
+QSevGuestInfo *s = QSEV_GUEST_INFO(obj);
+
+s->session_file = g_strdup(value);
+}
+
+static char *
+qsev_guest_get_dh_cert_file(Object *obj, Error **errp)
+{
+QSevGuestInfo *s = QSEV_GUEST_INFO(obj);
+
+return g_strdup(s->dh_cert_file);
+}
+
+static void
+qsev_guest_set_dh_cert_file(Object *obj, const char *value, Error **errp)
+{
+QSevGuestInfo *s = QSEV_GUEST_INFO(obj);
+
+s->dh_cert_file = g_strdup(value);
+}
+
+static char *
+qsev_guest_get_sev_device(Object *obj, Error **errp)
+{
+QSevGuestInfo *sev = QSEV_GUEST_INFO(obj);
+
+return g_strdup(sev->sev_device);
+}
+
+static void
+qsev_guest_set_sev_device(Object *obj, const char *value, Error **errp)
+{
+QSevGuestInfo *sev = QSEV_GUEST_INFO(obj);
+
+sev->sev_device = g_strdup(value);
+}
+
+static void
+qsev_guest_class_init(ObjectClass *oc, void *data)
+{
+object_class_property_add_str(oc, "sev-device",
+  qsev_guest_get_sev_device,
+  qsev_guest_set_sev_device,
+  NULL);
+object_class_property_set_description(oc, "sev-device",
+"SEV device to use", NULL);
+object_class_property_add_str(oc, "dh-cert-file",
+  qsev_guest_get_dh_cert_file,
+  qsev_guest_set_dh_cert_file,
+  NULL);
+object_class_property_set_description(oc, "dh-cert-file",
+"guest owners DH certificate (encoded with base64)", NULL);
+object_class_property_add_str(oc, "session-file",
+  qsev_guest_get_session_file,
+  qsev_guest_set_session_file,
+  NULL);
+object_class_property_set_description(oc, "session-file",
+"guest owners session parameters (encoded with base64)", NULL);
+}
+
+static void
+qsev_guest_set_handle(Object *obj, Visitor *v, const char *name,
+  void *opaque, Error **errp)
+{
+QSevGuestInfo *sev = QSEV_GUEST_INFO(obj);
+uint32_t value;
+
+visit_type_uint32(v, name, , errp);
+sev->handle = value;
+}
+
+static void
+qsev_guest_set_policy(Object *obj, Visitor *v, const char *name,
+  void *opaque, Error **errp)
+{
+QSevGuestInfo *sev = QSEV_GUEST_INFO(obj);
+uint32_t value;
+
+visit_type_uint32(v, name, , errp);
+sev->policy = value;
+}
+
+static void
+qsev_guest_get_policy(Object *obj, Visitor *v, const char *name,
+  void *opaque, Error **errp)
+{
+uint32_t value;
+QSevGuestInfo *sev = QSEV_GUEST_INFO(obj);
+
+value = sev->policy;
+visit_type_uint32(v, name, , errp);
+}
+
+static void

[Qemu-devel] [PATCH v5 00/23] x86: Secure Encrypted Virtualization (AMD)

2017-12-06 Thread Brijesh Singh

This patch series provides support for AMD's new Secure Encrypted 
Virtualization (SEV) feature.

SEV is an extension to the AMD-V architecture which supports running
multiple VMs under the control of a hypervisor. The SEV feature allows
the memory contents of a virtual machine (VM) to be transparently encrypted
with a key unique to the guest VM. The memory controller contains a
high performance encryption engine which can be programmed with multiple
keys for use by a different VMs in the system. The programming and
management of these keys is handled by the AMD Secure Processor firmware
which exposes a commands for these tasks.

The KVM SEV patch series [1] introduced a new ioctl (KVM_MEMORY_ENCRYPTION_OP)
which is used by qemu to issue the SEV commands to assist performing
common hypervisor activities such as a launching, running, snapshooting,
migration and debugging guests.

The following links provide additional details:

AMD Memory Encryption whitepaper:
 
http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_Memory_Encryption_Whitepaper_v7-Public.pdf

AMD64 Architecture Programmer's Manual:
http://support.amd.com/TechDocs/24593.pdf
SME is section 7.10
SEV is section 15.34

Secure Encrypted Virutualization Key Management:
http://support.amd.com/TechDocs/55766_SEV-KM API_Specification.pdf

KVM Forum slides:
http://www.linux-kvm.org/images/7/74/02x08A-Thomas_Lendacky-AMDs_Virtualizatoin_Memory_Encryption_Technology.pdf

KVM RFC link:

[1] https://marc.info/?l=linux-kernel=151243663119420=2

Video of the KVM Forum Talk:
https://www.youtube.com/watch?v=RcvQ1xN55Ew

---

Using these patches we have succesfully booted and tested a guest both with and
without SEV enabled.

TODO:

* Add SEV guest migration support
* Add SEV guest snapshot and restore support

Changes since v4:
 - extend sev-guest object to add new properties 'dh-cert-file', 'session-file' 
etc.
 - emit SEV_MEASUREMENT event when measurement is available
 - add migration blocker
 - add memory encryption cpuid support
 - rebase the series with recent qemu tree

Changes since v3:
 - update to newer SEV spec (0.12 -> 0.14)
 - update to newer KVM RFC and use KVM_MEMORY_ENCRYPTION_OP ioctl instead
   of KVM_ISSUE_SEV.
 - add support to encrypt plfash

Changes since v2:
- rename ram_ops to ram_debug_ops
- use '-' rather than '_' when adding new member in KvmInfo struct
- update sev object to use link properties when referencing other objects
- use ldq_phys_debug in tlb_info_64 and mem_info_64.
- remove sev-guest-policy object, we will revisit it after basic SEV
  guest support is merged.
- remove kernel API from doc and add SEV guest LAUNCH model. The doc will
  be updated as we integerate the remaining SEV APIs.

Changes since v1:
- Added Documentation
- Added security-policy object.
- Drop sev config parsing support and create new objects to get/set SEV
  specific parameters
- Added sev-guest-info object.
- Added sev-launch-info object.
- Added kvm_memory_encrytion_* APIs. The idea behind this was to allow adding
  a non SEV memory encrytion object without modifying interfaces.
- Drop patch to load OS image at fixed location.
- updated LAUNCH_FINISH command structure. Now the structure contains
  just 'measurement' field. Other fields are not used and will also be removed
  from newer SEV firmware API spec.


Brijesh Singh (23):
  memattrs: add debug attribute
  exec: add ram_debug_ops support
  exec: add debug version of physical memory read and write API
  monitor/i386: use debug APIs when accessing guest memory
  target/i386: define memory encryption cpuid support
  machine: add -memory-encryption property
  kvm: update kvm.h to include memory encryption ioctls
  docs: add AMD Secure Encrypted Virtualization (SEV)
  accel: add Secure Encrypted Virtulization (SEV) object
  sev: add command to initialize the memory encryption context
  sev: register the guest memory range which may contain encrypted data
  kvm: introduce memory encryption APIs
  hmp: display memory encryption support in 'info kvm'
  sev: add command to create launch memory encryption context
  sev: add command to encrypt guest memory region
  target/i386: encrypt bios rom
  qapi: add SEV_MEASUREMENT event
  sev: emit the SEV_MEASUREMENT event
  sev: Finalize the SEV guest launch flow
  hw: i386: set ram_debug_ops when memory encryption is enabled
  sev: add debug encrypt and decrypt commands
  target/i386: clear C-bit when walking SEV guest page table
  sev: add migration blocker

 accel/kvm/Makefile.objs|   2 +-
 accel/kvm/kvm-all.c|  48 +++
 accel/kvm/sev.c| 641 +
 accel/stubs/kvm-stub.c |  14 +
 cpus.c |   2 +-
 disas.c|   2 +-
 docs/amd-memory-encryption.txt | 109 +++
 exec.c |  96 +-
 hmp.c  |   2 +
 hw/core/machine.c  |  22 ++
 hw/i386/pc.c

[Qemu-devel] [PATCH v5 06/23] machine: add -memory-encryption property

2017-12-06 Thread Brijesh Singh

When CPU supports memory encryption feature, the property can be used to
specify the encryption object to use when launching an encrypted guest.

Cc: Paolo Bonzini 
Cc: Eduardo Habkost 
Cc: Marcel Apfelbaum 
Cc: Stefan Hajnoczi 
Signed-off-by: Brijesh Singh 
---
 hw/core/machine.c   | 22 ++
 include/hw/boards.h |  1 +
 qemu-options.hx |  2 ++
 3 files changed, 25 insertions(+)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 36c2fb069c01..132c57bc5124 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -335,6 +335,22 @@ static bool machine_get_enforce_config_section(Object 
*obj, Error **errp)
 return ms->enforce_config_section;
 }
 
+static char *machine_get_memory_encryption(Object *obj, Error **errp)
+{
+MachineState *ms = MACHINE(obj);
+
+return g_strdup(ms->memory_encryption);
+}
+
+static void machine_set_memory_encryption(Object *obj, const char *value,
+Error **errp)
+{
+MachineState *ms = MACHINE(obj);
+
+g_free(ms->memory_encryption);
+ms->memory_encryption = g_strdup(value);
+}
+
 static void error_on_sysbus_device(SysBusDevice *sbdev, void *opaque)
 {
 error_report("Option '-device %s' cannot be handled by this machine",
@@ -598,6 +614,12 @@ static void machine_class_init(ObjectClass *oc, void *data)
 _abort);
 object_class_property_set_description(oc, "enforce-config-section",
 "Set on to enforce configuration section migration", _abort);
+
+object_class_property_add_str(oc, "memory-encryption",
+machine_get_memory_encryption, machine_set_memory_encryption,
+_abort);
+object_class_property_set_description(oc, "memory-encryption",
+"Set memory encyption object to use", _abort);
 }
 
 static void machine_class_base_init(ObjectClass *oc, void *data)
diff --git a/include/hw/boards.h b/include/hw/boards.h
index 156b16f7a6b5..41fa5779557c 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -238,6 +238,7 @@ struct MachineState {
 bool suppress_vmdesc;
 bool enforce_config_section;
 bool enable_graphics;
+char *memory_encryption;
 
 ram_addr_t ram_size;
 ram_addr_t maxram_size;
diff --git a/qemu-options.hx b/qemu-options.hx
index f11c4ac960ff..5385832707e0 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -104,6 +104,8 @@ code to send configuration section even if the machine-type 
sets the
 @option{migration.send-configuration} property to @var{off}.
 NOTE: this parameter is deprecated. Please use @option{-global}
 @option{migration.send-configuration}=@var{on|off} instead.
+@item memory-encryption=@var{}
+Memory encryption object to use. The default is none.
 @end table
 ETEXI
 
-- 
2.9.5

[Qemu-devel] [PATCH 2/2] virtio-blk: reject configs with logical block size > physical block size

2017-12-06 Thread Mark Kanda

virtio-blk logical block size should never be larger than physical block
size because it doesn't make sense to have such configurations. QEMU doesn't
have a way to effectively express this condition; the best it can do is
report the physical block exponent as 0 - indicating the logical block size
equals the physical block size.

This is identical to commit 3da023b5827543ee4c022986ea2ad9d1274410b2
but applied to virtio-blk (instead of virtio-scsi).

Signed-off-by: Mark Kanda 
Reviewed-by: Konrad Rzeszutek Wilk 
Reviewed-by: Ameya More 
---
 hw/block/virtio-blk.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 002c56f..acfca78 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -949,6 +949,13 @@ static void virtio_blk_device_realize(DeviceState *dev, 
Error **errp)
 }
 blkconf_blocksizes(>conf);
 
+if (conf->conf.logical_block_size >
+conf->conf.physical_block_size) {
+error_setg(errp,
+   "logical_block_size > physical_block_size not supported");
+return;
+}
+
 virtio_init(vdev, "virtio-blk", VIRTIO_ID_BLOCK,
 sizeof(struct virtio_blk_config));
 
-- 
1.8.3.1

[Qemu-devel] [PATCH v5 05/23] target/i386: add memory encryption feature cpuid support

2017-12-06 Thread Brijesh Singh

AMD EPYC processors support memory encryption feature. The feature
is reported through CPUID 8000_001F[EAX].

Fn8000_001F [EAX]:
 Bit 0   Secure Memory Encryption (SME) supported
 Bit 1   Secure Encrypted Virtualization (SEV) supported
 Bit 2   Page flush MSR supported
 Bit 3   Ecrypted State (SEV-ES) support

when memory encryption feature is reported, CPUID 8000_001F[EBX] should
provide additional information regarding the feature (such as which page
table bit is used to mark pages as encrypted etc). The information in EBX
and ECX may vary from one family to another hence we use the host cpuid
to populate the EBX information.

The details for memory encryption CPUID is available in AMD APM
(http://support.amd.com/TechDocs/24593.pdf) Section 15.34.1

Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Eduardo Habkost 
Signed-off-by: Brijesh Singh 
---
 target/i386/cpu.c | 36 
 target/i386/cpu.h |  6 ++
 2 files changed, 42 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 045d66191f28..0cc7bb88ce2d 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -233,6 +233,7 @@ static void x86_cpu_vendor_words2str(char *dst, uint32_t 
vendor1,
 #define TCG_EXT4_FEATURES 0
 #define TCG_SVM_FEATURES 0
 #define TCG_KVM_FEATURES 0
+#define TCG_MEM_ENCRYPT_FEATURES 0
 #define TCG_7_0_EBX_FEATURES (CPUID_7_0_EBX_SMEP | CPUID_7_0_EBX_SMAP | \
   CPUID_7_0_EBX_BMI1 | CPUID_7_0_EBX_BMI2 | CPUID_7_0_EBX_ADX | \
   CPUID_7_0_EBX_PCOMMIT | CPUID_7_0_EBX_CLFLUSHOPT |\
@@ -528,6 +529,20 @@ static FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
 .cpuid_reg = R_EDX,
 .tcg_features = ~0U,
 },
+[FEAT_MEM_ENCRYPT] = {
+.feat_names = {
+"sme", "sev", "page-flush-msr", "sev-es",
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+},
+.cpuid_eax = 0x801F, .cpuid_reg = R_EAX,
+.tcg_features = TCG_MEM_ENCRYPT_FEATURES,
+}
 };
 
 typedef struct X86RegisterInfo32 {
@@ -1562,6 +1577,9 @@ static X86CPUDefinition builtin_x86_defs[] = {
 CPUID_XSAVE_XGETBV1,
 .features[FEAT_6_EAX] =
 CPUID_6_EAX_ARAT,
+/* Missing: SEV_ES */
+.features[FEAT_MEM_ENCRYPT] =
+CPUID_8000_001F_EAX_SME | CPUID_8000_001F_EAX_SEV,
 .xlevel = 0x800A,
 .model_id = "AMD EPYC Processor",
 },
@@ -3110,6 +3128,19 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 *edx = 0;
 }
 break;
+case 0x801F:
+if (env->features[FEAT_MEM_ENCRYPT] & CPUID_8000_001F_EAX_SEV) {
+*eax = env->features[FEAT_MEM_ENCRYPT];
+host_cpuid(0x801F, 0, NULL, ebx, NULL, NULL);
+*ecx = 0;
+*edx = 0;
+} else {
+*eax = 0;
+*ebx = 0;
+*ecx = 0;
+*edx = 0;
+}
+break;
 case 0xC000:
 *eax = env->cpuid_xlevel2;
 *ebx = 0;
@@ -3549,10 +3580,15 @@ static void x86_cpu_expand_features(X86CPU *cpu, Error 
**errp)
 x86_cpu_adjust_feat_level(cpu, FEAT_C000_0001_EDX);
 x86_cpu_adjust_feat_level(cpu, FEAT_SVM);
 x86_cpu_adjust_feat_level(cpu, FEAT_XSAVE);
+x86_cpu_adjust_feat_level(cpu, FEAT_MEM_ENCRYPT);
 /* SVM requires CPUID[0x800A] */
 if (env->features[FEAT_8000_0001_ECX] & CPUID_EXT3_SVM) {
 x86_cpu_adjust_level(cpu, >cpuid_min_xlevel, 0x800A);
 }
+/* SEV requires CPUID[0x801F] */
+if ((env->features[FEAT_MEM_ENCRYPT] & CPUID_8000_001F_EAX_SEV)) {
+x86_cpu_adjust_level(cpu, >cpuid_min_xlevel, 0x801F);
+}
 }
 
 /* Set cpuid_*level* based on cpuid_min_*level, if not explicitly set */
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index b086b1528b89..a99e89c368ba 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -463,6 +463,7 @@ typedef enum FeatureWord {
 FEAT_6_EAX, /* CPUID[6].EAX */
 FEAT_XSAVE_COMP_LO, /* CPUID[EAX=0xd,ECX=0].EAX */
 FEAT_XSAVE_COMP_HI, /* CPUID[EAX=0xd,ECX=0].EDX */
+FEAT_MEM_ENCRYPT,   /* CPUID[8000_001F].EAX */
 FEATURE_WORDS,
 } FeatureWord;
 
@@ -649,6 +650,11 @@ typedef uint32_t FeatureWordArray[FEATURE_WORDS];
 
 #define CPUID_6_EAX_ARAT   (1U << 2)
 
+#define CPUID_8000_001F_EAX_SME (1U << 0) /* SME */
+#define CPUID_8000_001F_EAX_SEV (1U << 1) /* SEV */
+#define CPUID_8000_001F_EAX_PAGE_FLUSH_MSR  (1U << 2) /* Page flush MSR */
+#define CPUID_8000_001F_EAX_SEV_ES  (1U << 3) /* SEV-ES */
+
 /* CPUID[0x8007].EDX flags: */
 #define

[Qemu-devel] [PATCH v5 23/23] sev: add migration blocker

2017-12-06 Thread Brijesh Singh

SEV guest migration is not yet implemented yet.

Signed-off-by: Brijesh Singh 
---
 accel/kvm/sev.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/accel/kvm/sev.c b/accel/kvm/sev.c
index 3edfb5b08416..10647645eacd 100644
--- a/accel/kvm/sev.c
+++ b/accel/kvm/sev.c
@@ -19,6 +19,7 @@
 #include "sysemu/sev.h"
 #include "sysemu/sysemu.h"
 #include "qapi-event.h"
+#include "migration/blocker.h"
 
 #define DEFAULT_GUEST_POLICY0x1 /* disable debug */
 #define DEFAULT_SEV_DEVICE  "/dev/sev"
@@ -36,6 +37,7 @@
 static int sev_fd;
 static SEVState *sev_state;
 static MemoryRegionRAMReadWriteOps  sev_ops;
+static Error *sev_mig_blocker;
 
 #define SEV_FW_MAX_ERROR  0x17
 
@@ -460,6 +462,7 @@ static void
 sev_launch_finish(SEVState *s)
 {
 int ret, error;
+Error *local_err = NULL;
 
 ret = sev_ioctl(KVM_SEV_LAUNCH_FINISH, 0, );
 if (ret) {
@@ -470,6 +473,16 @@ sev_launch_finish(SEVState *s)
 
 s->cur_state = SEV_STATE_RUNNING;
 DPRINTF("SEV: LAUNCH_FINISH\n");
+
+/* add migration blocker */
+error_setg(_mig_blocker,
+   "SEV: Migration is not implemented");
+ret = migrate_add_blocker(sev_mig_blocker, _err);
+if (local_err) {
+error_report_err(local_err);
+error_free(sev_mig_blocker);
+exit(1);
+}
 }
 
 static void
-- 
2.9.5

[Qemu-devel] reminder: please include justification/explanation for any pull request after rc2

2017-12-06 Thread Peter Maydell

Hi; I just wanted to send out a general email to ask submaintainers
who send me pullrequests for patches to go in after about rc2 to
make sure they include a justification and explanation of why the
bugs being fixed mean the changes need to go into the release.
I realized that I wasn't sure if I'd ever actually explicitly
asked people to do that, hence this mail.

The way our process works is that at the start of the rc cycle
we put in pretty much any bug fix, and as we move forwards the
bar gets steadily higher (so we prefer bug fixes that are simple,
that clearly don't affect other subsystems, that fix regressions
since the previous release, and so on), until by the last rc we
are very reluctant indeed to put in any more code.

I've had a few pull requests this cycle where the maintainer hasn't
put a clear explanation in the cover letter of why the patches
need to go into the release. I don't have a good grasp of every
subsystem and usually haven't been following the mailing list
threads, so I won't know whether the bugs being fixed are important
or trivial, risky or very safe. So I have to send an email
asking for clarification and get a reply, which can waste a day
if there are timezone differences or other delays.

So if people could make sure they write good cover letters that
give me the context I need for managing the release process,
that will make my job easier and the whole thing smoother.

thanks
-- PMM

[Qemu-devel] [PATCH 0/2] virtio-blk: miscellaneous changes

2017-12-06 Thread Mark Kanda

This series is for two minor virtio-blk changes. The first patch
makes the virtio-blk queue size user configurable. The second patch
rejects logical block size > physical block configurations (similar
to a recent change in virtio-scsi).

Mark Kanda (2):
  virtio-blk: make queue size configurable
  virtio-blk: reject configs with logical block size > physical block
size

 hw/block/virtio-blk.c  | 14 +-
 include/hw/virtio/virtio-blk.h |  1 +
 2 files changed, 14 insertions(+), 1 deletion(-)

-- 
1.8.3.1

[Qemu-devel] [PATCH v5 03/23] exec: add debug version of physical memory read and write API

2017-12-06 Thread Brijesh Singh

Adds the following new APIs
- cpu_physical_memory_read_debug
- cpu_physical_memory_write_debug
- cpu_physical_memory_rw_debug
- ldl_phys_debug
- ldq_phys_debug

Cc: Paolo Bonzini 
Cc: Peter Crosthwaite 
Cc: Richard Henderson 
Signed-off-by: Brijesh Singh 
Reviewed-by: Paolo Bonzini 
---
 exec.c| 31 +++
 include/exec/cpu-common.h | 15 +++
 2 files changed, 46 insertions(+)

diff --git a/exec.c b/exec.c
index 9b0ab1648945..e1837cad61f9 100644
--- a/exec.c
+++ b/exec.c
@@ -3540,6 +3540,37 @@ void address_space_cache_destroy(MemoryRegionCache 
*cache)
 #define RCU_READ_UNLOCK()rcu_read_unlock()
 #include "memory_ldst.inc.c"
 
+uint32_t ldl_phys_debug(CPUState *cpu, hwaddr addr)
+{
+MemTxAttrs attrs = MEMTXATTRS_DEBUG;
+int asidx = cpu_asidx_from_attrs(cpu, attrs);
+uint32_t val;
+
+cpu_physical_memory_rw_debug_internal(cpu->cpu_ases[asidx].as,
+  addr, (void *) ,
+  4, attrs, READ_DATA);
+return tswap32(val);
+}
+
+uint64_t ldq_phys_debug(CPUState *cpu, hwaddr addr)
+{
+MemTxAttrs attrs = MEMTXATTRS_DEBUG;
+int asidx = cpu_asidx_from_attrs(cpu, attrs);
+uint64_t val;
+
+cpu_physical_memory_rw_debug_internal(cpu->cpu_ases[asidx].as,
+  addr, (void *) ,
+  8, attrs, READ_DATA);
+return val;
+}
+
+void cpu_physical_memory_rw_debug(hwaddr addr, uint8_t *buf,
+  int len, int is_write)
+{
+address_space_rw(_space_memory, addr, MEMTXATTRS_DEBUG, buf,
+ len, is_write);
+}
+
 /* virtual memory access for debug (includes writing to ROM) */
 int cpu_memory_rw_debug(CPUState *cpu, target_ulong addr,
 uint8_t *buf, int len, int is_write)
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index 74341b19d26a..fa01385d4f1b 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -77,11 +77,26 @@ size_t qemu_ram_pagesize_largest(void);
 
 void cpu_physical_memory_rw(hwaddr addr, uint8_t *buf,
 int len, int is_write);
+void cpu_physical_memory_rw_debug(hwaddr addr, uint8_t *buf,
+  int len, int is_write);
 static inline void cpu_physical_memory_read(hwaddr addr,
 void *buf, int len)
 {
 cpu_physical_memory_rw(addr, buf, len, 0);
 }
+static inline void cpu_physical_memory_read_debug(hwaddr addr,
+  void *buf, int len)
+{
+cpu_physical_memory_rw_debug(addr, buf, len, 0);
+}
+static inline void cpu_physical_memory_write_debug(hwaddr addr,
+   const void *buf, int len)
+{
+cpu_physical_memory_rw_debug(addr, (void *)buf, len, 1);
+}
+uint32_t ldl_phys_debug(CPUState *cpu, hwaddr addr);
+uint64_t ldq_phys_debug(CPUState *cpu, hwaddr addr);
+
 static inline void cpu_physical_memory_write(hwaddr addr,
  const void *buf, int len)
 {
-- 
2.9.5

[Qemu-devel] [PATCH v5 21/23] sev: add debug encrypt and decrypt commands

2017-12-06 Thread Brijesh Singh

KVM_SEV_DBG_DECRYPT and KVM_SEV_DBG_ENCRYPT commands are used for
decrypting and encrypting guest memory region. The command works only if
the guest policy allows the debugging.

Cc: Paolo Bonzini 
Cc: k...@vger.kernel.org
Signed-off-by: Brijesh Singh 
---
 accel/kvm/kvm-all.c  |  1 +
 accel/kvm/sev.c  | 70 
 include/sysemu/sev.h |  1 +
 3 files changed, 72 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index d35eebb97901..b069261de32a 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -1677,6 +1677,7 @@ static int kvm_init(MachineState *ms)
 }
 
 kvm_state->memcrypt_encrypt_data = sev_encrypt_data;
+kvm_state->memcrypt_debug_ops = sev_set_debug_ops;
 }
 
 ret = kvm_arch_init(ms, s);
diff --git a/accel/kvm/sev.c b/accel/kvm/sev.c
index fbbd99becc0a..3edfb5b08416 100644
--- a/accel/kvm/sev.c
+++ b/accel/kvm/sev.c
@@ -22,6 +22,7 @@
 
 #define DEFAULT_GUEST_POLICY0x1 /* disable debug */
 #define DEFAULT_SEV_DEVICE  "/dev/sev"
+#define GUEST_POLICY_DBG_BIT0x1
 
 #define DEBUG_SEV
 #ifdef DEBUG_SEV
@@ -34,6 +35,7 @@
 
 static int sev_fd;
 static SEVState *sev_state;
+static MemoryRegionRAMReadWriteOps  sev_ops;
 
 #define SEV_FW_MAX_ERROR  0x17
 
@@ -483,6 +485,49 @@ sev_vm_state_change(void *opaque, int running, RunState 
state)
 }
 }
 
+static int
+sev_dbg_enc_dec(uint8_t *dst, const uint8_t *src, uint32_t len, bool write)
+{
+int ret, error;
+struct kvm_sev_dbg *dbg;
+dbg = g_malloc0(sizeof(*dbg));
+if (!dbg) {
+return 1;
+}
+
+dbg->src_uaddr = (unsigned long)src;
+dbg->dst_uaddr = (unsigned long)dst;
+dbg->len = len;
+
+ret = sev_ioctl(write ? KVM_SEV_DBG_ENCRYPT : KVM_SEV_DBG_DECRYPT,
+dbg, );
+if (ret) {
+error_report("%s (%s) %#llx->%#llx+%#x ret=%d fw_error=%d '%s'",
+ __func__, write ? "write" : "read", dbg->src_uaddr,
+ dbg->dst_uaddr, dbg->len, ret, error,
+ fw_error_to_str(error));
+}
+
+g_free(dbg);
+return ret;
+}
+
+static int
+sev_mem_read(uint8_t *dst, const uint8_t *src, uint32_t len, MemTxAttrs attrs)
+{
+assert(attrs.debug);
+
+return sev_dbg_enc_dec(dst, src, len, false);
+}
+
+static int
+sev_mem_write(uint8_t *dst, const uint8_t *src, uint32_t len, MemTxAttrs attrs)
+{
+assert(attrs.debug);
+
+return sev_dbg_enc_dec(dst, src, len, true);
+}
+
 void *
 sev_guest_init(const char *id)
 {
@@ -549,6 +594,31 @@ sev_encrypt_data(void *handle, uint8_t *ptr, uint64_t len)
 return 0;
 }
 
+void
+sev_set_debug_ops(void *handle, MemoryRegion *mr)
+{
+int policy;
+SEVState *s = (SEVState *)handle;
+
+policy = object_property_get_int(OBJECT(s->sev_info),
+ "policy", _abort);
+
+/*
+ * Check if guest policy supports debugging
+ * Bit 0 :
+ *   0 - debug allowed
+ *   1 - debug is not allowed
+ */
+if (policy & GUEST_POLICY_DBG_BIT) {
+return;
+}
+
+sev_ops.read = sev_mem_read;
+sev_ops.write = sev_mem_write;
+
+memory_region_set_ram_debug_ops(mr, _ops);
+}
+
 static void
 sev_register_types(void)
 {
diff --git a/include/sysemu/sev.h b/include/sysemu/sev.h
index 3af945935b60..7c50d33af4a9 100644
--- a/include/sysemu/sev.h
+++ b/include/sysemu/sev.h
@@ -71,6 +71,7 @@ typedef struct SEVState SEVState;
 
 void *sev_guest_init(const char *id);
 int sev_encrypt_data(void *handle, uint8_t *ptr, uint64_t len);
+void sev_set_debug_ops(void *handle, MemoryRegion *mr);
 
 #endif
 
-- 
2.9.5

[Qemu-devel] [PATCH v5 22/23] target/i386: clear C-bit when walking SEV guest page table

2017-12-06 Thread Brijesh Singh

In SEV-enabled guest the pte entry will have C-bit set, we need to
clear the C-bit when walking the page table. The C-bit position should
be available in cpuid Fn8000_001f[EBX]

Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Eduardo Habkost 
Signed-off-by: Brijesh Singh 
---
 target/i386/helper.c  | 46 +++
 target/i386/monitor.c | 86 ---
 2 files changed, 94 insertions(+), 38 deletions(-)

diff --git a/target/i386/helper.c b/target/i386/helper.c
index 5dc9e8839bc8..7dbbb9812950 100644
--- a/target/i386/helper.c
+++ b/target/i386/helper.c
@@ -723,6 +723,22 @@ void cpu_x86_update_cr4(CPUX86State *env, uint32_t new_cr4)
 }
 
 #if !defined(CONFIG_USER_ONLY)
+static uint64_t get_me_mask(void)
+{
+uint64_t me_mask = 0;
+
+/*
+ * When SEV is active, Fn8000_001F[EBX] Bit 0:5 contains the C-bit position
+ */
+if (kvm_memcrypt_enabled()) {
+uint32_t pos;
+pos = kvm_arch_get_supported_cpuid(kvm_state, 0x801f, 0, R_EBX);
+me_mask = (1UL << (pos & 0x3f));
+}
+
+return ~me_mask;
+}
+
 hwaddr x86_cpu_get_phys_page_debug(CPUState *cs, vaddr addr)
 {
 X86CPU *cpu = X86_CPU(cs);
@@ -732,6 +748,9 @@ hwaddr x86_cpu_get_phys_page_debug(CPUState *cs, vaddr addr)
 int32_t a20_mask;
 uint32_t page_offset;
 int page_size;
+uint64_t me_mask;
+
+me_mask = get_me_mask();
 
 a20_mask = x86_get_a20_mask(env);
 if (!(env->cr[0] & CR0_PG_MASK)) {
@@ -755,25 +774,25 @@ hwaddr x86_cpu_get_phys_page_debug(CPUState *cs, vaddr 
addr)
 }
 
 if (la57) {
-pml5e_addr = ((env->cr[3] & ~0xfff) +
+pml5e_addr = ((env->cr[3] & ~0xfff & me_mask) +
 (((addr >> 48) & 0x1ff) << 3)) & a20_mask;
-pml5e = ldq_phys_debug(cs, pml5e_addr);
+pml5e = ldq_phys_debug(cs, pml5e_addr) & me_mask;
 if (!(pml5e & PG_PRESENT_MASK)) {
 return -1;
 }
 } else {
-pml5e = env->cr[3];
+pml5e = env->cr[3] & me_mask;
 }
 
 pml4e_addr = ((pml5e & PG_ADDRESS_MASK) +
 (((addr >> 39) & 0x1ff) << 3)) & a20_mask;
-pml4e = ldq_phys_debug(cs, pml4e_addr);
+pml4e = ldq_phys_debug(cs, pml4e_addr) & me_mask;
 if (!(pml4e & PG_PRESENT_MASK)) {
 return -1;
 }
 pdpe_addr = ((pml4e & PG_ADDRESS_MASK) +
  (((addr >> 30) & 0x1ff) << 3)) & a20_mask;
-pdpe = x86_ldq_phys(cs, pdpe_addr);
+pdpe = ldq_phys_debug(cs, pdpe_addr) & me_mask;
 if (!(pdpe & PG_PRESENT_MASK)) {
 return -1;
 }
@@ -786,16 +805,16 @@ hwaddr x86_cpu_get_phys_page_debug(CPUState *cs, vaddr 
addr)
 } else
 #endif
 {
-pdpe_addr = ((env->cr[3] & ~0x1f) + ((addr >> 27) & 0x18)) &
-a20_mask;
-pdpe = ldq_phys_debug(cs, pdpe_addr);
+pdpe_addr = ((env->cr[3] & ~0x1f & me_mask) + ((addr >> 27) & 
0x18))
+  & a20_mask;
+pdpe = ldq_phys_debug(cs, pdpe_addr) & me_mask;
 if (!(pdpe & PG_PRESENT_MASK))
 return -1;
 }
 
 pde_addr = ((pdpe & PG_ADDRESS_MASK) +
 (((addr >> 21) & 0x1ff) << 3)) & a20_mask;
-pde = ldq_phys_debug(cs, pde_addr);
+pde = ldq_phys_debug(cs, pde_addr) & me_mask;
 if (!(pde & PG_PRESENT_MASK)) {
 return -1;
 }
@@ -808,7 +827,7 @@ hwaddr x86_cpu_get_phys_page_debug(CPUState *cs, vaddr addr)
 pte_addr = ((pde & PG_ADDRESS_MASK) +
 (((addr >> 12) & 0x1ff) << 3)) & a20_mask;
 page_size = 4096;
-pte = ldq_phys_debug(cs, pte_addr);
+pte = ldq_phys_debug(cs, pte_addr) & me_mask;
 }
 if (!(pte & PG_PRESENT_MASK)) {
 return -1;
@@ -817,8 +836,9 @@ hwaddr x86_cpu_get_phys_page_debug(CPUState *cs, vaddr addr)
 uint32_t pde;
 
 /* page directory entry */
-pde_addr = ((env->cr[3] & ~0xfff) + ((addr >> 20) & 0xffc)) & a20_mask;
-pde = ldl_phys_debug(cs, pde_addr);
+pde_addr = ((env->cr[3] & ~0xfff & me_mask) + ((addr >> 20) & 0xffc))
+ & a20_mask;
+pde = ldl_phys_debug(cs, pde_addr) & me_mask;
 if (!(pde & PG_PRESENT_MASK))
 return -1;
 if ((pde & PG_PSE_MASK) && (env->cr[4] & CR4_PSE_MASK)) {
@@ -827,7 +847,7 @@ hwaddr x86_cpu_get_phys_page_debug(CPUState *cs, vaddr addr)
 } else {
 /* page directory entry */
 pte_addr = ((pde & ~0xfff) + ((addr >> 10) & 0xffc)) & a20_mask;
-pte = ldl_phys_debug(cs, pte_addr);
+pte =

[Qemu-devel] [PATCH v5 01/23] memattrs: add debug attribute

2017-12-06 Thread Brijesh Singh

The debug attribute will be set when qemu attempts to access the guest
memory for debug (e.g memory access from gdbstub, memory dump commands
etc).

When guest memory is encrypted, the debug access will need to go through
the memory encryption APIs.

Cc: Alistair Francis 
Cc: Peter Maydell 
Cc: Edgar E. Iglesias" 
Cc: Richard Henderson 
Cc: Paolo Bonzini 
Signed-off-by: Brijesh Singh 
---
 include/exec/memattrs.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/exec/memattrs.h b/include/exec/memattrs.h
index d4a16420984b..721362e06292 100644
--- a/include/exec/memattrs.h
+++ b/include/exec/memattrs.h
@@ -37,6 +37,8 @@ typedef struct MemTxAttrs {
 unsigned int user:1;
 /* Requester ID (for MSI for example) */
 unsigned int requester_id:16;
+/* Debug memory access for encrypted guest */
+unsigned int debug:1;
 } MemTxAttrs;
 
 /* Bus masters which don't specify any attributes will get this,
@@ -56,4 +58,6 @@ typedef struct MemTxAttrs {
 #define MEMTX_DECODE_ERROR  (1U << 1) /* nothing at that address */
 typedef uint32_t MemTxResult;
 
+/* Access the guest memory for debug purposes */
+#define MEMTXATTRS_DEBUG ((MemTxAttrs) { .debug = 1 })
 #endif
-- 
2.9.5

[Qemu-devel] [PATCH v5 15/23] sev: add command to encrypt guest memory region

2017-12-06 Thread Brijesh Singh

The KVM_SEV_LAUNCH_UPDATE_DATA command is used to encrypt a guest memory
region using the VM Encryption Key created using LAUNCH_START.

Signed-off-by: Brijesh Singh 
---
 accel/kvm/kvm-all.c  |  2 ++
 accel/kvm/sev.c  | 44 
 include/sysemu/sev.h |  1 +
 3 files changed, 47 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 54a0fd6097fb..d35eebb97901 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -1675,6 +1675,8 @@ static int kvm_init(MachineState *ms)
 if (!kvm_state->memcrypt_handle) {
 goto err;
 }
+
+kvm_state->memcrypt_encrypt_data = sev_encrypt_data;
 }
 
 ret = kvm_arch_init(ms, s);
diff --git a/accel/kvm/sev.c b/accel/kvm/sev.c
index 74eb67526bd0..83fc950bd3ac 100644
--- a/accel/kvm/sev.c
+++ b/accel/kvm/sev.c
@@ -368,6 +368,37 @@ sev_launch_start(SEVState *s)
 return 0;
 }
 
+static int
+sev_launch_update_data(uint8_t *addr, uint64_t len)
+{
+int ret, fw_error;
+struct kvm_sev_launch_update_data *update;
+
+if (addr == NULL || len <= 0) {
+return 1;
+}
+
+update = g_malloc0(sizeof(*update));
+if (!update) {
+return 1;
+}
+
+update->uaddr = (__u64)addr;
+update->len = len;
+ret = sev_ioctl(KVM_SEV_LAUNCH_UPDATE_DATA, update, _error);
+if (ret) {
+error_report("%s: LAUNCH_UPDATE ret=%d fw_error=%d '%s'",
+__func__, ret, fw_error, fw_error_to_str(fw_error));
+goto err;
+}
+
+DPRINTF("SEV: LAUNCH_UPDATE_DATA %#lx+%#lx\n", (unsigned long)addr, len);
+
+err:
+g_free(update);
+return ret;
+}
+
 void *
 sev_guest_init(const char *id)
 {
@@ -417,6 +448,19 @@ err:
 return NULL;
 }
 
+int
+sev_encrypt_data(void *handle, uint8_t *ptr, uint64_t len)
+{
+SEVState *s = (SEVState *)handle;
+
+/* if SEV is in update state then encrypt the data else do nothing */
+if (s->cur_state == SEV_STATE_LUPDATE) {
+return sev_launch_update_data(ptr, len);
+}
+
+return 0;
+}
+
 static void
 sev_register_types(void)
 {
diff --git a/include/sysemu/sev.h b/include/sysemu/sev.h
index 45b464cc96f5..b1ea3f805290 100644
--- a/include/sysemu/sev.h
+++ b/include/sysemu/sev.h
@@ -69,6 +69,7 @@ struct SEVState {
 typedef struct SEVState SEVState;
 
 void *sev_guest_init(const char *id);
+int sev_encrypt_data(void *handle, uint8_t *ptr, uint64_t len);
 
 #endif
 
-- 
2.9.5

[Qemu-devel] [PATCH v5 18/23] sev: emit the SEV_MEASUREMENT event

2017-12-06 Thread Brijesh Singh

During machine creation we encrypted the guest bios image, the
LAUNCH_MEASURE command can be used to retrieve the measurement of
the encrypted memory region. Emit the SEV_MEASUREMENT event so that
libvirt can grab the measurement value as soon as we are done with
creating the encrypted machine.

Cc: Daniel P. Berrange 
Cc: Paolo Bonzini 
Cc: k...@vger.kernel.org
Signed-off-by: Brijesh Singh 
---
 accel/kvm/sev.c  | 58 
 include/sysemu/sev.h |  1 +
 2 files changed, 59 insertions(+)

diff --git a/accel/kvm/sev.c b/accel/kvm/sev.c
index 83fc950bd3ac..c0eea371fa06 100644
--- a/accel/kvm/sev.c
+++ b/accel/kvm/sev.c
@@ -18,6 +18,7 @@
 #include "sysemu/kvm.h"
 #include "sysemu/sev.h"
 #include "sysemu/sysemu.h"
+#include "qapi-event.h"
 
 #define DEFAULT_GUEST_POLICY0x1 /* disable debug */
 #define DEFAULT_SEV_DEVICE  "/dev/sev"
@@ -32,6 +33,7 @@
 #endif
 
 static int sev_fd;
+static SEVState *sev_state;
 
 #define SEV_FW_MAX_ERROR  0x17
 
@@ -399,6 +401,59 @@ err:
 return ret;
 }
 
+static void
+sev_launch_get_measure(Notifier *notifier, void *unused)
+{
+int ret, error;
+guchar *data;
+SEVState *s = sev_state;
+struct kvm_sev_launch_measure *measurement;
+
+measurement = g_malloc0(sizeof(*measurement));
+if (!measurement) {
+return;
+}
+
+/* query the measurement blob length */
+ret = sev_ioctl(KVM_SEV_LAUNCH_MEASURE, measurement, );
+if (!measurement->len) {
+error_report("%s: LAUNCH_MEASURE ret=%d fw_error=%d '%s'",
+ __func__, ret, error, fw_error_to_str(errno));
+goto free_measurement;
+}
+
+s->cur_state = SEV_STATE_SECRET;
+
+data = g_malloc(measurement->len);
+if (s->measurement) {
+goto free_data;
+}
+
+measurement->uaddr = (unsigned long)data;
+
+/* get the measurement blob */
+ret = sev_ioctl(KVM_SEV_LAUNCH_MEASURE, measurement, );
+if (ret) {
+error_report("%s: LAUNCH_MEASURE ret=%d fw_error=%d '%s'",
+ __func__, ret, error, fw_error_to_str(errno));
+goto free_data;
+}
+
+s->measurement = g_base64_encode(data, measurement->len);
+
+DPRINTF("SEV: MEASUREMENT: %s\n", s->measurement);
+qapi_event_send_sev_measurement(s->measurement, _abort);
+
+free_data:
+g_free(data);
+free_measurement:
+g_free(measurement);
+}
+
+static Notifier sev_machine_done_notify = {
+.notify = sev_launch_get_measure,
+};
+
 void *
 sev_guest_init(const char *id)
 {
@@ -441,6 +496,9 @@ sev_guest_init(const char *id)
 }
 
 ram_block_notifier_add(_ram_notifier);
+qemu_add_machine_init_done_notifier(_machine_done_notify);
+
+sev_state = s;
 
 return s;
 err:
diff --git a/include/sysemu/sev.h b/include/sysemu/sev.h
index b1ea3f805290..3af945935b60 100644
--- a/include/sysemu/sev.h
+++ b/include/sysemu/sev.h
@@ -64,6 +64,7 @@ enum {
 struct SEVState {
 QSevGuestInfo *sev_info;
 int cur_state;
+gchar *measurement;
 };
 
 typedef struct SEVState SEVState;
-- 
2.9.5

[Qemu-devel] [PATCH 1/2] virtio-blk: make queue size configurable

2017-12-06 Thread Mark Kanda

Depending on the configuration, it can be beneficial to adjust the virtio-blk
queue size to something other than the current default of 128. Add a new
property to make the queue size configurable.

Signed-off-by: Mark Kanda 
Reviewed-by: Karl Heubaum 
Reviewed-by: Martin K. Petersen 
Reviewed-by: Ameya More 
---
 hw/block/virtio-blk.c  | 7 ++-
 include/hw/virtio/virtio-blk.h | 1 +
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 05d1440..002c56f 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -928,6 +928,10 @@ static void virtio_blk_device_realize(DeviceState *dev, 
Error **errp)
 error_setg(errp, "num-queues property must be larger than 0");
 return;
 }
+if (!is_power_of_2(conf->queue_size)) {
+error_setg(errp, "queue-size property must be a power of 2");
+return;
+}
 
 blkconf_serial(>conf, >serial);
 blkconf_apply_backend_options(>conf,
@@ -953,7 +957,7 @@ static void virtio_blk_device_realize(DeviceState *dev, 
Error **errp)
 s->sector_mask = (s->conf.conf.logical_block_size / BDRV_SECTOR_SIZE) - 1;
 
 for (i = 0; i < conf->num_queues; i++) {
-virtio_add_queue(vdev, 128, virtio_blk_handle_output);
+virtio_add_queue(vdev, conf->queue_size, virtio_blk_handle_output);
 }
 virtio_blk_data_plane_create(vdev, conf, >dataplane, );
 if (err != NULL) {
@@ -1012,6 +1016,7 @@ static Property virtio_blk_properties[] = {
 DEFINE_PROP_BIT("request-merging", VirtIOBlock, conf.request_merging, 0,
 true),
 DEFINE_PROP_UINT16("num-queues", VirtIOBlock, conf.num_queues, 1),
+DEFINE_PROP_UINT16("queue-size", VirtIOBlock, conf.queue_size, 128),
 DEFINE_PROP_LINK("iothread", VirtIOBlock, conf.iothread, TYPE_IOTHREAD,
  IOThread *),
 DEFINE_PROP_END_OF_LIST(),
diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
index d3c8a6f..5117431 100644
--- a/include/hw/virtio/virtio-blk.h
+++ b/include/hw/virtio/virtio-blk.h
@@ -39,6 +39,7 @@ struct VirtIOBlkConf
 uint32_t config_wce;
 uint32_t request_merging;
 uint16_t num_queues;
+uint16_t queue_size;
 };
 
 struct VirtIOBlockDataPlane;
-- 
1.8.3.1

[Qemu-devel] [PATCH v5 17/23] qapi: add SEV_MEASUREMENT event

2017-12-06 Thread Brijesh Singh

Add SEV_MEASUREMENT event which can be used by libvirt to get the
measurement of the memory regions encrypted through the SEV launch
flow. The measurement value is base64 encoded.

Cc: Daniel P. Berrange 
Cc: Eric Blake 
Cc: Markus Armbruster 
Signed-off-by: Brijesh Singh 
---
 qapi-schema.json | 13 +
 1 file changed, 13 insertions(+)

diff --git a/qapi-schema.json b/qapi-schema.json
index 7eec403cd34a..f63659eda45b 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -3203,3 +3203,16 @@
 # Since: 2.11
 ##
 { 'command': 'watchdog-set-action', 'data' : {'action': 'WatchdogAction'} }
+
+##
+# @SEV_MEASUREMENT:
+#
+# Emitted when measurement is available for the SEV guest.
+#
+# @value: measurement value encoded in base64
+#
+# Since: 2.11
+#
+##
+{ 'event' : 'SEV_MEASUREMENT',
+  'data' : { 'value' : 'str' } }
-- 
2.9.5

[Qemu-devel] [PATCH v5 11/23] sev: register the guest memory range which may contain encrypted data

2017-12-06 Thread Brijesh Singh

When SEV is enabled, the hardware encryption engine uses a tweak such
that the two identical plaintext at different location will have a
different ciphertexts. So swapping or moving a ciphertexts of two guest
pages will not result in plaintexts being swapped. Hence relocating
a physical backing pages of the SEV guest will require some additional
steps in KVM driver. The KVM_MEMORY_ENCRYPT_{UN,}REG_REGION ioctl can be
used to register/unregister the guest memory region which may contain the
encrypted data. KVM driver will internally handle the relocating physical
backing pages of registered memory regions.

Cc: Paolo Bonzini 
Signed-off-by: Brijesh Singh 
---
 accel/kvm/sev.c | 39 +++
 1 file changed, 39 insertions(+)

diff --git a/accel/kvm/sev.c b/accel/kvm/sev.c
index 37020751bd14..7b5318993969 100644
--- a/accel/kvm/sev.c
+++ b/accel/kvm/sev.c
@@ -84,6 +84,43 @@ fw_error_to_str(int code)
 }
 
 static void
+sev_ram_block_added(RAMBlockNotifier *n, void *host, size_t size)
+{
+int r;
+struct kvm_enc_region range;
+
+range.addr = (__u64)host;
+range.size = size;
+
+r = kvm_vm_ioctl(kvm_state, KVM_MEMORY_ENCRYPT_REG_REGION, );
+if (r) {
+error_report("%s: failed to register region (%#llx+%#llx)",
+ __func__, range.addr, range.size);
+}
+}
+
+static void
+sev_ram_block_removed(RAMBlockNotifier *n, void *host, size_t size)
+{
+int r;
+struct kvm_enc_region range;
+
+range.addr = (__u64)host;
+range.size = size;
+
+r = kvm_vm_ioctl(kvm_state, KVM_MEMORY_ENCRYPT_UNREG_REGION, );
+if (r) {
+error_report("%s: failed to unregister region (%#llx+%#llx)",
+ __func__, range.addr, range.size);
+}
+}
+
+static struct RAMBlockNotifier sev_ram_notifier = {
+.ram_block_added = sev_ram_block_added,
+.ram_block_removed = sev_ram_block_removed,
+};
+
+static void
 qsev_guest_finalize(Object *obj)
 {
 }
@@ -286,6 +323,8 @@ sev_guest_init(const char *id)
 goto err;
 }
 
+ram_block_notifier_add(_ram_notifier);
+
 return s;
 err:
 g_free(s);
-- 
2.9.5

[Qemu-devel] [PATCH v5 08/23] docs: add AMD Secure Encrypted Virtualization (SEV)

2017-12-06 Thread Brijesh Singh

Create a documentation entry to describe the AMD Secure Encrypted
Virtualization (SEV) feature.

Cc: Paolo Bonzini 
Signed-off-by: Brijesh Singh 
---
 docs/amd-memory-encryption.txt | 92 ++
 1 file changed, 92 insertions(+)
 create mode 100644 docs/amd-memory-encryption.txt

diff --git a/docs/amd-memory-encryption.txt b/docs/amd-memory-encryption.txt
new file mode 100644
index ..72a92b6c6353
--- /dev/null
+++ b/docs/amd-memory-encryption.txt
@@ -0,0 +1,92 @@
+Secure Encrypted Virtualization (SEV) is a feature found on AMD processors.
+
+SEV is an extension to the AMD-V architecture which supports running encrypted
+virtual machine (VMs) under the control of KVM. Encrypted VMs have their pages
+(code and data) secured such that only the guest itself has access to the
+unencrypted version. Each encrypted VM is associated with a unique encryption
+key; if its data is accessed to a different entity using a different key the
+encrypted guests data will be incorrectly decrypted, leading to unintelligible
+data.
+
+The key management of this feature is handled by separate processor known as
+AMD secure processor (AMD-SP) which is present in AMD SOCs. Firmware running
+inside the AMD-SP provide commands to support common VM lifecycle. This
+includes commands for launching, snapshotting, migrating and debugging the
+encrypted guest. Those SEV command can be issued via KVM_MEMORY_ENCRYPT_OP
+ioctls.
+
+Launching
+-
+Boot images (such as bios) must be encrypted before guest can be booted.
+MEMORY_ENCRYPT_OP ioctl provides commands to encrypt the images :LAUNCH_START,
+LAUNCH_UPDATE_DATA, LAUNCH_MEASURE and LAUNCH_FINISH. These four commands
+together generate a fresh memory encryption key for the VM, encrypt the boot
+images and provide a measurement than can be used as an attestation of the
+successful launch.
+
+LAUNCH_START is called first to create a cryptographic launch context within
+the firmware. To create this context, guest owner must provides guest policy,
+its public Diffie-Hellman key (PDH) and session parameters. These inputs
+should be treated as binary blob and must be passed as-is to the SEV firmware.
+
+The guest policy is passed as plaintext and hypervisor may able to read it
+but should not modify it (any modification of the policy bits will result
+in bad measurement). The guest policy is a 4-byte data structure containing
+several flags that restricts what can be done on running SEV guest.
+See KM Spec section 3 and 6.2 for more details.
+
+Guest owners provided DH certificate and session parameters will be used to
+establish a cryptographic session with the guest owner to negotiate keys used
+for the attestation.
+
+LAUNCH_UPDATE_DATA encrypts the memory region using the cryptographic context
+created via LAUNCH_START command. If required, this command can be called
+multiple times to encrypt different memory regions. The command also calculates
+the measurement of the memory contents as it encrypts.
+
+LAUNCH_MEASURE command can be used to retrieve the measurement of encrypted
+memory. This measurement is a signature of the memory contents that can be
+sent to the guest owner as an attestation that the memory was encrypted
+correctly by the firmware. The guest owner may wait to provide the guest
+confidential information until it can verify the attestation measurement.
+Since the guest owner knows the initial contents of the guest at boot, the
+attestation measurement can be verified by comparing it to what the guest owner
+expects.
+
+LAUNCH_FINISH command finalizes the guest launch and destroy's the 
cryptographic
+context.
+
+See SEV KM API Spec [1] 'Launching a guest' usage flow (Appendix A) for the
+complete flow chart.
+
+Debugging
+---
+Since memory contents of SEV guest is encrypted hence hypervisor access to the
+guest memory will get a cipher text. If guest policy allows debugging, then
+hypervisor can use DEBUG_DECRYPT and DEBUG_ENCRYPT commands access the guest
+memory region for debug purposes.
+
+Snapshot/Restore
+-
+TODO
+
+Live Migration
+
+TODO
+
+References
+-
+
+AMD Memory Encryption whitepaper:
+http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_Memory_Encryption_Whitepaper_v7-Public.pdf
+
+Secure Encrypted Virutualization Key Management:
+[1] http://support.amd.com/TechDocs/55766_SEV-KM API_Specification.pdf
+
+KVM Forum slides:
+http://www.linux-kvm.org/images/7/74/02x08A-Thomas_Lendacky-AMDs_Virtualizatoin_Memory_Encryption_Technology.pdf
+
+AMD64 Architecture Programmer's Manual:
+   http://support.amd.com/TechDocs/24593.pdf
+   SME is section 7.10
+   SEV is section 15.34
-- 
2.9.5

[Qemu-devel] [PATCH 08/55] memory: Move FlatView allocation to a helper

2017-12-06 Thread Michael Roth

From: Alexey Kardashevskiy 

This moves a FlatView allocation and initialization to a helper.
While we are nere, replace g_new with g_new0 to not to bother if we add
new fields in the future.

This should cause no behavioural change.

Signed-off-by: Alexey Kardashevskiy 
Message-Id: <20170921085110.25598-4-...@ozlabs.ru>
Signed-off-by: Paolo Bonzini 
(cherry picked from commit cc94cd6d36602d976a5e7bc29134d1eaefb4102e)
Signed-off-by: Michael Roth 
---
 memory.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/memory.c b/memory.c
index 138e21d35d..c0aa296223 100644
--- a/memory.c
+++ b/memory.c
@@ -258,12 +258,14 @@ static bool flatrange_equal(FlatRange *a, FlatRange *b)
 && a->readonly == b->readonly;
 }
 
-static void flatview_init(FlatView *view)
+static FlatView *flatview_new(void)
 {
+FlatView *view;
+
+view = g_new0(FlatView, 1);
 view->ref = 1;
-view->ranges = NULL;
-view->nr = 0;
-view->nr_allocated = 0;
+
+return view;
 }
 
 /* Insert a range into a given position.  Caller is responsible for maintaining
@@ -706,8 +708,7 @@ static FlatView *generate_memory_topology(MemoryRegion *mr)
 {
 FlatView *view;
 
-view = g_new(FlatView, 1);
-flatview_init(view);
+view = flatview_new();
 
 if (mr) {
 render_memory_region(view, mr, int128_zero(),
@@ -2624,8 +2625,7 @@ void address_space_init(AddressSpace *as, MemoryRegion 
*root, const char *name)
 as->ref_count = 1;
 as->root = root;
 as->malloced = false;
-as->current_map = g_new(FlatView, 1);
-flatview_init(as->current_map);
+as->current_map = flatview_new();
 as->ioeventfd_nb = 0;
 as->ioeventfds = NULL;
 QTAILQ_INIT(>listeners);
-- 
2.11.0

Re: [Qemu-devel] [PATCH 19/25] spapr: add hcalls support for the XIVE interrupt mode

2017-12-06 Thread Benjamin Herrenschmidt

On Wed, 2017-12-06 at 20:20 +1100, David Gibson wrote:
> On Tue, Dec 05, 2017 at 08:50:26AM -0600, Benjamin Herrenschmidt wrote:
> > On Tue, 2017-12-05 at 18:00 +1100, David Gibson wrote:
> > > > The CPU revision. But we won't introduce XIVE exploitation mode on 
> > > > anything else than DD2.0 which has full XIVE support. Even STORE_EOI 
> > > > that we should be adding.
> > > 
> > > Hrm.  Host CPU?  That's a problem - if guest visible properties like
> > > this vary with the host CPU, migration breaks.
> > 
> > I don't think this is going to be a problem in practice. The
> > availability of trigger comes from OPAL but in practice, all virtual
> > interrupts are going to support it always,
> 
> Ok.  It still makes me nervous to derive guest visible features from
> the host.  I'd prefer to just hardwire the XIVE model to always/never
> advertise it and simply fail if that isn't workable for the host kernel.

We could fail loudly if we see an migratable interrupt that doesn't
have the flag.

Cheers,
Ben.

[Qemu-devel] [PATCH v5 07/23] kvm: update kvm.h to include memory encryption ioctls

2017-12-06 Thread Brijesh Singh

Updates kmv.h to include memory encryption ioctls and SEV commands.

Cc: Christian Borntraeger 
Cc: Cornelia Huck 
Cc: Paolo Bonzini 
Signed-off-by: Brijesh Singh 
---
 linux-headers/linux/kvm.h | 90 +++
 1 file changed, 90 insertions(+)

diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index dd8a91801e82..04b5801d0354 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -1356,6 +1356,96 @@ struct kvm_s390_ucas_mapping {
 /* Available with KVM_CAP_S390_CMMA_MIGRATION */
 #define KVM_S390_GET_CMMA_BITS  _IOWR(KVMIO, 0xb8, struct 
kvm_s390_cmma_log)
 #define KVM_S390_SET_CMMA_BITS  _IOW(KVMIO, 0xb9, struct kvm_s390_cmma_log)
+/* Memory Encryption Commands */
+#define KVM_MEMORY_ENCRYPT_OP  _IOWR(KVMIO, 0xba, unsigned long)
+
+struct kvm_enc_region {
+   __u64 addr;
+   __u64 size;
+};
+
+#define KVM_MEMORY_ENCRYPT_REG_REGION_IOR(KVMIO, 0xbb, struct 
kvm_enc_region)
+#define KVM_MEMORY_ENCRYPT_UNREG_REGION  _IOR(KVMIO, 0xbc, struct 
kvm_enc_region)
+
+/* Secure Encrypted Virtualization command */
+enum sev_cmd_id {
+   /* Guest initialization commands */
+   KVM_SEV_INIT = 0,
+   KVM_SEV_ES_INIT,
+   /* Guest launch commands */
+   KVM_SEV_LAUNCH_START,
+   KVM_SEV_LAUNCH_UPDATE_DATA,
+   KVM_SEV_LAUNCH_UPDATE_VMSA,
+   KVM_SEV_LAUNCH_SECRET,
+   KVM_SEV_LAUNCH_MEASURE,
+   KVM_SEV_LAUNCH_FINISH,
+   /* Guest migration commands (outgoing) */
+   KVM_SEV_SEND_START,
+   KVM_SEV_SEND_UPDATE_DATA,
+   KVM_SEV_SEND_UPDATE_VMSA,
+   KVM_SEV_SEND_FINISH,
+   /* Guest migration commands (incoming) */
+   KVM_SEV_RECEIVE_START,
+   KVM_SEV_RECEIVE_UPDATE_DATA,
+   KVM_SEV_RECEIVE_UPDATE_VMSA,
+   KVM_SEV_RECEIVE_FINISH,
+   /* Guest status and debug commands */
+   KVM_SEV_GUEST_STATUS,
+   KVM_SEV_DBG_DECRYPT,
+   KVM_SEV_DBG_ENCRYPT,
+   /* Guest certificates commands */
+   KVM_SEV_CERT_EXPORT,
+
+   KVM_SEV_NR_MAX,
+};
+
+struct kvm_sev_cmd {
+   __u32 id;
+   __u64 data;
+   __u32 error;
+   __u32 sev_fd;
+};
+
+struct kvm_sev_launch_start {
+   __u32 handle;
+   __u32 policy;
+   __u64 dh_uaddr;
+   __u32 dh_len;
+   __u64 session_uaddr;
+   __u32 session_len;
+};
+
+struct kvm_sev_launch_update_data {
+   __u64 uaddr;
+   __u32 len;
+};
+
+
+struct kvm_sev_launch_secret {
+   __u64 hdr_uaddr;
+   __u32 hdr_len;
+   __u64 guest_uaddr;
+   __u32 guest_len;
+   __u64 trans_uaddr;
+   __u32 trans_len;
+};
+
+struct kvm_sev_launch_measure {
+   __u64 uaddr;
+   __u32 len;
+};
+
+struct kvm_sev_guest_status {
+   __u32 handle;
+   __u32 policy;
+   __u32 state;
+};
+
+struct kvm_sev_dbg {
+   __u64 src_uaddr;
+   __u64 dst_uaddr;
+   __u32 len;
+};
 
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU(1 << 0)
 #define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
-- 
2.9.5

[Qemu-devel] [PATCH 04/55] kvmclock: use the updated system_timer_msr

2017-12-06 Thread Michael Roth

From: Jim Somerville 

Fixes e2b6c17 (kvmclock: update system_time_msr address forcibly)
which makes a call to get the latest value of the address
stored in system_timer_msr, but then uses the old address anyway.

Signed-off-by: Jim Somerville 
Message-Id: 
<59b67db0bd15a46ab47c3aa657c81a4c11f168ea.1506702472.git.jim.somervi...@windriver.com>
Cc: qemu-sta...@nongnu.org
Signed-off-by: Paolo Bonzini 
(cherry picked from commit 346b1215b1e9f7cc6d8fe9fb6f3c2778b890afb6)
Signed-off-by: Michael Roth 
---
 hw/i386/kvm/clock.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
index 363d1b5743..a31d8ff240 100644
--- a/hw/i386/kvm/clock.c
+++ b/hw/i386/kvm/clock.c
@@ -62,7 +62,7 @@ static uint64_t kvmclock_current_nsec(KVMClockState *s)
 {
 CPUState *cpu = first_cpu;
 CPUX86State *env = cpu->env_ptr;
-hwaddr kvmclock_struct_pa = env->system_time_msr & ~1ULL;
+hwaddr kvmclock_struct_pa;
 uint64_t migration_tsc = env->tsc;
 struct pvclock_vcpu_time_info time;
 uint64_t delta;
@@ -77,6 +77,7 @@ static uint64_t kvmclock_current_nsec(KVMClockState *s)
 return 0;
 }
 
+kvmclock_struct_pa = env->system_time_msr & ~1ULL;
 cpu_physical_memory_read(kvmclock_struct_pa, , sizeof(time));
 
 assert(time.tsc_timestamp <= migration_tsc);
-- 
2.11.0

[Qemu-devel] [PATCH 07/55] memory: Open code FlatView rendering

2017-12-06 Thread Michael Roth

From: Alexey Kardashevskiy 

We are going to share FlatView's between AddressSpace's and per-AS
memory listeners won't suit the purpose anymore so open code
the dispatch tree rendering.

Since there is a good chance that dispatch_listener was the only
listener, this avoids address_space_update_topology_pass() if there is
no registered listeners; this should improve starting time.

This should cause no behavioural change.

Signed-off-by: Alexey Kardashevskiy 
Message-Id: <20170921085110.25598-3-...@ozlabs.ru>
Signed-off-by: Paolo Bonzini 
(cherry picked from commit 9a62e24f45bc97f8eaf198caf58906b47c50a8d5)
Signed-off-by: Michael Roth 
---
 exec.c | 27 +++
 include/exec/memory-internal.h |  6 --
 include/exec/memory.h  |  1 -
 memory.c   | 19 ++-
 4 files changed, 21 insertions(+), 32 deletions(-)

diff --git a/exec.c b/exec.c
index bd94248390..3ed3718dea 100644
--- a/exec.c
+++ b/exec.c
@@ -1348,9 +1348,8 @@ static void register_multipage(AddressSpaceDispatch *d,
 phys_page_set(d, start_addr >> TARGET_PAGE_BITS, num_pages, section_index);
 }
 
-static void mem_add(MemoryListener *listener, MemoryRegionSection *section)
+void mem_add(AddressSpace *as, MemoryRegionSection *section)
 {
-AddressSpace *as = container_of(listener, AddressSpace, dispatch_listener);
 AddressSpaceDispatch *d = as->next_dispatch;
 MemoryRegionSection now = *section, remain = *section;
 Int128 page_size = int128_make64(TARGET_PAGE_SIZE);
@@ -2674,9 +2673,8 @@ static void io_mem_init(void)
   NULL, UINT64_MAX);
 }
 
-static void mem_begin(MemoryListener *listener)
+void mem_begin(AddressSpace *as)
 {
-AddressSpace *as = container_of(listener, AddressSpace, dispatch_listener);
 AddressSpaceDispatch *d = g_new0(AddressSpaceDispatch, 1);
 uint16_t n;
 
@@ -2700,9 +2698,8 @@ static void 
address_space_dispatch_free(AddressSpaceDispatch *d)
 g_free(d);
 }
 
-static void mem_commit(MemoryListener *listener)
+void mem_commit(AddressSpace *as)
 {
-AddressSpace *as = container_of(listener, AddressSpace, dispatch_listener);
 AddressSpaceDispatch *cur = as->dispatch;
 AddressSpaceDispatch *next = as->next_dispatch;
 
@@ -2732,24 +2729,6 @@ static void tcg_commit(MemoryListener *listener)
 tlb_flush(cpuas->cpu);
 }
 
-void address_space_init_dispatch(AddressSpace *as)
-{
-as->dispatch = NULL;
-as->dispatch_listener = (MemoryListener) {
-.begin = mem_begin,
-.commit = mem_commit,
-.region_add = mem_add,
-.region_nop = mem_add,
-.priority = 0,
-};
-memory_listener_register(>dispatch_listener, as);
-}
-
-void address_space_unregister(AddressSpace *as)
-{
-memory_listener_unregister(>dispatch_listener);
-}
-
 void address_space_destroy_dispatch(AddressSpace *as)
 {
 AddressSpaceDispatch *d = as->dispatch;
diff --git a/include/exec/memory-internal.h b/include/exec/memory-internal.h
index fb467acdba..9abde2f11c 100644
--- a/include/exec/memory-internal.h
+++ b/include/exec/memory-internal.h
@@ -22,8 +22,6 @@
 #ifndef CONFIG_USER_ONLY
 typedef struct AddressSpaceDispatch AddressSpaceDispatch;
 
-void address_space_init_dispatch(AddressSpace *as);
-void address_space_unregister(AddressSpace *as);
 void address_space_destroy_dispatch(AddressSpace *as);
 
 extern const MemoryRegionOps unassigned_mem_ops;
@@ -31,5 +29,9 @@ extern const MemoryRegionOps unassigned_mem_ops;
 bool memory_region_access_valid(MemoryRegion *mr, hwaddr addr,
 unsigned size, bool is_write);
 
+void mem_add(AddressSpace *as, MemoryRegionSection *section);
+void mem_begin(AddressSpace *as);
+void mem_commit(AddressSpace *as);
+
 #endif
 #endif
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 400dd4491b..9ee0f2e846 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -328,7 +328,6 @@ struct AddressSpace {
 struct MemoryRegionIoeventfd *ioeventfds;
 struct AddressSpaceDispatch *dispatch;
 struct AddressSpaceDispatch *next_dispatch;
-MemoryListener dispatch_listener;
 QTAILQ_HEAD(memory_listeners_as, MemoryListener) listeners;
 QTAILQ_ENTRY(AddressSpace) address_spaces_link;
 };
diff --git a/memory.c b/memory.c
index c0adc35410..138e21d35d 100644
--- a/memory.c
+++ b/memory.c
@@ -879,14 +879,24 @@ static void 
address_space_update_topology_pass(AddressSpace *as,
 }
 }
 
-
 static void address_space_update_topology(AddressSpace *as)
 {
 FlatView *old_view = address_space_get_flatview(as);
 FlatView *new_view = generate_memory_topology(as->root);
+int i;
 
-address_space_update_topology_pass(as, old_view, new_view, false);
-address_space_update_topology_pass(as, old_view, new_view, true);
+mem_begin(as);
+for (i = 0; i < new_view->nr; i++) {
+

[Qemu-devel] [PATCH 51/55] nbd/server: fix nbd_negotiate_handle_info

2017-12-06 Thread Michael Roth

From: Vladimir Sementsov-Ogievskiy 

namelen should be here, length is unrelated, and always 0 at this
point.  Broken in introduction in commit f37708f6, but mostly
harmless (replying with '' as the name does not violate protocol,
and does not confuse qemu as the nbd client since our implementation
does not ask for the name; but might confuse some other client that
does ask for the name especially if the default export is different
than the export name being queried).

Adding an assert makes it obvious that we are not skipping any bytes
in the client's message, as well as making it obvious that we were
using the wrong variable.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
CC: qemu-sta...@nongnu.org
Message-Id: <20171101154204.27146-1-vsement...@virtuozzo.com>
[eblake: improve commit message, squash in assert addition]
Signed-off-by: Eric Blake 

(cherry picked from commit 46321d6b5f8c880932a6b3d07bd0ff6f892e665c)
Signed-off-by: Michael Roth 
---
 nbd/server.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/nbd/server.c b/nbd/server.c
index 56aed3a735..5042cc4786 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -434,6 +434,7 @@ static int nbd_negotiate_handle_info(NBDClient *client, 
uint32_t length,
 break;
 }
 }
+assert(length == 0);
 
 exp = nbd_export_find(name);
 if (!exp) {
@@ -444,7 +445,7 @@ static int nbd_negotiate_handle_info(NBDClient *client, 
uint32_t length,
 
 /* Don't bother sending NBD_INFO_NAME unless client requested it */
 if (sendname) {
-rc = nbd_negotiate_send_info(client, opt, NBD_INFO_NAME, length, name,
+rc = nbd_negotiate_send_info(client, opt, NBD_INFO_NAME, namelen, name,
  errp);
 if (rc < 0) {
 return rc;
-- 
2.11.0

[Qemu-devel] [PATCH 53/55] nbd/client: Don't hard-disconnect on ESHUTDOWN from server

2017-12-06 Thread Michael Roth

From: Eric Blake 

The NBD spec says that a server may fail any transmission request
with ESHUTDOWN when it is apparent that no further request from
the client can be successfully honored.  The client is supposed
to then initiate a soft shutdown (wait for all remaining in-flight
requests to be answered, then send NBD_CMD_DISC).  However, since
qemu's server never uses ESHUTDOWN errors, this code was mostly
untested since its introduction in commit b6f5d3b5.

More recently, I learned that nbdkit as the NBD server is able to
send ESHUTDOWN errors, so I finally tested this code, and noticed
that our client was special-casing ESHUTDOWN to cause a hard
shutdown (immediate disconnect, with no NBD_CMD_DISC), but only
if the server sends this error as a simple reply.  Further
investigation found that commit d2febedb introduced a regression
where structured replies behave differently than simple replies -
but that the structured reply behavior is more in line with the
spec (even if we still lack code in nbd-client.c to properly quit
sending further requests).  So this patch reverts the portion of
b6f5d3b5 that introduced an improper hard-disconnect special-case
at the lower level, and leaves the future enhancement of a nicer
soft-disconnect at the higher level for another day.

CC: qemu-sta...@nongnu.org
Signed-off-by: Eric Blake 
Message-Id: <20171113194857.13933-1-ebl...@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy 
(cherry picked from commit 01b05c66a3616d5a4adc39fc90962e9efaf791d1)
 Conflicts:
nbd/client.c
*drop dep on d2febedb
Signed-off-by: Michael Roth 
---
 nbd/client.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/nbd/client.c b/nbd/client.c
index 4caff77119..f04e95542f 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -945,12 +945,6 @@ ssize_t nbd_receive_reply(QIOChannel *ioc, NBDReply 
*reply, Error **errp)
 reply->handle = ldq_be_p(buf + 8);
 
 reply->error = nbd_errno_to_system_errno(reply->error);
-
-if (reply->error == ESHUTDOWN) {
-/* This works even on mingw which lacks a native ESHUTDOWN */
-error_setg(errp, "server shutting down");
-return -EINVAL;
-}
 trace_nbd_receive_reply(magic, reply->error, reply->handle);
 
 if (magic != NBD_REPLY_MAGIC) {
-- 
2.11.0

[Qemu-devel] [PATCH 55/55] vga: handle cirrus vbe mode wraparounds.

2017-12-06 Thread Michael Roth

From: Gerd Hoffmann 

Commit "3d90c62548 vga: stop passing pointers to vga_draw_line*
functions" is incomplete.  It doesn't handle the case that the vga
rendering code tries to create a shared surface, i.e. a pixman image
backed by vga video memory.  That can not work in case the guest display
wraps from end of video memory to the start.  So force shadowing in that
case.  Also adjust the snapshot region calculation.

Can trigger with cirrus only, when programming vbe modes using the bochs
api (stdvga, also qxl and virtio-vga in vga compat mode) wrap arounds
can't happen.

Fixes: CVE-2017-13672
Fixes: 3d90c6254863693a6b13d918d2b8682e08bbc681
Cc: P J P 
Reported-by: David Buchanan 
Signed-off-by: Gerd Hoffmann 
Message-id: 20171010141323.14049-3-kra...@redhat.com
(cherry picked from commit 28f77de26a4f9995458ddeb9d34bb06c0193bdc9)
Signed-off-by: Michael Roth 
---
 hw/display/vga.c | 28 +---
 1 file changed, 21 insertions(+), 7 deletions(-)

diff --git a/hw/display/vga.c b/hw/display/vga.c
index 895e95c3f4..06ca3daa4c 100644
--- a/hw/display/vga.c
+++ b/hw/display/vga.c
@@ -1465,13 +1465,13 @@ static void vga_draw_graphic(VGACommonState *s, int 
full_update)
 DisplaySurface *surface = qemu_console_surface(s->con);
 int y1, y, update, linesize, y_start, double_scan, mask, depth;
 int width, height, shift_control, bwidth, bits;
-ram_addr_t page0, page1;
+ram_addr_t page0, page1, region_start, region_end;
 DirtyBitmapSnapshot *snap = NULL;
 int disp_width, multi_scan, multi_run;
 uint8_t *d;
 uint32_t v, addr1, addr;
 vga_draw_line_func *vga_draw_line = NULL;
-bool share_surface;
+bool share_surface, force_shadow = false;
 pixman_format_code_t format;
 #ifdef HOST_WORDS_BIGENDIAN
 bool byteswap = !s->big_endian_fb;
@@ -1484,6 +1484,15 @@ static void vga_draw_graphic(VGACommonState *s, int 
full_update)
 s->get_resolution(s, , );
 disp_width = width;
 
+region_start = (s->start_addr * 4);
+region_end = region_start + s->line_offset * height;
+if (region_end > s->vbe_size) {
+/* wraps around (can happen with cirrus vbe modes) */
+region_start = 0;
+region_end = s->vbe_size;
+force_shadow = true;
+}
+
 shift_control = (s->gr[VGA_GFX_MODE] >> 5) & 3;
 double_scan = (s->cr[VGA_CRTC_MAX_SCAN] >> 7);
 if (shift_control != 1) {
@@ -1523,7 +1532,7 @@ static void vga_draw_graphic(VGACommonState *s, int 
full_update)
 format = qemu_default_pixman_format(depth, !byteswap);
 if (format) {
 share_surface = dpy_gfx_check_format(s->con, format)
-&& !s->force_shadow;
+&& !s->force_shadow && !force_shadow;
 } else {
 share_surface = false;
 }
@@ -1627,8 +1636,6 @@ static void vga_draw_graphic(VGACommonState *s, int 
full_update)
 y1 = 0;
 
 if (!full_update) {
-ram_addr_t region_start = addr1;
-ram_addr_t region_end = addr1 + s->line_offset * height;
 vga_sync_dirty_bitmap(s);
 if (s->line_compare < height) {
 /* split screen mode */
@@ -1651,10 +1658,17 @@ static void vga_draw_graphic(VGACommonState *s, int 
full_update)
 addr = (addr & ~0x8000) | ((y1 & 2) << 14);
 }
 update = full_update;
-page0 = addr;
-page1 = addr + bwidth - 1;
+page0 = addr & s->vbe_size_mask;
+page1 = (addr + bwidth - 1) & s->vbe_size_mask;
 if (full_update) {
 update = 1;
+} else if (page1 < page0) {
+/* scanline wraps from end of video memory to the start */
+assert(force_shadow);
+update = memory_region_snapshot_get_dirty(>vram, snap,
+  page0, 0);
+update |= memory_region_snapshot_get_dirty(>vram, snap,
+   page1, 0);
 } else {
 update = memory_region_snapshot_get_dirty(>vram, snap,
   page0, page1 - page0);
-- 
2.11.0

[Qemu-devel] [PATCH 49/55] nbd/server: CVE-2017-15118 Stack smash on large export name

2017-12-06 Thread Michael Roth

From: Eric Blake 

Introduced in commit f37708f6b8 (2.10).  The NBD spec says a client
can request export names up to 4096 bytes in length, even though
they should not expect success on names longer than 256.  However,
qemu hard-codes the limit of 256, and fails to filter out a client
that probes for a longer name; the result is a stack smash that can
potentially give an attacker arbitrary control over the qemu
process.

The smash can be easily demonstrated with this client:
$ qemu-io f raw nbd://localhost:10809/$(printf %3000d 1 | tr ' ' a)

If the qemu NBD server binary (whether the standalone qemu-nbd, or
the builtin server of QMP nbd-server-start) was compiled with
-fstack-protector-strong, the ability to exploit the stack smash
into arbitrary execution is a lot more difficult (but still
theoretically possible to a determined attacker, perhaps in
combination with other CVEs).  Still, crashing a running qemu (and
losing the VM) is bad enough, even if the attacker did not obtain
full execution control.

CC: qemu-sta...@nongnu.org
Signed-off-by: Eric Blake 
(cherry picked from commit 51ae4f8455c9e32c54770c4ebc25bf86a8128183)
Signed-off-by: Michael Roth 
---
 nbd/server.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/nbd/server.c b/nbd/server.c
index b93cb88911..56aed3a735 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -393,6 +393,10 @@ static int nbd_negotiate_handle_info(NBDClient *client, 
uint32_t length,
 msg = "name length is incorrect";
 goto invalid;
 }
+if (namelen >= sizeof(name)) {
+msg = "name too long for qemu";
+goto invalid;
+}
 if (nbd_read(client->ioc, name, namelen, errp) < 0) {
 return -EIO;
 }
-- 
2.11.0

[Qemu-devel] [PATCH 54/55] vga: drop line_offset variable

2017-12-06 Thread Michael Roth

From: Gerd Hoffmann 

Signed-off-by: Gerd Hoffmann 
(cherry picked from commit 362f811793ff6cb4d209ab61d76cc4f841bb5e46)
Signed-off-by: Michael Roth 
---
 hw/display/vga.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/hw/display/vga.c b/hw/display/vga.c
index 497c8236d0..895e95c3f4 100644
--- a/hw/display/vga.c
+++ b/hw/display/vga.c
@@ -1464,7 +1464,7 @@ static void vga_draw_graphic(VGACommonState *s, int 
full_update)
 {
 DisplaySurface *surface = qemu_console_surface(s->con);
 int y1, y, update, linesize, y_start, double_scan, mask, depth;
-int width, height, shift_control, line_offset, bwidth, bits;
+int width, height, shift_control, bwidth, bits;
 ram_addr_t page0, page1;
 DirtyBitmapSnapshot *snap = NULL;
 int disp_width, multi_scan, multi_run;
@@ -1614,7 +1614,6 @@ static void vga_draw_graphic(VGACommonState *s, int 
full_update)
 s->cursor_invalidate(s);
 }
 
-line_offset = s->line_offset;
 #if 0
 printf("w=%d h=%d v=%d line_offset=%d cr[0x09]=0x%02x cr[0x17]=0x%02x 
linecmp=%d sr[0x01]=0x%02x\n",
width, height, v, line_offset, s->cr[9], s->cr[VGA_CRTC_MODE],
@@ -1629,7 +1628,7 @@ static void vga_draw_graphic(VGACommonState *s, int 
full_update)
 
 if (!full_update) {
 ram_addr_t region_start = addr1;
-ram_addr_t region_end = addr1 + line_offset * height;
+ram_addr_t region_end = addr1 + s->line_offset * height;
 vga_sync_dirty_bitmap(s);
 if (s->line_compare < height) {
 /* split screen mode */
@@ -1681,7 +1680,7 @@ static void vga_draw_graphic(VGACommonState *s, int 
full_update)
 if (!multi_run) {
 mask = (s->cr[VGA_CRTC_MODE] & 3) ^ 3;
 if ((y1 & mask) == mask)
-addr1 += line_offset;
+addr1 += s->line_offset;
 y1++;
 multi_run = multi_scan;
 } else {
-- 
2.11.0

[Qemu-devel] [PATCH 06/55] exec: Explicitly export target AS from address_space_translate_internal

2017-12-06 Thread Michael Roth

From: Alexey Kardashevskiy 

This adds an AS** parameter to address_space_do_translate()
to make it easier for the next patch to share FlatViews.

This should cause no behavioural change.

Signed-off-by: Alexey Kardashevskiy 
Message-Id: <20170921085110.25598-2-...@ozlabs.ru>
Signed-off-by: Paolo Bonzini 
(cherry picked from commit e76bb18f7e430e0c50fb38d051feacf268bd78f4)
Signed-off-by: Michael Roth 
---
 exec.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/exec.c b/exec.c
index d20c34ca83..bd94248390 100644
--- a/exec.c
+++ b/exec.c
@@ -477,7 +477,8 @@ static MemoryRegionSection 
address_space_do_translate(AddressSpace *as,
   hwaddr *xlat,
   hwaddr *plen,
   bool is_write,
-  bool is_mmio)
+  bool is_mmio,
+  AddressSpace **target_as)
 {
 IOMMUTLBEntry iotlb;
 MemoryRegionSection *section;
@@ -504,6 +505,7 @@ static MemoryRegionSection 
address_space_do_translate(AddressSpace *as,
 }
 
 as = iotlb.target_as;
+*target_as = iotlb.target_as;
 }
 
 *xlat = addr;
@@ -526,7 +528,7 @@ IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace 
*as, hwaddr addr,
 
 /* This can never be MMIO. */
 section = address_space_do_translate(as, addr, , ,
- is_write, false);
+ is_write, false, );
 
 /* Illegal translation */
 if (section.mr == _mem_unassigned) {
@@ -549,7 +551,7 @@ IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace 
*as, hwaddr addr,
 plen -= 1;
 
 return (IOMMUTLBEntry) {
-.target_as = section.address_space,
+.target_as = as,
 .iova = addr & ~plen,
 .translated_addr = xlat & ~plen,
 .addr_mask = plen,
@@ -570,7 +572,8 @@ MemoryRegion *address_space_translate(AddressSpace *as, 
hwaddr addr,
 MemoryRegionSection section;
 
 /* This can be MMIO, so setup MMIO bit. */
-section = address_space_do_translate(as, addr, xlat, plen, is_write, true);
+section = address_space_do_translate(as, addr, xlat, plen, is_write, true,
+ );
 mr = section.mr;
 
 if (xen_enabled() && memory_access_is_direct(mr, is_write)) {
-- 
2.11.0

[Qemu-devel] [PATCH 46/55] block/nfs: fix nfs_client_open for filesize greater than 1TB

2017-12-06 Thread Michael Roth

From: Peter Lieven 

DIV_ROUND_UP(st.st_size, BDRV_SECTOR_SIZE) was overflowing ret (int) if
st.st_size is greater than 1TB.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Peter Lieven 
Message-id: 1511798407-31129-1-git-send-email...@kamp.de
Signed-off-by: Max Reitz 
(cherry picked from commit f1a7ff770f7d71ee7833ff019aac9d6cc3d13f71)
Signed-off-by: Michael Roth 
---
 block/nfs.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/block/nfs.c b/block/nfs.c
index bec16b72a6..addea26d56 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -1,7 +1,7 @@
 /*
  * QEMU Block driver for native access to files on NFS shares
  *
- * Copyright (c) 2014-2016 Peter Lieven 
+ * Copyright (c) 2014-2017 Peter Lieven 
  *
  * Permission is hereby granted, free of charge, to any person obtaining a copy
  * of this software and associated documentation files (the "Software"), to 
deal
@@ -496,7 +496,7 @@ out:
 static int64_t nfs_client_open(NFSClient *client, QDict *options,
int flags, int open_flags, Error **errp)
 {
-int ret = -EINVAL;
+int64_t ret = -EINVAL;
 QemuOpts *opts = NULL;
 Error *local_err = NULL;
 struct stat st;
@@ -686,8 +686,7 @@ static QemuOptsList nfs_create_opts = {
 
 static int nfs_file_create(const char *url, QemuOpts *opts, Error **errp)
 {
-int ret = 0;
-int64_t total_size = 0;
+int64_t ret, total_size;
 NFSClient *client = g_new0(NFSClient, 1);
 QDict *options = NULL;
 
-- 
2.11.0

[Qemu-devel] [PATCH 52/55] nbd-client: Refuse read-only client with BDRV_O_RDWR

2017-12-06 Thread Michael Roth

From: Eric Blake 

The NBD spec says that clients should not try to write/trim to
an export advertised as read-only by the server.  But we failed
to check that, and would allow the block layer to use NBD with
BDRV_O_RDWR even when the server is read-only, which meant we
were depending on the server sending a proper EPERM failure for
various commands, and also exposes a leaky abstraction: using
qemu-io in read-write mode would succeed on 'w -z 0 0' because
of local short-circuiting logic, but 'w 0 0' would send a
request over the wire (where it then depends on the server, and
fails at least for qemu-nbd but might pass for other NBD
implementations).

With this patch, a client MUST request read-only mode to access
a server that is doing a read-only export, or else it will get
a message like:

can't open device nbd://localhost:10809/foo: request for write access conflicts 
with read-only export

It is no longer possible to even attempt writes over the wire
(including the corner case of 0-length writes), because the block
layer enforces the explicit read-only request; this matches the
behavior of qcow2 when backed by a read-only POSIX file.

Fix several iotests to comply with the new behavior (since
qemu-nbd of an internal snapshot, as well as nbd-server-add over QMP,
default to a read-only export, we must tell blockdev-add/qemu-io to
set up a read-only client).

CC: qemu-sta...@nongnu.org
Signed-off-by: Eric Blake 
Message-Id: <20171108215703.9295-3-ebl...@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy 
(cherry picked from commit 1104d83c726d2b20f9cec7b99ab3570a2fdbd46d)
Signed-off-by: Michael Roth 
---
 block/nbd-client.c | 9 +
 tests/qemu-iotests/058 | 8 
 tests/qemu-iotests/140 | 4 ++--
 tests/qemu-iotests/147 | 1 +
 4 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/block/nbd-client.c b/block/nbd-client.c
index ea728fffc8..db9d41eb04 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -256,6 +256,7 @@ int nbd_client_co_pwritev(BlockDriverState *bs, uint64_t 
offset,
 NBDReply reply;
 ssize_t ret;
 
+assert(!(client->info.flags & NBD_FLAG_READ_ONLY));
 if (flags & BDRV_REQ_FUA) {
 assert(client->info.flags & NBD_FLAG_SEND_FUA);
 request.flags |= NBD_CMD_FLAG_FUA;
@@ -284,6 +285,7 @@ int nbd_client_co_pwrite_zeroes(BlockDriverState *bs, 
int64_t offset,
 };
 NBDReply reply;
 
+assert(!(client->info.flags & NBD_FLAG_READ_ONLY));
 if (!(client->info.flags & NBD_FLAG_SEND_WRITE_ZEROES)) {
 return -ENOTSUP;
 }
@@ -339,6 +341,7 @@ int nbd_client_co_pdiscard(BlockDriverState *bs, int64_t 
offset, int bytes)
 NBDReply reply;
 ssize_t ret;
 
+assert(!(client->info.flags & NBD_FLAG_READ_ONLY));
 if (!(client->info.flags & NBD_FLAG_SEND_TRIM)) {
 return 0;
 }
@@ -403,6 +406,12 @@ int nbd_client_init(BlockDriverState *bs,
 logout("Failed to negotiate with the NBD server\n");
 return ret;
 }
+if (client->info.flags & NBD_FLAG_READ_ONLY &&
+!bdrv_is_read_only(bs)) {
+error_setg(errp,
+   "request for write access conflicts with read-only export");
+return -EACCES;
+}
 if (client->info.flags & NBD_FLAG_SEND_FUA) {
 bs->supported_write_flags = BDRV_REQ_FUA;
 bs->supported_zero_flags |= BDRV_REQ_FUA;
diff --git a/tests/qemu-iotests/058 b/tests/qemu-iotests/058
index 2253c6a6d1..5eb8784669 100755
--- a/tests/qemu-iotests/058
+++ b/tests/qemu-iotests/058
@@ -117,15 +117,15 @@ _export_nbd_snapshot sn1
 
 echo
 echo "== verifying the exported snapshot with patterns, method 1 =="
-$QEMU_IO_NBD -c 'read -P 0xa 0x1000 0x1000' "$nbd_snapshot_img" | 
_filter_qemu_io
-$QEMU_IO_NBD -c 'read -P 0xb 0x2000 0x1000' "$nbd_snapshot_img" | 
_filter_qemu_io
+$QEMU_IO_NBD -r -c 'read -P 0xa 0x1000 0x1000' "$nbd_snapshot_img" | 
_filter_qemu_io
+$QEMU_IO_NBD -r -c 'read -P 0xb 0x2000 0x1000' "$nbd_snapshot_img" | 
_filter_qemu_io
 
 _export_nbd_snapshot1 sn1
 
 echo
 echo "== verifying the exported snapshot with patterns, method 2 =="
-$QEMU_IO_NBD -c 'read -P 0xa 0x1000 0x1000' "$nbd_snapshot_img" | 
_filter_qemu_io
-$QEMU_IO_NBD -c 'read -P 0xb 0x2000 0x1000' "$nbd_snapshot_img" | 
_filter_qemu_io
+$QEMU_IO_NBD -r -c 'read -P 0xa 0x1000 0x1000' "$nbd_snapshot_img" | 
_filter_qemu_io
+$QEMU_IO_NBD -r -c 'read -P 0xb 0x2000 0x1000' "$nbd_snapshot_img" | 
_filter_qemu_io
 
 $QEMU_IMG convert "$TEST_IMG" -l sn1 -O qcow2 "$converted_image"
 
diff --git a/tests/qemu-iotests/140 b/tests/qemu-iotests/140
index f89d0d6789..a8fc95145c 100755
--- a/tests/qemu-iotests/140
+++ b/tests/qemu-iotests/140
@@ -78,7 +78,7 @@ _send_qemu_cmd $QEMU_HANDLE \
'arguments': { 'device': 'drv' }}" \
 'return'
 
-$QEMU_IO_PROG -f raw -c 'read -P 42 0 64k' \
+$QEMU_IO_PROG -f raw -r -c 'read -P 42 0 64k' \

[Qemu-devel] [PATCH 05/55] block: Perform copy-on-read in loop

2017-12-06 Thread Michael Roth

From: Eric Blake 

Improve our braindead copy-on-read implementation.  Pre-patch,
we have multiple issues:
- we create a bounce buffer and perform a write for the entire
request, even if the active image already has 99% of the
clusters occupied, and really only needs to copy-on-read the
remaining 1% of the clusters
- our bounce buffer was as large as the read request, and can
needlessly exhaust our memory by using double the memory of
the request size (the original request plus our bounce buffer),
rather than a capped maximum overhead beyond the original
- if a driver has a max_transfer limit, we are bypassing the
normal code in bdrv_aligned_preadv() that fragments to that
limit, and instead attempt to read the entire buffer from the
driver in one go, which some drivers may assert on
- a client can request a large request of nearly 2G such that
rounding the request out to cluster boundaries results in a
byte count larger than 2G.  While this cannot exceed 32 bits,
it DOES have some follow-on problems:
-- the call to bdrv_driver_pread() can assert for exceeding
BDRV_REQUEST_MAX_BYTES, if the driver is old and lacks
.bdrv_co_preadv
-- if the buffer is all zeroes, the subsequent call to
bdrv_co_do_pwrite_zeroes is a no-op due to a negative size,
which means we did not actually copy on read

Fix all of these issues by breaking up the action into a loop,
where each iteration is capped to sane limits.  Also, querying
the allocation status allows us to optimize: when data is
already present in the active layer, we don't need to bounce.

Note that the code has a telling comment that copy-on-read
should probably be a filter driver rather than a bolt-on hack
in io.c; but that remains a task for another day.

CC: qemu-sta...@nongnu.org
Signed-off-by: Eric Blake 
Reviewed-by: Kevin Wolf 
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Kevin Wolf 
(cherry picked from commit cb2e28780c7080af489e72227683fe374f05022d)
 Conflicts:
block/io.c
* remove context dep on d855ebcd3
Signed-off-by: Michael Roth 
---
 block/io.c | 118 ++---
 1 file changed, 81 insertions(+), 37 deletions(-)

diff --git a/block/io.c b/block/io.c
index 26003814eb..fce856ea8a 100644
--- a/block/io.c
+++ b/block/io.c
@@ -34,6 +34,9 @@
 
 #define NOT_DONE 0x7fff /* used while emulated sync operation in progress 
*/
 
+/* Maximum bounce buffer for copy-on-read and write zeroes, in bytes */
+#define MAX_BOUNCE_BUFFER (32768 << BDRV_SECTOR_BITS)
+
 static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
 int64_t offset, int bytes, BdrvRequestFlags flags);
 
@@ -945,11 +948,14 @@ static int coroutine_fn 
bdrv_co_do_copy_on_readv(BdrvChild *child,
 
 BlockDriver *drv = bs->drv;
 struct iovec iov;
-QEMUIOVector bounce_qiov;
+QEMUIOVector local_qiov;
 int64_t cluster_offset;
 unsigned int cluster_bytes;
 size_t skip_bytes;
 int ret;
+int max_transfer = MIN_NON_ZERO(bs->bl.max_transfer,
+BDRV_REQUEST_MAX_BYTES);
+unsigned int progress = 0;
 
 /* FIXME We cannot require callers to have write permissions when all they
  * are doing is a read request. If we did things right, write permissions
@@ -961,52 +967,94 @@ static int coroutine_fn 
bdrv_co_do_copy_on_readv(BdrvChild *child,
 // assert(child->perm & (BLK_PERM_WRITE_UNCHANGED | BLK_PERM_WRITE));
 
 /* Cover entire cluster so no additional backing file I/O is required when
- * allocating cluster in the image file.
+ * allocating cluster in the image file.  Note that this value may exceed
+ * BDRV_REQUEST_MAX_BYTES (even when the original read did not), which
+ * is one reason we loop rather than doing it all at once.
  */
 bdrv_round_to_clusters(bs, offset, bytes, _offset, _bytes);
+skip_bytes = offset - cluster_offset;
 
 trace_bdrv_co_do_copy_on_readv(bs, offset, bytes,
cluster_offset, cluster_bytes);
 
-iov.iov_len = cluster_bytes;
-iov.iov_base = bounce_buffer = qemu_try_blockalign(bs, iov.iov_len);
+bounce_buffer = qemu_try_blockalign(bs,
+MIN(MIN(max_transfer, cluster_bytes),
+MAX_BOUNCE_BUFFER));
 if (bounce_buffer == NULL) {
 ret = -ENOMEM;
 goto err;
 }
 
-qemu_iovec_init_external(_qiov, , 1);
+while (cluster_bytes) {
+int64_t pnum;
 
-ret = bdrv_driver_preadv(bs, cluster_offset, cluster_bytes,
- _qiov, 0);
-if (ret < 0) {
-goto err;
-}
+ret = bdrv_is_allocated(bs, cluster_offset,
+MIN(cluster_bytes, max_transfer), );
+if (ret < 0) {
+/* Safe to treat errors in querying allocation as if
+ *

[Qemu-devel] [PATCH 42/55] vhost: restore avail index from vring used index on disconnection

2017-12-06 Thread Michael Roth

From: Maxime Coquelin 

vhost_virtqueue_stop() gets avail index value from the backend,
except if the backend is not responding.

It happens when the backend crashes, and in this case, internal
state of the virtio queue is inconsistent, making packets
to corrupt the vring state.

With a Linux guest, it results in following error message on
backend reconnection:

[   22.444905] virtio_net virtio0: output.0:id 0 is not a head!
[   22.446746] net enp0s3: Unexpected TXQ (0) queue failure: -5
[   22.476360] net enp0s3: Unexpected TXQ (0) queue failure: -5

Fixes: 283e2c2adcb8 ("net: virtio-net discards TX data after link down")
Cc: qemu-sta...@nongnu.org
Signed-off-by: Maxime Coquelin 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
(cherry picked from commit 2ae39a113af311cb56a0c35b7f212dafcef15303)
Signed-off-by: Michael Roth 
---
 hw/virtio/vhost.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index b737ca915b..76f6e1fcaa 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -1137,6 +1137,10 @@ static void vhost_virtqueue_stop(struct vhost_dev *dev,
 r = dev->vhost_ops->vhost_get_vring_base(dev, );
 if (r < 0) {
 VHOST_OPS_DEBUG("vhost VQ %d ring restore failed: %d", idx, r);
+/* Connection to the backend is broken, so let's sync internal
+ * last avail idx to the device used idx.
+ */
+virtio_queue_restore_last_avail_idx(vdev, idx);
 } else {
 virtio_queue_set_last_avail_idx(vdev, idx, state.num);
 }
-- 
2.11.0

[Qemu-devel] [PATCH 50/55] vhost: fix error check in vhost_verify_ring_mappings()

2017-12-06 Thread Michael Roth

From: Greg Kurz 

Since commit f1f9e6c5 "vhost: adapt vhost_verify_ring_mappings() to
virtio 1 ring layout", we check the mapping of each part (descriptor
table, available ring and used ring) of each virtqueue separately.

The checking of a part is done by the vhost_verify_ring_part_mapping()
function: it returns either 0 on success or a negative errno if the
part cannot be mapped at the same place.

Unfortunately, the vhost_verify_ring_mappings() function checks its
return value the other way round. It means that we either:
- only verify the descriptor table of the first virtqueue, and if it
  is valid we ignore all the other mappings
- or ignore all broken mappings until we reach a valid one

ie, we only raise an error if all mappings are broken, and we consider
all mappings are valid otherwise (false success), which is obviously
wrong.

This patch ensures that vhost_verify_ring_mappings() only returns
success if ALL mappings are okay.

Reported-by: Dr. David Alan Gilbert 
Signed-off-by: Greg Kurz 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
(cherry picked from commit 2fe45ec3bffbd3a26f2ed39f60bab0ca5217d8f6)
Signed-off-by: Michael Roth 
---
 hw/virtio/vhost.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 76f6e1fcaa..fd6f4a878b 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -492,21 +492,21 @@ static int vhost_verify_ring_mappings(struct vhost_dev 
*dev,
 j = 0;
 r = vhost_verify_ring_part_mapping(dev, vq->desc, vq->desc_phys,
vq->desc_size, start_addr, size);
-if (!r) {
+if (r) {
 break;
 }
 
 j++;
 r = vhost_verify_ring_part_mapping(dev, vq->avail, vq->avail_phys,
vq->avail_size, start_addr, size);
-if (!r) {
+if (r) {
 break;
 }
 
 j++;
 r = vhost_verify_ring_part_mapping(dev, vq->used, vq->used_phys,
vq->used_size, start_addr, size);
-if (!r) {
+if (r) {
 break;
 }
 }
-- 
2.11.0

[Qemu-devel] [PATCH 41/55] virtio: Add queue interface to restore avail index from vring used index

2017-12-06 Thread Michael Roth

From: Maxime Coquelin 

In case of backend crash, it is not possible to restore internal
avail index from the backend value as vhost_get_vring_base
callback fails.

This patch provides a new interface to restore internal avail index
from the vring used index, as done by some vhost-user backend on
reconnection.

Signed-off-by: Maxime Coquelin 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
(cherry picked from commit 2d4ba6cc741df15df6fbb4feaa706a02e103083a)
Signed-off-by: Michael Roth 
---
 hw/virtio/virtio.c | 10 ++
 include/hw/virtio/virtio.h |  1 +
 2 files changed, 11 insertions(+)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 464947f76d..15cf6021a0 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -2311,6 +2311,16 @@ void virtio_queue_set_last_avail_idx(VirtIODevice *vdev, 
int n, uint16_t idx)
 vdev->vq[n].shadow_avail_idx = idx;
 }
 
+void virtio_queue_restore_last_avail_idx(VirtIODevice *vdev, int n)
+{
+rcu_read_lock();
+if (vdev->vq[n].vring.desc) {
+vdev->vq[n].last_avail_idx = vring_used_idx(>vq[n]);
+vdev->vq[n].shadow_avail_idx = vdev->vq[n].last_avail_idx;
+}
+rcu_read_unlock();
+}
+
 void virtio_queue_update_used_idx(VirtIODevice *vdev, int n)
 {
 rcu_read_lock();
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 80c45c321e..3d5c84e829 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -272,6 +272,7 @@ hwaddr virtio_queue_get_avail_size(VirtIODevice *vdev, int 
n);
 hwaddr virtio_queue_get_used_size(VirtIODevice *vdev, int n);
 uint16_t virtio_queue_get_last_avail_idx(VirtIODevice *vdev, int n);
 void virtio_queue_set_last_avail_idx(VirtIODevice *vdev, int n, uint16_t idx);
+void virtio_queue_restore_last_avail_idx(VirtIODevice *vdev, int n);
 void virtio_queue_invalidate_signalled_used(VirtIODevice *vdev, int n);
 void virtio_queue_update_used_idx(VirtIODevice *vdev, int n);
 VirtQueue *virtio_get_queue(VirtIODevice *vdev, int n);
-- 
2.11.0

[Qemu-devel] [PATCH 31/55] iotests: Add cluster_size=64k to 125

2017-12-06 Thread Michael Roth

From: Max Reitz 

Apparently it would be a good idea to test that, too.

Signed-off-by: Max Reitz 
Message-id: 20171009215533.12530-4-mre...@redhat.com
Reviewed-by: Eric Blake 
Reviewed-by: Jeff Cody 
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Max Reitz 
(cherry picked from commit 4c112a397c2f61038914fa315a7896ce6d645d18)
Signed-off-by: Michael Roth 
---
 tests/qemu-iotests/125 |   7 +-
 tests/qemu-iotests/125.out | 480 -
 2 files changed, 437 insertions(+), 50 deletions(-)

diff --git a/tests/qemu-iotests/125 b/tests/qemu-iotests/125
index 9424313e82..c20c71570c 100755
--- a/tests/qemu-iotests/125
+++ b/tests/qemu-iotests/125
@@ -69,13 +69,15 @@ fi
 # in B
 CREATION_SIZE=$((2 * 1024 * 1024 - 48 * 1024))
 
+# 512 is the actual test -- but it's good to test 64k as well, just to be sure.
+for cluster_size in 512 64k; do
 # in kB
 for GROWTH_SIZE in 16 48 80; do
 for create_mode in off metadata falloc full; do
 for growth_mode in off metadata falloc full; do
-echo "--- growth_size=$GROWTH_SIZE create_mode=$create_mode 
growth_mode=$growth_mode ---"
+echo "--- cluster_size=$cluster_size growth_size=$GROWTH_SIZE 
create_mode=$create_mode growth_mode=$growth_mode ---"
 
-IMGOPTS="preallocation=$create_mode,cluster_size=512" 
_make_test_img ${CREATION_SIZE}
+IMGOPTS="preallocation=$create_mode,cluster_size=$cluster_size" 
_make_test_img ${CREATION_SIZE}
 $QEMU_IMG resize -f "$IMGFMT" --preallocation=$growth_mode 
"$TEST_IMG" +${GROWTH_SIZE}K
 
 host_size_0=$(get_image_size_on_host)
@@ -123,6 +125,7 @@ for GROWTH_SIZE in 16 48 80; do
 done
 done
 done
+done
 
 # success, all done
 echo '*** done'
diff --git a/tests/qemu-iotests/125.out b/tests/qemu-iotests/125.out
index 3f4d6e31a6..596905f533 100644
--- a/tests/qemu-iotests/125.out
+++ b/tests/qemu-iotests/125.out
@@ -1,5 +1,5 @@
 QA output created by 125
 growth_size=16 create_mode=off growth_mode=off ---
+--- cluster_size=512 growth_size=16 create_mode=off growth_mode=off ---
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=2048000 preallocation=off
 Image resized.
 wrote 2048000/2048000 bytes at offset 0
@@ -7,7 +7,7 @@ wrote 2048000/2048000 bytes at offset 0
 wrote 16384/16384 bytes at offset 2048000
 16 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 
 growth_size=16 create_mode=off growth_mode=metadata ---
+--- cluster_size=512 growth_size=16 create_mode=off growth_mode=metadata ---
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=2048000 preallocation=off
 Image resized.
 wrote 2048000/2048000 bytes at offset 0
@@ -15,7 +15,7 @@ wrote 2048000/2048000 bytes at offset 0
 wrote 16384/16384 bytes at offset 2048000
 16 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 
 growth_size=16 create_mode=off growth_mode=falloc ---
+--- cluster_size=512 growth_size=16 create_mode=off growth_mode=falloc ---
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=2048000 preallocation=off
 Image resized.
 wrote 2048000/2048000 bytes at offset 0
@@ -23,7 +23,7 @@ wrote 2048000/2048000 bytes at offset 0
 wrote 16384/16384 bytes at offset 2048000
 16 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 
 growth_size=16 create_mode=off growth_mode=full ---
+--- cluster_size=512 growth_size=16 create_mode=off growth_mode=full ---
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=2048000 preallocation=off
 Image resized.
 wrote 2048000/2048000 bytes at offset 0
@@ -31,7 +31,7 @@ wrote 2048000/2048000 bytes at offset 0
 wrote 16384/16384 bytes at offset 2048000
 16 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 
 growth_size=16 create_mode=metadata growth_mode=off ---
+--- cluster_size=512 growth_size=16 create_mode=metadata growth_mode=off ---
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=2048000 preallocation=metadata
 Image resized.
 wrote 2048000/2048000 bytes at offset 0
@@ -39,7 +39,7 @@ wrote 2048000/2048000 bytes at offset 0
 wrote 16384/16384 bytes at offset 2048000
 16 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 
 growth_size=16 create_mode=metadata growth_mode=metadata ---
+--- cluster_size=512 growth_size=16 create_mode=metadata growth_mode=metadata 
---
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=2048000 preallocation=metadata
 Image resized.
 wrote 2048000/2048000 bytes at offset 0
@@ -47,7 +47,7 @@ wrote 2048000/2048000 bytes at offset 0
 wrote 16384/16384 bytes at offset 2048000
 16 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 
 growth_size=16 create_mode=metadata growth_mode=falloc ---
+--- cluster_size=512 growth_size=16 create_mode=metadata growth_mode=falloc ---
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=2048000 preallocation=metadata
 Image resized.
 wrote 2048000/2048000 bytes at offset 0
@@ -55,7 +55,7

[Qemu-devel] [PATCH 48/55] nbd/server: CVE-2017-15119 Reject options larger than 32M

2017-12-06 Thread Michael Roth

From: Eric Blake 

The NBD spec gives us permission to abruptly disconnect on clients
that send outrageously large option requests, rather than having
to spend the time reading to the end of the option.  No real
option request requires that much data anyways; and meanwhile, we
already have the practice of abruptly dropping the connection on
any client that sends NBD_CMD_WRITE with a payload larger than 32M.

For comparison, nbdkit drops the connection on any request with
more than 4096 bytes; however, that limit is probably too low
(as the NBD spec states an export name can theoretically be up
to 4096 bytes, which means a valid NBD_OPT_INFO could be even
longer) - even if qemu doesn't permit exports longer than 256
bytes.

It could be argued that a malicious client trying to get us to
read nearly 4G of data on a bad request is a form of denial of
service.  In particular, if the server requires TLS, but a client
that does not know the TLS credentials sends any option (other
than NBD_OPT_STARTTLS or NBD_OPT_EXPORT_NAME) with a stated
payload of nearly 4G, then the server was keeping the connection
alive trying to read all the payload, tying up resources that it
would rather be spending on a client that can get past the TLS
handshake.  Hence, this warranted a CVE.

Present since at least 2.5 when handling known options, and made
worse in 2.6 when fixing support for NBD_FLAG_C_FIXED_NEWSTYLE
to handle unknown options.

CC: qemu-sta...@nongnu.org
Signed-off-by: Eric Blake 
(cherry picked from commit fdad35ef6c5839d50dfc14073364ac893afebc30)
Signed-off-by: Michael Roth 
---
 nbd/server.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/nbd/server.c b/nbd/server.c
index 993ade30bb..b93cb88911 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -661,6 +661,12 @@ static int nbd_negotiate_options(NBDClient *client, 
uint16_t myflags,
 }
 length = be32_to_cpu(length);
 
+if (length > NBD_MAX_BUFFER_SIZE) {
+error_setg(errp, "len (%" PRIu32" ) is larger than max len (%u)",
+   length, NBD_MAX_BUFFER_SIZE);
+return -EINVAL;
+}
+
 trace_nbd_negotiate_options_check_option(option,
  nbd_opt_lookup(option));
 if (client->tlscreds &&
-- 
2.11.0

[Qemu-devel] [PATCH 38/55] net: fix check for number of parameters to -netdev socket

2017-12-06 Thread Michael Roth

From: Jens Freimann 

Since commit 0f8c289ad "net: fix -netdev socket,fd= for UDP sockets"
we allow more than one parameter for -netdev socket. But now
we run into an assert when no parameter at all is specified

> qemu-system-x86_64 -netdev socket
socket.c:729: net_init_socket: Assertion `sock->has_udp' failed.

Fix this by reverting the change of the if condition done in 0f8c289ad.

Cc: Jason Wang 
Cc: qemu-sta...@nongnu.org
Fixes: 0f8c289ad539feb5135c545bea947b310a893f4b
Reported-by: Mao Zhongyi 
Signed-off-by: Jens Freimann 
Signed-off-by: Jason Wang 
(cherry picked from commit ff86d5762552787f1fcb7da695ec4f8c1be754b4)
 Conflicts:
net/socket.c
* drop context dep on 0522a959
Signed-off-by: Michael Roth 
---
 net/socket.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/socket.c b/net/socket.c
index 6664a75aa4..95060e5ca2 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -674,8 +674,8 @@ int net_init_socket(const Netdev *netdev, const char *name,
 assert(netdev->type == NET_CLIENT_DRIVER_SOCKET);
 sock = >u.socket;
 
-if (sock->has_listen + sock->has_connect + sock->has_mcast +
-sock->has_udp > 1) {
+if (sock->has_fd + sock->has_listen + sock->has_connect + sock->has_mcast +
+sock->has_udp != 1) {
 error_report("exactly one of listen=, connect=, mcast= or udp="
  " is required");
 return -1;
-- 
2.11.0

[Qemu-devel] [PATCH 34/55] ppc: fix setting of compat mode

2017-12-06 Thread Michael Roth

From: Greg Kurz 

While trying to make KVM PR usable again, commit 5dfaa532ae introduced a
regression: the current compat_pvr value is passed to KVM instead of the
new one. This means that we always pass 0 instead of the max-cpu-compat
PVR during the initial machine reset. And at CAS time, we either pass
the PVR from the command line or even don't call kvmppc_set_compat() at
all, ie, the PCR will not be set as expected.

For example if we start a big endian fedora26 guest in power7 compat
mode on a POWER8 host, we get this in the guest:

$ cat /proc/cpuinfo
processor   : 0
cpu : POWER7 (architected), altivec supported
clock   : 4024.00MHz
revision: 2.0 (pvr 004d 0200)

timebase: 51200
platform: pSeries
model   : IBM pSeries (emulated by qemu)
machine : CHRP IBM pSeries (emulated by qemu)
MMU : Hash

but the guest can still execute POWER8 instructions, and the following
program succeeds:

int main()
{
asm("vncipher 0,0,0"); // ISA 2.07 instruction
}

Let's pass the new compat_pvr to kvmppc_set_compat() and the program fails
with SIGILL as expected.

Reported-by: Nageswara R Sastry 
Signed-off-by: Greg Kurz 
Signed-off-by: David Gibson 
(cherry picked from commit e4f0c6bb1a9f72ad9e32c3171d36bae17ea1cd67)
Signed-off-by: Michael Roth 
---
 target/ppc/compat.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/ppc/compat.c b/target/ppc/compat.c
index f8729fe46d..ad8f93c064 100644
--- a/target/ppc/compat.c
+++ b/target/ppc/compat.c
@@ -141,7 +141,7 @@ void ppc_set_compat(PowerPCCPU *cpu, uint32_t compat_pvr, 
Error **errp)
 cpu_synchronize_state(CPU(cpu));
 
 if (kvm_enabled() && cpu->compat_pvr != compat_pvr) {
-int ret = kvmppc_set_compat(cpu, cpu->compat_pvr);
+int ret = kvmppc_set_compat(cpu, compat_pvr);
 if (ret < 0) {
 error_setg_errno(errp, -ret,
  "Unable to set CPU compatibility mode in KVM");
-- 
2.11.0

[Qemu-devel] [PATCH 45/55] scripts/make-release: ship u-boot source as a tarball

2017-12-06 Thread Michael Roth

The u-boot sources we ship currently cause problems with unpacking on
a case-insensitive filesystem due to path conflicts. This has been
fixed in upstream u-boot via commit 610eec7f, but since it is not
yet included in an official release we implement this approach as a
temporary workaround.

Once we move to a u-boot containing commit 610eec7f we should revert
this patch.

Cc: qemu-sta...@nongnu.org
Cc: Alexander Graf 
Cc: Richard Henderson 
Cc: Thomas Huth 
Cc: Peter Maydell 
Suggested-by: Richard Henderson 
Signed-off-by: Michael Roth 
Reviewed-by: Thomas Huth 
Message-id: 20171107205201.10207-1-mdr...@linux.vnet.ibm.com
Signed-off-by: Peter Maydell 
(cherry picked from commit d0dead3b6df7f6cd970ed02e8369ab8730aac9d3)
Signed-off-by: Michael Roth 
---
 scripts/make-release | 4 
 1 file changed, 4 insertions(+)

diff --git a/scripts/make-release b/scripts/make-release
index fa6323fda8..3917df7142 100755
--- a/scripts/make-release
+++ b/scripts/make-release
@@ -20,6 +20,10 @@ git checkout "v${version}"
 git submodule update --init
 (cd roms/seabios && git describe --tags --long --dirty > .version)
 rm -rf .git roms/*/.git dtc/.git pixman/.git
+# FIXME: The following line is a workaround for avoiding filename collisions
+# when unpacking u-boot sources on case-insensitive filesystems. Once we
+# update to something with u-boot commit 610eec7f0 we can drop this line.
+tar cfj roms/u-boot.tar.bz2 -C roms u-boot && rm -rf roms/u-boot
 popd
 tar cfj ${destination}.tar.bz2 ${destination}
 rm -rf ${destination}
-- 
2.11.0

[Qemu-devel] [PATCH 03/55] block/mirror: check backing in bdrv_mirror_top_flush

2017-12-06 Thread Michael Roth

From: Vladimir Sementsov-Ogievskiy 

Backing may be zero after failed bdrv_append in mirror_start_job,
which leads to SIGSEGV.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Message-id: 20170929152255.5431-1-vsement...@virtuozzo.com
Signed-off-by: Max Reitz 
(cherry picked from commit ce960aa9062a407d0ca15aee3dcd3bd84a4e24f9)
Signed-off-by: Michael Roth 
---
 block/mirror.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/block/mirror.c b/block/mirror.c
index 429751b9fe..03fc6d63b7 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1056,6 +1056,10 @@ static int coroutine_fn 
bdrv_mirror_top_pwritev(BlockDriverState *bs,
 
 static int coroutine_fn bdrv_mirror_top_flush(BlockDriverState *bs)
 {
+if (bs->backing == NULL) {
+/* we can be here after failed bdrv_append in mirror_start_job */
+return 0;
+}
 return bdrv_co_flush(bs->backing->bs);
 }
 
-- 
2.11.0

[Qemu-devel] [PATCH 47/55] virtio-net: don't touch virtqueue if vm is stopped

2017-12-06 Thread Michael Roth

From: Jason Wang 

Guest state should not be touched if VM is stopped, unfortunately we
didn't check running state and tried to drain tx queue unconditionally
in virtio_net_set_status(). A crash was then noticed as a migration
destination when user type quit after virtqueue state is loaded but
before region cache is initialized. In this case,
virtio_net_drop_tx_queue_data() tries to access the uninitialized
region cache.

Fix this by only dropping tx queue data when vm is running.

Fixes: 283e2c2adcb80 ("net: virtio-net discards TX data after link down")
Cc: Yuri Benditovich 
Cc: Paolo Bonzini 
Cc: Stefan Hajnoczi 
Cc: Michael S. Tsirkin 
Cc: qemu-sta...@nongnu.org
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Jason Wang 
(cherry picked from commit 70e53e6e4da3db4b2c31981191753a7e974936d0)
Signed-off-by: Michael Roth 
---
 hw/net/virtio-net.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 148071a396..fbc5e1bd73 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -288,7 +288,8 @@ static void virtio_net_set_status(struct VirtIODevice 
*vdev, uint8_t status)
 qemu_bh_cancel(q->tx_bh);
 }
 if ((n->status & VIRTIO_NET_S_LINK_UP) == 0 &&
-(queue_status & VIRTIO_CONFIG_S_DRIVER_OK)) {
+(queue_status & VIRTIO_CONFIG_S_DRIVER_OK) &&
+vdev->vm_running) {
 /* if tx is waiting we are likely have some packets in tx queue
  * and disabled notification */
 q->tx_waiting = 0;
-- 
2.11.0

[Qemu-devel] [PATCH 28/55] hw/sd: fix out-of-bounds check for multi block reads

2017-12-06 Thread Michael Roth

From: Michael Olbrich 

The current code checks if the next block exceeds the size of the card.
This generates an error while reading the last block of the card.
Do the out-of-bounds check when starting to read a new block to fix this.

This issue became visible with increased error checking in Linux 4.13.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Michael Olbrich 
Reviewed-by: Alistair Francis 
Message-id: 20170916091611.10241-1-m.olbr...@pengutronix.de
Signed-off-by: Peter Maydell 
(cherry picked from commit 8573378e62d19e25a2434e23462ec99ef4d065ac)
Signed-off-by: Michael Roth 
---
 hw/sd/sd.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/sd/sd.c b/hw/sd/sd.c
index ba47bff4db..35347a5bbc 100644
--- a/hw/sd/sd.c
+++ b/hw/sd/sd.c
@@ -1797,8 +1797,13 @@ uint8_t sd_read_data(SDState *sd)
 break;
 
 case 18:   /* CMD18:  READ_MULTIPLE_BLOCK */
-if (sd->data_offset == 0)
+if (sd->data_offset == 0) {
+if (sd->data_start + io_len > sd->size) {
+sd->card_status |= ADDRESS_ERROR;
+return 0x00;
+}
 BLK_READ_BLOCK(sd->data_start, io_len);
+}
 ret = sd->data[sd->data_offset ++];
 
 if (sd->data_offset >= io_len) {
@@ -1812,11 +1817,6 @@ uint8_t sd_read_data(SDState *sd)
 break;
 }
 }
-
-if (sd->data_start + io_len > sd->size) {
-sd->card_status |= ADDRESS_ERROR;
-break;
-}
 }
 break;
 
-- 
2.11.0

[Qemu-devel] [PATCH 43/55] hw/ppc: clear pending_events on machine reset

2017-12-06 Thread Michael Roth

From: Daniel Henrique Barboza 

The sPAPR machine isn't clearing up the pending events QTAILQ on
machine reboot. This allows for unprocessed hotplug/epow events
to persist in the queue after reset and, when reasserting the IRQs in
check_exception later on, these will be being processed by the OS.

This patch implements a new function called 'spapr_clear_pending_events'
that clears up the pending_events QTAILQ. This helper is then called
inside ppc_spapr_reset to clear up the events queue, preventing
old/deprecated events from persisting after a reset.

Signed-off-by: Daniel Henrique Barboza 
Signed-off-by: David Gibson 
(cherry picked from commit 56258174238eb25df629a53a96e1ac16a32dc7d4)
Signed-off-by: Michael Roth 
---
 hw/ppc/spapr.c |  1 +
 hw/ppc/spapr_events.c  | 11 +++
 include/hw/ppc/spapr.h |  1 +
 3 files changed, 13 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index cc3901a790..954fd1a747 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1416,6 +1416,7 @@ static void ppc_spapr_reset(void)
 }
 
 qemu_devices_reset();
+spapr_clear_pending_events(spapr);
 
 /*
  * We place the device tree and RTAS just below either the top of the RMA,
diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
index f952b78237..66b8164f30 100644
--- a/hw/ppc/spapr_events.c
+++ b/hw/ppc/spapr_events.c
@@ -700,6 +700,17 @@ static void event_scan(PowerPCCPU *cpu, sPAPRMachineState 
*spapr,
 rtas_st(rets, 0, RTAS_OUT_NO_ERRORS_FOUND);
 }
 
+void spapr_clear_pending_events(sPAPRMachineState *spapr)
+{
+sPAPREventLogEntry *entry = NULL;
+
+QTAILQ_FOREACH(entry, >pending_events, next) {
+QTAILQ_REMOVE(>pending_events, entry, next);
+g_free(entry->extended_log);
+g_free(entry);
+}
+}
+
 void spapr_events_init(sPAPRMachineState *spapr)
 {
 QTAILQ_INIT(>pending_events);
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 2a303a705c..5d161ec580 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -662,6 +662,7 @@ void spapr_cpu_parse_features(sPAPRMachineState *spapr);
 int spapr_hpt_shift_for_ramsize(uint64_t ramsize);
 void spapr_reallocate_hpt(sPAPRMachineState *spapr, int shift,
   Error **errp);
+void spapr_clear_pending_events(sPAPRMachineState *spapr);
 
 /* CPU and LMB DRC release callbacks. */
 void spapr_core_release(DeviceState *dev);
-- 
2.11.0

[Qemu-devel] [PATCH 40/55] util/stats64: Fix min/max comparisons

2017-12-06 Thread Michael Roth

From: Max Reitz 

stat64_min_slow() and stat64_max_slow() compare the wrong way.  This
makes iotest 136 fail with clang and -m32.

Signed-off-by: Max Reitz 
Message-Id: <20171114232223.25207-1-mre...@redhat.com>
Signed-off-by: Paolo Bonzini 
(cherry picked from commit 26a5db322be1e424a815d070ddd04442a5e5df50)
Signed-off-by: Michael Roth 
---
 util/stats64.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/util/stats64.c b/util/stats64.c
index 9968fcceac..389c365a9e 100644
--- a/util/stats64.c
+++ b/util/stats64.c
@@ -91,7 +91,7 @@ bool stat64_min_slow(Stat64 *s, uint64_t value)
 low = atomic_read(>low);
 
 orig = ((uint64_t)high << 32) | low;
-if (orig < value) {
+if (value < orig) {
 /* We have to set low before high, just like stat64_min reads
  * high before low.  The value may become higher temporarily, but
  * stat64_get does not notice (it takes the lock) and the only ill
@@ -120,7 +120,7 @@ bool stat64_max_slow(Stat64 *s, uint64_t value)
 low = atomic_read(>low);
 
 orig = ((uint64_t)high << 32) | low;
-if (orig > value) {
+if (value > orig) {
 /* We have to set low before high, just like stat64_max reads
  * high before low.  The value may become lower temporarily, but
  * stat64_get does not notice (it takes the lock) and the only ill
-- 
2.11.0

[Qemu-devel] [PATCH 24/55] memory: Share special empty FlatView

2017-12-06 Thread Michael Roth

From: Alexey Kardashevskiy 

This shares an cached empty FlatView among address spaces. The empty
FV is used every time when a root MR renders into a FV without memory
sections which happens when MR or its children are not enabled or
zero-sized. The empty_view is not NULL to keep the rest of memory
API intact; it also has a dispatch tree for the same reason.

On POWER8 with 255 CPUs, 255 virtio-net, 40 PCI bridges guest this halves
the amount of FlatView's in use (557 -> 260) and dispatch tables
(~80 -> ~37).  In an unrelated experiment with 112 non-virtio
devices on x86 ("-M pc"), only 4 FlatViews are alive, and about ~2000
are created at startup.

Signed-off-by: Alexey Kardashevskiy 
Message-Id: <20170921085110.25598-16-...@ozlabs.ru>
Signed-off-by: Paolo Bonzini 
(cherry picked from commit 092aa2fc65b7a35121616aad8f39d47b8f921618)
Signed-off-by: Michael Roth 
---
 memory.c | 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/memory.c b/memory.c
index 231bb78fa7..d90853855b 100644
--- a/memory.c
+++ b/memory.c
@@ -317,6 +317,7 @@ static void flatview_unref(FlatView *view)
 {
 if (atomic_fetch_dec(>ref) == 1) {
 trace_flatview_destroy_rcu(view, view->root);
+assert(view->root);
 call_rcu(view, flatview_destroy, rcu);
 }
 }
@@ -760,16 +761,19 @@ static MemoryRegion 
*memory_region_get_flatview_root(MemoryRegion *mr)
 }
 }
 }
+if (found == 0) {
+return NULL;
+}
 if (next) {
 mr = next;
 continue;
 }
 }
 
-break;
+return mr;
 }
 
-return mr;
+return NULL;
 }
 
 /* Render a memory topology into a list of disjoint absolute ranges. */
@@ -965,12 +969,22 @@ static void 
address_space_update_topology_pass(AddressSpace *as,
 
 static void flatviews_init(void)
 {
+static FlatView *empty_view;
+
 if (flat_views) {
 return;
 }
 
 flat_views = g_hash_table_new_full(g_direct_hash, g_direct_equal, NULL,
(GDestroyNotify) flatview_unref);
+if (!empty_view) {
+empty_view = generate_memory_topology(NULL);
+/* We keep it alive forever in the global variable.  */
+flatview_ref(empty_view);
+} else {
+g_hash_table_replace(flat_views, NULL, empty_view);
+flatview_ref(empty_view);
+}
 }
 
 static void flatviews_reset(void)
-- 
2.11.0

[Qemu-devel] [PATCH 44/55] spapr: reset DRCs after devices

2017-12-06 Thread Michael Roth

From: Greg Kurz 

A DRC with a pending unplug request releases its associated device at
machine reset time.

In the case of LMB, when all DRCs for a DIMM device have been reset,
the DIMM gets unplugged, causing guest memory to disappear. This may
be very confusing for anything still using this memory.

This is exactly what happens with vhost backends, and QEMU aborts
with:

qemu-system-ppc64: used ring relocated for ring 2
qemu-system-ppc64: qemu/hw/virtio/vhost.c:649: vhost_commit: Assertion
 `r >= 0' failed.

The issue is that each DRC registers a QEMU reset handler, and we
don't control the order in which these handlers are called (ie,
a LMB DRC will unplug a DIMM before the virtio device using the
memory on this DIMM could stop its vhost backend).

To avoid such situations, let's reset DRCs after all devices
have been reset.

Reported-by: Mallesh N. Koti 
Signed-off-by: Greg Kurz 
Reviewed-by: Daniel Henrique Barboza 
Reviewed-by: Michael Roth 
Signed-off-by: David Gibson 
(cherry picked from commit 82512483940c756e2db1bd67ea91b02bc29c5e01)
Signed-off-by: Michael Roth 
---
 hw/ppc/spapr.c | 21 +
 hw/ppc/spapr_drc.c |  7 ---
 2 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 954fd1a747..8630281d0e 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1393,6 +1393,19 @@ static void find_unknown_sysbus_device(SysBusDevice 
*sbdev, void *opaque)
 }
 }
 
+static int spapr_reset_drcs(Object *child, void *opaque)
+{
+sPAPRDRConnector *drc =
+(sPAPRDRConnector *) object_dynamic_cast(child,
+ TYPE_SPAPR_DR_CONNECTOR);
+
+if (drc) {
+spapr_drc_reset(drc);
+}
+
+return 0;
+}
+
 static void ppc_spapr_reset(void)
 {
 MachineState *machine = MACHINE(qdev_get_machine());
@@ -1416,6 +1429,14 @@ static void ppc_spapr_reset(void)
 }
 
 qemu_devices_reset();
+
+/* DRC reset may cause a device to be unplugged. This will cause troubles
+ * if this device is used by another device (eg, a running vhost backend
+ * will crash QEMU if the DIMM holding the vring goes away). To avoid such
+ * situations, we reset DRCs after all devices have been reset.
+ */
+object_child_foreach_recursive(object_get_root(), spapr_reset_drcs, NULL);
+
 spapr_clear_pending_events(spapr);
 
 /*
diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
index 50df361187..85f4e7d324 100644
--- a/hw/ppc/spapr_drc.c
+++ b/hw/ppc/spapr_drc.c
@@ -455,11 +455,6 @@ void spapr_drc_reset(sPAPRDRConnector *drc)
 }
 }
 
-static void drc_reset(void *opaque)
-{
-spapr_drc_reset(SPAPR_DR_CONNECTOR(opaque));
-}
-
 bool spapr_drc_needed(void *opaque)
 {
 sPAPRDRConnector *drc = (sPAPRDRConnector *)opaque;
@@ -518,7 +513,6 @@ static void realize(DeviceState *d, Error **errp)
 }
 vmstate_register(DEVICE(drc), spapr_drc_index(drc), _spapr_drc,
  drc);
-qemu_register_reset(drc_reset, drc);
 trace_spapr_drc_realize_complete(spapr_drc_index(drc));
 }
 
@@ -529,7 +523,6 @@ static void unrealize(DeviceState *d, Error **errp)
 char name[256];
 
 trace_spapr_drc_unrealize(spapr_drc_index(drc));
-qemu_unregister_reset(drc_reset, drc);
 vmstate_unregister(DEVICE(drc), _spapr_drc, drc);
 root_container = container_get(object_get_root(), DRC_CONTAINER_PATH);
 snprintf(name, sizeof(name), "%x", spapr_drc_index(drc));
-- 
2.11.0

[Qemu-devel] [PATCH 39/55] nbd/client: Use error_prepend() correctly

2017-12-06 Thread Michael Roth

From: Eric Blake 

When using error prepend(), it is necessary to end with a space
in the format string; otherwise, messages come out incorrectly,
such as when connecting to a socket that hangs up immediately:

can't open device nbd://localhost:10809/: Failed to read dataUnexpected 
end-of-file before all bytes were read

Originally botched in commit e44ed99d, then several more instances
added in the meantime.

Pre-existing and not fixed here: we are inconsistent on capitalization;
some of our messages start with lower case, and others start with upper,
although the use of error_prepend() is much nicer to read when all
fragments consistently start with lower.

CC: qemu-sta...@nongnu.org
Signed-off-by: Eric Blake 
Message-Id: <20171113152424.25381-1-ebl...@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Markus Armbruster 
(cherry picked from commit cb6b1a3fc30c52ffd94ed0b69ca5991b19651724)
Signed-off-by: Michael Roth 
---
 nbd/client.c | 50 ++
 1 file changed, 26 insertions(+), 24 deletions(-)

diff --git a/nbd/client.c b/nbd/client.c
index 0a17de80b5..4caff77119 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -111,12 +111,12 @@ static int nbd_send_option_request(QIOChannel *ioc, 
uint32_t opt,
 stl_be_p(, len);
 
 if (nbd_write(ioc, , sizeof(req), errp) < 0) {
-error_prepend(errp, "Failed to send option request header");
+error_prepend(errp, "Failed to send option request header: ");
 return -1;
 }
 
 if (len && nbd_write(ioc, (char *) data, len, errp) < 0) {
-error_prepend(errp, "Failed to send option request data");
+error_prepend(errp, "Failed to send option request data: ");
 return -1;
 }
 
@@ -145,7 +145,7 @@ static int nbd_receive_option_reply(QIOChannel *ioc, 
uint32_t opt,
 {
 QEMU_BUILD_BUG_ON(sizeof(*reply) != 20);
 if (nbd_read(ioc, reply, sizeof(*reply), errp) < 0) {
-error_prepend(errp, "failed to read option reply");
+error_prepend(errp, "failed to read option reply: ");
 nbd_send_opt_abort(ioc);
 return -1;
 }
@@ -198,7 +198,7 @@ static int nbd_handle_reply_err(QIOChannel *ioc, 
nbd_opt_reply *reply,
 msg = g_malloc(reply->length + 1);
 if (nbd_read(ioc, msg, reply->length, errp) < 0) {
 error_prepend(errp, "failed to read option error 0x%" PRIx32
-  " (%s) message",
+  " (%s) message: ",
   reply->type, nbd_rep_lookup(reply->type));
 goto cleanup;
 }
@@ -309,7 +309,7 @@ static int nbd_receive_list(QIOChannel *ioc, const char 
*want, bool *match,
 return -1;
 }
 if (nbd_read(ioc, , sizeof(namelen), errp) < 0) {
-error_prepend(errp, "failed to read option name length");
+error_prepend(errp, "failed to read option name length: ");
 nbd_send_opt_abort(ioc);
 return -1;
 }
@@ -322,7 +322,8 @@ static int nbd_receive_list(QIOChannel *ioc, const char 
*want, bool *match,
 }
 if (namelen != strlen(want)) {
 if (nbd_drop(ioc, len, errp) < 0) {
-error_prepend(errp, "failed to skip export name with wrong 
length");
+error_prepend(errp,
+  "failed to skip export name with wrong length: ");
 nbd_send_opt_abort(ioc);
 return -1;
 }
@@ -331,14 +332,14 @@ static int nbd_receive_list(QIOChannel *ioc, const char 
*want, bool *match,
 
 assert(namelen < sizeof(name));
 if (nbd_read(ioc, name, namelen, errp) < 0) {
-error_prepend(errp, "failed to read export name");
+error_prepend(errp, "failed to read export name: ");
 nbd_send_opt_abort(ioc);
 return -1;
 }
 name[namelen] = '\0';
 len -= namelen;
 if (nbd_drop(ioc, len, errp) < 0) {
-error_prepend(errp, "failed to read export description");
+error_prepend(errp, "failed to read export description: ");
 nbd_send_opt_abort(ioc);
 return -1;
 }
@@ -424,7 +425,7 @@ static int nbd_opt_go(QIOChannel *ioc, const char *wantname,
 return -1;
 }
 if (nbd_read(ioc, , sizeof(type), errp) < 0) {
-error_prepend(errp, "failed to read info type");
+error_prepend(errp, "failed to read info type: ");
 nbd_send_opt_abort(ioc);
 return -1;
 }
@@ -439,13 +440,13 @@ static int nbd_opt_go(QIOChannel *ioc, const char 
*wantname,
 return -1;
 }
 if (nbd_read(ioc, >size, sizeof(info->size), errp) < 0) {
-error_prepend(errp, "failed to read info size");
+error_prepend(errp, "failed to read info size: ");
 nbd_send_opt_abort(ioc);
 return -1;
 }

[Qemu-devel] [PATCH 33/55] io: monitor encoutput buffer size from websocket GSource

2017-12-06 Thread Michael Roth

From: "Daniel P. Berrange" 

The websocket GSource is monitoring the size of the rawoutput
buffer to determine if the channel can accepts more writes.
The rawoutput buffer, however, is merely a temporary staging
buffer before data is copied into the encoutput buffer. Thus
its size will always be zero when the GSource runs.

This flaw causes the encoutput buffer to grow without bound
if the other end of the underlying data channel doesn't
read data being sent. This can be seen with VNC if a client
is on a slow WAN link and the guest OS is sending many screen
updates. A malicious VNC client can act like it is on a slow
link by playing a video in the guest and then reading data
very slowly, causing QEMU host memory to expand arbitrarily.

This issue is assigned CVE-2017-15268, publically reported in

  https://bugs.launchpad.net/qemu/+bug/1718964

(cherry picked from commit a7b20a8efa28e5f22c26c06cd06c2f12bc863493)

Reviewed-by: Eric Blake 

[Dan: Added extra checks to deal with code refactored in master but
 not stable 2.10]

Signed-off-by: Daniel P. Berrange 
Signed-off-by: Michael Roth 
---
 io/channel-websock.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/io/channel-websock.c b/io/channel-websock.c
index 5a3badbec2..19116dc148 100644
--- a/io/channel-websock.c
+++ b/io/channel-websock.c
@@ -26,7 +26,7 @@
 #include "trace.h"
 
 
-/* Max amount to allow in rawinput/rawoutput buffers */
+/* Max amount to allow in rawinput/encoutput buffers */
 #define QIO_CHANNEL_WEBSOCK_MAX_BUFFER 8192
 
 #define QIO_CHANNEL_WEBSOCK_CLIENT_KEY_LEN 24
@@ -1006,7 +1006,7 @@ qio_channel_websock_source_prepare(GSource *source,
 if (wsource->wioc->rawinput.offset) {
 cond |= G_IO_IN;
 }
-if (wsource->wioc->rawoutput.offset < QIO_CHANNEL_WEBSOCK_MAX_BUFFER) {
+if (wsource->wioc->encoutput.offset < QIO_CHANNEL_WEBSOCK_MAX_BUFFER) {
 cond |= G_IO_OUT;
 }
 
@@ -1022,7 +1022,7 @@ qio_channel_websock_source_check(GSource *source)
 if (wsource->wioc->rawinput.offset) {
 cond |= G_IO_IN;
 }
-if (wsource->wioc->rawoutput.offset < QIO_CHANNEL_WEBSOCK_MAX_BUFFER) {
+if (wsource->wioc->encoutput.offset < QIO_CHANNEL_WEBSOCK_MAX_BUFFER) {
 cond |= G_IO_OUT;
 }
 
@@ -1041,7 +1041,7 @@ qio_channel_websock_source_dispatch(GSource *source,
 if (wsource->wioc->rawinput.offset) {
 cond |= G_IO_IN;
 }
-if (wsource->wioc->rawoutput.offset < QIO_CHANNEL_WEBSOCK_MAX_BUFFER) {
+if (wsource->wioc->encoutput.offset < QIO_CHANNEL_WEBSOCK_MAX_BUFFER) {
 cond |= G_IO_OUT;
 }
 
-- 
2.11.0

[Qemu-devel] [PATCH 26/55] exec: simplify address_space_get_iotlb_entry

2017-12-06 Thread Michael Roth

From: Peter Xu 

This patch let address_space_get_iotlb_entry() to use the newly
introduced page_mask parameter in flatview_do_translate(). Then we
will be sure the IOTLB can be aligned to page mask, also we should
nicely support huge pages now when introducing a764040.

Fixes: a764040 ("exec: abstract address_space_do_translate()")
Signed-off-by: Peter Xu 
Signed-off-by: Maxime Coquelin 
Acked-by: Michael S. Tsirkin 
Message-Id: <20171010094247.10173-3-maxime.coque...@redhat.com>
Signed-off-by: Paolo Bonzini 
(cherry picked from commit 076a93d7972c9c1e3839d2f65edc32568a2cce93)
Signed-off-by: Michael Roth 
---
 exec.c | 31 ++-
 1 file changed, 10 insertions(+), 21 deletions(-)

diff --git a/exec.c b/exec.c
index 2fd65dc3f2..9a7600eb17 100644
--- a/exec.c
+++ b/exec.c
@@ -557,14 +557,14 @@ IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace 
*as, hwaddr addr,
 bool is_write)
 {
 MemoryRegionSection section;
-hwaddr xlat, plen;
+hwaddr xlat, page_mask;
 
-/* Try to get maximum page mask during translation. */
-plen = (hwaddr)-1;
-
-/* This can never be MMIO. */
-section = flatview_do_translate(address_space_to_flatview(as), addr,
-, , NULL, is_write, false, );
+/*
+ * This can never be MMIO, and we don't really care about plen,
+ * but page mask.
+ */
+section = flatview_do_translate(address_space_to_flatview(as), addr, ,
+NULL, _mask, is_write, false, );
 
 /* Illegal translation */
 if (section.mr == _mem_unassigned) {
@@ -575,22 +575,11 @@ IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace 
*as, hwaddr addr,
 xlat += section.offset_within_address_space -
 section.offset_within_region;
 
-if (plen == (hwaddr)-1) {
-/*
- * We use default page size here. Logically it only happens
- * for identity mappings.
- */
-plen = TARGET_PAGE_SIZE;
-}
-
-/* Convert to address mask */
-plen -= 1;
-
 return (IOMMUTLBEntry) {
 .target_as = as,
-.iova = addr & ~plen,
-.translated_addr = xlat & ~plen,
-.addr_mask = plen,
+.iova = addr & ~page_mask,
+.translated_addr = xlat & ~page_mask,
+.addr_mask = page_mask,
 /* IOTLBs are for DMAs, and DMA only allows on RAMs. */
 .perm = IOMMU_RW,
 };
-- 
2.11.0

[Qemu-devel] [PATCH 35/55] translate.c: Fix usermode big-endian AArch32 LDREXD and STREXD

2017-12-06 Thread Michael Roth

From: Peter Maydell 

For AArch32 LDREXD and STREXD, architecturally the 32-bit word at the
lowest address is always Rt and the one at addr+4 is Rt2, even if the
CPU is big-endian. Our implementation does these with a single
64-bit store, so if we're big-endian then we need to put the two
32-bit halves together in the opposite order to little-endian,
so that they end up in the right places. We were trying to do
this with the gen_aa32_frob64() function, but that is not correct
for the usermode emulator, because there there is a distinction
between "load a 64 bit value" (which does a BE 64-bit access
and doesn't need swapping) and "load two 32 bit values as one
64 bit access" (where we still need to do the swapping, like
system mode BE32).

Fixes: https://bugs.launchpad.net/qemu/+bug/1725267
Cc: qemu-sta...@nongnu.org
Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
Message-id: 1509622400-13351-1-git-send-email-peter.mayd...@linaro.org
(cherry picked from commit 3448d47b3172015006b79197eb5a69826c6a7b6d)
Signed-off-by: Michael Roth 
---
 target/arm/translate.c | 39 ++-
 1 file changed, 34 insertions(+), 5 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index d1a5f56998..ad758d333d 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -7858,9 +7858,27 @@ static void gen_load_exclusive(DisasContext *s, int rt, 
int rt2,
 TCGv_i32 tmp2 = tcg_temp_new_i32();
 TCGv_i64 t64 = tcg_temp_new_i64();
 
-gen_aa32_ld_i64(s, t64, addr, get_mem_index(s), opc);
+/* For AArch32, architecturally the 32-bit word at the lowest
+ * address is always Rt and the one at addr+4 is Rt2, even if
+ * the CPU is big-endian. That means we don't want to do a
+ * gen_aa32_ld_i64(), which invokes gen_aa32_frob64() as if
+ * for an architecturally 64-bit access, but instead do a
+ * 64-bit access using MO_BE if appropriate and then split
+ * the two halves.
+ * This only makes a difference for BE32 user-mode, where
+ * frob64() must not flip the two halves of the 64-bit data
+ * but this code must treat BE32 user-mode like BE32 system.
+ */
+TCGv taddr = gen_aa32_addr(s, addr, opc);
+
+tcg_gen_qemu_ld_i64(t64, taddr, get_mem_index(s), opc);
+tcg_temp_free(taddr);
 tcg_gen_mov_i64(cpu_exclusive_val, t64);
-tcg_gen_extr_i64_i32(tmp, tmp2, t64);
+if (s->be_data == MO_BE) {
+tcg_gen_extr_i64_i32(tmp2, tmp, t64);
+} else {
+tcg_gen_extr_i64_i32(tmp, tmp2, t64);
+}
 tcg_temp_free_i64(t64);
 
 store_reg(s, rt2, tmp2);
@@ -7909,15 +7927,26 @@ static void gen_store_exclusive(DisasContext *s, int 
rd, int rt, int rt2,
 TCGv_i64 n64 = tcg_temp_new_i64();
 
 t2 = load_reg(s, rt2);
-tcg_gen_concat_i32_i64(n64, t1, t2);
+/* For AArch32, architecturally the 32-bit word at the lowest
+ * address is always Rt and the one at addr+4 is Rt2, even if
+ * the CPU is big-endian. Since we're going to treat this as a
+ * single 64-bit BE store, we need to put the two halves in the
+ * opposite order for BE to LE, so that they end up in the right
+ * places.
+ * We don't want gen_aa32_frob64() because that does the wrong
+ * thing for BE32 usermode.
+ */
+if (s->be_data == MO_BE) {
+tcg_gen_concat_i32_i64(n64, t2, t1);
+} else {
+tcg_gen_concat_i32_i64(n64, t1, t2);
+}
 tcg_temp_free_i32(t2);
-gen_aa32_frob64(s, n64);
 
 tcg_gen_atomic_cmpxchg_i64(o64, taddr, cpu_exclusive_val, n64,
get_mem_index(s), opc);
 tcg_temp_free_i64(n64);
 
-gen_aa32_frob64(s, o64);
 tcg_gen_setcond_i64(TCG_COND_NE, o64, o64, cpu_exclusive_val);
 tcg_gen_extrl_i64_i32(t0, o64);
 
-- 
2.11.0

[Qemu-devel] [PATCH 36/55] hw/intc/arm_gicv3_its: Don't abort on table save failure

2017-12-06 Thread Michael Roth

From: Eric Auger 

The ITS is not fully properly reset at the moment. Caches are
not emptied.

After a reset, in case we attempt to save the state before
the bound devices have registered their MSIs and after the
1st level table has been allocated by the ITS driver
(device BASER is valid), the first level entries are still
invalid. If the device cache is not empty (devices registered
before the reset), vgic_its_save_device_tables fails with -EINVAL.
This causes a QEMU abort().

Cc: qemu-sta...@nongnu.org
Signed-off-by: Eric Auger 
Reported-by: wanghaibin 
Reviewed-by: Peter Maydell 
Signed-off-by: Peter Maydell 
(cherry picked from commit 8a7348b5d62d7ea16807e6bea54b448a0184bb0f)
Signed-off-by: Michael Roth 
---
 hw/intc/arm_gicv3_its_kvm.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/hw/intc/arm_gicv3_its_kvm.c b/hw/intc/arm_gicv3_its_kvm.c
index 1f8991b8a6..1cc58c2da3 100644
--- a/hw/intc/arm_gicv3_its_kvm.c
+++ b/hw/intc/arm_gicv3_its_kvm.c
@@ -64,20 +64,16 @@ static void vm_change_state_handler(void *opaque, int 
running,
 {
 GICv3ITSState *s = (GICv3ITSState *)opaque;
 Error *err = NULL;
-int ret;
 
 if (running) {
 return;
 }
 
-ret = kvm_device_access(s->dev_fd, KVM_DEV_ARM_VGIC_GRP_CTRL,
-KVM_DEV_ARM_ITS_SAVE_TABLES, NULL, true, );
+kvm_device_access(s->dev_fd, KVM_DEV_ARM_VGIC_GRP_CTRL,
+  KVM_DEV_ARM_ITS_SAVE_TABLES, NULL, true, );
 if (err) {
 error_report_err(err);
 }
-if (ret < 0 && ret != -EFAULT) {
-abort();
-}
 }
 
 static void kvm_arm_its_realize(DeviceState *dev, Error **errp)
-- 
2.11.0

[Qemu-devel] [PATCH 30/55] qcow2: Always execute preallocate() in a coroutine

2017-12-06 Thread Michael Roth

From: Max Reitz 

Some qcow2 functions (at least perform_cow()) expect s->lock to be
taken.  Therefore, if we want to make use of them, we should execute
preallocate() (as "preallocate_co") in a coroutine so that we can use
the qemu_co_mutex_* functions.

Signed-off-by: Max Reitz 
Message-id: 20171009215533.12530-3-mre...@redhat.com
Cc: qemu-sta...@nongnu.org
Reviewed-by: Eric Blake 
Reviewed-by: Jeff Cody 
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Max Reitz 
(cherry picked from commit 572b07bea1d1a0f7726fd95c2613c76002a379bc)
Signed-off-by: Michael Roth 
---
 block/qcow2.c | 41 ++---
 1 file changed, 34 insertions(+), 7 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 10e38074ad..668665ea8d 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2476,6 +2476,14 @@ static int qcow2_set_up_encryption(BlockDriverState *bs, 
const char *encryptfmt,
 }
 
 
+typedef struct PreallocCo {
+BlockDriverState *bs;
+uint64_t offset;
+uint64_t new_length;
+
+int ret;
+} PreallocCo;
+
 /**
  * Preallocates metadata structures for data clusters between @offset (in the
  * guest disk) and @new_length (which is thus generally the new guest disk
@@ -2483,9 +2491,12 @@ static int qcow2_set_up_encryption(BlockDriverState *bs, 
const char *encryptfmt,
  *
  * Returns: 0 on success, -errno on failure.
  */
-static int preallocate(BlockDriverState *bs,
-   uint64_t offset, uint64_t new_length)
+static void coroutine_fn preallocate_co(void *opaque)
 {
+PreallocCo *params = opaque;
+BlockDriverState *bs = params->bs;
+uint64_t offset = params->offset;
+uint64_t new_length = params->new_length;
 BDRVQcow2State *s = bs->opaque;
 uint64_t bytes;
 uint64_t host_offset = 0;
@@ -2493,9 +2504,7 @@ static int preallocate(BlockDriverState *bs,
 int ret;
 QCowL2Meta *meta;
 
-if (qemu_in_coroutine()) {
-qemu_co_mutex_lock(>lock);
-}
+qemu_co_mutex_lock(>lock);
 
 assert(offset <= new_length);
 bytes = new_length - offset;
@@ -2549,10 +2558,28 @@ static int preallocate(BlockDriverState *bs,
 ret = 0;
 
 done:
+qemu_co_mutex_unlock(>lock);
+params->ret = ret;
+}
+
+static int preallocate(BlockDriverState *bs,
+   uint64_t offset, uint64_t new_length)
+{
+PreallocCo params = {
+.bs = bs,
+.offset = offset,
+.new_length = new_length,
+.ret= -EINPROGRESS,
+};
+
 if (qemu_in_coroutine()) {
-qemu_co_mutex_unlock(>lock);
+preallocate_co();
+} else {
+Coroutine *co = qemu_coroutine_create(preallocate_co, );
+bdrv_coroutine_enter(bs, co);
+BDRV_POLL_WHILE(bs, params.ret == -EINPROGRESS);
 }
-return ret;
+return params.ret;
 }
 
 /* qcow2_refcount_metadata_size:
-- 
2.11.0

[Qemu-devel] [PATCH 18/55] memory: Share FlatView's and dispatch trees between address spaces

2017-12-06 Thread Michael Roth

From: Alexey Kardashevskiy 

This allows sharing flat views between address spaces (AS) when
the same root memory region is used when creating a new address space.
This is done by walking through all ASes and caching one FlatView per
a physical root MR (i.e. not aliased).

This removes search for duplicates from address_space_init_shareable() as
FlatViews are shared elsewhere and keeping as::ref_count correct seems
an unnecessary and useless complication.

This should cause no change and memory use or boot time yet.

Signed-off-by: Alexey Kardashevskiy 
Message-Id: <20170921085110.25598-13-...@ozlabs.ru>
Signed-off-by: Paolo Bonzini 
(cherry picked from commit 967dc9b1194a9281124b2e1ce67b6c3359a2138f)
Signed-off-by: Michael Roth 
---
 memory.c | 56 +---
 1 file changed, 45 insertions(+), 11 deletions(-)

diff --git a/memory.c b/memory.c
index 1f58d29755..f0c864206a 100644
--- a/memory.c
+++ b/memory.c
@@ -47,6 +47,8 @@ static QTAILQ_HEAD(memory_listeners, MemoryListener) 
memory_listeners
 static QTAILQ_HEAD(, AddressSpace) address_spaces
 = QTAILQ_HEAD_INITIALIZER(address_spaces);
 
+static GHashTable *flat_views;
+
 typedef struct AddrRange AddrRange;
 
 /*
@@ -760,6 +762,7 @@ static FlatView *generate_memory_topology(MemoryRegion *mr)
 flatview_add_to_dispatch(view, );
 }
 address_space_dispatch_compact(view->dispatch);
+g_hash_table_replace(flat_views, mr, view);
 
 return view;
 }
@@ -929,11 +932,47 @@ static void 
address_space_update_topology_pass(AddressSpace *as,
 }
 }
 
-static void address_space_update_topology(AddressSpace *as)
+static void flatviews_init(void)
+{
+if (flat_views) {
+return;
+}
+
+flat_views = g_hash_table_new_full(g_direct_hash, g_direct_equal, NULL,
+   (GDestroyNotify) flatview_unref);
+}
+
+static void flatviews_reset(void)
+{
+AddressSpace *as;
+
+if (flat_views) {
+g_hash_table_unref(flat_views);
+flat_views = NULL;
+}
+flatviews_init();
+
+/* Render unique FVs */
+QTAILQ_FOREACH(as, _spaces, address_spaces_link) {
+MemoryRegion *physmr = memory_region_get_flatview_root(as->root);
+
+if (g_hash_table_lookup(flat_views, physmr)) {
+continue;
+}
+
+generate_memory_topology(physmr);
+}
+}
+
+static void address_space_set_flatview(AddressSpace *as)
 {
 FlatView *old_view = address_space_get_flatview(as);
-MemoryRegion *physmr = memory_region_get_flatview_root(old_view->root);
-FlatView *new_view = generate_memory_topology(physmr);
+MemoryRegion *physmr = memory_region_get_flatview_root(as->root);
+FlatView *new_view = g_hash_table_lookup(flat_views, physmr);
+
+assert(new_view);
+
+flatview_ref(new_view);
 
 if (!QTAILQ_EMPTY(>listeners)) {
 address_space_update_topology_pass(as, old_view, new_view, false);
@@ -969,10 +1008,12 @@ void memory_region_transaction_commit(void)
 --memory_region_transaction_depth;
 if (!memory_region_transaction_depth) {
 if (memory_region_update_pending) {
+flatviews_reset();
+
 MEMORY_LISTENER_CALL_GLOBAL(begin, Forward);
 
 QTAILQ_FOREACH(as, _spaces, address_spaces_link) {
-address_space_update_topology(as);
+address_space_set_flatview(as);
 address_space_update_ioeventfds(as);
 }
 memory_region_update_pending = false;
@@ -2695,13 +2736,6 @@ AddressSpace *address_space_init_shareable(MemoryRegion 
*root, const char *name)
 {
 AddressSpace *as;
 
-QTAILQ_FOREACH(as, _spaces, address_spaces_link) {
-if (root == as->root && as->malloced) {
-as->ref_count++;
-return as;
-}
-}
-
 as = g_malloc0(sizeof *as);
 address_space_init(as, root, name);
 as->malloced = true;
-- 
2.11.0

1 2 3 >

1 - 100 of 270 matches

Mail list logo