Re: [Qemu-devel] [PATCH] virtio: Notice when the system doesn't support MSIx at all

2015-10-26 Thread Michael S. Tsirkin
On Mon, Oct 26, 2015 at 09:44:47AM +, Mark Cave-Ayland wrote:
> On 27/09/15 15:40, Peter Maydell wrote:
> 
> > Ping^2 -- this is still confusing users, cf bug LP:1500175.
> > 
> > thanks
> > -- PMM
> > 
> > On 6 July 2015 at 11:38, Peter Maydell  wrote:
> >> On 3 July 2015 at 09:22, Mark Cave-Ayland  
> >> wrote:
> >>> On 19/05/15 21:29, Richard Henderson wrote:
> >>>
>  And do not issue an error_report in that case.
> 
>  Signed-off-by: Richard Henderson 
>  ---
>   hw/virtio/virtio-pci.c | 15 ++-
>   1 file changed, 10 insertions(+), 5 deletions(-)
>  --
> 
>  This seems to be about the only sane way to test the value of
>  msi_supported from here.  Thoughts?
> >>
> >>> Ping? Any chance of getting this applied for 2.4 since it's something
> >>> that I do get emails about for SPARC64?
> >>
> >> Michael would like a respin with the commit message correction
> >> he suggested. RTH -- could you send out a fixed up version, please?

I don't think there was a reply to this, and by now the patch doesn't apply.

> >>
> >> thanks
> >> -- PMM
> 
> Another ping on this patch - please can we have it for 2.5? It has been
> around since before 2.4 and I'd hate for it to miss another release :(
> 
> 
> ATB,
> 
> Mark.

Pls address the issues if you want this applied.

-- 
MST



Re: [Qemu-devel] [PATCH v3 4/5] Enable fw_cfg DMA interface for ARM

2015-10-26 Thread Stefan Hajnoczi
On Thu, Oct 22, 2015 at 05:22:16PM -0400, Gabriel L. Somlo wrote:
> I was re-reading the documentation for fw_cfg_add_file_callback(),
> and noticed that non-dma read operations check for the presence
> of a callback (and call it if present) for *every* *single* *byte*,
> even on 64-bit MMIO reads. That's also what the documentation says
> (in docs/specs/fw_cfg.txt, being moved into fw_cfg.h as per
>  http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg05315.html).
> 
> During DMA reads, however, the callback is only checked once before
> each chunk, effectively once per DMA read operation.
> 
> Now, typical callbacks I found throughout the qemu source tend to return
> immediately except for the first time they're invoked, but I wonder if
> skipping over all those extra "do I have a callback, if so call it,
> mostly so it can return without doing anything" per-byte operations
> account in some significant part for the dramatically faster transfers?
> 
> Not sure how I'd test for that -- besides my not having anything
> resembling a viable ARM setup, I'm not sure if limiting the callbacks
> to only be invoked if (s->cur_offset == 0) would make sense, just as a
> test ?

I think Marc came to the conclusion that it's safe and therefore made
that optimization for DMA.

The same can be done for PIO.

Stefan



Re: [Qemu-devel] [PATCH] virtio: Notice when the system doesn't support MSIx at all

2015-10-26 Thread Mark Cave-Ayland
On 26/10/15 10:14, Peter Maydell wrote:

> On 26 October 2015 at 10:09, Mark Cave-Ayland
>  wrote:
>> On 26/10/15 10:03, Michael S. Tsirkin wrote:
>>> Pls address the issues if you want this applied.
>>
>> The patch came from Richard rather than myself - my only involvement has
>> been to champion the patch since both myself and Peter have had reports
>> of this message confusing users.
> 
> Isn't this patch already in master as commit 0d583647a7fc73f6 ?

Oh I think you're right! I had that thread tagged and hadn't seen a
reply come through, but I guess Richard must have attached it to one of
his recent pull requests.

In that case apologies to all for the noise.


ATB,

Mark.




Re: [Qemu-devel] [Bug 1505652] [NEW] An IO error happen when qemu snapshot-create

2015-10-26 Thread Stefan Hajnoczi
On Tue, Oct 13, 2015 at 11:52:26AM -, jiangchi68 wrote:
> My qemu version is 1.7.1,but when I try to make live snapshot by
> libvirt,the libvirt sometimes report an error
> :qemuMonitorJSONCheckError:377 : internal error: unable to execute QEMU
> command 'transaction': An IO error has occurred.
> 
> Here is the command being snpshot create by virsh:
> virsh  snapshot-create snapshot-test.vm   snapshot.xml  --no-metadata  
> --disk-only --reuse-external
> the snapshot.xml:
> 
>   test
>   
> 
>   
> 
>   
> 

Please send your question to libvirt-l...@redhat.com.

It could be related to /home/disk file permissions.



Re: [Qemu-devel] [PATCH v2 1/2] xen_platform: switch to realize

2015-10-26 Thread Stefano Stabellini
On Wed, 21 Oct 2015, Eduardo Habkost wrote:
> From: Stefano Stabellini 
> 
> Use realize to initialize the xen_platform device
> 
> Signed-off-by: Stefano Stabellini 
> Signed-off-by: Eduardo Habkost 

Applied to my tree


> Changes v1 -> v2:
> * Remove useless return
>   * Suggested-by: Paolo Bonzini 
> * Rename xen_platform_initfn() to xen_platform_realize()
> ---
>  hw/i386/xen/xen_platform.c | 6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/i386/xen/xen_platform.c b/hw/i386/xen/xen_platform.c
> index 8682c42..3dc68cb 100644
> --- a/hw/i386/xen/xen_platform.c
> +++ b/hw/i386/xen/xen_platform.c
> @@ -382,7 +382,7 @@ static const VMStateDescription vmstate_xen_platform = {
>  }
>  };
>  
> -static int xen_platform_initfn(PCIDevice *dev)
> +static void xen_platform_realize(PCIDevice *dev, Error **errp)
>  {
>  PCIXenPlatformState *d = XEN_PLATFORM(dev);
>  uint8_t *pci_conf;
> @@ -407,8 +407,6 @@ static int xen_platform_initfn(PCIDevice *dev)
>   >mmio_bar);
>  
>  platform_fixed_ioport_init(d);
> -
> -return 0;
>  }
>  
>  static void platform_reset(DeviceState *dev)
> @@ -423,7 +421,7 @@ static void xen_platform_class_init(ObjectClass *klass, 
> void *data)
>  DeviceClass *dc = DEVICE_CLASS(klass);
>  PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
>  
> -k->init = xen_platform_initfn;
> +k->realize = xen_platform_realize;
>  k->vendor_id = PCI_VENDOR_ID_XEN;
>  k->device_id = PCI_DEVICE_ID_XEN_PLATFORM;
>  k->class_id = PCI_CLASS_OTHERS << 8 | 0x80;
> -- 
> 2.1.0
> 



[Qemu-devel] [PULL 1/3] Qemu/Xen: Fix early freeing MSIX MMIO memory region

2015-10-26 Thread Stefano Stabellini
From: Lan Tianyu 

msix->mmio is added to XenPCIPassthroughState's object as property.
object_finalize_child_property is called for XenPCIPassthroughState's
object, which calls object_property_del_all, which is going to try to
delete msix->mmio. object_finalize_child_property() will access
msix->mmio's obj. But the whole msix struct has already been freed
by xen_pt_msix_delete. This will cause segment fault when msix->mmio
has been overwritten.

This patch is to fix the issue.

Signed-off-by: Lan Tianyu 
Reviewed-by: Stefano Stabellini 
Signed-off-by: Stefano Stabellini 
---
 hw/xen/xen_pt.c |8 
 hw/xen/xen_pt.h |1 +
 hw/xen/xen_pt_config_init.c |2 +-
 hw/xen/xen_pt_msi.c |   13 -
 4 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/hw/xen/xen_pt.c b/hw/xen/xen_pt.c
index 2b54f52..aa96288 100644
--- a/hw/xen/xen_pt.c
+++ b/hw/xen/xen_pt.c
@@ -938,10 +938,18 @@ static void xen_pci_passthrough_class_init(ObjectClass 
*klass, void *data)
 dc->props = xen_pci_passthrough_properties;
 };
 
+static void xen_pci_passthrough_finalize(Object *obj)
+{
+XenPCIPassthroughState *s = XEN_PT_DEVICE(obj);
+
+xen_pt_msix_delete(s);
+}
+
 static const TypeInfo xen_pci_passthrough_info = {
 .name = TYPE_XEN_PT_DEVICE,
 .parent = TYPE_PCI_DEVICE,
 .instance_size = sizeof(XenPCIPassthroughState),
+.instance_finalize = xen_pci_passthrough_finalize,
 .class_init = xen_pci_passthrough_class_init,
 };
 
diff --git a/hw/xen/xen_pt.h b/hw/xen/xen_pt.h
index 3bc22eb..c545280 100644
--- a/hw/xen/xen_pt.h
+++ b/hw/xen/xen_pt.h
@@ -305,6 +305,7 @@ void xen_pt_msi_disable(XenPCIPassthroughState *s);
 
 int xen_pt_msix_init(XenPCIPassthroughState *s, uint32_t base);
 void xen_pt_msix_delete(XenPCIPassthroughState *s);
+void xen_pt_msix_unmap(XenPCIPassthroughState *s);
 int xen_pt_msix_update(XenPCIPassthroughState *s);
 int xen_pt_msix_update_remap(XenPCIPassthroughState *s, int bar_index);
 void xen_pt_msix_disable(XenPCIPassthroughState *s);
diff --git a/hw/xen/xen_pt_config_init.c b/hw/xen/xen_pt_config_init.c
index 4a5bc11..0efee11 100644
--- a/hw/xen/xen_pt_config_init.c
+++ b/hw/xen/xen_pt_config_init.c
@@ -2079,7 +2079,7 @@ void xen_pt_config_delete(XenPCIPassthroughState *s)
 
 /* free MSI/MSI-X info table */
 if (s->msix) {
-xen_pt_msix_delete(s);
+xen_pt_msix_unmap(s);
 }
 g_free(s->msi);
 
diff --git a/hw/xen/xen_pt_msi.c b/hw/xen/xen_pt_msi.c
index e3d7194..82de2bc 100644
--- a/hw/xen/xen_pt_msi.c
+++ b/hw/xen/xen_pt_msi.c
@@ -610,7 +610,7 @@ error_out:
 return rc;
 }
 
-void xen_pt_msix_delete(XenPCIPassthroughState *s)
+void xen_pt_msix_unmap(XenPCIPassthroughState *s)
 {
 XenPTMSIX *msix = s->msix;
 
@@ -627,6 +627,17 @@ void xen_pt_msix_delete(XenPCIPassthroughState *s)
 }
 
 memory_region_del_subregion(>bar[msix->bar_index], >mmio);
+}
+
+void xen_pt_msix_delete(XenPCIPassthroughState *s)
+{
+XenPTMSIX *msix = s->msix;
+
+if (!msix) {
+return;
+}
+
+object_unparent(OBJECT(>mmio));
 
 g_free(s->msix);
 s->msix = NULL;
-- 
1.7.10.4




Re: [Qemu-devel] [PATCH] tests: re-enable vhost-user-test

2015-10-26 Thread Michael S. Tsirkin
On Mon, Oct 26, 2015 at 12:25:38PM +0100, Marc-André Lureau wrote:
> Hi
> 
> On Sat, Oct 24, 2015 at 9:44 PM, Michael S. Tsirkin  wrote:
> > On Thu, Oct 15, 2015 at 04:39:25PM +0200, marcandre.lur...@redhat.com wrote:
> >> From: Marc-André Lureau 
> >>
> >> Commit 7fe34ca9c2e actually disabled vhost-user-test altogether,
> >> since CONFIG_VHOST_NET is a per-target config variable.
> >>
> >> tests/vhost-user-test is already x86/64 softmmu specific test, in order
> >> to enable it correctly, kvm & vhost-net are also conditions. To check
> >> that, set CONFIG_VHOST_NET_TEST when kvm is also enabled.
> >>
> >> Signed-off-by: Marc-André Lureau 
> >
> > I had to drop this, this still seems to break on some configs.
> > Pls work to fix this up.
> 
> I am not sure I understand completely the issue that Peter is having
> on arm. I suppose my arm VM doesn't have kvm, and fails to reproduce
> it. What probably happens is that CONFIG_VHOST_NET_TEST is enabled
> because  "$target_name" = "$cpu" for $target_name = aarch64, then the
> test is trying to run with the qemu-system-i386 binary, but that one
> doesn't have vhost-net. We would probably need something like that
> (pseudo-code, I failed to express this with Makefile):
> 
> @@ -5652,6 +5654,7 @@ case "$target_name" in
>echo "CONFIG_KVM=y" >> $config_target_mak
>if test "$vhost_net" = "yes" ; then
>  echo "CONFIG_VHOST_NET=y" >> $config_target_mak
> +echo "CONFIG_VHOST_NET_TEST+=$cpu" >> $config_host_mak
>fi
>  fi
>  esac
> diff --git a/tests/Makefile b/tests/Makefile
> index 9341498..40fd02a 100644
> --- a/tests/Makefile
> +++ b/tests/Makefile
> @@ -192,9 +192,8 @@ gcov-files-i386-y += hw/usb/hcd-xhci.c
>  check-qtest-i386-y += tests/pc-cpu-test$(EXESUF)
>  check-qtest-i386-y += tests/q35-test$(EXESUF)
>  gcov-files-i386-y += hw/pci-host/q35.c
> -ifeq ($(CONFIG_VHOST_NET),y)
> -check-qtest-i386-$(CONFIG_LINUX) += tests/vhost-user-test$(EXESUF)
> -endif
> +# foreach CPU in CONFIG_VHOST_NET_TEST
> +# check-qtest-$(CPU)-y += tests/vhost-user-test$(EXESUF)

Like this then?

if test target_name == "i386" -o target_name == "x86_64"
then
echo "CONFIG_VHOST_NET_TEST_$target_name=y" >> $config_host_mak
fi


ifeq ($(CONFIG_VHOST_NET_i386),y)
check-qtest-i386-y += tests/vhost-user-test$(EXESUF)
endif
ifeq ($(CONFIG_VHOST_NET_x86_64),y)
check-qtest-x86_64-y += tests/vhost-user-test$(EXESUF)
endif

> 
> I don't feel very confortable with that sort of per-host/per-target
> complex configure-time conditions. I would rather simply use a simple
> runtime test check such as:

Problem with runtime checks is it makes people not notice
there's a problem.

-- 
MST



[Qemu-devel] [PULL 3/3] xen-platform: Replace assert() with appropriate error reporting

2015-10-26 Thread Stefano Stabellini
From: Eduardo Habkost 

Commit dbb7405d8caad0814ceddd568cb49f163a847561 made it possible to
trigger an assert using "-device xen-platform". Replace it with
appropriate error reporting.

Before:

  $ qemu-system-x86_64 -device xen-platform
  qemu-system-x86_64: hw/i386/xen/xen_platform.c:391: xen_platform_initfn: 
Assertion `xen_enabled()' failed.
  Aborted (core dumped)
  $

After:

  $ qemu-system-x86_64 -device xen-platform
  qemu-system-x86_64: -device xen-platform: xen-platform device requires the 
Xen accelerator
  $

Signed-off-by: Eduardo Habkost 
Reviewed-by: Stefano Stabellini 
Signed-off-by: Stefano Stabellini 
---
 hw/i386/xen/xen_platform.c |6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/hw/i386/xen/xen_platform.c b/hw/i386/xen/xen_platform.c
index 3dc68cb..de83f4e 100644
--- a/hw/i386/xen/xen_platform.c
+++ b/hw/i386/xen/xen_platform.c
@@ -33,6 +33,7 @@
 #include "trace.h"
 #include "exec/address-spaces.h"
 #include "sysemu/block-backend.h"
+#include "qemu/error-report.h"
 
 #include 
 
@@ -388,7 +389,10 @@ static void xen_platform_realize(PCIDevice *dev, Error 
**errp)
 uint8_t *pci_conf;
 
 /* Device will crash on reset if xen is not initialized */
-assert(xen_enabled());
+if (!xen_enabled()) {
+error_setg(errp, "xen-platform device requires the Xen accelerator");
+return;
+}
 
 pci_conf = dev->config;
 
-- 
1.7.10.4




Re: [Qemu-devel] [PATCH v8 19/54] Return path: Control commands

2015-10-26 Thread Dr. David Alan Gilbert
* Juan Quintela (quint...@redhat.com) wrote:
> "Dr. David Alan Gilbert (git)"  wrote:
> > From: "Dr. David Alan Gilbert" 
> >
> > Add two src->dest commands:
> >* OPEN_RETURN_PATH - To request that the destination open the return path
> >* PING - Request an acknowledge from the destination
> >
> > Signed-off-by: Dr. David Alan Gilbert 
> > Reviewed-by: Amit Shah 
> 
> Reviewed-by: Juan Quintela 
> 
> 
> > +void qemu_savevm_send_open_return_path(QEMUFile *f)
> > +{
> > +qemu_savevm_command_send(f, MIG_CMD_OPEN_RETURN_PATH, 0, NULL);
> 
> 
> For consistency, I would have put a
> 
>trace_savevm_send_open_return_path() here
> 
> The send in the loadvm path

Done.

Dave
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



[Qemu-devel] [PATCH 1/1] virtio: sync the dataplane vring state to the virtqueue before virtio_save

2015-10-26 Thread Denis V. Lunev
From: Pavel Butsykin 

When creating snapshot with the dataplane enabled, the snapshot file gets
not the actual state of virtqueue, because the current state is stored in
VirtIOBlockDataPlane. Therefore, before saving snapshot need to sync
the dataplane vring state to the virtqueue. The dataplane will resume its
work at the next notify virtqueue.

When snapshot loads with loadvm we get a message:
VQ 0 size 0x80 Guest index 0x15f5 inconsistent with Host index 0x0:
delta 0x15f5
error while loading state for instance 0x0 of device
':00:08.0/virtio-blk'
Error -1 while loading VM state

to reproduce the error I used the following hmp commands:
savevm snap1
loadvm snap1

qemu parameters:
--enable-kvm -smp 4 -m 1024 -drive 
file=/var/lib/libvirt/images/centos6.4.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,aio=native
 -device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk0,id=virtio-disk0
 -set device.virtio-disk0.x-data-plane=on

Signed-off-by: Pavel Butsykin 
Signed-off-by: Denis V. Lunev 
CC: Stefan Hajnoczi 
CC: "Michael S. Tsirkin" 
CC: Kevin Wolf 
CC: Paolo Bonzini 
---
 hw/block/virtio-blk.c | 5 +
 hw/scsi/virtio-scsi.c | 5 +
 2 files changed, 10 insertions(+)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 8beb26b..89ab72a 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -798,6 +798,11 @@ static void virtio_blk_set_status(VirtIODevice *vdev, 
uint8_t status)
 static void virtio_blk_save(QEMUFile *f, void *opaque)
 {
 VirtIODevice *vdev = VIRTIO_DEVICE(opaque);
+VirtIOBlock *s = VIRTIO_BLK(vdev);
+
+if (s->dataplane) {
+virtio_blk_data_plane_stop(s->dataplane);
+}
 
 virtio_save(vdev, f);
 }
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 20885fb..8ad3257 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -665,6 +665,11 @@ static void virtio_scsi_reset(VirtIODevice *vdev)
 static void virtio_scsi_save(QEMUFile *f, void *opaque)
 {
 VirtIODevice *vdev = VIRTIO_DEVICE(opaque);
+VirtIOSCSI *s = VIRTIO_SCSI(vdev);
+
+if (s->dataplane_started) {
+virtio_scsi_dataplane_stop(s);
+}
 virtio_save(vdev, f);
 }
 
-- 
2.1.4




Re: [Qemu-devel] [PATCH v4 03/13] target-arm: Add support for AArch32 S2 negative t0sz

2015-10-26 Thread Edgar E. Iglesias
On Fri, Oct 23, 2015 at 04:29:35PM +0100, Peter Maydell wrote:
> On 14 October 2015 at 23:55, Edgar E. Iglesias  
> wrote:
> > From: "Edgar E. Iglesias" 
> >
> > Add support for AArch32 S2 negative t0sz. In preparation for
> > using 40bit IPAs on AArch32.
> >
> > Signed-off-by: Edgar E. Iglesias 
> > ---
> >  target-arm/helper.c | 11 +++
> >  1 file changed, 11 insertions(+)
> >
> > diff --git a/target-arm/helper.c b/target-arm/helper.c
> > index 4e19838..a8a46db 100644
> > --- a/target-arm/helper.c
> > +++ b/target-arm/helper.c
> > @@ -6475,6 +6475,17 @@ static bool get_phys_addr_lpae(CPUARMState *env, 
> > target_ulong address,
> >  if (va_size == 64) {
> >  t0sz = MIN(t0sz, 39);
> >  t0sz = MAX(t0sz, 16);
> > +} else {
> > +bool sext = extract32(t0sz, 4, 1);
> > +bool sign = extract32(t0sz, 3, 1);
> > +t0sz = sextract32(t0sz, 0, 4);
> > +
> > +/* If the sign-extend bit is not the same as t0sz[3], the result
> > + * is unpredictable. Flag this as a guest error.  */
> > +if (sign != sext) {
> > +qemu_log_mask(LOG_GUEST_ERROR,
> > +  "AArch32: VTCR.S / VTCR.T0SZ[3] missmatch\n");
> > +}
> 
> Shouldn't this be guarded by a check on whether this is an s2
> translation, since the 4-bit signed T0SZ and the S bit are only for
> the VTCR, not for the normal TTBCRs ?

Yes, sounds good. I've changed the patch to the following:

@@ -6521,8 +6521,24 @@ static bool get_phys_addr_lpae(CPUARMState *env, 
target_ulong address,
  */
 int32_t t0sz = extract32(tcr->raw_tcr, 0, 6);
 if (va_size == 64) {
+/* AArch64 translation.  */
 t0sz = MIN(t0sz, 39);
 t0sz = MAX(t0sz, 16);
+} else if (mmu_idx != ARMMMUIdx_S2NS) {
+/* AArch32 stage 1 translation.  */
+t0sz = extract32(t0sz, 0, 3);
+} else {
+/* AArch32 stage 2 translation.  */
+bool sext = extract32(t0sz, 4, 1);
+bool sign = extract32(t0sz, 3, 1);
+t0sz = sextract32(t0sz, 0, 4);
+
+/* If the sign-extend bit is not the same as t0sz[3], the result
+ * is unpredictable. Flag this as a guest error.  */
+if (sign != sext) {
+qemu_log_mask(LOG_GUEST_ERROR,
+  "AArch32: VTCR.S / VTCR.T0SZ[3] missmatch\n");
+}
 }


We can also remove the error log and add more complete checks in future
patches if you prefer...

Cheers,
Edgar




> 
> That is, we have 3 cases here for determining t0sz:
>  * AArch64 6-bit unsigned field
>  * AArch32 stage 1 3-bit unsigned field
>  * AArch32 stage 2 4-bit signed field
> so we need more than just a single if/else.
> 
> It's true that bits 3 and 4 are RES0 for TTBCR, but if we're
> going to actually start logging guest errors here maybe we
> should actually report the real problem (RES0 bits being set)
> for that case.
> 
> thanks
> -- PMM



Re: [Qemu-devel] [PULL 00/37] Block layer patches

2015-10-26 Thread Peter Maydell
On 23 October 2015 at 18:00, Kevin Wolf  wrote:
> The following changes since commit 1e700f4c6cddaf29ce1d205f0f8e8b9255481930:
>
>   Merge remote-tracking branch 'remotes/mdroth/tags/qga-pull-2015-10-23-tag' 
> into staging (2015-10-23 15:55:50 +0100)
>
> are available in the git repository at:
>
>
>   git://repo.or.cz/qemu/kevin.git tags/for-upstream
>
> for you to fetch changes up to c07bc2c1658fffeee08eb46402b2f66d55b07586:
>
>   tests: Add test case for aio_disable_external (2015-10-23 18:18:24 +0200)
>
> 
> Block layer patches
>
Applied, thanks.

-- PMM



Re: [Qemu-devel] [PATCH v4 09/13] target-arm: Add ARMMMUFaultInfo

2015-10-26 Thread Edgar E. Iglesias
On Fri, Oct 23, 2015 at 05:53:09PM +0100, Peter Maydell wrote:
> On 14 October 2015 at 23:55, Edgar E. Iglesias  
> wrote:
> > From: "Edgar E. Iglesias" 
> >
> > Introduce ARMMMUFaultInfo to propagate MMU Fault information
> > across the MMU translation code path. This is in preparation for
> > adding Stage-2 translation.
> >
> > No functional changes.
> >
> > Signed-off-by: Edgar E. Iglesias 
> > @@ -1774,9 +1775,10 @@ static uint64_t do_ats_write(CPUARMState *env, 
> > uint64_t value,
> >  bool ret;
> >  uint64_t par64;
> >  MemTxAttrs attrs = {};
> > +ARMMMUFaultInfo fi = {};
> 
> Why are most of these initialized with "{}" ...
> 
> > @@ -83,8 +83,9 @@ void tlb_fill(CPUState *cs, target_ulong addr, int 
> > is_write, int mmu_idx,
> >  {
> >  bool ret;
> >  uint32_t fsr = 0;
> > +struct ARMMMUFaultInfo fi = {0};
> 
> ...but this one uses "{0}" ?

No reason.. I've removed the struct and zero so it's now:

ARMMMUFaultInfo fi = {};

everywhere.

Thanks,
Edgar


> 
> Otherwise
> Reviewed-by: Peter Maydell 
> 
> thanks
> -- PMM



[Qemu-devel] [PATCH 2/7] target-i386/kvm: Hyper-V SynIC MSR's support

2015-10-26 Thread Andrey Smetanin
Hyper-V SynIC MSR's support and MSR's migration.

Signed-off-by: Andrey Smetanin 
Reviewed-by: Roman Kagan 
Signed-off-by: Denis V. Lunev 
CC: Vitaly Kuznetsov 
CC: "K. Y. Srinivasan" 
CC: Gleb Natapov 
CC: Paolo Bonzini 
CC: Roman Kagan 
CC: Denis V. Lunev 
CC: k...@vger.kernel.org
CC: virtualizat...@lists.linux-foundation.org

---
 target-i386/cpu-qom.h |  1 +
 target-i386/cpu.c |  1 +
 target-i386/cpu.h |  5 +
 target-i386/kvm.c | 62 ++-
 target-i386/machine.c | 39 
 5 files changed, 107 insertions(+), 1 deletion(-)

diff --git a/target-i386/cpu-qom.h b/target-i386/cpu-qom.h
index e3bfe9d..7ea5b34 100644
--- a/target-i386/cpu-qom.h
+++ b/target-i386/cpu-qom.h
@@ -94,6 +94,7 @@ typedef struct X86CPU {
 bool hyperv_reset;
 bool hyperv_vpindex;
 bool hyperv_runtime;
+bool hyperv_synic;
 bool check_cpuid;
 bool enforce_cpuid;
 bool expose_kvm;
diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index c1a9e09..ff1c0a9 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -3141,6 +3141,7 @@ static Property x86_cpu_properties[] = {
 DEFINE_PROP_BOOL("hv-reset", X86CPU, hyperv_reset, false),
 DEFINE_PROP_BOOL("hv-vpindex", X86CPU, hyperv_vpindex, false),
 DEFINE_PROP_BOOL("hv-runtime", X86CPU, hyperv_runtime, false),
+DEFINE_PROP_BOOL("hv-synic", X86CPU, hyperv_synic, false),
 DEFINE_PROP_BOOL("check", X86CPU, check_cpuid, false),
 DEFINE_PROP_BOOL("enforce", X86CPU, enforce_cpuid, false),
 DEFINE_PROP_BOOL("kvm", X86CPU, expose_kvm, true),
diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index 62f7879..8611015 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -915,6 +915,11 @@ typedef struct CPUX86State {
 uint64_t msr_hv_tsc;
 uint64_t msr_hv_crash_params[HV_X64_MSR_CRASH_PARAMS];
 uint64_t msr_hv_runtime;
+uint64_t msr_hv_synic_control;
+uint64_t msr_hv_synic_version;
+uint64_t msr_hv_synic_evt_page;
+uint64_t msr_hv_synic_msg_page;
+uint64_t msr_hv_synic_sint[HV_SYNIC_SINT_COUNT];
 
 /* exception/interrupt handling */
 int error_code;
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 64046cb..325dfec 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -86,6 +86,7 @@ static bool has_msr_hv_crash;
 static bool has_msr_hv_reset;
 static bool has_msr_hv_vpindex;
 static bool has_msr_hv_runtime;
+static bool has_msr_hv_synic;
 static bool has_msr_mtrr;
 static bool has_msr_xss;
 
@@ -476,7 +477,8 @@ static bool hyperv_enabled(X86CPU *cpu)
 cpu->hyperv_crash ||
 cpu->hyperv_reset ||
 cpu->hyperv_vpindex ||
-cpu->hyperv_runtime);
+cpu->hyperv_runtime ||
+cpu->hyperv_synic);
 }
 
 static Error *invtsc_mig_blocker;
@@ -565,6 +567,9 @@ int kvm_arch_init_vcpu(CPUState *cs)
 if (cpu->hyperv_runtime && has_msr_hv_runtime) {
 c->eax |= HV_X64_MSR_VP_RUNTIME_AVAILABLE;
 }
+if (cpu->hyperv_synic && has_msr_hv_synic) {
+c->eax |= HV_X64_MSR_SYNIC_AVAILABLE;
+}
 c = _data.entries[cpuid_i++];
 c->function = HYPERV_CPUID_ENLIGHTMENT_INFO;
 if (cpu->hyperv_relaxed_timing) {
@@ -905,6 +910,10 @@ static int kvm_get_supported_msrs(KVMState *s)
 has_msr_hv_runtime = true;
 continue;
 }
+if (kvm_msr_list->indices[i] == HV_X64_MSR_SCONTROL) {
+has_msr_hv_synic = true;
+continue;
+}
 }
 }
 
@@ -1466,6 +1475,31 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
 kvm_msr_entry_set([n++], HV_X64_MSR_VP_RUNTIME,
   env->msr_hv_runtime);
 }
+if (has_msr_hv_synic) {
+int j;
+
+if (!env->msr_hv_synic_version) {
+/* First time initialization */
+env->msr_hv_synic_version = HV_SYNIC_VERSION_1;
+for (j = 0; j < ARRAY_SIZE(env->msr_hv_synic_sint); j++) {
+env->msr_hv_synic_sint[j] = HV_SYNIC_SINT_MASKED;
+}
+}
+
+kvm_msr_entry_set([n++], HV_X64_MSR_SCONTROL,
+  env->msr_hv_synic_control);
+kvm_msr_entry_set([n++], HV_X64_MSR_SVERSION,
+  env->msr_hv_synic_version);
+kvm_msr_entry_set([n++], HV_X64_MSR_SIEFP,
+  env->msr_hv_synic_evt_page);
+kvm_msr_entry_set([n++], HV_X64_MSR_SIMP,
+  env->msr_hv_synic_msg_page);
+
+for (j = 0; j < ARRAY_SIZE(env->msr_hv_synic_sint); j++) {
+kvm_msr_entry_set([n++], 

[Qemu-devel] [PATCH 1/7] standard-headers/x86: add Hyper-V SynIC constants

2015-10-26 Thread Andrey Smetanin
Signed-off-by: Andrey Smetanin 
Reviewed-by: Roman Kagan 
Signed-off-by: Denis V. Lunev 
CC: Vitaly Kuznetsov 
CC: "K. Y. Srinivasan" 
CC: Gleb Natapov 
CC: Paolo Bonzini 
CC: Roman Kagan 
CC: Denis V. Lunev 
CC: k...@vger.kernel.org
CC: virtualizat...@lists.linux-foundation.org

---
 include/standard-headers/asm-x86/hyperv.h | 12 
 1 file changed, 12 insertions(+)

diff --git a/include/standard-headers/asm-x86/hyperv.h 
b/include/standard-headers/asm-x86/hyperv.h
index c37c14e..f9780f1 100644
--- a/include/standard-headers/asm-x86/hyperv.h
+++ b/include/standard-headers/asm-x86/hyperv.h
@@ -257,4 +257,16 @@ typedef struct _HV_REFERENCE_TSC_PAGE {
int64_t tsc_offset;
 } HV_REFERENCE_TSC_PAGE, *PHV_REFERENCE_TSC_PAGE;
 
+/* Define the number of synthetic interrupt sources. */
+#define HV_SYNIC_SINT_COUNT(16)
+/* Define the expected SynIC version. */
+#define HV_SYNIC_VERSION_1 (0x1)
+
+#define HV_SYNIC_CONTROL_ENABLE(1ULL << 0)
+#define HV_SYNIC_SIMP_ENABLE   (1ULL << 0)
+#define HV_SYNIC_SIEFP_ENABLE  (1ULL << 0)
+#define HV_SYNIC_SINT_MASKED   (1ULL << 16)
+#define HV_SYNIC_SINT_AUTO_EOI (1ULL << 17)
+#define HV_SYNIC_SINT_VECTOR_MASK  (0xFF)
+
 #endif
-- 
2.4.3




Re: [Qemu-devel] Qemu.git Build Failed

2015-10-26 Thread Peter Maydell
On 26 October 2015 at 09:43, Zhang, Xianda  wrote:
>
> I'd like to know whether you have fixed the problem and  the commit id. Thank 
> you.

Yes, the fix is now in master as commit 7f4a930e64b9e69cd.

thanks
-- PMM



Re: [Qemu-devel] [PULL v4 00/52] Ivshmem patches

2015-10-26 Thread Peter Maydell
On 26 October 2015 at 09:28, Marc-André Lureau
 wrote:
> Hi Peter,
>
> On Mon, Oct 19, 2015 at 5:57 PM, Peter Maydell  
> wrote:
>>
>> What is happening here is that we are looping infinitely in
>> mktempshmem() because shm_open() returns -1 with errno ENOSYS,
>> and there's no code in the loop that stops the loop on anything
>> except shm_open succeeding, or even prints anything out about
>> shm_open failing.
>>
>> I think this is failing for me because my system's chroot doesn't have
>> /dev/shm mounted. It would be nice if we could at a minimum handle
>> this reasonably gracefully...
>
> The following diff works for me:
>
> diff --git a/tests/ivshmem-test.c b/tests/ivshmem-test.c
> index efaa6e3..c8f0cf0 100644
> --- a/tests/ivshmem-test.c
> +++ b/tests/ivshmem-test.c
> @@ -441,13 +441,18 @@ static gchar *mktempshm(int size, int *fd)
>  }
>
>  g_free(name);
> +
> +if (errno != EEXIST) {
> +perror("shm_open");
> +return NULL;
> +}
>  }
>  }
>
>  int main(int argc, char **argv)
>  {
>  int ret, fd;
>  gchar dir[] = "/tmp/ivshmem-test.XX";
>
>  #if !GLIB_CHECK_VERSION(2, 31, 0)
>  if (!g_thread_supported()) {
> @@ -460,6 +465,9 @@ int main(int argc, char **argv)
>  qtest_add_abrt_handler(abrt_handler, NULL);
>  /* shm */
>  tmpshm = mktempshm(TMPSHMSIZE, );
> +if (!tmpshm) {
> +return 0;
> +}
>  tmpshmem = mmap(0, TMPSHMSIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
>  g_assert(tmpshmem != MAP_FAILED);
>  /* server */

This will print a cryptic error message and then not fail the test,
which is not great. Maybe that's ok for the moment in the interests
of not keeping this huge patchset out of tree for too long[*], but
we should look at what glib's test framework provides in the way
of being able to report "skipped this test" outcomes.

[*] Incidentally this whole saga demonstrates why my general
recommendation is to keep pull requests at much less than
50 patches...

> I rebased and updated the tag.

If you mean by this "please retry the pull" you should send a fresh
coverletter email so my scripts will pick it up...

thanks
-- PMM



Re: [Qemu-devel] Reminder: we're now in softfreeze

2015-10-26 Thread Peter Maydell
On 26 October 2015 at 09:56, Mark Cave-Ayland
 wrote:
> Maybe we also need a [PING for-2.5] or similar? I've just sent another
> couple of pings for patches (one of which fixes the broken
> analyze-migration.py script) which have fallen through the cracks over
> the past couple of months. It would be nice to have an easy way to flag
> outstanding patches like this, especially during freeze when things get
> really busy.

That's what I'm hoping the wiki page will work for. Unlike the
mailing list, it's very short and easy to scan through :-)

-- PMM



Re: [Qemu-devel] [PATCH] virtio: Notice when the system doesn't support MSIx at all

2015-10-26 Thread Mark Cave-Ayland
On 26/10/15 10:03, Michael S. Tsirkin wrote:

> On Mon, Oct 26, 2015 at 09:44:47AM +, Mark Cave-Ayland wrote:
>> On 27/09/15 15:40, Peter Maydell wrote:
>>
>>> Ping^2 -- this is still confusing users, cf bug LP:1500175.
>>>
>>> thanks
>>> -- PMM
>>>
>>> On 6 July 2015 at 11:38, Peter Maydell  wrote:
 On 3 July 2015 at 09:22, Mark Cave-Ayland  
 wrote:
> On 19/05/15 21:29, Richard Henderson wrote:
>
>> And do not issue an error_report in that case.
>>
>> Signed-off-by: Richard Henderson 
>> ---
>>  hw/virtio/virtio-pci.c | 15 ++-
>>  1 file changed, 10 insertions(+), 5 deletions(-)
>> --
>>
>> This seems to be about the only sane way to test the value of
>> msi_supported from here.  Thoughts?

> Ping? Any chance of getting this applied for 2.4 since it's something
> that I do get emails about for SPARC64?

 Michael would like a respin with the commit message correction
 he suggested. RTH -- could you send out a fixed up version, please?
> 
> I don't think there was a reply to this, and by now the patch doesn't apply.

Hmmm that's a shame. I'm not overly familiar with the PCI parts of QEMU,
so is it much work to rebase?


 thanks
 -- PMM
>>
>> Another ping on this patch - please can we have it for 2.5? It has been
>> around since before 2.4 and I'd hate for it to miss another release :(
>>
>>
>> ATB,
>>
>> Mark.
> 
> Pls address the issues if you want this applied.

The patch came from Richard rather than myself - my only involvement has
been to champion the patch since both myself and Peter have had reports
of this message confusing users.


ATB,

Mark.




Re: [Qemu-devel] [Qemu-block] [PATCH] block/nfs: add support for setting debug level

2015-10-26 Thread Peter Lieven

Am 26.10.2015 um 11:45 schrieb Stefan Hajnoczi:

On Thu, Oct 22, 2015 at 08:37:19AM +0200, Peter Lieven wrote:

Am 22.09.2015 um 08:13 schrieb Peter Lieven:

Am 25.06.2015 um 15:18 schrieb Stefan Hajnoczi:

On Tue, Jun 23, 2015 at 10:12:15AM +0200, Peter Lieven wrote:

upcoming libnfs versions will support logging debug messages. Add
support for it in qemu through an URL parameter.

Signed-off-by: Peter Lieven 
---
  block/nfs.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/block/nfs.c b/block/nfs.c
index ca9e24e..f7388a3 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -329,6 +329,10 @@ static int64_t nfs_client_open(NFSClient *client, const 
char *filename,
  } else if (!strcmp(qp->p[i].name, "readahead")) {
  nfs_set_readahead(client->context, val);
  #endif
+#ifdef LIBNFS_FEATURE_DEBUG
+} else if (!strcmp(qp->p[i].name, "debug")) {
+nfs_set_debug(client->context, val);
+#endif
  } else {
  error_setg(errp, "Unknown NFS parameter name: %s",
 qp->p[i].name);

Untrusted users may be able to set these options since they are encoded
in the URI.  I'm imagining a hosting or cloud scenario like OpenStack.

A verbose debug level spams stderr and could consume a lot of disk
space.

(The uid and gid options are probably okay since the NFS server cannot
trust the uid/gid coming from QEMU anyway.)

I think we can merge this patch for QEMU 2.4 but I'd like to have a
discussion about the security risk of encoding libnfs options in the
URI.

CCed Eric Blake in case libvirt is affected.

Has anyone thought about this and what are the rules?

As I hadn't time to work further on the best way to add options for NFS (and 
other
protocols), would it be feasible to allow passing debug as an URL parameter, but
limit the maximum debug level to limit a possible security impact (flooding 
logs)?

If a higher debug level is needed it can be set via device specific options as 
soon
there is a common scheme for them.

Any objections?

If you are sure that ERROR and WARN levels (or similar) don't flood the
logs, then it sounds like a solution.


Thats not the case. I use debug level 2 for quite some time. Mainly to see NFS 
connection interruptions.

So I would be happy if we could allow for debug <= 2 from the cmdline.

Peter



Re: [Qemu-devel] [PATCH 15/19] pc: acpi: bump DSDT revision compliance to v2

2015-10-26 Thread Igor Mammedov
On Mon, 26 Oct 2015 12:07:50 +0200
"Michael S. Tsirkin"  wrote:

> On Mon, Oct 26, 2015 at 11:03:01AM +0100, Igor Mammedov wrote:
> > On Sat, 24 Oct 2015 22:40:59 +0300
> > "Michael S. Tsirkin"  wrote:
> > 
> > > On Fri, Oct 23, 2015 at 04:57:18PM +0200, Igor Mammedov wrote:
> > > > it turns on 64-bit integer handling in OSPM, which we could use
> > > > for writing simpler/smaller AML code.
> > > > Tested with Windows XP and Windows Server 2008, Linux:
> > > >  * XP doesn't care about revision and continues to use 32 integers
> > > >and boots just fine with this change.
> > > >  * WS 2008 and Linux - support rev2 and use 64-bit integers
> > > > 
> > > > Signed-off-by: Igor Mammedov 
> > > 
> > > This is still planned, right? IIUC you didn't post any code
> > > that needs the 64 bit math.
> > nope, the next patch 16/19 uses 64-bit math,
> > 
> > it greatly simplifies _CRS as we don't have to do
> > 64-bit math manually using 32-bit integers.
> > 
> > And even if we put new MHPD.MCRS() that uses 64-bit math in SSDT,
> > it won't crash XP unless someone would try to do memory hotplug
> > 
> > and even it could be 'fixed' if we check _REV on every
> > hotplug event, it's a bit ugly but should work.
> 
> Aha. That's exactly what I said. All that patch commit log
> says is
> pc: acpi: memhp: move MHPD.MCRS method into MHPT table
> when in fact you also rework it all to use 64 bit math.
yep, it's my fault. I'll fix it.

> 
> So pls don't do this. Pls start by rewriting ASL in C
> while keeping AML code identical. Cleanups - next.
That's what I'm trying to avoid in this case as 
reworked code is much simpler than the original ASL.
Rewriting original complex MHPD.MCRS() ASL into C and
immediately replacing it with simplified version is
rather counterproductive especially taking in account that
'make check' tests will fail anyway since code is moved
between tables and C-produced AML doesn't exactly match
blobs produced by a particular IASL version anyway.

But I can certainly do/split it other way around,
simplify ASL first and then rewrite result in C.
That should make reviewing easier even if tests
are broken.

> 
> > > 
> > > > ---
> > > >  hw/i386/acpi-build.c  | 2 +-
> > > >  hw/i386/acpi-dsdt.dsl | 2 +-
> > > >  hw/i386/q35-acpi-dsdt.dsl | 2 +-
> > > >  3 files changed, 3 insertions(+), 3 deletions(-)
> > > > 
> > > > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> > > > index 8add4d9..c929540 100644
> > > > --- a/hw/i386/acpi-build.c
> > > > +++ b/hw/i386/acpi-build.c
> > > > @@ -1484,7 +1484,7 @@ build_dsdt(GArray *table_data, GArray *linker, 
> > > > AcpiMiscInfo *misc)
> > > >  
> > > >  memset(dsdt, 0, sizeof *dsdt);
> > > >  build_header(linker, table_data, dsdt, "DSDT",
> > > > - misc->dsdt_size, 1);
> > > > + misc->dsdt_size, 2);
> > > >  }
> > > >  
> > > >  static GArray *
> > > > diff --git a/hw/i386/acpi-dsdt.dsl b/hw/i386/acpi-dsdt.dsl
> > > > index 8dba096..6d46b36 100644
> > > > --- a/hw/i386/acpi-dsdt.dsl
> > > > +++ b/hw/i386/acpi-dsdt.dsl
> > > > @@ -22,7 +22,7 @@ ACPI_EXTRACT_ALL_CODE AcpiDsdtAmlCode
> > > >  DefinitionBlock (
> > > >  "acpi-dsdt.aml",// Output Filename
> > > >  "DSDT", // Signature
> > > > -0x01,   // DSDT Compliance Revision
> > > > +0x02,   // DSDT Compliance Revision
> > > >  "BXPC", // OEMID
> > > >  "BXDSDT",   // TABLE ID
> > > >  0x1 // OEM Revision
> > > > diff --git a/hw/i386/q35-acpi-dsdt.dsl b/hw/i386/q35-acpi-dsdt.dsl
> > > > index 7be7b37..ecefdec 100644
> > > > --- a/hw/i386/q35-acpi-dsdt.dsl
> > > > +++ b/hw/i386/q35-acpi-dsdt.dsl
> > > > @@ -28,7 +28,7 @@ ACPI_EXTRACT_ALL_CODE Q35AcpiDsdtAmlCode
> > > >  DefinitionBlock (
> > > >  "q35-acpi-dsdt.aml",// Output Filename
> > > >  "DSDT", // Signature
> > > > -0x01,   // DSDT Compliance Revision
> > > > +0x02,   // DSDT Compliance Revision
> > > >  "BXPC", // OEMID
> > > >  "BXDSDT",   // TABLE ID
> > > >  0x2 // OEM Revision
> > > > -- 
> > > > 1.8.3.1




[Qemu-devel] [Bug 1505652] Re: An IO error happen when qemu snapshot-create

2015-10-26 Thread Stefan Hajnoczi
Please ask the libvirt community for help.  This issue is probably
related to your libvirt storage setup.

** Changed in: qemu
   Status: New => Invalid

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1505652

Title:
  An IO error happen when qemu  snapshot-create

Status in QEMU:
  Invalid

Bug description:
  My qemu version is 1.7.1,but when I try to make live snapshot by
  libvirt,the libvirt sometimes report an error
  :qemuMonitorJSONCheckError:377 : internal error: unable to execute
  QEMU command 'transaction': An IO error has occurred.

  Here is the command being snpshot create by virsh:
  virsh  snapshot-create snapshot-test.vm   snapshot.xml  --no-metadata  
--disk-only --reuse-external
  the snapshot.xml:
  
test

  

  

  

  
  I have read the qemu code about the snapshot create, and I find the qemu when 
call the function handle_aiocb_rw_linear():
  static ssize_t handle_aiocb_rw_linear(RawPosixAIOData *aiocb, char *buf)
  {
  ssize_t offset = 0;
  ssize_t len;

  while (offset < aiocb->aio_nbytes) {
  if (aiocb->aio_type & QEMU_AIO_WRITE) {
  len = pwrite(aiocb->aio_fildes,
   (const char *)buf + offset,
   aiocb->aio_nbytes - offset,
   aiocb->aio_offset + offset);
  } else {
  len = pread(aiocb->aio_fildes,
  buf + offset,
  aiocb->aio_nbytes - offset,
  aiocb->aio_offset + offset);
  }
  if (len == -1 && errno == EINTR) {
  continue;
  } else if (len == -1) {
  offset = -errno;
  break;
  } else if (len == 0) {
  break;
  }
  offset += len;
  }

  return offset;
  }

  The function pwrite happen error,the errono is 1,and the  error 
describe:"pwrite  failed, Operation not permitted (1, EPERM) because the 
process does not have the appropriate privileges to use the pwrite system call".
  The qemu call stack about is:
  
external_snapshot_prepare()->bdrv_flush()->...->paio_submit->...->handle_aiocb_rw_linear.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1505652/+subscriptions



[Qemu-devel] [PATCH 7/9] block: Drop BlockDriver.bdrv_ioctl

2015-10-26 Thread Fam Zheng
Now the callback is not used any more, drop the field along with all
implementations in block drivers, which are iscsi and raw.

Signed-off-by: Fam Zheng 
---
 block/iscsi.c | 33 -
 block/raw-posix.c |  9 -
 block/raw_bsd.c   |  6 --
 include/block/block_int.h |  1 -
 4 files changed, 49 deletions(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index 94cbdf2..8134cc8 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -844,38 +844,6 @@ static BlockAIOCB *iscsi_aio_ioctl(BlockDriverState *bs,
 return >common;
 }
 
-static void ioctl_cb(void *opaque, int status)
-{
-int *p_status = opaque;
-*p_status = status;
-}
-
-static int iscsi_ioctl(BlockDriverState *bs, unsigned long int req, void *buf)
-{
-IscsiLun *iscsilun = bs->opaque;
-int status;
-
-switch (req) {
-case SG_GET_VERSION_NUM:
-*(int *)buf = 3;
-break;
-case SG_GET_SCSI_ID:
-((struct sg_scsi_id *)buf)->scsi_type = iscsilun->type;
-break;
-case SG_IO:
-status = -EINPROGRESS;
-iscsi_aio_ioctl(bs, req, buf, ioctl_cb, );
-
-while (status == -EINPROGRESS) {
-aio_poll(iscsilun->aio_context, true);
-}
-
-return 0;
-default:
-return -1;
-}
-return 0;
-}
 #endif
 
 static int64_t
@@ -1807,7 +1775,6 @@ static BlockDriver bdrv_iscsi = {
 .bdrv_co_flush_to_disk = iscsi_co_flush,
 
 #ifdef __linux__
-.bdrv_ioctl   = iscsi_ioctl,
 .bdrv_aio_ioctl   = iscsi_aio_ioctl,
 #endif
 
diff --git a/block/raw-posix.c b/block/raw-posix.c
index 3a527f0..70ff7e9 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -2228,13 +2228,6 @@ static int fd_open(BlockDriverState *bs)
 return 0;
 }
 
-static int hdev_ioctl(BlockDriverState *bs, unsigned long int req, void *buf)
-{
-BDRVRawState *s = bs->opaque;
-
-return ioctl(s->fd, req, buf);
-}
-
 static BlockAIOCB *hdev_aio_ioctl(BlockDriverState *bs,
 unsigned long int req, void *buf,
 BlockCompletionFunc *cb, void *opaque)
@@ -2400,7 +2393,6 @@ static BlockDriver bdrv_host_device = {
 
 /* generic scsi device */
 #ifdef __linux__
-.bdrv_ioctl = hdev_ioctl,
 .bdrv_aio_ioctl = hdev_aio_ioctl,
 #endif
 };
@@ -2684,7 +2676,6 @@ static BlockDriver bdrv_host_cdrom = {
 .bdrv_lock_medium   = cdrom_lock_medium,
 
 /* generic scsi device */
-.bdrv_ioctl = hdev_ioctl,
 .bdrv_aio_ioctl = hdev_aio_ioctl,
 };
 #endif /* __linux__ */
diff --git a/block/raw_bsd.c b/block/raw_bsd.c
index 63ee911..4361c68 100644
--- a/block/raw_bsd.c
+++ b/block/raw_bsd.c
@@ -174,11 +174,6 @@ static void raw_lock_medium(BlockDriverState *bs, bool 
locked)
 bdrv_lock_medium(bs->file->bs, locked);
 }
 
-static int raw_ioctl(BlockDriverState *bs, unsigned long int req, void *buf)
-{
-return bdrv_ioctl(bs->file->bs, req, buf);
-}
-
 static BlockAIOCB *raw_aio_ioctl(BlockDriverState *bs,
  unsigned long int req, void *buf,
  BlockCompletionFunc *cb,
@@ -268,7 +263,6 @@ BlockDriver bdrv_raw = {
 .bdrv_media_changed   = _media_changed,
 .bdrv_eject   = _eject,
 .bdrv_lock_medium = _lock_medium,
-.bdrv_ioctl   = _ioctl,
 .bdrv_aio_ioctl   = _aio_ioctl,
 .create_opts  = _create_opts,
 .bdrv_has_zero_init   = _has_zero_init
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 936a3fc..d3b2411 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -226,7 +226,6 @@ struct BlockDriver {
 void (*bdrv_lock_medium)(BlockDriverState *bs, bool locked);
 
 /* to control generic scsi devices */
-int (*bdrv_ioctl)(BlockDriverState *bs, unsigned long int req, void *buf);
 BlockAIOCB *(*bdrv_aio_ioctl)(BlockDriverState *bs,
 unsigned long int req, void *buf,
 BlockCompletionFunc *cb, void *opaque);
-- 
2.4.3




[Qemu-devel] [PATCH 3/9] block: Track discard requests

2015-10-26 Thread Fam Zheng
Both bdrv_discard and bdrv_aio_discard will call into bdrv_co_discard,
so add tracked_request_begin/end calls around the loop.

Signed-off-by: Fam Zheng 
---
 block/io.c | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/block/io.c b/block/io.c
index 223c4e9..abb3aaa 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2415,7 +2415,8 @@ static void coroutine_fn bdrv_discard_co_entry(void 
*opaque)
 int coroutine_fn bdrv_co_discard(BlockDriverState *bs, int64_t sector_num,
  int nb_sectors)
 {
-int max_discard, ret;
+BdrvTrackedRequest req;
+int max_discard, ret = 0;
 
 if (!bs->drv) {
 return -ENOMEDIUM;
@@ -2437,6 +2438,8 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, 
int64_t sector_num,
 return 0;
 }
 
+tracked_request_begin(, bs, sector_num, nb_sectors,
+  BDRV_TRACKED_DISCARD);
 bdrv_set_dirty(bs, sector_num, nb_sectors);
 
 max_discard = MIN_NON_ZERO(bs->bl.max_discard, BDRV_REQUEST_MAX_SECTORS);
@@ -2470,20 +2473,23 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, 
int64_t sector_num,
 acb = bs->drv->bdrv_aio_discard(bs, sector_num, nb_sectors,
 bdrv_co_io_em_complete, );
 if (acb == NULL) {
-return -EIO;
+ret = -EIO;
+goto out;
 } else {
 qemu_coroutine_yield();
 ret = co.ret;
 }
 }
 if (ret && ret != -ENOTSUP) {
-return ret;
+goto out;
 }
 
 sector_num += num;
 nb_sectors -= num;
 }
-return 0;
+out:
+tracked_request_end();
+return ret;
 }
 
 int bdrv_discard(BlockDriverState *bs, int64_t sector_num, int nb_sectors)
-- 
2.4.3




[Qemu-devel] [PATCH 4/9] iscsi: Emulate commands in iscsi_aio_ioctl as iscsi_ioctl

2015-10-26 Thread Fam Zheng
iscsi_ioctl emulates SG_GET_VERSION_NUM and SG_GET_SCSI_ID. Now that
bdrv_ioctl() will be emulated with .bdrv_aio_ioctl, replicate the logic
into iscsi_aio_ioctl to make them consistent.

Signed-off-by: Fam Zheng 
---
 block/iscsi.c | 39 +--
 1 file changed, 37 insertions(+), 2 deletions(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index 93f1ee4..94cbdf2 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -96,6 +96,7 @@ typedef struct IscsiAIOCB {
 int status;
 int64_t sector_num;
 int nb_sectors;
+int ret;
 #ifdef __linux__
 sg_io_hdr_t *ioh;
 #endif
@@ -726,6 +727,37 @@ iscsi_aio_ioctl_cb(struct iscsi_context *iscsi, int status,
 iscsi_schedule_bh(acb);
 }
 
+static void iscsi_ioctl_bh_completion(void *opaque)
+{
+IscsiAIOCB *acb = opaque;
+
+qemu_bh_delete(acb->bh);
+acb->common.cb(acb->common.opaque, acb->ret);
+}
+
+static void iscsi_ioctl_handle_emulated(IscsiAIOCB *acb, int req, void *buf)
+{
+BlockDriverState *bs = acb->common.bs;
+IscsiLun *iscsilun = bs->opaque;
+int ret = 0;
+
+switch (req) {
+case SG_GET_VERSION_NUM:
+*(int *)buf = 3;
+break;
+case SG_GET_SCSI_ID:
+((struct sg_scsi_id *)buf)->scsi_type = iscsilun->type;
+break;
+default:
+ret = -EINVAL;
+}
+assert(!acb->bh);
+acb->bh = aio_bh_new(bdrv_get_aio_context(bs),
+ iscsi_ioctl_bh_completion, acb);
+acb->ret = ret;
+qemu_bh_schedule(acb->bh);
+}
+
 static BlockAIOCB *iscsi_aio_ioctl(BlockDriverState *bs,
 unsigned long int req, void *buf,
 BlockCompletionFunc *cb, void *opaque)
@@ -735,8 +767,6 @@ static BlockAIOCB *iscsi_aio_ioctl(BlockDriverState *bs,
 struct iscsi_data data;
 IscsiAIOCB *acb;
 
-assert(req == SG_IO);
-
 acb = qemu_aio_get(_aiocb_info, bs, cb, opaque);
 
 acb->iscsilun = iscsilun;
@@ -745,6 +775,11 @@ static BlockAIOCB *iscsi_aio_ioctl(BlockDriverState *bs,
 acb->buf = NULL;
 acb->ioh = buf;
 
+if (req != SG_IO) {
+iscsi_ioctl_handle_emulated(acb, req, buf);
+return >common;
+}
+
 acb->task = malloc(sizeof(struct scsi_task));
 if (acb->task == NULL) {
 error_report("iSCSI: Failed to allocate task for scsi command. %s",
-- 
2.4.3




[Qemu-devel] [PATCH 6/9] block: Emulate bdrv_ioctl with bdrv_aio_ioctl and track both

2015-10-26 Thread Fam Zheng
Currently all drivers that support .bdrv_aio_ioctl also implement
.bdrv_ioctl redundantly.  To track ioctl requests in block layer it is
easier if we unify the two paths, because we'll need to run it in a
coroutine, as required by tracked_request_begin. While we're at it, use
.bdrv_aio_ioctl plus aio_poll() to emulate bdrv_ioctl().

Signed-off-by: Fam Zheng 
---
 block/io.c | 104 +++--
 1 file changed, 95 insertions(+), 9 deletions(-)

diff --git a/block/io.c b/block/io.c
index abb3aaa..358c3c4 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2518,26 +2518,112 @@ int bdrv_discard(BlockDriverState *bs, int64_t 
sector_num, int nb_sectors)
 return rwco.ret;
 }
 
+typedef struct {
+CoroutineIOCompletion *co;
+QEMUBH *bh;
+} BdrvIoctlCompletionData;
+
+static void bdrv_ioctl_bh_cb(void *opaque)
+{
+BdrvIoctlCompletionData *data = opaque;
+
+bdrv_co_io_em_complete(data->co, -ENOTSUP);
+qemu_bh_delete(data->bh);
+}
+
+static int bdrv_co_do_ioctl(BlockDriverState *bs, int req, void *buf)
+{
+BlockDriver *drv = bs->drv;
+BdrvTrackedRequest tracked_req;
+CoroutineIOCompletion co = {
+.coroutine = qemu_coroutine_self(),
+};
+BlockAIOCB *acb;
+
+tracked_request_begin(_req, bs, 0, 0, BDRV_TRACKED_IOCTL);
+if (!drv || !drv->bdrv_aio_ioctl) {
+co.ret = -ENOTSUP;
+goto out;
+}
+
+acb = drv->bdrv_aio_ioctl(bs, req, buf, bdrv_co_io_em_complete, );
+if (!acb) {
+BdrvIoctlCompletionData *data = g_new(BdrvIoctlCompletionData, 1);
+data->bh = aio_bh_new(bdrv_get_aio_context(bs),
+bdrv_ioctl_bh_cb, data);
+data->co = 
+qemu_bh_schedule(data->bh);
+}
+qemu_coroutine_yield();
+out:
+tracked_request_end(_req);
+return co.ret;
+}
+
+typedef struct {
+BlockDriverState *bs;
+int req;
+void *buf;
+int ret;
+} BdrvIoctlCoData;
+
+static void coroutine_fn bdrv_co_ioctl_entry(void *opaque)
+{
+BdrvIoctlCoData *data = opaque;
+data->ret = bdrv_co_do_ioctl(data->bs, data->req, data->buf);
+}
+
 /* needed for generic scsi interface */
-
 int bdrv_ioctl(BlockDriverState *bs, unsigned long int req, void *buf)
 {
-BlockDriver *drv = bs->drv;
+BdrvIoctlCoData data = {
+.bs = bs,
+.req = req,
+.buf = buf,
+.ret = -EINPROGRESS,
+};
 
-if (drv && drv->bdrv_ioctl)
-return drv->bdrv_ioctl(bs, req, buf);
-return -ENOTSUP;
+if (qemu_in_coroutine()) {
+/* Fast-path if already in coroutine context */
+bdrv_co_ioctl_entry();
+} else {
+Coroutine *co = qemu_coroutine_create(bdrv_co_ioctl_entry);
+qemu_coroutine_enter(co, );
+}
+while (data.ret == -EINPROGRESS) {
+aio_poll(bdrv_get_aio_context(bs), true);
+}
+return data.ret;
+}
+
+static void coroutine_fn bdrv_co_aio_ioctl_entry(void *opaque)
+{
+BlockAIOCBCoroutine *acb = opaque;
+acb->req.error = bdrv_co_do_ioctl(acb->common.bs,
+  acb->req.req, acb->req.buf);
+bdrv_co_complete(acb);
 }
 
 BlockAIOCB *bdrv_aio_ioctl(BlockDriverState *bs,
 unsigned long int req, void *buf,
 BlockCompletionFunc *cb, void *opaque)
 {
-BlockDriver *drv = bs->drv;
+BlockAIOCBCoroutine *acb = qemu_aio_get(_em_co_aiocb_info,
+bs, cb, opaque);
+acb->need_bh = true;
+acb->req.error = -EINPROGRESS;
+acb->req.req = req;
+acb->req.buf = buf;
+if (qemu_in_coroutine()) {
+/* Fast-path if already in coroutine context */
+bdrv_co_aio_ioctl_entry(acb);
+} else {
+Coroutine *co = qemu_coroutine_create(bdrv_co_aio_ioctl_entry);
+qemu_coroutine_enter(co, acb);
+}
 
-if (drv && drv->bdrv_aio_ioctl)
-return drv->bdrv_aio_ioctl(bs, req, buf, cb, opaque);
-return NULL;
+bdrv_co_maybe_schedule_bh(acb);
+return >common;
 }
 
 void *qemu_blockalign(BlockDriverState *bs, size_t size)
-- 
2.4.3




Re: [Qemu-devel] Reminder: we're now in softfreeze

2015-10-26 Thread Peter Crosthwaite
On Thu, Oct 22, 2015 at 3:30 AM, Peter Maydell  wrote:
> [Apologies for the huge cc list, which is basically "everybody
> I have accepted a pullreq from this release cycle.]
>
> Just a reminder that we're now in softfreeze (ie "no new big
> features, start the process of cutting down towards only bugfixes
> going into master"). Hardfreeze will be on the 12th November,
> in three weeks' time. If submaintainers can aim to get pull
> requests in early rather than all on the 12th that would be
> nice, otherwise we get a big logjam on rc0 day and your
> chances of not getting new-feature code in go up sharply.
>
> For this release I'd like to try tracking known-issues on
> the wiki page: http://wiki.qemu.org/Planning/2.5
> with a list of patch series or bugs which need to be fixed
> into the release. This will hopefully avoid the problem we
> had last time around where I missed patches that should have
> gone into an rc.

Can patches fix this with the right form of query? Maybe another piece
of metadata for patches that are promoted for rc merge.

Regards,
Peter

The idea is that anybody (patch series author,
> bug submitter, submaintainer) can add to the list. Then we
> can annotate it with status (rejected as an rc bug, in a
> submaintainer tree, applied to master, etc).
>
> thanks
> -- PMM
>



[Qemu-devel] [PATCH v3 1/2] aio: Introduce aio_context_setup

2015-10-26 Thread Fam Zheng
This is the place to initialize platform specific bits of AioContext.

Signed-off-by: Fam Zheng 
---
 aio-posix.c |  4 
 aio-win32.c |  4 
 async.c | 13 +++--
 include/block/aio.h |  8 
 4 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/aio-posix.c b/aio-posix.c
index d477033..597cb35 100644
--- a/aio-posix.c
+++ b/aio-posix.c
@@ -297,3 +297,7 @@ bool aio_poll(AioContext *ctx, bool blocking)
 
 return progress;
 }
+
+void aio_context_setup(AioContext *ctx, Error **errp)
+{
+}
diff --git a/aio-win32.c b/aio-win32.c
index 50a6867..4742af3 100644
--- a/aio-win32.c
+++ b/aio-win32.c
@@ -363,3 +363,7 @@ bool aio_poll(AioContext *ctx, bool blocking)
 aio_context_release(ctx);
 return progress;
 }
+
+void aio_context_setup(AioContext *ctx, Error **errp)
+{
+}
diff --git a/async.c b/async.c
index efce14b..f4b1410 100644
--- a/async.c
+++ b/async.c
@@ -320,12 +320,18 @@ AioContext *aio_context_new(Error **errp)
 {
 int ret;
 AioContext *ctx;
+Error *local_err = NULL;
+
 ctx = (AioContext *) g_source_new(_source_funcs, sizeof(AioContext));
+aio_context_setup(ctx, _err);
+if (local_err) {
+error_propagate(errp, local_err);
+goto fail;
+}
 ret = event_notifier_init(>notifier, false);
 if (ret < 0) {
-g_source_destroy(>source);
 error_setg_errno(errp, -ret, "Failed to initialize event notifier");
-return NULL;
+goto fail;
 }
 g_source_set_can_recurse(>source, true);
 aio_set_event_notifier(ctx, >notifier,
@@ -339,6 +345,9 @@ AioContext *aio_context_new(Error **errp)
 ctx->notify_dummy_bh = aio_bh_new(ctx, notify_dummy_bh, NULL);
 
 return ctx;
+fail:
+g_source_destroy(>source);
+return NULL;
 }
 
 void aio_context_ref(AioContext *ctx)
diff --git a/include/block/aio.h b/include/block/aio.h
index 400b1b0..a6fd5ad 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -373,4 +373,12 @@ static inline void aio_timer_init(AioContext *ctx,
  */
 int64_t aio_compute_timeout(AioContext *ctx);
 
+/**
+ * aio_context_setup:
+ * @ctx: the aio context
+ *
+ * Initialize the aio context.
+ */
+void aio_context_setup(AioContext *ctx, Error **errp);
+
 #endif
-- 
2.4.3




[Qemu-devel] [PATCH v3 2/2] aio: Introduce aio-epoll.c

2015-10-26 Thread Fam Zheng
To minimize code duplication, epoll is hooked into aio-posix's
aio_poll() instead of rolling its own. This approach also has both
compile-time and run-time switchability.

1) When QEMU starts with a small number of fds in the event loop, ppoll
is used.

2) When QEMU starts with a big number of fds, or when more devices are
hot plugged, epoll kicks in when the number of fds hits the threshold.

3) Some fds may not support epoll, such as tty based stdio. In this
case, it falls back to ppoll.

A rough benchmark with scsi-disk on virtio-scsi dataplane (epoll gets
enabled from 64 onward). Numbers are in MB/s.

===
 | master | epoll
 ||
scsi disks # | readrandrw | readrandrw
-||
1| 74  36 | 75  37
8| 74  36 | 73  37
32   | 65  32 | 63  32
64   | 52  25 | 66  32
128  | 42  21 | 54  27
256  | 26  15 | 38  19
===

Signed-off-by: Fam Zheng 
---
 aio-posix.c | 170 +++-
 include/block/aio.h |   6 ++
 2 files changed, 174 insertions(+), 2 deletions(-)

diff --git a/aio-posix.c b/aio-posix.c
index 597cb35..f259ef9 100644
--- a/aio-posix.c
+++ b/aio-posix.c
@@ -17,6 +17,9 @@
 #include "block/block.h"
 #include "qemu/queue.h"
 #include "qemu/sockets.h"
+#ifdef CONFIG_EPOLL
+#include 
+#endif
 
 struct AioHandler
 {
@@ -28,6 +31,147 @@ struct AioHandler
 QLIST_ENTRY(AioHandler) node;
 };
 
+#ifdef CONFIG_EPOLL
+
+/* The fd number threashold to switch to epoll */
+#define EPOLL_ENABLE_THRESHOLD 64
+
+static void aio_epoll_disable(AioContext *ctx)
+{
+ctx->epoll_available = false;
+if (!ctx->epoll_enabled) {
+return;
+}
+ctx->epoll_enabled = false;
+close(ctx->epollfd);
+}
+
+static inline int epoll_events_from_pfd(int pfd_events)
+{
+return (pfd_events & G_IO_IN ? EPOLLIN : 0) |
+   (pfd_events & G_IO_OUT ? EPOLLOUT : 0) |
+   (pfd_events & G_IO_HUP ? EPOLLHUP : 0) |
+   (pfd_events & G_IO_ERR ? EPOLLERR : 0);
+}
+
+static bool aio_epoll_try_enable(AioContext *ctx)
+{
+AioHandler *node;
+struct epoll_event event;
+
+QLIST_FOREACH(node, >aio_handlers, node) {
+int r;
+if (node->deleted || !node->pfd.events) {
+continue;
+}
+event.events = epoll_events_from_pfd(node->pfd.events);
+event.data.ptr = node;
+r = epoll_ctl(ctx->epollfd, EPOLL_CTL_ADD, node->pfd.fd, );
+if (r) {
+return false;
+}
+}
+ctx->epoll_enabled = true;
+return true;
+}
+
+static void aio_epoll_update(AioContext *ctx, AioHandler *node, bool is_new)
+{
+struct epoll_event event;
+int r;
+
+if (!ctx->epoll_enabled) {
+return;
+}
+if (!node->pfd.events) {
+r = epoll_ctl(ctx->epollfd, EPOLL_CTL_DEL, node->pfd.fd, );
+assert(!r);
+} else {
+event.data.ptr = node;
+event.events = epoll_events_from_pfd(node->pfd.events);
+if (is_new) {
+r = epoll_ctl(ctx->epollfd, EPOLL_CTL_ADD, node->pfd.fd, );
+if (r) {
+aio_epoll_disable(ctx);
+}
+} else {
+r = epoll_ctl(ctx->epollfd, EPOLL_CTL_MOD, node->pfd.fd, );
+assert(!r);
+}
+}
+}
+
+static int aio_epoll(AioContext *ctx, GPollFD *pfds,
+ unsigned npfd, int64_t timeout)
+{
+AioHandler *node;
+int i, ret = 0;
+struct epoll_event events[128];
+
+assert(npfd == 1);
+assert(pfds[0].fd == ctx->epollfd);
+if (timeout > 0) {
+ret = qemu_poll_ns(pfds, npfd, timeout);
+}
+if (timeout <= 0 || ret > 0) {
+ret = epoll_wait(ctx->epollfd, events,
+ sizeof(events) / sizeof(events[0]),
+ timeout);
+if (ret <= 0) {
+goto out;
+}
+for (i = 0; i < ret; i++) {
+int ev = events[i].events;
+node = events[i].data.ptr;
+node->pfd.revents = (ev & EPOLLIN ? G_IO_IN : 0) |
+(ev & EPOLLOUT ? G_IO_OUT : 0) |
+(ev & EPOLLHUP ? G_IO_HUP : 0) |
+(ev & EPOLLERR ? G_IO_ERR : 0);
+}
+}
+out:
+return ret;
+}
+
+static bool aio_epoll_check_poll(AioContext *ctx, GPollFD *pfds,
+ unsigned npfd, int64_t timeout)
+{
+if (!ctx->epoll_available) {
+return false;
+}
+if (ctx->epoll_enabled) {
+return true;
+}
+if (npfd >= EPOLL_ENABLE_THRESHOLD) {
+if (aio_epoll_try_enable(ctx)) {
+return true;
+} else {
+aio_epoll_disable(ctx);
+}
+}
+return false;
+}
+
+#else
+
+static void 

[Qemu-devel] [PATCH v3 0/2] aio: Use epoll in aio_poll()

2015-10-26 Thread Fam Zheng
v3: Remove the redundant check in aio_epoll_try_enable. [Stefan]

v2: Merge aio-epoll.c into aio-posix.c. [Paolo]
Capture some benchmark data in commit log.

This series adds the ability to use epoll in aio_poll() on Linux. It's switched
on in a dynamic way rather than static for two reasons: 1) when the number of
fds is not high enough, using epoll has little advantage; 2) when an epoll
incompatible fd needs to be handled, we need to fall back.  The epoll is
enabled when a fd number threshold is met.



Fam Zheng (2):
  aio: Introduce aio_context_setup
  aio: Introduce aio-epoll.c

 aio-posix.c | 174 +++-
 aio-win32.c |   4 ++
 async.c |  13 +++-
 include/block/aio.h |  14 +
 4 files changed, 201 insertions(+), 4 deletions(-)

-- 
2.4.3




[Qemu-devel] [PATCH v2 0/2] Enforce gaps between DIMMs

2015-10-26 Thread Bharata B Rao
The suggested way to work around the virtio bug reported here

http://lists.nongnu.org/archive/html/qemu-devel/2015-08/msg00522.html

is to introduce gaps between DIMMs. Igor's patchset changes the pc-dimm
auto-address assignment to introduce gaps and ues the same from pc memhp.
This patchset does the same for sPAPR PowerPC.

Before introducing the gap, ensure that memory hotplug region has enough
room for alignment adjustment. We accommodate a max alignment of 256MB for
each slot since sPAPR memory hotplug enforces an alignment requirement of
256MB on RAM size, maxmem and NUMA node mem sizes.

This applies on David's spapr-next branch.

Changes in v2
-
- Minor rewording of patch description and code comment in 1/2.

v1: http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg02414.html
v0: https://lists.gnu.org/archive/html/qemu-devel/2015-10/msg00749.html

Bharata B Rao (2):
  spapr: Accommadate alignment gaps in hotplug memory region
  spapr: Force gaps between DIMM's GPA

 hw/ppc/spapr.c | 23 ++-
 1 file changed, 18 insertions(+), 5 deletions(-)

-- 
2.1.0




[Qemu-devel] [PATCH v2 2/2] spapr: Force gaps between DIMM's GPA

2015-10-26 Thread Bharata B Rao
Mapping DIMMs non contiguously allows to workaround virtio bug reported
earlier:
http://lists.nongnu.org/archive/html/qemu-devel/2015-08/msg00522.html
In this case guest kernel doesn't allocate buffers that can cross DIMM
boundary keeping each buffer local to a DIMM.

Suggested-by: Michael S. Tsirkin 
Signed-off-by: Bharata B Rao 
Reviewed-by: Igor Mammedov 
---
 hw/ppc/spapr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index f29bb10..1a8df95 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2170,7 +2170,7 @@ static void spapr_memory_plug(HotplugHandler 
*hotplug_dev, DeviceState *dev,
 goto out;
 }
 
-pc_dimm_memory_plug(dev, >hotplug_memory, mr, align, false, 
_err);
+pc_dimm_memory_plug(dev, >hotplug_memory, mr, align, true, _err);
 if (local_err) {
 goto out;
 }
-- 
2.1.0




Re: [Qemu-devel] [PATCH] pc: memhp: enforce minimal 128Mb alignment for pc-dimm

2015-10-26 Thread Igor Mammedov
On Mon, 26 Oct 2015 11:02:10 +0200
"Michael S. Tsirkin"  wrote:

> On Mon, Oct 26, 2015 at 09:42:05AM +0100, Igor Mammedov wrote:
> > commit aa8580cd "pc: memhp: force gaps between DIMM's GPA"
> > regressed memory hot-unplug for linux guests triggering
> > following BUGON
> >  =
> >  kernel BUG at mm/memory_hotplug.c:703!
> 
> This is in portable code. Does this imply anyone implementing
> inter dimm gaps will need the same value?
> Shouldn't this go into portable code then?
yep, but PAGE_SECTION_MASK => secstion size is not portable
(i.e. it's per target define)


> 
> >  ...
> >  [] acpi_memory_device_remove+0x79/0xa5
> >  [] acpi_bus_trim+0x5a/0x8d
> >  [] acpi_device_hotplug+0x1b7/0x418
> >  ===
> > BUG_ON(phys_start_pfn & ~PAGE_SECTION_MASK);
> >  ===
> > 
> > reson for it is that x86-64 linux guest supports memory
> > hotplug in chunks of 128Mb and memory section also should
> > be 128Mb aligned.
> > However gaps forced between 128Mb DIMMs with backend's
> > natural alignment of 2Mb make the 2nd and following
> > DIMMs not being aligned on 128Mb boundary as it was
> > originally. To fix regression enforce minimal 128Mb
> > alignment like it was done for PPC.
> > 
> > Signed-off-by: Igor Mammedov 
> 
> 
> Thanks for the fix. Pls see comments below.
> 
> > ---
> >  hw/i386/pc.c | 5 +
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> > index 3d958ba..cd68169 100644
> > --- a/hw/i386/pc.c
> > +++ b/hw/i386/pc.c
> > @@ -1610,6 +1610,8 @@ void ioapic_init_gsi(GSIState *gsi_state, const char 
> > *parent_name)
> >  }
> >  }
> >  
> > +#define MIN_DIMM_ALIGNMENT (1ULL << 27) /* 128Mb */
> > +
> 
> Pls prefix with PC_ and pls add a comment explaining where does this
> value come from.
sure

> 
> >  static void pc_dimm_plug(HotplugHandler *hotplug_dev,
> >   DeviceState *dev, Error **errp)
> >  {
> > @@ -1624,6 +1626,9 @@ static void pc_dimm_plug(HotplugHandler *hotplug_dev,
> >  
> >  if (memory_region_get_alignment(mr) && pcms->enforce_aligned_dimm) {
> >  align = memory_region_get_alignment(mr);
> > +if (pcmc->inter_dimm_gap && (align < MIN_DIMM_ALIGNMENT)) {
> 
> () not required around math.
> 
> > +align = MIN_DIMM_ALIGNMENT;
> > +}
> 
> This seems wrong. Why is alignment only required when inter_dimm_gap
> is set? Does this have to do with compatibility somehow? Pls add a comment.
indeed, it's keyed on inter_dimm_gap for compatibility reasons.
and since inter_dimm_gap introduced layout change it should be ok
to make fix also depend on inter_dimm_gap and not to touch previous machine 
types.

I'll respin v2.

> 
> >  }
> >  
> >  if (!pcms->acpi_dev) {
> > -- 
> > 1.8.3.1




Re: [Qemu-devel] [PATCH] migration: fix analyze-migration.py script

2015-10-26 Thread Mark Cave-Ayland
On 06/09/15 12:54, Mark Cave-Ayland wrote:

> On 06/09/15 09:36, Alexander Graf wrote:
> 
>> On 05.09.15 21:51, Mark Cave-Ayland wrote:
>>> Commit 61964 "Add configuration section" broke the analyze-migration.py 
>>> script
>>> which terminates due to the unrecognised section. Fix the script by parsing
>>> the contents of the configuration section directly into a new
>>> ConfigurationSection object (although nothing is done with it yet).
>>>
>>> Signed-off-by: Mark Cave-Ayland 
>>> ---
>>>  scripts/analyze-migration.py |   13 +
>>>  1 file changed, 13 insertions(+)
>>>
>>> diff --git a/scripts/analyze-migration.py b/scripts/analyze-migration.py
>>> index f6894be..1455387 100755
>>> --- a/scripts/analyze-migration.py
>>> +++ b/scripts/analyze-migration.py
>>> @@ -252,6 +252,15 @@ class HTABSection(object):
>>>  def getDict(self):
>>>  return ""
>>>  
>>> +
>>> +class ConfigurationSection(object):
>>> +def __init__(self, file):
>>> +self.file = file
>>> +
>>> +def read(self):
>>> +name_len = self.file.read32()
>>> +name = self.file.readstr(len = name_len)
>>> +
>>>  class VMSDFieldGeneric(object):
>>>  def __init__(self, desc, file):
>>>  self.file = file
>>> @@ -474,6 +483,7 @@ class MigrationDump(object):
>>>  QEMU_VM_SECTION_FULL  = 0x04
>>>  QEMU_VM_SUBSECTION= 0x05
>>>  QEMU_VM_VMDESCRIPTION = 0x06
>>> +QEMU_VM_CONFIGURATION = 0x07
>>>  QEMU_VM_SECTION_FOOTER= 0x7e
>>>  
>>>  def __init__(self, filename):
>>> @@ -514,6 +524,9 @@ class MigrationDump(object):
>>>  section_type = file.read8()
>>>  if section_type == self.QEMU_VM_EOF:
>>>  break
>>> +elif section_type == self.QEMU_VM_CONFIGURATION:
>>> +section = ConfigurationSection(file)
>>> +section.read()
>>
>> So since we don't have a normal section header, there is no version
>> field either. That in turn means that the format is determined by the
>> machine version only - bleks.
> 
> Yes :(  I double-checked the output of a migration file with hexdump and
> confirmed that just the raw fields are included without any additional
> metadata, even though the state is held in a VMStateDescription.
> 
>> So if there ever has to be more in the configuration section than the
>> machine name, please move to a more detectable scheme. Ideally something
>> that contains
>>
>>   * version
>>   * length of dynamically sized fields
>>   * lenght of full blob
>>
>> would be ideal, so that we have a chance to at least put code into the
>> analyze script to examine it.
>>
>> For now, I think the hard coded solution in the analyze script is
>> reasonable.
>>
>> However, I think we should print out the name if we find it. It should
>> be as simple as adding a special case for the configuration section in
>> MigrationDump.getDict().
> 
> I did play with adding a separate JSON object for configuration but was
> torn between whether configuration should have its own JSON object
> (better if we include extra fields and metadata as above) or to just
> embed it as a simple "machine" property similar to "page_size". I'll
> wait until we hear back from David/Juan and submit a v2 accordingly.

Ping again from Juan/David? The analyze-migration.py script is currently
broken without this patch (or another equivalent) applied.


ATB,

Mark.




[Qemu-devel] [RFC PATCH v1] spapr: Memory hot-unplug support

2015-10-26 Thread Bharata B Rao
Add support to hot remove pc-dimm memory devices.

TODO: In response to memory hot removal operation on a DIMM device,
guest kernel might refuse to offline a few LMBs that are part of that device.
In such cases, we will have a DIMM device that has some LMBs online and some
LMBs offline. To avoid this situation, drmgr could be enhanced to support
a command line option that results in removal of all the LMBs or none.

Signed-off-by: Bharata B Rao 
---
Changes in v1:
- Got rid of the patch that introduced a field in PCDIMMDevice to track
  DIMM marked for removal since we can track that using within DRC
  object.
- Removed the patch that added return value to rtas_set_indicator()
  since the required changes are already pushed by Michael Roth.

v0:

 hw/ppc/spapr.c | 90 +-
 hw/ppc/spapr_drc.c | 18 +++
 2 files changed, 107 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index e1202ce..f5b1ac2 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2174,6 +2174,85 @@ out:
 error_propagate(errp, local_err);
 }
 
+typedef struct sPAPRDIMMState {
+uint32_t nr_lmbs;
+} sPAPRDIMMState;
+
+static void spapr_lmb_release(DeviceState *dev, void *opaque)
+{
+sPAPRDIMMState *ds = (sPAPRDIMMState *)opaque;
+HotplugHandler *hotplug_ctrl = NULL;
+Error *local_err = NULL;
+
+if (--ds->nr_lmbs) {
+return;
+}
+
+g_free(ds);
+
+/*
+ * Now that all the LMBs have been removed by the guest, call the
+ * pc-dimm unplug handler to cleanup up the pc-dimm device.
+ */
+hotplug_ctrl = qdev_get_hotplug_handler(dev);
+hotplug_handler_unplug(hotplug_ctrl, dev, _err);
+}
+
+static void spapr_del_lmbs(DeviceState *dev, uint64_t addr, uint64_t size,
+   Error **errp)
+{
+sPAPRDRConnector *drc;
+sPAPRDRConnectorClass *drck;
+uint32_t nr_lmbs = size/SPAPR_MEMORY_BLOCK_SIZE;
+Error *local_err = NULL;
+int i;
+sPAPRDIMMState *ds = g_malloc0(sizeof(sPAPRDIMMState));
+
+ds->nr_lmbs = nr_lmbs;
+for (i = 0; i < nr_lmbs; i++) {
+drc = spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_LMB,
+addr/SPAPR_MEMORY_BLOCK_SIZE);
+g_assert(drc);
+
+drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
+drck->detach(drc, dev, spapr_lmb_release, ds, _err);
+addr += SPAPR_MEMORY_BLOCK_SIZE;
+}
+spapr_hotplug_req_remove_by_count(SPAPR_DR_CONNECTOR_TYPE_LMB, nr_lmbs);
+}
+
+static void spapr_memory_unplug(HotplugHandler *hotplug_dev, DeviceState *dev,
+Error **errp)
+{
+sPAPRMachineState *ms = SPAPR_MACHINE(hotplug_dev);
+PCDIMMDevice *dimm = PC_DIMM(dev);
+PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
+MemoryRegion *mr = ddc->get_memory_region(dimm);
+
+pc_dimm_memory_unplug(dev, >hotplug_memory, mr);
+object_unparent(OBJECT(dev));
+}
+
+static void spapr_memory_unplug_request(HotplugHandler *hotplug_dev,
+DeviceState *dev, Error **errp)
+{
+Error *local_err = NULL;
+PCDIMMDevice *dimm = PC_DIMM(dev);
+PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
+MemoryRegion *mr = ddc->get_memory_region(dimm);
+uint64_t size = memory_region_size(mr);
+uint64_t addr;
+
+addr = object_property_get_int(OBJECT(dimm), PC_DIMM_ADDR_PROP, 
_err);
+if (local_err) {
+goto out;
+}
+
+spapr_del_lmbs(dev, addr, size, _err);
+out:
+error_propagate(errp, local_err);
+}
+
 static void spapr_machine_device_plug(HotplugHandler *hotplug_dev,
   DeviceState *dev, Error **errp)
 {
@@ -2221,7 +2300,15 @@ static void spapr_machine_device_unplug(HotplugHandler 
*hotplug_dev,
   DeviceState *dev, Error **errp)
 {
 if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
-error_setg(errp, "Memory hot unplug not supported by sPAPR");
+spapr_memory_unplug(hotplug_dev, dev, errp);
+}
+}
+
+static void spapr_machine_device_unplug_request(HotplugHandler *hotplug_dev,
+DeviceState *dev, Error **errp)
+{
+if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
+spapr_memory_unplug_request(hotplug_dev, dev, errp);
 }
 }
 
@@ -2263,6 +2350,7 @@ static void spapr_machine_class_init(ObjectClass *oc, 
void *data)
 hc->plug = spapr_machine_device_plug;
 hc->unplug = spapr_machine_device_unplug;
 mc->cpu_index_to_socket_id = spapr_cpu_index_to_socket_id;
+hc->unplug_request = spapr_machine_device_unplug_request;
 
 smc->dr_lmb_enabled = false;
 fwc->get_dev_path = spapr_get_fw_dev_path;
diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
index 5d6ea7c..59b6ea9 100644
--- a/hw/ppc/spapr_drc.c
+++ b/hw/ppc/spapr_drc.c
@@ -11,6 +11,7 @@
  */
 
 #include "hw/ppc/spapr_drc.h"
+#include "hw/ppc/spapr.h"
 

[Qemu-devel] [kvm-unit-tests PATCH] x86: hyperv_synic: Hyper-V SynIC test

2015-10-26 Thread Andrey Smetanin
Hyper-V SynIC is a Hyper-V synthetic interrupt controller.

The test runs on every vCPU and performs the following steps:
* read from all Hyper-V SynIC MSR's
* setup Hyper-V SynIC evt/msg pages
* setup SINT's routing
* inject SINT's into destination vCPU by 'hyperv-synic-test-device'
* wait for SINT's isr's completion
* clear Hyper-V SynIC evt/msg pages and destroy SINT's routing

Signed-off-by: Andrey Smetanin 
Reviewed-by: Roman Kagan 
Signed-off-by: Denis V. Lunev 
CC: Vitaly Kuznetsov 
CC: "K. Y. Srinivasan" 
CC: Gleb Natapov 
CC: Paolo Bonzini 
CC: Roman Kagan 
CC: Denis V. Lunev 
CC: qemu-devel@nongnu.org
CC: virtualizat...@lists.linux-foundation.org
---
 config/config-x86-common.mak |   5 +-
 lib/x86/msr.h|  23 +
 x86/hyperv_synic.c   | 229 +++
 x86/run  |  10 +-
 x86/unittests.cfg|   5 +
 5 files changed, 270 insertions(+), 2 deletions(-)
 create mode 100644 x86/hyperv_synic.c

diff --git a/config/config-x86-common.mak b/config/config-x86-common.mak
index c2f9908..dc80eaa 100644
--- a/config/config-x86-common.mak
+++ b/config/config-x86-common.mak
@@ -36,7 +36,8 @@ tests-common = $(TEST_DIR)/vmexit.flat $(TEST_DIR)/tsc.flat \
$(TEST_DIR)/kvmclock_test.flat  $(TEST_DIR)/eventinj.flat \
$(TEST_DIR)/s3.flat $(TEST_DIR)/pmu.flat \
$(TEST_DIR)/tsc_adjust.flat $(TEST_DIR)/asyncpf.flat \
-   $(TEST_DIR)/init.flat $(TEST_DIR)/smap.flat
+   $(TEST_DIR)/init.flat $(TEST_DIR)/smap.flat \
+   $(TEST_DIR)/hyperv_synic.flat
 
 ifdef API
 tests-common += api/api-sample
@@ -108,6 +109,8 @@ $(TEST_DIR)/vmx.elf: $(cstart.o) $(TEST_DIR)/vmx.o 
$(TEST_DIR)/vmx_tests.o
 
 $(TEST_DIR)/debug.elf: $(cstart.o) $(TEST_DIR)/debug.o
 
+$(TEST_DIR)/hyperv_synic.elf: $(cstart.o) $(TEST_DIR)/hyperv_synic.o
+
 arch_clean:
$(RM) $(TEST_DIR)/*.o $(TEST_DIR)/*.flat $(TEST_DIR)/*.elf \
$(TEST_DIR)/.*.d lib/x86/.*.d
diff --git a/lib/x86/msr.h b/lib/x86/msr.h
index 281255a..54da420 100644
--- a/lib/x86/msr.h
+++ b/lib/x86/msr.h
@@ -408,4 +408,27 @@
 #define MSR_VM_IGNNE0xc0010115
 #define MSR_VM_HSAVE_PA 0xc0010117
 
+/* Define synthetic interrupt controller model specific registers. */
+#define HV_X64_MSR_SCONTROL 0x4080
+#define HV_X64_MSR_SVERSION 0x4081
+#define HV_X64_MSR_SIEFP0x4082
+#define HV_X64_MSR_SIMP 0x4083
+#define HV_X64_MSR_EOM  0x4084
+#define HV_X64_MSR_SINT00x4090
+#define HV_X64_MSR_SINT10x4091
+#define HV_X64_MSR_SINT20x4092
+#define HV_X64_MSR_SINT30x4093
+#define HV_X64_MSR_SINT40x4094
+#define HV_X64_MSR_SINT50x4095
+#define HV_X64_MSR_SINT60x4096
+#define HV_X64_MSR_SINT70x4097
+#define HV_X64_MSR_SINT80x4098
+#define HV_X64_MSR_SINT90x4099
+#define HV_X64_MSR_SINT10   0x409A
+#define HV_X64_MSR_SINT11   0x409B
+#define HV_X64_MSR_SINT12   0x409C
+#define HV_X64_MSR_SINT13   0x409D
+#define HV_X64_MSR_SINT14   0x409E
+#define HV_X64_MSR_SINT15   0x409F
+
 #endif /* _ASM_X86_MSR_INDEX_H */
diff --git a/x86/hyperv_synic.c b/x86/hyperv_synic.c
new file mode 100644
index 000..5c5a43a
--- /dev/null
+++ b/x86/hyperv_synic.c
@@ -0,0 +1,229 @@
+#include "libcflat.h"
+#include "processor.h"
+#include "msr.h"
+#include "isr.h"
+#include "vm.h"
+#include "apic.h"
+#include "desc.h"
+#include "io.h"
+#include "smp.h"
+#include "atomic.h"
+
+#define MAX_CPUS 4
+#define HYPERV_CPUID_FEATURES   0x4003
+#define HV_X64_MSR_SYNIC_AVAILABLE  (1 << 2)
+#define HV_SYNIC_CONTROL_ENABLE (1ULL << 0)
+#define HV_SYNIC_SIMP_ENABLE(1ULL << 0)
+#define HV_SYNIC_SIEFP_ENABLE   (1ULL << 0)
+#define HV_SYNIC_SINT_MASKED(1ULL << 16)
+#define HV_SYNIC_SINT_AUTO_EOI  (1ULL << 17)
+#define HV_SYNIC_SINT_VECTOR_MASK   (0xFF)
+#define HV_SYNIC_SINT_COUNT 16
+
+enum {
+HV_TEST_DEV_SINT_ROUTE_CREATE = 1,
+HV_TEST_DEV_SINT_ROUTE_DESTROY,
+HV_TEST_DEV_SINT_ROUTE_SET_SINT
+};
+
+static atomic_t isr_enter_count[MAX_CPUS];
+static atomic_t cpus_comp_count;
+
+static bool synic_supported(void)
+{
+   return 

Re: [Qemu-devel] [PULL v4 00/52] Ivshmem patches

2015-10-26 Thread Marc-André Lureau
Hi

On Mon, Oct 26, 2015 at 10:58 AM, Peter Maydell
 wrote:
> This will print a cryptic error message and then not fail the test,
> which is not great. Maybe that's ok for the moment in the interests
> of not keeping this huge patchset out of tree for too long[*], but
> we should look at what glib's test framework provides in the way
> of being able to report "skipped this test" outcomes.

g_test_skip() is since 2.38 (and can't be added in compat, because it
uses internal variable etc)

Furthermore, the shm error is a precondition for all the tests, it
doesn't fit well with g_test_skip() which is inside the individual
unit tests.

> [*] Incidentally this whole saga demonstrates why my general
> recommendation is to keep pull requests at much less than
> 50 patches...
>
>> I rebased and updated the tag.
>
> If you mean by this "please retry the pull" you should send a fresh
> coverletter email so my scripts will pick it up...

ok




-- 
Marc-André Lureau



Re: [Qemu-devel] [PATCH v3 9/9] kvm/x86: Hyper-V kvm exit

2015-10-26 Thread Denis V. Lunev

On 10/22/2015 07:34 PM, Paolo Bonzini wrote:


On 22/10/2015 18:10, Andrey Smetanin wrote:

A new vcpu exit is introduced to notify the userspace of the
changes in Hyper-V SynIC configuration triggered by guest writing to the
corresponding MSRs.

Changes v3:
* added KVM_EXIT_HYPERV types and structs notes into docs

Thanks.  The changes look good.  I look forward to the unit tests so I
can merge it.

Paolo


sent.

Den



Re: [Qemu-devel] [PATCH] virtio: Notice when the system doesn't support MSIx at all

2015-10-26 Thread Peter Maydell
On 26 October 2015 at 10:09, Mark Cave-Ayland
 wrote:
> On 26/10/15 10:03, Michael S. Tsirkin wrote:
>> Pls address the issues if you want this applied.
>
> The patch came from Richard rather than myself - my only involvement has
> been to champion the patch since both myself and Peter have had reports
> of this message confusing users.

Isn't this patch already in master as commit 0d583647a7fc73f6 ?

thanks
-- PMM



Re: [Qemu-devel] [PATCH] ui/egl: Reduce required libraries to build with EGL support

2015-10-26 Thread OGAWA Hirofumi
Gerd Hoffmann  writes:

> On Sa, 2015-10-24 at 20:51 +0900, OGAWA Hirofumi wrote:
>> To support EGL (sdl2-gl, gtk-egl, egl-helpers, etc.), we don't need to
>> install "gl" or "glesv" packages. (Those are used only for milkymist-tmu2).
>
> They are needed and used.  Maybe it works if you don't explicitly link
> them because epoxy brings them in indirectly then.

Hm, I meant, we don't need to explicitly install those development
packages. I.e. epoxy handling those dynamically.

With epoxy, it removes dependency of development stuff (e.g. headers,
(no versioned) *.so), so no need check those by pkg-config? IOW, need
runtime stuff only.

Thanks.
-- 
OGAWA Hirofumi 



Re: [Qemu-devel] [Qemu-block] [PATCH] block/nfs: add support for setting debug level

2015-10-26 Thread Stefan Hajnoczi
On Thu, Oct 22, 2015 at 08:37:19AM +0200, Peter Lieven wrote:
> Am 22.09.2015 um 08:13 schrieb Peter Lieven:
> >Am 25.06.2015 um 15:18 schrieb Stefan Hajnoczi:
> >>On Tue, Jun 23, 2015 at 10:12:15AM +0200, Peter Lieven wrote:
> >>>upcoming libnfs versions will support logging debug messages. Add
> >>>support for it in qemu through an URL parameter.
> >>>
> >>>Signed-off-by: Peter Lieven 
> >>>---
> >>>  block/nfs.c | 4 
> >>>  1 file changed, 4 insertions(+)
> >>>
> >>>diff --git a/block/nfs.c b/block/nfs.c
> >>>index ca9e24e..f7388a3 100644
> >>>--- a/block/nfs.c
> >>>+++ b/block/nfs.c
> >>>@@ -329,6 +329,10 @@ static int64_t nfs_client_open(NFSClient *client, 
> >>>const char *filename,
> >>>  } else if (!strcmp(qp->p[i].name, "readahead")) {
> >>>  nfs_set_readahead(client->context, val);
> >>>  #endif
> >>>+#ifdef LIBNFS_FEATURE_DEBUG
> >>>+} else if (!strcmp(qp->p[i].name, "debug")) {
> >>>+nfs_set_debug(client->context, val);
> >>>+#endif
> >>>  } else {
> >>>  error_setg(errp, "Unknown NFS parameter name: %s",
> >>> qp->p[i].name);
> >>Untrusted users may be able to set these options since they are encoded
> >>in the URI.  I'm imagining a hosting or cloud scenario like OpenStack.
> >>
> >>A verbose debug level spams stderr and could consume a lot of disk
> >>space.
> >>
> >>(The uid and gid options are probably okay since the NFS server cannot
> >>trust the uid/gid coming from QEMU anyway.)
> >>
> >>I think we can merge this patch for QEMU 2.4 but I'd like to have a
> >>discussion about the security risk of encoding libnfs options in the
> >>URI.
> >>
> >>CCed Eric Blake in case libvirt is affected.
> >>
> >>Has anyone thought about this and what are the rules?
> >
> >As I hadn't time to work further on the best way to add options for NFS (and 
> >other
> >protocols), would it be feasible to allow passing debug as an URL parameter, 
> >but
> >limit the maximum debug level to limit a possible security impact (flooding 
> >logs)?
> >
> >If a higher debug level is needed it can be set via device specific options 
> >as soon
> >there is a common scheme for them.
> 
> Any objections?

If you are sure that ERROR and WARN levels (or similar) don't flood the
logs, then it sounds like a solution.

Stefan



Re: [Qemu-devel] [PATCH v2 2/2] xen-platform: Replace assert() with appropriate error reporting

2015-10-26 Thread Stefano Stabellini
On Wed, 21 Oct 2015, Eduardo Habkost wrote:
> Commit dbb7405d8caad0814ceddd568cb49f163a847561 made it possible to
> trigger an assert using "-device xen-platform". Replace it with
> appropriate error reporting.
> 
> Before:
> 
>   $ qemu-system-x86_64 -device xen-platform
>   qemu-system-x86_64: hw/i386/xen/xen_platform.c:391: xen_platform_initfn: 
> Assertion `xen_enabled()' failed.
>   Aborted (core dumped)
>   $
> 
> After:
> 
>   $ qemu-system-x86_64 -device xen-platform
>   qemu-system-x86_64: -device xen-platform: xen-platform device requires the 
> Xen accelerator
>   $
> 
> Signed-off-by: Eduardo Habkost 

Applied to my tree


> Changes v1 -> v2:*
> * Use error_setg() instead of error_report()
>   * Suggested-by: Paolo Bonzini 
> ---
>  hw/i386/xen/xen_platform.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/i386/xen/xen_platform.c b/hw/i386/xen/xen_platform.c
> index 3dc68cb..de83f4e 100644
> --- a/hw/i386/xen/xen_platform.c
> +++ b/hw/i386/xen/xen_platform.c
> @@ -33,6 +33,7 @@
>  #include "trace.h"
>  #include "exec/address-spaces.h"
>  #include "sysemu/block-backend.h"
> +#include "qemu/error-report.h"
>  
>  #include 
>  
> @@ -388,7 +389,10 @@ static void xen_platform_realize(PCIDevice *dev, Error 
> **errp)
>  uint8_t *pci_conf;
>  
>  /* Device will crash on reset if xen is not initialized */
> -assert(xen_enabled());
> +if (!xen_enabled()) {
> +error_setg(errp, "xen-platform device requires the Xen accelerator");
> +return;
> +}
>  
>  pci_conf = dev->config;
>  
> -- 
> 2.1.0
> 



Re: [Qemu-devel] [Qemu-block] [PATCH 0/2] blkverify: Fix BDS leak in .bdrv_open error path

2015-10-26 Thread Stefan Hajnoczi
On Tue, Oct 13, 2015 at 02:17:38PM +0200, Kevin Wolf wrote:
> Kevin Wolf (2):
>   block: Allow bdrv_unref_child(bs, NULL)
>   blkverify: Fix BDS leak in .bdrv_open error path
> 
>  block.c   | 7 ++-
>  block/blkverify.c | 3 +++
>  2 files changed, 9 insertions(+), 1 deletion(-)
> 
> -- 
> 1.8.3.1
> 
> 

Reviewed-by: Stefan Hajnoczi 



Re: [Qemu-devel] [PATCH] tests: re-enable vhost-user-test

2015-10-26 Thread Marc-André Lureau
Hi

On Sat, Oct 24, 2015 at 9:44 PM, Michael S. Tsirkin  wrote:
> On Thu, Oct 15, 2015 at 04:39:25PM +0200, marcandre.lur...@redhat.com wrote:
>> From: Marc-André Lureau 
>>
>> Commit 7fe34ca9c2e actually disabled vhost-user-test altogether,
>> since CONFIG_VHOST_NET is a per-target config variable.
>>
>> tests/vhost-user-test is already x86/64 softmmu specific test, in order
>> to enable it correctly, kvm & vhost-net are also conditions. To check
>> that, set CONFIG_VHOST_NET_TEST when kvm is also enabled.
>>
>> Signed-off-by: Marc-André Lureau 
>
> I had to drop this, this still seems to break on some configs.
> Pls work to fix this up.

I am not sure I understand completely the issue that Peter is having
on arm. I suppose my arm VM doesn't have kvm, and fails to reproduce
it. What probably happens is that CONFIG_VHOST_NET_TEST is enabled
because  "$target_name" = "$cpu" for $target_name = aarch64, then the
test is trying to run with the qemu-system-i386 binary, but that one
doesn't have vhost-net. We would probably need something like that
(pseudo-code, I failed to express this with Makefile):

@@ -5652,6 +5654,7 @@ case "$target_name" in
   echo "CONFIG_KVM=y" >> $config_target_mak
   if test "$vhost_net" = "yes" ; then
 echo "CONFIG_VHOST_NET=y" >> $config_target_mak
+echo "CONFIG_VHOST_NET_TEST+=$cpu" >> $config_host_mak
   fi
 fi
 esac
diff --git a/tests/Makefile b/tests/Makefile
index 9341498..40fd02a 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -192,9 +192,8 @@ gcov-files-i386-y += hw/usb/hcd-xhci.c
 check-qtest-i386-y += tests/pc-cpu-test$(EXESUF)
 check-qtest-i386-y += tests/q35-test$(EXESUF)
 gcov-files-i386-y += hw/pci-host/q35.c
-ifeq ($(CONFIG_VHOST_NET),y)
-check-qtest-i386-$(CONFIG_LINUX) += tests/vhost-user-test$(EXESUF)
-endif
+# foreach CPU in CONFIG_VHOST_NET_TEST
+# check-qtest-$(CPU)-y += tests/vhost-user-test$(EXESUF)

I don't feel very confortable with that sort of per-host/per-target
complex configure-time conditions. I would rather simply use a simple
runtime test check such as:

+static bool check_qemu_support(const char *qemu_cmd)
+{
+GError *err = NULL;
+gchar *cmd = g_strdup_printf("%s %s -display none -monitor stdio",
+ getenv("QTEST_QEMU_BINARY"), qemu_cmd);
+gchar **argv;
+GPid pid;
+int output, outerr, hmp, status = -1;
+
+g_shell_parse_argv(cmd, NULL, , );
+g_assert_no_error(err);
+
+g_spawn_async_with_pipes(NULL, argv, NULL, G_SPAWN_DO_NOT_REAP_CHILD,
+ NULL, NULL, , , , , );
+g_assert_no_error(err);
+
+write(hmp, "q\n", 2);
+waitpid(pid, , 0);
+
+g_strfreev(argv);
+close(hmp);
+close(output);
+close(outerr);
+g_spawn_close_pid(pid);
+
+return status == 0;
+}
+
@@ -602,6 +630,9 @@ int main(int argc, char **argv)
 g_thread_new(NULL, thread_function, NULL);

 qemu_cmd = GET_QEMU_CMD(server);
+if (!check_qemu_support(qemu_cmd)) {
+goto cleanup;
+}

any help appreciated

-- 
Marc-André Lureau



Re: [Qemu-devel] [PATCH v2] target-*: Advance pc after recognizing a breakpoint

2015-10-26 Thread Peter Maydell
On 23 October 2015 at 23:00, Richard Henderson  wrote:
> Some targets already had this within their logic, but make sure
> it's present for all targets.
>
> Signed-off-by: Richard Henderson 
> ---
> Version 2 updates the language as discussed in the followup in v1.
>
> Peter, in that followup you mentioned that we ought to just use +1
> for all targets.  I was about to do that when I noticed the comment
> in the arm32 port about it being better to use the correct insn size
> in order to avoid warnings from the disassembler.  So I left my good
> estimates as-is from v1.

Reviewed-by: Peter Maydell 

though really we should shut up the disassembler some other way
somehow..."pick the insn length" doesn't work for variable-insn-length
CPUs.

thanks
-- PMM



Re: [Qemu-devel] Qemu.git Build Failed

2015-10-26 Thread Zhang, Xianda

I'd like to know whether you have fixed the problem and  the commit id. Thank 
you.

-Original Message-
From: Peter Maydell [mailto:peter.mayd...@linaro.org] 
Sent: Friday, October 23, 2015 5:23 PM
To: Zhang, Xianda 
Cc: qemu-devel@nongnu.org; Hu, Robert 
Subject: Re: [Qemu-devel] Qemu.git Build Failed

On 23 October 2015 at 06:39, Zhang, Xianda  wrote:
> Steps:
>
> ./configure --target-list=x86_64-softmmu
>
> make -j20
>
>
>
> Build log is as follows:
>
> /home/build/gitrepo/qemu/hw/virtio/vhost-user.c: In function
> ‘vhost_set_log_base’:
>
> /home/build/gitrepo/qemu/hw/virtio/vhost-user.c:374: error: unknown 
> field ‘u64’ specified in initializer
>
> make[1]: *** [hw/virtio/vhost-user.o] Error 1

Thanks for the report -- there's a patch on the list to fix this which I expect 
should hit master later today.

-- PMM


Re: [Qemu-devel] [PATCH v4 06/13] target-arm: Add computation of starting level for S2 PTW

2015-10-26 Thread Edgar E. Iglesias
On Fri, Oct 23, 2015 at 05:26:52PM +0100, Peter Maydell wrote:
> On 14 October 2015 at 23:55, Edgar E. Iglesias  
> wrote:
> > From: "Edgar E. Iglesias" 
> >
> > The starting level for S2 pagetable walks is computed
> > differently from the S1 starting level. Implement the S2
> > variant.
> >
> > Signed-off-by: Edgar E. Iglesias 
> > ---
> >  target-arm/helper.c| 117 
> > +++--
> >  target-arm/internals.h |  25 +++
> >  2 files changed, 129 insertions(+), 13 deletions(-)
> >
> > diff --git a/target-arm/helper.c b/target-arm/helper.c
> > index 79b4c03..8530f7e 100644
> > --- a/target-arm/helper.c
> > +++ b/target-arm/helper.c
> > @@ -6406,12 +6406,72 @@ typedef enum {
> >  permission_fault = 3,
> >  } MMUFaultType;
> >
> > +/*
> > + * check_s2_startlevel
> > + * @cpu:ARMCPU
> > + * @is_aa64:True if the translation regime is in AArch64 state
> > + * @startlevel: Suggested starting level
> > + * @inputsize:  Bitsize of IPAs
> > + * @stride: Page-table stride (See the ARM ARM)
> > + *
> > + * Returns true if the suggested starting level is OK and false otherwise.
> > + */
> > +static bool check_s2_startlevel(ARMCPU *cpu, bool is_aa64, int startlevel,
> > +int inputsize, int stride)
> > +{
> > +/* Negative levels are never allowed.  */
> > +if (startlevel < 0) {
> > +return false;
> > +}
> > +
> > +if (is_aa64) {
> > +unsigned int pamax = arm_pamax(cpu);
> > +
> > +switch (stride) {
> > +case 13: /* 64KB Pages.  */
> > +if (startlevel < 1 || (startlevel == 0 && pamax <= 42)) {
> > +return false;
> > +}
> 
> I'm having trouble matching these up with the ARM ARM pseudocode,
> which says for this case for instance
>if (level == 0 || (level == 1 && PAMax() <= 42)) then basefound = FALSE;
> 
> (as an aside, the pseudocode uses 'startlevel' for the raw SL0
> field value and 'level' for the 3 - startlevel (or 2 - startlevel)
> value, so it's a bit confusing to use startlevel for both here.)
> 
> > +break;
> > +case 11: /* 16KB Pages.  */
> > +if (startlevel < 1 || (startlevel == 0 && pamax <= 40)) {
> > +return false;
> > +}
> > +break;
> > +case 9: /* 4KB Pages.  */
> > +if (startlevel == 0 && pamax <= 42) {
> > +return false;
> > +}
> > +break;
> > +default:
> > +g_assert_not_reached();
> > +}
> > +} else {
> > +const int grainsize = stride + 3;
> > +int startsizecheck;
> > +
> > +/* AArch32 only supports 4KB pages. Assert on that.  */
> > +assert(stride == 9);
> > +
> > +if (startlevel == 0) {
> > +return false;
> > +}
> > +
> > +startsizecheck = inputsize - ((3 - startlevel) * stride + 
> > grainsize);
> > +if (startsizecheck < 1 || startsizecheck > stride + 4) {
> > +return false;
> > +}
> > +}
> > +return true;
> > +}
> > +
> >  static bool get_phys_addr_lpae(CPUARMState *env, target_ulong address,
> > int access_type, ARMMMUIdx mmu_idx,
> > hwaddr *phys_ptr, MemTxAttrs *txattrs, int 
> > *prot,
> > target_ulong *page_size_ptr, uint32_t *fsr)
> >  {
> > -CPUState *cs = CPU(arm_env_get_cpu(env));
> > +ARMCPU *cpu = arm_env_get_cpu(env);
> > +CPUState *cs = CPU(cpu);
> >  /* Read an LPAE long-descriptor translation table. */
> >  MMUFaultType fault_type = translation_fault;
> >  uint32_t level = 1;
> > @@ -6560,18 +6620,49 @@ static bool get_phys_addr_lpae(CPUARMState *env, 
> > target_ulong address,
> >  goto do_fault;
> >  }
> >
> > -/* The starting level depends on the virtual address size (which can be
> > - * up to 48 bits) and the translation granule size. It indicates the 
> > number
> > - * of strides (stride bits at a time) needed to consume the bits
> > - * of the input address. In the pseudocode this is:
> > - *  level = 4 - RoundUp((inputsize - grainsize) / stride)
> > - * where their 'inputsize' is our 'inputsize', 'grainsize' is
> > - * our 'stride + 3' and 'stride' is our 'stride'.
> > - * Applying the usual "rounded up m/n is (m+n-1)/n" and simplifying:
> > - * = 4 - (inputsize - stride - 3 + stride - 1) / stride
> > - * = 4 - (inputsize - 4) / stride;
> > - */
> > -level = 4 - (inputsize - 4) / stride;
> > +if (mmu_idx != ARMMMUIdx_S2NS) {
> > +/* The starting level depends on the virtual address size (which 
> > can
> > + * be up to 48 bits) and the translation granule size. It indicates
> > + * the number of strides (stride bits at a time) 

[Qemu-devel] [PATCH 6/7] target-i386/hyperv: Hyper-V SynIC SINT routing and vCPU exit

2015-10-26 Thread Andrey Smetanin
Hyper-V SynIC(synthetic interrupt controller) API for
irq routing setup, irq injection, irq ack notifications and
event/message pages changes tracking for future use.

Signed-off-by: Andrey Smetanin 
Reviewed-by: Roman Kagan 
Signed-off-by: Denis V. Lunev 
CC: Vitaly Kuznetsov 
CC: "K. Y. Srinivasan" 
CC: Gleb Natapov 
CC: Paolo Bonzini 
CC: Roman Kagan 
CC: Denis V. Lunev 
CC: k...@vger.kernel.org
CC: virtualizat...@lists.linux-foundation.org

---
 target-i386/Makefile.objs |   2 +-
 target-i386/hyperv.c  | 127 ++
 target-i386/hyperv.h  |  42 +++
 target-i386/kvm.c |   4 ++
 4 files changed, 174 insertions(+), 1 deletion(-)
 create mode 100644 target-i386/hyperv.c
 create mode 100644 target-i386/hyperv.h

diff --git a/target-i386/Makefile.objs b/target-i386/Makefile.objs
index 437d997..2255f46 100644
--- a/target-i386/Makefile.objs
+++ b/target-i386/Makefile.objs
@@ -3,5 +3,5 @@ obj-y += excp_helper.o fpu_helper.o cc_helper.o int_helper.o 
svm_helper.o
 obj-y += smm_helper.o misc_helper.o mem_helper.o seg_helper.o
 obj-y += gdbstub.o
 obj-$(CONFIG_SOFTMMU) += machine.o arch_memory_mapping.o arch_dump.o monitor.o
-obj-$(CONFIG_KVM) += kvm.o
+obj-$(CONFIG_KVM) += kvm.o hyperv.o
 obj-$(call lnot,$(CONFIG_KVM)) += kvm-stub.o
diff --git a/target-i386/hyperv.c b/target-i386/hyperv.c
new file mode 100644
index 000..e79b173
--- /dev/null
+++ b/target-i386/hyperv.c
@@ -0,0 +1,127 @@
+/*
+ * QEMU KVM Hyper-V support
+ *
+ * Copyright (C) 2015 Andrey Smetanin 
+ *
+ * Authors:
+ *  Andrey Smetanin 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "hyperv.h"
+#include "standard-headers/asm-x86/hyperv.h"
+
+int kvm_hv_handle_exit(X86CPU *cpu, struct kvm_hyperv_exit *exit)
+{
+CPUX86State *env = >env;
+
+switch (exit->type) {
+case KVM_EXIT_HYPERV_SYNIC:
+if (!cpu->hyperv_synic) {
+return -1;
+}
+
+/*
+ * For now just track changes in SynIC control and msg/evt pages msr's.
+ * When SynIC messaging/events processing will be added in future
+ * here we will do messages queues flushing and pages remapping.
+ */
+switch (exit->u.synic.msr) {
+case HV_X64_MSR_SCONTROL:
+env->msr_hv_synic_control = exit->u.synic.control;
+break;
+case HV_X64_MSR_SIMP:
+env->msr_hv_synic_msg_page = exit->u.synic.msg_page;
+break;
+case HV_X64_MSR_SIEFP:
+env->msr_hv_synic_evt_page = exit->u.synic.evt_page;
+break;
+default:
+return -1;
+}
+return 0;
+default:
+return -1;
+}
+}
+
+static void kvm_hv_sint_ack_handler(EventNotifier *notifier)
+{
+HvSintRoute *sint_route = container_of(notifier, HvSintRoute,
+   sint_ack_notifier);
+event_notifier_test_and_clear(notifier);
+if (sint_route->sint_ack_clb) {
+sint_route->sint_ack_clb(sint_route);
+}
+}
+
+HvSintRoute *kvm_hv_sint_route_create(uint32_t vcpu_id, uint32_t sint,
+  HvSintAckClb sint_ack_clb)
+{
+HvSintRoute *sint_route;
+int r, gsi;
+
+sint_route = g_malloc0(sizeof(*sint_route));
+r = event_notifier_init(_route->sint_set_notifier, false);
+if (r) {
+goto err;
+}
+
+r = event_notifier_init(_route->sint_ack_notifier, false);
+if (r) {
+goto err_sint_set_notifier;
+}
+
+event_notifier_set_handler(_route->sint_ack_notifier,
+   kvm_hv_sint_ack_handler);
+
+gsi = kvm_irqchip_add_hv_sint_route(kvm_state, vcpu_id, sint);
+if (gsi < 0) {
+goto err_gsi;
+}
+
+r = kvm_irqchip_add_irqfd_notifier_gsi(kvm_state,
+   _route->sint_set_notifier,
+   _route->sint_ack_notifier, 
gsi);
+if (r) {
+goto err_irqfd;
+}
+sint_route->gsi = gsi;
+sint_route->sint_ack_clb = sint_ack_clb;
+sint_route->vcpu_id = vcpu_id;
+sint_route->sint = sint;
+
+return sint_route;
+
+err_irqfd:
+kvm_irqchip_release_virq(kvm_state, gsi);
+err_gsi:
+event_notifier_set_handler(_route->sint_ack_notifier, NULL);
+event_notifier_cleanup(_route->sint_ack_notifier);
+err_sint_set_notifier:
+event_notifier_cleanup(_route->sint_set_notifier);
+err:
+g_free(sint_route);
+
+return NULL;
+}
+
+void kvm_hv_sint_route_destroy(HvSintRoute *sint_route)
+{
+kvm_irqchip_remove_irqfd_notifier_gsi(kvm_state,
+  

[Qemu-devel] [PATCH 7/7] hw/misc: Hyper-V test device 'hyperv-testdev'

2015-10-26 Thread Andrey Smetanin
'hyperv-testdev' will be used by kvm-unit-tests
to setup Hyper-V SynIC SINT's routing and to inject
Hyper-V SynIC SINT's.

Hyper-V test device is ISA type device that create 0x3000
IO memory region and catches write access into it. Every
write operation data decoded into ctl code and parameters
for Hyper-V test device.

Signed-off-by: Andrey Smetanin 
Reviewed-by: Roman Kagan 
Signed-off-by: Denis V. Lunev 
CC: Vitaly Kuznetsov 
CC: "K. Y. Srinivasan" 
CC: Gleb Natapov 
CC: Paolo Bonzini 
CC: Roman Kagan 
CC: Denis V. Lunev 
CC: k...@vger.kernel.org
CC: virtualizat...@lists.linux-foundation.org

---
 default-configs/i386-softmmu.mak   |   1 +
 default-configs/x86_64-softmmu.mak |   1 +
 hw/misc/Makefile.objs  |   1 +
 hw/misc/hyperv_testdev.c   | 164 +
 4 files changed, 167 insertions(+)
 create mode 100644 hw/misc/hyperv_testdev.c

diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
index 43c96d1..7f3c850 100644
--- a/default-configs/i386-softmmu.mak
+++ b/default-configs/i386-softmmu.mak
@@ -50,3 +50,4 @@ CONFIG_XIO3130=y
 CONFIG_IOH3420=y
 CONFIG_I82801B11=y
 CONFIG_SMBIOS=y
+CONFIG_HYPERV_TESTDEV=y
diff --git a/default-configs/x86_64-softmmu.mak 
b/default-configs/x86_64-softmmu.mak
index dfb8095..e494d79 100644
--- a/default-configs/x86_64-softmmu.mak
+++ b/default-configs/x86_64-softmmu.mak
@@ -50,3 +50,4 @@ CONFIG_XIO3130=y
 CONFIG_IOH3420=y
 CONFIG_I82801B11=y
 CONFIG_SMBIOS=y
+CONFIG_HYPERV_TESTDEV=y
diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
index 4aa76ff..fafc80a 100644
--- a/hw/misc/Makefile.objs
+++ b/hw/misc/Makefile.objs
@@ -40,3 +40,4 @@ obj-$(CONFIG_STM32F2XX_SYSCFG) += stm32f2xx_syscfg.o
 
 obj-$(CONFIG_PVPANIC) += pvpanic.o
 obj-$(CONFIG_EDU) += edu.o
+obj-$(CONFIG_HYPERV_TESTDEV) += hyperv_testdev.o
diff --git a/hw/misc/hyperv_testdev.c b/hw/misc/hyperv_testdev.c
new file mode 100644
index 000..f0e4e35
--- /dev/null
+++ b/hw/misc/hyperv_testdev.c
@@ -0,0 +1,164 @@
+/*
+ * QEMU KVM Hyper-V test device to support Hyper-V kvm-unit-tests
+ *
+ * Copyright (C) 2015 Andrey Smetanin 
+ *
+ * Authors:
+ *  Andrey Smetanin 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "hw/hw.h"
+#include "hw/qdev.h"
+#include "hw/isa/isa.h"
+#include "target-i386/hyperv.h"
+
+#define HV_TEST_DEV_MAX_SINT_ROUTES 64
+
+struct HypervTestDev {
+ISADevice parent_obj;
+MemoryRegion sint_control;
+HvSintRoute *sint_route[HV_TEST_DEV_MAX_SINT_ROUTES];
+};
+typedef struct HypervTestDev HypervTestDev;
+
+#define TYPE_HYPERV_TEST_DEV "hyperv-testdev"
+#define HYPERV_TEST_DEV(obj) \
+OBJECT_CHECK(HypervTestDev, (obj), TYPE_HYPERV_TEST_DEV)
+
+enum {
+HV_TEST_DEV_SINT_ROUTE_CREATE = 1,
+HV_TEST_DEV_SINT_ROUTE_DESTROY,
+HV_TEST_DEV_SINT_ROUTE_SET_SINT
+};
+
+static int alloc_sint_route_index(HypervTestDev *dev)
+{
+int i;
+
+for (i = 0; i < ARRAY_SIZE(dev->sint_route); i++) {
+if (dev->sint_route[i] == NULL) {
+return i;
+}
+}
+return -1;
+}
+
+static void free_sint_route_index(HypervTestDev *dev, int i)
+{
+assert(i >= 0 && i < ARRAY_SIZE(dev->sint_route));
+dev->sint_route[i] = NULL;
+}
+
+static int find_sint_route_index(HypervTestDev *dev, uint32_t vcpu_id,
+ uint32_t sint)
+{
+HvSintRoute *sint_route;
+int i;
+
+for (i = 0; i < ARRAY_SIZE(dev->sint_route); i++) {
+sint_route = dev->sint_route[i];
+if (sint_route && sint_route->vcpu_id == vcpu_id &&
+sint_route->sint == sint) {
+return i;
+}
+}
+return -1;
+}
+
+static void hv_synic_test_dev_control(HypervTestDev *dev, uint32_t ctl,
+  uint32_t vcpu_id, uint32_t sint)
+{
+int i;
+HvSintRoute *sint_route;
+
+switch (ctl) {
+case HV_TEST_DEV_SINT_ROUTE_CREATE:
+i = alloc_sint_route_index(dev);
+assert(i >= 0);
+sint_route = kvm_hv_sint_route_create(vcpu_id, sint, NULL);
+assert(sint_route);
+dev->sint_route[i] = sint_route;
+break;
+case HV_TEST_DEV_SINT_ROUTE_DESTROY:
+i = find_sint_route_index(dev, vcpu_id, sint);
+assert(i >= 0);
+sint_route = dev->sint_route[i];
+kvm_hv_sint_route_destroy(sint_route);
+free_sint_route_index(dev, i);
+break;
+case HV_TEST_DEV_SINT_ROUTE_SET_SINT:
+i = find_sint_route_index(dev, vcpu_id, sint);
+assert(i >= 0);
+sint_route = dev->sint_route[i];
+kvm_hv_sint_route_set_sint(sint_route);
+break;
+default:
+break;
+   

Re: [Qemu-devel] [PATCH 3/7] linux-headers/kvm: add Hyper-V SynIC irq routing type and struct

2015-10-26 Thread Peter Maydell
On 26 October 2015 at 09:50, Andrey Smetanin  wrote:
> Signed-off-by: Andrey Smetanin 
> Reviewed-by: Roman Kagan 
> Signed-off-by: Denis V. Lunev 
> CC: Vitaly Kuznetsov 
> CC: "K. Y. Srinivasan" 
> CC: Gleb Natapov 
> CC: Paolo Bonzini 
> CC: Roman Kagan 
> CC: Denis V. Lunev 
> CC: k...@vger.kernel.org
> CC: virtualizat...@lists.linux-foundation.org
>
> ---
>  linux-headers/linux/kvm.h | 8 
>  1 file changed, 8 insertions(+)

Hi. Changes to linux-headers/ should only be made as part of
an automated update from a mainline Linux kernel tree using
the scripts/update-linux-headers.sh script. This patch looks
like maybe it was a manual edit ?

thanks
-- PMM



Re: [Qemu-devel] [PATCH 15/19] pc: acpi: bump DSDT revision compliance to v2

2015-10-26 Thread Igor Mammedov
On Sat, 24 Oct 2015 22:40:59 +0300
"Michael S. Tsirkin"  wrote:

> On Fri, Oct 23, 2015 at 04:57:18PM +0200, Igor Mammedov wrote:
> > it turns on 64-bit integer handling in OSPM, which we could use
> > for writing simpler/smaller AML code.
> > Tested with Windows XP and Windows Server 2008, Linux:
> >  * XP doesn't care about revision and continues to use 32 integers
> >and boots just fine with this change.
> >  * WS 2008 and Linux - support rev2 and use 64-bit integers
> > 
> > Signed-off-by: Igor Mammedov 
> 
> This is still planned, right? IIUC you didn't post any code
> that needs the 64 bit math.
nope, the next patch 16/19 uses 64-bit math,

it greatly simplifies _CRS as we don't have to do
64-bit math manually using 32-bit integers.

And even if we put new MHPD.MCRS() that uses 64-bit math in SSDT,
it won't crash XP unless someone would try to do memory hotplug

and even it could be 'fixed' if we check _REV on every
hotplug event, it's a bit ugly but should work.

> 
> > ---
> >  hw/i386/acpi-build.c  | 2 +-
> >  hw/i386/acpi-dsdt.dsl | 2 +-
> >  hw/i386/q35-acpi-dsdt.dsl | 2 +-
> >  3 files changed, 3 insertions(+), 3 deletions(-)
> > 
> > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> > index 8add4d9..c929540 100644
> > --- a/hw/i386/acpi-build.c
> > +++ b/hw/i386/acpi-build.c
> > @@ -1484,7 +1484,7 @@ build_dsdt(GArray *table_data, GArray *linker, 
> > AcpiMiscInfo *misc)
> >  
> >  memset(dsdt, 0, sizeof *dsdt);
> >  build_header(linker, table_data, dsdt, "DSDT",
> > - misc->dsdt_size, 1);
> > + misc->dsdt_size, 2);
> >  }
> >  
> >  static GArray *
> > diff --git a/hw/i386/acpi-dsdt.dsl b/hw/i386/acpi-dsdt.dsl
> > index 8dba096..6d46b36 100644
> > --- a/hw/i386/acpi-dsdt.dsl
> > +++ b/hw/i386/acpi-dsdt.dsl
> > @@ -22,7 +22,7 @@ ACPI_EXTRACT_ALL_CODE AcpiDsdtAmlCode
> >  DefinitionBlock (
> >  "acpi-dsdt.aml",// Output Filename
> >  "DSDT", // Signature
> > -0x01,   // DSDT Compliance Revision
> > +0x02,   // DSDT Compliance Revision
> >  "BXPC", // OEMID
> >  "BXDSDT",   // TABLE ID
> >  0x1 // OEM Revision
> > diff --git a/hw/i386/q35-acpi-dsdt.dsl b/hw/i386/q35-acpi-dsdt.dsl
> > index 7be7b37..ecefdec 100644
> > --- a/hw/i386/q35-acpi-dsdt.dsl
> > +++ b/hw/i386/q35-acpi-dsdt.dsl
> > @@ -28,7 +28,7 @@ ACPI_EXTRACT_ALL_CODE Q35AcpiDsdtAmlCode
> >  DefinitionBlock (
> >  "q35-acpi-dsdt.aml",// Output Filename
> >  "DSDT", // Signature
> > -0x01,   // DSDT Compliance Revision
> > +0x02,   // DSDT Compliance Revision
> >  "BXPC", // OEMID
> >  "BXDSDT",   // TABLE ID
> >  0x2 // OEM Revision
> > -- 
> > 1.8.3.1




Re: [Qemu-devel] [PATCH 3/7] linux-headers/kvm: add Hyper-V SynIC irq routing type and struct

2015-10-26 Thread Denis V. Lunev

On 10/26/2015 01:03 PM, Peter Maydell wrote:

On 26 October 2015 at 09:50, Andrey Smetanin  wrote:

Signed-off-by: Andrey Smetanin 
Reviewed-by: Roman Kagan 
Signed-off-by: Denis V. Lunev 
CC: Vitaly Kuznetsov 
CC: "K. Y. Srinivasan" 
CC: Gleb Natapov 
CC: Paolo Bonzini 
CC: Roman Kagan 
CC: Denis V. Lunev 
CC: k...@vger.kernel.org
CC: virtualizat...@lists.linux-foundation.org

---
  linux-headers/linux/kvm.h | 8 
  1 file changed, 8 insertions(+)

Hi. Changes to linux-headers/ should only be made as part of
an automated update from a mainline Linux kernel tree using
the scripts/update-linux-headers.sh script. This patch looks
like maybe it was a manual edit ?

thanks
-- PMM


yep. We know and have discussed this with Paolo already.
Kernel stuff is in progress at the moment. The patch
is presented to interested people to allow to compile and
run.

Actual merge will be performed with proper sync
when kernel will be in rc3 or 4 stage and the patch will be
dropped.

The same applies for patch 5.

Den



Re: [Qemu-devel] [PATCH 3/7] linux-headers/kvm: add Hyper-V SynIC irq routing type and struct

2015-10-26 Thread Peter Maydell
On 26 October 2015 at 10:12, Denis V. Lunev  wrote:
> On 10/26/2015 01:03 PM, Peter Maydell wrote:
>> Hi. Changes to linux-headers/ should only be made as part of
>> an automated update from a mainline Linux kernel tree using
>> the scripts/update-linux-headers.sh script. This patch looks
>> like maybe it was a manual edit ?

> yep. We know and have discussed this with Paolo already.
> Kernel stuff is in progress at the moment. The patch
> is presented to interested people to allow to compile and
> run.
>
> Actual merge will be performed with proper sync
> when kernel will be in rc3 or 4 stage and the patch will be
> dropped.

OK, cool. It's best to tag the series as 'RFC' rather than
'PATCH' in that case, as a reminder that it can't be applied
just yet.

thanks
-- PMM



Re: [Qemu-devel] [PATCH] ui/egl: Reduce required libraries to build with EGL support

2015-10-26 Thread Gerd Hoffmann
On Sa, 2015-10-24 at 20:51 +0900, OGAWA Hirofumi wrote:
> To support EGL (sdl2-gl, gtk-egl, egl-helpers, etc.), we don't need to
> install "gl" or "glesv" packages. (Those are used only for milkymist-tmu2).

They are needed and used.  Maybe it works if you don't explicitly link
them because epoxy brings them in indirectly then.

cheers,
  Gerd





Re: [Qemu-devel] [PATCH] virtio: Notice when the system doesn't support MSIx at all

2015-10-26 Thread Michael S. Tsirkin
On Mon, Oct 26, 2015 at 10:14:20AM +, Peter Maydell wrote:
> On 26 October 2015 at 10:09, Mark Cave-Ayland
>  wrote:
> > On 26/10/15 10:03, Michael S. Tsirkin wrote:
> >> Pls address the issues if you want this applied.
> >
> > The patch came from Richard rather than myself - my only involvement has
> > been to champion the patch since both myself and Peter have had reports
> > of this message confusing users.
> 
> Isn't this patch already in master as commit 0d583647a7fc73f6 ?
> 
> thanks
> -- PMM

So it is.
That explains why it doesn't apply.




Re: [Qemu-devel] [PATCH V2 0/4] ide: avoid main-loop hang on CDROM/NFS failure

2015-10-26 Thread Stefan Hajnoczi
On Mon, Oct 12, 2015 at 02:27:21PM +0200, Peter Lieven wrote:
> This series aims at avoiding a hanging main-loop if a vserver has a
> CDROM image mounted from a NFS share and that NFS share goes down.
> Typical situation is that users mount an CDROM ISO to install something
> and then forget to eject that CDROM afterwards.
> As a consequence this mounted CD is able to bring down the
> whole vserver if the backend NFS share is unreachable. This is bad
> especially if the CDROM itself is not needed anymore at this point.
> 
> This series aims at fixing 2 blocking I/O operations that would
> hang if the NFS server is unavailable:
>  - ATAPI PIO read requests used sync calls to blk_read, convert
>them to an async variant where possible.
>  - If a busmaster DMA request is cancelled all requests are drained.
>Convert the drain to an async request canceling.
> 
> v1->v2: - fix offset for 2352 byte sector size [Kevin]
> - use a sync request if we continue an elementary transfer.
>   As John pointed out we enter a race condition between next
>   IDE command and async transfer otherwise. This is sill not
>   optimal, but it fixes the NFS down problems for all cases where
>   the NFS server goes down while there is no PIO CD activity.
>   Of course, it could still happen during a PIO transfer, but I
>   expect this to be the unlikelier case.
>   I spent some effort trying to read more sectors at once and
>   avoiding continuation of elementary transfers, but with
>   whatever I came up it was destroying migration between different
>   Qemu versions. I have a quite hackish patch that works and
>   should survive migration, but I am not happy with it. So I
>   would like to start with this version as it is a big improvement
>   already.
> - Dropped Patch 5 because it is upstream meanwhile.
> 
> Peter Lieven (4):
>   ide/atapi: make PIO read requests async
>   ide/atapi: blk_aio_readv may return NULL
>   ide: add support for cancelable read requests
>   ide/atapi: enable cancelable requests
> 
>  hw/ide/atapi.c| 99 
> +--
>  hw/ide/core.c | 55 +++
>  hw/ide/internal.h | 16 +
>  hw/ide/pci.c  | 42 +++
>  4 files changed, 188 insertions(+), 24 deletions(-)

Any reason why write and discard requests aren't covered in this series?

If this is a good idea for CD-ROM it should be a good idea for all PCI
IDE devices.

Having a specialized code path is often a sign that it hasn't been
tested enough.  Can we get confident enough to enable this everywhere?



Re: [Qemu-devel] [PATCH V2 0/4] ide: avoid main-loop hang on CDROM/NFS failure

2015-10-26 Thread Peter Lieven

Am 26.10.2015 um 11:42 schrieb Stefan Hajnoczi:

On Mon, Oct 12, 2015 at 02:27:21PM +0200, Peter Lieven wrote:

This series aims at avoiding a hanging main-loop if a vserver has a
CDROM image mounted from a NFS share and that NFS share goes down.
Typical situation is that users mount an CDROM ISO to install something
and then forget to eject that CDROM afterwards.
As a consequence this mounted CD is able to bring down the
whole vserver if the backend NFS share is unreachable. This is bad
especially if the CDROM itself is not needed anymore at this point.

This series aims at fixing 2 blocking I/O operations that would
hang if the NFS server is unavailable:
  - ATAPI PIO read requests used sync calls to blk_read, convert
them to an async variant where possible.
  - If a busmaster DMA request is cancelled all requests are drained.
Convert the drain to an async request canceling.

v1->v2: - fix offset for 2352 byte sector size [Kevin]
 - use a sync request if we continue an elementary transfer.
   As John pointed out we enter a race condition between next
   IDE command and async transfer otherwise. This is sill not
   optimal, but it fixes the NFS down problems for all cases where
   the NFS server goes down while there is no PIO CD activity.
   Of course, it could still happen during a PIO transfer, but I
   expect this to be the unlikelier case.
   I spent some effort trying to read more sectors at once and
   avoiding continuation of elementary transfers, but with
   whatever I came up it was destroying migration between different
   Qemu versions. I have a quite hackish patch that works and
   should survive migration, but I am not happy with it. So I
   would like to start with this version as it is a big improvement
   already.
 - Dropped Patch 5 because it is upstream meanwhile.

Peter Lieven (4):
   ide/atapi: make PIO read requests async
   ide/atapi: blk_aio_readv may return NULL
   ide: add support for cancelable read requests
   ide/atapi: enable cancelable requests

  hw/ide/atapi.c| 99 +--
  hw/ide/core.c | 55 +++
  hw/ide/internal.h | 16 +
  hw/ide/pci.c  | 42 +++
  4 files changed, 188 insertions(+), 24 deletions(-)

Any reason why write and discard requests aren't covered in this series?

If this is a good idea for CD-ROM it should be a good idea for all PCI
IDE devices.

Having a specialized code path is often a sign that it hasn't been
tested enough.  Can we get confident enough to enable this everywhere?


The reason is that the buffered request trick does only work for
read-only devices (like a CDROM). A write request that is completed
on the backend storage at a later point (after the OS thinks the request
is canceled) can cause damage to the filesystem.

Peter




Re: [Qemu-devel] [PATCH v4 03/13] target-arm: Add support for AArch32 S2 negative t0sz

2015-10-26 Thread Edgar E. Iglesias
On Mon, Oct 26, 2015 at 09:52:12AM +, Peter Maydell wrote:
> On 26 October 2015 at 09:20, Edgar E. Iglesias
>  wrote:
> > Yes, sounds good. I've changed the patch to the following:
> >
> > @@ -6521,8 +6521,24 @@ static bool get_phys_addr_lpae(CPUARMState *env, 
> > target_ulong address,
> >   */
> >  int32_t t0sz = extract32(tcr->raw_tcr, 0, 6);
> >  if (va_size == 64) {
> > +/* AArch64 translation.  */
> >  t0sz = MIN(t0sz, 39);
> >  t0sz = MAX(t0sz, 16);
> > +} else if (mmu_idx != ARMMMUIdx_S2NS) {
> > +/* AArch32 stage 1 translation.  */
> > +t0sz = extract32(t0sz, 0, 3);
> > +} else {
> > +/* AArch32 stage 2 translation.  */
> > +bool sext = extract32(t0sz, 4, 1);
> > +bool sign = extract32(t0sz, 3, 1);
> > +t0sz = sextract32(t0sz, 0, 4);
> > +
> > +/* If the sign-extend bit is not the same as t0sz[3], the result
> > + * is unpredictable. Flag this as a guest error.  */
> > +if (sign != sext) {
> > +qemu_log_mask(LOG_GUEST_ERROR,
> > +  "AArch32: VTCR.S / VTCR.T0SZ[3] missmatch\n");
> > +}
> >  }
> >
> 
> Looks good, but maybe we should just do all the extracts
> on tcr->raw_tcr, rather than extracting 6 bits of it and
> then re-extracting some subset of bits from that extract
> (for the 32-bit stage 1 case in particular it would be
> simpler).

OK, I've rearranged the code a bit to use raw_tcr.

Thanks,
Edgar



[Qemu-devel] [PULL 2/3] xen_platform: switch to realize

2015-10-26 Thread Stefano Stabellini
Use realize to initialize the xen_platform device

Signed-off-by: Stefano Stabellini 
Signed-off-by: Eduardo Habkost 
---
 hw/i386/xen/xen_platform.c |6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/hw/i386/xen/xen_platform.c b/hw/i386/xen/xen_platform.c
index 8682c42..3dc68cb 100644
--- a/hw/i386/xen/xen_platform.c
+++ b/hw/i386/xen/xen_platform.c
@@ -382,7 +382,7 @@ static const VMStateDescription vmstate_xen_platform = {
 }
 };
 
-static int xen_platform_initfn(PCIDevice *dev)
+static void xen_platform_realize(PCIDevice *dev, Error **errp)
 {
 PCIXenPlatformState *d = XEN_PLATFORM(dev);
 uint8_t *pci_conf;
@@ -407,8 +407,6 @@ static int xen_platform_initfn(PCIDevice *dev)
  >mmio_bar);
 
 platform_fixed_ioport_init(d);
-
-return 0;
 }
 
 static void platform_reset(DeviceState *dev)
@@ -423,7 +421,7 @@ static void xen_platform_class_init(ObjectClass *klass, 
void *data)
 DeviceClass *dc = DEVICE_CLASS(klass);
 PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
 
-k->init = xen_platform_initfn;
+k->realize = xen_platform_realize;
 k->vendor_id = PCI_VENDOR_ID_XEN;
 k->device_id = PCI_DEVICE_ID_XEN_PLATFORM;
 k->class_id = PCI_CLASS_OTHERS << 8 | 0x80;
-- 
1.7.10.4




[Qemu-devel] [PULL 0/3] xen-2015-10-26

2015-10-26 Thread Stefano Stabellini
The following changes since commit af25e7277d3e95a3ea31023f31d8097ab5e2ac84:

  Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging 
(2015-10-23 18:14:42 +0100)

are available in the git repository at:


  git://xenbits.xen.org/people/sstabellini/qemu-dm.git tags/xen-2015-10-26

for you to fetch changes up to b1ecd51bdbb0fc0a7026662b03e7e7df9d129ca0:

  xen-platform: Replace assert() with appropriate error reporting (2015-10-26 
11:32:24 +)


Xen 2015-10-26


Eduardo Habkost (1):
  xen-platform: Replace assert() with appropriate error reporting

Lan Tianyu (1):
  Qemu/Xen: Fix early freeing MSIX MMIO memory region

Stefano Stabellini (1):
  xen_platform: switch to realize

 hw/i386/xen/xen_platform.c  |   12 +++-
 hw/xen/xen_pt.c |8 
 hw/xen/xen_pt.h |1 +
 hw/xen/xen_pt_config_init.c |2 +-
 hw/xen/xen_pt_msi.c |   13 -
 5 files changed, 29 insertions(+), 7 deletions(-)



Re: [Qemu-devel] [PATCH v4 06/13] target-arm: Add computation of starting level for S2 PTW

2015-10-26 Thread Edgar E. Iglesias
On Fri, Oct 23, 2015 at 05:26:52PM +0100, Peter Maydell wrote:
> On 14 October 2015 at 23:55, Edgar E. Iglesias  
> wrote:
> > From: "Edgar E. Iglesias" 
> >
> > The starting level for S2 pagetable walks is computed
> > differently from the S1 starting level. Implement the S2
> > variant.
> >
> > Signed-off-by: Edgar E. Iglesias 
> > ---
> >  target-arm/helper.c| 117 
> > +++--
> >  target-arm/internals.h |  25 +++
> >  2 files changed, 129 insertions(+), 13 deletions(-)
> >
> > diff --git a/target-arm/helper.c b/target-arm/helper.c
> > index 79b4c03..8530f7e 100644
> > --- a/target-arm/helper.c
> > +++ b/target-arm/helper.c
> > @@ -6406,12 +6406,72 @@ typedef enum {
> >  permission_fault = 3,
> >  } MMUFaultType;
> >
> > +/*
> > + * check_s2_startlevel
> > + * @cpu:ARMCPU
> > + * @is_aa64:True if the translation regime is in AArch64 state
> > + * @startlevel: Suggested starting level
> > + * @inputsize:  Bitsize of IPAs
> > + * @stride: Page-table stride (See the ARM ARM)
> > + *
> > + * Returns true if the suggested starting level is OK and false otherwise.
> > + */
> > +static bool check_s2_startlevel(ARMCPU *cpu, bool is_aa64, int startlevel,
> > +int inputsize, int stride)
> > +{
> > +/* Negative levels are never allowed.  */
> > +if (startlevel < 0) {
> > +return false;
> > +}
> > +
> > +if (is_aa64) {
> > +unsigned int pamax = arm_pamax(cpu);
> > +
> > +switch (stride) {
> > +case 13: /* 64KB Pages.  */
> > +if (startlevel < 1 || (startlevel == 0 && pamax <= 42)) {
> > +return false;
> > +}
> 
> I'm having trouble matching these up with the ARM ARM pseudocode,
> which says for this case for instance
>if (level == 0 || (level == 1 && PAMax() <= 42)) then basefound = FALSE;

Not sure what I was thinking... I've rewritten all of it to match the
specs (I hope).

> 
> (as an aside, the pseudocode uses 'startlevel' for the raw SL0
> field value and 'level' for the 3 - startlevel (or 2 - startlevel)
> value, so it's a bit confusing to use startlevel for both here.)


Yes, I've changed the naming of to better match specs.

Thanks!
Edgar


> 
> > +break;
> > +case 11: /* 16KB Pages.  */
> > +if (startlevel < 1 || (startlevel == 0 && pamax <= 40)) {
> > +return false;
> > +}
> > +break;
> > +case 9: /* 4KB Pages.  */
> > +if (startlevel == 0 && pamax <= 42) {
> > +return false;
> > +}
> > +break;
> > +default:
> > +g_assert_not_reached();
> > +}
> > +} else {
> > +const int grainsize = stride + 3;
> > +int startsizecheck;
> > +
> > +/* AArch32 only supports 4KB pages. Assert on that.  */
> > +assert(stride == 9);
> > +
> > +if (startlevel == 0) {
> > +return false;
> > +}
> > +
> > +startsizecheck = inputsize - ((3 - startlevel) * stride + 
> > grainsize);
> > +if (startsizecheck < 1 || startsizecheck > stride + 4) {
> > +return false;
> > +}
> > +}
> > +return true;
> > +}
> > +
> >  static bool get_phys_addr_lpae(CPUARMState *env, target_ulong address,
> > int access_type, ARMMMUIdx mmu_idx,
> > hwaddr *phys_ptr, MemTxAttrs *txattrs, int 
> > *prot,
> > target_ulong *page_size_ptr, uint32_t *fsr)
> >  {
> > -CPUState *cs = CPU(arm_env_get_cpu(env));
> > +ARMCPU *cpu = arm_env_get_cpu(env);
> > +CPUState *cs = CPU(cpu);
> >  /* Read an LPAE long-descriptor translation table. */
> >  MMUFaultType fault_type = translation_fault;
> >  uint32_t level = 1;
> > @@ -6560,18 +6620,49 @@ static bool get_phys_addr_lpae(CPUARMState *env, 
> > target_ulong address,
> >  goto do_fault;
> >  }
> >
> > -/* The starting level depends on the virtual address size (which can be
> > - * up to 48 bits) and the translation granule size. It indicates the 
> > number
> > - * of strides (stride bits at a time) needed to consume the bits
> > - * of the input address. In the pseudocode this is:
> > - *  level = 4 - RoundUp((inputsize - grainsize) / stride)
> > - * where their 'inputsize' is our 'inputsize', 'grainsize' is
> > - * our 'stride + 3' and 'stride' is our 'stride'.
> > - * Applying the usual "rounded up m/n is (m+n-1)/n" and simplifying:
> > - * = 4 - (inputsize - stride - 3 + stride - 1) / stride
> > - * = 4 - (inputsize - 4) / stride;
> > - */
> > -level = 4 - (inputsize - 4) / stride;
> > +if (mmu_idx != ARMMMUIdx_S2NS) {
> > +/* The starting level depends on the virtual address size 

[Qemu-devel] [PATCH 4/7] kvm: Hyper-V SynIC irq routing support

2015-10-26 Thread Andrey Smetanin
Signed-off-by: Andrey Smetanin 
Reviewed-by: Roman Kagan 
Signed-off-by: Denis V. Lunev 
CC: Vitaly Kuznetsov 
CC: "K. Y. Srinivasan" 
CC: Gleb Natapov 
CC: Paolo Bonzini 
CC: Roman Kagan 
CC: Denis V. Lunev 
CC: k...@vger.kernel.org
CC: virtualizat...@lists.linux-foundation.org

---
 include/sysemu/kvm.h |  1 +
 kvm-all.c| 33 +
 2 files changed, 34 insertions(+)

diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 461ef65..5af10fb 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -455,6 +455,7 @@ int kvm_irqchip_update_msi_route(KVMState *s, int virq, 
MSIMessage msg,
 void kvm_irqchip_release_virq(KVMState *s, int virq);
 
 int kvm_irqchip_add_adapter_route(KVMState *s, AdapterInfo *adapter);
+int kvm_irqchip_add_hv_sint_route(KVMState *s, uint32_t vcpu, uint32_t sint);
 
 int kvm_irqchip_add_irqfd_notifier_gsi(KVMState *s, EventNotifier *n,
EventNotifier *rn, int virq);
diff --git a/kvm-all.c b/kvm-all.c
index c442838..4d36a6c 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1297,6 +1297,34 @@ int kvm_irqchip_add_adapter_route(KVMState *s, 
AdapterInfo *adapter)
 return virq;
 }
 
+int kvm_irqchip_add_hv_sint_route(KVMState *s, uint32_t vcpu, uint32_t sint)
+{
+struct kvm_irq_routing_entry kroute = {};
+int virq;
+
+if (!kvm_gsi_routing_enabled()) {
+return -ENOSYS;
+}
+if (!kvm_check_extension(s, KVM_CAP_HYPERV_SYNIC)) {
+return -ENOSYS;
+}
+virq = kvm_irqchip_get_virq(s);
+if (virq < 0) {
+return virq;
+}
+
+kroute.gsi = virq;
+kroute.type = KVM_IRQ_ROUTING_HV_SINT;
+kroute.flags = 0;
+kroute.u.hv_sint.vcpu = vcpu;
+kroute.u.hv_sint.sint = sint;
+
+kvm_add_routing_entry(s, );
+kvm_irqchip_commit_routes(s);
+
+return virq;
+}
+
 #else /* !KVM_CAP_IRQ_ROUTING */
 
 void kvm_init_irq_routing(KVMState *s)
@@ -1322,6 +1350,11 @@ int kvm_irqchip_add_adapter_route(KVMState *s, 
AdapterInfo *adapter)
 return -ENOSYS;
 }
 
+int kvm_irqchip_add_hv_sint_route(KVMState *s, uint32_t vcpu, uint32_t sint)
+{
+return -ENOSYS;
+}
+
 static int kvm_irqchip_assign_irqfd(KVMState *s, int fd, int virq, bool assign)
 {
 abort();
-- 
2.4.3




[Qemu-devel] [PATCH 0/7] Hyper-V Synthetic interrupt controller

2015-10-26 Thread Andrey Smetanin
Hyper-V SynIC (synthetic interrupt controller) device
implementation.

The implementation contains:
* msr's support
* irq routing setup
* irq injection
* irq ack callback registration
* event/message pages changes tracking at Hyper-V exit
* Hyper-V test device to test SynIC by kvm-unit-tests

Andrey Smetanin (7):
  standard-headers/x86: add Hyper-V SynIC constants
  target-i386/kvm: Hyper-V SynIC MSR's support
  linux-headers/kvm: add Hyper-V SynIC irq routing type and struct
  kvm: Hyper-V SynIC irq routing support
  linux-headers/kvm: KVM_EXIT_HYPERV type and struct
  target-i386/hyperv: Hyper-V SynIC SINT routing and vCPU exit
  hw/misc: Hyper-V test device 'hyperv-testdev'

 default-configs/i386-softmmu.mak  |   1 +
 default-configs/x86_64-softmmu.mak|   1 +
 hw/misc/Makefile.objs |   1 +
 hw/misc/hyperv_testdev.c  | 164 ++
 include/standard-headers/asm-x86/hyperv.h |  12 +++
 include/sysemu/kvm.h  |   1 +
 kvm-all.c |  33 ++
 linux-headers/linux/kvm.h |  25 +
 target-i386/Makefile.objs |   2 +-
 target-i386/cpu-qom.h |   1 +
 target-i386/cpu.c |   1 +
 target-i386/cpu.h |   5 +
 target-i386/hyperv.c  | 127 +++
 target-i386/hyperv.h  |  42 
 target-i386/kvm.c |  66 +++-
 target-i386/machine.c |  39 +++
 16 files changed, 519 insertions(+), 2 deletions(-)
 create mode 100644 hw/misc/hyperv_testdev.c
 create mode 100644 target-i386/hyperv.c
 create mode 100644 target-i386/hyperv.h

Signed-off-by: Andrey Smetanin 
Reviewed-by: Roman Kagan 
Signed-off-by: Denis V. Lunev 
CC: Vitaly Kuznetsov 
CC: "K. Y. Srinivasan" 
CC: Gleb Natapov 
CC: Paolo Bonzini 
CC: Roman Kagan 
CC: Denis V. Lunev 
CC: k...@vger.kernel.org
CC: virtualizat...@lists.linux-foundation.org

-- 
2.4.3




[Qemu-devel] [PATCH 5/7] linux-headers/kvm: KVM_EXIT_HYPERV type and struct

2015-10-26 Thread Andrey Smetanin
Signed-off-by: Andrey Smetanin 
Reviewed-by: Roman Kagan 
Signed-off-by: Denis V. Lunev 
CC: Vitaly Kuznetsov 
CC: "K. Y. Srinivasan" 
CC: Gleb Natapov 
CC: Paolo Bonzini 
CC: Roman Kagan 
CC: Denis V. Lunev 
CC: k...@vger.kernel.org
CC: virtualizat...@lists.linux-foundation.org

---
 linux-headers/linux/kvm.h | 17 +
 1 file changed, 17 insertions(+)

diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index 0bff588..4e20262 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -154,6 +154,20 @@ struct kvm_s390_skeys {
__u32 flags;
__u32 reserved[9];
 };
+
+struct kvm_hyperv_exit {
+#define KVM_EXIT_HYPERV_SYNIC  1
+   __u32 type;
+   union {
+   struct {
+   __u32 msr;
+   __u64 control;
+   __u64 evt_page;
+   __u64 msg_page;
+   } synic;
+   } u;
+};
+
 #define KVM_S390_GET_SKEYS_NONE   1
 #define KVM_S390_SKEYS_MAX1048576
 
@@ -184,6 +198,7 @@ struct kvm_s390_skeys {
 #define KVM_EXIT_SYSTEM_EVENT 24
 #define KVM_EXIT_S390_STSI25
 #define KVM_EXIT_IOAPIC_EOI   26
+#define KVM_EXIT_HYPERV   27
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 /* Emulate instruction failed. */
@@ -338,6 +353,8 @@ struct kvm_run {
struct {
__u8 vector;
} eoi;
+   /* KVM_EXIT_HYPERV */
+   struct kvm_hyperv_exit hyperv;
/* Fix the size of the union. */
char padding[256];
};
-- 
2.4.3




Re: [Qemu-devel] Reminder: we're now in softfreeze

2015-10-26 Thread Mark Cave-Ayland
On 26/10/15 09:27, Peter Maydell wrote:

> On 26 October 2015 at 06:25, Peter Crosthwaite
>  wrote:
>> On Thu, Oct 22, 2015 at 3:30 AM, Peter Maydell  
>> wrote:
>>> For this release I'd like to try tracking known-issues on
>>> the wiki page: http://wiki.qemu.org/Planning/2.5
>>> with a list of patch series or bugs which need to be fixed
>>> into the release. This will hopefully avoid the problem we
>>> had last time around where I missed patches that should have
>>> gone into an rc.
>>
>> Can patches fix this with the right form of query? Maybe another piece
>> of metadata for patches that are promoted for rc merge.
> 
> ...only if the submitter remembers to tag it when they send
> it, and it doesn't help for issues that don't have a patch yet,
> or for tracking whether a submaintainer has taken the patch.
> 
> Tagging your patch as "[PATCH for-2.5]" is a good idea anyway
> once we're in freeze, though.

Maybe we also need a [PING for-2.5] or similar? I've just sent another
couple of pings for patches (one of which fixes the broken
analyze-migration.py script) which have fallen through the cracks over
the past couple of months. It would be nice to have an easy way to flag
outstanding patches like this, especially during freeze when things get
really busy.


ATB,

Mark.




Re: [Qemu-devel] [PATCH 15/19] pc: acpi: bump DSDT revision compliance to v2

2015-10-26 Thread Michael S. Tsirkin
On Mon, Oct 26, 2015 at 11:03:01AM +0100, Igor Mammedov wrote:
> On Sat, 24 Oct 2015 22:40:59 +0300
> "Michael S. Tsirkin"  wrote:
> 
> > On Fri, Oct 23, 2015 at 04:57:18PM +0200, Igor Mammedov wrote:
> > > it turns on 64-bit integer handling in OSPM, which we could use
> > > for writing simpler/smaller AML code.
> > > Tested with Windows XP and Windows Server 2008, Linux:
> > >  * XP doesn't care about revision and continues to use 32 integers
> > >and boots just fine with this change.
> > >  * WS 2008 and Linux - support rev2 and use 64-bit integers
> > > 
> > > Signed-off-by: Igor Mammedov 
> > 
> > This is still planned, right? IIUC you didn't post any code
> > that needs the 64 bit math.
> nope, the next patch 16/19 uses 64-bit math,
> 
> it greatly simplifies _CRS as we don't have to do
> 64-bit math manually using 32-bit integers.
> 
> And even if we put new MHPD.MCRS() that uses 64-bit math in SSDT,
> it won't crash XP unless someone would try to do memory hotplug
> 
> and even it could be 'fixed' if we check _REV on every
> hotplug event, it's a bit ugly but should work.

Aha. That's exactly what I said. All that patch commit log
says is
pc: acpi: memhp: move MHPD.MCRS method into MHPT table
when in fact you also rework it all to use 64 bit math.

So pls don't do this. Pls start by rewriting ASL in C
while keeping AML code identical. Cleanups - next.

> > 
> > > ---
> > >  hw/i386/acpi-build.c  | 2 +-
> > >  hw/i386/acpi-dsdt.dsl | 2 +-
> > >  hw/i386/q35-acpi-dsdt.dsl | 2 +-
> > >  3 files changed, 3 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> > > index 8add4d9..c929540 100644
> > > --- a/hw/i386/acpi-build.c
> > > +++ b/hw/i386/acpi-build.c
> > > @@ -1484,7 +1484,7 @@ build_dsdt(GArray *table_data, GArray *linker, 
> > > AcpiMiscInfo *misc)
> > >  
> > >  memset(dsdt, 0, sizeof *dsdt);
> > >  build_header(linker, table_data, dsdt, "DSDT",
> > > - misc->dsdt_size, 1);
> > > + misc->dsdt_size, 2);
> > >  }
> > >  
> > >  static GArray *
> > > diff --git a/hw/i386/acpi-dsdt.dsl b/hw/i386/acpi-dsdt.dsl
> > > index 8dba096..6d46b36 100644
> > > --- a/hw/i386/acpi-dsdt.dsl
> > > +++ b/hw/i386/acpi-dsdt.dsl
> > > @@ -22,7 +22,7 @@ ACPI_EXTRACT_ALL_CODE AcpiDsdtAmlCode
> > >  DefinitionBlock (
> > >  "acpi-dsdt.aml",// Output Filename
> > >  "DSDT", // Signature
> > > -0x01,   // DSDT Compliance Revision
> > > +0x02,   // DSDT Compliance Revision
> > >  "BXPC", // OEMID
> > >  "BXDSDT",   // TABLE ID
> > >  0x1 // OEM Revision
> > > diff --git a/hw/i386/q35-acpi-dsdt.dsl b/hw/i386/q35-acpi-dsdt.dsl
> > > index 7be7b37..ecefdec 100644
> > > --- a/hw/i386/q35-acpi-dsdt.dsl
> > > +++ b/hw/i386/q35-acpi-dsdt.dsl
> > > @@ -28,7 +28,7 @@ ACPI_EXTRACT_ALL_CODE Q35AcpiDsdtAmlCode
> > >  DefinitionBlock (
> > >  "q35-acpi-dsdt.aml",// Output Filename
> > >  "DSDT", // Signature
> > > -0x01,   // DSDT Compliance Revision
> > > +0x02,   // DSDT Compliance Revision
> > >  "BXPC", // OEMID
> > >  "BXDSDT",   // TABLE ID
> > >  0x2 // OEM Revision
> > > -- 
> > > 1.8.3.1



Re: [Qemu-devel] [PATCH] vhdx: Return true for bdrv_has_zero_init

2015-10-26 Thread Stefan Hajnoczi
On Fri, Oct 23, 2015 at 06:00:52PM +0200, Kevin Wolf wrote:
> Am 23.10.2015 um 17:32 hat Stefan Hajnoczi geschrieben:
> > On Fri, Dec 05, 2014 at 11:26:49AM +0100, Kevin Wolf wrote:
> > > Like for most other image formats, vhdx images read as all zero in qemu
> > > after their creation (we're taking advantage from the fact that qemu has
> > > just created the image, because PAYLOAD_BLOCK_NOT_PRESENT actually means
> > > undefined rather than zeroed according to the spec).
> > > 
> > > This fixes that 'qemu-img convert' to vhdx fully populates the image.
> > > 
> > > Signed-off-by: Kevin Wolf 
> > > ---
> > >  block/vhdx.c | 1 +
> > >  1 file changed, 1 insertion(+)
> > 
> > Reviewed-by: Stefan Hajnoczi 
> 
> Hasn't this been implemented in commit 85b712c9 almost a year ago?

Yes :).

I was cleaning out my inbox and didn't notice the date until after
replying.



Re: [Qemu-devel] [PATCH v5 2/2] enable multi-function hot-add

2015-10-26 Thread Cao jin

Hi,
warning, there is a long story below O:-)

On 10/26/2015 04:29 PM, Michael S. Tsirkin wrote:

On Mon, Oct 26, 2015 at 11:29:18AM +0800, Cao jin wrote:

Enable pcie device multifunction hot, just ensure the function 0
added last, then driver will got the notification to scan all the
function in the slot.

Signed-off-by: Cao jin 
---
  hw/pci/pci.c | 31 ++-
  hw/pci/pci_host.c| 13 +++--
  hw/pci/pcie.c| 18 +-
  include/hw/pci/pci.h |  1 +
  4 files changed, 51 insertions(+), 12 deletions(-)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index b0bf540..6f43b12 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -847,6 +847,9 @@ static PCIDevice *do_pci_register_device(PCIDevice 
*pci_dev, PCIBus *bus,
  PCIConfigWriteFunc *config_write = pc->config_write;
  Error *local_err = NULL;
  AddressSpace *dma_as;
+DeviceState *dev = DEVICE(pci_dev);
+
+pci_dev->bus = bus;

  if (devfn < 0) {
  for(devfn = bus->devfn_min ; devfn < ARRAY_SIZE(bus->devices);
@@ -864,9 +867,17 @@ static PCIDevice *do_pci_register_device(PCIDevice 
*pci_dev, PCIBus *bus,
 PCI_SLOT(devfn), PCI_FUNC(devfn), name,
 bus->devices[devfn]->name);
  return NULL;
+} else if (dev->hotplugged &&
+   pci_get_function_0(pci_dev)) {
+error_setg(errp, "PCI: slot %d function 0 already ocuppied by %s,"
+   " new func %s cannot be exposed to guest.",
+   PCI_SLOT(devfn),
+   bus->devices[PCI_DEVFN(PCI_SLOT(devfn), 0)]->name,
+   name);
+
+   return NULL;
  }

-pci_dev->bus = bus;
  pci_dev->devfn = devfn;
  dma_as = pci_device_iommu_address_space(pci_dev);

@@ -2454,6 +2465,24 @@ void pci_bus_get_w64_range(PCIBus *bus, Range *range)
  pci_for_each_device_under_bus(bus, pci_dev_get_w64, range);
  }

+/* ARI device function number range is 0-255, means has only 1 function0;
+ * while non-ARI device has 1 function0 in each slot. non-ARI device could
+ * be PCI or PCIe, and there is up to 32 slots for PCI */
+PCIDevice *pci_get_function_0(PCIDevice *pci_dev)
+{
+PCIDevice *parent_dev;
+
+parent_dev = pci_bridge_get_device(pci_dev->bus);
+if (pcie_cap_is_arifwd_enabled(parent_dev) &&
+pci_find_capability(pci_dev, PCI_EXT_CAP_ID_ARI)) {
+/* ARI enabled */
+return pci_dev->bus->devices[0];


That's wrong I think since software might enable ARI after hotplug.


According to the spec, ARI feature is enabled only based on the 
following 2 conditions:
1. For an ARI Downstream Port, the capability is communicated through 
the Device Capabilities 2 register.
2. For an ARI Device, the capability is communicated through the ARI 
Capability structure.


also according to the driver code pci_configure_ari(), I think my 
implementation does follows spec?


And as you know, only ARI feature is enabled, we return 
pci_dev->bus->devices[0]



And I'm not sure all functions must have the ARI capability.



do you means there maybe the following condition: ARI forwarding bit is 
enabled in downstream port, but the functions below, some have ARI 
Capability structure while the others don`t. Shouldn`t we forbid the 
condition? Because If this condition happens, I am not sure whether the 
device could work normally.


In the IMPLEMENTATION NOTE: ARI Forwarding Enable Being Set 
Inappropriately, It seems the spec don`t want that complicated 
condition? While actually, qemu calls pcie_cap_arifwd_init() in root 
port/downstream port initialization first.



I don't see why don't you just go by spec:



I do read the spec, and also referred the pcie driver code(see 
pci_configure_ari()). I think it is my inaccurate understanding about 
"upstream port" results in my implementation(I also consult PCISIG 
support team for this question, see attachment). The concept of 
"upstream port" in ARI device definition confuse me a lot.


Talking about the definition of ARI device, I always thought the 
"upstream port" should on the ARI device itself(like the figure I drew 
before), or else why the definition add the words "with an upstream 
port"? It seems to me that the emphasis is on "with an upstream port", 
and implies to me that the non-ARI device doesn`t have an upstream port. 
Seeing their replies in attachment, says that I am wrong at this point, 
both of them have an upstream port.(their saying actually make me more 
confused at that time, but now I think I am clear about this concept 
after reading your implementation below)


After reading your implementation, and the PCISIG support team replies 
again, finally I figure out that, "upstream port" in ARI device 
definition is just a port whose position is closer to root complex, and 
the point is, the "upstream port" doesn`t need to exist on the ARI 
device itself, which also means, take Figure 6-13 in PCIe spec 3.1, 

[Qemu-devel] 4k seq read splitting for virtio-blk - possible workarounds?

2015-10-26 Thread Andrey Korolyov
Hi,

during the test against generic storage backend with NBD frontend we
found that the virtio block device is always splitting a single read
range request to 4k ones, bringing the overall performance of the
sequential reads far below virtio-scsi. Random reads are going
relatively well on small blocks due to small overhead comparing to
sequential ones and writes are ok in all cases. Multiread slightly
improves the situation, but it would be nice to see complete
pass-through of range read requests down to backend without an
intermediate splitting.

Samples measured on an NBD backend during 128k sequential reads for
both virtio-blk and virtio-scsi are attached. Please let me know if it
looks like that I missed something or this behavior is plainly wrong.

Thanks!
125550: *NBD_CMD_READ from 513298432 (1002536) len 4096, exp->buf, +(READ from 
fd 5 offset 513298432 len 4096), buf->net, +OK!
125551: *NBD_CMD_READ from 513302528 (1002544) len 4096, exp->buf, +(READ from 
fd 5 offset 513302528 len 4096), buf->net, +OK!
125552: *NBD_CMD_READ from 513306624 (1002552) len 4096, exp->buf, +(READ from 
fd 5 offset 513306624 len 4096), buf->net, +OK!
125553: *NBD_CMD_READ from 513310720 (1002560) len 4096, exp->buf, +(READ from 
fd 5 offset 513310720 len 4096), buf->net, +OK!
125554: *NBD_CMD_READ from 513314816 (1002568) len 4096, exp->buf, +(READ from 
fd 5 offset 513314816 len 4096), buf->net, +OK!
12: *NBD_CMD_READ from 513318912 (1002576) len 4096, exp->buf, +(READ from 
fd 5 offset 513318912 len 4096), buf->net, +OK!
125556: *NBD_CMD_READ from 513323008 (1002584) len 4096, exp->buf, +(READ from 
fd 5 offset 513323008 len 4096), buf->net, +OK!
125557: *NBD_CMD_READ from 513327104 (1002592) len 4096, exp->buf, +(READ from 
fd 5 offset 513327104 len 4096), buf->net, +OK!
125558: *NBD_CMD_READ from 513331200 (1002600) len 4096, exp->buf, +(READ from 
fd 5 offset 513331200 len 4096), buf->net, +OK!
125559: *NBD_CMD_READ from 513335296 (1002608) len 4096, exp->buf, +(READ from 
fd 5 offset 513335296 len 4096), buf->net, +OK!
125560: *NBD_CMD_READ from 513339392 (1002616) len 4096, exp->buf, +(READ from 
fd 5 offset 513339392 len 4096), buf->net, +OK!
125561: *NBD_CMD_READ from 513343488 (1002624) len 4096, exp->buf, +(READ from 
fd 5 offset 513343488 len 4096), buf->net, +OK!
125562: *NBD_CMD_READ from 513347584 (1002632) len 4096, exp->buf, +(READ from 
fd 5 offset 513347584 len 4096), buf->net, +OK!
8294: *NBD_CMD_READ from 1071120384 (2092032) len 131072, exp->buf, +(READ from 
fd 5 offset 1071120384 len 131072), buf->net, +OK!
8295: *NBD_CMD_READ from 1071251456 (2092288) len 131072, exp->buf, +(READ from 
fd 5 offset 1071251456 len 131072), buf->net, +OK!
8296: *NBD_CMD_READ from 1071382528 (2092544) len 131072, exp->buf, +(READ from 
fd 5 offset 1071382528 len 131072), buf->net, +OK!
8297: *NBD_CMD_READ from 1071513600 (2092800) len 131072, exp->buf, +(READ from 
fd 5 offset 1071513600 len 131072), buf->net, +OK!
8298: *NBD_CMD_READ from 1071644672 (2093056) len 131072, exp->buf, +(READ from 
fd 5 offset 1071644672 len 131072), buf->net, +OK!
8299: *NBD_CMD_READ from 1071775744 (2093312) len 131072, exp->buf, +(READ from 
fd 5 offset 1071775744 len 131072), buf->net, +OK!
8300: *NBD_CMD_READ from 1071906816 (2093568) len 131072, exp->buf, +(READ from 
fd 5 offset 1071906816 len 131072), buf->net, +OK!
8301: *NBD_CMD_READ from 1072037888 (2093824) len 131072, exp->buf, +(READ from 
fd 5 offset 1072037888 len 131072), buf->net, +OK!
8302: *NBD_CMD_READ from 1072168960 (2094080) len 131072, exp->buf, +(READ from 
fd 5 offset 1072168960 len 131072), buf->net, +OK!
8303: *NBD_CMD_READ from 1072300032 (2094336) len 131072, exp->buf, +(READ from 
fd 5 offset 1072300032 len 131072), buf->net, +OK!
8304: *NBD_CMD_READ from 1072431104 (2094592) len 131072, exp->buf, +(READ from 
fd 5 offset 1072431104 len 131072), buf->net, +OK!
8305: *NBD_CMD_READ from 1072562176 (2094848) len 131072, exp->buf, +(READ from 
fd 5 offset 1072562176 len 131072), buf->net, +OK!
8306: *NBD_CMD_READ from 1072693248 (2095104) len 131072, exp->buf, +(READ from 
fd 5 offset 1072693248 len 131072), buf->net, +OK!
8307: *NBD_CMD_READ from 1072824320 (2095360) len 131072, exp->buf, +(READ from 
fd 5 offset 1072824320 len 131072), buf->net, +OK!


Re: [Qemu-devel] [PULL v4 00/52] Ivshmem patches

2015-10-26 Thread Marc-André Lureau
Hi Peter,

On Mon, Oct 19, 2015 at 5:57 PM, Peter Maydell  wrote:
>
> What is happening here is that we are looping infinitely in
> mktempshmem() because shm_open() returns -1 with errno ENOSYS,
> and there's no code in the loop that stops the loop on anything
> except shm_open succeeding, or even prints anything out about
> shm_open failing.
>
> I think this is failing for me because my system's chroot doesn't have
> /dev/shm mounted. It would be nice if we could at a minimum handle
> this reasonably gracefully...

The following diff works for me:

diff --git a/tests/ivshmem-test.c b/tests/ivshmem-test.c
index efaa6e3..c8f0cf0 100644
--- a/tests/ivshmem-test.c
+++ b/tests/ivshmem-test.c
@@ -441,13 +441,18 @@ static gchar *mktempshm(int size, int *fd)
 }

 g_free(name);
+
+if (errno != EEXIST) {
+perror("shm_open");
+return NULL;
+}
 }
 }

 int main(int argc, char **argv)
 {
 int ret, fd;
 gchar dir[] = "/tmp/ivshmem-test.XX";

 #if !GLIB_CHECK_VERSION(2, 31, 0)
 if (!g_thread_supported()) {
@@ -460,6 +465,9 @@ int main(int argc, char **argv)
 qtest_add_abrt_handler(abrt_handler, NULL);
 /* shm */
 tmpshm = mktempshm(TMPSHMSIZE, );
+if (!tmpshm) {
+return 0;
+}
 tmpshmem = mmap(0, TMPSHMSIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
 g_assert(tmpshmem != MAP_FAILED);
 /* server */

I rebased and updated the tag.


-- 
Marc-André Lureau



Re: [Qemu-devel] Reminder: we're now in softfreeze

2015-10-26 Thread Peter Maydell
On 26 October 2015 at 06:25, Peter Crosthwaite
 wrote:
> On Thu, Oct 22, 2015 at 3:30 AM, Peter Maydell  
> wrote:
>> For this release I'd like to try tracking known-issues on
>> the wiki page: http://wiki.qemu.org/Planning/2.5
>> with a list of patch series or bugs which need to be fixed
>> into the release. This will hopefully avoid the problem we
>> had last time around where I missed patches that should have
>> gone into an rc.
>
> Can patches fix this with the right form of query? Maybe another piece
> of metadata for patches that are promoted for rc merge.

...only if the submitter remembers to tag it when they send
it, and it doesn't help for issues that don't have a patch yet,
or for tracking whether a submaintainer has taken the patch.

Tagging your patch as "[PATCH for-2.5]" is a good idea anyway
once we're in freeze, though.

thanks
-- PMM



[Qemu-devel] [PATCH v2] pc: memhp: enforce minimal 128Mb alignment for pc-dimm

2015-10-26 Thread Igor Mammedov
commit aa8580cd "pc: memhp: force gaps between DIMM's GPA"
regressed memory hot-unplug for linux guests triggering
following BUGON
 =
 kernel BUG at mm/memory_hotplug.c:703!
 ...
 [] acpi_memory_device_remove+0x79/0xa5
 [] acpi_bus_trim+0x5a/0x8d
 [] acpi_device_hotplug+0x1b7/0x418
 ===
BUG_ON(phys_start_pfn & ~PAGE_SECTION_MASK);
 ===

reson for it is that x86-64 linux guest supports memory
hotplug in chunks of 128Mb and memory section also should
be 128Mb aligned.
However gaps forced between 128Mb DIMMs with backend's
natural alignment of 2Mb make the 2nd and following
DIMMs not being aligned on 128Mb boundary as it was
originally. To fix regression enforce minimal 128Mb
alignment like it was done for PPC.

Signed-off-by: Igor Mammedov 
---
PS:
  PAGE_SECTION_MASK is derived from SECTION_SIZE_BITS which
  is arch dependent so this is fix for x86-64 target only.
  If anyone cares obout 32bit guests, it should also be fine
  for x86-32 which has 64Mb memory sections/alignment.
---
 hw/i386/pc.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 3d958ba..0f7cf7c 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1610,6 +1610,8 @@ void ioapic_init_gsi(GSIState *gsi_state, const char 
*parent_name)
 }
 }
 
+#define PC_MIN_DIMM_ALIGNMENT (1ULL << 27) /* 128Mb */
+
 static void pc_dimm_plug(HotplugHandler *hotplug_dev,
  DeviceState *dev, Error **errp)
 {
@@ -1624,6 +1626,16 @@ static void pc_dimm_plug(HotplugHandler *hotplug_dev,
 
 if (memory_region_get_alignment(mr) && pcms->enforce_aligned_dimm) {
 align = memory_region_get_alignment(mr);
+/*
+ * Linux x64 guests expect 128Mb aligned DIMM,
+ * but this change causes memory layout change so
+ * for compatibility apply 128Mb alignment only
+ * when forced gaps are enabled since it is the cause
+ * of misalignment.
+ */
+if (pcmc->inter_dimm_gap && align < PC_MIN_DIMM_ALIGNMENT) {
+align = PC_MIN_DIMM_ALIGNMENT;
+}
 }
 
 if (!pcms->acpi_dev) {
-- 
1.8.3.1




Re: [Qemu-devel] [PATCH] virtio: Notice when the system doesn't support MSIx at all

2015-10-26 Thread Mark Cave-Ayland
On 27/09/15 15:40, Peter Maydell wrote:

> Ping^2 -- this is still confusing users, cf bug LP:1500175.
> 
> thanks
> -- PMM
> 
> On 6 July 2015 at 11:38, Peter Maydell  wrote:
>> On 3 July 2015 at 09:22, Mark Cave-Ayland  
>> wrote:
>>> On 19/05/15 21:29, Richard Henderson wrote:
>>>
 And do not issue an error_report in that case.

 Signed-off-by: Richard Henderson 
 ---
  hw/virtio/virtio-pci.c | 15 ++-
  1 file changed, 10 insertions(+), 5 deletions(-)
 --

 This seems to be about the only sane way to test the value of
 msi_supported from here.  Thoughts?
>>
>>> Ping? Any chance of getting this applied for 2.4 since it's something
>>> that I do get emails about for SPARC64?
>>
>> Michael would like a respin with the commit message correction
>> he suggested. RTH -- could you send out a fixed up version, please?
>>
>> thanks
>> -- PMM

Another ping on this patch - please can we have it for 2.5? It has been
around since before 2.4 and I'd hate for it to miss another release :(


ATB,

Mark.




[Qemu-devel] [PATCH 3/7] linux-headers/kvm: add Hyper-V SynIC irq routing type and struct

2015-10-26 Thread Andrey Smetanin
Signed-off-by: Andrey Smetanin 
Reviewed-by: Roman Kagan 
Signed-off-by: Denis V. Lunev 
CC: Vitaly Kuznetsov 
CC: "K. Y. Srinivasan" 
CC: Gleb Natapov 
CC: Paolo Bonzini 
CC: Roman Kagan 
CC: Denis V. Lunev 
CC: k...@vger.kernel.org
CC: virtualizat...@lists.linux-foundation.org

---
 linux-headers/linux/kvm.h | 8 
 1 file changed, 8 insertions(+)

diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index dcc410e..0bff588 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -831,6 +831,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_GUEST_DEBUG_HW_WPS 120
 #define KVM_CAP_SPLIT_IRQCHIP 121
 #define KVM_CAP_IOEVENTFD_ANY_LENGTH 122
+#define KVM_CAP_HYPERV_SYNIC 123
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -854,10 +855,16 @@ struct kvm_irq_routing_s390_adapter {
__u32 adapter_id;
 };
 
+struct kvm_irq_routing_hv_sint {
+   __u32 vcpu;
+   __u32 sint;
+};
+
 /* gsi routing entry types */
 #define KVM_IRQ_ROUTING_IRQCHIP 1
 #define KVM_IRQ_ROUTING_MSI 2
 #define KVM_IRQ_ROUTING_S390_ADAPTER 3
+#define KVM_IRQ_ROUTING_HV_SINT 4
 
 struct kvm_irq_routing_entry {
__u32 gsi;
@@ -868,6 +875,7 @@ struct kvm_irq_routing_entry {
struct kvm_irq_routing_irqchip irqchip;
struct kvm_irq_routing_msi msi;
struct kvm_irq_routing_s390_adapter adapter;
+   struct kvm_irq_routing_hv_sint hv_sint;
__u32 pad[8];
} u;
 };
-- 
2.4.3




Re: [Qemu-devel] [PATCH v4 03/13] target-arm: Add support for AArch32 S2 negative t0sz

2015-10-26 Thread Peter Maydell
On 26 October 2015 at 09:20, Edgar E. Iglesias
 wrote:
> Yes, sounds good. I've changed the patch to the following:
>
> @@ -6521,8 +6521,24 @@ static bool get_phys_addr_lpae(CPUARMState *env, 
> target_ulong address,
>   */
>  int32_t t0sz = extract32(tcr->raw_tcr, 0, 6);
>  if (va_size == 64) {
> +/* AArch64 translation.  */
>  t0sz = MIN(t0sz, 39);
>  t0sz = MAX(t0sz, 16);
> +} else if (mmu_idx != ARMMMUIdx_S2NS) {
> +/* AArch32 stage 1 translation.  */
> +t0sz = extract32(t0sz, 0, 3);
> +} else {
> +/* AArch32 stage 2 translation.  */
> +bool sext = extract32(t0sz, 4, 1);
> +bool sign = extract32(t0sz, 3, 1);
> +t0sz = sextract32(t0sz, 0, 4);
> +
> +/* If the sign-extend bit is not the same as t0sz[3], the result
> + * is unpredictable. Flag this as a guest error.  */
> +if (sign != sext) {
> +qemu_log_mask(LOG_GUEST_ERROR,
> +  "AArch32: VTCR.S / VTCR.T0SZ[3] missmatch\n");
> +}
>  }
>

Looks good, but maybe we should just do all the extracts
on tcr->raw_tcr, rather than extracting 6 bits of it and
then re-extracting some subset of bits from that extract
(for the 32-bit stage 1 case in particular it would be
simpler).

-- PMM



[Qemu-devel] [PULL v5 00/51] Ivshmem patches

2015-10-26 Thread Marc-André Lureau
The following changes since commit bc79082e4cd12c1241fa03b0abceacf45f537740:

  Merge remote-tracking branch
'remotes/ehabkost/tags/x86-pull-request' into staging (2015-10-23
16:35:43 +0100)

are available in the git repository at:

  https://github.com/elmarco/qemu tags/ivshmem-pull-request

for you to fetch changes up to 7d4f4bdaf785dfe9fc41b06f85cc9aaf1b1474ee:

  doc: document ivshmem & hugepages (2015-10-26 10:19:53 +0100)


v5:
- check errno in test mktempshm()
- rebased


Andreas Färber (1):
  tests: Add ivshmem qtest

David Marchand (3):
  contrib: add ivshmem client and server
  docs: update ivshmem device spec
  ivshmem: add check on protocol version in QEMU

Marc-André Lureau (47):
  config: enable ivshmem on POSIX
  char: add qemu_chr_free()
  msix: add VMSTATE_MSIX_TEST
  ivhsmem: read do not accept more than sizeof(long)
  ivshmem: fix number of bytes to push to fifo
  ivshmem: factor out the incoming fifo handling
  ivshmem: remove unnecessary dup()
  ivshmem: remove superflous ivshmem_attr field
  ivshmem: remove useless doorbell field
  ivshmem: more qdev conversion
  ivshmem: remove last exit(1)
  ivshmem: limit maximum number of peers to G_MAXUINT16
  ivshmem: simplify around increase_dynamic_storage()
  ivshmem: allocate eventfds in resize_peers()
  ivshmem: remove useless ivshmem_update_irq() val argument
  ivshmem: initialize max_peer to -1
  ivshmem: remove max_peer field
  ivshmem: improve debug messages
  ivshmem: improve error handling
  ivshmem: print error on invalid peer id
  ivshmem: simplify a bit the code
  ivshmem: use common return
  ivshmem: use common is_power_of_2()
  ivshmem: migrate with VMStateDescription
  ivshmem: shmfd can be 0
  ivshmem: check shm isn't already initialized
  ivshmem: add device description
  ivshmem: fix pci_ivshmem_exit()
  ivshmem: replace 'guest' for 'peer' appropriately
  ivshmem: error on too many eventfd received
  ivshmem: reset mask on device reset
  util: const event_notifier_get_fd() argument
  ivshmem-client: check the number of vectors
  ivshmem-server: use a uint16 for client ID
  ivshmem-server: fix hugetlbfs support
  contrib: remove unnecessary strdup()
  msix: implement pba write (but read-only)
  qtest: add qtest_add_abrt_handler()
  tests: add ivshmem qtest
  ivshmem: do not keep shm_fd open
  ivshmem: use qemu_strtosz()
  ivshmem: add hostmem backend
  ivshmem: remove EventfdEntry.vector
  ivshmem: rename MSI eventfd_table
  ivshmem: use kvm irqfd for msi notifications
  ivshmem: use little-endian int64_t for the protocol
  doc: document ivshmem & hugepages

 Makefile|   7 +
 Makefile.objs   |   5 +
 configure   |   1 +
 contrib/ivshmem-client/Makefile.objs|   1 +
 contrib/ivshmem-client/ivshmem-client.c | 446
++
 contrib/ivshmem-client/ivshmem-client.h | 213 +
 contrib/ivshmem-client/main.c   | 240 +++
 contrib/ivshmem-server/Makefile.objs|   1 +
 contrib/ivshmem-server/ivshmem-server.c | 491
+
 contrib/ivshmem-server/ivshmem-server.h | 167 +
 contrib/ivshmem-server/main.c   | 263 
 default-configs/pci.mak |   2 +-
 docs/specs/ivshmem_device_spec.txt  | 127 +++---
 hw/misc/ivshmem.c   | 837

 hw/pci/msix.c   |   6 +
 include/hw/misc/ivshmem.h   |  25 ++
 include/hw/pci/msix.h   |  16 +-
 include/qemu/event_notifier.h   |   2 +-
 include/sysemu/char.h   |  10 +-
 qemu-char.c |   9 +-
 qemu-doc.texi   |  23 +-
 tests/Makefile  |   3 +
 tests/ivshmem-test.c| 491
+
 tests/libqtest.c|  37 ++-
 tests/libqtest.h|   2 +
 util/event_notifier-posix.c |   2 +-
 26 files changed, 3106 insertions(+), 321 deletions(-)
 create mode 100644 contrib/ivshmem-client/Makefile.objs
 create mode 100644 contrib/ivshmem-client/ivshmem-client.c
 create mode 100644 contrib/ivshmem-client/ivshmem-client.h
 create mode 100644 contrib/ivshmem-client/main.c
 create mode 100644 contrib/ivshmem-server/Makefile.objs
 create mode 100644 contrib/ivshmem-server/ivshmem-server.c
 create mode 100644 contrib/ivshmem-server/ivshmem-server.h
 create mode 100644 contrib/ivshmem-server/main.c
 create mode 100644 include/hw/misc/ivshmem.h
 

Re: [Qemu-devel] Reminder: we're now in softfreeze

2015-10-26 Thread Mark Cave-Ayland
On 26/10/15 10:01, Peter Maydell wrote:

> On 26 October 2015 at 09:56, Mark Cave-Ayland
>  wrote:
>> Maybe we also need a [PING for-2.5] or similar? I've just sent another
>> couple of pings for patches (one of which fixes the broken
>> analyze-migration.py script) which have fallen through the cracks over
>> the past couple of months. It would be nice to have an easy way to flag
>> outstanding patches like this, especially during freeze when things get
>> really busy.
> 
> That's what I'm hoping the wiki page will work for. Unlike the
> mailing list, it's very short and easy to scan through :-)

Got it. I'll go add some entries later.


ATB,

Mark.




Re: [Qemu-devel] [PATCH v2 0/2] Enforce gaps between DIMMs

2015-10-26 Thread Michael S. Tsirkin
On Mon, Oct 26, 2015 at 01:30:36PM +0530, Bharata B Rao wrote:
> The suggested way to work around the virtio bug reported here
> 
> http://lists.nongnu.org/archive/html/qemu-devel/2015-08/msg00522.html
> 
> is to introduce gaps between DIMMs. Igor's patchset changes the pc-dimm
> auto-address assignment to introduce gaps and ues the same from pc memhp.
> This patchset does the same for sPAPR PowerPC.
> 
> Before introducing the gap, ensure that memory hotplug region has enough
> room for alignment adjustment. We accommodate a max alignment of 256MB for
> each slot since sPAPR memory hotplug enforces an alignment requirement of
> 256MB on RAM size, maxmem and NUMA node mem sizes.
> 
> This applies on David's spapr-next branch.

This is already creating problems on x86.


Instead of propagating this all over the place,
let's fix things properly to either

- handle requests that cross the DIMMs
or
- make sure guest physically contigious implies host virtually
  contigious

> Changes in v2
> -
> - Minor rewording of patch description and code comment in 1/2.
> 
> v1: http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg02414.html
> v0: https://lists.gnu.org/archive/html/qemu-devel/2015-10/msg00749.html
> 
> Bharata B Rao (2):
>   spapr: Accommadate alignment gaps in hotplug memory region
>   spapr: Force gaps between DIMM's GPA
> 
>  hw/ppc/spapr.c | 23 ++-
>  1 file changed, 18 insertions(+), 5 deletions(-)
> 
> -- 
> 2.1.0



Re: [Qemu-devel] [PATCH v2] pc: memhp: enforce minimal 128Mb alignment for pc-dimm

2015-10-26 Thread Michael S. Tsirkin
On Mon, Oct 26, 2015 at 10:46:55AM +0100, Igor Mammedov wrote:
> commit aa8580cd "pc: memhp: force gaps between DIMM's GPA"
> regressed memory hot-unplug for linux guests triggering
> following BUGON
>  =
>  kernel BUG at mm/memory_hotplug.c:703!
>  ...
>  [] acpi_memory_device_remove+0x79/0xa5
>  [] acpi_bus_trim+0x5a/0x8d
>  [] acpi_device_hotplug+0x1b7/0x418
>  ===
> BUG_ON(phys_start_pfn & ~PAGE_SECTION_MASK);
>  ===
> 
> reson for it is that x86-64 linux guest supports memory
> hotplug in chunks of 128Mb and memory section also should
> be 128Mb aligned.
> However gaps forced between 128Mb DIMMs with backend's
> natural alignment of 2Mb make the 2nd and following
> DIMMs not being aligned on 128Mb boundary as it was
> originally. To fix regression enforce minimal 128Mb
> alignment like it was done for PPC.
> 
> Signed-off-by: Igor Mammedov 

So our temporary work around is creating more trouble.  I'm inclined to just
revert aa8580cd and df0acded19 with it.

> ---
> PS:
>   PAGE_SECTION_MASK is derived from SECTION_SIZE_BITS which
>   is arch dependent so this is fix for x86-64 target only.
>   If anyone cares obout 32bit guests, it should also be fine
>   for x86-32 which has 64Mb memory sections/alignment.

Like 32 bit guests are unheard of?  This does not inspire confidence at all.


So I dug in linux guest code:

#ifdef CONFIG_X86_32
# ifdef CONFIG_X86_PAE
#  define SECTION_SIZE_BITS 29
#  define MAX_PHYSADDR_BITS 36
#  define MAX_PHYSMEM_BITS  36
# else
#  define SECTION_SIZE_BITS 26
#  define MAX_PHYSADDR_BITS 32
#  define MAX_PHYSMEM_BITS  32
# endif
#else /* CONFIG_X86_32 */
# define SECTION_SIZE_BITS  27 /* matt - 128 is convenient right now */
# define MAX_PHYSADDR_BITS  44
# define MAX_PHYSMEM_BITS   46
#endif

Looks like PAE needs more alignment.
And it looks like 128 is arbitrary here.

So we are tying ourselves to specific guest quirks.
All this just looks wrong to me.


> ---
>  hw/i386/pc.c | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 3d958ba..0f7cf7c 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -1610,6 +1610,8 @@ void ioapic_init_gsi(GSIState *gsi_state, const char 
> *parent_name)
>  }
>  }
>  
> +#define PC_MIN_DIMM_ALIGNMENT (1ULL << 27) /* 128Mb */
> +

This kind of comment doesn't really help.

>  static void pc_dimm_plug(HotplugHandler *hotplug_dev,
>   DeviceState *dev, Error **errp)
>  {
> @@ -1624,6 +1626,16 @@ static void pc_dimm_plug(HotplugHandler *hotplug_dev,
>  
>  if (memory_region_get_alignment(mr) && pcms->enforce_aligned_dimm) {
>  align = memory_region_get_alignment(mr);
> +/*
> + * Linux x64 guests expect 128Mb aligned DIMM,

this implies no other guest cares. which isn't true.

> + * but this change

which change?

> causes memory layout change

change compared to what?

> so
> + * for compatibility

compatibility with what?

> apply 128Mb alignment only
> + * when forced gaps are enabled since it is the cause
> + * of misalignment.

Which makes no sense, sorry.

Can it be misaligned for some other reason?

If not, why limit to this case?

> + */
> +if (pcmc->inter_dimm_gap && align < PC_MIN_DIMM_ALIGNMENT) {
> +align = PC_MIN_DIMM_ALIGNMENT;
> +}
>  }
>  
>  if (!pcms->acpi_dev) {

All this sounds pretty fragile. How about we revert the inter dimm gap
thing for 2.4? It's just a work around, this is piling work arounds on
top of work arounds.

> -- 
> 1.8.3.1



Re: [Qemu-devel] [PATCH 3/4] ide: add support for cancelable read requests

2015-10-26 Thread Stefan Hajnoczi
On Mon, Oct 12, 2015 at 02:27:24PM +0200, Peter Lieven wrote:
> this patch adds a new aio readv compatible function which copies
> all data through a bounce buffer. The benefit is that these requests
> can be flagged as canceled to avoid guest memory corruption when
> a canceled request is completed by the backend at a later stage.
> 
> If an IDE protocol wants to use this function it has to pipe
> all read requests through ide_readv_cancelable and it may then
> enable requests_cancelable in the IDEState.
> 
> If this state is enable we can avoid the blocking blk_drain_all
> in case of a BMDMA reset.
> 
> Currently only read operations are cancelable thus we can only
> use this logic for read-only devices.

Naming is confusing here.  Requests are already "cancelable" using
bdv_aio_cancel().

Please use a different name, for example "orphan" requests.  These are
requests that QEMU still knows about but the guest believes are
complete.  Or maybe "IDEBufferedRequest" since data is transferred
through a bounce buffer.

> Signed-off-by: Peter Lieven 
> ---
>  hw/ide/core.c | 54 ++
>  hw/ide/internal.h | 16 
>  hw/ide/pci.c  | 42 --
>  3 files changed, 98 insertions(+), 14 deletions(-)
> 
> diff --git a/hw/ide/core.c b/hw/ide/core.c
> index 317406d..24547ce 100644
> --- a/hw/ide/core.c
> +++ b/hw/ide/core.c
> @@ -561,6 +561,59 @@ static bool ide_sect_range_ok(IDEState *s,
>  return true;
>  }
>  
> +static void ide_readv_cancelable_cb(void *opaque, int ret)
> +{
> +IDECancelableRequest *req = opaque;
> +if (!req->canceled) {
> +if (!ret) {
> +qemu_iovec_from_buf(req->org_qiov, 0, req->buf, 
> req->org_qiov->size);
> +}
> +req->org_cb(req->org_opaque, ret);
> +}
> +QLIST_REMOVE(req, list);
> +qemu_vfree(req->buf);
> +qemu_iovec_destroy(>qiov);
> +g_free(req);
> +}
> +
> +#define MAX_CANCELABLE_REQS 16
> +
> +BlockAIOCB *ide_readv_cancelable(IDEState *s, int64_t sector_num,
> + QEMUIOVector *iov, int nb_sectors,
> + BlockCompletionFunc *cb, void *opaque)
> +{
> +BlockAIOCB *aioreq;
> +IDECancelableRequest *req;
> +int c = 0;
> +
> +QLIST_FOREACH(req, >cancelable_requests, list) {
> +c++;
> +}
> +if (c > MAX_CANCELABLE_REQS) {
> +return NULL;
> +}

A BH is probably needed here to schedule an cb(-EIO) call since this
function isn't supposed to return NULL if it's a direct replacement for
blk_aio_readv().

> +
> +req = g_new0(IDECancelableRequest, 1);
> +qemu_iovec_init(>qiov, 1);

It saves a g_new() call if you add a struct iovec field to
IDECancelableRequest and use qemu_iovec_init_external() instead of
qemu_iovec_init().

The qemu_iovec_destroy() calls must be dropped when an external struct
iovec is used.

The qemu_iovec_init_external() call must be moved after the
qemu_blockalign() and struct iovec setup below.

> +req->buf = qemu_blockalign(blk_bs(s->blk), iov->size);
> +qemu_iovec_add(>qiov, req->buf, iov->size);
> +req->org_qiov = iov;
> +req->org_cb = cb;
> +req->org_opaque = opaque;
> +
> +aioreq = blk_aio_readv(s->blk, sector_num, >qiov, nb_sectors,
> +   ide_readv_cancelable_cb, req);
> +if (aioreq == NULL) {
> +qemu_vfree(req->buf);
> +qemu_iovec_destroy(>qiov);
> +g_free(req);
> +} else {
> +QLIST_INSERT_HEAD(>cancelable_requests, req, list);
> +}
> +
> +return aioreq;
> +}
> +
>  static void ide_sector_read(IDEState *s);
>  
>  static void ide_sector_read_cb(void *opaque, int ret)
> @@ -805,6 +858,7 @@ void ide_start_dma(IDEState *s, BlockCompletionFunc *cb)
>  s->bus->retry_unit = s->unit;
>  s->bus->retry_sector_num = ide_get_sector(s);
>  s->bus->retry_nsector = s->nsector;
> +s->bus->s = s;

How is 's' different from 'unit' and 'retry_unit'?

The logic for switching between units is already a little tricky since
the guest can write to the hardware registers while requests are
in-flight.

Please don't duplicate "active unit" state, that increases the risk of
inconsistencies.

Can you use idebus_active_if() to get an equivalent IDEState pointer
without storing s?

>  if (s->bus->dma->ops->start_dma) {
>  s->bus->dma->ops->start_dma(s->bus->dma, s, cb);
>  }
> diff --git a/hw/ide/internal.h b/hw/ide/internal.h
> index 05e93ff..ad188c2 100644
> --- a/hw/ide/internal.h
> +++ b/hw/ide/internal.h
> @@ -343,6 +343,16 @@ enum ide_dma_cmd {
>  #define ide_cmd_is_read(s) \
>   ((s)->dma_cmd == IDE_DMA_READ)
>  
> +typedef struct IDECancelableRequest {
> +QLIST_ENTRY(IDECancelableRequest) list;
> +QEMUIOVector qiov;
> +uint8_t *buf;
> +QEMUIOVector *org_qiov;
> +BlockCompletionFunc *org_cb;
> +void *org_opaque;

Please don't shorten names, 

Re: [Qemu-devel] [RFC PATCH v3 1/4] hw/intc/arm_gicv3_common: Add state information

2015-10-26 Thread Peter Maydell
On 26 October 2015 at 07:43, Pavel Fedin  wrote:
>  Hello!
>
>  I skipped many of comments because they are straightforward to address, 
> decided to discuss only important ones.
>
>> So we're going to (potentially) emulate:
>>  * non-system-register config
>>  * non-affinity-routing config
>>
>> ? OK, but are we really sure we want to do that? Legacy config
>> is deprecated and it's really going to complicate the model...
>
>  I remember that you told me that we are going to emulate full-blown
> GICv3, with HYP support and will all legacy stuff. You told me about
> this while merging the initial GICv3 series, so we reserved MMIO space
> for legacy CPU interface. So, i was pretty confident that we are going
> to do this over time, aren't we?

Yeah, you're right -- we should at least leave ourselves the room
to do that.

>> (Are we starting out with the non-legacy config that forces
>> system-regs and affinity-routing to always-on?)
>
>  Yes, we are, but i also remember that you told me that migration data
> format, once implemented, is set in stone, so we should think very well.

You'll see from a comment I made in a later patch that we can have
optional migration subsections, so we can have the data which is
only needed in legacy configs be in its own subsection which won't
break migration for non-legacy setups. It's the non-legacy used by
KVM format we really need to get right from the start.

> (*) So far, i added also the following legacy stuff:
>
> 1. GICC_CTLR
>
>  This is currently used to store GRPEN bits. I thought that having
> dedicated bool's for them is too much, they are just single bits,
> and they have to be mirrored in GICC_CTLR, once implemented. So i
> just squashed them in there from the beginning. Additionally, some
> of its bits, like FIQBypDisGrpX and IRQBypDisGrpX, do not have
> analogs in system registers. So, if we even implement them, we'll
> have to store them in dedicated place, and now we already have this
> place.

The only reason not to want to do this is that it means we have
legacy state in the not-legacy parts of the migration struct...

> 2. GIC_SPENDSGIR (**)
>
>  Actually this is not used by in-kernel vGIC, but this is necessary
> for SW emulation. Because, as far as i can understand Shlomo's code,
> for software emulation we need to track down which source CPUs are
> sending SGIs, even if we don't emulate legacy interface. Because,
> for example, if two CPUs send an SGI to another CPU at the same time,
> the destination CPU should actually get two interrupts. So, we have
> to track down whose interrupts are already delivered and whose are
> not yet. And Shlomo's code uses a bitmask for that, which i put in
> GICv3SGISource.

I haven't looked into the details but this suggests to me that the
emulation code is doing something wrong. The GICv3 registers should
expose (one way or another) all the state in hardware, and a sw
emulation version should not need any extra state in order to behave
correctly.

>> > +struct GICv3SGISource {
>> > +/* For each SGI on the target CPU, we store bit mask
>> > + * indicating which source CPUs have made this SGI
>> > + * pending on the target CPU. These correspond to
>> > + * the bytes in the GIC_SPENDSGIR* registers as
>> > + * read by the target CPU.
>> > + */
>> > +unsigned long *pending;
>> > +int32_t size; /* Bitmap size for migration */
>> > +};
>>
>> This doesn't look right. There is one GICD_SPENDSGIR* register set
>> for each CPU, but each CPU has only 8 pending bits per SGI.
>> (That's because this is only relevant for legacy operation
>> with affinity-routing disabled, and the limitations on
>> legacy operation apply.) So you don't need a huge bitmap here.
>> I would default to modelling this as uint32_t spendsgir[4]
>> unless there's a good reason to do something else.
>
>  This is about (**). Or do you want to tell that GICv3 with affinity
> routing enabled simply doesn't care about source CPUs, and if several
> CPUs trigger the same SGI for the same destination, the destination
> gets only one SGI?

There is no hidden state in the GIC. If a non-legacy GICv3 has
no register state to store information about source CPUs, then
it cannot care about source CPUs. (For instance, note that the
OS is not passed info about SGI source CPUs in the INTID bits
[12:10] on reads of GICC_IAR if affinity routing is enabled.)
One of the GICv2-to-GICv3+affinity changes is that in GICv3
SGIs were banked by both source and destination processor;
in GICv3+affinity-routing, they are banked only by destination
processor.

thanks
-- PMM



Re: [Qemu-devel] [PATCH] hw/usb/dev-audio.c: make USB audio card sound perfect

2015-10-26 Thread Stefan Hajnoczi
On Fri, Oct 16, 2015 at 09:54:12AM -0400, Programmingkid wrote:
> 
> On Oct 16, 2015, at 8:15 AM, Stefan Hajnoczi wrote:
> 
> > On Tue, Sep 22, 2015 at 07:32:01PM -0400, Programmingkid wrote:
> >> The USB audio card would not play audio well because its buffer was too 
> >> small. 
> >> Increasing it made it play perfectly. All the crackling and dropouts are 
> >> gone.  
> >> 
> >> Signed-off-by: John Arbuckle 
> >> 
> >> ---
> >> This patch was tested on qemu-system-ppc running Linux and qemu-system-i386
> >> running Windows XP. Windows XP sound output thru the USB audio card sounded
> >> perfect. Linux did improve in sound quality, but it still wasn't perfect. I
> >> think there are problems with the hcd-ohci.c file. The Mac OS 10.2 guest 
> >> in 
> >> qemu-system-ppc did not play sound due to USB issues unrelated to this 
> >> patch. 
> >> 
> >> hw/usb/dev-audio.c |3 ++-
> >> 1 files changed, 2 insertions(+), 1 deletions(-)
> >> 
> >> diff --git a/hw/usb/dev-audio.c b/hw/usb/dev-audio.c
> >> index f092bb8..e4e4989 100644
> >> --- a/hw/usb/dev-audio.c
> >> +++ b/hw/usb/dev-audio.c
> >> @@ -88,6 +88,7 @@ static const USBDescStrings usb_audio_stringtable = {
> >> #define USBAUDIO_PACKET_SIZE 192
> >> #define USBAUDIO_SAMPLE_RATE 48000
> >> #define USBAUDIO_PACKET_INTERVAL 1
> >> +#define BUFFER_MULTIPLIER32
> >> 
> >> static const USBDescIface desc_iface[] = {
> >> {
> >> @@ -664,7 +665,7 @@ static const VMStateDescription vmstate_usb_audio = {
> >> static Property usb_audio_properties[] = {
> >> DEFINE_PROP_UINT32("debug", USBAudioState, debug, 0),
> >> DEFINE_PROP_UINT32("buffer", USBAudioState, buffer,
> >> -   8 * USBAUDIO_PACKET_SIZE),
> >> +   BUFFER_MULTIPLIER * USBAUDIO_PACKET_SIZE),
> > 
> > I'm not familiar with the code but I guess this also increases audio
> > latency by a factor of 4 (8 -> 32).
> 
> Didn't hear any problems. When I tried it out. 

8 buffers * 192 bytes / 3 byte (24-bit) mono samples = 512 samples
32 buffers * 192 bytes / 3 byte (24-bit) mono samples = 2048 samples

At 48 kHz sample rate that is 10.6 milliseconds vs 42.6 milliseconds
latency.

I have never tried real-time audio processing under QEMU but when I use
Linux for guitar effects it becomes noticable when latency is above
around 12 milliseconds.  ~5 milliseconds latency with USB audio is
achievable on bare metal.

So this change would make real-time audio feel laggy.  Unless Gerd has a
strong feeling that it's an improvement for QEMU, I wouldn't merge this
change.

Stefan



Re: [Qemu-devel] [PATCH] virtio: Notice when the system doesn't support MSIx at all

2015-10-26 Thread Michael S. Tsirkin
On Mon, Oct 26, 2015 at 10:47:48AM +, Mark Cave-Ayland wrote:
> On 26/10/15 10:14, Peter Maydell wrote:
> 
> > On 26 October 2015 at 10:09, Mark Cave-Ayland
> >  wrote:
> >> On 26/10/15 10:03, Michael S. Tsirkin wrote:
> >>> Pls address the issues if you want this applied.
> >>
> >> The patch came from Richard rather than myself - my only involvement has
> >> been to champion the patch since both myself and Peter have had reports
> >> of this message confusing users.
> > 
> > Isn't this patch already in master as commit 0d583647a7fc73f6 ?
> 
> Oh I think you're right! I had that thread tagged and hadn't seen a
> reply come through, but I guess Richard must have attached it to one of
> his recent pull requests.

I did. I don't do replies, I merge patches then change my mind and drop
them too many times to make "applied thanks" work.  As I send pulls
about once a week, and Cc contributors, you only need to watch these to
know whether stuff is merged.


> In that case apologies to all for the noise.
> 
> 
> ATB,
> 
> Mark.



Re: [Qemu-devel] [RFC PATCH v3 3/4] hw/intc/arm_gicv3_kvm: Implement get/put functions

2015-10-26 Thread Peter Maydell
On 26 October 2015 at 07:59, Pavel Fedin  wrote:
>  Hello!
>
>> > +reg = c->pendbaser & (GICR_PENDBASER_OUTER_CACHEABILITY_MASK |
>> > +  GICR_PENDBASER_ADDR_MASK |
>> > +  GICR_PENDBASER_SHAREABILITY_MASK |
>> > +  GICR_PENDBASER_CACHEABILITY_MASK);
>> > +if (!c->redist_ctlr & GICR_CTLR_ENABLE_LPIS) {
>> > +reg |= GICR_PENDBASER_PTZ;
>> > +}
>>
>> Why does the state of the pendbaser register depend on state in the
>> redist_ctlr ?
>
>  PTZ bit is write-only, we cannot read it back. And spec says that setting 
> PTZ is adviced while LPIs are not enabled, because it shortens down the time 
> of GIC initialization. So, i had to implement this small heuristics here. Is 
> this approach OK?

OK, with a comment to say that's what we're doing. (I assume that
when we support LPIs we'll then set PTZ appropriately, so this
code will change later.)

>> Worth a comment, whatever the answer is.
>
>  I will.
>
>> > +kvm_gicr_access(s, GICR_PENDBASER, ncpu, , false);
>> > +c->pendbaser = reg & (GICR_PENDBASER_OUTER_CACHEABILITY_MASK |
>> > +  GICR_PENDBASER_ADDR_MASK |
>> > +  GICR_PENDBASER_SHAREABILITY_MASK |
>> > +  GICR_PENDBASER_CACHEABILITY_MASK);
>>
>> Why do we need to mask these values?
>
>  I decided to do this at least for the case of KVM->TCG migration (as far as 
> i understand, such things are possible). In this case i think we should not 
> pollute our state with read-only bits, which get added by the emulation code 
> itself.

We don't do this for other system registers which might contain
RO bits, so I think for consistency we shouldn't mask bits out
here either.

(Transferring RO bits in migration state gives the destination
an opportunity in theory to reject a migration which is for
a config it can't handle. And reserved bits may end up having a
use in a future GIC version, so it's nice not to have to do a
QEMU update just to remove them from the mask.)

thanks
-- PMM



Re: [Qemu-devel] [PATCH 15/19] pc: acpi: bump DSDT revision compliance to v2

2015-10-26 Thread Michael S. Tsirkin
On Mon, Oct 26, 2015 at 12:06:51PM +0100, Igor Mammedov wrote:
> On Mon, 26 Oct 2015 12:07:50 +0200
> "Michael S. Tsirkin"  wrote:
> 
> > On Mon, Oct 26, 2015 at 11:03:01AM +0100, Igor Mammedov wrote:
> > > On Sat, 24 Oct 2015 22:40:59 +0300
> > > "Michael S. Tsirkin"  wrote:
> > > 
> > > > On Fri, Oct 23, 2015 at 04:57:18PM +0200, Igor Mammedov wrote:
> > > > > it turns on 64-bit integer handling in OSPM, which we could use
> > > > > for writing simpler/smaller AML code.
> > > > > Tested with Windows XP and Windows Server 2008, Linux:
> > > > >  * XP doesn't care about revision and continues to use 32 integers
> > > > >and boots just fine with this change.
> > > > >  * WS 2008 and Linux - support rev2 and use 64-bit integers
> > > > > 
> > > > > Signed-off-by: Igor Mammedov 
> > > > 
> > > > This is still planned, right? IIUC you didn't post any code
> > > > that needs the 64 bit math.
> > > nope, the next patch 16/19 uses 64-bit math,
> > > 
> > > it greatly simplifies _CRS as we don't have to do
> > > 64-bit math manually using 32-bit integers.
> > > 
> > > And even if we put new MHPD.MCRS() that uses 64-bit math in SSDT,
> > > it won't crash XP unless someone would try to do memory hotplug
> > > 
> > > and even it could be 'fixed' if we check _REV on every
> > > hotplug event, it's a bit ugly but should work.
> > 
> > Aha. That's exactly what I said. All that patch commit log
> > says is
> > pc: acpi: memhp: move MHPD.MCRS method into MHPT table
> > when in fact you also rework it all to use 64 bit math.
> yep, it's my fault. I'll fix it.

Don't fix the comment. Just do it the right way.

> > 
> > So pls don't do this. Pls start by rewriting ASL in C
> > while keeping AML code identical. Cleanups - next.
> That's what I'm trying to avoid in this case as 
> reworked code is much simpler than the original ASL.
> Rewriting original complex MHPD.MCRS() ASL into C and
> immediately replacing it with simplified version is
> rather counterproductive especially taking in account that
> 'make check' tests will fail anyway since code is moved
> between tables and C-produced AML doesn't exactly match
> blobs produced by a particular IASL version anyway.

So make it match.
It seems pretty robust ATM, since we disassemble and
only compare the dis-assembled output.
Let's keep that invariant.

> But I can certainly do/split it other way around,
> simplify ASL first and then rewrite result in C.
> That should make reviewing easier even if tests
> are broken.

So your untested (in the field, I trust you to test your code) C matches
untested ASL. This doesn't help too much.

Sorry, IMO that's the wrong way to prioritize things. I value ease of
testing and reviewing patches much higher than ease of writing them,
and stability higher than any individual feature.

The original migration into QEMU was completely seemless because of full
bit for bit compatibility with seabios.

I bent the rules when the stuff was first rewritten in C and this
introduced some regressions, but there was just too much demand for the
stuff, people had real trouble hacking in 3 languages (python/C/ASL),
and that was holding up multiple features so some instability was worth
it.

Nothing like this here, this is just general cleanup.
Let's do it the slow, safe way please.

> > 
> > > > 
> > > > > ---
> > > > >  hw/i386/acpi-build.c  | 2 +-
> > > > >  hw/i386/acpi-dsdt.dsl | 2 +-
> > > > >  hw/i386/q35-acpi-dsdt.dsl | 2 +-
> > > > >  3 files changed, 3 insertions(+), 3 deletions(-)
> > > > > 
> > > > > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> > > > > index 8add4d9..c929540 100644
> > > > > --- a/hw/i386/acpi-build.c
> > > > > +++ b/hw/i386/acpi-build.c
> > > > > @@ -1484,7 +1484,7 @@ build_dsdt(GArray *table_data, GArray *linker, 
> > > > > AcpiMiscInfo *misc)
> > > > >  
> > > > >  memset(dsdt, 0, sizeof *dsdt);
> > > > >  build_header(linker, table_data, dsdt, "DSDT",
> > > > > - misc->dsdt_size, 1);
> > > > > + misc->dsdt_size, 2);
> > > > >  }
> > > > >  
> > > > >  static GArray *
> > > > > diff --git a/hw/i386/acpi-dsdt.dsl b/hw/i386/acpi-dsdt.dsl
> > > > > index 8dba096..6d46b36 100644
> > > > > --- a/hw/i386/acpi-dsdt.dsl
> > > > > +++ b/hw/i386/acpi-dsdt.dsl
> > > > > @@ -22,7 +22,7 @@ ACPI_EXTRACT_ALL_CODE AcpiDsdtAmlCode
> > > > >  DefinitionBlock (
> > > > >  "acpi-dsdt.aml",// Output Filename
> > > > >  "DSDT", // Signature
> > > > > -0x01,   // DSDT Compliance Revision
> > > > > +0x02,   // DSDT Compliance Revision
> > > > >  "BXPC", // OEMID
> > > > >  "BXDSDT",   // TABLE ID
> > > > >  0x1 // OEM Revision
> > > > > diff --git a/hw/i386/q35-acpi-dsdt.dsl b/hw/i386/q35-acpi-dsdt.dsl
> > > > > index 7be7b37..ecefdec 100644
> > > > > --- a/hw/i386/q35-acpi-dsdt.dsl
> > > > > +++ b/hw/i386/q35-acpi-dsdt.dsl
> 

Re: [Qemu-devel] Coding style for errors

2015-10-26 Thread Stefan Hajnoczi
On Fri, Oct 23, 2015 at 07:34:50PM +0200, Lluís Vilanova wrote:
> Stefan Hajnoczi writes:
> 
> > On Thu, Oct 22, 2015 at 03:30:34PM +0200, Lluís Vilanova wrote:
> >> Markus Armbruster writes:
> >> 
> >> > Lluís Vilanova  writes:
> >> [...]
> >> >> So, is there any agreement on what should be used? If so, could that 
> >> >> please be
> >> >> added to CODING_STYLE?
> >> 
> >> > I think HACKING would be a better fit.
> >> 
> >> What about this? (at the end of HACKING) Feel free to add references to 
> >> other
> >> functions you think are important. I'll send a patch once we agree on the 
> >> text.
> >> 
> >> Cheers,
> >> Lluis
> >> 
> >> 
> >> 7. Error reporting
> 
> > Guest-triggerable errors should not terminate QEMU.  There are plently
> > of examples where this is violated today but there are good reasons to
> > stop doing it.
> 
> > Denial of service cases:
> 
> > 1. If a guest userspace application is somehow able to trigger a QEMU
> >abort, then an unprivileged guest application is able to bring down
> >the whole VM.
> 
> > 2. If nested virtualization is used, it's possible that a nested guest
> >can kill its parent, and thereby also kill its sibling VMs.
> 
> > 3. abort(3) is heavyweight if crash reporting/coredumps are enabled.  A
> >broken/malicious guest that keeps triggering abort(3) can be a big
> >nuisance that consumes memory, disk, and CPU resources.
> 
> > Emulated hardware should behave the same way that physical hardware
> > behaves.  This may mean that the device becomes non-operational (ignores
> > or fails new requests) until the next hard or soft reset.
> 
> I'm not sure how this should be translated into the form of error-reporting
> guidelines.

It needs to be close to the text you are adding since your text
mentioned immediate exits and abort(3).  People should be aware that
using those approaches can introduce security problems.

Stefan



Re: [Qemu-devel] Reminder: we're now in softfreeze

2015-10-26 Thread Amit Shah
Hey Peter,

On (Thu) 22 Oct 2015 [11:30:57], Peter Maydell wrote:
> [Apologies for the huge cc list, which is basically "everybody
> I have accepted a pullreq from this release cycle.]
> 
> Just a reminder that we're now in softfreeze (ie "no new big
> features, start the process of cutting down towards only bugfixes
> going into master"). Hardfreeze will be on the 12th November,
> in three weeks' time. If submaintainers can aim to get pull
> requests in early rather than all on the 12th that would be
> nice, otherwise we get a big logjam on rc0 day and your
> chances of not getting new-feature code in go up sharply.

Juan and I hope to get the postcopy series in 2.5; it's been on the
list for a long while and it's mostly reviewed-by.  There are a few
patches that need reviews from the latest series (minor, but still
need reviewed-by).

Hope this meets the freeze rules.

Thanks,

Amit



[Qemu-devel] [PATCH 0/9] block: Fixes for bdrv_drain

2015-10-26 Thread Fam Zheng
Previously bdrv_drain and bdrv_drain_all don't handle ioctl, flush and discard
requests (which are fundamentally the same as read and write requests that
change disk state).  Forgetting such requests leaves us in risk of violating
the invariant that bdrv_drain() callers rely on - all asynchronous requests
must have completed after bdrv_drain returns.

Enrich the tracked request types, and add tracked_request_begin/end pairs to
all three code paths. As a prerequisite, ioctl code is moved into coroutine
too.

The last two patches take care of QED's "need check" timer, so that after
bdrv_drain returns, the driver is in a consistent state.

Fam


Fam Zheng (9):
  block: Add more types for tracked request
  block: Track flush requests
  block: Track discard requests
  iscsi: Emulate commands in iscsi_aio_ioctl as iscsi_ioctl
  block: Add ioctl parameter fields to BlockRequest
  block: Emulate bdrv_ioctl with bdrv_aio_ioctl and track both
  block: Drop BlockDriver.bdrv_ioctl
  block: Introduce BlockDriver.bdrv_drain callback
  qed: Implement .bdrv_drain

 block/io.c| 144 +++---
 block/iscsi.c |  72 ---
 block/qed.c   |  13 +
 block/raw-posix.c |   9 ---
 block/raw_bsd.c   |   6 --
 include/block/block.h |  16 --
 include/block/block_int.h |  17 +-
 7 files changed, 200 insertions(+), 77 deletions(-)

-- 
2.4.3




[Qemu-devel] [PATCH 1/9] block: Add more types for tracked request

2015-10-26 Thread Fam Zheng
We'll track more request types besides read and write, change the
boolean field to an enum.

Signed-off-by: Fam Zheng 
---
 block/io.c|  9 +
 include/block/block_int.h | 10 +-
 2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/block/io.c b/block/io.c
index 5311473..ca0859b 100644
--- a/block/io.c
+++ b/block/io.c
@@ -344,13 +344,14 @@ static void tracked_request_end(BdrvTrackedRequest *req)
 static void tracked_request_begin(BdrvTrackedRequest *req,
   BlockDriverState *bs,
   int64_t offset,
-  unsigned int bytes, bool is_write)
+  unsigned int bytes,
+  enum BdrvTrackedRequestType type)
 {
 *req = (BdrvTrackedRequest){
 .bs = bs,
 .offset = offset,
 .bytes  = bytes,
-.is_write   = is_write,
+.type   = type,
 .co = qemu_coroutine_self(),
 .serialising= false,
 .overlap_offset = offset,
@@ -967,7 +968,7 @@ static int coroutine_fn bdrv_co_do_preadv(BlockDriverState 
*bs,
 bytes = ROUND_UP(bytes, align);
 }
 
-tracked_request_begin(, bs, offset, bytes, false);
+tracked_request_begin(, bs, offset, bytes, BDRV_TRACKED_READ);
 ret = bdrv_aligned_preadv(bs, , offset, bytes, align,
   use_local_qiov ? _qiov : qiov,
   flags);
@@ -1286,7 +1287,7 @@ static int coroutine_fn 
bdrv_co_do_pwritev(BlockDriverState *bs,
  * Pad qiov with the read parts and be sure to have a tracked request not
  * only for bdrv_aligned_pwritev, but also for the reads of the RMW cycle.
  */
-tracked_request_begin(, bs, offset, bytes, true);
+tracked_request_begin(, bs, offset, bytes, BDRV_TRACKED_WRITE);
 
 if (!qiov) {
 ret = bdrv_co_do_zero_pwritev(bs, offset, bytes, flags, );
diff --git a/include/block/block_int.h b/include/block/block_int.h
index a480f94..936a3fc 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -59,11 +59,19 @@
 
 #define BLOCK_PROBE_BUF_SIZE512
 
+enum BdrvTrackedRequestType {
+BDRV_TRACKED_READ,
+BDRV_TRACKED_WRITE,
+BDRV_TRACKED_FLUSH,
+BDRV_TRACKED_IOCTL,
+BDRV_TRACKED_DISCARD,
+};
+
 typedef struct BdrvTrackedRequest {
 BlockDriverState *bs;
 int64_t offset;
 unsigned int bytes;
-bool is_write;
+enum BdrvTrackedRequestType type;
 
 bool serialising;
 int64_t overlap_offset;
-- 
2.4.3




[Qemu-devel] [PATCH 2/9] block: Track flush requests

2015-10-26 Thread Fam Zheng
Both bdrv_flush and bdrv_aio_flush eventually call bdrv_co_flush, add
tracked_request_begin and tracked_request_end pair in that function so
that all flush requests are now tracked.

Signed-off-by: Fam Zheng 
---
 block/io.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/block/io.c b/block/io.c
index ca0859b..223c4e9 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2309,18 +2309,20 @@ static void coroutine_fn bdrv_flush_co_entry(void 
*opaque)
 int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
 {
 int ret;
+BdrvTrackedRequest req;
 
 if (!bs || !bdrv_is_inserted(bs) || bdrv_is_read_only(bs) ||
 bdrv_is_sg(bs)) {
 return 0;
 }
 
+tracked_request_begin(, bs, 0, 0, BDRV_TRACKED_FLUSH);
 /* Write back cached data to the OS even with cache=unsafe */
 BLKDBG_EVENT(bs->file, BLKDBG_FLUSH_TO_OS);
 if (bs->drv->bdrv_co_flush_to_os) {
 ret = bs->drv->bdrv_co_flush_to_os(bs);
 if (ret < 0) {
-return ret;
+goto out;
 }
 }
 
@@ -2360,14 +2362,17 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
 ret = 0;
 }
 if (ret < 0) {
-return ret;
+goto out;
 }
 
 /* Now flush the underlying protocol.  It will also have BDRV_O_NO_FLUSH
  * in the case of cache=unsafe, so there are no useless flushes.
  */
 flush_parent:
-return bs->file ? bdrv_co_flush(bs->file->bs) : 0;
+ret = bs->file ? bdrv_co_flush(bs->file->bs) : 0;
+out:
+tracked_request_end();
+return ret;
 }
 
 int bdrv_flush(BlockDriverState *bs)
-- 
2.4.3




[Qemu-devel] [PATCH 5/9] block: Add ioctl parameter fields to BlockRequest

2015-10-26 Thread Fam Zheng
The two fields that will be used by ioctl handling code later are added
as union, because it's used exclusively by ioctl code which dosn't need
the four fields in the other struct of the union.

Signed-off-by: Fam Zheng 
---
 include/block/block.h | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 84f05ad..b3d55aa 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -340,10 +340,18 @@ void bdrv_aio_cancel_async(BlockAIOCB *acb);
 
 typedef struct BlockRequest {
 /* Fields to be filled by multiwrite caller */
-int64_t sector;
-int nb_sectors;
-int flags;
-QEMUIOVector *qiov;
+union {
+struct {
+int64_t sector;
+int nb_sectors;
+int flags;
+QEMUIOVector *qiov;
+};
+struct {
+int req;
+void *buf;
+};
+};
 BlockCompletionFunc *cb;
 void *opaque;
 
-- 
2.4.3




Re: [Qemu-devel] [Qemu-block] Dynamic reconfiguration

2015-10-26 Thread Markus Armbruster
Wen Congyang  writes:

> On 10/21/2015 04:27 PM, Markus Armbruster wrote:
[...]
>> Can we phrase the operation differently?  Instead of "insert between A
>> and B (silently replacing everything that is now between A and B)",
>> say one of
>> 
>> 1a. Replace node A by A <- blkdebug
>> 
>> 1b. Replace node B by blkdebug <- B
>> 
>> 2a. Replace edge A <- B by <- blkdebug <-
>> Impossible in the current state, because there is no such edge.
>
> What does 'edge' mean?

It's graph terminology: the BB and BDS serve as the graph's nodes
(a.k.a. vertices), and the pointers connecting them serve as the graph's
edges.

[...]



[Qemu-devel] [PATCH v2] target-arm: Extract some external ARM CPU API

2015-10-26 Thread Pavel Fedin
Includes, which reside in target-arm, are very problematic to use from code
which is considered target-independent by the build system (for example,
hw/intc/something.c). It happens because they depend on config-target.h,
which is unreachable in this case. This creates problems for example for
GICv3 implementation, which needs to call define_arm_cp_regs_with_opaque()
in order to add system registers to the processor model, as well as play
with affinity IDs.

This patch solves the problem by extracting some self-sufficient
definitions into public area (include/hw/cpu).

Signed-off-by: Pavel Fedin 
---
v1 => v2:
- mp-affinity property addition left out, now a pure code move
- Move some more useful definitions (REGINFO_SENTINEL) and NOP accessors.
---
 include/hw/cpu/arm.h | 308 +++
 target-arm/cpu-qom.h |  40 +--
 target-arm/cpu.h | 239 +--
 3 files changed, 311 insertions(+), 276 deletions(-)
 create mode 100644 include/hw/cpu/arm.h

diff --git a/include/hw/cpu/arm.h b/include/hw/cpu/arm.h
new file mode 100644
index 000..de7bec7
--- /dev/null
+++ b/include/hw/cpu/arm.h
@@ -0,0 +1,308 @@
+/*
+ * QEMU ARM CPU
+ *
+ * Copyright (c) 2003 Fabrice Bellard
+ * Copyright (c) 2012 SUSE LINUX Products GmbH
+ * Copyright (c) 2015 Samsung Electronics Co. Ltd.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see
+ * 
+ */
+
+#ifndef HW_CPU_ARM_H
+#define HW_CPU_ARM_H
+
+#include "qom/cpu.h"
+
+#define TYPE_ARM_CPU "arm-cpu"
+
+#define ARM_CPU_CLASS(klass) \
+OBJECT_CLASS_CHECK(ARMCPUClass, (klass), TYPE_ARM_CPU)
+#define ARM_CPU(obj) \
+OBJECT_CHECK(ARMCPU, (obj), TYPE_ARM_CPU)
+#define ARM_CPU_GET_CLASS(obj) \
+OBJECT_GET_CLASS(ARMCPUClass, (obj), TYPE_ARM_CPU)
+
+/**
+ * ARMCPUClass:
+ * @parent_realize: The parent class' realize handler.
+ * @parent_reset: The parent class' reset handler.
+ *
+ * An ARM CPU model.
+ */
+typedef struct ARMCPUClass {
+/*< private >*/
+CPUClass parent_class;
+/*< public >*/
+
+DeviceRealize parent_realize;
+void (*parent_reset)(CPUState *cpu);
+} ARMCPUClass;
+
+/* These two are black boxes for us */
+typedef struct ARMCPU ARMCPU;
+typedef struct CPUARMState CPUARMState;
+
+/* ARMCPRegInfo type field bits. If the SPECIAL bit is set this is a
+ * special-behaviour cp reg and bits [15..8] indicate what behaviour
+ * it has. Otherwise it is a simple cp reg, where CONST indicates that
+ * TCG can assume the value to be constant (ie load at translate time)
+ * and 64BIT indicates a 64 bit wide coprocessor register. SUPPRESS_TB_END
+ * indicates that the TB should not be ended after a write to this register
+ * (the default is that the TB ends after cp writes). OVERRIDE permits
+ * a register definition to override a previous definition for the
+ * same (cp, is64, crn, crm, opc1, opc2) tuple: either the new or the
+ * old must have the OVERRIDE bit set.
+ * ALIAS indicates that this register is an alias view of some underlying
+ * state which is also visible via another register, and that the other
+ * register is handling migration and reset; registers marked ALIAS will not be
+ * migrated but may have their state set by syncing of register state from KVM.
+ * NO_RAW indicates that this register has no underlying state and does not
+ * support raw access for state saving/loading; it will not be used for either
+ * migration or KVM state synchronization. (Typically this is for "registers"
+ * which are actually used as instructions for cache maintenance and so on.)
+ * IO indicates that this register does I/O and therefore its accesses
+ * need to be surrounded by gen_io_start()/gen_io_end(). In particular,
+ * registers which implement clocks or timers require this.
+ */
+#define ARM_CP_SPECIAL 1
+#define ARM_CP_CONST 2
+#define ARM_CP_64BIT 4
+#define ARM_CP_SUPPRESS_TB_END 8
+#define ARM_CP_OVERRIDE 16
+#define ARM_CP_ALIAS 32
+#define ARM_CP_IO 64
+#define ARM_CP_NO_RAW 128
+#define ARM_CP_NOP (ARM_CP_SPECIAL | (1 << 8))
+#define ARM_CP_WFI (ARM_CP_SPECIAL | (2 << 8))
+#define ARM_CP_NZCV (ARM_CP_SPECIAL | (3 << 8))
+#define ARM_CP_CURRENTEL (ARM_CP_SPECIAL | (4 << 8))
+#define ARM_CP_DC_ZVA (ARM_CP_SPECIAL | (5 << 8))
+#define ARM_LAST_SPECIAL ARM_CP_DC_ZVA
+/* Used only as a terminator for ARMCPRegInfo lists */
+#define 

Re: [Qemu-devel] [v2 0/4] Fix long vm downtime during live migration

2015-10-26 Thread Amit Shah
On (Wed) 21 Oct 2015 [09:00:31], Li, Liang Z wrote:
> > > Some cleanup operations take long time during the pause and copy 
> > > stage, especially with the KVM patch 3ea3b7fa9af067, do these 
> > > operations after the completion of live migration can help to reduce 
> > > VM
> > downtime.
> > >
> > > Ony the first patch changes the behavior, the rest 3 patches are for 
> > > code cleanup.
> > >
> > > Changes:
> > >   * Remove qemu_savevm_sate_cancel() in migrate_fd_cleanup()
> > >   * Add 2 more patches for code clean up
> > 
> > Reviewed-by: Paolo Bonzini 
> 
> Resend this mail.

Hi Liang,

I'm looking at this patchset; I did take a look at it in the past and
I couldn't map everything; I need more time to look at everything
here.

Definitely a more detailed commit log or comments would've helped.

But there's already a reviewed-by from Paolo, so I fully expect this
patch to be merged in 2.5.

Amit



[Qemu-devel] [PATCH] pc: memhp: enforce minimal 128Mb alignment for pc-dimm

2015-10-26 Thread Igor Mammedov
commit aa8580cd "pc: memhp: force gaps between DIMM's GPA"
regressed memory hot-unplug for linux guests triggering
following BUGON
 =
 kernel BUG at mm/memory_hotplug.c:703!
 ...
 [] acpi_memory_device_remove+0x79/0xa5
 [] acpi_bus_trim+0x5a/0x8d
 [] acpi_device_hotplug+0x1b7/0x418
 ===
BUG_ON(phys_start_pfn & ~PAGE_SECTION_MASK);
 ===

reson for it is that x86-64 linux guest supports memory
hotplug in chunks of 128Mb and memory section also should
be 128Mb aligned.
However gaps forced between 128Mb DIMMs with backend's
natural alignment of 2Mb make the 2nd and following
DIMMs not being aligned on 128Mb boundary as it was
originally. To fix regression enforce minimal 128Mb
alignment like it was done for PPC.

Signed-off-by: Igor Mammedov 
---
 hw/i386/pc.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 3d958ba..cd68169 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1610,6 +1610,8 @@ void ioapic_init_gsi(GSIState *gsi_state, const char 
*parent_name)
 }
 }
 
+#define MIN_DIMM_ALIGNMENT (1ULL << 27) /* 128Mb */
+
 static void pc_dimm_plug(HotplugHandler *hotplug_dev,
  DeviceState *dev, Error **errp)
 {
@@ -1624,6 +1626,9 @@ static void pc_dimm_plug(HotplugHandler *hotplug_dev,
 
 if (memory_region_get_alignment(mr) && pcms->enforce_aligned_dimm) {
 align = memory_region_get_alignment(mr);
+if (pcmc->inter_dimm_gap && (align < MIN_DIMM_ALIGNMENT)) {
+align = MIN_DIMM_ALIGNMENT;
+}
 }
 
 if (!pcms->acpi_dev) {
-- 
1.8.3.1




Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi

2015-10-26 Thread Markus Armbruster
Valerio Aimale  writes:

> On 10/23/15 12:55 PM, Eduardo Habkost wrote:
>> On Thu, Oct 22, 2015 at 03:51:28PM -0600, Valerio Aimale wrote:
>>> On 10/22/15 3:47 PM, Eduardo Habkost wrote:
 On Thu, Oct 22, 2015 at 01:57:13PM -0600, Valerio Aimale wrote:
> On 10/22/15 1:12 PM, Eduardo Habkost wrote:
>> On Wed, Oct 21, 2015 at 12:54:23PM +0200, Markus Armbruster wrote:
>>> Valerio Aimale  writes:
>> [...]
 There's also a similar patch, floating around the internet, the uses
 shared memory, instead of sockets, as inter-process communication
 between libvmi and QEMU. I've never used that.
>>> By the time you built a working IPC mechanism on top of shared memory,
>>> you're often no better off than with AF_LOCAL sockets.
>>>
>>> Crazy idea: can we allocate guest memory in a way that support sharing
>>> it with another process?  Eduardo, can -mem-path do such wild things?
>> It can't today, but just because it creates a temporary file inside
>> mem-path and unlinks it immediately after opening a file descriptor. We
>> could make memory-backend-file also accept a full filename as argument,
>> or add a mechanism to let QEMU send the open file descriptor to a QMP
>> client.
>>
> Eduardo, would my "artisanal" idea of creating an mmap'ed image
> of the guest
> memory footprint work, augmented by Eric's suggestion of having the qmp
> client pass the filename?
 The code below doesn't make sense to me.
>>> Ok. What I am trying to do is to create a mmapped() memory area of the guest
>>> physical memory that can be shared between QEMU and an external process,
>>> such that the external process can read arbitrary locations of the qemu
>>> guest physical memory.
>>> In short, I'm using mmap MAP_SHARED to share the guest memory area with a
>>> process that is external to QEMU
>>>
>>> does it make better sense now?
>> I think you are confused about what mmap() does. It will create a new
>> mapping into the process address space, containing the data from an
>> existing file, not the other way around.
>>
> Eduardo, I think it would be a common rule of politeness not to pass
> any judgement on a person that you don't know, but for some texts in a
> mailing list. I think I understand how mmap() works, and very well.
>
> Participating is this discussion has been a struggle for me. For the
> good of the libvmi users, I have been trying to ignore the judgements,
> the comments and so on. But, alas, I throw my hands up in the air, and
> I surrender.

I'm sorry we exceeded your tolerance for frustration.  This mailing list
can be tough.  We try to be welcoming (believe it or not), but we too
often fail (okay, that part is easily believable).

To be honest, I had difficulties understanding your explanation, and
ended up guessing.  I figure Eduardo did the same, and guessed
incorrectly.  There but for the grace of God go I.

> I think libvmi can live, as it has for the past years, by patching the
> QEMU source tree on as needed basis, and keeping the patch in the
> libvmi source tree, without disturbing any further the QEMU community.

I'm sure libvmi can continue to require a patched QEMU, but I'm equally
sure getting its needs satisfied out of the box would be better for all.
To get that done, we need to understand the problem, and map the
solution space.

So let me try to summarize the thread, and what I've learned from it so
far.  Valerio, if you could correct misunderstandings, I'd be much
obliged.

LibVMI is a C library with Python bindings that makes it easy to monitor
the low-level details of a running virtual machine by viewing its
memory, trapping on hardware events, and accessing the vCPU registers.
This is called virtual machine introspection.
[Direct quote from http://libvmi.com/]

For that purpose, LibVMI needs (among other things) sufficiently fast
means to read and write guest memory.

Its existing solution is a non-upstream patch that adds a new, simple
protocol for reading and writing physical guest memory over TCP, and
monitor commands to start this service.

This thread is in reply to Valerio's attempt to upstream this patch.
Good move.

The usual questions for feature requests apply:

1. Is this a use case we want to serve?

   Unreserved yes.  Supporting virtual machine introspection with LibVMI
   makes sense.

2. Can it be served by existing guest introspection interfaces?

   Not quite clear, yet.

   LibVMI interacts with QEMU/KVM virtual machines via libvirt.  We want
   to be able to start and stop introspecting running virtual machines
   managed by libvirt.  Rules out solutions that require QEMU to be
   started with special command line options.  We really want QMP
   commands.  HMP commands would work, but HMP is not a stable
   interface.  It's fine for prototyping, of course.

   Interfaces discussed include the following monitor commands:

   * x, xp

 

[Qemu-devel] [PATCH 11/11] log: add "-d trace:PATTERN"

2015-10-26 Thread Denis V. Lunev
From: Paolo Bonzini 

This is a bit easier to use than "-trace" if you are also enabling
other kinds of logging.  It is also more discoverable for experienced
QEMU users, and accessible from user-mode emulators.

Signed-off-by: Paolo Bonzini 
Signed-off-by: Denis V. Lunev 
Reviewed-by: Christian Borntraeger 
---
 util/log.c | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/util/log.c b/util/log.c
index 5c641a0..2bcef95 100644
--- a/util/log.c
+++ b/util/log.c
@@ -19,6 +19,7 @@
 
 #include "qemu-common.h"
 #include "qemu/log.h"
+#include "trace/control.h"
 
 static char *logfilename;
 FILE *qemu_logfile;
@@ -154,6 +155,11 @@ int qemu_str_to_log_mask(const char *str)
 for (item = qemu_log_items; item->mask != 0; item++) {
 mask |= item->mask;
 }
+#ifdef CONFIG_TRACE_LOG
+} else if (strncmp(p, "trace:", 6) == 0 && p + 6 != p1) {
+trace_enable_events(p + 6);
+mask |= LOG_TRACE;
+#endif
 } else {
 for (item = qemu_log_items; item->mask != 0; item++) {
 if (cmp1(p, p1 - p, item->name)) {
@@ -161,9 +167,9 @@ int qemu_str_to_log_mask(const char *str)
 }
 }
 return 0;
+found:
+mask |= item->mask;
 }
-found:
-mask |= item->mask;
 if (*p1 != ',') {
 break;
 }
@@ -177,6 +183,10 @@ void qemu_print_log_usage(FILE *f)
 const QEMULogItem *item;
 fprintf(f, "Log items (comma separated):\n");
 for (item = qemu_log_items; item->mask != 0; item++) {
-fprintf(f, "%-10s %s\n", item->name, item->help);
+fprintf(f, "%-15s %s\n", item->name, item->help);
 }
+#ifdef CONFIG_TRACE_LOG
+fprintf(f, "trace:PATTERN   enable trace events\n");
+fprintf(f, "\nUse \"-d trace:help\" to get a list of trace events.\n\n");
+#endif
 }
-- 
2.1.4




[Qemu-devel] [PATCH 07/11] log: do not unnecessarily include qom/cpu.h

2015-10-26 Thread Denis V. Lunev
From: Paolo Bonzini 

Split the bits that require it to exec/log.h.

Signed-off-by: Paolo Bonzini 
Signed-off-by: Denis V. Lunev 
Reviewed-by: Christian Borntraeger 
---
 bsd-user/main.c   |  1 +
 cpu-exec.c|  1 +
 exec.c|  1 +
 hw/acpi/cpu_hotplug.c |  1 +
 hw/timer/a9gtimer.c   |  1 +
 include/exec/log.h| 60 +++
 include/qemu/log.h| 59 --
 linux-user/main.c |  1 +
 qom/cpu.c |  1 +
 target-alpha/translate.c  |  1 +
 target-arm/translate.c|  1 +
 target-cris/translate.c   |  1 +
 target-i386/seg_helper.c  |  1 +
 target-i386/smm_helper.c  |  1 +
 target-i386/translate.c   |  1 +
 target-lm32/helper.c  |  1 +
 target-lm32/translate.c   |  1 +
 target-m68k/translate.c   |  1 +
 target-microblaze/helper.c|  1 +
 target-microblaze/translate.c |  1 +
 target-mips/helper.c  |  1 +
 target-mips/translate.c   |  1 +
 target-moxie/translate.c  |  1 +
 target-openrisc/translate.c   |  1 +
 target-ppc/mmu-hash32.c   |  1 +
 target-ppc/mmu-hash64.c   |  1 +
 target-ppc/mmu_helper.c   |  1 +
 target-ppc/translate.c|  1 +
 target-s390x/translate.c  |  1 +
 target-sh4/helper.c   |  1 +
 target-sh4/translate.c|  1 +
 target-sparc/int32_helper.c   |  1 +
 target-sparc/int64_helper.c   |  1 +
 target-sparc/translate.c  |  1 +
 target-tilegx/translate.c |  1 +
 target-tricore/translate.c|  1 +
 target-unicore32/translate.c  |  1 +
 target-xtensa/translate.c |  1 +
 tcg/tcg.c |  1 +
 translate-all.c   |  1 +
 40 files changed, 98 insertions(+), 59 deletions(-)
 create mode 100644 include/exec/log.h

diff --git a/bsd-user/main.c b/bsd-user/main.c
index adf2de0..520ce99 100644
--- a/bsd-user/main.c
+++ b/bsd-user/main.c
@@ -33,6 +33,7 @@
 #include "tcg.h"
 #include "qemu/timer.h"
 #include "qemu/envlist.h"
+#include "exec/log.h"
 
 int singlestep;
 unsigned long mmap_min_addr;
diff --git a/cpu-exec.c b/cpu-exec.c
index 7eef083..564a21d 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -27,6 +27,7 @@
 #include "exec/address-spaces.h"
 #include "qemu/rcu.h"
 #include "exec/tb-hash.h"
+#include "exec/log.h"
 #if defined(TARGET_I386) && !defined(CONFIG_USER_ONLY)
 #include "hw/i386/apic.h"
 #endif
diff --git a/exec.c b/exec.c
index 8af2570..87636a6 100644
--- a/exec.c
+++ b/exec.c
@@ -53,6 +53,7 @@
 
 #include "exec/memory-internal.h"
 #include "exec/ram_addr.h"
+#include "exec/log.h"
 
 #include "qemu/range.h"
 #ifndef _WIN32
diff --git a/hw/acpi/cpu_hotplug.c b/hw/acpi/cpu_hotplug.c
index f5b9972..16bacfc 100644
--- a/hw/acpi/cpu_hotplug.c
+++ b/hw/acpi/cpu_hotplug.c
@@ -11,6 +11,7 @@
  */
 #include "hw/hw.h"
 #include "hw/acpi/cpu_hotplug.h"
+#include "qom/cpu.h"
 
 static uint64_t cpu_status_read(void *opaque, hwaddr addr, unsigned int size)
 {
diff --git a/hw/timer/a9gtimer.c b/hw/timer/a9gtimer.c
index dd4aae8..b38c76a 100644
--- a/hw/timer/a9gtimer.c
+++ b/hw/timer/a9gtimer.c
@@ -24,6 +24,7 @@
 #include "qemu/timer.h"
 #include "qemu/bitops.h"
 #include "qemu/log.h"
+#include "qom/cpu.h"
 
 #ifndef A9_GTIMER_ERR_DEBUG
 #define A9_GTIMER_ERR_DEBUG 0
diff --git a/include/exec/log.h b/include/exec/log.h
new file mode 100644
index 000..ba1c9b5
--- /dev/null
+++ b/include/exec/log.h
@@ -0,0 +1,60 @@
+#ifndef QEMU_EXEC_LOG_H
+#define QEMU_EXEC_LOG_H
+
+#include "qemu/log.h"
+#include "qom/cpu.h"
+#include "disas/disas.h"
+
+/* cpu_dump_state() logging functions: */
+/**
+ * log_cpu_state:
+ * @cpu: The CPU whose state is to be logged.
+ * @flags: Flags what to log.
+ *
+ * Logs the output of cpu_dump_state().
+ */
+static inline void log_cpu_state(CPUState *cpu, int flags)
+{
+if (qemu_log_enabled()) {
+cpu_dump_state(cpu, qemu_logfile, fprintf, flags);
+}
+}
+
+/**
+ * log_cpu_state_mask:
+ * @mask: Mask when to log.
+ * @cpu: The CPU whose state is to be logged.
+ * @flags: Flags what to log.
+ *
+ * Logs the output of cpu_dump_state() if loglevel includes @mask.
+ */
+static inline void log_cpu_state_mask(int mask, CPUState *cpu, int flags)
+{
+if (qemu_loglevel & mask) {
+log_cpu_state(cpu, flags);
+}
+}
+
+#ifdef NEED_CPU_H
+/* disas() and target_disas() to qemu_logfile: */
+static inline void log_target_disas(CPUState *cpu, target_ulong start,
+target_ulong len, int flags)
+{
+target_disas(qemu_logfile, cpu, start, len, flags);
+}
+
+static inline void log_disas(void *code, unsigned long size)
+{
+disas(qemu_logfile, code, size);
+}
+
+#if defined(CONFIG_USER_ONLY)
+/* page_dump() output to the log file: */
+static inline void log_page_dump(void)
+{
+page_dump(qemu_logfile);
+}
+#endif
+#endif
+
+#endif
diff 

Re: [Qemu-devel] [PATCH] pc: memhp: enforce minimal 128Mb alignment for pc-dimm

2015-10-26 Thread Michael S. Tsirkin
On Mon, Oct 26, 2015 at 09:42:05AM +0100, Igor Mammedov wrote:
> commit aa8580cd "pc: memhp: force gaps between DIMM's GPA"
> regressed memory hot-unplug for linux guests triggering
> following BUGON
>  =
>  kernel BUG at mm/memory_hotplug.c:703!

This is in portable code. Does this imply anyone implementing
inter dimm gaps will need the same value?
Shouldn't this go into portable code then?

>  ...
>  [] acpi_memory_device_remove+0x79/0xa5
>  [] acpi_bus_trim+0x5a/0x8d
>  [] acpi_device_hotplug+0x1b7/0x418
>  ===
> BUG_ON(phys_start_pfn & ~PAGE_SECTION_MASK);
>  ===
> 
> reson for it is that x86-64 linux guest supports memory
> hotplug in chunks of 128Mb and memory section also should
> be 128Mb aligned.
> However gaps forced between 128Mb DIMMs with backend's
> natural alignment of 2Mb make the 2nd and following
> DIMMs not being aligned on 128Mb boundary as it was
> originally. To fix regression enforce minimal 128Mb
> alignment like it was done for PPC.
> 
> Signed-off-by: Igor Mammedov 


Thanks for the fix. Pls see comments below.

> ---
>  hw/i386/pc.c | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 3d958ba..cd68169 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -1610,6 +1610,8 @@ void ioapic_init_gsi(GSIState *gsi_state, const char 
> *parent_name)
>  }
>  }
>  
> +#define MIN_DIMM_ALIGNMENT (1ULL << 27) /* 128Mb */
> +

Pls prefix with PC_ and pls add a comment explaining where does this
value come from.

>  static void pc_dimm_plug(HotplugHandler *hotplug_dev,
>   DeviceState *dev, Error **errp)
>  {
> @@ -1624,6 +1626,9 @@ static void pc_dimm_plug(HotplugHandler *hotplug_dev,
>  
>  if (memory_region_get_alignment(mr) && pcms->enforce_aligned_dimm) {
>  align = memory_region_get_alignment(mr);
> +if (pcmc->inter_dimm_gap && (align < MIN_DIMM_ALIGNMENT)) {

() not required around math.

> +align = MIN_DIMM_ALIGNMENT;
> +}

This seems wrong. Why is alignment only required when inter_dimm_gap
is set? Does this have to do with compatibility somehow? Pls add a comment.

>  }
>  
>  if (!pcms->acpi_dev) {
> -- 
> 1.8.3.1



[Qemu-devel] [PATCH 03/11] trace: split trace_init_file out of trace_init_backends

2015-10-26 Thread Denis V. Lunev
From: Paolo Bonzini 

This is cleaner, and improves error reporting with -daemonize.

Signed-off-by: Paolo Bonzini 
Signed-off-by: Denis V. Lunev 
Reviewed-by: Christian Borntraeger 
---
 qemu-io.c   |  2 +-
 trace/control.c | 17 -
 trace/control.h | 13 -
 trace/simple.c  |  6 ++
 trace/simple.h  |  4 ++--
 vl.c| 13 +
 6 files changed, 38 insertions(+), 17 deletions(-)

diff --git a/qemu-io.c b/qemu-io.c
index 269f17c..fbddf82 100644
--- a/qemu-io.c
+++ b/qemu-io.c
@@ -440,7 +440,7 @@ int main(int argc, char **argv)
 }
 break;
 case 'T':
-if (!trace_init_backends(optarg, NULL)) {
+if (!trace_init_backends()) {
 exit(1); /* error message will have been printed */
 }
 break;
diff --git a/trace/control.c b/trace/control.c
index ee5fbca..3e33d03 100644
--- a/trace/control.c
+++ b/trace/control.c
@@ -142,17 +142,24 @@ void trace_init_events(const char *fname)
 loc_pop();
 }
 
-bool trace_init_backends(const char *file)
+void trace_init_file(const char *file)
 {
 #ifdef CONFIG_TRACE_SIMPLE
-if (!st_init(file)) {
-fprintf(stderr, "failed to initialize simple tracing backend.\n");
-return false;
-}
+st_set_trace_file(file);
 #else
 if (file) {
 fprintf(stderr, "error: -trace file=...: "
 "option not supported by the selected tracing backends\n");
+exit(1);
+}
+#endif
+}
+
+bool trace_init_backends(void)
+{
+#ifdef CONFIG_TRACE_SIMPLE
+if (!st_init()) {
+fprintf(stderr, "failed to initialize simple tracing backend.\n");
 return false;
 }
 #endif
diff --git a/trace/control.h b/trace/control.h
index bfbe560..d2506d4 100644
--- a/trace/control.h
+++ b/trace/control.h
@@ -157,7 +157,7 @@ static void trace_event_set_state_dynamic(TraceEvent *ev, 
bool state);
  *
  * Returns: Whether the backends could be successfully initialized.
  */
-bool trace_init_backends(const char *file);
+bool trace_init_backends(void);
 
 /**
  * trace_init_events:
@@ -170,6 +170,17 @@ bool trace_init_backends(const char *file);
  */
 void trace_init_events(const char *file);
 
+/**
+ * trace_init_file:
+ * @file:   Name of trace output file; may be NULL.
+ *  Corresponds to commandline option "-trace file=...".
+ *
+ * Record the name of the output file for the tracing backend.
+ * Exits if no selected backend does not support specifying the
+ * output file, and a non-NULL file was passed.
+ */
+void trace_init_file(const char *file);
+
 
 #include "trace/control-internal.h"
 
diff --git a/trace/simple.c b/trace/simple.c
index 11ad030..a4bc705 100644
--- a/trace/simple.c
+++ b/trace/simple.c
@@ -322,7 +322,7 @@ void st_set_trace_file_enabled(bool enable)
  * @fileThe trace file name or NULL for the default name- set at
  *  config time
  */
-bool st_set_trace_file(const char *file)
+void st_set_trace_file(const char *file)
 {
 st_set_trace_file_enabled(false);
 
@@ -335,7 +335,6 @@ bool st_set_trace_file(const char *file)
 }
 
 st_set_trace_file_enabled(true);
-return true;
 }
 
 void st_print_trace_file_status(FILE *stream, int (*stream_printf)(FILE 
*stream, const char *fmt, ...))
@@ -373,7 +372,7 @@ static GThread *trace_thread_create(GThreadFunc fn)
 return thread;
 }
 
-bool st_init(const char *file)
+bool st_init(void)
 {
 GThread *thread;
 
@@ -386,6 +385,5 @@ bool st_init(const char *file)
 }
 
 atexit(st_flush_trace_buffer);
-st_set_trace_file(file);
 return true;
 }
diff --git a/trace/simple.h b/trace/simple.h
index 6997996..8d1a32e 100644
--- a/trace/simple.h
+++ b/trace/simple.h
@@ -20,8 +20,8 @@
 
 void st_print_trace_file_status(FILE *stream, fprintf_function stream_printf);
 void st_set_trace_file_enabled(bool enable);
-bool st_set_trace_file(const char *file);
-bool st_init(const char *file);
+void st_set_trace_file(const char *file);
+bool st_init(void);
 void st_flush_trace_buffer(void);
 
 typedef struct {
diff --git a/vl.c b/vl.c
index ba29fef..b1ddd3f 100644
--- a/vl.c
+++ b/vl.c
@@ -2956,7 +2956,7 @@ int main(int argc, char **argv, char **envp)
 bool userconfig = true;
 const char *log_mask = NULL;
 const char *log_file = NULL;
-const char *trace_file = NULL;
+char *trace_file = NULL;
 ram_addr_t maxram_size;
 uint64_t ram_slots = 0;
 FILE *vmstate_dump_file = NULL;
@@ -3881,7 +3881,10 @@ int main(int argc, char **argv, char **envp)
 exit(1);
 }
 trace_init_events(qemu_opt_get(opts, "events"));
-trace_file = qemu_opt_get(opts, "file");
+if (trace_file) {
+g_free(trace_file);
+}
+trace_file = g_strdup(qemu_opt_get(opts, "file"));
 

  1   2   3   4   >