date:20180417

Re: [Qemu-devel] getdents patch for 64-bit app on 32-bit host

2018-04-17 Thread Thomas Huth

On 17.04.2018 23:53, Henry Wertz wrote:
> Please find submitted a patch for getdents (this system call stands for
> "get directory entries", it is passed a file descriptor pointing to a
> directory and returns a struct with info on the entries in that
> directory.)  This patch is against qemu-2.10 series but continues to apply
> cleanly on current as of April 15 2018.

 Hi,

thanks for the patch, but when you send patches, please make sure:

1) To send "unified" patches, i.e. "diff -u", or even better, use "git
format-patch" to create it and "git send-email" to send it to the list.

2) Patches should be inline, not in an attachment

3) Patch description should be ready for the changelog, i.e. don't write
something like "Please find submitted..." in the patch description. You
can write additional text below the "---" separator if necessary.
Patches will be applied with "git am", and text below the "---"
separator will then be discarded.

4) Make sure that you put the right maintainers on CC:, or otherwise
your patch might be lost in the high traffic of the qemu-devel mailing
list. Use the MAINTAINERS or the scripts/get_maintainers.pl script to
find out the right maintainers.

Please also see https://wiki.qemu.org/Contribute/SubmitAPatch for some
more details.

 Thomas

Re: [Qemu-devel] [PATCH] intel-iommu: send PSI always when notify_unmap set

2018-04-17 Thread Liu, Yi L

> Sent: Wednesday, April 18, 2018 12:51 PM
> Subject: [Qemu-devel] [PATCH] intel-iommu: send PSI always when notify_unmap
> set
> 
> During IOVA page table walk, there is a special case when:
> 
> - notify_unmap is set, meanwhile
> - entry is invalid

This is very brief description, would you mind talk a little bit more.
 
> In the past, we skip the entry always.  This is not correct.  We should send 
> UNMAP
> notification to registered notifiers in this case.  Otherwise some stall 
> pages will still
> be mapped in the host even if L1 guest unmapped them already.
>
> Without this patch, nested device assignment to L2 guests might dump some 
> errors
> like:

Should it be physical device assigned from L0 host? Or emulated devices could 
also
trigger this problem?

> qemu-system-x86_64: VFIO_MAP_DMA: -17
> qemu-system-x86_64: vfio_dma_map(0x557305420c30, 0xad000, 0x1000,
> 0x7f89a920d000) = -17 (File exists)
> 
> To fix this, we need to apply this patch to L1 QEMU (L2 QEMU is not affected 
> by this
> problem).

Does this fix also apply to L0 QEMU?
 
> Signed-off-by: Peter Xu 
> ---
> 
> To test nested assignment, one also needs to apply below patchset:
> https://lkml.org/lkml/2018/4/18/5
> ---
>  hw/i386/intel_iommu.c | 42 ++
>  1 file changed, 30 insertions(+), 12 deletions(-)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index
> fb31de9416..b359efd6f9 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -722,6 +722,15 @@ static int vtd_iova_to_slpte(VTDContextEntry *ce, 
> uint64_t
> iova, bool is_write,
> 
>  typedef int (*vtd_page_walk_hook)(IOMMUTLBEntry *entry, void *private);
> 
> +static int vtd_page_walk_one(IOMMUTLBEntry *entry, int level,
> + vtd_page_walk_hook hook_fn, void *private)
> +{
> +assert(hook_fn);
> +trace_vtd_page_walk_one(level, entry->iova, entry->translated_addr,
> +entry->addr_mask, entry->perm);
> +return hook_fn(entry, private);
> +}
> +
>  /**
>   * vtd_page_walk_level - walk over specific level for IOVA range
>   *
> @@ -781,28 +790,37 @@ static int vtd_page_walk_level(dma_addr_t addr, uint64_t
> start,
>   */
>  entry_valid = read_cur | write_cur;
> 
> +entry.target_as = _space_memory;
> +entry.iova = iova & subpage_mask;
> +entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
> +entry.addr_mask = ~subpage_mask;
> +
>  if (vtd_is_last_slpte(slpte, level)) {
> -entry.target_as = _space_memory;
> -entry.iova = iova & subpage_mask;
>  /* NOTE: this is only meaningful if entry_valid == true */
>  entry.translated_addr = vtd_get_slpte_addr(slpte, aw);
> -entry.addr_mask = ~subpage_mask;
> -entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
>  if (!entry_valid && !notify_unmap) {
>  trace_vtd_page_walk_skip_perm(iova, iova_next);
>  goto next;
>  }
> -trace_vtd_page_walk_one(level, entry.iova, entry.translated_addr,
> -entry.addr_mask, entry.perm);
> -if (hook_fn) {
> -ret = hook_fn(, private);
> -if (ret < 0) {
> -return ret;
> -}
> +ret = vtd_page_walk_one(, level, hook_fn, private);
> +if (ret < 0) {
> +return ret;
>  }
>  } else {
>  if (!entry_valid) {
> -trace_vtd_page_walk_skip_perm(iova, iova_next);
> +if (notify_unmap) {
> +/*
> + * The whole entry is invalid; unmap it all.
> + * Translated address is meaningless, zero it.
> + */
> +entry.translated_addr = 0x0;
> +ret = vtd_page_walk_one(, level, hook_fn, private);
> +if (ret < 0) {
> +return ret;
> +}
> +} else {
> +trace_vtd_page_walk_skip_perm(iova, iova_next);
> +}
>  goto next;
>  }
>  ret = vtd_page_walk_level(vtd_get_slpte_addr(slpte, aw), iova,
> --
> 2.14.3
> 

Thanks,
Yi Liu

Re: [Qemu-devel] [PATCH] intel-iommu: send PSI always when notify_unmap set

2018-04-17 Thread Peter Xu

On Wed, Apr 18, 2018 at 12:51:21PM +0800, Peter Xu wrote:
> During IOVA page table walk, there is a special case when:
> 
> - notify_unmap is set, meanwhile
> - entry is invalid
> 
> In the past, we skip the entry always.  This is not correct.  We should
> send UNMAP notification to registered notifiers in this case.  Otherwise
> some stall pages will still be mapped in the host even if L1 guest
> unmapped them already.
> 
> Without this patch, nested device assignment to L2 guests might dump
> some errors like:
> 
> qemu-system-x86_64: VFIO_MAP_DMA: -17
> qemu-system-x86_64: vfio_dma_map(0x557305420c30, 0xad000, 0x1000,
> 0x7f89a920d000) = -17 (File exists)
> 
> To fix this, we need to apply this patch to L1 QEMU (L2 QEMU is not
> affected by this problem).
> 
> Signed-off-by: Peter Xu 

This should really be 2.12 material, it fixes a real bug, but not sure
whether it's too late already.  Michael, what do you think?

Thanks,

-- 
Peter Xu

Re: [Qemu-devel] [PATCH v3] monitor: let cur_mon be per-thread

2018-04-17 Thread Stefan Hajnoczi

On Tue, Apr 17, 2018 at 11:05:47AM +0200, Markus Armbruster wrote:
> Stefan Hajnoczi  writes:
> 
> > On Mon, Apr 16, 2018 at 05:17:32PM +0800, Peter Xu wrote:
> >> On Mon, Apr 16, 2018 at 04:37:48PM +0800, Stefan Hajnoczi wrote:
> >> > On Thu, Apr 12, 2018 at 02:11:08PM +0800, Peter Xu wrote:
> >> > > In the future the monitor iothread may be accessing the cur_mon as
> >> > > well (via monitor_qmp_dispatch_one()).  Before we introduce a real
> >> > > Out-Of-Band command, let's convert the cur_mon variable to be a
> >> > > per-thread variable to make sure there won't be a race between threads.
> >> > >
> >> > > Note that thread variables are not initialized to a valid value when 
> >> > > new
> >> > > thread is created.  However for our case we don't need to set it up,
> >> > > since the cur_mon variable is only used in such a pattern:
> >> > > 
> >> > >   old_mon = cur_mon;
> >> > >   cur_mon = xxx;
> >> > >   (do something, read cur_mon if necessary in the stack)
> >> > >   cur_mon = old_mon;
> >> > > 
> >> > > It plays a role as stack variable, so no need to be initialized at all.
> >> > > We only need to make sure the variable won't be changed unexpectedly by
> >> > > other threads.
> >> > > 
> >> > > Signed-off-by: Peter Xu 
> >> > > ---
> >> > > v3:
> >> > > - fix code style warning from patchew
> >> > > v2:
> >> > > - drop qemu-thread changes
> >> > > ---
> >> > >  include/monitor/monitor.h | 2 +-
> >> > >  monitor.c | 2 +-
> >> > >  stubs/monitor.c   | 2 +-
> >> > >  tests/test-util-sockets.c | 2 +-
> >> > >  4 files changed, 4 insertions(+), 4 deletions(-)
> >> > 
> >> > The Monitor object is not fully thread-safe, so although the correct
> >> > cur_mon is now accessible, code may still be unsafe.  For example,
> >> > monitor_get_fd(cur_mon, ...) is not thread-safe and must not be used by
> >> > OOB commands.
> >> 
> >> IMHO things like monitor_get_fd() should only be called in QMP
> >> context, so there should always be a monitor_qmp_dispatch_one() in the
> >> stack already (no matter whether it is in main thread or the monitor
> >> iothread), which means that cur_mon should have been setup.  So IMHO
> >> it's a programming error if monitor_get_fd() is called without correct
> >> cur_mon setup after this patch.
> >
> > The pointer value of cur_mon is not the issue, you have made that work
> > correctly.  The problem is that some monitor.h APIs do not access the
> > Monitor object in a thread-safe fashion.
> >
> > Two QMP commands executing simultaneously in the main loop thread and
> > the monitor IOThread can hit race conditions.  The example I gave was
> > the monitor_get_fd() API, which iterates and modifies the mon->fds
> > QLIST without a lock.
> >
> > Please audit monitor.h and either make things thread-safe or document
> > the thread-safety rules (e.g. "This function cannot be called from
> > out-of-band QMP context").  This wasn't necessary before but now that
> > you are adding multi-threading it is.
> 
> Code working with the current thread's monitor via thread-local cur_mon
> is easier to analyze in some ways than code working with a Monitor *
> parameter: the latter can interfere with some other thread's monitor,
> and you may have to argue what values the parameter can take.
> 
> You might want to replace parameters by cur_mon in certain cases.
> 
> Funnily, the plan used to be the opposite.  Commit 376253ece48: "On the
> mid or long term, those use case will be obsoleted so that [cur_mon] can
> be removed again."

Either way, the issue I described can still happen since two QMP
commands for a single Monitor object can execute simultaneously in the
main loop thread and the monitor IOThread.

I'm basically warning that QMP multi-threading isn't a solved problem
yet.  It needs to be solved by a combination of making things
thread-safe, documentation, and assertions so code fails loudly and
early when called from an unsupported context.

Stefan


signature.asc
Description: PGP signature

[Qemu-devel] [PATCH] intel-iommu: send PSI always when notify_unmap set

2018-04-17 Thread Peter Xu

During IOVA page table walk, there is a special case when:

- notify_unmap is set, meanwhile
- entry is invalid

In the past, we skip the entry always.  This is not correct.  We should
send UNMAP notification to registered notifiers in this case.  Otherwise
some stall pages will still be mapped in the host even if L1 guest
unmapped them already.

Without this patch, nested device assignment to L2 guests might dump
some errors like:

qemu-system-x86_64: VFIO_MAP_DMA: -17
qemu-system-x86_64: vfio_dma_map(0x557305420c30, 0xad000, 0x1000,
0x7f89a920d000) = -17 (File exists)

To fix this, we need to apply this patch to L1 QEMU (L2 QEMU is not
affected by this problem).

Signed-off-by: Peter Xu 
---

To test nested assignment, one also needs to apply below patchset:
https://lkml.org/lkml/2018/4/18/5
---
 hw/i386/intel_iommu.c | 42 ++
 1 file changed, 30 insertions(+), 12 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index fb31de9416..b359efd6f9 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -722,6 +722,15 @@ static int vtd_iova_to_slpte(VTDContextEntry *ce, uint64_t 
iova, bool is_write,
 
 typedef int (*vtd_page_walk_hook)(IOMMUTLBEntry *entry, void *private);
 
+static int vtd_page_walk_one(IOMMUTLBEntry *entry, int level,
+ vtd_page_walk_hook hook_fn, void *private)
+{
+assert(hook_fn);
+trace_vtd_page_walk_one(level, entry->iova, entry->translated_addr,
+entry->addr_mask, entry->perm);
+return hook_fn(entry, private);
+}
+
 /**
  * vtd_page_walk_level - walk over specific level for IOVA range
  *
@@ -781,28 +790,37 @@ static int vtd_page_walk_level(dma_addr_t addr, uint64_t 
start,
  */
 entry_valid = read_cur | write_cur;
 
+entry.target_as = _space_memory;
+entry.iova = iova & subpage_mask;
+entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
+entry.addr_mask = ~subpage_mask;
+
 if (vtd_is_last_slpte(slpte, level)) {
-entry.target_as = _space_memory;
-entry.iova = iova & subpage_mask;
 /* NOTE: this is only meaningful if entry_valid == true */
 entry.translated_addr = vtd_get_slpte_addr(slpte, aw);
-entry.addr_mask = ~subpage_mask;
-entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
 if (!entry_valid && !notify_unmap) {
 trace_vtd_page_walk_skip_perm(iova, iova_next);
 goto next;
 }
-trace_vtd_page_walk_one(level, entry.iova, entry.translated_addr,
-entry.addr_mask, entry.perm);
-if (hook_fn) {
-ret = hook_fn(, private);
-if (ret < 0) {
-return ret;
-}
+ret = vtd_page_walk_one(, level, hook_fn, private);
+if (ret < 0) {
+return ret;
 }
 } else {
 if (!entry_valid) {
-trace_vtd_page_walk_skip_perm(iova, iova_next);
+if (notify_unmap) {
+/*
+ * The whole entry is invalid; unmap it all.
+ * Translated address is meaningless, zero it.
+ */
+entry.translated_addr = 0x0;
+ret = vtd_page_walk_one(, level, hook_fn, private);
+if (ret < 0) {
+return ret;
+}
+} else {
+trace_vtd_page_walk_skip_perm(iova, iova_next);
+}
 goto next;
 }
 ret = vtd_page_walk_level(vtd_get_slpte_addr(slpte, aw), iova,
-- 
2.14.3

Re: [Qemu-devel] [PATCH v3] spapr: Support ibm, dynamic-memory-v2 property

2018-04-17 Thread David Gibson

On Tue, Apr 17, 2018 at 02:39:09PM +0530, Bharata B Rao wrote:
> On Tue, Apr 17, 2018 at 11:14:27AM +1000, David Gibson wrote:
> > >  static void spapr_machine_2_12_class_options(MachineClass *mc)
> > > diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> > > index d60b7c6d7a..5e044c44af 100644
> > > --- a/include/hw/ppc/spapr.h
> > > +++ b/include/hw/ppc/spapr.h
> > > @@ -149,6 +149,7 @@ struct sPAPRMachineState {
> > >  sPAPROptionVector *ov5; /* QEMU-supported option vectors */
> > >  sPAPROptionVector *ov5_cas; /* negotiated (via CAS) option 
> > > vectors */
> > >  uint32_t max_compat_pvr;
> > > +bool use_ibm_dynamic_memory_v2;
> > 
> > TBH, I'm not really sure we even need to adjust this by machine type.
> 
> There are other similar features controlled by ov5 bits that
> are also determined by machine type version:
> 
> Memory hotplug support -- sPAPRMachineClass.dr_lmb_enabled
> Dedicated HP event support -- sPAPRMachineState.use_hotplug_event_source

As for user settability the issue isn't that it's set by ov5, but what
the effect of the feature is.  Those other features alter runtime
hypervisor behaviour and that behaviour has to remain the same across
a migration.  Therefore we have to keep the behaviour consistent for
old machine types.

This feature affects only boot time behaviour.  It has a similar
effect to what a firmware update might, on real hardware.  Furthermore
the way CAS and the device tree work, this is vanishingly unlikely to
break existing guests.

> Are you saying that presence of ibm,dynamic-memory-v2 probably shouldn't
> be dependent on machine type ?

Yes, I am.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

[Qemu-devel] [RFC PATCH v2 5/7] iscsi: Implement copy offloading

2018-04-17 Thread Fam Zheng

Issue EXTENDED COPY (LID1) command to implement the copy_range API.

The parameter data construction code is ported from libiscsi's
iscsi-dd.c.

Signed-off-by: Fam Zheng 
---
 block/iscsi.c| 266 +++
 include/scsi/constants.h |   3 +
 2 files changed, 269 insertions(+)

diff --git a/block/iscsi.c b/block/iscsi.c
index f5aecfc883..7d17e03ad3 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -68,6 +68,7 @@ typedef struct IscsiLun {
 QemuMutex mutex;
 struct scsi_inquiry_logical_block_provisioning lbp;
 struct scsi_inquiry_block_limits bl;
+struct scsi_inquiry_device_designator *dd;
 unsigned char *zeroblock;
 /* The allocmap tracks which clusters (pages) on the iSCSI target are
  * allocated and which are not. In case a target returns zeros for
@@ -1740,6 +1741,29 @@ static QemuOptsList runtime_opts = {
 },
 };
 
+static void iscsi_save_designator(IscsiLun *lun,
+  struct scsi_inquiry_device_identification 
*inq_di)
+{
+struct scsi_inquiry_device_designator *desig, *copy = NULL;
+
+for (desig = inq_di->designators; desig; desig = desig->next) {
+if (desig->association ||
+desig->designator_type > SCSI_DESIGNATOR_TYPE_NAA) {
+continue;
+}
+/* NAA works better than T10 vendor ID based designator. */
+if (!copy || copy->designator_type < desig->designator_type) {
+copy = desig;
+}
+}
+if (copy) {
+lun->dd = g_new(struct scsi_inquiry_device_designator, 1);
+*lun->dd = *copy;
+lun->dd->designator = g_malloc(copy->designator_length);
+memcpy(lun->dd->designator, copy->designator, copy->designator_length);
+}
+}
+
 static int iscsi_open(BlockDriverState *bs, QDict *options, int flags,
   Error **errp)
 {
@@ -1922,6 +1946,7 @@ static int iscsi_open(BlockDriverState *bs, QDict 
*options, int flags,
 struct scsi_task *inq_task;
 struct scsi_inquiry_logical_block_provisioning *inq_lbp;
 struct scsi_inquiry_block_limits *inq_bl;
+struct scsi_inquiry_device_identification *inq_di;
 switch (inq_vpd->pages[i]) {
 case SCSI_INQUIRY_PAGECODE_LOGICAL_BLOCK_PROVISIONING:
 inq_task = iscsi_do_inquiry(iscsilun->iscsi, iscsilun->lun, 1,
@@ -1947,6 +1972,17 @@ static int iscsi_open(BlockDriverState *bs, QDict 
*options, int flags,
sizeof(struct scsi_inquiry_block_limits));
 scsi_free_scsi_task(inq_task);
 break;
+case SCSI_INQUIRY_PAGECODE_DEVICE_IDENTIFICATION:
+inq_task = iscsi_do_inquiry(iscsilun->iscsi, iscsilun->lun, 1,
+
SCSI_INQUIRY_PAGECODE_DEVICE_IDENTIFICATION,
+(void **) _di, errp);
+if (inq_task == NULL) {
+ret = -EINVAL;
+goto out;
+}
+iscsi_save_designator(iscsilun, inq_di);
+scsi_free_scsi_task(inq_task);
+break;
 default:
 break;
 }
@@ -2003,6 +2039,8 @@ static void iscsi_close(BlockDriverState *bs)
 iscsi_logout_sync(iscsi);
 }
 iscsi_destroy_context(iscsi);
+g_free(iscsilun->dd->designator);
+g_free(iscsilun->dd);
 g_free(iscsilun->zeroblock);
 iscsi_allocmap_free(iscsilun);
 qemu_mutex_destroy(>mutex);
@@ -2184,6 +,230 @@ static void coroutine_fn 
iscsi_co_invalidate_cache(BlockDriverState *bs,
 iscsi_allocmap_invalidate(iscsilun);
 }
 
+static int coroutine_fn iscsi_co_copy_range_from(BlockDriverState *bs,
+ BdrvChild *src,
+ uint64_t src_offset,
+ BdrvChild *dst,
+ uint64_t dst_offset,
+ uint64_t bytes,
+ BdrvRequestFlags flags)
+{
+return bdrv_co_copy_range_to(src, src_offset, dst, dst_offset, bytes, 
flags);
+}
+
+static struct scsi_task *iscsi_xcopy_task(int param_len)
+{
+struct scsi_task *task;
+
+task = g_new0(struct scsi_task, 1);
+
+task->cdb[0] = EXTENDED_COPY;
+task->cdb[10]= (param_len >> 24) & 0xFF;
+task->cdb[11]= (param_len >> 16) & 0xFF;
+task->cdb[12]= (param_len >> 8) & 0xFF;
+task->cdb[13]= param_len & 0xFF;
+task->cdb_size   = 16;
+task->xfer_dir   = SCSI_XFER_WRITE;
+task->expxferlen = param_len;
+
+return task;
+}
+
+static int iscsi_populate_target_desc(unsigned char *desc, IscsiLun *lun)
+{
+struct scsi_inquiry_device_designator *dd = lun->dd;
+
+memset(desc, 0, 32);
+desc[0] = IDENT_DESCR_TGT_DESCR;
+desc[4] = dd->code_set;
+desc[5] = (dd->designator_type & 0xF)
+|

[Qemu-devel] [RFC PATCH v2 2/7] raw: Implement copy offloading

2018-04-17 Thread Fam Zheng

Just pass down to ->file.

Signed-off-by: Fam Zheng 
---
 block/raw-format.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/block/raw-format.c b/block/raw-format.c
index a378547c99..febddf00c0 100644
--- a/block/raw-format.c
+++ b/block/raw-format.c
@@ -482,6 +482,24 @@ static int raw_probe_geometry(BlockDriverState *bs, 
HDGeometry *geo)
 return bdrv_probe_geometry(bs->file->bs, geo);
 }
 
+static int coroutine_fn raw_co_copy_range_from(BlockDriverState *bs,
+   BdrvChild *src, uint64_t 
src_offset,
+   BdrvChild *dst, uint64_t 
dst_offset,
+   uint64_t bytes, 
BdrvRequestFlags flags)
+{
+return bdrv_co_copy_range_from(bs->file, src_offset, dst, dst_offset,
+   bytes, flags);
+}
+
+static int coroutine_fn raw_co_copy_range_to(BlockDriverState *bs,
+ BdrvChild *src, uint64_t 
src_offset,
+ BdrvChild *dst, uint64_t 
dst_offset,
+ uint64_t bytes, BdrvRequestFlags 
flags)
+{
+return bdrv_co_copy_range_to(src, src_offset, bs->file, dst_offset, bytes,
+ flags);
+}
+
 BlockDriver bdrv_raw = {
 .format_name  = "raw",
 .instance_size= sizeof(BDRVRawState),
@@ -498,6 +516,8 @@ BlockDriver bdrv_raw = {
 .bdrv_co_pwrite_zeroes = _co_pwrite_zeroes,
 .bdrv_co_pdiscard = _co_pdiscard,
 .bdrv_co_block_status = _co_block_status,
+.bdrv_co_copy_range_from = _co_copy_range_from,
+.bdrv_co_copy_range_to  = _co_copy_range_to,
 .bdrv_truncate= _truncate,
 .bdrv_getlength   = _getlength,
 .has_variable_length  = true,
-- 
2.14.3

[Qemu-devel] [RFC PATCH v2 7/7] qemu-img: Convert with copy offloading

2018-04-17 Thread Fam Zheng

The new blk_co_copy_range interface offers a more efficient way in the
case of network based storage. Make use of it to allow faster convert
operation.

Since copy offloading cannot do zero detection ('-S') and compression
(-c), only try it when these options are not used.

Signed-off-by: Fam Zheng 
---
 qemu-img.c | 50 --
 1 file changed, 48 insertions(+), 2 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 855fa52514..268b749592 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1544,6 +1544,7 @@ typedef struct ImgConvertState {
 bool compressed;
 bool target_has_backing;
 bool wr_in_order;
+bool copy_range;
 int min_sparse;
 size_t cluster_sectors;
 size_t buf_sectors;
@@ -1737,6 +1738,37 @@ static int coroutine_fn convert_co_write(ImgConvertState 
*s, int64_t sector_num,
 return 0;
 }
 
+static int coroutine_fn convert_co_copy_range(ImgConvertState *s, int64_t 
sector_num,
+  int nb_sectors)
+{
+int n, ret;
+
+while (nb_sectors > 0) {
+BlockBackend *blk;
+int src_cur;
+int64_t bs_sectors, src_cur_offset;
+int64_t offset;
+
+convert_select_part(s, sector_num, _cur, _cur_offset);
+offset = (sector_num - src_cur_offset) << BDRV_SECTOR_BITS;
+blk = s->src[src_cur];
+bs_sectors = s->src_sectors[src_cur];
+
+n = MIN(nb_sectors, bs_sectors - (sector_num - src_cur_offset));
+
+ret = blk_co_copy_range(blk, offset, s->target,
+sector_num << BDRV_SECTOR_BITS,
+n << BDRV_SECTOR_BITS, 0);
+if (ret < 0) {
+return ret;
+}
+
+sector_num += n;
+nb_sectors -= n;
+}
+return 0;
+}
+
 static void coroutine_fn convert_co_do_copy(void *opaque)
 {
 ImgConvertState *s = opaque;
@@ -1759,6 +1791,7 @@ static void coroutine_fn convert_co_do_copy(void *opaque)
 int n;
 int64_t sector_num;
 enum ImgConvertBlockStatus status;
+bool copy_range;
 
 qemu_co_mutex_lock(>lock);
 if (s->ret != -EINPROGRESS || s->sector_num >= s->total_sectors) {
@@ -1788,7 +1821,9 @@ static void coroutine_fn convert_co_do_copy(void *opaque)
 s->allocated_sectors, 0);
 }
 
-if (status == BLK_DATA) {
+retry:
+copy_range = s->copy_range && s->status == BLK_DATA;
+if (status == BLK_DATA && !copy_range) {
 ret = convert_co_read(s, sector_num, n, buf);
 if (ret < 0) {
 error_report("error while reading sector %" PRId64
@@ -1810,7 +1845,15 @@ static void coroutine_fn convert_co_do_copy(void *opaque)
 }
 
 if (s->ret == -EINPROGRESS) {
-ret = convert_co_write(s, sector_num, n, buf, status);
+if (copy_range) {
+ret = convert_co_copy_range(s, sector_num, n);
+if (ret) {
+s->copy_range = false;
+goto retry;
+}
+} else {
+ret = convert_co_write(s, sector_num, n, buf, status);
+}
 if (ret < 0) {
 error_report("error while writing sector %" PRId64
  ": %s", sector_num, strerror(-ret));
@@ -1933,6 +1976,7 @@ static int img_convert(int argc, char **argv)
 ImgConvertState s = (ImgConvertState) {
 /* Need at least 4k of zeros for sparse detection */
 .min_sparse = 8,
+.copy_range = true,
 .buf_sectors= IO_BUF_SIZE / BDRV_SECTOR_SIZE,
 .wr_in_order= true,
 .num_coroutines = 8,
@@ -1973,6 +2017,7 @@ static int img_convert(int argc, char **argv)
 break;
 case 'c':
 s.compressed = true;
+s.copy_range = false;
 break;
 case 'o':
 if (!is_valid_option_list(optarg)) {
@@ -2014,6 +2059,7 @@ static int img_convert(int argc, char **argv)
 }
 
 s.min_sparse = sval / BDRV_SECTOR_SIZE;
+s.copy_range = false;
 break;
 }
 case 'p':
-- 
2.14.3

[Qemu-devel] [RFC PATCH v2 6/7] block-backend: Add blk_co_copy_range

2018-04-17 Thread Fam Zheng

It's a BlockBackend wrapper of the BDS interface.

Signed-off-by: Fam Zheng 
---
 block/block-backend.c  | 9 +
 include/sysemu/block-backend.h | 4 
 2 files changed, 13 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index 681b240b12..2a984b1864 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -2217,3 +2217,12 @@ void blk_unregister_buf(BlockBackend *blk, void *host)
 {
 bdrv_unregister_buf(blk_bs(blk), host);
 }
+
+int coroutine_fn blk_co_copy_range(BlockBackend *blk_in, int64_t off_in,
+   BlockBackend *blk_out, int64_t off_out,
+   int bytes, BdrvRequestFlags flags)
+{
+return bdrv_co_copy_range(blk_in->root, off_in,
+  blk_out->root, off_out,
+  bytes, flags);
+}
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 92ab624fac..6f043b6b51 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -232,4 +232,8 @@ void blk_set_force_allow_inactivate(BlockBackend *blk);
 void blk_register_buf(BlockBackend *blk, void *host, size_t size);
 void blk_unregister_buf(BlockBackend *blk, void *host);
 
+int blk_co_copy_range(BlockBackend *blk_in, int64_t off_in,
+  BlockBackend *blk_out, int64_t off_out,
+  int bytes, BdrvRequestFlags flags);
+
 #endif
-- 
2.14.3

[Qemu-devel] [RFC PATCH v2 1/7] block: Introduce API for copy offloading

2018-04-17 Thread Fam Zheng

Signed-off-by: Fam Zheng 
---
 block/io.c| 91 +++
 include/block/block.h |  4 +++
 include/block/block_int.h | 30 
 3 files changed, 125 insertions(+)

diff --git a/block/io.c b/block/io.c
index bd9a19a9c4..d274e9525f 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2826,3 +2826,94 @@ void bdrv_unregister_buf(BlockDriverState *bs, void 
*host)
 bdrv_unregister_buf(child->bs, host);
 }
 }
+
+static int bdrv_co_copy_range_internal(BdrvChild *src,
+   uint64_t src_offset,
+   BdrvChild *dst,
+   uint64_t dst_offset,
+   uint64_t bytes, BdrvRequestFlags flags,
+   bool recurse_src)
+{
+int ret;
+
+if (!src || !dst || !src->bs || !dst->bs) {
+return -ENOMEDIUM;
+}
+ret = bdrv_check_byte_request(src->bs, src_offset, bytes);
+if (ret) {
+return ret;
+}
+
+ret = bdrv_check_byte_request(dst->bs, dst_offset, bytes);
+if (ret) {
+return ret;
+}
+if (flags & BDRV_REQ_ZERO_WRITE) {
+return bdrv_co_pwrite_zeroes(dst, dst_offset, bytes, flags);
+}
+
+if (!src->bs->drv->bdrv_co_copy_range_from
+|| !dst->bs->drv->bdrv_co_copy_range_to
+|| src->bs->encrypted || dst->bs->encrypted) {
+return -ENOTSUP;
+}
+if (recurse_src) {
+return src->bs->drv->bdrv_co_copy_range_from(src->bs,
+ src, src_offset,
+ dst, dst_offset,
+ bytes, flags);
+} else {
+return dst->bs->drv->bdrv_co_copy_range_to(dst->bs,
+   src, src_offset,
+   dst, dst_offset,
+   bytes, flags);
+}
+}
+
+/* Copy range from @bs to @dst. */
+int bdrv_co_copy_range_from(BdrvChild *src, uint64_t src_offset,
+BdrvChild *dst, uint64_t dst_offset,
+uint64_t bytes, BdrvRequestFlags flags)
+{
+return bdrv_co_copy_range_internal(src, src_offset, dst, dst_offset,
+   bytes, flags, true);
+}
+
+/* Copy range from @src to @bs. Should only be called by block drivers when @bs
+ * is the leaf. */
+int bdrv_co_copy_range_to(BdrvChild *src, uint64_t src_offset,
+  BdrvChild *dst, uint64_t dst_offset,
+  uint64_t bytes, BdrvRequestFlags flags)
+{
+return bdrv_co_copy_range_internal(src, src_offset, dst, dst_offset,
+   bytes, flags, false);
+}
+
+int bdrv_co_copy_range(BdrvChild *src, uint64_t src_offset,
+   BdrvChild *dst, uint64_t dst_offset,
+   uint64_t bytes, BdrvRequestFlags flags)
+{
+BdrvTrackedRequest src_req, dst_req;
+BlockDriverState *src_bs = src->bs;
+BlockDriverState *dst_bs = dst->bs;
+int ret;
+
+bdrv_inc_in_flight(src_bs);
+bdrv_inc_in_flight(dst_bs);
+tracked_request_begin(_req, src_bs, src_offset,
+  bytes, BDRV_TRACKED_READ);
+tracked_request_begin(_req, dst_bs, dst_offset,
+  bytes, BDRV_TRACKED_WRITE);
+
+wait_serialising_requests(_req);
+wait_serialising_requests(_req);
+ret = bdrv_co_copy_range_from(src, src_offset,
+  dst, dst_offset,
+  bytes, flags);
+
+tracked_request_end(_req);
+tracked_request_end(_req);
+bdrv_dec_in_flight(src_bs);
+bdrv_dec_in_flight(dst_bs);
+return ret;
+}
diff --git a/include/block/block.h b/include/block/block.h
index cdec3639a3..72ac011b2b 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -604,4 +604,8 @@ bool bdrv_can_store_new_dirty_bitmap(BlockDriverState *bs, 
const char *name,
  */
 void bdrv_register_buf(BlockDriverState *bs, void *host, size_t size);
 void bdrv_unregister_buf(BlockDriverState *bs, void *host);
+
+int bdrv_co_copy_range(BdrvChild *bs, uint64_t offset,
+   BdrvChild *src, uint64_t src_offset,
+   uint64_t bytes, BdrvRequestFlags flags);
 #endif
diff --git a/include/block/block_int.h b/include/block/block_int.h
index c4dd1d4bb8..305c29b75e 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -206,6 +206,29 @@ struct BlockDriver {
 int coroutine_fn (*bdrv_co_pdiscard)(BlockDriverState *bs,
 int64_t offset, int bytes);
 
+/* Map [offset, offset + nbytes) range onto a child of @bs to copy from,
+ * and invoke bdrv_co_copy_range_from(child, ...), or invoke
+ * bdrv_co_copy_range_to() if @bs is the leaf child to

[Qemu-devel] [RFC PATCH v2 4/7] file-posix: Implement bdrv_co_copy_range

2018-04-17 Thread Fam Zheng

With copy_file_range(2), we can implement the bdrv_co_copy_range
semantics.

Signed-off-by: Fam Zheng 
---
 block/file-posix.c  | 99 +++--
 include/block/raw-aio.h | 10 -
 2 files changed, 104 insertions(+), 5 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 3794c0007a..45ad543481 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -100,6 +100,7 @@
 #ifdef CONFIG_XFS
 #include 
 #endif
+#include 
 
 //#define DEBUG_BLOCK
 
@@ -185,6 +186,8 @@ typedef struct RawPosixAIOData {
 #define aio_ioctl_cmd   aio_nbytes /* for QEMU_AIO_IOCTL */
 off_t aio_offset;
 int aio_type;
+int fd2;
+off_t offset2;
 } RawPosixAIOData;
 
 #if defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
@@ -1421,6 +1424,48 @@ static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData 
*aiocb)
 return -ENOTSUP;
 }
 
+#ifdef __NR_copy_file_range
+#define HAS_COPY_FILE_RANGE
+#endif
+
+#ifdef HAS_COPY_FILE_RANGE
+static off_t copy_file_range(int in_fd, off_t *in_off, int out_fd,
+ off_t *out_off, size_t len, unsigned int flags)
+{
+return syscall(__NR_copy_file_range, in_fd, in_off, out_fd,
+   out_off, len, flags);
+}
+#endif
+
+static ssize_t handle_aiocb_copy_range(RawPosixAIOData *aiocb)
+{
+#ifndef HAS_COPY_FILE_RANGE
+return -ENOTSUP;
+#else
+uint64_t bytes = aiocb->aio_nbytes;
+off_t in_off = aiocb->aio_offset;
+off_t out_off = aiocb->offset2;
+
+while (bytes) {
+ssize_t ret = copy_file_range(aiocb->aio_fildes, _off,
+  aiocb->fd2, _off,
+  bytes, 0);
+if (ret == -EINTR) {
+continue;
+}
+if (ret < 0) {
+return -errno;
+}
+if (!ret) {
+/* No progress (e.g. when beyond EOF), fall back to buffer I/O. */
+return -ENOTSUP;
+}
+bytes -= ret;
+}
+return 0;
+#endif
+}
+
 static ssize_t handle_aiocb_discard(RawPosixAIOData *aiocb)
 {
 int ret = -EOPNOTSUPP;
@@ -1501,6 +1546,9 @@ static int aio_worker(void *arg)
 case QEMU_AIO_WRITE_ZEROES:
 ret = handle_aiocb_write_zeroes(aiocb);
 break;
+case QEMU_AIO_COPY_RANGE:
+ret = handle_aiocb_copy_range(aiocb);
+break;
 default:
 fprintf(stderr, "invalid aio request (0x%x)\n", aiocb->aio_type);
 ret = -EINVAL;
@@ -1511,9 +1559,10 @@ static int aio_worker(void *arg)
 return ret;
 }
 
-static int paio_submit_co(BlockDriverState *bs, int fd,
-  int64_t offset, QEMUIOVector *qiov,
-  int bytes, int type)
+static int paio_submit_co_full(BlockDriverState *bs, int fd,
+   int64_t offset, int fd2, int64_t offset2,
+   QEMUIOVector *qiov,
+   int bytes, int type)
 {
 RawPosixAIOData *acb = g_new(RawPosixAIOData, 1);
 ThreadPool *pool;
@@ -1521,6 +1570,8 @@ static int paio_submit_co(BlockDriverState *bs, int fd,
 acb->bs = bs;
 acb->aio_type = type;
 acb->aio_fildes = fd;
+acb->fd2 = fd2;
+acb->offset2 = offset2;
 
 acb->aio_nbytes = bytes;
 acb->aio_offset = offset;
@@ -1536,6 +1587,13 @@ static int paio_submit_co(BlockDriverState *bs, int fd,
 return thread_pool_submit_co(pool, aio_worker, acb);
 }
 
+static inline int paio_submit_co(BlockDriverState *bs, int fd,
+ int64_t offset, QEMUIOVector *qiov,
+ int bytes, int type)
+{
+return paio_submit_co_full(bs, fd, offset, -1, 0, qiov, bytes, type);
+}
+
 static BlockAIOCB *paio_submit(BlockDriverState *bs, int fd,
 int64_t offset, QEMUIOVector *qiov, int bytes,
 BlockCompletionFunc *cb, void *opaque, int type)
@@ -2312,6 +2370,35 @@ static void raw_abort_perm_update(BlockDriverState *bs)
 raw_handle_perm_lock(bs, RAW_PL_ABORT, 0, 0, NULL);
 }
 
+static int coroutine_fn raw_co_copy_range_from(BlockDriverState *bs,
+  BdrvChild *src, uint64_t src_offset,
+  BdrvChild *dst, uint64_t dst_offset,
+  uint64_t bytes, BdrvRequestFlags flags)
+{
+return bdrv_co_copy_range_to(src, src_offset, dst, dst_offset, bytes, 
flags);
+}
+
+static int coroutine_fn raw_co_copy_range_to(BlockDriverState *bs,
+ BdrvChild *src, uint64_t 
src_offset,
+ BdrvChild *dst, uint64_t 
dst_offset,
+ uint64_t bytes, BdrvRequestFlags 
flags)
+{
+BDRVRawState *s = bs->opaque;
+BDRVRawState *src_s;
+
+assert(dst->bs == bs);
+if (src->bs->drv->bdrv_co_copy_range_to != raw_co_copy_range_to) {
+return -ENOTSUP;
+}
+
+src_s = src->opaque;
+if

[Qemu-devel] [RFC PATCH v2 3/7] qcow2: Implement copy offloading

2018-04-17 Thread Fam Zheng

The two callbacks are implemented quite similarly to the read/write
functions: bdrv_co_copy_range_from maps for read and calls into bs->file
or bs->backing depending on the allocation status; bdrv_co_copy_range_to
maps for write and calls into bs->file.

Signed-off-by: Fam Zheng 
---
 block/qcow2.c | 224 ++
 1 file changed, 194 insertions(+), 30 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 486f3e83b7..9a0046220d 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1755,6 +1755,31 @@ static int coroutine_fn 
qcow2_co_block_status(BlockDriverState *bs,
 return status;
 }
 
+static int qcow2_handle_l2meta(BlockDriverState *bs, QCowL2Meta *l2meta)
+{
+int ret = 0;
+
+while (l2meta != NULL) {
+QCowL2Meta *next;
+
+if (!ret) {
+ret = qcow2_alloc_cluster_link_l2(bs, l2meta);
+}
+
+/* Take the request off the list of running requests */
+if (l2meta->nb_clusters != 0) {
+QLIST_REMOVE(l2meta, next_in_flight);
+}
+
+qemu_co_queue_restart_all(>dependent_requests);
+
+next = l2meta->next;
+g_free(l2meta);
+l2meta = next;
+}
+return ret;
+}
+
 static coroutine_fn int qcow2_co_preadv(BlockDriverState *bs, uint64_t offset,
 uint64_t bytes, QEMUIOVector *qiov,
 int flags)
@@ -2041,24 +2066,10 @@ static coroutine_fn int 
qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset,
 }
 }
 
-while (l2meta != NULL) {
-QCowL2Meta *next;
-
-ret = qcow2_alloc_cluster_link_l2(bs, l2meta);
-if (ret < 0) {
-goto fail;
-}
-
-/* Take the request off the list of running requests */
-if (l2meta->nb_clusters != 0) {
-QLIST_REMOVE(l2meta, next_in_flight);
-}
-
-qemu_co_queue_restart_all(>dependent_requests);
-
-next = l2meta->next;
-g_free(l2meta);
-l2meta = next;
+ret = qcow2_handle_l2meta(bs, l2meta);
+l2meta = NULL;
+if (ret) {
+goto fail;
 }
 
 bytes -= cur_bytes;
@@ -2069,18 +2080,7 @@ static coroutine_fn int 
qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset,
 ret = 0;
 
 fail:
-while (l2meta != NULL) {
-QCowL2Meta *next;
-
-if (l2meta->nb_clusters != 0) {
-QLIST_REMOVE(l2meta, next_in_flight);
-}
-qemu_co_queue_restart_all(>dependent_requests);
-
-next = l2meta->next;
-g_free(l2meta);
-l2meta = next;
-}
+qcow2_handle_l2meta(bs, l2meta);
 
 qemu_co_mutex_unlock(>lock);
 
@@ -3267,6 +3267,168 @@ static coroutine_fn int 
qcow2_co_pdiscard(BlockDriverState *bs,
 return ret;
 }
 
+static int qcow2_co_copy_range_from(BlockDriverState *bs,
+BdrvChild *src, uint64_t src_offset,
+BdrvChild *dst, uint64_t dst_offset,
+uint64_t bytes, BdrvRequestFlags flags)
+{
+BDRVQcow2State *s = bs->opaque;
+int offset_in_cluster;
+int ret;
+unsigned int cur_bytes; /* number of bytes in current iteration */
+uint64_t cluster_offset = 0;
+BdrvChild *child = NULL;
+
+assert(!bs->encrypted);
+qemu_co_mutex_lock(>lock);
+
+while (bytes != 0) {
+
+/* prepare next request */
+cur_bytes = MIN(bytes, INT_MAX);
+
+ret = qcow2_get_cluster_offset(bs, src_offset, _bytes, 
_offset);
+if (ret < 0) {
+goto out;
+}
+
+offset_in_cluster = offset_into_cluster(s, src_offset);
+
+switch (ret) {
+case QCOW2_CLUSTER_UNALLOCATED:
+if (bs->backing) {
+child = bs->backing;
+} else {
+flags |= BDRV_REQ_ZERO_WRITE;
+}
+break;
+
+case QCOW2_CLUSTER_ZERO_PLAIN:
+case QCOW2_CLUSTER_ZERO_ALLOC:
+flags |= BDRV_REQ_ZERO_WRITE;
+break;
+
+case QCOW2_CLUSTER_COMPRESSED:
+ret = -ENOTSUP;
+goto out;
+break;
+
+case QCOW2_CLUSTER_NORMAL:
+child = bs->file;
+if ((cluster_offset & 511) != 0) {
+ret = -EIO;
+goto out;
+}
+break;
+
+default:
+abort();
+}
+qemu_co_mutex_unlock(>lock);
+ret = bdrv_co_copy_range_from(child,
+  cluster_offset + offset_in_cluster,
+  dst, dst_offset,
+  cur_bytes, flags);
+qemu_co_mutex_lock(>lock);
+if (ret < 0) {
+goto out;
+}
+
+bytes -= cur_bytes;
+src_offset += cur_bytes;
+}
+

[Qemu-devel] [RFC PATCH v2 0/7] qemu-img convert with copy offloading

2018-04-17 Thread Fam Zheng

v2: - Add iscsi EXTENDED COPY.
- Design change: bdrv_co_copy_range_{from,to}. [Stefan]
- Retry upon EINTR. [Stefan]
- Drop the bounce buffer fallback. It is inefficient to attempt the
  offloaded copy over and over again if the error is returned from the host
  rather than checking our internal state. E.g.  trying copy_file_range
  between two filesystems results in EXDEV, in which case qemu-img.c should
  switch back to the copy path for the remaining sectors.

This series introduces block layer API for copy offloading and makes use of it
in qemu-img convert.

For now we implemented the operation in local file protocol with
copy_file_range(2).  Besides that it's possible to add similar to iscsi, nfs
and potentially more.

As far as its usage goes, in addition to qemu-img convert, we can emulate
offloading in scsi-disk (handle EXTENDED COPY command), and use the API in
block jobs too.

Fam Zheng (7):
  block: Introduce API for copy offloading
  raw: Implement copy offloading
  qcow2: Implement copy offloading
  file-posix: Implement bdrv_co_copy_range
  iscsi: Implement copy offloading
  block-backend: Add blk_co_copy_range
  qemu-img: Convert with copy offloading

 block/block-backend.c  |   9 ++
 block/file-posix.c |  99 ++-
 block/io.c |  91 ++
 block/iscsi.c  | 266 +
 block/qcow2.c  | 224 +-
 block/raw-format.c |  20 
 include/block/block.h  |   4 +
 include/block/block_int.h  |  30 +
 include/block/raw-aio.h|  10 +-
 include/scsi/constants.h   |   3 +
 include/sysemu/block-backend.h |   4 +
 qemu-img.c |  50 +++-
 12 files changed, 773 insertions(+), 37 deletions(-)

-- 
2.14.3

Re: [Qemu-devel] [Qemu-ppc] [PATCH for 2.13] spapr: Correct max associativity domains value for non-NUMA configs

2018-04-17 Thread David Gibson

On Tue, Apr 17, 2018 at 07:17:51PM +0200, Greg Kurz wrote:
> Cc'ing David who should always be in the recipient list when posting ppc 
> related
> patches :)
> 
> On Tue, 17 Apr 2018 12:21:35 -0400
> Serhii Popovych  wrote:
> 
> > In non-NUMA configurations nb_numa_nodes is zero and we set 5th cell
> > in ibm,max-associativity-domains to -1. That causes to stall Linux
> > guests during boot after following line:
> > 
> > [0.00] NUMA associativity depth for CPU/Memory: 4
> > 
> > Make last possible NUMA in property zero to correct support for
> > non-NUMA guests.
> > 
> 
> Alternatively, as suggested by David in some other mail, you could drop the
> property in this case. I've checked it fixes the hang too, and it probably
> makes more sense than exposing only zeroes.

Actually, I think this is the better solution. qemu treats "not NUMA"
and "exactly one NUMA node" differently for historical reasons, but I
don't think that actually makes a whole lot of sense.  I think
advertising them identically so we don't generate more special cases
on the guest side is a better idea.

> > Fixes: c1df49a670ef ("spapr: Add ibm,max-associativity-domains property")
> 
> Since c1df49a670ef hasn't hit master yet, I guess we should squash this
> patch (or any alternative) there to preserve bisect, ie, either David
> does it for you or you post a v4 of your previous series.

Right, I've folded these together in my tree so we don't get an
interval of broken commits.

> 
> > Signed-off-by: Serhii Popovych 
> > ---
> >  hw/ppc/spapr.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index 7b2bc4e..bff2125 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -914,7 +914,7 @@ static void spapr_dt_rtas(sPAPRMachineState *spapr, 
> > void *fdt)
> >  cpu_to_be32(0),
> >  cpu_to_be32(0),
> >  cpu_to_be32(0),
> > -cpu_to_be32(nb_numa_nodes - 1),
> > +cpu_to_be32(nb_numa_nodes ? nb_numa_nodes - 1 : 0),
> >  };
> >  
> >  _FDT(rtas = fdt_add_subnode(fdt, 0, "rtas"));
> 
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v2 1/1] migration: calculate expected_downtime with ram_bytes_remaining()

2018-04-17 Thread David Gibson

On Wed, Apr 18, 2018 at 10:55:50AM +1000, David Gibson wrote:
> On Tue, Apr 17, 2018 at 06:53:17PM +0530, Balamuruhan S wrote:
> > expected_downtime value is not accurate with dirty_pages_rate * page_size,
> > using ram_bytes_remaining would yeild it correct.
> 
> This commit message hasn't been changed since v1, but the patch is
> doing something completely different.  I think most of the info from
> your cover letter needs to be in here.
> 
> > 
> > Signed-off-by: Balamuruhan S 
> > ---
> >  migration/migration.c | 6 +++---
> >  migration/migration.h | 1 +
> >  2 files changed, 4 insertions(+), 3 deletions(-)
> > 
> > diff --git a/migration/migration.c b/migration/migration.c
> > index 52a5092add..4d866bb920 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -614,7 +614,7 @@ static void populate_ram_info(MigrationInfo *info, 
> > MigrationState *s)
> >  }
> >  
> >  if (s->state != MIGRATION_STATUS_COMPLETED) {
> > -info->ram->remaining = ram_bytes_remaining();
> > +info->ram->remaining = s->ram_bytes_remaining;
> >  info->ram->dirty_pages_rate = ram_counters.dirty_pages_rate;
> >  }
> >  }
> > @@ -2227,6 +2227,7 @@ static void migration_update_counters(MigrationState 
> > *s,
> >  transferred = qemu_ftell(s->to_dst_file) - s->iteration_initial_bytes;
> >  time_spent = current_time - s->iteration_start_time;
> >  bandwidth = (double)transferred / time_spent;
> > +s->ram_bytes_remaining = ram_bytes_remaining();
> >  s->threshold_size = bandwidth * s->parameters.downtime_limit;
> >  
> >  s->mbps = (((double) transferred * 8.0) /
> > @@ -2237,8 +2238,7 @@ static void migration_update_counters(MigrationState 
> > *s,
> >   * recalculate. 1 is a small enough number for our purposes
> >   */
> >  if (ram_counters.dirty_pages_rate && transferred > 1) {
> > -s->expected_downtime = ram_counters.dirty_pages_rate *
> > -qemu_target_page_size() / bandwidth;
> > +s->expected_downtime = s->ram_bytes_remaining / bandwidth;
> >  }

..but more importantly, I still think this change is bogus.  expected
downtime is not the same thing as remaining ram / bandwidth.

> >  
> >  qemu_file_reset_rate_limit(s->to_dst_file);
> > diff --git a/migration/migration.h b/migration/migration.h
> > index 8d2f320c48..8584f8e22e 100644
> > --- a/migration/migration.h
> > +++ b/migration/migration.h
> > @@ -128,6 +128,7 @@ struct MigrationState
> >  int64_t downtime_start;
> >  int64_t downtime;
> >  int64_t expected_downtime;
> > +int64_t ram_bytes_remaining;
> >  bool enabled_capabilities[MIGRATION_CAPABILITY__MAX];
> >  int64_t setup_time;
> >  /*
> 



-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v2 1/1] migration: calculate expected_downtime with ram_bytes_remaining()

2018-04-17 Thread David Gibson

On Tue, Apr 17, 2018 at 06:53:17PM +0530, Balamuruhan S wrote:
> expected_downtime value is not accurate with dirty_pages_rate * page_size,
> using ram_bytes_remaining would yeild it correct.

This commit message hasn't been changed since v1, but the patch is
doing something completely different.  I think most of the info from
your cover letter needs to be in here.

> 
> Signed-off-by: Balamuruhan S 
> ---
>  migration/migration.c | 6 +++---
>  migration/migration.h | 1 +
>  2 files changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 52a5092add..4d866bb920 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -614,7 +614,7 @@ static void populate_ram_info(MigrationInfo *info, 
> MigrationState *s)
>  }
>  
>  if (s->state != MIGRATION_STATUS_COMPLETED) {
> -info->ram->remaining = ram_bytes_remaining();
> +info->ram->remaining = s->ram_bytes_remaining;
>  info->ram->dirty_pages_rate = ram_counters.dirty_pages_rate;
>  }
>  }
> @@ -2227,6 +2227,7 @@ static void migration_update_counters(MigrationState *s,
>  transferred = qemu_ftell(s->to_dst_file) - s->iteration_initial_bytes;
>  time_spent = current_time - s->iteration_start_time;
>  bandwidth = (double)transferred / time_spent;
> +s->ram_bytes_remaining = ram_bytes_remaining();
>  s->threshold_size = bandwidth * s->parameters.downtime_limit;
>  
>  s->mbps = (((double) transferred * 8.0) /
> @@ -2237,8 +2238,7 @@ static void migration_update_counters(MigrationState *s,
>   * recalculate. 1 is a small enough number for our purposes
>   */
>  if (ram_counters.dirty_pages_rate && transferred > 1) {
> -s->expected_downtime = ram_counters.dirty_pages_rate *
> -qemu_target_page_size() / bandwidth;
> +s->expected_downtime = s->ram_bytes_remaining / bandwidth;
>  }
>  
>  qemu_file_reset_rate_limit(s->to_dst_file);
> diff --git a/migration/migration.h b/migration/migration.h
> index 8d2f320c48..8584f8e22e 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -128,6 +128,7 @@ struct MigrationState
>  int64_t downtime_start;
>  int64_t downtime;
>  int64_t expected_downtime;
> +int64_t ram_bytes_remaining;
>  bool enabled_capabilities[MIGRATION_CAPABILITY__MAX];
>  int64_t setup_time;
>  /*

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH RESEND v2] i386/kvm: add support for KVM_CAP_X86_DISABLE_EXITS

2018-04-17 Thread Wanpeng Li

2018-04-18 4:59 GMT+08:00 Eduardo Habkost :
> On Tue, Apr 17, 2018 at 01:24:15AM -0700, Wanpeng Li wrote:
[.../...]
>>
>> +if (env->features[FEAT_KVM_HINTS] & KVM_HINTS_DEDICATED) {
>> +int disable_exits = kvm_check_extension(cs->kvm_state, 
>> KVM_CAP_X86_DISABLE_EXITS);
>> +
>> +if (disable_exits) {
>> +disable_exits &= (KVM_X86_DISABLE_EXITS_MWAIT |
>> +  KVM_X86_DISABLE_EXITS_HLT |
>> +  KVM_X86_DISABLE_EXITS_PAUSE);
>> +if (env->user_features[FEAT_KVM] & KVM_PV_UNHALT) {
>> +disable_exits &= ~KVM_X86_DISABLE_EXITS_HLT;
>> +}
>
> In the future, if we decide to enable kvm-pv-unhalt by default,
> should "-cpu ...,kvm-hint-dedicated=on" disable kvm-pv-unhalt
> automatically, or should we require an explicit
> "kvm-hint-dedicated=on,kvm-pv-unhalt=off" option?
>
> For today's defaults, this patch solves the problem, only one
> thing is missing before I give my R-b: we need to clearly
> document what exactly are the consequences and requirements of
> setting kvm-hint-dedicated=on (I'm not sure if the best place for
> this is qemu-options.hx, x86_cpu_list(), or somewhere else).

What's your opinion, Paolo?

Regards,
Wanpeng Li

Re: [Qemu-devel] [PATCH RESEND v2] i386/kvm: add support for KVM_CAP_X86_DISABLE_EXITS

2018-04-17 Thread Wanpeng Li

2018-04-18 2:08 GMT+08:00 Michael S. Tsirkin :
> On Tue, Apr 17, 2018 at 01:24:15AM -0700, Wanpeng Li wrote:
>> From: Wanpeng Li 
>>
>> This patch adds support for KVM_CAP_X86_DISABLE_EXITS. Provides userspace 
>> with
>> per-VM capability(KVM_CAP_X86_DISABLE_EXITS) to not intercept MWAIT/HLT/PAUSE
>> in order that to improve latency in some workloads.
>>
>> Cc: Paolo Bonzini 
>> Cc: Radim Krčmář 
>> Cc: Eduardo Habkost 
>> Signed-off-by: Wanpeng Li 
>> ---
>>
>>  linux-headers/linux/kvm.h |  6 +-
>>  target/i386/cpu.h |  2 ++
>>  target/i386/kvm.c | 16 
>>  3 files changed, 23 insertions(+), 1 deletion(-)
>>
>> diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
>> index a167be8..857df15 100644
>> --- a/linux-headers/linux/kvm.h
>> +++ b/linux-headers/linux/kvm.h
>> @@ -925,7 +925,7 @@ struct kvm_ppc_resize_hpt {
>>  #define KVM_CAP_S390_GS 140
>>  #define KVM_CAP_S390_AIS 141
>>  #define KVM_CAP_SPAPR_TCE_VFIO 142
>> -#define KVM_CAP_X86_GUEST_MWAIT 143
>> +#define KVM_CAP_X86_DISABLE_EXITS 143
>>  #define KVM_CAP_ARM_USER_IRQ 144
>>  #define KVM_CAP_S390_CMMA_MIGRATION 145
>>  #define KVM_CAP_PPC_FWNMI 146
>> @@ -1508,6 +1508,10 @@ struct kvm_assigned_msix_entry {
>>  #define KVM_X2APIC_API_USE_32BIT_IDS(1ULL << 0)
>>  #define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK  (1ULL << 1)
>>
>> +#define KVM_X86_DISABLE_EXITS_MWAIT  (1 << 0)
>> +#define KVM_X86_DISABLE_EXITS_HLT(1 << 1)
>> +#define KVM_X86_DISABLE_EXITS_PAUSE  (1 << 2)
>> +
>>  /* Available with KVM_CAP_ARM_USER_IRQ */
>>
>>  /* Bits for run->s.regs.device_irq_level */
>> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
>> index 1b219fa..965de1b 100644
>> --- a/target/i386/cpu.h
>> +++ b/target/i386/cpu.h
>> @@ -685,6 +685,8 @@ typedef uint32_t FeatureWordArray[FEATURE_WORDS];
>>  #define CPUID_7_0_EDX_AVX512_4FMAPS (1U << 3) /* AVX512 Multiply 
>> Accumulation Single Precision */
>>  #define CPUID_7_0_EDX_SPEC_CTRL (1U << 26) /* Speculation Control */
>>
>> +#define KVM_PV_UNHALT (1U << 7)
>> +
>
> Why don't we use KVM_FEATURE_PV_UNHALT from kvm_para.h?
>
>>  #define KVM_HINTS_DEDICATED (1U << 0)
>>
>
> BTW I wonder whether we should switch to a value from
> kvm_para.h? I'll send a version to do it, pls take a look.

Yeah, your patchset looks good.

Regards,
Wanpeng Li

>
>
>>  #define CPUID_8000_0008_EBX_IBPB(1U << 12) /* Indirect Branch 
>> Prediction Barrier */
>> diff --git a/target/i386/kvm.c b/target/i386/kvm.c
>> index 6c49954..3e99830 100644
>> --- a/target/i386/kvm.c
>> +++ b/target/i386/kvm.c
>> @@ -1029,6 +1029,22 @@ int kvm_arch_init_vcpu(CPUState *cs)
>>  }
>>  }
>>
>> +if (env->features[FEAT_KVM_HINTS] & KVM_HINTS_DEDICATED) {
>> +int disable_exits = kvm_check_extension(cs->kvm_state, 
>> KVM_CAP_X86_DISABLE_EXITS);
>> +
>> +if (disable_exits) {
>> +disable_exits &= (KVM_X86_DISABLE_EXITS_MWAIT |
>> +  KVM_X86_DISABLE_EXITS_HLT |
>> +  KVM_X86_DISABLE_EXITS_PAUSE);
>> +if (env->user_features[FEAT_KVM] & KVM_PV_UNHALT) {
>> +disable_exits &= ~KVM_X86_DISABLE_EXITS_HLT;
>> +}
>> +}
>> +if (kvm_vm_enable_cap(cs->kvm_state, KVM_CAP_X86_DISABLE_EXITS, 0, 
>> disable_exits)) {
>> +error_report("kvm: DISABLE EXITS not supported");
>> +}
>> +}
>> +
>>  qemu_add_vm_change_state_handler(cpu_update_state, env);
>>
>>  c = cpuid_find_entry(_data.cpuid, 1, 0);
>> --
>> 2.7.4

Re: [Qemu-devel] [PATCH] fpu/softfloat: check for Inf / x or 0 / x before /0

2018-04-17 Thread Peter Maydell

On 18 April 2018 at 00:01, Peter Maydell  wrote:
> I don't have the original IEEE754 spec to hand though;
> that may have left this unspecified.

Having located a copy of 754-1985 I think that also is
clear enough that float-float conversion is an operation
that must quieten SNaN and raise Invalid.

thanks
-- PMM

Re: [Qemu-devel] [PATCH] fpu/softfloat: check for Inf / x or 0 / x before /0

2018-04-17 Thread Peter Maydell

On 17 April 2018 at 23:49, Richard Henderson
 wrote:
> On 04/17/2018 12:38 PM, Emilio G. Cota wrote:
>> On Tue, Apr 17, 2018 at 22:45:51 +0100, Peter Maydell wrote:
>>> On 17 April 2018 at 22:27, Emilio G. Cota  wrote:
 (...)
 +cff 0xffb0, expected: 0x7ff8, returned: 
 0x7ff4, \
   expected exceptions: i, returned: none
 +error: flags mismatch for input @ ibm/Basic-Types-Inputs.fptest:26170:
 +b32b64cff =0 S -> Q i
>>>
>>> SNaN conversion from 32 bit to 64 bit. Here I agree
>>> with the test -- we should quieten the NaN and raise
>>> Invalid -- which implies that the hardware is wrong ?!?
>>
>> This passes on an Intel host, and fails on both Power7 and 8 hosts I have
>> access to. I don't have the Power ISA spec in front of me, but I hope
>> there's something about this specified in it.
>
> IIRC this is unspecified and does vary by implementation.

I think 754-2008 does specify it: s6.2 says that you get
'set Invalid and return a QNaN if an input is an SNaN' for
"every general-computational and signaling-computational
operation except for the conversions described in 5.12".
So the only exceptions are:
 1) the s5.12 conversions, which are to/from strings-of-characters
 2) quiet-computational operations, which are just
copy, abs, negate, copySign, and some re-encoding
operations involving decimal formats

float-to-float conversions are general-computational.

I don't have the original IEEE754 spec to hand though;
that may have left this unspecified.

thanks
-- PMM

Re: [Qemu-devel] [PATCH v6 2/7] hw/misc: add vmcoreinfo device

2018-04-17 Thread Eduardo Habkost

On Tue, Apr 17, 2018 at 06:31:57PM -0400, Cole Robinson wrote:
> On 04/17/2018 05:11 PM, Eduardo Habkost wrote:
> > On Tue, Apr 17, 2018 at 03:12:03PM -0400, Cole Robinson wrote:
> > [...]
> >> Reviving this... did any follow up changes happen?
> >>
> >> Marc-André patched virt-manager a few months back to enable -device
> >> vmcoreinfo for new VMs:
> >>
> >> https://www.redhat.com/archives/virt-tools-list/2018-February/msg00020.html
> >>
> >> And I see there's at least a bug tracking adding this to openstack for
> >> new VMs:
> >>
> >> https://bugzilla.redhat.com/show_bug.cgi?id=1555276
> >>
> >> If this feature doesn't really have any downsides, it would be nice to
> >> get this tied to new machine types. Saves a lot of churn for higher
> >> levels of the stack
> > 
> > I understand this would be nice to have considering the existing
> > stacks, but at the same time I would like the rest of the
> > stack(s) to really try to not depend on QEMU machine-types to
> > define policy/defaults.
> > 
> > Every feature that is hidden behind an opaque machine-type name
> > and not visible in the domain XML and QEMU command-line increases
> > the risk of migration and compatibility bugs.
> > 
> 
> What exactly is the migration compatibility issue with turning on the
> equivalent of -device vmcoreinfo for -M *-2.13+ ? Possibly prevents
> backwards migration to older qemu but is that even a goal?

I mean the extra migration compatibility code that needs to be
maintained on older machine-types.  It's extra maintenance burden
on both upstream and downstream QEMU trees.


> 
> > This was being discussed in a mail thread at:
> > https://www.mail-archive.com/ovirt-devel@redhat.com/msg01196.html
> > 
> > Quoting Daniel, on that thread:
> > 
> > ] Another case is the pvpanic device - while in theory that could
> > ] have been enabled by default for all guests, by QEMU or a config
> > ] generator library, doing so is not useful on its own. The hard
> > ] bit of the work is adding code to the mgmt app to choose the
> > ] action for when pvpanic triggers, and code to handle the results
> > ] of that action.
> > 
> > From that comment, I understand that simply making QEMU create a
> > pvpanic device by default on pc-2.13+ won't be useful at all?
> > 
> 
> This qemu-devel thread was about -device vmcoreinfo though, not pvpanic.
> vmcoreinfo doesn't need anything else to work AFAICT and shouldn't need
> any explicit config, heck it doesn't even have any -device properties.
> 
> Like Dan says pvpanic isn't a 'just works' thing, and I know for windows
> VMs it shows up in device manager which has considerations for things
> like SVVP. I think vmcoreinfo doesn't have the same impact
> 

Oops, nevermind.  I confused both.


> There are some guest visible things that we have turned on for new
> machine types in the past, pveoi and x2apic comes to mind.

Yes, we have tons of guest-visible things that we tie to the
machine-type.  What I'm looking for is a solution to make this
less frequent in the future.

-- 
Eduardo

Re: [Qemu-devel] [PATCH] fpu/softfloat: check for Inf / x or 0 / x before /0

2018-04-17 Thread Richard Henderson

On 04/17/2018 12:38 PM, Emilio G. Cota wrote:
> On Tue, Apr 17, 2018 at 22:45:51 +0100, Peter Maydell wrote:
>> On 17 April 2018 at 22:27, Emilio G. Cota  wrote:
>>> BTW I just checked with -t host on an IBM Power8, and we get
>>> the same 1049 flag errors we get with -t soft plus two additional ones:
>>>
>>> +A 0xffb0, expected: 0x7fa0, returned: 0x7fa0, \
>>>   expected exceptions: i, returned: none
>>> +error: flags mismatch for input @ ibm/Basic-Types-Inputs.fptest:382:
>>> +b32A =0 S -> S i
>>
>> That's Abs of an SNaN; the test expects Invalid, which is wrong,
>> because IEEE754 says absolute-value is a "quiet-computational
>> operation" that never signals an exception.
>>
>> What's odd is that we don't report that error for the softfloat
>> implementation! I also don't understand why the expected value
>> isn't just the input value with the sign bit flipped.
> 
> With -t soft we don't handle "abs" and we don't get the error -- we get
> a "not handled" instead.
> Is there a function that we could use for abs? The only ones I've seen
> are floatX_abs() which mask out the sign bit and do nothing else.

Both abs and neg are pure bit operations.  So, yes, floatX_abs (and floatX_chs)
are the softfloat functions for these.

And if fp-test thinks Invalid should be raised, it's wrong.

>>> (...)
>>> +cff 0xffb0, expected: 0x7ff8, returned: 
>>> 0x7ff4, \
>>>   expected exceptions: i, returned: none
>>> +error: flags mismatch for input @ ibm/Basic-Types-Inputs.fptest:26170:
>>> +b32b64cff =0 S -> Q i
>>
>> SNaN conversion from 32 bit to 64 bit. Here I agree
>> with the test -- we should quieten the NaN and raise
>> Invalid -- which implies that the hardware is wrong ?!?
> 
> This passes on an Intel host, and fails on both Power7 and 8 hosts I have
> access to. I don't have the Power ISA spec in front of me, but I hope
> there's something about this specified in it.

IIRC this is unspecified and does vary by implementation.


r~

Re: [Qemu-devel] [PATCH v1 1/1] xilinx_spips: send dummy only if cmd requires it

2018-04-17 Thread francisco iglesias

Hi Sai,

[PATCH v1] xilinx_spips: send dummy only if cmd requires it

s/dummy/dummy cycles/

On 17 April 2018 at 16:18, Sai Pavan Boddu 
wrote:

> For all the commands, which do not have an entry in
> xilinx_spips_num_dummies, present logic sends dummy byte when ever we
>

s/dummy byte/dummy cycles/


> are in SNOOP_NONE state, fix it to send only if cmd requires them.
>
> s/fix it to send only if cmd/fix it to only send dummy cycles if the
command/

Only transmit max of 1 dummy byte(i.e 8 cycles) is a single snoop cycle.
> And also convert dummy bytes to cycles (required by m25p80).
>

Maybe it is better to drop this two last lines (was already done before so
it could be misleading when reading git history).


>
>
Signed-off-by: Sai Pavan Boddu 
> ---
>  hw/ssi/xilinx_spips.c | 13 ++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/hw/ssi/xilinx_spips.c b/hw/ssi/xilinx_spips.c
> index 426f971..8278930 100644
> --- a/hw/ssi/xilinx_spips.c
> +++ b/hw/ssi/xilinx_spips.c
> @@ -627,10 +627,17 @@ static void xilinx_spips_flush_txfifo(XilinxSPIPS
> *s)
>  tx_rx[i] = tx;
>  }
>  } else {
> -/* Extract a dummy byte and generate dummy cycles according
> to the
> - * link state */
>  tx = fifo8_pop(>tx_fifo);
> -dummy_cycles = 8 / s->link_state;
> +if (s->cmd_dummies > 0) {
> +/* Extract a dummy byte and generate dummy cycles
> according to
> + * the link state */
> + dummy_cycles = (s->cmd_dummies ? 1 : 0) * 8 /
> s->link_state;
> + s->cmd_dummies--;
> +} else {
> +for (i = 0; i < num_effective_busses(s); ++i) {
> +tx_rx[i] = tx;
> +}
> +}
>  }
>
>
Could we replace above with below in the same if ladder so we don't
complicate the code more than necessary? (Should give the same result when
num_effective_busses == 1)

-} else if (s->snoop_state == SNOOP_STRIPING) {
+} else if (s->snoop_state == SNOOP_STRIPING ||
+   s->snoop_state == SNOOP_NONE) {


Thank you!

Best regards,
Francisco Iglesias




>  for (i = 0; i < num_effective_busses(s); ++i) {
> --
> 2.7.4
>
>

[Qemu-devel] [qemu RFC v2] qapi: add "firmware.json"

2018-04-17 Thread Laszlo Ersek

Add a schema that describes the different uses and properties of virtual
machine firmware.

Each firmware executable installed on a host system should come with at
least one JSON file that conforms to this schema. Each file informs the
management applications about the firmware's properties and one possible
use case / feature set.

In addition, a configuration directory with symlinks to the JSON files
should exist, with the symlinks carefully named to reflect a priority
order. Management applications can then search this directory in priority
order for the first firmware description that satisfies their search
criteria. The found JSON file provides the management layer with domain
configuration bits that are required to run the firmware binary for a
certain use case or feature set.

Cc: "Daniel P. Berrange" 
Cc: Alexander Graf 
Cc: Ard Biesheuvel 
Cc: David Gibson 
Cc: Eric Blake 
Cc: Gary Ching-Pang Lin 
Cc: Gerd Hoffmann 
Cc: Kashyap Chamarthy 
Cc: Markus Armbruster 
Cc: Michael Roth 
Cc: Michal Privoznik 
Cc: Paolo Bonzini 
Cc: Peter Krempa 
Cc: Peter Maydell 
Cc: Thomas Huth 
Signed-off-by: Laszlo Ersek 
---

Notes:
RFCv2:
- previous version (RFCv1) was posted at
  <20180407000117.25640-1-lersek@redhat.com">http://mid.mail-archive.com/20180407000117.25640-1-lersek@redhat.com>

- v2 is basically a rewrite from scratch, starting out with Dan's
  definitions from
  <20180410102033.GL5155@redhat.com">http://mid.mail-archive.com/20180410102033.GL5155@redhat.com> and
  <20180410110357.GP5155@redhat.com">http://mid.mail-archive.com/20180410110357.GP5155@redhat.com>,
  hopefully addressing others' feedback as well

RFCv1:
- Folks on the CC list, please try to see if the suggested schema is
  flexible enough to describe the virtual firmware(s) that you are
  familiar with. Thanks!

 Makefile  |   9 +
 Makefile.objs |   4 +
 qapi/firmware.json| 503 ++
 qapi/qapi-schema.json |   1 +
 qmp.c |   5 +
 .gitignore|   4 +
 6 files changed, 526 insertions(+)
 create mode 100644 qapi/firmware.json

diff --git a/Makefile b/Makefile
index d71dd5bea416..32034abe1583 100644
--- a/Makefile
+++ b/Makefile
@@ -97,6 +97,7 @@ GENERATED_FILES += qapi/qapi-types-block.h 
qapi/qapi-types-block.c
 GENERATED_FILES += qapi/qapi-types-char.h qapi/qapi-types-char.c
 GENERATED_FILES += qapi/qapi-types-common.h qapi/qapi-types-common.c
 GENERATED_FILES += qapi/qapi-types-crypto.h qapi/qapi-types-crypto.c
+GENERATED_FILES += qapi/qapi-types-firmware.h qapi/qapi-types-firmware.c
 GENERATED_FILES += qapi/qapi-types-introspect.h qapi/qapi-types-introspect.c
 GENERATED_FILES += qapi/qapi-types-migration.h qapi/qapi-types-migration.c
 GENERATED_FILES += qapi/qapi-types-misc.h qapi/qapi-types-misc.c
@@ -115,6 +116,7 @@ GENERATED_FILES += qapi/qapi-visit-block.h 
qapi/qapi-visit-block.c
 GENERATED_FILES += qapi/qapi-visit-char.h qapi/qapi-visit-char.c
 GENERATED_FILES += qapi/qapi-visit-common.h qapi/qapi-visit-common.c
 GENERATED_FILES += qapi/qapi-visit-crypto.h qapi/qapi-visit-crypto.c
+GENERATED_FILES += qapi/qapi-visit-firmware.h qapi/qapi-visit-firmware.c
 GENERATED_FILES += qapi/qapi-visit-introspect.h qapi/qapi-visit-introspect.c
 GENERATED_FILES += qapi/qapi-visit-migration.h qapi/qapi-visit-migration.c
 GENERATED_FILES += qapi/qapi-visit-misc.h qapi/qapi-visit-misc.c
@@ -132,6 +134,7 @@ GENERATED_FILES += qapi/qapi-commands-block.h 
qapi/qapi-commands-block.c
 GENERATED_FILES += qapi/qapi-commands-char.h qapi/qapi-commands-char.c
 GENERATED_FILES += qapi/qapi-commands-common.h qapi/qapi-commands-common.c
 GENERATED_FILES += qapi/qapi-commands-crypto.h qapi/qapi-commands-crypto.c
+GENERATED_FILES += qapi/qapi-commands-firmware.h qapi/qapi-commands-firmware.c
 GENERATED_FILES += qapi/qapi-commands-introspect.h 
qapi/qapi-commands-introspect.c
 GENERATED_FILES += qapi/qapi-commands-migration.h 
qapi/qapi-commands-migration.c
 GENERATED_FILES += qapi/qapi-commands-misc.h qapi/qapi-commands-misc.c
@@ -149,6 +152,7 @@ GENERATED_FILES += qapi/qapi-events-block.h 
qapi/qapi-events-block.c
 GENERATED_FILES += qapi/qapi-events-char.h qapi/qapi-events-char.c
 GENERATED_FILES += qapi/qapi-events-common.h qapi/qapi-events-common.c
 GENERATED_FILES += qapi/qapi-events-crypto.h qapi/qapi-events-crypto.c
+GENERATED_FILES += qapi/qapi-events-firmware.h qapi/qapi-events-firmware.c
 GENERATED_FILES += qapi/qapi-events-introspect.h qapi/qapi-events-introspect.c
 GENERATED_FILES += qapi/qapi-events-migration.h qapi/qapi-events-migration.c
 GENERATED_FILES += qapi/qapi-events-misc.h

Re: [Qemu-devel] [PATCH] fpu/softfloat: check for Inf / x or 0 / x before /0

2018-04-17 Thread Emilio G. Cota

On Tue, Apr 17, 2018 at 22:45:51 +0100, Peter Maydell wrote:
> On 17 April 2018 at 22:27, Emilio G. Cota  wrote:
> > BTW I just checked with -t host on an IBM Power8, and we get
> > the same 1049 flag errors we get with -t soft plus two additional ones:
> >
> > +A 0xffb0, expected: 0x7fa0, returned: 0x7fa0, \
> >   expected exceptions: i, returned: none
> > +error: flags mismatch for input @ ibm/Basic-Types-Inputs.fptest:382:
> > +b32A =0 S -> S i
> 
> That's Abs of an SNaN; the test expects Invalid, which is wrong,
> because IEEE754 says absolute-value is a "quiet-computational
> operation" that never signals an exception.
> 
> What's odd is that we don't report that error for the softfloat
> implementation! I also don't understand why the expected value
> isn't just the input value with the sign bit flipped.

With -t soft we don't handle "abs" and we don't get the error -- we get
a "not handled" instead.
Is there a function that we could use for abs? The only ones I've seen
are floatX_abs() which mask out the sign bit and do nothing else.

> > (...)
> > +cff 0xffb0, expected: 0x7ff8, returned: 
> > 0x7ff4, \
> >   expected exceptions: i, returned: none
> > +error: flags mismatch for input @ ibm/Basic-Types-Inputs.fptest:26170:
> > +b32b64cff =0 S -> Q i
> 
> SNaN conversion from 32 bit to 64 bit. Here I agree
> with the test -- we should quieten the NaN and raise
> Invalid -- which implies that the hardware is wrong ?!?

This passes on an Intel host, and fails on both Power7 and 8 hosts I have
access to. I don't have the Power ISA spec in front of me, but I hope
there's something about this specified in it.

E.

Re: [Qemu-devel] [PATCH v6 2/7] hw/misc: add vmcoreinfo device

2018-04-17 Thread Cole Robinson

On 04/17/2018 05:11 PM, Eduardo Habkost wrote:
> On Tue, Apr 17, 2018 at 03:12:03PM -0400, Cole Robinson wrote:
> [...]
>> Reviving this... did any follow up changes happen?
>>
>> Marc-André patched virt-manager a few months back to enable -device
>> vmcoreinfo for new VMs:
>>
>> https://www.redhat.com/archives/virt-tools-list/2018-February/msg00020.html
>>
>> And I see there's at least a bug tracking adding this to openstack for
>> new VMs:
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=1555276
>>
>> If this feature doesn't really have any downsides, it would be nice to
>> get this tied to new machine types. Saves a lot of churn for higher
>> levels of the stack
> 
> I understand this would be nice to have considering the existing
> stacks, but at the same time I would like the rest of the
> stack(s) to really try to not depend on QEMU machine-types to
> define policy/defaults.
> 
> Every feature that is hidden behind an opaque machine-type name
> and not visible in the domain XML and QEMU command-line increases
> the risk of migration and compatibility bugs.
> 

What exactly is the migration compatibility issue with turning on the
equivalent of -device vmcoreinfo for -M *-2.13+ ? Possibly prevents
backwards migration to older qemu but is that even a goal?

> This was being discussed in a mail thread at:
> https://www.mail-archive.com/ovirt-devel@redhat.com/msg01196.html
> 
> Quoting Daniel, on that thread:
> 
> ] Another case is the pvpanic device - while in theory that could
> ] have been enabled by default for all guests, by QEMU or a config
> ] generator library, doing so is not useful on its own. The hard
> ] bit of the work is adding code to the mgmt app to choose the
> ] action for when pvpanic triggers, and code to handle the results
> ] of that action.
> 
> From that comment, I understand that simply making QEMU create a
> pvpanic device by default on pc-2.13+ won't be useful at all?
> 

This qemu-devel thread was about -device vmcoreinfo though, not pvpanic.
vmcoreinfo doesn't need anything else to work AFAICT and shouldn't need
any explicit config, heck it doesn't even have any -device properties.

Like Dan says pvpanic isn't a 'just works' thing, and I know for windows
VMs it shows up in device manager which has considerations for things
like SVVP. I think vmcoreinfo doesn't have the same impact

There are some guest visible things that we have turned on for new
machine types in the past, pveoi and x2apic comes to mind.

Thanks,
Cole

Re: [Qemu-devel] [PATCH for-2.13] tcg: Improve TCGv_ptr support

2018-04-17 Thread no-reply

Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20180417222635.17007-1-richard.hender...@linaro.org
Subject: [Qemu-devel] [PATCH for-2.13] tcg: Improve TCGv_ptr support

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
failed=1
echo
fi
n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag]   
patchew/20180417222635.17007-1-richard.hender...@linaro.org -> 
patchew/20180417222635.17007-1-richard.hender...@linaro.org
Switched to a new branch 'test'
3d7fa06584 tcg: Improve TCGv_ptr support

=== OUTPUT BEGIN ===
Checking PATCH 1/1: tcg: Improve TCGv_ptr support...
ERROR: space required after that ',' (ctx:VxV)
#101: FILE: tcg/tcg-op.h:1149:
+glue(tcg_gen_ld_,PTR)((NAT)r, a, o);
 ^

ERROR: space required after that ',' (ctx:VxV)
#106: FILE: tcg/tcg-op.h:1154:
+glue(tcg_gen_discard_,PTR)((NAT)a);
  ^

ERROR: space required after that ',' (ctx:VxV)
#111: FILE: tcg/tcg-op.h:1159:
+glue(tcg_gen_add_,PTR)((NAT)r, (NAT)a, (NAT)b);
  ^

ERROR: space required after that ',' (ctx:VxV)
#116: FILE: tcg/tcg-op.h:1164:
+glue(tcg_gen_addi_,PTR)((NAT)r, (NAT)a, b);
   ^

ERROR: space required after that ',' (ctx:VxV)
#122: FILE: tcg/tcg-op.h:1170:
+glue(tcg_gen_brcondi_,PTR)(cond, (NAT)a, b, label);
  ^

total: 5 errors, 0 warnings, 305 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

[Qemu-devel] [PATCH for-2.13] tcg: Improve TCGv_ptr support

2018-04-17 Thread Richard Henderson

Drop TCGV_PTR_TO_NAT and TCGV_NAT_TO_PTR internal macros.

Add tcg_temp_local_new_ptr, tcg_gen_brcondi_ptr, tcg_gen_ext_i32_ptr,
tcg_gen_trunc_i64_ptr, tcg_gen_extu_ptr_i64, tcg_gen_trunc_ptr_i32.

Use inlines instead of macros where possible.

Signed-off-by: Richard Henderson 


These additions will be used by target/arm/translate-sve.c.


r~


 tcg/tcg-op.h| 91 +
 tcg/tcg.h   | 86 ++
 target/hppa/translate.c | 16 ++---
 tcg/tcg.c   | 31 ++---
 4 files changed, 130 insertions(+), 94 deletions(-)

diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
index 75bb55aeac..5d2c91a1b6 100644
--- a/tcg/tcg-op.h
+++ b/tcg/tcg-op.h
@@ -1137,25 +1137,74 @@ void tcg_gen_stl_vec(TCGv_vec r, TCGv_ptr base, TCGArg 
offset, TCGType t);
 #endif
 
 #if UINTPTR_MAX == UINT32_MAX
-# define tcg_gen_ld_ptr(R, A, O) \
-tcg_gen_ld_i32(TCGV_PTR_TO_NAT(R), (A), (O))
-# define tcg_gen_discard_ptr(A) \
-tcg_gen_discard_i32(TCGV_PTR_TO_NAT(A))
-# define tcg_gen_add_ptr(R, A, B) \
-tcg_gen_add_i32(TCGV_PTR_TO_NAT(R), TCGV_PTR_TO_NAT(A), TCGV_PTR_TO_NAT(B))
-# define tcg_gen_addi_ptr(R, A, B) \
-tcg_gen_addi_i32(TCGV_PTR_TO_NAT(R), TCGV_PTR_TO_NAT(A), (B))
-# define tcg_gen_ext_i32_ptr(R, A) \
-tcg_gen_mov_i32(TCGV_PTR_TO_NAT(R), (A))
+# define PTR  i32
+# define NAT  TCGv_i32
 #else
-# define tcg_gen_ld_ptr(R, A, O) \
-tcg_gen_ld_i64(TCGV_PTR_TO_NAT(R), (A), (O))
-# define tcg_gen_discard_ptr(A) \
-tcg_gen_discard_i64(TCGV_PTR_TO_NAT(A))
-# define tcg_gen_add_ptr(R, A, B) \
-tcg_gen_add_i64(TCGV_PTR_TO_NAT(R), TCGV_PTR_TO_NAT(A), TCGV_PTR_TO_NAT(B))
-# define tcg_gen_addi_ptr(R, A, B) \
-tcg_gen_addi_i64(TCGV_PTR_TO_NAT(R), TCGV_PTR_TO_NAT(A), (B))
-# define tcg_gen_ext_i32_ptr(R, A) \
-tcg_gen_ext_i32_i64(TCGV_PTR_TO_NAT(R), (A))
-#endif /* UINTPTR_MAX == UINT32_MAX */
+# define PTR  i64
+# define NAT  TCGv_i64
+#endif
+
+static inline void tcg_gen_ld_ptr(TCGv_ptr r, TCGv_ptr a, intptr_t o)
+{
+glue(tcg_gen_ld_,PTR)((NAT)r, a, o);
+}
+
+static inline void tcg_gen_discard_ptr(TCGv_ptr a)
+{
+glue(tcg_gen_discard_,PTR)((NAT)a);
+}
+
+static inline void tcg_gen_add_ptr(TCGv_ptr r, TCGv_ptr a, TCGv_ptr b)
+{
+glue(tcg_gen_add_,PTR)((NAT)r, (NAT)a, (NAT)b);
+}
+
+static inline void tcg_gen_addi_ptr(TCGv_ptr r, TCGv_ptr a, intptr_t b)
+{
+glue(tcg_gen_addi_,PTR)((NAT)r, (NAT)a, b);
+}
+
+static inline void tcg_gen_brcondi_ptr(TCGCond cond, TCGv_ptr a,
+   intptr_t b, TCGLabel *label)
+{
+glue(tcg_gen_brcondi_,PTR)(cond, (NAT)a, b, label);
+}
+
+static inline void tcg_gen_ext_i32_ptr(TCGv_ptr r, TCGv_i32 a)
+{
+#if UINTPTR_MAX == UINT32_MAX
+tcg_gen_mov_i32((NAT)r, a);
+#else
+tcg_gen_ext_i32_i64((NAT)r, a);
+#endif
+}
+
+static inline void tcg_gen_trunc_i64_ptr(TCGv_ptr r, TCGv_i64 a)
+{
+#if UINTPTR_MAX == UINT32_MAX
+tcg_gen_extrl_i64_i32((NAT)r, a);
+#else
+tcg_gen_mov_i64((NAT)r, a);
+#endif
+}
+
+static inline void tcg_gen_extu_ptr_i64(TCGv_i64 r, TCGv_ptr a)
+{
+#if UINTPTR_MAX == UINT32_MAX
+tcg_gen_extu_i32_i64(r, (NAT)a);
+#else
+tcg_gen_mov_i64(r, (NAT)a);
+#endif
+}
+
+static inline void tcg_gen_trunc_ptr_i32(TCGv_i32 r, TCGv_ptr a)
+{
+#if UINTPTR_MAX == UINT32_MAX
+tcg_gen_mov_i32(r, (NAT)a);
+#else
+tcg_gen_extrl_i64_i32(r, (NAT)a);
+#endif
+}
+
+#undef PTR
+#undef NAT
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 30896ca304..eb0d4f6ca7 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -890,15 +890,30 @@ void tcg_set_frame(TCGContext *s, TCGReg reg, intptr_t 
start, intptr_t size);
 
 TCGTemp *tcg_global_mem_new_internal(TCGType, TCGv_ptr,
  intptr_t, const char *);
-
-TCGv_i32 tcg_temp_new_internal_i32(int temp_local);
-TCGv_i64 tcg_temp_new_internal_i64(int temp_local);
+TCGTemp *tcg_temp_new_internal(TCGType, bool);
+void tcg_temp_free_internal(TCGTemp *);
 TCGv_vec tcg_temp_new_vec(TCGType type);
 TCGv_vec tcg_temp_new_vec_matching(TCGv_vec match);
 
-void tcg_temp_free_i32(TCGv_i32 arg);
-void tcg_temp_free_i64(TCGv_i64 arg);
-void tcg_temp_free_vec(TCGv_vec arg);
+static inline void tcg_temp_free_i32(TCGv_i32 arg)
+{
+tcg_temp_free_internal(tcgv_i32_temp(arg));
+}
+
+static inline void tcg_temp_free_i64(TCGv_i64 arg)
+{
+tcg_temp_free_internal(tcgv_i64_temp(arg));
+}
+
+static inline void tcg_temp_free_ptr(TCGv_ptr arg)
+{
+tcg_temp_free_internal(tcgv_ptr_temp(arg));
+}
+
+static inline void tcg_temp_free_vec(TCGv_vec arg)
+{
+tcg_temp_free_internal(tcgv_vec_temp(arg));
+}
 
 static inline TCGv_i32 tcg_global_mem_new_i32(TCGv_ptr reg, intptr_t offset,
   const char *name)
@@ -909,12 +924,14 @@ static inline TCGv_i32 tcg_global_mem_new_i32(TCGv_ptr 
reg, intptr_t offset,
 
 static inline TCGv_i32 tcg_temp_new_i32(void)
 {
-return

[Qemu-devel] [PATCH for-2.13] tcg: Allow wider vectors for cmp and mul

2018-04-17 Thread Richard Henderson

In db432672, we allow wide inputs for operations such as add.
However, in 212be173 and 3774030a we didn't do the same for
compare and multiply.

Signed-off-by: Richard Henderson 
---
 tcg/tcg-op-vec.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c
index 70ec889bc1..2ca219734d 100644
--- a/tcg/tcg-op-vec.c
+++ b/tcg/tcg-op-vec.c
@@ -355,8 +355,8 @@ void tcg_gen_cmp_vec(TCGCond cond, unsigned vece,
 TCGType type = rt->base_type;
 int can;
 
-tcg_debug_assert(at->base_type == type);
-tcg_debug_assert(bt->base_type == type);
+tcg_debug_assert(at->base_type >= type);
+tcg_debug_assert(bt->base_type >= type);
 can = tcg_can_emit_vec_op(INDEX_op_cmp_vec, type, vece);
 if (can > 0) {
 vec_gen_4(INDEX_op_cmp_vec, type, vece, ri, ai, bi, cond);
@@ -377,8 +377,8 @@ void tcg_gen_mul_vec(unsigned vece, TCGv_vec r, TCGv_vec a, 
TCGv_vec b)
 TCGType type = rt->base_type;
 int can;
 
-tcg_debug_assert(at->base_type == type);
-tcg_debug_assert(bt->base_type == type);
+tcg_debug_assert(at->base_type >= type);
+tcg_debug_assert(bt->base_type >= type);
 can = tcg_can_emit_vec_op(INDEX_op_mul_vec, type, vece);
 if (can > 0) {
 vec_gen_3(INDEX_op_mul_vec, type, vece, ri, ai, bi);
-- 
2.14.3

Re: [Qemu-devel] [PATCH 4/4] linux-headers: drop kvm_para.h

2018-04-17 Thread Eduardo Habkost

On Tue, Apr 17, 2018 at 09:58:37PM +0300, Michael S. Tsirkin wrote:
> Unused now and can be removed.
> 
> Signed-off-by: Michael S. Tsirkin 

Reviewed-by: Eduardo Habkost 

-- 
Eduardo

Re: [Qemu-devel] [Qemu-arm] ARM memory barrier patch

2018-04-17 Thread Richard Henderson

On 04/17/2018 11:51 AM, Peter Maydell wrote:
> On 17 April 2018 at 22:32, Henry Wertz  wrote:
>> Please find submitted a patch for ARM memory barriers.  This patch is
>> against qemu-2.12-rc2 but I do believe it should apply for anything from
>> 2.11.x to current. (the code being patched was added in for 2.11 series.)
>>
>>
>> I found with qemu 2.11.x or newer that I would get an illegal instruction
>> error running some Intel binaries on my ARM chromebook.  On investigation,
>> I found it was quitting on memory barriers.
>> qemu instruction:
>> mb $0x31
>> was translating as:
>> 0x604050cc:  5bf07ff5  blpl #0x600250a8
>>
>> After patch it gives:
>> 0x604050cc:  f57ff05b  dmb  ish
>>
>> In short, I found INSN_DMB_ISH (memory barrier for ARMv7) appeared to be
>> correct based on online docs, but due to some endian-related shenanigans it
>> had to be byte-swapped to suit qemu; it appears INSN_DMB_MCR (memory barrier
>> for ARMv6) also should be byte swapped  (and this patch does so).
>> I have not checked for correctness of aarch64's barrier instruction.
>>
>> Signed-off-by: Henry Wertz 
> 
> Reviewed-by: Peter Maydell 
> 
> Richard, did you want to take this via the tcg tree?

Yes, I can do that.


r~

[Qemu-devel] getdents patch for 64-bit app on 32-bit host

2018-04-17 Thread Henry Wertz

Please find submitted a patch for getdents (this system call stands for
"get directory entries", it is passed a file descriptor pointing to a
directory and returns a struct with info on the entries in that
directory.)  This patch is against qemu-2.10 series but continues to apply
cleanly on current as of April 15 2018.  If you are running a 32-bit binary
on 64-bit target current qemu will convert he getdents struct, but running
a 64-bit binary on 32-bit target it passes the struct straight through
causing incorrect behavior (file type is in the middle of the 64-bit struct
and at the end of the 32-bit one).

My use case for this has been running aapt (from Android SDK) and whatever
other misc x86-64 bins android studio runs when building on a 32-bit ARM (I
previously had run these x86-64 bins on  32-bit Intel).  After an android
build tools update, aapt began erroring out until I applied this patch.

Peter Maydell has raised a concern about possible buffer overflows in this
code (which was meant to handle 32-bit app on 64-bit system, not 64-bit on
32-bit).  I must admit I haven't gone through the dirent-copying code with
a fine-toothed comb... it appeared to work for my use case.  That said, the
code seems to be careful about using offsetof() rather than making any
assumptions.  In addition, the dirent-copying code appears to have an
assert that would crash qemu if it was going to write past the end of the
dirent buffer -- always nice to have plenty of sanity checks!

--Thanks!
Henry Wertz

Signed-off-by: Henry Wertz 
*** linux-user/syscall.c~	2017-03-04 10:31:14.0 -0600
--- linux-user/syscall.c	2017-03-07 17:08:24.615399116 -0600
***
*** 9913,9921 
  #endif
  #ifdef TARGET_NR_getdents
  case TARGET_NR_getdents:
  #ifdef __NR_getdents
! #if TARGET_ABI_BITS == 32 && HOST_LONG_BITS == 64
  {
  struct target_dirent *target_dirp;
  struct linux_dirent *dirp;
  abi_long count = arg3;
--- 9913,9921 
  #endif
  #ifdef TARGET_NR_getdents
  case TARGET_NR_getdents:
  #ifdef __NR_getdents
! #if TARGET_ABI_BITS != HOST_LONG_BITS
  {
  struct target_dirent *target_dirp;
  struct linux_dirent *dirp;
  abi_long count = arg3;

Re: [Qemu-devel] [Qemu-arm] ARM memory barrier patch

2018-04-17 Thread Peter Maydell

On 17 April 2018 at 22:32, Henry Wertz  wrote:
> Please find submitted a patch for ARM memory barriers.  This patch is
> against qemu-2.12-rc2 but I do believe it should apply for anything from
> 2.11.x to current. (the code being patched was added in for 2.11 series.)
>
>
> I found with qemu 2.11.x or newer that I would get an illegal instruction
> error running some Intel binaries on my ARM chromebook.  On investigation,
> I found it was quitting on memory barriers.
> qemu instruction:
> mb $0x31
> was translating as:
> 0x604050cc:  5bf07ff5  blpl #0x600250a8
>
> After patch it gives:
> 0x604050cc:  f57ff05b  dmb  ish
>
> In short, I found INSN_DMB_ISH (memory barrier for ARMv7) appeared to be
> correct based on online docs, but due to some endian-related shenanigans it
> had to be byte-swapped to suit qemu; it appears INSN_DMB_MCR (memory barrier
> for ARMv6) also should be byte swapped  (and this patch does so).
> I have not checked for correctness of aarch64's barrier instruction.
>
> Signed-off-by: Henry Wertz 

Reviewed-by: Peter Maydell 

Richard, did you want to take this via the tcg tree?

thanks
-- PMM

Re: [Qemu-devel] [PATCH] fpu/softfloat: check for Inf / x or 0 / x before /0

2018-04-17 Thread Peter Maydell

On 17 April 2018 at 22:27, Emilio G. Cota  wrote:
> BTW I just checked with -t host on an IBM Power8, and we get
> the same 1049 flag errors we get with -t soft plus two additional ones:
>
> +A 0xffb0, expected: 0x7fa0, returned: 0x7fa0, \
>   expected exceptions: i, returned: none
> +error: flags mismatch for input @ ibm/Basic-Types-Inputs.fptest:382:
> +b32A =0 S -> S i

That's Abs of an SNaN; the test expects Invalid, which is wrong,
because IEEE754 says absolute-value is a "quiet-computational
operation" that never signals an exception.

What's odd is that we don't report that error for the softfloat
implementation! I also don't understand why the expected value
isn't just the input value with the sign bit flipped.

> (...)
> +cff 0xffb0, expected: 0x7ff8, returned: 0x7ff4, \
>   expected exceptions: i, returned: none
> +error: flags mismatch for input @ ibm/Basic-Types-Inputs.fptest:26170:
> +b32b64cff =0 S -> Q i

SNaN conversion from 32 bit to 64 bit. Here I agree
with the test -- we should quieten the NaN and raise
Invalid -- which implies that the hardware is wrong ?!?

thanks
-- PMM

Re: [Qemu-devel] [RFC PATCH V4 2/4] vfio: Add vm status change callback to stop/restart the mdev device

2018-04-17 Thread Alex Williamson

On Wed, 18 Apr 2018 02:41:44 +0530
Kirti Wankhede  wrote:

> On 4/18/2018 1:39 AM, Alex Williamson wrote:
> > On Wed, 18 Apr 2018 00:44:35 +0530
> > Kirti Wankhede  wrote:
> >   
> >> On 4/17/2018 8:13 PM, Alex Williamson wrote:  
> >>> On Tue, 17 Apr 2018 13:40:32 +
> >>> "Zhang, Yulei"  wrote:
> >>> 
> > -Original Message-
> > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > Sent: Tuesday, April 17, 2018 4:23 AM
> > To: Kirti Wankhede 
> > Cc: Zhang, Yulei ; qemu-devel@nongnu.org; Tian,
> > Kevin ; joonas.lahti...@linux.intel.com;
> > zhen...@linux.intel.com; Wang, Zhi A ;
> > dgilb...@redhat.com; quint...@redhat.com
> > Subject: Re: [RFC PATCH V4 2/4] vfio: Add vm status change callback to
> > stop/restart the mdev device
> >
> > On Mon, 16 Apr 2018 20:14:27 +0530
> > Kirti Wankhede  wrote:
> >   
> >> On 4/10/2018 11:32 AM, Yulei Zhang wrote:  
> >>> VM status change handler is added to change the vfio pci device
> >>> status during the migration, write the demanded device status
> >>> to the DEVICE STATUS subregion to stop the device on the source side
> >>> before fetch its status and start the deivce on the target side
> >>> after restore its status.
> >>>
> >>> Signed-off-by: Yulei Zhang 
> >>> ---
> >>>  hw/vfio/pci.c | 20 
> >>>  include/hw/vfio/vfio-common.h |  1 +
> >>>  linux-headers/linux/vfio.h|  6 ++
> >>>  roms/seabios  |  2 +-
> >>>  4 files changed, 28 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> >>> index f98a9dd..13d8c73 100644
> >>> --- a/hw/vfio/pci.c
> >>> +++ b/hw/vfio/pci.c
> >>> @@ -38,6 +38,7 @@
> >>>
> >>>  static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
> >>>  static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
> >>> +static void vfio_vm_change_state_handler(void *pv, int running,  
> > RunState state);  
> >>>
> >>>  /*
> >>>   * Disabling BAR mmaping can be slow, but toggling it around INTx can
> >>> @@ -2896,6 +2897,7 @@ static void vfio_realize(PCIDevice *pdev, Error 
> >>>  
> > **errp)  
> >>>  vfio_register_err_notifier(vdev);
> >>>  vfio_register_req_notifier(vdev);
> >>>  vfio_setup_resetfn_quirk(vdev);
> >>> +  
> > qemu_add_vm_change_state_handler(vfio_vm_change_state_handler,
> > vdev);  
> >>>
> >>>  return;
> >>>
> >>> @@ -2982,6 +2984,24 @@ post_reset:
> >>>  vfio_pci_post_reset(vdev);
> >>>  }
> >>>
> >>> +static void vfio_vm_change_state_handler(void *pv, int running,  
> > RunState state)  
> >>> +{
> >>> +VFIOPCIDevice *vdev = pv;
> >>> +VFIODevice *vbasedev = >vbasedev;
> >>> +uint8_t dev_state;
> >>> +uint8_t sz = 1;
> >>> +
> >>> +dev_state = running ? VFIO_DEVICE_START : VFIO_DEVICE_STOP;
> >>> +
> >>> +if (pwrite(vdev->vbasedev.fd, _state,
> >>> +   sz, vdev->device_state.offset) != sz) {
> >>> +error_report("vfio: Failed to %s device", running ? "start" 
> >>> : "stop");
> >>> +return;
> >>> +}
> >>> +
> >>> +vbasedev->device_state = dev_state;
> >>> +}
> >>> +  
> >>
> >> Is it expected to trap device_state region by vendor driver?
> >> Can this information be communicated to vendor driver through an 
> >> ioctl?  
> >
> > Either the mdev vendor driver or vfio bus driver (ie. vfio-pci) would
> > be providing REGION_INFO for this region, so the vendor driver is
> > already in full control here using existing ioctls.  I don't see that
> > we need new ioctls, we just need to fully define the API of the
> > proposed regions here.
> >   
>  If the device state region is mmaped, we may not be able to use
>  region device state offset to convey the running state. It may need
>  a new ioctl to set the device state.
> >>>
> >>> The vendor driver defines the mmap'ability of the region, the vendor
> >>> driver is still in control.  The API of the region and the
> >>> implementation by the vendor driver should account for handling
> >>> mmap'able sections within the region.  Thanks,
> >>>
> >>> Alex
> >>>
> >>>  
> >>
> >> If this same region should be used for communicating state or other
> >> parameters instead of ioctl, may be first page of this region need to be
> >> reserved. Mmappable region's start address should be page aligned. Is
> >> this API going to utilize 4K of the reserved part of this region?
>

[Qemu-devel] ARM memory barrier patch

2018-04-17 Thread Henry Wertz

Please find submitted a patch for ARM memory barriers.  This patch is
against qemu-2.12-rc2 but I do believe it should apply for anything from
2.11.x to current. (the code being patched was added in for 2.11 series.)


I found with qemu 2.11.x or newer that I would get an illegal instruction
error running some Intel binaries on my ARM chromebook.  On investigation,
I found it was quitting on memory barriers.
qemu instruction:
mb $0x31
was translating as:
0x604050cc:  5bf07ff5  blpl #0x600250a8

After patch it gives:
0x604050cc:  f57ff05b  dmb  ish

In short, I found INSN_DMB_ISH (memory barrier for ARMv7) appeared to be
correct based on online docs, but due to some endian-related shenanigans it
had to be byte-swapped to suit qemu; it appears INSN_DMB_MCR (memory
barrier for ARMv6) also should be byte swapped  (and this patch does so).
I have not checked for correctness of aarch64's barrier instruction.

Signed-off-by: Henry Wertz 
*** tcg/arm/tcg-target.inc.c.orig	2018-04-04 15:28:50.0 -0500
--- tcg/arm/tcg-target.inc.c	2018-04-16 12:55:04.917518898 -0500
***
*** 158,167 
  INSN_LDRD_REG  = 0x00d0,
  INSN_STRD_IMM  = 0x004000f0,
  INSN_STRD_REG  = 0x00f0,
  
! INSN_DMB_ISH   = 0x5bf07ff5,
! INSN_DMB_MCR   = 0xba0f07ee,
  
  /* Architected nop introduced in v6k.  */
  /* ??? This is an MSR (imm) 0,0,0 insn.  Anyone know if this
 also Just So Happened to do nothing on pre-v6k so that we
--- 158,167 
  INSN_LDRD_REG  = 0x00d0,
  INSN_STRD_IMM  = 0x004000f0,
  INSN_STRD_REG  = 0x00f0,
  
! INSN_DMB_ISH   = 0xf57ff05b,
! INSN_DMB_MCR   = 0xee070fba,
  
  /* Architected nop introduced in v6k.  */
  /* ??? This is an MSR (imm) 0,0,0 insn.  Anyone know if this
 also Just So Happened to do nothing on pre-v6k so that we

Re: [Qemu-devel] [PATCH] mux: fix ctrl-a b again

2018-04-17 Thread Marc-André Lureau

Hi

On Tue, Apr 17, 2018 at 11:19 PM, Peter Maydell
 wrote:
> On 17 April 2018 at 19:36, Philippe Mathieu-Daudé  wrote:
>> Since this commit, the console on the Malta board stay black...
>>
>> Before:
>> $ qemu-system-mips -M malta -m 512 \
>>   -kernel vmlinux-3.2.0-4-4kc-malta -append 'root=/dev/sda1' \
>>   -nographic
>> [0.00] Initializing cgroup subsys cpuset
>> [0.00] Initializing cgroup subsys cpu
>> [0.00] Linux version 3.2.0-4-4kc-malta
>> (debian-ker...@lists.debian.org) (gcc version 4.6.3 (Debian 4.6.3-14) )
>> #1 Debian 3.2.51-1
>> [0.00] Config serial console: console=ttyS0,38400n8r
>> [0.00] bootconsole [early0] enabled
>> ...
>>
>> After:
>> $ qemu-system-mips -M malta -m 512 \
>>   -kernel vmlinux-3.2.0-4-4kc-malta -append 'root=/dev/sda1' \
>>   -nographic
>> QEMU 2.11.92 monitor - type 'help' for more information
>> (qemu) QEMU 2.11.92 monitor - type 'help' for more information
>> (qemu) q
>
> I reproed this, and as you can see from the transcripts you
> give the problem is that the mux is confused about what the
> active console should be. Before the patch, we correctly
> start with the serial as the active console and C-a c
> switches, but afterwards we start out with the monitor
> active and C-a c doesn't switch. Revert pushed to master.

Thanks Peter for taking care of this "mess" so simply and diligently.

I'll look into it again asap

-- 
Marc-André Lureau

[Qemu-devel] [Bug 1654137] Re: Ctrl-A b not working in 2.8.0

2018-04-17 Thread Peter Maydell

Commit 1b2503fcf7b5932c reverted by commit 6f660996f1623034. We'll
release 2.12 without a fix for this bug, and look at it for 2.13 and
2.12.1.

https://lists.gnu.org/archive/html/qemu-devel/2018-04/msg02505.html and
followups describe the regression that 1b2503fcf7b5932c caused.


** Changed in: qemu
   Status: Fix Committed => In Progress

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1654137

Title:
  Ctrl-A b not working in 2.8.0

Status in QEMU:
  In Progress

Bug description:
  With a recent update from 2.7.0 to 2.8.0 I have discovered that I can
  no longer send a "break" to the VM.  Ctrl-A b is simply ignored.
  Other Ctrl-A sequences seem to work correctly.

  This is on a NetBSD amd64 system, version 7.99.53, and qemu was
  installed on this system from source.

  Reverting to the previous install restores "break" capability.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1654137/+subscriptions

[Qemu-devel] [Bug 1763536] Re: go build fails under qemu-ppc64le-static (qemu-user)

2018-04-17 Thread Peter Maydell

I care more about the arm64 case, so if you're going to do one then that
would be my preference.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1763536

Title:
  go build fails under qemu-ppc64le-static (qemu-user)

Status in QEMU:
  New

Bug description:
  I am using qemu-user (built static) in a docker container environment.
  When running multi-threaded go commands in the container (go build for
  example) the process may hang, report segfaults or other errors.  I
  built qemu-ppc64le from the upstream git (master).

  I see the problem running on a multi core system with Intel i7 processors.
  # cat /proc/cpuinfo | grep "model name"
  model name: Intel(R) Core(TM) i7-2760QM CPU @ 2.40GHz
  model name: Intel(R) Core(TM) i7-2760QM CPU @ 2.40GHz
  model name: Intel(R) Core(TM) i7-2760QM CPU @ 2.40GHz
  model name: Intel(R) Core(TM) i7-2760QM CPU @ 2.40GHz
  model name: Intel(R) Core(TM) i7-2760QM CPU @ 2.40GHz
  model name: Intel(R) Core(TM) i7-2760QM CPU @ 2.40GHz
  model name: Intel(R) Core(TM) i7-2760QM CPU @ 2.40GHz
  model name: Intel(R) Core(TM) i7-2760QM CPU @ 2.40GHz

  Steps to reproduce:
  1) Build qemu-ppc64le as static and copy into docker build directory named it 
qemu-ppc64le-static.

  2) Add hello.go to docker build dir.

  package main
  import "fmt"
  func main() {
fmt.Println("hello world")
  }

  3) Create the Dockerfile from below:

  FROM ppc64le/golang:1.10.1-alpine3.
  COPY qemu-ppc64le-static /usr/bin/
  COPY hello.go /go

  4) Build container
  $ docker build -t qemutest -f Dockerfile ./go 

  5) Run test
  $ docker run -it qemutest

  /go # /usr/bin/qemu-ppc64le-static --version
  qemu-ppc64le version 2.11.93 (v2.12.0-rc3-dirty)
  Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers

  /go # go version
  go version go1.10.1 linux/ppc64le

  /go # go build hello.go
  fatal error: fatal error: stopm holding locksunexpected signal during runtime 
execution

  panic during panic
  [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1003528c]

  runtime stack:
  runtime: unexpected return pc for syscall.Syscall6 called from 0xc42007f500
  stack: frame={sp:0xc4203be840, fp:0xc4203be860} 
stack=[0x4000b7ecf0,0x4000b928f0)

  syscall.Syscall6(0x100744e8, 0x3d, 0xc42050c140, 0x20, 0x18, 0x10422b80, 
0xc4203be968[signal , 0x10012d88SIGSEGV: segmentation violation, 0xc420594000 
code=, 0x00x1 addr=0x0 pc=0x1003528c)
  ]

  runtime stack:

/usr/local/go/src/syscall/asm_linux_ppc64x.s:61runtime.throw(0x10472d19, 0x13)
   +/usr/local/go/src/runtime/panic.go:0x6c616 +0x68

  
  runtime.stopm()
/usr/local/go/src/runtime/proc.go:1939goroutine  +10x158
   [runtime.exitsyscall0semacquire(0xc42007f500)
/usr/local/go/src/runtime/proc.go:3129 +]:
  0x130
  runtime.mcall(0xc42007f500)
/usr/local/go/src/runtime/asm_ppc64x.s:183 +0x58sync.runtime_Semacquire
  (0xc4201fab1c)
/usr/local/go/src/runtime/sema.go:56 +0x38

  
  Note the results may differ between attempts,  hangs and other faults 
sometimes happen.
  
  If I run "go: single threaded I don't see the problem, for example:

  /go # GOMAXPROCS=1 go build -p 1 hello.go 
  /go # ./hello
  hello world

  I see the same issue with arm64.  I don't think this is a go issue,
  but don't have a real evidence to prove that.  This problem looks
  similar to other problem I have seen reported against qemu running
  multi-threaded applications.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1763536/+subscriptions

Re: [Qemu-devel] Patch for ARM memory barriers and getdent

2018-04-17 Thread Henry Wertz

Hi Henry; thanks for these patches. Please could you provide
> a Signed-off-by: line for them? This says you're happy for us
> to apply them to QEMU under our license, and we can't do anything
> with them without one. The top part of
> https://wiki.qemu.org/Contribute/SubmitAPatch
> has more detail.
>
 Yup, I'll resubmit both in proper format.

> You'll probably have better luck if you submit patches as separate
> emails rather than all in one -- the set of people who care about
>
 Can do.

In general, 64-bit binary on a 32-bit host is not supported by linux-user.
> Various things are likely to go wrong for anything but the simplest
> of guest binaries. You need a 64-bit host to reliably use linux-user
> for a 64-bit guest binary.
>
 I didn't test it very hard, I was only using it to run the couple
Android SDK binaries (aapt and friends) that Android Studio runs; these in
general open a file or two, process them, write out a result, so they
probably are pretty simple guest binaries.  Nevertheless,with this patch
these few x86-64 binaries run on 32-bit ARM and (last I checked) 32-bit
Intel.


>
> I think your change isn't sufficient to handle the "target 32 bits
> and host 64 bits" case, because the 64-on-32 code path that you're
> using for it uses the guest's buffer size as the size of the
> host buffer it allocates to pass to the host syscall.
>


> Maybe I'm missing a clever trick?
>
 As far as I can tell, it makes a "linux_dirent" and "target_dirent"
and is pretty careful to calculate rather than assume offsets, and so on as
it copies data from one struct to the other.  i didn't look before, but
there is in fact a line "assert(count1 + treclen <= count);" that would
trigger if the buffer is going to be overflowed.  I think the code may not
have even been needed, except that d_type (file type) was moved to the
"middle" of the dirent64 struct instead of at the end.

Thanks!
-- Henry

Re: [Qemu-devel] [PATCH] fpu/softfloat: check for Inf / x or 0 / x before /0

2018-04-17 Thread Emilio G. Cota

On Tue, Apr 17, 2018 at 21:54:03 +0100, Peter Maydell wrote:
> On 17 April 2018 at 20:04, Emilio G. Cota  wrote:
> > Note that in fp-test I am not checking for flags that are raised
> > when none are expected, because doing so gives quite a few errors.
> > Just noticed that enabling this check yields 1049 of these errors for
> > v2.11, and before this patch that number was 1087. After this
> > patch, it is again brought down to 1049. IOW, the test cases in
> > fp-test raise exactly the same flags as v2.11, which is good to know.
> >
> > The 1049 errors are probably false positives -- at least a big
> > chunk of them should be, given that "-t host" gives even more errors.
> > I am tempted to keep the flag check and whitelist these errors
> > though, which would catch regressions such as the one we're fixing here.
> 
> I strongly suspect we do have a few cases where we get the answers
> wrong and/or don't report the flags right, so ideally we'd have
> a look at them in more detail...
> 
> > Here is the report file with the 1049 failing test cases:
> >   http://www.cs.columbia.edu/~cota/qemu/fp-test-after-inf-patch.txt
> 
> Syntax for interpreting the report:
> https://www.research.ibm.com/haifa/projects/verification/fpgen/syntax.txt
> 
> Here's the first one, am I reading it right?
> 
> + 0xffc0 0xffb0, expected: 0xffc0, returned: 0xffc0,
> expected exceptions: none, returned: i
> error: flags mismatch for input @ ibm/Basic-Types-Inputs.fptest:1346:
> b32+ =0 Q S -> Q
> 
> That's a float32 addition where the first input is a QNaN
> and the second an SNaN (presumably the test is configured
> for what in QEMU is snan_bit_is_one == 0), and it expects
> the result to be the QNaN, with no exceptions set. But
> we raise Invalid.
> =0 is the rounding mode, not relevant here.

Yes, you're reading it right.

> IEEE754-2008 s6.2 seems pretty clear that if there's
> an SNaN as an operand then operations like addition should
> signal Invalid. So this looks like a bug in the test case
> input. (Which is weird, because IBM must have tested this,
> so it's odd to see an obvious error in it.)

Yes sometimes the input files don't make much sense -- that's why
I ended up whitelisting some of them.

BTW I just checked with -t host on an IBM Power8, and we get
the same 1049 flag errors we get with -t soft plus two additional ones:

+A 0xffb0, expected: 0x7fa0, returned: 0x7fa0, \
  expected exceptions: i, returned: none
+error: flags mismatch for input @ ibm/Basic-Types-Inputs.fptest:382:
+b32A =0 S -> S i
(...)
+cff 0xffb0, expected: 0x7ff8, returned: 0x7ff4, \
  expected exceptions: i, returned: none
+error: flags mismatch for input @ ibm/Basic-Types-Inputs.fptest:26170:
+b32b64cff =0 S -> Q i

On x86 with -t host we again get a strict superset of what we get
with -t soft.

So yeah, I don't know what these test cases are about.

> Most of the "expected none, returned i" lines look
> like the same thing. We should look at the others, though.

Given the above, whitelisting the 1049 cases and forcing the flag
checks for all tests (as below) seems reasonable to me.

--- a/tests/fp/fp-test.c
+++ b/tests/fp/fp-test.c
@@ -247,7 +247,7 @@ static enum error tester_check(const struct test_op *t, 
uint64_t res64,
 goto out;
 }
 }
-if (t->exceptions && flags != (t->exceptions | default_exceptions)) {
+if (flags != (t->exceptions | default_exceptions)) {
 err = ERROR_EXCEPTIONS;
 goto out;
 }

Thanks,

E.

Re: [Qemu-devel] [PATCH] mux: fix ctrl-a b again

2018-04-17 Thread Peter Maydell

On 17 April 2018 at 19:36, Philippe Mathieu-Daudé  wrote:
> Since this commit, the console on the Malta board stay black...
>
> Before:
> $ qemu-system-mips -M malta -m 512 \
>   -kernel vmlinux-3.2.0-4-4kc-malta -append 'root=/dev/sda1' \
>   -nographic
> [0.00] Initializing cgroup subsys cpuset
> [0.00] Initializing cgroup subsys cpu
> [0.00] Linux version 3.2.0-4-4kc-malta
> (debian-ker...@lists.debian.org) (gcc version 4.6.3 (Debian 4.6.3-14) )
> #1 Debian 3.2.51-1
> [0.00] Config serial console: console=ttyS0,38400n8r
> [0.00] bootconsole [early0] enabled
> ...
>
> After:
> $ qemu-system-mips -M malta -m 512 \
>   -kernel vmlinux-3.2.0-4-4kc-malta -append 'root=/dev/sda1' \
>   -nographic
> QEMU 2.11.92 monitor - type 'help' for more information
> (qemu) QEMU 2.11.92 monitor - type 'help' for more information
> (qemu) q

I reproed this, and as you can see from the transcripts you
give the problem is that the mux is confused about what the
active console should be. Before the patch, we correctly
start with the serial as the active console and C-a c
switches, but afterwards we start out with the monitor
active and C-a c doesn't switch. Revert pushed to master.

thanks
-- PMM

Re: [Qemu-devel] [PATCH v2 5/5] qmp: add pmemload command

2018-04-17 Thread Eric Blake

On 04/12/2018 07:50 AM, Simon Ruderich wrote:
> Adapted patch from Baojun Wang [1] with the following commit message:
> 
> I found this could be useful to have qemu-softmmu as a cross
> debugger (launch with -s -S command line option), then if we can
> have a command to load guest physical memory, we can use cross gdb
> to do some target debug which gdb cannot do directly.
> 
> pmemload is necessary to directly write physical memory which is not
> possible with gdb alone as it uses only logical addresses.
> 
> The QAPI for pmemload uses "val" as parameter name for the physical
> address. This name is not very descriptive but is consistent with the
> existing pmemsave. Changing the parameter name of pmemsave is not
> possible without breaking the existing API.
> 
> [1]: https://lists.gnu.org/archive/html/qemu-trivial/2014-04/msg00074.html
> 
> Based-on-patch-by: Baojun Wang 
> Signed-off-by: Simon Ruderich 
> ---

Focusing on just the interface:

> +++ b/qapi/misc.json
> @@ -1185,6 +1185,26 @@
>  { 'command': 'pmemsave',
>'data': {'val': 'int', 'size': 'int', 'filename': 'str'} }
>  
> +##
> +# @pmemload:
> +#
> +# Load a portion of guest physical memory from a file.
> +#
> +# @val: the physical address of the guest to start from
> +#
> +# @size: the size of memory region to load

Should size be an optional parameter (default read to the end of the file)?

> +#
> +# @offset: the offset in the file to start from

Should offset be an optional parameter (default start reading from
offset 0 of the file)?

> +#
> +# @filename: the file to load the memory from as binary data
> +#
> +# Returns: Nothing on success
> +#
> +# Since: 2.13
> +##
> +{ 'command': 'pmemload',
> +  'data': {'val': 'int', 'size': 'int', 'offset': 'int', 'filename': 'str'} }
> +
>  ##
>  # @cont:
>  #
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v6 2/7] hw/misc: add vmcoreinfo device

2018-04-17 Thread Eduardo Habkost

On Tue, Apr 17, 2018 at 03:12:03PM -0400, Cole Robinson wrote:
[...]
> Reviving this... did any follow up changes happen?
> 
> Marc-André patched virt-manager a few months back to enable -device
> vmcoreinfo for new VMs:
> 
> https://www.redhat.com/archives/virt-tools-list/2018-February/msg00020.html
> 
> And I see there's at least a bug tracking adding this to openstack for
> new VMs:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1555276
> 
> If this feature doesn't really have any downsides, it would be nice to
> get this tied to new machine types. Saves a lot of churn for higher
> levels of the stack

I understand this would be nice to have considering the existing
stacks, but at the same time I would like the rest of the
stack(s) to really try to not depend on QEMU machine-types to
define policy/defaults.

Every feature that is hidden behind an opaque machine-type name
and not visible in the domain XML and QEMU command-line increases
the risk of migration and compatibility bugs.

This was being discussed in a mail thread at:
https://www.mail-archive.com/ovirt-devel@redhat.com/msg01196.html

Quoting Daniel, on that thread:

] Another case is the pvpanic device - while in theory that could
] have been enabled by default for all guests, by QEMU or a config
] generator library, doing so is not useful on its own. The hard
] bit of the work is adding code to the mgmt app to choose the
] action for when pvpanic triggers, and code to handle the results
] of that action.

>From that comment, I understand that simply making QEMU create a
pvpanic device by default on pc-2.13+ won't be useful at all?

-- 
Eduardo

Re: [Qemu-devel] [PATCH 2/4] include/standard-headers: add asm-x86/kvm_para.h

2018-04-17 Thread Eduardo Habkost

On Tue, Apr 17, 2018 at 09:58:25PM +0300, Michael S. Tsirkin wrote:
> Import asm-x86/kvm_para.h from linux where it can
> be easily used on Linux and non-Linux platforms.
> 
> Signed-off-by: Michael S. Tsirkin 

Acked-by: Eduardo Habkost 

-- 
Eduardo

Re: [Qemu-devel] [PATCH 1/4] update-linux-headers.sh: drop kvm_para.h hacks

2018-04-17 Thread Eduardo Habkost

On Tue, Apr 17, 2018 at 09:58:21PM +0300, Michael S. Tsirkin wrote:
> It turns out (as will be clear from follow-up patches)
> we do not really need any kvm para macros host side
> for now, except on x86, and there we need it
> unconditionally whether we run on kvm or we don't.
> 
> Import the x86 asm/kvm_para.h into standard-headers,
> follow-up patches remove a bunch of code using this.
> 
> Signed-off-by: Michael S. Tsirkin 

I didn't review the code yet but I agree it's a good change, so:

Acked-by: Eduardo Habkost 

-- 
Eduardo

Re: [Qemu-devel] [RFC PATCH V4 2/4] vfio: Add vm status change callback to stop/restart the mdev device

2018-04-17 Thread Kirti Wankhede



On 4/18/2018 1:39 AM, Alex Williamson wrote:
> On Wed, 18 Apr 2018 00:44:35 +0530
> Kirti Wankhede  wrote:
> 
>> On 4/17/2018 8:13 PM, Alex Williamson wrote:
>>> On Tue, 17 Apr 2018 13:40:32 +
>>> "Zhang, Yulei"  wrote:
>>>   
> -Original Message-
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Tuesday, April 17, 2018 4:23 AM
> To: Kirti Wankhede 
> Cc: Zhang, Yulei ; qemu-devel@nongnu.org; Tian,
> Kevin ; joonas.lahti...@linux.intel.com;
> zhen...@linux.intel.com; Wang, Zhi A ;
> dgilb...@redhat.com; quint...@redhat.com
> Subject: Re: [RFC PATCH V4 2/4] vfio: Add vm status change callback to
> stop/restart the mdev device
>
> On Mon, 16 Apr 2018 20:14:27 +0530
> Kirti Wankhede  wrote:
> 
>> On 4/10/2018 11:32 AM, Yulei Zhang wrote:
>>> VM status change handler is added to change the vfio pci device
>>> status during the migration, write the demanded device status
>>> to the DEVICE STATUS subregion to stop the device on the source side
>>> before fetch its status and start the deivce on the target side
>>> after restore its status.
>>>
>>> Signed-off-by: Yulei Zhang 
>>> ---
>>>  hw/vfio/pci.c | 20 
>>>  include/hw/vfio/vfio-common.h |  1 +
>>>  linux-headers/linux/vfio.h|  6 ++
>>>  roms/seabios  |  2 +-
>>>  4 files changed, 28 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>>> index f98a9dd..13d8c73 100644
>>> --- a/hw/vfio/pci.c
>>> +++ b/hw/vfio/pci.c
>>> @@ -38,6 +38,7 @@
>>>
>>>  static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
>>>  static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
>>> +static void vfio_vm_change_state_handler(void *pv, int running,
> RunState state);
>>>
>>>  /*
>>>   * Disabling BAR mmaping can be slow, but toggling it around INTx can
>>> @@ -2896,6 +2897,7 @@ static void vfio_realize(PCIDevice *pdev, Error   
>>>  
> **errp)
>>>  vfio_register_err_notifier(vdev);
>>>  vfio_register_req_notifier(vdev);
>>>  vfio_setup_resetfn_quirk(vdev);
>>> +
> qemu_add_vm_change_state_handler(vfio_vm_change_state_handler,
> vdev);
>>>
>>>  return;
>>>
>>> @@ -2982,6 +2984,24 @@ post_reset:
>>>  vfio_pci_post_reset(vdev);
>>>  }
>>>
>>> +static void vfio_vm_change_state_handler(void *pv, int running,
> RunState state)
>>> +{
>>> +VFIOPCIDevice *vdev = pv;
>>> +VFIODevice *vbasedev = >vbasedev;
>>> +uint8_t dev_state;
>>> +uint8_t sz = 1;
>>> +
>>> +dev_state = running ? VFIO_DEVICE_START : VFIO_DEVICE_STOP;
>>> +
>>> +if (pwrite(vdev->vbasedev.fd, _state,
>>> +   sz, vdev->device_state.offset) != sz) {
>>> +error_report("vfio: Failed to %s device", running ? "start" : 
>>> "stop");
>>> +return;
>>> +}
>>> +
>>> +vbasedev->device_state = dev_state;
>>> +}
>>> +
>>
>> Is it expected to trap device_state region by vendor driver?
>> Can this information be communicated to vendor driver through an ioctl?  
>>   
>
> Either the mdev vendor driver or vfio bus driver (ie. vfio-pci) would
> be providing REGION_INFO for this region, so the vendor driver is
> already in full control here using existing ioctls.  I don't see that
> we need new ioctls, we just need to fully define the API of the
> proposed regions here.
> 
 If the device state region is mmaped, we may not be able to use
 region device state offset to convey the running state. It may need
 a new ioctl to set the device state.  
>>>
>>> The vendor driver defines the mmap'ability of the region, the vendor
>>> driver is still in control.  The API of the region and the
>>> implementation by the vendor driver should account for handling
>>> mmap'able sections within the region.  Thanks,
>>>
>>> Alex
>>>
>>>
>>
>> If this same region should be used for communicating state or other
>> parameters instead of ioctl, may be first page of this region need to be
>> reserved. Mmappable region's start address should be page aligned. Is
>> this API going to utilize 4K of the reserved part of this region?
>> Instead of carving out part of section from the region, are there any
>> disadvantages of adding an ioctl?
>> May be defining a single ioctl and using different flags (GET_*/SET_*)
>> would work?
> 
> Yes, ioctls are something that should be feared and reviewed with great
> scrutiny and we should feel bad if we do a poor job defining them

[Qemu-devel] [Bug 1763536] Re: go build fails under qemu-ppc64le-static (qemu-user)

2018-04-17 Thread David Wilder

I will attempt to find an way to re-create without docker.  The key is
we need a way to create a ppc64le (or arm64) fakeroot with go that we
can chroot into.  That is easy to do with docker.  BTW: the use case
using docker and qemu-user-static is becoming fairly common way to cross
build container images.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1763536

Title:
  go build fails under qemu-ppc64le-static (qemu-user)

Status in QEMU:
  New

Bug description:
  I am using qemu-user (built static) in a docker container environment.
  When running multi-threaded go commands in the container (go build for
  example) the process may hang, report segfaults or other errors.  I
  built qemu-ppc64le from the upstream git (master).

  I see the problem running on a multi core system with Intel i7 processors.
  # cat /proc/cpuinfo | grep "model name"
  model name: Intel(R) Core(TM) i7-2760QM CPU @ 2.40GHz
  model name: Intel(R) Core(TM) i7-2760QM CPU @ 2.40GHz
  model name: Intel(R) Core(TM) i7-2760QM CPU @ 2.40GHz
  model name: Intel(R) Core(TM) i7-2760QM CPU @ 2.40GHz
  model name: Intel(R) Core(TM) i7-2760QM CPU @ 2.40GHz
  model name: Intel(R) Core(TM) i7-2760QM CPU @ 2.40GHz
  model name: Intel(R) Core(TM) i7-2760QM CPU @ 2.40GHz
  model name: Intel(R) Core(TM) i7-2760QM CPU @ 2.40GHz

  Steps to reproduce:
  1) Build qemu-ppc64le as static and copy into docker build directory named it 
qemu-ppc64le-static.

  2) Add hello.go to docker build dir.

  package main
  import "fmt"
  func main() {
fmt.Println("hello world")
  }

  3) Create the Dockerfile from below:

  FROM ppc64le/golang:1.10.1-alpine3.
  COPY qemu-ppc64le-static /usr/bin/
  COPY hello.go /go

  4) Build container
  $ docker build -t qemutest -f Dockerfile ./go 

  5) Run test
  $ docker run -it qemutest

  /go # /usr/bin/qemu-ppc64le-static --version
  qemu-ppc64le version 2.11.93 (v2.12.0-rc3-dirty)
  Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers

  /go # go version
  go version go1.10.1 linux/ppc64le

  /go # go build hello.go
  fatal error: fatal error: stopm holding locksunexpected signal during runtime 
execution

  panic during panic
  [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1003528c]

  runtime stack:
  runtime: unexpected return pc for syscall.Syscall6 called from 0xc42007f500
  stack: frame={sp:0xc4203be840, fp:0xc4203be860} 
stack=[0x4000b7ecf0,0x4000b928f0)

  syscall.Syscall6(0x100744e8, 0x3d, 0xc42050c140, 0x20, 0x18, 0x10422b80, 
0xc4203be968[signal , 0x10012d88SIGSEGV: segmentation violation, 0xc420594000 
code=, 0x00x1 addr=0x0 pc=0x1003528c)
  ]

  runtime stack:

/usr/local/go/src/syscall/asm_linux_ppc64x.s:61runtime.throw(0x10472d19, 0x13)
   +/usr/local/go/src/runtime/panic.go:0x6c616 +0x68

  
  runtime.stopm()
/usr/local/go/src/runtime/proc.go:1939goroutine  +10x158
   [runtime.exitsyscall0semacquire(0xc42007f500)
/usr/local/go/src/runtime/proc.go:3129 +]:
  0x130
  runtime.mcall(0xc42007f500)
/usr/local/go/src/runtime/asm_ppc64x.s:183 +0x58sync.runtime_Semacquire
  (0xc4201fab1c)
/usr/local/go/src/runtime/sema.go:56 +0x38

  
  Note the results may differ between attempts,  hangs and other faults 
sometimes happen.
  
  If I run "go: single threaded I don't see the problem, for example:

  /go # GOMAXPROCS=1 go build -p 1 hello.go 
  /go # ./hello
  hello world

  I see the same issue with arm64.  I don't think this is a go issue,
  but don't have a real evidence to prove that.  This problem looks
  similar to other problem I have seen reported against qemu running
  multi-threaded applications.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1763536/+subscriptions

Re: [Qemu-devel] [PATCH] fpu/softfloat: check for Inf / x or 0 / x before /0

2018-04-17 Thread Peter Maydell

On 17 April 2018 at 20:04, Emilio G. Cota  wrote:
> Note that in fp-test I am not checking for flags that are raised
> when none are expected, because doing so gives quite a few errors.
> Just noticed that enabling this check yields 1049 of these errors for
> v2.11, and before this patch that number was 1087. After this
> patch, it is again brought down to 1049. IOW, the test cases in
> fp-test raise exactly the same flags as v2.11, which is good to know.
>
> The 1049 errors are probably false positives -- at least a big
> chunk of them should be, given that "-t host" gives even more errors.
> I am tempted to keep the flag check and whitelist these errors
> though, which would catch regressions such as the one we're fixing here.

I strongly suspect we do have a few cases where we get the answers
wrong and/or don't report the flags right, so ideally we'd have
a look at them in more detail...

> Here is the report file with the 1049 failing test cases:
>   http://www.cs.columbia.edu/~cota/qemu/fp-test-after-inf-patch.txt

Syntax for interpreting the report:
https://www.research.ibm.com/haifa/projects/verification/fpgen/syntax.txt

Here's the first one, am I reading it right?

+ 0xffc0 0xffb0, expected: 0xffc0, returned: 0xffc0,
expected exceptions: none, returned: i
error: flags mismatch for input @ ibm/Basic-Types-Inputs.fptest:1346:
b32+ =0 Q S -> Q

That's a float32 addition where the first input is a QNaN
and the second an SNaN (presumably the test is configured
for what in QEMU is snan_bit_is_one == 0), and it expects
the result to be the QNaN, with no exceptions set. But
we raise Invalid.
=0 is the rounding mode, not relevant here.

IEEE754-2008 s6.2 seems pretty clear that if there's
an SNaN as an operand then operations like addition should
signal Invalid. So this looks like a bug in the test case
input. (Which is weird, because IBM must have tested this,
so it's odd to see an obvious error in it.)

Most of the "expected none, returned i" lines look
like the same thing. We should look at the others, though.

thanks
-- PMM

Re: [Qemu-devel] [PATCH v2 4/5] hmp: don't truncate size in hmp_memsave/hmp_pmemsave

2018-04-17 Thread Eric Blake

On 04/12/2018 07:50 AM, Simon Ruderich wrote:
> The called function takes an uint64_t as size parameter and
> qdict_get_int() returns an uint64_t. Don't truncate it needlessly to an
> uint32_t.
> 
> Signed-off-by: Simon Ruderich 
> ---
>  hmp.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH v4 21/21] target/arm: Send interrupts on PMU counter overflow

2018-04-17 Thread Aaron Lindsay

Setup a QEMUTimer to get a callback when we expect counters to next
overflow and trigger an interrupt at that time.

Signed-off-by: Aaron Lindsay 
---
 target/arm/cpu.c|  11 +
 target/arm/cpu.h|   7 +++
 target/arm/helper.c | 128 
 3 files changed, 137 insertions(+), 9 deletions(-)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 22063ca..592b7fc 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -741,6 +741,12 @@ static void arm_cpu_finalizefn(Object *obj)
 QLIST_REMOVE(hook, node);
 g_free(hook);
 }
+#ifndef CONFIG_USER_ONLY
+if (arm_feature(>env, ARM_FEATURE_PMU)) {
+timer_deinit(cpu->pmu_timer);
+timer_free(cpu->pmu_timer);
+}
+#endif
 }
 
 static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
@@ -907,6 +913,11 @@ static void arm_cpu_realizefn(DeviceState *dev, Error 
**errp)
 arm_register_pre_el_change_hook(cpu, _pre_el_change, 0);
 arm_register_el_change_hook(cpu, _post_el_change, 0);
 }
+
+#ifndef CONFIG_USER_ONLY
+cpu->pmu_timer = timer_new(QEMU_CLOCK_VIRTUAL, 1, arm_pmu_timer_cb,
+cpu);
+#endif
 } else {
 cpu->pmceid0 = 0x;
 cpu->pmceid1 = 0x;
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 2f07196..e9b6dab 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -702,6 +702,8 @@ struct ARMCPU {
 
 /* Timers used by the generic (architected) timer */
 QEMUTimer *gt_timer[NUM_GTIMERS];
+/* Timer used by the PMU */
+QEMUTimer *pmu_timer;
 /* GPIO outputs for generic timer */
 qemu_irq gt_timer_outputs[NUM_GTIMERS];
 /* GPIO output for GICv3 maintenance interrupt signal */
@@ -933,6 +935,11 @@ void pmu_op_start(CPUARMState *env);
 void pmu_op_finish(CPUARMState *env);
 
 /**
+ * Called when a PMU counter is due to overflow
+ */
+void arm_pmu_timer_cb(void *opaque);
+
+/**
  * Functions to register as EL change hooks for PMU mode filtering
  */
 void pmu_pre_el_change(ARMCPU *cpu, void *ignored);
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 046e37c..2efdc63 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -905,6 +905,7 @@ static const ARMCPRegInfo v6_cp_reginfo[] = {
 /* Definitions for the PMU registers */
 #define PMCRN_MASK  0xf800
 #define PMCRN_SHIFT 11
+#define PMCRLC  0x40
 #define PMCRDP  0x10
 #define PMCRD   0x8
 #define PMCRC   0x4
@@ -920,6 +921,8 @@ static const ARMCPRegInfo v6_cp_reginfo[] = {
 #define PMXEVTYPER_MT 0x0200
 #define PMXEVTYPER_EVTCOUNT   0x03ff
 
+#define PMEVCNTR_OVERFLOW_MASK ((uint64_t)1 << 31)
+
 #define PMCCFILTR 0xf800
 #define PMCCFILTR_M   PMXEVTYPER_M
 #define PMCCFILTR_EL0 (PMCCFILTR | PMCCFILTR_M)
@@ -942,6 +945,11 @@ typedef struct pm_event {
 /* Retrieve the current count of the underlying event. The programmed
  * counters hold a difference from the return value from this function */
 uint64_t (*get_count)(CPUARMState *);
+/* Return how many nanoseconds it will take (at a minimum) for count events
+ * to occur. A negative value indicates the counter will never overflow, or
+ * that the counter has otherwise arranged for the overflow bit to be set
+ * and the PMU interrupt to be raised on overflow. */
+int64_t (*ns_per_count)(uint64_t);
 } pm_event;
 
 static bool event_always_supported(CPUARMState *env)
@@ -958,6 +966,11 @@ static uint64_t swinc_get_count(CPUARMState *env)
 return 0;
 }
 
+static int64_t swinc_ns_per(uint64_t ignored)
+{
+return -1;
+}
+
 /*
  * Return the underlying cycle count for the PMU cycle counters. If we're in
  * usermode, simply return 0.
@@ -973,6 +986,11 @@ static uint64_t cycles_get_count(CPUARMState *env)
 }
 
 #ifndef CONFIG_USER_ONLY
+static int64_t cycles_ns_per(uint64_t cycles)
+{
+return ARM_CPU_FREQ / NANOSECONDS_PER_SECOND;
+}
+
 static bool instructions_supported(CPUARMState *env)
 {
 return use_icount == 1 /* Precise instruction counting */;
@@ -982,22 +1000,30 @@ static uint64_t instructions_get_count(CPUARMState *env)
 {
 return (uint64_t)cpu_get_icount_raw();
 }
+
+static int64_t instructions_ns_per(uint64_t icount)
+{
+return cpu_icount_to_ns((int64_t)icount);
+}
 #endif
 
 #define SUPPORTED_EVENT_SENTINEL UINT16_MAX
 static const pm_event pm_events[] = {
 { .number = 0x000, /* SW_INCR */
   .supported = event_always_supported,
-  .get_count = swinc_get_count
+  .get_count = swinc_get_count,
+  .ns_per_count = swinc_ns_per
 },
 #ifndef CONFIG_USER_ONLY
 { .number = 0x008, /* INST_RETIRED, Instruction architecturally executed */
   .supported = instructions_supported,
-  .get_count = instructions_get_count
+  .get_count = instructions_get_count,
+  .ns_per_count = instructions_ns_per
 },
 { .number = 0x011, /* CPU_CYCLES, Cycle */
   .supported =

Re: [Qemu-devel] [PATCH v2 3/5] cpus: use size_t in qmp_memsave/qmp_pmemsave

2018-04-17 Thread Eric Blake

On 04/12/2018 07:50 AM, Simon Ruderich wrote:
> It's the natural type for object sizes and matches the return value of
> sizeof(buf).
> 
> Signed-off-by: Simon Ruderich 
> ---
>  cpus.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH v4 16/21] target/arm: Finish implementation of PM[X]EVCNTR and PM[X]EVTYPER

2018-04-17 Thread Aaron Lindsay

Add arrays to hold the registers, the definitions themselves, access
functions, and logic to reset counters when PMCR.P is set. Update
filtering code to support counters other than PMCCNTR.

Signed-off-by: Aaron Lindsay 
---
 target/arm/cpu.h|   3 +
 target/arm/helper.c | 238 ++--
 2 files changed, 216 insertions(+), 25 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index f058f5c..2f07196 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -468,6 +468,9 @@ typedef struct CPUARMState {
  * pmccntr_op_finish.
  */
 uint64_t c15_ccnt_delta;
+uint64_t c14_pmevcntr[31];
+uint64_t c14_pmevcntr_delta[31];
+uint64_t c14_pmevtyper[31];
 uint64_t pmccfiltr_el0; /* Performance Monitor Filter Register */
 uint64_t vpidr_el2; /* Virtualization Processor ID Register */
 uint64_t vmpidr_el2; /* Virtualization Multiprocessor ID Register */
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 7a715a6..b36630f 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -907,6 +907,7 @@ static const ARMCPRegInfo v6_cp_reginfo[] = {
 #define PMCRDP  0x10
 #define PMCRD   0x8
 #define PMCRC   0x4
+#define PMCRP   0x2
 #define PMCRE   0x1
 
 #define PMXEVTYPER_P  0x8000
@@ -1089,9 +1090,11 @@ static inline bool pmu_counter_enabled(CPUARMState *env, 
uint8_t counter)
 prohibited = env->cp15.c9_pmcr & PMCRDP;
 }
 
-/* TODO Remove assert, set filter to correct PMEVTYPER */
-assert(counter == 31);
-filter = env->cp15.pmccfiltr_el0;
+if (counter == 31) {
+filter = env->cp15.pmccfiltr_el0;
+} else {
+filter = env->cp15.c14_pmevtyper[counter];
+}
 
 p   = filter & PMXEVTYPER_P;
 u   = filter & PMXEVTYPER_U;
@@ -,6 +1114,21 @@ static inline bool pmu_counter_enabled(CPUARMState *env, 
uint8_t counter)
 filtered = m != p;
 }
 
+if (counter != 31) {
+/* If not checking PMCCNTR, ensure the counter is setup to an event we
+ * support */
+uint16_t event = filter & PMXEVTYPER_EVTCOUNT;
+if (event > 0x3f) {
+return false; /* We only support common architectural and
+ microarchitectural events */
+}
+
+uint16_t event_idx = supported_event_map[event];
+if (event_idx == SUPPORTED_EVENT_SENTINEL) {
+return false;
+}
+}
+
 return enabled && !prohibited && !filtered;
 }
 
@@ -1157,14 +1175,44 @@ void pmccntr_op_finish(CPUARMState *env)
 }
 }
 
+static void pmevcntr_op_start(CPUARMState *env, uint8_t counter)
+{
+
+uint16_t event = env->cp15.c14_pmevtyper[counter] & PMXEVTYPER_EVTCOUNT;
+uint16_t event_idx = supported_event_map[event];
+uint64_t count = pm_events[event_idx].get_count(env);
+
+if (pmu_counter_enabled(env, counter)) {
+env->cp15.c14_pmevcntr[counter] =
+count - env->cp15.c14_pmevcntr_delta[counter];
+}
+env->cp15.c14_pmevcntr_delta[counter] = count;
+}
+
+static void pmevcntr_op_finish(CPUARMState *env, uint8_t counter)
+{
+if (pmu_counter_enabled(env, counter)) {
+env->cp15.c14_pmevcntr_delta[counter] -=
+env->cp15.c14_pmevcntr[counter];
+}
+}
+
 void pmu_op_start(CPUARMState *env)
 {
+unsigned int i;
 pmccntr_op_start(env);
+for (i = 0; i < pmu_num_counters(env); i++) {
+pmevcntr_op_start(env, i);
+}
 }
 
 void pmu_op_finish(CPUARMState *env)
 {
+unsigned int i;
 pmccntr_op_finish(env);
+for (i = 0; i < pmu_num_counters(env); i++) {
+pmevcntr_op_finish(env, i);
+}
 }
 
 void pmu_pre_el_change(ARMCPU *cpu, void *ignored)
@@ -1187,6 +1235,13 @@ static void pmcr_write(CPUARMState *env, const 
ARMCPRegInfo *ri,
 env->cp15.c15_ccnt = 0;
 }
 
+if (value & PMCRP) {
+unsigned int i;
+for (i = 0; i < pmu_num_counters(env); i++) {
+env->cp15.c14_pmevcntr[i] = 0;
+}
+}
+
 /* only the DP, X, D and E bits are writable */
 env->cp15.c9_pmcr &= ~0x39;
 env->cp15.c9_pmcr |= (value & 0x39);
@@ -1240,6 +1295,14 @@ void pmccntr_op_finish(CPUARMState *env)
 {
 }
 
+void pmevcntr_op_start(CPUARMState *env, uint8_t i)
+{
+}
+
+void pmevcntr_op_finish(CPUARMState *env, uint8_t i)
+{
+}
+
 void pmu_op_start(CPUARMState *env)
 {
 }
@@ -1269,11 +1332,11 @@ static void pmccfiltr_write(CPUARMState *env, const 
ARMCPRegInfo *ri,
 static void pmccfiltr_write_a32(CPUARMState *env, const ARMCPRegInfo *ri,
 uint64_t value)
 {
-uint64_t saved_cycles = pmccntr_op_start(env);
+pmccntr_op_start(env);
 /* M is not accessible from AArch32 */
 env->cp15.pmccfiltr_el0 = (env->cp15.pmccfiltr_el0 & PMCCFILTR_M) |
 (value & PMCCFILTR);
-pmccntr_op_finish(env, saved_cycles);
+pmccntr_op_finish(env);
 }
 
 static uint64_t

[Qemu-devel] [PATCH v4 20/21] target/arm: Mark PMINTENSET accesses as possibly doing IO

2018-04-17 Thread Aaron Lindsay

This makes it match its AArch64 equivalent, PMINTENSET_EL1

Signed-off-by: Aaron Lindsay 
---
 target/arm/helper.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index 3902719..046e37c 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -1732,7 +1732,7 @@ static const ARMCPRegInfo v7_cp_reginfo[] = {
   .writefn = pmuserenr_write, .raw_writefn = raw_write },
 { .name = "PMINTENSET", .cp = 15, .crn = 9, .crm = 14, .opc1 = 0, .opc2 = 
1,
   .access = PL1_RW, .accessfn = access_tpm,
-  .type = ARM_CP_ALIAS,
+  .type = ARM_CP_ALIAS | ARM_CP_IO,
   .fieldoffset = offsetoflow32(CPUARMState, cp15.c9_pminten),
   .resetvalue = 0,
   .writefn = pmintenset_write, .raw_writefn = raw_write },
-- 
Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

Re: [Qemu-devel] [PATCH v2 2/5] cpus: convert qmp_memsave/qmp_pmemsave to use qemu_open

2018-04-17 Thread Eric Blake

On 04/12/2018 07:50 AM, Simon Ruderich wrote:
> qemu_open() allow passing file descriptors to qemu which is used in

s/allow/allows/

> restricted environments like libvirt where open() is prohibited.
> 
> Suggested-by: Eric Blake 
> Signed-off-by: Simon Ruderich 
> ---
>  cpus.c | 20 ++--
>  1 file changed, 10 insertions(+), 10 deletions(-)
> 

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH RESEND v2] i386/kvm: add support for KVM_CAP_X86_DISABLE_EXITS

2018-04-17 Thread Eduardo Habkost

On Tue, Apr 17, 2018 at 01:24:15AM -0700, Wanpeng Li wrote:
> From: Wanpeng Li 
> 
> This patch adds support for KVM_CAP_X86_DISABLE_EXITS. Provides userspace 
> with 
> per-VM capability(KVM_CAP_X86_DISABLE_EXITS) to not intercept MWAIT/HLT/PAUSE 
> in order that to improve latency in some workloads.
> 
> Cc: Paolo Bonzini 
> Cc: Radim Krčmář  
> Cc: Eduardo Habkost 
> Signed-off-by: Wanpeng Li 
> ---
> 
>  linux-headers/linux/kvm.h |  6 +-
>  target/i386/cpu.h |  2 ++
>  target/i386/kvm.c | 16 
>  3 files changed, 23 insertions(+), 1 deletion(-)
> 
> diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
> index a167be8..857df15 100644
> --- a/linux-headers/linux/kvm.h
> +++ b/linux-headers/linux/kvm.h
> @@ -925,7 +925,7 @@ struct kvm_ppc_resize_hpt {
>  #define KVM_CAP_S390_GS 140
>  #define KVM_CAP_S390_AIS 141
>  #define KVM_CAP_SPAPR_TCE_VFIO 142
> -#define KVM_CAP_X86_GUEST_MWAIT 143
> +#define KVM_CAP_X86_DISABLE_EXITS 143
>  #define KVM_CAP_ARM_USER_IRQ 144
>  #define KVM_CAP_S390_CMMA_MIGRATION 145
>  #define KVM_CAP_PPC_FWNMI 146
> @@ -1508,6 +1508,10 @@ struct kvm_assigned_msix_entry {
>  #define KVM_X2APIC_API_USE_32BIT_IDS(1ULL << 0)
>  #define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK  (1ULL << 1)
>  
> +#define KVM_X86_DISABLE_EXITS_MWAIT  (1 << 0)
> +#define KVM_X86_DISABLE_EXITS_HLT(1 << 1)
> +#define KVM_X86_DISABLE_EXITS_PAUSE  (1 << 2)
> +
>  /* Available with KVM_CAP_ARM_USER_IRQ */
>  
>  /* Bits for run->s.regs.device_irq_level */
> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> index 1b219fa..965de1b 100644
> --- a/target/i386/cpu.h
> +++ b/target/i386/cpu.h
> @@ -685,6 +685,8 @@ typedef uint32_t FeatureWordArray[FEATURE_WORDS];
>  #define CPUID_7_0_EDX_AVX512_4FMAPS (1U << 3) /* AVX512 Multiply 
> Accumulation Single Precision */
>  #define CPUID_7_0_EDX_SPEC_CTRL (1U << 26) /* Speculation Control */
>  
> +#define KVM_PV_UNHALT (1U << 7)
> +
>  #define KVM_HINTS_DEDICATED (1U << 0)
>  
>  #define CPUID_8000_0008_EBX_IBPB(1U << 12) /* Indirect Branch Prediction 
> Barrier */
> diff --git a/target/i386/kvm.c b/target/i386/kvm.c
> index 6c49954..3e99830 100644
> --- a/target/i386/kvm.c
> +++ b/target/i386/kvm.c
> @@ -1029,6 +1029,22 @@ int kvm_arch_init_vcpu(CPUState *cs)
>  }
>  }
>  
> +if (env->features[FEAT_KVM_HINTS] & KVM_HINTS_DEDICATED) {
> +int disable_exits = kvm_check_extension(cs->kvm_state, 
> KVM_CAP_X86_DISABLE_EXITS);
> +
> +if (disable_exits) {
> +disable_exits &= (KVM_X86_DISABLE_EXITS_MWAIT |
> +  KVM_X86_DISABLE_EXITS_HLT |
> +  KVM_X86_DISABLE_EXITS_PAUSE);
> +if (env->user_features[FEAT_KVM] & KVM_PV_UNHALT) {
> +disable_exits &= ~KVM_X86_DISABLE_EXITS_HLT;
> +}

In the future, if we decide to enable kvm-pv-unhalt by default,
should "-cpu ...,kvm-hint-dedicated=on" disable kvm-pv-unhalt
automatically, or should we require an explicit
"kvm-hint-dedicated=on,kvm-pv-unhalt=off" option?

For today's defaults, this patch solves the problem, only one
thing is missing before I give my R-b: we need to clearly
document what exactly are the consequences and requirements of
setting kvm-hint-dedicated=on (I'm not sure if the best place for
this is qemu-options.hx, x86_cpu_list(), or somewhere else).

> +}
> +if (kvm_vm_enable_cap(cs->kvm_state, KVM_CAP_X86_DISABLE_EXITS, 0, 
> disable_exits)) {
> +error_report("kvm: DISABLE EXITS not supported");
> +}
> +}
> +
>  qemu_add_vm_change_state_handler(cpu_update_state, env);
>  
>  c = cpuid_find_entry(_data.cpuid, 1, 0);
> -- 
> 2.7.4
> 

-- 
Eduardo

[Qemu-devel] [PATCH v4 13/21] target/arm: Add ARM_FEATURE_V7VE for v7 Virtualization Extensions

2018-04-17 Thread Aaron Lindsay

Signed-off-by: Aaron Lindsay 
Reviewed-by: Philippe Mathieu-Daudé 
---
 target/arm/cpu.c | 3 +++
 target/arm/cpu.h | 1 +
 2 files changed, 4 insertions(+)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 2228e4c..9d27ffc 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -774,6 +774,7 @@ static void arm_cpu_realizefn(DeviceState *dev, Error 
**errp)
 /* Some features automatically imply others: */
 if (arm_feature(env, ARM_FEATURE_V8)) {
 set_feature(env, ARM_FEATURE_V7);
+set_feature(env, ARM_FEATURE_V7VE);
 set_feature(env, ARM_FEATURE_ARM_DIV);
 set_feature(env, ARM_FEATURE_LPAE);
 }
@@ -1490,6 +1491,7 @@ static void cortex_a7_initfn(Object *obj)
 
 cpu->dtb_compatible = "arm,cortex-a7";
 set_feature(>env, ARM_FEATURE_V7);
+set_feature(>env, ARM_FEATURE_V7VE);
 set_feature(>env, ARM_FEATURE_VFP4);
 set_feature(>env, ARM_FEATURE_NEON);
 set_feature(>env, ARM_FEATURE_THUMB2EE);
@@ -1535,6 +1537,7 @@ static void cortex_a15_initfn(Object *obj)
 
 cpu->dtb_compatible = "arm,cortex-a15";
 set_feature(>env, ARM_FEATURE_V7);
+set_feature(>env, ARM_FEATURE_V7VE);
 set_feature(>env, ARM_FEATURE_VFP4);
 set_feature(>env, ARM_FEATURE_NEON);
 set_feature(>env, ARM_FEATURE_THUMB2EE);
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 9f769ae..132e08d 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1445,6 +1445,7 @@ enum arm_features {
 ARM_FEATURE_OMAPCP, /* OMAP specific CP15 ops handling.  */
 ARM_FEATURE_THUMB2EE,
 ARM_FEATURE_V7MP,/* v7 Multiprocessing Extensions */
+ARM_FEATURE_V7VE,/* v7 with Virtualization Extensions */
 ARM_FEATURE_V4T,
 ARM_FEATURE_V5,
 ARM_FEATURE_STRONGARM,
-- 
Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

[Qemu-devel] [PATCH v4 18/21] target/arm: PMU: Set PMCR.N to 4

2018-04-17 Thread Aaron Lindsay

This both advertises that we support four counters and adds them to the
implementation because the PMU_NUM_COUNTERS macro reads this value from
the PMCR.

Signed-off-by: Aaron Lindsay 
---
 target/arm/helper.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index b91f022..7970129 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -1587,7 +1587,7 @@ static const ARMCPRegInfo v7_cp_reginfo[] = {
   .access = PL1_W, .type = ARM_CP_NOP },
 /* Performance monitors are implementation defined in v7,
  * but with an ARM recommended set of registers, which we
- * follow (although we don't actually implement any counters)
+ * follow.
  *
  * Performance registers fall into three categories:
  *  (a) always UNDEF in PL0, RW in PL1 (PMINTENSET, PMINTENCLR)
@@ -5205,7 +5205,8 @@ void register_cp_regs_for_features(ARMCPU *cpu)
 .access = PL0_RW, .accessfn = pmreg_access,
 .type = ARM_CP_IO,
 .fieldoffset = offsetof(CPUARMState, cp15.c9_pmcr),
-.resetvalue = cpu->midr & 0xff00,
+/* 4 counters enabled */
+.resetvalue = (cpu->midr & 0xff00) | (0x4 << PMCRN_SHIFT),
 .writefn = pmcr_write, .raw_writefn = raw_write,
 };
 define_one_arm_cp_reg(cpu, );
-- 
Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

Re: [Qemu-devel] [PATCH v2 1/5] cpus: correct coding style in qmp_memsave/qmp_pmemsave

2018-04-17 Thread Eric Blake

On 04/12/2018 07:50 AM, Simon Ruderich wrote:
> Signed-off-by: Simon Ruderich 
> ---
>  cpus.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)

Reviewed-by: Eric Blake 

However, as a meta-comment, this message was sent with:

> Message-Id: 
> <68c390f22ae2afc6539cd7b127063e3d9534d50b.1523537181.git.si...@ruderich.org>
> In-Reply-To: <20180412124834.ga2...@ruderich.org>
> References: <20180412124834.ga2...@ruderich.org>

Looking on patchew, I see:
http://patchew.org/QEMU/20180412124834.ga2...@ruderich.org/
The requested URL /QEMU/20180412124834.ga2...@ruderich.org/ was not
found on this server.

For most messages, looking up the In-Reply-To: message-id gives the
cover letter (for example,
http://patchew.org/QEMU/20180412115838.10208-1-alex.ben...@linaro.org/)

Similarly, looking on the list archives, I don't see the 0/5 cover
letter, but rather that your messages are treated as threaded to your v1
thread:

https://lists.gnu.org/archive/html/qemu-devel/2018-04/threads.html#01388

For the sake of tooling, it's best to send a v2 series with a new cover
letter as a new top-level thread.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 0/4] move kvm_para.h to standard-headers

2018-04-17 Thread no-reply

Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 1523991487-241006-1-git-send-email-...@redhat.com
Subject: [Qemu-devel] [PATCH 0/4] move kvm_para.h to standard-headers

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
failed=1
echo
fi
n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag]   
patchew/1523991487-241006-1-git-send-email-...@redhat.com -> 
patchew/1523991487-241006-1-git-send-email-...@redhat.com
 t [tag update]
patchew/20180417133602.23832-1-marcandre.lur...@redhat.com -> 
patchew/20180417133602.23832-1-marcandre.lur...@redhat.com
 t [tag update]patchew/cover.1523968389.git.be...@igalia.com -> 
patchew/cover.1523968389.git.be...@igalia.com
Switched to a new branch 'test'
6c8133c47c linux-headers: drop kvm_para.h
ce5d4ce607 x86/cpu: use standard-headers/asm-x86.kvm_para.h
124799b55a include/standard-headers: add asm-x86/kvm_para.h
cd93f5f702 update-linux-headers.sh: drop kvm_para.h hacks

=== OUTPUT BEGIN ===
Checking PATCH 1/4: update-linux-headers.sh: drop kvm_para.h hacks...
ERROR: line over 90 characters
#48: FILE: scripts/update-linux-headers.sh:124:
+cp_portable "$tmpdir/include/asm/kvm_para.h" 
"$output/include/standard-headers/asm-$arch"

total: 1 errors, 0 warnings, 44 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 2/4: include/standard-headers: add asm-x86/kvm_para.h...
Checking PATCH 3/4: x86/cpu: use standard-headers/asm-x86.kvm_para.h...
Checking PATCH 4/4: linux-headers: drop kvm_para.h...
=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

[Qemu-devel] [PATCH v4 10/21] target/arm: Filter cycle counter based on PMCCFILTR_EL0

2018-04-17 Thread Aaron Lindsay

The pmu_counter_enabled and pmu_op_start/finish functions are generic
(as opposed to PMCCNTR-specific) to allow for the implementation of
other events.

Signed-off-by: Aaron Lindsay 
---
 target/arm/cpu.c|   3 ++
 target/arm/cpu.h|  22 +-
 target/arm/helper.c | 114 +++-
 3 files changed, 129 insertions(+), 10 deletions(-)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index d175c5e..2228e4c 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -896,6 +896,9 @@ static void arm_cpu_realizefn(DeviceState *dev, Error 
**errp)
 if (!cpu->has_pmu) {
 unset_feature(env, ARM_FEATURE_PMU);
 cpu->id_aa64dfr0 &= ~0xf00;
+} else if (!kvm_enabled()) {
+arm_register_pre_el_change_hook(cpu, _pre_el_change, 0);
+arm_register_el_change_hook(cpu, _post_el_change, 0);
 }
 
 if (!arm_feature(env, ARM_FEATURE_EL2)) {
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 4f0d914..a56e9a0 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -917,6 +917,24 @@ int cpu_arm_signal_handler(int host_signum, void *pinfo,
 void pmccntr_op_start(CPUARMState *env);
 void pmccntr_op_finish(CPUARMState *env);
 
+/**
+ * pmu_op_start/finish
+ * @env: CPUARMState
+ *
+ * Convert all PMU counters between their delta form (the typical mode when
+ * they are enabled) and the guest-visible values. These two calls must
+ * surround any action which might affect the counters, and the return value
+ * from pmu_op_start must be supplied as the second argument to pmu_op_finish.
+ */
+void pmu_op_start(CPUARMState *env);
+void pmu_op_finish(CPUARMState *env);
+
+/**
+ * Functions to register as EL change hooks for PMU mode filtering
+ */
+void pmu_pre_el_change(ARMCPU *cpu, void *ignored);
+void pmu_post_el_change(ARMCPU *cpu, void *ignored);
+
 /* SCTLR bit meanings. Several bits have been reused in newer
  * versions of the architecture; in that case we define constants
  * for both old and new bit meanings. Code which tests against those
@@ -978,7 +996,8 @@ void pmccntr_op_finish(CPUARMState *env);
 
 #define MDCR_EPMAD(1U << 21)
 #define MDCR_EDAD (1U << 20)
-#define MDCR_SPME (1U << 17)
+#define MDCR_SPME (1U << 17)  /* MDCR_EL3 */
+#define MDCR_HPMD (1U << 17)  /* MDCR_EL2 */
 #define MDCR_SDD  (1U << 16)
 #define MDCR_SPD  (3U << 14)
 #define MDCR_TDRA (1U << 11)
@@ -988,6 +1007,7 @@ void pmccntr_op_finish(CPUARMState *env);
 #define MDCR_HPME (1U << 7)
 #define MDCR_TPM  (1U << 6)
 #define MDCR_TPMCR(1U << 5)
+#define MDCR_HPMN (0x1fU)
 
 /* Not all of the MDCR_EL3 bits are present in the 32-bit SDCR */
 #define SDCR_VALID_MASK (MDCR_EPMAD | MDCR_EDAD | MDCR_SPME | MDCR_SPD)
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 8158d33..5953980 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -904,10 +904,20 @@ static const ARMCPRegInfo v6_cp_reginfo[] = {
 /* Definitions for the PMU registers */
 #define PMCRN_MASK  0xf800
 #define PMCRN_SHIFT 11
+#define PMCRDP  0x10
 #define PMCRD   0x8
 #define PMCRC   0x4
 #define PMCRE   0x1
 
+#define PMXEVTYPER_P  0x8000
+#define PMXEVTYPER_U  0x4000
+#define PMXEVTYPER_NSK0x2000
+#define PMXEVTYPER_NSU0x1000
+#define PMXEVTYPER_NSH0x0800
+#define PMXEVTYPER_M  0x0400
+#define PMXEVTYPER_MT 0x0200
+#define PMXEVTYPER_EVTCOUNT   0x03ff
+
 static inline uint32_t pmu_num_counters(CPUARMState *env)
 {
   return (env->cp15.c9_pmcr & PMCRN_MASK) >> PMCRN_SHIFT;
@@ -1003,16 +1013,66 @@ static CPAccessResult pmreg_access_ccntr(CPUARMState 
*env,
 return pmreg_access(env, ri, isread);
 }
 
-static inline bool arm_ccnt_enabled(CPUARMState *env)
+/* Returns true if the counter (pass 31 for PMCCNTR) should count events using
+ * the current EL, security state, and register configuration.
+ */
+static inline bool pmu_counter_enabled(CPUARMState *env, uint8_t counter)
 {
-/* This does not support checking PMCCFILTR_EL0 register */
+uint64_t filter;
+bool e, p, u, nsk, nsu, nsh, m;
+bool enabled, prohibited, filtered;
+bool secure = arm_is_secure(env);
+int el = arm_current_el(env);
+uint8_t hpmn = env->cp15.mdcr_el2 & MDCR_HPMN;
 
-if (!(env->cp15.c9_pmcr & PMCRE) || !(env->cp15.c9_pmcnten & (1 << 31))) {
-return false;
+if (!arm_feature(env, ARM_FEATURE_EL2) ||
+(counter < hpmn || counter == 31)) {
+e = env->cp15.c9_pmcr & PMCRE;
+} else {
+e = env->cp15.mdcr_el2 & MDCR_HPME;
+}
+enabled = e && (env->cp15.c9_pmcnten & (1 << counter));
+
+if (!secure) {
+if (el == 2 && (counter < hpmn || counter == 31)) {
+prohibited = env->cp15.mdcr_el2 & MDCR_HPMD;
+} else {
+prohibited = false;
+}
+} else {
+prohibited = arm_feature(env, ARM_FEATURE_EL3) &&
+

[Qemu-devel] [PATCH v4 14/21] target/arm: Implement PMOVSSET

2018-04-17 Thread Aaron Lindsay

Adding an array for v7VE+ CP registers was necessary so that PMOVSSET
wasn't defined for all v7 processors.

Signed-off-by: Aaron Lindsay 
---
 target/arm/helper.c | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index 20b42b4..572709e 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -1262,9 +1262,17 @@ static void pmcntenclr_write(CPUARMState *env, const 
ARMCPRegInfo *ri,
 static void pmovsr_write(CPUARMState *env, const ARMCPRegInfo *ri,
  uint64_t value)
 {
+value &= PMU_COUNTER_MASK(env);
 env->cp15.c9_pmovsr &= ~value;
 }
 
+static void pmovsset_write(CPUARMState *env, const ARMCPRegInfo *ri,
+ uint64_t value)
+{
+value &= PMU_COUNTER_MASK(env);
+env->cp15.c9_pmovsr |= value;
+}
+
 static void pmxevtyper_write(CPUARMState *env, const ARMCPRegInfo *ri,
  uint64_t value)
 {
@@ -1614,6 +1622,25 @@ static const ARMCPRegInfo v7mp_cp_reginfo[] = {
 REGINFO_SENTINEL
 };
 
+static const ARMCPRegInfo v7ve_cp_reginfo[] = {
+/* Performance monitor registers which are not implemented in v7 before
+ * v7ve:
+ */
+{ .name = "PMOVSSET", .cp = 15, .opc1 = 0, .crn = 9, .crm = 14, .opc2 = 3,
+  .access = PL0_RW, .accessfn = pmreg_access,
+  .fieldoffset = offsetoflow32(CPUARMState, cp15.c9_pmovsr),
+  .writefn = pmovsset_write,
+  .raw_writefn = raw_write },
+{ .name = "PMOVSSET_EL0", .state = ARM_CP_STATE_AA64,
+  .opc0 = 3, .opc1 = 3, .crn = 9, .crm = 14, .opc2 = 3,
+  .access = PL0_RW, .accessfn = pmreg_access,
+  .type = ARM_CP_ALIAS,
+  .fieldoffset = offsetof(CPUARMState, cp15.c9_pmovsr),
+  .writefn = pmovsset_write,
+  .raw_writefn = raw_write },
+REGINFO_SENTINEL
+};
+
 static void teecr_write(CPUARMState *env, const ARMCPRegInfo *ri,
 uint64_t value)
 {
@@ -4965,6 +4992,9 @@ void register_cp_regs_for_features(ARMCPU *cpu)
 !arm_feature(env, ARM_FEATURE_PMSA)) {
 define_arm_cp_regs(cpu, v7mp_cp_reginfo);
 }
+if (arm_feature(env, ARM_FEATURE_V7VE)) {
+define_arm_cp_regs(cpu, v7ve_cp_reginfo);
+}
 if (arm_feature(env, ARM_FEATURE_V7)) {
 /* v7 performance monitor control register: same implementor
  * field as main ID register, and we implement only the cycle
-- 
Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

[Qemu-devel] [PATCH v4 11/21] target/arm: Allow AArch32 access for PMCCFILTR

2018-04-17 Thread Aaron Lindsay

Signed-off-by: Aaron Lindsay 
---
 target/arm/helper.c | 27 ++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index 5953980..62cace7 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -918,6 +918,10 @@ static const ARMCPRegInfo v6_cp_reginfo[] = {
 #define PMXEVTYPER_MT 0x0200
 #define PMXEVTYPER_EVTCOUNT   0x03ff
 
+#define PMCCFILTR 0xf800
+#define PMCCFILTR_M   PMXEVTYPER_M
+#define PMCCFILTR_EL0 (PMCCFILTR | PMCCFILTR_M)
+
 static inline uint32_t pmu_num_counters(CPUARMState *env)
 {
   return (env->cp15.c9_pmcr & PMCRN_MASK) >> PMCRN_SHIFT;
@@ -1221,10 +1225,26 @@ static void pmccfiltr_write(CPUARMState *env, const 
ARMCPRegInfo *ri,
 uint64_t value)
 {
 pmccntr_op_start(env);
-env->cp15.pmccfiltr_el0 = value & 0xfc00;
+env->cp15.pmccfiltr_el0 = value & PMCCFILTR_EL0;
 pmccntr_op_finish(env);
 }
 
+static void pmccfiltr_write_a32(CPUARMState *env, const ARMCPRegInfo *ri,
+uint64_t value)
+{
+uint64_t saved_cycles = pmccntr_op_start(env);
+/* M is not accessible from AArch32 */
+env->cp15.pmccfiltr_el0 = (env->cp15.pmccfiltr_el0 & PMCCFILTR_M) |
+(value & PMCCFILTR);
+pmccntr_op_finish(env, saved_cycles);
+}
+
+static uint64_t pmccfiltr_read_a32(CPUARMState *env, const ARMCPRegInfo *ri)
+{
+/* M is not visible in AArch32 */
+return env->cp15.pmccfiltr_el0 & PMCCFILTR;
+}
+
 static void pmcntenset_write(CPUARMState *env, const ARMCPRegInfo *ri,
 uint64_t value)
 {
@@ -1442,6 +1462,11 @@ static const ARMCPRegInfo v7_cp_reginfo[] = {
   .type = ARM_CP_IO,
   .readfn = pmccntr_read, .writefn = pmccntr_write, },
 #endif
+{ .name = "PMCCFILTR", .cp = 15, .opc1 = 0, .crn = 14, .crm = 15, .opc2 = 
7,
+  .writefn = pmccfiltr_write_a32, .readfn = pmccfiltr_read_a32,
+  .access = PL0_RW, .accessfn = pmreg_access,
+  .type = ARM_CP_ALIAS | ARM_CP_IO,
+  .resetvalue = 0, },
 { .name = "PMCCFILTR_EL0", .state = ARM_CP_STATE_AA64,
   .opc0 = 3, .opc1 = 3, .crn = 14, .crm = 15, .opc2 = 7,
   .writefn = pmccfiltr_write,
-- 
Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

[Qemu-devel] [PATCH v4 19/21] target/arm: Implement PMSWINC

2018-04-17 Thread Aaron Lindsay

Signed-off-by: Aaron Lindsay 
---
 target/arm/helper.c | 39 +--
 1 file changed, 37 insertions(+), 2 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index 7970129..3902719 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -949,6 +949,15 @@ static bool event_always_supported(CPUARMState *env)
 return true;
 }
 
+static uint64_t swinc_get_count(CPUARMState *env)
+{
+/*
+ * SW_INCR events are written directly to the pmevcntr's by writes to
+ * PMSWINC, so there is no underlying count maintained by the PMU itself
+ */
+return 0;
+}
+
 /*
  * Return the underlying cycle count for the PMU cycle counters. If we're in
  * usermode, simply return 0.
@@ -977,6 +986,10 @@ static uint64_t instructions_get_count(CPUARMState *env)
 
 #define SUPPORTED_EVENT_SENTINEL UINT16_MAX
 static const pm_event pm_events[] = {
+{ .number = 0x000, /* SW_INCR */
+  .supported = event_always_supported,
+  .get_count = swinc_get_count
+},
 #ifndef CONFIG_USER_ONLY
 { .number = 0x008, /* INST_RETIRED, Instruction architecturally executed */
   .supported = instructions_supported,
@@ -1287,6 +1300,24 @@ static void pmcr_write(CPUARMState *env, const 
ARMCPRegInfo *ri,
 pmu_op_finish(env);
 }
 
+static void pmswinc_write(CPUARMState *env, const ARMCPRegInfo *ri,
+  uint64_t value)
+{
+unsigned int i;
+for (i = 0; i < pmu_num_counters(env); i++) {
+/* Increment a counter's count iff: */
+if ((value & (1 << i)) && /* counter's bit is set */
+/* counter is enabled and not filtered */
+pmu_counter_enabled(env, i) &&
+/* counter is SW_INCR */
+(env->cp15.c14_pmevtyper[i] & PMXEVTYPER_EVTCOUNT) == 0x0) {
+pmevcntr_op_start(env, i);
+env->cp15.c14_pmevcntr[i]++;
+pmevcntr_op_finish(env, i);
+}
+}
+}
+
 static uint64_t pmccntr_read(CPUARMState *env, const ARMCPRegInfo *ri)
 {
 uint64_t ret;
@@ -1632,9 +1663,13 @@ static const ARMCPRegInfo v7_cp_reginfo[] = {
   .fieldoffset = offsetof(CPUARMState, cp15.c9_pmovsr),
   .writefn = pmovsr_write,
   .raw_writefn = raw_write },
-/* Unimplemented so WI. */
 { .name = "PMSWINC", .cp = 15, .crn = 9, .crm = 12, .opc1 = 0, .opc2 = 4,
-  .access = PL0_W, .accessfn = pmreg_access_swinc, .type = ARM_CP_NOP },
+  .access = PL0_W, .accessfn = pmreg_access_swinc, .type = ARM_CP_NO_RAW,
+  .writefn = pmswinc_write },
+{ .name = "PMSWINC_EL0", .state = ARM_CP_STATE_AA64,
+  .opc0 = 3, .opc1 = 3, .crn = 9, .crm = 12, .opc2 = 4,
+  .access = PL0_W, .accessfn = pmreg_access_swinc, .type = ARM_CP_NO_RAW,
+  .writefn = pmswinc_write },
 { .name = "PMSELR", .cp = 15, .crn = 9, .crm = 12, .opc1 = 0, .opc2 = 5,
   .access = PL0_RW, .type = ARM_CP_ALIAS,
   .fieldoffset = offsetoflow32(CPUARMState, cp15.c9_pmselr),
-- 
Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

[Qemu-devel] [PATCH v4 08/21] target/arm: Allow EL change hooks to do IO

2018-04-17 Thread Aaron Lindsay

During code generation, surround CPSR writes and exception returns which
call the EL change hooks with gen_io_start/end. The immediate need is
for the PMU to access the clock and icount during EL change to support
mode filtering.

Signed-off-by: Aaron Lindsay 
---
 target/arm/translate-a64.c |  6 ++
 target/arm/translate.c | 12 
 2 files changed, 18 insertions(+)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index c913292..bff4e13 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -1930,7 +1930,13 @@ static void disas_uncond_b_reg(DisasContext *s, uint32_t 
insn)
 unallocated_encoding(s);
 return;
 }
+if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
+gen_io_start();
+}
 gen_helper_exception_return(cpu_env);
+if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
+gen_io_end();
+}
 /* Must exit loop to check un-masked IRQs */
 s->base.is_jmp = DISAS_EXIT;
 return;
diff --git a/target/arm/translate.c b/target/arm/translate.c
index db1ce65..9bc2ce1 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -4548,7 +4548,13 @@ static void gen_rfe(DisasContext *s, TCGv_i32 pc, 
TCGv_i32 cpsr)
  * appropriately depending on the new Thumb bit, so it must
  * be called after storing the new PC.
  */
+if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
+gen_io_start();
+}
 gen_helper_cpsr_write_eret(cpu_env, cpsr);
+if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
+gen_io_end();
+}
 tcg_temp_free_i32(cpsr);
 /* Must exit loop to check un-masked IRQs */
 s->base.is_jmp = DISAS_EXIT;
@@ -9843,7 +9849,13 @@ static void disas_arm_insn(DisasContext *s, unsigned int 
insn)
 if (exc_return) {
 /* Restore CPSR from SPSR.  */
 tmp = load_cpu_field(spsr);
+if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
+gen_io_start();
+}
 gen_helper_cpsr_write_eret(cpu_env, tmp);
+if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
+gen_io_end();
+}
 tcg_temp_free_i32(tmp);
 /* Must exit loop to check un-masked IRQs */
 s->base.is_jmp = DISAS_EXIT;
-- 
Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

[Qemu-devel] [PATCH v4 07/21] target/arm: Add pre-EL change hooks

2018-04-17 Thread Aaron Lindsay

Because the design of the PMU requires that the counter values be
converted between their delta and guest-visible forms for mode
filtering, an additional hook which occurs before the EL is changed is
necessary.

Signed-off-by: Aaron Lindsay 
---
 target/arm/cpu.c   | 16 
 target/arm/cpu.h   | 22 +++---
 target/arm/helper.c| 14 --
 target/arm/internals.h |  7 +++
 target/arm/op_helper.c |  8 
 5 files changed, 58 insertions(+), 9 deletions(-)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 1f689f6..d175c5e 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -55,6 +55,17 @@ static bool arm_cpu_has_work(CPUState *cs)
  | CPU_INTERRUPT_EXITTB);
 }
 
+void arm_register_pre_el_change_hook(ARMCPU *cpu, ARMELChangeHookFn *hook,
+ void *opaque)
+{
+ARMELChangeHook *entry = g_new0(ARMELChangeHook, 1);
+
+entry->hook = hook;
+entry->opaque = opaque;
+
+QLIST_INSERT_HEAD(>pre_el_change_hooks, entry, node);
+}
+
 void arm_register_el_change_hook(ARMCPU *cpu, ARMELChangeHookFn *hook,
  void *opaque)
 {
@@ -554,6 +565,7 @@ static void arm_cpu_initfn(Object *obj)
 cpu->cp_regs = g_hash_table_new_full(g_int_hash, g_int_equal,
  g_free, g_free);
 
+QLIST_INIT(>pre_el_change_hooks);
 QLIST_INIT(>el_change_hooks);
 
 #ifndef CONFIG_USER_ONLY
@@ -721,6 +733,10 @@ static void arm_cpu_finalizefn(Object *obj)
 
 g_hash_table_destroy(cpu->cp_regs);
 
+QLIST_FOREACH_SAFE(hook, >pre_el_change_hooks, node, next) {
+QLIST_REMOVE(hook, node);
+g_free(hook);
+}
 QLIST_FOREACH_SAFE(hook, >el_change_hooks, node, next) {
 QLIST_REMOVE(hook, node);
 g_free(hook);
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 50d129b..4f0d914 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -841,6 +841,7 @@ struct ARMCPU {
  */
 bool cfgend;
 
+QLIST_HEAD(, ARMELChangeHook) pre_el_change_hooks;
 QLIST_HEAD(, ARMELChangeHook) el_change_hooks;
 
 int32_t node_id; /* NUMA node this CPU belongs to */
@@ -2905,14 +2906,29 @@ static inline AddressSpace *arm_addressspace(CPUState 
*cs, MemTxAttrs attrs)
 #endif
 
 /**
- * arm_register_el_change_hook:
- * Register a hook function which will be called back whenever this
+ * arm_register_pre_el_change_hook:
+ * Register a hook function which will be called immediately before this
  * CPU changes exception level or mode. The hook function will be
  * passed a pointer to the ARMCPU and the opaque data pointer passed
  * to this function when the hook was registered.
+ *
+ * Note that if a pre-change hook is called, any registered post-change hooks
+ * are guaranteed to subsequently be called.
  */
-void arm_register_el_change_hook(ARMCPU *cpu, ARMELChangeHookFn *hook,
+void arm_register_pre_el_change_hook(ARMCPU *cpu, ARMELChangeHookFn *hook,
  void *opaque);
+/**
+ * arm_register_el_change_hook:
+ * Register a hook function which will be called immediately after this
+ * CPU changes exception level or mode. The hook function will be
+ * passed a pointer to the ARMCPU and the opaque data pointer passed
+ * to this function when the hook was registered.
+ *
+ * Note that any registered hooks registered here are guaranteed to be called
+ * if pre-change hooks have been.
+ */
+void arm_register_el_change_hook(ARMCPU *cpu, ARMELChangeHookFn *hook, void
+*opaque);
 
 /**
  * aa32_vfp_dreg:
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 8bec07e..de3be11 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -8254,6 +8254,14 @@ void arm_cpu_do_interrupt(CPUState *cs)
 return;
 }
 
+/* Hooks may change global state so BQL should be held, also the
+ * BQL needs to be held for any modification of
+ * cs->interrupt_request.
+ */
+g_assert(qemu_mutex_iothread_locked());
+
+arm_call_pre_el_change_hook(cpu);
+
 assert(!excp_is_internal(cs->exception_index));
 if (arm_el_is_aa64(env, new_el)) {
 arm_cpu_do_interrupt_aarch64(cs);
@@ -8261,12 +8269,6 @@ void arm_cpu_do_interrupt(CPUState *cs)
 arm_cpu_do_interrupt_aarch32(cs);
 }
 
-/* Hooks may change global state so BQL should be held, also the
- * BQL needs to be held for any modification of
- * cs->interrupt_request.
- */
-g_assert(qemu_mutex_iothread_locked());
-
 arm_call_el_change_hook(cpu);
 
 if (!kvm_enabled()) {
diff --git a/target/arm/internals.h b/target/arm/internals.h
index 6358c2a..dc93577 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -728,6 +728,13 @@ void arm_cpu_do_transaction_failed(CPUState *cs, hwaddr 
physaddr,
MemTxResult response, uintptr_t retaddr);
 
 /* Call any registered EL change hooks */
+static inline void

[Qemu-devel] [PATCH v4 15/21] target/arm: Add array for supported PMU events, generate PMCEID[01]

2018-04-17 Thread Aaron Lindsay

This commit doesn't add any supported events, but provides the framework
for adding them. We store the pm_event structs in a simple array, and
provide the mapping from the event numbers to array indexes in the
supported_event_map array. Because the value of PMCEID[01] depends upon
which events are supported at runtime, generate it dynamically.

Signed-off-by: Aaron Lindsay 
---
 target/arm/cpu.c| 20 +---
 target/arm/cpu.h| 10 ++
 target/arm/cpu64.c  |  2 --
 target/arm/helper.c | 37 +
 4 files changed, 60 insertions(+), 9 deletions(-)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 9d27ffc..22063ca 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -897,9 +897,19 @@ static void arm_cpu_realizefn(DeviceState *dev, Error 
**errp)
 if (!cpu->has_pmu) {
 unset_feature(env, ARM_FEATURE_PMU);
 cpu->id_aa64dfr0 &= ~0xf00;
-} else if (!kvm_enabled()) {
-arm_register_pre_el_change_hook(cpu, _pre_el_change, 0);
-arm_register_el_change_hook(cpu, _post_el_change, 0);
+}
+if (arm_feature(env, ARM_FEATURE_PMU)) {
+uint64_t pmceid = get_pmceid(>env);
+cpu->pmceid0 = pmceid & 0x;
+cpu->pmceid1 = (pmceid >> 32) & 0x;
+
+if (!kvm_enabled()) {
+arm_register_pre_el_change_hook(cpu, _pre_el_change, 0);
+arm_register_el_change_hook(cpu, _post_el_change, 0);
+}
+} else {
+cpu->pmceid0 = 0x;
+cpu->pmceid1 = 0x;
 }
 
 if (!arm_feature(env, ARM_FEATURE_EL2)) {
@@ -1511,8 +1521,6 @@ static void cortex_a7_initfn(Object *obj)
 cpu->id_pfr0 = 0x1131;
 cpu->id_pfr1 = 0x00011011;
 cpu->id_dfr0 = 0x02010555;
-cpu->pmceid0 = 0x;
-cpu->pmceid1 = 0x;
 cpu->id_afr0 = 0x;
 cpu->id_mmfr0 = 0x10101105;
 cpu->id_mmfr1 = 0x4000;
@@ -1557,8 +1565,6 @@ static void cortex_a15_initfn(Object *obj)
 cpu->id_pfr0 = 0x1131;
 cpu->id_pfr1 = 0x00011011;
 cpu->id_dfr0 = 0x02010555;
-cpu->pmceid0 = 0x000;
-cpu->pmceid1 = 0x;
 cpu->id_afr0 = 0x;
 cpu->id_mmfr0 = 0x10201105;
 cpu->id_mmfr1 = 0x2000;
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 132e08d..f058f5c 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -935,6 +935,16 @@ void pmu_op_finish(CPUARMState *env);
 void pmu_pre_el_change(ARMCPU *cpu, void *ignored);
 void pmu_post_el_change(ARMCPU *cpu, void *ignored);
 
+/*
+ * get_pmceid
+ * @env: CPUARMState
+ *
+ * Return the PMCEID[01] register values corresponding to the counters which
+ * are supported given the current configuration (0 is low 32, 1 is high 32
+ * bits)
+ */
+uint64_t get_pmceid(CPUARMState *env);
+
 /* SCTLR bit meanings. Several bits have been reused in newer
  * versions of the architecture; in that case we define constants
  * for both old and new bit meanings. Code which tests against those
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index 991d764..7da0ea4 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -141,8 +141,6 @@ static void aarch64_a57_initfn(Object *obj)
 cpu->id_isar5 = 0x00011121;
 cpu->id_aa64pfr0 = 0x;
 cpu->id_aa64dfr0 = 0x10305106;
-cpu->pmceid0 = 0x;
-cpu->pmceid1 = 0x;
 cpu->id_aa64isar0 = 0x00011120;
 cpu->id_aa64mmfr0 = 0x1124;
 cpu->dbgdidr = 0x3516d000;
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 572709e..7a715a6 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -933,6 +933,43 @@ static inline uint64_t pmu_counter_mask(CPUARMState *env)
   return (1 << 31) | ((1 << pmu_num_counters(env)) - 1);
 }
 
+typedef struct pm_event {
+uint16_t number; /* PMEVTYPER.evtCount is 10 bits wide */
+/* If the event is supported on this CPU (used to generate PMCEID[01]) */
+bool (*supported)(CPUARMState *);
+/* Retrieve the current count of the underlying event. The programmed
+ * counters hold a difference from the return value from this function */
+uint64_t (*get_count)(CPUARMState *);
+} pm_event;
+
+#define SUPPORTED_EVENT_SENTINEL UINT16_MAX
+static const pm_event pm_events[] = {
+{ .number = SUPPORTED_EVENT_SENTINEL }
+};
+static uint16_t supported_event_map[0x3f];
+
+/*
+ * Called upon initialization to build PMCEID0 (low 32 bits) and PMCEID1 (high
+ * 32). We also use it to build a map of ARM event numbers to indices in
+ * our pm_events array.
+ */
+uint64_t get_pmceid(CPUARMState *env)
+{
+uint64_t pmceid = 0;
+unsigned int i = 0;
+while (pm_events[i].number != SUPPORTED_EVENT_SENTINEL) {
+const pm_event *cnt = _events[i];
+if (cnt->number < 0x3f && cnt->supported(env)) {
+pmceid |= (1 << cnt->number);
+supported_event_map[cnt->number] = i;
+} else {
+supported_event_map[cnt->number] =

[Qemu-devel] [PATCH v4 06/21] target/arm: Support multiple EL change hooks

2018-04-17 Thread Aaron Lindsay

Signed-off-by: Aaron Lindsay 
---
 target/arm/cpu.c   | 21 -
 target/arm/cpu.h   | 20 ++--
 target/arm/internals.h |  7 ---
 3 files changed, 30 insertions(+), 18 deletions(-)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 022d8c5..1f689f6 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -55,13 +55,15 @@ static bool arm_cpu_has_work(CPUState *cs)
  | CPU_INTERRUPT_EXITTB);
 }
 
-void arm_register_el_change_hook(ARMCPU *cpu, ARMELChangeHook *hook,
+void arm_register_el_change_hook(ARMCPU *cpu, ARMELChangeHookFn *hook,
  void *opaque)
 {
-/* We currently only support registering a single hook function */
-assert(!cpu->el_change_hook);
-cpu->el_change_hook = hook;
-cpu->el_change_hook_opaque = opaque;
+ARMELChangeHook *entry = g_new0(ARMELChangeHook, 1);
+
+entry->hook = hook;
+entry->opaque = opaque;
+
+QLIST_INSERT_HEAD(>el_change_hooks, entry, node);
 }
 
 static void cp_reg_reset(gpointer key, gpointer value, gpointer opaque)
@@ -552,6 +554,8 @@ static void arm_cpu_initfn(Object *obj)
 cpu->cp_regs = g_hash_table_new_full(g_int_hash, g_int_equal,
  g_free, g_free);
 
+QLIST_INIT(>el_change_hooks);
+
 #ifndef CONFIG_USER_ONLY
 /* Our inbound IRQ and FIQ lines */
 if (kvm_enabled()) {
@@ -713,7 +717,14 @@ static void arm_cpu_post_init(Object *obj)
 static void arm_cpu_finalizefn(Object *obj)
 {
 ARMCPU *cpu = ARM_CPU(obj);
+ARMELChangeHook *hook, *next;
+
 g_hash_table_destroy(cpu->cp_regs);
+
+QLIST_FOREACH_SAFE(hook, >el_change_hooks, node, next) {
+QLIST_REMOVE(hook, node);
+g_free(hook);
+}
 }
 
 static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index ff349f5..50d129b 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -642,12 +642,17 @@ typedef struct CPUARMState {
 } CPUARMState;
 
 /**
- * ARMELChangeHook:
+ * ARMELChangeHookFn:
  * type of a function which can be registered via arm_register_el_change_hook()
  * to get callbacks when the CPU changes its exception level or mode.
  */
-typedef void ARMELChangeHook(ARMCPU *cpu, void *opaque);
-
+typedef void ARMELChangeHookFn(ARMCPU *cpu, void *opaque);
+typedef struct ARMELChangeHook ARMELChangeHook;
+struct ARMELChangeHook {
+ARMELChangeHookFn *hook;
+void *opaque;
+QLIST_ENTRY(ARMELChangeHook) node;
+};
 
 /* These values map onto the return values for
  * QEMU_PSCI_0_2_FN_AFFINITY_INFO */
@@ -836,8 +841,7 @@ struct ARMCPU {
  */
 bool cfgend;
 
-ARMELChangeHook *el_change_hook;
-void *el_change_hook_opaque;
+QLIST_HEAD(, ARMELChangeHook) el_change_hooks;
 
 int32_t node_id; /* NUMA node this CPU belongs to */
 
@@ -2906,12 +2910,8 @@ static inline AddressSpace *arm_addressspace(CPUState 
*cs, MemTxAttrs attrs)
  * CPU changes exception level or mode. The hook function will be
  * passed a pointer to the ARMCPU and the opaque data pointer passed
  * to this function when the hook was registered.
- *
- * Note that we currently only support registering a single hook function,
- * and will assert if this function is called twice.
- * This facility is intended for the use of the GICv3 emulation.
  */
-void arm_register_el_change_hook(ARMCPU *cpu, ARMELChangeHook *hook,
+void arm_register_el_change_hook(ARMCPU *cpu, ARMELChangeHookFn *hook,
  void *opaque);
 
 /**
diff --git a/target/arm/internals.h b/target/arm/internals.h
index 8ce944b..6358c2a 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -727,11 +727,12 @@ void arm_cpu_do_transaction_failed(CPUState *cs, hwaddr 
physaddr,
int mmu_idx, MemTxAttrs attrs,
MemTxResult response, uintptr_t retaddr);
 
-/* Call the EL change hook if one has been registered */
+/* Call any registered EL change hooks */
 static inline void arm_call_el_change_hook(ARMCPU *cpu)
 {
-if (cpu->el_change_hook) {
-cpu->el_change_hook(cpu, cpu->el_change_hook_opaque);
+ARMELChangeHook *hook, *next;
+QLIST_FOREACH_SAFE(hook, >el_change_hooks, node, next) {
+hook->hook(cpu, hook->opaque);
 }
 }
 
-- 
Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

Re: [Qemu-devel] [PATCH v4 0/9] enable numa configuration before machine_init() from QMP

2018-04-17 Thread Eduardo Habkost

On Tue, Apr 17, 2018 at 05:41:10PM +0200, Igor Mammedov wrote:
> On Tue, 17 Apr 2018 11:27:39 -0300
> Eduardo Habkost  wrote:
> 
> > On Tue, Apr 17, 2018 at 04:13:34PM +0200, Markus Armbruster wrote:
> > > Igor Mammedov  writes:
> > > 
> > > [...]  
> > > > Series allows to configure NUMA mapping at runtime using QMP
> > > > interface. For that to happen it introduces a new '-preconfig' CLI 
> > > > option
> > > > which allows to pause QEMU before machine_init() is run and
> > > > adds new set-numa-node QMP command which in conjunction with
> > > > query-hotpluggable-cpus allows to configure NUMA mapping for cpus.
> > > >
> > > > Later we can modify other commands to run early, for example device_add.
> > > > I recall SPAPR had problem when libvirt started QEMU with -S and, while 
> > > > it's
> > > > paused, added CPUs with device_add. Intent was to coldplug CPUs (but at 
> > > > that
> > > > stage it's considered hotplug already), so SPAPR had to work around the 
> > > > issue.  
> > > 
> > > That instance is just stupidity / laziness, I think: we consider any
> > > plug after machine creation a hot plug.  Real machines remain cold until
> > > you press the power button.  Our virtual machines should remain cold
> > > until they start running, i.e. with -S until the first "cont".
> It probably would be too risky to change semantics of -S from hotplug to 
> coldplug.
> But even if we were easy it won't matter in case if dynamic configuration
> done properly. More on it below.
> 
> > > I vaguely remember me asking this before, but your answer didn't make it
> > > into this cover letter, which gives me a pretext to ask again instead of
> > > looking it up in the archives: what exactly prevents us from keeping the
> > > machine cold enough for numa configuration until the first "cont"?  
> > 
> > I also think this would be better, but it seems to be difficult
> > in practice, see:
> > http://mid.mail-archive.com/20180323210532.GD28161@localhost.localdomain
> 
> In addition to Eduardo's reply, here is what I've answered back
> when you've asked question the 1st time (v2 late at -S pause point reconfig):
> https://www.mail-archive.com/qemu-devel@nongnu.org/msg504140.html
> 
> In short:
> I think it's wrong in general doing fixups after machine is build
> instead of getting correct configuration before building machine.
> That's going to be complex and fragile and might be hard to do at
> all depending on what we are fixing up.

What "building the machine" should mean, exactly, for external
users?

The main question I'd like to see answered is: why exactly we
must "build" the machine before the first "cont" is issued when
using -S?  Why can't we delay everything to "cont" when using -S?

Is it just because it's a long and complex task?  Does that mean
we might still do that eventually, and eliminate the
prelaunch/preconfig distinction in the distant future?

Even if we follow your approach, we need to answer these
questions.  I'm sure we will try to reorder initialization steps
between the preconfig/prelaunch states in the future, and we
shouldn't break any expectations from external users when doing
that.

> 
> BTW this is an outdated version of series and there is a newer one v5
> https://patchwork.ozlabs.org/cover/895315/
> so pleases review it.
> 
> Short diff vs 1:
>  - only limited(minimum) set of commands is available at preconfig stage for 
> now
>  - use QAPI schema to mark commands as preconfig enabled,
>so mgmt could see when it can use commands.
>  - added preconfig runstate state-machine instead of adding more global 
> variables
>to cleanly keep track of where QEMU is paused and what it's allowed to do

-- 
Eduardo

[Qemu-devel] [PATCH v4 09/21] target/arm: Fix bitmask for PMCCFILTR writes

2018-04-17 Thread Aaron Lindsay

It was shifted to the left one bit too few.

Signed-off-by: Aaron Lindsay 
Reviewed-by: Peter Maydell 
---
 target/arm/helper.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index de3be11..8158d33 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -1125,7 +1125,7 @@ static void pmccfiltr_write(CPUARMState *env, const 
ARMCPRegInfo *ri,
 uint64_t value)
 {
 pmccntr_op_start(env);
-env->cp15.pmccfiltr_el0 = value & 0x7E00;
+env->cp15.pmccfiltr_el0 = value & 0xfc00;
 pmccntr_op_finish(env);
 }
 
-- 
Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

[Qemu-devel] [PATCH v4 03/21] target/arm: Reorganize PMCCNTR accesses

2018-04-17 Thread Aaron Lindsay

pmccntr_read and pmccntr_write contained duplicate code that was already
being handled by pmccntr_sync. Consolidate the duplicated code into two
functions: pmccntr_op_start and pmccntr_op_finish. Add a companion to
c15_ccnt in CPUARMState so that we can simultaneously save both the
architectural register value and the last underlying cycle count - this
ensure time isn't lost and will also allow us to access the 'old'
architectural register value in order to detect overflows in later
patches.

Signed-off-by: Aaron Lindsay 
---
 target/arm/cpu.h|  28 ++-
 target/arm/helper.c | 100 
 2 files changed, 73 insertions(+), 55 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 19a0c03..04041db 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -454,10 +454,20 @@ typedef struct CPUARMState {
 uint64_t oslsr_el1; /* OS Lock Status */
 uint64_t mdcr_el2;
 uint64_t mdcr_el3;
-/* If the counter is enabled, this stores the last time the counter
- * was reset. Otherwise it stores the counter value
+/* Stores the architectural value of the counter *the last time it was
+ * updated* by pmccntr_op_start. Accesses should always be surrounded
+ * by pmccntr_op_start/pmccntr_op_finish to guarantee the latest
+ * architecturally-corect value is being read/set.
  */
 uint64_t c15_ccnt;
+/* Stores the delta between the architectural value and the underlying
+ * cycle count during normal operation. It is used to update c15_ccnt
+ * to be the correct architectural value before accesses. During
+ * accesses, c15_ccnt_delta contains the underlying count being used
+ * for the access, after which it reverts to the delta value in
+ * pmccntr_op_finish.
+ */
+uint64_t c15_ccnt_delta;
 uint64_t pmccfiltr_el0; /* Performance Monitor Filter Register */
 uint64_t vpidr_el2; /* Virtualization Processor ID Register */
 uint64_t vmpidr_el2; /* Virtualization Multiprocessor ID Register */
@@ -890,15 +900,17 @@ int cpu_arm_signal_handler(int host_signum, void *pinfo,
void *puc);
 
 /**
- * pmccntr_sync
+ * pmccntr_op_start/finish
  * @env: CPUARMState
  *
- * Synchronises the counter in the PMCCNTR. This must always be called twice,
- * once before any action that might affect the timer and again afterwards.
- * The function is used to swap the state of the register if required.
- * This only happens when not in user mode (!CONFIG_USER_ONLY)
+ * Convert the counter in the PMCCNTR between its delta form (the typical mode
+ * when it's enabled) and the guest-visible value. These two calls must always
+ * surround any action which might affect the counter, and the return value
+ * from pmccntr_op_start must be supplied as the second argument to
+ * pmccntr_op_finish.
  */
-void pmccntr_sync(CPUARMState *env);
+void pmccntr_op_start(CPUARMState *env);
+void pmccntr_op_finish(CPUARMState *env);
 
 /* SCTLR bit meanings. Several bits have been reused in newer
  * versions of the architecture; in that case we define constants
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 83ea8f4..f6269a2 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -1000,28 +1000,53 @@ static inline bool arm_ccnt_enabled(CPUARMState *env)
 
 return true;
 }
-
-void pmccntr_sync(CPUARMState *env)
+/*
+ * Ensure c15_ccnt is the guest-visible count so that operations such as
+ * enabling/disabling the counter or filtering, modifying the count itself,
+ * etc. can be done logically. This is essentially a no-op if the counter is
+ * not enabled at the time of the call.
+ */
+void pmccntr_op_start(CPUARMState *env)
 {
-uint64_t temp_ticks;
-
-temp_ticks = muldiv64(qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL),
+uint64_t cycles = 0;
+cycles = muldiv64(qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL),
   ARM_CPU_FREQ, NANOSECONDS_PER_SECOND);
 
-if (env->cp15.c9_pmcr & PMCRD) {
-/* Increment once every 64 processor clock cycles */
-temp_ticks /= 64;
-}
-
 if (arm_ccnt_enabled(env)) {
-env->cp15.c15_ccnt = temp_ticks - env->cp15.c15_ccnt;
+uint64_t eff_cycles = cycles;
+if (env->cp15.c9_pmcr & PMCRD) {
+/* Increment once every 64 processor clock cycles */
+eff_cycles /= 64;
+}
+
+env->cp15.c15_ccnt = eff_cycles - env->cp15.c15_ccnt_delta;
+}
+env->cp15.c15_ccnt_delta = cycles;
+}
+
+/*
+ * If PMCCNTR is enabled, recalculate the delta between the clock and the
+ * guest-visible count. A call to pmccntr_op_finish should follow every call to
+ * pmccntr_op_start.
+ */
+void pmccntr_op_finish(CPUARMState *env)
+{
+if (arm_ccnt_enabled(env)) {
+uint64_t prev_cycles = env->cp15.c15_ccnt_delta;
+
+if

[Qemu-devel] [PATCH v4 17/21] target/arm: PMU: Add instruction and cycle events

2018-04-17 Thread Aaron Lindsay

The instruction event is only enabled when icount is used, cycles are
always supported. Always defining get_cycle_count (but altering its
behavior depending on CONFIG_USER_ONLY) allows us to remove some
CONFIG_USER_ONLY #defines throughout the rest of the code.

Signed-off-by: Aaron Lindsay 
---
 target/arm/helper.c | 88 ++---
 1 file changed, 43 insertions(+), 45 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index b36630f..b91f022 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -15,6 +15,7 @@
 #include "arm_ldst.h"
 #include  /* For crc32 */
 #include "exec/semihost.h"
+#include "sysemu/cpus.h"
 #include "sysemu/kvm.h"
 #include "fpu/softfloat.h"
 
@@ -943,8 +944,49 @@ typedef struct pm_event {
 uint64_t (*get_count)(CPUARMState *);
 } pm_event;
 
+static bool event_always_supported(CPUARMState *env)
+{
+return true;
+}
+
+/*
+ * Return the underlying cycle count for the PMU cycle counters. If we're in
+ * usermode, simply return 0.
+ */
+static uint64_t cycles_get_count(CPUARMState *env)
+{
+#ifndef CONFIG_USER_ONLY
+return muldiv64(qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL),
+   ARM_CPU_FREQ, NANOSECONDS_PER_SECOND);
+#else
+return 0;
+#endif
+}
+
+#ifndef CONFIG_USER_ONLY
+static bool instructions_supported(CPUARMState *env)
+{
+return use_icount == 1 /* Precise instruction counting */;
+}
+
+static uint64_t instructions_get_count(CPUARMState *env)
+{
+return (uint64_t)cpu_get_icount_raw();
+}
+#endif
+
 #define SUPPORTED_EVENT_SENTINEL UINT16_MAX
 static const pm_event pm_events[] = {
+#ifndef CONFIG_USER_ONLY
+{ .number = 0x008, /* INST_RETIRED, Instruction architecturally executed */
+  .supported = instructions_supported,
+  .get_count = instructions_get_count
+},
+{ .number = 0x011, /* CPU_CYCLES, Cycle */
+  .supported = event_always_supported,
+  .get_count = cycles_get_count
+},
+#endif
 { .number = SUPPORTED_EVENT_SENTINEL }
 };
 static uint16_t supported_event_map[0x3f];
@@ -1024,8 +1066,6 @@ static CPAccessResult pmreg_access_swinc(CPUARMState *env,
 return pmreg_access(env, ri, isread);
 }
 
-#ifndef CONFIG_USER_ONLY
-
 static CPAccessResult pmreg_access_selr(CPUARMState *env,
 const ARMCPRegInfo *ri,
 bool isread)
@@ -1140,9 +1180,7 @@ static inline bool pmu_counter_enabled(CPUARMState *env, 
uint8_t counter)
  */
 void pmccntr_op_start(CPUARMState *env)
 {
-uint64_t cycles = 0;
-cycles = muldiv64(qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL),
-  ARM_CPU_FREQ, NANOSECONDS_PER_SECOND);
+uint64_t cycles = cycles_get_count(env);
 
 if (pmu_counter_enabled(env, 31)) {
 uint64_t eff_cycles = cycles;
@@ -1285,42 +1323,6 @@ static void pmccntr_write32(CPUARMState *env, const 
ARMCPRegInfo *ri,
 pmccntr_write(env, ri, deposit64(cur_val, 0, 32, value));
 }
 
-#else /* CONFIG_USER_ONLY */
-
-void pmccntr_op_start(CPUARMState *env)
-{
-}
-
-void pmccntr_op_finish(CPUARMState *env)
-{
-}
-
-void pmevcntr_op_start(CPUARMState *env, uint8_t i)
-{
-}
-
-void pmevcntr_op_finish(CPUARMState *env, uint8_t i)
-{
-}
-
-void pmu_op_start(CPUARMState *env)
-{
-}
-
-void pmu_op_finish(CPUARMState *env)
-{
-}
-
-void pmu_pre_el_change(ARMCPU *cpu, void *ignored)
-{
-}
-
-void pmu_post_el_change(ARMCPU *cpu, void *ignored)
-{
-}
-
-#endif
-
 static void pmccfiltr_write(CPUARMState *env, const ARMCPRegInfo *ri,
 uint64_t value)
 {
@@ -1633,7 +1635,6 @@ static const ARMCPRegInfo v7_cp_reginfo[] = {
 /* Unimplemented so WI. */
 { .name = "PMSWINC", .cp = 15, .crn = 9, .crm = 12, .opc1 = 0, .opc2 = 4,
   .access = PL0_W, .accessfn = pmreg_access_swinc, .type = ARM_CP_NOP },
-#ifndef CONFIG_USER_ONLY
 { .name = "PMSELR", .cp = 15, .crn = 9, .crm = 12, .opc1 = 0, .opc2 = 5,
   .access = PL0_RW, .type = ARM_CP_ALIAS,
   .fieldoffset = offsetoflow32(CPUARMState, cp15.c9_pmselr),
@@ -1653,7 +1654,6 @@ static const ARMCPRegInfo v7_cp_reginfo[] = {
   .access = PL0_RW, .accessfn = pmreg_access_ccntr,
   .type = ARM_CP_IO,
   .readfn = pmccntr_read, .writefn = pmccntr_write, },
-#endif
 { .name = "PMCCFILTR", .cp = 15, .opc1 = 0, .crn = 14, .crm = 15, .opc2 = 
7,
   .writefn = pmccfiltr_write_a32, .readfn = pmccfiltr_read_a32,
   .access = PL0_RW, .accessfn = pmreg_access,
@@ -5191,7 +5191,6 @@ void register_cp_regs_for_features(ARMCPU *cpu)
  * field as main ID register, and we implement only the cycle
  * count register.
  */
-#ifndef CONFIG_USER_ONLY
 ARMCPRegInfo pmcr = {
 .name = "PMCR", .cp = 15, .crn = 9, .crm = 12, .opc1 = 0, .opc2 = 
0,
 .access = PL0_RW,
@@ -5245,7 +5244,6 @@ void register_cp_regs_for_features(ARMCPU *cpu)
 g_free(pmevtyper_name);

[Qemu-devel] [Bug 1654137] Re: Ctrl-A b not working in 2.8.0

2018-04-17 Thread Philippe Mathieu-Daudé

Hi Andreas, beware... while 1b2503fcf7b5 fixes this bug, it introduces another 
regression.
I suggest waiting for the release tag before cherry-picking it.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1654137

Title:
  Ctrl-A b not working in 2.8.0

Status in QEMU:
  Fix Committed

Bug description:
  With a recent update from 2.7.0 to 2.8.0 I have discovered that I can
  no longer send a "break" to the VM.  Ctrl-A b is simply ignored.
  Other Ctrl-A sequences seem to work correctly.

  This is on a NetBSD amd64 system, version 7.99.53, and qemu was
  installed on this system from source.

  Reverting to the previous install restores "break" capability.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1654137/+subscriptions

[Qemu-devel] [PATCH v4 04/21] target/arm: Mask PMU register writes based on PMCR_EL0.N

2018-04-17 Thread Aaron Lindsay

This is in preparation for enabling counters other than PMCCNTR

Signed-off-by: Aaron Lindsay 
Reviewed-by: Peter Maydell 
---
 target/arm/helper.c | 31 ++-
 1 file changed, 22 insertions(+), 9 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index f6269a2..8bec07e 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -52,11 +52,6 @@ typedef struct V8M_SAttributes {
 static void v8m_security_lookup(CPUARMState *env, uint32_t address,
 MMUAccessType access_type, ARMMMUIdx mmu_idx,
 V8M_SAttributes *sattrs);
-
-/* Definitions for the PMCCNTR and PMCR registers */
-#define PMCRD   0x8
-#define PMCRC   0x4
-#define PMCRE   0x1
 #endif
 
 static int vfp_gdb_get_reg(CPUARMState *env, uint8_t *buf, int reg)
@@ -906,6 +901,24 @@ static const ARMCPRegInfo v6_cp_reginfo[] = {
 REGINFO_SENTINEL
 };
 
+/* Definitions for the PMU registers */
+#define PMCRN_MASK  0xf800
+#define PMCRN_SHIFT 11
+#define PMCRD   0x8
+#define PMCRC   0x4
+#define PMCRE   0x1
+
+static inline uint32_t pmu_num_counters(CPUARMState *env)
+{
+  return (env->cp15.c9_pmcr & PMCRN_MASK) >> PMCRN_SHIFT;
+}
+
+/* Bits allowed to be set/cleared for PMCNTEN* and PMINTEN* */
+static inline uint64_t pmu_counter_mask(CPUARMState *env)
+{
+  return (1 << 31) | ((1 << pmu_num_counters(env)) - 1);
+}
+
 static CPAccessResult pmreg_access(CPUARMState *env, const ARMCPRegInfo *ri,
bool isread)
 {
@@ -1119,14 +1132,14 @@ static void pmccfiltr_write(CPUARMState *env, const 
ARMCPRegInfo *ri,
 static void pmcntenset_write(CPUARMState *env, const ARMCPRegInfo *ri,
 uint64_t value)
 {
-value &= (1 << 31);
+value &= pmu_counter_mask(env);
 env->cp15.c9_pmcnten |= value;
 }
 
 static void pmcntenclr_write(CPUARMState *env, const ARMCPRegInfo *ri,
  uint64_t value)
 {
-value &= (1 << 31);
+value &= pmu_counter_mask(env);
 env->cp15.c9_pmcnten &= ~value;
 }
 
@@ -1174,14 +1187,14 @@ static void pmintenset_write(CPUARMState *env, const 
ARMCPRegInfo *ri,
  uint64_t value)
 {
 /* We have no event counters so only the C bit can be changed */
-value &= (1 << 31);
+value &= pmu_counter_mask(env);
 env->cp15.c9_pminten |= value;
 }
 
 static void pmintenclr_write(CPUARMState *env, const ARMCPRegInfo *ri,
  uint64_t value)
 {
-value &= (1 << 31);
+value &= pmu_counter_mask(env);
 env->cp15.c9_pminten &= ~value;
 }
 
-- 
Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

[Qemu-devel] [PATCH v4 01/21] target/arm: Check PMCNTEN for whether PMCCNTR is enabled

2018-04-17 Thread Aaron Lindsay

Signed-off-by: Aaron Lindsay 
Reviewed-by: Peter Maydell 
---
 target/arm/helper.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index b14fdab..485004e 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -994,7 +994,7 @@ static inline bool arm_ccnt_enabled(CPUARMState *env)
 {
 /* This does not support checking PMCCFILTR_EL0 register */
 
-if (!(env->cp15.c9_pmcr & PMCRE)) {
+if (!(env->cp15.c9_pmcr & PMCRE) || !(env->cp15.c9_pmcnten & (1 << 31))) {
 return false;
 }
 
-- 
Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

[Qemu-devel] [PATCH v4 12/21] target/arm: Make PMOVSCLR and PMUSERENR 64 bits wide

2018-04-17 Thread Aaron Lindsay

This is a bug fix to ensure 64-bit reads of these registers don't read
adjacent data.

Signed-off-by: Aaron Lindsay 
---
 target/arm/cpu.h| 4 ++--
 target/arm/helper.c | 5 +++--
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index a56e9a0..9f769ae 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -367,8 +367,8 @@ typedef struct CPUARMState {
 uint32_t c9_data;
 uint64_t c9_pmcr; /* performance monitor control register */
 uint64_t c9_pmcnten; /* perf monitor counter enables */
-uint32_t c9_pmovsr; /* perf monitor overflow status */
-uint32_t c9_pmuserenr; /* perf monitor user enable */
+uint64_t c9_pmovsr; /* perf monitor overflow status */
+uint64_t c9_pmuserenr; /* perf monitor user enable */
 uint64_t c9_pmselr; /* perf monitor counter selection register */
 uint64_t c9_pminten; /* perf monitor interrupt enables */
 union { /* Memory attribute redirection */
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 62cace7..20b42b4 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -1427,7 +1427,8 @@ static const ARMCPRegInfo v7_cp_reginfo[] = {
   .fieldoffset = offsetof(CPUARMState, cp15.c9_pmcnten),
   .writefn = pmcntenclr_write },
 { .name = "PMOVSR", .cp = 15, .crn = 9, .crm = 12, .opc1 = 0, .opc2 = 3,
-  .access = PL0_RW, .fieldoffset = offsetof(CPUARMState, cp15.c9_pmovsr),
+  .access = PL0_RW,
+  .fieldoffset = offsetoflow32(CPUARMState, cp15.c9_pmovsr),
   .accessfn = pmreg_access,
   .writefn = pmovsr_write,
   .raw_writefn = raw_write },
@@ -1487,7 +1488,7 @@ static const ARMCPRegInfo v7_cp_reginfo[] = {
   .accessfn = pmreg_access_xevcntr },
 { .name = "PMUSERENR", .cp = 15, .crn = 9, .crm = 14, .opc1 = 0, .opc2 = 0,
   .access = PL0_R | PL1_RW, .accessfn = access_tpm,
-  .fieldoffset = offsetof(CPUARMState, cp15.c9_pmuserenr),
+  .fieldoffset = offsetoflow32(CPUARMState, cp15.c9_pmuserenr),
   .resetvalue = 0,
   .writefn = pmuserenr_write, .raw_writefn = raw_write },
 { .name = "PMUSERENR_EL0", .state = ARM_CP_STATE_AA64,
-- 
Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

[Qemu-devel] [PATCH v4 02/21] target/arm: Treat PMCCNTR as alias of PMCCNTR_EL0

2018-04-17 Thread Aaron Lindsay

They share the same underlying state

Signed-off-by: Aaron Lindsay 
Reviewed-by: Peter Maydell 
---
 target/arm/helper.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index 485004e..83ea8f4 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -1318,7 +1318,7 @@ static const ARMCPRegInfo v7_cp_reginfo[] = {
   .fieldoffset = offsetof(CPUARMState, cp15.c9_pmselr),
   .writefn = pmselr_write, .raw_writefn = raw_write, },
 { .name = "PMCCNTR", .cp = 15, .crn = 9, .crm = 13, .opc1 = 0, .opc2 = 0,
-  .access = PL0_RW, .resetvalue = 0, .type = ARM_CP_IO,
+  .access = PL0_RW, .resetvalue = 0, .type = ARM_CP_ALIAS | ARM_CP_IO,
   .readfn = pmccntr_read, .writefn = pmccntr_write32,
   .accessfn = pmreg_access_ccntr },
 { .name = "PMCCNTR_EL0", .state = ARM_CP_STATE_AA64,
-- 
Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

[Qemu-devel] [PATCH v4 05/21] target/arm: Fetch GICv3 state directly from CPUARMState

2018-04-17 Thread Aaron Lindsay

This eliminates the need for fetching it from el_change_hook_opaque, and
allows for supporting multiple el_change_hooks without having to hack
something together to find the registered opaque belonging to GICv3.

Signed-off-by: Aaron Lindsay 
Reviewed-by: Peter Maydell 
---
 hw/intc/arm_gicv3_cpuif.c | 10 ++
 target/arm/cpu.h  | 10 --
 2 files changed, 2 insertions(+), 18 deletions(-)

diff --git a/hw/intc/arm_gicv3_cpuif.c b/hw/intc/arm_gicv3_cpuif.c
index 26f5eed..cb9a3a5 100644
--- a/hw/intc/arm_gicv3_cpuif.c
+++ b/hw/intc/arm_gicv3_cpuif.c
@@ -29,11 +29,7 @@ void gicv3_set_gicv3state(CPUState *cpu, GICv3CPUState *s)
 
 static GICv3CPUState *icc_cs_from_env(CPUARMState *env)
 {
-/* Given the CPU, find the right GICv3CPUState struct.
- * Since we registered the CPU interface with the EL change hook as
- * the opaque pointer, we can just directly get from the CPU to it.
- */
-return arm_get_el_change_hook_opaque(arm_env_get_cpu(env));
+return env->gicv3state;
 }
 
 static bool gicv3_use_ns_bank(CPUARMState *env)
@@ -2615,9 +2611,7 @@ void gicv3_init_cpuif(GICv3State *s)
  * it might be with code translated by CPU 0 but run by CPU 1, in
  * which case we'd get the wrong value.
  * So instead we define the regs with no ri->opaque info, and
- * get back to the GICv3CPUState from the ARMCPU by reading back
- * the opaque pointer from the el_change_hook, which we're going
- * to need to register anyway.
+ * get back to the GICv3CPUState from the CPUARMState.
  */
 define_arm_cp_regs(cpu, gicv3_cpuif_reginfo);
 if (arm_feature(>env, ARM_FEATURE_EL2)
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 04041db..ff349f5 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -2915,16 +2915,6 @@ void arm_register_el_change_hook(ARMCPU *cpu, 
ARMELChangeHook *hook,
  void *opaque);
 
 /**
- * arm_get_el_change_hook_opaque:
- * Return the opaque data that will be used by the el_change_hook
- * for this CPU.
- */
-static inline void *arm_get_el_change_hook_opaque(ARMCPU *cpu)
-{
-return cpu->el_change_hook_opaque;
-}
-
-/**
  * aa32_vfp_dreg:
  * Return a pointer to the Dn register within env in 32-bit mode.
  */
-- 
Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

[Qemu-devel] [PATCH v4 00/21] More fully implement ARM PMUv3

2018-04-17 Thread Aaron Lindsay

The ARM PMU implementation currently contains a basic cycle counter, but it is
often useful to gather counts of other events and filter them based on
execution mode. These patches flesh out the implementations of various PMU
registers including PM[X]EVCNTR and PM[X]EVTYPER, add a struct definition to
represent arbitrary counter types, implement mode filtering, send interrupts on
counter overflow, and add instruction, cycle, and software increment events.

Notable changes since v3:

* Detect counter overflow and send interrupts accordingly (adds a 'shadow' copy
  of both PMCCNTR and general-purpose counters, possibly/probably Doing It
  Wrong)
* Update counter filtering code to more closely resemble the ARM documentation
  in form and functionality 
* Don't mix EL change hooks and KVM
* Don't call gen_io_start/end if not actually using icount
* Reorganized a few of the patches to more logically group changes
* Clarify and otherwise improve a few comments
* There are also a number of less significant changes scattered around

Thanks,
Aaron

Aaron Lindsay (21):
  target/arm: Check PMCNTEN for whether PMCCNTR is enabled
  target/arm: Treat PMCCNTR as alias of PMCCNTR_EL0
  target/arm: Reorganize PMCCNTR accesses
  target/arm: Mask PMU register writes based on PMCR_EL0.N
  target/arm: Fetch GICv3 state directly from CPUARMState
  target/arm: Support multiple EL change hooks
  target/arm: Add pre-EL change hooks
  target/arm: Allow EL change hooks to do IO
  target/arm: Fix bitmask for PMCCFILTR writes
  target/arm: Filter cycle counter based on PMCCFILTR_EL0
  target/arm: Allow AArch32 access for PMCCFILTR
  target/arm: Make PMOVSCLR and PMUSERENR 64 bits wide
  target/arm: Add ARM_FEATURE_V7VE for v7 Virtualization Extensions
  target/arm: Implement PMOVSSET
  target/arm: Add array for supported PMU events, generate PMCEID[01]
  target/arm: Finish implementation of PM[X]EVCNTR and PM[X]EVTYPER
  target/arm: PMU: Add instruction and cycle events
  target/arm: PMU: Set PMCR.N to 4
  target/arm: Implement PMSWINC
  target/arm: Mark PMINTENSET accesses as possibly doing IO
  target/arm: Send interrupts on PMU counter overflow

 hw/intc/arm_gicv3_cpuif.c  |  10 +-
 target/arm/cpu.c   |  68 +++-
 target/arm/cpu.h   | 119 +--
 target/arm/cpu64.c |   2 -
 target/arm/helper.c| 752 ++---
 target/arm/internals.h |  14 +-
 target/arm/op_helper.c |   8 +
 target/arm/translate-a64.c |   6 +
 target/arm/translate.c |  12 +
 9 files changed, 834 insertions(+), 157 deletions(-)

-- 
Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

[Qemu-devel] [Bug 1654137] Re: Ctrl-A b not working in 2.8.0

2018-04-17 Thread Andreas Gustafsson

Fixed on qemu mainline in 1b2503fcf7b5932c5a3779ca2ceb92bd403c4ee7 -
thanks.  I have backported the fix to pkgsrc as qemu-2.11.1nb3.


** Changed in: qemu
   Status: Confirmed => Fix Committed

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1654137

Title:
  Ctrl-A b not working in 2.8.0

Status in QEMU:
  Fix Committed

Bug description:
  With a recent update from 2.7.0 to 2.8.0 I have discovered that I can
  no longer send a "break" to the VM.  Ctrl-A b is simply ignored.
  Other Ctrl-A sequences seem to work correctly.

  This is on a NetBSD amd64 system, version 7.99.53, and qemu was
  installed on this system from source.

  Reverting to the previous install restores "break" capability.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1654137/+subscriptions

Re: [Qemu-devel] [RFC PATCH V4 2/4] vfio: Add vm status change callback to stop/restart the mdev device

2018-04-17 Thread Alex Williamson

On Wed, 18 Apr 2018 00:44:35 +0530
Kirti Wankhede  wrote:

> On 4/17/2018 8:13 PM, Alex Williamson wrote:
> > On Tue, 17 Apr 2018 13:40:32 +
> > "Zhang, Yulei"  wrote:
> >   
> >>> -Original Message-
> >>> From: Alex Williamson [mailto:alex.william...@redhat.com]
> >>> Sent: Tuesday, April 17, 2018 4:23 AM
> >>> To: Kirti Wankhede 
> >>> Cc: Zhang, Yulei ; qemu-devel@nongnu.org; Tian,
> >>> Kevin ; joonas.lahti...@linux.intel.com;
> >>> zhen...@linux.intel.com; Wang, Zhi A ;
> >>> dgilb...@redhat.com; quint...@redhat.com
> >>> Subject: Re: [RFC PATCH V4 2/4] vfio: Add vm status change callback to
> >>> stop/restart the mdev device
> >>>
> >>> On Mon, 16 Apr 2018 20:14:27 +0530
> >>> Kirti Wankhede  wrote:
> >>> 
>  On 4/10/2018 11:32 AM, Yulei Zhang wrote:
> > VM status change handler is added to change the vfio pci device
> > status during the migration, write the demanded device status
> > to the DEVICE STATUS subregion to stop the device on the source side
> > before fetch its status and start the deivce on the target side
> > after restore its status.
> >
> > Signed-off-by: Yulei Zhang 
> > ---
> >  hw/vfio/pci.c | 20 
> >  include/hw/vfio/vfio-common.h |  1 +
> >  linux-headers/linux/vfio.h|  6 ++
> >  roms/seabios  |  2 +-
> >  4 files changed, 28 insertions(+), 1 deletion(-)
> >
> > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > index f98a9dd..13d8c73 100644
> > --- a/hw/vfio/pci.c
> > +++ b/hw/vfio/pci.c
> > @@ -38,6 +38,7 @@
> >
> >  static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
> >  static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
> > +static void vfio_vm_change_state_handler(void *pv, int running,
> >>> RunState state);
> >
> >  /*
> >   * Disabling BAR mmaping can be slow, but toggling it around INTx can
> > @@ -2896,6 +2897,7 @@ static void vfio_realize(PCIDevice *pdev, Error   
> >  
> >>> **errp)
> >  vfio_register_err_notifier(vdev);
> >  vfio_register_req_notifier(vdev);
> >  vfio_setup_resetfn_quirk(vdev);
> > +
> >>> qemu_add_vm_change_state_handler(vfio_vm_change_state_handler,
> >>> vdev);
> >
> >  return;
> >
> > @@ -2982,6 +2984,24 @@ post_reset:
> >  vfio_pci_post_reset(vdev);
> >  }
> >
> > +static void vfio_vm_change_state_handler(void *pv, int running,
> >>> RunState state)
> > +{
> > +VFIOPCIDevice *vdev = pv;
> > +VFIODevice *vbasedev = >vbasedev;
> > +uint8_t dev_state;
> > +uint8_t sz = 1;
> > +
> > +dev_state = running ? VFIO_DEVICE_START : VFIO_DEVICE_STOP;
> > +
> > +if (pwrite(vdev->vbasedev.fd, _state,
> > +   sz, vdev->device_state.offset) != sz) {
> > +error_report("vfio: Failed to %s device", running ? "start" : 
> > "stop");
> > +return;
> > +}
> > +
> > +vbasedev->device_state = dev_state;
> > +}
> > +
> 
>  Is it expected to trap device_state region by vendor driver?
>  Can this information be communicated to vendor driver through an ioctl?  
>    
> >>>
> >>> Either the mdev vendor driver or vfio bus driver (ie. vfio-pci) would
> >>> be providing REGION_INFO for this region, so the vendor driver is
> >>> already in full control here using existing ioctls.  I don't see that
> >>> we need new ioctls, we just need to fully define the API of the
> >>> proposed regions here.
> >>> 
> >> If the device state region is mmaped, we may not be able to use
> >> region device state offset to convey the running state. It may need
> >> a new ioctl to set the device state.  
> > 
> > The vendor driver defines the mmap'ability of the region, the vendor
> > driver is still in control.  The API of the region and the
> > implementation by the vendor driver should account for handling
> > mmap'able sections within the region.  Thanks,
> > 
> > Alex
> > 
> >
> 
> If this same region should be used for communicating state or other
> parameters instead of ioctl, may be first page of this region need to be
> reserved. Mmappable region's start address should be page aligned. Is
> this API going to utilize 4K of the reserved part of this region?
> Instead of carving out part of section from the region, are there any
> disadvantages of adding an ioctl?
> May be defining a single ioctl and using different flags (GET_*/SET_*)
> would work?

Yes, ioctls are something that should be feared and reviewed with great
scrutiny and we should feel bad if we do a poor job defining them and
burn ioctl numbers whereas we have 32bits worth of region

Re: [Qemu-devel] [PATCH] mux: fix ctrl-a b again

2018-04-17 Thread Peter Maydell

On 17 April 2018 at 19:36, Philippe Mathieu-Daudé  wrote:
> Hi,
>
 Opinions welcome on whether this is a regression fix worth
 putting into rc4.
>>>
>>> It is a regression, but a long standing one - we've been broken for quite
>>> a while since 2.9.0 or even before.
>>>
>>> If we're doing an rc4 anyway I'd suggest including it, but not the end of
>>> the world if it has to go in via -stable given how long we've been broken
>>> for.
>>>
>>
>> Thanks for the clarification; I've applied this to master.
>
> Since this commit, the console on the Malta board stay black...

Thanks for catching that. Since as Dan says the send-break
feature wasn't a regression from 2.11, we're best off reverting
this patch for now, and then we can look at what's happening
for 2.13.

-- PMM

Re: [Qemu-devel] [PATCH 3/4] x86/cpu: use standard-headers/asm-x86.kvm_para.h

2018-04-17 Thread Eduardo Habkost

On Tue, Apr 17, 2018 at 09:58:25PM +0300, Michael S. Tsirkin wrote:
> Switch to the header we imported from Linux,
> this allows us to drop a hack in kvm_i386.h.
> More code will be dropped in the next patch.
> 
> Signed-off-by: Michael S. Tsirkin 
> ---
>  include/sysemu/kvm.h   | 1 -
>  target/i386/cpu.h  | 2 --
>  target/i386/kvm_i386.h | 6 --
>  hw/i386/kvm/clock.c| 2 +-
>  target/i386/cpu.c  | 4 +---
>  target/i386/kvm.c  | 4 ++--
>  6 files changed, 4 insertions(+), 15 deletions(-)
> 
> diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
> index 23669c4..0b64b8e 100644
> --- a/include/sysemu/kvm.h
> +++ b/include/sysemu/kvm.h
> @@ -22,7 +22,6 @@
>  #ifdef NEED_CPU_H
>  # ifdef CONFIG_KVM
>  #  include 
> -#  include 
>  #  define CONFIG_KVM_IS_POSSIBLE
>  # endif
>  #else
> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> index 1b219fa..9aaab70 100644
> --- a/target/i386/cpu.h
> +++ b/target/i386/cpu.h
> @@ -685,8 +685,6 @@ typedef uint32_t FeatureWordArray[FEATURE_WORDS];
>  #define CPUID_7_0_EDX_AVX512_4FMAPS (1U << 3) /* AVX512 Multiply 
> Accumulation Single Precision */
>  #define CPUID_7_0_EDX_SPEC_CTRL (1U << 26) /* Speculation Control */
>  
> -#define KVM_HINTS_DEDICATED (1U << 0)

include/standard-headers/asm-x86/kvm_para.h defines it as 0, but
the only user is fixed below, so this is OK.

> -
>  #define CPUID_8000_0008_EBX_IBPB(1U << 12) /* Indirect Branch Prediction 
> Barrier */
>  
>  #define CPUID_XSAVE_XSAVEOPT   (1U << 0)
> diff --git a/target/i386/kvm_i386.h b/target/i386/kvm_i386.h
> index 1de9876..e5df24c 100644
> --- a/target/i386/kvm_i386.h
> +++ b/target/i386/kvm_i386.h
> @@ -30,12 +30,6 @@
>  #define kvm_pic_in_kernel()  0
>  #define kvm_ioapic_in_kernel()   0
>  
> -/* These constants must never be used at runtime if kvm_enabled() is false.
> - * They exist so we don't need #ifdefs around KVM-specific code that already
> - * checks kvm_enabled() properly.
> - */
> -#define KVM_CPUID_FEATURES   0

The only usage of this macro without CONFIG_KVM is in
feature_word_info, so this change isn't just harmless: it's
necessary to make feature_word_info accurate.


> -
>  #endif  /* CONFIG_KVM */
>  
>  bool kvm_allows_irq0_override(void);
> diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
> index 7dac319..0bf1c60 100644
> --- a/hw/i386/kvm/clock.c
> +++ b/hw/i386/kvm/clock.c
> @@ -26,7 +26,7 @@
>  #include "qapi/error.h"
>  
>  #include 
> -#include 
> +#include "standard-headers/asm-x86/kvm_para.h"
>  
>  #define TYPE_KVM_CLOCK "kvmclock"
>  #define KVM_CLOCK(obj) OBJECT_CHECK(KVMClockState, (obj), TYPE_KVM_CLOCK)
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index 1a6b082..efdca33 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -40,9 +40,7 @@
>  #include "qom/qom-qobject.h"
>  #include "sysemu/arch_init.h"
>  
> -#if defined(CONFIG_KVM)
> -#include 
> -#endif
> +#include "standard-headers/asm-x86/kvm_para.h"
>  
>  #include "sysemu/sysemu.h"
>  #include "hw/qdev-properties.h"
> diff --git a/target/i386/kvm.c b/target/i386/kvm.c
> index 6c49954..44f8584 100644
> --- a/target/i386/kvm.c
> +++ b/target/i386/kvm.c
> @@ -18,7 +18,7 @@
>  #include 
>  
>  #include 
> -#include 
> +#include "standard-headers/asm-x86/kvm_para.h"
>  
>  #include "qemu-common.h"
>  #include "cpu.h"
> @@ -385,7 +385,7 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, 
> uint32_t function,
>  ret &= ~(1U << KVM_FEATURE_PV_UNHALT);
>  }
>  } else if (function == KVM_CPUID_FEATURES && reg == R_EDX) {
> -ret |= KVM_HINTS_DEDICATED;
> +ret |= 1U << KVM_HINTS_DEDICATED;
>  found = 1;

Reviewed-by: Eduardo Habkost 

-- 
Eduardo

Re: [Qemu-devel] [PATCH v3 12/22] target/arm: Filter cycle counter based on PMCCFILTR_EL0

2018-04-17 Thread Aaron Lindsay

On Apr 17 16:37, Peter Maydell wrote:
> On 17 April 2018 at 16:21, Aaron Lindsay  wrote:
> > On Apr 12 13:36, Aaron Lindsay wrote:
> >> On Apr 12 18:15, Peter Maydell wrote:
> >> > On 16 March 2018 at 20:31, Aaron Lindsay  wrote:
> >> > > diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> >> > > index b0ef727..9c3b5ef 100644
> >> > > --- a/target/arm/cpu.h
> >> > > +++ b/target/arm/cpu.h
> >> > > @@ -458,6 +458,11 @@ typedef struct CPUARMState {
> >> > >   * was reset. Otherwise it stores the counter value
> >> > >   */
> >> > >  uint64_t c15_ccnt;
> >> > > +/* ccnt_cached_cycles is used to hold the last cycle count 
> >> > > when
> >> > > + * c15_ccnt holds the guest-visible count instead of the 
> >> > > delta during
> >> > > + * PMU operations which require this.
> >> > > + */
> >> > > +uint64_t ccnt_cached_cycles;
> >> >
> >> > Can this ever hold valid state at a point when we need to do VM
> >> > migration, or is it purely temporary ?
> >>
> >> I believe that as of this version of the patch it is temporary and will
> >> not need to be migrated. However, I believe it's going to be necessary
> >> to have two variables to represent the state of each counter in order to
> >> implement interrupt on overflow.
> >
> > Coming back around to this, I don't see a way around using two variables
> > to hold PMCCNTR's full state to make interrupt on overflow work. I
> > haven't been able to find other examples or documentation covering state
> > needing to be updated in more than one location for a given CP register
> > - do you know of any I've missed or have recommendations about how to
> > approach this?
> 
> Can you explain the problem in more detail? In general it's a bit of
> a red flag if you think you need more state storage space than the
> hardware has, and I don't think there's any "hidden" state in the h/w here.

The critical difference between hardware and QEMU's PMU implementation
is that hardware detects overflow when the overflow actually happens,
which would be inefficient to do in software. Because QEMU stores a
delta from the 'real' count (i.e. the clock/icount) and only updates the
architectural counter values when necessary (when they're read/written),
checking for overflow is less straightforward than checking if
incrementing an individual counter by one flips the high-order bit from
1 to 0 as it happens. If the only information you have is the current
counter value, and don't know how many events have occurred since you
last checked or what the counter value was at that time, you can't tell
whether or not it overflowed.

I haven't come up with a way to correctly and reliably detect overflow
without storing additional information. I'll go ahead and post v4 with
my first-pass implementation of the overflow code and see if you see
something I'm missing or can think of a trick we can play to keep this
inside of one register value.

-Aaron

-- 
Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

Re: [Qemu-devel] [PATCH 3/3] qemu-iotests: Test new qemu-nbd --nolist option

2018-04-17 Thread Eric Blake

On 04/13/2018 02:26 PM, Nir Soffer wrote:
> Add new test module for tesing the --nolist option.
> 
> Signed-off-by: Nir Soffer 
> ---

> +iotests.log('Check that listing exports is allowed by default')
> +disk, nbd_sock = iotests.file_path('disk1', 'nbd-sock1')
> +iotests.qemu_img_create('-f', iotests.imgfmt, disk, '1m')
> +iotests.qemu_nbd('-k', nbd_sock, '-f', iotests.imgfmt, '-x', 'export', disk)
> +out = iotests.run('nbd-client', '-l', '--unix', nbd_sock)

Should we really be relying on the third-party nbd-client to be
installed?  Would it not be better to teach our own NBD client to learn
how to do interesting things over NBD?  Your use case of listing the
output of NBD_OPT_LIST is one, but another that readily comes to mind is
listing the possibilities of NBD_OPT_LIST_META_CONTEXT that just went
into 2.12.  Maybe making 'qemu-img info' give more details when
connecting to an NBD server, compared to what is normally needed just
for connecting to an export on that server?

Additionally, once we merge in Vladimir's work to expose persistent
dirty bitmaps via NBD_OPT_SET_META_CONTEXT/NBD_CMD_BLOCK_STATUS, it
would be nice to have an in-tree tool for reading out the context
information of an NBD export, perhaps extending what 'qemu-img map' can
already do (Vladimir already mentioned that he only implemented the
server side, and left the client side for an out-of-tree solution [1],
although I'm wondering if that is still the wisest course of action).

[1] https://lists.gnu.org/archive/html/qemu-devel/2018-03/msg05701.html

> +
> +assert 'export' in out.splitlines(), 'Export not in %r' % out
> +
> +iotests.log('Check that listing exports is forbidden with --nolist')
> +disk, nbd_sock = iotests.file_path('disk2', 'nbd-sock2')
> +iotests.qemu_img_create('-f', iotests.imgfmt, disk, '1m')
> +iotests.qemu_nbd('-k', nbd_sock, '-f', iotests.imgfmt, '-x', 'secret',
> + '--nolist', disk)
> +
> +# nbd-client fails when listing is not allowed, but lets not depend on 3rd

s/lets/let's/

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 1/3] nbd: Add option to disallow listing exports

2018-04-17 Thread Eric Blake

On 04/16/2018 06:00 AM, Daniel P. Berrangé wrote:
> On Mon, Apr 16, 2018 at 11:53:41AM +0100, Richard W.M. Jones wrote:
>> On Mon, Apr 16, 2018 at 11:31:18AM +0100, Daniel P. Berrangé wrote:
>>> Essentially this is abusing the export name as a crude authentication
>>> token. There are NBD servers that expect NBD_OPT_LIST to always succeeed
>>
>> I guess you mean "NBD clients" ...
> 
> Sigh, yes, of course.

qemu 2.10 and older tries to use NBD_OPT_LIST, but gracefully still
tries to connect even if the LIST fails (that is, it's use of
NBD_OPT_LIST was for better error handling than what NBD_OPT_EXPORT_NAME
gives, and not because it actually needed the list).  The recent
introduction in qemu 2.11 for support of NBD_OPT_GO means that modern
qemu is no longer even attempting NBD_OPT_LIST when talking to a new
server.  But cross-implementation compatibility is still a concern, and
there may indeed be non-qemu clients that choke if LIST fails, even
though...

> 
>>> when they detect that the new style protocol is available. I really hate
>>> the idea of making it possible to break the NBD_OPT_LIST functionality
>>> via a command line arg like this.

...the NBD spec suggests that a client that requires LIST to work is not
fully compliant, since NBD_OPT_LIST is an optional feature.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 1/3] nbd: Add option to disallow listing exports

2018-04-17 Thread Eric Blake

On 04/13/2018 02:26 PM, Nir Soffer wrote:
> When a management application expose images using qemu-nbd, it needs a
> secure way to allow temporary access to the disk. Using a random export
> name can solve this problem:
> 
> nbd://server:10809/22965f19-9ab5-4d18-94e1-cbeb321fa433

I share Dan's concerns that you are trying to protect information
without requiring TLS.  If you would just use TLS, then only clients
that can authenticate can list the export names; the fact that the name
leaks at all means you aren't using TLS, so you are just as vulnerable
to a man-in-the-middle attack as you are to the information leak.

> 
> Assuming that the url is passed to the user in a secure way, and the
> user is using TLS to access the image.
> 
> However, since qemu-nbd implements NBD_OPT_LIST, anyone can easily find
> the secret export:
> 
> $ nbd-client -l server 10809
> Negotiation: ..
> 22965f19-9ab5-4d18-94e1-cbeb321fa433

If the server requires TLS, then 'nbd-client -l' already cannot list
names without first negotiating TLS (all commands other than
NBD_OPT_STARTTLS are rejected with NBD_REP_ERR_TLS_REQD if the server
required TLS).  Your example is thus invalidating your above assumption
that the user is using TLS.

> 
> Add a new --nolist option, disabling listing, similar the "allowlist"
> nbd-server configuration option.

This may still make sense to implement, but not necessarily for the
reasons you are giving.

> @@ -86,6 +88,7 @@ static void usage(const char *name)
>  "  -v, --verbose display extra debugging information\n"
>  "  -x, --export-name=NAMEexpose export by name\n"
>  "  -D, --description=TEXTwith -x, also export a human-readable 
> description\n"
> +"  --nolist  do not list export\n"

s/export/exports/

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [RFC PATCH V4 3/4] vfio: Add SaveVMHanlders for VFIO device to support live migration

2018-04-17 Thread Kirti Wankhede

On 4/17/2018 1:31 PM, Zhang, Yulei wrote:

>>> +static SaveVMHandlers savevm_vfio_handlers = {
>>> +.save_setup = vfio_save_setup,
>>> +.save_live_pending = vfio_save_live_pending,
>>> +.save_live_complete_precopy = vfio_save_complete,
>>> +.load_state = vfio_load,
>>> +};
>>> +
>>
>> Isn't .is_active, .save_live_iterate and .cleanup required?
>> What is vendor driver have large amount of data in device's memory which
>> vendor driver is aware of? and vendor driver would required multiple
>> iterations to send that data to QEMU to save complete state of device?
>>
> I suppose the vendor driver will copy the device's memory to the device
> region iteratively, and let qemu read from the device region and transfer
> the data to the target side in pre-copy stage, isn't it? 
> 

As Dave mentioned in other mail in this thread, all data will not be
copied only in pre-copy state. Some static data would be copied in
pre-copy state but there could be significant amount of data in
stop-and-copy state where iterations would be required. .is_active and
.save_live_iterate would be required for that iterations.

.cleanup is required to provide an indication to vendor driver that
migration is complete and vendor driver can cleanup all the extra
allocations done for migration.

Thanks,
Kirti

Re: [Qemu-devel] [RFC PATCH V4 1/4] vfio: introduce a new VFIO subregion for mdev device migration support

2018-04-17 Thread Kirti Wankhede



On 4/17/2018 1:45 AM, Alex Williamson wrote:
> On Mon, 16 Apr 2018 20:14:03 +0530
> Kirti Wankhede  wrote:
> 
>> On 4/10/2018 11:32 AM, Yulei Zhang wrote:
>>> New VFIO sub region VFIO_REGION_SUBTYPE_DEVICE_STATE is added
>>> to fetch and restore the status of mdev device vGPU during the
>>> live migration.
>>>
>>> Signed-off-by: Yulei Zhang 
>>> ---
>>>  hw/vfio/pci.c  | 25 -
>>>  hw/vfio/pci.h  |  2 ++
>>>  linux-headers/linux/vfio.h |  9 ++---
>>>  3 files changed, 32 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>>> index c977ee3..f98a9dd 100644
>>> --- a/hw/vfio/pci.c
>>> +++ b/hw/vfio/pci.c
>>> @@ -32,6 +32,7 @@
>>>  #include "pci.h"
>>>  #include "trace.h"
>>>  #include "qapi/error.h"
>>> +#include "migration/blocker.h"
>>>  
>>>  #define MSIX_CAP_LENGTH 12
>>>  
>>> @@ -2821,6 +2822,25 @@ static void vfio_realize(PCIDevice *pdev, Error 
>>> **errp)
>>>  vfio_vga_quirk_setup(vdev);
>>>  }
>>>  
>>> +struct vfio_region_info *device_state;
>>> +/* device state region setup */
>>> +if (!vfio_get_dev_region_info(>vbasedev,
>>> +VFIO_REGION_TYPE_PCI_VENDOR_TYPE,
> 
> This is not how VENDOR_TYPE is meant to be used.  We have a 32-bit type
> and 32-bit sub-type.  When bit 31 of type is set (ie. VENDOR_TYPE),
> then the low 16-bits (VENDOR_MASK) defines a vendor specific set of
> sub-types.  Using it as above, you're stomping on vendor 0x's
> vendor sub-types.  I would expect non-vendor specific types to leave
> bit 31 of the type field clear.  We could then define a universal type
> for device state with a sub-type specifying the API such that we could
> revise the API by providing an updated sub-type.  Or perhaps there are
> devices that want separate save vs restore regions, we could do that
> with the sub-type.
> 
>>> +VFIO_REGION_SUBTYPE_DEVICE_STATE, _state)) {
>>> +memcpy(>device_state, device_state,
>>> +   sizeof(struct vfio_region_info));
>>> +g_free(device_state);
>>> +} else {
>>> +error_setg(>migration_blocker,
>>> +"Migration disabled: cannot support device state region");
>>> +migrate_add_blocker(vdev->migration_blocker, );
> 
> This appears as if it's going to be rather verbose and generate errors
> for anything not supporting migration, which is currently everything.
> Maybe there should be an OnOffAuto vfio-pci device option for the user
> to specify migration such that if migration=on is specified the device
> will fail if it's not available.  Otherwise the default would be auto.
> 
>>> +if (err) {
>>> +error_propagate(errp, err);
>>> +error_free(vdev->migration_blocker);
>>> +goto error;
>>> +}
>>> +}
>>> +  
>>
>> I think there should be a _PROBE ioctl before trying to setup region for
>> migration. If vfio device driver or vendor driver for mdev device
>> supports migration capability, _PROBE ioctl should return success along
>> with region's information  which can be used to setup the
>> region.
> 
> How is _PROBE different from _REGION_INFO which already exists to tell
> us information about the region?
> 

Fine with _REGION_INFO if SUBTYPE is non-vendor specific as you
mentioned above.

Thanks,
Kirti

>>>  for (i = 0; i < PCI_ROM_SLOT; i++) {
>>>  vfio_bar_quirk_setup(vdev, i);
>>>  }
>>> @@ -2884,6 +2904,10 @@ out_teardown:
>>>  vfio_teardown_msi(vdev);
>>>  vfio_bars_exit(vdev);
>>>  error:
>>> +if (vdev->migration_blocker) {
>>> +migrate_del_blocker(vdev->migration_blocker);
>>> +error_free(vdev->migration_blocker);
>>> +}
>>>  error_prepend(errp, ERR_PREFIX, vdev->vbasedev.name);
>>>  }
>>>  
>>> @@ -3009,7 +3033,6 @@ static Property vfio_pci_dev_properties[] = {
>>>  
>>>  static const VMStateDescription vfio_pci_vmstate = {
>>>  .name = "vfio-pci",
>>> -.unmigratable = 1,  
>>
>> Based on the result of above _PROBE ioctl, 'unmigratable' should be set
>> if vendor driver doesn't support migration capability.
> 
> .unmigratable isn't really dynamically settable aiui, thus the
> existence of the migration blocker.
> 
>>>  };
>>>  
>>>  static void vfio_pci_dev_class_init(ObjectClass *klass, void *data)
>>> diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
>>> index 502a575..0ee1724 100644
>>> --- a/hw/vfio/pci.h
>>> +++ b/hw/vfio/pci.h
>>> @@ -116,6 +116,8 @@ typedef struct VFIOPCIDevice {
>>>  VFIOBAR bars[PCI_NUM_REGIONS - 1]; /* No ROM */
>>>  VFIOVGA *vga; /* 0xa, 0x3b0, 0x3c0 */
>>>  void *igd_opregion;
>>> +struct vfio_region_info device_state;
>>> +Error *migration_blocker;
>>>  PCIHostDeviceAddress host;
>>>  EventNotifier err_notifier;
>>>  EventNotifier req_notifier;
>>> diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
>>> index

Re: [Qemu-devel] [RFC PATCH V4 3/4] vfio: Add SaveVMHanlders for VFIO device to support live migration

2018-04-17 Thread Eric Blake

On 04/16/2018 09:44 AM, Kirti Wankhede wrote:
> 
> 
> On 4/10/2018 11:33 AM, Yulei Zhang wrote:
>> Instead of using vm state description, add SaveVMHandlers for VFIO
>> device to support live migration.

In the subject line: s/Hanlders/Handlers/

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH] Show values and description when using "qom-list"

2018-04-17 Thread Eric Blake

On 04/13/2018 03:05 AM, Perez Blanco, Ricardo (Nokia - BE/Antwerp) wrote:
> Dear all,
> 
> Here you can find my first contribution to qemu. Please, do not hesitate to 
> do any kind of remark.

Welcome to the community.  Looking forward to your v2 patch submission
(see my reply to your followup, for more things to fix before you send v2).


> Subject: [PATCH] Show values and description when using "qom-list"
> 
> For debugging purposes it is very useful to:
>  - See the description of the field. This information is already filled
>in but not shown in "qom-list" command.
>  - Display value of the field. So far, only well known types are
>implemented (string, str, int, uint, bool).
> 
> Signed-off-by: Ricardo Perez Blanco 
> ---
>  hmp.c  | 13 +++--
>  qapi/misc.json |  4 +++-
>  qmp.c  | 26 ++
>  3 files changed, 40 insertions(+), 3 deletions(-)
> 
> diff --git a/hmp.c b/hmp.c
> index a25c7bd..967e0b2 100644
> --- a/hmp.c
> +++ b/hmp.c
> @@ -2490,8 +2490,17 @@ void hmp_qom_list(Monitor *mon, const QDict *qdict)
>  while (list != NULL) {
>  ObjectPropertyInfo *value = list->value;
> 
> -monitor_printf(mon, "%s (%s)\n",
> -   value->name, value->type);
> +monitor_printf(mon, "%s", value->name);
> +if (value->value) {
> +monitor_printf(mon, "=%s", value->value);
> +}

Technically, you should be checking 'if (value->has_value)'.  It happens
that our current code sets value->value to NULL if value->has_value is
false, but that's less reliable.  Someday, we may improve our QAPI code
generator to quit generating has_FOO members when FOO is an optional
pointer and NULL is an obvious indication that FOO was not provided (at
which point, your code as written is correct), but we're not there yet.


> +++ b/qapi/misc.json
> @@ -1328,10 +1328,12 @@
>  #
>  # @description: if specified, the description of the property.
>  #
> +# @value: if specified, the value of the property.

Missing a mention that this field is '(since 2.13)'.

> +#
>  # Since: 1.2
>  ##
>  { 'struct': 'ObjectPropertyInfo',
> -  'data': { 'name': 'str', 'type': 'str', '*description': 'str' } }
> +  'data': { 'name': 'str', 'type': 'str', '*description':'str', 
> '*value':'str' } }
> 
>  ##
>  # @qom-list:
> diff --git a/qmp.c b/qmp.c
> index f722616..750b5d0 100644
> --- a/qmp.c
> +++ b/qmp.c
> @@ -237,6 +237,32 @@ ObjectPropertyInfoList *qmp_qom_list(const char *path, 
> Error **errp)
> 
>  entry->value->name = g_strdup(prop->name);
>  entry->value->type = g_strdup(prop->type);
> +if (prop->description) {
> +entry->value->description = g_strdup(prop->description);
> +}
> +if ((g_ascii_strncasecmp(entry->value->type, "string", 6) == 0) ||
> +(g_ascii_strncasecmp(entry->value->type, "str", 3) == 0)) {

This will accept a type named "strange"; is that intentional?

> +Error **errp = NULL;
> +entry->value->value = g_strdup_printf("\"%s\"",
> +object_property_get_str(obj, entry->value->name, errp));
> +}
> +if (g_ascii_strncasecmp(entry->value->type, "int", 3) == 0) {

Likewise, this will accept a type named "internal".  I'm not sure if
your manual checking for types by names is the best approach; and the
fact that we are stringizing the result instead of using the natural
JSON type (a string for "str", a number for "int") is a bit worrisome.

> +Error **errp = NULL;
> +entry->value->value = g_strdup_printf("%ld",
> +object_property_get_int(obj, entry->value->name, errp));
> +}
> +if (g_ascii_strncasecmp(entry->value->type, "uint", 4) == 0) {
> +Error **errp = NULL;
> +entry->value->value = g_strdup_printf("%lu",
> +object_property_get_uint(obj, entry->value->name, errp));
> +}
> +if (g_ascii_strncasecmp(entry->value->type, "bool", 4) == 0) {
> +Error **errp = NULL;
> +entry->value->value = g_strdup_printf("%s",
> +   (object_property_get_bool(obj, entry->value->name, errp) == 
> true)
> +? "true" : "false");
> +}

Since the ->value field is optional, you have to set
entry->value->has_value = true anywhere that you are setting
entry->value->value to a string.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [RFC PATCH V4 3/4] vfio: Add SaveVMHanlders for VFIO device to support live migration

2018-04-17 Thread Dr. David Alan Gilbert

* Yulei Zhang (yulei.zh...@intel.com) wrote:
> Instead of using vm state description, add SaveVMHandlers for VFIO
> device to support live migration.
> 
> Introduce new Ioctl VFIO_DEVICE_GET_DIRTY_BITMAP to fetch the memory
> bitmap that dirtied by vfio device during the iterative precopy stage
> to shorten the system downtime afterward.
> 
> For vfio pci device status migrate, during the system downtime, it will
> save the following states
> 1. pci configuration space addr0~addr5
> 2. pci configuration space msi_addr msi_data
> 3. pci device status fetch from device driver
> 
> And on the target side the vfio_load will restore the same states
> 1. re-setup the pci bar configuration
> 2. re-setup the pci device msi configuration
> 3. restore the pci device status
> 
> Signed-off-by: Yulei Zhang 
> ---
>  hw/vfio/pci.c  | 195 
> +++--
>  linux-headers/linux/vfio.h |  14 
>  2 files changed, 204 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 13d8c73..ac6a9c7 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -33,9 +33,14 @@
>  #include "trace.h"
>  #include "qapi/error.h"
>  #include "migration/blocker.h"
> +#include "migration/register.h"
> +#include "exec/ram_addr.h"
>  
>  #define MSIX_CAP_LENGTH 12
>  
> +#define VFIO_SAVE_FLAG_SETUP 0
> +#define VFIO_SAVE_FLAG_DEV_STATE 1
> +
>  static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
>  static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
>  static void vfio_vm_change_state_handler(void *pv, int running, RunState 
> state);
> @@ -2639,6 +2644,190 @@ static void 
> vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
>  vdev->req_enabled = false;
>  }
>  
> +static uint64_t vfio_dirty_log_sync(VFIOPCIDevice *vdev)
> +{
> +RAMBlock *block;
> +struct vfio_device_get_dirty_bitmap *d;
> +uint64_t page = 0;
> +ram_addr_t size;
> +unsigned long nr, bitmap;
> +
> +RAMBLOCK_FOREACH(block) {
> +size = block->used_length;
> +nr = size >> TARGET_PAGE_BITS;
> +bitmap = (BITS_TO_LONGS(nr) + 1) * sizeof(unsigned long);
> +d = g_malloc0(sizeof(*d) +  bitmap);
> +d->start_addr = block->offset;
> +d->page_nr = nr;
> +if (ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_DIRTY_BITMAP, d)) {
> +error_report("vfio: Failed to get device dirty bitmap");
> +g_free(d);
> +goto exit;
> +}
> +
> +if (d->page_nr) {
> +cpu_physical_memory_set_dirty_lebitmap(
> + (unsigned long *)>dirty_bitmap,
> + d->start_addr, d->page_nr);
> +page += d->page_nr;
> +}
> +g_free(d);
> +}
> +
> +exit:
> +return page;
> +}
> +
> +static void vfio_save_live_pending(QEMUFile *f, void *opaque, uint64_t 
> max_size,
> +   uint64_t *non_postcopiable_pending,
> +   uint64_t *postcopiable_pending)
> +{
> +VFIOPCIDevice *vdev = opaque;
> +uint64_t pending;
> +
> +qemu_mutex_lock_iothread();
> +rcu_read_lock();
> +pending = vfio_dirty_log_sync(vdev);
> +rcu_read_unlock();
> +qemu_mutex_unlock_iothread();
> +*non_postcopiable_pending += pending;
> +}
> +
> +static int vfio_load(QEMUFile *f, void *opaque, int version_id)
> +{
> +VFIOPCIDevice *vdev = opaque;
> +PCIDevice *pdev = >pdev;
> +int sz = vdev->device_state.size - VFIO_DEVICE_STATE_OFFSET;
> +uint8_t *buf = NULL;
> +uint32_t ctl, msi_lo, msi_hi, msi_data, bar_cfg, i;
> +bool msi_64bit;
> +
> +if (qemu_get_byte(f) == VFIO_SAVE_FLAG_SETUP) {
> +goto exit;
> +}

If you're building something complex like this, you might want to add
some version flags at the start and a canary at the end to detect
corruption.

Also note that the migration could fail at any point; so calling
qemu_file_get_error is good practice at points before acting on data
youv'e read with qemu_get_be* since it could be bogus if it's already
failed.

> +/* retore pci bar configuration */
> +ctl = pci_default_read_config(pdev, PCI_COMMAND, 2);
> +vfio_pci_write_config(pdev, PCI_COMMAND,
> +  ctl & (!(PCI_COMMAND_IO | PCI_COMMAND_MEMORY)), 2);
> +for (i = 0; i < PCI_ROM_SLOT; i++) {
> +bar_cfg = qemu_get_be32(f);
> +vfio_pci_write_config(pdev, PCI_BASE_ADDRESS_0 + i * 4, bar_cfg, 4);
> +}
> +vfio_pci_write_config(pdev, PCI_COMMAND,
> +  ctl | PCI_COMMAND_IO | PCI_COMMAND_MEMORY, 2);
> +
> +/* restore msi configuration */
> +ctl = pci_default_read_config(pdev, pdev->msi_cap + PCI_MSI_FLAGS, 2);
> +msi_64bit = !!(ctl & PCI_MSI_FLAGS_64BIT);
> +
> +vfio_pci_write_config(>pdev,
> +  pdev->msi_cap + PCI_MSI_FLAGS,
> +  ctl & (!PCI_MSI_FLAGS_ENABLE), 2);
> +
> +msi_lo =

Re: [Qemu-devel] [RFC PATCH V4 2/4] vfio: Add vm status change callback to stop/restart the mdev device

2018-04-17 Thread Kirti Wankhede



On 4/17/2018 8:13 PM, Alex Williamson wrote:
> On Tue, 17 Apr 2018 13:40:32 +
> "Zhang, Yulei"  wrote:
> 
>>> -Original Message-
>>> From: Alex Williamson [mailto:alex.william...@redhat.com]
>>> Sent: Tuesday, April 17, 2018 4:23 AM
>>> To: Kirti Wankhede 
>>> Cc: Zhang, Yulei ; qemu-devel@nongnu.org; Tian,
>>> Kevin ; joonas.lahti...@linux.intel.com;
>>> zhen...@linux.intel.com; Wang, Zhi A ;
>>> dgilb...@redhat.com; quint...@redhat.com
>>> Subject: Re: [RFC PATCH V4 2/4] vfio: Add vm status change callback to
>>> stop/restart the mdev device
>>>
>>> On Mon, 16 Apr 2018 20:14:27 +0530
>>> Kirti Wankhede  wrote:
>>>   
 On 4/10/2018 11:32 AM, Yulei Zhang wrote:  
> VM status change handler is added to change the vfio pci device
> status during the migration, write the demanded device status
> to the DEVICE STATUS subregion to stop the device on the source side
> before fetch its status and start the deivce on the target side
> after restore its status.
>
> Signed-off-by: Yulei Zhang 
> ---
>  hw/vfio/pci.c | 20 
>  include/hw/vfio/vfio-common.h |  1 +
>  linux-headers/linux/vfio.h|  6 ++
>  roms/seabios  |  2 +-
>  4 files changed, 28 insertions(+), 1 deletion(-)
>
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index f98a9dd..13d8c73 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -38,6 +38,7 @@
>
>  static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
>  static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
> +static void vfio_vm_change_state_handler(void *pv, int running,  
>>> RunState state);  
>
>  /*
>   * Disabling BAR mmaping can be slow, but toggling it around INTx can
> @@ -2896,6 +2897,7 @@ static void vfio_realize(PCIDevice *pdev, Error  
>>> **errp)  
>  vfio_register_err_notifier(vdev);
>  vfio_register_req_notifier(vdev);
>  vfio_setup_resetfn_quirk(vdev);
> +  
>>> qemu_add_vm_change_state_handler(vfio_vm_change_state_handler,
>>> vdev);  
>
>  return;
>
> @@ -2982,6 +2984,24 @@ post_reset:
>  vfio_pci_post_reset(vdev);
>  }
>
> +static void vfio_vm_change_state_handler(void *pv, int running,  
>>> RunState state)  
> +{
> +VFIOPCIDevice *vdev = pv;
> +VFIODevice *vbasedev = >vbasedev;
> +uint8_t dev_state;
> +uint8_t sz = 1;
> +
> +dev_state = running ? VFIO_DEVICE_START : VFIO_DEVICE_STOP;
> +
> +if (pwrite(vdev->vbasedev.fd, _state,
> +   sz, vdev->device_state.offset) != sz) {
> +error_report("vfio: Failed to %s device", running ? "start" : 
> "stop");
> +return;
> +}
> +
> +vbasedev->device_state = dev_state;
> +}
> +  

 Is it expected to trap device_state region by vendor driver?
 Can this information be communicated to vendor driver through an ioctl?  
>>>
>>> Either the mdev vendor driver or vfio bus driver (ie. vfio-pci) would
>>> be providing REGION_INFO for this region, so the vendor driver is
>>> already in full control here using existing ioctls.  I don't see that
>>> we need new ioctls, we just need to fully define the API of the
>>> proposed regions here.
>>>   
>> If the device state region is mmaped, we may not be able to use
>> region device state offset to convey the running state. It may need
>> a new ioctl to set the device state.
> 
> The vendor driver defines the mmap'ability of the region, the vendor
> driver is still in control.  The API of the region and the
> implementation by the vendor driver should account for handling
> mmap'able sections within the region.  Thanks,
> 
> Alex
> 
>  

If this same region should be used for communicating state or other
parameters instead of ioctl, may be first page of this region need to be
reserved. Mmappable region's start address should be page aligned. Is
this API going to utilize 4K of the reserved part of this region?
Instead of carving out part of section from the region, are there any
disadvantages of adding an ioctl?
May be defining a single ioctl and using different flags (GET_*/SET_*)
would work?

 Here only device state is conveyed to vendor driver but knowing
 'RunState' in vendor driver is very useful and vendor driver can take
 necessary action accordingly like RUN_STATE_PAUSED indicating that VM  
>>> is  
 in paused state, similarly RUN_STATE_SUSPENDED,
 RUN_STATE_FINISH_MIGRATE, RUN_STATE_IO_ERROR. If these states are
 handled properly, all the cases can be supported with same interface
 like VM suspend-resume, VM pause-restore.  
>>>
>>> I agree, but let's remember that we're talking about device state, not

Re: [Qemu-devel] [PATCH v6 2/7] hw/misc: add vmcoreinfo device

2018-04-17 Thread Cole Robinson

On 10/20/2017 02:48 PM, Eduardo Habkost wrote:
> On Sun, Oct 15, 2017 at 04:56:28AM +0300, Michael S. Tsirkin wrote:
>> On Tue, Oct 10, 2017 at 03:01:10PM -0300, Eduardo Habkost wrote:
>>> On Tue, Oct 10, 2017 at 04:06:28PM +0100, Daniel P. Berrange wrote:
 On Tue, Oct 10, 2017 at 05:00:18PM +0200, Marc-André Lureau wrote:
> Hi
>
> On Tue, Oct 10, 2017 at 10:31 AM, Daniel P. Berrange
>  wrote:
>> On Tue, Oct 10, 2017 at 12:44:26AM +0300, Michael S. Tsirkin wrote:
>>> On Mon, Oct 09, 2017 at 02:02:18PM +0100, Daniel P. Berrange wrote:
 On Mon, Oct 09, 2017 at 02:43:44PM +0200, Igor Mammedov wrote:
> On Mon, 9 Oct 2017 12:03:36 +0100
> "Daniel P. Berrange"  wrote:
>
>> On Mon, Sep 11, 2017 at 06:59:24PM +0200, Marc-André Lureau wrote:
>>> See docs/specs/vmcoreinfo.txt for details.
>>>
>>> "etc/vmcoreinfo" fw_cfg entry is added when using "-device 
>>> vmcoreinfo".
>>
>> I'm wondering if you considered just adding the entry to fw_cfg by
>> default, without requiring any -device arg ? Unless I'm 
>> misunderstanding,
>> this doesn't feel like a device to me - its just a well known bucket
>> in fw_cfg IIUC ?  Obviously its existance would need to be tied to
>> the latest machine type for ABI reasons though. The benefit of this
>> is that it would "just work" without us having to plumb it through to
>> all the downstream applications that use QEMU for mgmt guest 
>> (OpenStack,
>> oVirt, GNOME Boxes, virt-manager, and countless other mgmt apps).
> it follows model set by pvpanic device, it's easier to manage from 
> migration
> POV, one could use it even for old machine types with new qemu (just 
> by adding
> device, it makes instance not backwards migratable to old qemu but 
> should work
> for forward migration) and if user doesn't need it, device could be 
> just omitted
> from CLI.

 Sure but it means that in effect no one will have this functionality 
 enabled
 for several years. pvpanic has been around a long time and I rarely 
 see it
 present in configured guests :-(


 Regards,
 Daniel
>>>
>>> libvirt runs with -nodefaults, right? I'd argue pretty strongly 
>>> -nodefaults
>>> shouldn't add optional devices anyway.
>>
>> This isn't really adding a device though is it - it is just a well known
>> location in fw_cfg to receive data.
>
> Enabling the device on some configurations by default can be done as a
> follow-up patch. Can we get this series reviewed & merged?

 The problem with the -device approach + turning it on by default is that 
 there
 is no way to turn it off again if you don't want it. eg there's way to undo
 an implicit '-device foo' except via -nodefaults, but since libvirt uses 
 that
 already it would negate the effect of enabling it by default 
 unconditionally.
>>>
>>> It's still possible to add a -machine option that can
>>> enable/disable automatic creation of the device.
>>>
>>> But I also don't see why it needs to be implemented using -device
>>> if it's not really a device.  A boolean machine or fw_cfg
>>> property is good enough for that.
>>
>> It certainly feels like a device. It has state
>> (that needs to be migrated), it has a host/guest interface.
> 
> (Sorry for the late reply)
> 
> That's convincing enough to me.  :)
> 
> 

 Your previous approach of "-global fw_cfg.vmcoreinfo=on" is nicer in this
 respect, as you can trivially turn it on/off, overriding the default state
 in both directions.
>>>
>>> Both "-global fw_cfg.vmcoreinfo=on|off" and
>>> "-machine vmcoreinfo=on|off" sound good enough to me.
>>
>>
>> Certainly not a fw cfg flag. Can be a machine flag I guess
>> but then we'd have to open-code each such device.
>> And don't forget auto - this is what Daniel asks for.
> 
> I'm not sure Daniel is really asking for "auto": he is just
> asking for a way to disable the new default.  If "vmcoreinfo=off"
> and "vmcoreinfo=off" works, there's no need for a user-visible
> "auto" value.
> 
> (Actually, "auto" values makes compatibility code even messier,
> because we would need one additional compat property/field to
> tell QEMU what "auto" means on each machine)
> 

Reviving this... did any follow up changes happen?

Marc-André patched virt-manager a few months back to enable -device
vmcoreinfo for new VMs:

https://www.redhat.com/archives/virt-tools-list/2018-February/msg00020.html

And I see there's at least a bug tracking adding this to openstack for
new VMs:

https://bugzilla.redhat.com/show_bug.cgi?id=1555276

If this feature doesn't really have any downsides, it would be nice to
get this tied to new

Re: [Qemu-devel] [PATCH] Show values and description when using "qom-list"

2018-04-17 Thread Eric Blake

On 04/16/2018 07:00 AM, Perez Blanco, Ricardo (Nokia - BE/Antwerp) wrote:
> Hi,
> 
> A new patch (to be rebase on top of my previous one). 

A patch-to-a-patch doesn't work well.  Instead, run:

git rebase -i origin

then mark the second patch as 'squash' before closing the editor, and
git will merge the two patches into one.  Then resend things with 'v2'
in the subject line, by using 'git send-email -v2 ...'.

More patch submission hints at https://wiki.qemu.org/Contribute/SubmitAPatch

> 
> From 77f7217c07d5e3892f26082f220954678eb375b3 Mon Sep 17 00:00:00 2001
> From: Ricardo Perez Blanco 
> Date: Mon, 16 Apr 2018 13:51:42 +0200
> Subject: [PATCH] [PATCHv2] Show values and description when using "qom-list"
> 
> For debugging purposes it is very useful to:

This is not in 'git send-email' format, which makes it harder for our
automated tooling to evaluate your patch.

>  - See the description of the field. This information is already
>filled
>in but not shown in "qom-list" command.
>  - Display value of the field. So far, only well known types are
>implemented (string, str, int, uint, bool).
> 
> Signed-off-by: Ricardo Perez Blanco 
> ---
>  qmp.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/qmp.c b/qmp.c
> index 750b5d0..5be9d8d 100644
> --- a/qmp.c
> +++ b/qmp.c
> @@ -249,12 +249,14 @@ ObjectPropertyInfoList *qmp_qom_list(const char *path, 
> Error **errp)
>  if (g_ascii_strncasecmp(entry->value->type, "int", 3) == 0) {
>  Error **errp = NULL;
>  entry->value->value = g_strdup_printf("%ld",
> -object_property_get_int(obj, entry->value->name, errp));
> +(long int) object_property_get_int(
> +obj, entry->value->name, errp));

This is wrong.  Casting 'int64_t' to 'long int' on a 32-bit platform
silently truncates the value.  You don't want the cast; instead, you
should be using "%"PRId64 in place of "%ld", so that your printf format
always matches the correct spelling corresponding to the 64-bit value
you will be printing.

> -Original Message-
> From: no-re...@patchew.org [mailto:no-re...@patchew.org] 
> Sent: Friday, April 13, 2018 2:54 PM
> To: Perez Blanco, Ricardo (Nokia - BE/Antwerp) 
> 
> Cc: f...@redhat.com; qemu-devel@nongnu.org; dgilb...@redhat.com; 
> arm...@redhat.com
> Subject: Re: [Qemu-devel] [PATCH] Show values and description when using 
> "qom-list"
> 
> Hi,
> 

Also, you are top-posting, which makes it hard to follow your
conversation.  On technical lists, it is better to reply inline instead
of top-posting, and to trim content that is not essential to your reply.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH] fpu/softfloat: check for Inf / x or 0 / x before /0

2018-04-17 Thread Emilio G. Cota

On Mon, Apr 16, 2018 at 14:54:42 +0100, Alex Bennée wrote:
> The re-factoring of div_floats changed the order of checking meaning
> an operation like -inf/0 erroneously raises the divbyzero flag.
> IEEE-754 (2008) specifies this should only occur for operations on
> finite operands.
> 
> We fix this by moving the check on the dividend being Inf/0 to before
> the divisor is zero check.
> 
> Signed-off-by: Alex Bennée 
> Cc: Bastian Koppelmann 

I can confirm this fixes the issue -- just checked with a modified
version of fp-test, see below.

Note that in fp-test I am not checking for flags that are raised
when none are expected, because doing so gives quite a few errors.
Just noticed that enabling this check yields 1049 of these errors for
v2.11, and before this patch that number was 1087. After this
patch, it is again brought down to 1049. IOW, the test cases in
fp-test raise exactly the same flags as v2.11, which is good to know.

The 1049 errors are probably false positives -- at least a big
chunk of them should be, given that "-t host" gives even more errors.
I am tempted to keep the flag check and whitelist these errors
though, which would catch regressions such as the one we're fixing here.

Here is the report file with the 1049 failing test cases:
  http://www.cs.columbia.edu/~cota/qemu/fp-test-after-inf-patch.txt

Thanks,

Emilio

Re: [Qemu-devel] [RFC PATCH V4 1/4] vfio: introduce a new VFIO subregion for mdev device migration support

2018-04-17 Thread Dr. David Alan Gilbert

* Alex Williamson (alex.william...@redhat.com) wrote:
> On Tue, 17 Apr 2018 10:36:53 +
> "Zhang, Yulei"  wrote:
> 
> > > -Original Message-
> > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > Sent: Tuesday, April 17, 2018 4:15 AM
> > > To: Kirti Wankhede 
> > > Cc: Zhang, Yulei ; qemu-devel@nongnu.org; Tian,
> > > Kevin ; joonas.lahti...@linux.intel.com;
> > > zhen...@linux.intel.com; Wang, Zhi A ;
> > > dgilb...@redhat.com; quint...@redhat.com
> > > Subject: Re: [RFC PATCH V4 1/4] vfio: introduce a new VFIO subregion for
> > > mdev device migration support
> > > 
> > > On Mon, 16 Apr 2018 20:14:03 +0530
> > > Kirti Wankhede  wrote:
> > >   
> > > > On 4/10/2018 11:32 AM, Yulei Zhang wrote:  
> > > > > New VFIO sub region VFIO_REGION_SUBTYPE_DEVICE_STATE is added
> > > > > to fetch and restore the status of mdev device vGPU during the
> > > > > live migration.
> > > > >
> > > > > Signed-off-by: Yulei Zhang 
> > > > > ---
> > > > >  hw/vfio/pci.c  | 25 -
> > > > >  hw/vfio/pci.h  |  2 ++
> > > > >  linux-headers/linux/vfio.h |  9 ++---
> > > > >  3 files changed, 32 insertions(+), 4 deletions(-)
> > > > >
> > > > > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > > > > index c977ee3..f98a9dd 100644
> > > > > --- a/hw/vfio/pci.c
> > > > > +++ b/hw/vfio/pci.c
> > > > > @@ -32,6 +32,7 @@
> > > > >  #include "pci.h"
> > > > >  #include "trace.h"
> > > > >  #include "qapi/error.h"
> > > > > +#include "migration/blocker.h"
> > > > >
> > > > >  #define MSIX_CAP_LENGTH 12
> > > > >
> > > > > @@ -2821,6 +2822,25 @@ static void vfio_realize(PCIDevice *pdev, 
> > > > > Error  
> > > **errp)  
> > > > >  vfio_vga_quirk_setup(vdev);
> > > > >  }
> > > > >
> > > > > +struct vfio_region_info *device_state;
> > > > > +/* device state region setup */
> > > > > +if (!vfio_get_dev_region_info(>vbasedev,
> > > > > +VFIO_REGION_TYPE_PCI_VENDOR_TYPE,  
> > > 
> > > This is not how VENDOR_TYPE is meant to be used.  We have a 32-bit type
> > > and 32-bit sub-type.  When bit 31 of type is set (ie. VENDOR_TYPE),
> > > then the low 16-bits (VENDOR_MASK) defines a vendor specific set of
> > > sub-types.  Using it as above, you're stomping on vendor 0x's
> > > vendor sub-types.  I would expect non-vendor specific types to leave
> > > bit 31 of the type field clear.  We could then define a universal type
> > > for device state with a sub-type specifying the API such that we could
> > > revise the API by providing an updated sub-type.  Or perhaps there are
> > > devices that want separate save vs restore regions, we could do that
> > > with the sub-type.  
> > 
> > Got it. For the subtype, can we use it to specify how to use the region,
> > to indicate it can be mmaped or not. 
> 
> No, REGION_INFO tells us about mmap'ability and device state regions
> can certainly support the sparse mmap capability, we already have full
> ability to describe the mmap features of a region without dedicating a
> sub-type to it.  A sub-type must have a defined API, but use the
> existing mechanisms for mmap.
> 
> > > > > +VFIO_REGION_SUBTYPE_DEVICE_STATE, _state)) {
> > > > > +memcpy(>device_state, device_state,
> > > > > +   sizeof(struct vfio_region_info));
> > > > > +g_free(device_state);
> > > > > +} else {
> > > > > +error_setg(>migration_blocker,
> > > > > +"Migration disabled: cannot support device state 
> > > > > region");
> > > > > +migrate_add_blocker(vdev->migration_blocker, );  
> > > 
> > > This appears as if it's going to be rather verbose and generate errors
> > > for anything not supporting migration, which is currently everything.
> > > Maybe there should be an OnOffAuto vfio-pci device option for the user
> > > to specify migration such that if migration=on is specified the device
> > > will fail if it's not available.  Otherwise the default would be auto.
> > >   
> > We only get this error message when we try to migrate the guest with
> > vfio-pci device which doesn't contain the device_state region, just like 
> > currently we set unmigratable=1.
> 
> Ok, I still wonder though if management tools might want the ability to
> fail the device if it doesn't support migration via an OnOffAuto
> option.  I can imagine a customer that doesn't want to compromise the
> migrate'ability of their VM and doesn't want to wait until a migration
> is attempted to find out it's blocked.  Thanks,

The fix here is to check the return value from migrate_add_blocker
and bail if it's not happy.
If QEMU is started with --only-migratable it'll fail with the error
   'disallowing migration blocker (--only migratable) for:' and then
your message.

Dave

> Alex
>  
> > > > > +if (err) {
> > > > >

Re: [Qemu-devel] qapi: [PATCH v2] Implement query-usbhost QMP command

2018-04-17 Thread Eric Blake

On 04/14/2018 04:29 AM, Alexander Kappner wrote:
> Implement a QMP command similar to the HMP's "info usbhost" command.
> This allows a QMP client to query which USB devices may be available
> for redirection. Because the availability of the command needs to
> depend on the target's (not the build host's) USB configuration,
> a stub function in host-stub.c is provided for targets without USB support.
> 
> v2 of this patch resolves build failure under some configurations
> without libusb.
> 
> Signed-off-by: Alexander Kappner 
> ---
>  hw/usb/host-libusb.c | 64 
> 
>  hw/usb/host-stub.c   |  9 
>  qapi/misc.json   | 62 ++
>  3 files changed, 135 insertions(+)
> 

> +++ b/qapi/misc.json
> @@ -270,6 +270,46 @@
>  { 'command': 'query-kvm', 'returns': 'KvmInfo' }
>  
>  ##
> +# @query-usbhost:
> +#
> +# Returns information about USB devices available on the host
> +#
> +# Returns: a [UsbDeviceInfo]. Returns an error if compiled without
> +# CONFIG_USB_LIBUSB

s/a [UsbDeviceInfo]/a list of UsbDeviceInfo

> +#
> +# Since: TODO (maintainer insert version number if mainlined)

2.13 (it's easier to guess the next version, on the likely chance that
it does not need tweaking, than it is to force the maintainer to have to
tweak the patch on acceptance)


> +##
> +{ 'command': 'query-usbhost', 'returns': ['UsbDeviceInfo'] }
> +
> +##
>  # @UuidInfo:
>  #
>  # Guest UUID information (Universally Unique Identifier).
> @@ -876,6 +916,28 @@
> 'regions': ['PciMemoryRegion']} }
>  
>  ##
> +# @UsbDeviceInfo:
> +#
> +# @speed: the speed
> +#
> +# @id_vendor: idVendor field from device descriptor

Please, new QMP interfaces should favor '-' over '_' in naming; so this
should be id-vendor, and so on.

> +#
> +# @id_product: idProduct field from device descriptor
> +#
> +# @str_product: string descriptor referenced by iProduct index, if any

What's the difference between idProduct and iProduct in the text here?

> +#
> +# @str_manufacturer: string descriptor referenced by iManufacturer index, if 
> any

Is the 'str_' (or my preferred switch to 'str-') prefix necessary; or
can this field just be 'manufacturer'?

> +#
> +# @dev_addr: address on bus that device is connected to
> +#
> +# @bus_num: bus number device is connected to
> +##
> +{ 'struct': 'UsbDeviceInfo',
> +  'data':
> +  {'speed': 'int', 'id_vendor': 'int', 'id_product' : 'int', 'str_product': 
> 'str',
> +   'b_device_class': 'int', 'str_manufacturer' : 'str', 'dev_addr' : 'int', 
> 'bus_num' : 'int'} }

Long line; please fit things within 80 columns.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH 4/4] linux-headers: drop kvm_para.h

2018-04-17 Thread Michael S. Tsirkin

Unused now and can be removed.

Signed-off-by: Michael S. Tsirkin 
---
 linux-headers/asm-arm/kvm_para.h |   2 -
 linux-headers/asm-arm64/kvm_para.h   |   1 -
 linux-headers/asm-generic/kvm_para.h |   4 --
 linux-headers/asm-mips/kvm_para.h|   5 --
 linux-headers/asm-powerpc/epapr_hcalls.h |  99 --
 linux-headers/asm-powerpc/kvm_para.h |  98 -
 linux-headers/asm-s390/kvm_para.h|   8 ---
 linux-headers/asm-x86/kvm_para.h | 118 ---
 linux-headers/linux/kvm_para.h   |  35 -
 9 files changed, 370 deletions(-)
 delete mode 100644 linux-headers/asm-arm/kvm_para.h
 delete mode 100644 linux-headers/asm-arm64/kvm_para.h
 delete mode 100644 linux-headers/asm-generic/kvm_para.h
 delete mode 100644 linux-headers/asm-mips/kvm_para.h
 delete mode 100644 linux-headers/asm-powerpc/epapr_hcalls.h
 delete mode 100644 linux-headers/asm-powerpc/kvm_para.h
 delete mode 100644 linux-headers/asm-s390/kvm_para.h
 delete mode 100644 linux-headers/asm-x86/kvm_para.h
 delete mode 100644 linux-headers/linux/kvm_para.h

diff --git a/linux-headers/asm-arm/kvm_para.h b/linux-headers/asm-arm/kvm_para.h
deleted file mode 100644
index baacc49..000
--- a/linux-headers/asm-arm/kvm_para.h
+++ /dev/null
@@ -1,2 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
-#include 
diff --git a/linux-headers/asm-arm64/kvm_para.h 
b/linux-headers/asm-arm64/kvm_para.h
deleted file mode 100644
index 14fab8f..000
--- a/linux-headers/asm-arm64/kvm_para.h
+++ /dev/null
@@ -1 +0,0 @@
-#include 
diff --git a/linux-headers/asm-generic/kvm_para.h 
b/linux-headers/asm-generic/kvm_para.h
deleted file mode 100644
index 486f0af..000
--- a/linux-headers/asm-generic/kvm_para.h
+++ /dev/null
@@ -1,4 +0,0 @@
-/*
- * There isn't anything here, but the file must not be empty or patch
- * will delete it.
- */
diff --git a/linux-headers/asm-mips/kvm_para.h 
b/linux-headers/asm-mips/kvm_para.h
deleted file mode 100644
index dbb2464..000
--- a/linux-headers/asm-mips/kvm_para.h
+++ /dev/null
@@ -1,5 +0,0 @@
-#ifndef _ASM_MIPS_KVM_PARA_H
-#define _ASM_MIPS_KVM_PARA_H
-
-
-#endif /* _ASM_MIPS_KVM_PARA_H */
diff --git a/linux-headers/asm-powerpc/epapr_hcalls.h 
b/linux-headers/asm-powerpc/epapr_hcalls.h
deleted file mode 100644
index 6cca559..000
--- a/linux-headers/asm-powerpc/epapr_hcalls.h
+++ /dev/null
@@ -1,99 +0,0 @@
-/* SPDX-License-Identifier: ((GPL-2.0+ WITH Linux-syscall-note) OR 
BSD-3-Clause) */
-/*
- * ePAPR hcall interface
- *
- * Copyright 2008-2011 Freescale Semiconductor, Inc.
- *
- * Author: Timur Tabi 
- *
- * This file is provided under a dual BSD/GPL license.  When using or
- * redistributing this file, you may do so under either license.
- *
- * Redistribution and use in source and binary forms, with or without
- * modification, are permitted provided that the following conditions are met:
- * * Redistributions of source code must retain the above copyright
- *   notice, this list of conditions and the following disclaimer.
- * * Redistributions in binary form must reproduce the above copyright
- *   notice, this list of conditions and the following disclaimer in the
- *   documentation and/or other materials provided with the distribution.
- * * Neither the name of Freescale Semiconductor nor the
- *   names of its contributors may be used to endorse or promote products
- *   derived from this software without specific prior written permission.
- *
- *
- * ALTERNATIVELY, this software may be distributed under the terms of the
- * GNU General Public License ("GPL") as published by the Free Software
- * Foundation, either version 2 of that License or (at your option) any
- * later version.
- *
- * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor ``AS IS'' AND ANY
- * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
- * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
- * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY
- * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
- * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
- * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
- * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF 
THIS
- * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-
-#ifndef _ASM_POWERPC_EPAPR_HCALLS_H
-#define _ASM_POWERPC_EPAPR_HCALLS_H
-
-#define EV_BYTE_CHANNEL_SEND   1
-#define EV_BYTE_CHANNEL_RECEIVE2
-#define EV_BYTE_CHANNEL_POLL   3
-#define EV_INT_SET_CONFIG  4
-#define EV_INT_GET_CONFIG  5
-#define EV_INT_SET_MASK6
-#define

1 2 3 >

1 - 100 of 259 matches

Mail list logo