date:20161017

Re: [Qemu-devel] [RFC v5 0/8] KVM PCI/MSI passthrough with mach-virt

2016-10-17 Thread Fam Zheng

On Mon, 10/17 22:56, Auger Eric wrote:
> Hi,
> On 17/10/2016 22:16, no-re...@ec2-52-6-146-230.compute-1.amazonaws.com
> wrote:
> > Hi,
> > 
> > Your series failed automatic build test. Please find the testing commands 
> > and
> > their output below. If you have docker installed, you can probably 
> > reproduce it
> > locally.
> > 
> > Subject: [Qemu-devel] [RFC v5 0/8] KVM PCI/MSI passthrough with mach-virt
> > Type: series
> > Message-id: 1476733110-14293-1-git-send-email-eric.au...@redhat.com
> > 
> > === TEST SCRIPT BEGIN ===
> > #!/bin/bash
> > set -e
> > git submodule update --init dtc
> > # Let docker tests dump environment info
> > export SHOW_ENV=1
> > export J=16
> > make docker-test-quick@centos6
> > make docker-test-mingw@fedora
> > === TEST SCRIPT END ===
> > 
> > Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
> > From https://github.com/patchew-project/qemu
> >  * [new tag] 
> > patchew/1476733110-14293-1-git-send-email-eric.au...@redhat.com -> 
> > patchew/1476733110-14293-1-git-send-email-eric.au...@redhat.com
> > Switched to a new branch 'test'
> > 465a02b hw: vfio: common: Adapt vfio_listeners for reserved_iova region
> > 25541d6 hw: vfio: common: vfio_prepare_msi_mapping
> > 9b97aec hw: platform-bus: Add platform bus stub
> > 94589db hw: platform-bus: Enable to map any memory region onto the 
> > platform-bus
> > 425e1ac memory: memory_region_find_by_name
> > 4a2c82c memory: Add reserved_iova region type
> > fdf9cd8 hw: vfio: common: vfio_get_iommu_type1_info
> > eb2e918 linux-headers: Partial update for MSI IOVA handling
> > 
> > === OUTPUT BEGIN ===
> > Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 
> > 'dtc'
> > Cloning into 'dtc'...
> > Submodule path 'dtc': checked out '65cc4d2748a2c2e6f27f1cf39e07a5dbabd80ebf'
> >   BUILD   centos6
> > make[1]: Entering directory '/var/tmp/patchew-tester-tmp-zydd_mdj/src'
> >   ARCHIVE qemu.tgz
> >   ARCHIVE dtc.tgz
> >   COPYRUNNER
> > RUN test-quick in qemu:centos6 
> > Packages installed:
> > SDL-devel-1.2.14-7.el6_7.1.x86_64
> > ccache-3.1.6-2.el6.x86_64
> > epel-release-6-8.noarch
> > gcc-4.4.7-17.el6.x86_64
> > git-1.7.1-4.el6_7.1.x86_64
> > glib2-devel-2.28.8-5.el6.x86_64
> > libfdt-devel-1.4.0-1.el6.x86_64
> > make-3.81-23.el6.x86_64
> > package g++ is not installed
> > pixman-devel-0.32.8-1.el6.x86_64
> > tar-1.23-15.el6_8.x86_64
> > zlib-devel-1.2.3-29.el6.x86_64
> > 
> > Environment variables:
> > PACKAGES=libfdt-devel ccache tar git make gcc g++ zlib-devel 
> > glib2-devel SDL-devel pixman-devel epel-release
> > HOSTNAME=a077d39b13de
> > TERM=xterm
> > MAKEFLAGS= -j16
> > HISTSIZE=1000
> > J=16
> > USER=root
> > CCACHE_DIR=/var/tmp/ccache
> > EXTRA_CONFIGURE_OPTS=
> > V=
> > SHOW_ENV=1
> > MAIL=/var/spool/mail/root
> > PATH=/usr/lib/ccache:/usr/lib64/ccache:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
> > PWD=/
> > LANG=en_US.UTF-8
> > TARGET_LIST=
> > HISTCONTROL=ignoredups
> > SHLVL=1
> > HOME=/root
> > TEST_DIR=/tmp/qemu-test
> > LOGNAME=root
> > LESSOPEN=||/usr/bin/lesspipe.sh %s
> > FEATURES= dtc
> > DEBUG=
> > G_BROKEN_FILENAMES=1
> > CCACHE_HASHDIR=
> > _=/usr/bin/env
> > 
> > Configure options:
> > --enable-werror --target-list=x86_64-softmmu,aarch64-softmmu 
> > --prefix=/var/tmp/qemu-build/install
> > No C++ compiler available; disabling C++ specific optional code
> > Install prefix/var/tmp/qemu-build/install
> > BIOS directory/var/tmp/qemu-build/install/share/qemu
> > binary directory  /var/tmp/qemu-build/install/bin
> > library directory /var/tmp/qemu-build/install/lib
> > module directory  /var/tmp/qemu-build/install/lib/qemu
> > libexec directory /var/tmp/qemu-build/install/libexec
> > include directory /var/tmp/qemu-build/install/include
> > config directory  /var/tmp/qemu-build/install/etc
> > local state directory   /var/tmp/qemu-build/install/var
> > Manual directory  /var/tmp/qemu-build/install/share/man
> > ELF interp prefix /usr/gnemul/qemu-%M
> > Source path   /tmp/qemu-test/src
> > C compilercc
> > Host C compiler   cc
> > C++ compiler  
> > Objective-C compiler cc
> > ARFLAGS   rv
> > CFLAGS-O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -g 
> > QEMU_CFLAGS   -I/usr/include/pixman-1-pthread 
> > -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include   -fPIE -DPIE -m64 
> > -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE 
> > -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings 
> > -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv  
> > -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs 
> > -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers 
> > -Wold-style-declaration -Wold-style-definition -Wtype-limits 
> > -fstack-protector-all
> > LDFLAGS   -Wl,--warn-common -Wl,-z,relro -Wl,-z,now -pie -m64 -g 
> > make  make
> > install   install
> > pythonpython -B
> > smbd

[Qemu-devel] [PATCH] docs/rcu.txt: Fix minor typo

2016-10-17 Thread Pranith Kumar

s/presented/prevented/

Signed-off-by: Pranith Kumar 
---
 docs/rcu.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/rcu.txt b/docs/rcu.txt
index a70b72c..c84e7f4 100644
--- a/docs/rcu.txt
+++ b/docs/rcu.txt
@@ -145,7 +145,7 @@ The core RCU API is small:
 and then read from there.
 
 RCU read-side critical sections must use atomic_rcu_read() to
-read data, unless concurrent writes are presented by another
+read data, unless concurrent writes are prevented by another
 synchronization mechanism.
 
 Furthermore, RCU read-side critical sections should traverse the
-- 
2.10.1

Re: [Qemu-devel] [PATCH v4 RESEND 0/3] IOMMU: intel_iommu support map and unmap notifications

2016-10-17 Thread Alex Williamson

On Tue, 18 Oct 2016 15:06:55 +1100
David Gibson  wrote:

> On Mon, Oct 17, 2016 at 10:07:36AM -0600, Alex Williamson wrote:
> > On Mon, 17 Oct 2016 18:44:21 +0300
> > "Aviv B.D"  wrote:
> >   
> > > From: "Aviv Ben-David" 
> > > 
> > > * Advertize Cache Mode capability in iommu cap register. 
> > >   This capability is controlled by "cache-mode" property of intel-iommu 
> > > device.
> > >   To enable this option call QEMU with "-device 
> > > intel-iommu,cache-mode=true".
> > > 
> > > * On page cache invalidation in intel vIOMMU, check if the domain belong 
> > > to
> > >   registered notifier, and notify accordingly.
> > > 
> > > Currently this patch still doesn't enabling VFIO devices support with 
> > > vIOMMU 
> > > present. Current problems:
> > > * vfio_iommu_map_notify is not aware about memory range belong to 
> > > specific 
> > >   VFIOGuestIOMMU.  
> > 
> > Could you elaborate on why this is an issue?
> >   
> > > * memory_region_iommu_replay hangs QEMU on start up while it itterate 
> > > over 
> > >   64bit address space. Commenting out the call to this function enables 
> > >   workable VFIO device while vIOMMU present.  
> > 
> > This has been discussed previously, it would be incorrect for vfio not
> > to call the replay function.  The solution is to add an iommu driver
> > callback to efficiently walk the mappings within a MemoryRegion.  
> 
> Right, replay is a bit of a hack.  There are a couple of other
> approaches that might be adequate without a new callback:
>- Make the VFIOGuestIOMMU aware of the guest address range mapped
>  by the vIOMMU.  Intel currently advertises that as a full 64-bit
>  address space, but I bet that's not actually true in practice.
>- Have the IOMMU MR advertise a (minimum) page size for vIOMMU
>  mappings.  That may let you stpe through the range with greater
>  strides

Hmm, VT-d supports at least a 39-bit address width and always supports
a minimum 4k page size, so yes that does reduce us from 2^52 steps down
to 2^27, but it's still absurd to walk through the raw address space.
It does however seem correct to create the MemoryRegion with a width
that actually matches the IOMMU capability, but I don't think that's a
sufficient fix by itself.  Thanks,

Alex

Re: [Qemu-devel] [PATCH v4 RESEND 2/3] IOMMU: change iommu_op->translate's is_write to flags, add support to NO_FAIL flag mode

2016-10-17 Thread David Gibson

On Mon, Oct 17, 2016 at 06:44:23PM +0300, Aviv B.D wrote:
> From: "Aviv Ben-David" 
> 
> Supports translation trials without reporting error to guest on
> translation failure.
> 
> Signed-off-by: Aviv Ben-David 
> ---
>  exec.c|  3 ++-
>  hw/i386/amd_iommu.c   |  4 +--
>  hw/i386/intel_iommu.c | 70 
> +--
>  include/exec/memory.h |  6 +++--
>  memory.c  |  3 ++-
>  5 files changed, 55 insertions(+), 31 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index 374c364..266fa01 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -432,7 +432,8 @@ MemoryRegion *address_space_translate(AddressSpace *as, 
> hwaddr addr,
>  break;
>  }
>  
> -iotlb = mr->iommu_ops->translate(mr, addr, is_write);
> +iotlb = mr->iommu_ops->translate(mr, addr,
> + is_write ? IOMMU_WO : IOMMU_RO);
>  addr = ((iotlb.translated_addr & ~iotlb.addr_mask)
>  | (addr & iotlb.addr_mask));
>  *plen = MIN(*plen, (addr | iotlb.addr_mask) - addr + 1);
> diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
> index 47b79d9..1f0d76b 100644
> --- a/hw/i386/amd_iommu.c
> +++ b/hw/i386/amd_iommu.c
> @@ -988,7 +988,7 @@ static inline bool amdvi_is_interrupt_addr(hwaddr addr)
>  }
>  
>  static IOMMUTLBEntry amdvi_translate(MemoryRegion *iommu, hwaddr addr,
> - bool is_write)
> + IOMMUAccessFlags flags)

You've also updated the intel viommu implementation for the new
notifier flags semantics, but none of the others (AMD and sPAPR).
That needs to be done as well.

>  {
>  AMDVIAddressSpace *as = container_of(iommu, AMDVIAddressSpace, iommu);
>  AMDVIState *s = as->iommu_state;
> @@ -1017,7 +1017,7 @@ static IOMMUTLBEntry amdvi_translate(MemoryRegion 
> *iommu, hwaddr addr,
>  return ret;
>  }
>  
> -amdvi_do_translate(as, addr, is_write, );
> +amdvi_do_translate(as, addr, flags & IOMMU_WO, );
>  trace_amdvi_translation_result(as->bus_num, PCI_SLOT(as->devfn),
>  PCI_FUNC(as->devfn), addr, ret.translated_addr);
>  return ret;
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 69730cb..dcf45f0 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -364,7 +364,7 @@ static void vtd_set_frcd_and_update_ppf(IntelIOMMUState 
> *s, uint16_t index)
>  /* Must not update F field now, should be done later */
>  static void vtd_record_frcd(IntelIOMMUState *s, uint16_t index,
>  uint16_t source_id, hwaddr addr,
> -VTDFaultReason fault, bool is_write)
> +VTDFaultReason fault, IOMMUAccessFlags flags)
>  {
>  uint64_t hi = 0, lo;
>  hwaddr frcd_reg_addr = DMAR_FRCD_REG_OFFSET + (((uint64_t)index) << 4);
> @@ -373,7 +373,7 @@ static void vtd_record_frcd(IntelIOMMUState *s, uint16_t 
> index,
>  
>  lo = VTD_FRCD_FI(addr);
>  hi = VTD_FRCD_SID(source_id) | VTD_FRCD_FR(fault);
> -if (!is_write) {
> +if (!(flags == IOMMU_WO || flags == IOMMU_RW)) {
>  hi |= VTD_FRCD_T;
>  }
>  vtd_set_quad_raw(s, frcd_reg_addr, lo);
> @@ -404,7 +404,7 @@ static bool vtd_try_collapse_fault(IntelIOMMUState *s, 
> uint16_t source_id)
>  /* Log and report an DMAR (address translation) fault to software */
>  static void vtd_report_dmar_fault(IntelIOMMUState *s, uint16_t source_id,
>hwaddr addr, VTDFaultReason fault,
> -  bool is_write)
> +  IOMMUAccessFlags flags)
>  {
>  uint32_t fsts_reg = vtd_get_long_raw(s, DMAR_FSTS_REG);
>  
> @@ -415,7 +415,7 @@ static void vtd_report_dmar_fault(IntelIOMMUState *s, 
> uint16_t source_id,
>  return;
>  }
>  VTD_DPRINTF(FLOG, "sid 0x%"PRIx16 ", fault %d, addr 0x%"PRIx64
> -", is_write %d", source_id, fault, addr, is_write);
> +", flags %d", source_id, fault, addr, flags);
>  if (fsts_reg & VTD_FSTS_PFO) {
>  VTD_DPRINTF(FLOG, "new fault is not recorded due to "
>  "Primary Fault Overflow");
> @@ -433,7 +433,7 @@ static void vtd_report_dmar_fault(IntelIOMMUState *s, 
> uint16_t source_id,
>  return;
>  }
>  
> -vtd_record_frcd(s, s->next_frcd_reg, source_id, addr, fault, is_write);
> +vtd_record_frcd(s, s->next_frcd_reg, source_id, addr, fault, flags);
>  
>  if (fsts_reg & VTD_FSTS_PPF) {
>  VTD_DPRINTF(FLOG, "there are pending faults already, "
> @@ -629,7 +629,8 @@ static bool vtd_slpte_nonzero_rsvd(uint64_t slpte, 
> uint32_t level)
>  /* Given the @gpa, get relevant @slptep. @slpte_level will be the last level
>   * of the translation, can be used for deciding the size of large page.
>   */
> -static int vtd_gpa_to_slpte(VTDContextEntry *ce, uint64_t gpa,

Re: [Qemu-devel] [PATCH v4 RESEND 3/3] IOMMU: enable intel_iommu map and unmap notifiers

2016-10-17 Thread David Gibson

On Mon, Oct 17, 2016 at 06:44:24PM +0300, Aviv B.D wrote:
> From: "Aviv Ben-David" 
> 
> Adds a list of registered vtd_as's to intel iommu state to save
> iteration over each PCI device in a search of the corrosponding domain.
> 
> Signed-off-by: Aviv Ben-David 
> ---
>  hw/i386/intel_iommu.c  | 102 
> ++---
>  hw/i386/intel_iommu_internal.h |   2 +
>  include/hw/i386/intel_iommu.h  |   9 
>  3 files changed, 106 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index dcf45f0..34fc1e8 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -51,6 +51,9 @@ static int vtd_dbgflags = VTD_DBGBIT(GENERAL) | 
> VTD_DBGBIT(CSR);
>  #define VTD_DPRINTF(what, fmt, ...) do {} while (0)
>  #endif
>  
> +static int vtd_dev_to_context_entry(IntelIOMMUState *s, uint8_t bus_num,
> +uint8_t devfn, VTDContextEntry *ce);
> +
>  static void vtd_define_quad(IntelIOMMUState *s, hwaddr addr, uint64_t val,
>  uint64_t wmask, uint64_t w1cmask)
>  {
> @@ -142,6 +145,23 @@ static uint64_t vtd_set_clear_mask_quad(IntelIOMMUState 
> *s, hwaddr addr,
>  return new_val;
>  }
>  
> +static int vtd_get_did_dev(IntelIOMMUState *s, uint8_t bus_num, uint8_t 
> devfn,
> +   uint16_t *domain_id)
> +{
> +VTDContextEntry ce;
> +int ret_fr;
> +
> +assert(domain_id);
> +
> +ret_fr = vtd_dev_to_context_entry(s, bus_num, devfn, );
> +if (ret_fr) {
> +return -1;
> +}
> +
> +*domain_id =  VTD_CONTEXT_ENTRY_DID(ce.hi);
> +return 0;
> +}
> +
>  /* GHashTable functions */
>  static gboolean vtd_uint64_equal(gconstpointer v1, gconstpointer v2)
>  {
> @@ -683,9 +703,6 @@ static int vtd_gpa_to_slpte(VTDContextEntry *ce, uint64_t 
> gpa,
>  *reads = (*reads) && (slpte & VTD_SL_R);
>  *writes = (*writes) && (slpte & VTD_SL_W);
>  if (!(slpte & access_right_check)) {
> -VTD_DPRINTF(GENERAL, "error: lack of %s permission for "
> -"gpa 0x%"PRIx64 " slpte 0x%"PRIx64,
> -(flags == IOMMU_WO ? "write" : "read"), gpa, slpte);
>  return (flags == IOMMU_RW || flags == IOMMU_WO) ?
> -VTD_FR_WRITE : -VTD_FR_READ;
>  }
> @@ -734,9 +751,6 @@ static int vtd_dev_to_context_entry(IntelIOMMUState *s, 
> uint8_t bus_num,
>  }
>  
>  if (!vtd_context_entry_present(ce)) {
> -VTD_DPRINTF(GENERAL,
> -"error: context-entry #%"PRIu8 "(bus #%"PRIu8 ") "
> -"is not present", devfn, bus_num);
>  return -VTD_FR_CONTEXT_ENTRY_P;
>  } else if ((ce->hi & VTD_CONTEXT_ENTRY_RSVD_HI) ||
> (ce->lo & VTD_CONTEXT_ENTRY_RSVD_LO)) {
> @@ -1065,6 +1079,55 @@ static void 
> vtd_iotlb_domain_invalidate(IntelIOMMUState *s, uint16_t domain_id)
>  _id);
>  }
>  
> +static void vtd_iotlb_page_invalidate_notify(IntelIOMMUState *s,
> +   uint16_t domain_id, hwaddr addr,
> +   uint8_t am)
> +{
> +IntelIOMMUNotifierNode *node;
> +
> +QLIST_FOREACH(node, &(s->notifiers_list), next) {

It's not really obvious to me why you need this additional list of
IntelIOMMUNotifierNode structures, rather than just the notifier list
already built into each MemoryRegion.

> +VTDAddressSpace *vtd_as = node->vtd_as;
> +uint16_t vfio_domain_id;
> +int ret = vtd_get_did_dev(s, pci_bus_num(vtd_as->bus), vtd_as->devfn,
> +  _domain_id);
> +if (!ret && domain_id == vfio_domain_id) {
> +IOMMUTLBEntry entry;
> +
> +/* notify unmap */
> +if (node->notifier_flag & IOMMU_NOTIFIER_UNMAP) {
> +VTD_DPRINTF(GENERAL, "Remove addr 0x%"PRIx64 " mask %d",
> +addr, am);
> +entry.target_as = _space_memory;
> +entry.iova = addr & VTD_PAGE_MASK_4K;
> +entry.translated_addr = 0;
> +entry.addr_mask = ~VTD_PAGE_MASK(VTD_PAGE_SHIFT_4K + am);
> +entry.perm = IOMMU_NONE;
> +memory_region_notify_iommu(>vtd_as->iommu, entry);
> +}
> +
> +/* notify map */
> +if (node->notifier_flag & IOMMU_NOTIFIER_MAP) {
> +hwaddr original_addr = addr;
> +VTD_DPRINTF(GENERAL, "add addr 0x%"PRIx64 " mask %d", addr, 
> am);
> +while (addr < original_addr + (1 << am) * VTD_PAGE_SIZE) {
> +/* call to vtd_iommu_translate */
> +IOMMUTLBEntry entry = s->iommu_ops.translate(
> + 
> >vtd_as->iommu,
> + addr,
> +

Re: [Qemu-devel] [PATCH v4 RESEND 0/3] IOMMU: intel_iommu support map and unmap notifications

2016-10-17 Thread David Gibson

On Mon, Oct 17, 2016 at 10:07:36AM -0600, Alex Williamson wrote:
> On Mon, 17 Oct 2016 18:44:21 +0300
> "Aviv B.D"  wrote:
> 
> > From: "Aviv Ben-David" 
> > 
> > * Advertize Cache Mode capability in iommu cap register. 
> >   This capability is controlled by "cache-mode" property of intel-iommu 
> > device.
> >   To enable this option call QEMU with "-device 
> > intel-iommu,cache-mode=true".
> > 
> > * On page cache invalidation in intel vIOMMU, check if the domain belong to
> >   registered notifier, and notify accordingly.
> > 
> > Currently this patch still doesn't enabling VFIO devices support with 
> > vIOMMU 
> > present. Current problems:
> > * vfio_iommu_map_notify is not aware about memory range belong to specific 
> >   VFIOGuestIOMMU.
> 
> Could you elaborate on why this is an issue?
> 
> > * memory_region_iommu_replay hangs QEMU on start up while it itterate over 
> >   64bit address space. Commenting out the call to this function enables 
> >   workable VFIO device while vIOMMU present.
> 
> This has been discussed previously, it would be incorrect for vfio not
> to call the replay function.  The solution is to add an iommu driver
> callback to efficiently walk the mappings within a MemoryRegion.

Right, replay is a bit of a hack.  There are a couple of other
approaches that might be adequate without a new callback:
   - Make the VFIOGuestIOMMU aware of the guest address range mapped
 by the vIOMMU.  Intel currently advertises that as a full 64-bit
 address space, but I bet that's not actually true in practice.
   - Have the IOMMU MR advertise a (minimum) page size for vIOMMU
 mappings.  That may let you stpe through the range with greater
 strides

> Thanks,
> 
> Alex
> 
> > Changes from v1 to v2:
> > * remove assumption that the cache do not clears
> > * fix lockup on high load.
> > 
> > Changes from v2 to v3:
> > * remove debug leftovers
> > * split to sepearate commits
> > * change is_write to flags in vtd_do_iommu_translate, add IOMMU_NO_FAIL 
> >   to suppress error propagating to guest.
> > 
> > Changes from v3 to v4:
> > * Add property to intel_iommu device to control the CM capability, 
> >   default to False.
> > * Use s->iommu_ops.notify_flag_changed to register notifiers.
> > 
> > Changes from v4 to v4 RESEND:
> > * Fix codding style pointed by checkpatch.pl script.
> > 
> > Aviv Ben-David (3):
> >   IOMMU: add option to enable VTD_CAP_CM to vIOMMU capility exposoed to
> > guest
> >   IOMMU: change iommu_op->translate's is_write to flags, add support to
> > NO_FAIL flag mode
> >   IOMMU: enable intel_iommu map and unmap notifiers
> > 
> >  exec.c |   3 +-
> >  hw/i386/amd_iommu.c|   4 +-
> >  hw/i386/intel_iommu.c  | 175 
> > +
> >  hw/i386/intel_iommu_internal.h |   3 +
> >  include/exec/memory.h  |   6 +-
> >  include/hw/i386/intel_iommu.h  |  11 +++
> >  memory.c   |   3 +-
> >  7 files changed, 168 insertions(+), 37 deletions(-)
> > 
> 
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [V2,1/7] nios2: Add disas entries

2016-10-17 Thread Marek Vasut

On 10/15/2016 03:15 PM, Romain Naour wrote:
> Hi Marek,

Hi!

> Le 28/09/2016 à 01:30, Marek Vasut a écrit :
>> Add nios2 disassembler support. This patch is composed from binutils files
>> from commit "Opcodes and assembler support for Nios II R2". The files from
>> binutils used in this patch are:
>>
>> include/opcode/nios2.h
>> include/opcode/nios2r1.h
>> include/opcode/nios2r2.h
>> opcodes/nios2-opc.c
>> opcodes/nios2-dis.c
> 
> With Waldemar Brodkorb and I, we tested this series using 10m50 kernel 
> defconfig
> with Buildroot generated system. In order to ease the test, we added the 
> device
> tree and a initramfs to the kernel image.
> 
> Here is the result:
> 
> Welcome to Buildroot
> # cat /proc/cpuinfo
> CPU:  Nios II/fast
> MMU:  present
> FPU:  none
> Clocking: 75.00 MHz
> BogoMips: 150.00
> Calibration:  7500 loops
> HW:
>  MUL: yes
>  MULX:no
>  DIV: yes
> Icache:   32kB, line length: 32
> Dcache:   32kB, line length: 32
> TLB:  16 ways, 256 entries, 8 PID bits
> # uname -a
> Linux buildroot 4.8.1 #2 Fri Oct 14 19:10:18 CEST 2016 nios2 GNU/Linux
> 
> When this series will be accepted in Qemu, I'll add a demo defconfig in
> Buildroot in order to ease runtime testing.
> 
> Tested-by: Romain Naour 

Great, thanks ! I'm glad to see it really becomes usable for other
people too :)

I handled the feedback and pushed updated patches to:
http://git.bfuser.eu/?p=marex/qemu.git;a=shortlog;h=refs/heads/nios2/master

-- 
Best regards,
Marek Vasut

Re: [Qemu-devel] [PATCH 2/7] nios2: Add architecture emulation support

2016-10-17 Thread Marek Vasut

On 09/29/2016 03:05 AM, Richard Henderson wrote:
>>  hw/nios2/cpu_pic.c |   70 +++
> 
> Why is this in this patch?

Ah, good catch, moved to 10m50 support patch.

>>  target-nios2/instruction.c | 1427
>> 
>>  target-nios2/instruction.h |  279 +
>>  target-nios2/translate.c   |  242 
> 
> Why are these files separate?

Remnant of the old code, will merge them.

>> +if (n < 32)/* GP regs */
>> +return gdb_get_reg32(mem_buf, env->regs[n]);
>> +else if (n == 32)/* PC */
>> +return gdb_get_reg32(mem_buf, env->regs[R_PC]);
>> +else if (n < 49)/* Status regs */
>> +return gdb_get_reg32(mem_buf, env->regs[n - 1]);
> 
> Use checkpatch.pl; the formatting is wrong.

Fixed across the entire series, thanks.

> There's no particular reason why R_PC needs to be 64; if you change it
> to 32, you can simplify this.

I believe this is in fact needed, see [1] page 18 (section 2, Register
file), quote:

"
The Nios II architecture supports a flat register file, consisting of
thirty-two 32-bit general-purpose integer
registers, and up to thirty-two 32-bit control registers. The
architecture supports supervisor and user
modes that allow system code to protect the control registers from
errant applications.
"

So the CPU supports 32 general purpose regs (R_ZERO..R_RA), then up-to
32 Control registers (CR_STATUS..CR_MPUACC) and then the PC .


[1]
https://www.altera.com/content/dam/altera-www/global/en_US/pdfs/literature/hb/nios2/n2cpu_nii5v1.pdf

>> +struct CPUNios2State {
>> +uint32_t regs[NUM_CORE_REGS];
>> +
>> +/* Addresses that are hard-coded in the FPGA build settings */
>> +uint32_t reset_addr;
>> +uint32_t exception_addr;
>> +uint32_t fast_tlb_miss_addr;
> 
> These three should go after ...
> 
>> +
>> +#if !defined(CONFIG_USER_ONLY)
>> +Nios2MMU mmu;
>> +
>> +uint32_t irq_pending;
>> +#endif
>> +
>> +CPU_COMMON
> 
> ... here, or even into ...
> 
>> +};
>> +
>> +/**
>> + * Nios2CPU:
>> + * @env: #CPUNios2State
>> + *
>> + * A Nios2 CPU.
>> + */
>> +typedef struct Nios2CPU {
>> +/*< private >*/
>> +CPUState parent_obj;
>> +/*< public >*/
>> +
>> +CPUNios2State env;
>> +bool mmu_present;
> 
> ... here.

Moved here, it looks more sensible as it's not something one can change
at runtime.

>> +static inline void t_gen_helper_raise_exception(DisasContext *dc,
>> +uint32_t index)
> 
> Remove all of the inline markers and let the compiler decide.

Done globally, thanks.

>> +static void gen_check_supervisor(DisasContext *dc, TCGLabel *label)
>> +{
>> +TCGLabel *l1 = gen_new_label();
>> +
>> +TCGv_i32 tmp = tcg_temp_new();
>> +tcg_gen_andi_tl(tmp, dc->cpu_R[CR_STATUS], CR_STATUS_U);
>> +tcg_gen_brcond_tl(TCG_COND_EQ, dc->cpu_R[R_ZERO], tmp, l1);
>> +t_gen_helper_raise_exception(dc, EXCP_SUPERI);
>> +tcg_gen_br(label);
> 
> The supervisor check should be done at translate time by checking
> dc->tb->flags & CR_STATUS_U.

Fixed.

>> +#ifdef CALL_TRACING
>> +TCGv_i32 tmp = tcg_const_i32(dc->pc);
>> +TCGv_i32 tmp2 = tcg_const_i32((dc->pc & 0xF000) |
>> (instr->imm26 * 4));
>> +gen_helper_call_status(tmp, tmp2);
>> +tcg_temp_free_i32(tmp);
>> +tcg_temp_free_i32(tmp2);
>> +#endif
> 
> What's the point of this?

Seems like some stale debug helper, removed, thanks.

>> +/*
>> + * I-Type instructions
>> + */
>> +
>> +/* rB <- 0x00 : Mem8[rA + @(IMM16)] */
>> +static void ldbu(DisasContext *dc, uint32_t code)
>> +{
>> +I_TYPE(instr, code);
>> +
>> +TCGv addr = tcg_temp_new();
>> +tcg_gen_addi_tl(addr, dc->cpu_R[instr->a],
>> +(int32_t)((int16_t)instr->imm16));
>> +
>> +tcg_gen_qemu_ld8u(dc->cpu_R[instr->b], addr, dc->mem_idx);
>> +
>> +tcg_temp_free(addr);
>> +}
> 
> You should have helper functions so that you don't have to replicate 12
> line functions.  Use the tcg_gen_qemu_ld_tl function, and the TCGMemOp
> flags to help merge all load instructions, and all store instructions.

Oh nice, thanks.

>> +/* rB <- rA + IMM16 */
>> +static void addi(DisasContext *dc, uint32_t code)
>> +{
>> +I_TYPE(instr, code);
>> +
>> +TCGv imm = tcg_temp_new();
>> +tcg_gen_addi_tl(dc->cpu_R[instr->b], dc->cpu_R[instr->a],
>> +(int32_t)((int16_t)instr->imm16));
> 
> The double cast is pointless, as are the extra parenthesis.

The int16_t cast is needed as the imm16 is signed value. The int32()
cast is indeed pointless, so removed. Thanks.

> You're not doing anything to make sure that r0 reads as 0, and ignores
> writes. You need to look at target-alpha, target-mips, or target-sparc
> to see various ways in which a zero register may be handled.

Any hint on this one ?

> You should handle the special case of movi, documented as addi rb, r0,
> imm, so that the common method of loading a small value does not require
> extra

Re: [Qemu-devel] [PATCH v9 12/12] docs: Sample driver to demonstrate how to use Mediated device framework.

2016-10-17 Thread Dong Jia Shi

* Kirti Wankhede  [2016-10-18 02:52:12 +0530]:

...snip...

> +static ssize_t mdev_access(struct mdev_device *mdev, char *buf,
> + size_t count, loff_t pos, bool is_write)
> +{
> + struct mdev_state *mdev_state;
> + unsigned int index;
> + loff_t offset;
> + int ret = 0;
> +
> + if (!mdev || !buf)
> + return -EINVAL;
> +
> + mdev_state = mdev_get_drvdata(mdev);
> + if (!mdev_state) {
> + pr_err("%s mdev_state not found\n", __func__);
> + return -EINVAL;
> + }
> +
> + mutex_lock(_state->ops_lock);
> +
> + index = MTTY_VFIO_PCI_OFFSET_TO_INDEX(pos);
> + offset = pos & MTTY_VFIO_PCI_OFFSET_MASK;
> + switch (index) {
> + case VFIO_PCI_CONFIG_REGION_INDEX:
> +
> +#if defined(DEBUG)
> + pr_info("%s: PCI config space %s at offset 0x%llx\n",
> +  __func__, is_write ? "write" : "read", offset);
> +#endif
> + if (is_write) {
> + dump_buffer(buf, count);
> + handle_pci_cfg_write(mdev_state, offset, buf, count);
> + } else {
> + memcpy(buf, (mdev_state->vconfig + offset), count);
> + dump_buffer(buf, count);
Dear Kirti:

Shouldn't we use copy_from_user instead of memcpy on @buf here? And I'm
wondering if dump_buffer could really work since it tries to dereference
a *__user* marked pointor.

Otherwise, this is a good example driver. Thanks!

> + }
> +
> + break;
> +
> + case VFIO_PCI_BAR0_REGION_INDEX ... VFIO_PCI_BAR5_REGION_INDEX:
> + if (!mdev_state->region_info[index].start)
> + mdev_read_base(mdev_state);
> +
> + if (is_write) {
> + dump_buffer(buf, count);
> +
> +#if defined(DEBUG_REGS)
> + pr_info("%s: BAR%d  WR @0x%llx %s val:0x%02x dlab:%d\n",
> + __func__, index, offset, wr_reg[offset],
> + (u8)*buf, mdev_state->s[index].dlab);
> +#endif
> + handle_bar_write(index, mdev_state, offset, buf, count);
> + } else {
> + handle_bar_read(index, mdev_state, offset, buf, count);
> + dump_buffer(buf, count);
> +
> +#if defined(DEBUG_REGS)
> + pr_info("%s: BAR%d  RD @0x%llx %s val:0x%02x dlab:%d\n",
> + __func__, index, offset, rd_reg[offset],
> + (u8)*buf, mdev_state->s[index].dlab);
> +#endif
> + }
> + break;
> +
> + default:
> + ret = -1;
> + goto accessfailed;
> + }
> +
> + ret = count;
> +
> +
> +accessfailed:
> + mutex_unlock(_state->ops_lock);
> +
> + return ret;
> +}
> +
...snip...

> +ssize_t mtty_read(struct mdev_device *mdev, char __user *buf,
> + size_t count, loff_t *ppos)
> +{
> + return mdev_access(mdev, buf, count, *ppos, false);
> +}
> +
> +ssize_t mtty_write(struct mdev_device *mdev, const char __user *buf,
> +  size_t count, loff_t *ppos)
> +{
> + return mdev_access(mdev, (char *)buf, count, *ppos, true);
> +}
> +
...snip...

-- 
Dong Jia

Re: [Qemu-devel] [PATCH] vfio: fix duplicate function call

2016-10-17 Thread Cao jin




On 10/17/2016 11:01 PM, Alex Williamson wrote:

On Mon, 17 Oct 2016 16:57:08 +0800
Cao jin  wrote:


Hi,

On 10/14/2016 11:50 PM, Alex Williamson wrote:

On Fri, 14 Oct 2016 19:16:59 +0800
Cao jin  wrote:


When vfio device is reset(encounter FLR, or bus reset), if need to do
bus reset(vfio_pci_hot_reset_one is called), vfio_pci_pre_reset &
vfio_pci_post_reset will be called twice.

Signed-off-by: Cao jin 
---
Also has a little question on vfio_pci_reset. it will be called when encounter
bus reset, or FLR. The reset method's priority in this function now is:

  1. If has "device specific reset function", then do it
  2. If has FLR, then do it.
  3. If it can do bus reset(only 1 affected device), then do it
  4. If has pm_reset, then do it

The question is: why pm reset has low priority than bus reset(if it does
can do a bus reset)? why bus reset is not the last choice? In PCI driver
of kernel, pls see __pci_dev_reset, we can see, if device support pm reset,
it won't do bus reset.


The PCI spec doesn't really define what sort of reset is done with a PM
reset.  My thinking was that if a device advertises an FLR capability
then the hardware has made a concerted effort to have a per function
reset mechanism available.  NoSoftRst- is not terribly common and it's
not entirely clear to me that the hardware has made a conscious effort
to provide this for the purposes of per function reset mechanism.
Therefore I've opt'd to prioritize a bus reset over a PM reset.



I still have a question about vfio_pci_reset. I checked commit message
in f16f39c3, if I understand right, couldn't we put

  /* See if we can do our own bus reset */
  if (!vfio_pci_hot_reset_one(vdev)) {
  goto post_reset;
  }

in the 1st priority? Because if there is 1 affected device, then it will
do bus reset which is the best reset we can do; if there are more than 1
affected devices, after this patch, vfio_pci_hot_reset_one will do
nothing, and then try other reset methods.


It's possible, yes, but that disregards that the hardware has gone to
the trouble to implement a proper function level reset.  As I
explained, I de-prioritize PM reset, specifically because I'm not sure
if hardware designers are necessarily intending it for the purpose of a
device reset.  For FLR this is the entire purpose of the interface.  We
also have a fair bit of experience with the current priority scheme and
I would not take it lightly to change without some compelling evidence
to prove that a new priority scheme is better than the existing.  There
do also exist devices which do not behave properly with a secondary bus
reset, see drivers/pci/quirks.c:quirk_no_bus_reset() in the kernel
tree.  It's possible more devices like this exist, but we don't see
them because they implement FLR.  A bus reset may result in a more
complete device reset, but it's also more disruptive to the system.
Thanks,

Alex



I see. Thanks Alex, I think these are valuable info to me, although 
maybe I still need more time in the future to understand these totally.


--
Yours Sincerely,

Cao jin

Re: [Qemu-devel] [PATCH v8 09/11] virtio-crypto: add data queue processing handler

2016-10-17 Thread Gonglei (Arei)

> +static CryptoDevBackendSymOpInfo *
> +virtio_crypto_sym_op_helper(VirtIODevice *vdev,
> +   struct virtio_crypto_cipher_para *cipher_para,
> +   struct virtio_crypto_alg_chain_data_para *alg_chain_para,
> +   struct iovec *iov, unsigned int out_num)
> +{
> +CryptoDevBackendSymOpInfo *op_info;
> +uint32_t src_len = 0, dst_len = 0;
> +uint32_t iv_len = 0;
> +uint32_t aad_len = 0, hash_result_len = 0;
> +uint32_t hash_start_src_offset = 0, len_to_hash = 0;
> +uint32_t cipher_start_src_offset = 0, len_to_cipher = 0;
> +
> +size_t max_len, curr_size = 0;
> +size_t s;
> +
> +/* Plain cipher */
> +if (cipher_para) {
> +iv_len = virtio_ldl_p(vdev, _para->iv_len);
> +src_len = virtio_ldl_p(vdev, _para->src_data_len);
> +dst_len = virtio_ldl_p(vdev, _para->dst_data_len);
> +} else if (alg_chain_para) { /* Algorithm chain */
> +iv_len = virtio_ldl_p(vdev, _chain_para->iv_len);
> +src_len = virtio_ldl_p(vdev, _chain_para->src_data_len);
> +dst_len = virtio_ldl_p(vdev, _chain_para->dst_data_len);
> +
> +aad_len = virtio_ldl_p(vdev, _chain_para->aad_len);
> +hash_result_len = virtio_ldl_p(vdev,
> +  _chain_para->hash_result_len);
> +hash_start_src_offset = virtio_ldl_p(vdev,
> + _chain_para->hash_start_src_offset);
> +cipher_start_src_offset = virtio_ldl_p(vdev,
> + _chain_para->cipher_start_src_offset);
> +len_to_cipher = virtio_ldl_p(vdev, _chain_para->len_to_cipher);
> +len_to_hash = virtio_ldl_p(vdev, _chain_para->len_to_hash);
> +} else {
> +return NULL;
> +}
> +
> +max_len = iv_len + aad_len + src_len + dst_len + hash_result_len;
> +if (max_len == LONG_MAX - sizeof(CryptoDevBackendSymOpInfo)) {
> +virtio_error(vdev, "virtio-crypto too big length");
> +return NULL;
> +}
> +
The check should be:

if (unlikely(max_len > LONG_MAX - sizeof(CryptoDevBackendSymOpInfo))) {
virtio_error(vdev, "virtio-crypto too big length");
return NULL;
}

Regards,
-Gonglei

Re: [Qemu-devel] [PATCH qemu] sysemu: support up to 1024 vCPUs

2016-10-17 Thread David Gibson

On Tue, Oct 18, 2016 at 12:07:20PM +1100, Alexey Kardashevskiy wrote:
> Ping, anyone?
> 
> I rather expected floods of mails on such a controversial topic :)

I think it's the reverse of the bike-shed problem.  Hardly anyone
feels qualified to comment on it.

It looks ok to me, but I just don't know if there are other subtle
dependencies on the size of the cpumask buried in the code.

Oh well,  let's say:

Reviewed-by: David Gibson 

and find any problems as they arise.

Not sure who we need to convince to take this into their tree though.

> 
> 
> On 11/10/16 09:19, Alexey Kardashevskiy wrote:
> > Ping, anyone?
> > 
> > 
> > On 04/10/16 11:33, Alexey Kardashevskiy wrote:
> >> From: Greg Kurz 
> >>
> >> Some systems can already provide more than 255 hardware threads.
> >>
> >> Bumping the QEMU limit to 1024 seems reasonable:
> >> - it has no visible overhead in top;
> >> - the limit itself has no effect on hot paths.
> >>
> >> Signed-off-by: Greg Kurz 
> >> Signed-off-by: Alexey Kardashevskiy 
> >> ---
> >>  include/sysemu/sysemu.h | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> >> index ef2c50b..2ec0bd8 100644
> >> --- a/include/sysemu/sysemu.h
> >> +++ b/include/sysemu/sysemu.h
> >> @@ -173,7 +173,7 @@ extern int mem_prealloc;
> >>   *
> >>   * Note that cpu->get_arch_id() may be larger than MAX_CPUMASK_BITS.
> >>   */
> >> -#define MAX_CPUMASK_BITS 255
> >> +#define MAX_CPUMASK_BITS 1024
> >>  
> >>  #define MAX_OPTION_ROMS 16
> >>  typedef struct QEMUOptionRom {
> >>
> > 
> > 
> 
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH qemu] sysemu: support up to 1024 vCPUs

2016-10-17 Thread Alexey Kardashevskiy

Ping, anyone?

I rather expected floods of mails on such a controversial topic :)


On 11/10/16 09:19, Alexey Kardashevskiy wrote:
> Ping, anyone?
> 
> 
> On 04/10/16 11:33, Alexey Kardashevskiy wrote:
>> From: Greg Kurz 
>>
>> Some systems can already provide more than 255 hardware threads.
>>
>> Bumping the QEMU limit to 1024 seems reasonable:
>> - it has no visible overhead in top;
>> - the limit itself has no effect on hot paths.
>>
>> Signed-off-by: Greg Kurz 
>> Signed-off-by: Alexey Kardashevskiy 
>> ---
>>  include/sysemu/sysemu.h | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
>> index ef2c50b..2ec0bd8 100644
>> --- a/include/sysemu/sysemu.h
>> +++ b/include/sysemu/sysemu.h
>> @@ -173,7 +173,7 @@ extern int mem_prealloc;
>>   *
>>   * Note that cpu->get_arch_id() may be larger than MAX_CPUMASK_BITS.
>>   */
>> -#define MAX_CPUMASK_BITS 255
>> +#define MAX_CPUMASK_BITS 1024
>>  
>>  #define MAX_OPTION_ROMS 16
>>  typedef struct QEMUOptionRom {
>>
> 
> 


-- 
Alexey

Re: [Qemu-devel] [Qemu-ppc] [PATCH v5 0/6] tests: enable virtio tests on SPAPR

2016-10-17 Thread David Gibson

On Mon, Oct 17, 2016 at 12:30:18PM +0200, Laurent Vivier wrote:
> This series enables virtio tests on SPAPR by starting
> machines using qtest_pc_boot() or qtest_spapr_boot() to
> use the good libqos PCI framework (pc or spapr).
> 
> It adds also some byte-swapping in virtio-pci.c as
> PCI is always little-endian and the endianness of
> the virtio device depends on the endianness of the
> guest.
> 
> This series does not enable virtio PCI MSI-X tests on
> SPAPR as this needs more work and will be the aim
> of another series.

Looking good to me.  I've tentatively merged this to ppc-for-2.8,
unless someone thinks it doesn't belong there.

> 
> v5:
> - minor updates:
>   - remove declaration within the code
>   - update comment about qtest_vboot()
> (terminates on failure)
> v4:
> - rebase on papr/ppc-for-2.8
> - add a patch to rename target_big_endian() to qvirtio_is_big_endian()
> - Add a patch to remove the QVirtioBus arguments from all virtio functions
> 
> v3:
> - remove "ppc" from allowed archs for virtio-blk-test
> - remove g_assert_nonnull() after g_malloc()
> - add a patch to fix "vs" memory leak in virtio-scsi-test
> - add a patch to remove g_assert() after qtest_spapr_boot()
>   and update comment qtest_vboot() to explain it never
>   returns NULL.
> 
> v2:
> - update comments (virtio-1.0, log)
> - move g_assert_nonnull() to qtest_boot()
> - exit gracefully if the architecture is not supported
> - replace qtest_pc_shutdown() by qtest_shutdown()
> - move qvirtio_scsi_stop() to qvirtio_scsi_pci_free()
> 
> Laurent Vivier (6):
>   tests: fix memory leak in virtio-scsi-test
>   tests: don't check if qtest_spapr_boot() returns NULL
>   tests: move QVirtioBus pointer into QVirtioDevice
>   tests: rename target_big_endian() as qvirtio_is_big_endian()
>   tests: use qtest_pc_boot()/qtest_shutdown() in virtio tests
>   tests: enable virtio tests on SPAPR
> 
>  tests/Makefile.include |   3 +-
>  tests/libqos/libqos.c  |   2 +
>  tests/libqos/virtio-mmio.c |   1 +
>  tests/libqos/virtio-pci.c  |  28 +++-
>  tests/libqos/virtio.c  |  77 +--
>  tests/libqos/virtio.h  |  57 
>  tests/libqtest.h   |  10 --
>  tests/rtas-test.c  |   1 -
>  tests/vhost-user-test.c|  33 +++--
>  tests/virtio-9p-test.c |  69 +-
>  tests/virtio-blk-test.c| 322 
> +
>  tests/virtio-net-test.c| 106 +++
>  tests/virtio-rng-test.c|   7 +-
>  tests/virtio-scsi-test.c   |  91 ++---
>  14 files changed, 400 insertions(+), 407 deletions(-)
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH] ppc/xics: Add xics to the monitor "info pic" command

2016-10-17 Thread David Gibson

On Mon, Oct 17, 2016 at 10:33:14PM +0200, Cédric Le Goater wrote:
> From: Benjamin Herrenschmidt 
> 
> Useful to debug interrupt problems.
> 
> Signed-off-by: Benjamin Herrenschmidt 
> [clg: - updated for qemu-2.7
>   - added a test on ->irqs as it is not necessarily allocated
> (PHB3_MSI)
>   - removed static variable g_xics and replace with a loop on all
> children to find the xics objects.
>   - rebased on InterruptStatsProvider interface ]
> Signed-off-by: Cédric Le Goater 

Applied to ppc-for-2.8.

> ---
>  hw/intc/xics.c | 49 +
>  1 file changed, 49 insertions(+)
> 
> diff --git a/hw/intc/xics.c b/hw/intc/xics.c
> index f40b3a45..7fac964fbd27 100644
> --- a/hw/intc/xics.c
> +++ b/hw/intc/xics.c
> @@ -35,6 +35,8 @@
>  #include "hw/ppc/xics.h"
>  #include "qemu/error-report.h"
>  #include "qapi/visitor.h"
> +#include "monitor/monitor.h"
> +#include "hw/intc/intc.h"
>  
>  int xics_get_cpu_index_by_dt_id(int cpu_dt_id)
>  {
> @@ -90,6 +92,47 @@ void xics_cpu_setup(XICSState *xics, PowerPCCPU *cpu)
>  }
>  }
>  
> +static void xics_common_pic_print_info(InterruptStatsProvider *obj,
> +   Monitor *mon)
> +{
> +XICSState *xics = XICS_COMMON(obj);
> +ICSState *ics;
> +uint32_t i;
> +
> +for (i = 0; i < xics->nr_servers; i++) {
> +ICPState *icp = >ss[i];
> +
> +if (!icp->output) {
> +continue;
> +}
> +monitor_printf(mon, "CPU %d XIRR=%08x (%p) PP=%02x MFRR=%02x\n",
> +   i, icp->xirr, icp->xirr_owner,
> +   icp->pending_priority, icp->mfrr);
> +}
> +
> +QLIST_FOREACH(ics, >ics, list) {
> +monitor_printf(mon, "ICS %4x..%4x %p\n",
> +   ics->offset, ics->offset + ics->nr_irqs - 1, ics);
> +
> +if (!ics->irqs) {
> +continue;
> +}
> +
> +for (i = 0; i < ics->nr_irqs; i++) {
> +ICSIRQState *irq = ics->irqs + i;
> +
> +if (!(irq->flags & XICS_FLAGS_IRQ_MASK)) {
> +continue;
> +}
> +monitor_printf(mon, "  %4x %s %02x %02x\n",
> +   ics->offset + i,
> +   (irq->flags & XICS_FLAGS_IRQ_LSI) ?
> +   "LSI" : "MSI",
> +   irq->priority, irq->status);
> +}
> +}
> +}
> +
>  /*
>   * XICS Common class - parent for emulated XICS and KVM-XICS
>   */
> @@ -190,8 +233,10 @@ static void xics_common_initfn(Object *obj)
>  static void xics_common_class_init(ObjectClass *oc, void *data)
>  {
>  DeviceClass *dc = DEVICE_CLASS(oc);
> +InterruptStatsProviderClass *ic = INTERRUPT_STATS_PROVIDER_CLASS(oc);
>  
>  dc->reset = xics_common_reset;
> +ic->print_info = xics_common_pic_print_info;
>  }
>  
>  static const TypeInfo xics_common_info = {
> @@ -201,6 +246,10 @@ static const TypeInfo xics_common_info = {
>  .class_size= sizeof(XICSStateClass),
>  .instance_init = xics_common_initfn,
>  .class_init= xics_common_class_init,
> +.interfaces = (InterfaceInfo[]) {
> +{ TYPE_INTERRUPT_STATS_PROVIDER },
> +{ }
> +},
>  };
>  
>  /*

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] invtsc + migration + TSC scaling

2016-10-17 Thread Marcelo Tosatti

On Mon, Oct 17, 2016 at 07:11:01PM -0200, Eduardo Habkost wrote:
> On Mon, Oct 17, 2016 at 06:24:38PM +0200, Paolo Bonzini wrote:
> > On 17/10/2016 16:50, Radim Krčmář wrote:
> > > 2016-10-17 07:47-0200, Marcelo Tosatti:
> [...]
> > >> since Linux guests use kvmclock and Windows guests use Hyper-V
> > >> enlightenment, it should be fine to disable 2).
> > 
> > ... and 1 too.
> > 
> > We should also blacklist the TSC deadline timer when invtsc is not
> > available.

Actually, a nicer fix would be to check the different 
frequencies and scale the deadline relative to the difference. 

This would take care of both patched and non-patched guests.

On a related note, what was the goal of Radim's paravirtual deadline
TSC timer?

> You mean on the guest-side? On the host side, it would make
> existing VMs refuse to run.
> 
> -- 
> Eduardo

Re: [Qemu-devel] [PATCH v3 0/3] Split cpu_exec_init() into an init and a realize part

2016-10-17 Thread David Gibson

On Mon, Oct 17, 2016 at 04:47:34PM -0200, Eduardo Habkost wrote:
> On Mon, Oct 17, 2016 at 02:44:04PM +1100, David Gibson wrote:
> > On Sat, Oct 15, 2016 at 12:52:46AM +0200, Laurent Vivier wrote:
> > > Since commit 42ecaba ("target-i386: Call cpu_exec_init() on realize"),
> > > , commit 6dd0f83 ("target-ppc: Move cpu_exec_init() call to realize 
> > > function"),
> > > and commit c6644fc ("s390x/cpu: Get rid of side effects when creating a 
> > > vcpu"),
> > > cpu_exec_init() has been moved to realize function for some architectures
> > > to implement CPU htoplug. This allows any failures from cpu_exec_init() 
> > > to be
> > > handled appropriately.
> > > 
> > > This series tries to do the same work for all the other CPUs.
> > > 
> > > But as the ARM Virtual Machine ("virt") needs the "memory" property of 
> > > the CPU
> > > in the machine init function (the "memory" property is created in
> > > cpu_exec_init() we want to move to the realize part), split 
> > > cpu_exec_init() in
> > > two parts: a realize part (cpu_exec_realizefn(), adding the CPU in the
> > > environment) and an init part (cpu_exec_initfn(), initializing the CPU, 
> > > like
> > > adding the "memory" property). To mirror the realize part, add an 
> > > unrealize
> > > part, and remove the cpu_exec_exit() call from the finalize part.
> > > 
> > > This also allows to remove all the 
> > > "cannot_destroy_with_object_finalize_yet"
> > > properties from the CPU device class.
> > 
> > This is looking good to me - the v3 re-org has made it quite a bit
> > easier to follow.
> > 
> > Whose tree should this go via?
> 
> I can merge it through the machine tree, if others agree.

Fine my me, fwiw.


-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

[Qemu-devel] [PATCH v2] raw_bsd: add offset and size options

2016-10-17 Thread Tomáš Golembiovský

Added two new options 'offset' and 'size'. This makes it possible to use
only part of the file as a device. This can be used e.g. to limit the
access only to single partition in a disk image or use a disk inside a
tar archive (like OVA).

When 'size' is specified we do our best to honour it.

Signed-off-by: Tomáš Golembiovský 
---
 block/raw_bsd.c  | 169 ++-
 qapi/block-core.json |  16 -
 2 files changed, 181 insertions(+), 4 deletions(-)

diff --git a/block/raw_bsd.c b/block/raw_bsd.c
index 588d408..3fb3f13 100644
--- a/block/raw_bsd.c
+++ b/block/raw_bsd.c
@@ -31,6 +31,30 @@
 #include "qapi/error.h"
 #include "qemu/option.h"
 
+typedef struct BDRVRawState {
+uint64_t offset;
+uint64_t size;
+bool has_size;
+} BDRVRawState;
+
+static QemuOptsList raw_runtime_opts = {
+.name = "raw",
+.head = QTAILQ_HEAD_INITIALIZER(raw_runtime_opts.head),
+.desc = {
+{
+.name = "offset",
+.type = QEMU_OPT_SIZE,
+.help = "offset in the disk where the image starts",
+},
+{
+.name = "size",
+.type = QEMU_OPT_SIZE,
+.help = "virtual disk size",
+},
+{ /* end of list */ }
+},
+};
+
 static QemuOptsList raw_create_opts = {
 .name = "raw-create-opts",
 .head = QTAILQ_HEAD_INITIALIZER(raw_create_opts.head),
@@ -44,17 +68,107 @@ static QemuOptsList raw_create_opts = {
 }
 };
 
+static int raw_read_options(QDict *options, BlockDriverState *bs,
+BDRVRawState *s, Error **errp)
+{
+Error *local_err = NULL;
+QemuOpts *opts = NULL;
+int64_t real_size = 0;
+int ret;
+
+real_size = bdrv_getlength(bs->file->bs);
+if (real_size < 0) {
+error_setg_errno(errp, -real_size, "Could not get image size");
+return real_size;
+}
+
+opts = qemu_opts_create(_runtime_opts, NULL, 0, _abort);
+qemu_opts_absorb_qdict(opts, options, _err);
+if (local_err) {
+error_propagate(errp, local_err);
+ret = -EINVAL;
+goto fail;
+}
+
+s->offset = qemu_opt_get_size(opts, "offset", 0);
+if (qemu_opt_find(opts, "size") != NULL) {
+s->size = qemu_opt_get_size(opts, "size", 0);
+s->has_size = true;
+} else {
+s->has_size = false;
+s->size = real_size;
+}
+
+/* Check size and offset */
+if (real_size < s->offset || (real_size - s->offset) < s->size) {
+error_setg(errp, "The sum of offset (%"PRIu64") and size "
+"(%"PRIu64") has to be smaller or equal to the "
+" actual size of the containing file (%"PRId64").",
+s->offset, s->size, real_size);
+ret = -EINVAL;
+goto fail;
+}
+
+/* Make sure size is multiple of BDRV_SECTOR_SIZE to prevent rounding
+ * up and leaking out of the specified area. */
+if (s->size != QEMU_ALIGN_DOWN(s->size, BDRV_SECTOR_SIZE)) {
+s->size = QEMU_ALIGN_DOWN(s->size, BDRV_SECTOR_SIZE);
+fprintf(stderr,
+"WARNING: Specified size is not multiple of %llu! "
+"Rounding down to %"PRIu64 ". (End of the image will be "
+"ignored.)\n",
+BDRV_SECTOR_SIZE, s->size);
+}
+
+ret = 0;
+
+fail:
+
+qemu_opts_del(opts);
+
+return ret;
+}
+
 static int raw_reopen_prepare(BDRVReopenState *reopen_state,
   BlockReopenQueue *queue, Error **errp)
 {
-return 0;
+assert(reopen_state != NULL);
+assert(reopen_state->bs != NULL);
+
+reopen_state->opaque = g_new0(BDRVRawState, 1);
+
+return raw_read_options(
+reopen_state->options,
+reopen_state->bs,
+reopen_state->opaque,
+errp);
+}
+
+static void raw_reopen_commit(BDRVReopenState *state)
+{
+BDRVRawState *new_s = state->opaque;
+BDRVRawState *s = state->bs->opaque;
+
+memcpy(s, new_s, sizeof(BDRVRawState));
+
+g_free(state->opaque);
+state->opaque = NULL;
+}
+
+static void raw_reopen_abort(BDRVReopenState *state)
+{
+g_free(state->opaque);
+state->opaque = NULL;
 }
 
 static int coroutine_fn raw_co_preadv(BlockDriverState *bs, uint64_t offset,
   uint64_t bytes, QEMUIOVector *qiov,
   int flags)
 {
+BDRVRawState *s = bs->opaque;
+
 BLKDBG_EVENT(bs->file, BLKDBG_READ_AIO);
+offset += s->offset;
 return bdrv_co_preadv(bs->file, offset, bytes, qiov, flags);
 }
 
@@ -62,11 +176,18 @@ static int coroutine_fn raw_co_pwritev(BlockDriverState 
*bs, uint64_t offset,
uint64_t bytes, QEMUIOVector *qiov,
int flags)
 {
+BDRVRawState *s = bs->opaque;
 void *buf = NULL;
 BlockDriver *drv;
 QEMUIOVector local_qiov;
 int ret;
 
+if (s->has_size && (offset > s->size || bytes > (s->size - offset))) {
+/* There's not

[Qemu-devel] [PATCH v2] Add 'offset' and 'size' options

2016-10-17 Thread Tomáš Golembiovský

This is a follow-up to the patch:
[PATCH] raw-posix: add 'offset' and 'size' options

The main changes are:
 -  options were moved from 'file' driver into 'raw' driver as suggested
 -  added support for writing, reopen and truncate when possible

If I forgot to address somebody's comments feel free to raise them again,
please.

Some general notes to the code:

1)  The size is rounded *down* to the 512 byte boundary. It's not that
the raw driver really cares about this, but if we don't do it then 
bdrv_getlength() will do that instead of us. The problem is that
bdrv_getlength() does round *up* and this can lead to reads/writes
outside the specified 'size'.

2)  We don't provide '.bdrv_get_allocated_file_size' function. As a
result the information about allocated disk size reports size of the
whole file. This is, rather confusingly, larger than the provided
'size'. But I don't think this matters much. Note that we don't have
any easy way how to get the correct information here apart from
checking all the block with bdrv_co_get_block_status() (as suggested
by Kevin Wolf).

3)  No options for raw_create(). The 'size' and 'offset' options were
added only to open/reopen. In my opinion there is no real reason for
them there. AFAIK you cannot create embeded QCOW2/VMDK/etc. image
that way anyway.


Tomáš Golembiovský (1):
  raw_bsd: add offset and size options

 block/raw_bsd.c  | 169 ++-
 qapi/block-core.json |  16 -
 2 files changed, 181 insertions(+), 4 deletions(-)

-- 
2.10.0

Re: [Qemu-devel] [PATCH 0/6] qdev class properties + abstract class support on device-list-properties

2016-10-17 Thread Andreas Färber

Am 17.10.2016 um 23:04 schrieb Eduardo Habkost:
> On Tue, Oct 11, 2016 at 05:41:13PM -0300, Eduardo Habkost wrote:
>> Eduardo Habkost (6):
>>   qdev: qdev_class_set_props() function

s/qdev_/device_/?

Regards,
Andreas

>>   qdev: Extract property-default code to qdev_property_set_to_default()
>>   qdev: Register static properties as class properties
>>   qom: object_class_property_iter_init() function
>>   qmp: Support abstract classes on device-list-properties
>>   qdev: Warning about using object_class_property_add() in new code

-- 
SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)

Re: [Qemu-devel] [PATCH 0/2] tests: A few check-qom-proplist fixes

2016-10-17 Thread Andreas Färber

Am 17.10.2016 um 23:02 schrieb Eduardo Habkost:
> Ping?
> 
> Markus, do you want to merge this through your tree?
> 
> On Tue, Oct 11, 2016 at 09:37:45AM -0300, Eduardo Habkost wrote:
>> A few fixes on check-qom-proplist that will ensure we test both
>> class properties and object properties, and catch errors when
>> registering properties in test code.
>>
>> Eduardo Habkost (2):
>>   tests: check-qom-proplist: Remove "bv" class property from class

This part I didn't quite understand from looking at the patch, but...

>>   tests: check-qom-proplist: Use _abort to catch errors

For this one:

Reviewed-by: Andreas Färber 

Regards,
Andreas

>>  tests/check-qom-proplist.c | 10 +++---
>>  1 file changed, 3 insertions(+), 7 deletions(-)

-- 
SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)

Re: [Qemu-devel] [PATCH v3 09/13] pc: kvm_apic: pass APIC ID depending on xAPIC/x2APIC mode

2016-10-17 Thread Eduardo Habkost

On Thu, Oct 13, 2016 at 11:52:43AM +0200, Igor Mammedov wrote:
> Signed-off-by: Igor Mammedov 
> ---
> v4:
>  - drop kvm_has_x2apic_api() and reuse kvm_enable_x2apic() instead
> ---
>  hw/i386/kvm/apic.c | 12 ++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/i386/kvm/apic.c b/hw/i386/kvm/apic.c
> index be55102..9a7dd03 100644
> --- a/hw/i386/kvm/apic.c
> +++ b/hw/i386/kvm/apic.c
> @@ -34,7 +34,11 @@ static void kvm_put_apic_state(APICCommonState *s, struct 
> kvm_lapic_state *kapic
>  int i;
>  
>  memset(kapic, 0, sizeof(*kapic));
> -kvm_apic_set_reg(kapic, 0x2, s->id << 24);
> +if (kvm_enable_x2apic() && s->apicbase & MSR_IA32_APICBASE_EXTD) {

This is going to enable x2apic unconditionally (not just check if
x2apic was enabled). Is this really what you want to do?


> +kvm_apic_set_reg(kapic, 0x2, s->initial_apic_id);
> +} else {
> +kvm_apic_set_reg(kapic, 0x2, s->id << 24);
> +}
>  kvm_apic_set_reg(kapic, 0x8, s->tpr);
>  kvm_apic_set_reg(kapic, 0xd, s->log_dest << 24);
>  kvm_apic_set_reg(kapic, 0xe, s->dest_mode << 28 | 0x0fff);
> @@ -59,7 +63,11 @@ void kvm_get_apic_state(DeviceState *dev, struct 
> kvm_lapic_state *kapic)
>  APICCommonState *s = APIC_COMMON(dev);
>  int i, v;
>  
> -s->id = kvm_apic_get_reg(kapic, 0x2) >> 24;
> +if (kvm_enable_x2apic() && s->apicbase & MSR_IA32_APICBASE_EXTD) {
> +assert(kvm_apic_get_reg(kapic, 0x2) == s->initial_apic_id);
> +} else {
> +s->id = kvm_apic_get_reg(kapic, 0x2) >> 24;
> +}
>  s->tpr = kvm_apic_get_reg(kapic, 0x8);
>  s->arb_id = kvm_apic_get_reg(kapic, 0x9);
>  s->log_dest = kvm_apic_get_reg(kapic, 0xd) >> 24;
> -- 
> 2.7.4
> 

-- 
Eduardo

Re: [Qemu-devel] [Qemu-stable] [Qemu-ppc] [PULL 0/4] ppc patches for qemu-2.7 stable branch

2016-10-17 Thread Peter Maydell

On 17 October 2016 at 22:24, Michael Roth  wrote:
> Quoting Peter Maydell (2016-10-17 13:45:21)
>> On 17 October 2016 at 19:13, Michael Roth  wrote:
>> > We could do both though: use some ad-hoc way to tag for a particular
>> > sub-maintainer tree/stable branch, as well as an explicit "not for
>> > master" in the cover letter ensure it doesn't go into master. It's a bit
>> > more redundant, but flexible in that people can use whatever tagging
>> > format they want for a particular tree.
>>
>> Yes, that would be my preference. Gmail's filtering is not
>> very good, and it doesn't seem to be able to support
>> multiple or complex matches on the subject line, but
>> it can deal with "doesn't include foo in body".
>> People who actively want to look for stuff not to go
>> into master can filter it however they like.
>
> Sounds good to me. For my part I think "for-2.7.1" etc. would be
> prefereable. No need to resend this patchset though.
>
> I suppose MAINTAINERS would be the best place to document something
> like this?

We have http://wiki.qemu.org/Contribute/SubmitAPullRequest
and I've added a note to it.

thanks
-- PMM

Re: [Qemu-devel] [PATCH v3 05/13] pc: leave max apic_id_limit only in legacy cpu hotplug code

2016-10-17 Thread Eduardo Habkost

On Thu, Oct 13, 2016 at 11:52:39AM +0200, Igor Mammedov wrote:
[...]
> @@ -236,7 +237,11 @@ void build_legacy_cpu_hotplug_aml(Aml *ctx, MachineState 
> *machine,
>  /* The current AML generator can cover the APIC ID range [0..255],
>   * inclusive, for VCPU hotplug. */
>  QEMU_BUILD_BUG_ON(ACPI_CPU_HOTPLUG_ID_LIMIT > 256);
> -g_assert(pcms->apic_id_limit <= ACPI_CPU_HOTPLUG_ID_LIMIT);
> +if (pcms->apic_id_limit > ACPI_CPU_HOTPLUG_ID_LIMIT) {
> +error_report("max_cpus is too large. APIC ID of last CPU is %u",
> + pcms->apic_id_limit - 1);
> +exit(1);
> +}

Moving the check here seems to make sense, but:

>  
>  /* create PCI0.PRES device and its _CRS to reserve CPU hotplug MMIO */
>  dev = aml_device("PCI0." stringify(CPU_HOTPLUG_RESOURCE_DEVICE));
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 93ff49c..f1c1013 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -778,7 +778,6 @@ static FWCfgState *bochs_bios_init(AddressSpace *as, 
> PCMachineState *pcms)

[Added more context below to show the code around the change]

>  numa_fw_cfg = g_new0(uint64_t, 1 + pcms->apic_id_limit + nb_numa_nodes);
>  numa_fw_cfg[0] = cpu_to_le64(nb_numa_nodes);
>  for (i = 0; i < max_cpus; i++) {
>  unsigned int apic_id = x86_cpu_apic_id_from_index(i);
> -assert(apic_id < pcms->apic_id_limit);

If you really needed to remove this assert, that means you can
write beyond the end of numa_fw_fg[] below. Are you sure you need
to remove it?

>  j = numa_get_node_for_cpu(i);
>  if (j < nb_numa_nodes) {
>  numa_fw_cfg[apic_id + 1] = cpu_to_le64(j);

   ^^^ here

>  }
>  }
> 
> @@ -1190,12 +1189,6 @@ void pc_cpus_init(PCMachineState *pcms)
>   * This is used for FW_CFG_MAX_CPUS. See comments on bochs_bios_init().
>   */
>  pcms->apic_id_limit = x86_cpu_apic_id_from_index(max_cpus - 1) + 1;
> -if (pcms->apic_id_limit > ACPI_CPU_HOTPLUG_ID_LIMIT) {
> -error_report("max_cpus is too large. APIC ID of last CPU is %u",
> - pcms->apic_id_limit - 1);
> -exit(1);
> -}
> -
>  pcms->possible_cpus = g_malloc0(sizeof(CPUArchIdList) +
>  sizeof(CPUArchId) * max_cpus);
>  for (i = 0; i < max_cpus; i++) {
> -- 
> 2.7.4
> 

-- 
Eduardo

Re: [Qemu-devel] [PATCH v9 00/12] Add Mediated device support

2016-10-17 Thread Alex Williamson

On Tue, 18 Oct 2016 02:52:00 +0530
Kirti Wankhede  wrote:

> This series adds Mediated device support to Linux host kernel. Purpose
> of this series is to provide a common interface for mediated device
> management that can be used by different devices. This series introduces
> Mdev core module that creates and manages mediated devices, VFIO based
> driver for mediated devices that are created by mdev core module and
> update VFIO type1 IOMMU module to support pinning & unpinning for mediated
> devices.
> 
> What changed in v9?
> mdev-core:
> - added class named 'mdev_bus' that contains links to devices that are
>   registered with the mdev core driver.
> - The [] name is created by adding the the device driver string as a
>   prefix to the string provided by the vendor driver.
> - 'device_api' attribute should be provided by vendor driver and should show
>which device API is being created, for example, "vfio-pci" for a PCI 
> device.
> - Renamed link to its type in mdev device directory to 'mdev_type'
> 
> vfio:
> - Split commits in multple individual commits
> - Added function to get device_api string based on vfio_device_info.flags.
> 
> vfio_iommu_type1:
> - Handled the case if all devices attached to the normal IOMMU API domain
>   go away and mdev device still exist in domain. Updated page accounting
>   for local domain.
> - Similarly if device is attached to normal IOMMU API domain, mappings are
>   establised and page accounting is updated accordingly.
> - Tested hot-plug and hot-unplug of vGPU and GPU pass through device with
>   Linux VM.

Hi,

I also commented that there must be an invalidation mechanism for pages
pinned by the vendor driver.  This is where pfn pinning was adjusting
accounting after a DMA_MAP, where the pfn should have been invalidated
on user unmap.  Userspace is in control of page mappings, the vendor
driver cannot maintain references to pages unmapped by the user.  I
would suggest that minimally some sort of callback needs to be
registered for every set of pinned pages to be called when the user
unmaps those IOVAs.  Thanks,

Alex

> 
> Documentation:
> - Updated Documentation and sample driver, mtty.c, accordingly.
> 
> Kirti Wankhede (12):
>   vfio: Mediated device Core driver
>   vfio: VFIO based driver for Mediated devices
>   vfio: Rearrange functions to get vfio_group from dev
>   vfio iommu: Add support for mediated devices
>   vfio: Introduce common function to add capabilities
>   vfio_pci: Update vfio_pci to use vfio_info_add_capability()
>   vfio: Introduce vfio_set_irqs_validate_and_prepare()
>   vfio_pci: Updated to use vfio_set_irqs_validate_and_prepare()
>   vfio_platform: Updated to use vfio_set_irqs_validate_and_prepare()
>   vfio: Add function to get device_api string from
> vfio_device_info.flags
>   docs: Add Documentation for Mediated devices
>   docs: Sample driver to demonstrate how to use Mediated device
> framework.
> 
>  Documentation/vfio-mdev/Makefile |   13 +
>  Documentation/vfio-mdev/mtty.c   | 1429 
> ++
>  Documentation/vfio-mdev/vfio-mediated-device.txt |  389 ++
>  drivers/vfio/Kconfig |1 +
>  drivers/vfio/Makefile|1 +
>  drivers/vfio/mdev/Kconfig|   18 +
>  drivers/vfio/mdev/Makefile   |5 +
>  drivers/vfio/mdev/mdev_core.c|  372 ++
>  drivers/vfio/mdev/mdev_driver.c  |  128 ++
>  drivers/vfio/mdev/mdev_private.h |   41 +
>  drivers/vfio/mdev/mdev_sysfs.c   |  296 +
>  drivers/vfio/mdev/vfio_mdev.c|  148 +++
>  drivers/vfio/pci/vfio_pci.c  |  101 +-
>  drivers/vfio/platform/vfio_platform_common.c |   31 +-
>  drivers/vfio/vfio.c  |  287 -
>  drivers/vfio/vfio_iommu_type1.c  |  692 +--
>  include/linux/mdev.h |  177 +++
>  include/linux/vfio.h |   23 +-
>  18 files changed, 3948 insertions(+), 204 deletions(-)
>  create mode 100644 Documentation/vfio-mdev/Makefile
>  create mode 100644 Documentation/vfio-mdev/mtty.c
>  create mode 100644 Documentation/vfio-mdev/vfio-mediated-device.txt
>  create mode 100644 drivers/vfio/mdev/Kconfig
>  create mode 100644 drivers/vfio/mdev/Makefile
>  create mode 100644 drivers/vfio/mdev/mdev_core.c
>  create mode 100644 drivers/vfio/mdev/mdev_driver.c
>  create mode 100644 drivers/vfio/mdev/mdev_private.h
>  create mode 100644 drivers/vfio/mdev/mdev_sysfs.c
>  create mode 100644 drivers/vfio/mdev/vfio_mdev.c
>  create mode 100644 include/linux/mdev.h
>

Re: [Qemu-devel] [PATCH 2/3] migrate: Share common MigrationParameters struct

2016-10-17 Thread Eric Blake

On 09/08/2016 10:14 PM, Eric Blake wrote:
> It is rather verbose, and slightly error-prone, to repeat
> the same set of parameters for input (migrate-set-parameters)
> as for output (query-migrate-parameters), where the only
> difference is whether the members are optional.  We can just
> document that the optional members will always be present
> on output, and then share a common struct between both
> commands.  The next patch can then reduce the amount of
> code needed on input.

>  '*tls-hostname': 'str'} }
> -
> -#
> -# @MigrationParameters
> -#
...
> -##
> -{ 'struct': 'MigrationParameters',
> -  'data': { 'compress-level': 'int',
> -'compress-threads': 'int',
> -'decompress-threads': 'int',
> -'cpu-throttle-initial': 'int',
> -'cpu-throttle-increment': 'int',
> -'tls-creds': 'str',
> -'tls-hostname': 'str'} }
>  ##
>  # @query-migrate-parameters

Pre-existing - there was no blank line before the docs for
query-migrate-parameters.  Commit a43edcf fixed the blank line, then the
merge conflict resolution undid things; so I've submitted a followup.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH] trivial: Restore blank line in qapi-schema

2016-10-17 Thread Eric Blake

Commit de63ab6 accidentally undid part of commit a43edcf,
because the two patches were written in parallel, and the
blank line was not noticed as a casualty of merge conflicts.

Signed-off-by: Eric Blake 
---
 qapi-schema.json | 1 +
 1 file changed, 1 insertion(+)

diff --git a/qapi-schema.json b/qapi-schema.json
index ded1179..6da520c 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -737,6 +737,7 @@
 '*tls-hostname': 'str',
 '*max-bandwidth': 'int',
 '*downtime-limit': 'int'} }
+
 ##
 # @query-migrate-parameters
 #
-- 
2.7.4

Re: [Qemu-devel] [Qemu-stable] [Qemu-ppc] [PULL 0/4] ppc patches for qemu-2.7 stable branch

2016-10-17 Thread Michael Roth

Quoting Peter Maydell (2016-10-17 13:45:21)
> On 17 October 2016 at 19:13, Michael Roth  wrote:
> > We could do both though: use some ad-hoc way to tag for a particular
> > sub-maintainer tree/stable branch, as well as an explicit "not for
> > master" in the cover letter ensure it doesn't go into master. It's a bit
> > more redundant, but flexible in that people can use whatever tagging
> > format they want for a particular tree.
> 
> Yes, that would be my preference. Gmail's filtering is not
> very good, and it doesn't seem to be able to support
> multiple or complex matches on the subject line, but
> it can deal with "doesn't include foo in body".
> People who actively want to look for stuff not to go
> into master can filter it however they like.

Sounds good to me. For my part I think "for-2.7.1" etc. would be
prefereable. No need to resend this patchset though.

I suppose MAINTAINERS would be the best place to document something
like this?

> 
> thanks
> -- PMM
>

[Qemu-devel] [PATCH v9 12/12] docs: Sample driver to demonstrate how to use Mediated device framework.

2016-10-17 Thread Kirti Wankhede

The Sample driver creates mdev device that simulates serial port over PCI
card.

Signed-off-by: Kirti Wankhede 
Signed-off-by: Neo Jia 
Change-Id: I857f8f12f8b275f2498dfe8c628a5cdc7193b1b2
---
 Documentation/vfio-mdev/Makefile |   13 +
 Documentation/vfio-mdev/mtty.c   | 1429 ++
 Documentation/vfio-mdev/vfio-mediated-device.txt |  104 +-
 3 files changed, 1544 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/vfio-mdev/Makefile
 create mode 100644 Documentation/vfio-mdev/mtty.c

diff --git a/Documentation/vfio-mdev/Makefile b/Documentation/vfio-mdev/Makefile
new file mode 100644
index ..a932edbe38eb
--- /dev/null
+++ b/Documentation/vfio-mdev/Makefile
@@ -0,0 +1,13 @@
+#
+# Makefile for mtty.c file
+#
+KERNEL_DIR:=/lib/modules/$(shell uname -r)/build
+
+obj-m:=mtty.o
+
+modules clean modules_install:
+   $(MAKE) -C $(KERNEL_DIR) SUBDIRS=$(PWD) $@
+
+default: modules
+
+module: modules
diff --git a/Documentation/vfio-mdev/mtty.c b/Documentation/vfio-mdev/mtty.c
new file mode 100644
index ..8ac321c4c8f1
--- /dev/null
+++ b/Documentation/vfio-mdev/mtty.c
@@ -0,0 +1,1429 @@
+/*
+ * Mediated virtual PCI serial host device driver
+ *
+ * Copyright (c) 2016, NVIDIA CORPORATION. All rights reserved.
+ * Author: Neo Jia 
+ * Kirti Wankhede 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Sample driver that creates mdev device that simulates serial port over PCI
+ * card.
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+/*
+ * #defines
+ */
+
+#define VERSION_STRING  "0.1"
+#define DRIVER_AUTHOR   "NVIDIA Corporation"
+
+#define MTTY_CLASS_NAME "mtty"
+
+#define MTTY_NAME   "mtty"
+
+#define MTTY_STRING_LEN16
+
+#define MTTY_CONFIG_SPACE_SIZE  0xff
+#define MTTY_IO_BAR_SIZE0x8
+#define MTTY_MMIO_BAR_SIZE  0x10
+
+#define STORE_LE16(addr, val)   (*(u16 *)addr = val)
+#define STORE_LE32(addr, val)   (*(u32 *)addr = val)
+
+#define MAX_FIFO_SIZE   16
+
+#define CIRCULAR_BUF_INC_IDX(idx)(idx = (idx + 1) & (MAX_FIFO_SIZE - 1))
+
+#define MTTY_VFIO_PCI_OFFSET_SHIFT   40
+
+#define MTTY_VFIO_PCI_OFFSET_TO_INDEX(off)   (off >> 
MTTY_VFIO_PCI_OFFSET_SHIFT)
+#define MTTY_VFIO_PCI_INDEX_TO_OFFSET(index) \
+   ((u64)(index) << MTTY_VFIO_PCI_OFFSET_SHIFT)
+#define MTTY_VFIO_PCI_OFFSET_MASK\
+   (((u64)(1) << MTTY_VFIO_PCI_OFFSET_SHIFT) - 1)
+#define MAX_MTTYS  24
+
+/*
+ * Global Structures
+ */
+
+struct mtty_dev {
+   dev_t   vd_devt;
+   struct class*vd_class;
+   struct cdev vd_cdev;
+   struct idr  vd_idr;
+   struct device   dev;
+} mtty_dev;
+
+struct mdev_region_info {
+   u64 start;
+   u64 phys_start;
+   u32 size;
+   u64 vfio_offset;
+};
+
+#if defined(DEBUG_REGS)
+const char *wr_reg[] = {
+   "TX",
+   "IER",
+   "FCR",
+   "LCR",
+   "MCR",
+   "LSR",
+   "MSR",
+   "SCR"
+};
+
+const char *rd_reg[] = {
+   "RX",
+   "IER",
+   "IIR",
+   "LCR",
+   "MCR",
+   "LSR",
+   "MSR",
+   "SCR"
+};
+#endif
+
+/* loop back buffer */
+struct rxtx {
+   u8 fifo[MAX_FIFO_SIZE];
+   u8 head, tail;
+   u8 count;
+};
+
+struct serial_port {
+   u8 uart_reg[8]; /* 8 registers */
+   struct rxtx rxtx;   /* loop back buffer */
+   bool dlab;
+   bool overrun;
+   u16 divisor;
+   u8 fcr; /* FIFO control register */
+   u8 max_fifo_size;
+   u8 intr_trigger_level;  /* interrupt trigger level */
+};
+
+/* State of each mdev device */
+struct mdev_state {
+   int irq_fd;
+   struct file *intx_file;
+   struct file *msi_file;
+   int irq_index;
+   u8 *vconfig;
+   struct mutex ops_lock;
+   struct mdev_device *mdev;
+   struct mdev_region_info region_info[VFIO_PCI_NUM_REGIONS];
+   u32 bar_mask[VFIO_PCI_NUM_REGIONS];
+   struct list_head next;
+   struct serial_port s[2];
+   struct mutex rxtx_lock;
+   struct vfio_device_info dev_info;
+   int nr_ports;
+};
+
+struct mutex mdev_list_lock;
+struct list_head mdev_devices_list;
+
+static const struct file_operations vd_fops = {
+   .owner  = THIS_MODULE,
+};
+
+/* function prototypes */
+
+static int mtty_trigger_interrupt(uuid_le uuid);
+
+/* Helper functions */
+static struct mdev_state *find_mdev_state_by_uuid(uuid_le uuid)
+{
+   struct mdev_state *mds;
+
+   list_for_each_entry(mds, _devices_list,

[Qemu-devel] [PATCH v9 06/12] vfio_pci: Update vfio_pci to use vfio_info_add_capability()

2016-10-17 Thread Kirti Wankhede

Update msix_sparse_mmap_cap() to use vfio_info_add_capability()
Update region type capability to use vfio_info_add_capability()
Can't split this commit for MSIx and region_type cap since there is a
common code which need to be updated for both the cases.

Signed-off-by: Kirti Wankhede 
Signed-off-by: Neo Jia 
Change-Id: I52bb28c7875a6da5a79ddad1843e6088aff58a45
---
 drivers/vfio/pci/vfio_pci.c | 72 +
 1 file changed, 27 insertions(+), 45 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index d624a52f..1ec0565b48ea 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -556,12 +556,12 @@ static int vfio_pci_for_each_slot_or_bus(struct pci_dev 
*pdev,
 }
 
 static int msix_sparse_mmap_cap(struct vfio_pci_device *vdev,
+   struct vfio_region_info *info,
struct vfio_info_cap *caps)
 {
-   struct vfio_info_cap_header *header;
struct vfio_region_info_cap_sparse_mmap *sparse;
size_t end, size;
-   int nr_areas = 2, i = 0;
+   int nr_areas = 2, i = 0, ret;
 
end = pci_resource_len(vdev->pdev, vdev->msix_bar);
 
@@ -572,13 +572,10 @@ static int msix_sparse_mmap_cap(struct vfio_pci_device 
*vdev,
 
size = sizeof(*sparse) + (nr_areas * sizeof(*sparse->areas));
 
-   header = vfio_info_cap_add(caps, size,
-  VFIO_REGION_INFO_CAP_SPARSE_MMAP, 1);
-   if (IS_ERR(header))
-   return PTR_ERR(header);
+   sparse = kzalloc(size, GFP_KERNEL);
+   if (!sparse)
+   return -ENOMEM;
 
-   sparse = container_of(header,
- struct vfio_region_info_cap_sparse_mmap, header);
sparse->nr_areas = nr_areas;
 
if (vdev->msix_offset & PAGE_MASK) {
@@ -594,26 +591,11 @@ static int msix_sparse_mmap_cap(struct vfio_pci_device 
*vdev,
i++;
}
 
-   return 0;
-}
-
-static int region_type_cap(struct vfio_pci_device *vdev,
-  struct vfio_info_cap *caps,
-  unsigned int type, unsigned int subtype)
-{
-   struct vfio_info_cap_header *header;
-   struct vfio_region_info_cap_type *cap;
-
-   header = vfio_info_cap_add(caps, sizeof(*cap),
-  VFIO_REGION_INFO_CAP_TYPE, 1);
-   if (IS_ERR(header))
-   return PTR_ERR(header);
+   ret = vfio_info_add_capability(info, caps,
+ VFIO_REGION_INFO_CAP_SPARSE_MMAP, sparse);
+   kfree(sparse);
 
-   cap = container_of(header, struct vfio_region_info_cap_type, header);
-   cap->type = type;
-   cap->subtype = subtype;
-
-   return 0;
+   return ret;
 }
 
 int vfio_pci_register_dev_region(struct vfio_pci_device *vdev,
@@ -704,7 +686,8 @@ static long vfio_pci_ioctl(void *device_data,
if (vdev->bar_mmap_supported[info.index]) {
info.flags |= VFIO_REGION_INFO_FLAG_MMAP;
if (info.index == vdev->msix_bar) {
-   ret = msix_sparse_mmap_cap(vdev, );
+   ret = msix_sparse_mmap_cap(vdev, ,
+  );
if (ret)
return ret;
}
@@ -752,6 +735,9 @@ static long vfio_pci_ioctl(void *device_data,
 
break;
default:
+   {
+   struct vfio_region_info_cap_type cap_type;
+
if (info.index >=
VFIO_PCI_NUM_REGIONS + vdev->num_regions)
return -EINVAL;
@@ -762,27 +748,23 @@ static long vfio_pci_ioctl(void *device_data,
info.size = vdev->region[i].size;
info.flags = vdev->region[i].flags;
 
-   ret = region_type_cap(vdev, ,
- vdev->region[i].type,
- vdev->region[i].subtype);
+   cap_type.type = vdev->region[i].type;
+   cap_type.subtype = vdev->region[i].subtype;
+
+   ret = vfio_info_add_capability(, ,
+ VFIO_REGION_INFO_CAP_TYPE,
+ _type);
if (ret)
return ret;
+
+   }
}
 
-   if (caps.size) {
-   info.flags |= VFIO_REGION_INFO_FLAG_CAPS;
-   if (info.argsz < sizeof(info) + caps.size) {
-   info.argsz = sizeof(info) +

[Qemu-devel] [PATCH v9 07/12] vfio: Introduce vfio_set_irqs_validate_and_prepare()

2016-10-17 Thread Kirti Wankhede

Vendor driver using mediated device framework would use same mechnism to
validate and prepare IRQs. Introducing this function to reduce code
replication in multiple drivers.

Signed-off-by: Kirti Wankhede 
Signed-off-by: Neo Jia 
Change-Id: Ie201f269dda0713ca18a07dc4852500bd8b48309
---
 drivers/vfio/vfio.c  | 39 +++
 include/linux/vfio.h |  4 
 2 files changed, 43 insertions(+)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index e96cb3f7a23c..10ef1c5fa762 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1877,6 +1877,45 @@ int vfio_info_add_capability(struct vfio_region_info 
*info,
 }
 EXPORT_SYMBOL(vfio_info_add_capability);
 
+int vfio_set_irqs_validate_and_prepare(struct vfio_irq_set *hdr, int num_irqs,
+  int max_irq_type, size_t *data_size)
+{
+   unsigned long minsz;
+
+   minsz = offsetofend(struct vfio_irq_set, count);
+
+   if ((hdr->argsz < minsz) || (hdr->index >= max_irq_type) ||
+   (hdr->flags & ~(VFIO_IRQ_SET_DATA_TYPE_MASK |
+   VFIO_IRQ_SET_ACTION_TYPE_MASK)))
+   return -EINVAL;
+
+   if (data_size)
+   *data_size = 0;
+
+   if (!(hdr->flags & VFIO_IRQ_SET_DATA_NONE)) {
+   size_t size;
+
+   if (hdr->flags & VFIO_IRQ_SET_DATA_BOOL)
+   size = sizeof(uint8_t);
+   else if (hdr->flags & VFIO_IRQ_SET_DATA_EVENTFD)
+   size = sizeof(int32_t);
+   else
+   return -EINVAL;
+
+   if ((hdr->argsz - minsz < hdr->count * size) ||
+   (hdr->start >= num_irqs) ||
+   (hdr->start + hdr->count > num_irqs))
+   return -EINVAL;
+
+   if (!data_size)
+   return -EINVAL;
+
+   *data_size = hdr->count * size;
+   }
+
+   return 0;
+}
+EXPORT_SYMBOL(vfio_set_irqs_validate_and_prepare);
 
 /*
  * Pin a set of guest PFNs and return their associated host PFNs for local
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 854a4b40be02..31d059f1649b 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -112,6 +112,10 @@ extern int vfio_info_add_capability(struct 
vfio_region_info *info,
struct vfio_info_cap *caps,
int cap_type_id, void *cap_type);
 
+extern int vfio_set_irqs_validate_and_prepare(struct vfio_irq_set *hdr,
+ int num_irqs, int max_irq_type,
+ size_t *data_size);
+
 struct pci_dev;
 #ifdef CONFIG_EEH
 extern void vfio_spapr_pci_eeh_open(struct pci_dev *pdev);
-- 
2.7.0

[Qemu-devel] [PATCH v9 11/12] docs: Add Documentation for Mediated devices

2016-10-17 Thread Kirti Wankhede

Add file Documentation/vfio-mediated-device.txt that include details of
mediated device framework.

Signed-off-by: Kirti Wankhede 
Signed-off-by: Neo Jia 
Change-Id: I137dd646442936090d92008b115908b7b2c7bc5d
---
 Documentation/vfio-mdev/vfio-mediated-device.txt | 289 +++
 1 file changed, 289 insertions(+)
 create mode 100644 Documentation/vfio-mdev/vfio-mediated-device.txt

diff --git a/Documentation/vfio-mdev/vfio-mediated-device.txt 
b/Documentation/vfio-mdev/vfio-mediated-device.txt
new file mode 100644
index ..8746e88dca4d
--- /dev/null
+++ b/Documentation/vfio-mdev/vfio-mediated-device.txt
@@ -0,0 +1,289 @@
+/*
+ * VFIO Mediated devices
+ *
+ * Copyright (c) 2016, NVIDIA CORPORATION. All rights reserved.
+ * Author: Neo Jia 
+ * Kirti Wankhede 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+Virtual Function I/O (VFIO) Mediated devices[1]
+===
+
+The number of use cases for virtualizing DMA devices that do not have built-in
+SR_IOV capability is increasing. Previously, to virtualize such devices,
+developers had to create their own management interfaces and APIs, and then
+integrate them with user space software. To simplify integration with user 
space
+software, we have identified common requirements and a unified management
+interface for such devices.
+
+The VFIO driver framework provides unified APIs for direct device access. It is
+an IOMMU/device-agnostic framework for exposing direct device access to user
+space in a secure, IOMMU-protected environment. This framework is used for
+multiple devices, such as GPUs, network adapters, and compute accelerators. 
With
+direct device access, virtual machines or user space applications have direct
+access to the physical device. This framework is reused for mediated devices.
+
+The mediated core driver provides a common interface for mediated device
+management that can be used by drivers of different devices. This module
+provides a generic interface to perform these operations:
+
+* Create and destroy a mediated device
+* Add a mediated device to and remove it from a mediated bus driver
+* Add a mediated device to and remove it from an IOMMU group
+
+The mediated core driver also provides an interface to register a bus driver.
+For example, the mediated VFIO mdev driver is designed for mediated devices and
+supports VFIO APIs. The mediated bus driver adds a mediated device to and
+removes it from a VFIO group.
+
+The following high-level block diagram shows the main components and interfaces
+in the VFIO mediated driver framework. The diagram shows NVIDIA, Intel, and IBM
+devices as examples, as these devices are the first devices to use this module.
+
+ +---+
+ |   |
+ | +---+ |  mdev_register_driver() +--+
+ | |   | +<+  |
+ | |  mdev | | |  |
+ | |  bus  | +>+ vfio_mdev.ko |<-> VFIO user
+ | |  driver   | | probe()/remove()|  |APIs
+ | |   | | +--+
+ | +---+ |
+ |   |
+ |  MDEV CORE|
+ |   MODULE  |
+ |   mdev.ko |
+ | +---+ |  mdev_register_device() +--+
+ | |   | +<+  |
+ | |   | | |  nvidia.ko   |<-> physical
+ | |   | +>+  |device
+ | |   | |callbacks+--+
+ | | Physical  | |
+ | |  device   | |  mdev_register_device() +--+
+ | | interface | |<+  |
+ | |   | | |  i915.ko |<-> physical
+ | |   | +>+  |device
+ | |   | |callbacks+--+
+ | |   | |
+ | |   | |  mdev_register_device() +--+
+ | |   | +<+  |
+ | |   | | | ccw_device.ko|<-> physical
+ | |   | +>+  |device
+ | |   | |callbacks+--+
+ | +---+ |
+ +---+
+
+
+Registration Interfaces
+===
+
+The mediated core driver provides the following types of registration
+interfaces:
+
+* Registration interface for a mediated bus driver
+* Physical device driver interface
+
+Registration Interface for a Mediated Bus Driver

[Qemu-devel] [PATCH v9 08/12] vfio_pci: Updated to use vfio_set_irqs_validate_and_prepare()

2016-10-17 Thread Kirti Wankhede

Updated vfio_pci.c file to use vfio_set_irqs_validate_and_prepare()

Signed-off-by: Kirti Wankhede 
Signed-off-by: Neo Jia 
Change-Id: I9f3daba89d8dba5cb5b01a8cff420412f30686c7
---
 drivers/vfio/pci/vfio_pci.c | 29 +
 1 file changed, 9 insertions(+), 20 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 1ec0565b48ea..23e7f32a4a07 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -812,35 +812,24 @@ static long vfio_pci_ioctl(void *device_data,
} else if (cmd == VFIO_DEVICE_SET_IRQS) {
struct vfio_irq_set hdr;
u8 *data = NULL;
-   int ret = 0;
+   int max, ret = 0;
+   size_t data_size = 0;
 
minsz = offsetofend(struct vfio_irq_set, count);
 
if (copy_from_user(, (void __user *)arg, minsz))
return -EFAULT;
 
-   if (hdr.argsz < minsz || hdr.index >= VFIO_PCI_NUM_IRQS ||
-   hdr.flags & ~(VFIO_IRQ_SET_DATA_TYPE_MASK |
- VFIO_IRQ_SET_ACTION_TYPE_MASK))
-   return -EINVAL;
+   max = vfio_pci_get_irq_count(vdev, hdr.index);
 
-   if (!(hdr.flags & VFIO_IRQ_SET_DATA_NONE)) {
-   size_t size;
-   int max = vfio_pci_get_irq_count(vdev, hdr.index);
-
-   if (hdr.flags & VFIO_IRQ_SET_DATA_BOOL)
-   size = sizeof(uint8_t);
-   else if (hdr.flags & VFIO_IRQ_SET_DATA_EVENTFD)
-   size = sizeof(int32_t);
-   else
-   return -EINVAL;
-
-   if (hdr.argsz - minsz < hdr.count * size ||
-   hdr.start >= max || hdr.start + hdr.count > max)
-   return -EINVAL;
+   ret = vfio_set_irqs_validate_and_prepare(, max,
+VFIO_PCI_NUM_IRQS, _size);
+   if (ret)
+   return ret;
 
+   if (data_size) {
data = memdup_user((void __user *)(arg + minsz),
-  hdr.count * size);
+   data_size);
if (IS_ERR(data))
return PTR_ERR(data);
}
-- 
2.7.0

[Qemu-devel] [PATCH v9 09/12] vfio_platform: Updated to use vfio_set_irqs_validate_and_prepare()

2016-10-17 Thread Kirti Wankhede

Updated vfio_platform_common.c file to use
vfio_set_irqs_validate_and_prepare()

Signed-off-by: Kirti Wankhede 
Signed-off-by: Neo Jia 
Change-Id: Id87cd6b78ae901610b39bf957974baa6f40cd7b0
---
 drivers/vfio/platform/vfio_platform_common.c | 31 +++-
 1 file changed, 8 insertions(+), 23 deletions(-)

diff --git a/drivers/vfio/platform/vfio_platform_common.c 
b/drivers/vfio/platform/vfio_platform_common.c
index d78142830754..4c27f4be3c3d 100644
--- a/drivers/vfio/platform/vfio_platform_common.c
+++ b/drivers/vfio/platform/vfio_platform_common.c
@@ -364,36 +364,21 @@ static long vfio_platform_ioctl(void *device_data,
struct vfio_irq_set hdr;
u8 *data = NULL;
int ret = 0;
+   size_t data_size = 0;
 
minsz = offsetofend(struct vfio_irq_set, count);
 
if (copy_from_user(, (void __user *)arg, minsz))
return -EFAULT;
 
-   if (hdr.argsz < minsz)
-   return -EINVAL;
-
-   if (hdr.index >= vdev->num_irqs)
-   return -EINVAL;
-
-   if (hdr.flags & ~(VFIO_IRQ_SET_DATA_TYPE_MASK |
- VFIO_IRQ_SET_ACTION_TYPE_MASK))
-   return -EINVAL;
-
-   if (!(hdr.flags & VFIO_IRQ_SET_DATA_NONE)) {
-   size_t size;
-
-   if (hdr.flags & VFIO_IRQ_SET_DATA_BOOL)
-   size = sizeof(uint8_t);
-   else if (hdr.flags & VFIO_IRQ_SET_DATA_EVENTFD)
-   size = sizeof(int32_t);
-   else
-   return -EINVAL;
-
-   if (hdr.argsz - minsz < size)
-   return -EINVAL;
+   ret = vfio_set_irqs_validate_and_prepare(, vdev->num_irqs,
+vdev->num_irqs, _size);
+   if (ret)
+   return ret;
 
-   data = memdup_user((void __user *)(arg + minsz), size);
+   if (data_size) {
+   data = memdup_user((void __user *)(arg + minsz),
+   data_size);
if (IS_ERR(data))
return PTR_ERR(data);
}
-- 
2.7.0

[Qemu-devel] [PATCH v9 05/12] vfio: Introduce common function to add capabilities

2016-10-17 Thread Kirti Wankhede

Vendor driver using mediated device framework should use
vfio_info_add_capability() to add capabilities.
Introduced this function to reduce code duplication in vendor drivers.

Signed-off-by: Kirti Wankhede 
Signed-off-by: Neo Jia 
Change-Id: I6fca329fa2291f37a2c859d0bc97574d9e2ce1a6
---
 drivers/vfio/vfio.c  | 78 
 include/linux/vfio.h |  4 +++
 2 files changed, 82 insertions(+)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index a5a210005b65..e96cb3f7a23c 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1799,6 +1799,84 @@ void vfio_info_cap_shift(struct vfio_info_cap *caps, 
size_t offset)
 }
 EXPORT_SYMBOL_GPL(vfio_info_cap_shift);
 
+static int sparse_mmap_cap(struct vfio_info_cap *caps, void *cap_type)
+{
+   struct vfio_info_cap_header *header;
+   struct vfio_region_info_cap_sparse_mmap *sparse_cap, *sparse = cap_type;
+   size_t size;
+
+   size = sizeof(*sparse) + sparse->nr_areas *  sizeof(*sparse->areas);
+   header = vfio_info_cap_add(caps, size,
+  VFIO_REGION_INFO_CAP_SPARSE_MMAP, 1);
+   if (IS_ERR(header))
+   return PTR_ERR(header);
+
+   sparse_cap = container_of(header,
+   struct vfio_region_info_cap_sparse_mmap, header);
+   sparse_cap->nr_areas = sparse->nr_areas;
+   memcpy(sparse_cap->areas, sparse->areas,
+  sparse->nr_areas * sizeof(*sparse->areas));
+   return 0;
+}
+
+static int region_type_cap(struct vfio_info_cap *caps, void *cap_type)
+{
+   struct vfio_info_cap_header *header;
+   struct vfio_region_info_cap_type *type_cap, *cap = cap_type;
+
+   header = vfio_info_cap_add(caps, sizeof(*cap),
+  VFIO_REGION_INFO_CAP_TYPE, 1);
+   if (IS_ERR(header))
+   return PTR_ERR(header);
+
+   type_cap = container_of(header, struct vfio_region_info_cap_type,
+   header);
+   type_cap->type = cap->type;
+   type_cap->subtype = cap->subtype;
+   return 0;
+}
+
+int vfio_info_add_capability(struct vfio_region_info *info,
+struct vfio_info_cap *caps,
+int cap_type_id,
+void *cap_type)
+{
+   int ret;
+
+   if (!cap_type)
+   return 0;
+
+   switch (cap_type_id) {
+   case VFIO_REGION_INFO_CAP_SPARSE_MMAP:
+   ret = sparse_mmap_cap(caps, cap_type);
+   if (ret)
+   return ret;
+   break;
+
+   case VFIO_REGION_INFO_CAP_TYPE:
+   ret = region_type_cap(caps, cap_type);
+   if (ret)
+   return ret;
+   break;
+   default:
+   return -EINVAL;
+   }
+
+   info->flags |= VFIO_REGION_INFO_FLAG_CAPS;
+
+   if (caps->size) {
+   if (info->argsz < sizeof(*info) + caps->size) {
+   info->argsz = sizeof(*info) + caps->size;
+   info->cap_offset = 0;
+   } else {
+   vfio_info_cap_shift(caps, sizeof(*info));
+   info->cap_offset = sizeof(*info);
+   }
+   }
+   return 0;
+}
+EXPORT_SYMBOL(vfio_info_add_capability);
+
 
 /*
  * Pin a set of guest PFNs and return their associated host PFNs for local
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 0bd25ba6223d..854a4b40be02 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -108,6 +108,10 @@ extern struct vfio_info_cap_header *vfio_info_cap_add(
struct vfio_info_cap *caps, size_t size, u16 id, u16 version);
 extern void vfio_info_cap_shift(struct vfio_info_cap *caps, size_t offset);
 
+extern int vfio_info_add_capability(struct vfio_region_info *info,
+   struct vfio_info_cap *caps,
+   int cap_type_id, void *cap_type);
+
 struct pci_dev;
 #ifdef CONFIG_EEH
 extern void vfio_spapr_pci_eeh_open(struct pci_dev *pdev);
-- 
2.7.0

[Qemu-devel] [PATCH v9 10/12] vfio: Add function to get device_api string from vfio_device_info.flags

2016-10-17 Thread Kirti Wankhede

Function vfio_device_api_string() returns string based on flag set in
vfio_device_info's flag. This should be used by vendor driver to get string
based on flag for device_api attribute.

Signed-off-by: Kirti Wankhede 
Signed-off-by: Neo Jia 
Change-Id: I42d29f475f02a7132ce13297fbf2b48f1da10995
---
 drivers/vfio/vfio.c  | 15 +++
 include/linux/vfio.h |  2 ++
 2 files changed, 17 insertions(+)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 10ef1c5fa762..aec470454a13 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1917,6 +1917,21 @@ int vfio_set_irqs_validate_and_prepare(struct 
vfio_irq_set *hdr, int num_irqs,
 }
 EXPORT_SYMBOL(vfio_set_irqs_validate_and_prepare);
 
+const char *vfio_device_api_string(u32 flags)
+{
+   if (flags & VFIO_DEVICE_FLAGS_PCI)
+   return "vfio-pci";
+
+   if (flags & VFIO_DEVICE_FLAGS_PLATFORM)
+   return "vfio-platform";
+
+   if (flags & VFIO_DEVICE_FLAGS_AMBA)
+   return "vfio-amba";
+
+   return "";
+}
+EXPORT_SYMBOL(vfio_device_api_string);
+
 /*
  * Pin a set of guest PFNs and return their associated host PFNs for local
  * domain only.
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 31d059f1649b..fca2bf23c4f1 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -116,6 +116,8 @@ extern int vfio_set_irqs_validate_and_prepare(struct 
vfio_irq_set *hdr,
  int num_irqs, int max_irq_type,
  size_t *data_size);
 
+extern const char *vfio_device_api_string(u32 flags);
+
 struct pci_dev;
 #ifdef CONFIG_EEH
 extern void vfio_spapr_pci_eeh_open(struct pci_dev *pdev);
-- 
2.7.0

[Qemu-devel] [PATCH v9 01/12] vfio: Mediated device Core driver

2016-10-17 Thread Kirti Wankhede

Design for Mediated Device Driver:
Main purpose of this driver is to provide a common interface for mediated
device management that can be used by different drivers of different
devices.

This module provides a generic interface to create the device, add it to
mediated bus, add device to IOMMU group and then add it to vfio group.

Below is the high Level block diagram, with Nvidia, Intel and IBM devices
as example, since these are the devices which are going to actively use
this module as of now.

 +---+
 |   |
 | +---+ |  mdev_register_driver() +--+
 | |   | +<+ __init() |
 | |  mdev | | |  |
 | |  bus  | +>+  |<-> VFIO user
 | |  driver   | | probe()/remove()| vfio_mdev.ko |APIs
 | |   | | |  |
 | +---+ | +--+
 |   |
 |  MDEV CORE|
 |   MODULE  |
 |   mdev.ko |
 | +---+ |  mdev_register_device() +--+
 | |   | +<+  |
 | |   | | |  nvidia.ko   |<-> physical
 | |   | +>+  |device
 | |   | |callback +--+
 | | Physical  | |
 | |  device   | |  mdev_register_device() +--+
 | | interface | |<+  |
 | |   | | |  i915.ko |<-> physical
 | |   | +>+  |device
 | |   | |callback +--+
 | |   | |
 | |   | |  mdev_register_device() +--+
 | |   | +<+  |
 | |   | | | ccw_device.ko|<-> physical
 | |   | +>+  |device
 | |   | |callback +--+
 | +---+ |
 +---+

Core driver provides two types of registration interfaces:
1. Registration interface for mediated bus driver:

/**
  * struct mdev_driver - Mediated device's driver
  * @name: driver name
  * @probe: called when new device created
  * @remove:called when device removed
  * @driver:device driver structure
  *
  **/
struct mdev_driver {
 const char *name;
 int  (*probe)  (struct device *dev);
 void (*remove) (struct device *dev);
 struct device_driverdriver;
};

Mediated bus driver for mdev device should use this interface to register
and unregister with core driver respectively:

int  mdev_register_driver(struct mdev_driver *drv, struct module *owner);
void mdev_unregister_driver(struct mdev_driver *drv);

Medisted bus driver is responsible to add/delete mediated devices to/from
VFIO group when devices are bound and unbound to the driver.

2. Physical device driver interface
This interface provides vendor driver the set APIs to manage physical
device related work in its driver. APIs are :

* dev_attr_groups: attributes of the parent device.
* mdev_attr_groups: attributes of the mediated device.
* supported_type_groups: attributes to define supported type. This is
 mandatory field.
* create: to allocate basic resources in driver for a mediated device.
* remove: to free resources in driver when mediated device is destroyed.
* open: open callback of mediated device
* release: release callback of mediated device
* read : read emulation callback.
* write: write emulation callback.
* mmap: mmap emulation callback.
* ioctl: ioctl callback.

Drivers should use these interfaces to register and unregister device to
mdev core driver respectively:

extern int  mdev_register_device(struct device *dev,
 const struct parent_ops *ops);
extern void mdev_unregister_device(struct device *dev);

There are no locks to serialize above callbacks in mdev driver and
vfio_mdev driver. If required, vendor driver can have locks to serialize
above APIs in their driver.

Signed-off-by: Kirti Wankhede 
Signed-off-by: Neo Jia 
Change-Id: I73a5084574270b14541c529461ea2f03c292d510
---
 drivers/vfio/Kconfig |   1 +
 drivers/vfio/Makefile|   1 +
 drivers/vfio/mdev/Kconfig|  11 ++
 drivers/vfio/mdev/Makefile   |   4 +
 drivers/vfio/mdev/mdev_core.c| 372 +++
 drivers/vfio/mdev/mdev_driver.c  | 128 ++
 drivers/vfio/mdev/mdev_private.h |  41 +
 drivers/vfio/mdev/mdev_sysfs.c   | 296 +++
 include/linux/mdev.h | 177 +++
 9 files changed, 1031 insertions(+)
 create mode 100644 drivers/vfio/mdev/Kconfig
 create mode 100644 drivers/vfio/mdev/Makefile
 create mode 100644 drivers/vfio/mdev/mdev_core.c
 create mode

[Qemu-devel] [PATCH v9 04/12] vfio iommu: Add support for mediated devices

2016-10-17 Thread Kirti Wankhede

VFIO IOMMU drivers are designed for the devices which are IOMMU capable.
Mediated device only uses IOMMU APIs, the underlying hardware can be
managed by an IOMMU domain.

Aim of this change is:
- To use most of the code of TYPE1 IOMMU driver for mediated devices
- To support direct assigned device and mediated device in single module

Added two new callback functions to struct vfio_iommu_driver_ops. Backend
IOMMU module that supports pining and unpinning pages for mdev devices
should provide these functions.
Added APIs for pining and unpining pages to VFIO module. These calls back
into backend iommu module to actually pin and unpin pages.

This change adds pin and unpin support for mediated device to TYPE1 IOMMU
backend module. More details:
- When iommu_group of mediated devices is attached, task structure is
  cached which is used later to pin pages and page accounting.
- It keeps track of pinned pages for mediated domain. This data is used to
  verify unpinning request and to unpin remaining pages while detaching, if
  there are any.
- Used existing mechanism for page accounting. If iommu capable domain
  exist in the container then all pages are already pinned and accounted.
  Accouting for mdev device is only done if there is no iommu capable
  domain in the container.
- Page accouting is updated on hot plug and unplug mdev device and pass
  through device.

Tested by assigning below combinations of devices to a single VM:
- GPU pass through only
- vGPU device only
- One GPU pass through and one vGPU device
- Linux VM hot plug and unplug vGPU device while GPU pass through device
  exist
- Linux VM hot plug and unplug GPU pass through device while vGPU device
  exist

Signed-off-by: Kirti Wankhede 
Signed-off-by: Neo Jia 
Change-Id: I295d6f0f2e0579b8d9882bfd8fd5a4194b97bd9a
---
 drivers/vfio/vfio.c |  98 ++
 drivers/vfio/vfio_iommu_type1.c | 692 ++--
 include/linux/vfio.h|  13 +-
 3 files changed, 707 insertions(+), 96 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 2e83bdf007fe..a5a210005b65 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1799,6 +1799,104 @@ void vfio_info_cap_shift(struct vfio_info_cap *caps, 
size_t offset)
 }
 EXPORT_SYMBOL_GPL(vfio_info_cap_shift);
 
+
+/*
+ * Pin a set of guest PFNs and return their associated host PFNs for local
+ * domain only.
+ * @dev [in] : device
+ * @user_pfn [in]: array of user/guest PFNs
+ * @npage [in]: count of array elements
+ * @prot [in] : protection flags
+ * @phys_pfn[out] : array of host PFNs
+ */
+long vfio_pin_pages(struct device *dev, unsigned long *user_pfn,
+   long npage, int prot, unsigned long *phys_pfn)
+{
+   struct vfio_container *container;
+   struct vfio_group *group;
+   struct vfio_iommu_driver *driver;
+   ssize_t ret = -EINVAL;
+
+   if (!dev || !user_pfn || !phys_pfn)
+   return -EINVAL;
+
+   group = vfio_group_get_from_dev(dev);
+   if (IS_ERR(group))
+   return PTR_ERR(group);
+
+   ret = vfio_group_add_container_user(group);
+   if (ret)
+   goto err_pin_pages;
+
+   container = group->container;
+   if (IS_ERR(container)) {
+   ret = PTR_ERR(container);
+   goto err_pin_pages;
+   }
+
+   down_read(>group_lock);
+
+   driver = container->iommu_driver;
+   if (likely(driver && driver->ops->pin_pages))
+   ret = driver->ops->pin_pages(container->iommu_data, user_pfn,
+npage, prot, phys_pfn);
+
+   up_read(>group_lock);
+   vfio_group_try_dissolve_container(group);
+
+err_pin_pages:
+   vfio_group_put(group);
+   return ret;
+
+}
+EXPORT_SYMBOL(vfio_pin_pages);
+
+/*
+ * Unpin set of host PFNs for local domain only.
+ * @dev [in] : device
+ * @pfn [in] : array of host PFNs to be unpinned.
+ * @npage [in] :count of elements in array, that is number of pages.
+ */
+long vfio_unpin_pages(struct device *dev, unsigned long *pfn, long npage)
+{
+   struct vfio_container *container;
+   struct vfio_group *group;
+   struct vfio_iommu_driver *driver;
+   ssize_t ret = -EINVAL;
+
+   if (!dev || !pfn)
+   return -EINVAL;
+
+   group = vfio_group_get_from_dev(dev);
+   if (IS_ERR(group))
+   return PTR_ERR(group);
+
+   ret = vfio_group_add_container_user(group);
+   if (ret)
+   goto err_unpin_pages;
+
+   container = group->container;
+   if (IS_ERR(container)) {
+   ret = PTR_ERR(container);
+   goto err_unpin_pages;
+   }
+
+   down_read(>group_lock);
+
+   driver = container->iommu_driver;
+   if (likely(driver && driver->ops->unpin_pages))
+   ret = driver->ops->unpin_pages(container->iommu_data, pfn,
+

[Qemu-devel] [PATCH v9 03/12] vfio: Rearrange functions to get vfio_group from dev

2016-10-17 Thread Kirti Wankhede

Rearrange functions to have common function to increment container_users.

Signed-off-by: Kirti Wankhede 
Signed-off-by: Neo Jia 
Change-Id: I1f93262bdbab75094bc24b087b29da35ba70c4c6
---
 drivers/vfio/vfio.c | 57 ++---
 1 file changed, 37 insertions(+), 20 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index d1d70e0b011b..2e83bdf007fe 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -480,6 +480,21 @@ static struct vfio_group *vfio_group_get_from_minor(int 
minor)
return group;
 }
 
+static struct vfio_group *vfio_group_get_from_dev(struct device *dev)
+{
+   struct iommu_group *iommu_group;
+   struct vfio_group *group;
+
+   iommu_group = iommu_group_get(dev);
+   if (!iommu_group)
+   return NULL;
+
+   group = vfio_group_get_from_iommu(iommu_group);
+   iommu_group_put(iommu_group);
+
+   return group;
+}
+
 /**
  * Device objects - create, release, get, put, search
  */
@@ -811,16 +826,10 @@ EXPORT_SYMBOL_GPL(vfio_add_group_dev);
  */
 struct vfio_device *vfio_device_get_from_dev(struct device *dev)
 {
-   struct iommu_group *iommu_group;
struct vfio_group *group;
struct vfio_device *device;
 
-   iommu_group = iommu_group_get(dev);
-   if (!iommu_group)
-   return NULL;
-
-   group = vfio_group_get_from_iommu(iommu_group);
-   iommu_group_put(iommu_group);
+   group = vfio_group_get_from_dev(dev);
if (!group)
return NULL;
 
@@ -1376,6 +1385,23 @@ static bool vfio_group_viable(struct vfio_group *group)
 group, vfio_dev_viable) == 0);
 }
 
+static int vfio_group_add_container_user(struct vfio_group *group)
+{
+   if (!atomic_inc_not_zero(>container_users))
+   return -EINVAL;
+
+   if (group->noiommu) {
+   atomic_dec(>container_users);
+   return -EPERM;
+   }
+   if (!group->container->iommu_driver || !vfio_group_viable(group)) {
+   atomic_dec(>container_users);
+   return -EINVAL;
+   }
+
+   return 0;
+}
+
 static const struct file_operations vfio_device_fops;
 
 static int vfio_group_get_device_fd(struct vfio_group *group, char *buf)
@@ -1685,23 +1711,14 @@ static const struct file_operations vfio_device_fops = {
 struct vfio_group *vfio_group_get_external_user(struct file *filep)
 {
struct vfio_group *group = filep->private_data;
+   int ret;
 
if (filep->f_op != _group_fops)
return ERR_PTR(-EINVAL);
 
-   if (!atomic_inc_not_zero(>container_users))
-   return ERR_PTR(-EINVAL);
-
-   if (group->noiommu) {
-   atomic_dec(>container_users);
-   return ERR_PTR(-EPERM);
-   }
-
-   if (!group->container->iommu_driver ||
-   !vfio_group_viable(group)) {
-   atomic_dec(>container_users);
-   return ERR_PTR(-EINVAL);
-   }
+   ret = vfio_group_add_container_user(group);
+   if (ret)
+   return ERR_PTR(ret);
 
vfio_group_get(group);
 
-- 
2.7.0

[Qemu-devel] [PATCH v9 00/12] Add Mediated device support

2016-10-17 Thread Kirti Wankhede

This series adds Mediated device support to Linux host kernel. Purpose
of this series is to provide a common interface for mediated device
management that can be used by different devices. This series introduces
Mdev core module that creates and manages mediated devices, VFIO based
driver for mediated devices that are created by mdev core module and
update VFIO type1 IOMMU module to support pinning & unpinning for mediated
devices.

What changed in v9?
mdev-core:
- added class named 'mdev_bus' that contains links to devices that are
  registered with the mdev core driver.
- The [] name is created by adding the the device driver string as a
  prefix to the string provided by the vendor driver.
- 'device_api' attribute should be provided by vendor driver and should show
   which device API is being created, for example, "vfio-pci" for a PCI device.
- Renamed link to its type in mdev device directory to 'mdev_type'

vfio:
- Split commits in multple individual commits
- Added function to get device_api string based on vfio_device_info.flags.

vfio_iommu_type1:
- Handled the case if all devices attached to the normal IOMMU API domain
  go away and mdev device still exist in domain. Updated page accounting
  for local domain.
- Similarly if device is attached to normal IOMMU API domain, mappings are
  establised and page accounting is updated accordingly.
- Tested hot-plug and hot-unplug of vGPU and GPU pass through device with
  Linux VM.

Documentation:
- Updated Documentation and sample driver, mtty.c, accordingly.

Kirti Wankhede (12):
  vfio: Mediated device Core driver
  vfio: VFIO based driver for Mediated devices
  vfio: Rearrange functions to get vfio_group from dev
  vfio iommu: Add support for mediated devices
  vfio: Introduce common function to add capabilities
  vfio_pci: Update vfio_pci to use vfio_info_add_capability()
  vfio: Introduce vfio_set_irqs_validate_and_prepare()
  vfio_pci: Updated to use vfio_set_irqs_validate_and_prepare()
  vfio_platform: Updated to use vfio_set_irqs_validate_and_prepare()
  vfio: Add function to get device_api string from
vfio_device_info.flags
  docs: Add Documentation for Mediated devices
  docs: Sample driver to demonstrate how to use Mediated device
framework.

 Documentation/vfio-mdev/Makefile |   13 +
 Documentation/vfio-mdev/mtty.c   | 1429 ++
 Documentation/vfio-mdev/vfio-mediated-device.txt |  389 ++
 drivers/vfio/Kconfig |1 +
 drivers/vfio/Makefile|1 +
 drivers/vfio/mdev/Kconfig|   18 +
 drivers/vfio/mdev/Makefile   |5 +
 drivers/vfio/mdev/mdev_core.c|  372 ++
 drivers/vfio/mdev/mdev_driver.c  |  128 ++
 drivers/vfio/mdev/mdev_private.h |   41 +
 drivers/vfio/mdev/mdev_sysfs.c   |  296 +
 drivers/vfio/mdev/vfio_mdev.c|  148 +++
 drivers/vfio/pci/vfio_pci.c  |  101 +-
 drivers/vfio/platform/vfio_platform_common.c |   31 +-
 drivers/vfio/vfio.c  |  287 -
 drivers/vfio/vfio_iommu_type1.c  |  692 +--
 include/linux/mdev.h |  177 +++
 include/linux/vfio.h |   23 +-
 18 files changed, 3948 insertions(+), 204 deletions(-)
 create mode 100644 Documentation/vfio-mdev/Makefile
 create mode 100644 Documentation/vfio-mdev/mtty.c
 create mode 100644 Documentation/vfio-mdev/vfio-mediated-device.txt
 create mode 100644 drivers/vfio/mdev/Kconfig
 create mode 100644 drivers/vfio/mdev/Makefile
 create mode 100644 drivers/vfio/mdev/mdev_core.c
 create mode 100644 drivers/vfio/mdev/mdev_driver.c
 create mode 100644 drivers/vfio/mdev/mdev_private.h
 create mode 100644 drivers/vfio/mdev/mdev_sysfs.c
 create mode 100644 drivers/vfio/mdev/vfio_mdev.c
 create mode 100644 include/linux/mdev.h

-- 
2.7.0

[Qemu-devel] [PATCH v9 02/12] vfio: VFIO based driver for Mediated devices

2016-10-17 Thread Kirti Wankhede

vfio_mdev driver registers with mdev core driver.
MDEV core driver creates mediated device and calls probe routine of
vfio_mdev driver for each device.
Probe routine of vfio_mdev driver adds mediated device to VFIO core module

This driver forms a shim layer that pass through VFIO devices operations
to vendor driver for mediated devices.

Signed-off-by: Kirti Wankhede 
Signed-off-by: Neo Jia 
Change-Id: I583f4734752971d3d112324d69e2508c88f359ec
---
 drivers/vfio/mdev/Kconfig |   7 ++
 drivers/vfio/mdev/Makefile|   1 +
 drivers/vfio/mdev/vfio_mdev.c | 148 ++
 3 files changed, 156 insertions(+)
 create mode 100644 drivers/vfio/mdev/vfio_mdev.c

diff --git a/drivers/vfio/mdev/Kconfig b/drivers/vfio/mdev/Kconfig
index 93addace9a67..6cef0c4d2ceb 100644
--- a/drivers/vfio/mdev/Kconfig
+++ b/drivers/vfio/mdev/Kconfig
@@ -9,3 +9,10 @@ config VFIO_MDEV
See Documentation/vfio-mdev/vfio-mediated-device.txt for more details.
 
 If you don't know what do here, say N.
+
+config VFIO_MDEV_DEVICE
+tristate "VFIO support for Mediated devices"
+depends on VFIO && VFIO_MDEV
+default n
+help
+VFIO based driver for mediated devices.
diff --git a/drivers/vfio/mdev/Makefile b/drivers/vfio/mdev/Makefile
index 31bc04801d94..fa2d5ea466ee 100644
--- a/drivers/vfio/mdev/Makefile
+++ b/drivers/vfio/mdev/Makefile
@@ -2,3 +2,4 @@
 mdev-y := mdev_core.o mdev_sysfs.o mdev_driver.o
 
 obj-$(CONFIG_VFIO_MDEV) += mdev.o
+obj-$(CONFIG_VFIO_MDEV_DEVICE) += vfio_mdev.o
diff --git a/drivers/vfio/mdev/vfio_mdev.c b/drivers/vfio/mdev/vfio_mdev.c
new file mode 100644
index ..b7b47604ce7a
--- /dev/null
+++ b/drivers/vfio/mdev/vfio_mdev.c
@@ -0,0 +1,148 @@
+/*
+ * VFIO based driver for Mediated device
+ *
+ * Copyright (c) 2016, NVIDIA CORPORATION. All rights reserved.
+ * Author: Neo Jia 
+ *Kirti Wankhede 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "mdev_private.h"
+
+#define DRIVER_VERSION  "0.1"
+#define DRIVER_AUTHOR   "NVIDIA Corporation"
+#define DRIVER_DESC "VFIO based driver for Mediated device"
+
+static int vfio_mdev_open(void *device_data)
+{
+   struct mdev_device *mdev = device_data;
+   struct parent_device *parent = mdev->parent;
+   int ret;
+
+   if (unlikely(!parent->ops->open))
+   return -EINVAL;
+
+   if (!try_module_get(THIS_MODULE))
+   return -ENODEV;
+
+   ret = parent->ops->open(mdev);
+   if (ret)
+   module_put(THIS_MODULE);
+
+   return ret;
+}
+
+static void vfio_mdev_release(void *device_data)
+{
+   struct mdev_device *mdev = device_data;
+   struct parent_device *parent = mdev->parent;
+
+   if (parent->ops->release)
+   parent->ops->release(mdev);
+
+   module_put(THIS_MODULE);
+}
+
+static long vfio_mdev_unlocked_ioctl(void *device_data,
+unsigned int cmd, unsigned long arg)
+{
+   struct mdev_device *mdev = device_data;
+   struct parent_device *parent = mdev->parent;
+
+   if (unlikely(!parent->ops->ioctl))
+   return -EINVAL;
+
+   return parent->ops->ioctl(mdev, cmd, arg);
+}
+
+static ssize_t vfio_mdev_read(void *device_data, char __user *buf,
+ size_t count, loff_t *ppos)
+{
+   struct mdev_device *mdev = device_data;
+   struct parent_device *parent = mdev->parent;
+
+   if (unlikely(!parent->ops->read))
+   return -EINVAL;
+
+   return parent->ops->read(mdev, buf, count, ppos);
+}
+
+static ssize_t vfio_mdev_write(void *device_data, const char __user *buf,
+  size_t count, loff_t *ppos)
+{
+   struct mdev_device *mdev = device_data;
+   struct parent_device *parent = mdev->parent;
+
+   if (unlikely(!parent->ops->write))
+   return -EINVAL;
+
+   return parent->ops->write(mdev, buf, count, ppos);
+}
+
+static int vfio_mdev_mmap(void *device_data, struct vm_area_struct *vma)
+{
+   struct mdev_device *mdev = device_data;
+   struct parent_device *parent = mdev->parent;
+
+   if (unlikely(!parent->ops->mmap))
+   return -EINVAL;
+
+   return parent->ops->mmap(mdev, vma);
+}
+
+static const struct vfio_device_ops vfio_mdev_dev_ops = {
+   .name   = "vfio-mdev",
+   .open   = vfio_mdev_open,
+   .release= vfio_mdev_release,
+   .ioctl  = vfio_mdev_unlocked_ioctl,
+   .read   = vfio_mdev_read,
+   .write  = vfio_mdev_write,
+   .mmap   = vfio_mdev_mmap,
+};
+
+int vfio_mdev_probe(struct device

Re: [Qemu-devel] invtsc + migration + TSC scaling

2016-10-17 Thread Eduardo Habkost

On Mon, Oct 17, 2016 at 06:24:38PM +0200, Paolo Bonzini wrote:
> On 17/10/2016 16:50, Radim Krčmář wrote:
> > 2016-10-17 07:47-0200, Marcelo Tosatti:
[...]
> >> since Linux guests use kvmclock and Windows guests use Hyper-V
> >> enlightenment, it should be fine to disable 2).
> 
> ... and 1 too.
> 
> We should also blacklist the TSC deadline timer when invtsc is not
> available.

You mean on the guest-side? On the host side, it would make
existing VMs refuse to run.

-- 
Eduardo

Re: [Qemu-devel] [PATCH 0/6] qdev class properties + abstract class support on device-list-properties

2016-10-17 Thread Eduardo Habkost

Ping?

On Tue, Oct 11, 2016 at 05:41:13PM -0300, Eduardo Habkost wrote:
> This series allows abstract classes to be used on
> device-list-properties, which will return all class properties
> registered for the class.
> 
> Patches 1-3 change qdev to register all static properties as
> class properties instead of instance properties.
> 
> Patches 4-5 change device-list-properties so it can return the
> list of properties for abstract classes.
> 
> Patch 6 just adds a warning to people to not use
> qdev_property_add_static() in new code.
> 
> The series is based on the "tests: A few check-qom-proplist
> fixes" series I have submitted earlier. A git branch containing
> this series can be found at:
>   https://github.com/ehabkost/qemu-hacks.git 
> work/device-list-abstract-properties
> 
> Eduardo Habkost (6):
>   qdev: qdev_class_set_props() function
>   qdev: Extract property-default code to qdev_property_set_to_default()
>   qdev: Register static properties as class properties
>   qom: object_class_property_iter_init() function
>   qmp: Support abstract classes on device-list-properties
>   qdev: Warning about using object_class_property_add() in new code
> 
>  hw/9pfs/virtio-9p-device.c  |   2 +-
>  hw/acpi/piix4.c |   2 +-
>  hw/arm/armv7m.c |   2 +-
>  hw/arm/bcm2836.c|   2 +-
>  hw/arm/integratorcp.c   |   2 +-
>  hw/arm/musicpal.c   |   2 +-
>  hw/arm/pxa2xx.c |   4 +-
>  hw/arm/pxa2xx_gpio.c|   2 +-
>  hw/arm/spitz.c  |   2 +-
>  hw/arm/stm32f205_soc.c  |   2 +-
>  hw/arm/strongarm.c  |   2 +-
>  hw/arm/xlnx-zynqmp.c|   2 +-
>  hw/audio/ac97.c |   2 +-
>  hw/audio/adlib.c|   2 +-
>  hw/audio/cs4231.c   |   2 +-
>  hw/audio/cs4231a.c  |   2 +-
>  hw/audio/gus.c  |   2 +-
>  hw/audio/hda-codec.c|   2 +-
>  hw/audio/intel-hda.c|   4 +-
>  hw/audio/marvell_88w8618.c  |   2 +-
>  hw/audio/pcspk.c|   2 +-
>  hw/audio/pl041.c|   2 +-
>  hw/audio/sb16.c |   2 +-
>  hw/block/fdc.c  |   6 +-
>  hw/block/m25p80.c   |   2 +-
>  hw/block/nand.c |   2 +-
>  hw/block/nvme.c |   2 +-
>  hw/block/onenand.c  |   2 +-
>  hw/block/pflash_cfi01.c |   2 +-
>  hw/block/pflash_cfi02.c |   2 +-
>  hw/block/virtio-blk.c   |   2 +-
>  hw/char/bcm2835_aux.c   |   2 +-
>  hw/char/cadence_uart.c  |   2 +-
>  hw/char/debugcon.c  |   2 +-
>  hw/char/digic-uart.c|   2 +-
>  hw/char/escc.c  |   2 +-
>  hw/char/etraxfs_ser.c   |   2 +-
>  hw/char/exynos4210_uart.c   |   2 +-
>  hw/char/grlib_apbuart.c |   2 +-
>  hw/char/imx_serial.c|   2 +-
>  hw/char/ipoctal232.c|   2 +-
>  hw/char/lm32_juart.c|   2 +-
>  hw/char/lm32_uart.c |   2 +-
>  hw/char/milkymist-uart.c|   2 +-
>  hw/char/parallel.c  |   2 +-
>  hw/char/pl011.c |   2 +-
>  hw/char/sclpconsole-lm.c|   2 +-
>  hw/char/sclpconsole.c   |   2 +-
>  hw/char/serial-isa.c|   2 +-
>  hw/char/serial-pci.c|   6 +-
>  hw/char/spapr_vty.c |   2 +-
>  hw/char/stm32f2xx_usart.c   |   2 +-
>  hw/char/virtio-console.c|   2 +-
>  hw/char/virtio-serial-bus.c |   4 +-
>  hw/char/xilinx_uartlite.c   |   2 +-
>  hw/core/generic-loader.c|   2 +-
>  hw/core/or-irq.c|   2 +-
>  hw/core/platform-bus.c  |   2 +-
>  hw/core/qdev.c  | 112 
> ++--
>  hw/cpu/a15mpcore.c  |   2 +-
>  hw/cpu/a9mpcore.c   |   2 +-
>  hw/cpu/arm11mpcore.c|   2 +-
>  hw/cpu/realview_mpcore.c|   2 +-
>  hw/display/bcm2835_fb.c |   2 +-
>  hw/display/cg3.c|   2 +-
>  hw/display/cirrus_vga.c |   4 +-
>  hw/display/g364fb.c |   2 +-
>  hw/display/milkymist-vgafb.c|   2 +-
>  hw/display/qxl.c|   2 +-
>  hw/display/tcx.c|   2 +-
>  hw/display/vga-isa.c|   2 +-
>  hw/display/vga-pci.c|   4 +-
>  hw/display/virtio-gpu-pci.c |   2 +-
>  hw/display/virtio-gpu.c |   2 +-
>  hw/display/virtio-vga.c |   2 +-
>  hw/display/vmware_vga.c |   2 +-
>  hw/dma/i82374.c |   2 +-
>  hw/dma/i8257.c  |   2 +-
>  hw/dma/pl330.c  |

Re: [Qemu-devel] qemu master tests/vmstate prints "Failed to load simple/primitive:b_1" etc

2016-10-17 Thread Paolo Bonzini



On 17/10/2016 21:15, Dr. David Alan Gilbert wrote:
> * Peter Maydell (peter.mayd...@linaro.org) wrote:
>> On 17 October 2016 at 19:51, Dr. David Alan Gilbert  
>> wrote:
>>> * Peter Maydell (peter.mayd...@linaro.org) wrote:
 I've just noticed that qemu master running 'make check' prints
   GTESTER tests/test-vmstate
 Failed to load simple/primitive:b_1
 Failed to load simple/primitive:i64_2
 Failed to load simple/primitive:i32_1
 Failed to load simple/primitive:i32_1

 but the test doesn't fail.

 Can we either (a) silence this output if it's spurious or (b) have
 it cause the test to fail if it's real (and fix the cause of the
 failure ;-)), please?
>>>
>>> The test (has always) tried loading truncated versions of the migration
>>> stream and made sure that it receives an error from vmstate_load_state.
>>>
>>> However I just added an error so we can see which field fails to load
>>> in a migration where we just used to get a 'migration has failed with -22'
>>>
>>> Is there a way to silence error_report's that's already in use in tests?
>>
>> We have some nasty hacks (like check for 'qtest_enabled()' before
>> calling error_report()) but we don't have anything in the
>> tree today that's a more coherent approach to the "test
>> deliberately provoked this error" problem.
> 
> Errors go to either the current monitor (if it's non-qmp) or
> stderr; so could we create a dummy monitor to eat the errors
> and make it current around that part?

I guess you could reimplement the functions of stubs/mon-printf.c and
stubs/mon-is-qmp.c.

Paolo

Re: [Qemu-devel] [PATCH] nvic: allow to set pending status for not active interrupts

2016-10-17 Thread Krzeminski, Marcin (Nokia - PL/Wroclaw)

Hi Peter

> -Original Message-
> From: Peter Maydell [mailto:peter.mayd...@linaro.org]
> Sent: Thursday, October 13, 2016 6:59 PM
> To: Krzeminski, Marcin (Nokia - PL/Wroclaw)
> 
> Cc: QEMU Developers ; qemu-arm  a...@nongnu.org>; rfsw-patc...@mlist.nokia.com
> Subject: Re: [PATCH] nvic: allow to set pending status for not active
> interrupts
> 
> On 7 October 2016 at 10:42,   wrote:
> > From: Marcin Krzeminski 
> >
> > According to ARM DUI 0552A 4.2.10. NVIC set pending status also for
> > disabled interrupts. This patch adds possibility to emulate this in
> > Qemu.
> >
> > Signed-off-by: Marcin Krzeminski 
> > ---
> >  hw/intc/arm_gic.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/hw/intc/arm_gic.c b/hw/intc/arm_gic.c index
> > b30cc91..85be6e4 100644
> > --- a/hw/intc/arm_gic.c
> > +++ b/hw/intc/arm_gic.c
> > @@ -147,7 +147,8 @@ static void gic_set_irq_11mpcore(GICState *s, int
> > irq, int level,  {
> >  if (level) {
> >  GIC_SET_LEVEL(irq, cm);
> > -if (GIC_TEST_EDGE_TRIGGER(irq) || GIC_TEST_ENABLED(irq, cm)) {
> > +if (GIC_TEST_EDGE_TRIGGER(irq) || GIC_TEST_ENABLED(irq, cm)
> > +|| (!GIC_TEST_ACTIVE(irq, cm) &&  s->revision ==
> > + REV_NVIC)) {
> >  DPRINTF("Set %d pending mask %x\n", irq, target);
> >  GIC_SET_PENDING(irq, target);
> >  }
> > --
> 
> Thanks for this patch. I agree that the current behaviour isn't correct.
> 
> I think it would be cleaner to define a new gic_set_irq_nvic() which has the
> NVIC specific behaviour, rather than sticking an "if this is really an NVIC"
> check into this function.
>
Sure.

> You probably also want to check whether the logic for re-pending level
> triggered interrupts in gic_complete_irq() also needs a similar change:
> 
> if (s->revision == REV_11MPCORE || s->revision == REV_NVIC) {
> /* Mark level triggered interrupts as pending if they are still
>raised.  */
> if (!GIC_TEST_EDGE_TRIGGER(irq) && GIC_TEST_ENABLED(irq, cm)
> && GIC_TEST_LEVEL(irq, cm) && (GIC_TARGET(irq) & cm) != 0) {
> DPRINTF("Set %d pending mask %x\n", irq, cm);
> GIC_SET_PENDING(irq, cm);
> }
> }
It seem thet level triggered interrupt could be removed from fi for NVIC.
I do not have guest to model that, but I will at least check if this
do not brake anything.

Thanks,
Marcin 
> 
> thanks
> -- PMM

Re: [Qemu-devel] [PATCH 0/2] tests: A few check-qom-proplist fixes

2016-10-17 Thread Eduardo Habkost

Ping?

Markus, do you want to merge this through your tree?

On Tue, Oct 11, 2016 at 09:37:45AM -0300, Eduardo Habkost wrote:
> A few fixes on check-qom-proplist that will ensure we test both
> class properties and object properties, and catch errors when
> registering properties in test code.
> 
> Eduardo Habkost (2):
>   tests: check-qom-proplist: Remove "bv" class property from class
>   tests: check-qom-proplist: Use _abort to catch errors
> 
>  tests/check-qom-proplist.c | 10 +++---
>  1 file changed, 3 insertions(+), 7 deletions(-)
> 
> -- 
> 2.7.4
> 

-- 
Eduardo

Re: [Qemu-devel] [RFC v5 0/8] KVM PCI/MSI passthrough with mach-virt

2016-10-17 Thread Auger Eric

Hi,
On 17/10/2016 22:16, no-re...@ec2-52-6-146-230.compute-1.amazonaws.com
wrote:
> Hi,
> 
> Your series failed automatic build test. Please find the testing commands and
> their output below. If you have docker installed, you can probably reproduce 
> it
> locally.
> 
> Subject: [Qemu-devel] [RFC v5 0/8] KVM PCI/MSI passthrough with mach-virt
> Type: series
> Message-id: 1476733110-14293-1-git-send-email-eric.au...@redhat.com
> 
> === TEST SCRIPT BEGIN ===
> #!/bin/bash
> set -e
> git submodule update --init dtc
> # Let docker tests dump environment info
> export SHOW_ENV=1
> export J=16
> make docker-test-quick@centos6
> make docker-test-mingw@fedora
> === TEST SCRIPT END ===
> 
> Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
> From https://github.com/patchew-project/qemu
>  * [new tag] 
> patchew/1476733110-14293-1-git-send-email-eric.au...@redhat.com -> 
> patchew/1476733110-14293-1-git-send-email-eric.au...@redhat.com
> Switched to a new branch 'test'
> 465a02b hw: vfio: common: Adapt vfio_listeners for reserved_iova region
> 25541d6 hw: vfio: common: vfio_prepare_msi_mapping
> 9b97aec hw: platform-bus: Add platform bus stub
> 94589db hw: platform-bus: Enable to map any memory region onto the 
> platform-bus
> 425e1ac memory: memory_region_find_by_name
> 4a2c82c memory: Add reserved_iova region type
> fdf9cd8 hw: vfio: common: vfio_get_iommu_type1_info
> eb2e918 linux-headers: Partial update for MSI IOVA handling
> 
> === OUTPUT BEGIN ===
> Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc'
> Cloning into 'dtc'...
> Submodule path 'dtc': checked out '65cc4d2748a2c2e6f27f1cf39e07a5dbabd80ebf'
>   BUILD   centos6
> make[1]: Entering directory '/var/tmp/patchew-tester-tmp-zydd_mdj/src'
>   ARCHIVE qemu.tgz
>   ARCHIVE dtc.tgz
>   COPYRUNNER
> RUN test-quick in qemu:centos6 
> Packages installed:
> SDL-devel-1.2.14-7.el6_7.1.x86_64
> ccache-3.1.6-2.el6.x86_64
> epel-release-6-8.noarch
> gcc-4.4.7-17.el6.x86_64
> git-1.7.1-4.el6_7.1.x86_64
> glib2-devel-2.28.8-5.el6.x86_64
> libfdt-devel-1.4.0-1.el6.x86_64
> make-3.81-23.el6.x86_64
> package g++ is not installed
> pixman-devel-0.32.8-1.el6.x86_64
> tar-1.23-15.el6_8.x86_64
> zlib-devel-1.2.3-29.el6.x86_64
> 
> Environment variables:
> PACKAGES=libfdt-devel ccache tar git make gcc g++ zlib-devel 
> glib2-devel SDL-devel pixman-devel epel-release
> HOSTNAME=a077d39b13de
> TERM=xterm
> MAKEFLAGS= -j16
> HISTSIZE=1000
> J=16
> USER=root
> CCACHE_DIR=/var/tmp/ccache
> EXTRA_CONFIGURE_OPTS=
> V=
> SHOW_ENV=1
> MAIL=/var/spool/mail/root
> PATH=/usr/lib/ccache:/usr/lib64/ccache:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
> PWD=/
> LANG=en_US.UTF-8
> TARGET_LIST=
> HISTCONTROL=ignoredups
> SHLVL=1
> HOME=/root
> TEST_DIR=/tmp/qemu-test
> LOGNAME=root
> LESSOPEN=||/usr/bin/lesspipe.sh %s
> FEATURES= dtc
> DEBUG=
> G_BROKEN_FILENAMES=1
> CCACHE_HASHDIR=
> _=/usr/bin/env
> 
> Configure options:
> --enable-werror --target-list=x86_64-softmmu,aarch64-softmmu 
> --prefix=/var/tmp/qemu-build/install
> No C++ compiler available; disabling C++ specific optional code
> Install prefix/var/tmp/qemu-build/install
> BIOS directory/var/tmp/qemu-build/install/share/qemu
> binary directory  /var/tmp/qemu-build/install/bin
> library directory /var/tmp/qemu-build/install/lib
> module directory  /var/tmp/qemu-build/install/lib/qemu
> libexec directory /var/tmp/qemu-build/install/libexec
> include directory /var/tmp/qemu-build/install/include
> config directory  /var/tmp/qemu-build/install/etc
> local state directory   /var/tmp/qemu-build/install/var
> Manual directory  /var/tmp/qemu-build/install/share/man
> ELF interp prefix /usr/gnemul/qemu-%M
> Source path   /tmp/qemu-test/src
> C compilercc
> Host C compiler   cc
> C++ compiler  
> Objective-C compiler cc
> ARFLAGS   rv
> CFLAGS-O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -g 
> QEMU_CFLAGS   -I/usr/include/pixman-1-pthread -I/usr/include/glib-2.0 
> -I/usr/lib64/glib-2.0/include   -fPIE -DPIE -m64 -D_GNU_SOURCE 
> -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes 
> -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes 
> -fno-strict-aliasing -fno-common -fwrapv  -Wendif-labels 
> -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security 
> -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration 
> -Wold-style-definition -Wtype-limits -fstack-protector-all
> LDFLAGS   -Wl,--warn-common -Wl,-z,relro -Wl,-z,now -pie -m64 -g 
> make  make
> install   install
> pythonpython -B
> smbd  /usr/sbin/smbd
> module supportno
> host CPU  x86_64
> host big endian   no
> target list   x86_64-softmmu aarch64-softmmu
> tcg debug enabled no
> gprof enabled no
> sparse enabledno
> strip binariesyes
> profiler  no
> static build  no
> pixmansystem
> SDL support

[Qemu-devel] [PATCH] ppc/xics: Add xics to the monitor "info pic" command

2016-10-17 Thread Cédric Le Goater

From: Benjamin Herrenschmidt 

Useful to debug interrupt problems.

Signed-off-by: Benjamin Herrenschmidt 
[clg: - updated for qemu-2.7
  - added a test on ->irqs as it is not necessarily allocated
(PHB3_MSI)
  - removed static variable g_xics and replace with a loop on all
children to find the xics objects.
  - rebased on InterruptStatsProvider interface ]
Signed-off-by: Cédric Le Goater 
---
 hw/intc/xics.c | 49 +
 1 file changed, 49 insertions(+)

diff --git a/hw/intc/xics.c b/hw/intc/xics.c
index f40b3a45..7fac964fbd27 100644
--- a/hw/intc/xics.c
+++ b/hw/intc/xics.c
@@ -35,6 +35,8 @@
 #include "hw/ppc/xics.h"
 #include "qemu/error-report.h"
 #include "qapi/visitor.h"
+#include "monitor/monitor.h"
+#include "hw/intc/intc.h"
 
 int xics_get_cpu_index_by_dt_id(int cpu_dt_id)
 {
@@ -90,6 +92,47 @@ void xics_cpu_setup(XICSState *xics, PowerPCCPU *cpu)
 }
 }
 
+static void xics_common_pic_print_info(InterruptStatsProvider *obj,
+   Monitor *mon)
+{
+XICSState *xics = XICS_COMMON(obj);
+ICSState *ics;
+uint32_t i;
+
+for (i = 0; i < xics->nr_servers; i++) {
+ICPState *icp = >ss[i];
+
+if (!icp->output) {
+continue;
+}
+monitor_printf(mon, "CPU %d XIRR=%08x (%p) PP=%02x MFRR=%02x\n",
+   i, icp->xirr, icp->xirr_owner,
+   icp->pending_priority, icp->mfrr);
+}
+
+QLIST_FOREACH(ics, >ics, list) {
+monitor_printf(mon, "ICS %4x..%4x %p\n",
+   ics->offset, ics->offset + ics->nr_irqs - 1, ics);
+
+if (!ics->irqs) {
+continue;
+}
+
+for (i = 0; i < ics->nr_irqs; i++) {
+ICSIRQState *irq = ics->irqs + i;
+
+if (!(irq->flags & XICS_FLAGS_IRQ_MASK)) {
+continue;
+}
+monitor_printf(mon, "  %4x %s %02x %02x\n",
+   ics->offset + i,
+   (irq->flags & XICS_FLAGS_IRQ_LSI) ?
+   "LSI" : "MSI",
+   irq->priority, irq->status);
+}
+}
+}
+
 /*
  * XICS Common class - parent for emulated XICS and KVM-XICS
  */
@@ -190,8 +233,10 @@ static void xics_common_initfn(Object *obj)
 static void xics_common_class_init(ObjectClass *oc, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(oc);
+InterruptStatsProviderClass *ic = INTERRUPT_STATS_PROVIDER_CLASS(oc);
 
 dc->reset = xics_common_reset;
+ic->print_info = xics_common_pic_print_info;
 }
 
 static const TypeInfo xics_common_info = {
@@ -201,6 +246,10 @@ static const TypeInfo xics_common_info = {
 .class_size= sizeof(XICSStateClass),
 .instance_init = xics_common_initfn,
 .class_init= xics_common_class_init,
+.interfaces = (InterfaceInfo[]) {
+{ TYPE_INTERRUPT_STATS_PROVIDER },
+{ }
+},
 };
 
 /*
-- 
2.7.4

Re: [Qemu-devel] [RFC v5 0/8] KVM PCI/MSI passthrough with mach-virt

2016-10-17 Thread no-reply

Hi,

Your series failed automatic build test. Please find the testing commands and
their output below. If you have docker installed, you can probably reproduce it
locally.

Subject: [Qemu-devel] [RFC v5 0/8] KVM PCI/MSI passthrough with mach-virt
Type: series
Message-id: 1476733110-14293-1-git-send-email-eric.au...@redhat.com

=== TEST SCRIPT BEGIN ===
#!/bin/bash
set -e
git submodule update --init dtc
# Let docker tests dump environment info
export SHOW_ENV=1
export J=16
make docker-test-quick@centos6
make docker-test-mingw@fedora
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag] 
patchew/1476733110-14293-1-git-send-email-eric.au...@redhat.com -> 
patchew/1476733110-14293-1-git-send-email-eric.au...@redhat.com
Switched to a new branch 'test'
465a02b hw: vfio: common: Adapt vfio_listeners for reserved_iova region
25541d6 hw: vfio: common: vfio_prepare_msi_mapping
9b97aec hw: platform-bus: Add platform bus stub
94589db hw: platform-bus: Enable to map any memory region onto the platform-bus
425e1ac memory: memory_region_find_by_name
4a2c82c memory: Add reserved_iova region type
fdf9cd8 hw: vfio: common: vfio_get_iommu_type1_info
eb2e918 linux-headers: Partial update for MSI IOVA handling

=== OUTPUT BEGIN ===
Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc'
Cloning into 'dtc'...
Submodule path 'dtc': checked out '65cc4d2748a2c2e6f27f1cf39e07a5dbabd80ebf'
  BUILD   centos6
make[1]: Entering directory '/var/tmp/patchew-tester-tmp-zydd_mdj/src'
  ARCHIVE qemu.tgz
  ARCHIVE dtc.tgz
  COPYRUNNER
RUN test-quick in qemu:centos6 
Packages installed:
SDL-devel-1.2.14-7.el6_7.1.x86_64
ccache-3.1.6-2.el6.x86_64
epel-release-6-8.noarch
gcc-4.4.7-17.el6.x86_64
git-1.7.1-4.el6_7.1.x86_64
glib2-devel-2.28.8-5.el6.x86_64
libfdt-devel-1.4.0-1.el6.x86_64
make-3.81-23.el6.x86_64
package g++ is not installed
pixman-devel-0.32.8-1.el6.x86_64
tar-1.23-15.el6_8.x86_64
zlib-devel-1.2.3-29.el6.x86_64

Environment variables:
PACKAGES=libfdt-devel ccache tar git make gcc g++ zlib-devel 
glib2-devel SDL-devel pixman-devel epel-release
HOSTNAME=a077d39b13de
TERM=xterm
MAKEFLAGS= -j16
HISTSIZE=1000
J=16
USER=root
CCACHE_DIR=/var/tmp/ccache
EXTRA_CONFIGURE_OPTS=
V=
SHOW_ENV=1
MAIL=/var/spool/mail/root
PATH=/usr/lib/ccache:/usr/lib64/ccache:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PWD=/
LANG=en_US.UTF-8
TARGET_LIST=
HISTCONTROL=ignoredups
SHLVL=1
HOME=/root
TEST_DIR=/tmp/qemu-test
LOGNAME=root
LESSOPEN=||/usr/bin/lesspipe.sh %s
FEATURES= dtc
DEBUG=
G_BROKEN_FILENAMES=1
CCACHE_HASHDIR=
_=/usr/bin/env

Configure options:
--enable-werror --target-list=x86_64-softmmu,aarch64-softmmu 
--prefix=/var/tmp/qemu-build/install
No C++ compiler available; disabling C++ specific optional code
Install prefix/var/tmp/qemu-build/install
BIOS directory/var/tmp/qemu-build/install/share/qemu
binary directory  /var/tmp/qemu-build/install/bin
library directory /var/tmp/qemu-build/install/lib
module directory  /var/tmp/qemu-build/install/lib/qemu
libexec directory /var/tmp/qemu-build/install/libexec
include directory /var/tmp/qemu-build/install/include
config directory  /var/tmp/qemu-build/install/etc
local state directory   /var/tmp/qemu-build/install/var
Manual directory  /var/tmp/qemu-build/install/share/man
ELF interp prefix /usr/gnemul/qemu-%M
Source path   /tmp/qemu-test/src
C compilercc
Host C compiler   cc
C++ compiler  
Objective-C compiler cc
ARFLAGS   rv
CFLAGS-O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -g 
QEMU_CFLAGS   -I/usr/include/pixman-1-pthread -I/usr/include/glib-2.0 
-I/usr/lib64/glib-2.0/include   -fPIE -DPIE -m64 -D_GNU_SOURCE 
-D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes 
-Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes 
-fno-strict-aliasing -fno-common -fwrapv  -Wendif-labels -Wmissing-include-dirs 
-Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self 
-Wignored-qualifiers -Wold-style-declaration -Wold-style-definition 
-Wtype-limits -fstack-protector-all
LDFLAGS   -Wl,--warn-common -Wl,-z,relro -Wl,-z,now -pie -m64 -g 
make  make
install   install
pythonpython -B
smbd  /usr/sbin/smbd
module supportno
host CPU  x86_64
host big endian   no
target list   x86_64-softmmu aarch64-softmmu
tcg debug enabled no
gprof enabled no
sparse enabledno
strip binariesyes
profiler  no
static build  no
pixmansystem
SDL support   yes (1.2.14)
GTK support   no 
GTK GL supportno
VTE support   no 
TLS priority  NORMAL
GNUTLS supportno
GNUTLS rndno
libgcrypt no
libgcrypt kdf no
nettleno 
nettle kdfno
libtasn1  no
curses supportno
virgl support no
curl support  no
mingw32 support   no
Audio drivers oss
Block

[Qemu-devel] [PULL 19/19] vfio: fix duplicate function call

2016-10-17 Thread Alex Williamson

From: Cao jin 

When vfio device is reset(encounter FLR, or bus reset), if need to do
bus reset(vfio_pci_hot_reset_one is called), vfio_pci_pre_reset &
vfio_pci_post_reset will be called twice.

Signed-off-by: Cao jin 
Signed-off-by: Alex Williamson 
---
 hw/vfio/pci.c |8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index fef436a..65d30fd 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -1951,7 +1951,9 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool 
single)
 
 trace_vfio_pci_hot_reset(vdev->vbasedev.name, single ? "one" : "multi");
 
-vfio_pci_pre_reset(vdev);
+if (!single) {
+vfio_pci_pre_reset(vdev);
+}
 vdev->vbasedev.needs_reset = false;
 
 info = g_malloc0(sizeof(*info));
@@ -2109,7 +2111,9 @@ out:
 }
 }
 out_single:
-vfio_pci_post_reset(vdev);
+if (!single) {
+vfio_pci_post_reset(vdev);
+}
 g_free(info);
 
 return ret;

Re: [Qemu-devel] [kvm-unit-tests PATCH v3 10/10] arm/arm64: gic: don't just use zero

2016-10-17 Thread Andrew Jones

On Fri, Sep 02, 2016 at 11:43:33AM +0200, Auger Eric wrote:
> Hi Drew,
> 
> On 15/07/2016 15:00, Andrew Jones wrote:
> > Allow user to select who sends ipis and with which irq,
> > rather than just always sending irq=0 from cpu0.
> > 
> > Signed-off-by: Andrew Jones 
> > 
> > ---
> > v2: actually check that the irq received was the irq sent,
> > and (for gicv2) that the sender is the expected one.
> > ---
> >  arm/gic.c | 80 
> > ++-
> >  1 file changed, 64 insertions(+), 16 deletions(-)
> > 
> > diff --git a/arm/gic.c b/arm/gic.c
> > index fc7ef241de3e2..d3ab97d4ae470 100644
> > --- a/arm/gic.c
> > +++ b/arm/gic.c
> > @@ -11,6 +11,7 @@
> >   * This work is licensed under the terms of the GNU LGPL, version 2.
> >   */
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -33,6 +34,8 @@ static struct gic *gic;
> >  static int gic_version;
> >  static int acked[NR_CPUS], spurious[NR_CPUS];
> >  static cpumask_t ready;
> > +static int sender;
> > +static u32 irq;
> >  
> >  static void nr_cpu_check(int nr)
> >  {
> > @@ -85,7 +88,16 @@ static void check_acked(cpumask_t *mask)
> >  
> >  static u32 gicv2_read_iar(void)
> >  {
> > -   return readl(gicv2_cpu_base() + GIC_CPU_INTACK);
> > +   u32 iar = readl(gicv2_cpu_base() + GIC_CPU_INTACK);
> > +   int src = (iar >> 10) & 7;
> > +
> > +   if (src != sender) {
> > +   report("cpu%d received IPI from unexpected source cpu%d "
> > +  "(expected cpu%d)",
> > +  false, smp_processor_id(), src, sender);
> > +   }
> > +
> > +   return iar & 0x3ff;
> you can use GICC_IAR_INT_ID_MASK instead

OK

> >  }
> >  
> >  static void gicv2_write_eoi(u32 irq)
> > @@ -99,9 +111,15 @@ static void ipi_handler(struct pt_regs *regs __unused)
> >  
> > if (iar != GICC_INT_SPURIOUS) {
> > gic->write_eoi(iar);
> > -   smp_rmb(); /* pairs with wmb in ipi_test functions */
> > -   ++acked[smp_processor_id()];
> > -   smp_wmb(); /* pairs with rmb in check_acked */
> > +   if (iar == irq) {
> > +   smp_rmb(); /* pairs with wmb in ipi_test functions */
> > +   ++acked[smp_processor_id()];
> > +   smp_wmb(); /* pairs with rmb in check_acked */
> > +   } else {
> > +   report("cpu%d received unexpected irq %u "
> > +  "(expected %u)",
> > +  false, smp_processor_id(), iar, irq);
> > +   }
> > } else {
> > ++spurious[smp_processor_id()];
> > smp_wmb();
> > @@ -110,19 +128,19 @@ static void ipi_handler(struct pt_regs *regs __unused)
> >  
> >  static void gicv2_ipi_send_self(void)
> >  {
> > -   writel(2 << 24, gicv2_dist_base() + GIC_DIST_SOFTINT);
> > +   writel(2 << 24 | irq, gicv2_dist_base() + GIC_DIST_SOFTINT);
> >  }
> >  
> >  static void gicv2_ipi_send_tlist(cpumask_t *mask)
> >  {
> > u8 tlist = (u8)cpumask_bits(mask)[0];
> >  
> > -   writel(tlist << 16, gicv2_dist_base() + GIC_DIST_SOFTINT);
> > +   writel(tlist << 16 | irq, gicv2_dist_base() + GIC_DIST_SOFTINT);
> >  }
> >  
> >  static void gicv2_ipi_send_broadcast(void)
> >  {
> > -   writel(1 << 24, gicv2_dist_base() + GIC_DIST_SOFTINT);
> > +   writel(1 << 24 | irq, gicv2_dist_base() + GIC_DIST_SOFTINT);
> >  }
> >  
> >  #define ICC_SGI1R_AFFINITY_1_SHIFT 16
> > @@ -165,7 +183,7 @@ static void gicv3_ipi_send_tlist(cpumask_t *mask)
> >  
> > sgi1r = (MPIDR_TO_SGI_AFFINITY(cluster_id, 3)   |
> >  MPIDR_TO_SGI_AFFINITY(cluster_id, 2)   |
> > -/* irq << 24   | */
> > +irq << 24  |
> >  MPIDR_TO_SGI_AFFINITY(cluster_id, 1)   |
> >  tlist);
> >  
> > @@ -187,7 +205,7 @@ static void gicv3_ipi_send_self(void)
> >  
> >  static void gicv3_ipi_send_broadcast(void)
> >  {
> > -   gicv3_write_sgi1r(1ULL << 40);
> > +   gicv3_write_sgi1r(1ULL << 40 | irq << 24);
> > isb();
> >  }
> >  
> > @@ -199,7 +217,7 @@ static void ipi_test_self(void)
> > memset(acked, 0, sizeof(acked));
> > smp_wmb();
> > cpumask_clear();
> > -   cpumask_set_cpu(0, );
> > +   cpumask_set_cpu(smp_processor_id(), );
> > gic->ipi.send_self();
> > check_acked();
> > report_prefix_pop();
> > @@ -214,7 +232,7 @@ static void ipi_test_smp(void)
> > memset(acked, 0, sizeof(acked));
> > smp_wmb();
> > cpumask_copy(, _present_mask);
> > -   for (i = 0; i < nr_cpus; i += 2)
> > +   for (i = smp_processor_id() & 1; i < nr_cpus; i += 2)
> > cpumask_clear_cpu(i, );
> > gic->ipi.send_tlist();
> > check_acked();
> > @@ -224,7 +242,7 @@ static void ipi_test_smp(void)
> > memset(acked, 0, sizeof(acked));
> > smp_wmb();
> > cpumask_copy(, _present_mask);
> > -   cpumask_clear_cpu(0, );
> > +

[Qemu-devel] [PULL 18/19] vfio/pci: Fix vfio_rtl8168_quirk_data_read address offset

2016-10-17 Thread Alex Williamson

From: Thorsten Kohfeldt 

Introductory comment for rtl8168 VFIO MSI-X quirk states:
At BAR2 offset 0x70 there is a dword data register,
 offset 0x74 is a dword address register.
vfio: vfio_bar_read(:05:00.0:BAR2+0x70, 4) = 0xfee00398 // read data

Thus, correct offset for data read is 0x70,
but function vfio_rtl8168_quirk_data_read() wrongfully uses offset 0x74.

Signed-off-by: Thorsten Kohfeldt 
Signed-off-by: Alex Williamson 
---
 hw/vfio/pci-quirks.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
index 2cbda08..811eecd 100644
--- a/hw/vfio/pci-quirks.c
+++ b/hw/vfio/pci-quirks.c
@@ -898,7 +898,7 @@ static uint64_t vfio_rtl8168_quirk_data_read(void *opaque,
 {
 VFIOrtl8168Quirk *rtl = opaque;
 VFIOPCIDevice *vdev = rtl->vdev;
-uint64_t data = vfio_region_read(>bars[2].region, addr + 0x74, size);
+uint64_t data = vfio_region_read(>bars[2].region, addr + 0x70, size);
 
 if (rtl->enabled && (vdev->pdev.cap_present & QEMU_PCI_CAP_MSIX)) {
 hwaddr offset = rtl->addr & 0xfff;

[Qemu-devel] [PULL 12/19] vfio/platform: fix a wrong returned value in vfio_populate_device

2016-10-17 Thread Alex Williamson

From: Eric Auger 

In case the vfio_init_intp fails we currently do not return an
error value. This patch fixes the bug. The returned value is not
explicit but in practice the error object is the one used to
report the error to the end-user and the actual returned error
value is not used.

Signed-off-by: Eric Auger 
Reviewed-by: Markus Armbruster 
Signed-off-by: Alex Williamson 
---
 hw/vfio/platform.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
index 1a35da0..484e31f 100644
--- a/hw/vfio/platform.c
+++ b/hw/vfio/platform.c
@@ -508,6 +508,7 @@ static int vfio_populate_device(VFIODevice *vbasedev, Error 
**errp)
 irq.flags);
 intp = vfio_init_intp(vbasedev, irq, errp);
 if (!intp) {
+ret = -1;
 goto irq_err;
 }
 }

[Qemu-devel] [PULL 13/19] vfio/platform: Pass an error object to vfio_base_device_init

2016-10-17 Thread Alex Williamson

From: Eric Auger 

This patch propagates errors encountered during vfio_base_device_init
up to the realize function.

In case the host value is not set or badly formed we now report an
error.

Signed-off-by: Eric Auger 
Reviewed-by: Markus Armbruster 
Signed-off-by: Alex Williamson 
---
 hw/vfio/platform.c |   50 +++---
 1 file changed, 27 insertions(+), 23 deletions(-)

diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
index 484e31f..a4663c9 100644
--- a/hw/vfio/platform.c
+++ b/hw/vfio/platform.c
@@ -541,13 +541,14 @@ static VFIODeviceOps vfio_platform_ops = {
 /**
  * vfio_base_device_init - perform preliminary VFIO setup
  * @vbasedev: the VFIO device handle
+ * @errp: error object
  *
  * Implement the VFIO command sequence that allows to discover
  * assigned device resources: group extraction, device
  * fd retrieval, resource query.
  * Precondition: the device name must be initialized
  */
-static int vfio_base_device_init(VFIODevice *vbasedev)
+static int vfio_base_device_init(VFIODevice *vbasedev, Error **errp)
 {
 VFIOGroup *group;
 VFIODevice *vbasedev_iter;
@@ -555,7 +556,6 @@ static int vfio_base_device_init(VFIODevice *vbasedev)
 ssize_t len;
 struct stat st;
 int groupid;
-Error *err = NULL;
 int ret;
 
 /* @sysfsdev takes precedence over @host */
@@ -564,6 +564,7 @@ static int vfio_base_device_init(VFIODevice *vbasedev)
 vbasedev->name = g_strdup(basename(vbasedev->sysfsdev));
 } else {
 if (!vbasedev->name || strchr(vbasedev->name, '/')) {
+error_setg(errp, "wrong host device name");
 return -EINVAL;
 }
 
@@ -572,8 +573,8 @@ static int vfio_base_device_init(VFIODevice *vbasedev)
 }
 
 if (stat(vbasedev->sysfsdev, ) < 0) {
-error_report("vfio: error: no such host device: %s",
- vbasedev->sysfsdev);
+error_setg_errno(errp, errno,
+ "failed to get the sysfs host device file status");
 return -errno;
 }
 
@@ -582,49 +583,44 @@ static int vfio_base_device_init(VFIODevice *vbasedev)
 g_free(tmp);
 
 if (len < 0 || len >= sizeof(group_path)) {
-error_report("vfio: error no iommu_group for device");
-return len < 0 ? -errno : -ENAMETOOLONG;
+ret = len < 0 ? -errno : -ENAMETOOLONG;
+error_setg_errno(errp, -ret, "no iommu_group found");
+return ret;
 }
 
 group_path[len] = 0;
 
 group_name = basename(group_path);
 if (sscanf(group_name, "%d", ) != 1) {
-error_report("vfio: error reading %s: %m", group_path);
+error_setg_errno(errp, errno, "failed to read %s", group_path);
 return -errno;
 }
 
 trace_vfio_platform_base_device_init(vbasedev->name, groupid);
 
-group = vfio_get_group(groupid, _space_memory, );
+group = vfio_get_group(groupid, _space_memory, errp);
 if (!group) {
-ret = -ENOENT;
-goto error;
+return -ENOENT;
 }
 
 QLIST_FOREACH(vbasedev_iter, >device_list, next) {
 if (strcmp(vbasedev_iter->name, vbasedev->name) == 0) {
-error_report("vfio: error: device %s is already attached",
- vbasedev->name);
+error_setg(errp, "device is already attached");
 vfio_put_group(group);
 return -EBUSY;
 }
 }
-ret = vfio_get_device(group, vbasedev->name, vbasedev, );
+ret = vfio_get_device(group, vbasedev->name, vbasedev, errp);
 if (ret) {
 vfio_put_group(group);
-goto error;
+return ret;
 }
 
-ret = vfio_populate_device(vbasedev, );
+ret = vfio_populate_device(vbasedev, errp);
 if (ret) {
 vfio_put_group(group);
 }
 
-error:
-if (err) {
-error_reportf_err(err, ERR_PREFIX, vbasedev->name);
-}
 return ret;
 }
 
@@ -650,11 +646,9 @@ static void vfio_platform_realize(DeviceState *dev, Error 
**errp)
 vbasedev->sysfsdev : vbasedev->name,
 vdev->compat);
 
-ret = vfio_base_device_init(vbasedev);
+ret = vfio_base_device_init(vbasedev, errp);
 if (ret) {
-error_setg(errp, "vfio: vfio_base_device_init failed for %s",
-   vbasedev->name);
-return;
+goto out;
 }
 
 for (i = 0; i < vbasedev->num_regions; i++) {
@@ -664,6 +658,16 @@ static void vfio_platform_realize(DeviceState *dev, Error 
**errp)
 }
 sysbus_init_mmio(sbdev, vdev->regions[i]->mem);
 }
+out:
+if (!ret) {
+return;
+}
+
+if (vdev->vbasedev.name) {
+error_prepend(errp, ERR_PREFIX, vdev->vbasedev.name);
+} else {
+error_prepend(errp, "vfio error: ");
+}
 }
 
 static const VMStateDescription vfio_platform_vmstate = {

[Qemu-devel] [PULL 17/19] vfio/pci: Handle host oversight

2016-10-17 Thread Alex Williamson

From: Eric Auger 

In case the end-user calls qemu with -vfio-pci option without passing
either sysfsdev or host property value, the device is interpreted as
:00:00.0. Let's create a specific error message to guide the end-user.

Signed-off-by: Eric Auger 
Reviewed-by: Markus Armbruster 
Signed-off-by: Alex Williamson 
---
 hw/vfio/pci.c |   11 +++
 1 file changed, 11 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 6d01324..fef436a 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2520,6 +2520,13 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 int i, ret;
 
 if (!vdev->vbasedev.sysfsdev) {
+if (!(~vdev->host.domain || ~vdev->host.bus ||
+  ~vdev->host.slot || ~vdev->host.function)) {
+error_setg(errp, "No provided host device");
+error_append_hint(errp, "Use -vfio-pci,host=:BB:DD.F "
+  "or -vfio-pci,sysfsdev=PATH_TO_DEVICE\n");
+return;
+}
 vdev->vbasedev.sysfsdev =
 g_strdup_printf("/sys/bus/pci/devices/%04x:%02x:%02x.%01x",
 vdev->host.domain, vdev->host.bus,
@@ -2828,6 +2835,10 @@ static void vfio_instance_init(Object *obj)
 device_add_bootindex_property(obj, >bootindex,
   "bootindex", NULL,
   _dev->qdev, NULL);
+vdev->host.domain = ~0U;
+vdev->host.bus = ~0U;
+vdev->host.slot = ~0U;
+vdev->host.function = ~0U;
 }
 
 static Property vfio_pci_dev_properties[] = {

[Qemu-devel] [PULL 11/19] vfio/platform: Pass an error object to vfio_populate_device

2016-10-17 Thread Alex Williamson

From: Eric Auger 

Propagate the vfio_populate_device errors up to vfio_base_device_init.
The error object also is passed to vfio_init_intp. At the moment we
only report the error. Subsequent patches will propagate the error
up to the realize function.

Signed-off-by: Eric Auger 
Reviewed-by: Markus Armbruster 
Signed-off-by: Alex Williamson 
---
 hw/vfio/platform.c |   25 +
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
index 9014ea7..1a35da0 100644
--- a/hw/vfio/platform.c
+++ b/hw/vfio/platform.c
@@ -44,9 +44,10 @@ static inline bool vfio_irq_is_automasked(VFIOINTp *intp)
  * and add it into the list of IRQs
  * @vbasedev: the VFIO device handle
  * @info: irq info struct retrieved from VFIO driver
+ * @errp: error object
  */
 static VFIOINTp *vfio_init_intp(VFIODevice *vbasedev,
-struct vfio_irq_info info)
+struct vfio_irq_info info, Error **errp)
 {
 int ret;
 VFIOPlatformDevice *vdev =
@@ -69,7 +70,8 @@ static VFIOINTp *vfio_init_intp(VFIODevice *vbasedev,
 if (ret) {
 g_free(intp->interrupt);
 g_free(intp);
-error_report("vfio: Error: trigger event_notifier_init failed ");
+error_setg_errno(errp, -ret,
+ "failed to initialize trigger eventd notifier");
 return NULL;
 }
 if (vfio_irq_is_automasked(intp)) {
@@ -80,7 +82,8 @@ static VFIOINTp *vfio_init_intp(VFIODevice *vbasedev,
 g_free(intp->interrupt);
 g_free(intp->unmask);
 g_free(intp);
-error_report("vfio: Error: resamplefd event_notifier_init failed");
+error_setg_errno(errp, -ret,
+ "failed to initialize resample eventd notifier");
 return NULL;
 }
 }
@@ -456,9 +459,10 @@ static int vfio_platform_hot_reset_multi(VFIODevice 
*vbasedev)
  * vfio_populate_device - Allocate and populate MMIO region
  * and IRQ structs according to driver returned information
  * @vbasedev: the VFIO device handle
+ * @errp: error object
  *
  */
-static int vfio_populate_device(VFIODevice *vbasedev)
+static int vfio_populate_device(VFIODevice *vbasedev, Error **errp)
 {
 VFIOINTp *intp, *tmp;
 int i, ret = -1;
@@ -466,7 +470,7 @@ static int vfio_populate_device(VFIODevice *vbasedev)
 container_of(vbasedev, VFIOPlatformDevice, vbasedev);
 
 if (!(vbasedev->flags & VFIO_DEVICE_FLAGS_PLATFORM)) {
-error_report("vfio: Um, this isn't a platform device");
+error_setg(errp, "this isn't a platform device");
 return ret;
 }
 
@@ -480,7 +484,7 @@ static int vfio_populate_device(VFIODevice *vbasedev)
 vdev->regions[i], i, name);
 g_free(name);
 if (ret) {
-error_report("vfio: Error getting region %d info: %m", i);
+error_setg_errno(errp, -ret, "failed to get region %d info", i);
 goto reg_error;
 }
 }
@@ -496,16 +500,14 @@ static int vfio_populate_device(VFIODevice *vbasedev)
 irq.index = i;
 ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_IRQ_INFO, );
 if (ret) {
-error_report("vfio: error getting device %s irq info",
- vbasedev->name);
+error_setg_errno(errp, -ret, "failed to get device irq info");
 goto irq_err;
 } else {
 trace_vfio_platform_populate_interrupts(irq.index,
 irq.count,
 irq.flags);
-intp = vfio_init_intp(vbasedev, irq);
+intp = vfio_init_intp(vbasedev, irq, errp);
 if (!intp) {
-error_report("vfio: Error installing IRQ %d up", i);
 goto irq_err;
 }
 }
@@ -613,9 +615,8 @@ static int vfio_base_device_init(VFIODevice *vbasedev)
 goto error;
 }
 
-ret = vfio_populate_device(vbasedev);
+ret = vfio_populate_device(vbasedev, );
 if (ret) {
-error_report("vfio: failed to populate device %s", vbasedev->name);
 vfio_put_group(group);
 }

[Qemu-devel] [PULL 14/19] vfio/pci: Conversion to realize

2016-10-17 Thread Alex Williamson

From: Eric Auger 

This patch converts VFIO PCI to realize function.

Also original initfn errors now are propagated using QEMU
error objects. All errors are formatted with the same pattern:
"vfio: %s: the error description"

Signed-off-by: Eric Auger 
Reviewed-by: Markus Armbruster 
Signed-off-by: Alex Williamson 
---
 hw/vfio/pci.c|   64 --
 hw/vfio/trace-events |2 +-
 2 files changed, 27 insertions(+), 39 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 0ba0711..d9652c2 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2513,13 +2513,12 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice 
*vdev)
 vdev->req_enabled = false;
 }
 
-static int vfio_initfn(PCIDevice *pdev)
+static void vfio_realize(PCIDevice *pdev, Error **errp)
 {
 VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
 VFIODevice *vbasedev_iter;
 VFIOGroup *group;
 char *tmp, group_path[PATH_MAX], *group_name;
-Error *err = NULL;
 ssize_t len;
 struct stat st;
 int groupid;
@@ -2533,9 +2532,9 @@ static int vfio_initfn(PCIDevice *pdev)
 }
 
 if (stat(vdev->vbasedev.sysfsdev, ) < 0) {
-error_setg_errno(, errno, "no such host device");
-ret = -errno;
-goto error;
+error_setg_errno(errp, errno, "no such host device");
+error_prepend(errp, ERR_PREFIX, vdev->vbasedev.sysfsdev);
+return;
 }
 
 vdev->vbasedev.name = g_strdup(basename(vdev->vbasedev.sysfsdev));
@@ -2547,8 +2546,8 @@ static int vfio_initfn(PCIDevice *pdev)
 g_free(tmp);
 
 if (len <= 0 || len >= sizeof(group_path)) {
-ret = len < 0 ? -errno : -ENAMETOOLONG;
-error_setg_errno(, -ret, "no iommu_group found");
+error_setg_errno(errp, len < 0 ? errno : ENAMETOOLONG,
+ "no iommu_group found");
 goto error;
 }
 
@@ -2556,35 +2555,32 @@ static int vfio_initfn(PCIDevice *pdev)
 
 group_name = basename(group_path);
 if (sscanf(group_name, "%d", ) != 1) {
-error_setg_errno(, errno, "failed to read %s", group_path);
-ret = -errno;
+error_setg_errno(errp, errno, "failed to read %s", group_path);
 goto error;
 }
 
-trace_vfio_initfn(vdev->vbasedev.name, groupid);
+trace_vfio_realize(vdev->vbasedev.name, groupid);
 
-group = vfio_get_group(groupid, pci_device_iommu_address_space(pdev), 
);
+group = vfio_get_group(groupid, pci_device_iommu_address_space(pdev), 
errp);
 if (!group) {
-ret = -ENOENT;
 goto error;
 }
 
 QLIST_FOREACH(vbasedev_iter, >device_list, next) {
 if (strcmp(vbasedev_iter->name, vdev->vbasedev.name) == 0) {
-error_setg(, "device is already attached");
+error_setg(errp, "device is already attached");
 vfio_put_group(group);
-ret = -EBUSY;
 goto error;
 }
 }
 
-ret = vfio_get_device(group, vdev->vbasedev.name, >vbasedev, );
+ret = vfio_get_device(group, vdev->vbasedev.name, >vbasedev, errp);
 if (ret) {
 vfio_put_group(group);
 goto error;
 }
 
-ret = vfio_populate_device(vdev, );
+ret = vfio_populate_device(vdev, errp);
 if (ret) {
 goto error;
 }
@@ -2595,7 +2591,7 @@ static int vfio_initfn(PCIDevice *pdev)
 vdev->config_offset);
 if (ret < (int)MIN(pci_config_size(>pdev), vdev->config_size)) {
 ret = ret < 0 ? -errno : -EFAULT;
-error_setg_errno(, -ret, "failed to read device config space");
+error_setg_errno(errp, -ret, "failed to read device config space");
 goto error;
 }
 
@@ -2612,8 +2608,7 @@ static int vfio_initfn(PCIDevice *pdev)
  */
 if (vdev->vendor_id != PCI_ANY_ID) {
 if (vdev->vendor_id >= 0x) {
-error_setg(, "invalid PCI vendor ID provided");
-ret = -EINVAL;
+error_setg(errp, "invalid PCI vendor ID provided");
 goto error;
 }
 vfio_add_emulated_word(vdev, PCI_VENDOR_ID, vdev->vendor_id, ~0);
@@ -2624,8 +2619,7 @@ static int vfio_initfn(PCIDevice *pdev)
 
 if (vdev->device_id != PCI_ANY_ID) {
 if (vdev->device_id > 0x) {
-error_setg(, "invalid PCI device ID provided");
-ret = -EINVAL;
+error_setg(errp, "invalid PCI device ID provided");
 goto error;
 }
 vfio_add_emulated_word(vdev, PCI_DEVICE_ID, vdev->device_id, ~0);
@@ -2636,8 +2630,7 @@ static int vfio_initfn(PCIDevice *pdev)
 
 if (vdev->sub_vendor_id != PCI_ANY_ID) {
 if (vdev->sub_vendor_id > 0x) {
-error_setg(, "invalid PCI subsystem vendor ID provided");
-ret = -EINVAL;
+error_setg(errp, "invalid PCI subsystem vendor ID provided");
 goto error;
 }

[Qemu-devel] [PULL 10/19] vfio: Pass an error object to vfio_get_device

2016-10-17 Thread Alex Williamson

From: Eric Auger 

Pass an error object to prepare for migration to VFIO-PCI realize.

In vfio platform vfio_base_device_init we currently just report the
error. Subsequent patches will propagate the error up to the realize
function.

Signed-off-by: Eric Auger 
Reviewed-by: Markus Armbruster 
Signed-off-by: Alex Williamson 
---
 hw/vfio/common.c  |   13 +++--
 hw/vfio/pci.c |3 +--
 hw/vfio/platform.c|5 ++---
 include/hw/vfio/vfio-common.h |2 +-
 4 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 90b1ebb..9505fb3 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1211,23 +1211,24 @@ void vfio_put_group(VFIOGroup *group)
 }
 
 int vfio_get_device(VFIOGroup *group, const char *name,
-   VFIODevice *vbasedev)
+VFIODevice *vbasedev, Error **errp)
 {
 struct vfio_device_info dev_info = { .argsz = sizeof(dev_info) };
 int ret, fd;
 
 fd = ioctl(group->fd, VFIO_GROUP_GET_DEVICE_FD, name);
 if (fd < 0) {
-error_report("vfio: error getting device %s from group %d: %m",
- name, group->groupid);
-error_printf("Verify all devices in group %d are bound to vfio- "
- "or pci-stub and not already in use\n", group->groupid);
+error_setg_errno(errp, errno, "error getting device from group %d",
+ group->groupid);
+error_append_hint(errp,
+  "Verify all devices in group %d are bound to vfio- "
+  "or pci-stub and not already in use\n", group->groupid);
 return fd;
 }
 
 ret = ioctl(fd, VFIO_DEVICE_GET_INFO, _info);
 if (ret) {
-error_report("vfio: error getting device info: %m");
+error_setg_errno(errp, errno, "error getting device info");
 close(fd);
 return ret;
 }
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index fdb0616..0ba0711 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2578,9 +2578,8 @@ static int vfio_initfn(PCIDevice *pdev)
 }
 }
 
-ret = vfio_get_device(group, vdev->vbasedev.name, >vbasedev);
+ret = vfio_get_device(group, vdev->vbasedev.name, >vbasedev, );
 if (ret) {
-error_setg_errno(, -ret, "failed to get device");
 vfio_put_group(group);
 goto error;
 }
diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
index 7bf525b..9014ea7 100644
--- a/hw/vfio/platform.c
+++ b/hw/vfio/platform.c
@@ -607,11 +607,10 @@ static int vfio_base_device_init(VFIODevice *vbasedev)
 return -EBUSY;
 }
 }
-ret = vfio_get_device(group, vbasedev->name, vbasedev);
+ret = vfio_get_device(group, vbasedev->name, vbasedev, );
 if (ret) {
-error_report("vfio: failed to get device %s", vbasedev->name);
 vfio_put_group(group);
-return ret;
+goto error;
 }
 
 ret = vfio_populate_device(vbasedev);
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 286fa31..c582de1 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -158,7 +158,7 @@ void vfio_reset_handler(void *opaque);
 VFIOGroup *vfio_get_group(int groupid, AddressSpace *as, Error **errp);
 void vfio_put_group(VFIOGroup *group);
 int vfio_get_device(VFIOGroup *group, const char *name,
-VFIODevice *vbasedev);
+VFIODevice *vbasedev, Error **errp);
 
 extern const MemoryRegionOps vfio_region_ops;
 extern QLIST_HEAD(vfio_group_head, VFIOGroup) vfio_group_list;

[Qemu-devel] [PULL 09/19] vfio: Pass an error object to vfio_get_group

2016-10-17 Thread Alex Williamson

From: Eric Auger 

Pass an error object to prepare for migration to VFIO-PCI realize.

For the time being let's just simply report the error in
vfio platform's vfio_base_device_init(). A subsequent patch will
duly propagate the error up to vfio_platform_realize.

Signed-off-by: Eric Auger 
Reviewed-by: Markus Armbruster 
Signed-off-by: Alex Williamson 
---
 hw/vfio/common.c  |   24 
 hw/vfio/pci.c |3 +--
 hw/vfio/platform.c|   11 ---
 include/hw/vfio/vfio-common.h |2 +-
 4 files changed, 22 insertions(+), 18 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 85a7759..90b1ebb 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1123,12 +1123,11 @@ static void vfio_disconnect_container(VFIOGroup *group)
 }
 }
 
-VFIOGroup *vfio_get_group(int groupid, AddressSpace *as)
+VFIOGroup *vfio_get_group(int groupid, AddressSpace *as, Error **errp)
 {
 VFIOGroup *group;
 char path[32];
 struct vfio_group_status status = { .argsz = sizeof(status) };
-Error *err = NULL;
 
 QLIST_FOREACH(group, _group_list, next) {
 if (group->groupid == groupid) {
@@ -1136,8 +1135,8 @@ VFIOGroup *vfio_get_group(int groupid, AddressSpace *as)
 if (group->container->space->as == as) {
 return group;
 } else {
-error_report("vfio: group %d used in multiple address spaces",
- group->groupid);
+error_setg(errp, "group %d used in multiple address spaces",
+   group->groupid);
 return NULL;
 }
 }
@@ -1148,28 +1147,29 @@ VFIOGroup *vfio_get_group(int groupid, AddressSpace *as)
 snprintf(path, sizeof(path), "/dev/vfio/%d", groupid);
 group->fd = qemu_open(path, O_RDWR);
 if (group->fd < 0) {
-error_report("vfio: error opening %s: %m", path);
+error_setg_errno(errp, errno, "failed to open %s", path);
 goto free_group_exit;
 }
 
 if (ioctl(group->fd, VFIO_GROUP_GET_STATUS, )) {
-error_report("vfio: error getting group status: %m");
+error_setg_errno(errp, errno, "failed to get group %d status", 
groupid);
 goto close_fd_exit;
 }
 
 if (!(status.flags & VFIO_GROUP_FLAGS_VIABLE)) {
-error_report("vfio: error, group %d is not viable, please ensure "
- "all devices within the iommu_group are bound to their "
- "vfio bus driver.", groupid);
+error_setg(errp, "group %d is not viable", groupid);
+error_append_hint(errp,
+  "Please ensure all devices within the iommu_group "
+  "are bound to their vfio bus driver.\n");
 goto close_fd_exit;
 }
 
 group->groupid = groupid;
 QLIST_INIT(>device_list);
 
-if (vfio_connect_container(group, as, )) {
-error_reportf_err(err, "vfio: failed to setup container for group %d",
-  groupid);
+if (vfio_connect_container(group, as, errp)) {
+error_prepend(errp, "failed to setup container for group %d: ",
+  groupid);
 goto close_fd_exit;
 }
 
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 3d84126..fdb0616 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2563,9 +2563,8 @@ static int vfio_initfn(PCIDevice *pdev)
 
 trace_vfio_initfn(vdev->vbasedev.name, groupid);
 
-group = vfio_get_group(groupid, pci_device_iommu_address_space(pdev));
+group = vfio_get_group(groupid, pci_device_iommu_address_space(pdev), 
);
 if (!group) {
-error_setg(, "failed to get group %d", groupid);
 ret = -ENOENT;
 goto error;
 }
diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
index a559e7b..7bf525b 100644
--- a/hw/vfio/platform.c
+++ b/hw/vfio/platform.c
@@ -552,6 +552,7 @@ static int vfio_base_device_init(VFIODevice *vbasedev)
 ssize_t len;
 struct stat st;
 int groupid;
+Error *err = NULL;
 int ret;
 
 /* @sysfsdev takes precedence over @host */
@@ -592,10 +593,10 @@ static int vfio_base_device_init(VFIODevice *vbasedev)
 
 trace_vfio_platform_base_device_init(vbasedev->name, groupid);
 
-group = vfio_get_group(groupid, _space_memory);
+group = vfio_get_group(groupid, _space_memory, );
 if (!group) {
-error_report("vfio: failed to get group %d", groupid);
-return -ENOENT;
+ret = -ENOENT;
+goto error;
 }
 
 QLIST_FOREACH(vbasedev_iter, >device_list, next) {
@@ -619,6 +620,10 @@ static int vfio_base_device_init(VFIODevice *vbasedev)
 vfio_put_group(group);
 }
 
+error:
+if (err) {
+error_reportf_err(err, ERR_PREFIX, vbasedev->name);
+}
 return ret;
 }
 
diff --git a/include/hw/vfio/vfio-common.h

Re: [Qemu-devel] [PATCH V1 03/10] qemu-clk: allow to bind two clocks together

2016-10-17 Thread KONRAD Frederic




Le 17/10/2016 à 20:19, Peter Maydell a écrit :

On 5 October 2016 at 23:10,   wrote:

From: KONRAD Frederic 

This introduces the clock binding and the update part.
When the qemu_clk_rate_update(qemu_clk, int) function is called:
  * The clock callback is called on the qemu_clk so it can change the rate.
  * The qemu_clk_rate_update function is called on all the driven clock.

Signed-off-by: KONRAD Frederic 
---
 include/qemu/qemu-clock.h | 66 +++
 qemu-clock.c  | 56 
 2 files changed, 122 insertions(+)

diff --git a/include/qemu/qemu-clock.h b/include/qemu/qemu-clock.h
index 1d56a2e..d575566 100644
--- a/include/qemu/qemu-clock.h
+++ b/include/qemu/qemu-clock.h
@@ -27,15 +27,29 @@
 #include "qemu/osdep.h"
 #include "qom/object.h"

+typedef uint64_t (*qemu_clk_on_rate_update_cb)(void *opaque, uint64_t rate);


I think it's more readable to have the typedef define
the function type, not the pointer-to-function type. (See for
instance CPReadFn, CPWriteFn in target-arm/cpu.h.) That way
your function pointers in structs and so on have type
"MyFnType *fn;" and are more obviously pointers.

(Also QEMU style says camelcase for types. QEMUClkRateUpdateCallback?)


Ok I'll fix that!




+
 #define TYPE_CLOCK "qemu-clk"
 #define QEMU_CLOCK(obj) OBJECT_CHECK(struct qemu_clk, (obj), TYPE_CLOCK)

+typedef struct ClkList ClkList;
+
 typedef struct qemu_clk {
 /*< private >*/
 Object parent_obj;
 char *name;/* name of this clock in the device. */
+uint64_t in_rate;  /* rate of the clock which drive this pin. */
+uint64_t out_rate; /* rate of this clock pin. */
+void *opaque;
+qemu_clk_on_rate_update_cb cb;
+QLIST_HEAD(, ClkList) bound;
 } *qemu_clk;

+struct ClkList {
+qemu_clk clk;
+QLIST_ENTRY(ClkList) node;
+};
+
 /**
  * qemu_clk_attach_to_device:
  * @dev: the device on which the clock need to be attached.
@@ -59,4 +73,56 @@ void qemu_clk_attach_to_device(DeviceState *dev, qemu_clk 
clk,
  */
 qemu_clk qemu_clk_get_pin(DeviceState *dev, const char *name);

+/**
+ * qemu_clk_bind_clock:
+ * @out: the clock output.
+ * @in: the clock input.
+ *
+ * Connect the clock together. This is unidirectional so a
+ * qemu_clk_update_rate will go from @out to @in.
+ *
+ */
+void qemu_clk_bind_clock(qemu_clk out, qemu_clk in);


Hang on, I thought that passing a clock to
qemu_clk_attach_to_device() was going to be the thing
that connected the clock up...



qemu_clk_attach_to_device() adds the clock to the device so it can be
found later to allow eg: qtree to show the clock tree.

qemu_clk_bind_clock do the actual bind between the clock.


+
+/**
+ * qemu_clk_unbound:
+ * @out: the clock output.
+ * @in: the clock input.
+ *
+ * Disconnect the clocks if they were bound together.
+ *
+ */
+void qemu_clk_unbind(qemu_clk out, qemu_clk in);


Function prototype and comment don't match...


+
+/**
+ * qemu_clk_update_rate:
+ * @clk: the clock to update.
+ * @rate: the new rate.
+ *
+ * Update the @clk to the new @rate.
+ *
+ */
+void qemu_clk_update_rate(qemu_clk clk, uint64_t rate);


What units is this 'rate' in ?


It's Hz, will add that to the comment.



+
+/**
+ * qemu_clk_refresh:
+ * @clk: the clock to be refreshed.
+ *
+ * If a model alters the topology of a clock tree, it must call this function
+ * to refresh the clock tree.
+ *
+ */
+void qemu_clk_refresh(qemu_clk clk);


...for which clock in the tree does it have to call the function?
All of them? Any one of them at random?

Ok I will precise that.

Thanks,
Fred




+
+/**
+ * qemu_clk_set_callback:
+ * @clk: the clock where to set the callback.
+ * @cb: the callback to associate to the callback.
+ * @opaque: the opaque data passed to the calback.
+ *
+ */
+void qemu_clk_set_callback(qemu_clk clk,
+   qemu_clk_on_rate_update_cb cb,
+   void *opaque);
+
 #endif /* QEMU_CLOCK_H */
diff --git a/qemu-clock.c b/qemu-clock.c
index 0ba6caf..541f615 100644
--- a/qemu-clock.c
+++ b/qemu-clock.c
@@ -37,6 +37,62 @@
 }\
 } while (0);

+void qemu_clk_refresh(qemu_clk clk)
+{
+qemu_clk_update_rate(clk, clk->in_rate);
+}
+
+void qemu_clk_update_rate(qemu_clk clk, uint64_t rate)
+{
+ClkList *child;
+
+clk->in_rate = rate;
+clk->out_rate = rate;
+
+if (clk->cb) {
+clk->out_rate = clk->cb(clk->opaque, rate);
+}
+
+DPRINTF("%s output rate updated to %" PRIu64 "\n",
+object_get_canonical_path(OBJECT(clk)),
+clk->out_rate);
+
+QLIST_FOREACH(child, >bound, node) {
+qemu_clk_update_rate(child->clk, clk->out_rate);
+}
+}
+
+void qemu_clk_bind_clock(qemu_clk out, qemu_clk in)
+{
+ClkList *child;
+
+child = g_malloc(sizeof(child));
+assert(child);
+

[Qemu-devel] [PULL 07/19] vfio/pci: Pass an error object to vfio_pci_igd_opregion_init

2016-10-17 Thread Alex Williamson

From: Eric Auger 

Pass an error object to prepare for migration to VFIO-PCI realize.

In vfio_probe_igd_bar4_quirk, simply report the error.

Signed-off-by: Eric Auger 
Reviewed-by: Markus Armbruster 
Signed-off-by: Alex Williamson 
---
 hw/vfio/pci-quirks.c |   10 +-
 hw/vfio/pci.c|3 +--
 hw/vfio/pci.h|3 ++-
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
index 806ea5d..2cbda08 100644
--- a/hw/vfio/pci-quirks.c
+++ b/hw/vfio/pci-quirks.c
@@ -1056,7 +1056,7 @@ typedef struct VFIOIGDQuirk {
  * of the IGD device.
  */
 int vfio_pci_igd_opregion_init(VFIOPCIDevice *vdev,
-   struct vfio_region_info *info)
+   struct vfio_region_info *info, Error **errp)
 {
 int ret;
 
@@ -1064,7 +1064,7 @@ int vfio_pci_igd_opregion_init(VFIOPCIDevice *vdev,
 ret = pread(vdev->vbasedev.fd, vdev->igd_opregion,
 info->size, info->offset);
 if (ret != info->size) {
-error_report("vfio: Error reading IGD OpRegion");
+error_setg(errp, "failed to read IGD OpRegion");
 g_free(vdev->igd_opregion);
 vdev->igd_opregion = NULL;
 return -EINVAL;
@@ -1489,10 +1489,10 @@ static void vfio_probe_igd_bar4_quirk(VFIOPCIDevice 
*vdev, int nr)
 }
 
 /* Setup OpRegion access */
-ret = vfio_pci_igd_opregion_init(vdev, opregion);
+ret = vfio_pci_igd_opregion_init(vdev, opregion, );
 if (ret) {
-error_report("IGD device %s failed to setup OpRegion, "
- "legacy mode disabled", vdev->vbasedev.name);
+error_append_hint(, "IGD legacy mode disabled\n");
+error_reportf_err(err, ERR_PREFIX, vdev->vbasedev.name);
 goto out;
 }
 
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index e2cf6ac..3d84126 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2721,10 +2721,9 @@ static int vfio_initfn(PCIDevice *pdev)
 goto out_teardown;
 }
 
-ret = vfio_pci_igd_opregion_init(vdev, opregion);
+ret = vfio_pci_igd_opregion_init(vdev, opregion, );
 g_free(opregion);
 if (ret) {
-error_setg_errno(, -ret, "IGD OpRegion initialization failed");
 goto out_teardown;
 }
 }
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index 87a62f9..a8366bb 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -164,6 +164,7 @@ void vfio_setup_resetfn_quirk(VFIOPCIDevice *vdev);
 int vfio_populate_vga(VFIOPCIDevice *vdev, Error **errp);
 
 int vfio_pci_igd_opregion_init(VFIOPCIDevice *vdev,
-   struct vfio_region_info *info);
+   struct vfio_region_info *info,
+   Error **errp);
 
 #endif /* HW_VFIO_VFIO_PCI_H */

[Qemu-devel] [PULL 08/19] vfio: Pass an Error object to vfio_connect_container

2016-10-17 Thread Alex Williamson

From: Eric Auger 

The error is currently simply reported in vfio_get_group. Don't
bother too much with the prefix which will be handled at upper level,
later on.

Also return an error value in case container->error is not 0 and
the container is teared down.

On vfio_spapr_remove_window failure, we also report an error whereas
it was silent before.

Signed-off-by: Eric Auger 
Reviewed-by: Markus Armbruster 
Signed-off-by: Alex Williamson 
---
 hw/vfio/common.c |   40 +---
 1 file changed, 25 insertions(+), 15 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 29188a1..85a7759 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -34,6 +34,7 @@
 #include "qemu/range.h"
 #include "sysemu/kvm.h"
 #include "trace.h"
+#include "qapi/error.h"
 
 struct vfio_group_head vfio_group_list =
 QLIST_HEAD_INITIALIZER(vfio_group_list);
@@ -900,7 +901,8 @@ static void vfio_put_address_space(VFIOAddressSpace *space)
 }
 }
 
-static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
+static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
+  Error **errp)
 {
 VFIOContainer *container;
 int ret, fd;
@@ -918,15 +920,15 @@ static int vfio_connect_container(VFIOGroup *group, 
AddressSpace *as)
 
 fd = qemu_open("/dev/vfio/vfio", O_RDWR);
 if (fd < 0) {
-error_report("vfio: failed to open /dev/vfio/vfio: %m");
+error_setg_errno(errp, errno, "failed to open /dev/vfio/vfio");
 ret = -errno;
 goto put_space_exit;
 }
 
 ret = ioctl(fd, VFIO_GET_API_VERSION);
 if (ret != VFIO_API_VERSION) {
-error_report("vfio: supported vfio version: %d, "
- "reported version: %d", VFIO_API_VERSION, ret);
+error_setg(errp, "supported vfio version: %d, "
+   "reported version: %d", VFIO_API_VERSION, ret);
 ret = -EINVAL;
 goto close_fd_exit;
 }
@@ -941,7 +943,7 @@ static int vfio_connect_container(VFIOGroup *group, 
AddressSpace *as)
 
 ret = ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, );
 if (ret) {
-error_report("vfio: failed to set group container: %m");
+error_setg_errno(errp, errno, "failed to set group container");
 ret = -errno;
 goto free_container_exit;
 }
@@ -949,7 +951,7 @@ static int vfio_connect_container(VFIOGroup *group, 
AddressSpace *as)
 container->iommu_type = v2 ? VFIO_TYPE1v2_IOMMU : VFIO_TYPE1_IOMMU;
 ret = ioctl(fd, VFIO_SET_IOMMU, container->iommu_type);
 if (ret) {
-error_report("vfio: failed to set iommu for container: %m");
+error_setg_errno(errp, errno, "failed to set iommu for container");
 ret = -errno;
 goto free_container_exit;
 }
@@ -976,7 +978,7 @@ static int vfio_connect_container(VFIOGroup *group, 
AddressSpace *as)
 
 ret = ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, );
 if (ret) {
-error_report("vfio: failed to set group container: %m");
+error_setg_errno(errp, errno, "failed to set group container");
 ret = -errno;
 goto free_container_exit;
 }
@@ -984,7 +986,7 @@ static int vfio_connect_container(VFIOGroup *group, 
AddressSpace *as)
 v2 ? VFIO_SPAPR_TCE_v2_IOMMU : VFIO_SPAPR_TCE_IOMMU;
 ret = ioctl(fd, VFIO_SET_IOMMU, container->iommu_type);
 if (ret) {
-error_report("vfio: failed to set iommu for container: %m");
+error_setg_errno(errp, errno, "failed to set iommu for container");
 ret = -errno;
 goto free_container_exit;
 }
@@ -997,7 +999,7 @@ static int vfio_connect_container(VFIOGroup *group, 
AddressSpace *as)
 if (!v2) {
 ret = ioctl(fd, VFIO_IOMMU_ENABLE);
 if (ret) {
-error_report("vfio: failed to enable container: %m");
+error_setg_errno(errp, errno, "failed to enable container");
 ret = -errno;
 goto free_container_exit;
 }
@@ -1008,7 +1010,9 @@ static int vfio_connect_container(VFIOGroup *group, 
AddressSpace *as)
  _space_memory);
 if (container->error) {
 memory_listener_unregister(>prereg_listener);
-error_report("vfio: RAM memory listener initialization failed 
for container");
+ret = container->error;
+error_setg(errp,
+"RAM memory listener initialization failed for container");
 goto free_container_exit;
 }
 }
@@ -1016,7 +1020,8 @@ static int vfio_connect_container(VFIOGroup *group, 
AddressSpace *as)
 info.argsz = sizeof(info);
 ret = ioctl(fd,

[Qemu-devel] [PULL 01/19] vfio/pci: Use local error object in vfio_initfn

2016-10-17 Thread Alex Williamson

From: Eric Auger 

To prepare for migration to realize, let's use a local error
object in vfio_initfn. Also let's use the same error prefix for all
error messages.

On top of the 1-1 conversion, we start using a common error prefix for
all error messages. We also introduce a similar warning prefix which will
be used later on.

Signed-off-by: Eric Auger 
Reviewed-by: Markus Armbruster 
Signed-off-by: Alex Williamson 
---
 hw/vfio/pci.c |   72 -
 include/hw/vfio/vfio-common.h |3 ++
 2 files changed, 45 insertions(+), 30 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index a5a620a..417bf7f 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2493,6 +2493,7 @@ static int vfio_initfn(PCIDevice *pdev)
 VFIODevice *vbasedev_iter;
 VFIOGroup *group;
 char *tmp, group_path[PATH_MAX], *group_name;
+Error *err = NULL;
 ssize_t len;
 struct stat st;
 int groupid;
@@ -2506,9 +2507,9 @@ static int vfio_initfn(PCIDevice *pdev)
 }
 
 if (stat(vdev->vbasedev.sysfsdev, ) < 0) {
-error_report("vfio: error: no such host device: %s",
- vdev->vbasedev.sysfsdev);
-return -errno;
+error_setg_errno(, errno, "no such host device");
+ret = -errno;
+goto error;
 }
 
 vdev->vbasedev.name = g_strdup(basename(vdev->vbasedev.sysfsdev));
@@ -2520,40 +2521,43 @@ static int vfio_initfn(PCIDevice *pdev)
 g_free(tmp);
 
 if (len <= 0 || len >= sizeof(group_path)) {
-error_report("vfio: error no iommu_group for device");
-return len < 0 ? -errno : -ENAMETOOLONG;
+ret = len < 0 ? -errno : -ENAMETOOLONG;
+error_setg_errno(, -ret, "no iommu_group found");
+goto error;
 }
 
 group_path[len] = 0;
 
 group_name = basename(group_path);
 if (sscanf(group_name, "%d", ) != 1) {
-error_report("vfio: error reading %s: %m", group_path);
-return -errno;
+error_setg_errno(, errno, "failed to read %s", group_path);
+ret = -errno;
+goto error;
 }
 
 trace_vfio_initfn(vdev->vbasedev.name, groupid);
 
 group = vfio_get_group(groupid, pci_device_iommu_address_space(pdev));
 if (!group) {
-error_report("vfio: failed to get group %d", groupid);
-return -ENOENT;
+error_setg(, "failed to get group %d", groupid);
+ret = -ENOENT;
+goto error;
 }
 
 QLIST_FOREACH(vbasedev_iter, >device_list, next) {
 if (strcmp(vbasedev_iter->name, vdev->vbasedev.name) == 0) {
-error_report("vfio: error: device %s is already attached",
- vdev->vbasedev.name);
+error_setg(, "device is already attached");
 vfio_put_group(group);
-return -EBUSY;
+ret = -EBUSY;
+goto error;
 }
 }
 
 ret = vfio_get_device(group, vdev->vbasedev.name, >vbasedev);
 if (ret) {
-error_report("vfio: failed to get device %s", vdev->vbasedev.name);
+error_setg_errno(, -ret, "failed to get device");
 vfio_put_group(group);
-return ret;
+goto error;
 }
 
 ret = vfio_populate_device(vdev);
@@ -2567,8 +2571,8 @@ static int vfio_initfn(PCIDevice *pdev)
 vdev->config_offset);
 if (ret < (int)MIN(pci_config_size(>pdev), vdev->config_size)) {
 ret = ret < 0 ? -errno : -EFAULT;
-error_report("vfio: Failed to read device config space");
-return ret;
+error_setg_errno(, -ret, "failed to read device config space");
+goto error;
 }
 
 /* vfio emulates a lot for us, but some bits need extra love */
@@ -2584,8 +2588,9 @@ static int vfio_initfn(PCIDevice *pdev)
  */
 if (vdev->vendor_id != PCI_ANY_ID) {
 if (vdev->vendor_id >= 0x) {
-error_report("vfio: Invalid PCI vendor ID provided");
-return -EINVAL;
+error_setg(, "invalid PCI vendor ID provided");
+ret = -EINVAL;
+goto error;
 }
 vfio_add_emulated_word(vdev, PCI_VENDOR_ID, vdev->vendor_id, ~0);
 trace_vfio_pci_emulated_vendor_id(vdev->vbasedev.name, 
vdev->vendor_id);
@@ -2595,8 +2600,9 @@ static int vfio_initfn(PCIDevice *pdev)
 
 if (vdev->device_id != PCI_ANY_ID) {
 if (vdev->device_id > 0x) {
-error_report("vfio: Invalid PCI device ID provided");
-return -EINVAL;
+error_setg(, "invalid PCI device ID provided");
+ret = -EINVAL;
+goto error;
 }
 vfio_add_emulated_word(vdev, PCI_DEVICE_ID, vdev->device_id, ~0);
 trace_vfio_pci_emulated_device_id(vdev->vbasedev.name, 
vdev->device_id);
@@ -2606,8 +2612,9 @@ static int vfio_initfn(PCIDevice *pdev)
 
 if (vdev->sub_vendor_id != PCI_ANY_ID) {
 if

[Qemu-devel] [PULL 04/19] vfio/pci: Pass an error object to vfio_msix_early_setup

2016-10-17 Thread Alex Williamson

From: Eric Auger 

Pass an error object to prepare for migration to VFIO-PCI realize.
The returned value will be removed later on.

We now format an error in case of reading failure for
- the MSIX flags
- the MSIX table,
- the MSIX PBA.

Signed-off-by: Eric Auger 
Reviewed-by: Markus Armbruster 
Signed-off-by: Alex Williamson 
---
 hw/vfio/pci.c |   13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 46e3cb8..02e92b0 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -1277,7 +1277,7 @@ static void vfio_pci_fixup_msix_region(VFIOPCIDevice 
*vdev)
  * need to first look for where the MSI-X table lives.  So we
  * unfortunately split MSI-X setup across two functions.
  */
-static int vfio_msix_early_setup(VFIOPCIDevice *vdev)
+static int vfio_msix_early_setup(VFIOPCIDevice *vdev, Error **errp)
 {
 uint8_t pos;
 uint16_t ctrl;
@@ -1292,16 +1292,19 @@ static int vfio_msix_early_setup(VFIOPCIDevice *vdev)
 
 if (pread(fd, , sizeof(ctrl),
   vdev->config_offset + pos + PCI_MSIX_FLAGS) != sizeof(ctrl)) {
+error_setg_errno(errp, errno, "failed to read PCI MSIX FLAGS");
 return -errno;
 }
 
 if (pread(fd, , sizeof(table),
   vdev->config_offset + pos + PCI_MSIX_TABLE) != sizeof(table)) {
+error_setg_errno(errp, errno, "failed to read PCI MSIX TABLE");
 return -errno;
 }
 
 if (pread(fd, , sizeof(pba),
   vdev->config_offset + pos + PCI_MSIX_PBA) != sizeof(pba)) {
+error_setg_errno(errp, errno, "failed to read PCI MSIX PBA");
 return -errno;
 }
 
@@ -1332,8 +1335,8 @@ static int vfio_msix_early_setup(VFIOPCIDevice *vdev)
 (vdev->device_id & 0xff00) == 0x5800) {
 msix->pba_offset = 0x1000;
 } else {
-error_report("vfio: Hardware reports invalid configuration, "
- "MSIX PBA outside of specified BAR");
+error_setg(errp, "hardware reports invalid configuration, "
+   "MSIX PBA outside of specified BAR");
 g_free(msix);
 return -EINVAL;
 }
@@ -2657,9 +2660,9 @@ static int vfio_initfn(PCIDevice *pdev)
 
 vfio_pci_size_rom(vdev);
 
-ret = vfio_msix_early_setup(vdev);
+ret = vfio_msix_early_setup(vdev, );
 if (ret) {
-return ret;
+goto error;
 }
 
 vfio_bars_setup(vdev);

[Qemu-devel] [PULL 02/19] vfio/pci: Pass an error object to vfio_populate_vga

2016-10-17 Thread Alex Williamson

From: Eric Auger 

Pass an error object to prepare for the same operation in
vfio_populate_device. Eventually this contributes to the migration
to VFIO-PCI realize.

We now report an error on vfio_get_region_info failure.

vfio_probe_igd_bar4_quirk is not involved in the migration to realize
and simply calls error_reportf_err.

Signed-off-by: Eric Auger 
Reviewed-by: Markus Armbruster 
Signed-off-by: Alex Williamson 
---
 hw/vfio/pci-quirks.c |4 +++-
 hw/vfio/pci.c|   19 ---
 hw/vfio/pci.h|2 +-
 3 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
index bec694c..806ea5d 100644
--- a/hw/vfio/pci-quirks.c
+++ b/hw/vfio/pci-quirks.c
@@ -1363,6 +1363,7 @@ static void vfio_probe_igd_bar4_quirk(VFIOPCIDevice 
*vdev, int nr)
 uint64_t *bdsm_size;
 uint32_t gmch;
 uint16_t cmd_orig, cmd;
+Error *err = NULL;
 
 /*
  * This must be an Intel VGA device at address 00:02.0 for us to even
@@ -1464,7 +1465,8 @@ static void vfio_probe_igd_bar4_quirk(VFIOPCIDevice 
*vdev, int nr)
  * try to enable it.  Probably shouldn't be using legacy mode without VGA,
  * but also no point in us enabling VGA if disabled in hardware.
  */
-if (!(gmch & 0x2) && !vdev->vga && vfio_populate_vga(vdev)) {
+if (!(gmch & 0x2) && !vdev->vga && vfio_populate_vga(vdev, )) {
+error_reportf_err(err, ERR_PREFIX, vdev->vbasedev.name);
 error_report("IGD device %s failed to enable VGA access, "
  "legacy mode disabled", vdev->vbasedev.name);
 goto out;
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 417bf7f..9645a77 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2134,7 +2134,7 @@ static VFIODeviceOps vfio_pci_ops = {
 .vfio_eoi = vfio_intx_eoi,
 };
 
-int vfio_populate_vga(VFIOPCIDevice *vdev)
+int vfio_populate_vga(VFIOPCIDevice *vdev, Error **errp)
 {
 VFIODevice *vbasedev = >vbasedev;
 struct vfio_region_info *reg_info;
@@ -2142,15 +2142,18 @@ int vfio_populate_vga(VFIOPCIDevice *vdev)
 
 ret = vfio_get_region_info(vbasedev, VFIO_PCI_VGA_REGION_INDEX, _info);
 if (ret) {
+error_setg_errno(errp, -ret,
+ "failed getting region info for VGA region index %d",
+ VFIO_PCI_VGA_REGION_INDEX);
 return ret;
 }
 
 if (!(reg_info->flags & VFIO_REGION_INFO_FLAG_READ) ||
 !(reg_info->flags & VFIO_REGION_INFO_FLAG_WRITE) ||
 reg_info->size < 0xb + 1) {
-error_report("vfio: Unexpected VGA info, flags 0x%lx, size 0x%lx",
- (unsigned long)reg_info->flags,
- (unsigned long)reg_info->size);
+error_setg(errp, "unexpected VGA info, flags 0x%lx, size 0x%lx",
+   (unsigned long)reg_info->flags,
+   (unsigned long)reg_info->size);
 g_free(reg_info);
 return -EINVAL;
 }
@@ -2205,6 +2208,7 @@ static int vfio_populate_device(VFIOPCIDevice *vdev)
 struct vfio_region_info *reg_info;
 struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info) };
 int i, ret = -1;
+Error *err = NULL;
 
 /* Sanity check device */
 if (!(vbasedev->flags & VFIO_DEVICE_FLAGS_PCI)) {
@@ -2259,10 +2263,11 @@ static int vfio_populate_device(VFIOPCIDevice *vdev)
 g_free(reg_info);
 
 if (vdev->features & VFIO_FEATURE_ENABLE_VGA) {
-ret = vfio_populate_vga(vdev);
+ret = vfio_populate_vga(vdev, );
 if (ret) {
-error_report(
-"vfio: Device does not support requested feature x-vga");
+error_append_hint(, "device does not support "
+  "requested feature x-vga\n");
+error_reportf_err(err, ERR_PREFIX, vdev->vbasedev.name);
 goto error;
 }
 }
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index 7d482d9..87a62f9 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -161,7 +161,7 @@ void vfio_bar_quirk_exit(VFIOPCIDevice *vdev, int nr);
 void vfio_bar_quirk_finalize(VFIOPCIDevice *vdev, int nr);
 void vfio_setup_resetfn_quirk(VFIOPCIDevice *vdev);
 
-int vfio_populate_vga(VFIOPCIDevice *vdev);
+int vfio_populate_vga(VFIOPCIDevice *vdev, Error **errp);
 
 int vfio_pci_igd_opregion_init(VFIOPCIDevice *vdev,
struct vfio_region_info *info);

[Qemu-devel] [PULL 16/19] vfio/pci: Remove vfio_populate_device returned value

2016-10-17 Thread Alex Williamson

From: Eric Auger 

The returned value (either -errno or -1) is not used anymore by the caller,
vfio_realize, since the error now is stored in the error object. So let's
remove it.

Signed-off-by: Eric Auger 
Reviewed-by: Markus Armbruster 
Signed-off-by: Alex Williamson 
---
 hw/vfio/pci.c |   23 ++-
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index f063c65..6d01324 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2223,7 +2223,7 @@ int vfio_populate_vga(VFIOPCIDevice *vdev, Error **errp)
 return 0;
 }
 
-static int vfio_populate_device(VFIOPCIDevice *vdev, Error **errp)
+static void vfio_populate_device(VFIOPCIDevice *vdev, Error **errp)
 {
 VFIODevice *vbasedev = >vbasedev;
 struct vfio_region_info *reg_info;
@@ -2233,18 +2233,18 @@ static int vfio_populate_device(VFIOPCIDevice *vdev, 
Error **errp)
 /* Sanity check device */
 if (!(vbasedev->flags & VFIO_DEVICE_FLAGS_PCI)) {
 error_setg(errp, "this isn't a PCI device");
-goto error;
+return;
 }
 
 if (vbasedev->num_regions < VFIO_PCI_CONFIG_REGION_INDEX + 1) {
 error_setg(errp, "unexpected number of io regions %u",
vbasedev->num_regions);
-goto error;
+return;
 }
 
 if (vbasedev->num_irqs < VFIO_PCI_MSIX_IRQ_INDEX + 1) {
 error_setg(errp, "unexpected number of irqs %u", vbasedev->num_irqs);
-goto error;
+return;
 }
 
 for (i = VFIO_PCI_BAR0_REGION_INDEX; i < VFIO_PCI_ROM_REGION_INDEX; i++) {
@@ -2256,7 +2256,7 @@ static int vfio_populate_device(VFIOPCIDevice *vdev, 
Error **errp)
 
 if (ret) {
 error_setg_errno(errp, -ret, "failed to get region %d info", i);
-goto error;
+return;
 }
 
 QLIST_INIT(>bars[i].quirks);
@@ -2266,7 +2266,7 @@ static int vfio_populate_device(VFIOPCIDevice *vdev, 
Error **errp)
VFIO_PCI_CONFIG_REGION_INDEX, _info);
 if (ret) {
 error_setg_errno(errp, -ret, "failed to get config info");
-goto error;
+return;
 }
 
 trace_vfio_populate_device_config(vdev->vbasedev.name,
@@ -2287,7 +2287,7 @@ static int vfio_populate_device(VFIOPCIDevice *vdev, 
Error **errp)
 if (ret) {
 error_append_hint(errp, "device does not support "
   "requested feature x-vga\n");
-goto error;
+return;
 }
 }
 
@@ -2297,7 +2297,6 @@ static int vfio_populate_device(VFIOPCIDevice *vdev, 
Error **errp)
 if (ret) {
 /* This can fail for an old kernel or legacy PCI dev */
 trace_vfio_populate_device_get_irq_info_failure();
-ret = 0;
 } else if (irq_info.count == 1) {
 vdev->pci_aer = true;
 } else {
@@ -2305,9 +2304,6 @@ static int vfio_populate_device(VFIOPCIDevice *vdev, 
Error **errp)
  "Could not enable error recovery for the device",
  vbasedev->name);
 }
-
-error:
-return ret;
 }
 
 static void vfio_put_device(VFIOPCIDevice *vdev)
@@ -2579,8 +2575,9 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 goto error;
 }
 
-ret = vfio_populate_device(vdev, errp);
-if (ret) {
+vfio_populate_device(vdev, );
+if (err) {
+error_propagate(errp, err);
 goto error;
 }

[Qemu-devel] [RFC v5 6/8] hw: platform-bus: Add platform bus stub

2016-10-17 Thread Eric Auger

platform_bus_map_region is bound to be called in VFIO common code.

Let's compile this stub in case CONFIG_SOFTMMU is set (VFIO requirement)
and CONFIG_PLATFORM_BUS is not set.

Signed-off-by: Eric Auger 
---
 hw/core/Makefile.objs   |  2 +-
 hw/core/platform-bus-stub.c | 27 +++
 2 files changed, 28 insertions(+), 1 deletion(-)
 create mode 100644 hw/core/platform-bus-stub.c

diff --git a/hw/core/Makefile.objs b/hw/core/Makefile.objs
index a4c94e5..f6ececc 100644
--- a/hw/core/Makefile.objs
+++ b/hw/core/Makefile.objs
@@ -18,5 +18,5 @@ common-obj-$(CONFIG_SOFTMMU) += qdev-properties-system.o
 common-obj-$(CONFIG_SOFTMMU) += register.o
 common-obj-$(CONFIG_SOFTMMU) += or-irq.o
 common-obj-$(CONFIG_PLATFORM_BUS) += platform-bus.o
-
+common-obj-$(call land,$(CONFIG_SOFTMMU),$(call lnot,$(CONFIG_PLATFORM_BUS))) 
+= platform-bus-stub.o
 obj-$(CONFIG_SOFTMMU) += generic-loader.o
diff --git a/hw/core/platform-bus-stub.c b/hw/core/platform-bus-stub.c
new file mode 100644
index 000..41d4ab6
--- /dev/null
+++ b/hw/core/platform-bus-stub.c
@@ -0,0 +1,27 @@
+/*
+ * Platform Bus device stub
+ *
+ * platform_bus_map_region is used in VFIO common code
+ *
+ * Copyright Red Hat, Inc. 2016
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "hw/platform-bus.h"
+
+void platform_bus_map_region(PlatformBusDevice *pbus, MemoryRegion *mr)
+{
+}
-- 
1.9.1

[Qemu-devel] [PULL 03/19] vfio/pci: Pass an error object to vfio_populate_device

2016-10-17 Thread Alex Williamson

From: Eric Auger 

Pass an error object to prepare for migration to VFIO-PCI realize.
The returned value will be removed later on.

The case where error recovery cannot be enabled is not converted into
an error object but directly reported through error_report, as before.
Populating an error instead would cause the future realize function to
fail, which is not wanted.

Signed-off-by: Eric Auger 
Reviewed-by: Markus Armbruster 
Signed-off-by: Alex Williamson 
---
 hw/vfio/pci.c |   26 --
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 9645a77..46e3cb8 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2202,28 +2202,27 @@ int vfio_populate_vga(VFIOPCIDevice *vdev, Error **errp)
 return 0;
 }
 
-static int vfio_populate_device(VFIOPCIDevice *vdev)
+static int vfio_populate_device(VFIOPCIDevice *vdev, Error **errp)
 {
 VFIODevice *vbasedev = >vbasedev;
 struct vfio_region_info *reg_info;
 struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info) };
 int i, ret = -1;
-Error *err = NULL;
 
 /* Sanity check device */
 if (!(vbasedev->flags & VFIO_DEVICE_FLAGS_PCI)) {
-error_report("vfio: Um, this isn't a PCI device");
+error_setg(errp, "this isn't a PCI device");
 goto error;
 }
 
 if (vbasedev->num_regions < VFIO_PCI_CONFIG_REGION_INDEX + 1) {
-error_report("vfio: unexpected number of io regions %u",
- vbasedev->num_regions);
+error_setg(errp, "unexpected number of io regions %u",
+   vbasedev->num_regions);
 goto error;
 }
 
 if (vbasedev->num_irqs < VFIO_PCI_MSIX_IRQ_INDEX + 1) {
-error_report("vfio: unexpected number of irqs %u", vbasedev->num_irqs);
+error_setg(errp, "unexpected number of irqs %u", vbasedev->num_irqs);
 goto error;
 }
 
@@ -2235,7 +2234,7 @@ static int vfio_populate_device(VFIOPCIDevice *vdev)
 g_free(name);
 
 if (ret) {
-error_report("vfio: Error getting region %d info: %m", i);
+error_setg_errno(errp, -ret, "failed to get region %d info", i);
 goto error;
 }
 
@@ -2245,7 +2244,7 @@ static int vfio_populate_device(VFIOPCIDevice *vdev)
 ret = vfio_get_region_info(vbasedev,
VFIO_PCI_CONFIG_REGION_INDEX, _info);
 if (ret) {
-error_report("vfio: Error getting config info: %m");
+error_setg_errno(errp, -ret, "failed to get config info");
 goto error;
 }
 
@@ -2263,11 +2262,10 @@ static int vfio_populate_device(VFIOPCIDevice *vdev)
 g_free(reg_info);
 
 if (vdev->features & VFIO_FEATURE_ENABLE_VGA) {
-ret = vfio_populate_vga(vdev, );
+ret = vfio_populate_vga(vdev, errp);
 if (ret) {
-error_append_hint(, "device does not support "
+error_append_hint(errp, "device does not support "
   "requested feature x-vga\n");
-error_reportf_err(err, ERR_PREFIX, vdev->vbasedev.name);
 goto error;
 }
 }
@@ -2282,7 +2280,7 @@ static int vfio_populate_device(VFIOPCIDevice *vdev)
 } else if (irq_info.count == 1) {
 vdev->pci_aer = true;
 } else {
-error_report("vfio: %s "
+error_report(WARN_PREFIX
  "Could not enable error recovery for the device",
  vbasedev->name);
 }
@@ -2565,9 +2563,9 @@ static int vfio_initfn(PCIDevice *pdev)
 goto error;
 }
 
-ret = vfio_populate_device(vdev);
+ret = vfio_populate_device(vdev, );
 if (ret) {
-return ret;
+goto error;
 }
 
 /* Get a copy of config space */

[Qemu-devel] [PATCH v7] timer: a9gtimer: remove loop to auto-increment comparator

2016-10-17 Thread P J P

From: Prasad J Pandit 

ARM A9MP processor has a peripheral timer with an auto-increment
register, which holds an increment step value. A user could set
this value to zero. When auto-increment control bit is enabled,
it leads to an infinite loop in 'a9_gtimer_update' while
updating comparator value. Remove this loop incrementing the
comparator value.

Reported-by: Li Qiang 
Signed-off-by: Prasad J Pandit 
---
 hw/timer/a9gtimer.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

Use QEMU_ALIGN_UP instead of QEMU_ALIGN_DOWN
  -> https://lists.gnu.org/archive/html/qemu-devel/2016-10/msg03788.html

diff --git a/hw/timer/a9gtimer.c b/hw/timer/a9gtimer.c
index 772f85f..ce1dc63 100644
--- a/hw/timer/a9gtimer.c
+++ b/hw/timer/a9gtimer.c
@@ -82,15 +82,15 @@ static void a9_gtimer_update(A9GTimerState *s, bool sync)
 if ((s->control & R_CONTROL_TIMER_ENABLE) &&
 (gtb->control & R_CONTROL_COMP_ENABLE)) {
 /* R2p0+, where the compare function is >= */
-while (gtb->compare < update.new) {
+if (gtb->compare < update.new) {
 DB_PRINT("Compare event happened for CPU %d\n", i);
 gtb->status = 1;
-if (gtb->control & R_CONTROL_AUTO_INCREMENT) {
-DB_PRINT("Auto incrementing timer compare by %" PRId32 
"\n",
- gtb->inc);
-gtb->compare += gtb->inc;
-} else {
-break;
+if (gtb->control & R_CONTROL_AUTO_INCREMENT && gtb->inc) {
+uint64_t inc =
+QEMU_ALIGN_UP(update.new - gtb->compare, gtb->inc);
+DB_PRINT("Auto incrementing timer compare by %"
+PRId64 "\n", inc);
+gtb->compare += inc;
 }
 }
 cdiff = (int64_t)gtb->compare - (int64_t)update.new + 1;
-- 
2.7.4

[Qemu-devel] [RFC v5 5/8] hw: platform-bus: Enable to map any memory region onto the platform-bus

2016-10-17 Thread Eric Auger

The platform bus is currently used to map dynamically instantiable
platform device MMIO regions. The platform bus also can be seen as a
pool of free guest physical addresses. We would like to use that pool
to allocate a contiguous reserved IOVA region usable for MSI message
address IOMMU mapping.

This patch introduces platform_bus_map_region which enables to map any
memory region onto the platform bus.

Signed-off-by: Eric Auger 
Reviewed-by: Peter Maydell 

---

v2 -> v3:
include qapi/error.h
---
 hw/core/platform-bus.c| 27 +--
 include/hw/platform-bus.h |  7 +++
 2 files changed, 24 insertions(+), 10 deletions(-)

diff --git a/hw/core/platform-bus.c b/hw/core/platform-bus.c
index 329ac67..3fb6f6f 100644
--- a/hw/core/platform-bus.c
+++ b/hw/core/platform-bus.c
@@ -24,6 +24,7 @@
 #include "exec/address-spaces.h"
 #include "qemu/error-report.h"
 #include "sysemu/sysemu.h"
+#include "qapi/error.h"
 
 
 /*
@@ -127,16 +128,14 @@ static void platform_bus_map_irq(PlatformBusDevice *pbus, 
SysBusDevice *sbdev,
 sysbus_connect_irq(sbdev, n, pbus->irqs[irqn]);
 }
 
-static void platform_bus_map_mmio(PlatformBusDevice *pbus, SysBusDevice *sbdev,
-  int n)
+void platform_bus_map_region(PlatformBusDevice *pbus, MemoryRegion *mr)
 {
-MemoryRegion *sbdev_mr = sysbus_mmio_get_region(sbdev, n);
-uint64_t size = memory_region_size(sbdev_mr);
+uint64_t size = memory_region_size(mr);
 uint64_t alignment = (1ULL << (63 - clz64(size + size - 1)));
 uint64_t off;
 bool found_region = false;
 
-if (memory_region_is_mapped(sbdev_mr)) {
+if (memory_region_is_mapped(mr)) {
 /* Region is already mapped, nothing to do */
 return;
 }
@@ -153,13 +152,21 @@ static void platform_bus_map_mmio(PlatformBusDevice 
*pbus, SysBusDevice *sbdev,
 }
 
 if (!found_region) {
-error_report("Platform Bus: Can not fit MMIO region of size %"PRIx64,
- size);
-exit(1);
+error_setg(_fatal,
+   "Platform Bus: Can not fit region %s of size %"PRIx64,
+   mr->name, size);
 }
 
-/* Map the device's region into our Platform Bus MMIO space */
-memory_region_add_subregion(>mmio, off, sbdev_mr);
+/* Map the region into our Platform Bus MMIO space */
+memory_region_add_subregion(>mmio, off, mr);
+}
+
+static void platform_bus_map_mmio(PlatformBusDevice *pbus, SysBusDevice *sbdev,
+  int n)
+{
+MemoryRegion *sbdev_mr = sysbus_mmio_get_region(sbdev, n);
+
+platform_bus_map_region(pbus, sbdev_mr);
 }
 
 /*
diff --git a/include/hw/platform-bus.h b/include/hw/platform-bus.h
index a00775c..6d3a664 100644
--- a/include/hw/platform-bus.h
+++ b/include/hw/platform-bus.h
@@ -54,4 +54,11 @@ int platform_bus_get_irqn(PlatformBusDevice *platform_bus, 
SysBusDevice *sbdev,
 hwaddr platform_bus_get_mmio_addr(PlatformBusDevice *pbus, SysBusDevice *sbdev,
   int n);
 
+/**
+ * platform_bus_map_region: map a MemoryRegion into the platform bus
+ * @pbus: platform bus handle
+ * @mr: memory region handle
+ */
+void platform_bus_map_region(PlatformBusDevice *pbus, MemoryRegion *mr);
+
 #endif /* HW_PLATFORM_BUS_H */
-- 
1.9.1

[Qemu-devel] [PULL 15/19] vfio/pci: Remove vfio_msix_early_setup returned value

2016-10-17 Thread Alex Williamson

From: Eric Auger 

The returned value is not used anymore by the caller, vfio_realize,
since the error now is stored in the error object. So let's remove it.

Signed-off-by: Eric Auger 
Reviewed-by: Markus Armbruster 
Signed-off-by: Alex Williamson 
---
 hw/vfio/pci.c |   20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index d9652c2..f063c65 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -1290,7 +1290,7 @@ static void vfio_pci_fixup_msix_region(VFIOPCIDevice 
*vdev)
  * need to first look for where the MSI-X table lives.  So we
  * unfortunately split MSI-X setup across two functions.
  */
-static int vfio_msix_early_setup(VFIOPCIDevice *vdev, Error **errp)
+static void vfio_msix_early_setup(VFIOPCIDevice *vdev, Error **errp)
 {
 uint8_t pos;
 uint16_t ctrl;
@@ -1300,25 +1300,25 @@ static int vfio_msix_early_setup(VFIOPCIDevice *vdev, 
Error **errp)
 
 pos = pci_find_capability(>pdev, PCI_CAP_ID_MSIX);
 if (!pos) {
-return 0;
+return;
 }
 
 if (pread(fd, , sizeof(ctrl),
   vdev->config_offset + pos + PCI_MSIX_FLAGS) != sizeof(ctrl)) {
 error_setg_errno(errp, errno, "failed to read PCI MSIX FLAGS");
-return -errno;
+return;
 }
 
 if (pread(fd, , sizeof(table),
   vdev->config_offset + pos + PCI_MSIX_TABLE) != sizeof(table)) {
 error_setg_errno(errp, errno, "failed to read PCI MSIX TABLE");
-return -errno;
+return;
 }
 
 if (pread(fd, , sizeof(pba),
   vdev->config_offset + pos + PCI_MSIX_PBA) != sizeof(pba)) {
 error_setg_errno(errp, errno, "failed to read PCI MSIX PBA");
-return -errno;
+return;
 }
 
 ctrl = le16_to_cpu(ctrl);
@@ -1351,7 +1351,7 @@ static int vfio_msix_early_setup(VFIOPCIDevice *vdev, 
Error **errp)
 error_setg(errp, "hardware reports invalid configuration, "
"MSIX PBA outside of specified BAR");
 g_free(msix);
-return -EINVAL;
+return;
 }
 }
 
@@ -1360,8 +1360,6 @@ static int vfio_msix_early_setup(VFIOPCIDevice *vdev, 
Error **errp)
 vdev->msix = msix;
 
 vfio_pci_fixup_msix_region(vdev);
-
-return 0;
 }
 
 static int vfio_msix_setup(VFIOPCIDevice *vdev, int pos, Error **errp)
@@ -2519,6 +2517,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 VFIODevice *vbasedev_iter;
 VFIOGroup *group;
 char *tmp, group_path[PATH_MAX], *group_name;
+Error *err = NULL;
 ssize_t len;
 struct stat st;
 int groupid;
@@ -2670,8 +2669,9 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 
 vfio_pci_size_rom(vdev);
 
-ret = vfio_msix_early_setup(vdev, errp);
-if (ret) {
+vfio_msix_early_setup(vdev, );
+if (err) {
+error_propagate(errp, err);
 goto error;
 }

[Qemu-devel] [PULL 00/19] VFIO updates 2016-10-17

2016-10-17 Thread Alex Williamson

The following changes since commit 0975b8b823a888d474fa33821dfe84e6904db197:

  Merge remote-tracking branch 'remotes/gkurz/tags/for-upstream' into staging 
(2016-10-17 16:17:51 +0100)

are available in the git repository at:


  git://github.com/awilliam/qemu-vfio.git tags/vfio-updates-20161017.0

for you to fetch changes up to 893bfc3cc893ed36cedc364e99cf483e9b08c294:

  vfio: fix duplicate function call (2016-10-17 10:58:03 -0600)


VFIO updates 2016-10-17

 - Convert to realize & improve error reporting (Eric Auger)
 - RTL quirk bug fix (Thorsten Kohfeldt)
 - Skip duplicate pre/post reset (Cao jin)


Cao jin (1):
  vfio: fix duplicate function call

Eric Auger (17):
  vfio/pci: Use local error object in vfio_initfn
  vfio/pci: Pass an error object to vfio_populate_vga
  vfio/pci: Pass an error object to vfio_populate_device
  vfio/pci: Pass an error object to vfio_msix_early_setup
  vfio/pci: Pass an error object to vfio_intx_enable
  vfio/pci: Pass an error object to vfio_add_capabilities
  vfio/pci: Pass an error object to vfio_pci_igd_opregion_init
  vfio: Pass an Error object to vfio_connect_container
  vfio: Pass an error object to vfio_get_group
  vfio: Pass an error object to vfio_get_device
  vfio/platform: Pass an error object to vfio_populate_device
  vfio/platform: fix a wrong returned value in vfio_populate_device
  vfio/platform: Pass an error object to vfio_base_device_init
  vfio/pci: Conversion to realize
  vfio/pci: Remove vfio_msix_early_setup returned value
  vfio/pci: Remove vfio_populate_device returned value
  vfio/pci: Handle host oversight

Thorsten Kohfeldt (1):
  vfio/pci: Fix vfio_rtl8168_quirk_data_read address offset

 hw/vfio/common.c  |  69 ++-
 hw/vfio/pci-quirks.c  |  16 +--
 hw/vfio/pci.c | 279 --
 hw/vfio/pci.h |   5 +-
 hw/vfio/platform.c|  66 +-
 hw/vfio/trace-events  |   2 +-
 include/hw/vfio/vfio-common.h |   7 +-
 7 files changed, 253 insertions(+), 191 deletions(-)

[Qemu-devel] [RFC v5 8/8] hw: vfio: common: Adapt vfio_listeners for reserved_iova region

2016-10-17 Thread Eric Auger

In case of reserved iova region, let's declare this region to the
kernel so that it can use it for IOVA/HPA bindings.

This is achieved through the new vfio_register_msi_iova helper.

Signed-off-by: Eric Auger 

---

v4 -> v5:
- squashed "hw: vfio: common: Introduce vfio_register_msi_iova" into
  this patch since checkpatch complained with extern
---
 hw/vfio/common.c | 68 +++-
 1 file changed, 53 insertions(+), 15 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 1e2282c..96be2e8 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -214,6 +214,32 @@ static int vfio_dma_unmap(VFIOContainer *container,
 return 0;
 }
 
+/**
+ * vfio_register_msi_iova: registers the MSI iova region
+ *
+ * @container: container handle
+ * @iova: base IOVA of the MSI region
+ * @size: size of the MSI IOVA region
+ */
+static int vfio_register_msi_iova(VFIOContainer *container, hwaddr iova,
+  ram_addr_t size)
+{
+int ret;
+struct vfio_iommu_type1_dma_map map = {
+.argsz = sizeof(map),
+.flags = VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA,
+.iova = iova,
+.size = size,
+};
+
+ret = ioctl(container->fd, VFIO_IOMMU_MAP_DMA, );
+
+if (ret) {
+error_report("VFIO_MAP_DMA/RESERVED_MSI_IOVA: %m");
+}
+return ret;
+}
+
 static int vfio_dma_map(VFIOContainer *container, hwaddr iova,
 ram_addr_t size, void *vaddr, bool readonly)
 {
@@ -285,6 +311,7 @@ static int vfio_host_win_del(VFIOContainer *container, 
hwaddr min_iova,
 static bool vfio_listener_skipped_section(MemoryRegionSection *section)
 {
 return (!memory_region_is_ram(section->mr) &&
+!memory_region_is_reserved_iova(section->mr) &&
 !memory_region_is_iommu(section->mr)) ||
/*
 * Sizing an enabled 64-bit BAR can cause spurious mappings to
@@ -368,7 +395,7 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 hwaddr iova, end;
 Int128 llend, llsize;
 void *vaddr;
-int ret;
+int ret = -1;
 VFIOHostDMAWindow *hostwin;
 bool hostwin_found;
 
@@ -464,27 +491,38 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 return;
 }
 
-/* Here we assume that memory_region_is_ram(section->mr)==true */
+/* Here we assume that the memory region is ram or reserved iova */
 
-vaddr = memory_region_get_ram_ptr(section->mr) +
-section->offset_within_region +
-(iova - section->offset_within_address_space);
+if (memory_region_is_ram(section->mr)) {
+vaddr = memory_region_get_ram_ptr(section->mr) +
+section->offset_within_region +
+(iova - section->offset_within_address_space);
 
-trace_vfio_listener_region_add_ram(iova, end, vaddr);
+trace_vfio_listener_region_add_ram(iova, end, vaddr);
 
-llsize = int128_sub(llend, int128_make64(iova));
+llsize = int128_sub(llend, int128_make64(iova));
 
-ret = vfio_dma_map(container, iova, int128_get64(llsize),
+ret = vfio_dma_map(container, iova, int128_get64(llsize),
vaddr, section->readonly);
-if (ret) {
-error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
- "0x%"HWADDR_PRIx", %p) = %d (%m)",
- container, iova, int128_get64(llsize), vaddr, ret);
-goto fail;
+if (ret) {
+error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
+ "0x%"HWADDR_PRIx", %p) = %d (%m)",
+ container, iova, int128_get64(llsize), vaddr, ret);
+goto fail;
+}
+return;
+} else if (memory_region_is_reserved_iova(section->mr)) {
+llsize = int128_sub(llend, int128_make64(iova));
+ret = vfio_register_msi_iova(container, iova, int128_get64(llsize));
+if (ret) {
+error_report("vfio_register_msi_iova(%p, 0x%"HWADDR_PRIx", "
+ "0x%"HWADDR_PRIx") = %d (%m)",
+ container, iova, int128_get64(llsize), ret);
+goto fail;
+}
+return;
 }
 
-return;
-
 fail:
 /*
  * On the initfn path, store the first error in the container so we
-- 
1.9.1

[Qemu-devel] [RFC v5 0/8] KVM PCI/MSI passthrough with mach-virt

2016-10-17 Thread Eric Auger

On ARM, MSI transactions emitted by passthrough'ed devices are translated
by the IOMMU.  So the host must allocate IOVAs and map them to the host
MSI frame physical addresses. Those IOVAs must be allocated within safe
GPA slots, unused by the guest.

The QEMU VFIO device retrieves the size if the IOVA window needed by the
host using a new VFIO IOMMU type capability chain API. This window is
allocated on guest address space withing the platform bus memory container.
This latter acts as a pool of usable GPA and comes with its own GPA allocator.
The memory region is tagged as "reserved_iova". The vfio_listener_region_add
callback is in charge of passing the window characteristics to the kernel
through an extended VFIO_IOMMU_MAP_DMA ioctl.

Best Regards

Eric

Dependencies:
The series depends on the not yet upstream kernel series:
[PATCH v14 00/16] KVM PCIe/MSI passthrough on ARM/ARM64,
https://lkml.org/lkml/2016/10/12/347

Git:
https://github.com/eauger/qemu/tree/v2.7.0-passthrough-rfc-v5

History:

RFC v4 -> RFC v5:
- update linux header according to last user API changes introduced in
  [PATCH v14 00/16] KVM PCIe/MSI passthrough on ARM/ARM64
- fix compilation issue on platforms not setting CONFIG_PLATFORM_BUS
- squash [RFC v4 3/8] hw: vfio: common: Introduce vfio_register_msi_iova
  into the last patch to avoid compilation warning

RFCv3 -> RFC v4:
- initialize err to NULL in vfio_connect_container and fix hint

RFCv2 -> RFC v3:
- IOVA aperture size is not arbitrary anymore. It is retrieved from the host
  usig VFIO IOMMU type capability chain API
- GPEX related patches removed since the warning is not seen anymore

RFC v1 -> RFC v2:
- now uses platform bus MMIO for mapping reserved IOVA region; hence the
  new patch file:
  "hw: platform-bus: enable to map any memory region onto the platform-bus"

Eric Auger (8):
  linux-headers: Partial update for MSI IOVA handling
  hw: vfio: common: vfio_get_iommu_type1_info
  memory: Add reserved_iova region type
  memory: memory_region_find_by_name
  hw: platform-bus: Enable to map any memory region onto the
platform-bus
  hw: platform-bus: Add platform bus stub
  hw: vfio: common: vfio_prepare_msi_mapping
  hw: vfio: common: Adapt vfio_listeners for reserved_iova region

 hw/core/Makefile.objs   |   2 +-
 hw/core/platform-bus-stub.c |  27 +++
 hw/core/platform-bus.c  |  27 ---
 hw/vfio/common.c| 169 ++--
 include/exec/memory.h   |  40 +++
 include/hw/platform-bus.h   |   7 ++
 linux-headers/linux/vfio.h  |  30 +++-
 memory.c|  27 +++
 8 files changed, 295 insertions(+), 34 deletions(-)
 create mode 100644 hw/core/platform-bus-stub.c

-- 
1.9.1

[Qemu-devel] [PULL 06/19] vfio/pci: Pass an error object to vfio_add_capabilities

2016-10-17 Thread Alex Williamson

From: Eric Auger 

Pass an error object to prepare for migration to VFIO-PCI realize.
The error is cascaded downto vfio_add_std_cap and then vfio_msi(x)_setup,
vfio_setup_pcie_cap.

vfio_add_ext_cap does not return anything else than 0 so let's transform
it into a void function.

Also use pci_add_capability2 which takes an error object.

Signed-off-by: Eric Auger 
Reviewed-by: Markus Armbruster 
Signed-off-by: Alex Williamson 
---
 hw/vfio/pci.c |   59 ++---
 1 file changed, 31 insertions(+), 28 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 42161c8..e2cf6ac 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -1180,7 +1180,7 @@ static void vfio_disable_interrupts(VFIOPCIDevice *vdev)
 }
 }
 
-static int vfio_msi_setup(VFIOPCIDevice *vdev, int pos)
+static int vfio_msi_setup(VFIOPCIDevice *vdev, int pos, Error **errp)
 {
 uint16_t ctrl;
 bool msi_64bit, msi_maskbit;
@@ -1189,6 +1189,7 @@ static int vfio_msi_setup(VFIOPCIDevice *vdev, int pos)
 
 if (pread(vdev->vbasedev.fd, , sizeof(ctrl),
   vdev->config_offset + pos + PCI_CAP_FLAGS) != sizeof(ctrl)) {
+error_setg_errno(errp, errno, "failed reading MSI PCI_CAP_FLAGS");
 return -errno;
 }
 ctrl = le16_to_cpu(ctrl);
@@ -1204,8 +1205,8 @@ static int vfio_msi_setup(VFIOPCIDevice *vdev, int pos)
 if (ret == -ENOTSUP) {
 return 0;
 }
-error_prepend(, "vfio: msi_init failed: ");
-error_report_err(err);
+error_prepend(, "msi_init failed: ");
+error_propagate(errp, err);
 return ret;
 }
 vdev->msi_cap_size = 0xa + (msi_maskbit ? 0xa : 0) + (msi_64bit ? 0x4 : 0);
@@ -1363,7 +1364,7 @@ static int vfio_msix_early_setup(VFIOPCIDevice *vdev, 
Error **errp)
 return 0;
 }
 
-static int vfio_msix_setup(VFIOPCIDevice *vdev, int pos)
+static int vfio_msix_setup(VFIOPCIDevice *vdev, int pos, Error **errp)
 {
 int ret;
 
@@ -1378,7 +1379,7 @@ static int vfio_msix_setup(VFIOPCIDevice *vdev, int pos)
 if (ret == -ENOTSUP) {
 return 0;
 }
-error_report("vfio: msix_init failed");
+error_setg(errp, "msix_init failed");
 return ret;
 }
 
@@ -1563,7 +1564,8 @@ static void vfio_add_emulated_long(VFIOPCIDevice *vdev, 
int pos,
 vfio_set_long_bits(vdev->emulated_config_bits + pos, mask, mask);
 }
 
-static int vfio_setup_pcie_cap(VFIOPCIDevice *vdev, int pos, uint8_t size)
+static int vfio_setup_pcie_cap(VFIOPCIDevice *vdev, int pos, uint8_t size,
+   Error **errp)
 {
 uint16_t flags;
 uint8_t type;
@@ -1575,8 +1577,8 @@ static int vfio_setup_pcie_cap(VFIOPCIDevice *vdev, int 
pos, uint8_t size)
 type != PCI_EXP_TYPE_LEG_END &&
 type != PCI_EXP_TYPE_RC_END) {
 
-error_report("vfio: Assignment of PCIe type 0x%x "
- "devices is not currently supported", type);
+error_setg(errp, "assignment of PCIe type 0x%x "
+   "devices is not currently supported", type);
 return -EINVAL;
 }
 
@@ -1710,7 +1712,7 @@ static void vfio_check_af_flr(VFIOPCIDevice *vdev, 
uint8_t pos)
 }
 }
 
-static int vfio_add_std_cap(VFIOPCIDevice *vdev, uint8_t pos)
+static int vfio_add_std_cap(VFIOPCIDevice *vdev, uint8_t pos, Error **errp)
 {
 PCIDevice *pdev = >pdev;
 uint8_t cap_id, next, size;
@@ -1735,9 +1737,9 @@ static int vfio_add_std_cap(VFIOPCIDevice *vdev, uint8_t 
pos)
  * will be changed as we unwind the stack.
  */
 if (next) {
-ret = vfio_add_std_cap(vdev, next);
+ret = vfio_add_std_cap(vdev, next, errp);
 if (ret) {
-return ret;
+goto out;
 }
 } else {
 /* Begin the rebuild, use QEMU emulated list bits */
@@ -1751,40 +1753,40 @@ static int vfio_add_std_cap(VFIOPCIDevice *vdev, 
uint8_t pos)
 
 switch (cap_id) {
 case PCI_CAP_ID_MSI:
-ret = vfio_msi_setup(vdev, pos);
+ret = vfio_msi_setup(vdev, pos, errp);
 break;
 case PCI_CAP_ID_EXP:
 vfio_check_pcie_flr(vdev, pos);
-ret = vfio_setup_pcie_cap(vdev, pos, size);
+ret = vfio_setup_pcie_cap(vdev, pos, size, errp);
 break;
 case PCI_CAP_ID_MSIX:
-ret = vfio_msix_setup(vdev, pos);
+ret = vfio_msix_setup(vdev, pos, errp);
 break;
 case PCI_CAP_ID_PM:
 vfio_check_pm_reset(vdev, pos);
 vdev->pm_cap = pos;
-ret = pci_add_capability(pdev, cap_id, pos, size);
+ret = pci_add_capability2(pdev, cap_id, pos, size, errp);
 break;
 case PCI_CAP_ID_AF:
 vfio_check_af_flr(vdev, pos);
-ret = pci_add_capability(pdev, cap_id, pos, size);
+ret = pci_add_capability2(pdev, cap_id, pos, size, errp);
 break;
 default:
-ret =

[Qemu-devel] [RFC v5 2/8] hw: vfio: common: vfio_get_iommu_type1_info

2016-10-17 Thread Eric Auger

Introduce vfio_get_iommu_type1_info helper that allows to handle
variable size vfio_iommu_type1_info allocation with capability
chain support.

Besides, fixes a checkpatch warning on vfio_host_win_add's call.

Signed-off-by: Eric Auger 
---
 hw/vfio/common.c | 33 +++--
 1 file changed, 27 insertions(+), 6 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 29188a1..4f4014e 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -900,6 +900,27 @@ static void vfio_put_address_space(VFIOAddressSpace *space)
 }
 }
 
+static int vfio_get_iommu_type1_info(int fd,
+ struct vfio_iommu_type1_info **pinfo)
+{
+size_t argsz = sizeof(struct vfio_iommu_type1_info);
+
+*pinfo = g_malloc0(argsz);
+retry:
+(*pinfo)->argsz =  argsz;
+
+if (ioctl(fd, VFIO_IOMMU_GET_INFO, *pinfo)) {
+return -errno;
+}
+if ((*pinfo)->argsz > argsz) {
+argsz = (*pinfo)->argsz;
+*pinfo = g_realloc(*pinfo, argsz);
+goto retry;
+}
+return 0;
+}
+
+
 static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
 {
 VFIOContainer *container;
@@ -937,7 +958,7 @@ static int vfio_connect_container(VFIOGroup *group, 
AddressSpace *as)
 if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1_IOMMU) ||
 ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1v2_IOMMU)) {
 bool v2 = !!ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1v2_IOMMU);
-struct vfio_iommu_type1_info info;
+struct vfio_iommu_type1_info *pinfo;
 
 ret = ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, );
 if (ret) {
@@ -961,14 +982,14 @@ static int vfio_connect_container(VFIOGroup *group, 
AddressSpace *as)
  * existing Type1 IOMMUs generally support any IOVA we're
  * going to actually try in practice.
  */
-info.argsz = sizeof(info);
-ret = ioctl(fd, VFIO_IOMMU_GET_INFO, );
+vfio_get_iommu_type1_info(fd, );
 /* Ignore errors */
-if (ret || !(info.flags & VFIO_IOMMU_INFO_PGSIZES)) {
+if (ret || !(pinfo->flags & VFIO_IOMMU_INFO_PGSIZES)) {
 /* Assume 4k IOVA page size */
-info.iova_pgsizes = 4096;
+pinfo->iova_pgsizes = 4096;
 }
-vfio_host_win_add(container, 0, (hwaddr)-1, info.iova_pgsizes);
+vfio_host_win_add(container, 0, (hwaddr)(-1), pinfo->iova_pgsizes);
+g_free(pinfo);
 } else if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_IOMMU) ||
ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_v2_IOMMU)) {
 struct vfio_iommu_spapr_tce_info info;
-- 
1.9.1

[Qemu-devel] [RFC v5 3/8] memory: Add reserved_iova region type

2016-10-17 Thread Eric Auger

Introduce a new reserved_iova region type. This type of iova region
is bound to be used by the kernel to map some host physical addresses
(typically MSI frames).

A new initializer, memory_region_init_reserved_iova is introduced, as
well as a test function, memory_region_is_reserved_iova.

Signed-off-by: Eric Auger 
---
 include/exec/memory.h | 29 +
 memory.c  | 11 +++
 2 files changed, 40 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 10d7eac..f97b1f4 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -191,6 +191,7 @@ struct MemoryRegion {
 /* The following fields should fit in a cache line */
 bool romd_mode;
 bool ram;
+bool reserved_iova;
 bool subpage;
 bool readonly; /* For RAM regions */
 bool rom_device;
@@ -385,6 +386,21 @@ void memory_region_init_ram(MemoryRegion *mr,
 Error **errp);
 
 /**
+ * memory_region_init_reserved_iova:  Initialize reserved iova memory region
+ *
+ * @mr: the #MemoryRegion to be initialized.
+ * @owner: the object that tracks the region's reference count
+ * @name: the name of the region.
+ * @size: size of the region.
+ * @errp: pointer to Error*, to store an error if it happens.
+ */
+void memory_region_init_reserved_iova(MemoryRegion *mr,
+  struct Object *owner,
+  const char *name,
+  uint64_t size,
+  Error **errp);
+
+/**
  * memory_region_init_resizeable_ram:  Initialize memory region with resizeable
  * RAM.  Accesses into the region will
  * modify memory directly.  Only an initial
@@ -573,6 +589,19 @@ static inline bool memory_region_is_ram(MemoryRegion *mr)
 }
 
 /**
+ * memory_region_is_reserved_iova: check whether a memory region corresponds to
+   reserved iova
+ *
+ * Returns %true is a memory region is reserved iova
+ *
+ * @mr: the memory region being queried
+ */
+static inline bool memory_region_is_reserved_iova(MemoryRegion *mr)
+{
+return mr->reserved_iova;
+}
+
+/**
  * memory_region_is_skip_dump: check whether a memory region should not be
  * dumped
  *
diff --git a/memory.c b/memory.c
index 58f9269..00a0ebe 100644
--- a/memory.c
+++ b/memory.c
@@ -1309,6 +1309,17 @@ void memory_region_init_ram(MemoryRegion *mr,
 mr->dirty_log_mask = tcg_enabled() ? (1 << DIRTY_MEMORY_CODE) : 0;
 }
 
+void memory_region_init_reserved_iova(MemoryRegion *mr,
+  Object *owner,
+  const char *name,
+  uint64_t size,
+  Error **errp)
+{
+memory_region_init(mr, owner, name, size);
+mr->reserved_iova = true;
+mr->terminates = true;
+}
+
 void memory_region_init_resizeable_ram(MemoryRegion *mr,
Object *owner,
const char *name,
-- 
1.9.1

[Qemu-devel] [RFC v5 7/8] hw: vfio: common: vfio_prepare_msi_mapping

2016-10-17 Thread Eric Auger

Introduce an helper function to retrieve the iommu type1 capability
chain info.

The first capability ready to be exploited is the msi resv
capability. vfio_prepare_msi_mapping allocates a MemoryRegion
dedicated to host MSI IOVA mapping. Its size matches the host needs.
This region is mapped on guest side on the platform bus memory container.

Signed-off-by: Eric Auger 

---
v4 -> v5:
- use msi_resv struct

v3 -> v4:
- initialize err to NULL

v3: creation
---
 hw/vfio/common.c | 68 
 1 file changed, 68 insertions(+)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 4f4014e..1e2282c 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -34,6 +34,8 @@
 #include "qemu/range.h"
 #include "sysemu/kvm.h"
 #include "trace.h"
+#include "hw/platform-bus.h"
+#include "qapi/error.h"
 
 struct vfio_group_head vfio_group_list =
 QLIST_HEAD_INITIALIZER(vfio_group_list);
@@ -920,12 +922,70 @@ retry:
 return 0;
 }
 
+static struct vfio_info_cap_header *
+vfio_get_iommu_type1_info_cap(struct vfio_iommu_type1_info *info, uint16_t id)
+{
+struct vfio_info_cap_header *hdr;
+void *ptr = info;
+
+if (!(info->flags & VFIO_IOMMU_INFO_CAPS)) {
+return NULL;
+}
+
+for (hdr = ptr + info->cap_offset; hdr != ptr; hdr = ptr + hdr->next) {
+if (hdr->id == id) {
+return hdr;
+}
+}
+return NULL;
+}
+
+static void vfio_prepare_msi_mapping(struct vfio_iommu_type1_info *info,
+ AddressSpace *as, Error **errp)
+{
+struct vfio_iommu_type1_info_cap_msi_resv *msi_resv;
+MemoryRegion *pbus_region, *reserved_reg;
+struct vfio_info_cap_header *hdr;
+PlatformBusDevice *pbus;
+
+hdr = vfio_get_iommu_type1_info_cap(info,
+VFIO_IOMMU_TYPE1_INFO_CAP_MSI_RESV);
+if (!hdr) {
+return;
+}
+
+msi_resv = container_of(hdr, struct vfio_iommu_type1_info_cap_msi_resv,
+header);
+/*
+ * MSI must be iommu mapped: allocate a GPA region located on the
+ * platform bus that the host will be able to use for MSI IOVA allocation
+ */
+reserved_reg = memory_region_find_by_name(as->root, "reserved-iova");
+if (reserved_reg) {
+memory_region_unref(reserved_reg);
+return;
+}
+
+pbus_region = memory_region_find_by_name(as->root, "platform bus");
+if (!pbus_region) {
+error_setg(errp, "no platform bus memory container found");
+return;
+}
+pbus = container_of(pbus_region, PlatformBusDevice, mmio);
+reserved_reg = g_new0(MemoryRegion, 1);
+memory_region_init_reserved_iova(reserved_reg, OBJECT(pbus),
+ "reserved-iova",
+ msi_resv->size, _fatal);
+platform_bus_map_region(pbus, reserved_reg);
+memory_region_unref(pbus_region);
+}
 
 static int vfio_connect_container(VFIOGroup *group, AddressSpace *as)
 {
 VFIOContainer *container;
 int ret, fd;
 VFIOAddressSpace *space;
+Error *err = NULL;
 
 space = vfio_get_address_space(as);
 
@@ -983,6 +1043,14 @@ static int vfio_connect_container(VFIOGroup *group, 
AddressSpace *as)
  * going to actually try in practice.
  */
 vfio_get_iommu_type1_info(fd, );
+vfio_prepare_msi_mapping(pinfo, as, );
+if (err) {
+error_append_hint(,
+"Make sure your machine instantiates a platform bus\n");
+error_report_err(err);
+goto free_container_exit;
+}
+
 /* Ignore errors */
 if (ret || !(pinfo->flags & VFIO_IOMMU_INFO_PGSIZES)) {
 /* Assume 4k IOVA page size */
-- 
1.9.1

[Qemu-devel] [PATCH v7] timer: a9gtimer: remove loop to auto-increment comparator

2016-10-17 Thread P J P



===

From 632498fa33248bb990b08f246c98f3f318aa631c Mon Sep 17 00:00:00 2001

From: Prasad J Pandit 
Date: Mon, 17 Oct 2016 23:56:01 +0530
Subject: [PATCH v7] timer: a9gtimer: remove loop to auto-increment comparator

ARM A9MP processor has a peripheral timer with an auto-increment
register, which holds an increment step value. A user could set
this value to zero. When auto-increment control bit is enabled,
it leads to an infinite loop in 'a9_gtimer_update' while
updating comparator value. Remove this loop incrementing the
comparator value.

Reported-by: Li Qiang 
Signed-off-by: Prasad J Pandit 
---
 hw/timer/a9gtimer.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

Use QEMU_ALIGN_UP instead of QEMU_ALIGN_DOWN
  -> https://lists.gnu.org/archive/html/qemu-devel/2016-10/msg03788.html

diff --git a/hw/timer/a9gtimer.c b/hw/timer/a9gtimer.c
index 772f85f..ce1dc63 100644
--- a/hw/timer/a9gtimer.c
+++ b/hw/timer/a9gtimer.c
@@ -82,15 +82,15 @@ static void a9_gtimer_update(A9GTimerState *s, bool sync)
 if ((s->control & R_CONTROL_TIMER_ENABLE) &&
 (gtb->control & R_CONTROL_COMP_ENABLE)) {
 /* R2p0+, where the compare function is >= */
-while (gtb->compare < update.new) {
+if (gtb->compare < update.new) {
 DB_PRINT("Compare event happened for CPU %d\n", i);
 gtb->status = 1;
-if (gtb->control & R_CONTROL_AUTO_INCREMENT) {
-DB_PRINT("Auto incrementing timer compare by %" PRId32 
"\n",
- gtb->inc);
-gtb->compare += gtb->inc;
-} else {
-break;
+if (gtb->control & R_CONTROL_AUTO_INCREMENT && gtb->inc) {
+uint64_t inc =
+QEMU_ALIGN_UP(update.new - gtb->compare, gtb->inc);
+DB_PRINT("Auto incrementing timer compare by %"
+PRId64 "\n", inc);
+gtb->compare += inc;
 }
 }
 cdiff = (int64_t)gtb->compare - (int64_t)update.new + 1;
--
2.7.4
===


Sorry about inline text here, git send-email is showing an error about CA path

  CA path "'/etc/mail/'" does not exist at /usr/libexec/git-core/git-send-email 
line 1220

trying to fix it.

Thank you.
--
Prasad J Pandit / Red Hat Product Security Team
47AF CE69 3A90 54AA 9045 1053 DD13 3D32 FE5B 041F

[Qemu-devel] [PULL 05/19] vfio/pci: Pass an error object to vfio_intx_enable

2016-10-17 Thread Alex Williamson

From: Eric Auger 

Pass an error object to prepare for migration to VFIO-PCI realize.

The error object is propagated down to vfio_intx_enable_kvm().

The three other callers, vfio_intx_enable_kvm(), vfio_msi_disable_common()
and vfio_pci_post_reset() do not propagate the error and simply call
error_reportf_err() with the ERR_PREFIX formatting.

Signed-off-by: Eric Auger 
Reviewed-by: Markus Armbruster 
Signed-off-by: Alex Williamson 
---
 hw/vfio/pci.c |   41 +
 1 file changed, 29 insertions(+), 12 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 02e92b0..42161c8 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -100,7 +100,7 @@ static void vfio_intx_eoi(VFIODevice *vbasedev)
 vfio_unmask_single_irqindex(vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
 }
 
-static void vfio_intx_enable_kvm(VFIOPCIDevice *vdev)
+static void vfio_intx_enable_kvm(VFIOPCIDevice *vdev, Error **errp)
 {
 #ifdef CONFIG_KVM
 struct kvm_irqfd irqfd = {
@@ -126,7 +126,7 @@ static void vfio_intx_enable_kvm(VFIOPCIDevice *vdev)
 
 /* Get an eventfd for resample/unmask */
 if (event_notifier_init(>intx.unmask, 0)) {
-error_report("vfio: Error: event_notifier_init failed eoi");
+error_setg(errp, "event_notifier_init failed eoi");
 goto fail;
 }
 
@@ -134,7 +134,7 @@ static void vfio_intx_enable_kvm(VFIOPCIDevice *vdev)
 irqfd.resamplefd = event_notifier_get_fd(>intx.unmask);
 
 if (kvm_vm_ioctl(kvm_state, KVM_IRQFD, )) {
-error_report("vfio: Error: Failed to setup resample irqfd: %m");
+error_setg_errno(errp, errno, "failed to setup resample irqfd");
 goto fail_irqfd;
 }
 
@@ -153,7 +153,7 @@ static void vfio_intx_enable_kvm(VFIOPCIDevice *vdev)
 ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
 g_free(irq_set);
 if (ret) {
-error_report("vfio: Error: Failed to setup INTx unmask fd: %m");
+error_setg_errno(errp, -ret, "failed to setup INTx unmask fd");
 goto fail_vfio;
 }
 
@@ -222,6 +222,7 @@ static void vfio_intx_update(PCIDevice *pdev)
 {
 VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
 PCIINTxRoute route;
+Error *err = NULL;
 
 if (vdev->interrupt != VFIO_INT_INTx) {
 return;
@@ -244,18 +245,22 @@ static void vfio_intx_update(PCIDevice *pdev)
 return;
 }
 
-vfio_intx_enable_kvm(vdev);
+vfio_intx_enable_kvm(vdev, );
+if (err) {
+error_reportf_err(err, WARN_PREFIX, vdev->vbasedev.name);
+}
 
 /* Re-enable the interrupt in cased we missed an EOI */
 vfio_intx_eoi(>vbasedev);
 }
 
-static int vfio_intx_enable(VFIOPCIDevice *vdev)
+static int vfio_intx_enable(VFIOPCIDevice *vdev, Error **errp)
 {
 uint8_t pin = vfio_pci_read_config(>pdev, PCI_INTERRUPT_PIN, 1);
 int ret, argsz;
 struct vfio_irq_set *irq_set;
 int32_t *pfd;
+Error *err = NULL;
 
 if (!pin) {
 return 0;
@@ -279,7 +284,7 @@ static int vfio_intx_enable(VFIOPCIDevice *vdev)
 
 ret = event_notifier_init(>intx.interrupt, 0);
 if (ret) {
-error_report("vfio: Error: event_notifier_init failed");
+error_setg_errno(errp, -ret, "event_notifier_init failed");
 return ret;
 }
 
@@ -299,13 +304,16 @@ static int vfio_intx_enable(VFIOPCIDevice *vdev)
 ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
 g_free(irq_set);
 if (ret) {
-error_report("vfio: Error: Failed to setup INTx fd: %m");
+error_setg_errno(errp, -ret, "failed to setup INTx fd");
 qemu_set_fd_handler(*pfd, NULL, NULL, vdev);
 event_notifier_cleanup(>intx.interrupt);
 return -errno;
 }
 
-vfio_intx_enable_kvm(vdev);
+vfio_intx_enable_kvm(vdev, );
+if (err) {
+error_reportf_err(err, WARN_PREFIX, vdev->vbasedev.name);
+}
 
 vdev->interrupt = VFIO_INT_INTx;
 
@@ -707,6 +715,7 @@ retry:
 
 static void vfio_msi_disable_common(VFIOPCIDevice *vdev)
 {
+Error *err = NULL;
 int i;
 
 for (i = 0; i < vdev->nr_vectors; i++) {
@@ -726,7 +735,10 @@ static void vfio_msi_disable_common(VFIOPCIDevice *vdev)
 vdev->nr_vectors = 0;
 vdev->interrupt = VFIO_INT_NONE;
 
-vfio_intx_enable(vdev);
+vfio_intx_enable(vdev, );
+if (err) {
+error_reportf_err(err, ERR_PREFIX, vdev->vbasedev.name);
+}
 }
 
 static void vfio_msix_disable(VFIOPCIDevice *vdev)
@@ -1908,7 +1920,12 @@ static void vfio_pci_pre_reset(VFIOPCIDevice *vdev)
 
 static void vfio_pci_post_reset(VFIOPCIDevice *vdev)
 {
-vfio_intx_enable(vdev);
+Error *err = NULL;
+
+vfio_intx_enable(vdev, );
+if (err) {
+error_reportf_err(err, ERR_PREFIX, vdev->vbasedev.name);
+}
 }
 
 static bool vfio_pci_host_match(PCIHostDeviceAddress *addr, const char *name)
@@ -2724,7 +2741,7 @@ static int vfio_initfn(PCIDevice *pdev)

Re: [Qemu-devel] [very-WIP 1/7] migration: Add VMSTATE_WITH_TMP

2016-10-17 Thread Jianjun Duan



On 10/17/2016 12:16 PM, Dr. David Alan Gilbert wrote:
> * Jianjun Duan (du...@linux.vnet.ibm.com) wrote:
>>
>>
>> On 10/17/2016 11:52 AM, Dr. David Alan Gilbert wrote:
>>> * Jianjun Duan (du...@linux.vnet.ibm.com) wrote:


 On 10/16/2016 08:31 PM, David Gibson wrote:
> On Tue, Oct 11, 2016 at 06:18:30PM +0100, Dr. David Alan Gilbert (git) 
> wrote:
>> From: "Dr. David Alan Gilbert" 
>>
>> VMSTATE_WITH_TMP is for handling structures where some calculation
>> or rearrangement of the data needs to be performed before the data
>> hits the wire.
>> For example,  where the value on the wire is an offset from a
>> non-migrated base, but the data in the structure is the actual pointer.
>>
>> To use it, a temporary type is created and a vmsd used on that type.
>> The first element of the type must be 'parent' a pointer back to the
>> type of the main structure.  VMSTATE_WITH_TMP takes care of allocating
>> and freeing the temporary before running the child vmsd.
>>
>> The post_load/pre_save on the child vmsd can copy things from the parent
>> to the temporary using the parent pointer and do any other calculations
>> needed; it can then use normal VMSD entries to do the actual data
>> storage without having to fiddle around with qemu_get_*/qemu_put_*
>>
 If customized put/get can do transformation and dumping/loading data
 to/from the parent structure, you don't have to go through
 pre_save/post_load, and may get rid of parent pointer.
>>>
>>> Yes but I'd rather try and get rid of the customized put/get from
>>> every device, because then people start using qemu_put/qemu_get in them all.
>>>
>> Then customized handling need to happen in pre_save/post_load. I think
>> you need a way to pass TMP pointer around?
> 
> But then why is that better than having the parent pointer?
> 
IIUC, from the put_tmp, I didn't see how tmp is filled with data. I
suppose it is to be filled by pre_save. So tmp pointer needs to find a
way from inside pre_save to put_tmp. How does it happen?

Thanks,
Jianjun


> Dave
> 
>>
>> Thanks,
>> Jianjun
>>> Dave
>>>

 Thanks,
 Jianjun

>> Signed-off-by: Dr. David Alan Gilbert 
>
> The requirement for the parent pointer is a little clunky, but I don't
> quickly see a better way, and it is compile-time verified.  As noted
> elsewhere I think this is a really useful approach which could allow a
> bunch of internal state cleanups while preserving migration.
>
> Reviewed-by: David Gibson 
>
>> ---
>>  include/migration/vmstate.h | 20 
>>  migration/vmstate.c | 38 ++
>>  2 files changed, 58 insertions(+)
>>
>> diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
>> index 9500da1..efb0e90 100644
>> --- a/include/migration/vmstate.h
>> +++ b/include/migration/vmstate.h
>> @@ -259,6 +259,7 @@ extern const VMStateInfo vmstate_info_cpudouble;
>>  extern const VMStateInfo vmstate_info_timer;
>>  extern const VMStateInfo vmstate_info_buffer;
>>  extern const VMStateInfo vmstate_info_unused_buffer;
>> +extern const VMStateInfo vmstate_info_tmp;
>>  extern const VMStateInfo vmstate_info_bitmap;
>>  extern const VMStateInfo vmstate_info_qtailq;
>>  
>> @@ -651,6 +652,25 @@ extern const VMStateInfo vmstate_info_qtailq;
>>  .offset = offsetof(_state, _field),  \
>>  }
>>  
>> +/* Allocate a temporary of type 'tmp_type', set tmp->parent to _state
>> + * and execute the vmsd on the temporary.  Note that we're working with
>> + * the whole of _state here, not a field within it.
>> + * We compile time check that:
>> + *That _tmp_type contains a 'parent' member that's a pointer to the
>> + *'_state' type
>> + *That the pointer is right at the start of _tmp_type.
>> + */
>> +#define VMSTATE_WITH_TMP(_state, _tmp_type, _vmsd) { \
>> +.name = "tmp",   \
>> +.size = sizeof(_tmp_type) +  \
>> +QEMU_BUILD_BUG_EXPR(offsetof(_tmp_type, parent) != 
>> 0) + \
>> +type_check_pointer(_state,   \
>> +typeof_field(_tmp_type, parent)),\
>> +.vmsd = &(_vmsd),\
>> +.info = _info_tmp,   \
>> +.flags= VMS_LINKED,  \
>> +}
>> +
>>  #define VMSTATE_UNUSED_BUFFER(_test, _version, _size) {  \
>>  .name = "unused",\
>>

[Qemu-devel] [RFC v5 4/8] memory: memory_region_find_by_name

2016-10-17 Thread Eric Auger

This new helper makes possible to search for a MemoryRegion matching
a given name within a root MemoryRegion.

Signed-off-by: Eric Auger 
---
 include/exec/memory.h | 11 +++
 memory.c  | 16 
 2 files changed, 27 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index f97b1f4..f62e5b5 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -1217,6 +1217,17 @@ MemoryRegionSection memory_region_find(MemoryRegion *mr,
hwaddr addr, uint64_t size);
 
 /**
+ * memory_region_find_by_name: Locates the first #MemoryRegion within @mr
+ * whose name matches @name
+ *
+ * @mr: the root MemoryRegion
+ * @name: name of the target MemoryRegion
+ *
+ * Returns the matched memory region or NULL
+ */
+MemoryRegion *memory_region_find_by_name(MemoryRegion *mr, const char *name);
+
+/**
  * memory_global_dirty_log_sync: synchronize the dirty log for all memory
  *
  * Synchronizes the dirty page log for all address spaces.
diff --git a/memory.c b/memory.c
index 00a0ebe..3701b4f 100644
--- a/memory.c
+++ b/memory.c
@@ -2166,6 +2166,22 @@ MemoryRegionSection memory_region_find(MemoryRegion *mr,
 return ret;
 }
 
+MemoryRegion *memory_region_find_by_name(MemoryRegion *root,
+ const char *name)
+{
+MemoryRegion *other;
+
+QTAILQ_FOREACH(other, >subregions, subregions_link) {
+if (!strcmp(other->name, name)) {
+memory_region_ref(other);
+return other;
+} else {
+memory_region_find_by_name(other, name);
+}
+}
+return NULL;
+}
+
 bool memory_region_present(MemoryRegion *container, hwaddr addr)
 {
 MemoryRegion *mr;
-- 
1.9.1

Re: [Qemu-devel] [very-WIP 1/7] migration: Add VMSTATE_WITH_TMP

2016-10-17 Thread Dr. David Alan Gilbert

* Jianjun Duan (du...@linux.vnet.ibm.com) wrote:
> 
> 
> On 10/17/2016 11:52 AM, Dr. David Alan Gilbert wrote:
> > * Jianjun Duan (du...@linux.vnet.ibm.com) wrote:
> >>
> >>
> >> On 10/16/2016 08:31 PM, David Gibson wrote:
> >>> On Tue, Oct 11, 2016 at 06:18:30PM +0100, Dr. David Alan Gilbert (git) 
> >>> wrote:
>  From: "Dr. David Alan Gilbert" 
> 
>  VMSTATE_WITH_TMP is for handling structures where some calculation
>  or rearrangement of the data needs to be performed before the data
>  hits the wire.
>  For example,  where the value on the wire is an offset from a
>  non-migrated base, but the data in the structure is the actual pointer.
> 
>  To use it, a temporary type is created and a vmsd used on that type.
>  The first element of the type must be 'parent' a pointer back to the
>  type of the main structure.  VMSTATE_WITH_TMP takes care of allocating
>  and freeing the temporary before running the child vmsd.
> 
>  The post_load/pre_save on the child vmsd can copy things from the parent
>  to the temporary using the parent pointer and do any other calculations
>  needed; it can then use normal VMSD entries to do the actual data
>  storage without having to fiddle around with qemu_get_*/qemu_put_*
> 
> >> If customized put/get can do transformation and dumping/loading data
> >> to/from the parent structure, you don't have to go through
> >> pre_save/post_load, and may get rid of parent pointer.
> > 
> > Yes but I'd rather try and get rid of the customized put/get from
> > every device, because then people start using qemu_put/qemu_get in them all.
> > 
> Then customized handling need to happen in pre_save/post_load. I think
> you need a way to pass TMP pointer around?

But then why is that better than having the parent pointer?

Dave

> 
> Thanks,
> Jianjun
> > Dave
> > 
> >>
> >> Thanks,
> >> Jianjun
> >>
>  Signed-off-by: Dr. David Alan Gilbert 
> >>>
> >>> The requirement for the parent pointer is a little clunky, but I don't
> >>> quickly see a better way, and it is compile-time verified.  As noted
> >>> elsewhere I think this is a really useful approach which could allow a
> >>> bunch of internal state cleanups while preserving migration.
> >>>
> >>> Reviewed-by: David Gibson 
> >>>
>  ---
>   include/migration/vmstate.h | 20 
>   migration/vmstate.c | 38 ++
>   2 files changed, 58 insertions(+)
> 
>  diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
>  index 9500da1..efb0e90 100644
>  --- a/include/migration/vmstate.h
>  +++ b/include/migration/vmstate.h
>  @@ -259,6 +259,7 @@ extern const VMStateInfo vmstate_info_cpudouble;
>   extern const VMStateInfo vmstate_info_timer;
>   extern const VMStateInfo vmstate_info_buffer;
>   extern const VMStateInfo vmstate_info_unused_buffer;
>  +extern const VMStateInfo vmstate_info_tmp;
>   extern const VMStateInfo vmstate_info_bitmap;
>   extern const VMStateInfo vmstate_info_qtailq;
>   
>  @@ -651,6 +652,25 @@ extern const VMStateInfo vmstate_info_qtailq;
>   .offset = offsetof(_state, _field),  \
>   }
>   
>  +/* Allocate a temporary of type 'tmp_type', set tmp->parent to _state
>  + * and execute the vmsd on the temporary.  Note that we're working with
>  + * the whole of _state here, not a field within it.
>  + * We compile time check that:
>  + *That _tmp_type contains a 'parent' member that's a pointer to the
>  + *'_state' type
>  + *That the pointer is right at the start of _tmp_type.
>  + */
>  +#define VMSTATE_WITH_TMP(_state, _tmp_type, _vmsd) { \
>  +.name = "tmp",   \
>  +.size = sizeof(_tmp_type) +  \
>  +QEMU_BUILD_BUG_EXPR(offsetof(_tmp_type, parent) != 
>  0) + \
>  +type_check_pointer(_state,   \
>  +typeof_field(_tmp_type, parent)),\
>  +.vmsd = &(_vmsd),\
>  +.info = _info_tmp,   \
>  +.flags= VMS_LINKED,  \
>  +}
>  +
>   #define VMSTATE_UNUSED_BUFFER(_test, _version, _size) {  \
>   .name = "unused",\
>   .field_exists = (_test), \
>  diff --git a/migration/vmstate.c b/migration/vmstate.c
>  index 2157997..f2563c5 100644
>  --- a/migration/vmstate.c
>  +++ b/migration/vmstate.c
>  @@ -925,6 +925,44 @@ const VMStateInfo

Re: [Qemu-devel] [Qemu-block] block/nfs: Fine grained runtime options in nfs

2016-10-17 Thread Eric Blake

On 10/17/2016 01:00 PM, Ashijeet Acharya wrote:

> One more relatively easy question though, will we include @port as an
> option in runtime_opts while converting NFS to use several
> runtime_opts? The reason I ask this because the uri syntax for NFS in
> QEMU looks like this:
> 
>nfs:[?param=value[=value2[&...]]]

It's actually nfs://[:port]/...

so the URI syntax already supports port.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [very-WIP 1/7] migration: Add VMSTATE_WITH_TMP

2016-10-17 Thread Dr. David Alan Gilbert

* Jianjun Duan (du...@linux.vnet.ibm.com) wrote:
> 
> 
> On 10/16/2016 08:31 PM, David Gibson wrote:
> > On Tue, Oct 11, 2016 at 06:18:30PM +0100, Dr. David Alan Gilbert (git) 
> > wrote:
> >> From: "Dr. David Alan Gilbert" 
> >>
> >> VMSTATE_WITH_TMP is for handling structures where some calculation
> >> or rearrangement of the data needs to be performed before the data
> >> hits the wire.
> >> For example,  where the value on the wire is an offset from a
> >> non-migrated base, but the data in the structure is the actual pointer.
> >>
> >> To use it, a temporary type is created and a vmsd used on that type.
> >> The first element of the type must be 'parent' a pointer back to the
> >> type of the main structure.  VMSTATE_WITH_TMP takes care of allocating
> >> and freeing the temporary before running the child vmsd.
> >>
> >> The post_load/pre_save on the child vmsd can copy things from the parent
> >> to the temporary using the parent pointer and do any other calculations
> >> needed; it can then use normal VMSD entries to do the actual data
> >> storage without having to fiddle around with qemu_get_*/qemu_put_*
> >>
> If customized put/get can do transformation and dumping/loading data
> to/from the parent structure, you don't have to go through
> pre_save/post_load, and may get rid of parent pointer.

Yes but I'd rather try and get rid of the customized put/get from
every device, because then people start using qemu_put/qemu_get in them all.

Dave

> 
> Thanks,
> Jianjun
> 
> >> Signed-off-by: Dr. David Alan Gilbert 
> > 
> > The requirement for the parent pointer is a little clunky, but I don't
> > quickly see a better way, and it is compile-time verified.  As noted
> > elsewhere I think this is a really useful approach which could allow a
> > bunch of internal state cleanups while preserving migration.
> > 
> > Reviewed-by: David Gibson 
> > 
> >> ---
> >>  include/migration/vmstate.h | 20 
> >>  migration/vmstate.c | 38 ++
> >>  2 files changed, 58 insertions(+)
> >>
> >> diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
> >> index 9500da1..efb0e90 100644
> >> --- a/include/migration/vmstate.h
> >> +++ b/include/migration/vmstate.h
> >> @@ -259,6 +259,7 @@ extern const VMStateInfo vmstate_info_cpudouble;
> >>  extern const VMStateInfo vmstate_info_timer;
> >>  extern const VMStateInfo vmstate_info_buffer;
> >>  extern const VMStateInfo vmstate_info_unused_buffer;
> >> +extern const VMStateInfo vmstate_info_tmp;
> >>  extern const VMStateInfo vmstate_info_bitmap;
> >>  extern const VMStateInfo vmstate_info_qtailq;
> >>  
> >> @@ -651,6 +652,25 @@ extern const VMStateInfo vmstate_info_qtailq;
> >>  .offset = offsetof(_state, _field),  \
> >>  }
> >>  
> >> +/* Allocate a temporary of type 'tmp_type', set tmp->parent to _state
> >> + * and execute the vmsd on the temporary.  Note that we're working with
> >> + * the whole of _state here, not a field within it.
> >> + * We compile time check that:
> >> + *That _tmp_type contains a 'parent' member that's a pointer to the
> >> + *'_state' type
> >> + *That the pointer is right at the start of _tmp_type.
> >> + */
> >> +#define VMSTATE_WITH_TMP(_state, _tmp_type, _vmsd) { \
> >> +.name = "tmp",   \
> >> +.size = sizeof(_tmp_type) +  \
> >> +QEMU_BUILD_BUG_EXPR(offsetof(_tmp_type, parent) != 0) 
> >> + \
> >> +type_check_pointer(_state,   \
> >> +typeof_field(_tmp_type, parent)),\
> >> +.vmsd = &(_vmsd),\
> >> +.info = _info_tmp,   \
> >> +.flags= VMS_LINKED,  \
> >> +}
> >> +
> >>  #define VMSTATE_UNUSED_BUFFER(_test, _version, _size) {  \
> >>  .name = "unused",\
> >>  .field_exists = (_test), \
> >> diff --git a/migration/vmstate.c b/migration/vmstate.c
> >> index 2157997..f2563c5 100644
> >> --- a/migration/vmstate.c
> >> +++ b/migration/vmstate.c
> >> @@ -925,6 +925,44 @@ const VMStateInfo vmstate_info_unused_buffer = {
> >>  .put  = put_unused_buffer,
> >>  };
> >>  
> >> +/* vmstate_info_tmp, see VMSTATE_WITH_TMP, the idea is that we allocate
> >> + * a temporary buffer and the pre_load/pre_save methods in the child vmsd
> >> + * copy stuff from the parent into the child and do calculations to fill
> >> + * in fields that don't really exist in the parent but need to be in the
> >> + * stream.
> >> + */
> >> +static int get_tmp(QEMUFile *f, void *pv, size_t size, VMStateField 
> >> *field)
> >> +{
> >>

Re: [Qemu-devel] [PATCH v3 2/3] exec: rename cpu_exec_init() as cpu_exec_realizefn()

2016-10-17 Thread Eduardo Habkost

On Sat, Oct 15, 2016 at 12:52:48AM +0200, Laurent Vivier wrote:
> Modify all CPUs to call it from XXX_cpu_realizefn() function.
> 
> Remove all the cannot_destroy_with_object_finalize_yet as
> unsafe references have been moved to cpu_exec_realizefn().
> (tested with QOM command provided by commit 4c315c27)
> 
> for arm:
> 
> Setting of cpu->mp_affinity is moved from arm_cpu_initfn()
> to arm_cpu_realizefn() as setting of cpu_index is now done
> in cpu_exec_realizefn().
> 
> Signed-off-by: Laurent Vivier 
[...]
> diff --git a/target-arm/cpu.c b/target-arm/cpu.c
> index 1b9540e..364a45d 100644
> --- a/target-arm/cpu.c
> +++ b/target-arm/cpu.c
> @@ -441,22 +441,11 @@ static void arm_cpu_initfn(Object *obj)
>  CPUState *cs = CPU(obj);
>  ARMCPU *cpu = ARM_CPU(obj);
>  static bool inited;
> -uint32_t Aff1, Aff0;
>  
>  cs->env_ptr = >env;
> -cpu_exec_init(cs, _abort);
>  cpu->cp_regs = g_hash_table_new_full(g_int_hash, g_int_equal,
>   g_free, g_free);
>  
> -/* This cpu-id-to-MPIDR affinity is used only for TCG; KVM will override 
> it.
> - * We don't support setting cluster ID ([16..23]) (known as Aff2
> - * in later ARM ARM versions), or any of the higher affinity level 
> fields,
> - * so these bits always RAZ.
> - */
> -Aff1 = cs->cpu_index / ARM_CPUS_PER_CLUSTER;
> -Aff0 = cs->cpu_index % ARM_CPUS_PER_CLUSTER;
> -cpu->mp_affinity = (Aff1 << ARM_AFF1_SHIFT) | Aff0;
> -
>  #ifndef CONFIG_USER_ONLY
>  /* Our inbound IRQ and FIQ lines */
>  if (kvm_enabled()) {
[...]
> @@ -631,6 +628,15 @@ static void arm_cpu_realizefn(DeviceState *dev, Error 
> **errp)
>  set_feature(env, ARM_FEATURE_THUMB_DSP);
>  }
>  
> +/* This cpu-id-to-MPIDR affinity is used only for TCG; KVM will override 
> it.
> + * We don't support setting cluster ID ([16..23]) (known as Aff2
> + * in later ARM ARM versions), or any of the higher affinity level 
> fields,
> + * so these bits always RAZ.
> + */
> +Aff1 = cs->cpu_index / ARM_CPUS_PER_CLUSTER;
> +Aff0 = cs->cpu_index % ARM_CPUS_PER_CLUSTER;
> +cpu->mp_affinity = (Aff1 << ARM_AFF1_SHIFT) | Aff0;
> +

This will override any value set in the "mp-affinity" property,
The mp-affinity property can be set by the user in the
command-line, and it is also set by machvirt_init() in
hw/arm/virt.c.

Considering that each CPU is supposed to have a different value,
I doubt there are existing use cases for mp-affinity being set
directly by the user.

I suggest having a "cluster-size" property, instead of
"mp-affinity". This way the mp_affinity field can be calculated
on realize, based on the configured cluster-size.

>  if (cpu->reset_hivecs) {
>  cpu->reset_sctlr |= (1 << 13);
>  }
[...]

-- 
Eduardo

Re: [Qemu-devel] qemu master tests/vmstate prints "Failed to load simple/primitive:b_1" etc

2016-10-17 Thread Dr. David Alan Gilbert

* Peter Maydell (peter.mayd...@linaro.org) wrote:
> On 17 October 2016 at 19:51, Dr. David Alan Gilbert  
> wrote:
> > * Peter Maydell (peter.mayd...@linaro.org) wrote:
> >> I've just noticed that qemu master running 'make check' prints
> >>   GTESTER tests/test-vmstate
> >> Failed to load simple/primitive:b_1
> >> Failed to load simple/primitive:i64_2
> >> Failed to load simple/primitive:i32_1
> >> Failed to load simple/primitive:i32_1
> >>
> >> but the test doesn't fail.
> >>
> >> Can we either (a) silence this output if it's spurious or (b) have
> >> it cause the test to fail if it's real (and fix the cause of the
> >> failure ;-)), please?
> >
> > The test (has always) tried loading truncated versions of the migration
> > stream and made sure that it receives an error from vmstate_load_state.
> >
> > However I just added an error so we can see which field fails to load
> > in a migration where we just used to get a 'migration has failed with -22'
> >
> > Is there a way to silence error_report's that's already in use in tests?
> 
> We have some nasty hacks (like check for 'qtest_enabled()' before
> calling error_report()) but we don't have anything in the
> tree today that's a more coherent approach to the "test
> deliberately provoked this error" problem.

Errors go to either the current monitor (if it's non-qmp) or
stderr; so could we create a dummy monitor to eat the errors
and make it current around that part?

Dave

> 
> thanks
> -- PMM
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

[Qemu-devel] [RFC v5 1/8] linux-headers: Partial update for MSI IOVA handling

2016-10-17 Thread Eric Auger

This is a partial update aiming at enhancing the VFIO user API
with IOMMU info capability chain, msi_resv reporting and
MSI IOVA window registration.

The kernel code is not yet upstreamed. It is available at
https://github.com/eauger/linux/tree/generic-v7-passthrough-v14
[PATCH v14 00/16] KVM PCIe/MSI passthrough on ARM/ARM64,
https://lkml.org/lkml/2016/10/12/347

Signed-off-by: Eric Auger 

---

v4 -> v5:
- update according to kernel v14 series

v2 -> v3:
- features VFIO_IOMMU_TYPE1_INFO_CAP_MSI_GEOMETRY
---
 linux-headers/linux/vfio.h | 30 --
 1 file changed, 28 insertions(+), 2 deletions(-)

diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
index 759b850..74c8f02 100644
--- a/linux-headers/linux/vfio.h
+++ b/linux-headers/linux/vfio.h
@@ -488,7 +488,23 @@ struct vfio_iommu_type1_info {
__u32   argsz;
__u32   flags;
 #define VFIO_IOMMU_INFO_PGSIZES (1 << 0)   /* supported page sizes info */
-   __u64   iova_pgsizes;   /* Bitmap of supported page sizes */
+#define VFIO_IOMMU_INFO_CAPS   (1 << 1)/* Info supports caps */
+   __u64   iova_pgsizes;   /* Bitmap of supported page sizes */
+   __u32   cap_offset; /* Offset within info struct of first cap */
+   __u32   __resv;
+};
+
+/*
+ * The MSI_RESV capability allows to report the MSI reserved IOVA requirements:
+ * In case this capability is supported, the userspace must provide an IOVA
+ * window characterized by @size and @alignment using VFIO_IOMMU_MAP_DMA with
+ * RESERVED_MSI_IOVA flag.
+ */
+#define VFIO_IOMMU_TYPE1_INFO_CAP_MSI_RESV  1
+struct vfio_iommu_type1_info_cap_msi_resv {
+   struct vfio_info_cap_header header;
+   __u64 size; /* requested IOVA aperture size in bytes */
+   __u64 alignment;/* requested byte alignment of the window */
 };
 
 #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
@@ -498,12 +514,21 @@ struct vfio_iommu_type1_info {
  *
  * Map process virtual addresses to IO virtual addresses using the
  * provided struct vfio_dma_map. Caller sets argsz. READ &/ WRITE required.
+ *
+ * In case RESERVED_MSI_IOVA flag is set, the API only aims at registering an
+ * IOVA region that will be used on some platforms to map the host MSI frames.
+ * In that specific case, vaddr is ignored. Once registered, an MSI reserved
+ * IOVA region stays until the container is closed.
+ * The requirement for provisioning such reserved IOVA range can be checked by
+ * checking the VFIO_IOMMU_TYPE1_INFO_CAP_MSI_RESV capability.
  */
 struct vfio_iommu_type1_dma_map {
__u32   argsz;
__u32   flags;
 #define VFIO_DMA_MAP_FLAG_READ (1 << 0)/* readable from device 
*/
 #define VFIO_DMA_MAP_FLAG_WRITE (1 << 1)   /* writable from device */
+/* reserved iova for MSI vectors*/
+#define VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA (1 << 2)
__u64   vaddr;  /* Process virtual address */
__u64   iova;   /* IO virtual address */
__u64   size;   /* Size of mapping (bytes) */
@@ -519,7 +544,8 @@ struct vfio_iommu_type1_dma_map {
  * Caller sets argsz.  The actual unmapped size is returned in the size
  * field.  No guarantee is made to the user that arbitrary unmaps of iova
  * or size different from those used in the original mapping call will
- * succeed.
+ * succeed. Once registered, an MSI region cannot be unmapped and stays
+ * until the container is closed.
  */
 struct vfio_iommu_type1_dma_unmap {
__u32   argsz;
-- 
1.9.1

Re: [Qemu-devel] [PATCH v3 0/5] Allow blockdev-add for SSH

2016-10-17 Thread no-reply

Hi,

Your series failed automatic build test. Please find the testing commands and
their output below. If you have docker installed, you can probably reproduce it
locally.

Type: series
Message-id: 1476725535-3350-1-git-send-email-ashijeetacha...@gmail.com
Subject: [Qemu-devel] [PATCH v3 0/5] Allow blockdev-add for SSH

=== TEST SCRIPT BEGIN ===
#!/bin/bash
set -e
git submodule update --init dtc
# Let docker tests dump environment info
export SHOW_ENV=1
export J=16
make docker-test-quick@centos6
make docker-test-mingw@fedora
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 - [tag update]  
patchew/1476416631-2870-1-git-send-email-ppan...@redhat.com -> 
patchew/1476416631-2870-1-git-send-email-ppan...@redhat.com
 * [new tag] 
patchew/1476725535-3350-1-git-send-email-ashijeetacha...@gmail.com -> 
patchew/1476725535-3350-1-git-send-email-ashijeetacha...@gmail.com
 * [new tag] patchew/20161017180939.27912-1-mre...@redhat.com -> 
patchew/20161017180939.27912-1-mre...@redhat.com
Switched to a new branch 'test'
32a50b2 qapi: allow blockdev-add for ssh
e1f6a0a block/ssh: Use InetSocketAddress options
02358bf block/ssh: Add InetSocketAddress and accept it
d115bdf util/qemu-sockets: Make inet_connect_saddr() public
585c083 block/ssh: Add ssh_has_filename_options_conflict()

=== OUTPUT BEGIN ===
Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc'
Cloning into 'dtc'...
Submodule path 'dtc': checked out '65cc4d2748a2c2e6f27f1cf39e07a5dbabd80ebf'
  BUILD   centos6
make[1]: Entering directory '/var/tmp/patchew-tester-tmp-_3aeudre/src'
  ARCHIVE qemu.tgz
  ARCHIVE dtc.tgz
  COPYRUNNER
RUN test-quick in qemu:centos6 
Packages installed:
SDL-devel-1.2.14-7.el6_7.1.x86_64
ccache-3.1.6-2.el6.x86_64
epel-release-6-8.noarch
gcc-4.4.7-17.el6.x86_64
git-1.7.1-4.el6_7.1.x86_64
glib2-devel-2.28.8-5.el6.x86_64
libfdt-devel-1.4.0-1.el6.x86_64
make-3.81-23.el6.x86_64
package g++ is not installed
pixman-devel-0.32.8-1.el6.x86_64
tar-1.23-15.el6_8.x86_64
zlib-devel-1.2.3-29.el6.x86_64

Environment variables:
PACKAGES=libfdt-devel ccache tar git make gcc g++ zlib-devel 
glib2-devel SDL-devel pixman-devel epel-release
HOSTNAME=171aa48e5c62
TERM=xterm
MAKEFLAGS= -j16
HISTSIZE=1000
J=16
USER=root
CCACHE_DIR=/var/tmp/ccache
EXTRA_CONFIGURE_OPTS=
V=
SHOW_ENV=1
MAIL=/var/spool/mail/root
PATH=/usr/lib/ccache:/usr/lib64/ccache:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PWD=/
LANG=en_US.UTF-8
TARGET_LIST=
HISTCONTROL=ignoredups
SHLVL=1
HOME=/root
TEST_DIR=/tmp/qemu-test
LOGNAME=root
LESSOPEN=||/usr/bin/lesspipe.sh %s
FEATURES= dtc
DEBUG=
G_BROKEN_FILENAMES=1
CCACHE_HASHDIR=
_=/usr/bin/env

Configure options:
--enable-werror --target-list=x86_64-softmmu,aarch64-softmmu 
--prefix=/var/tmp/qemu-build/install
No C++ compiler available; disabling C++ specific optional code
Install prefix/var/tmp/qemu-build/install
BIOS directory/var/tmp/qemu-build/install/share/qemu
binary directory  /var/tmp/qemu-build/install/bin
library directory /var/tmp/qemu-build/install/lib
module directory  /var/tmp/qemu-build/install/lib/qemu
libexec directory /var/tmp/qemu-build/install/libexec
include directory /var/tmp/qemu-build/install/include
config directory  /var/tmp/qemu-build/install/etc
local state directory   /var/tmp/qemu-build/install/var
Manual directory  /var/tmp/qemu-build/install/share/man
ELF interp prefix /usr/gnemul/qemu-%M
Source path   /tmp/qemu-test/src
C compilercc
Host C compiler   cc
C++ compiler  
Objective-C compiler cc
ARFLAGS   rv
CFLAGS-O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -g 
QEMU_CFLAGS   -I/usr/include/pixman-1-pthread -I/usr/include/glib-2.0 
-I/usr/lib64/glib-2.0/include   -fPIE -DPIE -m64 -D_GNU_SOURCE 
-D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes 
-Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes 
-fno-strict-aliasing -fno-common -fwrapv  -Wendif-labels -Wmissing-include-dirs 
-Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self 
-Wignored-qualifiers -Wold-style-declaration -Wold-style-definition 
-Wtype-limits -fstack-protector-all
LDFLAGS   -Wl,--warn-common -Wl,-z,relro -Wl,-z,now -pie -m64 -g 
make  make
install   install
pythonpython -B
smbd  /usr/sbin/smbd
module supportno
host CPU  x86_64
host big endian   no
target list   x86_64-softmmu aarch64-softmmu
tcg debug enabled no
gprof enabled no
sparse enabledno
strip binariesyes
profiler  no
static build  no
pixmansystem
SDL support   yes (1.2.14)
GTK support   no 
GTK GL supportno
VTE support   no 
TLS priority  NORMAL
GNUTLS supportno
GNUTLS rndno
libgcrypt no
libgcrypt kdf no
nettleno 
nettle kdfno
libtasn1  no
curses supportno
virgl

Re: [Qemu-devel] [kvm-unit-tests PATCH v3 08/10] arm/arm64: gicv2: add an IPI test

2016-10-17 Thread Andrew Jones

On Thu, Sep 01, 2016 at 06:42:50PM +0200, Auger Eric wrote:
> Hi Drew,
> 
> On 15/07/2016 15:00, Andrew Jones wrote:
> > Signed-off-by: Andrew Jones 
> > ---
> > v2: add more details in the output if a test fails,
> > report spurious interrupts if we get them
> > ---
> >  arm/Makefile.common |   6 +-
> >  arm/gic.c   | 194 
> > 
> >  arm/unittests.cfg   |   7 ++
> >  3 files changed, 204 insertions(+), 3 deletions(-)
> >  create mode 100644 arm/gic.c
> > 
> > diff --git a/arm/Makefile.common b/arm/Makefile.common
> > index 41239c37e0920..bc38183ab86e0 100644
> > --- a/arm/Makefile.common
> > +++ b/arm/Makefile.common
> > @@ -9,9 +9,9 @@ ifeq ($(LOADADDR),)
> > LOADADDR = 0x4000
> >  endif
> >  
> > -tests-common = \
> > -   $(TEST_DIR)/selftest.flat \
> > -   $(TEST_DIR)/spinlock-test.flat
> > +tests-common  = $(TEST_DIR)/selftest.flat
> > +tests-common += $(TEST_DIR)/spinlock-test.flat
> > +tests-common += $(TEST_DIR)/gic.flat
> >  
> >  all: test_cases
> >  
> > diff --git a/arm/gic.c b/arm/gic.c
> > new file mode 100644
> > index 0..cf7ec1c90413c
> > --- /dev/null
> > +++ b/arm/gic.c
> > @@ -0,0 +1,194 @@
> > +/*
> > + * GIC tests
> > + *
> > + * GICv2
> > + *   . test sending/receiving IPIs
> > + *
> > + * Copyright (C) 2016, Red Hat Inc, Andrew Jones 
> > + *
> > + * This work is licensed under the terms of the GNU LGPL, version 2.
> > + */
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +static int gic_version;
> > +static int acked[NR_CPUS], spurious[NR_CPUS];
> > +static cpumask_t ready;
> > +
> > +static void nr_cpu_check(int nr)
> > +{
> > +   if (nr_cpus < nr)
> > +   report_abort("At least %d cpus required", nr);
> > +}
> > +
> > +static void wait_on_ready(void)
> > +{
> > +   cpumask_set_cpu(smp_processor_id(), );
> > +   while (!cpumask_full())
> > +   cpu_relax();
> > +}
> > +
> > +static void check_acked(cpumask_t *mask)
> > +{
> > +   int missing = 0, extra = 0, unexpected = 0;
> > +   int nr_pass, cpu, i;
> > +
> > +   /* Wait up to 5s for all interrupts to be delivered */
> > +   for (i = 0; i < 50; ++i) {
> > +   mdelay(100);
> > +   nr_pass = 0;
> > +   for_each_present_cpu(cpu) {
> Couldn't we use for_each_cpu(cpu, mask)?

If we did, then we wouldn't be able to detect delivery of interrupts to
the wrong cpus. Note, we don't run the second loop if everything looks
good here. That one is just to better report a detected problem(s).

> > +   smp_rmb();
> > +   nr_pass += cpumask_test_cpu(cpu, mask) ?
> > +   acked[cpu] == 1 : acked[cpu] == 0;
> > +   }
> > +   if (nr_pass == nr_cpus) {
> > +   report("Completed in %d ms", true, ++i * 100);
> > +   return;
> > +   }
> > +   }
> > +
> > +   for_each_present_cpu(cpu) {
> > +   if (cpumask_test_cpu(cpu, mask)) {
> here we can't since we count unexpected
> 
> > +   if (!acked[cpu])
> > +   ++missing;
> > +   else if (acked[cpu] > 1)
> > +   ++extra;
> > +   } else {
> > +   if (acked[cpu])
> > +   ++unexpected;
> > +   }
> > +   }
> > +
> > +   report("Timed-out (5s). ACKS: missing=%d extra=%d unexpected=%d",
> > +  false, missing, extra, unexpected);
> > +}
> > +
> > +static void ipi_handler(struct pt_regs *regs __unused)
> > +{
> > +   u32 iar = readl(gicv2_cpu_base() + GIC_CPU_INTACK);
> > +
> > +   if (iar != GICC_INT_SPURIOUS) {
> > +   writel(iar, gicv2_cpu_base() + GIC_CPU_EOI);
> if EOIMode is set may need to DIR as well.

OK, I'll look into that

> > +   smp_rmb(); /* pairs with wmb in ipi_test functions */
> > +   ++acked[smp_processor_id()];
> > +   smp_wmb(); /* pairs with rmb in check_acked */
> > +   } else {
> > +   ++spurious[smp_processor_id()];
> > +   smp_wmb();
> > +   }
> > +}
> > +
> > +static void ipi_test_self(void)
> > +{
> > +   cpumask_t mask;
> > +
> > +   report_prefix_push("self");
> > +   memset(acked, 0, sizeof(acked));
> > +   smp_wmb();
> > +   cpumask_clear();
> > +   cpumask_set_cpu(0, );
> > +   writel(2 << 24, gicv2_dist_base() + GIC_DIST_SOFTINT);
> > +   check_acked();
> > +   report_prefix_pop();
> > +}
> > +
> > +static void ipi_test_smp(void)
> > +{
> > +   cpumask_t mask;
> > +   unsigned long tlist;
> > +
> > +   report_prefix_push("target-list");
> > +   memset(acked, 0, sizeof(acked));
> > +   smp_wmb();
> > +   tlist = cpumask_bits(_present_mask)[0] & 0xaa;
> > +   cpumask_bits()[0] = tlist;
> > +   writel((u8)tlist << 16, gicv2_dist_base() + GIC_DIST_SOFTINT);
> > +   check_acked();
> > +   report_prefix_pop();
> > +
> > +   report_prefix_push("broadcast");
> >

Re: [Qemu-devel] [very-WIP 3/4] slirp: VMStatify sbuf

2016-10-17 Thread Dr. David Alan Gilbert

* Halil Pasic (pa...@linux.vnet.ibm.com) wrote:
> 
> 
> On 10/17/2016 05:36 AM, David Gibson wrote:
> > On Tue, Oct 11, 2016 at 06:18:32PM +0100, Dr. David Alan Gilbert (git) 
> > wrote:
> >> From: "Dr. David Alan Gilbert" 
> >>
> >> Convert the sbuf structure to a VMStateDescription.
> >> Note this uses the VMSTATE_WITH_TMP mechanism to calculate
> >> and reload the offsets based on the pointers.
> >>
> >> Signed-off-by: Dr. David Alan Gilbert 
> 
> Hi Dave!
> 
> I had a brief look, which means I intend to have a deeper
> one too, but for now you will have to live with this.

Thanks.

> > 
> > Reviewed-by: David Gibson 
> > 
> >> ---
> >>  slirp/sbuf.h  |   4 +-
> >>  slirp/slirp.c | 116 
> >> ++
> >>  2 files changed, 78 insertions(+), 42 deletions(-)
> >>
> >> diff --git a/slirp/sbuf.h b/slirp/sbuf.h
> >> index efcec39..a722ecb 100644
> >> --- a/slirp/sbuf.h
> >> +++ b/slirp/sbuf.h
> >> @@ -12,8 +12,8 @@
> >>  #define sbspace(sb) ((sb)->sb_datalen - (sb)->sb_cc)
> >>  
> >>  struct sbuf {
> >> -  u_int   sb_cc;  /* actual chars in buffer */
> >> -  u_int   sb_datalen; /* Length of data  */
> >> +  uint32_t sb_cc; /* actual chars in buffer */
> >> +  uint32_t sb_datalen;/* Length of data  */
> >>char*sb_wptr;   /* write pointer. points to where the next
> >> * bytes should be written in the sbuf */
> >>char*sb_rptr;   /* read pointer. points to where the next
> >> diff --git a/slirp/slirp.c b/slirp/slirp.c
> >> index 6276315..2f7802e 100644
> >> --- a/slirp/slirp.c
> >> +++ b/slirp/slirp.c
> >> @@ -1185,19 +1185,72 @@ static const VMStateDescription vmstate_slirp_tcp 
> >> = {
> >>  }
> >>  };
> >>  
> >> -static void slirp_sbuf_save(QEMUFile *f, struct sbuf *sbuf)
> >> +/* The sbuf has a pair of pointers that are migrated as offsets;
> >> + * we calculate the offsets and restore the pointers using
> >> + * pre_save/post_load on a tmp structure.
> >> + */
> >> +struct sbuf_tmp {
> >> +struct sbuf *parent;
> >> +uint32_t roff, woff;
> >> +};
> >> +
> >> +static void sbuf_tmp_pre_save(void *opaque)
> >> +{
> >> +struct sbuf_tmp *tmp = opaque;
> >> +tmp->woff = tmp->parent->sb_wptr - tmp->parent->sb_data;
> >> +tmp->roff = tmp->parent->sb_rptr - tmp->parent->sb_data;
> >> +}
> >> +
> >> +static int sbuf_tmp_post_load(void *opaque, int version)
> >>  {
> 
> What makes me think about the properties of this approach,
> is, that each time we use a parent pointer to read we have
> a data dependency. This seems to me much more complicated
> that the current massaging function approach were we say
> "OK now everything below me is there, now let us transform".
> Of course the proposed approach is more powerful.

Yes it is, but we have to apply a transform to the data
so that means we somehow need to get to both a temporary
piece of storage and the parent data.

> >> -uint32_t off;
> >> -
> >> -qemu_put_be32(f, sbuf->sb_cc);
> >> -qemu_put_be32(f, sbuf->sb_datalen);
> >> -off = (uint32_t)(sbuf->sb_wptr - sbuf->sb_data);
> >> -qemu_put_sbe32(f, off);
> >> -off = (uint32_t)(sbuf->sb_rptr - sbuf->sb_data);
> >> -qemu_put_sbe32(f, off);
> >> -qemu_put_buffer(f, (unsigned char*)sbuf->sb_data, sbuf->sb_datalen);
> >> +struct sbuf_tmp *tmp = opaque;
> >> +uint32_t requested_len = tmp->parent->sb_datalen;
> 
> Ok, data parent->sb_datalen was previously loaded at #1
> 
> >> +
> >> +/* Allocate the buffer space used by the field after the tmp */
> >> +sbreserve(tmp->parent, tmp->parent->sb_datalen);
> #2 
> >> +
> >> +if (tmp->parent->sb_datalen != requested_len) {
> >> +return -ENOMEM;
> >> +}
> >> +if (tmp->woff >= requested_len ||
> >> +tmp->roff >= requested_len) {
> >> +error_report("invalid sbuf offsets r/w=%u/%u len=%u",
> >> + tmp->roff, tmp->woff, requested_len);
> >> +return -EINVAL;
> >> +}
> >> +
> >> +tmp->parent->sb_wptr = tmp->parent->sb_data + tmp->woff;
> >> +tmp->parent->sb_rptr = tmp->parent->sb_data + tmp->roff;
> 
> Ok, parent->sb_data is assigned and the backing memory allocated
> at #2
> 
> >> +
> >> +return 0;
> >>  }
> >>  
> >> +
> >> +static const VMStateDescription vmstate_slirp_sbuf_tmp = {
> >> +.name = "slirp-sbuf-tmp",
> >> +.post_load = sbuf_tmp_post_load,
> >> +.pre_save  = sbuf_tmp_pre_save,
> >> +.version_id = 0,
> >> +.fields = (VMStateField[]) {
> >> +VMSTATE_UINT32(woff, struct sbuf_tmp),
> >> +VMSTATE_UINT32(roff, struct sbuf_tmp),
> >> +VMSTATE_END_OF_LIST()
> >> +}
> >> +};
> >> +
> >> +static const VMStateDescription vmstate_slirp_sbuf = {
> >> +.name = "slirp-sbuf",
> >> +.version_id = 0,
> >> +.fields = (VMStateField[]) {
> >> +VMSTATE_UINT32(sb_cc, struct sbuf),
>

Re: [Qemu-devel] [Qemu-block] block/nfs: Fine grained runtime options in nfs

2016-10-17 Thread Ashijeet Acharya

On Tue, Oct 18, 2016 at 12:59 AM, Eric Blake  wrote:
> On 10/17/2016 01:00 PM, Ashijeet Acharya wrote:
>
>> One more relatively easy question though, will we include @port as an
>> option in runtime_opts while converting NFS to use several
>> runtime_opts? The reason I ask this because the uri syntax for NFS in
>> QEMU looks like this:
>>
>>nfs:[?param=value[=value2[&...]]]
>
> It's actually nfs://[:port]/...
>
> so the URI syntax already supports port.

But the commit message which added support for NFS had the uri which I
mentioned above and the code for NFS does not make use of 'port'
anywhere either, which is why I am a bit confused.

Ashijeet
>
> --
> Eric Blake   eblake redhat com+1-919-301-3266
> Libvirt virtualization library http://libvirt.org
>

Re: [Qemu-devel] [very-WIP 1/7] migration: Add VMSTATE_WITH_TMP

2016-10-17 Thread Jianjun Duan



On 10/17/2016 11:52 AM, Dr. David Alan Gilbert wrote:
> * Jianjun Duan (du...@linux.vnet.ibm.com) wrote:
>>
>>
>> On 10/16/2016 08:31 PM, David Gibson wrote:
>>> On Tue, Oct 11, 2016 at 06:18:30PM +0100, Dr. David Alan Gilbert (git) 
>>> wrote:
 From: "Dr. David Alan Gilbert" 

 VMSTATE_WITH_TMP is for handling structures where some calculation
 or rearrangement of the data needs to be performed before the data
 hits the wire.
 For example,  where the value on the wire is an offset from a
 non-migrated base, but the data in the structure is the actual pointer.

 To use it, a temporary type is created and a vmsd used on that type.
 The first element of the type must be 'parent' a pointer back to the
 type of the main structure.  VMSTATE_WITH_TMP takes care of allocating
 and freeing the temporary before running the child vmsd.

 The post_load/pre_save on the child vmsd can copy things from the parent
 to the temporary using the parent pointer and do any other calculations
 needed; it can then use normal VMSD entries to do the actual data
 storage without having to fiddle around with qemu_get_*/qemu_put_*

>> If customized put/get can do transformation and dumping/loading data
>> to/from the parent structure, you don't have to go through
>> pre_save/post_load, and may get rid of parent pointer.
> 
> Yes but I'd rather try and get rid of the customized put/get from
> every device, because then people start using qemu_put/qemu_get in them all.
> 
Then customized handling need to happen in pre_save/post_load. I think
you need a way to pass TMP pointer around?

Thanks,
Jianjun
> Dave
> 
>>
>> Thanks,
>> Jianjun
>>
 Signed-off-by: Dr. David Alan Gilbert 
>>>
>>> The requirement for the parent pointer is a little clunky, but I don't
>>> quickly see a better way, and it is compile-time verified.  As noted
>>> elsewhere I think this is a really useful approach which could allow a
>>> bunch of internal state cleanups while preserving migration.
>>>
>>> Reviewed-by: David Gibson 
>>>
 ---
  include/migration/vmstate.h | 20 
  migration/vmstate.c | 38 ++
  2 files changed, 58 insertions(+)

 diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
 index 9500da1..efb0e90 100644
 --- a/include/migration/vmstate.h
 +++ b/include/migration/vmstate.h
 @@ -259,6 +259,7 @@ extern const VMStateInfo vmstate_info_cpudouble;
  extern const VMStateInfo vmstate_info_timer;
  extern const VMStateInfo vmstate_info_buffer;
  extern const VMStateInfo vmstate_info_unused_buffer;
 +extern const VMStateInfo vmstate_info_tmp;
  extern const VMStateInfo vmstate_info_bitmap;
  extern const VMStateInfo vmstate_info_qtailq;
  
 @@ -651,6 +652,25 @@ extern const VMStateInfo vmstate_info_qtailq;
  .offset = offsetof(_state, _field),  \
  }
  
 +/* Allocate a temporary of type 'tmp_type', set tmp->parent to _state
 + * and execute the vmsd on the temporary.  Note that we're working with
 + * the whole of _state here, not a field within it.
 + * We compile time check that:
 + *That _tmp_type contains a 'parent' member that's a pointer to the
 + *'_state' type
 + *That the pointer is right at the start of _tmp_type.
 + */
 +#define VMSTATE_WITH_TMP(_state, _tmp_type, _vmsd) { \
 +.name = "tmp",   \
 +.size = sizeof(_tmp_type) +  \
 +QEMU_BUILD_BUG_EXPR(offsetof(_tmp_type, parent) != 0) 
 + \
 +type_check_pointer(_state,   \
 +typeof_field(_tmp_type, parent)),\
 +.vmsd = &(_vmsd),\
 +.info = _info_tmp,   \
 +.flags= VMS_LINKED,  \
 +}
 +
  #define VMSTATE_UNUSED_BUFFER(_test, _version, _size) {  \
  .name = "unused",\
  .field_exists = (_test), \
 diff --git a/migration/vmstate.c b/migration/vmstate.c
 index 2157997..f2563c5 100644
 --- a/migration/vmstate.c
 +++ b/migration/vmstate.c
 @@ -925,6 +925,44 @@ const VMStateInfo vmstate_info_unused_buffer = {
  .put  = put_unused_buffer,
  };
  
 +/* vmstate_info_tmp, see VMSTATE_WITH_TMP, the idea is that we allocate
 + * a temporary buffer and the pre_load/pre_save methods in the child vmsd
 + * copy stuff from the parent into the child and do calculations to fill

Re: [Qemu-devel] [very-WIP 1/7] migration: Add VMSTATE_WITH_TMP

2016-10-17 Thread Jianjun Duan



On 10/16/2016 08:31 PM, David Gibson wrote:
> On Tue, Oct 11, 2016 at 06:18:30PM +0100, Dr. David Alan Gilbert (git) wrote:
>> From: "Dr. David Alan Gilbert" 
>>
>> VMSTATE_WITH_TMP is for handling structures where some calculation
>> or rearrangement of the data needs to be performed before the data
>> hits the wire.
>> For example,  where the value on the wire is an offset from a
>> non-migrated base, but the data in the structure is the actual pointer.
>>
>> To use it, a temporary type is created and a vmsd used on that type.
>> The first element of the type must be 'parent' a pointer back to the
>> type of the main structure.  VMSTATE_WITH_TMP takes care of allocating
>> and freeing the temporary before running the child vmsd.
>>
>> The post_load/pre_save on the child vmsd can copy things from the parent
>> to the temporary using the parent pointer and do any other calculations
>> needed; it can then use normal VMSD entries to do the actual data
>> storage without having to fiddle around with qemu_get_*/qemu_put_*
>>
If customized put/get can do transformation and dumping/loading data
to/from the parent structure, you don't have to go through
pre_save/post_load, and may get rid of parent pointer.

Thanks,
Jianjun

>> Signed-off-by: Dr. David Alan Gilbert 
> 
> The requirement for the parent pointer is a little clunky, but I don't
> quickly see a better way, and it is compile-time verified.  As noted
> elsewhere I think this is a really useful approach which could allow a
> bunch of internal state cleanups while preserving migration.
> 
> Reviewed-by: David Gibson 
> 
>> ---
>>  include/migration/vmstate.h | 20 
>>  migration/vmstate.c | 38 ++
>>  2 files changed, 58 insertions(+)
>>
>> diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
>> index 9500da1..efb0e90 100644
>> --- a/include/migration/vmstate.h
>> +++ b/include/migration/vmstate.h
>> @@ -259,6 +259,7 @@ extern const VMStateInfo vmstate_info_cpudouble;
>>  extern const VMStateInfo vmstate_info_timer;
>>  extern const VMStateInfo vmstate_info_buffer;
>>  extern const VMStateInfo vmstate_info_unused_buffer;
>> +extern const VMStateInfo vmstate_info_tmp;
>>  extern const VMStateInfo vmstate_info_bitmap;
>>  extern const VMStateInfo vmstate_info_qtailq;
>>  
>> @@ -651,6 +652,25 @@ extern const VMStateInfo vmstate_info_qtailq;
>>  .offset = offsetof(_state, _field),  \
>>  }
>>  
>> +/* Allocate a temporary of type 'tmp_type', set tmp->parent to _state
>> + * and execute the vmsd on the temporary.  Note that we're working with
>> + * the whole of _state here, not a field within it.
>> + * We compile time check that:
>> + *That _tmp_type contains a 'parent' member that's a pointer to the
>> + *'_state' type
>> + *That the pointer is right at the start of _tmp_type.
>> + */
>> +#define VMSTATE_WITH_TMP(_state, _tmp_type, _vmsd) { \
>> +.name = "tmp",   \
>> +.size = sizeof(_tmp_type) +  \
>> +QEMU_BUILD_BUG_EXPR(offsetof(_tmp_type, parent) != 0) + 
>> \
>> +type_check_pointer(_state,   \
>> +typeof_field(_tmp_type, parent)),\
>> +.vmsd = &(_vmsd),\
>> +.info = _info_tmp,   \
>> +.flags= VMS_LINKED,  \
>> +}
>> +
>>  #define VMSTATE_UNUSED_BUFFER(_test, _version, _size) {  \
>>  .name = "unused",\
>>  .field_exists = (_test), \
>> diff --git a/migration/vmstate.c b/migration/vmstate.c
>> index 2157997..f2563c5 100644
>> --- a/migration/vmstate.c
>> +++ b/migration/vmstate.c
>> @@ -925,6 +925,44 @@ const VMStateInfo vmstate_info_unused_buffer = {
>>  .put  = put_unused_buffer,
>>  };
>>  
>> +/* vmstate_info_tmp, see VMSTATE_WITH_TMP, the idea is that we allocate
>> + * a temporary buffer and the pre_load/pre_save methods in the child vmsd
>> + * copy stuff from the parent into the child and do calculations to fill
>> + * in fields that don't really exist in the parent but need to be in the
>> + * stream.
>> + */
>> +static int get_tmp(QEMUFile *f, void *pv, size_t size, VMStateField *field)
>> +{
>> +int ret;
>> +const VMStateDescription *vmsd = field->vmsd;
>> +int version_id = field->version_id;
>> +void *tmp = g_malloc(size);
>> +
>> +/* Writes the parent field which is at the start of the tmp */
>> +*(void **)tmp = pv;
>> +ret = vmstate_load_state(f, vmsd, tmp, version_id);
>> +g_free(tmp);
>> +return ret;
>> +}
>> +
>> +static void put_tmp(QEMUFile *f, void

Re: [Qemu-devel] [PATCH v10 00/10] Dirty bitmap changes for migration/persistence work

2016-10-17 Thread Max Reitz

On 13.10.2016 23:58, John Snow wrote:
> Key:
> [] : patches are identical
> [] : number of functional differences between upstream/downstream patch
> [down] : patch is downstream-only
> The flags [FC] indicate (F)unctional and (C)ontextual differences, 
> respectively
> 
> 001/10:[] [--] 'block: Hide HBitmap in block dirty bitmap interface'
> 002/10:[] [--] 'HBitmap: Introduce "meta" bitmap to track bit changes'
> 003/10:[] [--] 'tests: Add test code for meta bitmap'
> 004/10:[] [--] 'block: Support meta dirty bitmap'
> 005/10:[] [--] 'block: Add two dirty bitmap getters'
> 006/10:[] [--] 'block: Assert that bdrv_release_dirty_bitmap succeeded'
> 007/10:[] [--] 'hbitmap: serialization'
> 008/10:[] [--] 'block: BdrvDirtyBitmap serialization interface'
> 009/10:[0005] [FC] 'tests: Add test code for hbitmap serialization'
> 010/10:[] [--] 'block: More operations for meta dirty bitmap'
> 
> ===
> v10: Now with less bits

Thanks, I've applied the series to my block branch:

https://github.com/XanClic/qemu/commits/block

Max



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v3 1/3] exec: split cpu_exec_init()

2016-10-17 Thread Eduardo Habkost

On Sat, Oct 15, 2016 at 12:52:47AM +0200, Laurent Vivier wrote:
> Put in cpu_exec_initfn() what initializes the CPU,
> and let in cpu_exec_init() what adds it to the environment.
> 
> As cpu_exec_initfn() is called by all XX_cpu_initfn() call it
> directly in cpu_common_initfn().
> cpu_exec_init() is now a realize function, it will be renamed
> to cpu_exec_realizefn() and moved to the XX_cpu_realizefn()
> function in a following patch.
> 
> Signed-off-by: Laurent Vivier 

Confirmed that:

* cpu->num_ases and cpu->as are never changed by architecture
  code before calling cpu_exec_init()
* cpu_exec_exit() is called on cpu_common_finalize()
* The cpu->memory reference will be dropped automatically
  because the property is registered using
  OBJ_PROP_LINK_UNREF_ON_RELEASE

BTW, the cpu->as and cpu->num_ases lines are redundant, because
QOM objects are guaranteed to be zeroed when allocated.

Reviewed-by: Eduardo Habkost 

-- 
Eduardo

Re: [Qemu-devel] qemu master tests/vmstate prints "Failed to load simple/primitive:b_1" etc

2016-10-17 Thread Dr. David Alan Gilbert

* Peter Maydell (peter.mayd...@linaro.org) wrote:
> I've just noticed that qemu master running 'make check' prints
>   GTESTER tests/test-vmstate
> Failed to load simple/primitive:b_1
> Failed to load simple/primitive:i64_2
> Failed to load simple/primitive:i32_1
> Failed to load simple/primitive:i32_1
> 
> but the test doesn't fail.
> 
> Can we either (a) silence this output if it's spurious or (b) have
> it cause the test to fail if it's real (and fix the cause of the
> failure ;-)), please?

The test (has always) tried loading truncated versions of the migration
stream and made sure that it receives an error from vmstate_load_state.

However I just added an error so we can see which field fails to load
in a migration where we just used to get a 'migration has failed with -22'

Is there a way to silence error_report's that's already in use in tests?

Dave

> thanks
> -- PMM
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

[Qemu-devel] [PULL 23/25] target-arm: Add trace events for the generic timers

2016-10-17 Thread Peter Maydell

Add some useful trace events for the ARM generic timers (notably
the various register writes and the resulting IRQ line state).

Signed-off-by: Peter Maydell 
Reviewed-by: Edgar E. Iglesias 
Message-id: 1476294876-12340-3-git-send-email-peter.mayd...@linaro.org
---
 Makefile.objs   |  1 +
 target-arm/helper.c | 20 
 target-arm/trace-events | 10 ++
 3 files changed, 27 insertions(+), 4 deletions(-)
 create mode 100644 target-arm/trace-events

diff --git a/Makefile.objs b/Makefile.objs
index 02fb8e7..69fdd48 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -155,6 +155,7 @@ trace-events-y += hw/alpha/trace-events
 trace-events-y += ui/trace-events
 trace-events-y += audio/trace-events
 trace-events-y += net/trace-events
+trace-events-y += target-arm/trace-events
 trace-events-y += target-i386/trace-events
 trace-events-y += target-sparc/trace-events
 trace-events-y += target-s390x/trace-events
diff --git a/target-arm/helper.c b/target-arm/helper.c
index a65f4f2..cb83ee2 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -1,4 +1,5 @@
 #include "qemu/osdep.h"
+#include "trace.h"
 #include "cpu.h"
 #include "internals.h"
 #include "exec/gdbstub.h"
@@ -1560,10 +1561,13 @@ static void gt_recalc_timer(ARMCPU *cpu, int timeridx)
 /* Note that this must be unsigned 64 bit arithmetic: */
 int istatus = count - offset >= gt->cval;
 uint64_t nexttick;
+int irqstate;
 
 gt->ctl = deposit32(gt->ctl, 2, 1, istatus);
-qemu_set_irq(cpu->gt_timer_outputs[timeridx],
- (istatus && !(gt->ctl & 2)));
+
+irqstate = (istatus && !(gt->ctl & 2));
+qemu_set_irq(cpu->gt_timer_outputs[timeridx], irqstate);
+
 if (istatus) {
 /* Next transition is when count rolls back over to zero */
 nexttick = UINT64_MAX;
@@ -1580,11 +1584,13 @@ static void gt_recalc_timer(ARMCPU *cpu, int timeridx)
 nexttick = INT64_MAX / GTIMER_SCALE;
 }
 timer_mod(cpu->gt_timer[timeridx], nexttick);
+trace_arm_gt_recalc(timeridx, irqstate, nexttick);
 } else {
 /* Timer disabled: ISTATUS and timer output always clear */
 gt->ctl &= ~4;
 qemu_set_irq(cpu->gt_timer_outputs[timeridx], 0);
 timer_del(cpu->gt_timer[timeridx]);
+trace_arm_gt_recalc_disabled(timeridx);
 }
 }
 
@@ -1610,6 +1616,7 @@ static void gt_cval_write(CPUARMState *env, const 
ARMCPRegInfo *ri,
   int timeridx,
   uint64_t value)
 {
+trace_arm_gt_cval_write(timeridx, value);
 env->cp15.c14_timer[timeridx].cval = value;
 gt_recalc_timer(arm_env_get_cpu(env), timeridx);
 }
@@ -1629,6 +1636,7 @@ static void gt_tval_write(CPUARMState *env, const 
ARMCPRegInfo *ri,
 {
 uint64_t offset = timeridx == GTIMER_VIRT ? env->cp15.cntvoff_el2 : 0;
 
+trace_arm_gt_tval_write(timeridx, value);
 env->cp15.c14_timer[timeridx].cval = gt_get_countervalue(env) - offset +
  sextract64(value, 0, 32);
 gt_recalc_timer(arm_env_get_cpu(env), timeridx);
@@ -1641,6 +1649,7 @@ static void gt_ctl_write(CPUARMState *env, const 
ARMCPRegInfo *ri,
 ARMCPU *cpu = arm_env_get_cpu(env);
 uint32_t oldval = env->cp15.c14_timer[timeridx].ctl;
 
+trace_arm_gt_ctl_write(timeridx, value);
 env->cp15.c14_timer[timeridx].ctl = deposit64(oldval, 0, 2, value);
 if ((oldval ^ value) & 1) {
 /* Enable toggled */
@@ -1649,8 +1658,10 @@ static void gt_ctl_write(CPUARMState *env, const 
ARMCPRegInfo *ri,
 /* IMASK toggled: don't need to recalculate,
  * just set the interrupt line based on ISTATUS
  */
-qemu_set_irq(cpu->gt_timer_outputs[timeridx],
- (oldval & 4) && !(value & 2));
+int irqstate = (oldval & 4) && !(value & 2);
+
+trace_arm_gt_imask_toggle(timeridx, irqstate);
+qemu_set_irq(cpu->gt_timer_outputs[timeridx], irqstate);
 }
 }
 
@@ -1715,6 +1726,7 @@ static void gt_cntvoff_write(CPUARMState *env, const 
ARMCPRegInfo *ri,
 {
 ARMCPU *cpu = arm_env_get_cpu(env);
 
+trace_arm_gt_cntvoff_write(value);
 raw_write(env, ri, value);
 gt_recalc_timer(cpu, GTIMER_VIRT);
 }
diff --git a/target-arm/trace-events b/target-arm/trace-events
new file mode 100644
index 000..9f726bd
--- /dev/null
+++ b/target-arm/trace-events
@@ -0,0 +1,10 @@
+# See docs/tracing.txt for syntax documentation.
+
+# target-arm/helper.c
+arm_gt_recalc(int timer, int irqstate, uint64_t nexttick) "gt recalc: timer %d 
irqstate %d next tick %" PRIx64
+arm_gt_recalc_disabled(int timer) "gt recalc: timer %d irqstate 0 timer 
disabled"
+arm_gt_cval_write(int timer, uint64_t value) "gt_cval_write: timer %d value %" 
PRIx64
+arm_gt_tval_write(int timer, uint64_t value) "gt_tval_write: timer %d value %" 
PRIx64
+arm_gt_ctl_write(int timer, uint64_t

[Qemu-devel] [PULL 18/25] target-arm: Infrastucture changes to enable handling of tagged address loading into PC

2016-10-17 Thread Peter Maydell

From: Thomas Hanson 

When capturing the current CPU state for the TB, extract the TBI0 and TBI1
values from the correct TCR for the current EL and then add them to the TB
flags field.

Then, at the start of code generation for the block, copy the TBI fields
into the DisasContext structure.

Signed-off-by: Thomas Hanson 
Message-id: 1476301853-15774-2-git-send-email-thomas.han...@linaro.org
[PMM: drop useless 'extern' keyword on function prototypes;
 provide CONFIG_USER_ONLY trivial versions of arm_regime_tbi[01]()]
Reviewed-by: Peter Maydell 
Signed-off-by: Peter Maydell 
---
 target-arm/cpu.h   | 52 --
 target-arm/helper.c| 46 
 target-arm/translate-a64.c |  2 ++
 target-arm/translate.h |  2 ++
 4 files changed, 100 insertions(+), 2 deletions(-)

diff --git a/target-arm/cpu.h b/target-arm/cpu.h
index 76d824d..2218c00 100644
--- a/target-arm/cpu.h
+++ b/target-arm/cpu.h
@@ -2191,7 +2191,11 @@ static inline bool 
arm_cpu_data_is_big_endian(CPUARMState *env)
 #define ARM_TBFLAG_BE_DATA_SHIFT20
 #define ARM_TBFLAG_BE_DATA_MASK (1 << ARM_TBFLAG_BE_DATA_SHIFT)
 
-/* Bit usage when in AArch64 state: currently we have no A64 specific bits */
+/* Bit usage when in AArch64 state */
+#define ARM_TBFLAG_TBI0_SHIFT 0/* TBI0 for EL0/1 or TBI for EL2/3 */
+#define ARM_TBFLAG_TBI0_MASK (0x1ull << ARM_TBFLAG_TBI0_SHIFT)
+#define ARM_TBFLAG_TBI1_SHIFT 1/* TBI1 for EL0/1  */
+#define ARM_TBFLAG_TBI1_MASK (0x1ull << ARM_TBFLAG_TBI1_SHIFT)
 
 /* some convenience accessor macros */
 #define ARM_TBFLAG_AARCH64_STATE(F) \
@@ -,6 +2226,10 @@ static inline bool 
arm_cpu_data_is_big_endian(CPUARMState *env)
 (((F) & ARM_TBFLAG_NS_MASK) >> ARM_TBFLAG_NS_SHIFT)
 #define ARM_TBFLAG_BE_DATA(F) \
 (((F) & ARM_TBFLAG_BE_DATA_MASK) >> ARM_TBFLAG_BE_DATA_SHIFT)
+#define ARM_TBFLAG_TBI0(F) \
+(((F) & ARM_TBFLAG_TBI0_MASK) >> ARM_TBFLAG_TBI0_SHIFT)
+#define ARM_TBFLAG_TBI1(F) \
+(((F) & ARM_TBFLAG_TBI1_MASK) >> ARM_TBFLAG_TBI1_SHIFT)
 
 static inline bool bswap_code(bool sctlr_b)
 {
@@ -2319,12 +2327,51 @@ static inline bool arm_cpu_bswap_data(CPUARMState *env)
 }
 #endif
 
+#ifndef CONFIG_USER_ONLY
+/**
+ * arm_regime_tbi0:
+ * @env: CPUARMState
+ * @mmu_idx: MMU index indicating required translation regime
+ *
+ * Extracts the TBI0 value from the appropriate TCR for the current EL
+ *
+ * Returns: the TBI0 value.
+ */
+uint32_t arm_regime_tbi0(CPUARMState *env, ARMMMUIdx mmu_idx);
+
+/**
+ * arm_regime_tbi1:
+ * @env: CPUARMState
+ * @mmu_idx: MMU index indicating required translation regime
+ *
+ * Extracts the TBI1 value from the appropriate TCR for the current EL
+ *
+ * Returns: the TBI1 value.
+ */
+uint32_t arm_regime_tbi1(CPUARMState *env, ARMMMUIdx mmu_idx);
+#else
+/* We can't handle tagged addresses properly in user-only mode */
+static inline uint32_t arm_regime_tbi0(CPUARMState *env, ARMMMUIdx mmu_idx)
+{
+return 0;
+}
+
+static inline uint32_t arm_regime_tbi1(CPUARMState *env, ARMMMUIdx mmu_idx)
+{
+return 0;
+}
+#endif
+
 static inline void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
 target_ulong *cs_base, uint32_t *flags)
 {
+ARMMMUIdx mmu_idx = cpu_mmu_index(env, false);
 if (is_a64(env)) {
 *pc = env->pc;
 *flags = ARM_TBFLAG_AARCH64_STATE_MASK;
+/* Get control bits for tagged addresses */
+*flags |= (arm_regime_tbi0(env, mmu_idx) << ARM_TBFLAG_TBI0_SHIFT);
+*flags |= (arm_regime_tbi1(env, mmu_idx) << ARM_TBFLAG_TBI1_SHIFT);
 } else {
 *pc = env->regs[15];
 *flags = (env->thumb << ARM_TBFLAG_THUMB_SHIFT)
@@ -2343,7 +2390,8 @@ static inline void cpu_get_tb_cpu_state(CPUARMState *env, 
target_ulong *pc,
<< ARM_TBFLAG_XSCALE_CPAR_SHIFT);
 }
 
-*flags |= (cpu_mmu_index(env, false) << ARM_TBFLAG_MMUIDX_SHIFT);
+*flags |= (mmu_idx << ARM_TBFLAG_MMUIDX_SHIFT);
+
 /* The SS_ACTIVE and PSTATE_SS bits correspond to the state machine
  * states defined in the ARM ARM for software singlestep:
  *  SS_ACTIVE   PSTATE.SS   State
diff --git a/target-arm/helper.c b/target-arm/helper.c
index 25f612d..70e2742 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -6720,6 +6720,52 @@ static inline TCR *regime_tcr(CPUARMState *env, 
ARMMMUIdx mmu_idx)
 return >cp15.tcr_el[regime_el(env, mmu_idx)];
 }
 
+/* Returns TBI0 value for current regime el */
+uint32_t arm_regime_tbi0(CPUARMState *env, ARMMMUIdx mmu_idx)
+{
+TCR *tcr;
+uint32_t el;
+
+/* For EL0 and EL1, TBI is controlled by stage 1's TCR, so convert
+   * a stage 1+2 mmu index into the appropriate stage 1 mmu index.
+   */
+if (mmu_idx == ARMMMUIdx_S12NSE0 || mmu_idx == ARMMMUIdx_S12NSE1) {
+mmu_idx += ARMMMUIdx_S1NSE0;
+

Re: [Qemu-devel] [PATCH v3 0/3] Split cpu_exec_init() into an init and a realize part

2016-10-17 Thread Eduardo Habkost

On Mon, Oct 17, 2016 at 02:44:04PM +1100, David Gibson wrote:
> On Sat, Oct 15, 2016 at 12:52:46AM +0200, Laurent Vivier wrote:
> > Since commit 42ecaba ("target-i386: Call cpu_exec_init() on realize"),
> > , commit 6dd0f83 ("target-ppc: Move cpu_exec_init() call to realize 
> > function"),
> > and commit c6644fc ("s390x/cpu: Get rid of side effects when creating a 
> > vcpu"),
> > cpu_exec_init() has been moved to realize function for some architectures
> > to implement CPU htoplug. This allows any failures from cpu_exec_init() to 
> > be
> > handled appropriately.
> > 
> > This series tries to do the same work for all the other CPUs.
> > 
> > But as the ARM Virtual Machine ("virt") needs the "memory" property of the 
> > CPU
> > in the machine init function (the "memory" property is created in
> > cpu_exec_init() we want to move to the realize part), split cpu_exec_init() 
> > in
> > two parts: a realize part (cpu_exec_realizefn(), adding the CPU in the
> > environment) and an init part (cpu_exec_initfn(), initializing the CPU, like
> > adding the "memory" property). To mirror the realize part, add an unrealize
> > part, and remove the cpu_exec_exit() call from the finalize part.
> > 
> > This also allows to remove all the "cannot_destroy_with_object_finalize_yet"
> > properties from the CPU device class.
> 
> This is looking good to me - the v3 re-org has made it quite a bit
> easier to follow.
> 
> Whose tree should this go via?

I can merge it through the machine tree, if others agree.

-- 
Eduardo

1 2 3 4 >

1 - 100 of 391 matches

Mail list logo