date:20161116

Re: [Qemu-devel] [Qemu-trivial] [PATCH v2] qapi-schema: clarify 'colo' state for MigrationStatus

2016-11-16 Thread Laurent Vivier

Le 17/11/2016 à 08:08, Hailiang Zhang a écrit :
> Hi Laurent,
> 
> On 2016/11/15 16:44, Laurent Vivier wrote:
>> Le 14/11/2016 à 14:54, Stefan Hajnoczi a écrit :
>>> On Mon, Nov 14, 2016 at 10:36:45AM +0800, Hailiang Zhang wrote:
 ping ?

 Anyone pick this up?
>>>
>>> The original patch that added these lines went through Amit Shah and
>>> David Gilbert.  I have CCed them.
>>
>> If it is needed, I can also send a pull request through my
>> trivial-patches branch.
>>
> 
> I think this patch can go through your trivial branch if Amit or Dave
> didn't pick it up.

Stefan,

can I send a pull request for this while we are in hard freeze phase or
should we wait the end of the release?

Thanks,
Laurent

Re: [Qemu-devel] [PATCH 1/4] arm: Uniquely name imx25 I2C buses.

2016-11-16 Thread Cédric Le Goater

Hello,

On 11/17/2016 05:36 AM, Alastair D'Silva wrote:
> From: Alastair D'Silva 
> 
> The imx25 chip provides 3 i2c buses, but they have all been named
> "i2c", which makes it difficult to predict which bus a device will
> be connected to when specified on the command line.
> 
> This patch addresses the issue by naming the buses uniquely:
>   i2c.0 i2c.1 i2c.2
> 
> Signed-off-by: Alastair D'Silva 
> ---
>  hw/arm/imx25_pdk.c | 2 +-
>  hw/i2c/imx_i2c.c   | 6 +-
>  2 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/arm/imx25_pdk.c b/hw/arm/imx25_pdk.c
> index 025b608..1f7512c 100644
> --- a/hw/arm/imx25_pdk.c
> +++ b/hw/arm/imx25_pdk.c
> @@ -139,7 +139,7 @@ static void imx25_pdk_init(MachineState *machine)
>   * of simple qtest. See "make check" for details.
>   */
>  i2c_create_slave((I2CBus *)qdev_get_child_bus(DEVICE(>soc.i2c[0]),
> -  "i2c"),
> +  "i2c.0"),
>   "ds1338", 0x68);

or just :

i2c_create_slave(s->soc.i2c[0].bus, "ds1338", 0x68);

?

C.

>  }
>  }
> diff --git a/hw/i2c/imx_i2c.c b/hw/i2c/imx_i2c.c
> index 37e5a62..7be10fb 100644
> --- a/hw/i2c/imx_i2c.c
> +++ b/hw/i2c/imx_i2c.c
> @@ -305,12 +305,16 @@ static const VMStateDescription imx_i2c_vmstate = {
>  static void imx_i2c_realize(DeviceState *dev, Error **errp)
>  {
>  IMXI2CState *s = IMX_I2C(dev);
> +static int bus_count;
> +char name[16];
> +
> +snprintf(name, sizeof(name), "i2c.%d", bus_count++);
>  
>  memory_region_init_io(>iomem, OBJECT(s), _i2c_ops, s, 
> TYPE_IMX_I2C,
>IMX_I2C_MEM_SIZE);
>  sysbus_init_mmio(SYS_BUS_DEVICE(dev), >iomem);
>  sysbus_init_irq(SYS_BUS_DEVICE(dev), >irq);
> -s->bus = i2c_init_bus(DEVICE(dev), "i2c");
> +s->bus = i2c_init_bus(DEVICE(dev), name);
>  }
>  
>  static void imx_i2c_class_init(ObjectClass *klass, void *data)
>

Re: [Qemu-devel] [Qemu-ppc] [PATCH] tests/postcopy: Use KVM on ppc64 only if it is KVM-HV

2016-11-16 Thread Laurent Vivier



On 17/11/2016 04:18, David Gibson wrote:
> On Wed, Nov 16, 2016 at 09:39:31AM +0100, Thomas Huth wrote:
>> The ppc64 postcopy test does not work with KVM-PR, and it is also
>> causing annoying warning messages when run on a x86 host. So let's
>> use KVM here only if we know that we're running with KVM-HV (which
>> automatically also means that we're running on a ppc64 host), and
>> fall back to TCG otherwise.
>>
>> Signed-off-by: Thomas Huth 
> 
> Applied to ppc-for-2.8.
> 
> Longer term, I think we should default to tcg for all these tests - on
> x86 as well - then run KVM *as well* when available.  But in the short
> term we should fix make check for the 2.8 release.

I agree with that.

Laurent

Re: [Qemu-devel] [Qemu-trivial] [PATCH v2] qapi-schema: clarify 'colo' state for MigrationStatus

2016-11-16 Thread Hailiang Zhang


Hi Laurent,

On 2016/11/15 16:44, Laurent Vivier wrote:

Le 14/11/2016 à 14:54, Stefan Hajnoczi a écrit :

On Mon, Nov 14, 2016 at 10:36:45AM +0800, Hailiang Zhang wrote:

ping ?

Anyone pick this up?


The original patch that added these lines went through Amit Shah and
David Gilbert.  I have CCed them.


If it is needed, I can also send a pull request through my
trivial-patches branch.



I think this patch can go through your trivial branch if Amit or Dave
didn't pick it up.

Thanks,
Hailiang


Laurent

Re: [Qemu-devel] [PATCH v14 09/22] vfio iommu type1: Add task structure to vfio_dma

2016-11-16 Thread Alexey Kardashevskiy

On 17/11/16 17:12, Alex Williamson wrote:
> On Thu, 17 Nov 2016 16:41:14 +1100
> Alexey Kardashevskiy  wrote:
> 
>> On 17/11/16 07:46, Kirti Wankhede wrote:
>>> Add task structure to vfio_dma structure. Task structure is used for:
>>> - During DMA_UNMAP, same task who mapped it or other task who shares same
>>> address space is allowed to unmap, otherwise unmap fails.
>>> QEMU maps few iova ranges initially, then fork threads and from the child
>>> thread calls DMA_UNMAP on previously mapped iova. Since child shares same
>>> address space, DMA_UNMAP is successful.
>>> - Avoid accessing struct mm while process is exiting by acquiring
>>> reference of task's mm during page accounting.
>>> - It is also used to get task mlock capability and rlimit for mlock.
>>>
>>> Signed-off-by: Kirti Wankhede 
>>> Signed-off-by: Neo Jia 
>>> Reviewed-by: Dong Jia Shi   
>>
>>
>> I keep whinging that @mm should be referenced, not @current but you keep
>> referencing @current even if you only need @mm and you are not telling why
>> - and I am wondering what I am missing here? Something else will be used
>> from @task later, besides just @mm?
> 
> Yes, we reference @current from vfio_dma_do_map() and this is stored
> on the struct vfio_dma.  A reference to current is held because the
> external page pinning in vfio_pin_page_external() needs to test the
> capabilities of the task for CAP_IPC_LOCK to know whether locked memory

Ah, that's it - capable(CAP_IPC_LOCK) is checking @current, missed that.


> limits are in effect for the task even when it's not @current (ie. an
> asynchronous call from the vendor driver regardless of what task is
> currently running).  There are also various get_task_mm() taken
> temporarily when we're working with the mm of that task.  Do you spot
> any issues with this behavior? Thanks,

No, now I am fine, thanks!


-- 
Alexey

Re: [Qemu-devel] [PATCH RFC 0/2] numa: allocate CPUs masks dynamically

2016-11-16 Thread Alexey Kardashevskiy

On 17/11/16 03:02, Igor Mammedov wrote:
> This series removes global MAX_CPUMASK_BITS constant
> so that it won't inderectly influence maximum CPUs count
> supported by different targets.
> 
> It replaces statically allocated bitmasks with dynamically
> allocated ones using '-smp maxcpus' value for setting
> bitmasks size.
> That would allocate just enough memory to handle all
> CPUs indexes that a QEMU instance would ever have.
> 
> CC: Alexey Kardashevskiy 
> CC: Greg Kurz 
> CC: David Gibson 
> CC: Eduardo Habkost 
> CC: Paolo Bonzini 
> 
> 
> Igor Mammedov (2):
>   add bitmap_free() wrapper
>   numa: make -numa parser dynamically allocate CPUs masks

Nice, with "ulimit -n 3072", guest kernel with CONFIG_NR_CPUS=2048,
"mc->max_cpus = 2048;" in  hw/ppc/spapr.c, and "-smp 2048,threads=8" in
QEMU cmdline, I get all 2048 CPUs in the guest.


Tested-by: Alexey Kardashevskiy 


> 
>  include/qemu/bitmap.h   |  5 +
>  include/sysemu/numa.h   |  2 +-
>  include/sysemu/sysemu.h |  7 ---
>  numa.c  | 19 ---
>  vl.c|  5 -
>  5 files changed, 18 insertions(+), 20 deletions(-)
> 


-- 
Alexey

Re: [Qemu-devel] [PATCH v14 09/22] vfio iommu type1: Add task structure to vfio_dma

2016-11-16 Thread Kirti Wankhede



On 11/17/2016 11:11 AM, Alexey Kardashevskiy wrote:
> On 17/11/16 07:46, Kirti Wankhede wrote:
>> Add task structure to vfio_dma structure. Task structure is used for:
>> - During DMA_UNMAP, same task who mapped it or other task who shares same
>> address space is allowed to unmap, otherwise unmap fails.
>> QEMU maps few iova ranges initially, then fork threads and from the child
>> thread calls DMA_UNMAP on previously mapped iova. Since child shares same
>> address space, DMA_UNMAP is successful.
>> - Avoid accessing struct mm while process is exiting by acquiring
>> reference of task's mm during page accounting.
>> - It is also used to get task mlock capability and rlimit for mlock.
>>
>> Signed-off-by: Kirti Wankhede 
>> Signed-off-by: Neo Jia 
>> Reviewed-by: Dong Jia Shi 
> 
> 
> I keep whinging that @mm should be referenced, not @current but you keep
> referencing @current even if you only need @mm and you are not telling why
> - and I am wondering what I am missing here? Something else will be used
> from @task later, besides just @mm?
> 
> 

Hey Alexey,

I updated briefly in commit description. Let me try to explain it again
in detail.

Its true we need mm, but we also need task structure for 2 reasons:
- Avoid accessing struct mm while process is exiting by acquiring
 reference of task's mm during page accounting.
If you see vfio_lock_acct(), where reference to mm is taken from task
structure, get_task_mm(task), to make sure that mm of this task is still
valid and task not in exiting process. If process is exiting, mm would
be NULL and we don't have to do page accounting.
This patch is to re-org and prepare the code for next patch, 10/22.
vfio_pin_pages()/ vfio_unpin_pages() for mediated devices would get
called from vendor driver. Those could be initiated by other process,
but for pin/unpin, these APIs should use the mm of the task who mapped
it. So in these calls we should check that we get the valid reference of
mm, that we would get from task structure.

- It is also used to get task mlock capability and rlimit for mlock.
These are again used for page accounting and page accounting should be
done with reference to the task who mapped the iova range. We get these
from task structure.

Thanks,
Kirti

Re: [Qemu-devel] [PATCH v2 1/4] aio: add AioPollFn and io_poll() interface

2016-11-16 Thread Fam Zheng

On Wed, 11/16 17:46, Stefan Hajnoczi wrote:
> The new AioPollFn io_poll() argument to aio_set_fd_handler() and
> aio_set_event_handler() is used in the next patch.
> 
> Keep this code change separate due to the number of files it touches.
> 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  aio-posix.c |  8 +---

As pointed out by patchew, aio-win32.c needs to the change too.

Fam

>  async.c |  5 +++--
>  block/curl.c|  8 
>  block/iscsi.c   |  3 ++-
>  block/linux-aio.c   |  4 ++--
>  block/nbd-client.c  |  8 
>  block/nfs.c |  7 ---
>  block/sheepdog.c| 26 +-
>  block/ssh.c |  4 ++--
>  block/win32-aio.c   |  4 ++--
>  hw/virtio/virtio.c  |  4 ++--
>  include/block/aio.h |  5 -
>  iohandler.c |  2 +-
>  nbd/server.c|  9 -
>  stubs/set-fd-handler.c  |  1 +
>  tests/test-aio.c|  4 ++--
>  util/event_notifier-posix.c |  2 +-
>  17 files changed, 56 insertions(+), 48 deletions(-)

Re: [Qemu-devel] [PATCH v14 09/22] vfio iommu type1: Add task structure to vfio_dma

2016-11-16 Thread Alex Williamson

On Thu, 17 Nov 2016 16:41:14 +1100
Alexey Kardashevskiy  wrote:

> On 17/11/16 07:46, Kirti Wankhede wrote:
> > Add task structure to vfio_dma structure. Task structure is used for:
> > - During DMA_UNMAP, same task who mapped it or other task who shares same
> > address space is allowed to unmap, otherwise unmap fails.
> > QEMU maps few iova ranges initially, then fork threads and from the child
> > thread calls DMA_UNMAP on previously mapped iova. Since child shares same
> > address space, DMA_UNMAP is successful.
> > - Avoid accessing struct mm while process is exiting by acquiring
> > reference of task's mm during page accounting.
> > - It is also used to get task mlock capability and rlimit for mlock.
> > 
> > Signed-off-by: Kirti Wankhede 
> > Signed-off-by: Neo Jia 
> > Reviewed-by: Dong Jia Shi   
> 
> 
> I keep whinging that @mm should be referenced, not @current but you keep
> referencing @current even if you only need @mm and you are not telling why
> - and I am wondering what I am missing here? Something else will be used
> from @task later, besides just @mm?

Yes, we reference @current from vfio_dma_do_map() and this is stored
on the struct vfio_dma.  A reference to current is held because the
external page pinning in vfio_pin_page_external() needs to test the
capabilities of the task for CAP_IPC_LOCK to know whether locked memory
limits are in effect for the task even when it's not @current (ie. an
asynchronous call from the vendor driver regardless of what task is
currently running).  There are also various get_task_mm() taken
temporarily when we're working with the mm of that task.  Do you spot
any issues with this behavior? Thanks,

Alex

Re: [Qemu-devel] [PATCH v14 09/22] vfio iommu type1: Add task structure to vfio_dma

2016-11-16 Thread Alexey Kardashevskiy

On 17/11/16 07:46, Kirti Wankhede wrote:
> Add task structure to vfio_dma structure. Task structure is used for:
> - During DMA_UNMAP, same task who mapped it or other task who shares same
> address space is allowed to unmap, otherwise unmap fails.
> QEMU maps few iova ranges initially, then fork threads and from the child
> thread calls DMA_UNMAP on previously mapped iova. Since child shares same
> address space, DMA_UNMAP is successful.
> - Avoid accessing struct mm while process is exiting by acquiring
> reference of task's mm during page accounting.
> - It is also used to get task mlock capability and rlimit for mlock.
> 
> Signed-off-by: Kirti Wankhede 
> Signed-off-by: Neo Jia 
> Reviewed-by: Dong Jia Shi 


I keep whinging that @mm should be referenced, not @current but you keep
referencing @current even if you only need @mm and you are not telling why
- and I am wondering what I am missing here? Something else will be used
from @task later, besides just @mm?


> 
> Change-Id: I7600f1bea6b384fd589fa72421ccf031bcfd9ac5
> ---
>  drivers/vfio/vfio_iommu_type1.c | 137 
> +---
>  1 file changed, 86 insertions(+), 51 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index ffe2026f1341..a0a7484cec64 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -36,6 +36,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #define DRIVER_VERSION  "0.2"
>  #define DRIVER_AUTHOR   "Alex Williamson "
> @@ -75,6 +76,7 @@ struct vfio_dma {
>   unsigned long   vaddr;  /* Process virtual addr */
>   size_t  size;   /* Map size (bytes) */
>   int prot;   /* IOMMU_READ/WRITE */
> + struct task_struct  *task;
>  };
>  
>  struct vfio_group {
> @@ -277,41 +279,47 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned 
> long vaddr,
>   * the iommu can only map chunks of consecutive pfns anyway, so get the
>   * first page and all consecutive pages with the same locking.
>   */
> -static long vfio_pin_pages_remote(unsigned long vaddr, long npage,
> -   int prot, unsigned long *pfn_base)
> +static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr,
> +   long npage, int prot, unsigned long *pfn_base)
>  {
> - unsigned long limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
> - bool lock_cap = capable(CAP_IPC_LOCK);
> + unsigned long limit;
> + bool lock_cap = ns_capable(task_active_pid_ns(dma->task)->user_ns,
> +CAP_IPC_LOCK);
> + struct mm_struct *mm;
>   long ret, i;
>   bool rsvd;
>  
> - if (!current->mm)
> + mm = get_task_mm(dma->task);
> + if (!mm)
>   return -ENODEV;
>  
> - ret = vaddr_get_pfn(current->mm, vaddr, prot, pfn_base);
> + ret = vaddr_get_pfn(mm, vaddr, prot, pfn_base);
>   if (ret)
> - return ret;
> + goto pin_pg_remote_exit;
>  
>   rsvd = is_invalid_reserved_pfn(*pfn_base);
> + limit = task_rlimit(dma->task, RLIMIT_MEMLOCK) >> PAGE_SHIFT;
>  
> - if (!rsvd && !lock_cap && current->mm->locked_vm + 1 > limit) {
> + if (!rsvd && !lock_cap && mm->locked_vm + 1 > limit) {
>   put_pfn(*pfn_base, prot);
>   pr_warn("%s: RLIMIT_MEMLOCK (%ld) exceeded\n", __func__,
>   limit << PAGE_SHIFT);
> - return -ENOMEM;
> + ret = -ENOMEM;
> + goto pin_pg_remote_exit;
>   }
>  
>   if (unlikely(disable_hugepages)) {
>   if (!rsvd)
> - vfio_lock_acct(current, 1);
> - return 1;
> + vfio_lock_acct(dma->task, 1);
> + ret = 1;
> + goto pin_pg_remote_exit;
>   }
>  
>   /* Lock all the consecutive pages from pfn_base */
>   for (i = 1, vaddr += PAGE_SIZE; i < npage; i++, vaddr += PAGE_SIZE) {
>   unsigned long pfn = 0;
>  
> - ret = vaddr_get_pfn(current->mm, vaddr, prot, );
> + ret = vaddr_get_pfn(mm, vaddr, prot, );
>   if (ret)
>   break;
>  
> @@ -321,8 +329,7 @@ static long vfio_pin_pages_remote(unsigned long vaddr, 
> long npage,
>   break;
>   }
>  
> - if (!rsvd && !lock_cap &&
> - current->mm->locked_vm + i + 1 > limit) {
> + if (!rsvd && !lock_cap && mm->locked_vm + i + 1 > limit) {
>   put_pfn(pfn, prot);
>   pr_warn("%s: RLIMIT_MEMLOCK (%ld) exceeded\n",
>   __func__, limit << PAGE_SHIFT);
> @@ -331,13 +338,16 @@ static long vfio_pin_pages_remote(unsigned long vaddr, 
> long npage,
>   }
>  
>   if (!rsvd)

Re: [Qemu-devel] [PATCH 1/3] virtio: introduce grab/release_ioeventfd to fix vhost

2016-11-16 Thread Alexey Kardashevskiy

On 17/11/16 05:05, Paolo Bonzini wrote:
> Following the recent refactoring of virtio notifiers [1], more specifically
> the patch ed08a2a0b ("virtio: use virtio_bus_set_host_notifier to
> start/stop ioeventfd") that uses virtio_bus_set_host_notifier [2]
> by default, core virtio code requires 'ioeventfd_started' to be set
> to true/false when the host notifiers are configured.
> 
> When vhost is stopped and started, however, there is a stop followed by
> another start. Since ioeventfd_started was never set to true, the 'stop'
> operation triggered by virtio_bus_set_host_notifier() will not result
> in a call to virtio_pci_ioeventfd_assign(assign=false). This leaves
> the memory regions with stale notifiers and results on the next start
> triggering the following assertion:
> 
>   kvm_mem_ioeventfd_add: error adding ioeventfd: File exists
>   Aborted
> 
> This patch reintroduces (hopefully in a cleaner way) the concept
> that was present with ioeventfd_disabled before the refactoring.
> When ioeventfd_grabbed>0, ioeventfd_started tracks whether ioeventfd
> should be enabled or not, but ioeventfd is actually not started at
> all until vhost releases the host notifiers.
> 
> [1] http://lists.nongnu.org/archive/html/qemu-devel/2016-10/msg07748.html
> [2] http://lists.nongnu.org/archive/html/qemu-devel/2016-10/msg07760.html
> 
> Reported-by: Felipe Franciosi 
> Reported-by: Christian Borntraeger 
> Reported-by: Alex Williamson 
> Fixes: ed08a2a0b ("virtio: use virtio_bus_set_host_notifier to start/stop 
> ioeventfd")
> Signed-off-by: Paolo Bonzini 
> Message-Id: <2016192855.26350-1-pbonz...@redhat.com>
> Signed-off-by: Paolo Bonzini 


As mentioned in another thread, this fixed vhost on ppc64/pseries.


Tested-by: Alexey Kardashevskiy 


> ---
> v1->v2: more comments [Cornelia]
> 
>  hw/virtio/vhost.c  | 14 +-
>  hw/virtio/virtio-bus.c | 58 
> ++
>  hw/virtio/virtio.c | 16 
>  include/hw/virtio/virtio-bus.h | 14 ++
>  include/hw/virtio/virtio.h |  2 ++
>  5 files changed, 86 insertions(+), 18 deletions(-)
> 
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index 30aee88..f7f7023 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -1214,17 +1214,17 @@ void vhost_dev_cleanup(struct vhost_dev *hdev)
>  int vhost_dev_enable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev)
>  {
>  BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
> -VirtioBusState *vbus = VIRTIO_BUS(qbus);
> -VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
>  int i, r, e;
>  
> -if (!k->ioeventfd_assign) {
> +/* We will pass the notifiers to the kernel, make sure that QEMU
> + * doesn't interfere.
> + */
> +r = virtio_device_grab_ioeventfd(vdev);
> +if (r < 0) {
>  error_report("binding does not support host notifiers");
> -r = -ENOSYS;
>  goto fail;
>  }
>  
> -virtio_device_stop_ioeventfd(vdev);
>  for (i = 0; i < hdev->nvqs; ++i) {
>  r = virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), hdev->vq_index + 
> i,
>   true);
> @@ -1244,7 +1244,7 @@ fail_vq:
>  }
>  assert (e >= 0);
>  }
> -virtio_device_start_ioeventfd(vdev);
> +virtio_device_release_ioeventfd(vdev);
>  fail:
>  return r;
>  }
> @@ -1267,7 +1267,7 @@ void vhost_dev_disable_notifiers(struct vhost_dev 
> *hdev, VirtIODevice *vdev)
>  }
>  assert (r >= 0);
>  }
> -virtio_device_start_ioeventfd(vdev);
> +virtio_device_release_ioeventfd(vdev);
>  }
>  
>  /* Test and clear event pending status.
> diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
> index bf61f66..d6c0c72 100644
> --- a/hw/virtio/virtio-bus.c
> +++ b/hw/virtio/virtio-bus.c
> @@ -147,6 +147,39 @@ void virtio_bus_set_vdev_config(VirtioBusState *bus, 
> uint8_t *config)
>  }
>  }
>  
> +/* On success, ioeventfd ownership belongs to the caller.  */
> +int virtio_bus_grab_ioeventfd(VirtioBusState *bus)
> +{
> +VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(bus);
> +
> +/* vhost can be used even if ioeventfd=off in the proxy device,
> + * so do not check k->ioeventfd_enabled.
> + */
> +if (!k->ioeventfd_assign) {
> +return -ENOSYS;
> +}
> +
> +if (bus->ioeventfd_grabbed == 0 && bus->ioeventfd_started) {
> +virtio_bus_stop_ioeventfd(bus);
> +/* Remember that we need to restart ioeventfd
> + * when ioeventfd_grabbed becomes zero.
> + */
> +bus->ioeventfd_started = true;
> +}
> +bus->ioeventfd_grabbed++;
> +return 0;
> +}
> +
> +void virtio_bus_release_ioeventfd(VirtioBusState *bus)
> +{
> +assert(bus->ioeventfd_grabbed != 0);
> +if (--bus->ioeventfd_grabbed == 0 &&

Re: [Qemu-devel] [PATCH v2] vhost: Update 'ioeventfd_started' with host notifiers

2016-11-16 Thread Alexey Kardashevskiy

On 16/11/16 19:38, Felipe Franciosi wrote:
> 
>> On 16 Nov 2016, at 04:05, Alexey Kardashevskiy  wrote:
>>
>> On 11/11/16 01:45, Christian Borntraeger wrote:
>>> On 11/09/2016 01:44 PM, Felipe Franciosi wrote:
 Following the recent refactor of virtio notfiers [1], more specifically
 the patch that uses virtio_bus_set_host_notifier [2] by default, core
 virtio code requires 'ioeventfd_started' to be set to true/false when
 the host notifiers are configured. Because not all vhost devices were
 update (eg. vhost-scsi) to use the new interface, this value is always
 set to false.

 When booting a guest with a vhost-scsi backend controller, SeaBIOS will
 initially configure the device which sets all notifiers. The guest will
 continue to boot fine until the kernel virtio-scsi driver reinitialises
 the device causing a stop followed by another start. Since
 ioeventfd_started was never set to true, the 'stop' operation triggered
 by virtio_bus_set_host_notifier() will not result in a call to
 virtio_pci_ioeventfd_assign(assign=false). This leaves the memory
 regions with stale notifiers and results on the next start triggering
 the following assertion:

  kvm_mem_ioeventfd_add: error adding ioeventfd: File exists
  Aborted

 This patch updates ioeventfd_started whenever the notifiers are set or
 cleared, fixing this issue.

 Signed-off-by: Felipe Franciosi 
>>>
>>> This also fixes vhost-net after reboot on s390/kvm for me
>>
>>
>> It does not fix it (the original breakage from e616c2f "virtio: remove
>> ioeventfd_disabled altogether") for me:
> 
> Can you try Paolo's latest patches for this issue?
> http://lists.nongnu.org/archive/html/qemu-devel/2016-11/msg02834.html
> 
> Specifically this:
> http://lists.nongnu.org/archive/html/qemu-devel/2016-11/msg02837.html

This one does work, thanks!

-- 
Alexey

Re: [Qemu-devel] [PATCH v14 12/22] vfio: Add notifier callback to parent's ops structure of mdev

2016-11-16 Thread Kirti Wankhede



On 11/17/2016 7:45 AM, Dong Jia Shi wrote:
> * Kirti Wankhede  [2016-11-17 02:16:24 +0530]:
> 
> Hi Kirti,
> 
>> diff --git a/drivers/vfio/mdev/vfio_mdev.c b/drivers/vfio/mdev/vfio_mdev.c
> [...]
> 
>> @@ -51,6 +78,11 @@ static void vfio_mdev_release(void *device_data)
>>  if (likely(parent->ops->release))
>>  parent->ops->release(mdev);
>>
>> +if (likely(parent->ops->notifier)) {
>> +if (vfio_unregister_notifier(>dev, >nb))
>> +pr_err("Failed to unregister notifier for mdev\n");
> For the -ENOTTY case, we should not fail here either.
> 

Removing the error print and ignoring return from this unregister call.
Updating this patch on this thread.

>> +}
>> +
>>  module_put(THIS_MODULE);
>>  }
>>
> [...]
>

Re: [Qemu-devel] [PATCH v14 12/22] vfio: Add notifier callback to parent's ops structure of mdev

2016-11-16 Thread Kirti Wankhede

Add a notifier calback to parent's ops structure of mdev device so that per
device notifer for vfio module is registered through vfio_mdev module.

Signed-off-by: Kirti Wankhede 
Signed-off-by: Neo Jia 
Change-Id: Iafa6f1721aecdd6e50eb93b153b5621e6d29b637
---
 drivers/vfio/mdev/vfio_mdev.c | 32 +++-
 include/linux/mdev.h  |  9 +
 2 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/mdev/vfio_mdev.c b/drivers/vfio/mdev/vfio_mdev.c
index ffc36758cb84..7ffdc317319d 100644
--- a/drivers/vfio/mdev/vfio_mdev.c
+++ b/drivers/vfio/mdev/vfio_mdev.c
@@ -24,6 +24,15 @@
 #define DRIVER_AUTHOR   "NVIDIA Corporation"
 #define DRIVER_DESC "VFIO based driver for Mediated device"
 
+static int vfio_mdev_notifier(struct notifier_block *nb, unsigned long action,
+ void *data)
+{
+   struct mdev_device *mdev = container_of(nb, struct mdev_device, nb);
+   struct parent_device *parent = mdev->parent;
+
+   return parent->ops->notifier(mdev, action, data);
+}
+
 static int vfio_mdev_open(void *device_data)
 {
struct mdev_device *mdev = device_data;
@@ -36,9 +45,27 @@ static int vfio_mdev_open(void *device_data)
if (!try_module_get(THIS_MODULE))
return -ENODEV;
 
+   if (likely(parent->ops->notifier)) {
+   mdev->nb.notifier_call = vfio_mdev_notifier;
+   ret = vfio_register_notifier(>dev, >nb);
+
+   /*
+* This should not fail if backend iommu module doesn't support
+* register_notifier.
+*/
+   if (ret && (ret != -ENOTTY)) {
+   pr_err("Failed to register notifier for mdev\n");
+   module_put(THIS_MODULE);
+   return ret;
+   }
+   }
+
ret = parent->ops->open(mdev);
-   if (ret)
+   if (ret) {
+   if (likely(parent->ops->notifier))
+   vfio_unregister_notifier(>dev, >nb);
module_put(THIS_MODULE);
+   }
 
return ret;
 }
@@ -51,6 +78,9 @@ static void vfio_mdev_release(void *device_data)
if (likely(parent->ops->release))
parent->ops->release(mdev);
 
+   if (likely(parent->ops->notifier))
+   vfio_unregister_notifier(>dev, >nb);
+
module_put(THIS_MODULE);
 }
 
diff --git a/include/linux/mdev.h b/include/linux/mdev.h
index ec819e9a115a..94c43034c297 100644
--- a/include/linux/mdev.h
+++ b/include/linux/mdev.h
@@ -37,6 +37,7 @@ struct mdev_device {
struct kref ref;
struct list_headnext;
struct kobject  *type_kobj;
+   struct notifier_block   nb;
 };
 
 /**
@@ -85,6 +86,12 @@ struct mdev_device {
  * @mmap:  mmap callback
  * @mdev: mediated device structure
  * @vma: vma structure
+ * @notifer:   Notifier callback, currently only for
+ * VFIO_IOMMU_NOTIFY_DMA_UNMAP action notified duing
+ * DMA_UNMAP call on mapped iova range.
+ * @mdev: mediated device structure
+ * @action: Action for which notifier is called
+ * @data: Data associated with the notifier
  * Parent device that support mediated device should be registered with mdev
  * module with parent_ops structure.
  **/
@@ -106,6 +113,8 @@ struct parent_ops {
ssize_t (*ioctl)(struct mdev_device *mdev, unsigned int cmd,
 unsigned long arg);
int (*mmap)(struct mdev_device *mdev, struct vm_area_struct *vma);
+   int (*notifier)(struct mdev_device *mdev, unsigned long action,
+   void *data);
 };
 
 /* interface for exporting mdev supported type attributes */
-- 
2.7.0

Re: [Qemu-devel] [PATCH v14 11/22] vfio iommu: Add blocking notifier to notify DMA_UNMAP

2016-11-16 Thread Kirti Wankhede

Added blocking notifier to IOMMU TYPE1 driver to notify vendor drivers
about DMA_UNMAP.
Exported two APIs vfio_register_notifier() and vfio_unregister_notifier().
Notifier should be registered, if external user wants to use
vfio_pin_pages()/vfio_unpin_pages() APIs to pin/unpin pages.
Vendor driver should use VFIO_IOMMU_NOTIFY_DMA_UNMAP action to invalidate
mappings.

Signed-off-by: Kirti Wankhede 
Signed-off-by: Neo Jia 
Change-Id: I5910d0024d6be87f3e8d3e0ca0eaeaaa0b17f271
---
 drivers/vfio/vfio.c | 73 
 drivers/vfio/vfio_iommu_type1.c | 74 +
 include/linux/vfio.h| 12 +++
 3 files changed, 146 insertions(+), 13 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index bd36c16b0ef2..301eed07a0ab 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1901,6 +1901,79 @@ err_unpin_pages:
 }
 EXPORT_SYMBOL(vfio_unpin_pages);
 
+int vfio_register_notifier(struct device *dev, struct notifier_block *nb)
+{
+   struct vfio_container *container;
+   struct vfio_group *group;
+   struct vfio_iommu_driver *driver;
+   int ret;
+
+   if (!dev || !nb)
+   return -EINVAL;
+
+   group = vfio_group_get_from_dev(dev);
+   if (IS_ERR(group))
+   return PTR_ERR(group);
+
+   ret = vfio_group_add_container_user(group);
+   if (ret)
+   goto err_register_nb;
+
+   container = group->container;
+   down_read(>group_lock);
+
+   driver = container->iommu_driver;
+   if (likely(driver && driver->ops->register_notifier))
+   ret = driver->ops->register_notifier(container->iommu_data, nb);
+   else
+   ret = -ENOTTY;
+
+   up_read(>group_lock);
+   vfio_group_try_dissolve_container(group);
+
+err_register_nb:
+   vfio_group_put(group);
+   return ret;
+}
+EXPORT_SYMBOL(vfio_register_notifier);
+
+int vfio_unregister_notifier(struct device *dev, struct notifier_block *nb)
+{
+   struct vfio_container *container;
+   struct vfio_group *group;
+   struct vfio_iommu_driver *driver;
+   int ret;
+
+   if (!dev || !nb)
+   return -EINVAL;
+
+   group = vfio_group_get_from_dev(dev);
+   if (IS_ERR(group))
+   return PTR_ERR(group);
+
+   ret = vfio_group_add_container_user(group);
+   if (ret)
+   goto err_unregister_nb;
+
+   container = group->container;
+   down_read(>group_lock);
+
+   driver = container->iommu_driver;
+   if (likely(driver && driver->ops->unregister_notifier))
+   ret = driver->ops->unregister_notifier(container->iommu_data,
+  nb);
+   else
+   ret = -ENOTTY;
+
+   up_read(>group_lock);
+   vfio_group_try_dissolve_container(group);
+
+err_unregister_nb:
+   vfio_group_put(group);
+   return ret;
+}
+EXPORT_SYMBOL(vfio_unregister_notifier);
+
 /**
  * Module/class support
  */
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index db0901f8a149..511c517bae15 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -38,6 +38,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define DRIVER_VERSION  "0.2"
 #define DRIVER_AUTHOR   "Alex Williamson "
@@ -60,6 +61,7 @@ struct vfio_iommu {
struct vfio_domain  *external_domain; /* domain for external user */
struct mutexlock;
struct rb_root  dma_list;
+   struct blocking_notifier_head notifier;
boolv2;
boolnesting;
 };
@@ -561,7 +563,8 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
 
mutex_lock(>lock);
 
-   if (!iommu->external_domain) {
+   /* Fail if notifier list is empty */
+   if ((!iommu->external_domain) || (!iommu->notifier.head)) {
ret = -EINVAL;
goto pin_done;
}
@@ -776,9 +779,9 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
 struct vfio_iommu_type1_dma_unmap *unmap)
 {
uint64_t mask;
-   struct vfio_dma *dma;
+   struct vfio_dma *dma, *dma_last = NULL;
size_t unmapped = 0;
-   int ret = 0;
+   int ret = 0, retries;
 
mask = ((uint64_t)1 << __ffs(vfio_pgsize_bitmap(iommu))) - 1;
 
@@ -788,7 +791,7 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
return -EINVAL;
 
WARN_ON(mask & PAGE_MASK);
-
+again:
mutex_lock(>lock);
 
/*
@@ -844,6 +847,32 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
 */
if (dma->task->mm != current->mm)
break;
+
+   if (!RB_EMPTY_ROOT(>pfn_list)) {
+   struct

Re: [Qemu-devel] [PATCH v14 10/22] vfio iommu type1: Add support for mediated devices

2016-11-16 Thread Kirti Wankhede

VFIO IOMMU drivers are designed for the devices which are IOMMU capable.
Mediated device only uses IOMMU APIs, the underlying hardware can be
managed by an IOMMU domain.

Aim of this change is:
- To use most of the code of TYPE1 IOMMU driver for mediated devices
- To support direct assigned device and mediated device in single module

This change adds pin and unpin support for mediated device to TYPE1 IOMMU
backend module. More details:
- Domain for external user is tracked separately in vfio_iommu structure.
  It is allocated when group for first mdev device is attached.
- Pages pinned for external domain are tracked in each vfio_dma structure
  for that iova range.
- Page tracking rb-tree in vfio_dma keeps . Key of
  rb-tree is iova, but it actually aims to track pfns.
- On external pin request for an iova, page is pinned once, if iova is
  already pinned and tracked, ref_count is incremented.
- External unpin request unpins pages only when ref_count is 0.
- Pinned pages list is used to find pfn from iova and then unpin it.
  WARN_ON is added if there are entires in pfn_list while detaching the
  group and releasing the domain.
- Page accounting is updated to account in its address space where the
  pages are pinned/unpinned, i.e dma->task
-  Accouting for mdev device is only done if there is no iommu capable
  domain in the container. When there is a direct device assigned to the
  container and that domain is iommu capable, all pages are already pinned
  during DMA_MAP.
- Page accouting is updated on hot plug and unplug mdev device and pass
  through device.

Tested by assigning below combinations of devices to a single VM:
- GPU pass through only
- vGPU device only
- One GPU pass through and one vGPU device
- Linux VM hot plug and unplug vGPU device while GPU pass through device
  exist
- Linux VM hot plug and unplug GPU pass through device while vGPU device
  exist

Signed-off-by: Kirti Wankhede 
Signed-off-by: Neo Jia 
Change-Id: I295d6f0f2e0579b8d9882bfd8fd5a4194b97bd9a
---
 drivers/vfio/vfio_iommu_type1.c | 621 ++--
 1 file changed, 537 insertions(+), 84 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index a0a7484cec64..db0901f8a149 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define DRIVER_VERSION  "0.2"
 #define DRIVER_AUTHOR   "Alex Williamson "
@@ -56,6 +57,7 @@ MODULE_PARM_DESC(disable_hugepages,
 
 struct vfio_iommu {
struct list_headdomain_list;
+   struct vfio_domain  *external_domain; /* domain for external user */
struct mutexlock;
struct rb_root  dma_list;
boolv2;
@@ -76,7 +78,9 @@ struct vfio_dma {
unsigned long   vaddr;  /* Process virtual addr */
size_t  size;   /* Map size (bytes) */
int prot;   /* IOMMU_READ/WRITE */
+   booliommu_mapped;
struct task_struct  *task;
+   struct rb_root  pfn_list;   /* Ex-user pinned pfn list */
 };
 
 struct vfio_group {
@@ -85,6 +89,21 @@ struct vfio_group {
 };
 
 /*
+ * Guest RAM pinning working set or DMA target
+ */
+struct vfio_pfn {
+   struct rb_node  node;
+   dma_addr_t  iova;   /* Device address */
+   unsigned long   pfn;/* Host pfn */
+   atomic_tref_count;
+};
+
+#define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)\
+   (!list_empty(>domain_list))
+
+static int put_pfn(unsigned long pfn, int prot);
+
+/*
  * This code handles mapping and unmapping of user data buffers
  * into DMA'ble space using the IOMMU
  */
@@ -132,6 +151,97 @@ static void vfio_unlink_dma(struct vfio_iommu *iommu, 
struct vfio_dma *old)
rb_erase(>node, >dma_list);
 }
 
+/*
+ * Helper Functions for host iova-pfn list
+ */
+static struct vfio_pfn *vfio_find_vpfn(struct vfio_dma *dma, dma_addr_t iova)
+{
+   struct vfio_pfn *vpfn;
+   struct rb_node *node = dma->pfn_list.rb_node;
+
+   while (node) {
+   vpfn = rb_entry(node, struct vfio_pfn, node);
+
+   if (iova < vpfn->iova)
+   node = node->rb_left;
+   else if (iova > vpfn->iova)
+   node = node->rb_right;
+   else
+   return vpfn;
+   }
+   return NULL;
+}
+
+static void vfio_link_pfn(struct vfio_dma *dma,
+ struct vfio_pfn *new)
+{
+   struct rb_node **link, *parent = NULL;
+   struct vfio_pfn *vpfn;
+
+   link = >pfn_list.rb_node;
+   while (*link) {
+   parent = *link;
+   vpfn = rb_entry(parent,

Re: [Qemu-devel] [PATCH v14 10/22] vfio iommu type1: Add support for mediated devices

2016-11-16 Thread Kirti Wankhede



On 11/17/2016 5:27 AM, Alex Williamson wrote:
> On Thu, 17 Nov 2016 02:16:22 +0530
> Kirti Wankhede  wrote:
>> @@ -931,6 +1344,24 @@ static void vfio_iommu_type1_detach_group(void 
>> *iommu_data,
>>  
>>  mutex_lock(>lock);
>>  
>> +if (iommu->external_domain) {
>> +group = find_iommu_group(iommu->external_domain, iommu_group);
>> +if (group) {
>> +list_del(>next);
>> +kfree(group);
>> +
>> +if (list_empty(>external_domain->group_list)) {
>> +vfio_sanity_check_pfn_list(iommu);
>> +
>> +if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu))
>> +vfio_iommu_unmap_unpin_all(iommu);
>> +
>> +kfree(iommu->external_domain);
> 
> I advised in one place that I didn't understand why  we were checking
> iommu->external_domain before walking the pfn_list, but we do have
> several checks still in place for if(iommu->external_domain), so I
> think we better be setting to NULL after we free it.
> 
> I haven't finished my review yet, but if this ends up being the only
> comment and you agree, I can add:
> 
> iommu->external_domain = NULL;
> 
> here on commit.  Thanks,
> 

Thanks Alex.

I'm updating this patch and 11/22 as per your comment here and  on 11/22.

Thanks,
Kirti

[Qemu-devel] [PATCH 1/1] hw/net/spapr_llan: 6 byte mac address device tree entry

2016-11-16 Thread Sam Bobroff

The spapr-vlan device in QEMU has always presented it's MAC address in
the device tree as an 8 byte value, even though PAPR requires it to be
6 bytes.  This is because, at the time, AIX required the value to be 8
bytes.  However, modern versions of AIX only support the (correct) 6
byte value so they are now failing to get this value correctly.

This patch removes the old workaround and presents the address as the
correct 6 byte value in the device tree.

However, the value is also consumed by the Linux ibmveth driver.

Since commit 13f85203e (3.10, May 2013) the driver has been able to
handle 6 or 8 byte addresses so versions after that should be
unaffected by this change.

Drivers from kernels before that can also handle either type of
address, but not always:
* If the first byte's lowest bits are 10, the address must be 6 bytes.
* Otherwise, the address must be 8 bytes.
(The two bits in question are significant in a MAC address: they
indicate a locally-administered unicast address.)

After this change they will see incorrect values for broadcast or
non-locally generated addresses. AFAIK these addresses would not
normally be used as the address of an emulated adapter, so any
breakage should be rare. The breakage would appear as the MAC address
losing the first two bytes and receiving two bytes of garbage at the
other end.

Signed-off-by: Sam Bobroff 
---

 hw/net/spapr_llan.c | 14 +-
 1 file changed, 1 insertion(+), 13 deletions(-)

diff --git a/hw/net/spapr_llan.c b/hw/net/spapr_llan.c
index 01ecb02..eebf7cc 100644
--- a/hw/net/spapr_llan.c
+++ b/hw/net/spapr_llan.c
@@ -381,22 +381,10 @@ void spapr_vlan_create(VIOsPAPRBus *bus, NICInfo *nd)
 static int spapr_vlan_devnode(VIOsPAPRDevice *dev, void *fdt, int node_off)
 {
 VIOsPAPRVLANDevice *vdev = VIO_SPAPR_VLAN_DEVICE(dev);
-uint8_t padded_mac[8] = {0, 0};
 int ret;
 
-/* Some old phyp versions give the mac address in an 8-byte
- * property.  The kernel driver has an insane workaround for this;
- * rather than doing the obvious thing and checking the property
- * length, it checks whether the first byte has 0b10 in the low
- * bits.  If a correct 6-byte property has a different first byte
- * the kernel will get the wrong mac address, overrunning its
- * buffer in the process (read only, thank goodness).
- *
- * Here we workaround the kernel workaround by always supplying an
- * 8-byte property, with the mac address in the last six bytes */
-memcpy(_mac[2], >nicconf.macaddr, ETH_ALEN);
 ret = fdt_setprop(fdt, node_off, "local-mac-address",
-  padded_mac, sizeof(padded_mac));
+  >nicconf.macaddr, ETH_ALEN);
 if (ret < 0) {
 return ret;
 }
-- 
2.10.0.297.gf6727b0

Re: [Qemu-devel] [PATCH 2/4] target-ppc: Implement bcdctsq. instruction

2016-11-16 Thread David Gibson

On Wed, Nov 16, 2016 at 06:07:28PM -0200, Jose Ricardo Ziviani wrote:
> bcdctsq.: Decimal convert to signed quadword. It is possible to
> convert packed decimal values to signed quadwords.
> 
> Signed-off-by: Jose Ricardo Ziviani 
> ---
>  target-ppc/helper.h |  1 +
>  target-ppc/int_helper.c | 39 
> +
>  target-ppc/translate/vmx-impl.inc.c |  7 +++
>  3 files changed, 47 insertions(+)
> 
> diff --git a/target-ppc/helper.h b/target-ppc/helper.h
> index 87f533c..503f257 100644
> --- a/target-ppc/helper.h
> +++ b/target-ppc/helper.h
> @@ -383,6 +383,7 @@ DEF_HELPER_3(bcdctn, i32, avr, avr, i32)
>  DEF_HELPER_3(bcdcfz, i32, avr, avr, i32)
>  DEF_HELPER_3(bcdctz, i32, avr, avr, i32)
>  DEF_HELPER_3(bcdcfsq, i32, avr, avr, i32)
> +DEF_HELPER_3(bcdctsq, i32, avr, avr, i32)
>  
>  DEF_HELPER_2(xsadddp, void, env, i32)
>  DEF_HELPER_2(xssubdp, void, env, i32)
> diff --git a/target-ppc/int_helper.c b/target-ppc/int_helper.c
> index db65a51..1025438 100644
> --- a/target-ppc/int_helper.c
> +++ b/target-ppc/int_helper.c
> @@ -2922,6 +2922,45 @@ uint32_t helper_bcdcfsq(ppc_avr_t *r, ppc_avr_t *b, 
> uint32_t ps)
>  return cr;
>  }
>  
> +uint32_t helper_bcdctsq(ppc_avr_t *r, ppc_avr_t *b, uint32_t ps)
> +{
> +uint8_t i;
> +int cr = 0;
> +uint64_t hi = 0;
> +int sgnb = bcd_get_sgn(b);
> +int invalid = (sgnb == 0);
> +ppc_avr_t ret = { .u64 = { 0, 0 } };
> +
> +ret.u64[LO_IDX] = bcd_get_digit(b, 31, );
> +for (i = 30; i > 0; i--) {
> +mulu64([LO_IDX], ,
> +ret.u64[LO_IDX], 10ULL);
> +
> +ret.u64[HI_IDX] = (ret.u64[HI_IDX]) ? ret.u64[HI_IDX] * 10 + hi : hi;
> +ret.u64[LO_IDX] += bcd_get_digit(b, i, );

Again, it might be simpler to use the int128 code we already have in qemu.

> +if (unlikely(invalid)) {
> +break;
> +}
> +}
> +
> +if (sgnb == -1) {
> +if (ret.s64[HI_IDX] > 0) {
> +ret.s64[HI_IDX] = -ret.s64[HI_IDX];
> +} else {
> +ret.s64[LO_IDX] = -ret.s64[LO_IDX];
> +}

As on the other direction, I don't think this looks like a correct
128-bit negate.

> +}
> +
> +cr = bcd_cmp_zero(b);
> +
> +if (unlikely(invalid)) {
> +cr = 1 << CRF_SO;
> +}
> +
> +return cr;
> +}
> +
>  void helper_vsbox(ppc_avr_t *r, ppc_avr_t *a)
>  {
>  int i;
> diff --git a/target-ppc/translate/vmx-impl.inc.c 
> b/target-ppc/translate/vmx-impl.inc.c
> index 36141e5..1579b58 100644
> --- a/target-ppc/translate/vmx-impl.inc.c
> +++ b/target-ppc/translate/vmx-impl.inc.c
> @@ -990,10 +990,14 @@ GEN_BCD2(bcdctn)
>  GEN_BCD2(bcdcfz)
>  GEN_BCD2(bcdctz)
>  GEN_BCD2(bcdcfsq)
> +GEN_BCD2(bcdctsq)
>  
>  static void gen_xpnd04_1(DisasContext *ctx)
>  {
>  switch (opc4(ctx->opcode)) {
> +case 0:
> +gen_bcdctsq(ctx);
> +break;
>  case 2:
>  gen_bcdcfsq(ctx);
>  break;
> @@ -1018,6 +1022,9 @@ static void gen_xpnd04_1(DisasContext *ctx)
>  static void gen_xpnd04_2(DisasContext *ctx)
>  {
>  switch (opc4(ctx->opcode)) {
> +case 0:
> +gen_bcdctsq(ctx);
> +break;
>  case 2:
>  gen_bcdcfsq(ctx);
>  break;

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH 4/4] target-ppc: Implement bcdsetsgn. instruction

2016-11-16 Thread David Gibson

On Wed, Nov 16, 2016 at 06:07:30PM -0200, Jose Ricardo Ziviani wrote:
> bcdsetsgn.: Decimal set sign. This instruction copies the register
> value to the result register but adjust the signal according to
> the preferred sign value.
> 
> Signed-off-by: Jose Ricardo Ziviani 
> ---
>  target-ppc/helper.h | 1 +
>  target-ppc/int_helper.c | 9 +
>  target-ppc/translate/vmx-impl.inc.c | 8 
>  3 files changed, 18 insertions(+)
> 
> diff --git a/target-ppc/helper.h b/target-ppc/helper.h
> index dada48e..cddac8e 100644
> --- a/target-ppc/helper.h
> +++ b/target-ppc/helper.h
> @@ -385,6 +385,7 @@ DEF_HELPER_3(bcdctz, i32, avr, avr, i32)
>  DEF_HELPER_3(bcdcfsq, i32, avr, avr, i32)
>  DEF_HELPER_3(bcdctsq, i32, avr, avr, i32)
>  DEF_HELPER_4(bcdcpsgn, i32, avr, avr, avr, i32)
> +DEF_HELPER_3(bcdsetsgn, i32, avr, avr, i32)
>  
>  DEF_HELPER_2(xsadddp, void, env, i32)
>  DEF_HELPER_2(xssubdp, void, env, i32)
> diff --git a/target-ppc/int_helper.c b/target-ppc/int_helper.c
> index a215bfe..38af503 100644
> --- a/target-ppc/int_helper.c
> +++ b/target-ppc/int_helper.c
> @@ -2991,6 +2991,15 @@ uint32_t helper_bcdcpsgn(ppc_avr_t *r, ppc_avr_t *a, 
> ppc_avr_t *b, uint32_t ps)
>  return cr;
>  }
>  
> +uint32_t helper_bcdsetsgn(ppc_avr_t *r, ppc_avr_t *b, uint32_t ps)
> +{
> +int sgnb = bcd_get_sgn(b);
> +ppc_avr_t ret = { .u64 = { 0, 0 } };
> +
> +bcd_put_digit(, bcd_preferred_sgn(sgnb, ps), 0);
> +return helper_bcdcpsgn(r, b, , ps);

This is doing a lot of work just to canonicalize the sign indicator.

> +}
> +
>  void helper_vsbox(ppc_avr_t *r, ppc_avr_t *a)
>  {
>  int i;
> diff --git a/target-ppc/translate/vmx-impl.inc.c 
> b/target-ppc/translate/vmx-impl.inc.c
> index c14b666..b188e60 100644
> --- a/target-ppc/translate/vmx-impl.inc.c
> +++ b/target-ppc/translate/vmx-impl.inc.c
> @@ -991,6 +991,7 @@ GEN_BCD2(bcdcfz)
>  GEN_BCD2(bcdctz)
>  GEN_BCD2(bcdcfsq)
>  GEN_BCD2(bcdctsq)
> +GEN_BCD2(bcdsetsgn)
>  GEN_BCD(bcdcpsgn);
>  
>  static void gen_xpnd04_1(DisasContext *ctx)
> @@ -1014,6 +1015,9 @@ static void gen_xpnd04_1(DisasContext *ctx)
>  case 7:
>  gen_bcdcfn(ctx);
>  break;
> +case 31:
> +gen_bcdsetsgn(ctx);
> +break;
>  default:
>  gen_invalid(ctx);
>  break;
> @@ -1038,12 +1042,16 @@ static void gen_xpnd04_2(DisasContext *ctx)
>  case 7:
>  gen_bcdcfn(ctx);
>  break;
> +case 31:
> +gen_bcdsetsgn(ctx);
> +break;
>  default:
>  gen_invalid(ctx);
>  break;
>  }
>  }
>  
> +
>  GEN_VXFORM_DUAL(vsubcuw, PPC_ALTIVEC, PPC_NONE, \
>  xpnd04_1, PPC_NONE, PPC2_ISA300)
>  GEN_VXFORM_DUAL(vsubsws, PPC_ALTIVEC, PPC_NONE, \

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [Qemu-ppc] [PATCH] tests/postcopy: Use KVM on ppc64 only if it is KVM-HV

2016-11-16 Thread David Gibson

On Wed, Nov 16, 2016 at 09:39:31AM +0100, Thomas Huth wrote:
> The ppc64 postcopy test does not work with KVM-PR, and it is also
> causing annoying warning messages when run on a x86 host. So let's
> use KVM here only if we know that we're running with KVM-HV (which
> automatically also means that we're running on a ppc64 host), and
> fall back to TCG otherwise.
> 
> Signed-off-by: Thomas Huth 

Applied to ppc-for-2.8.

Longer term, I think we should default to tcg for all these tests - on
x86 as well - then run KVM *as well* when available.  But in the short
term we should fix make check for the 2.8 release.

> ---
>  tests/postcopy-test.c | 12 
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/tests/postcopy-test.c b/tests/postcopy-test.c
> index d6613c5..dafe8be 100644
> --- a/tests/postcopy-test.c
> +++ b/tests/postcopy-test.c
> @@ -380,17 +380,21 @@ static void test_migrate(void)
>" -incoming %s",
>tmpfs, bootpath, uri);
>  } else if (strcmp(arch, "ppc64") == 0) {
> +const char *accel;
> +
> +/* On ppc64, the test only works with kvm-hv, but not with kvm-pr */
> +accel = access("/sys/module/kvm_hv", F_OK) ? "tcg" : "kvm:tcg";
>  init_bootfile_ppc(bootpath);
> -cmd_src = g_strdup_printf("-machine accel=kvm:tcg -m 256M"
> +cmd_src = g_strdup_printf("-machine accel=%s -m 256M"
>" -name pcsource,debug-threads=on"
>" -serial file:%s/src_serial"
>" -drive file=%s,if=pflash,format=raw",
> -  tmpfs, bootpath);
> -cmd_dst = g_strdup_printf("-machine accel=kvm:tcg -m 256M"
> +  accel, tmpfs, bootpath);
> +cmd_dst = g_strdup_printf("-machine accel=%s -m 256M"
>" -name pcdest,debug-threads=on"
>" -serial file:%s/dest_serial"
>" -incoming %s",
> -  tmpfs, uri);
> +  accel, tmpfs, uri);
>  } else {
>  g_assert_not_reached();
>  }

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH 3/4] target-ppc: Implement bcdcpsgn. instruction

2016-11-16 Thread David Gibson

On Wed, Nov 16, 2016 at 06:07:29PM -0200, Jose Ricardo Ziviani wrote:
> bcdcpsgn.: Decimal copy sign. Given two registers vra and vrb, it
> copies the vra value with vrb sign to the result register vrt.
> 
> Signed-off-by: Jose Ricardo Ziviani 
> ---
>  target-ppc/helper.h |  1 +
>  target-ppc/int_helper.c | 30 ++
>  target-ppc/translate/vmx-impl.inc.c |  3 +++
>  target-ppc/translate/vmx-ops.inc.c  |  2 +-
>  4 files changed, 35 insertions(+), 1 deletion(-)
> 
> diff --git a/target-ppc/helper.h b/target-ppc/helper.h
> index 503f257..dada48e 100644
> --- a/target-ppc/helper.h
> +++ b/target-ppc/helper.h
> @@ -384,6 +384,7 @@ DEF_HELPER_3(bcdcfz, i32, avr, avr, i32)
>  DEF_HELPER_3(bcdctz, i32, avr, avr, i32)
>  DEF_HELPER_3(bcdcfsq, i32, avr, avr, i32)
>  DEF_HELPER_3(bcdctsq, i32, avr, avr, i32)
> +DEF_HELPER_4(bcdcpsgn, i32, avr, avr, avr, i32)
>  
>  DEF_HELPER_2(xsadddp, void, env, i32)
>  DEF_HELPER_2(xssubdp, void, env, i32)
> diff --git a/target-ppc/int_helper.c b/target-ppc/int_helper.c
> index 1025438..a215bfe 100644
> --- a/target-ppc/int_helper.c
> +++ b/target-ppc/int_helper.c
> @@ -2961,6 +2961,36 @@ uint32_t helper_bcdctsq(ppc_avr_t *r, ppc_avr_t *b, 
> uint32_t ps)
>  return cr;
>  }
>  
> +uint32_t helper_bcdcpsgn(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b, uint32_t 
> ps)
> +{
> +int i;
> +int cr = 0;
> +int sgna = bcd_get_sgn(a);
> +int sgnb = bcd_get_sgn(b);
> +int invalid = (sgna == 0) || (sgnb == 0);
> +ppc_avr_t ret = { .u64 = { 0, 0 } };
> +
> +for (i = 1; i < 32; i++) {
> +bcd_put_digit(, bcd_get_digit(a, i, ), i);
> +bcd_get_digit(b, i, );

This is a lot of bit fiddling to accomplish what you could pretty much
do just by copying the entire register.  Checking for invalid input
makes it a bit more complex than that, of course, but you still don't
need to assemble every digit separately in the target.

> +
> +if (unlikely(invalid)) {
> +break;
> +}
> +}
> +bcd_put_digit(, bcd_get_digit(b, 0, ), 0);

This won't work.  bcd_get_digit() will set invalid if digit 0 is an
invalid *digit*.  But some valid sign indicators are not valid digits,
so this will erroneously set invalid in those cases.

> +
> +cr = bcd_cmp_zero(a);
> +
> +if (unlikely(invalid)) {
> +cr = 1 << CRF_SO;
> +}
> +
> +*r = ret;
> +
> +return cr;
> +}
> +
>  void helper_vsbox(ppc_avr_t *r, ppc_avr_t *a)
>  {
>  int i;
> diff --git a/target-ppc/translate/vmx-impl.inc.c 
> b/target-ppc/translate/vmx-impl.inc.c
> index 1579b58..c14b666 100644
> --- a/target-ppc/translate/vmx-impl.inc.c
> +++ b/target-ppc/translate/vmx-impl.inc.c
> @@ -991,6 +991,7 @@ GEN_BCD2(bcdcfz)
>  GEN_BCD2(bcdctz)
>  GEN_BCD2(bcdcfsq)
>  GEN_BCD2(bcdctsq)
> +GEN_BCD(bcdcpsgn);
>  
>  static void gen_xpnd04_1(DisasContext *ctx)
>  {
> @@ -1056,6 +1057,8 @@ GEN_VXFORM_DUAL(vsubuhm, PPC_ALTIVEC, PPC_NONE, \
>  bcdsub, PPC_NONE, PPC2_ALTIVEC_207)
>  GEN_VXFORM_DUAL(vsubuhs, PPC_ALTIVEC, PPC_NONE, \
>  bcdsub, PPC_NONE, PPC2_ALTIVEC_207)
> +GEN_VXFORM_DUAL(vaddshs, PPC_ALTIVEC, PPC_NONE, \
> +bcdcpsgn, PPC_NONE, PPC2_ISA300)
>  
>  static void gen_vsbox(DisasContext *ctx)
>  {
> diff --git a/target-ppc/translate/vmx-ops.inc.c 
> b/target-ppc/translate/vmx-ops.inc.c
> index f02b3be..70d7d2b 100644
> --- a/target-ppc/translate/vmx-ops.inc.c
> +++ b/target-ppc/translate/vmx-ops.inc.c
> @@ -131,7 +131,7 @@ GEN_VXFORM_DUAL(vaddubs, vmul10uq, 0, 8, PPC_ALTIVEC, 
> PPC_NONE),
>  GEN_VXFORM_DUAL(vadduhs, vmul10euq, 0, 9, PPC_ALTIVEC, PPC_NONE),
>  GEN_VXFORM(vadduws, 0, 10),
>  GEN_VXFORM(vaddsbs, 0, 12),
> -GEN_VXFORM(vaddshs, 0, 13),
> +GEN_VXFORM_DUAL(vaddshs, bcdcpsgn, 0, 13, PPC_ALTIVEC, PPC_NONE),
>  GEN_VXFORM(vaddsws, 0, 14),
>  GEN_VXFORM_DUAL(vsububs, bcdadd, 0, 24, PPC_ALTIVEC, PPC_NONE),
>  GEN_VXFORM_DUAL(vsubuhs, bcdsub, 0, 25, PPC_ALTIVEC, PPC_NONE),

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH for-2.8] migration: Fix return code of ram_save_iterate()

2016-11-16 Thread David Gibson

On Mon, Nov 14, 2016 at 07:34:59PM +0100, Juan Quintela wrote:
> Thomas Huth  wrote:
> > qemu_savevm_state_iterate() expects the iterators to return 1
> > when they are done, and 0 if there is still something left to do.
> > However, ram_save_iterate() does not obey this rule and returns
> > the number of saved pages instead. This causes a fatal hang with
> > ppc64 guests when you run QEMU like this (also works with TCG):
> >
> >  qemu-img create -f qcow2  /tmp/test.qcow2 1M
> >  qemu-system-ppc64 -nographic -nodefaults -m 256 \
> >-hda /tmp/test.qcow2 -serial mon:stdio
> >
> > ... then switch to the monitor by pressing CTRL-a c and try to
> > save a snapshot with "savevm test1" for example.
> >
> > After the first iteration, ram_save_iterate() always returns 0 here,
> > so that qemu_savevm_state_iterate() hangs in an endless loop and you
> > can only "kill -9" the QEMU process.
> > Fix it by using proper return values in ram_save_iterate().
> >
> > Signed-off-by: Thomas Huth 
> 
> Reviewed-by: Juan Quintela 
> 
> Applied.
> 
> I don't know how we broked this so much.

Note that block save iterate has the same bug...

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH] tests/postcopy: Use KVM on ppc64 only if it is KVM-HV

2016-11-16 Thread David Gibson

On Wed, Nov 16, 2016 at 02:17:47PM +0100, Thomas Huth wrote:
> On 16.11.2016 13:37, Greg Kurz wrote:
> > On Wed, 16 Nov 2016 12:24:50 +
> > "Dr. David Alan Gilbert"  wrote:
> > 
> >> * Greg Kurz (gr...@kaod.org) wrote:
> >>> On Wed, 16 Nov 2016 09:39:31 +0100
> >>> Thomas Huth  wrote:
> >>>   
>  The ppc64 postcopy test does not work with KVM-PR, and it is also
>  causing annoying warning messages when run on a x86 host. So let's
>  use KVM here only if we know that we're running with KVM-HV (which
>  automatically also means that we're running on a ppc64 host), and
>  fall back to TCG otherwise.
>    
> >>>
> >>> This patch addresses two issues actually:
> >>> - the annoying warning when running on a ppc64 guest on a non-ppc64 host
> >>> - the fact that KVM-PR seems to be currently broken
> >>>
> >>> I agree that the former makes sense, but what about the case of running
> >>> a x86 guest on a non-x86 host ?
> 
> Of course you also get these '"kvm" accelerator not found' messages
> there. But so far, I think nobody complained about that yet (only for
> ppc64 running on x86). And at least the test succeeds there - unlike
> with KVM-PR, where the test fails completely.

Well, I guess I should complain about them then.  It is slightly
irritating when doing my pre-pull tests on a ppc64 host, although I'm
more or less used to it now.

> >>> I'm still feeling uncomfortable with the KVM-PR case... is this a 
> >>> workaround
> >>> we want to keep until we find out what's going on or are we starting to
> >>> partially deprecate KVM PR ? In any case, I guess we should document this
> >>> and probably print some meaningful error message.  
> >>
> >> This is certainly a work around for now, it doesn't suggest anything about
> >> deprecation.
> > 
> > Well it doesn't suggest anything actually, it just silently skips KVM PR...
> > I would at least expect a comment in the code mentioning this is a
> > workaround and maybe an explicit warning for the user. If the user really
> > wants to run this test with KVM on ppc64, then she should ensure it is
> > KVM HV.
> 
> Honestly, also considering the number of patches that Laurent already
> wrote here and never have been accepted, all this has become quite an
> ugly bike-shed painting discussion.
> 
> My opinion:
> 
> - If we want to properly test KVM (be it KVM-HV or KVM-PR), write
>   a proper kvm-unit-test instead. I.e. I personally don't care if this
>   test in QEMU is only run with TCG or with KVM.
> 
> - The current status of "make check" is broken, since it does not
>   work on KVM-PR. We've got to fix that before the release.
> 
> That means I currently really don't care if we've spill out a warning
> message for KVM-PR here or not - sure, somebody just got to look at
> KVM-PR later, but that's IMHO off-topic for the test here in the QEMU
> context.
> 
> So if you think that the patch for fixing this issue here with the QEMU
> test should look differently, please propose a different patch instead.
> I'm fine with every other approach as long as we get this fixed in time
> for QEMU 2.8.

Hm, yeah, I concur.x

> 
>  Thomas
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH] vfio: avoid adding same iommu mr for notify

2016-11-16 Thread David Gibson

On Mon, Nov 14, 2016 at 07:59:28PM -0500, Peter Xu wrote:
> When one IOMMU memory region is splitted into multiple memory sections,
> vfio will register multiple same notifiers to a vIOMMU for the same
> region. That's not sensible. What we need is to register one IOMMU
> notifier for each IOMMU region, not per section.
> 
> Solution is simple - we traverse the container->giommu_list, and skip
> the registration if memory region is already registered.
> 
> To make vfio's region_add() short, vfio_listener_region_add_iommu() is
> introduced.
> 
> Signed-off-by: Peter Xu 

This is wrong.  It will work on the register side, but when the first
section attached to the IOMMU is removed, the IOMMU will be removed
from the list (triggering an unmap of the whole vfio space), rather
than when the *last* section attached to the MR is removed.

You'll get away with it in the simple x86 case, because both sections
will be removed at basically the same time, but it's not correct in
general.

I really think a better approach is to add the section boundary
information to the IOMMUNotifier structure within VFIOGuestIOMMU, and
check that as well as the MR on remove.  That additionally means the
IOMMU notifier won't get called for portions of the MR outside the
mapped sections, which the notifier handler probably isn't going to
expect.

> ---
>  hw/vfio/common.c | 56 
> +++-
>  1 file changed, 35 insertions(+), 21 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 801578b..5279fd1 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -360,6 +360,40 @@ out:
>  rcu_read_unlock();
>  }
>  
> +static void vfio_listener_region_add_iommu(VFIOContainer *container,
> +   MemoryRegionSection *section,
> +   hwaddr iova,
> +   hwaddr end)
> +{
> +VFIOGuestIOMMU *giommu;
> +
> +QLIST_FOREACH(giommu, >giommu_list, giommu_next) {
> +if (giommu->iommu == section->mr) {
> +/* We have already registered with this MR, skip */
> +return;
> +}
> +}
> +
> +trace_vfio_listener_region_add_iommu(iova, end);
> +
> +/*
> + * FIXME: For VFIO iommu types which have KVM acceleration to
> + * avoid bouncing all map/unmaps through qemu this way, this
> + * would be the right place to wire that up (tell the KVM
> + * device emulation the VFIO iommu handles to use).
> + */
> +giommu = g_malloc0(sizeof(*giommu));
> +giommu->iommu = section->mr;
> +giommu->iommu_offset = section->offset_within_address_space -
> +section->offset_within_region;
> +giommu->container = container;
> +giommu->n.notify = vfio_iommu_map_notify;
> +giommu->n.notifier_flags = IOMMU_NOTIFIER_ALL;
> +QLIST_INSERT_HEAD(>giommu_list, giommu, giommu_next);
> +memory_region_register_iommu_notifier(giommu->iommu, >n);
> +memory_region_iommu_replay(giommu->iommu, >n, false);
> +}
> +
>  static void vfio_listener_region_add(MemoryListener *listener,
>   MemoryRegionSection *section)
>  {
> @@ -439,27 +473,7 @@ static void vfio_listener_region_add(MemoryListener 
> *listener,
>  memory_region_ref(section->mr);
>  
>  if (memory_region_is_iommu(section->mr)) {
> -VFIOGuestIOMMU *giommu;
> -
> -trace_vfio_listener_region_add_iommu(iova, end);
> -/*
> - * FIXME: For VFIO iommu types which have KVM acceleration to
> - * avoid bouncing all map/unmaps through qemu this way, this
> - * would be the right place to wire that up (tell the KVM
> - * device emulation the VFIO iommu handles to use).
> - */
> -giommu = g_malloc0(sizeof(*giommu));
> -giommu->iommu = section->mr;
> -giommu->iommu_offset = section->offset_within_address_space -
> -   section->offset_within_region;
> -giommu->container = container;
> -giommu->n.notify = vfio_iommu_map_notify;
> -giommu->n.notifier_flags = IOMMU_NOTIFIER_ALL;
> -QLIST_INSERT_HEAD(>giommu_list, giommu, giommu_next);
> -
> -memory_region_register_iommu_notifier(giommu->iommu, >n);
> -memory_region_iommu_replay(giommu->iommu, >n, false);
> -
> +vfio_listener_region_add_iommu(container, section, iova, end);
>  return;
>  }
>  

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH 1/4] target-ppc: Implement bcdcfsq. instruction

2016-11-16 Thread David Gibson

On Wed, Nov 16, 2016 at 06:07:27PM -0200, Jose Ricardo Ziviani wrote:
> bcdcfsq.: Decimal convert from signed quadword. It is possible to

I think there should be a "not" in there.

> convert values less than 10^31-1 or greater than -10^31-1 to be
> represented in packed decimal format.
> 
> Signed-off-by: Jose Ricardo Ziviani 
> ---
>  target-ppc/helper.h |  1 +
>  target-ppc/int_helper.c | 48 
> +
>  target-ppc/translate/vmx-impl.inc.c |  7 ++
>  3 files changed, 56 insertions(+)
> 
> diff --git a/target-ppc/helper.h b/target-ppc/helper.h
> index da00f0a..87f533c 100644
> --- a/target-ppc/helper.h
> +++ b/target-ppc/helper.h
> @@ -382,6 +382,7 @@ DEF_HELPER_3(bcdcfn, i32, avr, avr, i32)
>  DEF_HELPER_3(bcdctn, i32, avr, avr, i32)
>  DEF_HELPER_3(bcdcfz, i32, avr, avr, i32)
>  DEF_HELPER_3(bcdctz, i32, avr, avr, i32)
> +DEF_HELPER_3(bcdcfsq, i32, avr, avr, i32)
>  
>  DEF_HELPER_2(xsadddp, void, env, i32)
>  DEF_HELPER_2(xssubdp, void, env, i32)
> diff --git a/target-ppc/int_helper.c b/target-ppc/int_helper.c
> index 9ac204a..db65a51 100644
> --- a/target-ppc/int_helper.c
> +++ b/target-ppc/int_helper.c
> @@ -2874,6 +2874,54 @@ uint32_t helper_bcdctz(ppc_avr_t *r, ppc_avr_t *b, 
> uint32_t ps)
>  return cr;
>  }
>  
> +uint32_t helper_bcdcfsq(ppc_avr_t *r, ppc_avr_t *b, uint32_t ps)
> +{
> +int i;
> +int cr = 0;
> +int ox_flag = 0;
> +uint64_t digit = 0;
> +uint64_t carry = 0;
> +uint64_t lo_value = 0;
> +uint64_t hi_value = 0;

Most of the variables above don't need initializers.

> +uint64_t max = ULLONG_MAX;
> +ppc_avr_t ret = { .u64 = { 0, 0 } };
> +
> +if (b->s64[HI_IDX] < 0) {
> +hi_value = -b->s64[HI_IDX];
> +lo_value = b->s64[LO_IDX];

I'm pretty sure this is wrong.  Take for example 128-bit -1:
   
Upper word is negative (64-bit -1), so
hi_value =  0001
lo_value =  

0x1   != +1

> +bcd_put_digit(, 0xD, 0);
> +} else if (b->s64[HI_IDX] == 0 && b->s64[LO_IDX] < 0) {
> +lo_value = -b->s64[LO_IDX];
> +bcd_put_digit(, 0xD, 0);
> +} else {
> +hi_value = b->s64[HI_IDX];
> +lo_value = b->s64[LO_IDX];
> +bcd_put_digit(, bcd_preferred_sgn(0, ps), 0);
> +}
> +
> +if (unlikely(hi_value > 0x7e37be2022)) {

This doesn't look right.  Unless by chance 10^31-1 is equal to (k*2^64
- 1) you need to look at the lo_value as well.

> +ox_flag = 1;

You might as well just return 1<< CRF_SO here - no point actually
computing a meaningless value.

> +}
> +
> +carry = hi_value;
> +for (i = 0; i < 32; i++, max /= 10, lo_value /= 10) {

Looks like this loop has one too many iterations - there are 32
iterations, but you only have 31 digits.

> +digit = ((max % 10) * hi_value) + (lo_value % 10) + carry;
> +carry = (digit > 9) ? digit / 10 : 0;
> +
> +bcd_put_digit(, (carry) ? digit % 10 : digit, i + 1);

Ugh, this is hard to follow.  We're already using an Int128 library in
the memory region code; wonder if we should just use that here as well.

> +}
> +
> +cr = bcd_cmp_zero();
> +
> +if (unlikely(ox_flag)) {
> +cr |= 1 << CRF_SO;
> +}
> +
> +*r = ret;
> +
> +return cr;
> +}
> +
>  void helper_vsbox(ppc_avr_t *r, ppc_avr_t *a)
>  {
>  int i;
> diff --git a/target-ppc/translate/vmx-impl.inc.c 
> b/target-ppc/translate/vmx-impl.inc.c
> index 7143eb3..36141e5 100644
> --- a/target-ppc/translate/vmx-impl.inc.c
> +++ b/target-ppc/translate/vmx-impl.inc.c
> @@ -989,10 +989,14 @@ GEN_BCD2(bcdcfn)
>  GEN_BCD2(bcdctn)
>  GEN_BCD2(bcdcfz)
>  GEN_BCD2(bcdctz)
> +GEN_BCD2(bcdcfsq)
>  
>  static void gen_xpnd04_1(DisasContext *ctx)
>  {
>  switch (opc4(ctx->opcode)) {
> +case 2:
> +gen_bcdcfsq(ctx);
> +break;
>  case 4:
>  gen_bcdctz(ctx);
>  break;
> @@ -1014,6 +1018,9 @@ static void gen_xpnd04_1(DisasContext *ctx)
>  static void gen_xpnd04_2(DisasContext *ctx)
>  {
>  switch (opc4(ctx->opcode)) {
> +case 2:
> +gen_bcdcfsq(ctx);
> +break;
>  case 4:
>  gen_bcdctz(ctx);
>  break;

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH 07/25] target-ppc: Use clz and ctz opcodes

2016-11-16 Thread David Gibson

On Wed, Nov 16, 2016 at 08:25:17PM +0100, Richard Henderson wrote:
> Cc: qemu-...@nongnu.org
> Cc: David Gibson 
> Signed-off-by: Richard Henderson 

Reviewed-by: David Gibson 

> ---
>  target-ppc/helper.h |  4 
>  target-ppc/int_helper.c | 20 
>  target-ppc/translate.c  | 20 
>  3 files changed, 16 insertions(+), 28 deletions(-)
> 
> diff --git a/target-ppc/helper.h b/target-ppc/helper.h
> index da00f0a..1ed1d2c 100644
> --- a/target-ppc/helper.h
> +++ b/target-ppc/helper.h
> @@ -38,16 +38,12 @@ DEF_HELPER_4(divde, i64, env, i64, i64, i32)
>  DEF_HELPER_4(divweu, tl, env, tl, tl, i32)
>  DEF_HELPER_4(divwe, tl, env, tl, tl, i32)
>  
> -DEF_HELPER_FLAGS_1(cntlzw, TCG_CALL_NO_RWG_SE, tl, tl)
> -DEF_HELPER_FLAGS_1(cnttzw, TCG_CALL_NO_RWG_SE, tl, tl)
>  DEF_HELPER_FLAGS_1(popcntb, TCG_CALL_NO_RWG_SE, tl, tl)
>  DEF_HELPER_FLAGS_1(popcntw, TCG_CALL_NO_RWG_SE, tl, tl)
>  DEF_HELPER_FLAGS_2(cmpb, TCG_CALL_NO_RWG_SE, tl, tl, tl)
>  DEF_HELPER_3(sraw, tl, env, tl, tl)
>  #if defined(TARGET_PPC64)
>  DEF_HELPER_FLAGS_2(cmpeqb, TCG_CALL_NO_RWG_SE, i32, tl, tl)
> -DEF_HELPER_FLAGS_1(cntlzd, TCG_CALL_NO_RWG_SE, tl, tl)
> -DEF_HELPER_FLAGS_1(cnttzd, TCG_CALL_NO_RWG_SE, tl, tl)
>  DEF_HELPER_FLAGS_1(popcntd, TCG_CALL_NO_RWG_SE, tl, tl)
>  DEF_HELPER_FLAGS_2(bpermd, TCG_CALL_NO_RWG_SE, i64, i64, i64)
>  DEF_HELPER_3(srad, tl, env, tl, tl)
> diff --git a/target-ppc/int_helper.c b/target-ppc/int_helper.c
> index 9ac204a..a6486ce 100644
> --- a/target-ppc/int_helper.c
> +++ b/target-ppc/int_helper.c
> @@ -141,16 +141,6 @@ uint64_t helper_divde(CPUPPCState *env, uint64_t rau, 
> uint64_t rbu, uint32_t oe)
>  #endif
>  
>  
> -target_ulong helper_cntlzw(target_ulong t)
> -{
> -return clz32(t);
> -}
> -
> -target_ulong helper_cnttzw(target_ulong t)
> -{
> -return ctz32(t);
> -}
> -
>  #if defined(TARGET_PPC64)
>  /* if x = 0xab, returns 0xababababababababa */
>  #define pattern(x) (((x) & 0xff) * (~(target_ulong)0 / 0xff))
> @@ -174,16 +164,6 @@ uint32_t helper_cmpeqb(target_ulong ra, target_ulong rb)
>  #undef haszero
>  #undef hasvalue
>  
> -target_ulong helper_cntlzd(target_ulong t)
> -{
> -return clz64(t);
> -}
> -
> -target_ulong helper_cnttzd(target_ulong t)
> -{
> -return ctz64(t);
> -}
> -
>  /* Return invalid random number.
>   *
>   * FIXME: Add rng backend or other mechanism to get cryptographically 
> suitable
> diff --git a/target-ppc/translate.c b/target-ppc/translate.c
> index 435c6f0..1224f56 100644
> --- a/target-ppc/translate.c
> +++ b/target-ppc/translate.c
> @@ -1641,7 +1641,13 @@ static void gen_andis_(DisasContext *ctx)
>  /* cntlzw */
>  static void gen_cntlzw(DisasContext *ctx)
>  {
> -gen_helper_cntlzw(cpu_gpr[rA(ctx->opcode)], cpu_gpr[rS(ctx->opcode)]);
> +TCGv_i32 t = tcg_temp_new_i32();
> +
> +tcg_gen_trunc_tl_i32(t, cpu_gpr[rS(ctx->opcode)]);
> +tcg_gen_clzi_i32(t, t, 32);
> +tcg_gen_extu_i32_tl(cpu_gpr[rA(ctx->opcode)], t);
> +tcg_temp_free_i32(t);
> +
>  if (unlikely(Rc(ctx->opcode) != 0))
>  gen_set_Rc0(ctx, cpu_gpr[rA(ctx->opcode)]);
>  }
> @@ -1649,7 +1655,13 @@ static void gen_cntlzw(DisasContext *ctx)
>  /* cnttzw */
>  static void gen_cnttzw(DisasContext *ctx)
>  {
> -gen_helper_cnttzw(cpu_gpr[rA(ctx->opcode)], cpu_gpr[rS(ctx->opcode)]);
> +TCGv_i32 t = tcg_temp_new_i32();
> +
> +tcg_gen_trunc_tl_i32(t, cpu_gpr[rS(ctx->opcode)]);
> +tcg_gen_ctzi_i32(t, t, 32);
> +tcg_gen_extu_i32_tl(cpu_gpr[rA(ctx->opcode)], t);
> +tcg_temp_free_i32(t);
> +
>  if (unlikely(Rc(ctx->opcode) != 0)) {
>  gen_set_Rc0(ctx, cpu_gpr[rA(ctx->opcode)]);
>  }
> @@ -1891,7 +1903,7 @@ GEN_LOGICAL1(extsw, tcg_gen_ext32s_tl, 0x1E, PPC_64B);
>  /* cntlzd */
>  static void gen_cntlzd(DisasContext *ctx)
>  {
> -gen_helper_cntlzd(cpu_gpr[rA(ctx->opcode)], cpu_gpr[rS(ctx->opcode)]);
> +tcg_gen_clzi_i64(cpu_gpr[rA(ctx->opcode)], cpu_gpr[rS(ctx->opcode)], 64);
>  if (unlikely(Rc(ctx->opcode) != 0))
>  gen_set_Rc0(ctx, cpu_gpr[rA(ctx->opcode)]);
>  }
> @@ -1899,7 +1911,7 @@ static void gen_cntlzd(DisasContext *ctx)
>  /* cnttzd */
>  static void gen_cnttzd(DisasContext *ctx)
>  {
> -gen_helper_cnttzd(cpu_gpr[rA(ctx->opcode)], cpu_gpr[rS(ctx->opcode)]);
> +tcg_gen_ctzi_i64(cpu_gpr[rA(ctx->opcode)], cpu_gpr[rS(ctx->opcode)], 64);
>  if (unlikely(Rc(ctx->opcode) != 0)) {
>  gen_set_Rc0(ctx, cpu_gpr[rA(ctx->opcode)]);
>  }

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

[Qemu-devel] [PATCH v3 3/4] 9pfs: add cleanup operation for handle backend driver

2016-11-16 Thread Li Qiang

In the init operation of handle backend dirver, it allocates a
handle_data struct and opens a mount file. We should free these
resources when the 9pfs device is unrealized. This is what this
patch does.

Signed-off-by: Li Qiang 
---
 hw/9pfs/9p-handle.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/hw/9pfs/9p-handle.c b/hw/9pfs/9p-handle.c
index 3d77594..1687661 100644
--- a/hw/9pfs/9p-handle.c
+++ b/hw/9pfs/9p-handle.c
@@ -649,6 +649,14 @@ out:
 return ret;
 }
 
+static void handle_cleanup(FsContext *ctx)
+{
+struct handle_data *data = ctx->private;
+
+close(data->mountfd);
+g_free(data);
+}
+
 static int handle_parse_opts(QemuOpts *opts, struct FsDriverEntry *fse)
 {
 const char *sec_model = qemu_opt_get(opts, "security_model");
@@ -671,6 +679,7 @@ static int handle_parse_opts(QemuOpts *opts, struct 
FsDriverEntry *fse)
 FileOperations handle_ops = {
 .parse_opts   = handle_parse_opts,
 .init = handle_init,
+.cleanup  = handle_cleanup,
 .lstat= handle_lstat,
 .readlink = handle_readlink,
 .close= handle_close,
-- 
1.8.3.1

[Qemu-devel] [PATCH v3 2/4] 9pfs: add cleanup operation in FileOperations

2016-11-16 Thread Li Qiang

Currently, the backend of VirtFS doesn't have a cleanup
function. This will lead resource leak issues if the backed
driver allocates resources. This patch addresses this issue.

Signed-off-by: Li Qiang 
---

Changes since the v1:
-move the cleanup stuff above calls to g_free
-add cleanup call in the error path of realize if init was called

 fsdev/file-op-9p.h | 1 +
 hw/9pfs/9p.c   | 6 ++
 2 files changed, 7 insertions(+)

diff --git a/fsdev/file-op-9p.h b/fsdev/file-op-9p.h
index 6db9fea..a56dc84 100644
--- a/fsdev/file-op-9p.h
+++ b/fsdev/file-op-9p.h
@@ -100,6 +100,7 @@ struct FileOperations
 {
 int (*parse_opts)(QemuOpts *, struct FsDriverEntry *);
 int (*init)(struct FsContext *);
+void (*cleanup)(struct FsContext *);
 int (*lstat)(FsContext *, V9fsPath *, struct stat *);
 ssize_t (*readlink)(FsContext *, V9fsPath *, char *, size_t);
 int (*chmod)(FsContext *, V9fsPath *, FsCred *);
diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
index 087b5c9..faebd91 100644
--- a/hw/9pfs/9p.c
+++ b/hw/9pfs/9p.c
@@ -3521,6 +3521,9 @@ int v9fs_device_realize_common(V9fsState *s, Error **errp)
 rc = 0;
 out:
 if (rc) {
+if (s->ops->cleanup && s->ctx.private) {
+s->ops->cleanup(>ctx);
+}
 g_free(s->tag);
 g_free(s->ctx.fs_root);
 v9fs_path_free();
@@ -3530,6 +3533,9 @@ out:
 
 void v9fs_device_unrealize_common(V9fsState *s, Error **errp)
 {
+if (s->ops->cleanup) {
+s->ops->cleanup(>ctx);
+}
 g_free(s->tag);
 g_free(s->ctx.fs_root);
 }
-- 
1.8.3.1

[Qemu-devel] [PATCH v3 1/4] 9pfs: adjust the order of resource cleanup in device unrealize

2016-11-16 Thread Li Qiang

Unrealize should undo things that were set during realize in
reverse order. So should do in the error path in realize.

Signed-off-by: Li Qiang 
---

Changes since the v2:
-adjust the order in the error path in realize

 hw/9pfs/9p.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
index aea7e9d..087b5c9 100644
--- a/hw/9pfs/9p.c
+++ b/hw/9pfs/9p.c
@@ -3521,8 +3521,8 @@ int v9fs_device_realize_common(V9fsState *s, Error **errp)
 rc = 0;
 out:
 if (rc) {
-g_free(s->ctx.fs_root);
 g_free(s->tag);
+g_free(s->ctx.fs_root);
 v9fs_path_free();
 }
 return rc;
@@ -3530,8 +3530,8 @@ out:
 
 void v9fs_device_unrealize_common(V9fsState *s, Error **errp)
 {
-g_free(s->ctx.fs_root);
 g_free(s->tag);
+g_free(s->ctx.fs_root);
 }
 
 typedef struct VirtfsCoResetData {
-- 
1.8.3.1

[Qemu-devel] [PATCH v3 4/4] 9pfs: add cleanup operation for proxy backend driver

2016-11-16 Thread Li Qiang

In the init operation of proxy backend dirver, it allocates a
V9fsProxy struct and some other resources. We should free these
resources when the 9pfs device is unrealized. This is what this
patch does.

Signed-off-by: Li Qiang 
---

Changes since the v2:
-only close proxy->sockfd if QEMU opened the fd
-do the cleanup work in reverse order

 hw/9pfs/9p-proxy.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/hw/9pfs/9p-proxy.c b/hw/9pfs/9p-proxy.c
index f2417b7..f4aa7a9 100644
--- a/hw/9pfs/9p-proxy.c
+++ b/hw/9pfs/9p-proxy.c
@@ -1168,9 +1168,22 @@ static int proxy_init(FsContext *ctx)
 return 0;
 }
 
+static void proxy_cleanup(FsContext *ctx)
+{
+V9fsProxy *proxy = ctx->private;
+
+g_free(proxy->out_iovec.iov_base);
+g_free(proxy->in_iovec.iov_base);
+if (ctx->export_flags & V9FS_PROXY_SOCK_NAME) {
+close(proxy->sockfd);
+}
+g_free(proxy);
+}
+
 FileOperations proxy_ops = {
 .parse_opts   = proxy_parse_opts,
 .init = proxy_init,
+.cleanup  = proxy_cleanup,
 .lstat= proxy_lstat,
 .readlink = proxy_readlink,
 .close= proxy_close,
-- 
1.8.3.1

[Qemu-devel] [PATCH v3 0/4] 9pfs: add cleanup operation in handle/proxy backend

2016-11-16 Thread Li Qiang

Currently, the backend of VirtFS doesn't have a cleanup
function. This will leak some resources in handle and proxy
backend driver. This patchset addresses this issue.

Li Qiang (4):
  9pfs: adjust the order of resource cleanup in device unrealize
  9pfs: add cleanup operation in FileOperations
  9pfs: add cleanup operation for handle backend driver
  9pfs: add cleanup operation for proxy backend driver

 fsdev/file-op-9p.h  |  1 +
 hw/9pfs/9p-handle.c |  9 +
 hw/9pfs/9p-proxy.c  | 13 +
 hw/9pfs/9p.c| 10 --
 4 files changed, 31 insertions(+), 2 deletions(-)

-- 
1.8.3.1

Re: [Qemu-devel] [PATCH v14 11/22] vfio iommu: Add blocking notifier to notify DMA_UNMAP

2016-11-16 Thread Jike Song

On 11/17/2016 04:46 AM, Kirti Wankhede wrote:
> Added blocking notifier to IOMMU TYPE1 driver to notify vendor drivers
> about DMA_UNMAP.
> Exported two APIs vfio_register_notifier() and vfio_unregister_notifier().
> Notifier should be registered, if external user wants to use
> vfio_pin_pages()/vfio_unpin_pages() APIs to pin/unpin pages.
> Vendor driver should use VFIO_IOMMU_NOTIFY_DMA_UNMAP action to invalidate
> mappings.
> 
> Signed-off-by: Kirti Wankhede 
> Signed-off-by: Neo Jia 
> Change-Id: I5910d0024d6be87f3e8d3e0ca0eaeaaa0b17f271
> ---
>  drivers/vfio/vfio.c | 73 ++
>  drivers/vfio/vfio_iommu_type1.c | 77 
> +
>  include/linux/vfio.h| 12 +++
>  3 files changed, 147 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index bd36c16b0ef2..c850ba324be2 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -1901,6 +1901,79 @@ err_unpin_pages:
>  }
>  EXPORT_SYMBOL(vfio_unpin_pages);
>  
> +int vfio_register_notifier(struct device *dev, struct notifier_block *nb)
> +{
> + struct vfio_container *container;
> + struct vfio_group *group;
> + struct vfio_iommu_driver *driver;
> + ssize_t ret;
> +

Any reason being 'ssize_t' here (and unregister)?

--
Thanks,
Jike
> + if (!dev || !nb)
> + return -EINVAL;
> +
> + group = vfio_group_get_from_dev(dev);
> + if (IS_ERR(group))
> + return PTR_ERR(group);
> +
> + ret = vfio_group_add_container_user(group);
> + if (ret)
> + goto err_register_nb;
> +
> + container = group->container;
> + down_read(>group_lock);
> +
> + driver = container->iommu_driver;
> + if (likely(driver && driver->ops->register_notifier))
> + ret = driver->ops->register_notifier(container->iommu_data, nb);
> + else
> + ret = -ENOTTY;
> +
> + up_read(>group_lock);
> + vfio_group_try_dissolve_container(group);
> +
> +err_register_nb:
> + vfio_group_put(group);
> + return ret;
> +}
> +EXPORT_SYMBOL(vfio_register_notifier);
> +
> +int vfio_unregister_notifier(struct device *dev, struct notifier_block *nb)
> +{
> + struct vfio_container *container;
> + struct vfio_group *group;
> + struct vfio_iommu_driver *driver;
> + ssize_t ret;
> +
> + if (!dev || !nb)
> + return -EINVAL;
> +
> + group = vfio_group_get_from_dev(dev);
> + if (IS_ERR(group))
> + return PTR_ERR(group);
> +
> + ret = vfio_group_add_container_user(group);
> + if (ret)
> + goto err_unregister_nb;
> +
> + container = group->container;
> + down_read(>group_lock);
> +
> + driver = container->iommu_driver;
> + if (likely(driver && driver->ops->unregister_notifier))
> + ret = driver->ops->unregister_notifier(container->iommu_data,
> +nb);
> + else
> + ret = -ENOTTY;
> +
> + up_read(>group_lock);
> + vfio_group_try_dissolve_container(group);
> +
> +err_unregister_nb:
> + vfio_group_put(group);
> + return ret;
> +}
> +EXPORT_SYMBOL(vfio_unregister_notifier);
> +
>  /**
>   * Module/class support
>   */
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 98191fc590f8..63fbc48a088f 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -38,6 +38,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #define DRIVER_VERSION  "0.2"
>  #define DRIVER_AUTHOR   "Alex Williamson "
> @@ -60,6 +61,7 @@ struct vfio_iommu {
>   struct vfio_domain  *external_domain; /* domain for external user */
>   struct mutexlock;
>   struct rb_root  dma_list;
> + struct blocking_notifier_head notifier;
>   boolv2;
>   boolnesting;
>  };
> @@ -561,7 +563,8 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
>  
>   mutex_lock(>lock);
>  
> - if (!iommu->external_domain) {
> + /* Fail if notifier list is empty */
> + if ((!iommu->external_domain) || (!iommu->notifier.head)) {
>   ret = -EINVAL;
>   goto pin_done;
>   }
> @@ -776,9 +779,9 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
>struct vfio_iommu_type1_dma_unmap *unmap)
>  {
>   uint64_t mask;
> - struct vfio_dma *dma;
> + struct vfio_dma *dma, *dma_last = NULL;
>   size_t unmapped = 0;
> - int ret = 0;
> + int ret = 0, retries;
>  
>   mask = ((uint64_t)1 << __ffs(vfio_pgsize_bitmap(iommu))) - 1;
>  
> @@ -788,7 +791,7 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
>   return -EINVAL;
>  
>   WARN_ON(mask & PAGE_MASK);
> -
> +again:
>   mutex_lock(>lock);
>  
>

[Qemu-devel] Problem with QEMU PPC test image

2016-11-16 Thread Programmingkid

When I run this test disk image: 
http://wiki.qemu.org/download/ppc-virtexml507-linux-2_6_34.tgz

I see these error messages:
/selftest.sh: line 6: /usr/bin/sha1test: not found
/selftest.sh: line 7: /usr/bin/hmactest: not found


Maybe /usr/bin/sha1sum is what the first test should be. I don't see anything 
that looked like hmactest in the image file.

Re: [Qemu-devel] [PATCH v14 12/22] vfio: Add notifier callback to parent's ops structure of mdev

2016-11-16 Thread Dong Jia Shi

* Kirti Wankhede  [2016-11-17 02:16:24 +0530]:

Hi Kirti,

> diff --git a/drivers/vfio/mdev/vfio_mdev.c b/drivers/vfio/mdev/vfio_mdev.c
[...]

> @@ -51,6 +78,11 @@ static void vfio_mdev_release(void *device_data)
>   if (likely(parent->ops->release))
>   parent->ops->release(mdev);
> 
> + if (likely(parent->ops->notifier)) {
> + if (vfio_unregister_notifier(>dev, >nb))
> + pr_err("Failed to unregister notifier for mdev\n");
For the -ENOTTY case, we should not fail here either.

> + }
> +
>   module_put(THIS_MODULE);
>  }
> 
[...]

-- 
Dong Jia

Re: [Qemu-devel] [Qemu-ppc] [RFC PATCH qemu] spapr_pci: Create PCI-express root bus by default

2016-11-16 Thread Alexey Kardashevskiy

On 16/11/16 01:02, Andrea Bolognani wrote:
> On Tue, 2016-11-01 at 13:46 +1100, David Gibson wrote:
>> On Mon, Oct 31, 2016 at 03:10:23PM +1100, Alexey Kardashevskiy wrote:
>>>  
>>> On 31/10/16 13:53, David Gibson wrote:
  
 On Fri, Oct 28, 2016 at 12:07:12PM +0200, Greg Kurz wrote:
>  
> On Fri, 28 Oct 2016 18:56:40 +1100
> Alexey Kardashevskiy  wrote:
>  
>>  
>> At the moment sPAPR PHB creates a root buf of TYPE_PCI_BUS type.
>> This means that vfio-pci devices attached to it (and this is
>> a default behaviour) hide PCIe extended capabilities as
>> the bus does not pass a pci_bus_is_express(pdev->bus) check.
>>  
>> This changes adds a default PCI bus type property to sPAPR PHB
>> and uses TYPE_PCIE_BUS if none passed; older machines get TYPE_PCI_BUS
>> for backward compatibility as a bus type is used in the bus name
>> so the root bus name becomes "pcie.0" instead of "pci.0".
>>  
>> Signed-off-by: Alexey Kardashevskiy 
>> ---
>>  
>> What can possibly go wrong with such change of a name?
>> From devices prospective, I cannot see any.
>>  
>> libvirt might get upset as "pci.0" will not be available,
>> will it make sense to create pcie.0 as a root bus and always
>> add a PCIe->PCI bridge and name its bus "pci.0"?
>>  
>> Or create root bus from TYPE_PCIE_BUS and force name to "pci.0"?
>> pci_register_bus() can do this.
>>  
>>  
>> ---
>>   hw/ppc/spapr.c  | 5 +
>>   hw/ppc/spapr_pci.c  | 5 -
>>   include/hw/pci-host/spapr.h | 1 +
>>   3 files changed, 10 insertions(+), 1 deletion(-)
>>  
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index 0b3820b..a268511 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -2541,6 +2541,11 @@ DEFINE_SPAPR_MACHINE(2_8, "2.8", true);
>>   .driver   = TYPE_SPAPR_PCI_HOST_BRIDGE, \
>>   .property = "mem64_win_size",   \
>>   .value= "0",\
>> +},  \
>> +{   \
>> +.driver   = TYPE_SPAPR_PCI_HOST_BRIDGE, \
>> +.property = "root_bus_type",\
>> +.value= TYPE_PCI_BUS,   \
>>   },
>>   
>>   static void phb_placement_2_7(sPAPRMachineState *spapr, uint32_t index,
>> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
>> index 7cde30e..2fa1f22 100644
>> --- a/hw/ppc/spapr_pci.c
>> +++ b/hw/ppc/spapr_pci.c
>> @@ -1434,7 +1434,9 @@ static void spapr_phb_realize(DeviceState *dev, 
>> Error **errp)
>>   bus = pci_register_bus(dev, NULL,
>>  pci_spapr_set_irq, pci_spapr_map_irq, sphb,
>>  >memspace, >iospace,
>> -   PCI_DEVFN(0, 0), PCI_NUM_PINS, TYPE_PCI_BUS);
>> +   PCI_DEVFN(0, 0), PCI_NUM_PINS,
>> +   sphb->root_bus_type ? sphb->root_bus_type :
>> +   TYPE_PCIE_BUS);
>  
> Shouldn't we ensure that sphb->root_bus_type is either TYPE_PCIE_BUS or
> TYPE_PCI_BUS ?
  
 Yes, I think so.  In fact, I think it would be better to make the
 property a boolean that just selects PCI-E, rather than this which
 exposes qemu (semi-)internal type names on the comamnd line.
>>>  
>>> Sure, a "pcie-root" boolean property should do.
>>>  
>>> However this is not my main concern, I rather wonder if we have to have
>>> pci.0 when we pick PCIe for the root.
>>  
>> Right.
>>  
>> I've added Andrea Bologna to the CC list to get a libvirt perspective.
> 
> Thanks for doing so: changes such as this one can have quite
> an impact on the upper layers of the stack, so the earliest
> libvirt is involved in the discussion the better.
> 
> I'm going to go a step further and cross-post to libvir-list
> in order to give other libvirt contributors a chance to chime
> in too.
> 
>> Andrea,
>>  
>> To summarise the issue here:
>> * As I've said before the PAPR spec kinda-sorta abstracts the
>>   difference between vanilla PCI and PCI-E
>> * However, because within qemu we're declaring the bus as PCI that
>>   means some PCI-E devices aren't working right
>> * In particular it means that PCI-E extended config space isn't
>>   available
>>  
>> The proposal is to change (on newer machine types) the spapr PHB code
>> to declare a PCI-E bus instead.  AIUI this still won't make the root
>> complex guest visible (which it's not supposed to be under PAPR), and
>> the guest shouldn't see a difference in most cases - it will still see
>> the PAPR abstracted PCIish bus, but will now be able to get extended
>> config space.
>>  
>> The possible problem from

Re: [Qemu-devel] [PATCH] virtio-crypto: fix virtio_queue_set_notification() race

2016-11-16 Thread Gonglei (Arei)






> -Original Message-
> From: Stefan Hajnoczi [mailto:stefa...@redhat.com]
> Sent: Thursday, November 17, 2016 4:18 AM
> To: qemu-devel@nongnu.org
> Cc: Gonglei (Arei); Michael S. Tsirkin; Stefan Hajnoczi
> Subject: [PATCH] virtio-crypto: fix virtio_queue_set_notification() race
> 
> We must check for new virtqueue buffers after re-enabling notifications.
> This prevents the race condition where the guest added buffers just
> after we stopped popping the virtqueue but before we re-enabled
> notifications.
> 
> I think the virtio-crypto code was based on virtio-net but this crucial
> detail was missed.  virtio-net does not have the race condition because
> it processes the virtqueue one more time after re-enabling
> notifications.
> 
> Cc: Gonglei 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  hw/virtio/virtio-crypto.c | 13 +++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
Reviewed-by: Gonglei 

Thanks,
-Gonglei

> diff --git a/hw/virtio/virtio-crypto.c b/hw/virtio/virtio-crypto.c
> index 3293843..847dc9d 100644
> --- a/hw/virtio/virtio-crypto.c
> +++ b/hw/virtio/virtio-crypto.c
> @@ -692,8 +692,17 @@ static void virtio_crypto_dataq_bh(void *opaque)
>  return;
>  }
> 
> -virtio_crypto_handle_dataq(vdev, q->dataq);
> -virtio_queue_set_notification(q->dataq, 1);
> +for (;;) {
> +virtio_crypto_handle_dataq(vdev, q->dataq);
> +virtio_queue_set_notification(q->dataq, 1);
> +
> +/* Are we done or did the guest add more buffers? */
> +if (virtio_queue_empty(q->dataq)) {
> +break;
> +}
> +
> +virtio_queue_set_notification(q->dataq, 0);
> +}
>  }
> 
>  static void
> --
> 2.7.4

Re: [Qemu-devel] [PATCH v13 12/22] vfio: Add notifier callback to parent's ops structure of mdev

2016-11-16 Thread Dong Jia Shi

* Kirti Wankhede  [2016-11-16 20:47:18 +0530]:

> 
> 
> On 11/16/2016 12:07 PM, Dong Jia Shi wrote:
> > * Kirti Wankhede  [2016-11-15 20:59:55 +0530]:
> > 
> > Hi Kirti,
> > 
> > [...]
> > 
> >> diff --git a/drivers/vfio/mdev/vfio_mdev.c b/drivers/vfio/mdev/vfio_mdev.c
> >> index ffc36758cb84..4fc63db38829 100644
> >> --- a/drivers/vfio/mdev/vfio_mdev.c
> >> +++ b/drivers/vfio/mdev/vfio_mdev.c
> >> @@ -24,6 +24,15 @@
> >>  #define DRIVER_AUTHOR   "NVIDIA Corporation"
> >>  #define DRIVER_DESC "VFIO based driver for Mediated device"
> >>
> >> +static int vfio_mdev_notifier(struct notifier_block *nb, unsigned long 
> >> action,
> >> +void *data)
> >> +{
> >> +  struct mdev_device *mdev = container_of(nb, struct mdev_device, nb);
> >> +  struct parent_device *parent = mdev->parent;
> >> +
> >> +  return parent->ops->notifier(mdev, action, data);
> >> +}
> >> +
> >>  static int vfio_mdev_open(void *device_data)
> >>  {
> >>struct mdev_device *mdev = device_data;
> >> @@ -36,9 +45,18 @@ static int vfio_mdev_open(void *device_data)
> >>if (!try_module_get(THIS_MODULE))
> >>return -ENODEV;
> >>
> >> +  if (likely(parent->ops->notifier)) {
> >> +  mdev->nb.notifier_call = vfio_mdev_notifier;
> >> +  if (vfio_register_notifier(>dev, >nb))
> >> +  pr_err("Failed to register notifier for mdev\n");
> > I think we should just return here if the error value is not -ENOTTY.
> > 
> 
> It might be the case where iommu backend module might not support
> .register_notifier(). In that case vfio_register_notifier() returns
> -ENOTTY and that should not fail this open() call
> Changing it to:
> 
> ret = vfio_register_notifier(>dev, >nb);
> if (ret && (ret != -ENOTTY)) {
> pr_err("Failed to register notifier for mdev\n");
> module_put(THIS_MODULE);
> return ret;
> }
Nod. And we need not call vfio_unregister_notifier once error occurs in
open() in this case.

> 
> Thanks,
> Kirti
> 

-- 
Dong Jia

[Qemu-devel] [PATCH] arm: Create /chosen and /memory devicetree nodes if necessary

2016-11-16 Thread Guenter Roeck

While customary, the /chosen and /memory devicetree nodes do not have to
exist. Create if necessary. Also create the /memory/device_type property
if needed.

Signed-off-by: Guenter Roeck <li...@roeck-us.net>
---
The problem is seen with the latest version of the Linux kernel in
linux-next (next-20161116), where many of the /chosen and /memory nodes
in arm devicetree files have been removed. This results in a kernel hang
(sabrelite) or qemu abort (imx25-pdk).

 hw/arm/boot.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index 942416d..ff621e4 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -9,6 +9,7 @@
 
 #include "qemu/osdep.h"
 #include "qapi/error.h"
+#include 
 #include "hw/hw.h"
 #include "hw/arm/arm.h"
 #include "hw/arm/linux-boot-if.h"
@@ -486,6 +487,17 @@ static int load_dtb(hwaddr addr, const struct 
arm_boot_info *binfo,
 g_free(nodename);
 }
 } else {
+Error *err = NULL;
+
+rc = fdt_path_offset(fdt, "/memory");
+if (rc < 0) {
+qemu_fdt_add_subnode(fdt, "/memory");
+}
+
+if (!qemu_fdt_getprop(fdt, "/memory", "device_type", NULL, )) {
+qemu_fdt_setprop_string(fdt, "/memory", "device_type", "memory");
+}
+
 rc = qemu_fdt_setprop_sized_cells(fdt, "/memory", "reg",
   acells, binfo->loader_start,
   scells, binfo->ram_size);
@@ -495,6 +507,11 @@ static int load_dtb(hwaddr addr, const struct 
arm_boot_info *binfo,
 }
 }
 
+rc = fdt_path_offset(fdt, "/chosen");
+if (rc < 0) {
+qemu_fdt_add_subnode(fdt, "/chosen");
+}
+
 if (binfo->kernel_cmdline && *binfo->kernel_cmdline) {
 rc = qemu_fdt_setprop_string(fdt, "/chosen", "bootargs",
  binfo->kernel_cmdline);
-- 
2.5.0

[Qemu-devel] [QEMU PATCH v13 4/4] migration: add error_report

2016-11-16 Thread Jianjun Duan

Added error_report where version_ids do not match in vmstate_load_state.

Signed-off-by: Jianjun Duan 
---
 migration/vmstate.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/migration/vmstate.c b/migration/vmstate.c
index 2f9d4ba..0e6fce4 100644
--- a/migration/vmstate.c
+++ b/migration/vmstate.c
@@ -85,6 +85,7 @@ int vmstate_load_state(QEMUFile *f, const VMStateDescription 
*vmsd,
 
 trace_vmstate_load_state(vmsd->name, version_id);
 if (version_id > vmsd->version_id) {
+error_report("%s %s",  vmsd->name, "too new");
 trace_vmstate_load_state_end(vmsd->name, "too new", -EINVAL);
 return -EINVAL;
 }
@@ -95,6 +96,7 @@ int vmstate_load_state(QEMUFile *f, const VMStateDescription 
*vmsd,
 trace_vmstate_load_state_end(vmsd->name, "old path", ret);
 return ret;
 }
+error_report("%s %s",  vmsd->name, "too old");
 trace_vmstate_load_state_end(vmsd->name, "too old", -EINVAL);
 return -EINVAL;
 }
-- 
1.9.1

[Qemu-devel] [QEMU PATCH v13 3/4] tests/migration: Add test for QTAILQ migration

2016-11-16 Thread Jianjun Duan

Add a test for QTAILQ migration to tests/test-vmstate.c.

Signed-off-by: Jianjun Duan 
---
 tests/test-vmstate.c | 160 +++
 1 file changed, 160 insertions(+)

diff --git a/tests/test-vmstate.c b/tests/test-vmstate.c
index d2f529b..88aab8c 100644
--- a/tests/test-vmstate.c
+++ b/tests/test-vmstate.c
@@ -544,6 +544,163 @@ static void test_arr_ptr_str_no0_load(void)
 }
 }
 
+/* test QTAILQ migration */
+typedef struct TestQtailqElement TestQtailqElement;
+
+struct TestQtailqElement {
+bool b;
+uint8_t  u8;
+QTAILQ_ENTRY(TestQtailqElement) next;
+};
+
+typedef struct TestQtailq {
+int16_t  i16;
+QTAILQ_HEAD(TestQtailqHead, TestQtailqElement) q;
+int32_t  i32;
+} TestQtailq;
+
+static const VMStateDescription vmstate_q_element = {
+.name = "test/queue-element",
+.version_id = 1,
+.minimum_version_id = 1,
+.fields = (VMStateField[]) {
+VMSTATE_BOOL(b, TestQtailqElement),
+VMSTATE_UINT8(u8, TestQtailqElement),
+VMSTATE_END_OF_LIST()
+},
+};
+
+static const VMStateDescription vmstate_q = {
+.name = "test/queue",
+.version_id = 1,
+.minimum_version_id = 1,
+.fields = (VMStateField[]) {
+VMSTATE_INT16(i16, TestQtailq),
+VMSTATE_QTAILQ_V(q, TestQtailq, 1, vmstate_q_element, 
TestQtailqElement,
+ next),
+VMSTATE_INT32(i32, TestQtailq),
+VMSTATE_END_OF_LIST()
+}
+};
+
+static void test_save_q(void)
+{
+TestQtailq obj_q = {
+.i16 = -512,
+.i32 = 7,
+};
+
+TestQtailqElement obj_qe1 = {
+.b = true,
+.u8 = 130,
+};
+
+TestQtailqElement obj_qe2 = {
+.b = false,
+.u8 = 65,
+};
+
+uint8_t wire_q[] = {
+/* i16 */ 0xfe, 0x0,
+/* start of element 0 of q */ 0x01,
+/* .b  */ 0x01,
+/* .u8 */ 0x82,
+/* start of element 1 of q */ 0x01,
+/* b */   0x00,
+/* u8 */  0x41,
+/* end of q */0x00,
+/* i32 */ 0x00, 0x01, 0x11, 0x70,
+QEMU_VM_EOF, /* just to ensure we won't get EOF reported prematurely */
+};
+
+QTAILQ_INIT(_q.q);
+QTAILQ_INSERT_TAIL(_q.q, _qe1, next);
+QTAILQ_INSERT_TAIL(_q.q, _qe2, next);
+
+save_vmstate(_q, _q);
+compare_vmstate(wire_q, sizeof(wire_q));
+}
+
+static void test_load_q(void)
+{
+TestQtailq obj_q = {
+.i16 = -512,
+.i32 = 7,
+};
+
+TestQtailqElement obj_qe1 = {
+.b = true,
+.u8 = 130,
+};
+
+TestQtailqElement obj_qe2 = {
+.b = false,
+.u8 = 65,
+};
+
+uint8_t wire_q[] = {
+/* i16 */ 0xfe, 0x0,
+/* start of element 0 of q */ 0x01,
+/* .b  */ 0x01,
+/* .u8 */ 0x82,
+/* start of element 1 of q */ 0x01,
+/* b */   0x00,
+/* u8 */  0x41,
+/* end of q */0x00,
+/* i32 */ 0x00, 0x01, 0x11, 0x70,
+};
+
+QTAILQ_INIT(_q.q);
+QTAILQ_INSERT_TAIL(_q.q, _qe1, next);
+QTAILQ_INSERT_TAIL(_q.q, _qe2, next);
+
+QEMUFile *fsave = open_test_file(true);
+
+qemu_put_buffer(fsave, wire_q, sizeof(wire_q));
+qemu_put_byte(fsave, QEMU_VM_EOF);
+g_assert(!qemu_file_get_error(fsave));
+qemu_fclose(fsave);
+
+QEMUFile *fload = open_test_file(false);
+TestQtailq tgt;
+
+QTAILQ_INIT();
+vmstate_load_state(fload, _q, , 1);
+char eof = qemu_get_byte(fload);
+g_assert(!qemu_file_get_error(fload));
+g_assert_cmpint(tgt.i16, ==, obj_q.i16);
+g_assert_cmpint(tgt.i32, ==, obj_q.i32);
+g_assert_cmpint(eof, ==, QEMU_VM_EOF);
+
+TestQtailqElement *qele_from = QTAILQ_FIRST(_q.q);
+TestQtailqElement *qlast_from = QTAILQ_LAST(_q.q, TestQtailqHead);
+TestQtailqElement *qele_to = QTAILQ_FIRST();
+TestQtailqElement *qlast_to = QTAILQ_LAST(, TestQtailqHead);
+
+while (1) {
+g_assert_cmpint(qele_to->b, ==, qele_from->b);
+g_assert_cmpint(qele_to->u8, ==, qele_from->u8);
+if ((qele_from == qlast_from) || (qele_to == qlast_to)) {
+break;
+}
+qele_from = QTAILQ_NEXT(qele_from, next);
+qele_to = QTAILQ_NEXT(qele_to, next);
+}
+
+g_assert_cmpint((uint64_t) qele_from, ==, (uint64_t) qlast_from);
+g_assert_cmpint((uint64_t) qele_to, ==, (uint64_t) qlast_to);
+
+/* clean up */
+TestQtailqElement *qele;
+while (!QTAILQ_EMPTY()) {
+qele = QTAILQ_LAST(, TestQtailqHead);
+QTAILQ_REMOVE(, qele, next);
+free(qele);
+qele = NULL;
+}
+qemu_fclose(fload);
+}
+
 int main(int argc, char **argv)
 {
 temp_fd = mkstemp(temp_file);
@@ -562,6

[Qemu-devel] [QEMU PATCH v13 2/4] migration: migrate QTAILQ

2016-11-16 Thread Jianjun Duan

Currently we cannot directly transfer a QTAILQ instance because of the
limitation in the migration code. Here we introduce an approach to
transfer such structures. We created VMStateInfo vmstate_info_qtailq
for QTAILQ. Similar VMStateInfo can be created for other data structures
such as list.

When a QTAILQ is migrated from source to target, it is appended to the
corresponding QTAILQ structure, which is assumed to have been properly
initialized.

This approach will be used to transfer pending_events and ccs_list in spapr
state.

We also create some macros in qemu/queue.h to access a QTAILQ using pointer
arithmetic. This ensures that we do not depend on the implementation
details about QTAILQ in the migration code.

Signed-off-by: Jianjun Duan 
---
 include/migration/vmstate.h | 20 +
 include/qemu/queue.h| 60 +++
 migration/trace-events  |  4 +++
 migration/vmstate.c | 69 +
 4 files changed, 153 insertions(+)

diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index eafc8f2..e47ad6e 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -253,6 +253,7 @@ extern const VMStateInfo vmstate_info_timer;
 extern const VMStateInfo vmstate_info_buffer;
 extern const VMStateInfo vmstate_info_unused_buffer;
 extern const VMStateInfo vmstate_info_bitmap;
+extern const VMStateInfo vmstate_info_qtailq;
 
 #define type_check_2darray(t1,t2,n,m) ((t1(*)[n][m])0 - (t2*)0)
 #define type_check_array(t1,t2,n) ((t1(*)[n])0 - (t2*)0)
@@ -664,6 +665,25 @@ extern const VMStateInfo vmstate_info_bitmap;
 .offset   = offsetof(_state, _field),\
 }
 
+/* For migrating a QTAILQ.
+ * Target QTAILQ needs be properly initialized.
+ * _type: type of QTAILQ element
+ * _next: name of QTAILQ entry field in QTAILQ element
+ * _vmsd: VMSD for QTAILQ element
+ * size: size of QTAILQ element
+ * start: offset of QTAILQ entry in QTAILQ element
+ */
+#define VMSTATE_QTAILQ_V(_field, _state, _version, _vmsd, _type, _next)  \
+{\
+.name = (stringify(_field)), \
+.version_id   = (_version),  \
+.vmsd = &(_vmsd),\
+.size = sizeof(_type),   \
+.info = _info_qtailq,\
+.offset   = offsetof(_state, _field),\
+.start= offsetof(_type, _next),  \
+}
+
 /* _f : field name
_f_n : num of elements field_name
_n : num of elements
diff --git a/include/qemu/queue.h b/include/qemu/queue.h
index 342073f..35292c3 100644
--- a/include/qemu/queue.h
+++ b/include/qemu/queue.h
@@ -438,4 +438,64 @@ struct {   
 \
 #define QTAILQ_PREV(elm, headname, field) \
 (*(((struct headname *)((elm)->field.tqe_prev))->tqh_last))
 
+#define field_at_offset(base, offset, type)
\
+((type) (((char *) (base)) + (offset)))
+
+typedef struct DUMMY_Q_ENTRY DUMMY_Q_ENTRY;
+typedef struct DUMMY_Q DUMMY_Q;
+
+struct DUMMY_Q_ENTRY {
+QTAILQ_ENTRY(DUMMY_Q_ENTRY) next;
+};
+
+struct DUMMY_Q {
+QTAILQ_HEAD(DUMMY_Q_HEAD, DUMMY_Q_ENTRY) head;
+};
+
+#define dummy_q ((DUMMY_Q *) 0)
+#define dummy_qe ((DUMMY_Q_ENTRY *) 0)
+
+/*
+ * Offsets of layout of a tail queue head.
+ */
+#define QTAILQ_FIRST_OFFSET (offsetof(typeof(dummy_q->head), tqh_first))
+#define QTAILQ_LAST_OFFSET  (offsetof(typeof(dummy_q->head), tqh_last))
+/*
+ * Raw access of elements of a tail queue
+ */
+#define QTAILQ_RAW_FIRST(head) 
\
+(*field_at_offset(head, QTAILQ_FIRST_OFFSET, void **))
+#define QTAILQ_RAW_TQH_LAST(head)  
\
+(*field_at_offset(head, QTAILQ_LAST_OFFSET, void ***))
+
+/*
+ * Offsets of layout of a tail queue element.
+ */
+#define QTAILQ_NEXT_OFFSET (offsetof(typeof(dummy_qe->next), tqe_next))
+#define QTAILQ_PREV_OFFSET (offsetof(typeof(dummy_qe->next), tqe_prev))
+
+/*
+ * Raw access of elements of a tail entry
+ */
+#define QTAILQ_RAW_NEXT(elm, entry)
\
+(*field_at_offset(elm, entry + QTAILQ_NEXT_OFFSET, void **))
+#define QTAILQ_RAW_TQE_PREV(elm, entry)
\
+(*field_at_offset(elm, entry + QTAILQ_PREV_OFFSET, void ***))
+/*
+ * Tail queue tranversal using pointer arithmetic.
+ */
+#define QTAILQ_RAW_FOREACH(elm, head, entry)   
\
+for ((elm) = QTAILQ_RAW_FIRST(head);   
\
+ (elm);

[Qemu-devel] [QEMU PATCH v13 1/4] migration: extend VMStateInfo

2016-11-16 Thread Jianjun Duan

Current migration code cannot handle some data structures such as
QTAILQ in qemu/queue.h. Here we extend the signatures of put/get
in VMStateInfo so that customized handling is supported. put now
will return int type.

Signed-off-by: Jianjun Duan 
---
 hw/display/virtio-gpu.c |   8 +++-
 hw/intc/s390_flic_kvm.c |   8 +++-
 hw/net/vmxnet3.c|  24 +++---
 hw/nvram/eeprom93xx.c   |   8 +++-
 hw/nvram/fw_cfg.c   |   8 +++-
 hw/pci/msix.c   |   8 +++-
 hw/pci/pci.c|  16 +--
 hw/pci/shpc.c   |   7 ++-
 hw/scsi/scsi-bus.c  |   8 +++-
 hw/timer/twl92230.c |   8 +++-
 hw/usb/redirect.c   |  24 +++---
 hw/virtio/virtio-pci.c  |   8 +++-
 hw/virtio/virtio.c  |  15 --
 include/migration/vmstate.h |  19 ++--
 migration/savevm.c  |   7 ++-
 migration/vmstate.c | 113 +---
 target-alpha/machine.c  |   6 ++-
 target-arm/machine.c|  14 --
 target-i386/machine.c   |  26 +++---
 target-mips/machine.c   |  14 --
 target-ppc/machine.c|  12 +++--
 target-sparc/machine.c  |   6 ++-
 22 files changed, 262 insertions(+), 105 deletions(-)

diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index 60bce94..c58fa1b 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -988,7 +988,8 @@ static const VMStateDescription vmstate_virtio_gpu_scanouts 
= {
 },
 };
 
-static void virtio_gpu_save(QEMUFile *f, void *opaque, size_t size)
+static int virtio_gpu_save(QEMUFile *f, void *opaque, size_t size,
+   VMStateField *field, QJSON *vmdesc)
 {
 VirtIOGPU *g = opaque;
 struct virtio_gpu_simple_resource *res;
@@ -1013,9 +1014,12 @@ static void virtio_gpu_save(QEMUFile *f, void *opaque, 
size_t size)
 qemu_put_be32(f, 0); /* end of list */
 
 vmstate_save_state(f, _virtio_gpu_scanouts, g, NULL);
+
+return 0;
 }
 
-static int virtio_gpu_load(QEMUFile *f, void *opaque, size_t size)
+static int virtio_gpu_load(QEMUFile *f, void *opaque, size_t size,
+   VMStateField *field)
 {
 VirtIOGPU *g = opaque;
 struct virtio_gpu_simple_resource *res;
diff --git a/hw/intc/s390_flic_kvm.c b/hw/intc/s390_flic_kvm.c
index 21ac2e2..61f512f 100644
--- a/hw/intc/s390_flic_kvm.c
+++ b/hw/intc/s390_flic_kvm.c
@@ -286,7 +286,8 @@ static void kvm_s390_release_adapter_routes(S390FLICState 
*fs,
  * increase until buffer is sufficient or maxium size is
  * reached
  */
-static void kvm_flic_save(QEMUFile *f, void *opaque, size_t size)
+static int kvm_flic_save(QEMUFile *f, void *opaque, size_t size,
+ VMStateField *field, QJSON *vmdesc)
 {
 KVMS390FLICState *flic = opaque;
 int len = FLIC_SAVE_INITIAL_SIZE;
@@ -319,6 +320,8 @@ static void kvm_flic_save(QEMUFile *f, void *opaque, size_t 
size)
 count * sizeof(struct kvm_s390_irq));
 }
 g_free(buf);
+
+return 0;
 }
 
 /**
@@ -331,7 +334,8 @@ static void kvm_flic_save(QEMUFile *f, void *opaque, size_t 
size)
  * Note: Do nothing when no interrupts where stored
  * in QEMUFile
  */
-static int kvm_flic_load(QEMUFile *f, void *opaque, size_t size)
+static int kvm_flic_load(QEMUFile *f, void *opaque, size_t size,
+ VMStateField *field)
 {
 uint64_t len = 0;
 uint64_t count = 0;
diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
index 92f6af9..4163ca8 100644
--- a/hw/net/vmxnet3.c
+++ b/hw/net/vmxnet3.c
@@ -2451,7 +2451,8 @@ static void vmxnet3_put_tx_stats_to_file(QEMUFile *f,
 qemu_put_be64(f, tx_stat->pktsTxDiscard);
 }
 
-static int vmxnet3_get_txq_descr(QEMUFile *f, void *pv, size_t size)
+static int vmxnet3_get_txq_descr(QEMUFile *f, void *pv, size_t size,
+VMStateField *field)
 {
 Vmxnet3TxqDescr *r = pv;
 
@@ -2465,7 +2466,8 @@ static int vmxnet3_get_txq_descr(QEMUFile *f, void *pv, 
size_t size)
 return 0;
 }
 
-static void vmxnet3_put_txq_descr(QEMUFile *f, void *pv, size_t size)
+static int vmxnet3_put_txq_descr(QEMUFile *f, void *pv, size_t size,
+ VMStateField *field, QJSON *vmdesc)
 {
 Vmxnet3TxqDescr *r = pv;
 
@@ -2474,6 +2476,8 @@ static void vmxnet3_put_txq_descr(QEMUFile *f, void *pv, 
size_t size)
 qemu_put_byte(f, r->intr_idx);
 qemu_put_be64(f, r->tx_stats_pa);
 vmxnet3_put_tx_stats_to_file(f, >txq_stats);
+
+return 0;
 }
 
 static const VMStateInfo txq_descr_info = {
@@ -2512,7 +2516,8 @@ static void vmxnet3_put_rx_stats_to_file(QEMUFile *f,
 qemu_put_be64(f, rx_stat->pktsRxError);
 }
 
-static int vmxnet3_get_rxq_descr(QEMUFile *f, void *pv, size_t size)
+static int vmxnet3_get_rxq_descr(QEMUFile *f, void *pv, size_t size,
+VMStateField *field)
 {
 Vmxnet3RxqDescr *r = pv;
 int i;
@@ -2530,7 +2535,8 @@ static int vmxnet3_get_rxq_descr(QEMUFile *f, void *pv, 
size_t size)

[Qemu-devel] [QEMU PATCH v13 0/4] migration: migrate QTAILQ

2016-11-16 Thread Jianjun Duan

Hi all,

I addressed some review comments. Comments are welcome. 

v13: - Changed some QTAILQ related macro names to match existing ones. 

Previous versions are:

v12: - Fixed type for put_qtailq which caused build break.
(link: http://lists.gnu.org/archive/html/qemu-devel/2016-11/msg01328.html

v11: - Split error_report statements into a separate patch.
 - Changed the signature of put. It now returns int type.
 - Minor changes to QTAILQ macros. 
 
v10: - Fixed a typo.
(link: http://lists.nongnu.org/archive/html/qemu-ppc/2016-10/msg01206.html)

v9: - No more hard encoding of QTAILQ layout information
(link: http://lists.nongnu.org/archive/html/qemu-ppc/2016-10/msg01042.html)

v8: - Fixed a style issue. 
(link: http://lists.nongnu.org/archive/html/qemu-ppc/2016-10/msg00874.html)

v7: - Fixed merge errors.
- Simplified macro definitions related to pointer arithmetic based QTAILQ 
access.
- Added test case for QTAILQ migration in tests/test-vmstate.c.
(link: http://lists.nongnu.org/archive/html/qemu-ppc/2016-10/msg00711.html)


v6: - Split from Power specific patches. 
- Dropped VMS_LINKED flag.
- Rebased to master.
- Added comments to clarify about put/get in VMStateInfo.  
(link: http://lists.nongnu.org/archive/html/qemu-ppc/2016-10/msg00336.html)

v5: - Rebased to David's ppc-for-2.8. 
(link: https://lists.nongnu.org/archive/html/qemu-devel/2016-10/msg00270.html)

v4: - Introduce a way to set customized instance_id in SaveStateEntry. Use it
  to set instance_id for DRC using its unique index to address David 
  Gibson's concern.
- Rename VMS_CSTM to VMS_LINKED based on Paolo Bonzini's suggestions.
- Clean up qjson stuff in put_qtailq. 
- Add trace for put_qtailq and get_qtailq based on David Gilbert's 
  suggestion.
- Based on David's ppc-for-2.7. 
(link: https://lists.nongnu.org/archive/html/qemu-devel/2016-06/msg07720.html)

v3: - Simplify overall design followng discussion with Paolo. No longer need
  metadata to migrate QTAILQ.
- Extend VMStateInfo instead of adding similar fields to VMStateField.
- Clean up macros in qemu/queue.h.
(link: https://lists.nongnu.org/archive/html/qemu-devel/2016-05/msg05695.html)

v2: - Introduce a general approach to migrate QTAILQ in qemu/queue.h.
- Migrate signalled field in the DRC state.
- Put the newly added migrating fields in subsections so that backward 
  migration is not broken.  
- Set detach_cb field right after migration so that a migrated hot-unplug
  event could finish its course.
(link: https://lists.nongnu.org/archive/html/qemu-devel/2016-05/msg04188.html)

v1: - Inital version.
(link: https://lists.nongnu.org/archive/html/qemu-devel/2016-04/msg02601.html)


Jianjun Duan (4):
  migration: extend VMStateInfo
  migration: migrate QTAILQ
  tests/migration: Add test for QTAILQ migration
  migration: add error_report

 hw/display/virtio-gpu.c |   8 +-
 hw/intc/s390_flic_kvm.c |   8 +-
 hw/net/vmxnet3.c|  24 --
 hw/nvram/eeprom93xx.c   |   8 +-
 hw/nvram/fw_cfg.c   |   8 +-
 hw/pci/msix.c   |   8 +-
 hw/pci/pci.c|  16 +++-
 hw/pci/shpc.c   |   7 +-
 hw/scsi/scsi-bus.c  |   8 +-
 hw/timer/twl92230.c |   8 +-
 hw/usb/redirect.c   |  24 --
 hw/virtio/virtio-pci.c  |   8 +-
 hw/virtio/virtio.c  |  15 +++-
 include/migration/vmstate.h |  39 --
 include/qemu/queue.h|  60 +++
 migration/savevm.c  |   7 +-
 migration/trace-events  |   4 +
 migration/vmstate.c | 184 +++-
 target-alpha/machine.c  |   6 +-
 target-arm/machine.c|  14 +++-
 target-i386/machine.c   |  26 +--
 target-mips/machine.c   |  14 +++-
 target-ppc/machine.c|  12 ++-
 target-sparc/machine.c  |   6 +-
 tests/test-vmstate.c| 160 ++
 25 files changed, 577 insertions(+), 105 deletions(-)

-- 
1.9.1

Re: [Qemu-devel] [PATCH v14 11/22] vfio iommu: Add blocking notifier to notify DMA_UNMAP

2016-11-16 Thread Alex Williamson

On Thu, 17 Nov 2016 02:16:23 +0530
Kirti Wankhede  wrote:
> @@ -1321,12 +1350,11 @@ static void vfio_iommu_unmap_unpin_reaccount(struct 
> vfio_iommu *iommu)
>  
>  static void vfio_sanity_check_pfn_list(struct vfio_iommu *iommu)
>  {
> - struct rb_node *n, *p;
> + struct rb_node *n;
>  
>   n = rb_first(>dma_list);
>   for (; n; n = rb_next(n)) {
>   struct vfio_dma *dma;
> - int unlocked = 0;
>  
>   dma = rb_entry(n, struct vfio_dma, node);
>  

This chunk really should have been part of 10/22 as well.

Re: [Qemu-devel] [PATCH v14 10/22] vfio iommu type1: Add support for mediated devices

2016-11-16 Thread Alex Williamson

On Thu, 17 Nov 2016 02:16:22 +0530
Kirti Wankhede  wrote:
> @@ -931,6 +1344,24 @@ static void vfio_iommu_type1_detach_group(void 
> *iommu_data,
>  
>   mutex_lock(>lock);
>  
> + if (iommu->external_domain) {
> + group = find_iommu_group(iommu->external_domain, iommu_group);
> + if (group) {
> + list_del(>next);
> + kfree(group);
> +
> + if (list_empty(>external_domain->group_list)) {
> + vfio_sanity_check_pfn_list(iommu);
> +
> + if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu))
> + vfio_iommu_unmap_unpin_all(iommu);
> +
> + kfree(iommu->external_domain);

I advised in one place that I didn't understand why  we were checking
iommu->external_domain before walking the pfn_list, but we do have
several checks still in place for if(iommu->external_domain), so I
think we better be setting to NULL after we free it.

I haven't finished my review yet, but if this ends up being the only
comment and you agree, I can add:

iommu->external_domain = NULL;

here on commit.  Thanks,

Alex

Re: [Qemu-devel] [PATCH 0/3] virtio: disable notifications in blk and scsi

2016-11-16 Thread Michael S. Tsirkin

On Wed, Nov 16, 2016 at 09:53:06PM +, Stefan Hajnoczi wrote:
> Disabling notifications during virtqueue processing reduces the number of
> exits.  The virtio-net device already uses virtio_queue_set_notifications() 
> but
> virtio-blk and virtio-scsi do not.
> 
> The following benchmark shows a 15% reduction in virtio-blk-pci MMIO exits:
> 
>   (host)$ qemu-system-x86_64 \
>   -enable-kvm -m 1024 -cpu host \
>   -drive if=virtio,id=drive0,file=f24.img,format=raw,\
>  cache=none,aio=native
>   (guest)$ fio # jobs=4, iodepth=8, direct=1, randread
>   (host)$ sudo perf record -a -e kvm:kvm_fast_mmio
> 
> Number of kvm_fast_mmio events:
> Unpatched: 685k
> Patched: 592k (-15%, lower is better)

Any chance to see a gain in actual benchmark numbers?
This is important to make sure we are not just
shifting overhead around.


> Note that a workload with iodepth=1 and a single thread will not benefit - 
> this
> is a batching optimization.  The effect should be strongest with large iodepth
> and multiple threads submitting I/O.  The guest I/O scheduler also affects the
> optimization.
> 
> Stefan Hajnoczi (3):
>   virtio: add missing vdev->broken check
>   virtio-blk: suppress virtqueue kick during processing
>   virtio-scsi: suppress virtqueue kick during processing
> 
>  hw/block/virtio-blk.c | 18 --
>  hw/scsi/virtio-scsi.c | 36 +---
>  hw/virtio/virtio.c|  4 
>  3 files changed, 37 insertions(+), 21 deletions(-)
> 
> -- 
> 2.7.4

Re: [Qemu-devel] [PATCH v2 1/1] cadence_uart: Check baud rate generator and divider values on migration

2016-11-16 Thread Alistair Francis

On Mon, Nov 7, 2016 at 4:34 PM, Alistair Francis
 wrote:
> The Cadence UART device emulator calculates speed by dividing the
> baud rate by a 'baud rate generator' & 'baud rate divider' value.
> The device specification defines these register values to be
> non-zero and within certain limits. Checks were recently added when
> writing to these registers but not when restoring from migration.
>
> This patch adds checks when restoring from migration to avoid divide by
> zero errors.

Ping!

Thanks,

Alistair

>
> Reported-by: Huawei PSIRT 
> Signed-off-by: Alistair Francis 
> ---
> V2:
>  - Abort the migration if the data is invalid
>
>  hw/char/cadence_uart.c | 7 +++
>  1 file changed, 7 insertions(+)
>
> diff --git a/hw/char/cadence_uart.c b/hw/char/cadence_uart.c
> index def34cd..9568ac6 100644
> --- a/hw/char/cadence_uart.c
> +++ b/hw/char/cadence_uart.c
> @@ -487,6 +487,13 @@ static int cadence_uart_post_load(void *opaque, int 
> version_id)
>  {
>  CadenceUARTState *s = opaque;
>
> +/* Ensure these two aren't invalid numbers */
> +if (s->r[R_BRGR] <= 1 || s->r[R_BRGR] & 0x ||
> +s->r[R_BDIV] <= 3 || s->r[R_BDIV] & 0xFF) {
> +/* Value is invalid, abort */
> +return 1;
> +}
> +
>  uart_parameters_setup(s);
>  uart_update_status(s);
>  return 0;
> --
> 2.7.4
>

Re: [Qemu-devel] [PATCH v1 1/1] generic-loader: file: Only set a PC if a CPU is specified

2016-11-16 Thread Alistair Francis

On Fri, Nov 11, 2016 at 8:06 PM, Edgar E. Iglesias
 wrote:
> On Fri, Nov 11, 2016 at 06:51:20PM -0800, Alistair Francis wrote:
>> This patch fixes the generic-loader file loading to only set the program
>> counter if a CPU is specified. This follows what is written in the
>> documentation and was always part of the original intention.
>
> Reviewed-by: Edgar E. Iglesias 

Peter can this go through your queue?

I think it should make it into 2.8 as it matches the documentation and
this will be the first release with the device loader.

Thanks,

Alistair

>
>
>>
>> Signed-off-by: Alistair Francis 
>> ---
>>
>>  hw/core/generic-loader.c | 7 ++-
>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/core/generic-loader.c b/hw/core/generic-loader.c
>> index 79ab6df..208f549 100644
>> --- a/hw/core/generic-loader.c
>> +++ b/hw/core/generic-loader.c
>> @@ -93,7 +93,12 @@ static void generic_loader_realize(DeviceState *dev, 
>> Error **errp)
>> "image");
>>  return;
>>  }
>> -s->set_pc = true;
>> +/* The user specified a file, only set the PC if they also specified
>> + * a CPU to use.
>> + */
>> +if (s->cpu_num != CPU_NONE) {
>> +s->set_pc = true;
>> +}
>>  } else if (s->addr) {
>>  /* User is setting the PC */
>>  if (s->data || s->data_len || s->data_be) {
>> --
>> 2.7.4
>>

[Qemu-devel] [PATCH 3/3] virtio-scsi: suppress virtqueue kick during processing

2016-11-16 Thread Stefan Hajnoczi

The guest does not need to kick the virtqueue while we are processing
it.  This reduces the number of vmexits during periods of heavy I/O.

Signed-off-by: Stefan Hajnoczi 
---
 hw/scsi/virtio-scsi.c | 36 +---
 1 file changed, 21 insertions(+), 15 deletions(-)

diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 3e5ae6a..4d23a78 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -578,26 +578,32 @@ static void virtio_scsi_handle_cmd_req_submit(VirtIOSCSI 
*s, VirtIOSCSIReq *req)
 void virtio_scsi_handle_cmd_vq(VirtIOSCSI *s, VirtQueue *vq)
 {
 VirtIOSCSIReq *req, *next;
-int ret;
+int ret = 0;
 
 QTAILQ_HEAD(, VirtIOSCSIReq) reqs = QTAILQ_HEAD_INITIALIZER(reqs);
 
-while ((req = virtio_scsi_pop_req(s, vq))) {
-ret = virtio_scsi_handle_cmd_req_prepare(s, req);
-if (!ret) {
-QTAILQ_INSERT_TAIL(, req, next);
-} else if (ret == -EINVAL) {
-/* The device is broken and shouldn't process any request */
-while (!QTAILQ_EMPTY()) {
-req = QTAILQ_FIRST();
-QTAILQ_REMOVE(, req, next);
-blk_io_unplug(req->sreq->dev->conf.blk);
-scsi_req_unref(req->sreq);
-virtqueue_detach_element(req->vq, >elem, 0);
-virtio_scsi_free_req(req);
+do {
+virtio_queue_set_notification(vq, 0);
+
+while ((req = virtio_scsi_pop_req(s, vq))) {
+ret = virtio_scsi_handle_cmd_req_prepare(s, req);
+if (!ret) {
+QTAILQ_INSERT_TAIL(, req, next);
+} else if (ret == -EINVAL) {
+/* The device is broken and shouldn't process any request */
+while (!QTAILQ_EMPTY()) {
+req = QTAILQ_FIRST();
+QTAILQ_REMOVE(, req, next);
+blk_io_unplug(req->sreq->dev->conf.blk);
+scsi_req_unref(req->sreq);
+virtqueue_detach_element(req->vq, >elem, 0);
+virtio_scsi_free_req(req);
+}
 }
 }
-}
+
+virtio_queue_set_notification(vq, 1);
+} while (ret != -EINVAL && !virtio_queue_empty(vq));
 
 QTAILQ_FOREACH_SAFE(req, , next, next) {
 virtio_scsi_handle_cmd_req_submit(s, req);
-- 
2.7.4

[Qemu-devel] [PATCH 1/3] virtio: add missing vdev->broken check

2016-11-16 Thread Stefan Hajnoczi

virtio_queue_notify_vq() checks vdev->broken before invoking the
handler, virtio_queue_notify_aio_vq() should too.

Signed-off-by: Stefan Hajnoczi 
---
 hw/virtio/virtio.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 55a00cd..a4759bd 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -1239,6 +1239,10 @@ static void virtio_queue_notify_aio_vq(VirtQueue *vq)
 if (vq->vring.desc && vq->handle_aio_output) {
 VirtIODevice *vdev = vq->vdev;
 
+if (unlikely(vq->vdev->broken)) {
+return;
+}
+
 trace_virtio_queue_notify(vdev, vq - vdev->vq, vq);
 vq->handle_aio_output(vdev, vq);
 }
-- 
2.7.4

[Qemu-devel] [PATCH 0/3] virtio: disable notifications in blk and scsi

2016-11-16 Thread Stefan Hajnoczi

Disabling notifications during virtqueue processing reduces the number of
exits.  The virtio-net device already uses virtio_queue_set_notifications() but
virtio-blk and virtio-scsi do not.

The following benchmark shows a 15% reduction in virtio-blk-pci MMIO exits:

  (host)$ qemu-system-x86_64 \
  -enable-kvm -m 1024 -cpu host \
  -drive if=virtio,id=drive0,file=f24.img,format=raw,\
 cache=none,aio=native
  (guest)$ fio # jobs=4, iodepth=8, direct=1, randread
  (host)$ sudo perf record -a -e kvm:kvm_fast_mmio

Number of kvm_fast_mmio events:
Unpatched: 685k
Patched: 592k (-15%, lower is better)

Note that a workload with iodepth=1 and a single thread will not benefit - this
is a batching optimization.  The effect should be strongest with large iodepth
and multiple threads submitting I/O.  The guest I/O scheduler also affects the
optimization.

Stefan Hajnoczi (3):
  virtio: add missing vdev->broken check
  virtio-blk: suppress virtqueue kick during processing
  virtio-scsi: suppress virtqueue kick during processing

 hw/block/virtio-blk.c | 18 --
 hw/scsi/virtio-scsi.c | 36 +---
 hw/virtio/virtio.c|  4 
 3 files changed, 37 insertions(+), 21 deletions(-)

-- 
2.7.4

[Qemu-devel] [PATCH 2/3] virtio-blk: suppress virtqueue kick during processing

2016-11-16 Thread Stefan Hajnoczi

The guest does not need to kick the virtqueue while we are processing
it.  This reduces the number of vmexits during periods of heavy I/O.

Signed-off-by: Stefan Hajnoczi 
---
 hw/block/virtio-blk.c | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 0c5fd27..50bb0cb 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -588,13 +588,19 @@ void virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq)
 
 blk_io_plug(s->blk);
 
-while ((req = virtio_blk_get_request(s, vq))) {
-if (virtio_blk_handle_request(req, )) {
-virtqueue_detach_element(req->vq, >elem, 0);
-virtio_blk_free_request(req);
-break;
+do {
+virtio_queue_set_notification(vq, 0);
+
+while ((req = virtio_blk_get_request(s, vq))) {
+if (virtio_blk_handle_request(req, )) {
+virtqueue_detach_element(req->vq, >elem, 0);
+virtio_blk_free_request(req);
+break;
+}
 }
-}
+
+virtio_queue_set_notification(vq, 1);
+} while (!virtio_queue_empty(vq));
 
 if (mrb.num_reqs) {
 virtio_blk_submit_multireq(s->blk, );
-- 
2.7.4

Re: [Qemu-devel] [PATCH for-2.8 0/3] virtio fixes

2016-11-16 Thread Farhan Ali




On 11/16/2016 03:32 PM, Farhan Ali wrote:



On 11/16/2016 03:16 PM, Michael S. Tsirkin wrote:

On Wed, Nov 16, 2016 at 03:03:13PM -0500, Farhan Ali wrote:

Hi Paolo,

I was able to test your patches in our s390 environment. I don't see 
the

qemu crashes anymore which I noticed before.

Testing a guest running high stress I/O workload, without iothreads 
does

show a delay in guest response time.

Compared to which version?

Compared to 2.7.0




But running

the same test with iothreads seems to solve the issue.

Tested-by : Farhan Ali 

Could you also test just patches 1 and 2 pls?


Okay will do


Testing with just patches 1 and 2, I did not see any change.




Thank you

Farhan


On 11/16/2016 02:50 PM, Christian Borntraeger wrote:

On 11/15/2016 02:46 PM, Paolo Bonzini wrote:

Patch 1 fixes vhost, patches 2-3 fix Windows hibernation.

Paolo

Paolo Bonzini (3):
virtio: introduce grab/release_ioeventfd to fix vhost
virtio: access ISR atomically
virtio: set ISR on dataplane notifications

   hw/block/dataplane/virtio-blk.c |  4 +--
   hw/scsi/virtio-scsi-dataplane.c |  7 --
   hw/scsi/virtio-scsi.c   |  2 +-
   hw/virtio/trace-events  |  2 +-
   hw/virtio/vhost.c   | 11 +++--
   hw/virtio/virtio-bus.c  | 54 
-

   hw/virtio/virtio-mmio.c |  6 ++---
   hw/virtio/virtio-pci.c  |  9 +++
   hw/virtio/virtio.c  | 46 
---

   include/hw/virtio/virtio-bus.h  | 14 +++
   include/hw/virtio/virtio-scsi.h |  1 -
   include/hw/virtio/virtio.h  |  4 ++-
   12 files changed, 110 insertions(+), 50 deletions(-)


Farhan,

it was this mail thread.

Re: [Qemu-devel] [PATCH 3/3] virtio: set ISR on dataplane notifications

2016-11-16 Thread Michael S. Tsirkin

On Wed, Nov 16, 2016 at 04:05:31PM -0500, Paolo Bonzini wrote:
> 
> 
> - Original Message -
> > From: "Michael S. Tsirkin" 
> > To: "Paolo Bonzini" 
> > Cc: qemu-devel@nongnu.org, "alex williamson" , 
> > borntrae...@de.ibm.com, fel...@nutanix.com
> > Sent: Wednesday, November 16, 2016 9:39:24 PM
> > Subject: Re: [PATCH 3/3] virtio: set ISR on dataplane notifications
> > 
> > On Wed, Nov 16, 2016 at 03:38:11PM -0500, Paolo Bonzini wrote:
> > > > > +void virtio_notify_irqfd(VirtIODevice *vdev, VirtQueue *vq)
> > > > > +{
> > > > > +if (!virtio_should_notify(vdev, vq)) {
> > > > > +return;
> > > > > +}
> > > > > +
> > > > > +trace_virtio_notify_irqfd(vdev, vq);
> > > > > +virtio_set_isr(vq->vdev, 0x1);
> > > > 
> > > > So here, I think we need a comment with parts of
> > > > the commit log.
> > > > 
> > > > /*
> > > >  * virtio spec 1.0 says ISR bit 0 should be ignored with MSI, but
> > > >  * windows drivers included in virtio-win 1.8.0 (circa 2015)
> > > >  * for Windows 8.1 only are incorrectly polling this bit during shutdown
> > >  
> > > 
> > > Not sure it's only for Windows 8.1, in fact probably not.
> > 
> > 8.1 on shutdown and others on crashdump or hibernation?
> 
> Even 8.1 is really a hibernation hidden behind a "Shut down" menu item.
> 
> Paolo

what does "hang during shutdown" in your commit log refer to then?

> > > Looks good if you replace this line with
> > > 
> > > "are incorrectly polling this bit during crashdump or hibernation"
> > > 
> > > Paolo
> > > 
> > > >  * in MSI mode, causing a hang if this bit is never updated.
> > > >  * Next driver release from 2016 fixed this problem, so working around 
> > > > it
> > > >  * is not a must, but it's easy to do so let's do it here.
> > > >  *
> > > >  * Note: it's safe to update ISR from any thread as it was switched
> > > >  * to an atomic operation.
> > > >  */
> > > 
> > > 
> > > > 
> > > > 
> > > > 
> > > > > +event_notifier_set(>guest_notifier);
> > > > > +}
> > > > > +
> > > > >  void virtio_notify(VirtIODevice *vdev, VirtQueue *vq)
> > > > >  {
> > > > >  if (!virtio_should_notify(vdev, vq)) {
> > > > > @@ -1990,7 +1994,7 @@ static void
> > > > > virtio_queue_guest_notifier_read(EventNotifier *n)
> > > > >  {
> > > > >  VirtQueue *vq = container_of(n, VirtQueue, guest_notifier);
> > > > >  if (event_notifier_test_and_clear(n)) {
> > > > > -virtio_irq(vq);
> > > > > +virtio_notify_vector(vq->vdev, vq->vector);
> > > > >  }
> > > > >  }
> > > > >  
> > > > > diff --git a/include/hw/virtio/virtio-scsi.h
> > > > > b/include/hw/virtio/virtio-scsi.h
> > > > > index 9fbc7d7..7375196 100644
> > > > > --- a/include/hw/virtio/virtio-scsi.h
> > > > > +++ b/include/hw/virtio/virtio-scsi.h
> > > > > @@ -137,6 +137,5 @@ void virtio_scsi_push_event(VirtIOSCSI *s,
> > > > > SCSIDevice
> > > > > *dev,
> > > > >  void virtio_scsi_dataplane_setup(VirtIOSCSI *s, Error **errp);
> > > > >  int virtio_scsi_dataplane_start(VirtIODevice *s);
> > > > >  void virtio_scsi_dataplane_stop(VirtIODevice *s);
> > > > > -void virtio_scsi_dataplane_notify(VirtIODevice *vdev, VirtIOSCSIReq
> > > > > *req);
> > > > >  
> > > > >  #endif /* QEMU_VIRTIO_SCSI_H */
> > > > > diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> > > > > index 835b085..ab0e030 100644
> > > > > --- a/include/hw/virtio/virtio.h
> > > > > +++ b/include/hw/virtio/virtio.h
> > > > > @@ -181,6 +181,7 @@ void virtqueue_get_avail_bytes(VirtQueue *vq,
> > > > > unsigned
> > > > > int *in_bytes,
> > > > > unsigned max_in_bytes, unsigned
> > > > > max_out_bytes);
> > > > >  
> > > > >  bool virtio_should_notify(VirtIODevice *vdev, VirtQueue *vq);
> > > > > +void virtio_notify_irqfd(VirtIODevice *vdev, VirtQueue *vq);
> > > > >  void virtio_notify(VirtIODevice *vdev, VirtQueue *vq);
> > > > >  
> > > > >  void virtio_save(VirtIODevice *vdev, QEMUFile *f);
> > > > > @@ -280,7 +281,6 @@ void virtio_queue_host_notifier_read(EventNotifier
> > > > > *n);
> > > > >  void virtio_queue_aio_set_host_notifier_handler(VirtQueue *vq,
> > > > >  AioContext
> > > > >  *ctx,
> > > > >  void
> > > > >  (*fn)(VirtIODevice *,
> > > > > VirtQueue
> > > > > *));
> > > > > -void virtio_irq(VirtQueue *vq);
> > > > >  VirtQueue *virtio_vector_first_queue(VirtIODevice *vdev, uint16_t
> > > > >  vector);
> > > > >  VirtQueue *virtio_vector_next_queue(VirtQueue *vq);
> > > > >  
> > > > > --
> > > > > 2.9.3
> > > > 
> >

Re: [Qemu-devel] [patch v3 00/18] tcg field extract primitives

2016-11-16 Thread no-reply

Hi,

Your series seems to have some coding style problems. See output below for
more information:

Type: series
Subject: [Qemu-devel] [patch v3 00/18] tcg field extract primitives
Message-id: 1479326625-10682-1-git-send-email-...@twiddle.net

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

# Useful git options
git config --local diff.renamelimit 0
git config --local diff.renames True

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
failed=1
echo
fi
n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag] patchew/1479326625-10682-1-git-send-email-...@twiddle.net 
-> patchew/1479326625-10682-1-git-send-email-...@twiddle.net
 * [new tag] 
patchew/1479326850-8369-1-git-send-email-jos...@linux.vnet.ibm.com -> 
patchew/1479326850-8369-1-git-send-email-jos...@linux.vnet.ibm.com
 * [new tag] 
patchew/1479327452-16096-1-git-send-email-stefa...@redhat.com -> 
patchew/1479327452-16096-1-git-send-email-stefa...@redhat.com
Switched to a new branch 'test'
5f8de8d target-s390x: Use the new deposit and extract ops
5d887cd target-ppc: Use the new deposit and extract ops
823629a target-mips: Use the new extract op
b1df231 target-i386: Use new deposit and extract ops
4fcc1ad target-arm: Use new deposit and extract ops
38aa933 target-alpha: Use deposit and extract ops
935116b tcg/s390: Support deposit into zero
bc2ecb0 tcg/s390: Implement field extraction opcodes
34f18c5 tcg/s390: Expose host facilities to tcg-target.h
830033c tcg/ppc: Implement field extraction opcodes
7b20c32 tcg/mips: Implement field extraction opcodes
8e69031 tcg/i386: Implement field extraction opcodes
a10576c tcg/arm: Implement field extraction opcodes
ce7e87e tcg/arm: Move isa detection to tcg-target.h
ae30586 tcg/aarch64: Implement field extraction opcodes
fbb4fe9 tcg: Add deposit_z expander
4745d42 tcg: Minor adjustments to deposit expanders
e29bca5 tcg: Add field extraction primitives

=== OUTPUT BEGIN ===
Checking PATCH 1/18: tcg: Add field extraction primitives...
ERROR: spaces required around that ':' (ctx:VxE)
#139: FILE: tcg/optimize.c:881:
+CASE_OP_32_64(extract):
   ^

ERROR: spaces required around that ':' (ctx:VxE)
#145: FILE: tcg/optimize.c:887:
+CASE_OP_32_64(sextract):
^

ERROR: spaces required around that ':' (ctx:VxE)
#159: FILE: tcg/optimize.c:1064:
+CASE_OP_32_64(extract):
   ^

ERROR: spaces required around that ':' (ctx:VxE)
#167: FILE: tcg/optimize.c:1072:
+CASE_OP_32_64(sextract):
^

ERROR: space prohibited after that '&&' (ctx:ExW)
#271: FILE: tcg/tcg-op.c:582:
+&& TCG_TARGET_extract_i32_valid(ofs, len)) {
 ^

ERROR: space prohibited after that '&&' (ctx:ExW)
#334: FILE: tcg/tcg-op.c:645:
+&& TCG_TARGET_extract_i32_valid(ofs, len)) {
 ^

ERROR: space prohibited after that '&&' (ctx:ExW)
#420: FILE: tcg/tcg-op.c:1799:
+&& TCG_TARGET_extract_i64_valid(ofs, len)) {
 ^

ERROR: space prohibited after that '&&' (ctx:ExW)
#526: FILE: tcg/tcg-op.c:1905:
+&& TCG_TARGET_extract_i64_valid(ofs, len)) {
 ^

total: 8 errors, 0 warnings, 599 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 2/18: tcg: Minor adjustments to deposit expanders...
Checking PATCH 3/18: tcg: Add deposit_z expander...
ERROR: space prohibited after that '&&' (ctx:ExW)
#33: FILE: tcg/tcg-op.c:577:
+   && TCG_TARGET_deposit_i32_valid(ofs, len)) {
^

ERROR: space prohibited after that '&&' (ctx:ExW)
#98: FILE: tcg/tcg-op.c:1836:
+   && TCG_TARGET_deposit_i64_valid(ofs, len)) {
^

total: 2 errors, 0 warnings, 185 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 4/18: tcg/aarch64: Implement field extraction opcodes...
Checking PATCH 5/18: tcg/arm: Move isa detection to tcg-target.h...
WARNING: architecture specific defines should be avoided
#19: FILE: tcg/arm/tcg-target.h:30:
+#ifndef __ARM_ARCH

WARNING: architecture specific defines should be avoided
#20: FILE: tcg/arm/tcg-target.h:31:
+# if defined(__ARM_ARCH_7__) || defined(__ARM_ARCH_7A__) \

WARNING: architecture specific defines should be avoided
#39: FILE: tcg/arm/tcg-target.h:50:
+#if defined(__ARM_ARCH_5T__) \

total: 0 errors, 3 warnings, 107 lines checked

Your patch has style problems, please review.

Re: [Qemu-devel] [PATCH for-2.8 0/3] virtio fixes

2016-11-16 Thread Farhan Ali




On 11/16/2016 03:16 PM, Michael S. Tsirkin wrote:

On Wed, Nov 16, 2016 at 03:03:13PM -0500, Farhan Ali wrote:

Hi Paolo,

I was able to test your patches in our s390 environment. I don't see the
qemu crashes anymore which I noticed before.

Testing a guest running high stress I/O workload, without iothreads does
show a delay in guest response time.

Compared to which version?

Compared to 2.7.0




But running

the same test with iothreads seems to solve the issue.

Tested-by : Farhan Ali 

Could you also test just patches 1 and 2 pls?


Okay will do



Thank you

Farhan


On 11/16/2016 02:50 PM, Christian Borntraeger wrote:

On 11/15/2016 02:46 PM, Paolo Bonzini wrote:

Patch 1 fixes vhost, patches 2-3 fix Windows hibernation.

Paolo

Paolo Bonzini (3):
virtio: introduce grab/release_ioeventfd to fix vhost
virtio: access ISR atomically
virtio: set ISR on dataplane notifications

   hw/block/dataplane/virtio-blk.c |  4 +--
   hw/scsi/virtio-scsi-dataplane.c |  7 --
   hw/scsi/virtio-scsi.c   |  2 +-
   hw/virtio/trace-events  |  2 +-
   hw/virtio/vhost.c   | 11 +++--
   hw/virtio/virtio-bus.c  | 54 
-
   hw/virtio/virtio-mmio.c |  6 ++---
   hw/virtio/virtio-pci.c  |  9 +++
   hw/virtio/virtio.c  | 46 ---
   include/hw/virtio/virtio-bus.h  | 14 +++
   include/hw/virtio/virtio-scsi.h |  1 -
   include/hw/virtio/virtio.h  |  4 ++-
   12 files changed, 110 insertions(+), 50 deletions(-)


Farhan,

it was this mail thread.

[Qemu-devel] Once again with feeling: work-around for slow SEEK_HOLE on Linux tmpfs.

2016-11-16 Thread Christopher Oliver



-- 
Christopher Oliver 


qemu-patch
Description: Binary data

[Qemu-devel] [PATCH v14 22/22] MAINTAINERS: Add entry VFIO based Mediated device drivers

2016-11-16 Thread Kirti Wankhede

Adding myself as a maintainer of mediated device framework,
a sub module of VFIO.

Signed-off-by: Kirti Wankhede 
Signed-off-by: Neo Jia 
Change-Id: I58f6717783e0d4008ca31f4a5c4494696bae8571
---
 MAINTAINERS | 9 +
 1 file changed, 9 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 411e3b87b8c2..0cff155c1315 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12754,6 +12754,15 @@ F: drivers/vfio/
 F: include/linux/vfio.h
 F: include/uapi/linux/vfio.h
 
+VFIO MEDIATED DEVICE DRIVERS
+M: Kirti Wankhede 
+L: k...@vger.kernel.org
+S: Maintained
+F: Documentation/vfio-mediated-device.txt
+F: drivers/vfio/mdev/
+F: include/linux/mdev.h
+F: samples/vfio-mdev/
+
 VFIO PLATFORM DRIVER
 M: Baptiste Reynal 
 L: k...@vger.kernel.org
-- 
2.7.0

[Qemu-devel] [PATCH v14 19/22] docs: Add Documentation for Mediated devices

2016-11-16 Thread Kirti Wankhede

Add file Documentation/vfio-mediated-device.txt that include details of
mediated device framework.

Signed-off-by: Kirti Wankhede 
Signed-off-by: Neo Jia 
Change-Id: I137dd646442936090d92008b115908b7b2c7bc5d
---
 Documentation/vfio-mediated-device.txt | 298 +
 drivers/vfio/mdev/Kconfig  |   1 +
 2 files changed, 299 insertions(+)
 create mode 100644 Documentation/vfio-mediated-device.txt

diff --git a/Documentation/vfio-mediated-device.txt 
b/Documentation/vfio-mediated-device.txt
new file mode 100644
index ..fe8bd2e7b26a
--- /dev/null
+++ b/Documentation/vfio-mediated-device.txt
@@ -0,0 +1,298 @@
+/*
+ * VFIO Mediated devices
+ *
+ * Copyright (c) 2016, NVIDIA CORPORATION. All rights reserved.
+ * Author: Neo Jia 
+ * Kirti Wankhede 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+Virtual Function I/O (VFIO) Mediated devices[1]
+===
+
+The number of use cases for virtualizing DMA devices that do not have built-in
+SR_IOV capability is increasing. Previously, to virtualize such devices,
+developers had to create their own management interfaces and APIs, and then
+integrate them with user space software. To simplify integration with user 
space
+software, we have identified common requirements and a unified management
+interface for such devices.
+
+The VFIO driver framework provides unified APIs for direct device access. It is
+an IOMMU/device-agnostic framework for exposing direct device access to user
+space in a secure, IOMMU-protected environment. This framework is used for
+multiple devices, such as GPUs, network adapters, and compute accelerators. 
With
+direct device access, virtual machines or user space applications have direct
+access to the physical device. This framework is reused for mediated devices.
+
+The mediated core driver provides a common interface for mediated device
+management that can be used by drivers of different devices. This module
+provides a generic interface to perform these operations:
+
+* Create and destroy a mediated device
+* Add a mediated device to and remove it from a mediated bus driver
+* Add a mediated device to and remove it from an IOMMU group
+
+The mediated core driver also provides an interface to register a bus driver.
+For example, the mediated VFIO mdev driver is designed for mediated devices and
+supports VFIO APIs. The mediated bus driver adds a mediated device to and
+removes it from a VFIO group.
+
+The following high-level block diagram shows the main components and interfaces
+in the VFIO mediated driver framework. The diagram shows NVIDIA, Intel, and IBM
+devices as examples, as these devices are the first devices to use this module.
+
+ +---+
+ |   |
+ | +---+ |  mdev_register_driver() +--+
+ | |   | +<+  |
+ | |  mdev | | |  |
+ | |  bus  | +>+ vfio_mdev.ko |<-> VFIO user
+ | |  driver   | | probe()/remove()|  |APIs
+ | |   | | +--+
+ | +---+ |
+ |   |
+ |  MDEV CORE|
+ |   MODULE  |
+ |   mdev.ko |
+ | +---+ |  mdev_register_device() +--+
+ | |   | +<+  |
+ | |   | | |  nvidia.ko   |<-> physical
+ | |   | +>+  |device
+ | |   | |callbacks+--+
+ | | Physical  | |
+ | |  device   | |  mdev_register_device() +--+
+ | | interface | |<+  |
+ | |   | | |  i915.ko |<-> physical
+ | |   | +>+  |device
+ | |   | |callbacks+--+
+ | |   | |
+ | |   | |  mdev_register_device() +--+
+ | |   | +<+  |
+ | |   | | | ccw_device.ko|<-> physical
+ | |   | +>+  |device
+ | |   | |callbacks+--+
+ | +---+ |
+ +---+
+
+
+Registration Interfaces
+===
+
+The mediated core driver provides the following types of registration
+interfaces:
+
+* Registration interface for a mediated bus driver
+* Physical device driver interface
+
+Registration Interface for a Mediated Bus Driver

Re: [Qemu-devel] [Qemu-block] Once again with feeling: work-around for slow SEEK_HOLE on Linux tmpfs.

2016-11-16 Thread Eric Blake

On 11/16/2016 01:48 PM, Christopher Oliver wrote:
> 
> 

Attaching the patch (rather than including it inline) requires reviewers
to save the attachment off to a file in order to even see what you
wrote.  To save others some time, I'm pasting the text and replying inline:

> The following patch is a work-around for slow SEEK_HOLE on some filesystems.

The subject line is too long and doesn't match the usual pattern of
"category: short synopsis".

> Specifically, SEEK_HOLE on a dense file on Linux tmpfs is linear time in
> the length.  This slows qemu-img to a crawl as it runs SEEK_DATA/SEEK_HOLE
> pairs over the length of the image it's reading stepping by small deltas.
> 
> The key observation is that if the descriptor is read-only, and there are
> no writers anywhere else (that's undefined behavior anyhow, right?), then a
> hole seek in the interval from the previous start to the previously found
> hole will find the same hole.
> 
> Signed-off-by: Christopher Oliver

The S-o-b is incorrect; it is missing an email address.  Without that,
the patch cannot be accepted.  You'll want to try again, but this time,
I suggest getting 'qemu send-email' working to the point that you can
send yourself an inline patch (not an attachment), before sending v2 to
the list; you may also want to check out other patch submission
guidelines here:

http://wiki.qemu.org/Contribute/SubmitAPatch

> 
> diff --git a/block/raw-posix.c b/block/raw-posix.c
> index 28b47d9..b45defe 100644
> --- a/block/raw-posix.c
> +++ b/block/raw-posix.c
> @@ -136,6 +136,8 @@ typedef struct BDRVRawState {
>  int type;
>  int open_flags;
>  size_t buf_align;
> +off_t last_hole;
> +off_t hole_follows;
>  
>  #ifdef CONFIG_XFS
>  bool is_xfs:1;
> @@ -470,6 +472,7 @@ static int raw_open_common(BlockDriverState *bs, QDict 
> *options,
>  
>  s->has_discard = true;
>  s->has_write_zeroes = true;
> +s->last_hole = -1;
>  bs->supported_zero_flags = BDRV_REQ_MAY_UNMAP;
>  if ((bs->open_flags & BDRV_O_NOCACHE) != 0) {
>  s->needs_alignment = true;
> @@ -1710,7 +1713,24 @@ static int find_allocation(BlockDriverState *bs, off_t 
> start,
>   * H4. offs < 0, errno != ENXIO: we learned nothing
>   * Pretend we know nothing at all, i.e. "forget" about D1.
>   */
> -offs = lseek(s->fd, start, SEEK_HOLE);
> +/* Addendum: Since HOLE seeks are expensive on some filesystems

Can anything be done to Linux tmpfs to make HOLE seeks less expensive?

> + * (e.g. tmpfs) and holes don't change when an image is read only,
> + * cache the range from a start to a hold and return that value

s/hold/hole/

> + * for requests in that interval.  Outside of that interval, seek
> + * and cache the new range.
> + */
> +if  ((s->open_flags & (O_RDWR|O_RDONLY)) == O_RDONLY) {
> +if (start <= s->last_hole && start >= s->hole_follows) {
> +offs = lseek(s->fd, s->last_hole, SEEK_SET);
> +} else {
> +offs = lseek(s->fd, start, SEEK_HOLE);
> +s->last_hole = offs;
> +s->hole_follows = start;
> +}
> +} else {
> +offs = lseek(s->fd, start, SEEK_HOLE);
> +}
> +
>  if (offs < 0) {
>  return -errno;  /* D1 and (H3 or H4) */
>  }

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH v14 15/22] vfio: Introduce vfio_set_irqs_validate_and_prepare()

2016-11-16 Thread Kirti Wankhede

Vendor driver using mediated device framework would use same mechnism to
validate and prepare IRQs. Introducing this function to reduce code
replication in multiple drivers.

Signed-off-by: Kirti Wankhede 
Signed-off-by: Neo Jia 
Change-Id: Ie201f269dda0713ca18a07dc4852500bd8b48309
---
 drivers/vfio/vfio.c  | 48 
 include/linux/vfio.h |  4 
 2 files changed, 52 insertions(+)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 82257cf30f52..2c044af09a2c 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1858,6 +1858,54 @@ int vfio_info_add_capability(struct vfio_info_cap *caps, 
int cap_type_id,
 }
 EXPORT_SYMBOL(vfio_info_add_capability);
 
+int vfio_set_irqs_validate_and_prepare(struct vfio_irq_set *hdr, int num_irqs,
+  int max_irq_type, size_t *data_size)
+{
+   unsigned long minsz;
+   size_t size;
+
+   minsz = offsetofend(struct vfio_irq_set, count);
+
+   if ((hdr->argsz < minsz) || (hdr->index >= max_irq_type) ||
+   (hdr->count >= (U32_MAX - hdr->start)) ||
+   (hdr->flags & ~(VFIO_IRQ_SET_DATA_TYPE_MASK |
+   VFIO_IRQ_SET_ACTION_TYPE_MASK)))
+   return -EINVAL;
+
+   if (data_size)
+   *data_size = 0;
+
+   if (hdr->start >= num_irqs || hdr->start + hdr->count > num_irqs)
+   return -EINVAL;
+
+   switch (hdr->flags & VFIO_IRQ_SET_DATA_TYPE_MASK) {
+   case VFIO_IRQ_SET_DATA_NONE:
+   size = 0;
+   break;
+   case VFIO_IRQ_SET_DATA_BOOL:
+   size = sizeof(uint8_t);
+   break;
+   case VFIO_IRQ_SET_DATA_EVENTFD:
+   size = sizeof(int32_t);
+   break;
+   default:
+   return -EINVAL;
+   }
+
+   if (size) {
+   if (hdr->argsz - minsz < hdr->count * size)
+   return -EINVAL;
+
+   if (!data_size)
+   return -EINVAL;
+
+   *data_size = hdr->count * size;
+   }
+
+   return 0;
+}
+EXPORT_SYMBOL(vfio_set_irqs_validate_and_prepare);
+
 /*
  * Pin a set of guest PFNs and return their associated host PFNs for local
  * domain only.
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index e26f7ccab564..15ff0421b423 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -129,6 +129,10 @@ extern void vfio_info_cap_shift(struct vfio_info_cap 
*caps, size_t offset);
 extern int vfio_info_add_capability(struct vfio_info_cap *caps,
int cap_type_id, void *cap_type);
 
+extern int vfio_set_irqs_validate_and_prepare(struct vfio_irq_set *hdr,
+ int num_irqs, int max_irq_type,
+ size_t *data_size);
+
 struct pci_dev;
 #ifdef CONFIG_EEH
 extern void vfio_spapr_pci_eeh_open(struct pci_dev *pdev);
-- 
2.7.0

[Qemu-devel] [PATCH v14 12/22] vfio: Add notifier callback to parent's ops structure of mdev

2016-11-16 Thread Kirti Wankhede

Add a notifier calback to parent's ops structure of mdev device so that per
device notifer for vfio module is registered through vfio_mdev module.

Signed-off-by: Kirti Wankhede 
Signed-off-by: Neo Jia 
Change-Id: Iafa6f1721aecdd6e50eb93b153b5621e6d29b637
---
 drivers/vfio/mdev/vfio_mdev.c | 34 +-
 include/linux/mdev.h  |  9 +
 2 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/mdev/vfio_mdev.c b/drivers/vfio/mdev/vfio_mdev.c
index ffc36758cb84..2f8e06e5f95a 100644
--- a/drivers/vfio/mdev/vfio_mdev.c
+++ b/drivers/vfio/mdev/vfio_mdev.c
@@ -24,6 +24,15 @@
 #define DRIVER_AUTHOR   "NVIDIA Corporation"
 #define DRIVER_DESC "VFIO based driver for Mediated device"
 
+static int vfio_mdev_notifier(struct notifier_block *nb, unsigned long action,
+ void *data)
+{
+   struct mdev_device *mdev = container_of(nb, struct mdev_device, nb);
+   struct parent_device *parent = mdev->parent;
+
+   return parent->ops->notifier(mdev, action, data);
+}
+
 static int vfio_mdev_open(void *device_data)
 {
struct mdev_device *mdev = device_data;
@@ -36,9 +45,27 @@ static int vfio_mdev_open(void *device_data)
if (!try_module_get(THIS_MODULE))
return -ENODEV;
 
+   if (likely(parent->ops->notifier)) {
+   mdev->nb.notifier_call = vfio_mdev_notifier;
+   ret = vfio_register_notifier(>dev, >nb);
+
+   /*
+* This should not fail if backend iommu module doesn't support
+* register_notifier.
+*/
+   if (ret && (ret != -ENOTTY)) {
+   pr_err("Failed to register notifier for mdev\n");
+   module_put(THIS_MODULE);
+   return ret;
+   }
+   }
+
ret = parent->ops->open(mdev);
-   if (ret)
+   if (ret) {
+   if (likely(parent->ops->notifier))
+   vfio_unregister_notifier(>dev, >nb);
module_put(THIS_MODULE);
+   }
 
return ret;
 }
@@ -51,6 +78,11 @@ static void vfio_mdev_release(void *device_data)
if (likely(parent->ops->release))
parent->ops->release(mdev);
 
+   if (likely(parent->ops->notifier)) {
+   if (vfio_unregister_notifier(>dev, >nb))
+   pr_err("Failed to unregister notifier for mdev\n");
+   }
+
module_put(THIS_MODULE);
 }
 
diff --git a/include/linux/mdev.h b/include/linux/mdev.h
index ec819e9a115a..94c43034c297 100644
--- a/include/linux/mdev.h
+++ b/include/linux/mdev.h
@@ -37,6 +37,7 @@ struct mdev_device {
struct kref ref;
struct list_headnext;
struct kobject  *type_kobj;
+   struct notifier_block   nb;
 };
 
 /**
@@ -85,6 +86,12 @@ struct mdev_device {
  * @mmap:  mmap callback
  * @mdev: mediated device structure
  * @vma: vma structure
+ * @notifer:   Notifier callback, currently only for
+ * VFIO_IOMMU_NOTIFY_DMA_UNMAP action notified duing
+ * DMA_UNMAP call on mapped iova range.
+ * @mdev: mediated device structure
+ * @action: Action for which notifier is called
+ * @data: Data associated with the notifier
  * Parent device that support mediated device should be registered with mdev
  * module with parent_ops structure.
  **/
@@ -106,6 +113,8 @@ struct parent_ops {
ssize_t (*ioctl)(struct mdev_device *mdev, unsigned int cmd,
 unsigned long arg);
int (*mmap)(struct mdev_device *mdev, struct vm_area_struct *vma);
+   int (*notifier)(struct mdev_device *mdev, unsigned long action,
+   void *data);
 };
 
 /* interface for exporting mdev supported type attributes */
-- 
2.7.0

[Qemu-devel] [PATCH v14 04/22] vfio: Common function to increment container_users

2016-11-16 Thread Kirti Wankhede

This change rearrange functions to have common function to increment
container_users

Signed-off-by: Kirti Wankhede 
Signed-off-by: Neo Jia 
Reviewed-by: Jike Song 

Change-Id: I8bdeb352bc8439b107ffd519480fd4dc238677f2
---
 drivers/vfio/vfio.c | 34 +-
 1 file changed, 21 insertions(+), 13 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 23bc86c1d05d..2e83bdf007fe 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1385,6 +1385,23 @@ static bool vfio_group_viable(struct vfio_group *group)
 group, vfio_dev_viable) == 0);
 }
 
+static int vfio_group_add_container_user(struct vfio_group *group)
+{
+   if (!atomic_inc_not_zero(>container_users))
+   return -EINVAL;
+
+   if (group->noiommu) {
+   atomic_dec(>container_users);
+   return -EPERM;
+   }
+   if (!group->container->iommu_driver || !vfio_group_viable(group)) {
+   atomic_dec(>container_users);
+   return -EINVAL;
+   }
+
+   return 0;
+}
+
 static const struct file_operations vfio_device_fops;
 
 static int vfio_group_get_device_fd(struct vfio_group *group, char *buf)
@@ -1694,23 +1711,14 @@ static const struct file_operations vfio_device_fops = {
 struct vfio_group *vfio_group_get_external_user(struct file *filep)
 {
struct vfio_group *group = filep->private_data;
+   int ret;
 
if (filep->f_op != _group_fops)
return ERR_PTR(-EINVAL);
 
-   if (!atomic_inc_not_zero(>container_users))
-   return ERR_PTR(-EINVAL);
-
-   if (group->noiommu) {
-   atomic_dec(>container_users);
-   return ERR_PTR(-EPERM);
-   }
-
-   if (!group->container->iommu_driver ||
-   !vfio_group_viable(group)) {
-   atomic_dec(>container_users);
-   return ERR_PTR(-EINVAL);
-   }
+   ret = vfio_group_add_container_user(group);
+   if (ret)
+   return ERR_PTR(ret);
 
vfio_group_get(group);
 
-- 
2.7.0

Re: [Qemu-devel] [PATCH for-2.8 0/3] virtio fixes

2016-11-16 Thread Farhan Ali


Hi Paolo,

I was able to test your patches in our s390 environment. I don't see the 
qemu crashes anymore which I noticed before.


Testing a guest running high stress I/O workload, without iothreads does 
show a delay in guest response time. But running


the same test with iothreads seems to solve the issue.

Tested-by : Farhan Ali 


Thank you

Farhan


On 11/16/2016 02:50 PM, Christian Borntraeger wrote:

On 11/15/2016 02:46 PM, Paolo Bonzini wrote:

Patch 1 fixes vhost, patches 2-3 fix Windows hibernation.

Paolo

Paolo Bonzini (3):
   virtio: introduce grab/release_ioeventfd to fix vhost
   virtio: access ISR atomically
   virtio: set ISR on dataplane notifications

  hw/block/dataplane/virtio-blk.c |  4 +--
  hw/scsi/virtio-scsi-dataplane.c |  7 --
  hw/scsi/virtio-scsi.c   |  2 +-
  hw/virtio/trace-events  |  2 +-
  hw/virtio/vhost.c   | 11 +++--
  hw/virtio/virtio-bus.c  | 54 -
  hw/virtio/virtio-mmio.c |  6 ++---
  hw/virtio/virtio-pci.c  |  9 +++
  hw/virtio/virtio.c  | 46 ---
  include/hw/virtio/virtio-bus.h  | 14 +++
  include/hw/virtio/virtio-scsi.h |  1 -
  include/hw/virtio/virtio.h  |  4 ++-
  12 files changed, 110 insertions(+), 50 deletions(-)


Farhan,

it was this mail thread.

[Qemu-devel] [PATCH v14 08/22] vfio iommu type1: Add find_iommu_group() function

2016-11-16 Thread Kirti Wankhede

Add find_iommu_group()

Signed-off-by: Kirti Wankhede 
Signed-off-by: Neo Jia 
Reviewed-by: Jike Song 
Reviewed-by: Dong Jia Shi 

Change-Id: I9d372f1ebe9eb01a5a21374b8a2b03f7df73601f
---
 drivers/vfio/vfio_iommu_type1.c | 57 -
 1 file changed, 33 insertions(+), 24 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 52af5fc01d91..ffe2026f1341 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -752,11 +752,24 @@ static void vfio_test_domain_fgsp(struct vfio_domain 
*domain)
__free_pages(pages, order);
 }
 
+static struct vfio_group *find_iommu_group(struct vfio_domain *domain,
+  struct iommu_group *iommu_group)
+{
+   struct vfio_group *g;
+
+   list_for_each_entry(g, >group_list, next) {
+   if (g->iommu_group == iommu_group)
+   return g;
+   }
+
+   return NULL;
+}
+
 static int vfio_iommu_type1_attach_group(void *iommu_data,
 struct iommu_group *iommu_group)
 {
struct vfio_iommu *iommu = iommu_data;
-   struct vfio_group *group, *g;
+   struct vfio_group *group;
struct vfio_domain *domain, *d;
struct bus_type *bus = NULL;
int ret;
@@ -764,10 +777,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
mutex_lock(>lock);
 
list_for_each_entry(d, >domain_list, next) {
-   list_for_each_entry(g, >group_list, next) {
-   if (g->iommu_group != iommu_group)
-   continue;
-
+   if (find_iommu_group(d, iommu_group)) {
mutex_unlock(>lock);
return -EINVAL;
}
@@ -887,27 +897,26 @@ static void vfio_iommu_type1_detach_group(void 
*iommu_data,
mutex_lock(>lock);
 
list_for_each_entry(domain, >domain_list, next) {
-   list_for_each_entry(group, >group_list, next) {
-   if (group->iommu_group != iommu_group)
-   continue;
+   group = find_iommu_group(domain, iommu_group);
+   if (!group)
+   continue;
 
-   iommu_detach_group(domain->domain, iommu_group);
-   list_del(>next);
-   kfree(group);
-   /*
-* Group ownership provides privilege, if the group
-* list is empty, the domain goes away.  If it's the
-* last domain, then all the mappings go away too.
-*/
-   if (list_empty(>group_list)) {
-   if (list_is_singular(>domain_list))
-   vfio_iommu_unmap_unpin_all(iommu);
-   iommu_domain_free(domain->domain);
-   list_del(>next);
-   kfree(domain);
-   }
-   goto done;
+   iommu_detach_group(domain->domain, iommu_group);
+   list_del(>next);
+   kfree(group);
+   /*
+* Group ownership provides privilege, if the group
+* list is empty, the domain goes away.  If it's the
+* last domain, then all the mappings go away too.
+*/
+   if (list_empty(>group_list)) {
+   if (list_is_singular(>domain_list))
+   vfio_iommu_unmap_unpin_all(iommu);
+   iommu_domain_free(domain->domain);
+   list_del(>next);
+   kfree(domain);
}
+   goto done;
}
 
 done:
-- 
2.7.0

[Qemu-devel] (no subject)

2016-11-16 Thread Christopher Oliver

This patch (hack?) works around the slowness in SEEK_HOLE for large dense files
on Linux tmpfs.  It may improve life elsewhere as well, and the penalty of the 
checks
should be vanishingly small where it is not needed.

If I'm subtly (or not so subtly) wrong, please fire back.

Sincerely,

-- 
Christopher Oliver 


qemu-patch
Description: Binary data

[Qemu-devel] [PATCH v14 18/22] vfio: Define device_api strings

2016-11-16 Thread Kirti Wankhede

Defined device API strings. Vendor driver using mediated device
framework should use corresponding string for device_api attribute.

Signed-off-by: Kirti Wankhede 
Signed-off-by: Neo Jia 
Change-Id: I42d29f475f02a7132ce13297fbf2b48f1da10995
---
 include/uapi/linux/vfio.h | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 255a2113f53c..519eff362c1c 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -203,6 +203,16 @@ struct vfio_device_info {
 };
 #define VFIO_DEVICE_GET_INFO   _IO(VFIO_TYPE, VFIO_BASE + 7)
 
+/*
+ * Vendor driver using Mediated device framework should provide device_api
+ * attribute in supported type attribute groups. Device API string should be 
one
+ * of the following corresponding to device flags in vfio_device_info 
structure.
+ */
+
+#define VFIO_DEVICE_API_PCI_STRING "vfio-pci"
+#define VFIO_DEVICE_API_PLATFORM_STRING"vfio-platform"
+#define VFIO_DEVICE_API_AMBA_STRING"vfio-amba"
+
 /**
  * VFIO_DEVICE_GET_REGION_INFO - _IOWR(VFIO_TYPE, VFIO_BASE + 8,
  *struct vfio_region_info)
-- 
2.7.0

[Qemu-devel] [PATCH v14 05/22] vfio iommu: Added pin and unpin callback functions to vfio_iommu_driver_ops

2016-11-16 Thread Kirti Wankhede

Added APIs for pining and unpining set of pages. These call back into
backend iommu module to actually pin and unpin pages.
Added two new callback functions to struct vfio_iommu_driver_ops. Backend
IOMMU module that supports pining and unpinning pages for mdev devices
should provide these functions.

Renamed static functions in vfio_type1_iommu.c to resolve conflicts

Signed-off-by: Kirti Wankhede 
Signed-off-by: Neo Jia 
Reviewed-by: Dong Jia Shi 

Change-Id: Ia7417723aaae86bec2959ad9ae6c2915ddd340e0
---
 drivers/vfio/vfio.c | 102 
 drivers/vfio/vfio_iommu_type1.c |  20 
 include/linux/vfio.h|  13 -
 3 files changed, 124 insertions(+), 11 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 2e83bdf007fe..bd36c16b0ef2 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1799,6 +1799,108 @@ void vfio_info_cap_shift(struct vfio_info_cap *caps, 
size_t offset)
 }
 EXPORT_SYMBOL_GPL(vfio_info_cap_shift);
 
+
+/*
+ * Pin a set of guest PFNs and return their associated host PFNs for local
+ * domain only.
+ * @dev [in] : device
+ * @user_pfn [in]: array of user/guest PFNs to be unpinned.
+ * @npage [in]   : count of elements in user_pfn array.  This count should not
+ *be greater VFIO_PIN_PAGES_MAX_ENTRIES.
+ * @prot [in]: protection flags
+ * @phys_pfn[out]: array of host PFNs
+ * Return error or number of pages pinned.
+ */
+int vfio_pin_pages(struct device *dev, unsigned long *user_pfn, int npage,
+  int prot, unsigned long *phys_pfn)
+{
+   struct vfio_container *container;
+   struct vfio_group *group;
+   struct vfio_iommu_driver *driver;
+   int ret;
+
+   if (!dev || !user_pfn || !phys_pfn || !npage)
+   return -EINVAL;
+
+   if (npage > VFIO_PIN_PAGES_MAX_ENTRIES)
+   return -E2BIG;
+
+   group = vfio_group_get_from_dev(dev);
+   if (IS_ERR(group))
+   return PTR_ERR(group);
+
+   ret = vfio_group_add_container_user(group);
+   if (ret)
+   goto err_pin_pages;
+
+   container = group->container;
+   down_read(>group_lock);
+
+   driver = container->iommu_driver;
+   if (likely(driver && driver->ops->pin_pages))
+   ret = driver->ops->pin_pages(container->iommu_data, user_pfn,
+npage, prot, phys_pfn);
+   else
+   ret = -ENOTTY;
+
+   up_read(>group_lock);
+   vfio_group_try_dissolve_container(group);
+
+err_pin_pages:
+   vfio_group_put(group);
+   return ret;
+}
+EXPORT_SYMBOL(vfio_pin_pages);
+
+/*
+ * Unpin set of host PFNs for local domain only.
+ * @dev [in] : device
+ * @user_pfn [in]: array of user/guest PFNs to be unpinned. Number of 
user/guest
+ *PFNs should not be greater than VFIO_PIN_PAGES_MAX_ENTRIES.
+ * @npage [in]   : count of elements in user_pfn array.  This count should not
+ * be greater than VFIO_PIN_PAGES_MAX_ENTRIES.
+ * Return error or number of pages unpinned.
+ */
+int vfio_unpin_pages(struct device *dev, unsigned long *user_pfn, int npage)
+{
+   struct vfio_container *container;
+   struct vfio_group *group;
+   struct vfio_iommu_driver *driver;
+   int ret;
+
+   if (!dev || !user_pfn || !npage)
+   return -EINVAL;
+
+   if (npage > VFIO_PIN_PAGES_MAX_ENTRIES)
+   return -E2BIG;
+
+   group = vfio_group_get_from_dev(dev);
+   if (IS_ERR(group))
+   return PTR_ERR(group);
+
+   ret = vfio_group_add_container_user(group);
+   if (ret)
+   goto err_unpin_pages;
+
+   container = group->container;
+   down_read(>group_lock);
+
+   driver = container->iommu_driver;
+   if (likely(driver && driver->ops->unpin_pages))
+   ret = driver->ops->unpin_pages(container->iommu_data, user_pfn,
+  npage);
+   else
+   ret = -ENOTTY;
+
+   up_read(>group_lock);
+   vfio_group_try_dissolve_container(group);
+
+err_unpin_pages:
+   vfio_group_put(group);
+   return ret;
+}
+EXPORT_SYMBOL(vfio_unpin_pages);
+
 /**
  * Module/class support
  */
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 2ba19424e4a1..9f3d58d3dfaf 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -259,8 +259,8 @@ static int vaddr_get_pfn(unsigned long vaddr, int prot, 
unsigned long *pfn)
  * the iommu can only map chunks of consecutive pfns anyway, so get the
  * first page and all consecutive pages with the same locking.
  */
-static long vfio_pin_pages(unsigned long vaddr, long npage,
-  int prot, unsigned long *pfn_base)
+static long vfio_pin_pages_remote(unsigned long vaddr, long npage,
+

[Qemu-devel] [PATCH v14 21/22] docs: Sample driver to demonstrate how to use Mediated device framework.

2016-11-16 Thread Kirti Wankhede

The Sample driver creates mdev device that simulates serial port over PCI
card.

Signed-off-by: Kirti Wankhede 
Signed-off-by: Neo Jia 
Change-Id: I857f8f12f8b275f2498dfe8c628a5cdc7193b1b2
---
 Documentation/vfio-mediated-device.txt |  103 ++-
 samples/vfio-mdev/Makefile |   13 +
 samples/vfio-mdev/mtty.c   | 1503 
 3 files changed, 1618 insertions(+), 1 deletion(-)
 create mode 100644 samples/vfio-mdev/Makefile
 create mode 100644 samples/vfio-mdev/mtty.c

diff --git a/Documentation/vfio-mediated-device.txt 
b/Documentation/vfio-mediated-device.txt
index fe8bd2e7b26a..0d2e402af7bb 100644
--- a/Documentation/vfio-mediated-device.txt
+++ b/Documentation/vfio-mediated-device.txt
@@ -289,8 +289,109 @@ these callbacks are supported in the TYPE1 IOMMU module. 
To enable them for
 other IOMMU backend modules, such as PPC64 sPAPR module, they need to provide
 these two callback functions.
 
+Using the Sample Code
+=
+
+mtty.c in samples/vfio-mdev/ directory is a sample driver program to
+demonstrate how to use the mediated device framework.
+
+The sample driver creates an mdev device that simulates a serial port over a 
PCI
+card.
+
+1. Build and load the mtty.ko module.
+
+   This step creates a dummy device, /sys/devices/virtual/mtty/mtty/
+
+   Files in this device directory in sysfs are similar to the following:
+
+   # tree /sys/devices/virtual/mtty/mtty/
+  /sys/devices/virtual/mtty/mtty/
+  |-- mdev_supported_types
+  |   |-- mtty-1
+  |   |   |-- available_instances
+  |   |   |-- create
+  |   |   |-- device_api
+  |   |   |-- devices
+  |   |   `-- name
+  |   `-- mtty-2
+  |   |-- available_instances
+  |   |-- create
+  |   |-- device_api
+  |   |-- devices
+  |   `-- name
+  |-- mtty_dev
+  |   `-- sample_mtty_dev
+  |-- power
+  |   |-- autosuspend_delay_ms
+  |   |-- control
+  |   |-- runtime_active_time
+  |   |-- runtime_status
+  |   `-- runtime_suspended_time
+  |-- subsystem -> ../../../../class/mtty
+  `-- uevent
+
+2. Create a mediated device by using the dummy device that you created in the
+   previous step.
+
+   # echo "83b8f4f2-509f-382f-3c1e-e6bfe0fa1001" > \
+  /sys/devices/virtual/mtty/mtty/mdev_supported_types/mtty-2/create
+
+3. Add parameters to qemu-kvm.
+
+   -device vfio-pci,\
+sysfsdev=/sys/bus/mdev/devices/83b8f4f2-509f-382f-3c1e-e6bfe0fa1001
+
+4. Boot the VM.
+
+   In the Linux guest VM, with no hardware on the host, the device appears
+   as  follows:
+
+   # lspci -s 00:05.0 -xxvv
+   00:05.0 Serial controller: Device 4348:3253 (rev 10) (prog-if 02 [16550])
+   Subsystem: Device 4348:3253
+   Physical Slot: 5
+   Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
+   Stepping- SERR- FastB2B- DisINTx-
+   Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
+   SERR-  Link[LNKA] -> GSI 10 (level, high) -> IRQ
+10
+   :00:05.0: ttyS1 at I/O 0xc150 (irq = 10) is a 16550A
+   :00:05.0: ttyS2 at I/O 0xc158 (irq = 10) is a 16550A
+
+
+5. In the Linux guest VM, check the serial ports.
+
+   # setserial -g /dev/ttyS*
+   /dev/ttyS0, UART: 16550A, Port: 0x03f8, IRQ: 4
+   /dev/ttyS1, UART: 16550A, Port: 0xc150, IRQ: 10
+   /dev/ttyS2, UART: 16550A, Port: 0xc158, IRQ: 10
+
+6. Using a minicom or any terminal enulation program, open port /dev/ttyS1 or
+   /dev/ttyS2 with hardware flow control disabled.
+
+7. Type data on the minicom terminal or send data to the terminal emulation
+   program and read the data.
+
+   Data is loop backed from hosts mtty driver.
+
+8. Destroy the mediated device that you created.
+
+   # echo 1 > /sys/bus/mdev/devices/83b8f4f2-509f-382f-3c1e-e6bfe0fa1001/remove
+
 References
---
+==
 
 [1] See Documentation/vfio.txt for more information on VFIO.
 [2] struct mdev_driver in include/linux/mdev.h
diff --git a/samples/vfio-mdev/Makefile b/samples/vfio-mdev/Makefile
new file mode 100644
index ..a932edbe38eb
--- /dev/null
+++ b/samples/vfio-mdev/Makefile
@@ -0,0 +1,13 @@
+#
+# Makefile for mtty.c file
+#
+KERNEL_DIR:=/lib/modules/$(shell uname -r)/build
+
+obj-m:=mtty.o
+
+modules clean modules_install:
+   $(MAKE) -C $(KERNEL_DIR) SUBDIRS=$(PWD) $@
+
+default: modules
+
+module: modules
diff --git a/samples/vfio-mdev/mtty.c b/samples/vfio-mdev/mtty.c
new file mode 100644
index ..6b633a4ea333
--- /dev/null
+++ b/samples/vfio-mdev/mtty.c
@@ -0,0 +1,1503 @@
+/*
+ * Mediated virtual PCI serial host device driver
+ *
+ * Copyright (c) 2016, NVIDIA CORPORATION. All rights reserved.
+ * Author: Neo Jia 
+ * Kirti Wankhede 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public

Re: [Qemu-devel] [PATCH] translate-all: Enable locking debug in a debug build

2016-11-16 Thread Pranith Kumar

On Wed, Nov 16, 2016 at 10:57 AM, Alex Bennée  wrote:
>
> Pranith Kumar  writes:
>
>> Unconditionally enable locking checks in debug builds so that we get
>> wider testing. Using tcg_debug_assert() allows us to remove
>> DEBUG_LOCKING define.
>
> Interesting. The other option would be to add a debug build to
> .travis.yml that define this (and others) with -DFOO_DEBUG.
>
>>
>> Signed-off-by: Pranith Kumar 
>> ---
>>  translate-all.c | 50 +-
>>  1 file changed, 17 insertions(+), 33 deletions(-)
>>
>> diff --git a/translate-all.c b/translate-all.c
>> index cf828aa..a03f323 100644
>> --- a/translate-all.c
>> +++ b/translate-all.c
>> @@ -60,7 +60,6 @@
>>
>>  /* #define DEBUG_TB_INVALIDATE */
>>  /* #define DEBUG_TB_FLUSH */
>> -/* #define DEBUG_LOCKING */
>>  /* make various TB consistency checks */
>>  /* #define DEBUG_TB_CHECK */
>
> So if we are enabling this for tcg_debug builds why not the other cases?

Ideally, we should enable all the debug checks in the debug build. I
didn't want to touch unrelated stuff in this patch. I can clean up all
these cases if you prefer.

>
>>
>> @@ -75,23 +74,13 @@
>>   * access to the memory related structures are protected with the
>>   * mmap_lock.
>>   */
>> -#ifdef DEBUG_LOCKING
>> -#define DEBUG_MEM_LOCKS 1
>> -#else
>> -#define DEBUG_MEM_LOCKS 0
>> -#endif
>> -
>
> In retrospect I should probably of had a comment in here about the roll
> of tb_lock in CONFIG_SOFTMMU versus the mmap_lock.
>
>>  #ifdef CONFIG_SOFTMMU
>>  #define assert_memory_lock() do {   \
>> -if (DEBUG_MEM_LOCKS) {  \
>> -g_assert(have_tb_lock); \
>> -}   \
>> +tcg_debug_assert(have_tb_lock); \
>>  } while (0)
>>  #else
>>  #define assert_memory_lock() do {   \
>> -if (DEBUG_MEM_LOCKS) {  \
>> -g_assert(have_mmap_lock()); \
>> -}   \
>> +tcg_debug_assert(have_mmap_lock()); \
>>  } while (0)
>>  #endif
>>
>> @@ -172,16 +161,24 @@ static void page_table_config_init(void)
>>  assert(v_l2_levels >= 0);
>>  }
>>
>> +#define assert_tb_locked() do { \
>> +tcg_debug_assert(have_tb_lock); \
>> +} while (0)
>> +
>> +#define assert_tb_unlocked() do {   \
>> +tcg_debug_assert(!have_tb_lock);\
>> +} while (0)
>> +
>
> I'm not sure we need all this multi-line stuff for a simple
> substitution? Richard?

OK, I will update this to a single line macro...

>
>>  void tb_lock(void)
>>  {
>> -assert(!have_tb_lock);
>> +assert_tb_unlocked();
>
> Hmm why introduce a helper for exactly one use?

...or can entirely remove the macro. I don't favour one over the other. :)


-- 
Pranith

Re: [Qemu-devel] [PATCH 3/3] virtio: set ISR on dataplane notifications

2016-11-16 Thread Paolo Bonzini



- Original Message -
> From: "Michael S. Tsirkin" 
> To: "Paolo Bonzini" 
> Cc: qemu-devel@nongnu.org, "alex williamson" , 
> borntrae...@de.ibm.com, fel...@nutanix.com
> Sent: Wednesday, November 16, 2016 9:39:24 PM
> Subject: Re: [PATCH 3/3] virtio: set ISR on dataplane notifications
> 
> On Wed, Nov 16, 2016 at 03:38:11PM -0500, Paolo Bonzini wrote:
> > > > +void virtio_notify_irqfd(VirtIODevice *vdev, VirtQueue *vq)
> > > > +{
> > > > +if (!virtio_should_notify(vdev, vq)) {
> > > > +return;
> > > > +}
> > > > +
> > > > +trace_virtio_notify_irqfd(vdev, vq);
> > > > +virtio_set_isr(vq->vdev, 0x1);
> > > 
> > > So here, I think we need a comment with parts of
> > > the commit log.
> > > 
> > > /*
> > >  * virtio spec 1.0 says ISR bit 0 should be ignored with MSI, but
> > >  * windows drivers included in virtio-win 1.8.0 (circa 2015)
> > >  * for Windows 8.1 only are incorrectly polling this bit during shutdown
> >  
> > 
> > Not sure it's only for Windows 8.1, in fact probably not.
> 
> 8.1 on shutdown and others on crashdump or hibernation?

Even 8.1 is really a hibernation hidden behind a "Shut down" menu item.

Paolo

> > Looks good if you replace this line with
> > 
> > "are incorrectly polling this bit during crashdump or hibernation"
> > 
> > Paolo
> > 
> > >  * in MSI mode, causing a hang if this bit is never updated.
> > >  * Next driver release from 2016 fixed this problem, so working around it
> > >  * is not a must, but it's easy to do so let's do it here.
> > >  *
> > >  * Note: it's safe to update ISR from any thread as it was switched
> > >  * to an atomic operation.
> > >  */
> > 
> > 
> > > 
> > > 
> > > 
> > > > +event_notifier_set(>guest_notifier);
> > > > +}
> > > > +
> > > >  void virtio_notify(VirtIODevice *vdev, VirtQueue *vq)
> > > >  {
> > > >  if (!virtio_should_notify(vdev, vq)) {
> > > > @@ -1990,7 +1994,7 @@ static void
> > > > virtio_queue_guest_notifier_read(EventNotifier *n)
> > > >  {
> > > >  VirtQueue *vq = container_of(n, VirtQueue, guest_notifier);
> > > >  if (event_notifier_test_and_clear(n)) {
> > > > -virtio_irq(vq);
> > > > +virtio_notify_vector(vq->vdev, vq->vector);
> > > >  }
> > > >  }
> > > >  
> > > > diff --git a/include/hw/virtio/virtio-scsi.h
> > > > b/include/hw/virtio/virtio-scsi.h
> > > > index 9fbc7d7..7375196 100644
> > > > --- a/include/hw/virtio/virtio-scsi.h
> > > > +++ b/include/hw/virtio/virtio-scsi.h
> > > > @@ -137,6 +137,5 @@ void virtio_scsi_push_event(VirtIOSCSI *s,
> > > > SCSIDevice
> > > > *dev,
> > > >  void virtio_scsi_dataplane_setup(VirtIOSCSI *s, Error **errp);
> > > >  int virtio_scsi_dataplane_start(VirtIODevice *s);
> > > >  void virtio_scsi_dataplane_stop(VirtIODevice *s);
> > > > -void virtio_scsi_dataplane_notify(VirtIODevice *vdev, VirtIOSCSIReq
> > > > *req);
> > > >  
> > > >  #endif /* QEMU_VIRTIO_SCSI_H */
> > > > diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> > > > index 835b085..ab0e030 100644
> > > > --- a/include/hw/virtio/virtio.h
> > > > +++ b/include/hw/virtio/virtio.h
> > > > @@ -181,6 +181,7 @@ void virtqueue_get_avail_bytes(VirtQueue *vq,
> > > > unsigned
> > > > int *in_bytes,
> > > > unsigned max_in_bytes, unsigned
> > > > max_out_bytes);
> > > >  
> > > >  bool virtio_should_notify(VirtIODevice *vdev, VirtQueue *vq);
> > > > +void virtio_notify_irqfd(VirtIODevice *vdev, VirtQueue *vq);
> > > >  void virtio_notify(VirtIODevice *vdev, VirtQueue *vq);
> > > >  
> > > >  void virtio_save(VirtIODevice *vdev, QEMUFile *f);
> > > > @@ -280,7 +281,6 @@ void virtio_queue_host_notifier_read(EventNotifier
> > > > *n);
> > > >  void virtio_queue_aio_set_host_notifier_handler(VirtQueue *vq,
> > > >  AioContext
> > > >  *ctx,
> > > >  void
> > > >  (*fn)(VirtIODevice *,
> > > > VirtQueue
> > > > *));
> > > > -void virtio_irq(VirtQueue *vq);
> > > >  VirtQueue *virtio_vector_first_queue(VirtIODevice *vdev, uint16_t
> > > >  vector);
> > > >  VirtQueue *virtio_vector_next_queue(VirtQueue *vq);
> > > >  
> > > > --
> > > > 2.9.3
> > > 
>

[Qemu-devel] [PATCH v14 03/22] vfio: Rearrange functions to get vfio_group from dev

2016-11-16 Thread Kirti Wankhede

This patch rearranges functions to get vfio_group from device

Signed-off-by: Kirti Wankhede 
Signed-off-by: Neo Jia 
Reviewed-by: Jike Song 

Change-Id: I1f93262bdbab75094bc24b087b29da35ba70c4c6
---
 drivers/vfio/vfio.c | 23 ---
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index d1d70e0b011b..23bc86c1d05d 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -480,6 +480,21 @@ static struct vfio_group *vfio_group_get_from_minor(int 
minor)
return group;
 }
 
+static struct vfio_group *vfio_group_get_from_dev(struct device *dev)
+{
+   struct iommu_group *iommu_group;
+   struct vfio_group *group;
+
+   iommu_group = iommu_group_get(dev);
+   if (!iommu_group)
+   return NULL;
+
+   group = vfio_group_get_from_iommu(iommu_group);
+   iommu_group_put(iommu_group);
+
+   return group;
+}
+
 /**
  * Device objects - create, release, get, put, search
  */
@@ -811,16 +826,10 @@ EXPORT_SYMBOL_GPL(vfio_add_group_dev);
  */
 struct vfio_device *vfio_device_get_from_dev(struct device *dev)
 {
-   struct iommu_group *iommu_group;
struct vfio_group *group;
struct vfio_device *device;
 
-   iommu_group = iommu_group_get(dev);
-   if (!iommu_group)
-   return NULL;
-
-   group = vfio_group_get_from_iommu(iommu_group);
-   iommu_group_put(iommu_group);
+   group = vfio_group_get_from_dev(dev);
if (!group)
return NULL;
 
-- 
2.7.0

[Qemu-devel] [PATCH v14 20/22] docs: Sysfs ABI for mediated device framework

2016-11-16 Thread Kirti Wankhede

Added details of sysfs ABI for mediated device framework

Signed-off-by: Kirti Wankhede 
Signed-off-by: Neo Jia 
Change-Id: Icb0fd4ed58a2fa793fbcb1c3d5009a4403c1f3ac
---
 Documentation/ABI/testing/sysfs-bus-vfio-mdev | 111 ++
 1 file changed, 111 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-bus-vfio-mdev

diff --git a/Documentation/ABI/testing/sysfs-bus-vfio-mdev 
b/Documentation/ABI/testing/sysfs-bus-vfio-mdev
new file mode 100644
index ..452dbe39270e
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-bus-vfio-mdev
@@ -0,0 +1,111 @@
+What:   /sys/...//mdev_supported_types/
+Date:   October 2016
+Contact:Kirti Wankhede 
+Description:
+This directory contains list of directories of currently
+   supported mediated device types and their details for
+   . Supported type attributes are defined by the
+   vendor driver who registers with Mediated device framework.
+   Each supported type is a directory whose name is created
+   by adding the device driver string as a prefix to the
+   string provided by the vendor driver.
+
+What:   /sys/...//mdev_supported_types//
+Date:   October 2016
+Contact:Kirti Wankhede 
+Description:
+This directory gives details of supported type, like name,
+   description, available_instances, device_api etc.
+   'device_api' and 'available_instances' are mandatory
+   attributes to be provided by vendor driver. 'name',
+   'description' and other vendor driver specific attributes
+   are optional.
+
+What:   /sys/.../mdev_supported_types//create
+Date:   October 2016
+Contact:Kirti Wankhede 
+Description:
+   Writing UUID to this file will create mediated device of
+   type  for parent device . This is a
+   write-only file.
+   For example:
+   # echo "83b8f4f2-509f-382f-3c1e-e6bfe0fa1001" > \
+  /sys/devices/foo/mdev_supported_types/foo-1/create
+
+What:   /sys/.../mdev_supported_types//devices/
+Date:   October 2016
+Contact:Kirti Wankhede 
+Description:
+   This directory contains symbolic links pointing to mdev
+   devices sysfs entries which are created of this .
+
+What:   /sys/.../mdev_supported_types//available_instances
+Date:   October 2016
+Contact:Kirti Wankhede 
+Description:
+   Reading this attribute will show the number of mediated
+   devices of type  that can be created. This is a
+   readonly file.
+Users:
+   Userspace applications interested in creating mediated
+   device of that type. Userspace application should check
+   the number of available instances could be created before
+   creating mediated device of this type.
+
+What:   /sys/.../mdev_supported_types//device_api
+Date:   October 2016
+Contact:Kirti Wankhede 
+Description:
+   Reading this attribute will show VFIO device API supported
+   by this type. For example, "vfio-pci" for a PCI device,
+   "vfio-platform" for platform device.
+
+What:   /sys/.../mdev_supported_types//name
+Date:   October 2016
+Contact:Kirti Wankhede 
+Description:
+   Reading this attribute will show human readable name of the
+   mediated device that will get created of type .
+   This is optional attribute. For example: "Grid M60-0Q"
+Users:
+   Userspace applications interested in knowing the name of
+   a particular  that can help in understanding the
+   type of mediated device.
+
+What:   /sys/.../mdev_supported_types//description
+Date:   October 2016
+Contact:Kirti Wankhede 
+Description:
+   Reading this attribute will show description of the type of
+   mediated device that will get created of type .
+   This is optional attribute. For example:
+   "2 heads, 512M FB, 2560x1600 maximum resolution"
+Users:
+   Userspace applications interested in knowing the details of
+   a particular  that can help in understanding the
+   features provided by that type of mediated device.
+
+What:   /sys/...///
+Date:   October 2016
+Contact:Kirti Wankhede 
+Description:
+   This directory represents device directory of mediated
+   device. It contains all the

[Qemu-devel] [PATCH v14 11/22] vfio iommu: Add blocking notifier to notify DMA_UNMAP

2016-11-16 Thread Kirti Wankhede

Added blocking notifier to IOMMU TYPE1 driver to notify vendor drivers
about DMA_UNMAP.
Exported two APIs vfio_register_notifier() and vfio_unregister_notifier().
Notifier should be registered, if external user wants to use
vfio_pin_pages()/vfio_unpin_pages() APIs to pin/unpin pages.
Vendor driver should use VFIO_IOMMU_NOTIFY_DMA_UNMAP action to invalidate
mappings.

Signed-off-by: Kirti Wankhede 
Signed-off-by: Neo Jia 
Change-Id: I5910d0024d6be87f3e8d3e0ca0eaeaaa0b17f271
---
 drivers/vfio/vfio.c | 73 ++
 drivers/vfio/vfio_iommu_type1.c | 77 +
 include/linux/vfio.h| 12 +++
 3 files changed, 147 insertions(+), 15 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index bd36c16b0ef2..c850ba324be2 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1901,6 +1901,79 @@ err_unpin_pages:
 }
 EXPORT_SYMBOL(vfio_unpin_pages);
 
+int vfio_register_notifier(struct device *dev, struct notifier_block *nb)
+{
+   struct vfio_container *container;
+   struct vfio_group *group;
+   struct vfio_iommu_driver *driver;
+   ssize_t ret;
+
+   if (!dev || !nb)
+   return -EINVAL;
+
+   group = vfio_group_get_from_dev(dev);
+   if (IS_ERR(group))
+   return PTR_ERR(group);
+
+   ret = vfio_group_add_container_user(group);
+   if (ret)
+   goto err_register_nb;
+
+   container = group->container;
+   down_read(>group_lock);
+
+   driver = container->iommu_driver;
+   if (likely(driver && driver->ops->register_notifier))
+   ret = driver->ops->register_notifier(container->iommu_data, nb);
+   else
+   ret = -ENOTTY;
+
+   up_read(>group_lock);
+   vfio_group_try_dissolve_container(group);
+
+err_register_nb:
+   vfio_group_put(group);
+   return ret;
+}
+EXPORT_SYMBOL(vfio_register_notifier);
+
+int vfio_unregister_notifier(struct device *dev, struct notifier_block *nb)
+{
+   struct vfio_container *container;
+   struct vfio_group *group;
+   struct vfio_iommu_driver *driver;
+   ssize_t ret;
+
+   if (!dev || !nb)
+   return -EINVAL;
+
+   group = vfio_group_get_from_dev(dev);
+   if (IS_ERR(group))
+   return PTR_ERR(group);
+
+   ret = vfio_group_add_container_user(group);
+   if (ret)
+   goto err_unregister_nb;
+
+   container = group->container;
+   down_read(>group_lock);
+
+   driver = container->iommu_driver;
+   if (likely(driver && driver->ops->unregister_notifier))
+   ret = driver->ops->unregister_notifier(container->iommu_data,
+  nb);
+   else
+   ret = -ENOTTY;
+
+   up_read(>group_lock);
+   vfio_group_try_dissolve_container(group);
+
+err_unregister_nb:
+   vfio_group_put(group);
+   return ret;
+}
+EXPORT_SYMBOL(vfio_unregister_notifier);
+
 /**
  * Module/class support
  */
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 98191fc590f8..63fbc48a088f 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -38,6 +38,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define DRIVER_VERSION  "0.2"
 #define DRIVER_AUTHOR   "Alex Williamson "
@@ -60,6 +61,7 @@ struct vfio_iommu {
struct vfio_domain  *external_domain; /* domain for external user */
struct mutexlock;
struct rb_root  dma_list;
+   struct blocking_notifier_head notifier;
boolv2;
boolnesting;
 };
@@ -561,7 +563,8 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
 
mutex_lock(>lock);
 
-   if (!iommu->external_domain) {
+   /* Fail if notifier list is empty */
+   if ((!iommu->external_domain) || (!iommu->notifier.head)) {
ret = -EINVAL;
goto pin_done;
}
@@ -776,9 +779,9 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
 struct vfio_iommu_type1_dma_unmap *unmap)
 {
uint64_t mask;
-   struct vfio_dma *dma;
+   struct vfio_dma *dma, *dma_last = NULL;
size_t unmapped = 0;
-   int ret = 0;
+   int ret = 0, retries;
 
mask = ((uint64_t)1 << __ffs(vfio_pgsize_bitmap(iommu))) - 1;
 
@@ -788,7 +791,7 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
return -EINVAL;
 
WARN_ON(mask & PAGE_MASK);
-
+again:
mutex_lock(>lock);
 
/*
@@ -844,6 +847,32 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
 */
if (dma->task->mm != current->mm)
break;
+
+   if (!RB_EMPTY_ROOT(>pfn_list)) {
+   struct

[Qemu-devel] [PATCH v14 13/22] vfio: Introduce common function to add capabilities

2016-11-16 Thread Kirti Wankhede

Vendor driver using mediated device framework should use
vfio_info_add_capability() to add capabilities.
Introduced this function to reduce code duplication in vendor drivers.

vfio_info_cap_shift() manipulated a data buffer to add an offset to each
element in a chain. This data buffer is documented in a uapi header.
Changing vfio_info_cap_shift symbol to be available to all drivers.

Signed-off-by: Kirti Wankhede 
Signed-off-by: Neo Jia 
Change-Id: I6fca329fa2291f37a2c859d0bc97574d9e2ce1a6
---
 drivers/vfio/vfio.c  | 60 +++-
 include/linux/vfio.h |  3 +++
 2 files changed, 62 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index c850ba324be2..82257cf30f52 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1797,8 +1797,66 @@ void vfio_info_cap_shift(struct vfio_info_cap *caps, 
size_t offset)
for (tmp = caps->buf; tmp->next; tmp = (void *)tmp + tmp->next - offset)
tmp->next += offset;
 }
-EXPORT_SYMBOL_GPL(vfio_info_cap_shift);
+EXPORT_SYMBOL(vfio_info_cap_shift);
 
+static int sparse_mmap_cap(struct vfio_info_cap *caps, void *cap_type)
+{
+   struct vfio_info_cap_header *header;
+   struct vfio_region_info_cap_sparse_mmap *sparse_cap, *sparse = cap_type;
+   size_t size;
+
+   size = sizeof(*sparse) + sparse->nr_areas *  sizeof(*sparse->areas);
+   header = vfio_info_cap_add(caps, size,
+  VFIO_REGION_INFO_CAP_SPARSE_MMAP, 1);
+   if (IS_ERR(header))
+   return PTR_ERR(header);
+
+   sparse_cap = container_of(header,
+   struct vfio_region_info_cap_sparse_mmap, header);
+   sparse_cap->nr_areas = sparse->nr_areas;
+   memcpy(sparse_cap->areas, sparse->areas,
+  sparse->nr_areas * sizeof(*sparse->areas));
+   return 0;
+}
+
+static int region_type_cap(struct vfio_info_cap *caps, void *cap_type)
+{
+   struct vfio_info_cap_header *header;
+   struct vfio_region_info_cap_type *type_cap, *cap = cap_type;
+
+   header = vfio_info_cap_add(caps, sizeof(*cap),
+  VFIO_REGION_INFO_CAP_TYPE, 1);
+   if (IS_ERR(header))
+   return PTR_ERR(header);
+
+   type_cap = container_of(header, struct vfio_region_info_cap_type,
+   header);
+   type_cap->type = cap->type;
+   type_cap->subtype = cap->subtype;
+   return 0;
+}
+
+int vfio_info_add_capability(struct vfio_info_cap *caps, int cap_type_id,
+void *cap_type)
+{
+   int ret = -EINVAL;
+
+   if (!cap_type)
+   return 0;
+
+   switch (cap_type_id) {
+   case VFIO_REGION_INFO_CAP_SPARSE_MMAP:
+   ret = sparse_mmap_cap(caps, cap_type);
+   break;
+
+   case VFIO_REGION_INFO_CAP_TYPE:
+   ret = region_type_cap(caps, cap_type);
+   break;
+   }
+
+   return ret;
+}
+EXPORT_SYMBOL(vfio_info_add_capability);
 
 /*
  * Pin a set of guest PFNs and return their associated host PFNs for local
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 6ab13f7e2920..e26f7ccab564 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -126,6 +126,9 @@ extern struct vfio_info_cap_header *vfio_info_cap_add(
struct vfio_info_cap *caps, size_t size, u16 id, u16 version);
 extern void vfio_info_cap_shift(struct vfio_info_cap *caps, size_t offset);
 
+extern int vfio_info_add_capability(struct vfio_info_cap *caps,
+   int cap_type_id, void *cap_type);
+
 struct pci_dev;
 #ifdef CONFIG_EEH
 extern void vfio_spapr_pci_eeh_open(struct pci_dev *pdev);
-- 
2.7.0

[Qemu-devel] [PATCH v14 02/22] vfio: VFIO based driver for Mediated devices

2016-11-16 Thread Kirti Wankhede

vfio_mdev driver registers with mdev core driver.
mdev core driver creates mediated device and calls probe routine of
vfio_mdev driver for each device.
Probe routine of vfio_mdev driver adds mediated device to VFIO core module

This driver forms a shim layer that pass through VFIO devices operations
to vendor driver for mediated devices.

Signed-off-by: Kirti Wankhede 
Signed-off-by: Neo Jia 
Reviewed-by: Jike Song 
Reviewed-by: Dong Jia Shi 

Change-Id: I583f4734752971d3d112324d69e2508c88f359ec
---
 drivers/vfio/mdev/Kconfig |   7 ++
 drivers/vfio/mdev/Makefile|   1 +
 drivers/vfio/mdev/mdev_core.c |  13 +++-
 drivers/vfio/mdev/vfio_mdev.c | 148 ++
 4 files changed, 168 insertions(+), 1 deletion(-)
 create mode 100644 drivers/vfio/mdev/vfio_mdev.c

diff --git a/drivers/vfio/mdev/Kconfig b/drivers/vfio/mdev/Kconfig
index 258481d65ebd..424ddebee55d 100644
--- a/drivers/vfio/mdev/Kconfig
+++ b/drivers/vfio/mdev/Kconfig
@@ -7,3 +7,10 @@ config VFIO_MDEV
  Provides a framework to virtualize devices.
 
  If you don't know what do here, say N.
+
+config VFIO_MDEV_DEVICE
+   tristate "VFIO driver for Mediated devices"
+   depends on VFIO && VFIO_MDEV
+   default n
+   help
+ VFIO based driver for Mediated devices.
diff --git a/drivers/vfio/mdev/Makefile b/drivers/vfio/mdev/Makefile
index 31bc04801d94..fa2d5ea466ee 100644
--- a/drivers/vfio/mdev/Makefile
+++ b/drivers/vfio/mdev/Makefile
@@ -2,3 +2,4 @@
 mdev-y := mdev_core.o mdev_sysfs.o mdev_driver.o
 
 obj-$(CONFIG_VFIO_MDEV) += mdev.o
+obj-$(CONFIG_VFIO_MDEV_DEVICE) += vfio_mdev.o
diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index 613e8a8a3b2a..be1ee89ee917 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -354,7 +354,18 @@ int mdev_device_remove(struct device *dev, bool 
force_remove)
 
 static int __init mdev_init(void)
 {
-   return mdev_bus_register();
+   int ret;
+
+   ret = mdev_bus_register();
+
+   /*
+* Attempt to load known vfio_mdev.  This gives us a working environment
+* without the user needing to explicitly load vfio_mdev driver.
+*/
+   if (!ret)
+   request_module_nowait("vfio_mdev");
+
+   return ret;
 }
 
 static void __exit mdev_exit(void)
diff --git a/drivers/vfio/mdev/vfio_mdev.c b/drivers/vfio/mdev/vfio_mdev.c
new file mode 100644
index ..ffc36758cb84
--- /dev/null
+++ b/drivers/vfio/mdev/vfio_mdev.c
@@ -0,0 +1,148 @@
+/*
+ * VFIO based driver for Mediated device
+ *
+ * Copyright (c) 2016, NVIDIA CORPORATION. All rights reserved.
+ * Author: Neo Jia 
+ * Kirti Wankhede 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "mdev_private.h"
+
+#define DRIVER_VERSION  "0.1"
+#define DRIVER_AUTHOR   "NVIDIA Corporation"
+#define DRIVER_DESC "VFIO based driver for Mediated device"
+
+static int vfio_mdev_open(void *device_data)
+{
+   struct mdev_device *mdev = device_data;
+   struct parent_device *parent = mdev->parent;
+   int ret;
+
+   if (unlikely(!parent->ops->open))
+   return -EINVAL;
+
+   if (!try_module_get(THIS_MODULE))
+   return -ENODEV;
+
+   ret = parent->ops->open(mdev);
+   if (ret)
+   module_put(THIS_MODULE);
+
+   return ret;
+}
+
+static void vfio_mdev_release(void *device_data)
+{
+   struct mdev_device *mdev = device_data;
+   struct parent_device *parent = mdev->parent;
+
+   if (likely(parent->ops->release))
+   parent->ops->release(mdev);
+
+   module_put(THIS_MODULE);
+}
+
+static long vfio_mdev_unlocked_ioctl(void *device_data,
+unsigned int cmd, unsigned long arg)
+{
+   struct mdev_device *mdev = device_data;
+   struct parent_device *parent = mdev->parent;
+
+   if (unlikely(!parent->ops->ioctl))
+   return -EINVAL;
+
+   return parent->ops->ioctl(mdev, cmd, arg);
+}
+
+static ssize_t vfio_mdev_read(void *device_data, char __user *buf,
+ size_t count, loff_t *ppos)
+{
+   struct mdev_device *mdev = device_data;
+   struct parent_device *parent = mdev->parent;
+
+   if (unlikely(!parent->ops->read))
+   return -EINVAL;
+
+   return parent->ops->read(mdev, buf, count, ppos);
+}
+
+static ssize_t vfio_mdev_write(void *device_data, const char __user *buf,
+  size_t count, loff_t *ppos)
+{
+   struct mdev_device *mdev = device_data;
+   struct parent_device *parent =

[Qemu-devel] [PATCH v14 07/22] vfio iommu type1: Update argument of vaddr_get_pfn()

2016-11-16 Thread Kirti Wankhede

Update arguments of vaddr_get_pfn() to take struct mm_struct *mm as input
argument.

Signed-off-by: Kirti Wankhede 
Signed-off-by: Neo Jia 
Change-Id: I885fd4cd4a9f66f4ee2c1caf58267464ec239f52
---
 drivers/vfio/vfio_iommu_type1.c | 32 
 1 file changed, 24 insertions(+), 8 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 34d17e51dc97..52af5fc01d91 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -230,20 +230,36 @@ static int put_pfn(unsigned long pfn, int prot)
return 0;
 }
 
-static int vaddr_get_pfn(unsigned long vaddr, int prot, unsigned long *pfn)
+static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
+int prot, unsigned long *pfn)
 {
struct page *page[1];
struct vm_area_struct *vma;
-   int ret = -EFAULT;
+   int ret;
+
+   if (mm == current->mm) {
+   ret = get_user_pages_fast(vaddr, 1, !!(prot & IOMMU_WRITE),
+ page);
+   } else {
+   unsigned int flags = 0;
+
+   if (prot & IOMMU_WRITE)
+   flags |= FOLL_WRITE;
+
+   down_read(>mmap_sem);
+   ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags, page,
+   NULL);
+   up_read(>mmap_sem);
+   }
 
-   if (get_user_pages_fast(vaddr, 1, !!(prot & IOMMU_WRITE), page) == 1) {
+   if (ret == 1) {
*pfn = page_to_pfn(page[0]);
return 0;
}
 
-   down_read(>mm->mmap_sem);
+   down_read(>mmap_sem);
 
-   vma = find_vma_intersection(current->mm, vaddr, vaddr + 1);
+   vma = find_vma_intersection(mm, vaddr, vaddr + 1);
 
if (vma && vma->vm_flags & VM_PFNMAP) {
*pfn = ((vaddr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
@@ -251,7 +267,7 @@ static int vaddr_get_pfn(unsigned long vaddr, int prot, 
unsigned long *pfn)
ret = 0;
}
 
-   up_read(>mm->mmap_sem);
+   up_read(>mmap_sem);
 
return ret;
 }
@@ -272,7 +288,7 @@ static long vfio_pin_pages_remote(unsigned long vaddr, long 
npage,
if (!current->mm)
return -ENODEV;
 
-   ret = vaddr_get_pfn(vaddr, prot, pfn_base);
+   ret = vaddr_get_pfn(current->mm, vaddr, prot, pfn_base);
if (ret)
return ret;
 
@@ -295,7 +311,7 @@ static long vfio_pin_pages_remote(unsigned long vaddr, long 
npage,
for (i = 1, vaddr += PAGE_SIZE; i < npage; i++, vaddr += PAGE_SIZE) {
unsigned long pfn = 0;
 
-   ret = vaddr_get_pfn(vaddr, prot, );
+   ret = vaddr_get_pfn(current->mm, vaddr, prot, );
if (ret)
break;
 
-- 
2.7.0

[Qemu-devel] [PATCH v14 17/22] vfio_platform: Updated to use vfio_set_irqs_validate_and_prepare()

2016-11-16 Thread Kirti Wankhede

Updated vfio_platform_common.c file to use
vfio_set_irqs_validate_and_prepare()

Signed-off-by: Kirti Wankhede 
Signed-off-by: Neo Jia 
Change-Id: Id87cd6b78ae901610b39bf957974baa6f40cd7b0
---
 drivers/vfio/platform/vfio_platform_common.c | 31 +++-
 1 file changed, 8 insertions(+), 23 deletions(-)

diff --git a/drivers/vfio/platform/vfio_platform_common.c 
b/drivers/vfio/platform/vfio_platform_common.c
index d78142830754..4c27f4be3c3d 100644
--- a/drivers/vfio/platform/vfio_platform_common.c
+++ b/drivers/vfio/platform/vfio_platform_common.c
@@ -364,36 +364,21 @@ static long vfio_platform_ioctl(void *device_data,
struct vfio_irq_set hdr;
u8 *data = NULL;
int ret = 0;
+   size_t data_size = 0;
 
minsz = offsetofend(struct vfio_irq_set, count);
 
if (copy_from_user(, (void __user *)arg, minsz))
return -EFAULT;
 
-   if (hdr.argsz < minsz)
-   return -EINVAL;
-
-   if (hdr.index >= vdev->num_irqs)
-   return -EINVAL;
-
-   if (hdr.flags & ~(VFIO_IRQ_SET_DATA_TYPE_MASK |
- VFIO_IRQ_SET_ACTION_TYPE_MASK))
-   return -EINVAL;
-
-   if (!(hdr.flags & VFIO_IRQ_SET_DATA_NONE)) {
-   size_t size;
-
-   if (hdr.flags & VFIO_IRQ_SET_DATA_BOOL)
-   size = sizeof(uint8_t);
-   else if (hdr.flags & VFIO_IRQ_SET_DATA_EVENTFD)
-   size = sizeof(int32_t);
-   else
-   return -EINVAL;
-
-   if (hdr.argsz - minsz < size)
-   return -EINVAL;
+   ret = vfio_set_irqs_validate_and_prepare(, vdev->num_irqs,
+vdev->num_irqs, _size);
+   if (ret)
+   return ret;
 
-   data = memdup_user((void __user *)(arg + minsz), size);
+   if (data_size) {
+   data = memdup_user((void __user *)(arg + minsz),
+   data_size);
if (IS_ERR(data))
return PTR_ERR(data);
}
-- 
2.7.0

[Qemu-devel] [PATCH v14 06/22] vfio iommu type1: Update arguments of vfio_lock_acct

2016-11-16 Thread Kirti Wankhede

Added task structure as input argument to vfio_lock_acct() function.

Signed-off-by: Kirti Wankhede 
Signed-off-by: Neo Jia 
Change-Id: I5d3673cc9d3786bb436b395d5f74537f1a36da80
---
 drivers/vfio/vfio_iommu_type1.c | 30 --
 1 file changed, 16 insertions(+), 14 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 9f3d58d3dfaf..34d17e51dc97 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -150,17 +150,22 @@ static void vfio_lock_acct_bg(struct work_struct *work)
kfree(vwork);
 }
 
-static void vfio_lock_acct(long npage)
+static void vfio_lock_acct(struct task_struct *task, long npage)
 {
struct vwork *vwork;
struct mm_struct *mm;
 
-   if (!current->mm || !npage)
+   if (!npage)
+   return;
+
+   mm = get_task_mm(task);
+   if (!mm)
return; /* process exited or nothing to do */
 
-   if (down_write_trylock(>mm->mmap_sem)) {
-   current->mm->locked_vm += npage;
-   up_write(>mm->mmap_sem);
+   if (down_write_trylock(>mmap_sem)) {
+   mm->locked_vm += npage;
+   up_write(>mmap_sem);
+   mmput(mm);
return;
}
 
@@ -170,11 +175,8 @@ static void vfio_lock_acct(long npage)
 * wouldn't need this silliness
 */
vwork = kmalloc(sizeof(struct vwork), GFP_KERNEL);
-   if (!vwork)
-   return;
-   mm = get_task_mm(current);
-   if (!mm) {
-   kfree(vwork);
+   if (!vwork) {
+   mmput(mm);
return;
}
INIT_WORK(>work, vfio_lock_acct_bg);
@@ -285,7 +287,7 @@ static long vfio_pin_pages_remote(unsigned long vaddr, long 
npage,
 
if (unlikely(disable_hugepages)) {
if (!rsvd)
-   vfio_lock_acct(1);
+   vfio_lock_acct(current, 1);
return 1;
}
 
@@ -313,7 +315,7 @@ static long vfio_pin_pages_remote(unsigned long vaddr, long 
npage,
}
 
if (!rsvd)
-   vfio_lock_acct(i);
+   vfio_lock_acct(current, i);
 
return i;
 }
@@ -328,7 +330,7 @@ static long vfio_unpin_pages_remote(unsigned long pfn, long 
npage,
unlocked += put_pfn(pfn++, prot);
 
if (do_accounting)
-   vfio_lock_acct(-unlocked);
+   vfio_lock_acct(current, -unlocked);
 
return unlocked;
 }
@@ -390,7 +392,7 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, 
struct vfio_dma *dma)
cond_resched();
}
 
-   vfio_lock_acct(-unlocked);
+   vfio_lock_acct(current, -unlocked);
 }
 
 static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
-- 
2.7.0

[Qemu-devel] [PATCH v14 00/22] Add Mediated device support

2016-11-16 Thread Kirti Wankhede

This series adds Mediated device support to Linux host kernel. Purpose
of this series is to provide a common interface for mediated device
management that can be used by different devices. This series introduces
Mdev core module that creates and manages mediated devices, VFIO based
driver for mediated devices that are created by mdev core module and
update VFIO type1 IOMMU module to support pinning & unpinning for mediated
devices.

What changed v13-> v14?
- Added retries to notify DMA_UNMAP, if pfn_list is not empty.
- Added BUG_ON if pages are not unpined during DMA_UNMAP after 10 retries.
- Removed unpinning from detach_group() and release() and added WARN_ON
  if pfn_list is not empty.
- Updated page accounting logic.

Tested by assigning below combinations of devices to a single VM:
- GPU pass through only
- vGPU device only
- One GPU pass through and one vGPU device
- Linux VM hot plug and unplug vGPU device while GPU pass through device
  exist
- Linux VM hot plug and unplug GPU pass through device while vGPU device
  exist

Tested with Linux-next upto commit e76d21c40bd6.

Kirti Wankhede (22):
  vfio: Mediated device Core driver
  vfio: VFIO based driver for Mediated devices
  vfio: Rearrange functions to get vfio_group from dev
  vfio: Common function to increment container_users
  vfio iommu: Added pin and unpin callback functions to
vfio_iommu_driver_ops
  vfio iommu type1: Update arguments of vfio_lock_acct
  vfio iommu type1: Update argument of vaddr_get_pfn()
  vfio iommu type1: Add find_iommu_group() function
  vfio iommu type1: Add task structure to vfio_dma
  vfio iommu type1: Add support for mediated devices
  vfio iommu: Add blocking notifier to notify DMA_UNMAP
  vfio: Add notifier callback to parent's ops structure of mdev
  vfio: Introduce common function to add capabilities
  vfio_pci: Update vfio_pci to use vfio_info_add_capability()
  vfio: Introduce vfio_set_irqs_validate_and_prepare()
  vfio_pci: Updated to use vfio_set_irqs_validate_and_prepare()
  vfio_platform: Updated to use vfio_set_irqs_validate_and_prepare()
  vfio: Define device_api strings
  docs: Add Documentation for Mediated devices
  docs: Sysfs ABI for mediated device framework
  docs: Sample driver to demonstrate how to use Mediated device
framework.
  MAINTAINERS: Add entry VFIO based Mediated device drivers

 Documentation/ABI/testing/sysfs-bus-vfio-mdev |  111 ++
 Documentation/vfio-mediated-device.txt|  399 +++
 MAINTAINERS   |9 +
 drivers/vfio/Kconfig  |1 +
 drivers/vfio/Makefile |1 +
 drivers/vfio/mdev/Kconfig |   17 +
 drivers/vfio/mdev/Makefile|5 +
 drivers/vfio/mdev/mdev_core.c |  385 +++
 drivers/vfio/mdev/mdev_driver.c   |  119 ++
 drivers/vfio/mdev/mdev_private.h  |   41 +
 drivers/vfio/mdev/mdev_sysfs.c|  286 +
 drivers/vfio/mdev/vfio_mdev.c |  180 +++
 drivers/vfio/pci/vfio_pci.c   |   83 +-
 drivers/vfio/platform/vfio_platform_common.c  |   31 +-
 drivers/vfio/vfio.c   |  340 +-
 drivers/vfio/vfio_iommu_type1.c   |  872 +++---
 include/linux/mdev.h  |  177 +++
 include/linux/vfio.h  |   32 +-
 include/uapi/linux/vfio.h |   10 +
 samples/vfio-mdev/Makefile|   13 +
 samples/vfio-mdev/mtty.c  | 1503 +
 21 files changed, 4358 insertions(+), 257 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-bus-vfio-mdev
 create mode 100644 Documentation/vfio-mediated-device.txt
 create mode 100644 drivers/vfio/mdev/Kconfig
 create mode 100644 drivers/vfio/mdev/Makefile
 create mode 100644 drivers/vfio/mdev/mdev_core.c
 create mode 100644 drivers/vfio/mdev/mdev_driver.c
 create mode 100644 drivers/vfio/mdev/mdev_private.h
 create mode 100644 drivers/vfio/mdev/mdev_sysfs.c
 create mode 100644 drivers/vfio/mdev/vfio_mdev.c
 create mode 100644 include/linux/mdev.h
 create mode 100644 samples/vfio-mdev/Makefile
 create mode 100644 samples/vfio-mdev/mtty.c

-- 
2.7.0

[Qemu-devel] [PATCH v14 10/22] vfio iommu type1: Add support for mediated devices

2016-11-16 Thread Kirti Wankhede

VFIO IOMMU drivers are designed for the devices which are IOMMU capable.
Mediated device only uses IOMMU APIs, the underlying hardware can be
managed by an IOMMU domain.

Aim of this change is:
- To use most of the code of TYPE1 IOMMU driver for mediated devices
- To support direct assigned device and mediated device in single module

This change adds pin and unpin support for mediated device to TYPE1 IOMMU
backend module. More details:
- Domain for external user is tracked separately in vfio_iommu structure.
  It is allocated when group for first mdev device is attached.
- Pages pinned for external domain are tracked in each vfio_dma structure
  for that iova range.
- Page tracking rb-tree in vfio_dma keeps . Key of
  rb-tree is iova, but it actually aims to track pfns.
- On external pin request for an iova, page is pinned once, if iova is
  already pinned and tracked, ref_count is incremented.
- External unpin request unpins pages only when ref_count is 0.
- Pinned pages list is used to find pfn from iova and then unpin it.
  WARN_ON is added if there are entires in pfn_list while detaching the
  group and releasing the domain.
- Page accounting is updated to account in its address space where the
  pages are pinned/unpinned, i.e dma->task
-  Accouting for mdev device is only done if there is no iommu capable
  domain in the container. When there is a direct device assigned to the
  container and that domain is iommu capable, all pages are already pinned
  during DMA_MAP.
- Page accouting is updated on hot plug and unplug mdev device and pass
  through device.

Tested by assigning below combinations of devices to a single VM:
- GPU pass through only
- vGPU device only
- One GPU pass through and one vGPU device
- Linux VM hot plug and unplug vGPU device while GPU pass through device
  exist
- Linux VM hot plug and unplug GPU pass through device while vGPU device
  exist

Signed-off-by: Kirti Wankhede 
Signed-off-by: Neo Jia 
Change-Id: I295d6f0f2e0579b8d9882bfd8fd5a4194b97bd9a
---
 drivers/vfio/vfio_iommu_type1.c | 621 ++--
 1 file changed, 537 insertions(+), 84 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index a0a7484cec64..98191fc590f8 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define DRIVER_VERSION  "0.2"
 #define DRIVER_AUTHOR   "Alex Williamson "
@@ -56,6 +57,7 @@ MODULE_PARM_DESC(disable_hugepages,
 
 struct vfio_iommu {
struct list_headdomain_list;
+   struct vfio_domain  *external_domain; /* domain for external user */
struct mutexlock;
struct rb_root  dma_list;
boolv2;
@@ -76,7 +78,9 @@ struct vfio_dma {
unsigned long   vaddr;  /* Process virtual addr */
size_t  size;   /* Map size (bytes) */
int prot;   /* IOMMU_READ/WRITE */
+   booliommu_mapped;
struct task_struct  *task;
+   struct rb_root  pfn_list;   /* Ex-user pinned pfn list */
 };
 
 struct vfio_group {
@@ -85,6 +89,21 @@ struct vfio_group {
 };
 
 /*
+ * Guest RAM pinning working set or DMA target
+ */
+struct vfio_pfn {
+   struct rb_node  node;
+   dma_addr_t  iova;   /* Device address */
+   unsigned long   pfn;/* Host pfn */
+   atomic_tref_count;
+};
+
+#define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)\
+   (!list_empty(>domain_list))
+
+static int put_pfn(unsigned long pfn, int prot);
+
+/*
  * This code handles mapping and unmapping of user data buffers
  * into DMA'ble space using the IOMMU
  */
@@ -132,6 +151,97 @@ static void vfio_unlink_dma(struct vfio_iommu *iommu, 
struct vfio_dma *old)
rb_erase(>node, >dma_list);
 }
 
+/*
+ * Helper Functions for host iova-pfn list
+ */
+static struct vfio_pfn *vfio_find_vpfn(struct vfio_dma *dma, dma_addr_t iova)
+{
+   struct vfio_pfn *vpfn;
+   struct rb_node *node = dma->pfn_list.rb_node;
+
+   while (node) {
+   vpfn = rb_entry(node, struct vfio_pfn, node);
+
+   if (iova < vpfn->iova)
+   node = node->rb_left;
+   else if (iova > vpfn->iova)
+   node = node->rb_right;
+   else
+   return vpfn;
+   }
+   return NULL;
+}
+
+static void vfio_link_pfn(struct vfio_dma *dma,
+ struct vfio_pfn *new)
+{
+   struct rb_node **link, *parent = NULL;
+   struct vfio_pfn *vpfn;
+
+   link = >pfn_list.rb_node;
+   while (*link) {
+   parent = *link;
+   vpfn = rb_entry(parent,

[Qemu-devel] [PATCH v14 16/22] vfio_pci: Updated to use vfio_set_irqs_validate_and_prepare()

2016-11-16 Thread Kirti Wankhede

Updated vfio_pci.c file to use vfio_set_irqs_validate_and_prepare()

Signed-off-by: Kirti Wankhede 
Signed-off-by: Neo Jia 
Change-Id: I9f3daba89d8dba5cb5b01a8cff420412f30686c7
---
 drivers/vfio/pci/vfio_pci.c | 34 +++---
 1 file changed, 7 insertions(+), 27 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 03b5434f4d5b..dcd7c2a99618 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -818,45 +818,25 @@ static long vfio_pci_ioctl(void *device_data,
 
} else if (cmd == VFIO_DEVICE_SET_IRQS) {
struct vfio_irq_set hdr;
-   size_t size;
u8 *data = NULL;
int max, ret = 0;
+   size_t data_size = 0;
 
minsz = offsetofend(struct vfio_irq_set, count);
 
if (copy_from_user(, (void __user *)arg, minsz))
return -EFAULT;
 
-   if (hdr.argsz < minsz || hdr.index >= VFIO_PCI_NUM_IRQS ||
-   hdr.count >= (U32_MAX - hdr.start) ||
-   hdr.flags & ~(VFIO_IRQ_SET_DATA_TYPE_MASK |
- VFIO_IRQ_SET_ACTION_TYPE_MASK))
-   return -EINVAL;
-
max = vfio_pci_get_irq_count(vdev, hdr.index);
-   if (hdr.start >= max || hdr.start + hdr.count > max)
-   return -EINVAL;
 
-   switch (hdr.flags & VFIO_IRQ_SET_DATA_TYPE_MASK) {
-   case VFIO_IRQ_SET_DATA_NONE:
-   size = 0;
-   break;
-   case VFIO_IRQ_SET_DATA_BOOL:
-   size = sizeof(uint8_t);
-   break;
-   case VFIO_IRQ_SET_DATA_EVENTFD:
-   size = sizeof(int32_t);
-   break;
-   default:
-   return -EINVAL;
-   }
-
-   if (size) {
-   if (hdr.argsz - minsz < hdr.count * size)
-   return -EINVAL;
+   ret = vfio_set_irqs_validate_and_prepare(, max,
+VFIO_PCI_NUM_IRQS, _size);
+   if (ret)
+   return ret;
 
+   if (data_size) {
data = memdup_user((void __user *)(arg + minsz),
-  hdr.count * size);
+   data_size);
if (IS_ERR(data))
return PTR_ERR(data);
}
-- 
2.7.0

[Qemu-devel] [PATCH v14 01/22] vfio: Mediated device Core driver

2016-11-16 Thread Kirti Wankhede

Design for Mediated Device Driver:
Main purpose of this driver is to provide a common interface for mediated
device management that can be used by different drivers of different
devices.

This module provides a generic interface to create the device, add it to
mediated bus, add device to IOMMU group and then add it to vfio group.

Below is the high Level block diagram, with Nvidia, Intel and IBM devices
as example, since these are the devices which are going to actively use
this module as of now.

 +---+
 |   |
 | +---+ |  mdev_register_driver() +--+
 | |   | +<+ __init() |
 | |  mdev | | |  |
 | |  bus  | +>+  |<-> VFIO user
 | |  driver   | | probe()/remove()| vfio_mdev.ko |APIs
 | |   | | |  |
 | +---+ | +--+
 |   |
 |  MDEV CORE|
 |   MODULE  |
 |   mdev.ko |
 | +---+ |  mdev_register_device() +--+
 | |   | +<+  |
 | |   | | |  nvidia.ko   |<-> physical
 | |   | +>+  |device
 | |   | |callback +--+
 | | Physical  | |
 | |  device   | |  mdev_register_device() +--+
 | | interface | |<+  |
 | |   | | |  i915.ko |<-> physical
 | |   | +>+  |device
 | |   | |callback +--+
 | |   | |
 | |   | |  mdev_register_device() +--+
 | |   | +<+  |
 | |   | | | ccw_device.ko|<-> physical
 | |   | +>+  |device
 | |   | |callback +--+
 | +---+ |
 +---+

Core driver provides two types of registration interfaces:
1. Registration interface for mediated bus driver:

/**
  * struct mdev_driver - Mediated device's driver
  * @name: driver name
  * @probe: called when new device created
  * @remove:called when device removed
  * @driver:device driver structure
  *
  **/
struct mdev_driver {
 const char *name;
 int  (*probe)  (struct device *dev);
 void (*remove) (struct device *dev);
 struct device_driverdriver;
};

Mediated bus driver for mdev device should use this interface to register
and unregister with core driver respectively:

int  mdev_register_driver(struct mdev_driver *drv, struct module *owner);
void mdev_unregister_driver(struct mdev_driver *drv);

Mediated bus driver is responsible to add/delete mediated devices to/from
VFIO group when devices are bound and unbound to the driver.

2. Physical device driver interface
This interface provides vendor driver the set APIs to manage physical
device related work in its driver. APIs are :

* dev_attr_groups: attributes of the parent device.
* mdev_attr_groups: attributes of the mediated device.
* supported_type_groups: attributes to define supported type. This is
 mandatory field.
* create: to allocate basic resources in vendor driver for a mediated
 device. This is mandatory to be provided by vendor driver.
* remove: to free resources in vendor driver when mediated device is
 destroyed. This is mandatory to be provided by vendor driver.
* open: open callback of mediated device
* release: release callback of mediated device
* read : read emulation callback.
* write: write emulation callback.
* ioctl: ioctl callback.
* mmap: mmap emulation callback.

Drivers should use these interfaces to register and unregister device to
mdev core driver respectively:

extern int  mdev_register_device(struct device *dev,
 const struct parent_ops *ops);
extern void mdev_unregister_device(struct device *dev);

There are no locks to serialize above callbacks in mdev driver and
vfio_mdev driver. If required, vendor driver can have locks to serialize
above APIs in their driver.

Signed-off-by: Kirti Wankhede 
Signed-off-by: Neo Jia 
Reviewed-by: Jike Song 
Reviewed-by: Dong Jia Shi 

Change-Id: I73a5084574270b14541c529461ea2f03c292d510
---
 drivers/vfio/Kconfig |   1 +
 drivers/vfio/Makefile|   1 +
 drivers/vfio/mdev/Kconfig|   9 +
 drivers/vfio/mdev/Makefile   |   4 +
 drivers/vfio/mdev/mdev_core.c| 374 +++
 drivers/vfio/mdev/mdev_driver.c  | 119 +
 drivers/vfio/mdev/mdev_private.h |  41 +
 drivers/vfio/mdev/mdev_sysfs.c   | 286 ++
 include/linux/mdev.h

Re: [Qemu-devel] [PATCH 3/3] virtio: set ISR on dataplane notifications

2016-11-16 Thread Paolo Bonzini

> > +void virtio_notify_irqfd(VirtIODevice *vdev, VirtQueue *vq)
> > +{
> > +if (!virtio_should_notify(vdev, vq)) {
> > +return;
> > +}
> > +
> > +trace_virtio_notify_irqfd(vdev, vq);
> > +virtio_set_isr(vq->vdev, 0x1);
> 
> So here, I think we need a comment with parts of
> the commit log.
> 
> /*
>  * virtio spec 1.0 says ISR bit 0 should be ignored with MSI, but
>  * windows drivers included in virtio-win 1.8.0 (circa 2015)
>  * for Windows 8.1 only are incorrectly polling this bit during shutdown
 

Not sure it's only for Windows 8.1, in fact probably not.
Looks good if you replace this line with

"are incorrectly polling this bit during crashdump or hibernation"

Paolo

>  * in MSI mode, causing a hang if this bit is never updated.
>  * Next driver release from 2016 fixed this problem, so working around it
>  * is not a must, but it's easy to do so let's do it here.
>  *
>  * Note: it's safe to update ISR from any thread as it was switched
>  * to an atomic operation.
>  */


> 
> 
> 
> > +event_notifier_set(>guest_notifier);
> > +}
> > +
> >  void virtio_notify(VirtIODevice *vdev, VirtQueue *vq)
> >  {
> >  if (!virtio_should_notify(vdev, vq)) {
> > @@ -1990,7 +1994,7 @@ static void
> > virtio_queue_guest_notifier_read(EventNotifier *n)
> >  {
> >  VirtQueue *vq = container_of(n, VirtQueue, guest_notifier);
> >  if (event_notifier_test_and_clear(n)) {
> > -virtio_irq(vq);
> > +virtio_notify_vector(vq->vdev, vq->vector);
> >  }
> >  }
> >  
> > diff --git a/include/hw/virtio/virtio-scsi.h
> > b/include/hw/virtio/virtio-scsi.h
> > index 9fbc7d7..7375196 100644
> > --- a/include/hw/virtio/virtio-scsi.h
> > +++ b/include/hw/virtio/virtio-scsi.h
> > @@ -137,6 +137,5 @@ void virtio_scsi_push_event(VirtIOSCSI *s, SCSIDevice
> > *dev,
> >  void virtio_scsi_dataplane_setup(VirtIOSCSI *s, Error **errp);
> >  int virtio_scsi_dataplane_start(VirtIODevice *s);
> >  void virtio_scsi_dataplane_stop(VirtIODevice *s);
> > -void virtio_scsi_dataplane_notify(VirtIODevice *vdev, VirtIOSCSIReq *req);
> >  
> >  #endif /* QEMU_VIRTIO_SCSI_H */
> > diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> > index 835b085..ab0e030 100644
> > --- a/include/hw/virtio/virtio.h
> > +++ b/include/hw/virtio/virtio.h
> > @@ -181,6 +181,7 @@ void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned
> > int *in_bytes,
> > unsigned max_in_bytes, unsigned
> > max_out_bytes);
> >  
> >  bool virtio_should_notify(VirtIODevice *vdev, VirtQueue *vq);
> > +void virtio_notify_irqfd(VirtIODevice *vdev, VirtQueue *vq);
> >  void virtio_notify(VirtIODevice *vdev, VirtQueue *vq);
> >  
> >  void virtio_save(VirtIODevice *vdev, QEMUFile *f);
> > @@ -280,7 +281,6 @@ void virtio_queue_host_notifier_read(EventNotifier *n);
> >  void virtio_queue_aio_set_host_notifier_handler(VirtQueue *vq, AioContext
> >  *ctx,
> >  void (*fn)(VirtIODevice *,
> > VirtQueue *));
> > -void virtio_irq(VirtQueue *vq);
> >  VirtQueue *virtio_vector_first_queue(VirtIODevice *vdev, uint16_t vector);
> >  VirtQueue *virtio_vector_next_queue(VirtQueue *vq);
> >  
> > --
> > 2.9.3
>

[Qemu-devel] [PATCH v14 09/22] vfio iommu type1: Add task structure to vfio_dma

2016-11-16 Thread Kirti Wankhede

Add task structure to vfio_dma structure. Task structure is used for:
- During DMA_UNMAP, same task who mapped it or other task who shares same
address space is allowed to unmap, otherwise unmap fails.
QEMU maps few iova ranges initially, then fork threads and from the child
thread calls DMA_UNMAP on previously mapped iova. Since child shares same
address space, DMA_UNMAP is successful.
- Avoid accessing struct mm while process is exiting by acquiring
reference of task's mm during page accounting.
- It is also used to get task mlock capability and rlimit for mlock.

Signed-off-by: Kirti Wankhede 
Signed-off-by: Neo Jia 
Reviewed-by: Dong Jia Shi 

Change-Id: I7600f1bea6b384fd589fa72421ccf031bcfd9ac5
---
 drivers/vfio/vfio_iommu_type1.c | 137 +---
 1 file changed, 86 insertions(+), 51 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index ffe2026f1341..a0a7484cec64 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define DRIVER_VERSION  "0.2"
 #define DRIVER_AUTHOR   "Alex Williamson "
@@ -75,6 +76,7 @@ struct vfio_dma {
unsigned long   vaddr;  /* Process virtual addr */
size_t  size;   /* Map size (bytes) */
int prot;   /* IOMMU_READ/WRITE */
+   struct task_struct  *task;
 };
 
 struct vfio_group {
@@ -277,41 +279,47 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned 
long vaddr,
  * the iommu can only map chunks of consecutive pfns anyway, so get the
  * first page and all consecutive pages with the same locking.
  */
-static long vfio_pin_pages_remote(unsigned long vaddr, long npage,
- int prot, unsigned long *pfn_base)
+static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr,
+ long npage, int prot, unsigned long *pfn_base)
 {
-   unsigned long limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
-   bool lock_cap = capable(CAP_IPC_LOCK);
+   unsigned long limit;
+   bool lock_cap = ns_capable(task_active_pid_ns(dma->task)->user_ns,
+  CAP_IPC_LOCK);
+   struct mm_struct *mm;
long ret, i;
bool rsvd;
 
-   if (!current->mm)
+   mm = get_task_mm(dma->task);
+   if (!mm)
return -ENODEV;
 
-   ret = vaddr_get_pfn(current->mm, vaddr, prot, pfn_base);
+   ret = vaddr_get_pfn(mm, vaddr, prot, pfn_base);
if (ret)
-   return ret;
+   goto pin_pg_remote_exit;
 
rsvd = is_invalid_reserved_pfn(*pfn_base);
+   limit = task_rlimit(dma->task, RLIMIT_MEMLOCK) >> PAGE_SHIFT;
 
-   if (!rsvd && !lock_cap && current->mm->locked_vm + 1 > limit) {
+   if (!rsvd && !lock_cap && mm->locked_vm + 1 > limit) {
put_pfn(*pfn_base, prot);
pr_warn("%s: RLIMIT_MEMLOCK (%ld) exceeded\n", __func__,
limit << PAGE_SHIFT);
-   return -ENOMEM;
+   ret = -ENOMEM;
+   goto pin_pg_remote_exit;
}
 
if (unlikely(disable_hugepages)) {
if (!rsvd)
-   vfio_lock_acct(current, 1);
-   return 1;
+   vfio_lock_acct(dma->task, 1);
+   ret = 1;
+   goto pin_pg_remote_exit;
}
 
/* Lock all the consecutive pages from pfn_base */
for (i = 1, vaddr += PAGE_SIZE; i < npage; i++, vaddr += PAGE_SIZE) {
unsigned long pfn = 0;
 
-   ret = vaddr_get_pfn(current->mm, vaddr, prot, );
+   ret = vaddr_get_pfn(mm, vaddr, prot, );
if (ret)
break;
 
@@ -321,8 +329,7 @@ static long vfio_pin_pages_remote(unsigned long vaddr, long 
npage,
break;
}
 
-   if (!rsvd && !lock_cap &&
-   current->mm->locked_vm + i + 1 > limit) {
+   if (!rsvd && !lock_cap && mm->locked_vm + i + 1 > limit) {
put_pfn(pfn, prot);
pr_warn("%s: RLIMIT_MEMLOCK (%ld) exceeded\n",
__func__, limit << PAGE_SHIFT);
@@ -331,13 +338,16 @@ static long vfio_pin_pages_remote(unsigned long vaddr, 
long npage,
}
 
if (!rsvd)
-   vfio_lock_acct(current, i);
+   vfio_lock_acct(dma->task, i);
+   ret = i;
 
-   return i;
+pin_pg_remote_exit:
+   mmput(mm);
+   return ret;
 }
 
-static long vfio_unpin_pages_remote(unsigned long pfn, long npage,
-   int prot, bool do_accounting)
+static long vfio_unpin_pages_remote(struct vfio_dma *dma, unsigned long pfn,
+

[Qemu-devel] [PATCH v14 14/22] vfio_pci: Update vfio_pci to use vfio_info_add_capability()

2016-11-16 Thread Kirti Wankhede

Update msix_sparse_mmap_cap() to use vfio_info_add_capability()
Update region type capability to use vfio_info_add_capability()

Signed-off-by: Kirti Wankhede 
Signed-off-by: Neo Jia 
Change-Id: I52bb28c7875a6da5a79ddad1843e6088aff58a45
---
 drivers/vfio/pci/vfio_pci.c | 49 ++---
 1 file changed, 19 insertions(+), 30 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 031bc08d000d..03b5434f4d5b 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -558,10 +558,9 @@ static int vfio_pci_for_each_slot_or_bus(struct pci_dev 
*pdev,
 static int msix_sparse_mmap_cap(struct vfio_pci_device *vdev,
struct vfio_info_cap *caps)
 {
-   struct vfio_info_cap_header *header;
struct vfio_region_info_cap_sparse_mmap *sparse;
size_t end, size;
-   int nr_areas = 2, i = 0;
+   int nr_areas = 2, i = 0, ret;
 
end = pci_resource_len(vdev->pdev, vdev->msix_bar);
 
@@ -572,13 +571,10 @@ static int msix_sparse_mmap_cap(struct vfio_pci_device 
*vdev,
 
size = sizeof(*sparse) + (nr_areas * sizeof(*sparse->areas));
 
-   header = vfio_info_cap_add(caps, size,
-  VFIO_REGION_INFO_CAP_SPARSE_MMAP, 1);
-   if (IS_ERR(header))
-   return PTR_ERR(header);
+   sparse = kzalloc(size, GFP_KERNEL);
+   if (!sparse)
+   return -ENOMEM;
 
-   sparse = container_of(header,
- struct vfio_region_info_cap_sparse_mmap, header);
sparse->nr_areas = nr_areas;
 
if (vdev->msix_offset & PAGE_MASK) {
@@ -594,26 +590,11 @@ static int msix_sparse_mmap_cap(struct vfio_pci_device 
*vdev,
i++;
}
 
-   return 0;
-}
-
-static int region_type_cap(struct vfio_pci_device *vdev,
-  struct vfio_info_cap *caps,
-  unsigned int type, unsigned int subtype)
-{
-   struct vfio_info_cap_header *header;
-   struct vfio_region_info_cap_type *cap;
-
-   header = vfio_info_cap_add(caps, sizeof(*cap),
-  VFIO_REGION_INFO_CAP_TYPE, 1);
-   if (IS_ERR(header))
-   return PTR_ERR(header);
-
-   cap = container_of(header, struct vfio_region_info_cap_type, header);
-   cap->type = type;
-   cap->subtype = subtype;
+   ret = vfio_info_add_capability(caps, VFIO_REGION_INFO_CAP_SPARSE_MMAP,
+  sparse);
+   kfree(sparse);
 
-   return 0;
+   return ret;
 }
 
 int vfio_pci_register_dev_region(struct vfio_pci_device *vdev,
@@ -752,6 +733,9 @@ static long vfio_pci_ioctl(void *device_data,
 
break;
default:
+   {
+   struct vfio_region_info_cap_type cap_type;
+
if (info.index >=
VFIO_PCI_NUM_REGIONS + vdev->num_regions)
return -EINVAL;
@@ -762,11 +746,16 @@ static long vfio_pci_ioctl(void *device_data,
info.size = vdev->region[i].size;
info.flags = vdev->region[i].flags;
 
-   ret = region_type_cap(vdev, ,
- vdev->region[i].type,
- vdev->region[i].subtype);
+   cap_type.type = vdev->region[i].type;
+   cap_type.subtype = vdev->region[i].subtype;
+
+   ret = vfio_info_add_capability(,
+ VFIO_REGION_INFO_CAP_TYPE,
+ _type);
if (ret)
return ret;
+
+   }
}
 
if (caps.size) {
-- 
2.7.0

Re: [Qemu-devel] [PATCH v2] hw/isa/lpc_ich9: inject SMI on all VCPUs if APM_STS == 'Q'

2016-11-16 Thread Michael S. Tsirkin

On Wed, Nov 16, 2016 at 07:03:27PM +0100, Laszlo Ersek wrote:
> On 11/16/16 15:05, Paolo Bonzini wrote:
> > 
> > 
> > On 16/11/2016 14:18, Michael S. Tsirkin wrote:
> >>> - we could have another magic 0xB2 value, which is implemented directly
> >>> in QEMU and sets 0xB3 to a magic value.  Then OVMF can invoke it
> >>> after SMBASE relocation and SMM IPL (so as not to crash on old QEMUs)
> >>> to detect the new feature.  It can fail to start if using traditional
> >>> AP and the new feature is not there.
> >>
> >> If we keep collecting these magic values, should architect it
> >> and do a host/guest bitmap like virtio does?
> > 
> > The value written in 0xB3 can certainly be a feature bitmap.  For now we
> > would have for example
> > 
> > bit 0   if set, writing 0x10-0xFF to 0xB2 results in a broadcast SMI
> > bit 1-7 zero
> 
> Doable, but:
> - doesn't address how OVMF learns about the broadcast SMI availability,
> - the command value OVMF currently writes is 0.
> 
> How about this:
> - etc/smi/features is the LE uint64_t bitmap proposed earlier, bit#0
> stands for broadcast SMI availability
> - 0xB2 is the command value (independent of 0xB3)
> - 0XB3 is a guest feature bitmap (valid for the next request). SeaBIOS
> reserves bit#0 already (uses values 0 and 1), so we can use the
> remaining 7 bits for requesting features. Bit#1 (value 2) could be the
> broadcast SMI.
> 
> This does resemble a kind of feature negotiation, except the host cannot
> signal back an error (unsupported combination of features), like
> virtio-1.0 can. We can make QEMU abort in that case, or ignore the flags.
> 
> Thanks
> Laszlo

I think that if you are going to do it, do it like 1.0:
- same bitmap for host and guest. how about a writeable fw cfg file?
- use 0XB3 bit for FEATURES_OK

-- 
MST

Re: [Qemu-devel] dpdk/vpp and cross-version migration for vhost

2016-11-16 Thread Maxime Coquelin


Hi Michael,

On 10/13/2016 07:50 PM, Michael S. Tsirkin wrote:

Hi!
So it looks like we face a problem with cross-version
migration when using vhost. It's not new but became more
acute with the advent of vhost user.

For users to be able to migrate between different versions
of the hypervisor the interface exposed to guests
by hypervisor must stay unchanged.

The problem is that a qemu device is connected
to a backend in another process, so the interface
exposed to guests depends on the capabilities of that
process.

Specifically, for vhost user interface based on virtio, this includes
the "host features" bitmap that defines the interface, as well as more
host values such as the max ring size.  Adding new features/changing
values to this interface is required to make progress, but on the other
hand we need ability to get the old host features to be compatible.

To solve this problem within qemu, qemu has a versioning system based on
a machine type concept which fundamentally is a version string, by
specifying that string one can get hardware compatible with a previous
qemu version. QEMU also reports the latest version and list of versions
supported so libvirt records the version at VM creation and then is
careful to use this machine version whenever it migrates a VM.

One might wonder how is this solved with a kernel vhost backend. The
answer is that it mostly isn't - instead an assumption is made, that
qemu versions are deployed together with the kernel - this is generally
true for downstreams.  Thus whenever qemu gains a new feature, it is
already supported by the kernel as well.  However, if one attempts
migration with a new qemu from a system with a new to old kernel, one
would get a failure.

In the world where we have multiple userspace backends, with some of
these supplied by ISVs, this seems non-realistic.

IMO we need to support vhost backend versioning, ideally
in a way that will also work for vhost kernel backends.

So I'd like to get some input from both backend and management
developers on what a good solution would look like.

If we want to emulate the qemu solution, this involves adding the
concept of interface versions to dpdk.  For example, dpdk could supply a
file (or utility printing?) with list of versions: latest and versions
supported. libvirt could read that and


So if I understand correctly, it would be generated at build time?
One problem I see is that the DPDK's vhost-user lib API provides a way
to disable features:
"
rte_vhost_feature_disable/rte_vhost_feature_enable(feature_mask)

This function disables/enables some features. For example, it can be 
used to disable mergeable buffers and TSO features, which both are 
enabled by default.

"

I think we should not have this capability on host side, it should be
guest's decision to use or not some features, and if it has to be done
on host, QEMU already provides a way to disable features (moreover
per-device, which is not the case with rte_vhost_feature_disable).
IMHO, we should consider deprecating this API in v17.02.

That said, the API is here, and it would break migration if the version
file advertises some features the vSwitch has disabled at runtime.


- store latest version at vm creation
- pass it around with the vm
- pass it to qemu
From here, qemu could pass this over the vhost-user channel,
thus making sure it's initialized with the correct
compatible interface.


Using vhost-user protocol features I guess?


As version here is an opaque string for libvirt and qemu,
anything can be used - but I suggest either a list
of values defining the interface, e.g.
any_layout=on,max_ring=256
or a version including the name and vendor of the backend,
e.g. "org.dpdk.v4.5.6".


I think the first option provides more flexibility.
For example, we could imagine migrating from a process using DPDK's
vhost-user lib, to another process using its own implementation (VPP
has its own implementation currently if I'm not mistaken).
Maybe this scenario does not make sense, but in this case, exposing
values directly would avoid the need for synchronization between
vhost-user implementations.



Note that typically the list of supported versions can only be
extended, not shrunk. Also, if the host/guest interface
does not change, don't change the current version as
this just creates work for everyone.

Thoughts? Would this work well for management? dpdk? vpp?


One thing I'm not clear is how it will work for the MTU feature, if the
process it is migrated to exposes a larger MTU that the guest doesn't
support (if it has sized receive buffers to pre-migration MTU for
example).

Thanks,
Maxime

Re: [Qemu-devel] [PATCH v6 0/3] IOMMU: intel_iommu support map and unmap notifications

2016-11-16 Thread Michael S. Tsirkin

On Wed, Nov 16, 2016 at 01:03:14PM -0700, Alex Williamson wrote:
> On Wed, 16 Nov 2016 21:50:46 +0200
> "Aviv B.D."  wrote:
> 
> > On Wed, Nov 16, 2016 at 5:34 PM, Alex Williamson  > > wrote:  
> > 
> > > On Wed, 16 Nov 2016 15:54:56 +0200
> > > "Michael S. Tsirkin"  wrote:
> > >  
> > > > On Thu, Nov 10, 2016 at 12:44:47PM -0700, Alex Williamson wrote:  
> > > > > On Thu, 10 Nov 2016 21:20:36 +0200
> > > > > "Michael S. Tsirkin"  wrote:
> > > > >  
> > > > > > On Thu, Nov 10, 2016 at 09:04:13AM -0700, Alex Williamson wrote:  
> > > > > > > On Thu, 10 Nov 2016 17:54:35 +0200
> > > > > > > "Michael S. Tsirkin"  wrote:
> > > > > > >  
> > > > > > > > On Thu, Nov 10, 2016 at 08:30:21AM -0700, Alex Williamson 
> > > > > > > > wrote:  
> > > > > > > > > On Thu, 10 Nov 2016 17:14:24 +0200
> > > > > > > > > "Michael S. Tsirkin"  wrote:
> > > > > > > > >  
> > > > > > > > > > On Tue, Nov 08, 2016 at 01:04:21PM +0200, Aviv B.D wrote:  
> > > > > > > > > > > From: "Aviv Ben-David" 
> > > > > > > > > > >
> > > > > > > > > > > * Advertize Cache Mode capability in iommu cap register.
> > > > > > > > > > >   This capability is controlled by "cache-mode" property  
> > > of intel-iommu device.  
> > > > > > > > > > >   To enable this option call QEMU with "-device  
> > > intel-iommu,cache-mode=true".  
> > > > > > > > > > >
> > > > > > > > > > > * On page cache invalidation in intel vIOMMU, check if 
> > > > > > > > > > > the  
> > > domain belong to  
> > > > > > > > > > >   registered notifier, and notify accordingly.  
> > > > > > > > > >
> > > > > > > > > > This looks sane I think. Alex, care to comment?
> > > > > > > > > > Merging will have to wait until after the release.
> > > > > > > > > > Pls remember to re-test and re-ping then.  
> > > > > > > > >
> > > > > > > > > I don't think it's suitable for upstream until there's a  
> > > reasonable  
> > > > > > > > > replay mechanism  
> > > > > > > >
> > > > > > > > Could you pls clarify what do you mean by replay?
> > > > > > > > Is this when you attach a device by hotplug to
> > > > > > > > a running system?
> > > > > > > >
> > > > > > > > If yes this can maybe be addressed by disabling hotplug  
> > > temporarily.  
> > > > > > >
> > > > > > > No, hotplug is not required, moving a device between existing  
> > > domains  
> > > > > > > requires replay, ie. actually using it for nested device  
> > > assignment.  
> > > > > >
> > > > > > Good point, that one is a correctness thing. Aviv,
> > > > > > could you add this in TODO list in a cover letter pls?
> > > > > >  
> > > > > > > > > and we straighten out whether it's expected to get
> > > > > > > > > multiple notifies and the notif-ee is responsible for 
> > > > > > > > > filtering
> > > > > > > > > them or if the notif-er should do filtering.  
> > > > > > > >
> > > > > > > > OK this is a documentation thing.  
> > > > > > >
> > > > > > > Well no, it needs to be decided and if necessary implemented.  
> > > > > >
> > > > > > Let's assume it's the notif-ee for now. Less is more and all that.  
> > > > >
> > > > > I think this is opposite of the approach dwg suggested.
> > > > >  
> > > > > > > > >  Without those, this is
> > > > > > > > > effectively just an RFC.  
> > > > > > > >
> > > > > > > > It's infrastructure without users so it doesn't break things,
> > > > > > > > I'm more interested in seeing whether it's broken in
> > > > > > > > some way than whether it's complete.  
> > > > > > >
> > > > > > > If it allows use with vfio but doesn't fully implement the  
> > > complete set  
> > > > > > > of interfaces, it does break things.  We currently prevent viommu 
> > > > > > >  
> > > usage  
> > > > > > > with vfio because it is incomplete.  
> > > > > >
> > > > > > Right - that bit is still in as far as I can see.  
> > > > >
> > > > > Nope, 3/3 changes vtd_iommu_notify_flag_changed() to allow use with
> > > > > vfio even though it's still incomplete.  We would at least need
> > > > > something like a replay callback for VT-d that triggers an abort if 
> > > > > you
> > > > > still want to accept it incomplete.  Thanks,
> > > > >
> > > > > Alex  
> > > >
> > > > IIUC practically things seems to work, right?  
> > >
> > > AFAIK, no.
> > >  
> > > > So how about disabling by default with a flag for people that want to
> > > > experiment with it?
> > > > E.g. x-vfio-allow-broken-translations ?  
> > >
> > > We've already been through one round of "intel-iommu is incomplete for
> > > use with device assignment, how can we prevent it from being used",
> > > which led to the notify_flag_changed callback on MemoryRegionIOMMUOps.
> > > This series now claims to fix that yet still doesn't provide a
> > > mechanism to do memory_region_iommu_replay() given that VT-d has a much
> > > larger address width.  Why is the onus on vfio to resolve this or
> > > provide some sort of

Re: [Qemu-devel] [PATCH for-2.8 0/3] virtio fixes

2016-11-16 Thread Michael S. Tsirkin

On Wed, Nov 16, 2016 at 03:03:13PM -0500, Farhan Ali wrote:
> Hi Paolo,
> 
> I was able to test your patches in our s390 environment. I don't see the
> qemu crashes anymore which I noticed before.
> 
> Testing a guest running high stress I/O workload, without iothreads does
> show a delay in guest response time.

Compared to which version?

> But running
> 
> the same test with iothreads seems to solve the issue.
> 
> Tested-by : Farhan Ali 

Could you also test just patches 1 and 2 pls?

> 
> Thank you
> 
> Farhan
> 
> 
> On 11/16/2016 02:50 PM, Christian Borntraeger wrote:
> > On 11/15/2016 02:46 PM, Paolo Bonzini wrote:
> > > Patch 1 fixes vhost, patches 2-3 fix Windows hibernation.
> > > 
> > > Paolo
> > > 
> > > Paolo Bonzini (3):
> > >virtio: introduce grab/release_ioeventfd to fix vhost
> > >virtio: access ISR atomically
> > >virtio: set ISR on dataplane notifications
> > > 
> > >   hw/block/dataplane/virtio-blk.c |  4 +--
> > >   hw/scsi/virtio-scsi-dataplane.c |  7 --
> > >   hw/scsi/virtio-scsi.c   |  2 +-
> > >   hw/virtio/trace-events  |  2 +-
> > >   hw/virtio/vhost.c   | 11 +++--
> > >   hw/virtio/virtio-bus.c  | 54 
> > > -
> > >   hw/virtio/virtio-mmio.c |  6 ++---
> > >   hw/virtio/virtio-pci.c  |  9 +++
> > >   hw/virtio/virtio.c  | 46 ---
> > >   include/hw/virtio/virtio-bus.h  | 14 +++
> > >   include/hw/virtio/virtio-scsi.h |  1 -
> > >   include/hw/virtio/virtio.h  |  4 ++-
> > >   12 files changed, 110 insertions(+), 50 deletions(-)
> > > 
> > Farhan,
> > 
> > it was this mail thread.

Re: [Qemu-devel] [PATCH v2] hw/isa/lpc_ich9: inject SMI on all VCPUs if APM_STS == 'Q'

2016-11-16 Thread Michael S. Tsirkin

On Wed, Nov 16, 2016 at 06:37:30PM +0100, Laszlo Ersek wrote:
> On 11/16/16 13:47, Paolo Bonzini wrote:
> > 
> >> If the consensus is that the patch is a QEMU bugfix (as opposed to a
> >> feature) and that it is eligible for the currently supported upstream
> >> stable branches, that's the best, no doubt.
> > 
> > The currently supported upstream stable branches is just 2.7. :)
> > 
> > I'm okay with bending the rules and including it in 2.8, but it's
> > worrisome that you also needed to go back from relaxed to traditional
> > delivery, meaning that old QEMU + new OVMF will take ages to boot.
> > 
> > If this is the case, I still think this needs some kind of discovery
> > mechanism, unless OVMF can just say "things were too broken, stop
> > supporting SMM on QEMUs older than 2.8".
> > 
> > For example:
> > 
> > - OVMF should keep on using 0x00 (no broadcast) if the relaxed AP
> > setting is used for the PCD; this would be backwards compatibility mode.
> 
> Okay, but this still means that the PCD has to become dynamic, and we
> must set the PCD earlier (likely in PlatformPei) based on something.
> 
> I guess that's what the next paragraph is about:
> 
> > - we could have another magic 0xB2 value, which is implemented directly
> > in QEMU and sets 0xB3 to a magic value.  Then OVMF can invoke it
> > after SMBASE relocation and SMM IPL (so as not to crash on old QEMUs)
> > to detect the new feature.  It can fail to start if using traditional
> > AP and the new feature is not there.
> 
> Please explain in more detail. If I write to 0xB2 (by invoking the
> Trigger() method or somehow else), then on old QEMU's that will raise a
> sync / unicast SMI. The SMI handler in edk2 will run, but no request
> parameters will have been set up by OVMF, so the SMI handler will do...
> no clue what. I don't think this is a good idea.
> 
> My preference is fw_cfg ATM. It provides a prove, flexible and
> extensible interface (it's easy to add new files for future features).
> If we expect more knobs in the area, I can modify my proposal to use
> "etc/smi/broadcast", so we can add "etc/smi/" later.
> 
> Do you have any specific arguments against fw_cfg? As I suggested in my
> previous email, with fw_cfg I can implement the change in OVMF such that
> the default behavior wouldn't change -- the default delivery would
> remain relaxed, and the broadcast wouldn't be requested, unless the
> fw_cfg file told OVMF otherwise.

Only thing is, I think it's a good idea in the future to be able
to build OVMF without legacy QEMU support. E.g. there are all
people that want to speed up boot.
Add some ifdefs in code for that?
And add comments to document which version needs these hacks.



> > By the way, in case OVMF needs to use SmmSwDispatch in the future, I
> > would make QEMU use broadcast behavior for all values in the 0x10-0xff
> > range, or something like that.
> 
> Are we talking control/command (0xB2) or scratch/data (0xB3) register
> values? My patches currently use the scratch/data register to provide
> the hint to QEMU; that register is less likely to interfere with
> anything the SMM core in edk2 does. I seem to recall that SmmSwDispatch
> uses command/control values to distinguish the called functions. Should
> we keep the broadcast / unicast decision separate from the
> control/command value ?
> 
> Thanks
> Laszlo
> 
> > 
> > Paolo
> > 
> >> For reference, the OVMF documentation recommends QEMU 2.5+ for SMM. The
> >> SMM enablement in libvirt enforces QEMU 2.4+. (Libvirt is actually
> >> correct; when I was writing the OVMF docs, I must have misunderstood the
> >> requirements and needlessly required 2.5+; 2.4+ should have been fine.)
> >>
> >> Which means the fix should be backported as far as stable-2.4.
> >>
> >> Should we proceed with that? CC'ing Mike Roth and the stable list.
> >>
> >> Thanks!
> >> Laszlo
> >>
> >>>
> >>>
> >
> > Paolo
> >
> >> ---
> >>  hw/isa/lpc_ich9.c | 12 +++-
> >>  1 file changed, 11 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
> >> index 10d1ee8b9310..f2fe644fdaa4 100644
> >> --- a/hw/isa/lpc_ich9.c
> >> +++ b/hw/isa/lpc_ich9.c
> >> @@ -372,6 +372,8 @@ void ich9_lpc_pm_init(PCIDevice *lpc_pci, bool
> >> smm_enabled)
> >>  
> >>  /* APM */
> >>  
> >> +#define QEMU_ICH9_APM_STS_BROADCAST_SMI 'Q'
> >> +
> >>  static void ich9_apm_ctrl_changed(uint32_t val, void *arg)
> >>  {
> >>  ICH9LPCState *lpc = arg;
> >> @@ -386,7 +388,15 @@ static void ich9_apm_ctrl_changed(uint32_t val,
> >> void *arg)
> >>  
> >>  /* SMI_EN = PMBASE + 30. SMI control and enable register */
> >>  if (lpc->pm.smi_en & ICH9_PMIO_SMI_EN_APMC_EN) {
> >> -cpu_interrupt(current_cpu, CPU_INTERRUPT_SMI);
> >> +if (lpc->apm.apms == QEMU_ICH9_APM_STS_BROADCAST_SMI) {
> >> +CPUState *cs;
> >> +
> >> +

Re: [Qemu-devel] [PATCH 3/3] virtio: set ISR on dataplane notifications

2016-11-16 Thread Michael S. Tsirkin

On Wed, Nov 16, 2016 at 07:05:51PM +0100, Paolo Bonzini wrote:
> Dataplane has been omitting forever the step of setting ISR when
> an interrupt is raised.  This caused little breakage, because the
> specification actually says that ISR may not be updated in MSI mode.
> 
> Some versions of the Windows drivers however didn't clear MSI mode
> correctly, and proceeded using polling mode (using ISR, not the used
> ring index!) for crashdump and hibernation.  If it were just crashdump
> and hibernation it would not be a big deal, but recent releases of
> Windows do not really shut down, but rather log out and hibernate to
> make the next startup faster.  Hence, this manifested as a more serious
> hang during shutdown with e.g. Windows 8.1 and virtio-win 1.8.0 RPMs.
> Newer versions fixed this, while older versions do not use MSI at all.
> 
> The failure has always been there for virtio dataplane, but it became
> visible after commits 9ffe337 ("virtio-blk: always use dataplane path
> if ioeventfd is active", 2016-10-30) and ad07cd6 ("virtio-scsi: always
> use dataplane path if ioeventfd is active", 2016-10-30) made virtio-blk
> and virtio-scsi always use the dataplane code under KVM.  The good news
> therefore is that it was not a bug in the patches---they were doing
> exactly what they were meant for, i.e. shake out remaining dataplane bugs.
> 
> The fix is not hard, so it's worth arranging for the broken drivers.
> The virtio_should_notify+event_notifier_set pair that is common to
> virtio-blk and virtio-scsi dataplane is replaced with a new public
> function virtio_notify_irqfd that also sets ISR.  The irqfd emulation
> code now need not set ISR anymore, so virtio_irq is removed.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  hw/block/dataplane/virtio-blk.c |  4 +---
>  hw/scsi/virtio-scsi-dataplane.c |  7 ---
>  hw/scsi/virtio-scsi.c   |  2 +-
>  hw/virtio/trace-events  |  2 +-
>  hw/virtio/virtio.c  | 20 
>  include/hw/virtio/virtio-scsi.h |  1 -
>  include/hw/virtio/virtio.h  |  2 +-
>  7 files changed, 16 insertions(+), 22 deletions(-)
> 
> diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
> index 90ef557..d1f9f63 100644
> --- a/hw/block/dataplane/virtio-blk.c
> +++ b/hw/block/dataplane/virtio-blk.c
> @@ -68,9 +68,7 @@ static void notify_guest_bh(void *opaque)
>  unsigned i = j + ctzl(bits);
>  VirtQueue *vq = virtio_get_queue(s->vdev, i);
>  
> -if (virtio_should_notify(s->vdev, vq)) {
> -event_notifier_set(virtio_queue_get_guest_notifier(vq));
> -}
> +virtio_notify_irqfd(s->vdev, vq);
>  
>  bits &= bits - 1; /* clear right-most bit */
>  }
> diff --git a/hw/scsi/virtio-scsi-dataplane.c b/hw/scsi/virtio-scsi-dataplane.c
> index f2ea29d..6b8d0f0 100644
> --- a/hw/scsi/virtio-scsi-dataplane.c
> +++ b/hw/scsi/virtio-scsi-dataplane.c
> @@ -95,13 +95,6 @@ static int virtio_scsi_vring_init(VirtIOSCSI *s, VirtQueue 
> *vq, int n,
>  return 0;
>  }
>  
> -void virtio_scsi_dataplane_notify(VirtIODevice *vdev, VirtIOSCSIReq *req)
> -{
> -if (virtio_should_notify(vdev, req->vq)) {
> -event_notifier_set(virtio_queue_get_guest_notifier(req->vq));
> -}
> -}
> -
>  /* assumes s->ctx held */
>  static void virtio_scsi_clear_aio(VirtIOSCSI *s)
>  {
> diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
> index 3e5ae6a..10fd687 100644
> --- a/hw/scsi/virtio-scsi.c
> +++ b/hw/scsi/virtio-scsi.c
> @@ -69,7 +69,7 @@ static void virtio_scsi_complete_req(VirtIOSCSIReq *req)
>  qemu_iovec_from_buf(>resp_iov, 0, >resp, req->resp_size);
>  virtqueue_push(vq, >elem, req->qsgl.size + req->resp_iov.size);
>  if (s->dataplane_started && !s->dataplane_fenced) {
> -virtio_scsi_dataplane_notify(vdev, req);
> +virtio_notify_irqfd(vdev, vq);
>  } else {
>  virtio_notify(vdev, vq);
>  }
> diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> index 8756cef..7b6f55e 100644
> --- a/hw/virtio/trace-events
> +++ b/hw/virtio/trace-events
> @@ -5,7 +5,7 @@ virtqueue_fill(void *vq, const void *elem, unsigned int len, 
> unsigned int idx) "
>  virtqueue_flush(void *vq, unsigned int count) "vq %p count %u"
>  virtqueue_pop(void *vq, void *elem, unsigned int in_num, unsigned int 
> out_num) "vq %p elem %p in_num %u out_num %u"
>  virtio_queue_notify(void *vdev, int n, void *vq) "vdev %p n %d vq %p"
> -virtio_irq(void *vq) "vq %p"
> +virtio_notify_irqfd(void *vdev, void *vq) "vdev %p vq %p"
>  virtio_notify(void *vdev, void *vq) "vdev %p vq %p"
>  virtio_set_status(void *vdev, uint8_t val) "vdev %p val %u"
>  
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index ecf13bd..860ebdb 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -1326,13 +1326,6 @@ static void virtio_set_isr(VirtIODevice *vdev, int 
> value)
>  }
>  }
>  
> -void

[Qemu-devel] [PATCH 4/4] target-ppc: Implement bcdsetsgn. instruction

2016-11-16 Thread Jose Ricardo Ziviani

bcdsetsgn.: Decimal set sign. This instruction copies the register
value to the result register but adjust the signal according to
the preferred sign value.

Signed-off-by: Jose Ricardo Ziviani 
---
 target-ppc/helper.h | 1 +
 target-ppc/int_helper.c | 9 +
 target-ppc/translate/vmx-impl.inc.c | 8 
 3 files changed, 18 insertions(+)

diff --git a/target-ppc/helper.h b/target-ppc/helper.h
index dada48e..cddac8e 100644
--- a/target-ppc/helper.h
+++ b/target-ppc/helper.h
@@ -385,6 +385,7 @@ DEF_HELPER_3(bcdctz, i32, avr, avr, i32)
 DEF_HELPER_3(bcdcfsq, i32, avr, avr, i32)
 DEF_HELPER_3(bcdctsq, i32, avr, avr, i32)
 DEF_HELPER_4(bcdcpsgn, i32, avr, avr, avr, i32)
+DEF_HELPER_3(bcdsetsgn, i32, avr, avr, i32)
 
 DEF_HELPER_2(xsadddp, void, env, i32)
 DEF_HELPER_2(xssubdp, void, env, i32)
diff --git a/target-ppc/int_helper.c b/target-ppc/int_helper.c
index a215bfe..38af503 100644
--- a/target-ppc/int_helper.c
+++ b/target-ppc/int_helper.c
@@ -2991,6 +2991,15 @@ uint32_t helper_bcdcpsgn(ppc_avr_t *r, ppc_avr_t *a, 
ppc_avr_t *b, uint32_t ps)
 return cr;
 }
 
+uint32_t helper_bcdsetsgn(ppc_avr_t *r, ppc_avr_t *b, uint32_t ps)
+{
+int sgnb = bcd_get_sgn(b);
+ppc_avr_t ret = { .u64 = { 0, 0 } };
+
+bcd_put_digit(, bcd_preferred_sgn(sgnb, ps), 0);
+return helper_bcdcpsgn(r, b, , ps);
+}
+
 void helper_vsbox(ppc_avr_t *r, ppc_avr_t *a)
 {
 int i;
diff --git a/target-ppc/translate/vmx-impl.inc.c 
b/target-ppc/translate/vmx-impl.inc.c
index c14b666..b188e60 100644
--- a/target-ppc/translate/vmx-impl.inc.c
+++ b/target-ppc/translate/vmx-impl.inc.c
@@ -991,6 +991,7 @@ GEN_BCD2(bcdcfz)
 GEN_BCD2(bcdctz)
 GEN_BCD2(bcdcfsq)
 GEN_BCD2(bcdctsq)
+GEN_BCD2(bcdsetsgn)
 GEN_BCD(bcdcpsgn);
 
 static void gen_xpnd04_1(DisasContext *ctx)
@@ -1014,6 +1015,9 @@ static void gen_xpnd04_1(DisasContext *ctx)
 case 7:
 gen_bcdcfn(ctx);
 break;
+case 31:
+gen_bcdsetsgn(ctx);
+break;
 default:
 gen_invalid(ctx);
 break;
@@ -1038,12 +1042,16 @@ static void gen_xpnd04_2(DisasContext *ctx)
 case 7:
 gen_bcdcfn(ctx);
 break;
+case 31:
+gen_bcdsetsgn(ctx);
+break;
 default:
 gen_invalid(ctx);
 break;
 }
 }
 
+
 GEN_VXFORM_DUAL(vsubcuw, PPC_ALTIVEC, PPC_NONE, \
 xpnd04_1, PPC_NONE, PPC2_ISA300)
 GEN_VXFORM_DUAL(vsubsws, PPC_ALTIVEC, PPC_NONE, \
-- 
2.7.4

[Qemu-devel] [PATCH] virtio-crypto: fix virtio_queue_set_notification() race

2016-11-16 Thread Stefan Hajnoczi

We must check for new virtqueue buffers after re-enabling notifications.
This prevents the race condition where the guest added buffers just
after we stopped popping the virtqueue but before we re-enabled
notifications.

I think the virtio-crypto code was based on virtio-net but this crucial
detail was missed.  virtio-net does not have the race condition because
it processes the virtqueue one more time after re-enabling
notifications.

Cc: Gonglei 
Signed-off-by: Stefan Hajnoczi 
---
 hw/virtio/virtio-crypto.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/virtio-crypto.c b/hw/virtio/virtio-crypto.c
index 3293843..847dc9d 100644
--- a/hw/virtio/virtio-crypto.c
+++ b/hw/virtio/virtio-crypto.c
@@ -692,8 +692,17 @@ static void virtio_crypto_dataq_bh(void *opaque)
 return;
 }
 
-virtio_crypto_handle_dataq(vdev, q->dataq);
-virtio_queue_set_notification(q->dataq, 1);
+for (;;) {
+virtio_crypto_handle_dataq(vdev, q->dataq);
+virtio_queue_set_notification(q->dataq, 1);
+
+/* Are we done or did the guest add more buffers? */
+if (virtio_queue_empty(q->dataq)) {
+break;
+}
+
+virtio_queue_set_notification(q->dataq, 0);
+}
 }
 
 static void
-- 
2.7.4

Re: [Qemu-devel] [PATCH 3/3] virtio: set ISR on dataplane notifications

2016-11-16 Thread Michael S. Tsirkin

On Wed, Nov 16, 2016 at 03:38:11PM -0500, Paolo Bonzini wrote:
> > > +void virtio_notify_irqfd(VirtIODevice *vdev, VirtQueue *vq)
> > > +{
> > > +if (!virtio_should_notify(vdev, vq)) {
> > > +return;
> > > +}
> > > +
> > > +trace_virtio_notify_irqfd(vdev, vq);
> > > +virtio_set_isr(vq->vdev, 0x1);
> > 
> > So here, I think we need a comment with parts of
> > the commit log.
> > 
> > /*
> >  * virtio spec 1.0 says ISR bit 0 should be ignored with MSI, but
> >  * windows drivers included in virtio-win 1.8.0 (circa 2015)
> >  * for Windows 8.1 only are incorrectly polling this bit during shutdown
>  
> 
> Not sure it's only for Windows 8.1, in fact probably not.

8.1 on shutdown and others on crashdump or hibernation?

> Looks good if you replace this line with
> 
> "are incorrectly polling this bit during crashdump or hibernation"
> 
> Paolo
> 
> >  * in MSI mode, causing a hang if this bit is never updated.
> >  * Next driver release from 2016 fixed this problem, so working around it
> >  * is not a must, but it's easy to do so let's do it here.
> >  *
> >  * Note: it's safe to update ISR from any thread as it was switched
> >  * to an atomic operation.
> >  */
> 
> 
> > 
> > 
> > 
> > > +event_notifier_set(>guest_notifier);
> > > +}
> > > +
> > >  void virtio_notify(VirtIODevice *vdev, VirtQueue *vq)
> > >  {
> > >  if (!virtio_should_notify(vdev, vq)) {
> > > @@ -1990,7 +1994,7 @@ static void
> > > virtio_queue_guest_notifier_read(EventNotifier *n)
> > >  {
> > >  VirtQueue *vq = container_of(n, VirtQueue, guest_notifier);
> > >  if (event_notifier_test_and_clear(n)) {
> > > -virtio_irq(vq);
> > > +virtio_notify_vector(vq->vdev, vq->vector);
> > >  }
> > >  }
> > >  
> > > diff --git a/include/hw/virtio/virtio-scsi.h
> > > b/include/hw/virtio/virtio-scsi.h
> > > index 9fbc7d7..7375196 100644
> > > --- a/include/hw/virtio/virtio-scsi.h
> > > +++ b/include/hw/virtio/virtio-scsi.h
> > > @@ -137,6 +137,5 @@ void virtio_scsi_push_event(VirtIOSCSI *s, SCSIDevice
> > > *dev,
> > >  void virtio_scsi_dataplane_setup(VirtIOSCSI *s, Error **errp);
> > >  int virtio_scsi_dataplane_start(VirtIODevice *s);
> > >  void virtio_scsi_dataplane_stop(VirtIODevice *s);
> > > -void virtio_scsi_dataplane_notify(VirtIODevice *vdev, VirtIOSCSIReq 
> > > *req);
> > >  
> > >  #endif /* QEMU_VIRTIO_SCSI_H */
> > > diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> > > index 835b085..ab0e030 100644
> > > --- a/include/hw/virtio/virtio.h
> > > +++ b/include/hw/virtio/virtio.h
> > > @@ -181,6 +181,7 @@ void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned
> > > int *in_bytes,
> > > unsigned max_in_bytes, unsigned
> > > max_out_bytes);
> > >  
> > >  bool virtio_should_notify(VirtIODevice *vdev, VirtQueue *vq);
> > > +void virtio_notify_irqfd(VirtIODevice *vdev, VirtQueue *vq);
> > >  void virtio_notify(VirtIODevice *vdev, VirtQueue *vq);
> > >  
> > >  void virtio_save(VirtIODevice *vdev, QEMUFile *f);
> > > @@ -280,7 +281,6 @@ void virtio_queue_host_notifier_read(EventNotifier 
> > > *n);
> > >  void virtio_queue_aio_set_host_notifier_handler(VirtQueue *vq, AioContext
> > >  *ctx,
> > >  void (*fn)(VirtIODevice 
> > > *,
> > > VirtQueue *));
> > > -void virtio_irq(VirtQueue *vq);
> > >  VirtQueue *virtio_vector_first_queue(VirtIODevice *vdev, uint16_t 
> > > vector);
> > >  VirtQueue *virtio_vector_next_queue(VirtQueue *vq);
> > >  
> > > --
> > > 2.9.3
> >

[Qemu-devel] [patch v3 09/18] tcg/ppc: Implement field extraction opcodes

2016-11-16 Thread Richard Henderson

Reviewed-by: David Gibson 
Signed-off-by: Richard Henderson 
---
 tcg/ppc/tcg-target.h |  4 ++--
 tcg/ppc/tcg-target.inc.c | 10 ++
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index c765d3e..b42c57a 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -69,7 +69,7 @@ typedef enum {
 #define TCG_TARGET_HAS_nand_i32 1
 #define TCG_TARGET_HAS_nor_i32  1
 #define TCG_TARGET_HAS_deposit_i32  1
-#define TCG_TARGET_HAS_extract_i32  0
+#define TCG_TARGET_HAS_extract_i32  1
 #define TCG_TARGET_HAS_sextract_i32 0
 #define TCG_TARGET_HAS_movcond_i32  1
 #define TCG_TARGET_HAS_mulu2_i320
@@ -102,7 +102,7 @@ typedef enum {
 #define TCG_TARGET_HAS_nand_i64 1
 #define TCG_TARGET_HAS_nor_i64  1
 #define TCG_TARGET_HAS_deposit_i64  1
-#define TCG_TARGET_HAS_extract_i64  0
+#define TCG_TARGET_HAS_extract_i64  1
 #define TCG_TARGET_HAS_sextract_i64 0
 #define TCG_TARGET_HAS_movcond_i64  1
 #define TCG_TARGET_HAS_add2_i64 1
diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index a3262cf..7ec54a2 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -2396,6 +2396,14 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, 
const TCGArg *args,
 }
 break;
 
+case INDEX_op_extract_i32:
+tcg_out_rlw(s, RLWINM, args[0], args[1],
+32 - args[2], 32 - args[3], 31);
+break;
+case INDEX_op_extract_i64:
+tcg_out_rld(s, RLDICL, args[0], args[1], 64 - args[2], 64 - args[3]);
+break;
+
 case INDEX_op_movcond_i32:
 tcg_out_movcond(s, TCG_TYPE_I32, args[5], args[0], args[1], args[2],
 args[3], args[4], const_args[2]);
@@ -2530,6 +2538,7 @@ static const TCGTargetOpDef ppc_op_defs[] = {
 { INDEX_op_movcond_i32, { "r", "r", "ri", "rZ", "rZ" } },
 
 { INDEX_op_deposit_i32, { "r", "0", "rZ" } },
+{ INDEX_op_extract_i32, { "r", "r" } },
 
 { INDEX_op_muluh_i32, { "r", "r", "r" } },
 { INDEX_op_mulsh_i32, { "r", "r", "r" } },
@@ -2585,6 +2594,7 @@ static const TCGTargetOpDef ppc_op_defs[] = {
 { INDEX_op_movcond_i64, { "r", "r", "ri", "rZ", "rZ" } },
 
 { INDEX_op_deposit_i64, { "r", "0", "rZ" } },
+{ INDEX_op_extract_i64, { "r", "r" } },
 
 { INDEX_op_mulsh_i64, { "r", "r", "r" } },
 { INDEX_op_muluh_i64, { "r", "r", "r" } },
-- 
2.7.4

[Qemu-devel] [PATCH 2/4] target-ppc: Implement bcdctsq. instruction

2016-11-16 Thread Jose Ricardo Ziviani

bcdctsq.: Decimal convert to signed quadword. It is possible to
convert packed decimal values to signed quadwords.

Signed-off-by: Jose Ricardo Ziviani 
---
 target-ppc/helper.h |  1 +
 target-ppc/int_helper.c | 39 +
 target-ppc/translate/vmx-impl.inc.c |  7 +++
 3 files changed, 47 insertions(+)

diff --git a/target-ppc/helper.h b/target-ppc/helper.h
index 87f533c..503f257 100644
--- a/target-ppc/helper.h
+++ b/target-ppc/helper.h
@@ -383,6 +383,7 @@ DEF_HELPER_3(bcdctn, i32, avr, avr, i32)
 DEF_HELPER_3(bcdcfz, i32, avr, avr, i32)
 DEF_HELPER_3(bcdctz, i32, avr, avr, i32)
 DEF_HELPER_3(bcdcfsq, i32, avr, avr, i32)
+DEF_HELPER_3(bcdctsq, i32, avr, avr, i32)
 
 DEF_HELPER_2(xsadddp, void, env, i32)
 DEF_HELPER_2(xssubdp, void, env, i32)
diff --git a/target-ppc/int_helper.c b/target-ppc/int_helper.c
index db65a51..1025438 100644
--- a/target-ppc/int_helper.c
+++ b/target-ppc/int_helper.c
@@ -2922,6 +2922,45 @@ uint32_t helper_bcdcfsq(ppc_avr_t *r, ppc_avr_t *b, 
uint32_t ps)
 return cr;
 }
 
+uint32_t helper_bcdctsq(ppc_avr_t *r, ppc_avr_t *b, uint32_t ps)
+{
+uint8_t i;
+int cr = 0;
+uint64_t hi = 0;
+int sgnb = bcd_get_sgn(b);
+int invalid = (sgnb == 0);
+ppc_avr_t ret = { .u64 = { 0, 0 } };
+
+ret.u64[LO_IDX] = bcd_get_digit(b, 31, );
+for (i = 30; i > 0; i--) {
+mulu64([LO_IDX], ,
+ret.u64[LO_IDX], 10ULL);
+
+ret.u64[HI_IDX] = (ret.u64[HI_IDX]) ? ret.u64[HI_IDX] * 10 + hi : hi;
+ret.u64[LO_IDX] += bcd_get_digit(b, i, );
+
+if (unlikely(invalid)) {
+break;
+}
+}
+
+if (sgnb == -1) {
+if (ret.s64[HI_IDX] > 0) {
+ret.s64[HI_IDX] = -ret.s64[HI_IDX];
+} else {
+ret.s64[LO_IDX] = -ret.s64[LO_IDX];
+}
+}
+
+cr = bcd_cmp_zero(b);
+
+if (unlikely(invalid)) {
+cr = 1 << CRF_SO;
+}
+
+return cr;
+}
+
 void helper_vsbox(ppc_avr_t *r, ppc_avr_t *a)
 {
 int i;
diff --git a/target-ppc/translate/vmx-impl.inc.c 
b/target-ppc/translate/vmx-impl.inc.c
index 36141e5..1579b58 100644
--- a/target-ppc/translate/vmx-impl.inc.c
+++ b/target-ppc/translate/vmx-impl.inc.c
@@ -990,10 +990,14 @@ GEN_BCD2(bcdctn)
 GEN_BCD2(bcdcfz)
 GEN_BCD2(bcdctz)
 GEN_BCD2(bcdcfsq)
+GEN_BCD2(bcdctsq)
 
 static void gen_xpnd04_1(DisasContext *ctx)
 {
 switch (opc4(ctx->opcode)) {
+case 0:
+gen_bcdctsq(ctx);
+break;
 case 2:
 gen_bcdcfsq(ctx);
 break;
@@ -1018,6 +1022,9 @@ static void gen_xpnd04_1(DisasContext *ctx)
 static void gen_xpnd04_2(DisasContext *ctx)
 {
 switch (opc4(ctx->opcode)) {
+case 0:
+gen_bcdctsq(ctx);
+break;
 case 2:
 gen_bcdcfsq(ctx);
 break;
-- 
2.7.4

Re: [Qemu-devel] [PATCH v2] hw/isa/lpc_ich9: inject SMI on all VCPUs if APM_STS == 'Q'

2016-11-16 Thread Michael S. Tsirkin

On Wed, Nov 16, 2016 at 01:04:00PM -0500, Paolo Bonzini wrote:
> > I guess that's what the next paragraph is about:
> > 
> > > - we could have another magic 0xB2 value, which is implemented directly
> > > in QEMU and sets 0xB3 to a magic value.  Then OVMF can invoke it
> > > after SMBASE relocation and SMM IPL (so as not to crash on old QEMUs)
> > > to detect the new feature.  It can fail to start if using traditional
> > > AP and the new feature is not there.
> > 
> > Please explain in more detail. If I write to 0xB2 (by invoking the
> > Trigger() method or somehow else), then on old QEMU's that will raise a
> > sync / unicast SMI. The SMI handler in edk2 will run, but no request
> > parameters will have been set up by OVMF, so the SMI handler will do...
> > no clue what.
> 
> It should hopefully do nothing.  A spurious SMI (such as the one caused
> by the write to 0xB2) should not crash OVMF.
> 
> SMBASE relocation uses IPIs, so my hope was to use the
> SmmCpuFeaturesSmmRelocationComplete hook.
> 
> > My preference is fw_cfg ATM. It provides a prove, flexible and
> > extensible interface (it's easy to add new files for future features).
> > If we expect more knobs in the area, I can modify my proposal to use
> > "etc/smi/broadcast", so we can add "etc/smi/" later.
> 
> Did you know there are 16 entries only for fw_cfg files? :)  And we're
> using already 20 in the worst case:
> 
> genroms/linuxboot.bin
> genroms/kvmvapic.bin
> NVDIMM_DSM_MEM_FILE
> "etc/smbios/smbios-tables"
> "etc/smbios/smbios-anchor"
> "etc/acpi/tables"
> "etc/table-loader"
> ACPI_BUILD_TPMLOG_FILE
> ACPI_BUILD_RSDP_FILE
> "etc/e820"
> "etc/msr_feature_control"
> "etc/reserved-memory-end"
> "etc/pvpanic-port"
> "etc/boot-menu-wait"
> "bootsplash.jpg"
> "etc/boot-fail-wait"
> "etc/igd-opregion"
> "etc/igd-bdsm-size"
> "etc/extra-pci-roots"
> "bootorder"
> 
> Therefore, so close to the release I'm a bit worried about doing
> changes to fw_cfg or adding more fw_cfg files.

Indeed. Is an unconditional thing so bad?
What would be the observed behaviour with new OVMF on old QEMU?
Note you need to migrate during boot to notice this.

> Though we just got
> rid of one file for the number of CPUs, so I guess we might not care.
> 
> > Do you have any specific arguments against fw_cfg? As I suggested in my
> > previous email, with fw_cfg I can implement the change in OVMF such that
> > the default behavior wouldn't change -- the default delivery would
> > remain relaxed, and the broadcast wouldn't be requested, unless the
> > fw_cfg file told OVMF otherwise.
> > 
> > > By the way, in case OVMF needs to use SmmSwDispatch in the future, I
> > > would make QEMU use broadcast behavior for all values in the 0x10-0xff
> > > range, or something like that.
> > 
> > Are we talking control/command (0xB2) or scratch/data (0xB3) register
> > values? My patches currently use the scratch/data register to provide
> > the hint to QEMU; that register is less likely to interfere with
> > anything the SMM core in edk2 does.
> 
> Sorry I confused the two registers.  0xb3 is more or less unused as far
> as I can see indeed.
> 
> Paolo

Re: [Qemu-devel] [PATCH] target-m68k: free TCG variables that are not

2016-11-16 Thread Richard Henderson


On 11/15/2016 11:16 PM, Laurent Vivier wrote:

This is a cleanup patch. It adds call to tcg_temp_free()
when it is missing.

Signed-off-by: Laurent Vivier 
---
 target-m68k/translate.c | 41 -
 1 file changed, 32 insertions(+), 9 deletions(-)


Reviewed-by: Richard Henderson 


r~

1 2 3 >

1 - 100 of 268 matches

Mail list logo