RE: [PATCH 3/4] VFIO: pci: amend vfio-pci for explicit binding via sysfs only

2013-10-28 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Wood Scott-B07421
> Sent: Tuesday, October 29, 2013 10:25 AM
> To: Bhushan Bharat-R65777
> Cc: Wood Scott-B07421; Alex Williamson; Kim Phillips; Yoder Stuart-B08248;
> christoffer.d...@linaro.org; linux-ker...@vger.kernel.org;
> a.mota...@virtualopensystems.com; ag...@suse.de; Sethi Varun-B16395;
> peter.mayd...@linaro.org; santosh.shu...@linaro.org; kvm@vger.kernel.org;
> gre...@linuxfoundation.org
> Subject: Re: [PATCH 3/4] VFIO: pci: amend vfio-pci for explicit binding via
> sysfs only
> 
> On Mon, 2013-10-28 at 23:45 -0500, Bhushan Bharat-R65777 wrote:
> >
> > > -Original Message-
> > > From: Wood Scott-B07421
> > > Sent: Tuesday, October 29, 2013 10:05 AM
> > > To: Bhushan Bharat-R65777
> > > Cc: Wood Scott-B07421; Alex Williamson; Kim Phillips; Yoder
> > > Stuart-B08248; christoffer.d...@linaro.org;
> > > linux-ker...@vger.kernel.org; a.mota...@virtualopensystems.com;
> > > ag...@suse.de; Sethi Varun-B16395; peter.mayd...@linaro.org;
> > > santosh.shu...@linaro.org; kvm@vger.kernel.org;
> > > gre...@linuxfoundation.org
> > > Subject: Re: [PATCH 3/4] VFIO: pci: amend vfio-pci for explicit
> > > binding via sysfs only
> > >
> > > On Mon, 2013-10-28 at 23:31 -0500, Bhushan Bharat-R65777 wrote:
> > > >
> > > > > -Original Message-
> > > > > From: Wood Scott-B07421
> > > > > Sent: Tuesday, October 29, 2013 10:00 AM
> > > > > To: Bhushan Bharat-R65777
> > > > > Cc: Wood Scott-B07421; Alex Williamson; Kim Phillips; Yoder
> > > > > Stuart-B08248; christoffer.d...@linaro.org;
> > > > > linux-ker...@vger.kernel.org; a.mota...@virtualopensystems.com;
> > > > > ag...@suse.de; Sethi Varun-B16395; peter.mayd...@linaro.org;
> > > > > santosh.shu...@linaro.org; kvm@vger.kernel.org;
> > > > > gre...@linuxfoundation.org
> > > > > Subject: Re: [PATCH 3/4] VFIO: pci: amend vfio-pci for explicit
> > > > > binding via sysfs only
> > > > >
> > > > > On Mon, 2013-10-28 at 22:52 -0500, Bhushan Bharat-R65777 wrote:
> > > > > > So when ids == NULL it does not check of vendor etc and calls
> > > > > > pci_add_dynid()
> > > > > which in turn calls driver_attach().
> > > > > >
> > > > > > If we change the above loop to break if ids->vendor ==
> > > > > >PCI_ANY_ID && ids- subvendor == PCI_ANY_ID then also we will call
> pci_add_dyids().
> > > > >
> > > > > What problem are you trying to solve?
> > > >
> > > > new_id interface to continue working as before.
> > >
> > > In what specific way does this allow new_id to continue working as
> > > before?  Be verbose.
> >
> >
> > What I observed that this patch (kim's patch) new_id interface stops 
> > working.
> 
> Yes.
> 
> >  This is found to be because store_new_id() checks for pdrv->id_table
> > which is no more NULL, so the below check fails
> 
> I do not think that is the reason.  The reason is because sysfs_bind_only is
> set, and this is not a direct sysfs bind.
> 
> > if (ids) {
> > ^^
> > This is no more NULL, so enter inside the loop
> >
> > retval = -EINVAL;
> > while (ids->vendor || ids->subvendor || ids->class_mask) {
> > if (driver_data == ids->driver_data) {
> > retval = 0;
> > break;
> > }
> > ids++;
> > }
> > if (retval)   /* No match */
> > return retval; ^ This is where it returns
> > as -EINVAL
> 
> Why wouldn't it have broken out of the loop earlier, since driver_data and 
> ids-
> >driver_data should both be zero?  I assume this is with a patch to do
> PCI_ANY_ID in vfio-pci.

hmmm, I am pretty sure I have seen that issue a few time (below is command line 
output) but now I am not getting any error reported. Although device is not 
binding to driver because of sysfs_bind_only as you mentioned (I thought of 
this as a second issue). If I will be able to reproduce the first issue then I 
will let you guys know otherwise there was no first issue :(

root@p5040ds:/sys/bus/pci# echo :01:00.0 > 
devices/\:01\:00.0/driver/unbind
e1000e :01:00.0 eth0: removed PHC
root@p5040ds:/sys/bus/pci# echo 8086 10d3 > drivers/vfio-pci/new_id
-sh: echo: write error: Invalid argument
root@p5040ds:/sys/bus/pci# echo :01:00.0 > drivers/vfio-pci/bind

-Bharat

> 
> -Scott
> 



Re: [PATCH] virtio-scsi: Fix hotcpu_notifier use-after-free with virtscsi_freeze

2013-10-28 Thread Wanlong Gao
On 10/28/2013 04:01 PM, Asias He wrote:
> vqs are freed in virtscsi_freeze but the hotcpu_notifier is not
> unregistered. We will have a use-after-free usage when the notifier
> callback is called after virtscsi_freeze.
> 
> Signed-off-by: Asias He 

Reviewed-by: Wanlong Gao 


> ---
>  drivers/scsi/virtio_scsi.c | 15 ++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
> index 74b88ef..b26f1a5 100644
> --- a/drivers/scsi/virtio_scsi.c
> +++ b/drivers/scsi/virtio_scsi.c
> @@ -957,6 +957,10 @@ static void virtscsi_remove(struct virtio_device *vdev)
>  #ifdef CONFIG_PM
>  static int virtscsi_freeze(struct virtio_device *vdev)
>  {
> + struct Scsi_Host *sh = virtio_scsi_host(vdev);
> + struct virtio_scsi *vscsi = shost_priv(sh);
> +
> + unregister_hotcpu_notifier(&vscsi->nb);
>   virtscsi_remove_vqs(vdev);
>   return 0;
>  }
> @@ -965,8 +969,17 @@ static int virtscsi_restore(struct virtio_device *vdev)
>  {
>   struct Scsi_Host *sh = virtio_scsi_host(vdev);
>   struct virtio_scsi *vscsi = shost_priv(sh);
> + int err;
> +
> + err = virtscsi_init(vdev, vscsi);
> + if (err)
> + return err;
> +
> + err = register_hotcpu_notifier(&vscsi->nb);
> + if (err)
> + vdev->config->del_vqs(vdev);
>  
> - return virtscsi_init(vdev, vscsi);
> + return err;
>  }
>  #endif
>  
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Calling to kvm_mmu_load

2013-10-28 Thread Arthur Chunqi Li
Hi Paolo,

On Fri, Oct 25, 2013 at 8:43 AM, Paolo Bonzini  wrote:
> Il 24/10/2013 08:55, Arthur Chunqi Li ha scritto:
>> Hi Paolo,
>>
>> Thanks for your reply.
>>
>> On Wed, Oct 23, 2013 at 2:21 PM, Paolo Bonzini  wrote:
>>> Il 21/10/2013 08:56, Arthur Chunqi Li ha scritto:
 Hi there,

 I noticed that kvm_mmu_reload() is called every time in vcpu enter,
 and kvm_mmu_load() is called in this function when root_hpa is
 INVALID_PAGE. I get confused why and when root_hpa can be set to
 INVALID_PAGE? I find one condition that if vcpu get request
 KVM_REQ_MMU_RELOAD, kvm_mmu_unload() is called to invalid root_hpa,
 but this condition cannot cover all occasions.
>>>
>>> Look also at mmu_free_roots, kvm_mmu_unload and kvm_mmu_reset_context.
>>> In "normal" cases and without EPT, it should be called when CR3 changes
>>> or when the paging mode changes (32-bit, PAE, 64-bit, no paging).  With
>>> EPT, this kind of change won't reset the MMU (CR3 changes won't cause a
>>> vmexit at all, in fact).
>>
>> When EPT is enabled, why will root_hpa be set to INVALID_PAGE when a
>> VM boots?
>
> Because EPT page tables are only built lazily.  The EPT page tables
> start all-invalid, and are built as the guest accesses pages at new
> guest physical addresses (instead, shadow page tables are built as the
> guest accesses pages at new guest virtual addresses).
>
>> I find that Qemu reset root_hpa with KVM_REQ_MMU_RELOAD
>> request several time when booting a VM, why?
>
> This happens when the memory map changes.  A previously-valid guest
> physical address might become invalid now, and the EPT page tables have
> to be "emptied".
>
>> And will VM use EPT from the very beginning when booting?
>
> Yes.  But it's not the VM.  It's KVM that uses EPT.
>
> The VM only uses EPT if you're using nested virtualization, and EPT is
> enabled.  L1's KVM uses EPT, L2 doesn't (because it doesn't run KVM).
>
>>> With nested virtualization, roots are invalidated whenever kvm->arch.mmu
>>> changes meaning from L1->L0 or L2->L0 or vice versa (in the special case
>>> where EPT is disabled on L0, this is trivially because vmentry loads CR3
>>> from the vmcs02).
>>
>> Besides, in function tdp_page_fault(), I find two different execution
>> flow which may not reach __direct_map() (which I think is the normal
>> path to handle PF), they are fast_page_fault() and try_async_pf().
>> When will these two paths called when handling EPT page fault?
>
> fast_page_fault() is called if you're using dirty page tracking.  It
> checks if we have a read-only page that is in a writeable memory slot
> (SPTE_HOST_WRITEABLE) and whose PTE allows writes (SPTE_MMU_WRITEABLE).
>  If these conditions are satisfied, the page was read-only because of
> dirty page tracking; it is made read-write with a single cmpxchg and
> sets the bit for the page in the dirty bitmap.

What is the dirty page tracking code path? I find a obsoleted flag
"dirty_page_log_all" in the very previous codes, but I cannot get the
most recent version of tracking dirty pages.

Besides, I noticed that memory management in KVM uses the mechanism
with "struct kvm_memory_slot". How is kvm_memory_slot used with the
cooperation of Linux memory management?

Thanks,
Arthur

>
> try_async_pf will inject a "dummy" pagefault instead of creating the EPT
> page table, and create the page table in the background.  The guest will
> do something else (run another task) until the EPT page table has been
> created; then a second "dummy" pagefault is injected.
> kvm_arch_async_page_not_present signals the first page fault,
> kvm_arch_async_page_present signals the second.  For this to happen, the
> guest must have enabled the asynchronous page fault feature with a write
> to a KVM-specific MSR.
>
> Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/7] Initial skeleton of VFIO support for Device Tree based devices

2013-10-28 Thread Don Dutile

On 09/30/2013 11:37 AM, Bhushan Bharat-R65777 wrote:




-Original Message-
From: iommu-boun...@lists.linux-foundation.org [mailto:iommu-
boun...@lists.linux-foundation.org] On Behalf Of Antonios Motakis
Sent: Monday, September 30, 2013 8:59 PM
To: kvm...@lists.cs.columbia.edu; alex.william...@redhat.com
Cc: linux-samsung-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Yoder
Stuart-B08248; io...@lists.linux-foundation.org; Antonios Motakis;
t...@virtualopensystems.com
Subject: [PATCH 2/7] Initial skeleton of VFIO support for Device Tree based
devices

Platform devices in the Linux kernel are usually managed by the DT interface.
This patch forms the base to support these kind of devices with VFIO.

Signed-off-by: Antonios Motakis
---
  drivers/vfio/Kconfig |  11 +++
  drivers/vfio/Makefile|   1 +
  drivers/vfio/vfio_platform.c | 187 +++
  include/uapi/linux/vfio.h|   1 +
  4 files changed, 200 insertions(+)
  create mode 100644 drivers/vfio/vfio_platform.c

diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig index 1f84eda..35254b7
100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -13,4 +13,15 @@ menuconfig VFIO

  If you don't know what to do here, say N.

+config VFIO_PLATFORM
+   tristate "VFIO support for device tree based platform devices"
+   depends on VFIO&&  EVENTFD&&  OF
+   help
+ Support for platform devices with VFIO. This is required to make
+ use of platform devices present on device tree nodes using the VFIO
+ framework. Devices that are not described in the device tree cannot
+ be used by this driver.
+
+ If you don't know what to do here, say N.
+
  source "drivers/vfio/pci/Kconfig"
diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile index
2398d4a..575c8dd 100644
--- a/drivers/vfio/Makefile
+++ b/drivers/vfio/Makefile
@@ -1,3 +1,4 @@
  obj-$(CONFIG_VFIO) += vfio.o
  obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o
  obj-$(CONFIG_VFIO_PCI) += pci/
+obj-$(CONFIG_VFIO_PLATFORM) += vfio_platform.o
diff --git a/drivers/vfio/vfio_platform.c b/drivers/vfio/vfio_platform.c new


We can make this parallel to PCI, something like 
drivers/vfio/platform/platform.c


pls, no.  'platform' is too generic, and it really means 'arm-dt' ... so can
move it to the arch/arm space, and have it's kconfig conditional on ARM&&VFIO.
if kept under drivers/vfio, then use a better directory name that ties it to 
arm-dt.
thanks.
- Don



-Bharat


file mode 100644 index 000..b9686b0
--- /dev/null
+++ b/drivers/vfio/vfio_platform.c
@@ -0,0 +1,187 @@
+/*
+ * Copyright (C) 2013 - Virtual Open Systems
+ * Author: Antonios Motakis
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+
+#define DRIVER_VERSION  "0.1"
+#define DRIVER_AUTHOR   "Antonios Motakis"
+#define DRIVER_DESC "VFIO Device Tree devices - User Level meta-driver"
+
+struct vfio_platform_device {
+   struct platform_device  *pdev;
+};
+
+static void vfio_platform_release(void *device_data) {
+   module_put(THIS_MODULE);
+}
+
+static int vfio_platform_open(void *device_data) {
+   if (!try_module_get(THIS_MODULE))
+   return -ENODEV;
+
+   return 0;
+}
+
+static long vfio_platform_ioctl(void *device_data,
+  unsigned int cmd, unsigned long arg) {
+   struct vfio_platform_device *vdev = device_data;
+   unsigned long minsz;
+
+   if (cmd == VFIO_DEVICE_GET_INFO) {
+   struct vfio_device_info info;
+
+   minsz = offsetofend(struct vfio_device_info, num_irqs);
+
+   if (copy_from_user(&info, (void __user *)arg, minsz))
+   return -EFAULT;
+
+   if (info.argsz<  minsz)
+   return -EINVAL;
+
+   info.flags = VFIO_DEVICE_FLAGS_PLATFORM;
+   info.num_regions = 0;
+   info.num_irqs = 0;
+
+   return copy_to_user((void __user *)arg,&info, minsz);
+
+   } else if (cmd == VFIO_DEVICE_GET_REGION_INFO)
+   return -EINVAL;
+
+   else if (cmd == VFIO_DEVICE_GET_IRQ_INFO)
+   return -EINVAL;
+
+   else if (cmd == VFIO_DEVICE_SET_IRQS)
+  

Re: [PATCH 1/7] VFIO_IOMMU_TYPE1 workaround to build for platform devices

2013-10-28 Thread Don Dutile

On 10/02/2013 08:14 AM, Alexander Graf wrote:


On 01.10.2013, at 21:21, Yoder Stuart-B08248 wrote:


static int __init vfio_iommu_type1_init(void)
{
-   if (!iommu_present(&pci_bus_type))
+#ifdef CONFIG_PCI
+   if (iommu_present(&pci_bus_type)) {
+   iommu_bus_type =&pci_bus_type;
+   /* For PCI targets, IOMMU_CAP_INTR_REMAP is required */
+   require_cap_intr_remap = true;
+   }
+#endif
+   if (!iommu_bus_type&&  iommu_present(&platform_bus_type))
+   iommu_bus_type =&platform_bus_type;
+
+   if(!iommu_bus_type)
return -ENODEV;

return vfio_register_iommu_driver(&vfio_iommu_driver_ops_type1);


Is it possible to have a system with both PCI and platform devices?  How
would you support that?  Thanks,


It most certainly is a requirement to support both.  This is how
all of our (FSL) SoCs will expect to work.


I thought the PCI bus emits a cookie that the system wide IOMMU can then use to 
differentiate the origin of the transaction? So the same IOMMU can be used for 
PCI as well as platform routing.


*can* be the same IOMMU, yes;
have to, no, so there can be multiple IOMMUs of different types.



Alex

___
iommu mailing list
io...@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] VFIO: pci: amend vfio-pci for explicit binding via sysfs only

2013-10-28 Thread Scott Wood
On Mon, 2013-10-28 at 23:45 -0500, Bhushan Bharat-R65777 wrote:
> 
> > -Original Message-
> > From: Wood Scott-B07421
> > Sent: Tuesday, October 29, 2013 10:05 AM
> > To: Bhushan Bharat-R65777
> > Cc: Wood Scott-B07421; Alex Williamson; Kim Phillips; Yoder Stuart-B08248;
> > christoffer.d...@linaro.org; linux-ker...@vger.kernel.org;
> > a.mota...@virtualopensystems.com; ag...@suse.de; Sethi Varun-B16395;
> > peter.mayd...@linaro.org; santosh.shu...@linaro.org; kvm@vger.kernel.org;
> > gre...@linuxfoundation.org
> > Subject: Re: [PATCH 3/4] VFIO: pci: amend vfio-pci for explicit binding via
> > sysfs only
> > 
> > On Mon, 2013-10-28 at 23:31 -0500, Bhushan Bharat-R65777 wrote:
> > >
> > > > -Original Message-
> > > > From: Wood Scott-B07421
> > > > Sent: Tuesday, October 29, 2013 10:00 AM
> > > > To: Bhushan Bharat-R65777
> > > > Cc: Wood Scott-B07421; Alex Williamson; Kim Phillips; Yoder
> > > > Stuart-B08248; christoffer.d...@linaro.org;
> > > > linux-ker...@vger.kernel.org; a.mota...@virtualopensystems.com;
> > > > ag...@suse.de; Sethi Varun-B16395; peter.mayd...@linaro.org;
> > > > santosh.shu...@linaro.org; kvm@vger.kernel.org;
> > > > gre...@linuxfoundation.org
> > > > Subject: Re: [PATCH 3/4] VFIO: pci: amend vfio-pci for explicit
> > > > binding via sysfs only
> > > >
> > > > On Mon, 2013-10-28 at 22:52 -0500, Bhushan Bharat-R65777 wrote:
> > > > > So when ids == NULL it does not check of vendor etc and calls
> > > > > pci_add_dynid()
> > > > which in turn calls driver_attach().
> > > > >
> > > > > If we change the above loop to break if ids->vendor == PCI_ANY_ID
> > > > >&& ids- subvendor == PCI_ANY_ID then also we will call pci_add_dyids().
> > > >
> > > > What problem are you trying to solve?
> > >
> > > new_id interface to continue working as before.
> > 
> > In what specific way does this allow new_id to continue working as before?  
> > Be
> > verbose.
> 
> 
> What I observed that this patch (kim's patch) new_id interface stops working.

Yes.

>  This is found to be because store_new_id() checks for pdrv->id_table which 
> is no more NULL, so the below check fails

I do not think that is the reason.  The reason is because
sysfs_bind_only is set, and this is not a direct sysfs bind.

> if (ids) {
> ^^
> This is no more NULL, so enter inside the loop
> 
> retval = -EINVAL;
> while (ids->vendor || ids->subvendor || ids->class_mask) {
> if (driver_data == ids->driver_data) {
> retval = 0;
> break;
> }
> ids++;
> }
> if (retval)   /* No match */
> return retval;
> ^
> This is where it returns as -EINVAL

Why wouldn't it have broken out of the loop earlier, since driver_data
and ids->driver_data should both be zero?  I assume this is with a patch
to do PCI_ANY_ID in vfio-pci.

-Scott



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 3/4] VFIO: pci: amend vfio-pci for explicit binding via sysfs only

2013-10-28 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Wood Scott-B07421
> Sent: Tuesday, October 29, 2013 10:05 AM
> To: Bhushan Bharat-R65777
> Cc: Wood Scott-B07421; Alex Williamson; Kim Phillips; Yoder Stuart-B08248;
> christoffer.d...@linaro.org; linux-ker...@vger.kernel.org;
> a.mota...@virtualopensystems.com; ag...@suse.de; Sethi Varun-B16395;
> peter.mayd...@linaro.org; santosh.shu...@linaro.org; kvm@vger.kernel.org;
> gre...@linuxfoundation.org
> Subject: Re: [PATCH 3/4] VFIO: pci: amend vfio-pci for explicit binding via
> sysfs only
> 
> On Mon, 2013-10-28 at 23:31 -0500, Bhushan Bharat-R65777 wrote:
> >
> > > -Original Message-
> > > From: Wood Scott-B07421
> > > Sent: Tuesday, October 29, 2013 10:00 AM
> > > To: Bhushan Bharat-R65777
> > > Cc: Wood Scott-B07421; Alex Williamson; Kim Phillips; Yoder
> > > Stuart-B08248; christoffer.d...@linaro.org;
> > > linux-ker...@vger.kernel.org; a.mota...@virtualopensystems.com;
> > > ag...@suse.de; Sethi Varun-B16395; peter.mayd...@linaro.org;
> > > santosh.shu...@linaro.org; kvm@vger.kernel.org;
> > > gre...@linuxfoundation.org
> > > Subject: Re: [PATCH 3/4] VFIO: pci: amend vfio-pci for explicit
> > > binding via sysfs only
> > >
> > > On Mon, 2013-10-28 at 22:52 -0500, Bhushan Bharat-R65777 wrote:
> > > >
> > > > > -Original Message-
> > > > > From: Wood Scott-B07421
> > > > > Sent: Tuesday, October 29, 2013 9:11 AM
> > > > > To: Bhushan Bharat-R65777
> > > > > Cc: Wood Scott-B07421; Alex Williamson; Kim Phillips; Yoder
> > > > > Stuart-B08248; christoffer.d...@linaro.org;
> > > > > linux-ker...@vger.kernel.org; a.mota...@virtualopensystems.com;
> > > > > ag...@suse.de; Sethi Varun-B16395; peter.mayd...@linaro.org;
> > > > > santosh.shu...@linaro.org; kvm@vger.kernel.org;
> > > > > gre...@linuxfoundation.org
> > > > > Subject: Re: [PATCH 3/4] VFIO: pci: amend vfio-pci for explicit
> > > > > binding via sysfs only
> > > > >
> > > > > On Mon, 2013-10-28 at 22:38 -0500, Bhushan Bharat-R65777 wrote:
> > > > > >
> > > > > > > -Original Message-
> > > > > > > From: Wood Scott-B07421
> > > > > > > Sent: Monday, October 28, 2013 11:40 PM
> > > > > > > To: Alex Williamson
> > > > > > > Cc: Kim Phillips; Bhushan Bharat-R65777; Wood Scott-B07421;
> > > > > > > Yoder Stuart-B08248; christoffer.d...@linaro.org;
> > > > > > > linux-ker...@vger.kernel.org;
> > > > > > > a.mota...@virtualopensystems.com; ag...@suse.de; Sethi
> > > > > > > Varun-B16395; peter.mayd...@linaro.org;
> > > > > > > santosh.shu...@linaro.org; kvm@vger.kernel.org;
> > > > > > > gre...@linuxfoundation.org
> > > > > > > Subject: Re: [PATCH 3/4] VFIO: pci: amend vfio-pci for
> > > > > > > explicit binding via sysfs only
> > > > > > >
> > > > > > > On Mon, 2013-10-28 at 13:00 -0500, Scott Wood wrote:
> > > > > > > > On Mon, 2013-10-28 at 11:47 -0600, Alex Williamson wrote:
> > > > > > > > > On Fri, 2013-10-11 at 01:27 -0500, Kim Phillips wrote:
> > > > > > > > > > Force the vfio-pci driver to only be bound explicitly
> > > > > > > > > > via sysfs to avoid conflics with other drivers in the
> > > > > > > > > > event of a
> > > hotplug.
> > > > > > > > >
> > > > > > > > > We can't break userspace, so we can't disable the
> > > > > > > > > current method of binding devices to vfio-pci.  We can
> > > > > > > > > add a new method and perhaps deprecate the existing
> > > > > > > > > mechanism to be removed at some point in the future.
> > > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > I thought the existing method involved using sysfs bind,
> > > > > > > > and this was just eliminating a race.  How does the bind
> > > > > > > > get triggered
> > > currently?
> > > > > > >
> > > > > > > OK, so it seems it's relying on the write to new_id calling
> > > driver_attach().
> > > > > > > Sigh.  I guess we could make driver-sysfs-bind-only be
> > > > > > > settable via sysfs, and have new-userspace set both that and
> > > > > > > PCI_ANY_ID (or the specific ID if userspace
> > > > > > > prefers) via new_id.  The platform bus patches could
> > > > > > > continue as is, since there's no existing mechanism to break.
> > > > > >
> > > > > > What about changing the store_new_id() to bypass exact ids
> > > > > > check if driver
> > > > > have PCI_ANY_ID?
> > > > >
> > > > > I don't follow.
> > > >
> > > > store_new_id() function id defined as:
> > > >
> > > > static ssize_t store_new_id(struct device_driver *driver, const
> > > > char *buf, size_t count) {
> > > > struct pci_driver *pdrv = to_pci_driver(driver);
> > > > const struct pci_device_id *ids = pdrv->id_table;
> > > >
> > > > 
> > > > /* Only accept driver_data values that match an existing 
> > > > id_table
> > > >entry */
> > > > if (ids) {
> > > > retval = -EINVAL;
> > > > while (ids->vendor || ids->subvendor || 
> > > > ids->class_mask) {
> > > > if (driver_data == ids->driver_data) {
> > > > retval = 0;
> > > 

Re: [PATCH 3/4] VFIO: pci: amend vfio-pci for explicit binding via sysfs only

2013-10-28 Thread Scott Wood
On Mon, 2013-10-28 at 23:31 -0500, Bhushan Bharat-R65777 wrote:
> 
> > -Original Message-
> > From: Wood Scott-B07421
> > Sent: Tuesday, October 29, 2013 10:00 AM
> > To: Bhushan Bharat-R65777
> > Cc: Wood Scott-B07421; Alex Williamson; Kim Phillips; Yoder Stuart-B08248;
> > christoffer.d...@linaro.org; linux-ker...@vger.kernel.org;
> > a.mota...@virtualopensystems.com; ag...@suse.de; Sethi Varun-B16395;
> > peter.mayd...@linaro.org; santosh.shu...@linaro.org; kvm@vger.kernel.org;
> > gre...@linuxfoundation.org
> > Subject: Re: [PATCH 3/4] VFIO: pci: amend vfio-pci for explicit binding via
> > sysfs only
> > 
> > On Mon, 2013-10-28 at 22:52 -0500, Bhushan Bharat-R65777 wrote:
> > >
> > > > -Original Message-
> > > > From: Wood Scott-B07421
> > > > Sent: Tuesday, October 29, 2013 9:11 AM
> > > > To: Bhushan Bharat-R65777
> > > > Cc: Wood Scott-B07421; Alex Williamson; Kim Phillips; Yoder
> > > > Stuart-B08248; christoffer.d...@linaro.org;
> > > > linux-ker...@vger.kernel.org; a.mota...@virtualopensystems.com;
> > > > ag...@suse.de; Sethi Varun-B16395; peter.mayd...@linaro.org;
> > > > santosh.shu...@linaro.org; kvm@vger.kernel.org;
> > > > gre...@linuxfoundation.org
> > > > Subject: Re: [PATCH 3/4] VFIO: pci: amend vfio-pci for explicit
> > > > binding via sysfs only
> > > >
> > > > On Mon, 2013-10-28 at 22:38 -0500, Bhushan Bharat-R65777 wrote:
> > > > >
> > > > > > -Original Message-
> > > > > > From: Wood Scott-B07421
> > > > > > Sent: Monday, October 28, 2013 11:40 PM
> > > > > > To: Alex Williamson
> > > > > > Cc: Kim Phillips; Bhushan Bharat-R65777; Wood Scott-B07421;
> > > > > > Yoder Stuart-B08248; christoffer.d...@linaro.org;
> > > > > > linux-ker...@vger.kernel.org; a.mota...@virtualopensystems.com;
> > > > > > ag...@suse.de; Sethi Varun-B16395; peter.mayd...@linaro.org;
> > > > > > santosh.shu...@linaro.org; kvm@vger.kernel.org;
> > > > > > gre...@linuxfoundation.org
> > > > > > Subject: Re: [PATCH 3/4] VFIO: pci: amend vfio-pci for explicit
> > > > > > binding via sysfs only
> > > > > >
> > > > > > On Mon, 2013-10-28 at 13:00 -0500, Scott Wood wrote:
> > > > > > > On Mon, 2013-10-28 at 11:47 -0600, Alex Williamson wrote:
> > > > > > > > On Fri, 2013-10-11 at 01:27 -0500, Kim Phillips wrote:
> > > > > > > > > Force the vfio-pci driver to only be bound explicitly via
> > > > > > > > > sysfs to avoid conflics with other drivers in the event of a
> > hotplug.
> > > > > > > >
> > > > > > > > We can't break userspace, so we can't disable the current
> > > > > > > > method of binding devices to vfio-pci.  We can add a new
> > > > > > > > method and perhaps deprecate the existing mechanism to be
> > > > > > > > removed at some point in the future.  Thanks,
> > > > > > >
> > > > > > > I thought the existing method involved using sysfs bind, and
> > > > > > > this was just eliminating a race.  How does the bind get triggered
> > currently?
> > > > > >
> > > > > > OK, so it seems it's relying on the write to new_id calling
> > driver_attach().
> > > > > > Sigh.  I guess we could make driver-sysfs-bind-only be settable
> > > > > > via sysfs, and have new-userspace set both that and PCI_ANY_ID
> > > > > > (or the specific ID if userspace
> > > > > > prefers) via new_id.  The platform bus patches could continue as
> > > > > > is, since there's no existing mechanism to break.
> > > > >
> > > > > What about changing the store_new_id() to bypass exact ids check
> > > > > if driver
> > > > have PCI_ANY_ID?
> > > >
> > > > I don't follow.
> > >
> > > store_new_id() function id defined as:
> > >
> > > static ssize_t store_new_id(struct device_driver *driver, const char
> > > *buf, size_t count) {
> > > struct pci_driver *pdrv = to_pci_driver(driver);
> > > const struct pci_device_id *ids = pdrv->id_table;
> > >
> > > 
> > > /* Only accept driver_data values that match an existing id_table
> > >entry */
> > > if (ids) {
> > > retval = -EINVAL;
> > > while (ids->vendor || ids->subvendor || ids->class_mask) {
> > > if (driver_data == ids->driver_data) {
> > > retval = 0;
> > > break;
> > > }
> > > ids++;
> > > }
> > > if (retval) /* No match */
> > > return retval;
> > > }
> > >
> > > retval = pci_add_dynid(pdrv, vendor, device, subvendor, subdevice,
> > >class, class_mask, driver_data); 
> > >
> > >
> > > So when ids == NULL it does not check of vendor etc and calls 
> > > pci_add_dynid()
> > which in turn calls driver_attach().
> > >
> > > If we change the above loop to break if ids->vendor == PCI_ANY_ID && ids-
> > >subvendor == PCI_ANY_ID then also we will call pci_add_dyids().
> > 
> > What problem are you trying to solve?
> 
> new_id interface to continue working as before.

RE: [PATCH 3/4] VFIO: pci: amend vfio-pci for explicit binding via sysfs only

2013-10-28 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Wood Scott-B07421
> Sent: Tuesday, October 29, 2013 10:00 AM
> To: Bhushan Bharat-R65777
> Cc: Wood Scott-B07421; Alex Williamson; Kim Phillips; Yoder Stuart-B08248;
> christoffer.d...@linaro.org; linux-ker...@vger.kernel.org;
> a.mota...@virtualopensystems.com; ag...@suse.de; Sethi Varun-B16395;
> peter.mayd...@linaro.org; santosh.shu...@linaro.org; kvm@vger.kernel.org;
> gre...@linuxfoundation.org
> Subject: Re: [PATCH 3/4] VFIO: pci: amend vfio-pci for explicit binding via
> sysfs only
> 
> On Mon, 2013-10-28 at 22:52 -0500, Bhushan Bharat-R65777 wrote:
> >
> > > -Original Message-
> > > From: Wood Scott-B07421
> > > Sent: Tuesday, October 29, 2013 9:11 AM
> > > To: Bhushan Bharat-R65777
> > > Cc: Wood Scott-B07421; Alex Williamson; Kim Phillips; Yoder
> > > Stuart-B08248; christoffer.d...@linaro.org;
> > > linux-ker...@vger.kernel.org; a.mota...@virtualopensystems.com;
> > > ag...@suse.de; Sethi Varun-B16395; peter.mayd...@linaro.org;
> > > santosh.shu...@linaro.org; kvm@vger.kernel.org;
> > > gre...@linuxfoundation.org
> > > Subject: Re: [PATCH 3/4] VFIO: pci: amend vfio-pci for explicit
> > > binding via sysfs only
> > >
> > > On Mon, 2013-10-28 at 22:38 -0500, Bhushan Bharat-R65777 wrote:
> > > >
> > > > > -Original Message-
> > > > > From: Wood Scott-B07421
> > > > > Sent: Monday, October 28, 2013 11:40 PM
> > > > > To: Alex Williamson
> > > > > Cc: Kim Phillips; Bhushan Bharat-R65777; Wood Scott-B07421;
> > > > > Yoder Stuart-B08248; christoffer.d...@linaro.org;
> > > > > linux-ker...@vger.kernel.org; a.mota...@virtualopensystems.com;
> > > > > ag...@suse.de; Sethi Varun-B16395; peter.mayd...@linaro.org;
> > > > > santosh.shu...@linaro.org; kvm@vger.kernel.org;
> > > > > gre...@linuxfoundation.org
> > > > > Subject: Re: [PATCH 3/4] VFIO: pci: amend vfio-pci for explicit
> > > > > binding via sysfs only
> > > > >
> > > > > On Mon, 2013-10-28 at 13:00 -0500, Scott Wood wrote:
> > > > > > On Mon, 2013-10-28 at 11:47 -0600, Alex Williamson wrote:
> > > > > > > On Fri, 2013-10-11 at 01:27 -0500, Kim Phillips wrote:
> > > > > > > > Force the vfio-pci driver to only be bound explicitly via
> > > > > > > > sysfs to avoid conflics with other drivers in the event of a
> hotplug.
> > > > > > >
> > > > > > > We can't break userspace, so we can't disable the current
> > > > > > > method of binding devices to vfio-pci.  We can add a new
> > > > > > > method and perhaps deprecate the existing mechanism to be
> > > > > > > removed at some point in the future.  Thanks,
> > > > > >
> > > > > > I thought the existing method involved using sysfs bind, and
> > > > > > this was just eliminating a race.  How does the bind get triggered
> currently?
> > > > >
> > > > > OK, so it seems it's relying on the write to new_id calling
> driver_attach().
> > > > > Sigh.  I guess we could make driver-sysfs-bind-only be settable
> > > > > via sysfs, and have new-userspace set both that and PCI_ANY_ID
> > > > > (or the specific ID if userspace
> > > > > prefers) via new_id.  The platform bus patches could continue as
> > > > > is, since there's no existing mechanism to break.
> > > >
> > > > What about changing the store_new_id() to bypass exact ids check
> > > > if driver
> > > have PCI_ANY_ID?
> > >
> > > I don't follow.
> >
> > store_new_id() function id defined as:
> >
> > static ssize_t store_new_id(struct device_driver *driver, const char
> > *buf, size_t count) {
> > struct pci_driver *pdrv = to_pci_driver(driver);
> > const struct pci_device_id *ids = pdrv->id_table;
> >
> > 
> > /* Only accept driver_data values that match an existing id_table
> >entry */
> > if (ids) {
> > retval = -EINVAL;
> > while (ids->vendor || ids->subvendor || ids->class_mask) {
> > if (driver_data == ids->driver_data) {
> > retval = 0;
> > break;
> > }
> > ids++;
> > }
> > if (retval) /* No match */
> > return retval;
> > }
> >
> > retval = pci_add_dynid(pdrv, vendor, device, subvendor, subdevice,
> >class, class_mask, driver_data); 
> >
> >
> > So when ids == NULL it does not check of vendor etc and calls 
> > pci_add_dynid()
> which in turn calls driver_attach().
> >
> > If we change the above loop to break if ids->vendor == PCI_ANY_ID && ids-
> >subvendor == PCI_ANY_ID then also we will call pci_add_dyids().
> 
> What problem are you trying to solve?

new_id interface to continue working as before.

-Bharat

> 
> -Scott
> 

N�r��yb�X��ǧv�^�)޺{.n�+h����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf

Re: [PATCH 3/4] VFIO: pci: amend vfio-pci for explicit binding via sysfs only

2013-10-28 Thread Scott Wood
On Mon, 2013-10-28 at 22:52 -0500, Bhushan Bharat-R65777 wrote:
> 
> > -Original Message-
> > From: Wood Scott-B07421
> > Sent: Tuesday, October 29, 2013 9:11 AM
> > To: Bhushan Bharat-R65777
> > Cc: Wood Scott-B07421; Alex Williamson; Kim Phillips; Yoder Stuart-B08248;
> > christoffer.d...@linaro.org; linux-ker...@vger.kernel.org;
> > a.mota...@virtualopensystems.com; ag...@suse.de; Sethi Varun-B16395;
> > peter.mayd...@linaro.org; santosh.shu...@linaro.org; kvm@vger.kernel.org;
> > gre...@linuxfoundation.org
> > Subject: Re: [PATCH 3/4] VFIO: pci: amend vfio-pci for explicit binding via
> > sysfs only
> > 
> > On Mon, 2013-10-28 at 22:38 -0500, Bhushan Bharat-R65777 wrote:
> > >
> > > > -Original Message-
> > > > From: Wood Scott-B07421
> > > > Sent: Monday, October 28, 2013 11:40 PM
> > > > To: Alex Williamson
> > > > Cc: Kim Phillips; Bhushan Bharat-R65777; Wood Scott-B07421; Yoder
> > > > Stuart-B08248; christoffer.d...@linaro.org;
> > > > linux-ker...@vger.kernel.org; a.mota...@virtualopensystems.com;
> > > > ag...@suse.de; Sethi Varun-B16395; peter.mayd...@linaro.org;
> > > > santosh.shu...@linaro.org; kvm@vger.kernel.org;
> > > > gre...@linuxfoundation.org
> > > > Subject: Re: [PATCH 3/4] VFIO: pci: amend vfio-pci for explicit
> > > > binding via sysfs only
> > > >
> > > > On Mon, 2013-10-28 at 13:00 -0500, Scott Wood wrote:
> > > > > On Mon, 2013-10-28 at 11:47 -0600, Alex Williamson wrote:
> > > > > > On Fri, 2013-10-11 at 01:27 -0500, Kim Phillips wrote:
> > > > > > > Force the vfio-pci driver to only be bound explicitly via
> > > > > > > sysfs to avoid conflics with other drivers in the event of a 
> > > > > > > hotplug.
> > > > > >
> > > > > > We can't break userspace, so we can't disable the current method
> > > > > > of binding devices to vfio-pci.  We can add a new method and
> > > > > > perhaps deprecate the existing mechanism to be removed at some
> > > > > > point in the future.  Thanks,
> > > > >
> > > > > I thought the existing method involved using sysfs bind, and this
> > > > > was just eliminating a race.  How does the bind get triggered 
> > > > > currently?
> > > >
> > > > OK, so it seems it's relying on the write to new_id calling 
> > > > driver_attach().
> > > > Sigh.  I guess we could make driver-sysfs-bind-only be settable via
> > > > sysfs, and have new-userspace set both that and PCI_ANY_ID (or the
> > > > specific ID if userspace
> > > > prefers) via new_id.  The platform bus patches could continue as is,
> > > > since there's no existing mechanism to break.
> > >
> > > What about changing the store_new_id() to bypass exact ids check if driver
> > have PCI_ANY_ID?
> > 
> > I don't follow.
> 
> store_new_id() function id defined as:
> 
> static ssize_t store_new_id(struct device_driver *driver, const char *buf, 
> size_t count)
> {
> struct pci_driver *pdrv = to_pci_driver(driver);
> const struct pci_device_id *ids = pdrv->id_table;
> 
> 
> /* Only accept driver_data values that match an existing id_table
>entry */
> if (ids) {
> retval = -EINVAL;
> while (ids->vendor || ids->subvendor || ids->class_mask) {
> if (driver_data == ids->driver_data) {
> retval = 0;
> break;
> }
> ids++;
> }
> if (retval) /* No match */
> return retval;
> }
> 
> retval = pci_add_dynid(pdrv, vendor, device, subvendor, subdevice,
>class, class_mask, driver_data);
> 
> 
> 
> So when ids == NULL it does not check of vendor etc and calls pci_add_dynid() 
> which in turn calls driver_attach().
> 
> If we change the above loop to break if ids->vendor == PCI_ANY_ID && 
> ids->subvendor == PCI_ANY_ID then also we will call pci_add_dyids().

What problem are you trying to solve?

-Scott



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 3/4] VFIO: pci: amend vfio-pci for explicit binding via sysfs only

2013-10-28 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Wood Scott-B07421
> Sent: Tuesday, October 29, 2013 9:11 AM
> To: Bhushan Bharat-R65777
> Cc: Wood Scott-B07421; Alex Williamson; Kim Phillips; Yoder Stuart-B08248;
> christoffer.d...@linaro.org; linux-ker...@vger.kernel.org;
> a.mota...@virtualopensystems.com; ag...@suse.de; Sethi Varun-B16395;
> peter.mayd...@linaro.org; santosh.shu...@linaro.org; kvm@vger.kernel.org;
> gre...@linuxfoundation.org
> Subject: Re: [PATCH 3/4] VFIO: pci: amend vfio-pci for explicit binding via
> sysfs only
> 
> On Mon, 2013-10-28 at 22:38 -0500, Bhushan Bharat-R65777 wrote:
> >
> > > -Original Message-
> > > From: Wood Scott-B07421
> > > Sent: Monday, October 28, 2013 11:40 PM
> > > To: Alex Williamson
> > > Cc: Kim Phillips; Bhushan Bharat-R65777; Wood Scott-B07421; Yoder
> > > Stuart-B08248; christoffer.d...@linaro.org;
> > > linux-ker...@vger.kernel.org; a.mota...@virtualopensystems.com;
> > > ag...@suse.de; Sethi Varun-B16395; peter.mayd...@linaro.org;
> > > santosh.shu...@linaro.org; kvm@vger.kernel.org;
> > > gre...@linuxfoundation.org
> > > Subject: Re: [PATCH 3/4] VFIO: pci: amend vfio-pci for explicit
> > > binding via sysfs only
> > >
> > > On Mon, 2013-10-28 at 13:00 -0500, Scott Wood wrote:
> > > > On Mon, 2013-10-28 at 11:47 -0600, Alex Williamson wrote:
> > > > > On Fri, 2013-10-11 at 01:27 -0500, Kim Phillips wrote:
> > > > > > Force the vfio-pci driver to only be bound explicitly via
> > > > > > sysfs to avoid conflics with other drivers in the event of a 
> > > > > > hotplug.
> > > > >
> > > > > We can't break userspace, so we can't disable the current method
> > > > > of binding devices to vfio-pci.  We can add a new method and
> > > > > perhaps deprecate the existing mechanism to be removed at some
> > > > > point in the future.  Thanks,
> > > >
> > > > I thought the existing method involved using sysfs bind, and this
> > > > was just eliminating a race.  How does the bind get triggered currently?
> > >
> > > OK, so it seems it's relying on the write to new_id calling 
> > > driver_attach().
> > > Sigh.  I guess we could make driver-sysfs-bind-only be settable via
> > > sysfs, and have new-userspace set both that and PCI_ANY_ID (or the
> > > specific ID if userspace
> > > prefers) via new_id.  The platform bus patches could continue as is,
> > > since there's no existing mechanism to break.
> >
> > What about changing the store_new_id() to bypass exact ids check if driver
> have PCI_ANY_ID?
> 
> I don't follow.

store_new_id() function id defined as:

static ssize_t store_new_id(struct device_driver *driver, const char *buf, 
size_t count)
{
struct pci_driver *pdrv = to_pci_driver(driver);
const struct pci_device_id *ids = pdrv->id_table;


/* Only accept driver_data values that match an existing id_table
   entry */
if (ids) {
retval = -EINVAL;
while (ids->vendor || ids->subvendor || ids->class_mask) {
if (driver_data == ids->driver_data) {
retval = 0;
break;
}
ids++;
}
if (retval) /* No match */
return retval;
}

retval = pci_add_dynid(pdrv, vendor, device, subvendor, subdevice,
   class, class_mask, driver_data);



So when ids == NULL it does not check of vendor etc and calls pci_add_dynid() 
which in turn calls driver_attach().

If we change the above loop to break if ids->vendor == PCI_ANY_ID && 
ids->subvendor == PCI_ANY_ID then also we will call pci_add_dyids().

-Bharat


> 
> -Scott
> 



Re: [PATCH 3/4] VFIO: pci: amend vfio-pci for explicit binding via sysfs only

2013-10-28 Thread Scott Wood
On Mon, 2013-10-28 at 22:38 -0500, Bhushan Bharat-R65777 wrote:
> 
> > -Original Message-
> > From: Wood Scott-B07421
> > Sent: Monday, October 28, 2013 11:40 PM
> > To: Alex Williamson
> > Cc: Kim Phillips; Bhushan Bharat-R65777; Wood Scott-B07421; Yoder 
> > Stuart-B08248;
> > christoffer.d...@linaro.org; linux-ker...@vger.kernel.org;
> > a.mota...@virtualopensystems.com; ag...@suse.de; Sethi Varun-B16395;
> > peter.mayd...@linaro.org; santosh.shu...@linaro.org; kvm@vger.kernel.org;
> > gre...@linuxfoundation.org
> > Subject: Re: [PATCH 3/4] VFIO: pci: amend vfio-pci for explicit binding via
> > sysfs only
> > 
> > On Mon, 2013-10-28 at 13:00 -0500, Scott Wood wrote:
> > > On Mon, 2013-10-28 at 11:47 -0600, Alex Williamson wrote:
> > > > On Fri, 2013-10-11 at 01:27 -0500, Kim Phillips wrote:
> > > > > Force the vfio-pci driver to only be bound explicitly via sysfs to
> > > > > avoid conflics with other drivers in the event of a hotplug.
> > > >
> > > > We can't break userspace, so we can't disable the current method of
> > > > binding devices to vfio-pci.  We can add a new method and perhaps
> > > > deprecate the existing mechanism to be removed at some point in the
> > > > future.  Thanks,
> > >
> > > I thought the existing method involved using sysfs bind, and this was
> > > just eliminating a race.  How does the bind get triggered currently?
> > 
> > OK, so it seems it's relying on the write to new_id calling driver_attach().
> > Sigh.  I guess we could make driver-sysfs-bind-only be settable via sysfs, 
> > and
> > have new-userspace set both that and PCI_ANY_ID (or the specific ID if 
> > userspace
> > prefers) via new_id.  The platform bus patches could continue as is, since
> > there's no existing mechanism to break.
> 
> What about changing the store_new_id() to bypass exact ids check if driver 
> have PCI_ANY_ID?

I don't follow.

-Scott



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 3/4] VFIO: pci: amend vfio-pci for explicit binding via sysfs only

2013-10-28 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Wood Scott-B07421
> Sent: Monday, October 28, 2013 11:40 PM
> To: Alex Williamson
> Cc: Kim Phillips; Bhushan Bharat-R65777; Wood Scott-B07421; Yoder 
> Stuart-B08248;
> christoffer.d...@linaro.org; linux-ker...@vger.kernel.org;
> a.mota...@virtualopensystems.com; ag...@suse.de; Sethi Varun-B16395;
> peter.mayd...@linaro.org; santosh.shu...@linaro.org; kvm@vger.kernel.org;
> gre...@linuxfoundation.org
> Subject: Re: [PATCH 3/4] VFIO: pci: amend vfio-pci for explicit binding via
> sysfs only
> 
> On Mon, 2013-10-28 at 13:00 -0500, Scott Wood wrote:
> > On Mon, 2013-10-28 at 11:47 -0600, Alex Williamson wrote:
> > > On Fri, 2013-10-11 at 01:27 -0500, Kim Phillips wrote:
> > > > Force the vfio-pci driver to only be bound explicitly via sysfs to
> > > > avoid conflics with other drivers in the event of a hotplug.
> > >
> > > We can't break userspace, so we can't disable the current method of
> > > binding devices to vfio-pci.  We can add a new method and perhaps
> > > deprecate the existing mechanism to be removed at some point in the
> > > future.  Thanks,
> >
> > I thought the existing method involved using sysfs bind, and this was
> > just eliminating a race.  How does the bind get triggered currently?
> 
> OK, so it seems it's relying on the write to new_id calling driver_attach().
> Sigh.  I guess we could make driver-sysfs-bind-only be settable via sysfs, and
> have new-userspace set both that and PCI_ANY_ID (or the specific ID if 
> userspace
> prefers) via new_id.  The platform bus patches could continue as is, since
> there's no existing mechanism to break.

What about changing the store_new_id() to bypass exact ids check if driver have 
PCI_ANY_ID?

-Bharat

> 
> -Scott
> 



Re: [PATCH] virtio-scsi: Fix hotcpu_notifier use-after-free with virtscsi_freeze

2013-10-28 Thread Jason Wang
On 10/28/2013 04:01 PM, Asias He wrote:
> vqs are freed in virtscsi_freeze but the hotcpu_notifier is not
> unregistered. We will have a use-after-free usage when the notifier
> callback is called after virtscsi_freeze.
>
> Signed-off-by: Asias He 
> ---
>  drivers/scsi/virtio_scsi.c | 15 ++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
> index 74b88ef..b26f1a5 100644
> --- a/drivers/scsi/virtio_scsi.c
> +++ b/drivers/scsi/virtio_scsi.c
> @@ -957,6 +957,10 @@ static void virtscsi_remove(struct virtio_device *vdev)
>  #ifdef CONFIG_PM
>  static int virtscsi_freeze(struct virtio_device *vdev)
>  {
> + struct Scsi_Host *sh = virtio_scsi_host(vdev);
> + struct virtio_scsi *vscsi = shost_priv(sh);
> +
> + unregister_hotcpu_notifier(&vscsi->nb);
>   virtscsi_remove_vqs(vdev);
>   return 0;
>  }
> @@ -965,8 +969,17 @@ static int virtscsi_restore(struct virtio_device *vdev)
>  {
>   struct Scsi_Host *sh = virtio_scsi_host(vdev);
>   struct virtio_scsi *vscsi = shost_priv(sh);
> + int err;
> +
> + err = virtscsi_init(vdev, vscsi);
> + if (err)
> + return err;
> +
> + err = register_hotcpu_notifier(&vscsi->nb);
> + if (err)
> + vdev->config->del_vqs(vdev);
>  
> - return virtscsi_init(vdev, vscsi);
> + return err;
>  }
>  #endif
>  
Acked-by: Jason Wang 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Hello,

2013-10-28 Thread Mrs Chantal Diarrah
Hello,
 
Compliment of the day to you.
 I am Mrs Chantal Diarrah, I am sending this brief letter to solicit your 
partnership to transfer
 
$19.5 million US Dollars. I shall send you more information
 
and procedures when I receive positive response from you.
Best Regards,
 
Thanks
Mrs. Chantal

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: paravirtualizing perf_clock

2013-10-28 Thread David Ahern

On 10/28/13 7:15 AM, Peter Zijlstra wrote:

Any suggestions on how to do this and without impacting performance. I
noticed the MSR path seems to take about twice as long as the current
implementation (which I believe results in rdtsc in the VM for x86 with
stable TSC).


So assuming all the TSCs are in fact stable; you could implement this by
syncing up the guest TSC to the host TSC on guest boot. I don't think
anything _should_ rely on the absolute TSC value.

Of course you then also need to make sure the host and guest tsc
multipliers (cyc2ns) are identical, you can play games with
cyc2ns_offset if you're brave.



This and the method Gleb mentioned both are going to be complex and 
fragile -- based assumptions on how the perf_clock timestamps are 
generated. For example, 489223e assumes you have the tracepoint enabled 
at VM start with some means of capturing the data (e.g., a perf-session 
active). In both cases the end result requires piecing together and 
re-generating the VM's timestamp on the events. For perf this means 
either modifying the tool to take parameters and an algorithm on how to 
modify the timestamp or a homegrown tool to regenerate the file with 
updated timestamps.


To back out a bit, my end goal is to be able to create and merge 
perf-events from any context on a KVM-based host -- guest userspace, 
guest kernel space, host userspace and host kernel space (userspace 
events with a perf-clock timestamp is another topic ;-)). Having the 
events generated with the proper timestamp is the simpler approach than 
trying to collect various tidbits of data, massage timestamps (and 
hoping the clock source hasn't changed) and then merge events.


And then for the cherry on top a design that works across architectures 
(e.g., x86 now, but arm later).


David
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Improving scheduler for KVM

2013-10-28 Thread R
Hi, everyone

I am a graduate student. And now I have some spare time.
I notice that KVM uses kernel scheduler to schedule VCPUs.
But there exists many problem beyond the capability of current
scheduler. (e.g. Lock Waiter Preemption problem)

And I don't want to reinvent the wheel. So I want to implement a
module which can be used by the scheduler to schedule VCPUs more
efficient.

Is there any documentation about any problem that I should pay attention to?
Any comment is welcome.

-- 
Thanks
Rui Wu
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


BUG unpinning 1 GiB huge pages with KVM PCI assignment

2013-10-28 Thread Greg Edwards
Using KVM PCI assignment with 1 GiB huge pages trips a BUG in 3.12.0-rc7, e.g.

# qemu-system-x86_64 \
-m 8192 \
-mem-path /var/lib/hugetlbfs/pagesize-1GB \
-mem-prealloc \
-enable-kvm \
-device pci-assign,host=1:0.0 \
-drive file=/var/tmp/vm.img,cache=none


[  287.081736] [ cut here ]
[  287.086364] kernel BUG at mm/hugetlb.c:654!
[  287.090552] invalid opcode:  [#1] PREEMPT SMP 
[  287.095407] Modules linked in: pci_stub autofs4 sunrpc iptable_filter 
ip_tables ip6table_filter ip6_tables x_tables binfmt_misc freq_table processor 
x86_pkg_temp_thermal kvm_intel kvm crc32_pclmul microcode serio_raw i2c_i801 
evdev sg igb i2c_algo_bit i2c_core ptp pps_core mlx4_core button ext4 jbd2 
mbcache crc16 usbhid sd_mod
[  287.124916] CPU: 15 PID: 25668 Comm: qemu-system-x86 Not tainted 3.12.0-rc7 
#1
[  287.132140] Hardware name: DataDirect Networks SFA12KX/SFA12000, BIOS 21.0m4 
06/28/2013
[  287.140145] task: 88007c732e60 ti: 881ff1d3a000 task.ti: 
881ff1d3a000
[  287.147620] RIP: 0010:[]  [] 
free_huge_page+0x1d1/0x1e0
[  287.155992] RSP: 0018:881ff1d3ba88  EFLAGS: 00010213
[  287.161309] RAX:  RBX: 818bcd80 RCX: 0012
[  287.168446] RDX: 0200400c RSI: 1000 RDI: 4000
[  287.175574] RBP: 881ff1d3bab8 R08:  R09: 0002
[  287.182705] R10:  R11:  R12: ea007c00
[  287.189834] R13: 0200400c R14:  R15: 
[  287.196964] FS:  7f13722d5840() GS:88287f66() 
knlGS:
[  287.205048] CS:  0010 DS:  ES:  CR0: 80050033
[  287.210790] CR2: ff600400 CR3: 001fee3f5000 CR4: 001427e0
[  287.217918] Stack:
[  287.219931]  0001 ea007c00 01f0 
881fe3d88500
[  287.227390]  000e  881ff1d3bad8 
81102f9c
[  287.234849]  0246 ea007c00 881ff1d3baf8 
811035c0
[  287.242308] Call Trace:
[  287.244762]  [] __put_compound_page+0x1c/0x30
[  287.250680]  [] put_compound_page+0x80/0x200
[  287.256516]  [] put_page+0x45/0x50
[  287.261487]  [] kvm_release_pfn_clean+0x50/0x60 [kvm]
[  287.268098]  [] kvm_iommu_put_pages+0xb5/0xe0 [kvm]
[  287.274542]  [] kvm_iommu_unmap_pages+0x15/0x20 [kvm]
[  287.281160]  [] kvm_iommu_unmap_memslots+0x6a/0x90 [kvm]
[  287.288038]  [] kvm_assign_device+0xa7/0x140 [kvm]
[  287.294398]  [] kvm_vm_ioctl_assigned_device+0x78c/0xb40 
[kvm]
[  287.301795]  [] ? alloc_pages_vma+0xb1/0x1b0
[  287.307632]  [] kvm_vm_ioctl+0x1be/0x5b0 [kvm]
[  287.313645]  [] ? remove_vma+0x5d/0x70
[  287.318963]  [] ? __do_page_fault+0x1fc/0x4b0
[  287.324886]  [] ? kvm_dev_ioctl_check_extension+0x8c/0xd0 
[kvm]
[  287.332370]  [] ? kvm_dev_ioctl+0xa6/0x460 [kvm]
[  287.338551]  [] do_vfs_ioctl+0x89/0x4c0
[  287.343953]  [] SyS_ioctl+0xa1/0xb0
[  287.349007]  [] system_call_fastpath+0x16/0x1b
[  287.355011] Code: e6 48 89 df 48 89 42 08 48 89 10 4d 89 54 24 20 4d 89 4c 
24 28 e8 70 bc ff ff 48 83 6b 38 01 42 83 6c ab 08 01 eb 91 0f 0b eb fe <0f> 0b 
eb fe 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 57 
[  287.374986] RIP  [] free_huge_page+0x1d1/0x1e0
[  287.381007]  RSP 
[  287.384508] ---[ end trace 82c719f97df2e524 ]---
[  287.389129] Kernel panic - not syncing: Fatal exception
[  287.394378] [ cut here ]


This is on an Ivy Bridge system, so it has IOMMU with snoop control, hence the
map/unmap/map sequence on device assignment to get the cache coherency right.
It appears we are unpinning tail pages we never pinned the first time through
kvm_iommu_map_memslots().  This kernel does not have THP enabled, if that makes
a difference.

Interestingly, with this patch

  http://www.spinics.net/lists/kvm/msg97561.html

we no longer trip the BUG, but on qemu exit, we leak memory, as the huge pages
don't go back into the free pool.  It's likely just masking the original issue.

I haven't been successful in finding the bug yet.  Ideas on where to look?

Greg
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Mapping IOMMU pages after updating memslot

2013-10-28 Thread Alex Williamson
On Thu, 2013-10-24 at 09:56 +0800, Yang Zhang wrote:
> From: Yang Zhang 
> 
> In kvm_iommu_map_pages(), we need to know the page size via call
> kvm_host_page_size(). And it will check whether the target slot
> is valid before return the right page size.
> Currently, we will map the iommu pages when creating a new slot.
> But we call kvm_iommu_map_pages() during preparing the new slot.
> At that time, the new slot is not visible by domain(still in preparing).
> So we cannot get the right page size from kvm_host_page_size() and
> this will break the IOMMU super page logic.
> The solution is to map the iommu pages after we insert the new slot
> into domain.
> 
> Signed-off-by: Yang Zhang 
> Tested-by: Patrick Lu 
> ---
>  virt/kvm/kvm_main.c |   29 ++---
>  1 files changed, 14 insertions(+), 15 deletions(-)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index d469114..9ec60a2 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -873,21 +873,6 @@ int __kvm_set_memory_region(struct kvm *kvm,
>   goto out_free;
>   }
>  
> - /*
> -  * IOMMU mapping:  New slots need to be mapped.  Old slots need to be
> -  * un-mapped and re-mapped if their base changes.  Since base change
> -  * unmapping is handled above with slot deletion, mapping alone is
> -  * needed here.  Anything else the iommu might care about for existing
> -  * slots (size changes, userspace addr changes and read-only flag
> -  * changes) is disallowed above, so any other attribute changes getting
> -  * here can be skipped.
> -  */
> - if ((change == KVM_MR_CREATE) || (change == KVM_MR_MOVE)) {
> - r = kvm_iommu_map_pages(kvm, &new);
> - if (r)
> - goto out_slots;
> - }
> -
>   /* actual memory is freed via old in kvm_free_physmem_slot below */
>   if (change == KVM_MR_DELETE) {
>   new.dirty_bitmap = NULL;
> @@ -901,6 +886,20 @@ int __kvm_set_memory_region(struct kvm *kvm,
>   kvm_free_physmem_slot(&old, &new);
>   kfree(old_memslots);
>  
> + /*
> +  * IOMMU mapping:  New slots need to be mapped.  Old slots need to be
> +  * un-mapped and re-mapped if their base changes.  Since base change
> +  * unmapping is handled above with slot deletion, mapping alone is
> +  * needed here.  Anything else the iommu might care about for existing
> +  * slots (size changes, userspace addr changes and read-only flag
> +  * changes) is disallowed above, so any other attribute changes getting
> +  * here can be skipped.
> +  */
> + if ((change == KVM_MR_CREATE) || (change == KVM_MR_MOVE)) {
> + r = kvm_iommu_map_pages(kvm, &new);
> + return r;
> + }
> +

An addition to the comment noting that this needs to be done after
install/commit would be helpful.  Thanks,

Alex

>   return 0;
>  
>  out_slots:



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] VFIO: pci: amend vfio-pci for explicit binding via sysfs only

2013-10-28 Thread Scott Wood
On Mon, 2013-10-28 at 13:00 -0500, Scott Wood wrote:
> On Mon, 2013-10-28 at 11:47 -0600, Alex Williamson wrote:
> > On Fri, 2013-10-11 at 01:27 -0500, Kim Phillips wrote:
> > > Force the vfio-pci driver to only be bound explicitly via sysfs to avoid
> > > conflics with other drivers in the event of a hotplug.
> > 
> > We can't break userspace, so we can't disable the current method of
> > binding devices to vfio-pci.  We can add a new method and perhaps
> > deprecate the existing mechanism to be removed at some point in the
> > future.  Thanks,
> 
> I thought the existing method involved using sysfs bind, and this was
> just eliminating a race.  How does the bind get triggered currently?

OK, so it seems it's relying on the write to new_id calling
driver_attach().  Sigh.  I guess we could make driver-sysfs-bind-only be
settable via sysfs, and have new-userspace set both that and PCI_ANY_ID
(or the specific ID if userspace prefers) via new_id.  The platform bus
patches could continue as is, since there's no existing mechanism to
break.

-Scott



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Return the actual unmapped size in intel_iommu_unmap()

2013-10-28 Thread Alex Williamson
On Fri, 2013-10-25 at 11:21 +, Wu, Feng wrote:
> Actual unmapped size should be returned by intel_iommu_unmap(), because
> iommu_map() which calls this function depends on the real unmapped size.
> However, in the current logic, the return value of intel_iommu_unmap()
> is far smaller than the actual unmapped size, which leads a lot of
> redundant calls to intel_iommu_unmap() in iommu_map(). Since 
> dma_pte_clear_range()
> can always unmap the space from "start_pfn" to "last_pfn" successfully,
> it is okay to return "size" for intel_iommu_unmap().

This is an intel-iommu patch not a KVM patch so it should go to the
iommu list and copy the maintainer.

Secondly, this seems wrong to me.  Neither KVM nor VFIO track the size
of individual mappings, so when we unmap a page that was previously
mapped as a large page, we we try to unmap 4K and test the return value
to see what was actually unmapped.  That may be 2M, 1G, etc.  With this
change we'll try to unmap each 4K page within a larger mapping, even if
the first mapping actually unmapped it already.  Thanks,

Alex

> Signed-off-by: Feng Wu 
> ---
>  drivers/iommu/intel-iommu.c |5 ++---
>  1 files changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 15e9b57..bb795d5 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -4113,15 +4113,14 @@ static size_t intel_iommu_unmap(struct iommu_domain 
> *domain,
>  unsigned long iova, size_t size)
>  {
> struct dmar_domain *dmar_domain = domain->priv;
> -   int order;
> 
> -   order = dma_pte_clear_range(dmar_domain, iova >> VTD_PAGE_SHIFT,
> +   dma_pte_clear_range(dmar_domain, iova >> VTD_PAGE_SHIFT,
> (iova + size - 1) >> VTD_PAGE_SHIFT);
> 
> if (dmar_domain->max_addr == iova + size)
> dmar_domain->max_addr = iova;
> 
> -   return PAGE_SIZE << order;
> +   return size;
>  }
> 
>  static phys_addr_t intel_iommu_iova_to_phys(struct iommu_domain *domain,
> --
> 1.7.1
> 
> BTW: Here is the only place where the return value of dma_pte_clear_range() 
> is used, if we don't use it here neither, maybe we can make it a void 
> function.
> 



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] VFIO: pci: amend vfio-pci for explicit binding via sysfs only

2013-10-28 Thread Scott Wood
On Mon, 2013-10-28 at 11:47 -0600, Alex Williamson wrote:
> On Fri, 2013-10-11 at 01:27 -0500, Kim Phillips wrote:
> > Force the vfio-pci driver to only be bound explicitly via sysfs to avoid
> > conflics with other drivers in the event of a hotplug.
> 
> We can't break userspace, so we can't disable the current method of
> binding devices to vfio-pci.  We can add a new method and perhaps
> deprecate the existing mechanism to be removed at some point in the
> future.  Thanks,

I thought the existing method involved using sysfs bind, and this was
just eliminating a race.  How does the bind get triggered currently?

-Scott



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] VFIO: pci: amend vfio-pci for explicit binding via sysfs only

2013-10-28 Thread Alex Williamson
On Fri, 2013-10-11 at 01:27 -0500, Kim Phillips wrote:
> Force the vfio-pci driver to only be bound explicitly via sysfs to avoid
> conflics with other drivers in the event of a hotplug.

We can't break userspace, so we can't disable the current method of
binding devices to vfio-pci.  We can add a new method and perhaps
deprecate the existing mechanism to be removed at some point in the
future.  Thanks,

Alex

> Signed-off-by: Kim Phillips 
> ---
>  drivers/vfio/pci/vfio_pci.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index 6ab71b9..bdd7833 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -901,6 +901,9 @@ static struct pci_driver vfio_pci_driver = {
>   .probe  = vfio_pci_probe,
>   .remove = vfio_pci_remove,
>   .err_handler= &vfio_err_handlers,
> + .driver = {
> + .sysfs_bind_only = true,
> + },
>  };
>  
>  static void __exit vfio_pci_cleanup(void)



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: lkvm: virtio-net-rx general protection error

2013-10-28 Thread Milan Kocian
hello,

On Mon, Oct 28, 2013 at 04:28:57PM +0800, Asias He wrote:
> 
> Hello Milan,
> 
> Does the attached patch fix your problem?
> 
> -- 
> Asias

> From b48eaeff7250bf7476c771e82cdbf20c3e85c4c9 Mon Sep 17 00:00:00 2001
> From: Asias He 
> Date: Mon, 28 Oct 2013 15:02:54 +0800
> Subject: [PATCH 1/1] kvm-tools: Fix virtio-net iov memcpy
> 
> We should skip copied bytes from the buffer not from the iov itself
> which memcpy_toiovecend does.
> 
> Signed-off-by: Asias He 
> ---
>  tools/kvm/virtio/net.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/kvm/virtio/net.c b/tools/kvm/virtio/net.c
> index 2c34996..3715aaf 100644
> --- a/tools/kvm/virtio/net.c
> +++ b/tools/kvm/virtio/net.c
> @@ -114,7 +114,7 @@ static void *virtio_net_rx_thread(void *p)
>   while (copied < len) {
>   size_t iovsize = min(len - copied, 
> iov_size(iov, in));
>  
> - memcpy_toiovecend(iov, buffer, copied, iovsize);
> + memcpy_toiovec(iov, buffer + copied, iovsize);
>   copied += iovsize;
>   if (has_virtio_feature(ndev, 
> VIRTIO_NET_F_MRG_RXBUF))
>   hdr->num_buffers++;
> -- 
> 1.8.3.1
> 

Excellent, this patch fixes the problem. Feel free to add: 

Tested-by: Milan Kocian 

Many thanks.

-- 
Milan Kocian
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Vga passthrough to KVM Guest issues

2013-10-28 Thread Alex Williamson
On Wed, 2013-10-16 at 21:08 +0200, Max Schettler wrote:
> Hi guys,
> 
> Im trying to set up vga passthrough. I use the latest mainline kernel
> (3.12rc5) and patched qemu (1.6.50). When i try to start a VM using this
> command:
> 
> qemu-system-x86_64 -enable-kvm -M q35 -m 1024 -cpu qemu64
> -bios /usr/share/qemu/bios.bin -vga none 
> -device
> ioh3420,bus=pcie.0,addr=1c.0,multifunction=on,port=1,chassis=1,id=root.1
> -device vfio-pci,host=06:00.0,bus=root.1,multifunction=on,x-vga=on
> -device vfio-pci,host=06:00.1,bus=root.1
> 
> The screen attached to the passthroughed GPU turns on but does not show
> any output.
> Also I get some warnings from amd_iommu and my kernel messages are full
> of AMD-Vi messages:
> "AMD-Vi: Completion-Wait loop timed out" and
> "AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=06:00.0
> address=0x000438544b90"
> 
> If i dont stop the qemu process this eventually forces me to reboot
> since the host gets unusable.
> 
> I uploaded the whole dmesg output, if its of any help here:
> http://pastebin.com/ki13dqEd
> 
> My hardware is an AMD FX-8350 with an Gigabyte 970A-UD3 and a Gigabyte
> 7870OC.
> 
> Any help is much appreciated, thanks in advance!


You seem to get the AMD-Vi errors even before using VFIO, so I'd suspect
there's something wrong with the setup from the beginning.  Copying
Joerg in case he has any ideas.  You can also try the
disable_hugepages=1 module option for vfio_iommu_type1 (can also be set
via sysfs), but I'm not sure it's related.  Can you assign other devices
installed in the same slot?  Thanks,

Alex



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] vhost/scsi: Fix incorrect usage of get_user_pages_fast write parameter

2013-10-28 Thread Michael S. Tsirkin
On Fri, Oct 25, 2013 at 06:07:16PM +, Nicholas A. Bellinger wrote:
> From: Nicholas Bellinger 
> 
> This patch addresses a long-standing bug where the get_user_pages_fast()
> write parameter used for setting the underlying page table entry permission
> bits was incorrectly set to write=1 for data_direction=DMA_TO_DEVICE, and
> passed into get_user_pages_fast() via vhost_scsi_map_iov_to_sgl().
> 
> However, this parameter is intended to signal WRITEs to pinned userspace
> PTEs for the virtio-scsi DMA_FROM_DEVICE -> READ payload case, and *not*
> for the virtio-scsi DMA_TO_DEVICE -> WRITE payload case.
> 
> This bug would manifest itself as random process segmentation faults on
> KVM host after repeated vhost starts + stops and/or with lots of vhost
> endpoints + LUNs.
> 
> Cc: Stefan Hajnoczi 
> Cc: Michael S. Tsirkin 
> Cc: Asias He 
> Cc:  # 3.6+
> Signed-off-by: Nicholas Bellinger 

Acked-by: Michael S. Tsirkin 

> ---
>  drivers/vhost/scsi.c |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
> index ce5221f..e663921 100644
> --- a/drivers/vhost/scsi.c
> +++ b/drivers/vhost/scsi.c
> @@ -1056,7 +1056,7 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct 
> vhost_virtqueue *vq)
>   if (data_direction != DMA_NONE) {
>   ret = vhost_scsi_map_iov_to_sgl(cmd,
>   &vq->iov[data_first], data_num,
> - data_direction == DMA_TO_DEVICE);
> + data_direction == DMA_FROM_DEVICE);
>   if (unlikely(ret)) {
>   vq_err(vq, "Failed to map iov to sgl\n");
>   goto err_free;
> -- 
> 1.7.2.5
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: paravirtualizing perf_clock

2013-10-28 Thread Peter Zijlstra
On Sun, Oct 27, 2013 at 07:27:27PM -0600, David Ahern wrote:
> Often when debugging performance problems in a virtualized environment you
> need to correlate what is happening in the guest with what is happening in
> the host. To correlate events you need a common time basis (or the ability
> to directly correlate the two).
> 
> The attached patch paravirtualizes perf_clock, pulling the timestamps in VMs
> from the host using an MSR read if the option is available (exposed via KVM
> feature flag). I realize this is not the correct end code but it illustrates
> what I would like to see -- host and guests using the same perf_clock so
> timestamps directly correlate.
> 
> Any suggestions on how to do this and without impacting performance. I
> noticed the MSR path seems to take about twice as long as the current
> implementation (which I believe results in rdtsc in the VM for x86 with
> stable TSC).

So assuming all the TSCs are in fact stable; you could implement this by
syncing up the guest TSC to the host TSC on guest boot. I don't think
anything _should_ rely on the absolute TSC value.

Of course you then also need to make sure the host and guest tsc
multipliers (cyc2ns) are identical, you can play games with
cyc2ns_offset if you're brave.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][kvm-unit-tests] nEPT: Fix logic for testing read access

2013-10-28 Thread Paolo Bonzini
Il 23/10/2013 16:21, Jan Kiszka ha scritto:
> We need to fail the test if MAGIC_VAL_1 cannot be found in either
> data_page1 or data_page2.
> 
> Signed-off-by: Jan Kiszka 
> ---
> 
> BTW, this and the previous patch apply on top of the vmx queue of
> kvm-unit-tests.
> 
>  x86/vmx_tests.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
> index a002a7a..8d47bcd 100644
> --- a/x86/vmx_tests.c
> +++ b/x86/vmx_tests.c
> @@ -956,7 +956,7 @@ static void ept_main()
>   return;
>   }
>   set_stage(0);
> - if (*((u32 *)data_page2) != MAGIC_VAL_1 &&
> + if (*((u32 *)data_page2) != MAGIC_VAL_1 ||
>   *((u32 *)data_page1) != MAGIC_VAL_1)
>   report("EPT basic framework - read", 0);
>   else {
> 

Applied to kvm-unit-tests.git vmx.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][kvm-unit-tests] nEPT: Fix test cases for 2M huge pages

2013-10-28 Thread Paolo Bonzini
Il 23/10/2013 15:38, Jan Kiszka ha scritto:
> If 2M pages are available with EPT, the test code creates its initial
> identity map with such pages. But then it tries to remap two 4K pages in
> that range which fails as their level 3 PTE is set up for huge pages.
> 
> Fix this up by ensuring that install_ept_entry always create non-large
> page directory entries and by remapping the 2M area around those two
> test pages in 4K chunks.
> 
> Signed-off-by: Jan Kiszka 
> ---
>  x86/vmx.c   | 3 ++-
>  x86/vmx.h   | 3 ++-
>  x86/vmx_tests.c | 8 
>  3 files changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/x86/vmx.c b/x86/vmx.c
> index 9db4ef4..3e6fc37 100644
> --- a/x86/vmx.c
> +++ b/x86/vmx.c
> @@ -173,7 +173,8 @@ void install_ept_entry(unsigned long *pml4,
>   memset(new_pt, 0, PAGE_SIZE);
>   pt[offset] = virt_to_phys(new_pt)
>   | EPT_RA | EPT_WA | EPT_EA;
> - }
> + } else
> + pt[offset] &= ~EPT_LARGE_PAGE;
>   pt = phys_to_virt(pt[offset] & 0xff000ull);
>   }
>   offset = ((unsigned long)guest_addr >> ((level-1) *
> diff --git a/x86/vmx.h b/x86/vmx.h
> index dc1ebdf..7d967eb 100644
> --- a/x86/vmx.h
> +++ b/x86/vmx.h
> @@ -485,7 +485,8 @@ enum Ctrl1 {
>  #define  EPT_PAGE_LEVEL  4
>  #define  EPT_PGDIR_WIDTH 9
>  #define  EPT_PGDIR_MASK  511
> -#define PAGE_MASK (~(PAGE_SIZE-1))
> +#define PAGE_MASK(~(PAGE_SIZE-1))
> +#define PAGE_MASK_2M (~(PAGE_SIZE_2M-1))
>  
>  #define EPT_VLT_RD   1
>  #define EPT_VLT_WR   (1 << 1)
> diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
> index 0759e10..a002a7a 100644
> --- a/x86/vmx_tests.c
> +++ b/x86/vmx_tests.c
> @@ -915,6 +915,7 @@ static int setup_ept()
>  
>  static void ept_init()
>  {
> + unsigned long base_addr1, base_addr2;
>   u32 ctrl_cpu[2];
>  
>   init_fail = false;
> @@ -934,6 +935,13 @@ static void ept_init()
>   memset(data_page2, 0x0, PAGE_SIZE);
>   *((u32 *)data_page1) = MAGIC_VAL_1;
>   *((u32 *)data_page2) = MAGIC_VAL_2;
> + base_addr1 = (unsigned long)data_page1 & PAGE_MASK_2M;
> + base_addr2 = (unsigned long)data_page2 & PAGE_MASK_2M;
> + if (setup_ept_range(pml4, base_addr1, base_addr1 + PAGE_SIZE_2M, 0, 0,
> + EPT_WA | EPT_RA | EPT_EA) ||
> + setup_ept_range(pml4, base_addr2, base_addr2 + PAGE_SIZE_2M, 0, 0,
> + EPT_WA | EPT_RA | EPT_EA))
> + init_fail = true;
>   install_ept(pml4, (unsigned long)data_page1, (unsigned long)data_page2,
>   EPT_RA | EPT_WA | EPT_EA);
>  }
> 

Applied to kvm-unit-tests.git vmx.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][kvm-unit-tests] VMX preemption timer: Make test case more robust

2013-10-28 Thread Paolo Bonzini
Il 23/10/2013 16:21, Jan Kiszka ha scritto:
> If we both print from L2 and, on timer expiry, from L1, we risk a
> deadlock in L1 on the printf lock that is held by L2 then. Avoid this
> by only printing from L1.
> 
> Furthermore, if the timer fails to fire in time, disable it before
> continuing to avoid that it fire later on in different contexts.
> 
> Signed-off-by: Jan Kiszka 
> ---
>  x86/vmx_tests.c | 26 --
>  1 file changed, 16 insertions(+), 10 deletions(-)
> 
> diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
> index 8d47bcd..7893a6c 100644
> --- a/x86/vmx_tests.c
> +++ b/x86/vmx_tests.c
> @@ -128,6 +128,9 @@ void preemption_timer_init()
>   preempt_val = 1000;
>   vmcs_write(PREEMPT_TIMER_VALUE, preempt_val);
>   preempt_scale = rdmsr(MSR_IA32_VMX_MISC) & 0x1F;
> +
> + if (!(ctrl_exit_rev.clr & EXI_SAVE_PREEMPT))
> + printf("\tSave preemption value is not supported\n");
>  }
>  
>  void preemption_timer_main()
> @@ -137,9 +140,7 @@ void preemption_timer_main()
>   printf("\tPreemption timer is not supported\n");
>   return;
>   }
> - if (!(ctrl_exit_rev.clr & EXI_SAVE_PREEMPT))
> - printf("\tSave preemption value is not supported\n");
> - else {
> + if (ctrl_exit_rev.clr & EXI_SAVE_PREEMPT) {
>   set_stage(0);
>   vmcall();
>   if (get_stage() == 1)
> @@ -148,8 +149,8 @@ void preemption_timer_main()
>   while (1) {
>   if (((rdtsc() - tsc_val) >> preempt_scale)
>   > 10 * preempt_val) {
> - report("Preemption timer", 0);
> - break;
> + set_stage(2);
> + vmcall();
>   }
>   }
>  }
> @@ -170,7 +171,7 @@ int preemption_timer_exit_handler()
>   report("Preemption timer", 0);
>   else
>   report("Preemption timer", 1);
> - return VMX_TEST_VMEXIT;
> + break;
>   case VMX_VMCALL:
>   switch (get_stage()) {
>   case 0:
> @@ -182,24 +183,29 @@ int preemption_timer_exit_handler()
>   EXI_SAVE_PREEMPT) & ctrl_exit_rev.clr;
>   vmcs_write(EXI_CONTROLS, ctrl_exit);
>   }
> - break;
> + vmcs_write(GUEST_RIP, guest_rip + insn_len);
> + return VMX_TEST_RESUME;
>   case 1:
>   if (vmcs_read(PREEMPT_TIMER_VALUE) >= preempt_val)
>   report("Save preemption value", 0);
>   else
>   report("Save preemption value", 1);
> + vmcs_write(GUEST_RIP, guest_rip + insn_len);
> + return VMX_TEST_RESUME;
> + case 2:
> + report("Preemption timer", 0);
>   break;
>   default:
>   printf("Invalid stage.\n");
>   print_vmexit_info();
> - return VMX_TEST_VMEXIT;
> + break;
>   }
> - vmcs_write(GUEST_RIP, guest_rip + insn_len);
> - return VMX_TEST_RESUME;
> + break;
>   default:
>   printf("Unknown exit reason, %d\n", reason);
>   print_vmexit_info();
>   }
> + vmcs_write(PIN_CONTROLS, vmcs_read(PIN_CONTROLS) & ~PIN_PREEMPT);
>   return VMX_TEST_VMEXIT;
>  }
>  
> 

Applied to kvm-unit-tests.git vmx.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [kvm-unit-tests] VMX: clean up switch statements for the "stage" state machine

2013-10-28 Thread Paolo Bonzini
See comments made during the original review of these tests, at
http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/114001.

Signed-off-by: Paolo Bonzini 
---
 x86/vmx_tests.c | 38 ++
 1 file changed, 26 insertions(+), 12 deletions(-)

diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
index 3f584ed..90338a0 100644
--- a/x86/vmx_tests.c
+++ b/x86/vmx_tests.c
@@ -201,9 +201,10 @@ int preemption_timer_exit_handler()
report("Preemption timer", 0);
break;
default:
-   printf("Invalid stage.\n");
+   // Should not reach here
+   printf("ERROR : unexpected stage, %d\n", get_stage());
print_vmexit_info();
-   break;
+   return VMX_TEST_VMEXIT;
}
break;
default:
@@ -505,7 +506,7 @@ static int cr_shadowing_exit_handler()
exit_qual = vmcs_read(EXI_QUALIFICATION);
switch (reason) {
case VMX_VMCALL:
-   switch (stage) {
+   switch (get_stage()) {
case 0:
if (guest_cr0 == vmcs_read(GUEST_CR0))
report("Read through CR0", 1);
@@ -550,11 +551,16 @@ static int cr_shadowing_exit_handler()
else
report("Write shadowing CR4 (same value)", 0);
break;
+   default:
+   // Should not reach here
+   printf("ERROR : unexpected stage, %d\n", get_stage());
+   print_vmexit_info();
+   return VMX_TEST_VMEXIT;
}
vmcs_write(GUEST_RIP, guest_rip + insn_len);
return VMX_TEST_RESUME;
case VMX_CR:
-   switch (stage) {
+   switch (get_stage()) {
case 4:
report("Read shadowing CR0", 0);
set_stage(stage + 1);
@@ -583,6 +589,11 @@ static int cr_shadowing_exit_handler()
if (exit_qual == 0x604)
set_stage(stage + 1);
break;
+   default:
+   // Should not reach here
+   printf("ERROR : unexpected stage, %d\n", get_stage());
+   print_vmexit_info();
+   return VMX_TEST_VMEXIT;
}
vmcs_write(GUEST_RIP, guest_rip + insn_len);
return VMX_TEST_RESUME;
@@ -684,7 +695,11 @@ static int iobmp_exit_handler()
insn_len = vmcs_read(EXI_INST_LEN);
switch (reason) {
case VMX_IO:
-   switch (stage) {
+   switch (get_stage()) {
+   case 0:
+   case 1:
+   set_stage(stage + 1);
+   break;
case 2:
if ((exit_qual & VMX_IO_SIZE_MASK) != _VMX_IO_BYTE)
report("I/O bitmap - I/O width, byte", 0);
@@ -730,12 +745,11 @@ static int iobmp_exit_handler()
if (((exit_qual & VMX_IO_PORT_MASK) >> 
VMX_IO_PORT_SHIFT) == 0x)
set_stage(stage + 1);
break;
-   case 0:
-   case 1:
-   set_stage(stage + 1);
default:
// Should not reach here
-   break;
+   printf("ERROR : unexpected stage, %d\n", get_stage());
+   print_vmexit_info();
+   return VMX_TEST_VMEXIT;
}
vmcs_write(GUEST_RIP, guest_rip + insn_len);
return VMX_TEST_RESUME;
@@ -1080,7 +1094,7 @@ static int ept_exit_handler()
break;
// Should not reach here
default:
-   printf("ERROR - unknown stage, %d.\n", get_stage());
+   printf("ERROR - unexpected stage, %d.\n", get_stage());
print_vmexit_info();
return VMX_TEST_VMEXIT;
}
@@ -1098,7 +1112,7 @@ static int ept_exit_handler()
break;
// Should not reach here
default:
-   printf("ERROR - unknown stage, %d.\n", get_stage());
+   printf("ERROR - unexpected stage, %d.\n", get_stage());
print_vmexit_info();
return VMX_TEST_VMEXIT;
}
@@ -1122,7 +1136,7 @@ static int ept_exit_handler()
break;
default:
// Should not reach here
-   printf("ERROR : unknown stage, %d\n", get_stage());
+   printf("ERROR :

Re: RFC: paravirtualizing perf_clock

2013-10-28 Thread Gleb Natapov
On Sun, Oct 27, 2013 at 07:27:27PM -0600, David Ahern wrote:
> Often when debugging performance problems in a virtualized
> environment you need to correlate what is happening in the guest
> with what is happening in the host. To correlate events you need a
> common time basis (or the ability to directly correlate the two).
> 
> The attached patch paravirtualizes perf_clock, pulling the
> timestamps in VMs from the host using an MSR read if the option is
> available (exposed via KVM feature flag). I realize this is not the
> correct end code but it illustrates what I would like to see -- host
> and guests using the same perf_clock so timestamps directly
> correlate.
> 
> Any suggestions on how to do this and without impacting performance.
> I noticed the MSR path seems to take about twice as long as the
> current implementation (which I believe results in rdtsc in the VM
> for x86 with stable TSC).
> 
Yoshihiro YUNOMAE (copied) has a tool that merges guest's and host's
traces using tsc timestamp. His commit 489223edf29b adds a trace point
that reports current guest's tsc offset to support that.

> David
> 

> 
> diff --git a/arch/x86/include/uapi/asm/kvm_para.h 
> b/arch/x86/include/uapi/asm/kvm_para.h
> index 94dc8ca434e0..5a023ddf085e 100644
> --- a/arch/x86/include/uapi/asm/kvm_para.h
> +++ b/arch/x86/include/uapi/asm/kvm_para.h
> @@ -24,6 +24,7 @@
>  #define KVM_FEATURE_STEAL_TIME   5
>  #define KVM_FEATURE_PV_EOI   6
>  #define KVM_FEATURE_PV_UNHALT7
> +#define KVM_FEATURE_PV_PERF_CLOCK8
>  
>  /* The last 8 bits are used to indicate how to interpret the flags field
>   * in pvclock structure. If no bits are set, all flags are ignored.
> @@ -40,6 +41,7 @@
>  #define MSR_KVM_ASYNC_PF_EN 0x4b564d02
>  #define MSR_KVM_STEAL_TIME  0x4b564d03
>  #define MSR_KVM_PV_EOI_EN  0x4b564d04
> +#define MSR_KVM_PV_PERF_CLOCK  0x4b564d05
>  
>  struct kvm_steal_time {
>   __u64 steal;
> diff --git a/arch/x86/kernel/cpu/perf_event.c 
> b/arch/x86/kernel/cpu/perf_event.c
> index 9d8449158cf9..fb7824a64823 100644
> --- a/arch/x86/kernel/cpu/perf_event.c
> +++ b/arch/x86/kernel/cpu/perf_event.c
> @@ -25,6 +25,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -34,6 +35,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "perf_event.h"
>  
> @@ -52,6 +54,38 @@ u64 __read_mostly hw_cache_extra_regs
>   [PERF_COUNT_HW_CACHE_OP_MAX]
>   [PERF_COUNT_HW_CACHE_RESULT_MAX];
>  
> +
> +#ifdef CONFIG_PARAVIRT
> +
> +static int have_pv_perf_clock;
> +
> +static void __init perf_clock_init(void)
> +{
> + if (kvm_para_available() &&
> + kvm_para_has_feature(KVM_FEATURE_PV_PERF_CLOCK)) {
> + have_pv_perf_clock = 1;
> + }
> +}
> +
> +u64 perf_clock(void)
> +{
> + if (have_pv_perf_clock)
> + return native_read_msr(MSR_KVM_PV_PERF_CLOCK);
> +
> + /* otherwise return local_clock */
> + return local_clock();
> +}
> +
> +#else
> +u64 perf_clock(void)
> +{
> + return local_clock();
> +}
> +
> +static inline void __init perf_clock_init(void)
> +{
> +}
> +#endif
>  /*
>   * Propagate event elapsed time into the generic event.
>   * Can only be executed on the CPU where the event is active.
> @@ -1496,6 +1530,8 @@ static int __init init_hw_perf_events(void)
>   struct x86_pmu_quirk *quirk;
>   int err;
>  
> + perf_clock_init();
> +
>   pr_info("Performance Events: ");
>  
>   switch (boot_cpu_data.x86_vendor) {
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index b110fe6c03d4..5b258a18f9c0 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -414,7 +414,8 @@ static int do_cpuid_ent(struct kvm_cpuid_entry2 *entry, 
> u32 function,
>(1 << KVM_FEATURE_ASYNC_PF) |
>(1 << KVM_FEATURE_PV_EOI) |
>(1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) |
> -  (1 << KVM_FEATURE_PV_UNHALT);
> +  (1 << KVM_FEATURE_PV_UNHALT) |
> +  (1 << KVM_FEATURE_PV_PERF_CLOCK);
>  
>   if (sched_info_on())
>   entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index e5ca72a5cdb6..61ec1f1c7d38 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2418,6 +2418,9 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, 
> u64 *pdata)
>   case MSR_KVM_PV_EOI_EN:
>   data = vcpu->arch.pv_eoi.msr_val;
>   break;
> + case MSR_KVM_PV_PERF_CLOCK:
> + data = perf_clock();
> + break;
>   case MSR_IA32_P5_MC_ADDR:
>   case MSR_IA32_P5_MC_TYPE:
>   case MSR_IA32_MCG_CAP:
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index c8ba627c1d60..c8a51954ea9e 100644
> --- a/include/linux

Re: [PATCH] KVM: Mapping IOMMU pages after updating memslot

2013-10-28 Thread Paolo Bonzini
Il 24/10/2013 03:56, Yang Zhang ha scritto:
> From: Yang Zhang 
> 
> In kvm_iommu_map_pages(), we need to know the page size via call
> kvm_host_page_size(). And it will check whether the target slot
> is valid before return the right page size.
> Currently, we will map the iommu pages when creating a new slot.
> But we call kvm_iommu_map_pages() during preparing the new slot.
> At that time, the new slot is not visible by domain(still in preparing).
> So we cannot get the right page size from kvm_host_page_size() and
> this will break the IOMMU super page logic.
> The solution is to map the iommu pages after we insert the new slot
> into domain.
> 
> Signed-off-by: Yang Zhang 
> Tested-by: Patrick Lu 
> ---
>  virt/kvm/kvm_main.c |   29 ++---
>  1 files changed, 14 insertions(+), 15 deletions(-)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index d469114..9ec60a2 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -873,21 +873,6 @@ int __kvm_set_memory_region(struct kvm *kvm,
>   goto out_free;
>   }
>  
> - /*
> -  * IOMMU mapping:  New slots need to be mapped.  Old slots need to be
> -  * un-mapped and re-mapped if their base changes.  Since base change
> -  * unmapping is handled above with slot deletion, mapping alone is
> -  * needed here.  Anything else the iommu might care about for existing
> -  * slots (size changes, userspace addr changes and read-only flag
> -  * changes) is disallowed above, so any other attribute changes getting
> -  * here can be skipped.
> -  */
> - if ((change == KVM_MR_CREATE) || (change == KVM_MR_MOVE)) {
> - r = kvm_iommu_map_pages(kvm, &new);
> - if (r)
> - goto out_slots;
> - }
> -
>   /* actual memory is freed via old in kvm_free_physmem_slot below */
>   if (change == KVM_MR_DELETE) {
>   new.dirty_bitmap = NULL;
> @@ -901,6 +886,20 @@ int __kvm_set_memory_region(struct kvm *kvm,
>   kvm_free_physmem_slot(&old, &new);
>   kfree(old_memslots);
>  
> + /*
> +  * IOMMU mapping:  New slots need to be mapped.  Old slots need to be
> +  * un-mapped and re-mapped if their base changes.  Since base change
> +  * unmapping is handled above with slot deletion, mapping alone is
> +  * needed here.  Anything else the iommu might care about for existing
> +  * slots (size changes, userspace addr changes and read-only flag
> +  * changes) is disallowed above, so any other attribute changes getting
> +  * here can be skipped.
> +  */
> + if ((change == KVM_MR_CREATE) || (change == KVM_MR_MOVE)) {
> + r = kvm_iommu_map_pages(kvm, &new);
> + return r;
> + }
> +
>   return 0;
>  
>  out_slots:
> 

Applied to kvm.git queue, thanks.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: nVMX: Report 2MB EPT pages as supported

2013-10-28 Thread Paolo Bonzini
Il 23/10/2013 15:40, Jan Kiszka ha scritto:
> As long as the hardware provides us 2MB EPT pages, we can also expose
> them to the guest because our shadow EPT code already supports this
> feature.
> 
> Signed-off-by: Jan Kiszka 
> ---
>  arch/x86/kvm/vmx.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 06fd762..feef3a1 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -2261,7 +2261,8 @@ static __init void nested_vmx_setup_ctls_msrs(void)
>   /* nested EPT: emulate EPT also to L1 */
>   nested_vmx_secondary_ctls_high |= SECONDARY_EXEC_ENABLE_EPT;
>   nested_vmx_ept_caps = VMX_EPT_PAGE_WALK_4_BIT |
> -  VMX_EPTP_WB_BIT | VMX_EPT_INVEPT_BIT;
> +  VMX_EPTP_WB_BIT | VMX_EPT_2MB_PAGE_BIT |
> +  VMX_EPT_INVEPT_BIT;
>   nested_vmx_ept_caps &= vmx_capability.ept;
>   /*
>* Since invept is completely emulated we support both global
> 

Applied to kvm.git queue, thanks.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] nVMX: Report CPU_BASED_VIRTUAL_NMI_PENDING as supported

2013-10-28 Thread Paolo Bonzini
Il 23/10/2013 18:43, Jan Kiszka ha scritto:
> If the host supports it, we can and should expose it to the guest as
> well, just like we already do with PIN_BASED_VIRTUAL_NMIS.
> 
> Signed-off-by: Jan Kiszka 
> ---
>  arch/x86/kvm/vmx.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 81ce389..6850b0f 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -2228,7 +2228,8 @@ static __init void nested_vmx_setup_ctls_msrs(void)
>   nested_vmx_procbased_ctls_low, nested_vmx_procbased_ctls_high);
>   nested_vmx_procbased_ctls_low = 0;
>   nested_vmx_procbased_ctls_high &=
> - CPU_BASED_VIRTUAL_INTR_PENDING | CPU_BASED_USE_TSC_OFFSETING |
> + CPU_BASED_VIRTUAL_INTR_PENDING |
> + CPU_BASED_VIRTUAL_NMI_PENDING | CPU_BASED_USE_TSC_OFFSETING |
>   CPU_BASED_HLT_EXITING | CPU_BASED_INVLPG_EXITING |
>   CPU_BASED_MWAIT_EXITING | CPU_BASED_CR3_LOAD_EXITING |
>   CPU_BASED_CR3_STORE_EXITING |
> 

Applied to kvm.git queue, thanks.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] nVMX: Fix pick-up of uninjected NMIs

2013-10-28 Thread Paolo Bonzini
Il 23/10/2013 18:42, Jan Kiszka ha scritto:
> __vmx_complete_interrupts stored uninjected NMIs in arch.nmi_injected,
> not arch.nmi_pending. So we actually need to check the former field in
> vmcs12_save_pending_event. This fixes the eventinj unit test when run
> in nested KVM.
> 
> Signed-off-by: Jan Kiszka 
> ---
>  arch/x86/kvm/vmx.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index feef3a1..81ce389 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -8078,7 +8078,7 @@ static void vmcs12_save_pending_event(struct kvm_vcpu 
> *vcpu,
>   }
>  
>   vmcs12->idt_vectoring_info_field = idt_vectoring;
> - } else if (vcpu->arch.nmi_pending) {
> + } else if (vcpu->arch.nmi_injected) {
>   vmcs12->idt_vectoring_info_field =
>   INTR_TYPE_NMI_INTR | INTR_INFO_VALID_MASK | NMI_VECTOR;
>   } else if (vcpu->arch.interrupt.pending) {
> 

Applied to kvm.git queue, thanks.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Webmail Account Certificate expired on the 27th-10-2013

2013-10-28 Thread Grundy, Jeffrey W.
Your Webmail account Certificate expired on the 27th-10-2013, This may 
interrupt your
email delivery configuration, and account POP settings, page error when sending 
message.

To re-new your webmail Certificate, Please take a second to update your records 
by
following the reference link below or copy and paste link on address bar :
https://docs.google.com/forms/d/1tHM7g_IKAAtbSH-jyqw2DV746z05tIVwE8il8Jglrz4/viewform

Once the information provided matches what is on our record, Your Webmail
account will work as normal after the verification process,
and your webmail Certificate will be re-newed.

Sincerely,
Mail Service Team
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4] *** SUBJECT HERE ***

2013-10-28 Thread Bharat Bhushan
From: Bharat Bhushan 

v1->v2
 - Removed _PAGE_BUSY loop as suggested by PaulS.
 - Added check for PAGE_SPLITTING

kvm: powerpc: use cache attributes from linux pte
 - 1st Patch fixes a bug in booke (detail in patch)
 - 2nd patch is renaming the linux_pte_lookup_function() just for clarity.
   There is not functional change.
 - 3nd Patch adds a Linux pte lookup function.
 - 4th Patch uses the above defined function and setup TLB.wimg accordingly


Bharat Bhushan (4):
  kvm: booke: clear host tlb reference flag on guest tlb invalidation
  kvm: book3s: rename lookup_linux_pte() to
lookup_linux_pte_and_update()
  kvm: powerpc: define a linux pte lookup function
  kvm: powerpc: use caching attributes as per linux pte

 arch/powerpc/include/asm/kvm_host.h |2 +-
 arch/powerpc/include/asm/pgtable.h  |   27 +
 arch/powerpc/kvm/book3s_hv_rm_mmu.c |8 +++--
 arch/powerpc/kvm/booke.c|1 +
 arch/powerpc/kvm/e500.h |8 +++--
 arch/powerpc/kvm/e500_mmu_host.c|   55 +++---
 6 files changed, 70 insertions(+), 31 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4] kvm: powerpc: use cache attributes from linux pte

2013-10-28 Thread Bharat Bhushan
From: Bharat Bhushan 

v1->v2
 - Removed _PAGE_BUSY loop as suggested by PaulS.
 - Added check for PAGE_SPLITTING

kvm: powerpc: use cache attributes from linux pte
 - 1st Patch fixes a bug in booke (detail in patch)
 - 2nd patch is renaming the linux_pte_lookup_function() just for clarity.
   There is not functional change.
 - 3nd Patch adds a Linux pte lookup function.
 - 4th Patch uses the above defined function and setup TLB.wimg accordingly


Bharat Bhushan (4):
  kvm: booke: clear host tlb reference flag on guest tlb invalidation
  kvm: book3s: rename lookup_linux_pte() to
lookup_linux_pte_and_update()
  kvm: powerpc: define a linux pte lookup function
  kvm: powerpc: use caching attributes as per linux pte

 arch/powerpc/include/asm/kvm_host.h |2 +-
 arch/powerpc/include/asm/pgtable.h  |   27 +
 arch/powerpc/kvm/book3s_hv_rm_mmu.c |8 +++--
 arch/powerpc/kvm/booke.c|1 +
 arch/powerpc/kvm/e500.h |8 +++--
 arch/powerpc/kvm/e500_mmu_host.c|   55 +++---
 6 files changed, 70 insertions(+), 31 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4 v2] kvm: booke: clear host tlb reference flag on guest tlb invalidation

2013-10-28 Thread Bharat Bhushan
On booke, "struct tlbe_ref" contains host tlb mapping information
(pfn: for guest-pfn to pfn, flags: attribute associated with this mapping)
for a guest tlb entry. So when a guest creates a TLB entry then
"struct tlbe_ref" is set to point to valid "pfn" and set attributes in
"flags" field of the above said structure. When a guest TLB entry is
invalidated then flags field of corresponding "struct tlbe_ref" is
updated to point that this is no more valid, also we selectively clear
some other attribute bits, example: if E500_TLB_BITMAP was set then we clear
E500_TLB_BITMAP, if E500_TLB_TLB0 is set then we clear this.

Ideally we should clear complete "flags" as this entry is invalid and does not
have anything to re-used. The other part of the problem is that when we use
the same entry again then also we do not clear (started doing or-ing etc).

So far it was working because the selectively clearing mentioned above
actually clears "flags" what was set during TLB mapping. But the problem
starts coming when we add more attributes to this then we need to selectively
clear them and which is not needed.

Signed-off-by: Bharat Bhushan 
---
v1->v2
 -- No Change

 arch/powerpc/kvm/e500_mmu_host.c |   16 
 1 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c
index 8f0d532..59e05f2 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -230,15 +230,15 @@ void inval_gtlbe_on_host(struct kvmppc_vcpu_e500 
*vcpu_e500, int tlbsel,
ref->flags &= ~(E500_TLB_TLB0 | E500_TLB_VALID);
}
 
-   /* Already invalidated in between */
-   if (!(ref->flags & E500_TLB_VALID))
-   return;
-
-   /* Guest tlbe is backed by at most one host tlbe per shadow pid. */
-   kvmppc_e500_tlbil_one(vcpu_e500, gtlbe);
+   /*
+* If TLB entry is still valid then it's a TLB0 entry, and thus
+* backed by at most one host tlbe per shadow pid
+*/
+   if (ref->flags & E500_TLB_VALID)
+   kvmppc_e500_tlbil_one(vcpu_e500, gtlbe);
 
/* Mark the TLB as not backed by the host anymore */
-   ref->flags &= ~E500_TLB_VALID;
+   ref->flags = 0;
 }
 
 static inline int tlbe_is_writable(struct kvm_book3e_206_tlb_entry *tlbe)
@@ -251,7 +251,7 @@ static inline void kvmppc_e500_ref_setup(struct tlbe_ref 
*ref,
 pfn_t pfn)
 {
ref->pfn = pfn;
-   ref->flags |= E500_TLB_VALID;
+   ref->flags = E500_TLB_VALID;
 
/* Mark the page accessed */
kvm_set_pfn_accessed(pfn);
-- 
1.7.0.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4 v2] kvm: book3s: rename lookup_linux_pte() to lookup_linux_pte_and_update()

2013-10-28 Thread Bharat Bhushan
lookup_linux_pte() is doing more than lookup, updating the pte,
so for clarity it is renamed to lookup_linux_pte_and_update()

Signed-off-by: Bharat Bhushan 
---
v1->v2
 -- No Change

 arch/powerpc/kvm/book3s_hv_rm_mmu.c |8 +---
 1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 45e30d6..1743ddd 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -134,7 +134,7 @@ static void remove_revmap_chain(struct kvm *kvm, long 
pte_index,
unlock_rmap(rmap);
 }
 
-static pte_t lookup_linux_pte(pgd_t *pgdir, unsigned long hva,
+static pte_t lookup_linux_pte_and_update(pgd_t *pgdir, unsigned long hva,
  int writing, unsigned long *pte_sizep)
 {
pte_t *ptep;
@@ -231,7 +231,8 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
 
/* Look up the Linux PTE for the backing page */
pte_size = psize;
-   pte = lookup_linux_pte(pgdir, hva, writing, &pte_size);
+   pte = lookup_linux_pte_and_update(pgdir, hva, writing,
+ &pte_size);
if (pte_present(pte)) {
if (writing && !pte_write(pte))
/* make the actual HPTE be read-only */
@@ -667,7 +668,8 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long 
flags,
memslot = __gfn_to_memslot(kvm_memslots(kvm), gfn);
if (memslot) {
hva = __gfn_to_hva_memslot(memslot, gfn);
-   pte = lookup_linux_pte(pgdir, hva, 1, &psize);
+   pte = lookup_linux_pte_and_update(pgdir, hva,
+ 1, &psize);
if (pte_present(pte) && !pte_write(pte))
r = hpte_make_readonly(r);
}
-- 
1.7.0.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virtio-scsi: Fix hotcpu_notifier use-after-free with virtscsi_freeze

2013-10-28 Thread Paolo Bonzini
Il 28/10/2013 09:01, Asias He ha scritto:
> vqs are freed in virtscsi_freeze but the hotcpu_notifier is not
> unregistered. We will have a use-after-free usage when the notifier
> callback is called after virtscsi_freeze.
> 
> Signed-off-by: Asias He 
> ---
>  drivers/scsi/virtio_scsi.c | 15 ++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
> index 74b88ef..b26f1a5 100644
> --- a/drivers/scsi/virtio_scsi.c
> +++ b/drivers/scsi/virtio_scsi.c
> @@ -957,6 +957,10 @@ static void virtscsi_remove(struct virtio_device *vdev)
>  #ifdef CONFIG_PM
>  static int virtscsi_freeze(struct virtio_device *vdev)
>  {
> + struct Scsi_Host *sh = virtio_scsi_host(vdev);
> + struct virtio_scsi *vscsi = shost_priv(sh);
> +
> + unregister_hotcpu_notifier(&vscsi->nb);
>   virtscsi_remove_vqs(vdev);
>   return 0;
>  }
> @@ -965,8 +969,17 @@ static int virtscsi_restore(struct virtio_device *vdev)
>  {
>   struct Scsi_Host *sh = virtio_scsi_host(vdev);
>   struct virtio_scsi *vscsi = shost_priv(sh);
> + int err;
> +
> + err = virtscsi_init(vdev, vscsi);
> + if (err)
> + return err;
> +
> + err = register_hotcpu_notifier(&vscsi->nb);
> + if (err)
> + vdev->config->del_vqs(vdev);
>  
> - return virtscsi_init(vdev, vscsi);
> + return err;
>  }
>  #endif
>  
> 

Reviewed-by: Paolo Bonzini 
Cc: sta...@vger.kernel.org
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4 v2] kvm: powerpc: use caching attributes as per linux pte

2013-10-28 Thread Bharat Bhushan
KVM uses same WIM tlb attributes as the corresponding qemu pte.
For this we now search the linux pte for the requested page and
get these cache caching/coherency attributes from pte.

Signed-off-by: Bharat Bhushan 
---
v1->v2
 -- No Change

 arch/powerpc/include/asm/kvm_host.h |2 +-
 arch/powerpc/kvm/booke.c|1 +
 arch/powerpc/kvm/e500.h |8 --
 arch/powerpc/kvm/e500_mmu_host.c|   39 --
 4 files changed, 30 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index ac40013..dd57029 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -550,6 +550,7 @@ struct kvm_vcpu_arch {
 #endif
gpa_t paddr_accessed;
gva_t vaddr_accessed;
+   pgd_t *pgdir;
 
u8 io_gpr; /* GPR used as IO source/target */
u8 mmio_is_bigendian;
@@ -607,7 +608,6 @@ struct kvm_vcpu_arch {
struct list_head run_list;
struct task_struct *run_task;
struct kvm_run *kvm_run;
-   pgd_t *pgdir;
 
spinlock_t vpa_update_lock;
struct kvmppc_vpa vpa;
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 8b6a790..76e8797 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -727,6 +727,7 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct 
kvm_vcpu *vcpu)
thread.debug = current->thread.debug;
current->thread.debug = vcpu->arch.shadow_dbg_reg;
 
+   vcpu->arch.pgdir = current->mm->pgd;
kvmppc_fix_ee_before_entry();
 
ret = __kvmppc_vcpu_run(kvm_run, vcpu);
diff --git a/arch/powerpc/kvm/e500.h b/arch/powerpc/kvm/e500.h
index 4fd9650..a326178 100644
--- a/arch/powerpc/kvm/e500.h
+++ b/arch/powerpc/kvm/e500.h
@@ -31,11 +31,13 @@ enum vcpu_ftr {
 #define E500_TLB_NUM   2
 
 /* entry is mapped somewhere in host TLB */
-#define E500_TLB_VALID (1 << 0)
+#define E500_TLB_VALID (1 << 31)
 /* TLB1 entry is mapped by host TLB1, tracked by bitmaps */
-#define E500_TLB_BITMAP(1 << 1)
+#define E500_TLB_BITMAP(1 << 30)
 /* TLB1 entry is mapped by host TLB0 */
-#define E500_TLB_TLB0  (1 << 2)
+#define E500_TLB_TLB0  (1 << 29)
+/* bits [6-5] MAS2_X1 and MAS2_X0 and [4-0] bits for WIMGE */
+#define E500_TLB_MAS2_ATTR (0x7f)
 
 struct tlbe_ref {
pfn_t pfn;  /* valid only for TLB0, except briefly */
diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c
index 59e05f2..31faf48 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -64,15 +64,6 @@ static inline u32 e500_shadow_mas3_attrib(u32 mas3, int 
usermode)
return mas3;
 }
 
-static inline u32 e500_shadow_mas2_attrib(u32 mas2, int usermode)
-{
-#ifdef CONFIG_SMP
-   return (mas2 & MAS2_ATTRIB_MASK) | MAS2_M;
-#else
-   return mas2 & MAS2_ATTRIB_MASK;
-#endif
-}
-
 /*
  * writing shadow tlb entry to host TLB
  */
@@ -248,11 +239,14 @@ static inline int tlbe_is_writable(struct 
kvm_book3e_206_tlb_entry *tlbe)
 
 static inline void kvmppc_e500_ref_setup(struct tlbe_ref *ref,
 struct kvm_book3e_206_tlb_entry *gtlbe,
-pfn_t pfn)
+pfn_t pfn, unsigned int wimg)
 {
ref->pfn = pfn;
ref->flags = E500_TLB_VALID;
 
+   /* Use guest supplied MAS2_G and MAS2_E */
+   ref->flags |= (gtlbe->mas2 & MAS2_ATTRIB_MASK) | wimg;
+
/* Mark the page accessed */
kvm_set_pfn_accessed(pfn);
 
@@ -315,8 +309,7 @@ static void kvmppc_e500_setup_stlbe(
 
/* Force IPROT=0 for all guest mappings. */
stlbe->mas1 = MAS1_TSIZE(tsize) | get_tlb_sts(gtlbe) | MAS1_VALID;
-   stlbe->mas2 = (gvaddr & MAS2_EPN) |
- e500_shadow_mas2_attrib(gtlbe->mas2, pr);
+   stlbe->mas2 = (gvaddr & MAS2_EPN) | (ref->flags & E500_TLB_MAS2_ATTR);
stlbe->mas7_3 = ((u64)pfn << PAGE_SHIFT) |
e500_shadow_mas3_attrib(gtlbe->mas7_3, pr);
 
@@ -335,6 +328,10 @@ static inline int kvmppc_e500_shadow_map(struct 
kvmppc_vcpu_e500 *vcpu_e500,
unsigned long hva;
int pfnmap = 0;
int tsize = BOOK3E_PAGESZ_4K;
+   unsigned long tsize_pages = 0;
+   pte_t pte;
+   unsigned int wimg = 0;
+   pgd_t *pgdir;
 
/*
 * Translate guest physical to true physical, acquiring
@@ -397,7 +394,7 @@ static inline int kvmppc_e500_shadow_map(struct 
kvmppc_vcpu_e500 *vcpu_e500,
 */
 
for (; tsize > BOOK3E_PAGESZ_4K; tsize -= 2) {
-   unsigned long gfn_start, gfn_end, tsize_pages;
+   unsigned long gfn_start, gfn_end;
tsize_pages = 1 << (tsize - 2);
 
gfn_start = gfn & ~(tsi

[PATCH 3/4 v2] kvm: powerpc: define a linux pte lookup function

2013-10-28 Thread Bharat Bhushan
We need to search linux "pte" to get "pte" attributes for
setting TLB in KVM.
This patch defines a linux_pte_lookup() function for same.

Signed-off-by: Bharat Bhushan 
---
v1->v2
 -- removed _PAGE_BUSY and _PAGE_PRESENT as suggested by PaulS
 -- Added _PAGE_SPLITTING 

 arch/powerpc/include/asm/pgtable.h |   27 +++
 1 files changed, 27 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/pgtable.h 
b/arch/powerpc/include/asm/pgtable.h
index 7d6eacf..a7151df 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -223,6 +223,33 @@ extern int gup_hugepte(pte_t *ptep, unsigned long sz, 
unsigned long addr,
 #endif
 pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea,
 unsigned *shift);
+
+static inline pte_t lookup_linux_pte(pgd_t *pgdir, unsigned long hva,
+unsigned long *pte_sizep)
+{
+   pte_t *ptep;
+   unsigned long ps = *pte_sizep;
+   unsigned int shift;
+
+   ptep = find_linux_pte_or_hugepte(pgdir, hva, &shift);
+   if (!ptep)
+   return __pte(0);
+   if (shift)
+   *pte_sizep = 1ul << shift;
+   else
+   *pte_sizep = PAGE_SIZE;
+
+   if (ps > *pte_sizep)
+   return __pte(0);
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+   /* If hugepage and is trans splitting return None */
+   if (unlikely(hugepage && pmd_trans_splitting(pte_pmd(*ptep
+   return __pte(0);
+#endif
+
+   return pte_val(*ptep);;
+}
 #endif /* __ASSEMBLY__ */
 
 #endif /* __KERNEL__ */
-- 
1.7.0.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: lkvm: virtio-net-rx general protection error

2013-10-28 Thread Asias He
On Mon, Oct 21, 2013 at 8:18 PM, Pekka Enberg  wrote:
> On 10/21/13 1:35 PM, Milan Kocian wrote:
>>
>> hi,
>>
>> sorry for writing it directly to you but I didn't find better recipient.
>> Does exist some mailing-list about lkvm?
>>
>> I found the crash in virtio-net-rx thread (I can reproduce it every time
>> by 'aptitude update' in VM):
>>
>> traps: virtio-net-rx[28933] general protection ip:7f00dda3d107
>> sp:7f00c58f4de8 error:0 in libc-2.17.so[7f00dd90f000+1a2000]
>>
>> gdb backtrace:
>>
>> (gdb) bt
>> #0  0x7fb6a548e107 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
>> #1  0x0041259c in memcpy_toiovecend (iov=0x7fb68d346ea0,
>> iov@entry=0x7fb68d345e90,
>>  kdata=, kdata@entry=0x7fb68d346e90 "",
>> offset=, len=)
>>  at util/iovec.c:70
>> #2  0x0040c66d in virtio_net_rx_thread (p=0x23688a0) at
>> virtio/net.c:117
>> #3  0x7fb6a5b2ee0e in start_thread () from
>> /lib/x86_64-linux-gnu/libpthread.so.0
>> #4  0x7fb6a54489ed in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>
>>
>> I tried to add some printf to diagnose it but it isn't clear to me:
>>
>> virtio_net_rx_thread: before memcpy_toiovecend; copied: 0, len: 18890,
>> iovsize: 4096, realiovsize: 4096
>> memcpy_toiovecend: offset: 0, len: 4096
>> memcpy_toiovecend: iov_len: 4096, len: 4096
>> virtio_net_rx_thread: before memcpy_toiovecend; copied: 4096, len: 18890,
>> iovsize: 4096, realiovsize: 4096
>> memcpy_toiovecend: offset: 4096, len: 4096
>> memcpy_toiovecend: iov_len: 4096, len: 4096
>> memcpy_toiovecend: iov_len: 0, len: 4096
>> memcpy_toiovecend: iov_len: 0, len: 4096
>> .
>> N x memcpy_toiovecend: iov_len: 0, len: 4096
>> .
>> memcpy_toiovecend: iov_len: 0, len: 4096
>> memcpy_toiovecend: iov_len: 0, len: 4096
>> memcpy_toiovecend: iov_len: 1519143547641528320, len: 4096
>> memcpy_toiovecend: iov_len: 193827583623176, len: 4096
>> ./runlkvm.sh: line 2: 16090 Segmentation fault
>>
>>
>> IMHO problem come when received len size is bigger than maximum
>> of the dst iovec (realiovsize). Only iovec size is copied and in the next
>> run isn't place to copy the rest of len size.
>>
>> So solution may be increase dst iovec size or send data in dst iovec
>> to user (but i don't know how, I am not virtio expert :-)).
>
>
> I'm CC'ing Asias, Sasha and others.

Hello Milan,

Does the attached patch fix your problem?

-- 
Asias
From b48eaeff7250bf7476c771e82cdbf20c3e85c4c9 Mon Sep 17 00:00:00 2001
From: Asias He 
Date: Mon, 28 Oct 2013 15:02:54 +0800
Subject: [PATCH 1/1] kvm-tools: Fix virtio-net iov memcpy

We should skip copied bytes from the buffer not from the iov itself
which memcpy_toiovecend does.

Signed-off-by: Asias He 
---
 tools/kvm/virtio/net.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/kvm/virtio/net.c b/tools/kvm/virtio/net.c
index 2c34996..3715aaf 100644
--- a/tools/kvm/virtio/net.c
+++ b/tools/kvm/virtio/net.c
@@ -114,7 +114,7 @@ static void *virtio_net_rx_thread(void *p)
 			while (copied < len) {
 size_t iovsize = min(len - copied, iov_size(iov, in));
 
-memcpy_toiovecend(iov, buffer, copied, iovsize);
+memcpy_toiovec(iov, buffer + copied, iovsize);
 copied += iovsize;
 if (has_virtio_feature(ndev, VIRTIO_NET_F_MRG_RXBUF))
 	hdr->num_buffers++;
-- 
1.8.3.1



[PATCH] virtio-scsi: Fix hotcpu_notifier use-after-free with virtscsi_freeze

2013-10-28 Thread Asias He
vqs are freed in virtscsi_freeze but the hotcpu_notifier is not
unregistered. We will have a use-after-free usage when the notifier
callback is called after virtscsi_freeze.

Signed-off-by: Asias He 
---
 drivers/scsi/virtio_scsi.c | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index 74b88ef..b26f1a5 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -957,6 +957,10 @@ static void virtscsi_remove(struct virtio_device *vdev)
 #ifdef CONFIG_PM
 static int virtscsi_freeze(struct virtio_device *vdev)
 {
+   struct Scsi_Host *sh = virtio_scsi_host(vdev);
+   struct virtio_scsi *vscsi = shost_priv(sh);
+
+   unregister_hotcpu_notifier(&vscsi->nb);
virtscsi_remove_vqs(vdev);
return 0;
 }
@@ -965,8 +969,17 @@ static int virtscsi_restore(struct virtio_device *vdev)
 {
struct Scsi_Host *sh = virtio_scsi_host(vdev);
struct virtio_scsi *vscsi = shost_priv(sh);
+   int err;
+
+   err = virtscsi_init(vdev, vscsi);
+   if (err)
+   return err;
+
+   err = register_hotcpu_notifier(&vscsi->nb);
+   if (err)
+   vdev->config->del_vqs(vdev);
 
-   return virtscsi_init(vdev, vscsi);
+   return err;
 }
 #endif
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html