Re: compiler bug gcc4.6/4.7 with ACCESS_ONCE and workarounds

2014-11-12 Thread Christian Borntraeger
Am 12.11.2014 um 01:36 schrieb Linus Torvalds:
 On Tue, Nov 11, 2014 at 4:33 PM, Linus Torvalds
 torva...@linux-foundation.org wrote:

 I guess as a workaround it is fine, as long as we don't lose sight of
 trying to eventually do a better job.
 
 Oh, and when it comes to the actual gcc bug - do you have any reason
 to believe that it's somehow triggered more easily by something
 particular in the arch/s390/kvm/gaccess.c code?

Yes there are reasons. First of all the bug if SRA removes the volatile tag, 
but that does not mean that this breaks things. As long as the operation is 
simple enough things will be mostly ok. If things are not simple like in 
gaccess things get more complicated. Lets look at the ipte lock. The lock 
itself consists of 3 parts: k (1 bit:locked), kh(31bit counter for the host) 
and kg(32 bit counter for the millicode when doing specific guest 
instructions). There are 3 valid states
1. k=0, kh=0; kg=0
2. k=1, kh!=0, kg=0
3. k=1, kh=0, kg!=0

So the host code must check if the guest counter is zero and can then set the k 
bit to one and increase the counter. (for unlock it has to decrement kh and if 
that becomes zero also zero the k bit)
That means that we have multiple accesses to subcomponents. As the host counter 
is bit 1-31 (ibm speak, linux speak bit 32-62) gcc wants to use the load thirty 
one bit instruction. 
So far so good. The ticket lock is also not a trivial set/clear bit.

Now: In gcc the memory costs for s390 are modeled to have the same costs as 
register accesses (TARGET_MEMORY_MOVE_COST==1, TARGET_REGISTER_MOVE_COST=1)
So for gcc a re-loading of a part of the lock from memory costs the same as 
loading it from a register. That probably triggered that bug.

Christian




 
 IOW, why does this problem not hit the x86 spinlocks that also use
 volatile pointers to aggregate types? Or does it?

I think we would have noticed if that hits. But there are certainly cases where 
this bug triggers also on x86, see
the initial bug report of https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145 
This bug is certainly different, (instead of transforming one load into 
multiple loads , it combines multiple write into one) but it shows, that a 
volatile marker is removed.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Live migration locks up 3.2 guests in do_timer(ticks ~ 500000)

2014-11-12 Thread Paolo Bonzini


On 11/11/2014 20:57, Matt Mullins wrote:
 That seems to work great, yes.
 
 Looking through the commit history, I see:
   kvmclock: Ensure time in migration never goes backward
   kvmclock: Ensure proper env-tsc value for kvmclock_current_nsec 
 calculation
 
 Assuming those are the right fixes for this issue, are they suitable to 
 request
 backported to distros' qemu 2.0 branches?  The merge commit for them seemed to
 indicate they were problematic at first.
 
 On second thought - I found the qemu-devel threads about reverting them in the
 2.1 timeframe, so I'm going to do a little more research before I start trying
 to suggest how to fix for existing install-base.

The right commits are

de9d61e83d43be9069e6646fa9d57a3f47779d28
317b0a6d8ba44e9bf8f9c3dbd776c4536843d82c
9a48bcd1b82494671c09b0eefdb882581499

which are similar but not equivalent to the two commits you found.  They
should be in 2.1.3 if there will be one, and they are appropriate for
backporting to 2.0.

Thanks for confirming that they fix your problem!

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes

2014-11-12 Thread Paolo Bonzini


On 12/11/2014 04:42, Zhang, Yang Z wrote:
 Personally, I think this feature will be helpful to the legacy device
 assignment. Agree, vfio is the right solution for future feature
 enabling. But the old kvm without the good vfio supporting is still
 used largely today. The user really looking for this feature but they
 will not upgrade their kernel. It's easy for us to backport this
 feature to old kvm with the legacy device assignment, but it is
 impossible to backport the whole vfio.

You can certainly backport these patches to distros that do not have
VFIO.  But upstream we should work on VFIO first.  VFIO has feature
parity with legacy device assignment, and adding a new feature that is
not in VFIO would be a bad idea.

By the way, do you have benchmark results for it?  We have not been able
to see any performance improvement for APICv on e.g. netperf.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes

2014-11-12 Thread Wu, Feng


 -Original Message-
 From: Paolo Bonzini [mailto:pbonz...@redhat.com]
 Sent: Wednesday, November 12, 2014 5:14 PM
 To: Zhang, Yang Z; Wu, Feng; Alex Williamson
 Cc: g...@kernel.org; dw...@infradead.org; j...@8bytes.org;
 t...@linutronix.de; mi...@redhat.com; h...@zytor.com; x...@kernel.org;
 kvm@vger.kernel.org; io...@lists.linux-foundation.org;
 linux-ker...@vger.kernel.org
 Subject: Re: [PATCH 05/13] KVM: Update IRTE according to guest interrupt
 configuration changes
 
 
 
 On 12/11/2014 04:42, Zhang, Yang Z wrote:
  Personally, I think this feature will be helpful to the legacy device
  assignment. Agree, vfio is the right solution for future feature
  enabling. But the old kvm without the good vfio supporting is still
  used largely today. The user really looking for this feature but they
  will not upgrade their kernel. It's easy for us to backport this
  feature to old kvm with the legacy device assignment, but it is
  impossible to backport the whole vfio.
 
 You can certainly backport these patches to distros that do not have
 VFIO.  But upstream we should work on VFIO first.  VFIO has feature
 parity with legacy device assignment, and adding a new feature that is
 not in VFIO would be a bad idea.
 
 By the way, do you have benchmark results for it?  We have not been able
 to see any performance improvement for APICv on e.g. netperf.

Do you mean benchmark results for APICv itself or VT-d Posted-Interrtups?

Thanks,
Feng

 
 Paolo


Re: compiler bug gcc4.6/4.7 with ACCESS_ONCE and workarounds

2014-11-12 Thread Martin Schwidefsky
On Tue, 11 Nov 2014 16:36:06 -0800
Linus Torvalds torva...@linux-foundation.org wrote:

 On Tue, Nov 11, 2014 at 4:33 PM, Linus Torvalds
 torva...@linux-foundation.org wrote:
 
  I guess as a workaround it is fine, as long as we don't lose sight of
  trying to eventually do a better job.
 
 Oh, and when it comes to the actual gcc bug - do you have any reason
 to believe that it's somehow triggered more easily by something
 particular in the arch/s390/kvm/gaccess.c code?
 
 IOW, why does this problem not hit the x86 spinlocks that also use
 volatile pointers to aggregate types? Or does it?

This looks similiar to what we had on s390:

old.tickets = ACCESS_ONCE(lock-tickets)

In theory x86 should be affected as well. On s390 we have lots of
instruction that operate on memory and the cost model of gcc makes
the compiler more inclined to access memory multiple times. My
guess would be that once the value is cached in a register the
cost model for x86 will usually make sure that the value is not
read a second time. But this is no guarantee.

-- 
blue skies,
   Martin.

Reality continues to ruin my life. - Calvin.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v9 03/19] vfio: platform: add the VFIO PLATFORM module to Kconfig

2014-11-12 Thread Hongbo Zhang
On 28 October 2014 02:07, Antonios Motakis
a.mota...@virtualopensystems.com wrote:

 Enable building the VFIO PLATFORM driver that allows to use Linux platform
 devices with VFIO.

 Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
 ---
  drivers/vfio/Kconfig   | 1 +
  drivers/vfio/Makefile  | 1 +
  drivers/vfio/platform/Kconfig  | 9 +
  drivers/vfio/platform/Makefile | 4 
  4 files changed, 15 insertions(+)
  create mode 100644 drivers/vfio/platform/Kconfig
  create mode 100644 drivers/vfio/platform/Makefile

 diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
 index a0abe04..962fb80 100644
 --- a/drivers/vfio/Kconfig
 +++ b/drivers/vfio/Kconfig
 @@ -27,3 +27,4 @@ menuconfig VFIO
   If you don't know what to do here, say N.

  source drivers/vfio/pci/Kconfig
 +source drivers/vfio/platform/Kconfig
 diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
 index 0b035b1..dadf0ca 100644
 --- a/drivers/vfio/Makefile
 +++ b/drivers/vfio/Makefile
 @@ -3,3 +3,4 @@ obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o
  obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o
  obj-$(CONFIG_VFIO_SPAPR_EEH) += vfio_spapr_eeh.o
  obj-$(CONFIG_VFIO_PCI) += pci/
 +obj-$(CONFIG_VFIO_PLATFORM) += platform/
 diff --git a/drivers/vfio/platform/Kconfig b/drivers/vfio/platform/Kconfig
 new file mode 100644
 index 000..c51af17
 --- /dev/null
 +++ b/drivers/vfio/platform/Kconfig
 @@ -0,0 +1,9 @@
 +config VFIO_PLATFORM
 +   tristate VFIO support for platform devices
 +   depends on VFIO  EVENTFD  ARM

Hi Antonios,
Is this only for ARM? how about X86 and PowerPC?
On Freescale's PowerPC platform, the IOMMU is called PAMU (Peripheral
Access Management Unit), and I am trying to use this VFIO framework on
it.


 +   help
 + Support for platform devices with VFIO. This is required to make
 + use of platform devices present on the system using the VFIO
 + framework.
 +
 + If you don't know what to do here, say N.
 diff --git a/drivers/vfio/platform/Makefile b/drivers/vfio/platform/Makefile
 new file mode 100644
 index 000..279862b
 --- /dev/null
 +++ b/drivers/vfio/platform/Makefile
 @@ -0,0 +1,4 @@
 +
 +vfio-platform-y := vfio_platform.o vfio_platform_common.o
 +
 +obj-$(CONFIG_VFIO_PLATFORM) += vfio-platform.o
 --
 2.1.1

 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes

2014-11-12 Thread Paolo Bonzini


On 12/11/2014 10:19, Wu, Feng wrote:
 You can certainly backport these patches to distros that do not have
 VFIO.  But upstream we should work on VFIO first.  VFIO has feature
 parity with legacy device assignment, and adding a new feature that is
 not in VFIO would be a bad idea.

 By the way, do you have benchmark results for it?  We have not been able
 to see any performance improvement for APICv on e.g. netperf.
 
 Do you mean benchmark results for APICv itself or VT-d Posted-Interrtups?

Especially for VT-d posted interrupts---but it'd be great to know which
workloads see the biggest speedup from APICv.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v9 03/19] vfio: platform: add the VFIO PLATFORM module to Kconfig

2014-11-12 Thread Antonios Motakis
Hello Hongbo,

On Wed, Nov 12, 2014 at 10:52 AM, Hongbo Zhang hongbo.zh...@linaro.org wrote:
 On 28 October 2014 02:07, Antonios Motakis
 a.mota...@virtualopensystems.com wrote:

 Enable building the VFIO PLATFORM driver that allows to use Linux platform
 devices with VFIO.

 Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
 ---
  drivers/vfio/Kconfig   | 1 +
  drivers/vfio/Makefile  | 1 +
  drivers/vfio/platform/Kconfig  | 9 +
  drivers/vfio/platform/Makefile | 4 
  4 files changed, 15 insertions(+)
  create mode 100644 drivers/vfio/platform/Kconfig
  create mode 100644 drivers/vfio/platform/Makefile

 diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
 index a0abe04..962fb80 100644
 --- a/drivers/vfio/Kconfig
 +++ b/drivers/vfio/Kconfig
 @@ -27,3 +27,4 @@ menuconfig VFIO
   If you don't know what to do here, say N.

  source drivers/vfio/pci/Kconfig
 +source drivers/vfio/platform/Kconfig
 diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
 index 0b035b1..dadf0ca 100644
 --- a/drivers/vfio/Makefile
 +++ b/drivers/vfio/Makefile
 @@ -3,3 +3,4 @@ obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o
  obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o
  obj-$(CONFIG_VFIO_SPAPR_EEH) += vfio_spapr_eeh.o
  obj-$(CONFIG_VFIO_PCI) += pci/
 +obj-$(CONFIG_VFIO_PLATFORM) += platform/
 diff --git a/drivers/vfio/platform/Kconfig b/drivers/vfio/platform/Kconfig
 new file mode 100644
 index 000..c51af17
 --- /dev/null
 +++ b/drivers/vfio/platform/Kconfig
 @@ -0,0 +1,9 @@
 +config VFIO_PLATFORM
 +   tristate VFIO support for platform devices
 +   depends on VFIO  EVENTFD  ARM

 Hi Antonios,
 Is this only for ARM? how about X86 and PowerPC?
 On Freescale's PowerPC platform, the IOMMU is called PAMU (Peripheral
 Access Management Unit), and I am trying to use this VFIO framework on
 it.


In principle it should be working on any platform with such devices;
as long as you have a VFIO IOMMU driver for the PAMU (on ARM we use
VFIO PLATFORM for the device, with VFIO IOMMU TYPE1 for the IOMMU).

So if you have a suitable IOMMU driver for your target, feel free to
test it, and let us know of the results.


 +   help
 + Support for platform devices with VFIO. This is required to make
 + use of platform devices present on the system using the VFIO
 + framework.
 +
 + If you don't know what to do here, say N.
 diff --git a/drivers/vfio/platform/Makefile b/drivers/vfio/platform/Makefile
 new file mode 100644
 index 000..279862b
 --- /dev/null
 +++ b/drivers/vfio/platform/Makefile
 @@ -0,0 +1,4 @@
 +
 +vfio-platform-y := vfio_platform.o vfio_platform_common.o
 +
 +obj-$(CONFIG_VFIO_PLATFORM) += vfio-platform.o
 --
 2.1.1

 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v9 03/19] vfio: platform: add the VFIO PLATFORM module to Kconfig

2014-11-12 Thread Hongbo Zhang
On 12 November 2014 18:05, bharat.bhus...@freescale.com
bharat.bhus...@freescale.com wrote:
 Hi,



 This is not yet supported on Freescale PowerPC. I am still in process of
 upstreaming the FSL PAMU specific patches for same.

 Initial plan is to test with PCIe devices and then with Platform devices.


I see there is already driver/iommu/fsl_pamu.c, doesn't it work?
Could you explain briefly what is wrong? I've heard that the vfio pci
works on powerpc platforms
Thanks.



 Thanks

 -Bharat



 From: kvmarm-boun...@lists.cs.columbia.edu
 [mailto:kvmarm-boun...@lists.cs.columbia.edu] On Behalf Of Hongbo Zhang
 Sent: Wednesday, November 12, 2014 3:08 PM
 To: Antonios Motakis
 Cc: open list:VFIO DRIVER; will.dea...@arm.com; alex.william...@redhat.com;
 open list; io...@lists.linux-foundation.org; t...@virtualopensystems.com;
 kvm...@lists.cs.columbia.edu
 Subject: Re: [PATCH v9 03/19] vfio: platform: add the VFIO PLATFORM module
 to Kconfig







 On 28 October 2014 02:07, Antonios Motakis
 a.mota...@virtualopensystems.com wrote:

 Enable building the VFIO PLATFORM driver that allows to use Linux platform
 devices with VFIO.

 Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
 ---
  drivers/vfio/Kconfig   | 1 +
  drivers/vfio/Makefile  | 1 +
  drivers/vfio/platform/Kconfig  | 9 +
  drivers/vfio/platform/Makefile | 4 
  4 files changed, 15 insertions(+)
  create mode 100644 drivers/vfio/platform/Kconfig
  create mode 100644 drivers/vfio/platform/Makefile

 diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
 index a0abe04..962fb80 100644
 --- a/drivers/vfio/Kconfig
 +++ b/drivers/vfio/Kconfig
 @@ -27,3 +27,4 @@ menuconfig VFIO
   If you don't know what to do here, say N.

  source drivers/vfio/pci/Kconfig
 +source drivers/vfio/platform/Kconfig
 diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
 index 0b035b1..dadf0ca 100644
 --- a/drivers/vfio/Makefile
 +++ b/drivers/vfio/Makefile
 @@ -3,3 +3,4 @@ obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o
  obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o
  obj-$(CONFIG_VFIO_SPAPR_EEH) += vfio_spapr_eeh.o
  obj-$(CONFIG_VFIO_PCI) += pci/
 +obj-$(CONFIG_VFIO_PLATFORM) += platform/
 diff --git a/drivers/vfio/platform/Kconfig b/drivers/vfio/platform/Kconfig
 new file mode 100644
 index 000..c51af17
 --- /dev/null
 +++ b/drivers/vfio/platform/Kconfig
 @@ -0,0 +1,9 @@
 +config VFIO_PLATFORM
 +   tristate VFIO support for platform devices
 +   depends on VFIO  EVENTFD  ARM



 Hi Antonios,

 Is this only for ARM? how about X86 and PowerPC?

 On Freescale's PowerPC platform, the IOMMU is called PAMU (Peripheral Access
 Management Unit), and I am trying to use this VFIO framework on it.



 +   help
 + Support for platform devices with VFIO. This is required to make
 + use of platform devices present on the system using the VFIO
 + framework.
 +
 + If you don't know what to do here, say N.
 diff --git a/drivers/vfio/platform/Makefile b/drivers/vfio/platform/Makefile
 new file mode 100644
 index 000..279862b
 --- /dev/null
 +++ b/drivers/vfio/platform/Makefile
 @@ -0,0 +1,4 @@
 +
 +vfio-platform-y := vfio_platform.o vfio_platform_common.o
 +
 +obj-$(CONFIG_VFIO_PLATFORM) += vfio-platform.o
 --
 2.1.1

 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v9 03/19] vfio: platform: add the VFIO PLATFORM module to Kconfig

2014-11-12 Thread bharat.bhus...@freescale.com


 -Original Message-
 From: Hongbo Zhang [mailto:hongbo.zh...@linaro.org]
 Sent: Wednesday, November 12, 2014 4:09 PM
 To: Bhushan Bharat-R65777
 Cc: Antonios Motakis; open list:VFIO DRIVER; will.dea...@arm.com;
 alex.william...@redhat.com; open list; io...@lists.linux-foundation.org;
 t...@virtualopensystems.com; kvm...@lists.cs.columbia.edu
 Subject: Re: [PATCH v9 03/19] vfio: platform: add the VFIO PLATFORM module to
 Kconfig
 
 On 12 November 2014 18:05, bharat.bhus...@freescale.com
 bharat.bhus...@freescale.com wrote:
  Hi,
 
 
 
  This is not yet supported on Freescale PowerPC. I am still in process
  of upstreaming the FSL PAMU specific patches for same.
 
  Initial plan is to test with PCIe devices and then with Platform devices.
 
 
 I see there is already driver/iommu/fsl_pamu.c, doesn't it work?

We need VFIO iommu driver for same.

 Could you explain briefly what is wrong? I've heard that the vfio pci works on
 powerpc platforms.

Yes, patches are in Freescale internal git repository. But those patches are 
yet to be upstreamed. I have started working on same.

Thanks
-Bharat

 
 
 
  Thanks
 
  -Bharat
 
 
 
  From: kvmarm-boun...@lists.cs.columbia.edu
  [mailto:kvmarm-boun...@lists.cs.columbia.edu] On Behalf Of Hongbo
  Zhang
  Sent: Wednesday, November 12, 2014 3:08 PM
  To: Antonios Motakis
  Cc: open list:VFIO DRIVER; will.dea...@arm.com;
  alex.william...@redhat.com; open list;
  io...@lists.linux-foundation.org; t...@virtualopensystems.com;
  kvm...@lists.cs.columbia.edu
  Subject: Re: [PATCH v9 03/19] vfio: platform: add the VFIO PLATFORM
  module to Kconfig
 
 
 
 
 
 
 
  On 28 October 2014 02:07, Antonios Motakis
  a.mota...@virtualopensystems.com wrote:
 
  Enable building the VFIO PLATFORM driver that allows to use Linux
  platform devices with VFIO.
 
  Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
  ---
   drivers/vfio/Kconfig   | 1 +
   drivers/vfio/Makefile  | 1 +
   drivers/vfio/platform/Kconfig  | 9 +
  drivers/vfio/platform/Makefile | 4 
   4 files changed, 15 insertions(+)
   create mode 100644 drivers/vfio/platform/Kconfig  create mode 100644
  drivers/vfio/platform/Makefile
 
  diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig index
  a0abe04..962fb80 100644
  --- a/drivers/vfio/Kconfig
  +++ b/drivers/vfio/Kconfig
  @@ -27,3 +27,4 @@ menuconfig VFIO
If you don't know what to do here, say N.
 
   source drivers/vfio/pci/Kconfig
  +source drivers/vfio/platform/Kconfig
  diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile index
  0b035b1..dadf0ca 100644
  --- a/drivers/vfio/Makefile
  +++ b/drivers/vfio/Makefile
  @@ -3,3 +3,4 @@ obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o
   obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o
   obj-$(CONFIG_VFIO_SPAPR_EEH) += vfio_spapr_eeh.o
   obj-$(CONFIG_VFIO_PCI) += pci/
  +obj-$(CONFIG_VFIO_PLATFORM) += platform/
  diff --git a/drivers/vfio/platform/Kconfig
  b/drivers/vfio/platform/Kconfig new file mode 100644 index
  000..c51af17
  --- /dev/null
  +++ b/drivers/vfio/platform/Kconfig
  @@ -0,0 +1,9 @@
  +config VFIO_PLATFORM
  +   tristate VFIO support for platform devices
  +   depends on VFIO  EVENTFD  ARM
 
 
 
  Hi Antonios,
 
  Is this only for ARM? how about X86 and PowerPC?
 
  On Freescale's PowerPC platform, the IOMMU is called PAMU (Peripheral
  Access Management Unit), and I am trying to use this VFIO framework on it.
 
 
 
  +   help
  + Support for platform devices with VFIO. This is required to make
  + use of platform devices present on the system using the VFIO
  + framework.
  +
  + If you don't know what to do here, say N.
  diff --git a/drivers/vfio/platform/Makefile
  b/drivers/vfio/platform/Makefile new file mode 100644 index
  000..279862b
  --- /dev/null
  +++ b/drivers/vfio/platform/Makefile
  @@ -0,0 +1,4 @@
  +
  +vfio-platform-y := vfio_platform.o vfio_platform_common.o
  +
  +obj-$(CONFIG_VFIO_PLATFORM) += vfio-platform.o
  --
  2.1.1
 
  --
  To unsubscribe from this list: send the line unsubscribe
  linux-kernel in the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
  Please read the FAQ at  http://www.tux.org/lkml/
 
 


Re: [PATCH v9 03/19] vfio: platform: add the VFIO PLATFORM module to Kconfig

2014-11-12 Thread Hongbo Zhang
On 12 November 2014 19:00, bharat.bhus...@freescale.com
bharat.bhus...@freescale.com wrote:


 -Original Message-
 From: Hongbo Zhang [mailto:hongbo.zh...@linaro.org]
 Sent: Wednesday, November 12, 2014 4:09 PM
 To: Bhushan Bharat-R65777
 Cc: Antonios Motakis; open list:VFIO DRIVER; will.dea...@arm.com;
 alex.william...@redhat.com; open list; io...@lists.linux-foundation.org;
 t...@virtualopensystems.com; kvm...@lists.cs.columbia.edu
 Subject: Re: [PATCH v9 03/19] vfio: platform: add the VFIO PLATFORM module to
 Kconfig

 On 12 November 2014 18:05, bharat.bhus...@freescale.com
 bharat.bhus...@freescale.com wrote:
  Hi,
 
 
 
  This is not yet supported on Freescale PowerPC. I am still in process
  of upstreaming the FSL PAMU specific patches for same.
 
  Initial plan is to test with PCIe devices and then with Platform devices.
 

 I see there is already driver/iommu/fsl_pamu.c, doesn't it work?

 We need VFIO iommu driver for same.

 Could you explain briefly what is wrong? I've heard that the vfio pci works 
 on
 powerpc platforms.

 Yes, patches are in Freescale internal git repository. But those patches are 
 yet to be upstreamed. I have started working on same.

(I come from Freescale too, just currently assigned to Linaro)
Oh, yes I now see vfio iommu driver in the internal git repo, but not
in the community one.
It seems I have to select another platform within Linaro to work with.



 Thanks
 -Bharat


 
 
  Thanks
 
  -Bharat
 
 
 
  From: kvmarm-boun...@lists.cs.columbia.edu
  [mailto:kvmarm-boun...@lists.cs.columbia.edu] On Behalf Of Hongbo
  Zhang
  Sent: Wednesday, November 12, 2014 3:08 PM
  To: Antonios Motakis
  Cc: open list:VFIO DRIVER; will.dea...@arm.com;
  alex.william...@redhat.com; open list;
  io...@lists.linux-foundation.org; t...@virtualopensystems.com;
  kvm...@lists.cs.columbia.edu
  Subject: Re: [PATCH v9 03/19] vfio: platform: add the VFIO PLATFORM
  module to Kconfig
 
 
 
 
 
 
 
  On 28 October 2014 02:07, Antonios Motakis
  a.mota...@virtualopensystems.com wrote:
 
  Enable building the VFIO PLATFORM driver that allows to use Linux
  platform devices with VFIO.
 
  Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
  ---
   drivers/vfio/Kconfig   | 1 +
   drivers/vfio/Makefile  | 1 +
   drivers/vfio/platform/Kconfig  | 9 +
  drivers/vfio/platform/Makefile | 4 
   4 files changed, 15 insertions(+)
   create mode 100644 drivers/vfio/platform/Kconfig  create mode 100644
  drivers/vfio/platform/Makefile
 
  diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig index
  a0abe04..962fb80 100644
  --- a/drivers/vfio/Kconfig
  +++ b/drivers/vfio/Kconfig
  @@ -27,3 +27,4 @@ menuconfig VFIO
If you don't know what to do here, say N.
 
   source drivers/vfio/pci/Kconfig
  +source drivers/vfio/platform/Kconfig
  diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile index
  0b035b1..dadf0ca 100644
  --- a/drivers/vfio/Makefile
  +++ b/drivers/vfio/Makefile
  @@ -3,3 +3,4 @@ obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o
   obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o
   obj-$(CONFIG_VFIO_SPAPR_EEH) += vfio_spapr_eeh.o
   obj-$(CONFIG_VFIO_PCI) += pci/
  +obj-$(CONFIG_VFIO_PLATFORM) += platform/
  diff --git a/drivers/vfio/platform/Kconfig
  b/drivers/vfio/platform/Kconfig new file mode 100644 index
  000..c51af17
  --- /dev/null
  +++ b/drivers/vfio/platform/Kconfig
  @@ -0,0 +1,9 @@
  +config VFIO_PLATFORM
  +   tristate VFIO support for platform devices
  +   depends on VFIO  EVENTFD  ARM
 
 
 
  Hi Antonios,
 
  Is this only for ARM? how about X86 and PowerPC?
 
  On Freescale's PowerPC platform, the IOMMU is called PAMU (Peripheral
  Access Management Unit), and I am trying to use this VFIO framework on it.
 
 
 
  +   help
  + Support for platform devices with VFIO. This is required to make
  + use of platform devices present on the system using the VFIO
  + framework.
  +
  + If you don't know what to do here, say N.
  diff --git a/drivers/vfio/platform/Makefile
  b/drivers/vfio/platform/Makefile new file mode 100644 index
  000..279862b
  --- /dev/null
  +++ b/drivers/vfio/platform/Makefile
  @@ -0,0 +1,4 @@
  +
  +vfio-platform-y := vfio_platform.o vfio_platform_common.o
  +
  +obj-$(CONFIG_VFIO_PLATFORM) += vfio-platform.o
  --
  2.1.1
 
  --
  To unsubscribe from this list: send the line unsubscribe
  linux-kernel in the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
  Please read the FAQ at  http://www.tux.org/lkml/
 
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Seeking a KVM benchmark

2014-11-12 Thread Paolo Bonzini


On 10/11/2014 18:38, Gleb Natapov wrote:
 On Mon, Nov 10, 2014 at 06:28:25PM +0100, Paolo Bonzini wrote:
 On 10/11/2014 15:23, Avi Kivity wrote:
 It's not surprising [1].  Since the meaning of some PTE bits change [2],
 the TLB has to be flushed.  In VMX we have VPIDs, so we only need to flush
 if EFER changed between two invocations of the same VPID, which isn't the
 case.

 [1] after the fact
 [2] although those bits were reserved with NXE=0, so they shouldn't have
 any TLB footprint

 You're right that this is not that surprising after the fact, and that
 both Sandy Bridge and Ivy Bridge have VPIDs (even the non-Xeon ones).
 This is also why I'm curious about the Nehalem.

 However note that even toggling the SCE bit is flushing the TLB.  The
 NXE bit is not being toggled here!  That's the more surprising part.

 Just a guess, but may be because writing EFER is not something that happens
 often in regular OSes it is not optimized to handle different bits 
 differently.

Yes, that's what Intel said too.

Nehalem results:

userspace exit, urn  17560 17726 17628 17572 17417
lightweight exit, urn 3316  3342  3342  3319  3328
userspace exit, LOAD_EFER, guest!=host   12200 11772 12130 12164 12327
lightweight exit, LOAD_EFER, guest!=host  3214  3220  3238  3218  3337
userspace exit, LOAD_EFER, guest=host11983 11780 11920 11919 12040
lightweight exit, LOAD_EFER, guest=host   3178  3193  3193  3187  3220

So the benchmark results also explain why skipping the LOAD_EFER does
not give a benefit for guest EFER=host EFER.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] x86, kvm, vmx: Always use LOAD_IA32_EFER if available

2014-11-12 Thread Paolo Bonzini


On 08/11/2014 03:25, Andy Lutomirski wrote:
 At least on Sandy Bridge, letting the CPU switch IA32_EFER is much
 faster than switching it manually.
 
 I benchmarked this using the vmexit kvm-unit-test (single run, but
 GOAL multiplied by 5 to do more iterations):
 
 Test  Before  AfterChange
 cpuid   2000   1932-3.40%
 vmcall  1914   1817-5.07%
 mov_from_cr8  13 13 0.00%
 mov_to_cr819 19 0.00%
 inl_from_pmtimer   19164  10619   -44.59%
 inl_from_qemu  15662  10302   -34.22%
 inl_from_kernel 3916   3802-2.91%
 outl_to_kernel  2230   2194-1.61%
 mov_dr   172176 2.33%
 ipi(skipped)  (skipped)
 ipi+halt   (skipped)  (skipped)
 ple-round-robin   13 13 0.00%
 wr_tsc_adjust_msr   1920   1845-3.91%
 rd_tsc_adjust_msr   1892   1814-4.12%
 mmio-no-eventfd:pci-mem16394  11165   -31.90%
 mmio-wildcard-eventfd:pci-mem   4607   4645 0.82%
 mmio-datamatch-eventfd:pci-mem  4601   4610 0.20%
 portio-no-eventfd:pci-io   11507   7942   -30.98%
 portio-wildcard-eventfd:pci-io  2239   2225-0.63%
 portio-datamatch-eventfd:pci-io 2250   2234-0.71%
 
 I haven't explicitly computed the significance of these numbers,
 but this isn't subtle.
 
 Signed-off-by: Andy Lutomirski l...@amacapital.net
 ---
  arch/x86/kvm/vmx.c | 10 --
  1 file changed, 8 insertions(+), 2 deletions(-)
 
 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index 3e556c68351b..e72b9660e51c 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -1659,8 +1659,14 @@ static bool update_transition_efer(struct vcpu_vmx 
 *vmx, int efer_offset)
   vmx-guest_msrs[efer_offset].mask = ~ignore_bits;
  
   clear_atomic_switch_msr(vmx, MSR_EFER);
 - /* On ept, can't emulate nx, and must switch nx atomically */
 - if (enable_ept  ((vmx-vcpu.arch.efer ^ host_efer)  EFER_NX)) {
 +
 + /*
 +  * On EPT, we can't emulate NX, so we must switch EFER atomically.
 +  * On CPUs that support load IA32_EFER, always switch EFER
 +  * atomically, since it's faster than switching it manually.
 +  */
 + if (cpu_has_load_ia32_efer ||
 + (enable_ept  ((vmx-vcpu.arch.efer ^ host_efer)  EFER_NX))) {
   guest_efer = vmx-vcpu.arch.efer;
   if (!(guest_efer  EFER_LMA))
   guest_efer = ~EFER_LME;
 

I am committing this patch, with an additional remark in the commit message:

 The results were reproducible on all of Nehalem, Sandy Bridge and
 Ivy Bridge.  The slowness of manual switching is because writing
 to EFER with WRMSR triggers a TLB flush, even if the only bit you're
 touching is SCE (so the page table format is not affected).  Doing
 the write as part of vmentry/vmexit, instead, does not flush the TLB,
 probably because all processors that have EPT also have VPID.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] virt: kvm: arm: vgic: Return failure code '-EBUSY' when mutex_trylock() fails

2014-11-12 Thread Chen Gang
When mutex_trylock() fails, kvm_vgic_create() will not create 'vgic', so
it need return failure code '-EBUSY' instead of '0' to let outside know
about it.

Also simplify the code within kvm_vgic_create():

 - kvm_for_each_vcpu() scanning once is enough for current case.

 - Remove redundant variable 'vcpu_lock_idx'.


Signed-off-by: Chen Gang gang.chen.5...@gmail.com
---
 virt/kvm/arm/vgic.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 3aaca49..5846725 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -1933,7 +1933,7 @@ out:
 
 int kvm_vgic_create(struct kvm *kvm)
 {
-   int i, vcpu_lock_idx = -1, ret = 0;
+   int i, ret = 0;
struct kvm_vcpu *vcpu;
 
mutex_lock(kvm-lock);
@@ -1949,13 +1949,12 @@ int kvm_vgic_create(struct kvm *kvm)
 * that no other VCPUs are run while we create the vgic.
 */
kvm_for_each_vcpu(i, vcpu, kvm) {
-   if (!mutex_trylock(vcpu-mutex))
+   if (!mutex_trylock(vcpu-mutex)) {
+   ret = -EBUSY;
goto out_unlock;
-   vcpu_lock_idx = i;
-   }
-
-   kvm_for_each_vcpu(i, vcpu, kvm) {
+   }
if (vcpu-arch.has_run_once) {
+   mutex_unlock(vcpu-mutex);
ret = -EBUSY;
goto out_unlock;
}
@@ -1968,8 +1967,8 @@ int kvm_vgic_create(struct kvm *kvm)
kvm-arch.vgic.vgic_cpu_base = VGIC_ADDR_UNDEF;
 
 out_unlock:
-   for (; vcpu_lock_idx = 0; vcpu_lock_idx--) {
-   vcpu = kvm_get_vcpu(kvm, vcpu_lock_idx);
+   while (i--  0) {
+   vcpu = kvm_get_vcpu(kvm, i);
mutex_unlock(vcpu-mutex);
}
 
-- 
1.9.3
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Seeking a KVM benchmark

2014-11-12 Thread Gleb Natapov
On Wed, Nov 12, 2014 at 12:33:32PM +0100, Paolo Bonzini wrote:
 
 
 On 10/11/2014 18:38, Gleb Natapov wrote:
  On Mon, Nov 10, 2014 at 06:28:25PM +0100, Paolo Bonzini wrote:
  On 10/11/2014 15:23, Avi Kivity wrote:
  It's not surprising [1].  Since the meaning of some PTE bits change [2],
  the TLB has to be flushed.  In VMX we have VPIDs, so we only need to flush
  if EFER changed between two invocations of the same VPID, which isn't the
  case.
 
  [1] after the fact
  [2] although those bits were reserved with NXE=0, so they shouldn't have
  any TLB footprint
 
  You're right that this is not that surprising after the fact, and that
  both Sandy Bridge and Ivy Bridge have VPIDs (even the non-Xeon ones).
  This is also why I'm curious about the Nehalem.
 
  However note that even toggling the SCE bit is flushing the TLB.  The
  NXE bit is not being toggled here!  That's the more surprising part.
 
  Just a guess, but may be because writing EFER is not something that happens
  often in regular OSes it is not optimized to handle different bits 
  differently.
 
 Yes, that's what Intel said too.
 
 Nehalem results:
 
 userspace exit, urn  17560 17726 17628 17572 17417
 lightweight exit, urn 3316  3342  3342  3319  3328
 userspace exit, LOAD_EFER, guest!=host   12200 11772 12130 12164 12327
 lightweight exit, LOAD_EFER, guest!=host  3214  3220  3238  3218  3337
 userspace exit, LOAD_EFER, guest=host11983 11780 11920 11919 12040
 lightweight exit, LOAD_EFER, guest=host   3178  3193  3193  3187  3220
 
Is this with Andy's patch that skips LOAD_EFER when guest=host, or the one
that always switch LOAD_EFER?

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Seeking a KVM benchmark

2014-11-12 Thread Paolo Bonzini


On 12/11/2014 16:22, Gleb Natapov wrote:
  Nehalem results:
  
  userspace exit, urn  17560 17726 17628 17572 17417
  lightweight exit, urn 3316  3342  3342  3319  3328
  userspace exit, LOAD_EFER, guest!=host   12200 11772 12130 12164 12327
  lightweight exit, LOAD_EFER, guest!=host  3214  3220  3238  3218  3337
  userspace exit, LOAD_EFER, guest=host11983 11780 11920 11919 12040
  lightweight exit, LOAD_EFER, guest=host   3178  3193  3193  3187  3220
  
 Is this with Andy's patch that skips LOAD_EFER when guest=host, or the one
 that always switch LOAD_EFER?

Skip LOAD_EFER when guest=host.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Seeking a KVM benchmark

2014-11-12 Thread Gleb Natapov
On Wed, Nov 12, 2014 at 04:26:29PM +0100, Paolo Bonzini wrote:
 
 
 On 12/11/2014 16:22, Gleb Natapov wrote:
   Nehalem results:
   
   userspace exit, urn  17560 17726 17628 17572 17417
   lightweight exit, urn 3316  3342  3342  3319  3328
   userspace exit, LOAD_EFER, guest!=host   12200 11772 12130 12164 12327
   lightweight exit, LOAD_EFER, guest!=host  3214  3220  3238  3218  3337
   userspace exit, LOAD_EFER, guest=host11983 11780 11920 11919 12040
   lightweight exit, LOAD_EFER, guest=host   3178  3193  3193  3187  3220
   
  Is this with Andy's patch that skips LOAD_EFER when guest=host, or the one
  that always switch LOAD_EFER?
 
 Skip LOAD_EFER when guest=host.
 
So guest=host is a little bit better than guest!=host so looks like skipping 
LOAD_EFER helps, but
why lightweight exit, urn worse than guest=host though, it should be exactly 
the same as long as NX
bit is the same in urn test, no?

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Seeking a KVM benchmark

2014-11-12 Thread Paolo Bonzini


On 12/11/2014 16:32, Gleb Natapov wrote:
userspace exit, urn  17560 17726 17628 17572 17417
lightweight exit, urn 3316  3342  3342  3319  3328
userspace exit, LOAD_EFER, guest!=host   12200 11772 12130 12164 12327
lightweight exit, LOAD_EFER, guest!=host  3214  3220  3238  3218  3337
userspace exit, LOAD_EFER, guest=host11983 11780 11920 11919 12040
lightweight exit, LOAD_EFER, guest=host   3178  3193  3193  3187  3220
   
   Is this with Andy's patch that skips LOAD_EFER when guest=host, or the one
   that always switch LOAD_EFER?
  
  Skip LOAD_EFER when guest=host.
 
 So guest=host is a little bit better than guest!=host so looks like
 skipping LOAD_EFER helps, but why lightweight exit, urn worse than
 guest=host though, it should be exactly the same as long as NX bit is
 the same in urn test, no?

I don't know---it is very much reproducible though.  It is not my
machine so I cannot run perf on it, but I can try to find a similar one
in the next few days.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Seeking a KVM benchmark

2014-11-12 Thread Andy Lutomirski
On Wed, Nov 12, 2014 at 7:51 AM, Paolo Bonzini pbonz...@redhat.com wrote:


 On 12/11/2014 16:32, Gleb Natapov wrote:
userspace exit, urn  17560 17726 17628 17572 17417
lightweight exit, urn 3316  3342  3342  3319  3328
userspace exit, LOAD_EFER, guest!=host   12200 11772 12130 12164 12327
lightweight exit, LOAD_EFER, guest!=host  3214  3220  3238  3218  3337
userspace exit, LOAD_EFER, guest=host11983 11780 11920 11919 12040
lightweight exit, LOAD_EFER, guest=host   3178  3193  3193  3187  3220
  
   Is this with Andy's patch that skips LOAD_EFER when guest=host, or the 
   one
   that always switch LOAD_EFER?
 
  Skip LOAD_EFER when guest=host.

 So guest=host is a little bit better than guest!=host so looks like
 skipping LOAD_EFER helps, but why lightweight exit, urn worse than
 guest=host though, it should be exactly the same as long as NX bit is
 the same in urn test, no?

 I don't know---it is very much reproducible though.  It is not my
 machine so I cannot run perf on it, but I can try to find a similar one
 in the next few days.

Assuming you're running both of my patches (LOAD_EFER regardless of
nx, but skip LOAD_EFER of guest == host), then some of the speedup may
be just less code running.  I haven't figured out exactly when
vmx_save_host_state runs, but my patches avoid a call to
kvm_set_shared_msr, which is worth a few cycles.

--Andy


 Paolo



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v9 06/19] vfio/platform: return info for bound device

2014-11-12 Thread Alex Williamson
On Wed, 2014-11-12 at 11:32 +0100, Eric Auger wrote:
 On 10/27/2014 07:07 PM, Antonios Motakis wrote:
  A VFIO userspace driver will start by opening the VFIO device
  that corresponds to an IOMMU group, and will use the ioctl interface
  to get the basic device info, such as number of memory regions and
  interrupts, and their properties. This patch enables the
  VFIO_DEVICE_GET_INFO ioctl call.
  
  Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
  ---
   drivers/vfio/platform/vfio_platform_common.c | 23 ---
   1 file changed, 20 insertions(+), 3 deletions(-)
  
  diff --git a/drivers/vfio/platform/vfio_platform_common.c 
  b/drivers/vfio/platform/vfio_platform_common.c
  index e0fdbc8..cb20526 100644
  --- a/drivers/vfio/platform/vfio_platform_common.c
  +++ b/drivers/vfio/platform/vfio_platform_common.c
  @@ -43,10 +43,27 @@ static int vfio_platform_open(void *device_data)
   static long vfio_platform_ioctl(void *device_data,
 unsigned int cmd, unsigned long arg)
   {
  -   if (cmd == VFIO_DEVICE_GET_INFO)
  -   return -EINVAL;
  +   struct vfio_platform_device *vdev = device_data;
  +   unsigned long minsz;
  +
  +   if (cmd == VFIO_DEVICE_GET_INFO) {
  +   struct vfio_device_info info;
  +
  +   minsz = offsetofend(struct vfio_device_info, num_irqs);
  +
  +   if (copy_from_user(info, (void __user *)arg, minsz))
  +   return -EFAULT;
  +
  +   if (info.argsz  minsz)
  +   return -EINVAL;
  +
  +   info.flags = vdev-flags;
  +   info.num_regions = 0;
  +   info.num_irqs = 0;
 Seems a bit weird to me to enable the modality but returning zeroed
 values. Shouldn't we put that patch after VFIO_DEVICE_GET_REGION_INFO
 and VFIO_DEVICE_GET_IRQ_INFO ones?

I actually like how Antonios has started from a base framework, exposing
a device but none of the resources and then incrementally adds each
component.  It's also a good showcase of the VFIO ABI that we can do
things like this.  Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes

2014-11-12 Thread Alex Williamson
On Wed, 2014-11-12 at 10:14 +0100, Paolo Bonzini wrote:
 
 On 12/11/2014 04:42, Zhang, Yang Z wrote:
  Personally, I think this feature will be helpful to the legacy device
  assignment. Agree, vfio is the right solution for future feature
  enabling. But the old kvm without the good vfio supporting is still
  used largely today. The user really looking for this feature but they
  will not upgrade their kernel. It's easy for us to backport this
  feature to old kvm with the legacy device assignment, but it is
  impossible to backport the whole vfio.
 
 You can certainly backport these patches to distros that do not have
 VFIO.  But upstream we should work on VFIO first.  VFIO has feature
 parity with legacy device assignment, and adding a new feature that is
 not in VFIO would be a bad idea.

Thanks Paolo, I agree.  We should design the interfaces for VFIO since
we expect legacy KVM assignment to be deprecated and eventually removed.
I think that some of the platform device work for ARM's IRQ forwarding
should probably be leveraged for this interface.  IRQ forwarding
effectively allows level triggered interrupts to be handled as edge,
eliminating the mask/unmask overhead and EOI path entirely.  To do this
through VFIO they make use of the KVM-VFIO device to register the device
and set attributes for the forwarded IRQ.  This enables KVM to use the
VFIO external user interfaces to acquire a VFIO device reference and
access the struct device.  From there it can do some IRQ manipulation on
the device to reconfigure how the host handles the interrupt.  Ideally
we could use the same base KVM-VFIO device interface interface, perhaps
with different attributes, and obviously with different architecture
backing.  Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v9 01/19] vfio/platform: initial skeleton of VFIO support for platform devices

2014-11-12 Thread Alex Williamson
On Wed, 2014-11-12 at 11:05 +0100, Eric Auger wrote:
 Hi Antonios,
 
 On 10/27/2014 07:07 PM, Antonios Motakis wrote:
  This patch forms the common skeleton code for platform devices support
  with VFIO. This will include the core functionality of VFIO_PLATFORM,
  however binding to the device and discovering the device resources will
  be done with the help of a separate file where any Linux platform bus
  specific code will reside.
  
  This will allow us to implement support for also discovering AMBA devices
  and their resources, but still reuse a large part of the VFIO_PLATFORM
  implementation.
  
  Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
  ---
   drivers/vfio/platform/vfio_platform_common.c  | 126 
  ++
   drivers/vfio/platform/vfio_platform_private.h |  36 
   2 files changed, 162 insertions(+)
   create mode 100644 drivers/vfio/platform/vfio_platform_common.c
   create mode 100644 drivers/vfio/platform/vfio_platform_private.h
  
  diff --git a/drivers/vfio/platform/vfio_platform_common.c 
  b/drivers/vfio/platform/vfio_platform_common.c
  new file mode 100644
  index 000..e0fdbc8
  --- /dev/null
  +++ b/drivers/vfio/platform/vfio_platform_common.c
  @@ -0,0 +1,126 @@
  +/*
  + * Copyright (C) 2013 - Virtual Open Systems
  + * Author: Antonios Motakis a.mota...@virtualopensystems.com
  + *
  + * This program is free software; you can redistribute it and/or modify
  + * it under the terms of the GNU General Public License, version 2, as
  + * published by the Free Software Foundation.
  + *
  + * This program is distributed in the hope that it will be useful,
  + * but WITHOUT ANY WARRANTY; without even the implied warranty of
  + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  + * GNU General Public License for more details.
  + */
  +
  +#include linux/device.h
  +#include linux/interrupt.h
  +#include linux/iommu.h
  +#include linux/module.h
  +#include linux/mutex.h
  +#include linux/notifier.h
  +#include linux/pm_runtime.h
  +#include linux/slab.h
  +#include linux/types.h
  +#include linux/uaccess.h
  +#include linux/vfio.h
  +#include linux/io.h
 not sure at that state all the above includes are needed.
  +
  +#include vfio_platform_private.h
  +
  +static void vfio_platform_release(void *device_data)
  +{
  +   module_put(THIS_MODULE);
  +}
  +
  +static int vfio_platform_open(void *device_data)
  +{
  +   if (!try_module_get(THIS_MODULE))
  +   return -ENODEV;
  +
  +   return 0;
  +}
  +
  +static long vfio_platform_ioctl(void *device_data,
  +  unsigned int cmd, unsigned long arg)
 a minor style comment/question that applies on all the series. Shouldn't
 subsequent argument lines rather be aligned with the first char after
 (, as done in PCI code?

It's also my preferred style to indent to just after the open paren on
wrapped lines where possible, but I don't think there are hard rules in
CodingStyle or checkpatch that enforce this, so I often let it slide.
Thanks,

Alex

  +{
  +   if (cmd == VFIO_DEVICE_GET_INFO)
  +   return -EINVAL;
  +
  +   else if (cmd == VFIO_DEVICE_GET_REGION_INFO)
  +   return -EINVAL;
  +
  +   else if (cmd == VFIO_DEVICE_GET_IRQ_INFO)
  +   return -EINVAL;
  +
  +   else if (cmd == VFIO_DEVICE_SET_IRQS)
  +   return -EINVAL;
  +
  +   else if (cmd == VFIO_DEVICE_RESET)
  +   return -EINVAL;
  +
  +   return -ENOTTY;
  +}
  +
  +static ssize_t vfio_platform_read(void *device_data, char __user *buf,
  + size_t count, loff_t *ppos)
  +{
  +   return -EINVAL;
  +}
  +
  +static ssize_t vfio_platform_write(void *device_data, const char __user 
  *buf,
  +  size_t count, loff_t *ppos)
  +{
  +   return -EINVAL;
  +}
  +
  +static int vfio_platform_mmap(void *device_data, struct vm_area_struct 
  *vma)
  +{
  +   return -EINVAL;
  +}
  +
  +static const struct vfio_device_ops vfio_platform_ops = {
  +   .name   = vfio-platform,
  +   .open   = vfio_platform_open,
  +   .release= vfio_platform_release,
  +   .ioctl  = vfio_platform_ioctl,
  +   .read   = vfio_platform_read,
  +   .write  = vfio_platform_write,
  +   .mmap   = vfio_platform_mmap,
  +};
  +
  +int vfio_platform_probe_common(struct vfio_platform_device *vdev,
  +  struct device *dev)
  +{
  +   struct iommu_group *group;
  +   int ret;
  +
  +   if (!vdev)
  +   return -EINVAL;
  +
  +   group = iommu_group_get(dev);
  +   if (!group) {
  +   pr_err(VFIO: No IOMMU group for device %s\n, vdev-name);
  +   return -EINVAL;
  +   }
 I saw the above check also is done at beginning of vfio_add_group_dev.
 Added value however is pr_err which is useful and PCI code does the
 check too.
 
 Eric
  +
  +   ret = vfio_add_group_dev(dev, vfio_platform_ops, vdev);
  +   if (ret) {
  +   

Re: Seeking a KVM benchmark

2014-11-12 Thread Paolo Bonzini
 Assuming you're running both of my patches (LOAD_EFER regardless of
 nx, but skip LOAD_EFER of guest == host), then some of the speedup may
 be just less code running.  I haven't figured out exactly when
 vmx_save_host_state runs, but my patches avoid a call to
 kvm_set_shared_msr, which is worth a few cycles.

Yes, that's possible.  vmx_save_host_state is here:

preempt_disable();

kvm_x86_ops-prepare_guest_switch(vcpu);  // 
if (vcpu-fpu_active)
kvm_load_guest_fpu(vcpu);
kvm_load_guest_xcr0(vcpu);

vcpu-mode = IN_GUEST_MODE;

srcu_read_unlock(vcpu-kvm-srcu, vcpu-srcu_idx);

and it's a fairly hot function.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: x86: add trace event for pvclock updates

2014-11-12 Thread David Matlack
On 11/10 11:18 PM, Marcelo Tosatti wrote:
 On Wed, Nov 05, 2014 at 11:46:42AM -0800, David Matlack wrote:
  The new trace event records:
* the id of vcpu being updated
* the pvclock_vcpu_time_info struct being written to guest memory
  
  This is useful for debugging pvclock bugs, such as the bug fixed by
  [PATCH] kvm: x86: Fix kvm clock versioning..
  
  Signed-off-by: David Matlack dmatl...@google.com
 
 So you actually hit that bug in practice? Can you describe the
 scenario?

We noticed guests running stress workloads would occasionally get stuck 
on the far side of a save/restore. Inspecting the guest state revealed
arch/x86/kernel/pvclock.c:last_value was stuck at a value like
8020566108469899263, despite TSC and pvclock looking sane.

Since these guests ran without PVCLOCK_TSC_STABLE_BIT set in their
CPUID, they were stuck with this large time value until real time
caught up (in about 250 years :).

We've been unable to reproduce the bug with kvm: x86: Fix kvm clock
versioning. so we didn't invest in catching the overflow in the act,
but a likely explanation is the guest gets save/restore-ed while
computing the pvclock delta:

u64 delta = __native_read_tsc() - src-tsc_timestamp;

causing the subtraction to underflow and delta to be huge.

 
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virt: kvm: arm: vgic: Return failure code '-EBUSY' when mutex_trylock() fails

2014-11-12 Thread Christoffer Dall
On Wed, Nov 12, 2014 at 11:04:23PM +0800, Chen Gang wrote:
 When mutex_trylock() fails, kvm_vgic_create() will not create 'vgic', so
 it need return failure code '-EBUSY' instead of '0' to let outside know
 about it.

I already sent a patch for the -EBUSY:
https://lists.cs.columbia.edu/pipermail/kvmarm/2014-November/011936.html

 
 Also simplify the code within kvm_vgic_create():
 
  - kvm_for_each_vcpu() scanning once is enough for current case.
 
  - Remove redundant variable 'vcpu_lock_idx'.

I don't like using the iterator variable for this kind of thing because
it can really break in languages where i is out-of-scope after the loop
and it is too easy to reuse the iterator variable in later versions of
the code.

That being said, the scanning once is slightly prettier I guess,
but I'd rather not introduce the churn unless others (Marc) think this
is a big win.

-Christoffer

 
 
 Signed-off-by: Chen Gang gang.chen.5...@gmail.com
 ---
  virt/kvm/arm/vgic.c | 15 +++
  1 file changed, 7 insertions(+), 8 deletions(-)
 
 diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
 index 3aaca49..5846725 100644
 --- a/virt/kvm/arm/vgic.c
 +++ b/virt/kvm/arm/vgic.c
 @@ -1933,7 +1933,7 @@ out:
  
  int kvm_vgic_create(struct kvm *kvm)
  {
 - int i, vcpu_lock_idx = -1, ret = 0;
 + int i, ret = 0;
   struct kvm_vcpu *vcpu;
  
   mutex_lock(kvm-lock);
 @@ -1949,13 +1949,12 @@ int kvm_vgic_create(struct kvm *kvm)
* that no other VCPUs are run while we create the vgic.
*/
   kvm_for_each_vcpu(i, vcpu, kvm) {
 - if (!mutex_trylock(vcpu-mutex))
 + if (!mutex_trylock(vcpu-mutex)) {
 + ret = -EBUSY;
   goto out_unlock;
 - vcpu_lock_idx = i;
 - }
 -
 - kvm_for_each_vcpu(i, vcpu, kvm) {
 + }
   if (vcpu-arch.has_run_once) {
 + mutex_unlock(vcpu-mutex);
   ret = -EBUSY;
   goto out_unlock;
   }
 @@ -1968,8 +1967,8 @@ int kvm_vgic_create(struct kvm *kvm)
   kvm-arch.vgic.vgic_cpu_base = VGIC_ADDR_UNDEF;
  
  out_unlock:
 - for (; vcpu_lock_idx = 0; vcpu_lock_idx--) {
 - vcpu = kvm_get_vcpu(kvm, vcpu_lock_idx);
 + while (i--  0) {
 + vcpu = kvm_get_vcpu(kvm, i);
   mutex_unlock(vcpu-mutex);
   }
  
 -- 
 1.9.3
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] Revert drivers/net: Disable UFO through virtio in macvtap and tun

2014-11-12 Thread Stefan Hajnoczi
On Tue, Nov 11, 2014 at 05:12:58PM +, Ben Hutchings wrote:
 This reverts commit 88e0e0e5aa722b193c8758c8b45d041de5316924 for
 the tap drivers, but leaves UFO disabled in virtio_net.
 
 libvirt at least assumes that tap features will never be dropped
 in new kernel versions, and doing so prevents migration of VMs to
 the never kernel version while they are running with virtio net
 devices.
 
 Fixes: 88e0e0e5aa7a (drivers/net: Disable UFO through virtio)
 Signed-off-by: Ben Hutchings b...@decadent.org.uk
 ---
 Compile-tested only.

Jelle, care to test this and reply with Tested-by: Jelle de Jong
jelledej...@powercraft.nl if it solves the live migration problem you
reported?

It requires applying the patch to the host kernel on your virt01 host.

Thanks!

 Ben.
 
  drivers/net/macvtap.c | 13 -
  drivers/net/tun.c | 19 ---
  2 files changed, 16 insertions(+), 16 deletions(-)
 
 diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
 index 6f226de..aeaeb6d 100644
 --- a/drivers/net/macvtap.c
 +++ b/drivers/net/macvtap.c
 @@ -66,7 +66,7 @@ static struct cdev macvtap_cdev;
  static const struct proto_ops macvtap_socket_ops;
  
  #define TUN_OFFLOADS (NETIF_F_HW_CSUM | NETIF_F_TSO_ECN | NETIF_F_TSO | \
 -   NETIF_F_TSO6)
 +   NETIF_F_TSO6 | NETIF_F_UFO)
  #define RX_OFFLOADS (NETIF_F_GRO | NETIF_F_LRO)
  #define TAP_FEATURES (NETIF_F_GSO | NETIF_F_SG)
  
 @@ -570,8 +570,6 @@ static int macvtap_skb_from_vnet_hdr(struct sk_buff *skb,
   gso_type = SKB_GSO_TCPV6;
   break;
   case VIRTIO_NET_HDR_GSO_UDP:
 - pr_warn_once(macvtap: %s: using disabled UFO feature; 
 please fix this program\n,
 -  current-comm);
   gso_type = SKB_GSO_UDP;
   if (skb-protocol == htons(ETH_P_IPV6))
   ipv6_proxy_select_ident(skb);
 @@ -619,6 +617,8 @@ static void macvtap_skb_to_vnet_hdr(const struct sk_buff 
 *skb,
   vnet_hdr-gso_type = VIRTIO_NET_HDR_GSO_TCPV4;
   else if (sinfo-gso_type  SKB_GSO_TCPV6)
   vnet_hdr-gso_type = VIRTIO_NET_HDR_GSO_TCPV6;
 + else if (sinfo-gso_type  SKB_GSO_UDP)
 + vnet_hdr-gso_type = VIRTIO_NET_HDR_GSO_UDP;
   else
   BUG();
   if (sinfo-gso_type  SKB_GSO_TCP_ECN)
 @@ -953,6 +953,9 @@ static int set_offload(struct macvtap_queue *q, unsigned 
 long arg)
   if (arg  TUN_F_TSO6)
   feature_mask |= NETIF_F_TSO6;
   }
 +
 + if (arg  TUN_F_UFO)
 + feature_mask |= NETIF_F_UFO;
   }
  
   /* tun/tap driver inverts the usage for TSO offloads, where
 @@ -963,7 +966,7 @@ static int set_offload(struct macvtap_queue *q, unsigned 
 long arg)
* When user space turns off TSO, we turn off GSO/LRO so that
* user-space will not receive TSO frames.
*/
 - if (feature_mask  (NETIF_F_TSO | NETIF_F_TSO6))
 + if (feature_mask  (NETIF_F_TSO | NETIF_F_TSO6 | NETIF_F_UFO))
   features |= RX_OFFLOADS;
   else
   features = ~RX_OFFLOADS;
 @@ -1064,7 +1067,7 @@ static long macvtap_ioctl(struct file *file, unsigned 
 int cmd,
   case TUNSETOFFLOAD:
   /* let the user check for future flags */
   if (arg  ~(TUN_F_CSUM | TUN_F_TSO4 | TUN_F_TSO6 |
 - TUN_F_TSO_ECN))
 + TUN_F_TSO_ECN | TUN_F_UFO))
   return -EINVAL;
  
   rtnl_lock();
 diff --git a/drivers/net/tun.c b/drivers/net/tun.c
 index 7302398..a0987d1 100644
 --- a/drivers/net/tun.c
 +++ b/drivers/net/tun.c
 @@ -175,7 +175,7 @@ struct tun_struct {
   struct net_device   *dev;
   netdev_features_t   set_features;
  #define TUN_USER_FEATURES (NETIF_F_HW_CSUM|NETIF_F_TSO_ECN|NETIF_F_TSO| \
 -   NETIF_F_TSO6)
 +   NETIF_F_TSO6|NETIF_F_UFO)
  
   int vnet_hdr_sz;
   int sndbuf;
 @@ -1152,20 +1152,10 @@ static ssize_t tun_get_user(struct tun_struct *tun, 
 struct tun_file *tfile,
   skb_shinfo(skb)-gso_type = SKB_GSO_TCPV6;
   break;
   case VIRTIO_NET_HDR_GSO_UDP:
 - {
 - static bool warned;
 -
 - if (!warned) {
 - warned = true;
 - netdev_warn(tun-dev,
 - %s: using disabled UFO feature; 
 please fix this program\n,
 - current-comm);
 - }
   skb_shinfo(skb)-gso_type = SKB_GSO_UDP;
   if (skb-protocol == htons(ETH_P_IPV6))
   

RE: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes

2014-11-12 Thread Zhang, Yang Z
Wu, Feng wrote on 2014-11-13:
 
 
 kvm-ow...@vger.kernel.org wrote on 2014-11-12:
 kvm@vger.kernel.org; io...@lists.linux-foundation.org; 
 linux-ker...@vger.kernel.org
 Subject: Re: [PATCH 05/13] KVM: Update IRTE according to guest 
 interrupt configuration changes
 
 
 
 On 12/11/2014 10:19, Wu, Feng wrote:
 You can certainly backport these patches to distros that do not 
 have VFIO.  But upstream we should work on VFIO first.  VFIO has 
 feature parity with legacy device assignment, and adding a new 
 feature that is not in VFIO would be a bad idea.
 
 By the way, do you have benchmark results for it?  We have not been 
 able to see any performance improvement for APICv on e.g. netperf.
 
 Do you mean benchmark results for APICv itself or VT-d Posted-Interrtups?
 
 Especially for VT-d posted interrupts---but it'd be great to know 
 which workloads see the biggest speedup from APICv.
 
 We have some draft performance data internally, please see the 
 attached. For VT-d PI, I think we can get the biggest performance gain 
 if the VCPU is running in non-root mode for most of the time (not in 
 HLT state), since external interrupt from assigned devices will be delivered 
 by guest directly in this case.
 That means we can run some cpu intensive workload in the guests.

Have you check that the CPU side posted interrupt is taking effect in w/o VT-D 
PI case? Per my understanding, the performance gap should be so large if you 
use CPU side posted interrupt. This data more like the VT-d PI vs non PI(both 
VT-d and CPU).

 
 Thanks,
 Feng
 
 
 Paolo
 --
 To unsubscribe from this list: send the line unsubscribe kvm in the 
 body of a message to majord...@vger.kernel.org More majordomo info at 
 http://vger.kernel.org/majordomo-info.html


Best regards,
Yang




RE: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes

2014-11-12 Thread Wu, Feng


 -Original Message-
 From: Zhang, Yang Z
 Sent: Thursday, November 13, 2014 9:21 AM
 To: Wu, Feng; Paolo Bonzini; Alex Williamson
 Cc: g...@kernel.org; dw...@infradead.org; j...@8bytes.org;
 t...@linutronix.de; mi...@redhat.com; h...@zytor.com; x...@kernel.org;
 kvm@vger.kernel.org; io...@lists.linux-foundation.org;
 linux-ker...@vger.kernel.org
 Subject: RE: [PATCH 05/13] KVM: Update IRTE according to guest interrupt
 configuration changes
 
 Wu, Feng wrote on 2014-11-13:
 
 
  kvm-ow...@vger.kernel.org wrote on 2014-11-12:
  kvm@vger.kernel.org; io...@lists.linux-foundation.org;
  linux-ker...@vger.kernel.org
  Subject: Re: [PATCH 05/13] KVM: Update IRTE according to guest
  interrupt configuration changes
 
 
 
  On 12/11/2014 10:19, Wu, Feng wrote:
  You can certainly backport these patches to distros that do not
  have VFIO.  But upstream we should work on VFIO first.  VFIO has
  feature parity with legacy device assignment, and adding a new
  feature that is not in VFIO would be a bad idea.
 
  By the way, do you have benchmark results for it?  We have not been
  able to see any performance improvement for APICv on e.g. netperf.
 
  Do you mean benchmark results for APICv itself or VT-d Posted-Interrtups?
 
  Especially for VT-d posted interrupts---but it'd be great to know
  which workloads see the biggest speedup from APICv.
 
  We have some draft performance data internally, please see the
  attached. For VT-d PI, I think we can get the biggest performance gain
  if the VCPU is running in non-root mode for most of the time (not in
  HLT state), since external interrupt from assigned devices will be 
  delivered by
 guest directly in this case.
  That means we can run some cpu intensive workload in the guests.
 
 Have you check that the CPU side posted interrupt is taking effect in w/o VT-D
 PI case? Per my understanding, the performance gap should be so large if you
 use CPU side posted interrupt. This data more like the VT-d PI vs non PI(both
 VT-d and CPU).

Yes, this data is VT-d PI vs Non VT-d PI. The CPU side APICv mechanism 
(including CPU side Posted-Interrtups) is enabled.

Thanks,
Feng

 
 
  Thanks,
  Feng
 
 
  Paolo
  --
  To unsubscribe from this list: send the line unsubscribe kvm in the
  body of a message to majord...@vger.kernel.org More majordomo info at
  http://vger.kernel.org/majordomo-info.html
 
 
 Best regards,
 Yang
 



RE: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes

2014-11-12 Thread Zhang, Yang Z
Wu, Feng wrote on 2014-11-13:
 
 
 Zhang, Yang Z wrote on 2014-11-13:
 kvm@vger.kernel.org; io...@lists.linux-foundation.org;
 linux-ker...@vger.kernel.org
 Subject: RE: [PATCH 05/13] KVM: Update IRTE according to guest
 interrupt configuration changes
 
 Wu, Feng wrote on 2014-11-13:
 
 
 kvm-ow...@vger.kernel.org wrote on 2014-11-12:
 kvm@vger.kernel.org; io...@lists.linux-foundation.org;
 linux-ker...@vger.kernel.org
 Subject: Re: [PATCH 05/13] KVM: Update IRTE according to guest
 interrupt configuration changes
 
 
 
 On 12/11/2014 10:19, Wu, Feng wrote:
 You can certainly backport these patches to distros that do not
 have VFIO.  But upstream we should work on VFIO first.  VFIO
 has feature parity with legacy device assignment, and adding a
 new feature that is not in VFIO would be a bad idea.
 
 By the way, do you have benchmark results for it?  We have not
 been able to see any performance improvement for APICv on e.g.
 netperf.
 
 Do you mean benchmark results for APICv itself or VT-d
 Posted-Interrtups?
 
 Especially for VT-d posted interrupts---but it'd be great to know
 which workloads see the biggest speedup from APICv.
 
 We have some draft performance data internally, please see the
 attached. For VT-d PI, I think we can get the biggest performance gain
 if the VCPU is running in non-root mode for most of the time (not in
 HLT state), since external interrupt from assigned devices will be
 delivered by guest directly in this case. That means we can run some
 cpu intensive workload in the guests.
 
 Have you check that the CPU side posted interrupt is taking effect
 in w/o VT-D PI case? Per my understanding, the performance gap
 should be so large if you use CPU side posted interrupt. This data
 more like the VT-d PI vs non PI(both VT-d and CPU).
 
 Yes, this data is VT-d PI vs Non VT-d PI. The CPU side APICv mechanism
 (including CPU side Posted-Interrtups) is enabled.

From the CPU utilization data, it seems the environment of APICv is not 
reasonable to me. with current APICv, the interrupt should not deliver to the 
PCPU where vcpu is running. Otherwise, it will force the vcpu vmexit and the 
CPU side posted interrupt cannot take effect. Do you set the interrupt affinity 
manually?

 
 Thanks,
 Feng
 
 
 
 Thanks,
 Feng
 
 
 Paolo
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org More majordomo
 info at http://vger.kernel.org/majordomo-info.html
 
 
 Best regards,
 Yang



Best regards,
Yang




Re: [PATCH] virt: kvm: arm: vgic: Return failure code '-EBUSY' when mutex_trylock() fails

2014-11-12 Thread Chen Gang
On 11/13/14 3:41, Christoffer Dall wrote:
 On Wed, Nov 12, 2014 at 11:04:23PM +0800, Chen Gang wrote:
 When mutex_trylock() fails, kvm_vgic_create() will not create 'vgic', so
 it need return failure code '-EBUSY' instead of '0' to let outside know
 about it.
 
 I already sent a patch for the -EBUSY:
 https://lists.cs.columbia.edu/pipermail/kvmarm/2014-November/011936.html
 

Yeah, really it is.


 Also simplify the code within kvm_vgic_create():

  - kvm_for_each_vcpu() scanning once is enough for current case.

  - Remove redundant variable 'vcpu_lock_idx'.
 
 I don't like using the iterator variable for this kind of thing because
 it can really break in languages where i is out-of-scope after the loop
 and it is too easy to reuse the iterator variable in later versions of
 the code.
 

For me, what you said is OK, we can still keep it no touch.

 That being said, the scanning once is slightly prettier I guess,
 but I'd rather not introduce the churn unless others (Marc) think this
 is a big win.
 

If only merge the 2 scanning loops, it will not change much. And also
can let code simpler and clearer for readers (both are for processing
and checking '-EBUSY').

If possible, after your patch merges into linux next tree, I will send
the related improving patch for it.

Thanks.

 -Christoffer
 


 Signed-off-by: Chen Gang gang.chen.5...@gmail.com
 ---
  virt/kvm/arm/vgic.c | 15 +++
  1 file changed, 7 insertions(+), 8 deletions(-)

 diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
 index 3aaca49..5846725 100644
 --- a/virt/kvm/arm/vgic.c
 +++ b/virt/kvm/arm/vgic.c
 @@ -1933,7 +1933,7 @@ out:
  
  int kvm_vgic_create(struct kvm *kvm)
  {
 -int i, vcpu_lock_idx = -1, ret = 0;
 +int i, ret = 0;
  struct kvm_vcpu *vcpu;
  
  mutex_lock(kvm-lock);
 @@ -1949,13 +1949,12 @@ int kvm_vgic_create(struct kvm *kvm)
   * that no other VCPUs are run while we create the vgic.
   */
  kvm_for_each_vcpu(i, vcpu, kvm) {
 -if (!mutex_trylock(vcpu-mutex))
 +if (!mutex_trylock(vcpu-mutex)) {
 +ret = -EBUSY;
  goto out_unlock;
 -vcpu_lock_idx = i;
 -}
 -
 -kvm_for_each_vcpu(i, vcpu, kvm) {
 +}
  if (vcpu-arch.has_run_once) {
 +mutex_unlock(vcpu-mutex);
  ret = -EBUSY;
  goto out_unlock;
  }
 @@ -1968,8 +1967,8 @@ int kvm_vgic_create(struct kvm *kvm)
  kvm-arch.vgic.vgic_cpu_base = VGIC_ADDR_UNDEF;
  
  out_unlock:
 -for (; vcpu_lock_idx = 0; vcpu_lock_idx--) {
 -vcpu = kvm_get_vcpu(kvm, vcpu_lock_idx);
 +while (i--  0) {
 +vcpu = kvm_get_vcpu(kvm, i);
  mutex_unlock(vcpu-mutex);
  }
  
 -- 
 1.9.3

-- 
Chen Gang

Open, share, and attitude like air, water, and life which God blessed
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM: x86: add module parameter to disable periodic kvmclock sync

2014-11-12 Thread Marcelo Tosatti

The periodic kvmclock sync can be an undesired source of latencies.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0033df3..be56fd3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -98,6 +98,9 @@ module_param(ignore_msrs, bool, S_IRUGO | S_IWUSR);
 unsigned int min_timer_period_us = 500;
 module_param(min_timer_period_us, uint, S_IRUGO | S_IWUSR);
 
+static bool kvmclock_periodic_sync = 1;
+module_param(kvmclock_periodic_sync, bool, S_IRUGO | S_IWUSR);
+
 bool kvm_has_tsc_control;
 EXPORT_SYMBOL_GPL(kvm_has_tsc_control);
 u32  kvm_max_guest_tsc_khz;
@@ -1718,7 +1721,8 @@ static void kvmclock_sync_fn(struct work_struct *work)
struct kvm *kvm = container_of(ka, struct kvm, arch);
 
schedule_delayed_work(kvm-arch.kvmclock_update_work, 0);
-   schedule_delayed_work(kvm-arch.kvmclock_sync_work,
+   if (kvmclock_periodic_sync)
+   schedule_delayed_work(kvm-arch.kvmclock_sync_work,
KVMCLOCK_SYNC_PERIOD);
 }
 
@@ -6971,7 +6975,8 @@ int kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu)
kvm_write_tsc(vcpu, msr);
vcpu_put(vcpu);
 
-   schedule_delayed_work(kvm-arch.kvmclock_sync_work,
+   if (kvmclock_periodic_sync)
+   schedule_delayed_work(kvm-arch.kvmclock_sync_work,
KVMCLOCK_SYNC_PERIOD);
 
return r;


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm: svm: move WARN_ON in svm_adjust_tsc_offset

2014-11-12 Thread Chris J Arges
When running the tsc_adjust kvm-unit-test on an AMD processor with the
IA32_TSC_ADJUST feature enabled, the WARN_ON in svm_adjust_tsc_offset can be
triggered. This WARN_ON checks for a negative adjustment in case __scale_tsc
is called; however it may trigger unnecessary warnings.

This patch moves the WARN_ON to trigger only if __scale_tsc will actually be
called from svm_adjust_tsc_offset. In addition make adj in kvm_set_msr_common
s64 since this can have signed values.

Signed-off-by: Chris J Arges chris.j.ar...@canonical.com
---
 arch/x86/kvm/svm.c | 8 +---
 arch/x86/kvm/x86.c | 2 +-
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index d4f3aaa..6b411ad 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1056,9 +1056,11 @@ static void svm_adjust_tsc_offset(struct kvm_vcpu *vcpu, 
s64 adjustment, bool ho
 {
struct vcpu_svm *svm = to_svm(vcpu);
 
-   WARN_ON(adjustment  0);
-   if (host)
-   adjustment = svm_scale_tsc(vcpu, adjustment);
+   if (host) {
+   if (svm-tsc_ratio != TSC_RATIO_DEFAULT)
+   WARN_ON(adjustment  0);
+   adjustment = svm_scale_tsc(vcpu, (u64)adjustment);
+   }
 
svm-vmcb-control.tsc_offset += adjustment;
if (is_guest_mode(vcpu))
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f85da5c..1cd1376 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2141,7 +2141,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
case MSR_IA32_TSC_ADJUST:
if (guest_cpuid_has_tsc_adjust(vcpu)) {
if (!msr_info-host_initiated) {
-   u64 adj = data - vcpu-arch.ia32_tsc_adjust_msr;
+   s64 adj = data - vcpu-arch.ia32_tsc_adjust_msr;
kvm_x86_ops-adjust_tsc_offset(vcpu, adj, true);
}
vcpu-arch.ia32_tsc_adjust_msr = data;
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 01/11] sched: introduce sys_cpumask in tsk to adapt asymmetric system

2014-11-12 Thread Srikar Dronamraju
* kernelf...@gmail.com kernelf...@gmail.com [2014-10-16 15:29:50]:

 Some system such as powerpc, some tsk (vcpu thread) can only run on
 the dedicated cpu. Since we adapt some asymmetric method to monitor the
 whole physical cpu. (powerKVM only allows the primary hwthread to
 set up runtime env for the secondary when entering guest).
 
 Nowadays, powerKVM run with all the secondary hwthread offline to ensure
 the vcpu threads only run on the primary thread. But we plan to keep all
 cpus online when running powerKVM to give more power when switching back
 to host, so introduce sys_allowed cpumask to reflect the cpuset which
 the vcpu thread can run on.
 
 Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com
 ---
  include/linux/init_task.h |  1 +
  include/linux/sched.h |  6 ++
  kernel/sched/core.c   | 10 --
  3 files changed, 15 insertions(+), 2 deletions(-)
 
 diff --git a/include/linux/init_task.h b/include/linux/init_task.h
 index 2bb4c4f3..c56f69e 100644
 --- a/include/linux/init_task.h
 +++ b/include/linux/init_task.h
 @@ -172,6 +172,7 @@ extern struct task_group root_task_group;
   .normal_prio= MAX_PRIO-20,  \
   .policy = SCHED_NORMAL, \
   .cpus_allowed   = CPU_MASK_ALL, \
 + .sys_allowed = CPU_MASK_ALL,\

Do we really need another mask, cant we just use cpus_allowed itself.

   .nr_cpus_allowed= NR_CPUS,  \
   .mm = NULL, \
   .active_mm  = init_mm, \
 diff --git a/include/linux/sched.h b/include/linux/sched.h
 index 5c2c885..ce429f3 100644
 --- a/include/linux/sched.h
 +++ b/include/linux/sched.h
 @@ -1260,7 +1260,10 @@ struct task_struct {
  
   unsigned int policy;
   int nr_cpus_allowed;
 + /* Anded user and sys_allowed */
   cpumask_t cpus_allowed;
 + /* due to the feature of asymmetric, some tsk can only run on such cpu 
 */
 + cpumask_t sys_allowed;
  
  #ifdef CONFIG_PREEMPT_RCU
   int rcu_read_lock_nesting;
 @@ -2030,6 +2033,9 @@ static inline void tsk_restore_flags(struct task_struct 
 *task,
  }
  
  #ifdef CONFIG_SMP
 +extern void set_cpus_sys_allowed(struct task_struct *p,
 + const struct cpumask *new_mask);
 +
  extern void do_set_cpus_allowed(struct task_struct *p,
  const struct cpumask *new_mask);
  
 diff --git a/kernel/sched/core.c b/kernel/sched/core.c
 index ec1a286..2cd1ae3 100644
 --- a/kernel/sched/core.c
 +++ b/kernel/sched/core.c
 @@ -4596,13 +4596,19 @@ void init_idle(struct task_struct *idle, int cpu)
  }
  
  #ifdef CONFIG_SMP
 +void set_cpus_sys_allowed(struct task_struct *p,
 + const struct cpumask *new_mask)
 +{
 + cpumask_copy(p-sys_allowed, new_mask);
 +}
 +

This function doesnt seem to be used anywhere... Not sure why it is
introduced

  void do_set_cpus_allowed(struct task_struct *p, const struct cpumask 
 *new_mask)
  {
   if (p-sched_class  p-sched_class-set_cpus_allowed)
   p-sched_class-set_cpus_allowed(p, new_mask);
  
 - cpumask_copy(p-cpus_allowed, new_mask);
 - p-nr_cpus_allowed = cpumask_weight(new_mask);
 + cpumask_and(p-cpus_allowed, p-sys_allowed, new_mask);
 + p-nr_cpus_allowed = cpumask_weight(p-cpus_allowed);
  }
  
  /*
 -- 
 1.8.3.1
 
 

-- 
Thanks and Regards
Srikar Dronamraju

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 02/11] powerpc: kvm: ensure vcpu-thread run only on primary hwthread

2014-11-12 Thread Srikar Dronamraju
* kernelf...@gmail.com kernelf...@gmail.com [2014-10-16 15:29:51]:

 When vcpu thread runs at the first time, it will ensure to stick
 to the primary thread.
 
 Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com
 ---
  arch/powerpc/include/asm/kvm_host.h |  3 +++
  arch/powerpc/kvm/book3s_hv.c| 17 +
  2 files changed, 20 insertions(+)
 
 diff --git a/arch/powerpc/include/asm/kvm_host.h 
 b/arch/powerpc/include/asm/kvm_host.h
 index 98d9dd5..9a3355e 100644
 --- a/arch/powerpc/include/asm/kvm_host.h
 +++ b/arch/powerpc/include/asm/kvm_host.h
 @@ -666,6 +666,9 @@ struct kvm_vcpu_arch {
   spinlock_t tbacct_lock;
   u64 busy_stolen;
   u64 busy_preempt;
 +#ifdef CONFIG_KVMPPC_ENABLE_SECONDARY
 + bool cpu_selected;
 +#endif
  #endif
  };
  
 diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
 index 27cced9..ba258c8 100644
 --- a/arch/powerpc/kvm/book3s_hv.c
 +++ b/arch/powerpc/kvm/book3s_hv.c
 @@ -1909,6 +1909,23 @@ static int kvmppc_vcpu_run_hv(struct kvm_run *run, 
 struct kvm_vcpu *vcpu)
  {
   int r;
   int srcu_idx;
 +#ifdef CONFIG_KVMPPC_ENABLE_SECONDARY
 + int cpu = smp_processor_id();
 + int target_cpu;
 + unsigned int cpu;

2 variables with same name... cpu

 + struct task_struct *p = current;
 +
 + if (unlikely(!vcpu-arch.cpu_selected)) {
 + vcpu-arch.cpu_selected = true;

Nit: something like cpumask_set seems to be better than cpu_selected

 + for (cpu = 0; cpu  NR_CPUS; cpu+=threads_per_core) {
 + cpumask_set_cpu(cpu, p-sys_allowed);
Dont we need to reset the cpumask first before we set
the cpumask here?

 + }
 + if (cpu%threads_per_core != 0) {

At this time, cpu should be NR_CPUS and most times it should be a
multiple of threads_per_core. Unfortunately there wont be a cpu with
cpu number NR_CPUS.

 + target_cpu = cpu/threads_per_core*threads_per_core;

Its probably better of to have parenthesis here. 

 + migrate_task_to(current, target_cpu);
We are probably migrating to a non-existant cpu.
Also dont you need to check if the target_cpu is part of the cpumask?

 + }
 + }
 +#endif
  
   if (!vcpu-arch.sane) {
   run-exit_reason = KVM_EXIT_INTERNAL_ERROR;
 -- 
 1.8.3.1
 
 

-- 
Thanks and Regards
Srikar Dronamraju

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 03/11] powerpc: kvm: add interface to control kvm function on a core

2014-11-12 Thread Srikar Dronamraju
* kernelf...@gmail.com kernelf...@gmail.com [2014-10-16 15:29:52]:

 When kvm is enabled on a core, we migrate all external irq to primary
 thread. Since currently, the kvmirq logic is handled by the primary
 hwthread.
 
 Todo: this patch lacks re-enable of irqbalance when kvm is disable on
 the core
 
 Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com
 ---
  arch/powerpc/kernel/sysfs.c| 39 
 ++
  arch/powerpc/sysdev/xics/xics-common.c | 12 +++
  2 files changed, 51 insertions(+)
 
 diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
 index 67fd2fd..a2595dd 100644
 --- a/arch/powerpc/kernel/sysfs.c
 +++ b/arch/powerpc/kernel/sysfs.c
 @@ -552,6 +552,45 @@ static void sysfs_create_dscr_default(void)
   if (cpu_has_feature(CPU_FTR_DSCR))
   err = device_create_file(cpu_subsys.dev_root, 
 dev_attr_dscr_default);
  }
 +
 +#ifdef CONFIG_KVMPPC_ENABLE_SECONDARY
 +#define NR_CORES (CONFIG_NR_CPUS/threads_per_core)
 +static DECLARE_BITMAP(kvm_on_core, NR_CORES) __read_mostly
 +
 +static ssize_t show_kvm_enable(struct device *dev,
 + struct device_attribute *attr, char *buf)
 +{
 +}
 +
 +static ssize_t __used store_kvm_enable(struct device *dev,
 + struct device_attribute *attr, const char *buf,
 + size_t count)
 +{
 + struct cpumask stop_cpus;
 + unsigned long core, thr;
 +
 + sscanf(buf, %lx, core);
 + if (core  NR_CORES)
 + return -1;
 + if (!test_bit(core, kvm_on_core))
 + for (thr = 1; thr threads_per_core; thr++)
 + if (cpu_online(thr * threads_per_core + thr))
 + cpumask_set_cpu(thr * threads_per_core + thr, 
 stop_cpus);

Shouldnt this be 
if (cpu_online(core * threads_per_core + thr))
cpumask_set_cpu(core * threads_per_core + thr, 
stop_cpus);
?


-- 
Thanks and Regards
Srikar Dronamraju

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html