Re: compiler bug gcc4.6/4.7 with ACCESS_ONCE and workarounds
Am 12.11.2014 um 01:36 schrieb Linus Torvalds: On Tue, Nov 11, 2014 at 4:33 PM, Linus Torvalds torva...@linux-foundation.org wrote: I guess as a workaround it is fine, as long as we don't lose sight of trying to eventually do a better job. Oh, and when it comes to the actual gcc bug - do you have any reason to believe that it's somehow triggered more easily by something particular in the arch/s390/kvm/gaccess.c code? Yes there are reasons. First of all the bug if SRA removes the volatile tag, but that does not mean that this breaks things. As long as the operation is simple enough things will be mostly ok. If things are not simple like in gaccess things get more complicated. Lets look at the ipte lock. The lock itself consists of 3 parts: k (1 bit:locked), kh(31bit counter for the host) and kg(32 bit counter for the millicode when doing specific guest instructions). There are 3 valid states 1. k=0, kh=0; kg=0 2. k=1, kh!=0, kg=0 3. k=1, kh=0, kg!=0 So the host code must check if the guest counter is zero and can then set the k bit to one and increase the counter. (for unlock it has to decrement kh and if that becomes zero also zero the k bit) That means that we have multiple accesses to subcomponents. As the host counter is bit 1-31 (ibm speak, linux speak bit 32-62) gcc wants to use the load thirty one bit instruction. So far so good. The ticket lock is also not a trivial set/clear bit. Now: In gcc the memory costs for s390 are modeled to have the same costs as register accesses (TARGET_MEMORY_MOVE_COST==1, TARGET_REGISTER_MOVE_COST=1) So for gcc a re-loading of a part of the lock from memory costs the same as loading it from a register. That probably triggered that bug. Christian IOW, why does this problem not hit the x86 spinlocks that also use volatile pointers to aggregate types? Or does it? I think we would have noticed if that hits. But there are certainly cases where this bug triggers also on x86, see the initial bug report of https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145 This bug is certainly different, (instead of transforming one load into multiple loads , it combines multiple write into one) but it shows, that a volatile marker is removed. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Live migration locks up 3.2 guests in do_timer(ticks ~ 500000)
On 11/11/2014 20:57, Matt Mullins wrote: That seems to work great, yes. Looking through the commit history, I see: kvmclock: Ensure time in migration never goes backward kvmclock: Ensure proper env-tsc value for kvmclock_current_nsec calculation Assuming those are the right fixes for this issue, are they suitable to request backported to distros' qemu 2.0 branches? The merge commit for them seemed to indicate they were problematic at first. On second thought - I found the qemu-devel threads about reverting them in the 2.1 timeframe, so I'm going to do a little more research before I start trying to suggest how to fix for existing install-base. The right commits are de9d61e83d43be9069e6646fa9d57a3f47779d28 317b0a6d8ba44e9bf8f9c3dbd776c4536843d82c 9a48bcd1b82494671c09b0eefdb882581499 which are similar but not equivalent to the two commits you found. They should be in 2.1.3 if there will be one, and they are appropriate for backporting to 2.0. Thanks for confirming that they fix your problem! Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes
On 12/11/2014 04:42, Zhang, Yang Z wrote: Personally, I think this feature will be helpful to the legacy device assignment. Agree, vfio is the right solution for future feature enabling. But the old kvm without the good vfio supporting is still used largely today. The user really looking for this feature but they will not upgrade their kernel. It's easy for us to backport this feature to old kvm with the legacy device assignment, but it is impossible to backport the whole vfio. You can certainly backport these patches to distros that do not have VFIO. But upstream we should work on VFIO first. VFIO has feature parity with legacy device assignment, and adding a new feature that is not in VFIO would be a bad idea. By the way, do you have benchmark results for it? We have not been able to see any performance improvement for APICv on e.g. netperf. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes
-Original Message- From: Paolo Bonzini [mailto:pbonz...@redhat.com] Sent: Wednesday, November 12, 2014 5:14 PM To: Zhang, Yang Z; Wu, Feng; Alex Williamson Cc: g...@kernel.org; dw...@infradead.org; j...@8bytes.org; t...@linutronix.de; mi...@redhat.com; h...@zytor.com; x...@kernel.org; kvm@vger.kernel.org; io...@lists.linux-foundation.org; linux-ker...@vger.kernel.org Subject: Re: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes On 12/11/2014 04:42, Zhang, Yang Z wrote: Personally, I think this feature will be helpful to the legacy device assignment. Agree, vfio is the right solution for future feature enabling. But the old kvm without the good vfio supporting is still used largely today. The user really looking for this feature but they will not upgrade their kernel. It's easy for us to backport this feature to old kvm with the legacy device assignment, but it is impossible to backport the whole vfio. You can certainly backport these patches to distros that do not have VFIO. But upstream we should work on VFIO first. VFIO has feature parity with legacy device assignment, and adding a new feature that is not in VFIO would be a bad idea. By the way, do you have benchmark results for it? We have not been able to see any performance improvement for APICv on e.g. netperf. Do you mean benchmark results for APICv itself or VT-d Posted-Interrtups? Thanks, Feng Paolo
Re: compiler bug gcc4.6/4.7 with ACCESS_ONCE and workarounds
On Tue, 11 Nov 2014 16:36:06 -0800 Linus Torvalds torva...@linux-foundation.org wrote: On Tue, Nov 11, 2014 at 4:33 PM, Linus Torvalds torva...@linux-foundation.org wrote: I guess as a workaround it is fine, as long as we don't lose sight of trying to eventually do a better job. Oh, and when it comes to the actual gcc bug - do you have any reason to believe that it's somehow triggered more easily by something particular in the arch/s390/kvm/gaccess.c code? IOW, why does this problem not hit the x86 spinlocks that also use volatile pointers to aggregate types? Or does it? This looks similiar to what we had on s390: old.tickets = ACCESS_ONCE(lock-tickets) In theory x86 should be affected as well. On s390 we have lots of instruction that operate on memory and the cost model of gcc makes the compiler more inclined to access memory multiple times. My guess would be that once the value is cached in a register the cost model for x86 will usually make sure that the value is not read a second time. But this is no guarantee. -- blue skies, Martin. Reality continues to ruin my life. - Calvin. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v9 03/19] vfio: platform: add the VFIO PLATFORM module to Kconfig
On 28 October 2014 02:07, Antonios Motakis a.mota...@virtualopensystems.com wrote: Enable building the VFIO PLATFORM driver that allows to use Linux platform devices with VFIO. Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com --- drivers/vfio/Kconfig | 1 + drivers/vfio/Makefile | 1 + drivers/vfio/platform/Kconfig | 9 + drivers/vfio/platform/Makefile | 4 4 files changed, 15 insertions(+) create mode 100644 drivers/vfio/platform/Kconfig create mode 100644 drivers/vfio/platform/Makefile diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig index a0abe04..962fb80 100644 --- a/drivers/vfio/Kconfig +++ b/drivers/vfio/Kconfig @@ -27,3 +27,4 @@ menuconfig VFIO If you don't know what to do here, say N. source drivers/vfio/pci/Kconfig +source drivers/vfio/platform/Kconfig diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile index 0b035b1..dadf0ca 100644 --- a/drivers/vfio/Makefile +++ b/drivers/vfio/Makefile @@ -3,3 +3,4 @@ obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o obj-$(CONFIG_VFIO_SPAPR_EEH) += vfio_spapr_eeh.o obj-$(CONFIG_VFIO_PCI) += pci/ +obj-$(CONFIG_VFIO_PLATFORM) += platform/ diff --git a/drivers/vfio/platform/Kconfig b/drivers/vfio/platform/Kconfig new file mode 100644 index 000..c51af17 --- /dev/null +++ b/drivers/vfio/platform/Kconfig @@ -0,0 +1,9 @@ +config VFIO_PLATFORM + tristate VFIO support for platform devices + depends on VFIO EVENTFD ARM Hi Antonios, Is this only for ARM? how about X86 and PowerPC? On Freescale's PowerPC platform, the IOMMU is called PAMU (Peripheral Access Management Unit), and I am trying to use this VFIO framework on it. + help + Support for platform devices with VFIO. This is required to make + use of platform devices present on the system using the VFIO + framework. + + If you don't know what to do here, say N. diff --git a/drivers/vfio/platform/Makefile b/drivers/vfio/platform/Makefile new file mode 100644 index 000..279862b --- /dev/null +++ b/drivers/vfio/platform/Makefile @@ -0,0 +1,4 @@ + +vfio-platform-y := vfio_platform.o vfio_platform_common.o + +obj-$(CONFIG_VFIO_PLATFORM) += vfio-platform.o -- 2.1.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes
On 12/11/2014 10:19, Wu, Feng wrote: You can certainly backport these patches to distros that do not have VFIO. But upstream we should work on VFIO first. VFIO has feature parity with legacy device assignment, and adding a new feature that is not in VFIO would be a bad idea. By the way, do you have benchmark results for it? We have not been able to see any performance improvement for APICv on e.g. netperf. Do you mean benchmark results for APICv itself or VT-d Posted-Interrtups? Especially for VT-d posted interrupts---but it'd be great to know which workloads see the biggest speedup from APICv. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v9 03/19] vfio: platform: add the VFIO PLATFORM module to Kconfig
Hello Hongbo, On Wed, Nov 12, 2014 at 10:52 AM, Hongbo Zhang hongbo.zh...@linaro.org wrote: On 28 October 2014 02:07, Antonios Motakis a.mota...@virtualopensystems.com wrote: Enable building the VFIO PLATFORM driver that allows to use Linux platform devices with VFIO. Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com --- drivers/vfio/Kconfig | 1 + drivers/vfio/Makefile | 1 + drivers/vfio/platform/Kconfig | 9 + drivers/vfio/platform/Makefile | 4 4 files changed, 15 insertions(+) create mode 100644 drivers/vfio/platform/Kconfig create mode 100644 drivers/vfio/platform/Makefile diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig index a0abe04..962fb80 100644 --- a/drivers/vfio/Kconfig +++ b/drivers/vfio/Kconfig @@ -27,3 +27,4 @@ menuconfig VFIO If you don't know what to do here, say N. source drivers/vfio/pci/Kconfig +source drivers/vfio/platform/Kconfig diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile index 0b035b1..dadf0ca 100644 --- a/drivers/vfio/Makefile +++ b/drivers/vfio/Makefile @@ -3,3 +3,4 @@ obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o obj-$(CONFIG_VFIO_SPAPR_EEH) += vfio_spapr_eeh.o obj-$(CONFIG_VFIO_PCI) += pci/ +obj-$(CONFIG_VFIO_PLATFORM) += platform/ diff --git a/drivers/vfio/platform/Kconfig b/drivers/vfio/platform/Kconfig new file mode 100644 index 000..c51af17 --- /dev/null +++ b/drivers/vfio/platform/Kconfig @@ -0,0 +1,9 @@ +config VFIO_PLATFORM + tristate VFIO support for platform devices + depends on VFIO EVENTFD ARM Hi Antonios, Is this only for ARM? how about X86 and PowerPC? On Freescale's PowerPC platform, the IOMMU is called PAMU (Peripheral Access Management Unit), and I am trying to use this VFIO framework on it. In principle it should be working on any platform with such devices; as long as you have a VFIO IOMMU driver for the PAMU (on ARM we use VFIO PLATFORM for the device, with VFIO IOMMU TYPE1 for the IOMMU). So if you have a suitable IOMMU driver for your target, feel free to test it, and let us know of the results. + help + Support for platform devices with VFIO. This is required to make + use of platform devices present on the system using the VFIO + framework. + + If you don't know what to do here, say N. diff --git a/drivers/vfio/platform/Makefile b/drivers/vfio/platform/Makefile new file mode 100644 index 000..279862b --- /dev/null +++ b/drivers/vfio/platform/Makefile @@ -0,0 +1,4 @@ + +vfio-platform-y := vfio_platform.o vfio_platform_common.o + +obj-$(CONFIG_VFIO_PLATFORM) += vfio-platform.o -- 2.1.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v9 03/19] vfio: platform: add the VFIO PLATFORM module to Kconfig
On 12 November 2014 18:05, bharat.bhus...@freescale.com bharat.bhus...@freescale.com wrote: Hi, This is not yet supported on Freescale PowerPC. I am still in process of upstreaming the FSL PAMU specific patches for same. Initial plan is to test with PCIe devices and then with Platform devices. I see there is already driver/iommu/fsl_pamu.c, doesn't it work? Could you explain briefly what is wrong? I've heard that the vfio pci works on powerpc platforms Thanks. Thanks -Bharat From: kvmarm-boun...@lists.cs.columbia.edu [mailto:kvmarm-boun...@lists.cs.columbia.edu] On Behalf Of Hongbo Zhang Sent: Wednesday, November 12, 2014 3:08 PM To: Antonios Motakis Cc: open list:VFIO DRIVER; will.dea...@arm.com; alex.william...@redhat.com; open list; io...@lists.linux-foundation.org; t...@virtualopensystems.com; kvm...@lists.cs.columbia.edu Subject: Re: [PATCH v9 03/19] vfio: platform: add the VFIO PLATFORM module to Kconfig On 28 October 2014 02:07, Antonios Motakis a.mota...@virtualopensystems.com wrote: Enable building the VFIO PLATFORM driver that allows to use Linux platform devices with VFIO. Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com --- drivers/vfio/Kconfig | 1 + drivers/vfio/Makefile | 1 + drivers/vfio/platform/Kconfig | 9 + drivers/vfio/platform/Makefile | 4 4 files changed, 15 insertions(+) create mode 100644 drivers/vfio/platform/Kconfig create mode 100644 drivers/vfio/platform/Makefile diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig index a0abe04..962fb80 100644 --- a/drivers/vfio/Kconfig +++ b/drivers/vfio/Kconfig @@ -27,3 +27,4 @@ menuconfig VFIO If you don't know what to do here, say N. source drivers/vfio/pci/Kconfig +source drivers/vfio/platform/Kconfig diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile index 0b035b1..dadf0ca 100644 --- a/drivers/vfio/Makefile +++ b/drivers/vfio/Makefile @@ -3,3 +3,4 @@ obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o obj-$(CONFIG_VFIO_SPAPR_EEH) += vfio_spapr_eeh.o obj-$(CONFIG_VFIO_PCI) += pci/ +obj-$(CONFIG_VFIO_PLATFORM) += platform/ diff --git a/drivers/vfio/platform/Kconfig b/drivers/vfio/platform/Kconfig new file mode 100644 index 000..c51af17 --- /dev/null +++ b/drivers/vfio/platform/Kconfig @@ -0,0 +1,9 @@ +config VFIO_PLATFORM + tristate VFIO support for platform devices + depends on VFIO EVENTFD ARM Hi Antonios, Is this only for ARM? how about X86 and PowerPC? On Freescale's PowerPC platform, the IOMMU is called PAMU (Peripheral Access Management Unit), and I am trying to use this VFIO framework on it. + help + Support for platform devices with VFIO. This is required to make + use of platform devices present on the system using the VFIO + framework. + + If you don't know what to do here, say N. diff --git a/drivers/vfio/platform/Makefile b/drivers/vfio/platform/Makefile new file mode 100644 index 000..279862b --- /dev/null +++ b/drivers/vfio/platform/Makefile @@ -0,0 +1,4 @@ + +vfio-platform-y := vfio_platform.o vfio_platform_common.o + +obj-$(CONFIG_VFIO_PLATFORM) += vfio-platform.o -- 2.1.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v9 03/19] vfio: platform: add the VFIO PLATFORM module to Kconfig
-Original Message- From: Hongbo Zhang [mailto:hongbo.zh...@linaro.org] Sent: Wednesday, November 12, 2014 4:09 PM To: Bhushan Bharat-R65777 Cc: Antonios Motakis; open list:VFIO DRIVER; will.dea...@arm.com; alex.william...@redhat.com; open list; io...@lists.linux-foundation.org; t...@virtualopensystems.com; kvm...@lists.cs.columbia.edu Subject: Re: [PATCH v9 03/19] vfio: platform: add the VFIO PLATFORM module to Kconfig On 12 November 2014 18:05, bharat.bhus...@freescale.com bharat.bhus...@freescale.com wrote: Hi, This is not yet supported on Freescale PowerPC. I am still in process of upstreaming the FSL PAMU specific patches for same. Initial plan is to test with PCIe devices and then with Platform devices. I see there is already driver/iommu/fsl_pamu.c, doesn't it work? We need VFIO iommu driver for same. Could you explain briefly what is wrong? I've heard that the vfio pci works on powerpc platforms. Yes, patches are in Freescale internal git repository. But those patches are yet to be upstreamed. I have started working on same. Thanks -Bharat Thanks -Bharat From: kvmarm-boun...@lists.cs.columbia.edu [mailto:kvmarm-boun...@lists.cs.columbia.edu] On Behalf Of Hongbo Zhang Sent: Wednesday, November 12, 2014 3:08 PM To: Antonios Motakis Cc: open list:VFIO DRIVER; will.dea...@arm.com; alex.william...@redhat.com; open list; io...@lists.linux-foundation.org; t...@virtualopensystems.com; kvm...@lists.cs.columbia.edu Subject: Re: [PATCH v9 03/19] vfio: platform: add the VFIO PLATFORM module to Kconfig On 28 October 2014 02:07, Antonios Motakis a.mota...@virtualopensystems.com wrote: Enable building the VFIO PLATFORM driver that allows to use Linux platform devices with VFIO. Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com --- drivers/vfio/Kconfig | 1 + drivers/vfio/Makefile | 1 + drivers/vfio/platform/Kconfig | 9 + drivers/vfio/platform/Makefile | 4 4 files changed, 15 insertions(+) create mode 100644 drivers/vfio/platform/Kconfig create mode 100644 drivers/vfio/platform/Makefile diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig index a0abe04..962fb80 100644 --- a/drivers/vfio/Kconfig +++ b/drivers/vfio/Kconfig @@ -27,3 +27,4 @@ menuconfig VFIO If you don't know what to do here, say N. source drivers/vfio/pci/Kconfig +source drivers/vfio/platform/Kconfig diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile index 0b035b1..dadf0ca 100644 --- a/drivers/vfio/Makefile +++ b/drivers/vfio/Makefile @@ -3,3 +3,4 @@ obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o obj-$(CONFIG_VFIO_SPAPR_EEH) += vfio_spapr_eeh.o obj-$(CONFIG_VFIO_PCI) += pci/ +obj-$(CONFIG_VFIO_PLATFORM) += platform/ diff --git a/drivers/vfio/platform/Kconfig b/drivers/vfio/platform/Kconfig new file mode 100644 index 000..c51af17 --- /dev/null +++ b/drivers/vfio/platform/Kconfig @@ -0,0 +1,9 @@ +config VFIO_PLATFORM + tristate VFIO support for platform devices + depends on VFIO EVENTFD ARM Hi Antonios, Is this only for ARM? how about X86 and PowerPC? On Freescale's PowerPC platform, the IOMMU is called PAMU (Peripheral Access Management Unit), and I am trying to use this VFIO framework on it. + help + Support for platform devices with VFIO. This is required to make + use of platform devices present on the system using the VFIO + framework. + + If you don't know what to do here, say N. diff --git a/drivers/vfio/platform/Makefile b/drivers/vfio/platform/Makefile new file mode 100644 index 000..279862b --- /dev/null +++ b/drivers/vfio/platform/Makefile @@ -0,0 +1,4 @@ + +vfio-platform-y := vfio_platform.o vfio_platform_common.o + +obj-$(CONFIG_VFIO_PLATFORM) += vfio-platform.o -- 2.1.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v9 03/19] vfio: platform: add the VFIO PLATFORM module to Kconfig
On 12 November 2014 19:00, bharat.bhus...@freescale.com bharat.bhus...@freescale.com wrote: -Original Message- From: Hongbo Zhang [mailto:hongbo.zh...@linaro.org] Sent: Wednesday, November 12, 2014 4:09 PM To: Bhushan Bharat-R65777 Cc: Antonios Motakis; open list:VFIO DRIVER; will.dea...@arm.com; alex.william...@redhat.com; open list; io...@lists.linux-foundation.org; t...@virtualopensystems.com; kvm...@lists.cs.columbia.edu Subject: Re: [PATCH v9 03/19] vfio: platform: add the VFIO PLATFORM module to Kconfig On 12 November 2014 18:05, bharat.bhus...@freescale.com bharat.bhus...@freescale.com wrote: Hi, This is not yet supported on Freescale PowerPC. I am still in process of upstreaming the FSL PAMU specific patches for same. Initial plan is to test with PCIe devices and then with Platform devices. I see there is already driver/iommu/fsl_pamu.c, doesn't it work? We need VFIO iommu driver for same. Could you explain briefly what is wrong? I've heard that the vfio pci works on powerpc platforms. Yes, patches are in Freescale internal git repository. But those patches are yet to be upstreamed. I have started working on same. (I come from Freescale too, just currently assigned to Linaro) Oh, yes I now see vfio iommu driver in the internal git repo, but not in the community one. It seems I have to select another platform within Linaro to work with. Thanks -Bharat Thanks -Bharat From: kvmarm-boun...@lists.cs.columbia.edu [mailto:kvmarm-boun...@lists.cs.columbia.edu] On Behalf Of Hongbo Zhang Sent: Wednesday, November 12, 2014 3:08 PM To: Antonios Motakis Cc: open list:VFIO DRIVER; will.dea...@arm.com; alex.william...@redhat.com; open list; io...@lists.linux-foundation.org; t...@virtualopensystems.com; kvm...@lists.cs.columbia.edu Subject: Re: [PATCH v9 03/19] vfio: platform: add the VFIO PLATFORM module to Kconfig On 28 October 2014 02:07, Antonios Motakis a.mota...@virtualopensystems.com wrote: Enable building the VFIO PLATFORM driver that allows to use Linux platform devices with VFIO. Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com --- drivers/vfio/Kconfig | 1 + drivers/vfio/Makefile | 1 + drivers/vfio/platform/Kconfig | 9 + drivers/vfio/platform/Makefile | 4 4 files changed, 15 insertions(+) create mode 100644 drivers/vfio/platform/Kconfig create mode 100644 drivers/vfio/platform/Makefile diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig index a0abe04..962fb80 100644 --- a/drivers/vfio/Kconfig +++ b/drivers/vfio/Kconfig @@ -27,3 +27,4 @@ menuconfig VFIO If you don't know what to do here, say N. source drivers/vfio/pci/Kconfig +source drivers/vfio/platform/Kconfig diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile index 0b035b1..dadf0ca 100644 --- a/drivers/vfio/Makefile +++ b/drivers/vfio/Makefile @@ -3,3 +3,4 @@ obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o obj-$(CONFIG_VFIO_SPAPR_EEH) += vfio_spapr_eeh.o obj-$(CONFIG_VFIO_PCI) += pci/ +obj-$(CONFIG_VFIO_PLATFORM) += platform/ diff --git a/drivers/vfio/platform/Kconfig b/drivers/vfio/platform/Kconfig new file mode 100644 index 000..c51af17 --- /dev/null +++ b/drivers/vfio/platform/Kconfig @@ -0,0 +1,9 @@ +config VFIO_PLATFORM + tristate VFIO support for platform devices + depends on VFIO EVENTFD ARM Hi Antonios, Is this only for ARM? how about X86 and PowerPC? On Freescale's PowerPC platform, the IOMMU is called PAMU (Peripheral Access Management Unit), and I am trying to use this VFIO framework on it. + help + Support for platform devices with VFIO. This is required to make + use of platform devices present on the system using the VFIO + framework. + + If you don't know what to do here, say N. diff --git a/drivers/vfio/platform/Makefile b/drivers/vfio/platform/Makefile new file mode 100644 index 000..279862b --- /dev/null +++ b/drivers/vfio/platform/Makefile @@ -0,0 +1,4 @@ + +vfio-platform-y := vfio_platform.o vfio_platform_common.o + +obj-$(CONFIG_VFIO_PLATFORM) += vfio-platform.o -- 2.1.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Seeking a KVM benchmark
On 10/11/2014 18:38, Gleb Natapov wrote: On Mon, Nov 10, 2014 at 06:28:25PM +0100, Paolo Bonzini wrote: On 10/11/2014 15:23, Avi Kivity wrote: It's not surprising [1]. Since the meaning of some PTE bits change [2], the TLB has to be flushed. In VMX we have VPIDs, so we only need to flush if EFER changed between two invocations of the same VPID, which isn't the case. [1] after the fact [2] although those bits were reserved with NXE=0, so they shouldn't have any TLB footprint You're right that this is not that surprising after the fact, and that both Sandy Bridge and Ivy Bridge have VPIDs (even the non-Xeon ones). This is also why I'm curious about the Nehalem. However note that even toggling the SCE bit is flushing the TLB. The NXE bit is not being toggled here! That's the more surprising part. Just a guess, but may be because writing EFER is not something that happens often in regular OSes it is not optimized to handle different bits differently. Yes, that's what Intel said too. Nehalem results: userspace exit, urn 17560 17726 17628 17572 17417 lightweight exit, urn 3316 3342 3342 3319 3328 userspace exit, LOAD_EFER, guest!=host 12200 11772 12130 12164 12327 lightweight exit, LOAD_EFER, guest!=host 3214 3220 3238 3218 3337 userspace exit, LOAD_EFER, guest=host11983 11780 11920 11919 12040 lightweight exit, LOAD_EFER, guest=host 3178 3193 3193 3187 3220 So the benchmark results also explain why skipping the LOAD_EFER does not give a benefit for guest EFER=host EFER. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] x86, kvm, vmx: Always use LOAD_IA32_EFER if available
On 08/11/2014 03:25, Andy Lutomirski wrote: At least on Sandy Bridge, letting the CPU switch IA32_EFER is much faster than switching it manually. I benchmarked this using the vmexit kvm-unit-test (single run, but GOAL multiplied by 5 to do more iterations): Test Before AfterChange cpuid 2000 1932-3.40% vmcall 1914 1817-5.07% mov_from_cr8 13 13 0.00% mov_to_cr819 19 0.00% inl_from_pmtimer 19164 10619 -44.59% inl_from_qemu 15662 10302 -34.22% inl_from_kernel 3916 3802-2.91% outl_to_kernel 2230 2194-1.61% mov_dr 172176 2.33% ipi(skipped) (skipped) ipi+halt (skipped) (skipped) ple-round-robin 13 13 0.00% wr_tsc_adjust_msr 1920 1845-3.91% rd_tsc_adjust_msr 1892 1814-4.12% mmio-no-eventfd:pci-mem16394 11165 -31.90% mmio-wildcard-eventfd:pci-mem 4607 4645 0.82% mmio-datamatch-eventfd:pci-mem 4601 4610 0.20% portio-no-eventfd:pci-io 11507 7942 -30.98% portio-wildcard-eventfd:pci-io 2239 2225-0.63% portio-datamatch-eventfd:pci-io 2250 2234-0.71% I haven't explicitly computed the significance of these numbers, but this isn't subtle. Signed-off-by: Andy Lutomirski l...@amacapital.net --- arch/x86/kvm/vmx.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 3e556c68351b..e72b9660e51c 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -1659,8 +1659,14 @@ static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset) vmx-guest_msrs[efer_offset].mask = ~ignore_bits; clear_atomic_switch_msr(vmx, MSR_EFER); - /* On ept, can't emulate nx, and must switch nx atomically */ - if (enable_ept ((vmx-vcpu.arch.efer ^ host_efer) EFER_NX)) { + + /* + * On EPT, we can't emulate NX, so we must switch EFER atomically. + * On CPUs that support load IA32_EFER, always switch EFER + * atomically, since it's faster than switching it manually. + */ + if (cpu_has_load_ia32_efer || + (enable_ept ((vmx-vcpu.arch.efer ^ host_efer) EFER_NX))) { guest_efer = vmx-vcpu.arch.efer; if (!(guest_efer EFER_LMA)) guest_efer = ~EFER_LME; I am committing this patch, with an additional remark in the commit message: The results were reproducible on all of Nehalem, Sandy Bridge and Ivy Bridge. The slowness of manual switching is because writing to EFER with WRMSR triggers a TLB flush, even if the only bit you're touching is SCE (so the page table format is not affected). Doing the write as part of vmentry/vmexit, instead, does not flush the TLB, probably because all processors that have EPT also have VPID. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] virt: kvm: arm: vgic: Return failure code '-EBUSY' when mutex_trylock() fails
When mutex_trylock() fails, kvm_vgic_create() will not create 'vgic', so it need return failure code '-EBUSY' instead of '0' to let outside know about it. Also simplify the code within kvm_vgic_create(): - kvm_for_each_vcpu() scanning once is enough for current case. - Remove redundant variable 'vcpu_lock_idx'. Signed-off-by: Chen Gang gang.chen.5...@gmail.com --- virt/kvm/arm/vgic.c | 15 +++ 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index 3aaca49..5846725 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -1933,7 +1933,7 @@ out: int kvm_vgic_create(struct kvm *kvm) { - int i, vcpu_lock_idx = -1, ret = 0; + int i, ret = 0; struct kvm_vcpu *vcpu; mutex_lock(kvm-lock); @@ -1949,13 +1949,12 @@ int kvm_vgic_create(struct kvm *kvm) * that no other VCPUs are run while we create the vgic. */ kvm_for_each_vcpu(i, vcpu, kvm) { - if (!mutex_trylock(vcpu-mutex)) + if (!mutex_trylock(vcpu-mutex)) { + ret = -EBUSY; goto out_unlock; - vcpu_lock_idx = i; - } - - kvm_for_each_vcpu(i, vcpu, kvm) { + } if (vcpu-arch.has_run_once) { + mutex_unlock(vcpu-mutex); ret = -EBUSY; goto out_unlock; } @@ -1968,8 +1967,8 @@ int kvm_vgic_create(struct kvm *kvm) kvm-arch.vgic.vgic_cpu_base = VGIC_ADDR_UNDEF; out_unlock: - for (; vcpu_lock_idx = 0; vcpu_lock_idx--) { - vcpu = kvm_get_vcpu(kvm, vcpu_lock_idx); + while (i-- 0) { + vcpu = kvm_get_vcpu(kvm, i); mutex_unlock(vcpu-mutex); } -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Seeking a KVM benchmark
On Wed, Nov 12, 2014 at 12:33:32PM +0100, Paolo Bonzini wrote: On 10/11/2014 18:38, Gleb Natapov wrote: On Mon, Nov 10, 2014 at 06:28:25PM +0100, Paolo Bonzini wrote: On 10/11/2014 15:23, Avi Kivity wrote: It's not surprising [1]. Since the meaning of some PTE bits change [2], the TLB has to be flushed. In VMX we have VPIDs, so we only need to flush if EFER changed between two invocations of the same VPID, which isn't the case. [1] after the fact [2] although those bits were reserved with NXE=0, so they shouldn't have any TLB footprint You're right that this is not that surprising after the fact, and that both Sandy Bridge and Ivy Bridge have VPIDs (even the non-Xeon ones). This is also why I'm curious about the Nehalem. However note that even toggling the SCE bit is flushing the TLB. The NXE bit is not being toggled here! That's the more surprising part. Just a guess, but may be because writing EFER is not something that happens often in regular OSes it is not optimized to handle different bits differently. Yes, that's what Intel said too. Nehalem results: userspace exit, urn 17560 17726 17628 17572 17417 lightweight exit, urn 3316 3342 3342 3319 3328 userspace exit, LOAD_EFER, guest!=host 12200 11772 12130 12164 12327 lightweight exit, LOAD_EFER, guest!=host 3214 3220 3238 3218 3337 userspace exit, LOAD_EFER, guest=host11983 11780 11920 11919 12040 lightweight exit, LOAD_EFER, guest=host 3178 3193 3193 3187 3220 Is this with Andy's patch that skips LOAD_EFER when guest=host, or the one that always switch LOAD_EFER? -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Seeking a KVM benchmark
On 12/11/2014 16:22, Gleb Natapov wrote: Nehalem results: userspace exit, urn 17560 17726 17628 17572 17417 lightweight exit, urn 3316 3342 3342 3319 3328 userspace exit, LOAD_EFER, guest!=host 12200 11772 12130 12164 12327 lightweight exit, LOAD_EFER, guest!=host 3214 3220 3238 3218 3337 userspace exit, LOAD_EFER, guest=host11983 11780 11920 11919 12040 lightweight exit, LOAD_EFER, guest=host 3178 3193 3193 3187 3220 Is this with Andy's patch that skips LOAD_EFER when guest=host, or the one that always switch LOAD_EFER? Skip LOAD_EFER when guest=host. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Seeking a KVM benchmark
On Wed, Nov 12, 2014 at 04:26:29PM +0100, Paolo Bonzini wrote: On 12/11/2014 16:22, Gleb Natapov wrote: Nehalem results: userspace exit, urn 17560 17726 17628 17572 17417 lightweight exit, urn 3316 3342 3342 3319 3328 userspace exit, LOAD_EFER, guest!=host 12200 11772 12130 12164 12327 lightweight exit, LOAD_EFER, guest!=host 3214 3220 3238 3218 3337 userspace exit, LOAD_EFER, guest=host11983 11780 11920 11919 12040 lightweight exit, LOAD_EFER, guest=host 3178 3193 3193 3187 3220 Is this with Andy's patch that skips LOAD_EFER when guest=host, or the one that always switch LOAD_EFER? Skip LOAD_EFER when guest=host. So guest=host is a little bit better than guest!=host so looks like skipping LOAD_EFER helps, but why lightweight exit, urn worse than guest=host though, it should be exactly the same as long as NX bit is the same in urn test, no? -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Seeking a KVM benchmark
On 12/11/2014 16:32, Gleb Natapov wrote: userspace exit, urn 17560 17726 17628 17572 17417 lightweight exit, urn 3316 3342 3342 3319 3328 userspace exit, LOAD_EFER, guest!=host 12200 11772 12130 12164 12327 lightweight exit, LOAD_EFER, guest!=host 3214 3220 3238 3218 3337 userspace exit, LOAD_EFER, guest=host11983 11780 11920 11919 12040 lightweight exit, LOAD_EFER, guest=host 3178 3193 3193 3187 3220 Is this with Andy's patch that skips LOAD_EFER when guest=host, or the one that always switch LOAD_EFER? Skip LOAD_EFER when guest=host. So guest=host is a little bit better than guest!=host so looks like skipping LOAD_EFER helps, but why lightweight exit, urn worse than guest=host though, it should be exactly the same as long as NX bit is the same in urn test, no? I don't know---it is very much reproducible though. It is not my machine so I cannot run perf on it, but I can try to find a similar one in the next few days. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Seeking a KVM benchmark
On Wed, Nov 12, 2014 at 7:51 AM, Paolo Bonzini pbonz...@redhat.com wrote: On 12/11/2014 16:32, Gleb Natapov wrote: userspace exit, urn 17560 17726 17628 17572 17417 lightweight exit, urn 3316 3342 3342 3319 3328 userspace exit, LOAD_EFER, guest!=host 12200 11772 12130 12164 12327 lightweight exit, LOAD_EFER, guest!=host 3214 3220 3238 3218 3337 userspace exit, LOAD_EFER, guest=host11983 11780 11920 11919 12040 lightweight exit, LOAD_EFER, guest=host 3178 3193 3193 3187 3220 Is this with Andy's patch that skips LOAD_EFER when guest=host, or the one that always switch LOAD_EFER? Skip LOAD_EFER when guest=host. So guest=host is a little bit better than guest!=host so looks like skipping LOAD_EFER helps, but why lightweight exit, urn worse than guest=host though, it should be exactly the same as long as NX bit is the same in urn test, no? I don't know---it is very much reproducible though. It is not my machine so I cannot run perf on it, but I can try to find a similar one in the next few days. Assuming you're running both of my patches (LOAD_EFER regardless of nx, but skip LOAD_EFER of guest == host), then some of the speedup may be just less code running. I haven't figured out exactly when vmx_save_host_state runs, but my patches avoid a call to kvm_set_shared_msr, which is worth a few cycles. --Andy Paolo -- Andy Lutomirski AMA Capital Management, LLC -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v9 06/19] vfio/platform: return info for bound device
On Wed, 2014-11-12 at 11:32 +0100, Eric Auger wrote: On 10/27/2014 07:07 PM, Antonios Motakis wrote: A VFIO userspace driver will start by opening the VFIO device that corresponds to an IOMMU group, and will use the ioctl interface to get the basic device info, such as number of memory regions and interrupts, and their properties. This patch enables the VFIO_DEVICE_GET_INFO ioctl call. Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com --- drivers/vfio/platform/vfio_platform_common.c | 23 --- 1 file changed, 20 insertions(+), 3 deletions(-) diff --git a/drivers/vfio/platform/vfio_platform_common.c b/drivers/vfio/platform/vfio_platform_common.c index e0fdbc8..cb20526 100644 --- a/drivers/vfio/platform/vfio_platform_common.c +++ b/drivers/vfio/platform/vfio_platform_common.c @@ -43,10 +43,27 @@ static int vfio_platform_open(void *device_data) static long vfio_platform_ioctl(void *device_data, unsigned int cmd, unsigned long arg) { - if (cmd == VFIO_DEVICE_GET_INFO) - return -EINVAL; + struct vfio_platform_device *vdev = device_data; + unsigned long minsz; + + if (cmd == VFIO_DEVICE_GET_INFO) { + struct vfio_device_info info; + + minsz = offsetofend(struct vfio_device_info, num_irqs); + + if (copy_from_user(info, (void __user *)arg, minsz)) + return -EFAULT; + + if (info.argsz minsz) + return -EINVAL; + + info.flags = vdev-flags; + info.num_regions = 0; + info.num_irqs = 0; Seems a bit weird to me to enable the modality but returning zeroed values. Shouldn't we put that patch after VFIO_DEVICE_GET_REGION_INFO and VFIO_DEVICE_GET_IRQ_INFO ones? I actually like how Antonios has started from a base framework, exposing a device but none of the resources and then incrementally adds each component. It's also a good showcase of the VFIO ABI that we can do things like this. Thanks, Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes
On Wed, 2014-11-12 at 10:14 +0100, Paolo Bonzini wrote: On 12/11/2014 04:42, Zhang, Yang Z wrote: Personally, I think this feature will be helpful to the legacy device assignment. Agree, vfio is the right solution for future feature enabling. But the old kvm without the good vfio supporting is still used largely today. The user really looking for this feature but they will not upgrade their kernel. It's easy for us to backport this feature to old kvm with the legacy device assignment, but it is impossible to backport the whole vfio. You can certainly backport these patches to distros that do not have VFIO. But upstream we should work on VFIO first. VFIO has feature parity with legacy device assignment, and adding a new feature that is not in VFIO would be a bad idea. Thanks Paolo, I agree. We should design the interfaces for VFIO since we expect legacy KVM assignment to be deprecated and eventually removed. I think that some of the platform device work for ARM's IRQ forwarding should probably be leveraged for this interface. IRQ forwarding effectively allows level triggered interrupts to be handled as edge, eliminating the mask/unmask overhead and EOI path entirely. To do this through VFIO they make use of the KVM-VFIO device to register the device and set attributes for the forwarded IRQ. This enables KVM to use the VFIO external user interfaces to acquire a VFIO device reference and access the struct device. From there it can do some IRQ manipulation on the device to reconfigure how the host handles the interrupt. Ideally we could use the same base KVM-VFIO device interface interface, perhaps with different attributes, and obviously with different architecture backing. Thanks, Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v9 01/19] vfio/platform: initial skeleton of VFIO support for platform devices
On Wed, 2014-11-12 at 11:05 +0100, Eric Auger wrote: Hi Antonios, On 10/27/2014 07:07 PM, Antonios Motakis wrote: This patch forms the common skeleton code for platform devices support with VFIO. This will include the core functionality of VFIO_PLATFORM, however binding to the device and discovering the device resources will be done with the help of a separate file where any Linux platform bus specific code will reside. This will allow us to implement support for also discovering AMBA devices and their resources, but still reuse a large part of the VFIO_PLATFORM implementation. Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com --- drivers/vfio/platform/vfio_platform_common.c | 126 ++ drivers/vfio/platform/vfio_platform_private.h | 36 2 files changed, 162 insertions(+) create mode 100644 drivers/vfio/platform/vfio_platform_common.c create mode 100644 drivers/vfio/platform/vfio_platform_private.h diff --git a/drivers/vfio/platform/vfio_platform_common.c b/drivers/vfio/platform/vfio_platform_common.c new file mode 100644 index 000..e0fdbc8 --- /dev/null +++ b/drivers/vfio/platform/vfio_platform_common.c @@ -0,0 +1,126 @@ +/* + * Copyright (C) 2013 - Virtual Open Systems + * Author: Antonios Motakis a.mota...@virtualopensystems.com + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#include linux/device.h +#include linux/interrupt.h +#include linux/iommu.h +#include linux/module.h +#include linux/mutex.h +#include linux/notifier.h +#include linux/pm_runtime.h +#include linux/slab.h +#include linux/types.h +#include linux/uaccess.h +#include linux/vfio.h +#include linux/io.h not sure at that state all the above includes are needed. + +#include vfio_platform_private.h + +static void vfio_platform_release(void *device_data) +{ + module_put(THIS_MODULE); +} + +static int vfio_platform_open(void *device_data) +{ + if (!try_module_get(THIS_MODULE)) + return -ENODEV; + + return 0; +} + +static long vfio_platform_ioctl(void *device_data, + unsigned int cmd, unsigned long arg) a minor style comment/question that applies on all the series. Shouldn't subsequent argument lines rather be aligned with the first char after (, as done in PCI code? It's also my preferred style to indent to just after the open paren on wrapped lines where possible, but I don't think there are hard rules in CodingStyle or checkpatch that enforce this, so I often let it slide. Thanks, Alex +{ + if (cmd == VFIO_DEVICE_GET_INFO) + return -EINVAL; + + else if (cmd == VFIO_DEVICE_GET_REGION_INFO) + return -EINVAL; + + else if (cmd == VFIO_DEVICE_GET_IRQ_INFO) + return -EINVAL; + + else if (cmd == VFIO_DEVICE_SET_IRQS) + return -EINVAL; + + else if (cmd == VFIO_DEVICE_RESET) + return -EINVAL; + + return -ENOTTY; +} + +static ssize_t vfio_platform_read(void *device_data, char __user *buf, + size_t count, loff_t *ppos) +{ + return -EINVAL; +} + +static ssize_t vfio_platform_write(void *device_data, const char __user *buf, + size_t count, loff_t *ppos) +{ + return -EINVAL; +} + +static int vfio_platform_mmap(void *device_data, struct vm_area_struct *vma) +{ + return -EINVAL; +} + +static const struct vfio_device_ops vfio_platform_ops = { + .name = vfio-platform, + .open = vfio_platform_open, + .release= vfio_platform_release, + .ioctl = vfio_platform_ioctl, + .read = vfio_platform_read, + .write = vfio_platform_write, + .mmap = vfio_platform_mmap, +}; + +int vfio_platform_probe_common(struct vfio_platform_device *vdev, + struct device *dev) +{ + struct iommu_group *group; + int ret; + + if (!vdev) + return -EINVAL; + + group = iommu_group_get(dev); + if (!group) { + pr_err(VFIO: No IOMMU group for device %s\n, vdev-name); + return -EINVAL; + } I saw the above check also is done at beginning of vfio_add_group_dev. Added value however is pr_err which is useful and PCI code does the check too. Eric + + ret = vfio_add_group_dev(dev, vfio_platform_ops, vdev); + if (ret) { +
Re: Seeking a KVM benchmark
Assuming you're running both of my patches (LOAD_EFER regardless of nx, but skip LOAD_EFER of guest == host), then some of the speedup may be just less code running. I haven't figured out exactly when vmx_save_host_state runs, but my patches avoid a call to kvm_set_shared_msr, which is worth a few cycles. Yes, that's possible. vmx_save_host_state is here: preempt_disable(); kvm_x86_ops-prepare_guest_switch(vcpu); // if (vcpu-fpu_active) kvm_load_guest_fpu(vcpu); kvm_load_guest_xcr0(vcpu); vcpu-mode = IN_GUEST_MODE; srcu_read_unlock(vcpu-kvm-srcu, vcpu-srcu_idx); and it's a fairly hot function. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: x86: add trace event for pvclock updates
On 11/10 11:18 PM, Marcelo Tosatti wrote: On Wed, Nov 05, 2014 at 11:46:42AM -0800, David Matlack wrote: The new trace event records: * the id of vcpu being updated * the pvclock_vcpu_time_info struct being written to guest memory This is useful for debugging pvclock bugs, such as the bug fixed by [PATCH] kvm: x86: Fix kvm clock versioning.. Signed-off-by: David Matlack dmatl...@google.com So you actually hit that bug in practice? Can you describe the scenario? We noticed guests running stress workloads would occasionally get stuck on the far side of a save/restore. Inspecting the guest state revealed arch/x86/kernel/pvclock.c:last_value was stuck at a value like 8020566108469899263, despite TSC and pvclock looking sane. Since these guests ran without PVCLOCK_TSC_STABLE_BIT set in their CPUID, they were stuck with this large time value until real time caught up (in about 250 years :). We've been unable to reproduce the bug with kvm: x86: Fix kvm clock versioning. so we didn't invest in catching the overflow in the act, but a likely explanation is the guest gets save/restore-ed while computing the pvclock delta: u64 delta = __native_read_tsc() - src-tsc_timestamp; causing the subtraction to underflow and delta to be huge. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virt: kvm: arm: vgic: Return failure code '-EBUSY' when mutex_trylock() fails
On Wed, Nov 12, 2014 at 11:04:23PM +0800, Chen Gang wrote: When mutex_trylock() fails, kvm_vgic_create() will not create 'vgic', so it need return failure code '-EBUSY' instead of '0' to let outside know about it. I already sent a patch for the -EBUSY: https://lists.cs.columbia.edu/pipermail/kvmarm/2014-November/011936.html Also simplify the code within kvm_vgic_create(): - kvm_for_each_vcpu() scanning once is enough for current case. - Remove redundant variable 'vcpu_lock_idx'. I don't like using the iterator variable for this kind of thing because it can really break in languages where i is out-of-scope after the loop and it is too easy to reuse the iterator variable in later versions of the code. That being said, the scanning once is slightly prettier I guess, but I'd rather not introduce the churn unless others (Marc) think this is a big win. -Christoffer Signed-off-by: Chen Gang gang.chen.5...@gmail.com --- virt/kvm/arm/vgic.c | 15 +++ 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index 3aaca49..5846725 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -1933,7 +1933,7 @@ out: int kvm_vgic_create(struct kvm *kvm) { - int i, vcpu_lock_idx = -1, ret = 0; + int i, ret = 0; struct kvm_vcpu *vcpu; mutex_lock(kvm-lock); @@ -1949,13 +1949,12 @@ int kvm_vgic_create(struct kvm *kvm) * that no other VCPUs are run while we create the vgic. */ kvm_for_each_vcpu(i, vcpu, kvm) { - if (!mutex_trylock(vcpu-mutex)) + if (!mutex_trylock(vcpu-mutex)) { + ret = -EBUSY; goto out_unlock; - vcpu_lock_idx = i; - } - - kvm_for_each_vcpu(i, vcpu, kvm) { + } if (vcpu-arch.has_run_once) { + mutex_unlock(vcpu-mutex); ret = -EBUSY; goto out_unlock; } @@ -1968,8 +1967,8 @@ int kvm_vgic_create(struct kvm *kvm) kvm-arch.vgic.vgic_cpu_base = VGIC_ADDR_UNDEF; out_unlock: - for (; vcpu_lock_idx = 0; vcpu_lock_idx--) { - vcpu = kvm_get_vcpu(kvm, vcpu_lock_idx); + while (i-- 0) { + vcpu = kvm_get_vcpu(kvm, i); mutex_unlock(vcpu-mutex); } -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net] Revert drivers/net: Disable UFO through virtio in macvtap and tun
On Tue, Nov 11, 2014 at 05:12:58PM +, Ben Hutchings wrote: This reverts commit 88e0e0e5aa722b193c8758c8b45d041de5316924 for the tap drivers, but leaves UFO disabled in virtio_net. libvirt at least assumes that tap features will never be dropped in new kernel versions, and doing so prevents migration of VMs to the never kernel version while they are running with virtio net devices. Fixes: 88e0e0e5aa7a (drivers/net: Disable UFO through virtio) Signed-off-by: Ben Hutchings b...@decadent.org.uk --- Compile-tested only. Jelle, care to test this and reply with Tested-by: Jelle de Jong jelledej...@powercraft.nl if it solves the live migration problem you reported? It requires applying the patch to the host kernel on your virt01 host. Thanks! Ben. drivers/net/macvtap.c | 13 - drivers/net/tun.c | 19 --- 2 files changed, 16 insertions(+), 16 deletions(-) diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c index 6f226de..aeaeb6d 100644 --- a/drivers/net/macvtap.c +++ b/drivers/net/macvtap.c @@ -66,7 +66,7 @@ static struct cdev macvtap_cdev; static const struct proto_ops macvtap_socket_ops; #define TUN_OFFLOADS (NETIF_F_HW_CSUM | NETIF_F_TSO_ECN | NETIF_F_TSO | \ - NETIF_F_TSO6) + NETIF_F_TSO6 | NETIF_F_UFO) #define RX_OFFLOADS (NETIF_F_GRO | NETIF_F_LRO) #define TAP_FEATURES (NETIF_F_GSO | NETIF_F_SG) @@ -570,8 +570,6 @@ static int macvtap_skb_from_vnet_hdr(struct sk_buff *skb, gso_type = SKB_GSO_TCPV6; break; case VIRTIO_NET_HDR_GSO_UDP: - pr_warn_once(macvtap: %s: using disabled UFO feature; please fix this program\n, - current-comm); gso_type = SKB_GSO_UDP; if (skb-protocol == htons(ETH_P_IPV6)) ipv6_proxy_select_ident(skb); @@ -619,6 +617,8 @@ static void macvtap_skb_to_vnet_hdr(const struct sk_buff *skb, vnet_hdr-gso_type = VIRTIO_NET_HDR_GSO_TCPV4; else if (sinfo-gso_type SKB_GSO_TCPV6) vnet_hdr-gso_type = VIRTIO_NET_HDR_GSO_TCPV6; + else if (sinfo-gso_type SKB_GSO_UDP) + vnet_hdr-gso_type = VIRTIO_NET_HDR_GSO_UDP; else BUG(); if (sinfo-gso_type SKB_GSO_TCP_ECN) @@ -953,6 +953,9 @@ static int set_offload(struct macvtap_queue *q, unsigned long arg) if (arg TUN_F_TSO6) feature_mask |= NETIF_F_TSO6; } + + if (arg TUN_F_UFO) + feature_mask |= NETIF_F_UFO; } /* tun/tap driver inverts the usage for TSO offloads, where @@ -963,7 +966,7 @@ static int set_offload(struct macvtap_queue *q, unsigned long arg) * When user space turns off TSO, we turn off GSO/LRO so that * user-space will not receive TSO frames. */ - if (feature_mask (NETIF_F_TSO | NETIF_F_TSO6)) + if (feature_mask (NETIF_F_TSO | NETIF_F_TSO6 | NETIF_F_UFO)) features |= RX_OFFLOADS; else features = ~RX_OFFLOADS; @@ -1064,7 +1067,7 @@ static long macvtap_ioctl(struct file *file, unsigned int cmd, case TUNSETOFFLOAD: /* let the user check for future flags */ if (arg ~(TUN_F_CSUM | TUN_F_TSO4 | TUN_F_TSO6 | - TUN_F_TSO_ECN)) + TUN_F_TSO_ECN | TUN_F_UFO)) return -EINVAL; rtnl_lock(); diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 7302398..a0987d1 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -175,7 +175,7 @@ struct tun_struct { struct net_device *dev; netdev_features_t set_features; #define TUN_USER_FEATURES (NETIF_F_HW_CSUM|NETIF_F_TSO_ECN|NETIF_F_TSO| \ - NETIF_F_TSO6) + NETIF_F_TSO6|NETIF_F_UFO) int vnet_hdr_sz; int sndbuf; @@ -1152,20 +1152,10 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile, skb_shinfo(skb)-gso_type = SKB_GSO_TCPV6; break; case VIRTIO_NET_HDR_GSO_UDP: - { - static bool warned; - - if (!warned) { - warned = true; - netdev_warn(tun-dev, - %s: using disabled UFO feature; please fix this program\n, - current-comm); - } skb_shinfo(skb)-gso_type = SKB_GSO_UDP; if (skb-protocol == htons(ETH_P_IPV6))
RE: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes
Wu, Feng wrote on 2014-11-13: kvm-ow...@vger.kernel.org wrote on 2014-11-12: kvm@vger.kernel.org; io...@lists.linux-foundation.org; linux-ker...@vger.kernel.org Subject: Re: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes On 12/11/2014 10:19, Wu, Feng wrote: You can certainly backport these patches to distros that do not have VFIO. But upstream we should work on VFIO first. VFIO has feature parity with legacy device assignment, and adding a new feature that is not in VFIO would be a bad idea. By the way, do you have benchmark results for it? We have not been able to see any performance improvement for APICv on e.g. netperf. Do you mean benchmark results for APICv itself or VT-d Posted-Interrtups? Especially for VT-d posted interrupts---but it'd be great to know which workloads see the biggest speedup from APICv. We have some draft performance data internally, please see the attached. For VT-d PI, I think we can get the biggest performance gain if the VCPU is running in non-root mode for most of the time (not in HLT state), since external interrupt from assigned devices will be delivered by guest directly in this case. That means we can run some cpu intensive workload in the guests. Have you check that the CPU side posted interrupt is taking effect in w/o VT-D PI case? Per my understanding, the performance gap should be so large if you use CPU side posted interrupt. This data more like the VT-d PI vs non PI(both VT-d and CPU). Thanks, Feng Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Best regards, Yang
RE: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes
-Original Message- From: Zhang, Yang Z Sent: Thursday, November 13, 2014 9:21 AM To: Wu, Feng; Paolo Bonzini; Alex Williamson Cc: g...@kernel.org; dw...@infradead.org; j...@8bytes.org; t...@linutronix.de; mi...@redhat.com; h...@zytor.com; x...@kernel.org; kvm@vger.kernel.org; io...@lists.linux-foundation.org; linux-ker...@vger.kernel.org Subject: RE: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes Wu, Feng wrote on 2014-11-13: kvm-ow...@vger.kernel.org wrote on 2014-11-12: kvm@vger.kernel.org; io...@lists.linux-foundation.org; linux-ker...@vger.kernel.org Subject: Re: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes On 12/11/2014 10:19, Wu, Feng wrote: You can certainly backport these patches to distros that do not have VFIO. But upstream we should work on VFIO first. VFIO has feature parity with legacy device assignment, and adding a new feature that is not in VFIO would be a bad idea. By the way, do you have benchmark results for it? We have not been able to see any performance improvement for APICv on e.g. netperf. Do you mean benchmark results for APICv itself or VT-d Posted-Interrtups? Especially for VT-d posted interrupts---but it'd be great to know which workloads see the biggest speedup from APICv. We have some draft performance data internally, please see the attached. For VT-d PI, I think we can get the biggest performance gain if the VCPU is running in non-root mode for most of the time (not in HLT state), since external interrupt from assigned devices will be delivered by guest directly in this case. That means we can run some cpu intensive workload in the guests. Have you check that the CPU side posted interrupt is taking effect in w/o VT-D PI case? Per my understanding, the performance gap should be so large if you use CPU side posted interrupt. This data more like the VT-d PI vs non PI(both VT-d and CPU). Yes, this data is VT-d PI vs Non VT-d PI. The CPU side APICv mechanism (including CPU side Posted-Interrtups) is enabled. Thanks, Feng Thanks, Feng Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Best regards, Yang
RE: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes
Wu, Feng wrote on 2014-11-13: Zhang, Yang Z wrote on 2014-11-13: kvm@vger.kernel.org; io...@lists.linux-foundation.org; linux-ker...@vger.kernel.org Subject: RE: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes Wu, Feng wrote on 2014-11-13: kvm-ow...@vger.kernel.org wrote on 2014-11-12: kvm@vger.kernel.org; io...@lists.linux-foundation.org; linux-ker...@vger.kernel.org Subject: Re: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes On 12/11/2014 10:19, Wu, Feng wrote: You can certainly backport these patches to distros that do not have VFIO. But upstream we should work on VFIO first. VFIO has feature parity with legacy device assignment, and adding a new feature that is not in VFIO would be a bad idea. By the way, do you have benchmark results for it? We have not been able to see any performance improvement for APICv on e.g. netperf. Do you mean benchmark results for APICv itself or VT-d Posted-Interrtups? Especially for VT-d posted interrupts---but it'd be great to know which workloads see the biggest speedup from APICv. We have some draft performance data internally, please see the attached. For VT-d PI, I think we can get the biggest performance gain if the VCPU is running in non-root mode for most of the time (not in HLT state), since external interrupt from assigned devices will be delivered by guest directly in this case. That means we can run some cpu intensive workload in the guests. Have you check that the CPU side posted interrupt is taking effect in w/o VT-D PI case? Per my understanding, the performance gap should be so large if you use CPU side posted interrupt. This data more like the VT-d PI vs non PI(both VT-d and CPU). Yes, this data is VT-d PI vs Non VT-d PI. The CPU side APICv mechanism (including CPU side Posted-Interrtups) is enabled. From the CPU utilization data, it seems the environment of APICv is not reasonable to me. with current APICv, the interrupt should not deliver to the PCPU where vcpu is running. Otherwise, it will force the vcpu vmexit and the CPU side posted interrupt cannot take effect. Do you set the interrupt affinity manually? Thanks, Feng Thanks, Feng Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Best regards, Yang Best regards, Yang
Re: [PATCH] virt: kvm: arm: vgic: Return failure code '-EBUSY' when mutex_trylock() fails
On 11/13/14 3:41, Christoffer Dall wrote: On Wed, Nov 12, 2014 at 11:04:23PM +0800, Chen Gang wrote: When mutex_trylock() fails, kvm_vgic_create() will not create 'vgic', so it need return failure code '-EBUSY' instead of '0' to let outside know about it. I already sent a patch for the -EBUSY: https://lists.cs.columbia.edu/pipermail/kvmarm/2014-November/011936.html Yeah, really it is. Also simplify the code within kvm_vgic_create(): - kvm_for_each_vcpu() scanning once is enough for current case. - Remove redundant variable 'vcpu_lock_idx'. I don't like using the iterator variable for this kind of thing because it can really break in languages where i is out-of-scope after the loop and it is too easy to reuse the iterator variable in later versions of the code. For me, what you said is OK, we can still keep it no touch. That being said, the scanning once is slightly prettier I guess, but I'd rather not introduce the churn unless others (Marc) think this is a big win. If only merge the 2 scanning loops, it will not change much. And also can let code simpler and clearer for readers (both are for processing and checking '-EBUSY'). If possible, after your patch merges into linux next tree, I will send the related improving patch for it. Thanks. -Christoffer Signed-off-by: Chen Gang gang.chen.5...@gmail.com --- virt/kvm/arm/vgic.c | 15 +++ 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index 3aaca49..5846725 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -1933,7 +1933,7 @@ out: int kvm_vgic_create(struct kvm *kvm) { -int i, vcpu_lock_idx = -1, ret = 0; +int i, ret = 0; struct kvm_vcpu *vcpu; mutex_lock(kvm-lock); @@ -1949,13 +1949,12 @@ int kvm_vgic_create(struct kvm *kvm) * that no other VCPUs are run while we create the vgic. */ kvm_for_each_vcpu(i, vcpu, kvm) { -if (!mutex_trylock(vcpu-mutex)) +if (!mutex_trylock(vcpu-mutex)) { +ret = -EBUSY; goto out_unlock; -vcpu_lock_idx = i; -} - -kvm_for_each_vcpu(i, vcpu, kvm) { +} if (vcpu-arch.has_run_once) { +mutex_unlock(vcpu-mutex); ret = -EBUSY; goto out_unlock; } @@ -1968,8 +1967,8 @@ int kvm_vgic_create(struct kvm *kvm) kvm-arch.vgic.vgic_cpu_base = VGIC_ADDR_UNDEF; out_unlock: -for (; vcpu_lock_idx = 0; vcpu_lock_idx--) { -vcpu = kvm_get_vcpu(kvm, vcpu_lock_idx); +while (i-- 0) { +vcpu = kvm_get_vcpu(kvm, i); mutex_unlock(vcpu-mutex); } -- 1.9.3 -- Chen Gang Open, share, and attitude like air, water, and life which God blessed -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM: x86: add module parameter to disable periodic kvmclock sync
The periodic kvmclock sync can be an undesired source of latencies. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 0033df3..be56fd3 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -98,6 +98,9 @@ module_param(ignore_msrs, bool, S_IRUGO | S_IWUSR); unsigned int min_timer_period_us = 500; module_param(min_timer_period_us, uint, S_IRUGO | S_IWUSR); +static bool kvmclock_periodic_sync = 1; +module_param(kvmclock_periodic_sync, bool, S_IRUGO | S_IWUSR); + bool kvm_has_tsc_control; EXPORT_SYMBOL_GPL(kvm_has_tsc_control); u32 kvm_max_guest_tsc_khz; @@ -1718,7 +1721,8 @@ static void kvmclock_sync_fn(struct work_struct *work) struct kvm *kvm = container_of(ka, struct kvm, arch); schedule_delayed_work(kvm-arch.kvmclock_update_work, 0); - schedule_delayed_work(kvm-arch.kvmclock_sync_work, + if (kvmclock_periodic_sync) + schedule_delayed_work(kvm-arch.kvmclock_sync_work, KVMCLOCK_SYNC_PERIOD); } @@ -6971,7 +6975,8 @@ int kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu) kvm_write_tsc(vcpu, msr); vcpu_put(vcpu); - schedule_delayed_work(kvm-arch.kvmclock_sync_work, + if (kvmclock_periodic_sync) + schedule_delayed_work(kvm-arch.kvmclock_sync_work, KVMCLOCK_SYNC_PERIOD); return r; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm: svm: move WARN_ON in svm_adjust_tsc_offset
When running the tsc_adjust kvm-unit-test on an AMD processor with the IA32_TSC_ADJUST feature enabled, the WARN_ON in svm_adjust_tsc_offset can be triggered. This WARN_ON checks for a negative adjustment in case __scale_tsc is called; however it may trigger unnecessary warnings. This patch moves the WARN_ON to trigger only if __scale_tsc will actually be called from svm_adjust_tsc_offset. In addition make adj in kvm_set_msr_common s64 since this can have signed values. Signed-off-by: Chris J Arges chris.j.ar...@canonical.com --- arch/x86/kvm/svm.c | 8 +--- arch/x86/kvm/x86.c | 2 +- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index d4f3aaa..6b411ad 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -1056,9 +1056,11 @@ static void svm_adjust_tsc_offset(struct kvm_vcpu *vcpu, s64 adjustment, bool ho { struct vcpu_svm *svm = to_svm(vcpu); - WARN_ON(adjustment 0); - if (host) - adjustment = svm_scale_tsc(vcpu, adjustment); + if (host) { + if (svm-tsc_ratio != TSC_RATIO_DEFAULT) + WARN_ON(adjustment 0); + adjustment = svm_scale_tsc(vcpu, (u64)adjustment); + } svm-vmcb-control.tsc_offset += adjustment; if (is_guest_mode(vcpu)) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index f85da5c..1cd1376 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2141,7 +2141,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_IA32_TSC_ADJUST: if (guest_cpuid_has_tsc_adjust(vcpu)) { if (!msr_info-host_initiated) { - u64 adj = data - vcpu-arch.ia32_tsc_adjust_msr; + s64 adj = data - vcpu-arch.ia32_tsc_adjust_msr; kvm_x86_ops-adjust_tsc_offset(vcpu, adj, true); } vcpu-arch.ia32_tsc_adjust_msr = data; -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 01/11] sched: introduce sys_cpumask in tsk to adapt asymmetric system
* kernelf...@gmail.com kernelf...@gmail.com [2014-10-16 15:29:50]: Some system such as powerpc, some tsk (vcpu thread) can only run on the dedicated cpu. Since we adapt some asymmetric method to monitor the whole physical cpu. (powerKVM only allows the primary hwthread to set up runtime env for the secondary when entering guest). Nowadays, powerKVM run with all the secondary hwthread offline to ensure the vcpu threads only run on the primary thread. But we plan to keep all cpus online when running powerKVM to give more power when switching back to host, so introduce sys_allowed cpumask to reflect the cpuset which the vcpu thread can run on. Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com --- include/linux/init_task.h | 1 + include/linux/sched.h | 6 ++ kernel/sched/core.c | 10 -- 3 files changed, 15 insertions(+), 2 deletions(-) diff --git a/include/linux/init_task.h b/include/linux/init_task.h index 2bb4c4f3..c56f69e 100644 --- a/include/linux/init_task.h +++ b/include/linux/init_task.h @@ -172,6 +172,7 @@ extern struct task_group root_task_group; .normal_prio= MAX_PRIO-20, \ .policy = SCHED_NORMAL, \ .cpus_allowed = CPU_MASK_ALL, \ + .sys_allowed = CPU_MASK_ALL,\ Do we really need another mask, cant we just use cpus_allowed itself. .nr_cpus_allowed= NR_CPUS, \ .mm = NULL, \ .active_mm = init_mm, \ diff --git a/include/linux/sched.h b/include/linux/sched.h index 5c2c885..ce429f3 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1260,7 +1260,10 @@ struct task_struct { unsigned int policy; int nr_cpus_allowed; + /* Anded user and sys_allowed */ cpumask_t cpus_allowed; + /* due to the feature of asymmetric, some tsk can only run on such cpu */ + cpumask_t sys_allowed; #ifdef CONFIG_PREEMPT_RCU int rcu_read_lock_nesting; @@ -2030,6 +2033,9 @@ static inline void tsk_restore_flags(struct task_struct *task, } #ifdef CONFIG_SMP +extern void set_cpus_sys_allowed(struct task_struct *p, + const struct cpumask *new_mask); + extern void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask); diff --git a/kernel/sched/core.c b/kernel/sched/core.c index ec1a286..2cd1ae3 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4596,13 +4596,19 @@ void init_idle(struct task_struct *idle, int cpu) } #ifdef CONFIG_SMP +void set_cpus_sys_allowed(struct task_struct *p, + const struct cpumask *new_mask) +{ + cpumask_copy(p-sys_allowed, new_mask); +} + This function doesnt seem to be used anywhere... Not sure why it is introduced void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask) { if (p-sched_class p-sched_class-set_cpus_allowed) p-sched_class-set_cpus_allowed(p, new_mask); - cpumask_copy(p-cpus_allowed, new_mask); - p-nr_cpus_allowed = cpumask_weight(new_mask); + cpumask_and(p-cpus_allowed, p-sys_allowed, new_mask); + p-nr_cpus_allowed = cpumask_weight(p-cpus_allowed); } /* -- 1.8.3.1 -- Thanks and Regards Srikar Dronamraju -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 02/11] powerpc: kvm: ensure vcpu-thread run only on primary hwthread
* kernelf...@gmail.com kernelf...@gmail.com [2014-10-16 15:29:51]: When vcpu thread runs at the first time, it will ensure to stick to the primary thread. Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com --- arch/powerpc/include/asm/kvm_host.h | 3 +++ arch/powerpc/kvm/book3s_hv.c| 17 + 2 files changed, 20 insertions(+) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 98d9dd5..9a3355e 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -666,6 +666,9 @@ struct kvm_vcpu_arch { spinlock_t tbacct_lock; u64 busy_stolen; u64 busy_preempt; +#ifdef CONFIG_KVMPPC_ENABLE_SECONDARY + bool cpu_selected; +#endif #endif }; diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 27cced9..ba258c8 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -1909,6 +1909,23 @@ static int kvmppc_vcpu_run_hv(struct kvm_run *run, struct kvm_vcpu *vcpu) { int r; int srcu_idx; +#ifdef CONFIG_KVMPPC_ENABLE_SECONDARY + int cpu = smp_processor_id(); + int target_cpu; + unsigned int cpu; 2 variables with same name... cpu + struct task_struct *p = current; + + if (unlikely(!vcpu-arch.cpu_selected)) { + vcpu-arch.cpu_selected = true; Nit: something like cpumask_set seems to be better than cpu_selected + for (cpu = 0; cpu NR_CPUS; cpu+=threads_per_core) { + cpumask_set_cpu(cpu, p-sys_allowed); Dont we need to reset the cpumask first before we set the cpumask here? + } + if (cpu%threads_per_core != 0) { At this time, cpu should be NR_CPUS and most times it should be a multiple of threads_per_core. Unfortunately there wont be a cpu with cpu number NR_CPUS. + target_cpu = cpu/threads_per_core*threads_per_core; Its probably better of to have parenthesis here. + migrate_task_to(current, target_cpu); We are probably migrating to a non-existant cpu. Also dont you need to check if the target_cpu is part of the cpumask? + } + } +#endif if (!vcpu-arch.sane) { run-exit_reason = KVM_EXIT_INTERNAL_ERROR; -- 1.8.3.1 -- Thanks and Regards Srikar Dronamraju -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 03/11] powerpc: kvm: add interface to control kvm function on a core
* kernelf...@gmail.com kernelf...@gmail.com [2014-10-16 15:29:52]: When kvm is enabled on a core, we migrate all external irq to primary thread. Since currently, the kvmirq logic is handled by the primary hwthread. Todo: this patch lacks re-enable of irqbalance when kvm is disable on the core Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com --- arch/powerpc/kernel/sysfs.c| 39 ++ arch/powerpc/sysdev/xics/xics-common.c | 12 +++ 2 files changed, 51 insertions(+) diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c index 67fd2fd..a2595dd 100644 --- a/arch/powerpc/kernel/sysfs.c +++ b/arch/powerpc/kernel/sysfs.c @@ -552,6 +552,45 @@ static void sysfs_create_dscr_default(void) if (cpu_has_feature(CPU_FTR_DSCR)) err = device_create_file(cpu_subsys.dev_root, dev_attr_dscr_default); } + +#ifdef CONFIG_KVMPPC_ENABLE_SECONDARY +#define NR_CORES (CONFIG_NR_CPUS/threads_per_core) +static DECLARE_BITMAP(kvm_on_core, NR_CORES) __read_mostly + +static ssize_t show_kvm_enable(struct device *dev, + struct device_attribute *attr, char *buf) +{ +} + +static ssize_t __used store_kvm_enable(struct device *dev, + struct device_attribute *attr, const char *buf, + size_t count) +{ + struct cpumask stop_cpus; + unsigned long core, thr; + + sscanf(buf, %lx, core); + if (core NR_CORES) + return -1; + if (!test_bit(core, kvm_on_core)) + for (thr = 1; thr threads_per_core; thr++) + if (cpu_online(thr * threads_per_core + thr)) + cpumask_set_cpu(thr * threads_per_core + thr, stop_cpus); Shouldnt this be if (cpu_online(core * threads_per_core + thr)) cpumask_set_cpu(core * threads_per_core + thr, stop_cpus); ? -- Thanks and Regards Srikar Dronamraju -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html