Re: [RFC Patch 00/12] IXGBE: Add live migration support for SRIOV NIC

2015-10-23 Thread Alexander Duyck

On 10/21/2015 09:37 AM, Lan Tianyu wrote:

This patchset is to propose a new solution to add live migration support for 
82599
SRIOV network card.

Im our solution, we prefer to put all device specific operation into VF and
PF driver and make code in the Qemu more general.


VF status migration
=
VF status can be divided into 4 parts
1) PCI configure regs
2) MSIX configure
3) VF status in the PF driver
4) VF MMIO regs

The first three status are all handled by Qemu.
The PCI configure space regs and MSIX configure are originally
stored in Qemu. To save and restore "VF status in the PF driver"
by Qemu during migration, adds new sysfs node "state_in_pf" under
VF sysfs directory.

For VF MMIO regs, we introduce self emulation layer in the VF
driver to record MMIO reg values during reading or writing MMIO
and put these data in the guest memory. It will be migrated with
guest memory to new machine.


VF function restoration

Restoring VF function operation are done in the VF and PF driver.

In order to let VF driver to know migration status, Qemu fakes VF
PCI configure regs to indicate migration status and add new sysfs
node "notify_vf" to trigger VF mailbox irq in order to notify VF
about migration status change.

Transmit/Receive descriptor head regs are read-only and can't
be restored via writing back recording reg value directly and they
are set to 0 during VF reset. To reuse original tx/rx rings, shift
desc ring in order to move the desc pointed by original head reg to
first entry of the ring and then enable tx/rx rings. VF restarts to
receive and transmit from original head desc.


Tracking DMA accessed memory
=
Migration relies on tracking dirty page to migrate memory.
Hardware can't automatically mark a page as dirty after DMA
memory access. VF descriptor rings and data buffers are modified
by hardware when receive and transmit data. To track such dirty memory
manually, do dummy writes(read a byte and write it back) when receive
and transmit data.


I was thinking about it and I am pretty sure the dummy write approach is 
problematic at best.  Specifically the issue is that while you are 
performing a dummy write you risk pulling in descriptors for data that 
hasn't been dummy written to yet.  So when you resume and restore your 
descriptors you will have once that may contain Rx descriptors 
indicating they contain data when after the migration they don't.


I really think the best approach to take would be to look at 
implementing an emulated IOMMU so that you could track DMA mapped pages 
and avoid migrating the ones marked as DMA_FROM_DEVICE until they are 
unmapped.  The advantage to this is that in the case of the ixgbevf 
driver it now reuses the same pages for Rx DMA.  As a result it will be 
rewriting the same pages often and if you are marking those pages as 
dirty and transitioning them it is possible for a flow of small packets 
to really make a mess of things since you would be rewriting the same 
pages in a loop while the device is processing packets.


Beyond that I would say you could suspend/resume the device in order to 
get it to stop and flush the descriptor rings and any outstanding 
packets.  The code for suspend would unmap the DMA memory which would 
then be the trigger to flush it across in the migration, and the resume 
code would take care of any state restoration needed beyond any values 
that can be configured with the ip link command.


If you wanted to do a proof of concept of this you could probably do so 
with very little overhead.  Basically you would need the "page_addr" 
portion of patch 12 to emulate a slightly migration aware DMA API, and 
then beyond that you would need something like patch 9 but instead of 
adding new functions and API you would be switching things on and off 
via the ixgbevf_suspend/resume calls.


- Alex








--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/4] KVM: X86: Migration is supported

2015-10-23 Thread Jian Zhou
Supported bits of MSR_IA32_DEBUGCTLMSR are DEBUGCTLMSR_LBR(bit 0),
DEBUGCTLMSR_BTF(bit 1) and DEBUGCTLMSR_FREEZE_LBRS_ON_PMI(bit 11).
Qemu can get/set contents of LBR MSRs and LBR status in order to
support migration.

Signed-off-by: Jian Zhou 
Signed-off-by: Stephen He 
---
 arch/x86/kvm/x86.c | 88 +++---
 1 file changed, 77 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9a9a198..a3c72db 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -136,6 +136,8 @@ struct kvm_shared_msrs {
 static struct kvm_shared_msrs_global __read_mostly shared_msrs_global;
 static struct kvm_shared_msrs __percpu *shared_msrs;

+#define MSR_LBR_STATUS 0xd6
+
 struct kvm_stats_debugfs_item debugfs_entries[] = {
{ "pf_fixed", VCPU_STAT(pf_fixed) },
{ "pf_guest", VCPU_STAT(pf_guest) },
@@ -1917,6 +1919,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
bool pr = false;
u32 msr = msr_info->index;
u64 data = msr_info->data;
+   u64 supported = 0;

switch (msr) {
case MSR_AMD64_NB_CFG:
@@ -1948,16 +1951,25 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
}
break;
case MSR_IA32_DEBUGCTLMSR:
-   if (!data) {
-   /* We support the non-activated case already */
-   break;
-   } else if (data & ~(DEBUGCTLMSR_LBR | DEBUGCTLMSR_BTF)) {
-   /* Values other than LBR and BTF are vendor-specific,
-  thus reserved and should throw a #GP */
+   supported = DEBUGCTLMSR_LBR | DEBUGCTLMSR_BTF |
+   DEBUGCTLMSR_FREEZE_LBRS_ON_PMI;
+
+   if (data & ~supported) {
+   /*
+* Values other than LBR/BTF/FREEZE_LBRS_ON_PMI
+* are not supported, thus reserved and should throw a 
#GP
+*/
+   vcpu_unimpl(vcpu, "%s: MSR_IA32_DEBUGCTLMSR 0x%llx, 
nop\n",
+   __func__, data);
return 1;
}
-   vcpu_unimpl(vcpu, "%s: MSR_IA32_DEBUGCTLMSR 0x%llx, nop\n",
-   __func__, data);
+   if (kvm_x86_ops->set_debugctlmsr) {
+   if (kvm_x86_ops->set_debugctlmsr(vcpu, data))
+   return 1;
+   }
+   else
+   return 1;
+
break;
case 0x200 ... 0x2ff:
return kvm_mtrr_set_msr(vcpu, msr, data);
@@ -2078,6 +2090,33 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
vcpu_unimpl(vcpu, "disabled perfctr wrmsr: "
"0x%x data 0x%llx\n", msr, data);
break;
+   case MSR_LBR_STATUS:
+   if (kvm_x86_ops->set_debugctlmsr) {
+   vcpu->arch.lbr_status = (data == 0) ? 0 : 1;
+   if (data)
+   kvm_x86_ops->set_debugctlmsr(vcpu,
+   DEBUGCTLMSR_LBR | 
DEBUGCTLMSR_FREEZE_LBRS_ON_PMI);
+   } else
+   vcpu_unimpl(vcpu, "lbr is disabled, ignored wrmsr: "
+   "0x%x data 0x%llx\n", msr, data);
+   break;
+   case MSR_LBR_SELECT:
+   case MSR_LBR_TOS:
+   case MSR_PENTIUM4_LER_FROM_LIP:
+   case MSR_PENTIUM4_LER_TO_LIP:
+   case MSR_PENTIUM4_LBR_TOS:
+   case MSR_IA32_LASTINTFROMIP:
+   case MSR_IA32_LASTINTTOIP:
+   case MSR_LBR_CORE2_FROM ... MSR_LBR_CORE2_FROM + 0x7:
+   case MSR_LBR_CORE2_TO ... MSR_LBR_CORE2_TO + 0x7:
+   case MSR_LBR_NHM_FROM ... MSR_LBR_NHM_FROM + 0x1f:
+   case MSR_LBR_NHM_TO ... MSR_LBR_NHM_TO + 0x1f:
+   if (kvm_x86_ops->set_lbr_msr)
+   kvm_x86_ops->set_lbr_msr(vcpu, msr, data);
+   else
+   vcpu_unimpl(vcpu, "lbr is disabled, ignored wrmsr: "
+   "0x%x data 0x%llx\n", msr, data);
+   break;
case MSR_K7_CLK_CTL:
/*
 * Ignore all writes to this no longer documented MSR.
@@ -2178,13 +2217,16 @@ static int get_msr_mce(struct kvm_vcpu *vcpu, u32 msr, 
u64 *pdata)
 int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 {
switch (msr_info->index) {
+   case MSR_IA32_DEBUGCTLMSR:
+   if (kvm_x86_ops->get_debugctlmsr)
+   msr_info->data = kvm_x86_ops->get_debugctlmsr();
+   else
+   msr_info->data = 0;
+   break;
case MSR_IA32_PLATFORM_ID:
case 

[PATCH v2 0/4] KVM: VMX: enable LBR virtualization

2015-10-23 Thread Jian Zhou
Changelog in v2:
  (1) move the implementation into vmx.c
  (2) migraton is supported
  (3) add arrays in kvm_vcpu_arch struct to save/restore
  LBR MSRs at vm exit/entry time.
  (3) add a parameter of kvm_intel module to permanently
  disable LBRV
  (4) table of supported CPUs is reorgnized, LBRV
  can be enabled or not according to the guest CPUID

Jian Zhou (4):
  KVM: X86: Add arrays to save/restore LBR MSRs
  KVM: X86: LBR MSRs of supported CPU types
  KVM: X86: Migration is supported
  KVM: VMX: details of LBR virtualization implementation

 arch/x86/include/asm/kvm_host.h  |  26 -
 arch/x86/include/asm/msr-index.h |  26 -
 arch/x86/kvm/vmx.c   | 245 +++
 arch/x86/kvm/x86.c   |  88 --
 4 files changed, 366 insertions(+), 19 deletions(-)

--
1.7.12.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/4] KVM: X86: LBR MSRs of supported CPU types

2015-10-23 Thread Jian Zhou
Macros about LBR MSRs.

Signed-off-by: Jian Zhou 
Signed-off-by: Stephen He 
---
 arch/x86/include/asm/msr-index.h | 26 --
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index b98b471..2afcacd 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -68,10 +68,32 @@

 #define MSR_LBR_SELECT 0x01c8
 #define MSR_LBR_TOS0x01c9
+#define MSR_LBR_CORE_FROM  0x0040
+#define MSR_LBR_CORE_TO0x0060
+#define MAX_NUM_LBR_MSRS   128
+/* Pentium4/Xeon(based on NetBurst) LBR */
+#define MSR_PENTIUM4_LER_FROM_LIP  0x01d7
+#define MSR_PENTIUM4_LER_TO_LIP0x01d8
+#define MSR_PENTIUM4_LBR_TOS   0x01da
+#define MSR_LBR_PENTIUM4_FROM  0x0680
+#define MSR_LBR_PENTIUM4_TO0x06c0
+#define SIZE_PENTIUM4_LBR_STACK16
+/* Core2 LBR */
+#define MSR_LBR_CORE2_FROM MSR_LBR_CORE_FROM
+#define MSR_LBR_CORE2_TO   MSR_LBR_CORE_TO
+#define SIZE_CORE2_LBR_STACK   4
+/* Atom LBR */
+#define MSR_LBR_ATOM_FROM  MSR_LBR_CORE_FROM
+#define MSR_LBR_ATOM_TOMSR_LBR_CORE_TO
+#define SIZE_ATOM_LBR_STACK8
+/* Nehalem LBR */
 #define MSR_LBR_NHM_FROM   0x0680
 #define MSR_LBR_NHM_TO 0x06c0
-#define MSR_LBR_CORE_FROM  0x0040
-#define MSR_LBR_CORE_TO0x0060
+#define SIZE_NHM_LBR_STACK 16
+/* Skylake LBR */
+#define MSR_LBR_SKYLAKE_FROM   MSR_LBR_NHM_FROM
+#define MSR_LBR_SKYLAKE_TO MSR_LBR_NHM_TO
+#define SIZE_SKYLAKE_LBR_STACK 32

 #define MSR_LBR_INFO_0 0x0dc0 /* ... 0xddf for _31 */
 #define LBR_INFO_MISPRED   BIT_ULL(63)
--
1.7.12.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/4] KVM: X86: Add arrays to save/restore LBR MSRs

2015-10-23 Thread Jian Zhou
Add arrays in kvm_vcpu_arch struct to save/restore
LBR MSRs at vm exit/entry time.
Add new hooks to set/get DEBUGCTLMSR and LBR MSRs.

Signed-off-by: Jian Zhou 
Signed-off-by: Stephen He 
---
 arch/x86/include/asm/kvm_host.h | 26 --
 1 file changed, 20 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 3a36ee7..dc2c120 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -376,6 +376,12 @@ struct kvm_vcpu_hv {
u64 hv_vapic;
 };

+struct msr_data {
+   bool host_initiated;
+   u32 index;
+   u64 data;
+};
+
 struct kvm_vcpu_arch {
/*
 * rip and regs accesses must go through
@@ -516,6 +522,15 @@ struct kvm_vcpu_arch {
unsigned long eff_db[KVM_NR_DB_REGS];
unsigned long guest_debug_dr7;

+   int lbr_status;
+   int lbr_used;
+
+   struct lbr_msr {
+   unsigned nr;
+   struct msr_data guest[MAX_NUM_LBR_MSRS];
+   struct msr_data host[MAX_NUM_LBR_MSRS];
+   }lbr_msr;
+
u64 mcg_cap;
u64 mcg_status;
u64 mcg_ctl;
@@ -728,12 +743,6 @@ struct kvm_vcpu_stat {

 struct x86_instruction_info;

-struct msr_data {
-   bool host_initiated;
-   u32 index;
-   u64 data;
-};
-
 struct kvm_lapic_irq {
u32 vector;
u16 delivery_mode;
@@ -887,6 +896,11 @@ struct kvm_x86_ops {
   gfn_t offset, unsigned long mask);
/* pmu operations of sub-arch */
const struct kvm_pmu_ops *pmu_ops;
+
+   int (*set_debugctlmsr)(struct kvm_vcpu *vcpu, u64 value);
+   u64 (*get_debugctlmsr)(void);
+   void (*set_lbr_msr)(struct kvm_vcpu *vcpu, u32 msr, u64 data);
+   u64 (*get_lbr_msr)(struct kvm_vcpu *vcpu, u32 msr);
 };

 struct kvm_arch_async_pf {
--
1.7.12.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 4/4] KVM: VMX: details of LBR virtualization implementation

2015-10-23 Thread Jian Zhou
Using msr intercept bitmap and arrays(save/restore LBR MSRs)
in kvm_vcpu_arch struct to support LBR virtualization.
Add a parameter of kvm_intel module to permanently disable
LBRV.
Reorgnized the table of supported CPUs, LBRV can be enabled
or not according to the guest CPUID.

Signed-off-by: Jian Zhou 
Signed-off-by: Stephen He 
---
 arch/x86/kvm/vmx.c | 245 +
 1 file changed, 245 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 6a8bc64..3ab890d 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -90,6 +90,9 @@ module_param(fasteoi, bool, S_IRUGO);
 static bool __read_mostly enable_apicv = 1;
 module_param(enable_apicv, bool, S_IRUGO);

+static bool __read_mostly lbrv = 1;
+module_param(lbrv, bool, S_IRUGO);
+
 static bool __read_mostly enable_shadow_vmcs = 1;
 module_param_named(enable_shadow_vmcs, enable_shadow_vmcs, bool, S_IRUGO);
 /*
@@ -4323,6 +4326,21 @@ static void vmx_disable_intercept_msr_write_x2apic(u32 
msr)
msr, MSR_TYPE_W);
 }

+static void vmx_disable_intercept_guest_msr(struct kvm_vcpu *vcpu, u32 msr)
+{
+   if (irqchip_in_kernel(vcpu->kvm) &&
+   apic_x2apic_mode(vcpu->arch.apic)) {
+   vmx_disable_intercept_msr_read_x2apic(msr);
+   vmx_disable_intercept_msr_write_x2apic(msr);
+   }
+   else {
+   if (is_long_mode(vcpu))
+   vmx_disable_intercept_for_msr(msr, true);
+   else
+   vmx_disable_intercept_for_msr(msr, false);
+   }
+}
+
 static int vmx_vm_has_apicv(struct kvm *kvm)
 {
return enable_apicv && irqchip_in_kernel(kvm);
@@ -6037,6 +6055,13 @@ static __init int hardware_setup(void)
kvm_x86_ops->sync_pir_to_irr = vmx_sync_pir_to_irr_dummy;
}

+   if (!lbrv) {
+   kvm_x86_ops->set_debugctlmsr = NULL;
+   kvm_x86_ops->get_debugctlmsr = NULL;
+   kvm_x86_ops->set_lbr_msr = NULL;
+   kvm_x86_ops->get_lbr_msr = NULL;
+   }
+
vmx_disable_intercept_for_msr(MSR_FS_BASE, false);
vmx_disable_intercept_for_msr(MSR_GS_BASE, false);
vmx_disable_intercept_for_msr(MSR_KERNEL_GS_BASE, true);
@@ -8258,6 +8283,215 @@ static void atomic_switch_perf_msrs(struct vcpu_vmx 
*vmx)
msrs[i].host);
 }

+struct lbr_info {
+   u32 base;
+   u8 count;
+} pentium4_lbr[] = {
+   { MSR_LBR_SELECT,  1 },
+   { MSR_PENTIUM4_LER_FROM_LIP,   1 },
+   { MSR_PENTIUM4_LER_TO_LIP, 1 },
+   { MSR_PENTIUM4_LBR_TOS,1 },
+   { MSR_LBR_PENTIUM4_FROM,   SIZE_PENTIUM4_LBR_STACK },
+   { MSR_LBR_PENTIUM4_TO, SIZE_PENTIUM4_LBR_STACK },
+   { 0, 0 }
+}, core2_lbr[] = {
+   { MSR_LBR_SELECT,   1 },
+   { MSR_IA32_LASTINTFROMIP,   1 },
+   { MSR_IA32_LASTINTTOIP, 1 },
+   { MSR_LBR_TOS,  1 },
+   { MSR_LBR_CORE2_FROM,   SIZE_CORE2_LBR_STACK },
+   { MSR_LBR_CORE2_TO, SIZE_CORE2_LBR_STACK },
+   { 0, 0 }
+}, atom_lbr[] = {
+   { MSR_LBR_SELECT,   1 },
+   { MSR_IA32_LASTINTFROMIP,   1 },
+   { MSR_IA32_LASTINTTOIP, 1 },
+   { MSR_LBR_TOS,  1 },
+   { MSR_LBR_ATOM_FROM,SIZE_ATOM_LBR_STACK },
+   { MSR_LBR_ATOM_TO,  SIZE_ATOM_LBR_STACK },
+   { 0, 0 }
+}, nehalem_lbr[] = {
+   { MSR_LBR_SELECT,   1 },
+   { MSR_IA32_LASTINTFROMIP,   1 },
+   { MSR_IA32_LASTINTTOIP, 1 },
+   { MSR_LBR_TOS,  1 },
+   { MSR_LBR_NHM_FROM, SIZE_NHM_LBR_STACK },
+   { MSR_LBR_NHM_TO,   SIZE_NHM_LBR_STACK },
+   { 0, 0 }
+}, skylake_lbr[] = {
+   { MSR_LBR_SELECT,   1 },
+   { MSR_IA32_LASTINTFROMIP,   1 },
+   { MSR_IA32_LASTINTTOIP, 1 },
+   { MSR_LBR_TOS,  1 },
+   { MSR_LBR_SKYLAKE_FROM, SIZE_SKYLAKE_LBR_STACK },
+   { MSR_LBR_SKYLAKE_TO,   SIZE_SKYLAKE_LBR_STACK },
+   { 0, 0}
+};
+
+static const struct lbr_info *last_branch_msr_get(struct kvm_vcpu *vcpu)
+{
+   struct kvm_cpuid_entry2 *best = kvm_find_cpuid_entry(vcpu, 1, 0);
+   u32 eax = best->eax;
+   u8 family = (eax >> 8) & 0xf;
+   u8 model = (eax >> 4) & 0xf;
+
+   if (family == 15)
+   family += (eax >> 20) & 0xff;
+   if (family >= 6)
+   model += ((eax >> 16) & 0xf) << 4;
+
+   if (family == 6)
+   {
+   switch (model)
+   {
+   case 15: /* 65nm Core2 "Merom" */
+   case 22: /* 65nm Core2 "Merom-L" */
+   case 23: /* 45nm Core2 "Penryn" */
+   case 29: /* 45nm Core2 "Dunnington (MP) */
+   

Re: [RFC Patch 00/12] IXGBE: Add live migration support for SRIOV NIC

2015-10-23 Thread Alex Williamson
On Fri, 2015-10-23 at 11:36 -0700, Alexander Duyck wrote:
> On 10/21/2015 09:37 AM, Lan Tianyu wrote:
> > This patchset is to propose a new solution to add live migration support 
> > for 82599
> > SRIOV network card.
> >
> > Im our solution, we prefer to put all device specific operation into VF and
> > PF driver and make code in the Qemu more general.
> >
> >
> > VF status migration
> > =
> > VF status can be divided into 4 parts
> > 1) PCI configure regs
> > 2) MSIX configure
> > 3) VF status in the PF driver
> > 4) VF MMIO regs
> >
> > The first three status are all handled by Qemu.
> > The PCI configure space regs and MSIX configure are originally
> > stored in Qemu. To save and restore "VF status in the PF driver"
> > by Qemu during migration, adds new sysfs node "state_in_pf" under
> > VF sysfs directory.
> >
> > For VF MMIO regs, we introduce self emulation layer in the VF
> > driver to record MMIO reg values during reading or writing MMIO
> > and put these data in the guest memory. It will be migrated with
> > guest memory to new machine.
> >
> >
> > VF function restoration
> > 
> > Restoring VF function operation are done in the VF and PF driver.
> >
> > In order to let VF driver to know migration status, Qemu fakes VF
> > PCI configure regs to indicate migration status and add new sysfs
> > node "notify_vf" to trigger VF mailbox irq in order to notify VF
> > about migration status change.
> >
> > Transmit/Receive descriptor head regs are read-only and can't
> > be restored via writing back recording reg value directly and they
> > are set to 0 during VF reset. To reuse original tx/rx rings, shift
> > desc ring in order to move the desc pointed by original head reg to
> > first entry of the ring and then enable tx/rx rings. VF restarts to
> > receive and transmit from original head desc.
> >
> >
> > Tracking DMA accessed memory
> > =
> > Migration relies on tracking dirty page to migrate memory.
> > Hardware can't automatically mark a page as dirty after DMA
> > memory access. VF descriptor rings and data buffers are modified
> > by hardware when receive and transmit data. To track such dirty memory
> > manually, do dummy writes(read a byte and write it back) when receive
> > and transmit data.
> 
> I was thinking about it and I am pretty sure the dummy write approach is 
> problematic at best.  Specifically the issue is that while you are 
> performing a dummy write you risk pulling in descriptors for data that 
> hasn't been dummy written to yet.  So when you resume and restore your 
> descriptors you will have once that may contain Rx descriptors 
> indicating they contain data when after the migration they don't.
> 
> I really think the best approach to take would be to look at 
> implementing an emulated IOMMU so that you could track DMA mapped pages 
> and avoid migrating the ones marked as DMA_FROM_DEVICE until they are 
> unmapped.  The advantage to this is that in the case of the ixgbevf 
> driver it now reuses the same pages for Rx DMA.  As a result it will be 
> rewriting the same pages often and if you are marking those pages as 
> dirty and transitioning them it is possible for a flow of small packets 
> to really make a mess of things since you would be rewriting the same 
> pages in a loop while the device is processing packets.

I'd be concerned that an emulated IOMMU on the DMA path would reduce
throughput to the point where we shouldn't even bother with assigning
the device in the first place and should be using virtio-net instead.
POWER systems have a guest visible IOMMU and it's been challenging for
them to get to 10Gbps, requiring real-mode tricks.  virtio-net may add
some latency, but it's not that hard to get it to 10Gbps and it already
supports migration.  An emulated IOMMU in the guest is really only good
for relatively static mappings, the latency for anything else is likely
too high.  Maybe there are shadow page table tricks that could help, but
it's imposing overhead the whole time the guest is running, not only on
migration.  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/12] KVM: x86: add support for VMX TSC scaling

2015-10-23 Thread Joerg Roedel
On Tue, Oct 20, 2015 at 03:39:00PM +0800, Haozhong Zhang wrote:
> VMX TSC scaling shares some common logics with SVM TSC ratio which
> is already supported by KVM. Patch 1 ~ 8 move those common logics from
> SVM code to the common code. Upon them, patch 9 ~ 12 add VMX-specific
> support for VMX TSC scaling.

Have you tested your changes on an AMD machine too?


Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 0/7] KVM: arm64: Implement API for vGICv3 live migration

2015-10-23 Thread Peter Maydell
On 12 October 2015 at 09:29, Pavel Fedin  wrote:
> This patchset adds necessary userspace API in order to support vGICv3 live
> migration. GICv3 registers are accessed using device attribute ioctls,
> similar to GICv2.
>
> Whoever wants to test it, please note that this version is not
> binary-compatible with previous one, the API has been seriously changed.
> qemu patchess will be posted in some time.
>
> v4 => v5:
> - Adapted to new API by Peter Maydell, Marc Zyngier and Christoffer Dall.
>   Acked-by's on the documentation were dropped, just in case, because i
>   slightly adjusted it. Additionally, i merged all doc updates into one
>   patch.

Could you tell us what you changed in the doc patch from the version
that got sent out with the acks, please?

thanks
-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 2/7] vfio: platform: add capability to register a reset function

2015-10-23 Thread Eric Auger
In preparation for subsequent changes in reset function lookup,
lets introduce a dynamic list of reset combos (compat string,
reset module, reset function). The list can be populated/voided with
vfio_platform_register/unregister_reset. Those are not yet used in
this patch.

Signed-off-by: Eric Auger 

---

v3 -> v4:
- __vfio_platform_register_reset does not return any value anymore
- vfio_platform_unregister_reset also takes the reset function pointer
  as parameter

v2 -> v3:
- use goto out to have a single mutex_unlock
- implement vfio_platform_register_reset as a macro (suggested by Arnd)
- move reset_node struct declaration back to vfio_platform_private.h
- vfio_platform_unregister_reset does not return any value anymore

v1 -> v2:
- reset_list becomes static
- vfio_platform_register/unregister_reset take a const char * as compat
- fix node leak
- add reset_lock to protect the reset list manipulation
- move vfio_platform_reset_node declaration in vfio_platform_common.c
---
 drivers/vfio/platform/vfio_platform_common.c  | 27 +++
 drivers/vfio/platform/vfio_platform_private.h | 20 
 2 files changed, 47 insertions(+)

diff --git a/drivers/vfio/platform/vfio_platform_common.c 
b/drivers/vfio/platform/vfio_platform_common.c
index 184e9d2..3b7e52c 100644
--- a/drivers/vfio/platform/vfio_platform_common.c
+++ b/drivers/vfio/platform/vfio_platform_common.c
@@ -27,6 +27,7 @@
 #define DRIVER_AUTHOR   "Antonios Motakis "
 #define DRIVER_DESC "VFIO platform base module"
 
+static LIST_HEAD(reset_list);
 static DEFINE_MUTEX(driver_lock);
 
 static const struct vfio_platform_reset_combo reset_lookup_table[] = {
@@ -578,6 +579,32 @@ struct vfio_platform_device 
*vfio_platform_remove_common(struct device *dev)
 }
 EXPORT_SYMBOL_GPL(vfio_platform_remove_common);
 
+void __vfio_platform_register_reset(struct vfio_platform_reset_node *node)
+{
+   mutex_lock(_lock);
+   list_add(>link, _list);
+   mutex_unlock(_lock);
+}
+EXPORT_SYMBOL_GPL(__vfio_platform_register_reset);
+
+void vfio_platform_unregister_reset(const char *compat,
+   vfio_platform_reset_fn_t fn)
+{
+   struct vfio_platform_reset_node *iter, *temp;
+
+   mutex_lock(_lock);
+   list_for_each_entry_safe(iter, temp, _list, link) {
+   if (!strcmp(iter->compat, compat) && (iter->reset == fn)) {
+   list_del(>link);
+   break;
+   }
+   }
+
+   mutex_unlock(_lock);
+
+}
+EXPORT_SYMBOL_GPL(vfio_platform_unregister_reset);
+
 MODULE_VERSION(DRIVER_VERSION);
 MODULE_LICENSE("GPL v2");
 MODULE_AUTHOR(DRIVER_AUTHOR);
diff --git a/drivers/vfio/platform/vfio_platform_private.h 
b/drivers/vfio/platform/vfio_platform_private.h
index 7128690..c563940 100644
--- a/drivers/vfio/platform/vfio_platform_private.h
+++ b/drivers/vfio/platform/vfio_platform_private.h
@@ -71,6 +71,15 @@ struct vfio_platform_device {
int (*reset)(struct vfio_platform_device *vdev);
 };
 
+typedef int (*vfio_platform_reset_fn_t)(struct vfio_platform_device *vdev);
+
+struct vfio_platform_reset_node {
+   struct list_head link;
+   char *compat;
+   struct module *owner;
+   vfio_platform_reset_fn_t reset;
+};
+
 struct vfio_platform_reset_combo {
const char *compat;
const char *reset_function_name;
@@ -90,4 +99,15 @@ extern int vfio_platform_set_irqs_ioctl(struct 
vfio_platform_device *vdev,
unsigned start, unsigned count,
void *data);
 
+extern void __vfio_platform_register_reset(struct vfio_platform_reset_node *n);
+extern void vfio_platform_unregister_reset(const char *compat,
+  vfio_platform_reset_fn_t fn);
+#define vfio_platform_register_reset(__compat, __reset)\
+static struct vfio_platform_reset_node __reset ## _node = {\
+   .owner = THIS_MODULE,   \
+   .compat = __compat, \
+   .reset = __reset,   \
+}; \
+__vfio_platform_register_reset(&__reset ## _node)
+
 #endif /* VFIO_PLATFORM_PRIVATE_H */
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 4/7] vfio: platform: reset: calxedaxgmac: add reset function registration

2015-10-23 Thread Eric Auger
This patch adds the reset function registration/unregistration.
This is handled through the module_vfio_reset_handler macro. This
latter also defines a MODULE_ALIAS which simplifies the load from
vfio-platform.

Signed-off-by: Eric Auger 
Reviewed-by: Arnd Bergmann 

---

v3 -> v4:
- I restored the EXPORT_SYMBOL which will be removed when switching the
  lookup method
- Add Arnd R-b.

v2 -> v3:
- do not include vfio_platform_reset_private.h anymore (removed)
- remove pr_info
- rework commit message

v1 -> v2:
- uses the module_vfio_reset_handler macro
- add pr_info on vfio reset
- do not export vfio_platform_calxedaxgmac_reset symbol anymore
---
 drivers/vfio/platform/reset/vfio_platform_calxedaxgmac.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/platform/reset/vfio_platform_calxedaxgmac.c 
b/drivers/vfio/platform/reset/vfio_platform_calxedaxgmac.c
index 619dc7d..80718f2 100644
--- a/drivers/vfio/platform/reset/vfio_platform_calxedaxgmac.c
+++ b/drivers/vfio/platform/reset/vfio_platform_calxedaxgmac.c
@@ -30,8 +30,6 @@
 #define DRIVER_AUTHOR   "Eric Auger "
 #define DRIVER_DESC "Reset support for Calxeda xgmac vfio platform device"
 
-#define CALXEDAXGMAC_COMPAT "calxeda,hb-xgmac"
-
 /* XGMAC Register definitions */
 #define XGMAC_CONTROL   0x  /* MAC Configuration */
 
@@ -80,6 +78,8 @@ int vfio_platform_calxedaxgmac_reset(struct 
vfio_platform_device *vdev)
 }
 EXPORT_SYMBOL_GPL(vfio_platform_calxedaxgmac_reset);
 
+module_vfio_reset_handler("calxeda,hb-xgmac", 
vfio_platform_calxedaxgmac_reset);
+
 MODULE_VERSION(DRIVER_VERSION);
 MODULE_LICENSE("GPL v2");
 MODULE_AUTHOR(DRIVER_AUTHOR);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 0/7] VFIO platform reset module rework

2015-10-23 Thread Eric Auger
This series fixes the current implementation by getting rid of the
usage of __symbol_get which caused a compilation issue with
CONFIG_MODULES disabled. On top of this, the usage of MODULE_ALIAS makes
possible to add a new reset module without being obliged to update the
framework. The new implementation relies on the reset module registering
its reset function to the vfio-platform driver.

The series is available at

https://git.linaro.org/people/eric.auger/linux.git/shortlog/refs/heads/v4.3-rc6-rework-v3

Best Regards

Eric

v3 -> v4:
- Remove the EXPORT_SYMBOL_GPL(vfio_platform_calxedaxgmac_reset) later
  in [6/7], to keep the functionality working all along the series
- Add Arnd R-b (I dared to keep them despite the above change)
- vfio_platform_unregister_reset gets the reset function to do a double
  check on the compat and the function pointer too
- __vfio_platform_register_reset turned to 'void'

v2 -> v3:
- use driver_mutex instead of reset_mutex
- style fixes: single mutex_unlock
- use static nodes; vfio_platform_register_reset now is a macro
- vfio_platform_reset_private.h removed since reset_module_(un)register
  disappear. No use of symbol_get anymore.
- new patch introducing vfio-platform-base
- reset look-up moved back at vfio-platform probe time
- new patch featuring dev_info/dev_warn

v1 -> v2:
* in vfio_platform_common.c:
  - move reset lookup at load time and put reset at release: this is to
prevent a race between the 2 load module loads
  - reset_list becomes static
  - vfio_platform_register/unregister_reset take a const char * as compat
  - fix node link
  - remove old combo struct and cleanup proto of vfio_platform_get_reset
  - add mutex to protect the reset list
* in calxeda xgmac reset module
  - introduce vfio_platform_reset_private.h
  - use module_vfio_reset_handler macro
  - do not export vfio_platform_calxedaxgmac_reset symbol anymore
  - add a pr_info to show the device is reset by vfio reset module



Eric Auger (7):
  vfio: platform: introduce vfio-platform-base module
  vfio: platform: add capability to register a reset function
  vfio: platform: introduce module_vfio_reset_handler macro
  vfio: platform: reset: calxedaxgmac: add reset function registration
  vfio: platform: add compat in vfio_platform_device
  vfio: platform: use list of registered reset function
  vfio: platform: add dev_info on device reset

 drivers/vfio/platform/Makefile |   6 +-
 .../platform/reset/vfio_platform_calxedaxgmac.c|   5 +-
 drivers/vfio/platform/vfio_amba.c  |   1 +
 drivers/vfio/platform/vfio_platform.c  |   1 +
 drivers/vfio/platform/vfio_platform_common.c   | 119 +++--
 drivers/vfio/platform/vfio_platform_private.h  |  40 ++-
 6 files changed, 130 insertions(+), 42 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 3/7] vfio: platform: introduce module_vfio_reset_handler macro

2015-10-23 Thread Eric Auger
The module_vfio_reset_handler macro
- define a module alias
- implement module init/exit function which respectively registers
  and unregisters the reset function.

Signed-off-by: Eric Auger 

---
v3 -> v4:
- pass reset to vfio_platform_unregister_reset

v2 -> v3:
- use vfio_platform_register_reset macro

v1 -> v2:
- remove vfio_platform_reset_private.h and move back the macro to
  vfio_platform_private.h header: removed reset_module_register &
  unregister (symbol_get)
- defines the module_vfio_reset_handler macro as suggested by Arnd
  (formerly in vfio_platform_reset_private.h)
---
 drivers/vfio/platform/vfio_platform_private.h | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/drivers/vfio/platform/vfio_platform_private.h 
b/drivers/vfio/platform/vfio_platform_private.h
index c563940..fd262be 100644
--- a/drivers/vfio/platform/vfio_platform_private.h
+++ b/drivers/vfio/platform/vfio_platform_private.h
@@ -110,4 +110,18 @@ static struct vfio_platform_reset_node __reset ## _node = 
{\
 }; \
 __vfio_platform_register_reset(&__reset ## _node)
 
+#define module_vfio_reset_handler(compat, reset)   \
+MODULE_ALIAS("vfio-reset:" compat);\
+static int __init reset ## _module_init(void)  \
+{  \
+   vfio_platform_register_reset(compat, reset);\
+   return 0;   \
+}; \
+static void __exit reset ## _module_exit(void) \
+{  \
+   vfio_platform_unregister_reset(compat, reset);  \
+}; \
+module_init(reset ## _module_init);\
+module_exit(reset ## _module_exit)
+
 #endif /* VFIO_PLATFORM_PRIVATE_H */
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3] VFIO: platform: reset: AMD xgbe reset module

2015-10-23 Thread Eric Auger
This patch introduces a module that registers and implements a low-level
reset function for the AMD XGBE device.

it performs the following actions:
- reset the PHY
- disable auto-negotiation
- disable & clear auto-negotiation IRQ
- soft-reset the MAC

Those tiny pieces of code are inherited from the native xgbe driver.

Signed-off-by: Eric Auger 

---

Applies on top of [PATCH v3 0/7] VFIO platform reset module rework

v2 -> v3:
- in Kconfig, add empty line between the 2 options
- remove DRIVER_VERSION, DRIVER_AUTHOR and DRIVER_DESC and put
  strings directly in MODULE macros

v1 -> v2:
- uses module_vfio_reset_handler macro
---
 drivers/vfio/platform/reset/Kconfig|   8 ++
 drivers/vfio/platform/reset/Makefile   |   2 +
 .../vfio/platform/reset/vfio_platform_amdxgbe.c| 127 +
 3 files changed, 137 insertions(+)
 create mode 100644 drivers/vfio/platform/reset/vfio_platform_amdxgbe.c

diff --git a/drivers/vfio/platform/reset/Kconfig 
b/drivers/vfio/platform/reset/Kconfig
index 746b96b..705 100644
--- a/drivers/vfio/platform/reset/Kconfig
+++ b/drivers/vfio/platform/reset/Kconfig
@@ -5,3 +5,11 @@ config VFIO_PLATFORM_CALXEDAXGMAC_RESET
  Enables the VFIO platform driver to handle reset for Calxeda xgmac
 
  If you don't know what to do here, say N.
+
+config VFIO_PLATFORM_AMDXGBE_RESET
+   tristate "VFIO support for AMD XGBE reset"
+   depends on VFIO_PLATFORM
+   help
+ Enables the VFIO platform driver to handle reset for AMD XGBE
+
+ If you don't know what to do here, say N.
diff --git a/drivers/vfio/platform/reset/Makefile 
b/drivers/vfio/platform/reset/Makefile
index 2a486af..93f4e23 100644
--- a/drivers/vfio/platform/reset/Makefile
+++ b/drivers/vfio/platform/reset/Makefile
@@ -1,5 +1,7 @@
 vfio-platform-calxedaxgmac-y := vfio_platform_calxedaxgmac.o
+vfio-platform-amdxgbe-y := vfio_platform_amdxgbe.o
 
 ccflags-y += -Idrivers/vfio/platform
 
 obj-$(CONFIG_VFIO_PLATFORM_CALXEDAXGMAC_RESET) += vfio-platform-calxedaxgmac.o
+obj-$(CONFIG_VFIO_PLATFORM_AMDXGBE_RESET) += vfio-platform-amdxgbe.o
diff --git a/drivers/vfio/platform/reset/vfio_platform_amdxgbe.c 
b/drivers/vfio/platform/reset/vfio_platform_amdxgbe.c
new file mode 100644
index 000..1636e22
--- /dev/null
+++ b/drivers/vfio/platform/reset/vfio_platform_amdxgbe.c
@@ -0,0 +1,127 @@
+/*
+ * VFIO platform driver specialized for AMD xgbe reset
+ * reset code is inherited from AMD xgbe native driver
+ *
+ * Copyright (c) 2015 Linaro Ltd.
+ *  www.linaro.org
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see .
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "vfio_platform_private.h"
+
+#define DMA_MR 0x3000
+#define MAC_VR 0x0110
+#define DMA_ISR0x3008
+#define MAC_ISR0x00b0
+#define PCS_MMD_SELECT 0xff
+#define MDIO_AN_INT0x8002
+#define MDIO_AN_INTMASK0x8001
+
+static unsigned int xmdio_read(void *ioaddr, unsigned int mmd,
+  unsigned int reg)
+{
+   unsigned int mmd_address, value;
+
+   mmd_address = (mmd << 16) | ((reg) & 0x);
+   iowrite32(mmd_address >> 8, ioaddr + (PCS_MMD_SELECT << 2));
+   value = ioread32(ioaddr + ((mmd_address & 0xff) << 2));
+   return value;
+}
+
+static void xmdio_write(void *ioaddr, unsigned int mmd,
+   unsigned int reg, unsigned int value)
+{
+   unsigned int mmd_address;
+
+   mmd_address = (mmd << 16) | ((reg) & 0x);
+   iowrite32(mmd_address >> 8, ioaddr + (PCS_MMD_SELECT << 2));
+   iowrite32(value, ioaddr + ((mmd_address & 0xff) << 2));
+}
+
+int vfio_platform_amdxgbe_reset(struct vfio_platform_device *vdev)
+{
+   struct vfio_platform_region xgmac_regs = vdev->regions[0];
+   struct vfio_platform_region xpcs_regs = vdev->regions[1];
+   u32 dma_mr_value, pcs_value, value;
+   unsigned int count;
+
+   if (!xgmac_regs.ioaddr) {
+   xgmac_regs.ioaddr =
+   ioremap_nocache(xgmac_regs.addr, xgmac_regs.size);
+   if (!xgmac_regs.ioaddr)
+   return -ENOMEM;
+   }
+   if (!xpcs_regs.ioaddr) {
+   xpcs_regs.ioaddr =
+   ioremap_nocache(xpcs_regs.addr, xpcs_regs.size);
+

[PATCH v4 1/7] vfio: platform: introduce vfio-platform-base module

2015-10-23 Thread Eric Auger
To prepare for vfio platform reset rework let's build
vfio_platform_common.c and vfio_platform_irq.c in a separate
module from vfio-platform and vfio-amba. This makes possible
to have separate module inits and works around a race between
platform driver init and vfio reset module init: that way we
make sure symbols exported by base are available when vfio-platform
driver gets probed.

The open/release being implemented in the base module, the ref
count is applied to the parent module instead.

Signed-off-by: Eric Auger 
Suggested-by: Arnd Bergmann 
Reviewed-by: Arnd Bergmann 

---
v3 -> v4:
- add Arnd R-b

v3: creation
---
 drivers/vfio/platform/Makefile|  6 --
 drivers/vfio/platform/vfio_amba.c |  1 +
 drivers/vfio/platform/vfio_platform.c |  1 +
 drivers/vfio/platform/vfio_platform_common.c  | 13 +++--
 drivers/vfio/platform/vfio_platform_private.h |  1 +
 5 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/platform/Makefile b/drivers/vfio/platform/Makefile
index 9ce8afe..41a6224 100644
--- a/drivers/vfio/platform/Makefile
+++ b/drivers/vfio/platform/Makefile
@@ -1,10 +1,12 @@
-
-vfio-platform-y := vfio_platform.o vfio_platform_common.o vfio_platform_irq.o
+vfio-platform-base-y := vfio_platform_common.o vfio_platform_irq.o
+vfio-platform-y := vfio_platform.o
 
 obj-$(CONFIG_VFIO_PLATFORM) += vfio-platform.o
+obj-$(CONFIG_VFIO_PLATFORM) += vfio-platform-base.o
 obj-$(CONFIG_VFIO_PLATFORM) += reset/
 
 vfio-amba-y := vfio_amba.o
 
 obj-$(CONFIG_VFIO_AMBA) += vfio-amba.o
+obj-$(CONFIG_VFIO_AMBA) += vfio-platform-base.o
 obj-$(CONFIG_VFIO_AMBA) += reset/
diff --git a/drivers/vfio/platform/vfio_amba.c 
b/drivers/vfio/platform/vfio_amba.c
index ff0331f..a66479b 100644
--- a/drivers/vfio/platform/vfio_amba.c
+++ b/drivers/vfio/platform/vfio_amba.c
@@ -67,6 +67,7 @@ static int vfio_amba_probe(struct amba_device *adev, const 
struct amba_id *id)
vdev->flags = VFIO_DEVICE_FLAGS_AMBA;
vdev->get_resource = get_amba_resource;
vdev->get_irq = get_amba_irq;
+   vdev->parent_module = THIS_MODULE;
 
ret = vfio_platform_probe_common(vdev, >dev);
if (ret) {
diff --git a/drivers/vfio/platform/vfio_platform.c 
b/drivers/vfio/platform/vfio_platform.c
index cef645c..f1625dc 100644
--- a/drivers/vfio/platform/vfio_platform.c
+++ b/drivers/vfio/platform/vfio_platform.c
@@ -65,6 +65,7 @@ static int vfio_platform_probe(struct platform_device *pdev)
vdev->flags = VFIO_DEVICE_FLAGS_PLATFORM;
vdev->get_resource = get_platform_resource;
vdev->get_irq = get_platform_irq;
+   vdev->parent_module = THIS_MODULE;
 
ret = vfio_platform_probe_common(vdev, >dev);
if (ret)
diff --git a/drivers/vfio/platform/vfio_platform_common.c 
b/drivers/vfio/platform/vfio_platform_common.c
index e43efb5..184e9d2 100644
--- a/drivers/vfio/platform/vfio_platform_common.c
+++ b/drivers/vfio/platform/vfio_platform_common.c
@@ -23,6 +23,10 @@
 
 #include "vfio_platform_private.h"
 
+#define DRIVER_VERSION  "0.10"
+#define DRIVER_AUTHOR   "Antonios Motakis "
+#define DRIVER_DESC "VFIO platform base module"
+
 static DEFINE_MUTEX(driver_lock);
 
 static const struct vfio_platform_reset_combo reset_lookup_table[] = {
@@ -146,7 +150,7 @@ static void vfio_platform_release(void *device_data)
 
mutex_unlock(_lock);
 
-   module_put(THIS_MODULE);
+   module_put(vdev->parent_module);
 }
 
 static int vfio_platform_open(void *device_data)
@@ -154,7 +158,7 @@ static int vfio_platform_open(void *device_data)
struct vfio_platform_device *vdev = device_data;
int ret;
 
-   if (!try_module_get(THIS_MODULE))
+   if (!try_module_get(vdev->parent_module))
return -ENODEV;
 
mutex_lock(_lock);
@@ -573,3 +577,8 @@ struct vfio_platform_device 
*vfio_platform_remove_common(struct device *dev)
return vdev;
 }
 EXPORT_SYMBOL_GPL(vfio_platform_remove_common);
+
+MODULE_VERSION(DRIVER_VERSION);
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR(DRIVER_AUTHOR);
+MODULE_DESCRIPTION(DRIVER_DESC);
diff --git a/drivers/vfio/platform/vfio_platform_private.h 
b/drivers/vfio/platform/vfio_platform_private.h
index 1c9b3d5..7128690 100644
--- a/drivers/vfio/platform/vfio_platform_private.h
+++ b/drivers/vfio/platform/vfio_platform_private.h
@@ -56,6 +56,7 @@ struct vfio_platform_device {
u32 num_irqs;
int refcnt;
struct mutexigate;
+   struct module   *parent_module;
 
/*
 * These fields should be filled by the bus specific binder
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 0/7] VFIO platform reset module rework

2015-10-23 Thread Eric Auger
On 10/23/2015 05:47 PM, Eric Auger wrote:
> This series fixes the current implementation by getting rid of the
> usage of __symbol_get which caused a compilation issue with
> CONFIG_MODULES disabled. On top of this, the usage of MODULE_ALIAS makes
> possible to add a new reset module without being obliged to update the
> framework. The new implementation relies on the reset module registering
> its reset function to the vfio-platform driver.
> 
> The series is available at
> 
> https://git.linaro.org/people/eric.auger/linux.git/shortlog/refs/heads/v4.3-rc6-rework-v3
argh,

https://git.linaro.org/people/eric.auger/linux.git/shortlog/refs/heads/v4.3-rc6-rework-v4

But scatterbrained as I am, I am pretty sure I will do a new respin.

Thanks for your patience :-(

Eric
> 
> Best Regards
> 
> Eric
> 
> v3 -> v4:
> - Remove the EXPORT_SYMBOL_GPL(vfio_platform_calxedaxgmac_reset) later
>   in [6/7], to keep the functionality working all along the series
> - Add Arnd R-b (I dared to keep them despite the above change)
> - vfio_platform_unregister_reset gets the reset function to do a double
>   check on the compat and the function pointer too
> - __vfio_platform_register_reset turned to 'void'
> 
> v2 -> v3:
> - use driver_mutex instead of reset_mutex
> - style fixes: single mutex_unlock
> - use static nodes; vfio_platform_register_reset now is a macro
> - vfio_platform_reset_private.h removed since reset_module_(un)register
>   disappear. No use of symbol_get anymore.
> - new patch introducing vfio-platform-base
> - reset look-up moved back at vfio-platform probe time
> - new patch featuring dev_info/dev_warn
> 
> v1 -> v2:
> * in vfio_platform_common.c:
>   - move reset lookup at load time and put reset at release: this is to
> prevent a race between the 2 load module loads
>   - reset_list becomes static
>   - vfio_platform_register/unregister_reset take a const char * as compat
>   - fix node link
>   - remove old combo struct and cleanup proto of vfio_platform_get_reset
>   - add mutex to protect the reset list
> * in calxeda xgmac reset module
>   - introduce vfio_platform_reset_private.h
>   - use module_vfio_reset_handler macro
>   - do not export vfio_platform_calxedaxgmac_reset symbol anymore
>   - add a pr_info to show the device is reset by vfio reset module
> 
> 
> 
> Eric Auger (7):
>   vfio: platform: introduce vfio-platform-base module
>   vfio: platform: add capability to register a reset function
>   vfio: platform: introduce module_vfio_reset_handler macro
>   vfio: platform: reset: calxedaxgmac: add reset function registration
>   vfio: platform: add compat in vfio_platform_device
>   vfio: platform: use list of registered reset function
>   vfio: platform: add dev_info on device reset
> 
>  drivers/vfio/platform/Makefile |   6 +-
>  .../platform/reset/vfio_platform_calxedaxgmac.c|   5 +-
>  drivers/vfio/platform/vfio_amba.c  |   1 +
>  drivers/vfio/platform/vfio_platform.c  |   1 +
>  drivers/vfio/platform/vfio_platform_common.c   | 119 
> +++--
>  drivers/vfio/platform/vfio_platform_private.h  |  40 ++-
>  6 files changed, 130 insertions(+), 42 deletions(-)
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 3/7] vfio: platform: introduce module_vfio_reset_handler macro

2015-10-23 Thread Arnd Bergmann
On Friday 23 October 2015 17:47:21 Eric Auger wrote:
> The module_vfio_reset_handler macro
> - define a module alias
> - implement module init/exit function which respectively registers
>   and unregisters the reset function.
> 
> Signed-off-by: Eric Auger 
> 

Reviewed-by: Arnd Bergmann 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 2/7] vfio: platform: add capability to register a reset function

2015-10-23 Thread Arnd Bergmann
On Friday 23 October 2015 17:47:20 Eric Auger wrote:
> In preparation for subsequent changes in reset function lookup,
> lets introduce a dynamic list of reset combos (compat string,
> reset module, reset function). The list can be populated/voided with
> vfio_platform_register/unregister_reset. Those are not yet used in
> this patch.
> 
> Signed-off-by: Eric Auger 
> 
> 

Reviewed-by: Arnd Bergmann 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 5/7] vfio: platform: add compat in vfio_platform_device

2015-10-23 Thread Eric Auger
Let's retrieve the compatibility string on probe and store it
in the vfio_platform_device struct

Signed-off-by: Eric Auger 

---

v2 -> v3:
- populate compat after vdev check
---
 drivers/vfio/platform/vfio_platform_common.c  | 15 ---
 drivers/vfio/platform/vfio_platform_private.h |  1 +
 2 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/vfio/platform/vfio_platform_common.c 
b/drivers/vfio/platform/vfio_platform_common.c
index 3b7e52c..f2d41a0 100644
--- a/drivers/vfio/platform/vfio_platform_common.c
+++ b/drivers/vfio/platform/vfio_platform_common.c
@@ -41,16 +41,11 @@ static const struct vfio_platform_reset_combo 
reset_lookup_table[] = {
 static void vfio_platform_get_reset(struct vfio_platform_device *vdev,
struct device *dev)
 {
-   const char *compat;
int (*reset)(struct vfio_platform_device *);
-   int ret, i;
-
-   ret = device_property_read_string(dev, "compatible", );
-   if (ret)
-   return;
+   int i;
 
for (i = 0 ; i < ARRAY_SIZE(reset_lookup_table); i++) {
-   if (!strcmp(reset_lookup_table[i].compat, compat)) {
+   if (!strcmp(reset_lookup_table[i].compat, vdev->compat)) {
request_module(reset_lookup_table[i].module_name);
reset = __symbol_get(
reset_lookup_table[i].reset_function_name);
@@ -544,6 +539,12 @@ int vfio_platform_probe_common(struct vfio_platform_device 
*vdev,
if (!vdev)
return -EINVAL;
 
+   ret = device_property_read_string(dev, "compatible", >compat);
+   if (ret) {
+   pr_err("VFIO: cannot retrieve compat for %s\n", vdev->name);
+   return -EINVAL;
+   }
+
group = iommu_group_get(dev);
if (!group) {
pr_err("VFIO: No IOMMU group for device %s\n", vdev->name);
diff --git a/drivers/vfio/platform/vfio_platform_private.h 
b/drivers/vfio/platform/vfio_platform_private.h
index fd262be..415310f 100644
--- a/drivers/vfio/platform/vfio_platform_private.h
+++ b/drivers/vfio/platform/vfio_platform_private.h
@@ -57,6 +57,7 @@ struct vfio_platform_device {
int refcnt;
struct mutexigate;
struct module   *parent_module;
+   const char  *compat;
 
/*
 * These fields should be filled by the bus specific binder
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 6/7] vfio: platform: use list of registered reset function

2015-10-23 Thread Eric Auger
Remove the static lookup table and use the dynamic list of registered
reset functions instead. Also load the reset module through its alias.
The reset struct module pointer is stored in vfio_platform_device.

We also remove the useless struct device pointer parameter in
vfio_platform_get_reset.

This patch fixes the issue related to the usage of __symbol_get, which
besides from being moot, prevented compilation with CONFIG_MODULES
disabled.

Also usage of MODULE_ALIAS makes possible to add a new reset module
without needing to update the framework. This was suggested by Arnd.

Signed-off-by: Eric Auger 
Reported-by: Arnd Bergmann 
Reviewed-by: Arnd Bergmann 

---

v3 -> v4:
- add Arnd R-b.
- Remove the EXPORT_SYMBOL_GPL(vfio_platform_calxedaxgmac_reset) here

v2 -> v3:
- remove clear of vfio_platform_device reset_module and reset
  in vfio_platform_put_reset
- single unlock in vfio_platform_lookup_reset
- use driver_lock instead of reset_lock

v1 -> v2:
- use reset_lock in vfio_platform_lookup_reset
- remove vfio_platform_reset_combo declaration
- remove struct device *dev parameter in vfio_platform_get_reset
- set reset_module and reset to NULL in put function
---
 .../platform/reset/vfio_platform_calxedaxgmac.c|  1 -
 drivers/vfio/platform/vfio_platform_common.c   | 52 --
 drivers/vfio/platform/vfio_platform_private.h  |  7 +--
 3 files changed, 30 insertions(+), 30 deletions(-)

diff --git a/drivers/vfio/platform/reset/vfio_platform_calxedaxgmac.c 
b/drivers/vfio/platform/reset/vfio_platform_calxedaxgmac.c
index 80718f2..640f5d8 100644
--- a/drivers/vfio/platform/reset/vfio_platform_calxedaxgmac.c
+++ b/drivers/vfio/platform/reset/vfio_platform_calxedaxgmac.c
@@ -76,7 +76,6 @@ int vfio_platform_calxedaxgmac_reset(struct 
vfio_platform_device *vdev)
 
return 0;
 }
-EXPORT_SYMBOL_GPL(vfio_platform_calxedaxgmac_reset);
 
 module_vfio_reset_handler("calxeda,hb-xgmac", 
vfio_platform_calxedaxgmac_reset);
 
diff --git a/drivers/vfio/platform/vfio_platform_common.c 
b/drivers/vfio/platform/vfio_platform_common.c
index f2d41a0..f74836a 100644
--- a/drivers/vfio/platform/vfio_platform_common.c
+++ b/drivers/vfio/platform/vfio_platform_common.c
@@ -30,37 +30,43 @@
 static LIST_HEAD(reset_list);
 static DEFINE_MUTEX(driver_lock);
 
-static const struct vfio_platform_reset_combo reset_lookup_table[] = {
-   {
-   .compat = "calxeda,hb-xgmac",
-   .reset_function_name = "vfio_platform_calxedaxgmac_reset",
-   .module_name = "vfio-platform-calxedaxgmac",
-   },
-};
-
-static void vfio_platform_get_reset(struct vfio_platform_device *vdev,
-   struct device *dev)
+static vfio_platform_reset_fn_t vfio_platform_lookup_reset(const char *compat,
+   struct module **module)
 {
-   int (*reset)(struct vfio_platform_device *);
-   int i;
+   struct vfio_platform_reset_node *iter;
+   vfio_platform_reset_fn_t reset_fn = NULL;
 
-   for (i = 0 ; i < ARRAY_SIZE(reset_lookup_table); i++) {
-   if (!strcmp(reset_lookup_table[i].compat, vdev->compat)) {
-   request_module(reset_lookup_table[i].module_name);
-   reset = __symbol_get(
-   reset_lookup_table[i].reset_function_name);
-   if (reset) {
-   vdev->reset = reset;
-   return;
-   }
+   mutex_lock(_lock);
+   list_for_each_entry(iter, _list, link) {
+   if (!strcmp(iter->compat, compat) &&
+   try_module_get(iter->owner)) {
+   *module = iter->owner;
+   reset_fn = iter->reset;
+   break;
}
}
+   mutex_unlock(_lock);
+   return reset_fn;
+}
+
+static void vfio_platform_get_reset(struct vfio_platform_device *vdev)
+{
+   char modname[256];
+
+   vdev->reset = vfio_platform_lookup_reset(vdev->compat,
+   >reset_module);
+   if (!vdev->reset) {
+   snprintf(modname, 256, "vfio-reset:%s", vdev->compat);
+   request_module(modname);
+   vdev->reset = vfio_platform_lookup_reset(vdev->compat,
+>reset_module);
+   }
 }
 
 static void vfio_platform_put_reset(struct vfio_platform_device *vdev)
 {
if (vdev->reset)
-   symbol_put_addr(vdev->reset);
+   module_put(vdev->reset_module);
 }
 
 static int vfio_platform_regions_init(struct vfio_platform_device *vdev)
@@ -557,7 +563,7 @@ int vfio_platform_probe_common(struct vfio_platform_device 
*vdev,
return ret;
}
 
-   vfio_platform_get_reset(vdev, dev);
+   vfio_platform_get_reset(vdev);
 

[PATCH v4 7/7] vfio: platform: add dev_info on device reset

2015-10-23 Thread Eric Auger
It might be helpful for the end-user to check the device reset
function was found by the vfio platform reset framework.

Lets store a pointer to the struct device in vfio_platform_device
and trace when the reset function is called or not found.

Signed-off-by: Eric Auger 

---

v3: creation
---
 drivers/vfio/platform/vfio_platform_common.c  | 14 --
 drivers/vfio/platform/vfio_platform_private.h |  1 +
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/platform/vfio_platform_common.c 
b/drivers/vfio/platform/vfio_platform_common.c
index f74836a..376d289 100644
--- a/drivers/vfio/platform/vfio_platform_common.c
+++ b/drivers/vfio/platform/vfio_platform_common.c
@@ -144,8 +144,12 @@ static void vfio_platform_release(void *device_data)
mutex_lock(_lock);
 
if (!(--vdev->refcnt)) {
-   if (vdev->reset)
+   if (vdev->reset) {
+   dev_info(vdev->device, "reset\n");
vdev->reset(vdev);
+   } else {
+   dev_warn(vdev->device, "no reset function found!\n");
+   }
vfio_platform_regions_cleanup(vdev);
vfio_platform_irq_cleanup(vdev);
}
@@ -174,8 +178,12 @@ static int vfio_platform_open(void *device_data)
if (ret)
goto err_irq;
 
-   if (vdev->reset)
+   if (vdev->reset) {
+   dev_info(vdev->device, "reset\n");
vdev->reset(vdev);
+   } else {
+   dev_warn(vdev->device, "no reset function found!\n");
+   }
}
 
vdev->refcnt++;
@@ -551,6 +559,8 @@ int vfio_platform_probe_common(struct vfio_platform_device 
*vdev,
return -EINVAL;
}
 
+   vdev->device = dev;
+
group = iommu_group_get(dev);
if (!group) {
pr_err("VFIO: No IOMMU group for device %s\n", vdev->name);
diff --git a/drivers/vfio/platform/vfio_platform_private.h 
b/drivers/vfio/platform/vfio_platform_private.h
index d1b0668..42816dd 100644
--- a/drivers/vfio/platform/vfio_platform_private.h
+++ b/drivers/vfio/platform/vfio_platform_private.h
@@ -59,6 +59,7 @@ struct vfio_platform_device {
struct module   *parent_module;
const char  *compat;
struct module   *reset_module;
+   struct device   *device;
 
/*
 * These fields should be filled by the bus specific binder
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] VFIO: platform: reset: AMD xgbe reset module

2015-10-23 Thread Arnd Bergmann
On Friday 23 October 2015 17:58:33 Eric Auger wrote:
> This patch introduces a module that registers and implements a low-level
> reset function for the AMD XGBE device.
> 
> it performs the following actions:
> - reset the PHY
> - disable auto-negotiation
> - disable & clear auto-negotiation IRQ
> - soft-reset the MAC
> 
> Those tiny pieces of code are inherited from the native xgbe driver.
> 
> Signed-off-by: Eric Auger 
> 

Reviewed-by: Arnd Bergmann 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/4] KVM: X86: Migration is supported

2015-10-23 Thread Jian Zhou
Supported bits of MSR_IA32_DEBUGCTLMSR are DEBUGCTLMSR_LBR(bit 0),
DEBUGCTLMSR_BTF(bit 1) and DEBUGCTLMSR_FREEZE_LBRS_ON_PMI(bit 11).
Qemu can get/set contents of LBR MSRs and LBR status in order to
support migration.

Signed-off-by: Jian Zhou 
Signed-off-by: Stephen He 
---
 arch/x86/kvm/x86.c | 88 +++---
 1 file changed, 77 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9a9a198..a3c72db 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -136,6 +136,8 @@ struct kvm_shared_msrs {
 static struct kvm_shared_msrs_global __read_mostly shared_msrs_global;
 static struct kvm_shared_msrs __percpu *shared_msrs;

+#define MSR_LBR_STATUS 0xd6
+
 struct kvm_stats_debugfs_item debugfs_entries[] = {
{ "pf_fixed", VCPU_STAT(pf_fixed) },
{ "pf_guest", VCPU_STAT(pf_guest) },
@@ -1917,6 +1919,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
bool pr = false;
u32 msr = msr_info->index;
u64 data = msr_info->data;
+   u64 supported = 0;

switch (msr) {
case MSR_AMD64_NB_CFG:
@@ -1948,16 +1951,25 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
}
break;
case MSR_IA32_DEBUGCTLMSR:
-   if (!data) {
-   /* We support the non-activated case already */
-   break;
-   } else if (data & ~(DEBUGCTLMSR_LBR | DEBUGCTLMSR_BTF)) {
-   /* Values other than LBR and BTF are vendor-specific,
-  thus reserved and should throw a #GP */
+   supported = DEBUGCTLMSR_LBR | DEBUGCTLMSR_BTF |
+   DEBUGCTLMSR_FREEZE_LBRS_ON_PMI;
+
+   if (data & ~supported) {
+   /*
+* Values other than LBR/BTF/FREEZE_LBRS_ON_PMI
+* are not supported, thus reserved and should throw a 
#GP
+*/
+   vcpu_unimpl(vcpu, "%s: MSR_IA32_DEBUGCTLMSR 0x%llx, 
nop\n",
+   __func__, data);
return 1;
}
-   vcpu_unimpl(vcpu, "%s: MSR_IA32_DEBUGCTLMSR 0x%llx, nop\n",
-   __func__, data);
+   if (kvm_x86_ops->set_debugctlmsr) {
+   if (kvm_x86_ops->set_debugctlmsr(vcpu, data))
+   return 1;
+   }
+   else
+   return 1;
+
break;
case 0x200 ... 0x2ff:
return kvm_mtrr_set_msr(vcpu, msr, data);
@@ -2078,6 +2090,33 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
vcpu_unimpl(vcpu, "disabled perfctr wrmsr: "
"0x%x data 0x%llx\n", msr, data);
break;
+   case MSR_LBR_STATUS:
+   if (kvm_x86_ops->set_debugctlmsr) {
+   vcpu->arch.lbr_status = (data == 0) ? 0 : 1;
+   if (data)
+   kvm_x86_ops->set_debugctlmsr(vcpu,
+   DEBUGCTLMSR_LBR | 
DEBUGCTLMSR_FREEZE_LBRS_ON_PMI);
+   } else
+   vcpu_unimpl(vcpu, "lbr is disabled, ignored wrmsr: "
+   "0x%x data 0x%llx\n", msr, data);
+   break;
+   case MSR_LBR_SELECT:
+   case MSR_LBR_TOS:
+   case MSR_PENTIUM4_LER_FROM_LIP:
+   case MSR_PENTIUM4_LER_TO_LIP:
+   case MSR_PENTIUM4_LBR_TOS:
+   case MSR_IA32_LASTINTFROMIP:
+   case MSR_IA32_LASTINTTOIP:
+   case MSR_LBR_CORE2_FROM ... MSR_LBR_CORE2_FROM + 0x7:
+   case MSR_LBR_CORE2_TO ... MSR_LBR_CORE2_TO + 0x7:
+   case MSR_LBR_NHM_FROM ... MSR_LBR_NHM_FROM + 0x1f:
+   case MSR_LBR_NHM_TO ... MSR_LBR_NHM_TO + 0x1f:
+   if (kvm_x86_ops->set_lbr_msr)
+   kvm_x86_ops->set_lbr_msr(vcpu, msr, data);
+   else
+   vcpu_unimpl(vcpu, "lbr is disabled, ignored wrmsr: "
+   "0x%x data 0x%llx\n", msr, data);
+   break;
case MSR_K7_CLK_CTL:
/*
 * Ignore all writes to this no longer documented MSR.
@@ -2178,13 +2217,16 @@ static int get_msr_mce(struct kvm_vcpu *vcpu, u32 msr, 
u64 *pdata)
 int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 {
switch (msr_info->index) {
+   case MSR_IA32_DEBUGCTLMSR:
+   if (kvm_x86_ops->get_debugctlmsr)
+   msr_info->data = kvm_x86_ops->get_debugctlmsr();
+   else
+   msr_info->data = 0;
+   break;
case MSR_IA32_PLATFORM_ID:
case 

Re: [PATCH v2 0/3] target-i386: save/restore vcpu's TSC rate during migration

2015-10-23 Thread Eduardo Habkost
On Fri, Oct 23, 2015 at 08:35:20AM -0200, Marcelo Tosatti wrote:
> On Thu, Oct 22, 2015 at 04:45:21PM -0200, Eduardo Habkost wrote:
> > On Tue, Oct 20, 2015 at 03:22:51PM +0800, Haozhong Zhang wrote:
> > > This patchset enables QEMU to save/restore vcpu's TSC rate during the
> > > migration. When cooperating with KVM which supports TSC scaling, guest
> > > programs can observe a consistent guest TSC rate even though they are
> > > migrated among machines with different host TSC rates.
> > > 
> > > A pair of cpu options 'save-tsc-freq' and 'load-tsc-freq' are added to
> > > control the migration of vcpu's TSC rate.
> > 
> > The requirements and goals aren't clear to me. I see two possible use
> > cases, here:
> > 
> > 1) Best effort to keep TSC frequency constant if possible (but not
> >aborting migration if not possible). This would be an interesting
> >default, but a bit unpredictable.
> > 2) Strictly ensuring TSC frequency stays constant on migration (and
> >aborting migration if not possible). This would be an useful feature,
> >but can't be enabled by default unless both hosts have the same TSC
> >frequency or support TSC scaling.
> 
> Only destination needs to support TSC scaling, to match the frequency
> of the incoming host.

True.

> 
> The KVM code for this feature has submitted or integrated? 
> 
> > Which one(s) you are trying to implement?
> > 
> > In other words, what is the right behavior when KVM_SET_TSC_KHZ fails or
> > KVM_CAP_TSC_CONTROL is not available? We can't answer that question if
> > the requirements and goals are not clear.
> > 
> > Once we know what exactly is the goal, we could enable the new mode with
> > a single option, instead of raw options to control migration stream
> > loading/saving.
> 
> Windows and Linux guests have paravirt clocks and/or options to
> disable direct TSC usage for timekeeping purposes. So disabling
> migration seems overkill.

I assume that users who set TSC frequency explicitly in the VM config
care about it (otherwise they wouldn't be setting it explicitly) and
don't want it to change after migration.

-- 
Eduardo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 7/7] vfio: platform: add dev_info on device reset

2015-10-23 Thread Eric Auger
It might be helpful for the end-user to check the device reset
function was found by the vfio platform reset framework.

Lets store a pointer to the struct device in vfio_platform_device
and trace when the reset function is called or not found.

Signed-off-by: Eric Auger 

---

v3: creation
---
 drivers/vfio/platform/vfio_platform_common.c  | 14 --
 drivers/vfio/platform/vfio_platform_private.h |  1 +
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/platform/vfio_platform_common.c 
b/drivers/vfio/platform/vfio_platform_common.c
index de3fb33..207fab6 100644
--- a/drivers/vfio/platform/vfio_platform_common.c
+++ b/drivers/vfio/platform/vfio_platform_common.c
@@ -144,8 +144,12 @@ static void vfio_platform_release(void *device_data)
mutex_lock(_lock);
 
if (!(--vdev->refcnt)) {
-   if (vdev->reset)
+   if (vdev->reset) {
+   dev_info(vdev->device, "reset\n");
vdev->reset(vdev);
+   } else {
+   dev_warn(vdev->device, "no reset function found!\n");
+   }
vfio_platform_regions_cleanup(vdev);
vfio_platform_irq_cleanup(vdev);
}
@@ -174,8 +178,12 @@ static int vfio_platform_open(void *device_data)
if (ret)
goto err_irq;
 
-   if (vdev->reset)
+   if (vdev->reset) {
+   dev_info(vdev->device, "reset\n");
vdev->reset(vdev);
+   } else {
+   dev_warn(vdev->device, "no reset function found!\n");
+   }
}
 
vdev->refcnt++;
@@ -551,6 +559,8 @@ int vfio_platform_probe_common(struct vfio_platform_device 
*vdev,
return -EINVAL;
}
 
+   vdev->device = dev;
+
group = iommu_group_get(dev);
if (!group) {
pr_err("VFIO: No IOMMU group for device %s\n", vdev->name);
diff --git a/drivers/vfio/platform/vfio_platform_private.h 
b/drivers/vfio/platform/vfio_platform_private.h
index e505c15..ccb99b4 100644
--- a/drivers/vfio/platform/vfio_platform_private.h
+++ b/drivers/vfio/platform/vfio_platform_private.h
@@ -59,6 +59,7 @@ struct vfio_platform_device {
struct module   *parent_module;
const char  *compat;
struct module   *reset_module;
+   struct device   *device;
 
/*
 * These fields should be filled by the bus specific binder
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 5/7] vfio: platform: add compat in vfio_platform_device

2015-10-23 Thread Eric Auger
Let's retrieve the compatibility string on probe and store it
in the vfio_platform_device struct

Signed-off-by: Eric Auger 

---

v2 -> v3:
- populate compat after vdev check
---
 drivers/vfio/platform/vfio_platform_common.c  | 15 ---
 drivers/vfio/platform/vfio_platform_private.h |  1 +
 2 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/vfio/platform/vfio_platform_common.c 
b/drivers/vfio/platform/vfio_platform_common.c
index 8eccd30..50a388b 100644
--- a/drivers/vfio/platform/vfio_platform_common.c
+++ b/drivers/vfio/platform/vfio_platform_common.c
@@ -41,16 +41,11 @@ static const struct vfio_platform_reset_combo 
reset_lookup_table[] = {
 static void vfio_platform_get_reset(struct vfio_platform_device *vdev,
struct device *dev)
 {
-   const char *compat;
int (*reset)(struct vfio_platform_device *);
-   int ret, i;
-
-   ret = device_property_read_string(dev, "compatible", );
-   if (ret)
-   return;
+   int i;
 
for (i = 0 ; i < ARRAY_SIZE(reset_lookup_table); i++) {
-   if (!strcmp(reset_lookup_table[i].compat, compat)) {
+   if (!strcmp(reset_lookup_table[i].compat, vdev->compat)) {
request_module(reset_lookup_table[i].module_name);
reset = __symbol_get(
reset_lookup_table[i].reset_function_name);
@@ -544,6 +539,12 @@ int vfio_platform_probe_common(struct vfio_platform_device 
*vdev,
if (!vdev)
return -EINVAL;
 
+   ret = device_property_read_string(dev, "compatible", >compat);
+   if (ret) {
+   pr_err("VFIO: cannot retrieve compat for %s\n", vdev->name);
+   return -EINVAL;
+   }
+
group = iommu_group_get(dev);
if (!group) {
pr_err("VFIO: No IOMMU group for device %s\n", vdev->name);
diff --git a/drivers/vfio/platform/vfio_platform_private.h 
b/drivers/vfio/platform/vfio_platform_private.h
index 5a1e8e6..f8072d8 100644
--- a/drivers/vfio/platform/vfio_platform_private.h
+++ b/drivers/vfio/platform/vfio_platform_private.h
@@ -57,6 +57,7 @@ struct vfio_platform_device {
int refcnt;
struct mutexigate;
struct module   *parent_module;
+   const char  *compat;
 
/*
 * These fields should be filled by the bus specific binder
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/12] KVM: x86: add support for VMX TSC scaling

2015-10-23 Thread Joerg Roedel
On Fri, Oct 23, 2015 at 08:32:28PM +0800, Haozhong Zhang wrote:
> No, since I don't have AMD machines at hand. The modifications to SVM
> code are mostly lifting common code with VMX TSC scaling code, so it
> should still work on AMD machines.

Well, I think it would be good if you can provide a Tested-by on AMD
machines from someone who has one. Or get one yourself when changing AMD
specific code, they are not that expensive :)
I can do some testing when I am back from my travels, but that will not
be before early November.

Joerg
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 1/7] vfio: platform: introduce vfio-platform-base module

2015-10-23 Thread Arnd Bergmann
On Friday 23 October 2015 14:37:09 Eric Auger wrote:
> To prepare for vfio platform reset rework let's build
> vfio_platform_common.c and vfio_platform_irq.c in a separate
> module from vfio-platform and vfio-amba. This makes possible
> to have separate module inits and works around a race between
> platform driver init and vfio reset module init: that way we
> make sure symbols exported by base are available when vfio-platform
> driver gets probed.
> 
> The open/release being implemented in the base module, the ref
> count is applied to the parent module instead.
> 
> Signed-off-by: Eric Auger 
> Suggested-by: Arnd Bergmann 
> 
> 

Reviewed-by: Arnd Bergmann 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 6/7] vfio: platform: use list of registered reset function

2015-10-23 Thread Arnd Bergmann
On Friday 23 October 2015 14:37:14 Eric Auger wrote:
> Remove the static lookup table and use the dynamic list of registered
> reset functions instead. Also load the reset module through its alias.
> The reset struct module pointer is stored in vfio_platform_device.
> 
> We also remove the useless struct device pointer parameter in
> vfio_platform_get_reset.
> 
> This patch fixes the issue related to the usage of __symbol_get, which
> besides from being moot, prevented compilation with CONFIG_MODULES
> disabled.
> 
> Also usage of MODULE_ALIAS makes possible to add a new reset module
> without needing to update the framework. This was suggested by Arnd.
> 
> Signed-off-by: Eric Auger 
> Reported-by: Arnd Bergmann 
> 
> 
Reviewed-by: Arnd Bergmann 

but doesn't this need to come before patch 4/7?

Arnd
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next RFC 2/2] vhost_net: basic polling support

2015-10-23 Thread Michael S. Tsirkin
On Fri, Oct 23, 2015 at 03:13:07PM +0800, Jason Wang wrote:
> 
> 
> On 10/22/2015 05:33 PM, Michael S. Tsirkin wrote:
> > On Thu, Oct 22, 2015 at 01:27:29AM -0400, Jason Wang wrote:
> >> This patch tries to poll for new added tx buffer for a while at the
> >> end of tx processing. The maximum time spent on polling were limited
> >> through a module parameter. To avoid block rx, the loop will end it
> >> there's new other works queued on vhost so in fact socket receive
> >> queue is also be polled.
> >>
> >> busyloop_timeout = 50 gives us following improvement on TCP_RR test:
> >>
> >> size/session/+thu%/+normalize%
> >> 1/ 1/   +5%/  -20%
> >> 1/50/  +17%/   +3%
> > Is there a measureable increase in cpu utilization
> > with busyloop_timeout = 0?
> 
> Just run TCP_RR, no increasing. Will run a complete test on next version.
> 
> >
> >> Signed-off-by: Jason Wang 
> > We might be able to shave off the minor regression
> > by careful use of likely/unlikely, or maybe
> > deferring 
> 
> Yes, but what did "deferring" mean here?

Don't call local_clock until we know we'll need it.

> >
> >> ---
> >>  drivers/vhost/net.c | 19 +++
> >>  1 file changed, 19 insertions(+)
> >>
> >> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> >> index 9eda69e..bbb522a 100644
> >> --- a/drivers/vhost/net.c
> >> +++ b/drivers/vhost/net.c
> >> @@ -31,7 +31,9 @@
> >>  #include "vhost.h"
> >>  
> >>  static int experimental_zcopytx = 1;
> >> +static int busyloop_timeout = 50;
> >>  module_param(experimental_zcopytx, int, 0444);
> >> +module_param(busyloop_timeout, int, 0444);
> > Pls add a description, including the units and the special
> > value 0.
> 
> Ok.
> 
> >
> >>  MODULE_PARM_DESC(experimental_zcopytx, "Enable Zero Copy TX;"
> >>   " 1 -Enable; 0 - Disable");
> >>  
> >> @@ -287,12 +289,23 @@ static void vhost_zerocopy_callback(struct ubuf_info 
> >> *ubuf, bool success)
> >>rcu_read_unlock_bh();
> >>  }
> >>  
> >> +static bool tx_can_busy_poll(struct vhost_dev *dev,
> >> +   unsigned long endtime)
> >> +{
> >> +  unsigned long now = local_clock() >> 10;
> > local_clock might go backwards if we jump between CPUs.
> > One way to fix would be to record the CPU id and break
> > out of loop if that changes.
> 
> Right, or maybe disable preemption in this case?
> 
> >
> > Also - defer this until we actually know we need it?
> 
> Right.
> 
> >
> >> +
> >> +  return busyloop_timeout && !need_resched() &&
> >> + !time_after(now, endtime) && !vhost_has_work(dev) &&
> >> + single_task_running();
> > signal pending as well?
> 
> Yes.
> 
> >> +}
> >> +
> >>  /* Expects to be always run from workqueue - which acts as
> >>   * read-size critical section for our kind of RCU. */
> >>  static void handle_tx(struct vhost_net *net)
> >>  {
> >>struct vhost_net_virtqueue *nvq = >vqs[VHOST_NET_VQ_TX];
> >>struct vhost_virtqueue *vq = >vq;
> >> +  unsigned long endtime;
> >>unsigned out, in;
> >>int head;
> >>struct msghdr msg = {
> >> @@ -331,6 +344,8 @@ static void handle_tx(struct vhost_net *net)
> >>  % UIO_MAXIOV == nvq->done_idx))
> >>break;
> >>  
> >> +  endtime  = (local_clock() >> 10) + busyloop_timeout;
> >> +again:
> >>head = vhost_get_vq_desc(vq, vq->iov,
> >> ARRAY_SIZE(vq->iov),
> >> , ,
> >> @@ -340,6 +355,10 @@ static void handle_tx(struct vhost_net *net)
> >>break;
> >>/* Nothing new?  Wait for eventfd to tell us they refilled. */
> >>if (head == vq->num) {
> >> +  if (tx_can_busy_poll(vq->dev, endtime)) {
> >> +  cpu_relax();
> >> +  goto again;
> >> +  }
> >>if (unlikely(vhost_enable_notify(>dev, vq))) {
> >>vhost_disable_notify(>dev, vq);
> >>continue;
> >> -- 
> >> 1.8.3.1
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/12] KVM: x86: add support for VMX TSC scaling

2015-10-23 Thread Paolo Bonzini


On 23/10/2015 14:46, Joerg Roedel wrote:
>> > No, since I don't have AMD machines at hand. The modifications to SVM
>> > code are mostly lifting common code with VMX TSC scaling code, so it
>> > should still work on AMD machines.
> Well, I think it would be good if you can provide a Tested-by on AMD
> machines from someone who has one. Or get one yourself when changing AMD
> specific code, they are not that expensive :)
> I can do some testing when I am back from my travels, but that will not
> be before early November.

I have one now (mine, not just Red Hat's). :D

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 3/7] vfio: platform: introduce module_vfio_reset_handler macro

2015-10-23 Thread Arnd Bergmann
On Friday 23 October 2015 14:37:11 Eric Auger wrote:
> +static int __init reset ## _module_init(void)  \
> +{  \
> +   vfio_platform_register_reset(compat, reset);\
> +   return 0;   \
> +}; \
> 

I would make this 'return vfio_platform_register_reset(...)', so loading
the driver fails if the handler cannot get registered. Alternatively,
change the return type of vfio_platform_register_reset to 'void'
so it can never fail.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/3] KVM: arm/arm64: Clean up some obsolete code

2015-10-23 Thread Pavel Fedin
Current KVM code has lots of old redundancies, which can be cleaned up.
This patchset is actually a better alternative to
http://www.spinics.net/lists/arm-kernel/msg430726.html, which allows to
keep piggy-backed LRs. The idea is based on the fact that our code also
maintains LR state in elrsr, and this information is enough to track LR
usage.

In case of problems this series can be applied partially, each patch is
a complete refactoring step on its own.

Thanks to Andre Przywara for pinpointing some 4.3+ specifics.

This version has been tested on SMDK5410 development board
(Exynos5410 SoC).

v1 => v2:
- Rebased to kvmarm/next of 23.10.2015.
- Do not use vgic_retire_lr() for initializing ELRSR bitmask, because now
  it also handles pushback of PENDING state, use direct initialization
  instead (copied from Andre's patchset).
- Took more care about vgic_retire_lr(), which has deserved own patch.

Pavel Fedin (3):
  KVM: arm/arm64: Optimize away redundant LR tracking
  KVM: arm/arm64: Merge vgic_set_lr() and vgic_sync_lr_elrsr()
  KVM: arm/arm64: Refactor vgic_retire_lr()

 include/kvm/arm_vgic.h |   7 
 virt/kvm/arm/vgic-v2.c |   6 +--
 virt/kvm/arm/vgic-v3.c |   6 +--
 virt/kvm/arm/vgic.c| 106 ++---
 4 files changed, 31 insertions(+), 94 deletions(-)

-- 
2.4.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/7] vfio: platform: add capability to register a reset function

2015-10-23 Thread Arnd Bergmann
On Friday 23 October 2015 14:37:10 Eric Auger wrote:
> +
> +void vfio_platform_unregister_reset(const char *compat)
> +{
> +   struct vfio_platform_reset_node *iter, *temp;
> +
> +   mutex_lock(_lock);
> +   list_for_each_entry_safe(iter, temp, _list, link) {
> +   if (!strcmp(iter->compat, compat)) {
> +   list_del(>link);
> +   break;
> +   }
> +   }
> +
> +   mutex_unlock(_lock);
> +}
> +EXPORT_SYMBOL_GPL(vfio_platform_unregister_reset);
> 

This is slightly unsafe in case you ever get two drivers that register
with the same compat string. If we care about that, we could pass
and compare both the string and the function pointer, or the
vfio_platform_reset_node.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 6/7] vfio: platform: use list of registered reset function

2015-10-23 Thread Eric Auger
Hi Arnd,
On 10/23/2015 03:12 PM, Arnd Bergmann wrote:
> On Friday 23 October 2015 14:37:14 Eric Auger wrote:
>> Remove the static lookup table and use the dynamic list of registered
>> reset functions instead. Also load the reset module through its alias.
>> The reset struct module pointer is stored in vfio_platform_device.
>>
>> We also remove the useless struct device pointer parameter in
>> vfio_platform_get_reset.
>>
>> This patch fixes the issue related to the usage of __symbol_get, which
>> besides from being moot, prevented compilation with CONFIG_MODULES
>> disabled.
>>
>> Also usage of MODULE_ALIAS makes possible to add a new reset module
>> without needing to update the framework. This was suggested by Arnd.
>>
>> Signed-off-by: Eric Auger 
>> Reported-by: Arnd Bergmann 
>>
>>
> Reviewed-by: Arnd Bergmann 
> 
> but doesn't this need to come before patch 4/7?
Well I don't think so. In [4] we introduce the dynamic registration
method but until this patch we still use the old lookup method in the
static table. I tested and the reset lookup still works in [4].
If we put this one before the registration, the functionality will be
lost here.

Eric

>   Arnd
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 3/7] vfio: platform: introduce module_vfio_reset_handler macro

2015-10-23 Thread Eric Auger
Hi Arnd,
On 10/23/2015 03:09 PM, Arnd Bergmann wrote:
> On Friday 23 October 2015 14:37:11 Eric Auger wrote:
>> +static int __init reset ## _module_init(void)  \
>> +{  \
>> +   vfio_platform_register_reset(compat, reset);\
>> +   return 0;   \
>> +}; \
>>
> 
> I would make this 'return vfio_platform_register_reset(...)', so loading
> the driver fails if the handler cannot get registered. Alternatively,
> change the return type of vfio_platform_register_reset to 'void'
> so it can never fail.
I will turn __vfio_platform_register_reset to 'void' then since it has
no reason to fail now.

Thanks

Eric

> 
>   Arnd
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/7] vfio: platform: add capability to register a reset function

2015-10-23 Thread Eric Auger
On 10/23/2015 03:07 PM, Arnd Bergmann wrote:
> On Friday 23 October 2015 14:37:10 Eric Auger wrote:
>> +
>> +void vfio_platform_unregister_reset(const char *compat)
>> +{
>> +   struct vfio_platform_reset_node *iter, *temp;
>> +
>> +   mutex_lock(_lock);
>> +   list_for_each_entry_safe(iter, temp, _list, link) {
>> +   if (!strcmp(iter->compat, compat)) {
>> +   list_del(>link);
>> +   break;
>> +   }
>> +   }
>> +
>> +   mutex_unlock(_lock);
>> +}
>> +EXPORT_SYMBOL_GPL(vfio_platform_unregister_reset);
>>
> 
> This is slightly unsafe in case you ever get two drivers that register
> with the same compat string. If we care about that, we could pass
> and compare both the string and the function pointer, or the
> vfio_platform_reset_node.
OK

Thanks

Eric
> 
>   Arnd
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v2 0/3] target-i386: save/restore vcpu's TSC rate during migration

2015-10-23 Thread Haozhong Zhang
On Fri, Oct 23, 2015 at 08:35:20AM -0200, Marcelo Tosatti wrote:
> On Thu, Oct 22, 2015 at 04:45:21PM -0200, Eduardo Habkost wrote:
> > On Tue, Oct 20, 2015 at 03:22:51PM +0800, Haozhong Zhang wrote:
> > > This patchset enables QEMU to save/restore vcpu's TSC rate during the
> > > migration. When cooperating with KVM which supports TSC scaling, guest
> > > programs can observe a consistent guest TSC rate even though they are
> > > migrated among machines with different host TSC rates.
> > > 
> > > A pair of cpu options 'save-tsc-freq' and 'load-tsc-freq' are added to
> > > control the migration of vcpu's TSC rate.
> > 
> > The requirements and goals aren't clear to me. I see two possible use
> > cases, here:
> > 
> > 1) Best effort to keep TSC frequency constant if possible (but not
> >aborting migration if not possible). This would be an interesting
> >default, but a bit unpredictable.
> > 2) Strictly ensuring TSC frequency stays constant on migration (and
> >aborting migration if not possible). This would be an useful feature,
> >but can't be enabled by default unless both hosts have the same TSC
> >frequency or support TSC scaling.
> 
> Only destination needs to support TSC scaling, to match the frequency
> of the incoming host.
>

Yes.

> The KVM code for this feature has submitted or integrated?

submitted and can be found at http://www.spinics.net/lists/kvm/msg122431.html

> 
> > Which one(s) you are trying to implement?
> > 
> > In other words, what is the right behavior when KVM_SET_TSC_KHZ fails or
> > KVM_CAP_TSC_CONTROL is not available? We can't answer that question if
> > the requirements and goals are not clear.
> > 
> > Once we know what exactly is the goal, we could enable the new mode with
> > a single option, instead of raw options to control migration stream
> > loading/saving.
> 
> Windows and Linux guests have paravirt clocks and/or options to
> disable direct TSC usage for timekeeping purposes. So disabling
> migration seems overkill.
>

For KVM clock, guest users still need to know the host TSC (possibly
adjusted by scaling and offset) to know how long has passed since the
time provided by the PV clock. The KVM patch has adjusted KVM clock
for VMX TSC scaling so that it can be safely used across migration.

Haozhong

> > 
> > 
> > >  * By default, the migration of vcpu's TSC rate is enabled only on
> > >pc-*-2.5 and newer machine types. If the cpu option 'save-tsc-freq'
> > >is present, the vcpu's TSC rate will be migrated from older machine
> > >types as well.
> > >  * Another cpu option 'load-tsc-freq' controls whether the migrated
> > >vcpu's TSC rate is used. By default, QEMU will not use the migrated
> > >TSC rate if this option is not present. Otherwise, QEMU will use
> > >the migrated TSC rate and override the TSC rate given by the cpu
> > >option 'tsc-freq'.
> > > 
> > > Changes in v2:
> > >  * Add a pair of cpu options 'save-tsc-freq' and 'load-tsc-freq' to
> > >control the migration of vcpu's TSC rate.
> > >  * Move all logic of setting TSC rate to target-i386.
> > >  * Remove the duplicated TSC setup in kvm_arch_init_vcpu().
> > > 
> > > Haozhong Zhang (3):
> > >   target-i386: add a subsection for migrating vcpu's TSC rate
> > >   target-i386: calculate vcpu's TSC rate to be migrated
> > >   target-i386: load the migrated vcpu's TSC rate
> > > 
> > >  include/hw/i386/pc.h  |  5 +
> > >  target-i386/cpu.c |  2 ++
> > >  target-i386/cpu.h |  3 +++
> > >  target-i386/kvm.c | 61 
> > > +++
> > >  target-i386/machine.c | 19 
> > >  5 files changed, 81 insertions(+), 9 deletions(-)
> > > 
> > > -- 
> > > 2.4.8
> > > 
> > 
> > -- 
> > Eduardo
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 1/7] vfio: platform: introduce vfio-platform-base module

2015-10-23 Thread Eric Auger
To prepare for vfio platform reset rework let's build
vfio_platform_common.c and vfio_platform_irq.c in a separate
module from vfio-platform and vfio-amba. This makes possible
to have separate module inits and works around a race between
platform driver init and vfio reset module init: that way we
make sure symbols exported by base are available when vfio-platform
driver gets probed.

The open/release being implemented in the base module, the ref
count is applied to the parent module instead.

Signed-off-by: Eric Auger 
Suggested-by: Arnd Bergmann 

---

v3: creation
---
 drivers/vfio/platform/Makefile|  6 --
 drivers/vfio/platform/vfio_amba.c |  1 +
 drivers/vfio/platform/vfio_platform.c |  1 +
 drivers/vfio/platform/vfio_platform_common.c  | 13 +++--
 drivers/vfio/platform/vfio_platform_private.h |  1 +
 5 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/platform/Makefile b/drivers/vfio/platform/Makefile
index 9ce8afe..41a6224 100644
--- a/drivers/vfio/platform/Makefile
+++ b/drivers/vfio/platform/Makefile
@@ -1,10 +1,12 @@
-
-vfio-platform-y := vfio_platform.o vfio_platform_common.o vfio_platform_irq.o
+vfio-platform-base-y := vfio_platform_common.o vfio_platform_irq.o
+vfio-platform-y := vfio_platform.o
 
 obj-$(CONFIG_VFIO_PLATFORM) += vfio-platform.o
+obj-$(CONFIG_VFIO_PLATFORM) += vfio-platform-base.o
 obj-$(CONFIG_VFIO_PLATFORM) += reset/
 
 vfio-amba-y := vfio_amba.o
 
 obj-$(CONFIG_VFIO_AMBA) += vfio-amba.o
+obj-$(CONFIG_VFIO_AMBA) += vfio-platform-base.o
 obj-$(CONFIG_VFIO_AMBA) += reset/
diff --git a/drivers/vfio/platform/vfio_amba.c 
b/drivers/vfio/platform/vfio_amba.c
index ff0331f..a66479b 100644
--- a/drivers/vfio/platform/vfio_amba.c
+++ b/drivers/vfio/platform/vfio_amba.c
@@ -67,6 +67,7 @@ static int vfio_amba_probe(struct amba_device *adev, const 
struct amba_id *id)
vdev->flags = VFIO_DEVICE_FLAGS_AMBA;
vdev->get_resource = get_amba_resource;
vdev->get_irq = get_amba_irq;
+   vdev->parent_module = THIS_MODULE;
 
ret = vfio_platform_probe_common(vdev, >dev);
if (ret) {
diff --git a/drivers/vfio/platform/vfio_platform.c 
b/drivers/vfio/platform/vfio_platform.c
index cef645c..f1625dc 100644
--- a/drivers/vfio/platform/vfio_platform.c
+++ b/drivers/vfio/platform/vfio_platform.c
@@ -65,6 +65,7 @@ static int vfio_platform_probe(struct platform_device *pdev)
vdev->flags = VFIO_DEVICE_FLAGS_PLATFORM;
vdev->get_resource = get_platform_resource;
vdev->get_irq = get_platform_irq;
+   vdev->parent_module = THIS_MODULE;
 
ret = vfio_platform_probe_common(vdev, >dev);
if (ret)
diff --git a/drivers/vfio/platform/vfio_platform_common.c 
b/drivers/vfio/platform/vfio_platform_common.c
index e43efb5..184e9d2 100644
--- a/drivers/vfio/platform/vfio_platform_common.c
+++ b/drivers/vfio/platform/vfio_platform_common.c
@@ -23,6 +23,10 @@
 
 #include "vfio_platform_private.h"
 
+#define DRIVER_VERSION  "0.10"
+#define DRIVER_AUTHOR   "Antonios Motakis "
+#define DRIVER_DESC "VFIO platform base module"
+
 static DEFINE_MUTEX(driver_lock);
 
 static const struct vfio_platform_reset_combo reset_lookup_table[] = {
@@ -146,7 +150,7 @@ static void vfio_platform_release(void *device_data)
 
mutex_unlock(_lock);
 
-   module_put(THIS_MODULE);
+   module_put(vdev->parent_module);
 }
 
 static int vfio_platform_open(void *device_data)
@@ -154,7 +158,7 @@ static int vfio_platform_open(void *device_data)
struct vfio_platform_device *vdev = device_data;
int ret;
 
-   if (!try_module_get(THIS_MODULE))
+   if (!try_module_get(vdev->parent_module))
return -ENODEV;
 
mutex_lock(_lock);
@@ -573,3 +577,8 @@ struct vfio_platform_device 
*vfio_platform_remove_common(struct device *dev)
return vdev;
 }
 EXPORT_SYMBOL_GPL(vfio_platform_remove_common);
+
+MODULE_VERSION(DRIVER_VERSION);
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR(DRIVER_AUTHOR);
+MODULE_DESCRIPTION(DRIVER_DESC);
diff --git a/drivers/vfio/platform/vfio_platform_private.h 
b/drivers/vfio/platform/vfio_platform_private.h
index 1c9b3d5..7128690 100644
--- a/drivers/vfio/platform/vfio_platform_private.h
+++ b/drivers/vfio/platform/vfio_platform_private.h
@@ -56,6 +56,7 @@ struct vfio_platform_device {
u32 num_irqs;
int refcnt;
struct mutexigate;
+   struct module   *parent_module;
 
/*
 * These fields should be filled by the bus specific binder
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/3] KVM: arm/arm64: Optimize away redundant LR tracking

2015-10-23 Thread Pavel Fedin
Currently we use vgic_irq_lr_map in order to track which LRs hold which
IRQs, and lr_used bitmap in order to track which LRs are used or free.

vgic_irq_lr_map is actually used only for piggy-back optimization, and
can be easily replaced by iteration over lr_used. This is good because in
future, when LPI support is introduced, number of IRQs will grow up to at
least 16384, while numbers from 1024 to 8192 are never going to be used.
This would be a huge memory waste.

In its turn, lr_used is also completely redundant since
ae705930fca6322600690df9dc1c7d0516145a93 ("arm/arm64: KVM: Keep elrsr/aisr
in sync with software model"), because together with lr_used we also update
elrsr. This allows to easily replace lr_used with elrsr, inverting all
conditions (because in elrsr '1' means 'free').

Signed-off-by: Pavel Fedin 
---
 include/kvm/arm_vgic.h |  6 --
 virt/kvm/arm/vgic-v2.c |  1 +
 virt/kvm/arm/vgic-v3.c |  1 +
 virt/kvm/arm/vgic.c| 55 --
 4 files changed, 19 insertions(+), 44 deletions(-)

diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 8065801..3936bf8 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -295,9 +295,6 @@ struct vgic_v3_cpu_if {
 };
 
 struct vgic_cpu {
-   /* per IRQ to LR mapping */
-   u8  *vgic_irq_lr_map;
-
/* Pending/active/both interrupts on this VCPU */
DECLARE_BITMAP(pending_percpu, VGIC_NR_PRIVATE_IRQS);
DECLARE_BITMAP(active_percpu, VGIC_NR_PRIVATE_IRQS);
@@ -308,9 +305,6 @@ struct vgic_cpu {
unsigned long   *active_shared;
unsigned long   *pend_act_shared;
 
-   /* Bitmap of used/free list registers */
-   DECLARE_BITMAP(lr_used, VGIC_V2_MAX_LRS);
-
/* Number of list registers on this CPU */
int nr_lr;
 
diff --git a/virt/kvm/arm/vgic-v2.c b/virt/kvm/arm/vgic-v2.c
index 8d7b04d..c0f5d7f 100644
--- a/virt/kvm/arm/vgic-v2.c
+++ b/virt/kvm/arm/vgic-v2.c
@@ -158,6 +158,7 @@ static void vgic_v2_enable(struct kvm_vcpu *vcpu)
 * anyway.
 */
vcpu->arch.vgic_cpu.vgic_v2.vgic_vmcr = 0;
+   vcpu->arch.vgic_cpu.vgic_v2.vgic_elrsr = ~0;
 
/* Get the show on the road... */
vcpu->arch.vgic_cpu.vgic_v2.vgic_hcr = GICH_HCR_EN;
diff --git a/virt/kvm/arm/vgic-v3.c b/virt/kvm/arm/vgic-v3.c
index 7dd5d62..92003cb 100644
--- a/virt/kvm/arm/vgic-v3.c
+++ b/virt/kvm/arm/vgic-v3.c
@@ -193,6 +193,7 @@ static void vgic_v3_enable(struct kvm_vcpu *vcpu)
 * anyway.
 */
vgic_v3->vgic_vmcr = 0;
+   vgic_v3->vgic_elrsr = ~0;
 
/*
 * If we are emulating a GICv3, we do it in an non-GICv2-compatible
diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index d4669eb..60d270d 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -108,6 +108,7 @@ static void vgic_retire_disabled_irqs(struct kvm_vcpu 
*vcpu);
 static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu);
 static struct vgic_lr vgic_get_lr(const struct kvm_vcpu *vcpu, int lr);
 static void vgic_set_lr(struct kvm_vcpu *vcpu, int lr, struct vgic_lr lr_desc);
+static u64 vgic_get_elrsr(struct kvm_vcpu *vcpu);
 static struct irq_phys_map *vgic_irq_map_search(struct kvm_vcpu *vcpu,
int virt_irq);
 static int compute_pending_for_cpu(struct kvm_vcpu *vcpu);
@@ -691,9 +692,11 @@ bool vgic_handle_cfg_reg(u32 *reg, struct kvm_exit_mmio 
*mmio,
 void vgic_unqueue_irqs(struct kvm_vcpu *vcpu)
 {
struct vgic_cpu *vgic_cpu = >arch.vgic_cpu;
+   u64 elrsr = vgic_get_elrsr(vcpu);
+   unsigned long *elrsr_ptr = u64_to_bitmask();
int i;
 
-   for_each_set_bit(i, vgic_cpu->lr_used, vgic_cpu->nr_lr) {
+   for_each_clear_bit(i, elrsr_ptr, vgic_cpu->nr_lr) {
struct vgic_lr lr = vgic_get_lr(vcpu, i);
 
/*
@@ -1098,7 +1101,6 @@ static inline void vgic_enable(struct kvm_vcpu *vcpu)
 
 static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu)
 {
-   struct vgic_cpu *vgic_cpu = >arch.vgic_cpu;
struct vgic_lr vlr = vgic_get_lr(vcpu, lr_nr);
 
/*
@@ -1112,8 +1114,6 @@ static void vgic_retire_lr(int lr_nr, int irq, struct 
kvm_vcpu *vcpu)
 
vlr.state = 0;
vgic_set_lr(vcpu, lr_nr, vlr);
-   clear_bit(lr_nr, vgic_cpu->lr_used);
-   vgic_cpu->vgic_irq_lr_map[irq] = LR_EMPTY;
vgic_sync_lr_elrsr(vcpu, lr_nr, vlr);
 }
 
@@ -1128,10 +1128,11 @@ static void vgic_retire_lr(int lr_nr, int irq, struct 
kvm_vcpu *vcpu)
  */
 static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu)
 {
-   struct vgic_cpu *vgic_cpu = >arch.vgic_cpu;
+   u64 elrsr = vgic_get_elrsr(vcpu);
+   unsigned long *elrsr_ptr = u64_to_bitmask();
int lr;
 
-   for_each_set_bit(lr, vgic_cpu->lr_used, vgic->nr_lr) {
+   for_each_clear_bit(lr, elrsr_ptr, vgic->nr_lr) {
struct vgic_lr vlr = 

[PATCH v2 3/3] KVM: arm/arm64: Refactor vgic_retire_lr()

2015-10-23 Thread Pavel Fedin
1. Remove unnecessary 'irq' argument, because irq number can be retrieved
   from the LR.
2. vgic_retire_lr() is always accompanied by vgic_irq_clear_queued(). Since
   it already does more than just clearing the LR, move
   vgic_irq_clear_queued() inside of it.

Signed-off-by: Pavel Fedin 
---
 virt/kvm/arm/vgic.c | 18 --
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 218b094..d8f7c21 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -105,7 +105,7 @@
 #include "vgic.h"
 
 static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu);
-static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu);
+static void vgic_retire_lr(int lr_nr, struct kvm_vcpu *vcpu);
 static struct vgic_lr vgic_get_lr(const struct kvm_vcpu *vcpu, int lr);
 static void vgic_set_lr(struct kvm_vcpu *vcpu, int lr, struct vgic_lr lr_desc);
 static u64 vgic_get_elrsr(struct kvm_vcpu *vcpu);
@@ -724,8 +724,7 @@ void vgic_unqueue_irqs(struct kvm_vcpu *vcpu)
 * Reestablish the pending state on the distributor and the
 * CPU interface and mark the LR as free for other use.
 */
-   vgic_retire_lr(i, lr.irq, vcpu);
-   vgic_irq_clear_queued(vcpu, lr.irq);
+   vgic_retire_lr(i, vcpu);
 
/* Finally update the VGIC state. */
vgic_update_state(vcpu->kvm);
@@ -1078,16 +1077,18 @@ static inline void vgic_enable(struct kvm_vcpu *vcpu)
vgic_ops->enable(vcpu);
 }
 
-static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu)
+static void vgic_retire_lr(int lr_nr, struct kvm_vcpu *vcpu)
 {
struct vgic_lr vlr = vgic_get_lr(vcpu, lr_nr);
 
+   vgic_irq_clear_queued(vcpu, vlr.irq);
+
/*
 * We must transfer the pending state back to the distributor before
 * retiring the LR, otherwise we may loose edge-triggered interrupts.
 */
if (vlr.state & LR_STATE_PENDING) {
-   vgic_dist_irq_set_pending(vcpu, irq);
+   vgic_dist_irq_set_pending(vcpu, vlr.irq);
vlr.hwirq = 0;
}
 
@@ -1113,11 +1114,8 @@ static void vgic_retire_disabled_irqs(struct kvm_vcpu 
*vcpu)
for_each_clear_bit(lr, elrsr_ptr, vgic->nr_lr) {
struct vgic_lr vlr = vgic_get_lr(vcpu, lr);
 
-   if (!vgic_irq_is_enabled(vcpu, vlr.irq)) {
-   vgic_retire_lr(lr, vlr.irq, vcpu);
-   if (vgic_irq_is_queued(vcpu, vlr.irq))
-   vgic_irq_clear_queued(vcpu, vlr.irq);
-   }
+   if (!vgic_irq_is_enabled(vcpu, vlr.irq))
+   vgic_retire_lr(lr, vcpu);
}
 }
 
-- 
2.4.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/3] KVM: arm/arm64: Merge vgic_set_lr() and vgic_sync_lr_elrsr()

2015-10-23 Thread Pavel Fedin
Now we see that vgic_set_lr() and vgic_sync_lr_elrsr() are always used
together. Merge them into one function, saving from second vgic_ops
dereferencing every time.

Additionally, remove unnecessary vgic_set_lr() and LR_STATE_PENDING check
in vgic_unqueue_irqs(), because all these things are now done by the
following vgic_retire_lr().

Signed-off-by: Pavel Fedin 
---
 include/kvm/arm_vgic.h |  1 -
 virt/kvm/arm/vgic-v2.c |  5 -
 virt/kvm/arm/vgic-v3.c |  5 -
 virt/kvm/arm/vgic.c| 33 -
 4 files changed, 4 insertions(+), 40 deletions(-)

diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 3936bf8..f62addc 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -112,7 +112,6 @@ struct vgic_vmcr {
 struct vgic_ops {
struct vgic_lr  (*get_lr)(const struct kvm_vcpu *, int);
void(*set_lr)(struct kvm_vcpu *, int, struct vgic_lr);
-   void(*sync_lr_elrsr)(struct kvm_vcpu *, int, struct vgic_lr);
u64 (*get_elrsr)(const struct kvm_vcpu *vcpu);
u64 (*get_eisr)(const struct kvm_vcpu *vcpu);
void(*clear_eisr)(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/arm/vgic-v2.c b/virt/kvm/arm/vgic-v2.c
index c0f5d7f..ff02f08 100644
--- a/virt/kvm/arm/vgic-v2.c
+++ b/virt/kvm/arm/vgic-v2.c
@@ -79,11 +79,7 @@ static void vgic_v2_set_lr(struct kvm_vcpu *vcpu, int lr,
lr_val |= (lr_desc.source << GICH_LR_PHYSID_CPUID_SHIFT);
 
vcpu->arch.vgic_cpu.vgic_v2.vgic_lr[lr] = lr_val;
-}
 
-static void vgic_v2_sync_lr_elrsr(struct kvm_vcpu *vcpu, int lr,
- struct vgic_lr lr_desc)
-{
if (!(lr_desc.state & LR_STATE_MASK))
vcpu->arch.vgic_cpu.vgic_v2.vgic_elrsr |= (1ULL << lr);
else
@@ -167,7 +163,6 @@ static void vgic_v2_enable(struct kvm_vcpu *vcpu)
 static const struct vgic_ops vgic_v2_ops = {
.get_lr = vgic_v2_get_lr,
.set_lr = vgic_v2_set_lr,
-   .sync_lr_elrsr  = vgic_v2_sync_lr_elrsr,
.get_elrsr  = vgic_v2_get_elrsr,
.get_eisr   = vgic_v2_get_eisr,
.clear_eisr = vgic_v2_clear_eisr,
diff --git a/virt/kvm/arm/vgic-v3.c b/virt/kvm/arm/vgic-v3.c
index 92003cb..487d635 100644
--- a/virt/kvm/arm/vgic-v3.c
+++ b/virt/kvm/arm/vgic-v3.c
@@ -112,11 +112,7 @@ static void vgic_v3_set_lr(struct kvm_vcpu *vcpu, int lr,
}
 
vcpu->arch.vgic_cpu.vgic_v3.vgic_lr[LR_INDEX(lr)] = lr_val;
-}
 
-static void vgic_v3_sync_lr_elrsr(struct kvm_vcpu *vcpu, int lr,
- struct vgic_lr lr_desc)
-{
if (!(lr_desc.state & LR_STATE_MASK))
vcpu->arch.vgic_cpu.vgic_v3.vgic_elrsr |= (1U << lr);
else
@@ -212,7 +208,6 @@ static void vgic_v3_enable(struct kvm_vcpu *vcpu)
 static const struct vgic_ops vgic_v3_ops = {
.get_lr = vgic_v3_get_lr,
.set_lr = vgic_v3_set_lr,
-   .sync_lr_elrsr  = vgic_v3_sync_lr_elrsr,
.get_elrsr  = vgic_v3_get_elrsr,
.get_eisr   = vgic_v3_get_eisr,
.clear_eisr = vgic_v3_clear_eisr,
diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 60d270d..218b094 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -717,28 +717,13 @@ void vgic_unqueue_irqs(struct kvm_vcpu *vcpu)
 * interrupt then move the active state to the
 * distributor tracking bit.
 */
-   if (lr.state & LR_STATE_ACTIVE) {
+   if (lr.state & LR_STATE_ACTIVE)
vgic_irq_set_active(vcpu, lr.irq);
-   lr.state &= ~LR_STATE_ACTIVE;
-   }
 
/*
 * Reestablish the pending state on the distributor and the
-* CPU interface.  It may have already been pending, but that
-* is fine, then we are only setting a few bits that were
-* already set.
-*/
-   if (lr.state & LR_STATE_PENDING) {
-   vgic_dist_irq_set_pending(vcpu, lr.irq);
-   lr.state &= ~LR_STATE_PENDING;
-   }
-
-   vgic_set_lr(vcpu, i, lr);
-
-   /*
-* Mark the LR as free for other use.
+* CPU interface and mark the LR as free for other use.
 */
-   BUG_ON(lr.state & LR_STATE_MASK);
vgic_retire_lr(i, lr.irq, vcpu);
vgic_irq_clear_queued(vcpu, lr.irq);
 
@@ -1048,12 +1033,6 @@ static void vgic_set_lr(struct kvm_vcpu *vcpu, int lr,
vgic_ops->set_lr(vcpu, lr, vlr);
 }
 
-static void vgic_sync_lr_elrsr(struct kvm_vcpu *vcpu, int lr,
-  struct vgic_lr vlr)
-{
-   vgic_ops->sync_lr_elrsr(vcpu, lr, vlr);
-}
-
 static inline u64 vgic_get_elrsr(struct 

Re: [PATCH v3 4/7] vfio: platform: reset: calxedaxgmac: add reset function registration

2015-10-23 Thread Arnd Bergmann
On Friday 23 October 2015 14:37:12 Eric Auger wrote:
> This patch adds the reset function registration/unregistration.
> This is handled through the module_vfio_reset_handler macro. This
> latter also defines a MODULE_ALIAS which simplifies the load from
> vfio-platform.
> 
> Signed-off-by: Eric Auger 
> 
> 

Reviewed-by: Arnd Bergmann 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] VFIO: platform: reset: AMD xgbe reset module

2015-10-23 Thread Eric Auger
This patch introduces a module that registers and implements a low-level
reset function for the AMD XGBE device.

it performs the following actions:
- reset the PHY
- disable auto-negotiation
- disable & clear auto-negotiation IRQ
- soft-reset the MAC

Those tiny pieces of code are inherited from the native xgbe driver.

Signed-off-by: Eric Auger 

---

Applies on top of [PATCH v3 0/7] VFIO platform reset module rework

v1 -> v2:
- uses module_vfio_reset_handler macro
---
 drivers/vfio/platform/reset/Kconfig|   7 ++
 drivers/vfio/platform/reset/Makefile   |   2 +
 .../vfio/platform/reset/vfio_platform_amdxgbe.c| 131 +
 3 files changed, 140 insertions(+)
 create mode 100644 drivers/vfio/platform/reset/vfio_platform_amdxgbe.c

diff --git a/drivers/vfio/platform/reset/Kconfig 
b/drivers/vfio/platform/reset/Kconfig
index 746b96b..ed9bb28 100644
--- a/drivers/vfio/platform/reset/Kconfig
+++ b/drivers/vfio/platform/reset/Kconfig
@@ -5,3 +5,10 @@ config VFIO_PLATFORM_CALXEDAXGMAC_RESET
  Enables the VFIO platform driver to handle reset for Calxeda xgmac
 
  If you don't know what to do here, say N.
+config VFIO_PLATFORM_AMDXGBE_RESET
+   tristate "VFIO support for AMD XGBE reset"
+   depends on VFIO_PLATFORM
+   help
+ Enables the VFIO platform driver to handle reset for AMD XGBE
+
+ If you don't know what to do here, say N.
diff --git a/drivers/vfio/platform/reset/Makefile 
b/drivers/vfio/platform/reset/Makefile
index 2a486af..93f4e23 100644
--- a/drivers/vfio/platform/reset/Makefile
+++ b/drivers/vfio/platform/reset/Makefile
@@ -1,5 +1,7 @@
 vfio-platform-calxedaxgmac-y := vfio_platform_calxedaxgmac.o
+vfio-platform-amdxgbe-y := vfio_platform_amdxgbe.o
 
 ccflags-y += -Idrivers/vfio/platform
 
 obj-$(CONFIG_VFIO_PLATFORM_CALXEDAXGMAC_RESET) += vfio-platform-calxedaxgmac.o
+obj-$(CONFIG_VFIO_PLATFORM_AMDXGBE_RESET) += vfio-platform-amdxgbe.o
diff --git a/drivers/vfio/platform/reset/vfio_platform_amdxgbe.c 
b/drivers/vfio/platform/reset/vfio_platform_amdxgbe.c
new file mode 100644
index 000..20b530a
--- /dev/null
+++ b/drivers/vfio/platform/reset/vfio_platform_amdxgbe.c
@@ -0,0 +1,131 @@
+/*
+ * VFIO platform driver specialized for AMD xgbe reset
+ * reset code is inherited from AMD xgbe native driver
+ *
+ * Copyright (c) 2015 Linaro Ltd.
+ *  www.linaro.org
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see .
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "vfio_platform_private.h"
+
+#define DRIVER_VERSION  "0.1"
+#define DRIVER_AUTHOR   "Eric Auger "
+#define DRIVER_DESC "Reset support for AMD xgbe vfio platform device"
+
+#define DMA_MR 0x3000
+#define MAC_VR 0x0110
+#define DMA_ISR0x3008
+#define MAC_ISR0x00b0
+#define PCS_MMD_SELECT 0xff
+#define MDIO_AN_INT0x8002
+#define MDIO_AN_INTMASK0x8001
+
+static unsigned int xmdio_read(void *ioaddr, unsigned int mmd,
+  unsigned int reg)
+{
+   unsigned int mmd_address, value;
+
+   mmd_address = (mmd << 16) | ((reg) & 0x);
+   iowrite32(mmd_address >> 8, ioaddr + (PCS_MMD_SELECT << 2));
+   value = ioread32(ioaddr + ((mmd_address & 0xff) << 2));
+   return value;
+}
+
+static void xmdio_write(void *ioaddr, unsigned int mmd,
+   unsigned int reg, unsigned int value)
+{
+   unsigned int mmd_address;
+
+   mmd_address = (mmd << 16) | ((reg) & 0x);
+   iowrite32(mmd_address >> 8, ioaddr + (PCS_MMD_SELECT << 2));
+   iowrite32(value, ioaddr + ((mmd_address & 0xff) << 2));
+}
+
+int vfio_platform_amdxgbe_reset(struct vfio_platform_device *vdev)
+{
+   struct vfio_platform_region xgmac_regs = vdev->regions[0];
+   struct vfio_platform_region xpcs_regs = vdev->regions[1];
+   u32 dma_mr_value, pcs_value, value;
+   unsigned int count;
+
+   if (!xgmac_regs.ioaddr) {
+   xgmac_regs.ioaddr =
+   ioremap_nocache(xgmac_regs.addr, xgmac_regs.size);
+   if (!xgmac_regs.ioaddr)
+   return -ENOMEM;
+   }
+   if (!xpcs_regs.ioaddr) {
+   xpcs_regs.ioaddr =
+   ioremap_nocache(xpcs_regs.addr, xpcs_regs.size);
+ 

Re: [PATCH v2 00/12] KVM: x86: add support for VMX TSC scaling

2015-10-23 Thread Haozhong Zhang
On Fri, Oct 23, 2015 at 02:46:19PM +0200, Joerg Roedel wrote:
> On Fri, Oct 23, 2015 at 08:32:28PM +0800, Haozhong Zhang wrote:
> > No, since I don't have AMD machines at hand. The modifications to SVM
> > code are mostly lifting common code with VMX TSC scaling code, so it
> > should still work on AMD machines.
> 
> Well, I think it would be good if you can provide a Tested-by on AMD
> machines from someone who has one. Or get one yourself when changing AMD
> specific code, they are not that expensive :)
> I can do some testing when I am back from my travels, but that will not
> be before early November.
> 
>   Joerg

I'll try to get a test result. And it would be very appreciated if you
could test as well.

Thanks!
Haozhong

> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] VFIO: platform: reset: AMD xgbe reset module

2015-10-23 Thread Arnd Bergmann
On Friday 23 October 2015 14:44:13 Eric Auger wrote:
> This patch introduces a module that registers and implements a low-level
> reset function for the AMD XGBE device.
> 
> it performs the following actions:
> - reset the PHY
> - disable auto-negotiation
> - disable & clear auto-negotiation IRQ
> - soft-reset the MAC
> 
> Those tiny pieces of code are inherited from the native xgbe driver.
> 
> Signed-off-by: Eric Auger 
> 
> ---

The code looks ok to me, just two small style issues.
>  
> If you don't know what to do here, say N.
> +config VFIO_PLATFORM_AMDXGBE_RESET
> + tristate "VFIO support for AMD XGBE reset"
> + depends on VFIO_PLATFORM
> + help
> +   Enables the VFIO platform driver to handle reset for AMD XGBE
> +
> +   If you don't know what to do here, say N.

Please add an empty line before the newly introduced option.

> +MODULE_VERSION(DRIVER_VERSION);
> +MODULE_LICENSE("GPL v2");
> +MODULE_AUTHOR(DRIVER_AUTHOR);
> +MODULE_DESCRIPTION(DRIVER_DESC);

Best remove those macros and put the strings in here directly to
make it easier to grep for.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/4] KVM: X86: Add arrays to save/restore LBR MSRs

2015-10-23 Thread kbuild test robot
Hi Jian,

[auto build test ERROR on v4.3-rc6 -- if it's inappropriate base, please 
suggest rules for selecting the more suitable base]

url:
https://github.com/0day-ci/linux/commits/Jian-Zhou/KVM-X86-Add-arrays-to-save-restore-LBR-MSRs/20151023-172601
config: x86_64-lkp (attached as .config)
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

Note: the 
linux-review/Jian-Zhou/KVM-X86-Add-arrays-to-save-restore-LBR-MSRs/20151023-172601
 HEAD d402c03a709c1dff60e2800becbafaf3b2d86dcd builds fine.
  It only hurts bisectibility.

All errors (new ones prefixed by >>):

   In file included from include/linux/kvm_host.h:34:0,
from arch/x86/kvm/../../../virt/kvm/kvm_main.c:21:
>> arch/x86/include/asm/kvm_host.h:530:25: error: 'MAX_NUM_LBR_MSRS' undeclared 
>> here (not in a function)
  struct msr_data guest[MAX_NUM_LBR_MSRS];
^

vim +/MAX_NUM_LBR_MSRS +530 arch/x86/include/asm/kvm_host.h

   524  
   525  int lbr_status;
   526  int lbr_used;
   527  
   528  struct lbr_msr {
   529  unsigned nr;
 > 530  struct msr_data guest[MAX_NUM_LBR_MSRS];
   531  struct msr_data host[MAX_NUM_LBR_MSRS];
   532  }lbr_msr;
   533  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


[PATCH v3 3/7] vfio: platform: introduce module_vfio_reset_handler macro

2015-10-23 Thread Eric Auger
The module_vfio_reset_handler macro
- define a module alias
- implement module init/exit function which respectively registers
  and unregisters the reset function.

Signed-off-by: Eric Auger 

---
v2 -> v3:
- use vfio_platform_register_reset macro

v1 -> v2:
- remove vfio_platform_reset_private.h and move back the macro to
  vfio_platform_private.h header: removed reset_module_register &
  unregister (symbol_get)
- defines the module_vfio_reset_handler macro as suggested by Arnd
  (formerly in vfio_platform_reset_private.h)
---
 drivers/vfio/platform/vfio_platform_private.h | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/drivers/vfio/platform/vfio_platform_private.h 
b/drivers/vfio/platform/vfio_platform_private.h
index 277521a..5a1e8e6 100644
--- a/drivers/vfio/platform/vfio_platform_private.h
+++ b/drivers/vfio/platform/vfio_platform_private.h
@@ -110,4 +110,18 @@ static struct vfio_platform_reset_node __reset ## _node = 
{\
 }; \
 __vfio_platform_register_reset(&__reset ## _node)
 
+#define module_vfio_reset_handler(compat, reset)   \
+MODULE_ALIAS("vfio-reset:" compat);\
+static int __init reset ## _module_init(void)  \
+{  \
+   vfio_platform_register_reset(compat, reset);\
+   return 0;   \
+}; \
+static void __exit reset ## _module_exit(void) \
+{  \
+   vfio_platform_unregister_reset(compat); \
+}; \
+module_init(reset ## _module_init);\
+module_exit(reset ## _module_exit)
+
 #endif /* VFIO_PLATFORM_PRIVATE_H */
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 4/7] vfio: platform: reset: calxedaxgmac: add reset function registration

2015-10-23 Thread Eric Auger
This patch adds the reset function registration/unregistration.
This is handled through the module_vfio_reset_handler macro. This
latter also defines a MODULE_ALIAS which simplifies the load from
vfio-platform.

Signed-off-by: Eric Auger 

---
v2 -> v3:
- do not include vfio_platform_reset_private.h anymore (removed)
- remove pr_info
- rework commit message

v1 -> v2:
- uses the module_vfio_reset_handler macro
- add pr_info on vfio reset
- do not export vfio_platform_calxedaxgmac_reset symbol anymore
---
 drivers/vfio/platform/reset/vfio_platform_calxedaxgmac.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/platform/reset/vfio_platform_calxedaxgmac.c 
b/drivers/vfio/platform/reset/vfio_platform_calxedaxgmac.c
index 619dc7d..640f5d8 100644
--- a/drivers/vfio/platform/reset/vfio_platform_calxedaxgmac.c
+++ b/drivers/vfio/platform/reset/vfio_platform_calxedaxgmac.c
@@ -30,8 +30,6 @@
 #define DRIVER_AUTHOR   "Eric Auger "
 #define DRIVER_DESC "Reset support for Calxeda xgmac vfio platform device"
 
-#define CALXEDAXGMAC_COMPAT "calxeda,hb-xgmac"
-
 /* XGMAC Register definitions */
 #define XGMAC_CONTROL   0x  /* MAC Configuration */
 
@@ -78,7 +76,8 @@ int vfio_platform_calxedaxgmac_reset(struct 
vfio_platform_device *vdev)
 
return 0;
 }
-EXPORT_SYMBOL_GPL(vfio_platform_calxedaxgmac_reset);
+
+module_vfio_reset_handler("calxeda,hb-xgmac", 
vfio_platform_calxedaxgmac_reset);
 
 MODULE_VERSION(DRIVER_VERSION);
 MODULE_LICENSE("GPL v2");
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/4] KVM: X86: Add arrays to save/restore LBR MSRs

2015-10-23 Thread Jian Zhou
Add arrays in kvm_vcpu_arch struct to save/restore
LBR MSRs at vm exit/entry time.
Add new hooks to set/get DEBUGCTLMSR and LBR MSRs.

Signed-off-by: Jian Zhou 
Signed-off-by: Stephen He 
---
 arch/x86/include/asm/kvm_host.h | 26 --
 1 file changed, 20 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 3a36ee7..dc2c120 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -376,6 +376,12 @@ struct kvm_vcpu_hv {
u64 hv_vapic;
 };

+struct msr_data {
+   bool host_initiated;
+   u32 index;
+   u64 data;
+};
+
 struct kvm_vcpu_arch {
/*
 * rip and regs accesses must go through
@@ -516,6 +522,15 @@ struct kvm_vcpu_arch {
unsigned long eff_db[KVM_NR_DB_REGS];
unsigned long guest_debug_dr7;

+   int lbr_status;
+   int lbr_used;
+
+   struct lbr_msr {
+   unsigned nr;
+   struct msr_data guest[MAX_NUM_LBR_MSRS];
+   struct msr_data host[MAX_NUM_LBR_MSRS];
+   }lbr_msr;
+
u64 mcg_cap;
u64 mcg_status;
u64 mcg_ctl;
@@ -728,12 +743,6 @@ struct kvm_vcpu_stat {

 struct x86_instruction_info;

-struct msr_data {
-   bool host_initiated;
-   u32 index;
-   u64 data;
-};
-
 struct kvm_lapic_irq {
u32 vector;
u16 delivery_mode;
@@ -887,6 +896,11 @@ struct kvm_x86_ops {
   gfn_t offset, unsigned long mask);
/* pmu operations of sub-arch */
const struct kvm_pmu_ops *pmu_ops;
+
+   int (*set_debugctlmsr)(struct kvm_vcpu *vcpu, u64 value);
+   u64 (*get_debugctlmsr)(void);
+   void (*set_lbr_msr)(struct kvm_vcpu *vcpu, u32 msr, u64 data);
+   u64 (*get_lbr_msr)(struct kvm_vcpu *vcpu, u32 msr);
 };

 struct kvm_arch_async_pf {
--
1.7.12.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 6/7] vfio: platform: use list of registered reset function

2015-10-23 Thread Eric Auger
Remove the static lookup table and use the dynamic list of registered
reset functions instead. Also load the reset module through its alias.
The reset struct module pointer is stored in vfio_platform_device.

We also remove the useless struct device pointer parameter in
vfio_platform_get_reset.

This patch fixes the issue related to the usage of __symbol_get, which
besides from being moot, prevented compilation with CONFIG_MODULES
disabled.

Also usage of MODULE_ALIAS makes possible to add a new reset module
without needing to update the framework. This was suggested by Arnd.

Signed-off-by: Eric Auger 
Reported-by: Arnd Bergmann 

---

v2 -> v3:
- remove clear of vfio_platform_device reset_module and reset
  in vfio_platform_put_reset
- single unlock in vfio_platform_lookup_reset
- use driver_lock instead of reset_lock

v1 -> v2:
- use reset_lock in vfio_platform_lookup_reset
- remove vfio_platform_reset_combo declaration
- remove struct device *dev parameter in vfio_platform_get_reset
- set reset_module and reset to NULL in put function
---
 drivers/vfio/platform/vfio_platform_common.c  | 52 +++
 drivers/vfio/platform/vfio_platform_private.h |  7 +---
 2 files changed, 30 insertions(+), 29 deletions(-)

diff --git a/drivers/vfio/platform/vfio_platform_common.c 
b/drivers/vfio/platform/vfio_platform_common.c
index 50a388b..de3fb33 100644
--- a/drivers/vfio/platform/vfio_platform_common.c
+++ b/drivers/vfio/platform/vfio_platform_common.c
@@ -30,37 +30,43 @@
 static LIST_HEAD(reset_list);
 static DEFINE_MUTEX(driver_lock);
 
-static const struct vfio_platform_reset_combo reset_lookup_table[] = {
-   {
-   .compat = "calxeda,hb-xgmac",
-   .reset_function_name = "vfio_platform_calxedaxgmac_reset",
-   .module_name = "vfio-platform-calxedaxgmac",
-   },
-};
-
-static void vfio_platform_get_reset(struct vfio_platform_device *vdev,
-   struct device *dev)
+static vfio_platform_reset_fn_t vfio_platform_lookup_reset(const char *compat,
+   struct module **module)
 {
-   int (*reset)(struct vfio_platform_device *);
-   int i;
+   struct vfio_platform_reset_node *iter;
+   vfio_platform_reset_fn_t reset_fn = NULL;
 
-   for (i = 0 ; i < ARRAY_SIZE(reset_lookup_table); i++) {
-   if (!strcmp(reset_lookup_table[i].compat, vdev->compat)) {
-   request_module(reset_lookup_table[i].module_name);
-   reset = __symbol_get(
-   reset_lookup_table[i].reset_function_name);
-   if (reset) {
-   vdev->reset = reset;
-   return;
-   }
+   mutex_lock(_lock);
+   list_for_each_entry(iter, _list, link) {
+   if (!strcmp(iter->compat, compat) &&
+   try_module_get(iter->owner)) {
+   *module = iter->owner;
+   reset_fn = iter->reset;
+   break;
}
}
+   mutex_unlock(_lock);
+   return reset_fn;
+}
+
+static void vfio_platform_get_reset(struct vfio_platform_device *vdev)
+{
+   char modname[256];
+
+   vdev->reset = vfio_platform_lookup_reset(vdev->compat,
+   >reset_module);
+   if (!vdev->reset) {
+   snprintf(modname, 256, "vfio-reset:%s", vdev->compat);
+   request_module(modname);
+   vdev->reset = vfio_platform_lookup_reset(vdev->compat,
+>reset_module);
+   }
 }
 
 static void vfio_platform_put_reset(struct vfio_platform_device *vdev)
 {
if (vdev->reset)
-   symbol_put_addr(vdev->reset);
+   module_put(vdev->reset_module);
 }
 
 static int vfio_platform_regions_init(struct vfio_platform_device *vdev)
@@ -557,7 +563,7 @@ int vfio_platform_probe_common(struct vfio_platform_device 
*vdev,
return ret;
}
 
-   vfio_platform_get_reset(vdev, dev);
+   vfio_platform_get_reset(vdev);
 
mutex_init(>igate);
 
diff --git a/drivers/vfio/platform/vfio_platform_private.h 
b/drivers/vfio/platform/vfio_platform_private.h
index f8072d8..e505c15 100644
--- a/drivers/vfio/platform/vfio_platform_private.h
+++ b/drivers/vfio/platform/vfio_platform_private.h
@@ -58,6 +58,7 @@ struct vfio_platform_device {
struct mutexigate;
struct module   *parent_module;
const char  *compat;
+   struct module   *reset_module;
 
/*
 * These fields should be filled by the bus specific binder
@@ -81,12 +82,6 @@ struct vfio_platform_reset_node {
vfio_platform_reset_fn_t reset;
 };
 
-struct vfio_platform_reset_combo {
-   const 

Re: [PATCH v2 0/3] target-i386: save/restore vcpu's TSC rate during migration

2015-10-23 Thread Marcelo Tosatti
On Thu, Oct 22, 2015 at 04:45:21PM -0200, Eduardo Habkost wrote:
> On Tue, Oct 20, 2015 at 03:22:51PM +0800, Haozhong Zhang wrote:
> > This patchset enables QEMU to save/restore vcpu's TSC rate during the
> > migration. When cooperating with KVM which supports TSC scaling, guest
> > programs can observe a consistent guest TSC rate even though they are
> > migrated among machines with different host TSC rates.
> > 
> > A pair of cpu options 'save-tsc-freq' and 'load-tsc-freq' are added to
> > control the migration of vcpu's TSC rate.
> 
> The requirements and goals aren't clear to me. I see two possible use
> cases, here:
> 
> 1) Best effort to keep TSC frequency constant if possible (but not
>aborting migration if not possible). This would be an interesting
>default, but a bit unpredictable.
> 2) Strictly ensuring TSC frequency stays constant on migration (and
>aborting migration if not possible). This would be an useful feature,
>but can't be enabled by default unless both hosts have the same TSC
>frequency or support TSC scaling.

Only destination needs to support TSC scaling, to match the frequency
of the incoming host.

The KVM code for this feature has submitted or integrated? 

> Which one(s) you are trying to implement?
> 
> In other words, what is the right behavior when KVM_SET_TSC_KHZ fails or
> KVM_CAP_TSC_CONTROL is not available? We can't answer that question if
> the requirements and goals are not clear.
> 
> Once we know what exactly is the goal, we could enable the new mode with
> a single option, instead of raw options to control migration stream
> loading/saving.

Windows and Linux guests have paravirt clocks and/or options to
disable direct TSC usage for timekeeping purposes. So disabling
migration seems overkill.

> 
> 
> >  * By default, the migration of vcpu's TSC rate is enabled only on
> >pc-*-2.5 and newer machine types. If the cpu option 'save-tsc-freq'
> >is present, the vcpu's TSC rate will be migrated from older machine
> >types as well.
> >  * Another cpu option 'load-tsc-freq' controls whether the migrated
> >vcpu's TSC rate is used. By default, QEMU will not use the migrated
> >TSC rate if this option is not present. Otherwise, QEMU will use
> >the migrated TSC rate and override the TSC rate given by the cpu
> >option 'tsc-freq'.
> > 
> > Changes in v2:
> >  * Add a pair of cpu options 'save-tsc-freq' and 'load-tsc-freq' to
> >control the migration of vcpu's TSC rate.
> >  * Move all logic of setting TSC rate to target-i386.
> >  * Remove the duplicated TSC setup in kvm_arch_init_vcpu().
> > 
> > Haozhong Zhang (3):
> >   target-i386: add a subsection for migrating vcpu's TSC rate
> >   target-i386: calculate vcpu's TSC rate to be migrated
> >   target-i386: load the migrated vcpu's TSC rate
> > 
> >  include/hw/i386/pc.h  |  5 +
> >  target-i386/cpu.c |  2 ++
> >  target-i386/cpu.h |  3 +++
> >  target-i386/kvm.c | 61 
> > +++
> >  target-i386/machine.c | 19 
> >  5 files changed, 81 insertions(+), 9 deletions(-)
> > 
> > -- 
> > 2.4.8
> > 
> 
> -- 
> Eduardo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/12] KVM: x86: add support for VMX TSC scaling

2015-10-23 Thread Haozhong Zhang
On Fri, Oct 23, 2015 at 12:06:50PM +0200, Joerg Roedel wrote:
> On Tue, Oct 20, 2015 at 03:39:00PM +0800, Haozhong Zhang wrote:
> > VMX TSC scaling shares some common logics with SVM TSC ratio which
> > is already supported by KVM. Patch 1 ~ 8 move those common logics from
> > SVM code to the common code. Upon them, patch 9 ~ 12 add VMX-specific
> > support for VMX TSC scaling.
> 
> Have you tested your changes on an AMD machine too?
> 
> 
>   Joerg
> 

No, since I don't have AMD machines at hand. The modifications to SVM
code are mostly lifting common code with VMX TSC scaling code, so it
should still work on AMD machines.

Haozhong
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/4] KVM: VMX: enable LBR virtualization

2015-10-23 Thread Jian Zhou
Changelog in v2:
  (1) move the implementation into vmx.c
  (2) migraton is supported
  (3) add arrays in kvm_vcpu_arch struct to save/restore
  LBR MSRs at vm exit/entry time.
  (4) add a parameter of kvm_intel module to permanently
  disable LBRV
  (5) table of supported CPUs is reorgnized, LBRV
  can be enabled or not according to the guest CPUID

Jian Zhou (4):
  KVM: X86: Add arrays to save/restore LBR MSRs
  KVM: X86: LBR MSRs of supported CPU types
  KVM: X86: Migration is supported
  KVM: VMX: details of LBR virtualization implementation

 arch/x86/include/asm/kvm_host.h  |  26 -
 arch/x86/include/asm/msr-index.h |  26 -
 arch/x86/kvm/vmx.c   | 245 +++
 arch/x86/kvm/x86.c   |  88 --
 4 files changed, 366 insertions(+), 19 deletions(-)

--
1.7.12.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 4/4] KVM: VMX: details of LBR virtualization implementation

2015-10-23 Thread Jian Zhou
Using msr intercept bitmap and arrays(save/restore LBR MSRs)
in kvm_vcpu_arch struct to support LBR virtualization.
Add a parameter of kvm_intel module to permanently disable
LBRV.
Reorgnized the table of supported CPUs, LBRV can be enabled
or not according to the guest CPUID.

Signed-off-by: Jian Zhou 
Signed-off-by: Stephen He 
---
 arch/x86/kvm/vmx.c | 245 +
 1 file changed, 245 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 6a8bc64..3ab890d 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -90,6 +90,9 @@ module_param(fasteoi, bool, S_IRUGO);
 static bool __read_mostly enable_apicv = 1;
 module_param(enable_apicv, bool, S_IRUGO);

+static bool __read_mostly lbrv = 1;
+module_param(lbrv, bool, S_IRUGO);
+
 static bool __read_mostly enable_shadow_vmcs = 1;
 module_param_named(enable_shadow_vmcs, enable_shadow_vmcs, bool, S_IRUGO);
 /*
@@ -4323,6 +4326,21 @@ static void vmx_disable_intercept_msr_write_x2apic(u32 
msr)
msr, MSR_TYPE_W);
 }

+static void vmx_disable_intercept_guest_msr(struct kvm_vcpu *vcpu, u32 msr)
+{
+   if (irqchip_in_kernel(vcpu->kvm) &&
+   apic_x2apic_mode(vcpu->arch.apic)) {
+   vmx_disable_intercept_msr_read_x2apic(msr);
+   vmx_disable_intercept_msr_write_x2apic(msr);
+   }
+   else {
+   if (is_long_mode(vcpu))
+   vmx_disable_intercept_for_msr(msr, true);
+   else
+   vmx_disable_intercept_for_msr(msr, false);
+   }
+}
+
 static int vmx_vm_has_apicv(struct kvm *kvm)
 {
return enable_apicv && irqchip_in_kernel(kvm);
@@ -6037,6 +6055,13 @@ static __init int hardware_setup(void)
kvm_x86_ops->sync_pir_to_irr = vmx_sync_pir_to_irr_dummy;
}

+   if (!lbrv) {
+   kvm_x86_ops->set_debugctlmsr = NULL;
+   kvm_x86_ops->get_debugctlmsr = NULL;
+   kvm_x86_ops->set_lbr_msr = NULL;
+   kvm_x86_ops->get_lbr_msr = NULL;
+   }
+
vmx_disable_intercept_for_msr(MSR_FS_BASE, false);
vmx_disable_intercept_for_msr(MSR_GS_BASE, false);
vmx_disable_intercept_for_msr(MSR_KERNEL_GS_BASE, true);
@@ -8258,6 +8283,215 @@ static void atomic_switch_perf_msrs(struct vcpu_vmx 
*vmx)
msrs[i].host);
 }

+struct lbr_info {
+   u32 base;
+   u8 count;
+} pentium4_lbr[] = {
+   { MSR_LBR_SELECT,  1 },
+   { MSR_PENTIUM4_LER_FROM_LIP,   1 },
+   { MSR_PENTIUM4_LER_TO_LIP, 1 },
+   { MSR_PENTIUM4_LBR_TOS,1 },
+   { MSR_LBR_PENTIUM4_FROM,   SIZE_PENTIUM4_LBR_STACK },
+   { MSR_LBR_PENTIUM4_TO, SIZE_PENTIUM4_LBR_STACK },
+   { 0, 0 }
+}, core2_lbr[] = {
+   { MSR_LBR_SELECT,   1 },
+   { MSR_IA32_LASTINTFROMIP,   1 },
+   { MSR_IA32_LASTINTTOIP, 1 },
+   { MSR_LBR_TOS,  1 },
+   { MSR_LBR_CORE2_FROM,   SIZE_CORE2_LBR_STACK },
+   { MSR_LBR_CORE2_TO, SIZE_CORE2_LBR_STACK },
+   { 0, 0 }
+}, atom_lbr[] = {
+   { MSR_LBR_SELECT,   1 },
+   { MSR_IA32_LASTINTFROMIP,   1 },
+   { MSR_IA32_LASTINTTOIP, 1 },
+   { MSR_LBR_TOS,  1 },
+   { MSR_LBR_ATOM_FROM,SIZE_ATOM_LBR_STACK },
+   { MSR_LBR_ATOM_TO,  SIZE_ATOM_LBR_STACK },
+   { 0, 0 }
+}, nehalem_lbr[] = {
+   { MSR_LBR_SELECT,   1 },
+   { MSR_IA32_LASTINTFROMIP,   1 },
+   { MSR_IA32_LASTINTTOIP, 1 },
+   { MSR_LBR_TOS,  1 },
+   { MSR_LBR_NHM_FROM, SIZE_NHM_LBR_STACK },
+   { MSR_LBR_NHM_TO,   SIZE_NHM_LBR_STACK },
+   { 0, 0 }
+}, skylake_lbr[] = {
+   { MSR_LBR_SELECT,   1 },
+   { MSR_IA32_LASTINTFROMIP,   1 },
+   { MSR_IA32_LASTINTTOIP, 1 },
+   { MSR_LBR_TOS,  1 },
+   { MSR_LBR_SKYLAKE_FROM, SIZE_SKYLAKE_LBR_STACK },
+   { MSR_LBR_SKYLAKE_TO,   SIZE_SKYLAKE_LBR_STACK },
+   { 0, 0}
+};
+
+static const struct lbr_info *last_branch_msr_get(struct kvm_vcpu *vcpu)
+{
+   struct kvm_cpuid_entry2 *best = kvm_find_cpuid_entry(vcpu, 1, 0);
+   u32 eax = best->eax;
+   u8 family = (eax >> 8) & 0xf;
+   u8 model = (eax >> 4) & 0xf;
+
+   if (family == 15)
+   family += (eax >> 20) & 0xff;
+   if (family >= 6)
+   model += ((eax >> 16) & 0xf) << 4;
+
+   if (family == 6)
+   {
+   switch (model)
+   {
+   case 15: /* 65nm Core2 "Merom" */
+   case 22: /* 65nm Core2 "Merom-L" */
+   case 23: /* 45nm Core2 "Penryn" */
+   case 29: /* 45nm Core2 "Dunnington (MP) */
+   

[PATCH v2 2/4] KVM: X86: LBR MSRs of supported CPU types

2015-10-23 Thread Jian Zhou
Macros about LBR MSRs.

Signed-off-by: Jian Zhou 
Signed-off-by: Stephen He 
---
 arch/x86/include/asm/msr-index.h | 26 --
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index b98b471..2afcacd 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -68,10 +68,32 @@

 #define MSR_LBR_SELECT 0x01c8
 #define MSR_LBR_TOS0x01c9
+#define MSR_LBR_CORE_FROM  0x0040
+#define MSR_LBR_CORE_TO0x0060
+#define MAX_NUM_LBR_MSRS   128
+/* Pentium4/Xeon(based on NetBurst) LBR */
+#define MSR_PENTIUM4_LER_FROM_LIP  0x01d7
+#define MSR_PENTIUM4_LER_TO_LIP0x01d8
+#define MSR_PENTIUM4_LBR_TOS   0x01da
+#define MSR_LBR_PENTIUM4_FROM  0x0680
+#define MSR_LBR_PENTIUM4_TO0x06c0
+#define SIZE_PENTIUM4_LBR_STACK16
+/* Core2 LBR */
+#define MSR_LBR_CORE2_FROM MSR_LBR_CORE_FROM
+#define MSR_LBR_CORE2_TO   MSR_LBR_CORE_TO
+#define SIZE_CORE2_LBR_STACK   4
+/* Atom LBR */
+#define MSR_LBR_ATOM_FROM  MSR_LBR_CORE_FROM
+#define MSR_LBR_ATOM_TOMSR_LBR_CORE_TO
+#define SIZE_ATOM_LBR_STACK8
+/* Nehalem LBR */
 #define MSR_LBR_NHM_FROM   0x0680
 #define MSR_LBR_NHM_TO 0x06c0
-#define MSR_LBR_CORE_FROM  0x0040
-#define MSR_LBR_CORE_TO0x0060
+#define SIZE_NHM_LBR_STACK 16
+/* Skylake LBR */
+#define MSR_LBR_SKYLAKE_FROM   MSR_LBR_NHM_FROM
+#define MSR_LBR_SKYLAKE_TO MSR_LBR_NHM_TO
+#define SIZE_SKYLAKE_LBR_STACK 32

 #define MSR_LBR_INFO_0 0x0dc0 /* ... 0xddf for _31 */
 #define LBR_INFO_MISPRED   BIT_ULL(63)
--
1.7.12.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 2/7] vfio: platform: add capability to register a reset function

2015-10-23 Thread Eric Auger
In preparation for subsequent changes in reset function lookup,
lets introduce a dynamic list of reset combos (compat string,
reset module, reset function). The list can be populated/voided with
two new functions, vfio_platform_register/unregister_reset. Those are
not yet used in this patch.

Signed-off-by: Eric Auger 

---

v2 -> v3:
- use goto out to have a single mutex_unlock
- implement vfio_platform_register_reset as a macro (suggested by Arnd)
- move reset_node struct declaration back to vfio_platform_private.h
- vfio_platform_unregister_reset does not return any value anymore

v1 -> v2:
- reset_list becomes static
- vfio_platform_register/unregister_reset take a const char * as compat
- fix node leak
- add reset_lock to protect the reset list manipulation
- move vfio_platform_reset_node declaration in vfio_platform_common.c
---
 drivers/vfio/platform/vfio_platform_common.c  | 26 ++
 drivers/vfio/platform/vfio_platform_private.h | 20 
 2 files changed, 46 insertions(+)

diff --git a/drivers/vfio/platform/vfio_platform_common.c 
b/drivers/vfio/platform/vfio_platform_common.c
index 184e9d2..8eccd30 100644
--- a/drivers/vfio/platform/vfio_platform_common.c
+++ b/drivers/vfio/platform/vfio_platform_common.c
@@ -27,6 +27,7 @@
 #define DRIVER_AUTHOR   "Antonios Motakis "
 #define DRIVER_DESC "VFIO platform base module"
 
+static LIST_HEAD(reset_list);
 static DEFINE_MUTEX(driver_lock);
 
 static const struct vfio_platform_reset_combo reset_lookup_table[] = {
@@ -578,6 +579,31 @@ struct vfio_platform_device 
*vfio_platform_remove_common(struct device *dev)
 }
 EXPORT_SYMBOL_GPL(vfio_platform_remove_common);
 
+int __vfio_platform_register_reset(struct vfio_platform_reset_node *node)
+{
+   mutex_lock(_lock);
+   list_add(>link, _list);
+   mutex_unlock(_lock);
+   return 0;
+}
+EXPORT_SYMBOL_GPL(__vfio_platform_register_reset);
+
+void vfio_platform_unregister_reset(const char *compat)
+{
+   struct vfio_platform_reset_node *iter, *temp;
+
+   mutex_lock(_lock);
+   list_for_each_entry_safe(iter, temp, _list, link) {
+   if (!strcmp(iter->compat, compat)) {
+   list_del(>link);
+   break;
+   }
+   }
+
+   mutex_unlock(_lock);
+}
+EXPORT_SYMBOL_GPL(vfio_platform_unregister_reset);
+
 MODULE_VERSION(DRIVER_VERSION);
 MODULE_LICENSE("GPL v2");
 MODULE_AUTHOR(DRIVER_AUTHOR);
diff --git a/drivers/vfio/platform/vfio_platform_private.h 
b/drivers/vfio/platform/vfio_platform_private.h
index 7128690..277521a 100644
--- a/drivers/vfio/platform/vfio_platform_private.h
+++ b/drivers/vfio/platform/vfio_platform_private.h
@@ -71,6 +71,15 @@ struct vfio_platform_device {
int (*reset)(struct vfio_platform_device *vdev);
 };
 
+typedef int (*vfio_platform_reset_fn_t)(struct vfio_platform_device *vdev);
+
+struct vfio_platform_reset_node {
+   struct list_head link;
+   char *compat;
+   struct module *owner;
+   vfio_platform_reset_fn_t reset;
+};
+
 struct vfio_platform_reset_combo {
const char *compat;
const char *reset_function_name;
@@ -90,4 +99,15 @@ extern int vfio_platform_set_irqs_ioctl(struct 
vfio_platform_device *vdev,
unsigned start, unsigned count,
void *data);
 
+extern int __vfio_platform_register_reset(struct vfio_platform_reset_node *n);
+extern void vfio_platform_unregister_reset(const char *compat);
+
+#define vfio_platform_register_reset(__compat, __reset)\
+static struct vfio_platform_reset_node __reset ## _node = {\
+   .owner = THIS_MODULE,   \
+   .compat = __compat, \
+   .reset = __reset,   \
+}; \
+__vfio_platform_register_reset(&__reset ## _node)
+
 #endif /* VFIO_PLATFORM_PRIVATE_H */
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 0/7] VFIO platform reset module rework

2015-10-23 Thread Eric Auger
This series fixes the current implementation by getting rid of the
usage of __symbol_get which caused a compilation issue with
CONFIG_MODULES disabled. On top of this, the usage of MODULE_ALIAS makes
possible to add a new reset module without being obliged to update the
framework. The new implementation relies on the reset module registering
its reset function to the vfio-platform driver.

The series is available at

https://git.linaro.org/people/eric.auger/linux.git/shortlog/refs/heads/v4.3-rc6-rework-v3

Best Regards

Eric

v2 -> v3:
- use driver_mutex instead of reset_mutex
- style fixes: single mutex_unlock
- use static nodes; vfio_platform_register_reset now is a macro
- vfio_platform_reset_private.h removed since reset_module_(un)register
  disappear. No use of symbol_get anymore
- new patch introducing vfio-platform-base
- reset look-up moved back at vfio-platform probe time
- new patch featuring dev_info/dev_warn

v1 -> v2:
* in vfio_platform_common.c:
  - move reset lookup at load time and put reset at release: this is to
prevent a race between the 2 load module loads
  - reset_list becomes static
  - vfio_platform_register/unregister_reset take a const char * as compat
  - fix node link
  - remove old combo struct and cleanup proto of vfio_platform_get_reset
  - add mutex to protect the reset list
* in calxeda xgmac reset module
  - introduce vfio_platform_reset_private.h
  - use module_vfio_reset_handler macro
  - do not export vfio_platform_calxedaxgmac_reset symbol anymore
  - add a pr_info to show the device is reset by vfio reset module


Eric Auger (7):
  vfio: platform: introduce vfio-platform-base module
  vfio: platform: add capability to register a reset function
  vfio: platform: introduce module_vfio_reset_handler macro
  vfio: platform: reset: calxedaxgmac: add reset function registration
  vfio: platform: add compat in vfio_platform_device
  vfio: platform: use list of registered reset function
  vfio: platform: add dev_info on device reset

 drivers/vfio/platform/Makefile |   6 +-
 .../platform/reset/vfio_platform_calxedaxgmac.c|   5 +-
 drivers/vfio/platform/vfio_amba.c  |   1 +
 drivers/vfio/platform/vfio_platform.c  |   1 +
 drivers/vfio/platform/vfio_platform_common.c   | 118 +++--
 drivers/vfio/platform/vfio_platform_private.h  |  40 ++-
 6 files changed, 129 insertions(+), 42 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 17/33] dimm: abstract dimm device from pc-dimm

2015-10-23 Thread Bharata B Rao
On Mon, Oct 19, 2015 at 6:24 AM, Xiao Guangrong
 wrote:
> A base device, dimm, is abstracted from pc-dimm, so that we can
> build nvdimm device based on dimm in the later patch
>
> Signed-off-by: Xiao Guangrong 
> ---
>  default-configs/i386-softmmu.mak   |  1 +
>  default-configs/x86_64-softmmu.mak |  1 +
>  hw/mem/Makefile.objs   |  3 ++-
>  hw/mem/dimm.c  | 11 ++---
>  hw/mem/pc-dimm.c   | 46 
> ++
>  include/hw/mem/dimm.h  |  4 ++--
>  include/hw/mem/pc-dimm.h   |  7 ++
>  7 files changed, 61 insertions(+), 12 deletions(-)
>  create mode 100644 hw/mem/pc-dimm.c
>  create mode 100644 include/hw/mem/pc-dimm.h
>
> diff --git a/default-configs/i386-softmmu.mak 
> b/default-configs/i386-softmmu.mak
> index 43c96d1..3ece8bb 100644
> --- a/default-configs/i386-softmmu.mak
> +++ b/default-configs/i386-softmmu.mak
> @@ -18,6 +18,7 @@ CONFIG_FDC=y
>  CONFIG_ACPI=y
>  CONFIG_ACPI_X86=y
>  CONFIG_ACPI_X86_ICH=y
> +CONFIG_DIMM=y
>  CONFIG_ACPI_MEMORY_HOTPLUG=y
>  CONFIG_ACPI_CPU_HOTPLUG=y
>  CONFIG_APM=y
> diff --git a/default-configs/x86_64-softmmu.mak 
> b/default-configs/x86_64-softmmu.mak
> index dfb8095..92ea7c1 100644
> --- a/default-configs/x86_64-softmmu.mak
> +++ b/default-configs/x86_64-softmmu.mak
> @@ -18,6 +18,7 @@ CONFIG_FDC=y
>  CONFIG_ACPI=y
>  CONFIG_ACPI_X86=y
>  CONFIG_ACPI_X86_ICH=y
> +CONFIG_DIMM=y

Same change needs to be done in default-configs/ppc64-softmmu.mak too.

Regards,
Bharata.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC Patch 00/12] IXGBE: Add live migration support for SRIOV NIC

2015-10-23 Thread Alexander Duyck

On 10/23/2015 12:05 PM, Alex Williamson wrote:

On Fri, 2015-10-23 at 11:36 -0700, Alexander Duyck wrote:

On 10/21/2015 09:37 AM, Lan Tianyu wrote:

This patchset is to propose a new solution to add live migration support for 
82599
SRIOV network card.

Im our solution, we prefer to put all device specific operation into VF and
PF driver and make code in the Qemu more general.


VF status migration
=
VF status can be divided into 4 parts
1) PCI configure regs
2) MSIX configure
3) VF status in the PF driver
4) VF MMIO regs

The first three status are all handled by Qemu.
The PCI configure space regs and MSIX configure are originally
stored in Qemu. To save and restore "VF status in the PF driver"
by Qemu during migration, adds new sysfs node "state_in_pf" under
VF sysfs directory.

For VF MMIO regs, we introduce self emulation layer in the VF
driver to record MMIO reg values during reading or writing MMIO
and put these data in the guest memory. It will be migrated with
guest memory to new machine.


VF function restoration

Restoring VF function operation are done in the VF and PF driver.

In order to let VF driver to know migration status, Qemu fakes VF
PCI configure regs to indicate migration status and add new sysfs
node "notify_vf" to trigger VF mailbox irq in order to notify VF
about migration status change.

Transmit/Receive descriptor head regs are read-only and can't
be restored via writing back recording reg value directly and they
are set to 0 during VF reset. To reuse original tx/rx rings, shift
desc ring in order to move the desc pointed by original head reg to
first entry of the ring and then enable tx/rx rings. VF restarts to
receive and transmit from original head desc.


Tracking DMA accessed memory
=
Migration relies on tracking dirty page to migrate memory.
Hardware can't automatically mark a page as dirty after DMA
memory access. VF descriptor rings and data buffers are modified
by hardware when receive and transmit data. To track such dirty memory
manually, do dummy writes(read a byte and write it back) when receive
and transmit data.


I was thinking about it and I am pretty sure the dummy write approach is
problematic at best.  Specifically the issue is that while you are
performing a dummy write you risk pulling in descriptors for data that
hasn't been dummy written to yet.  So when you resume and restore your
descriptors you will have once that may contain Rx descriptors
indicating they contain data when after the migration they don't.

I really think the best approach to take would be to look at
implementing an emulated IOMMU so that you could track DMA mapped pages
and avoid migrating the ones marked as DMA_FROM_DEVICE until they are
unmapped.  The advantage to this is that in the case of the ixgbevf
driver it now reuses the same pages for Rx DMA.  As a result it will be
rewriting the same pages often and if you are marking those pages as
dirty and transitioning them it is possible for a flow of small packets
to really make a mess of things since you would be rewriting the same
pages in a loop while the device is processing packets.


I'd be concerned that an emulated IOMMU on the DMA path would reduce
throughput to the point where we shouldn't even bother with assigning
the device in the first place and should be using virtio-net instead.
POWER systems have a guest visible IOMMU and it's been challenging for
them to get to 10Gbps, requiring real-mode tricks.  virtio-net may add
some latency, but it's not that hard to get it to 10Gbps and it already
supports migration.  An emulated IOMMU in the guest is really only good
for relatively static mappings, the latency for anything else is likely
too high.  Maybe there are shadow page table tricks that could help, but
it's imposing overhead the whole time the guest is running, not only on
migration.  Thanks,



The big overhead I have seen with IOMMU implementations is the fact that 
they almost always have some sort of locked table or tree that prevents 
multiple CPUs from accessing resources in any kind of timely fashion. 
As a result things like Tx is usually slowed down for network workloads 
when multiple CPUs are enabled.


I admit doing a guest visible IOMMU would probably add some overhead, 
but this current patch set as implemented already has some of the hints 
of that as the descriptor rings are locked which means we cannot unmap 
in the Tx clean-up while we are mapping on another Tx queue for instance.


One approach for this would be to implement or extend a lightweight DMA 
API such as swiotlb or nommu.  The code would need to have a bit in 
there so it can take care of marking the pages as dirty on sync_for_cpu 
and unmap calls when set for BIDIRECTIONAL or FROM_DEVICE.  Then if we 
could somehow have some mechanism for the 

Re: Steal time accounting in KVM. Benchmark.

2015-10-23 Thread Alexey Makhalov
 What I figured out.
It happens in intersection of 3 features:
*irq time accounting
*stolen time accounting
*linux guest with tickless idle only (not fully tickless)

Looks like timer interrupts storm is happening during this benchmark
(with 2:1 cpu overcommit). irq time accounting gets crazy. Even 'top'
shows weird statistic: 50% hi, 50% st, ~0% user, spinning processes
use ~0% cpu - that is not correct.

Thanks.


On Tue, Oct 20, 2015 at 5:24 PM, Alexey Makhalov  wrote:
> Yes, VM1 results are as before.
>
> Alexey
>
> On Tue, Oct 20, 2015 at 4:04 PM, Wanpeng Li  wrote:
>> On 10/21/15 4:05 AM, Alexey Makhalov wrote:
>>>
>>> 'echo NO_NONTASK_CAPACITY > /sys/kernel/debug/sched_features'  in both
>>> guests.
>>> Results:
>>> VM1: STA is disabled -- no changes, still little bit bellow expected 90%
>>> VM2: STA is enabled -- result is changed, but still bad. Hard to say
>>> better or worse. It prefers to stuck at quarters (100% 75% 50% 25%)
>>> Output is attached.
>>
>>
>> If the output in attachment is for VM2 only?
>>
>> Regards,
>> Wanpeng Li
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/12] KVM: x86: add support for VMX TSC scaling

2015-10-23 Thread Haozhong Zhang
On Fri, Oct 23, 2015 at 02:51:06PM +0200, Paolo Bonzini wrote:
> 
> 
> On 23/10/2015 14:46, Joerg Roedel wrote:
> >> > No, since I don't have AMD machines at hand. The modifications to SVM
> >> > code are mostly lifting common code with VMX TSC scaling code, so it
> >> > should still work on AMD machines.
> > Well, I think it would be good if you can provide a Tested-by on AMD
> > machines from someone who has one. Or get one yourself when changing AMD
> > specific code, they are not that expensive :)
> > I can do some testing when I am back from my travels, but that will not
> > be before early November.
> 
> I have one now (mine, not just Red Hat's). :D
> 
> Paolo

Hi Paolo,

I just posted the test instructions. It would be very appreciated if
you can help to test this patchset on AMD machines (two are required).

Thanks,
Haozhong

> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 6/7] vfio: platform: use list of registered reset function

2015-10-23 Thread Eric Auger
On 10/23/2015 04:23 PM, Arnd Bergmann wrote:
> On Friday 23 October 2015 16:11:08 Eric Auger wrote:
>> Hi Arnd,
>> On 10/23/2015 03:12 PM, Arnd Bergmann wrote:
>>> On Friday 23 October 2015 14:37:14 Eric Auger wrote:
 Remove the static lookup table and use the dynamic list of registered
 reset functions instead. Also load the reset module through its alias.
 The reset struct module pointer is stored in vfio_platform_device.

 We also remove the useless struct device pointer parameter in
 vfio_platform_get_reset.

 This patch fixes the issue related to the usage of __symbol_get, which
 besides from being moot, prevented compilation with CONFIG_MODULES
 disabled.

 Also usage of MODULE_ALIAS makes possible to add a new reset module
 without needing to update the framework. This was suggested by Arnd.

 Signed-off-by: Eric Auger 
 Reported-by: Arnd Bergmann 


>>> Reviewed-by: Arnd Bergmann 
>>>
>>> but doesn't this need to come before patch 4/7?
>> Well I don't think so. In [4] we introduce the dynamic registration
>> method but until this patch we still use the old lookup method in the
>> static table. I tested and the reset lookup still works in [4].
>> If we put this one before the registration, the functionality will be
>> lost here.
>>
> 
> Ok, I see. I was getting confused by the removal of the EXPORT_SYMBOL
> statement there and thought it would break the __get_symbol call.

Hum no actually you're right. I checked the reset module was loaded but
effectly the _get_symbol fails. So if I want to keep the functionality
all along the series I need to remove the EXPORT_SYMBOL when I swap the
lookup method.

Thanks

Eric
> 
>   Arnd
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/12] KVM: x86: add support for VMX TSC scaling

2015-10-23 Thread Haozhong Zhang
Following is how I test this patchset. It should also apply to AMD
machines by replacing Intel with AMD and VMX TSC scaling with SVM TSC
ratio.

* Hardware Requirements
  1) Two machines with Intel CPUs, called M_A and M_B below.
  2) TSC frequency of CPUs on M_A is different from CPUs on M_B.
 Suppose TSC frequency on M_A is f_a KHz.
  3) At least CPUs on M_B support VMX TSC scaling.

* Software Requirements
  1) Apply this patchset to KVM on both machines.
  2) Apply QEMU patches[1] to QEMU commit 40fe17b on both machines

* Test Process
  1) Start a linux guest on M_A
   qemu-system-x86_64 -enable-kvm -smp 4 -cpu qemu66 -m 512 -hda linux.img
   
  2) In guest linux, check the TSC frequency detected by Linux kernel.
 e.g. search in dmeg for messages like
   "tsc: Detected XYZ.ABC MHz processor" or
   "tsc: Refined TSC clocksource calibration: XYZ.ABC MHz"
  
  3) Start QEMU waiting for migration on M_B:
   qemu-system-x86_64 -enable-kvm -smp 4 -cpu qemu64,load-tsc-freq -m 512 
-hda linux.img -incoming tcp:0:1234
   
  4) Migrate above VM to M_B as normal in QEMU monitor:
   migrate tcp::1234
   
  5) After the migration, if VMX TSC scaling and this patchset work on
 M_B, no messages like
   "Clocksource tsc unstable (delta = x ns)"
 should appear in dmesg of guest linux

  6) Furthermore, users can also check whether guest TSC after the
 migration increases in the same rate as before by running the
 attached program test_tsc in VM:
   ./test_tsc N f_a
 It measures the number of TSC ticks passed in N seconds, and
 divides it by the expected TSC frequency f_a to get the output
 result. If this patchset works, the output should be very closed
 to N
  
[1] http://www.spinics.net/lists/kvm/msg122421.html

Thanks,
Haozhong
#include 
#include 
#include 
#include 

static inline uint64_t rdtsc(void)
{
uint32_t lo, hi;
asm volatile("lfence; rdtsc" : "=a" (lo), "=d" (hi));
return (uint64_t)hi << 32 | lo;
}

int main(int argc, char **argv)
{
uint64_t tsc0, tsc1;
int ns, tsc_khz;
double delta;

if (argc < 2) {
printf("Usage: %s  \n", argv[0]);
return -1;
}

if ((ns = atoi(argv[1])) <= 0)
return -1;
if ((tsc_khz = atoi(argv[2])) <= 0)
return -1;

tsc0 = rdtsc();
sleep(ns);
tsc1 = rdtsc();

delta = tsc1 - tsc0;
printf("Passed %lf s\n", delta / (tsc_khz * 1000.0));

return 0;
}


Re: [PATCH v2 0/3] target-i386: save/restore vcpu's TSC rate during migration

2015-10-23 Thread Eduardo Habkost
On Fri, Oct 23, 2015 at 10:27:27AM +0800, Haozhong Zhang wrote:
> On Thu, Oct 22, 2015 at 04:45:21PM -0200, Eduardo Habkost wrote:
> > On Tue, Oct 20, 2015 at 03:22:51PM +0800, Haozhong Zhang wrote:
> > > This patchset enables QEMU to save/restore vcpu's TSC rate during the
> > > migration. When cooperating with KVM which supports TSC scaling, guest
> > > programs can observe a consistent guest TSC rate even though they are
> > > migrated among machines with different host TSC rates.
> > > 
> > > A pair of cpu options 'save-tsc-freq' and 'load-tsc-freq' are added to
> > > control the migration of vcpu's TSC rate.
> > 
> > The requirements and goals aren't clear to me. I see two possible use
> > cases, here:
> > 
> > 1) Best effort to keep TSC frequency constant if possible (but not
> >aborting migration if not possible). This would be an interesting
> >default, but a bit unpredictable.
> > 2) Strictly ensuring TSC frequency stays constant on migration (and
> >aborting migration if not possible). This would be an useful feature,
> >but can't be enabled by default unless both hosts have the same TSC
> >frequency or support TSC scaling.
> > 
> > Which one(s) you are trying to implement?
> >
> 
> The former. I agree that it's unpredictable if setting vcpu's TSC
> frequency to the migrated value is enabled by default (but not in this
> patchset). The cpu option 'load-tsc-freq' is introduced to allow users
> to enable this behavior if they do know the underlying KVM and CPU
> support TSC scaling. In this way, I think the behavior is predictable
> as users do know what they are doing.

I'm confused. If load-tsc-freq doesn't abort when TSC scaling isn't
available (use case #1), why isn't it enabled by default? On the other
hand, if you expect the user to enable it only if the host supports TSC
scaling, why doesn't it abort if TSC scaling isn't available?

I mean, we can implement both use cases above this way:

1) If the user didn't ask for anything explicitly:
  * If the tsc-freq value is available in the migration stream, try to
set it (but don't abort if it can't be set). (use case #1 above)
* Rationale: it won't hurt to try to make the VM behave nicely if
  possible, without blocking migration if TSC scaling isn't
  available.
2) If the user asked for the TSC frequency to be enforced, set it and
  abort if it couldn't be set (use case #2 above). This could apply to
  both cases:
  2.1) If tsc-freq is explicitly set in the command-line.
* Rationale: if the user asked for a specific frequency, we
  should do what was requested and not ignore errors silently.
  2.2) If tsc-freq is available in the migration stream, and the
user asked explicitly for it to be enforced.
* Rationale: the user is telling us that the incoming tsc-freq
  is important, so we shouldn't ignore it silently.
* Open question: how should we name the new option?
  "load-tsc-freq" would be misleading because it won't be just about
  _loading_ tsc-freq (we would be loading it on use case #1, too),
  but about making sure it is enforced. "strict-tsc-freq"?
  "enforce-tsc-freq"?

We don't need to implement both #1 and #2 at the same time. But if you
just want to implement #1 first, I don't see the need for the
"load-tsc-freq" option.

On the migration source, we need another option or internal machine flag
for #1. I am not sure it should be an user-visible option. If
user-visible, I don't know how to name it. "save-tsc-freq" describes it
correctly, but it doesn't make its purpose very clear. Any suggestions?
It can also be implemented first as an internal machine class flag (set
in pc >= 2.5 only), and possibly become a user-visible option later.

> 
> > In other words, what is the right behavior when KVM_SET_TSC_KHZ fails or
> > KVM_CAP_TSC_CONTROL is not available? We can't answer that question if
> > the requirements and goals are not clear.
> >
> 
> If KVM_CAP_TSC_CONTROL is unavailable, QEMU and KVM will use the host
> TSC frequency as vcpu's TSC frequency.
> 
> If KVM_CAP_TSC_CONTROL is available and KVM_SET_TSC_KHZ fails, the
> setting of TSC frequency will fail and abort either the VM creation
> (this is the case for cpu option 'tsc-freq') or the migration.

I don't see why the lack of KVM_CAP_TSC_CONTROL and failure of
KVM_SET_TSC_KHZ should be treated differently. In both cases it means we
the TSC frequency can't be set.

I mean: if KVM_SET_TSC_KHZ is important enough for the user to make QEMU
abort, it should abort if KVM_CAP_TSC_CONTROL isn't even available. On
the other hand, if the user doesn't care about the lack of
KVM_CAP_TSC_CONTROL (meaning it isn't possible to call KVM_SET_TSC_KHZ
at all), I don't see why they would care if KVM_SET_TSC_KHZ failed.


> 
> > Once we know what exactly is the goal, we could enable the new mode with
> > a single option, instead of raw options to control migration stream
> > loading/saving.
> >
> 
> Saving vcpu's TSC 

Re: [PATCH v3 6/7] vfio: platform: use list of registered reset function

2015-10-23 Thread Arnd Bergmann
On Friday 23 October 2015 16:11:08 Eric Auger wrote:
> Hi Arnd,
> On 10/23/2015 03:12 PM, Arnd Bergmann wrote:
> > On Friday 23 October 2015 14:37:14 Eric Auger wrote:
> >> Remove the static lookup table and use the dynamic list of registered
> >> reset functions instead. Also load the reset module through its alias.
> >> The reset struct module pointer is stored in vfio_platform_device.
> >>
> >> We also remove the useless struct device pointer parameter in
> >> vfio_platform_get_reset.
> >>
> >> This patch fixes the issue related to the usage of __symbol_get, which
> >> besides from being moot, prevented compilation with CONFIG_MODULES
> >> disabled.
> >>
> >> Also usage of MODULE_ALIAS makes possible to add a new reset module
> >> without needing to update the framework. This was suggested by Arnd.
> >>
> >> Signed-off-by: Eric Auger 
> >> Reported-by: Arnd Bergmann 
> >>
> >>
> > Reviewed-by: Arnd Bergmann 
> > 
> > but doesn't this need to come before patch 4/7?
> Well I don't think so. In [4] we introduce the dynamic registration
> method but until this patch we still use the old lookup method in the
> static table. I tested and the reset lookup still works in [4].
> If we put this one before the registration, the functionality will be
> lost here.
> 

Ok, I see. I was getting confused by the removal of the EXPORT_SYMBOL
statement there and thought it would break the __get_symbol call.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/3] target-i386: load the migrated vcpu's TSC rate

2015-10-23 Thread Eduardo Habkost
On Fri, Oct 23, 2015 at 11:14:48AM +0800, Haozhong Zhang wrote:
> On Thu, Oct 22, 2015 at 04:11:37PM -0200, Eduardo Habkost wrote:
> > On Tue, Oct 20, 2015 at 03:22:54PM +0800, Haozhong Zhang wrote:
> > > Set vcpu's TSC rate to the migrated value (if any). If KVM supports TSC
> > > scaling, guest programs will observe TSC increasing in the migrated rate
> > > other than the host TSC rate.
> > > 
> > > The loading is controlled by a new cpu option 'load-tsc-freq'. If it is
> > > present, then the loading will be enabled and the migrated vcpu's TSC
> > > rate will override the value specified by the cpu option
> > > 'tsc-freq'. Otherwise, the loading will be disabled.
> > 
> > Why do we need an option? Why can't we enable loading unconditionally?
> >
> 
> If TSC scaling is not supported by KVM and CPU, unconditionally
> enabling this loading will not take effect which would be different
> from users' expectation. 'load-tsc-freq' is introduced to allow users
> to enable the loading of migrated TSC frequency if they do know the
> underlying KVM and CPU have TSC scaling support.
> 

I don't get your argument about user expectations. We can't read the
user's mind, but let's enumerate all possible scenarios:

* Host has TSC scaling, user expect TSC frequency to be set:
  * We set it. The user is happy.
* Host has TSC scaling, user doesn't expect TSC frequency to be
  set:
  * We still set it. VM behaves better, guest doesn't see changing TSC
frequency. User didn't expect it but won't be unhappy.
* No TSC scaling, user expect TSC frequency to be set:
  * We won't set it, user will be unhappy. But I believe we all agree
we shouldn't make QEMU abort migration by default on all hosts that
don't support TSC scaling.
* No TSC scaling, user doesn't expect TSC frequency to be set:
  * We don't set it. User is happy.

Could you clarify on which items you disagree above, exactly?

-- 
Eduardo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next RFC 1/2] vhost: introduce vhost_has_work()

2015-10-23 Thread Jason Wang


On 10/22/2015 04:38 PM, Michael S. Tsirkin wrote:
> On Thu, Oct 22, 2015 at 01:27:28AM -0400, Jason Wang wrote:
>> > This path introduces a helper which can give a hint for whether or not
>> > there's a work queued in the work list.
>> > 
>> > Signed-off-by: Jason Wang 
>> > ---
>> >  drivers/vhost/vhost.c | 6 ++
>> >  drivers/vhost/vhost.h | 1 +
>> >  2 files changed, 7 insertions(+)
>> > 
>> > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>> > index eec2f11..d42d11e 100644
>> > --- a/drivers/vhost/vhost.c
>> > +++ b/drivers/vhost/vhost.c
>> > @@ -245,6 +245,12 @@ void vhost_work_queue(struct vhost_dev *dev, struct 
>> > vhost_work *work)
>> >  }
>> >  EXPORT_SYMBOL_GPL(vhost_work_queue);
>> >  
>> > +bool vhost_has_work(struct vhost_dev *dev)
>> > +{
>> > +  return !list_empty(>work_list);
>> > +}
>> > +EXPORT_SYMBOL_GPL(vhost_has_work);
>> > +
>> >  void vhost_poll_queue(struct vhost_poll *poll)
>> >  {
>> >vhost_work_queue(poll->dev, >work);
> This doesn't take a lock so it's unreliable.
> I think it's ok in this case since it's just
> an optimization - but pls document this.
>

Ok, will do.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next RFC 2/2] vhost_net: basic polling support

2015-10-23 Thread Jason Wang


On 10/22/2015 05:33 PM, Michael S. Tsirkin wrote:
> On Thu, Oct 22, 2015 at 01:27:29AM -0400, Jason Wang wrote:
>> This patch tries to poll for new added tx buffer for a while at the
>> end of tx processing. The maximum time spent on polling were limited
>> through a module parameter. To avoid block rx, the loop will end it
>> there's new other works queued on vhost so in fact socket receive
>> queue is also be polled.
>>
>> busyloop_timeout = 50 gives us following improvement on TCP_RR test:
>>
>> size/session/+thu%/+normalize%
>> 1/ 1/   +5%/  -20%
>> 1/50/  +17%/   +3%
> Is there a measureable increase in cpu utilization
> with busyloop_timeout = 0?

Just run TCP_RR, no increasing. Will run a complete test on next version.

>
>> Signed-off-by: Jason Wang 
> We might be able to shave off the minor regression
> by careful use of likely/unlikely, or maybe
> deferring 

Yes, but what did "deferring" mean here?
 
>
>> ---
>>  drivers/vhost/net.c | 19 +++
>>  1 file changed, 19 insertions(+)
>>
>> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>> index 9eda69e..bbb522a 100644
>> --- a/drivers/vhost/net.c
>> +++ b/drivers/vhost/net.c
>> @@ -31,7 +31,9 @@
>>  #include "vhost.h"
>>  
>>  static int experimental_zcopytx = 1;
>> +static int busyloop_timeout = 50;
>>  module_param(experimental_zcopytx, int, 0444);
>> +module_param(busyloop_timeout, int, 0444);
> Pls add a description, including the units and the special
> value 0.

Ok.

>
>>  MODULE_PARM_DESC(experimental_zcopytx, "Enable Zero Copy TX;"
>> " 1 -Enable; 0 - Disable");
>>  
>> @@ -287,12 +289,23 @@ static void vhost_zerocopy_callback(struct ubuf_info 
>> *ubuf, bool success)
>>  rcu_read_unlock_bh();
>>  }
>>  
>> +static bool tx_can_busy_poll(struct vhost_dev *dev,
>> + unsigned long endtime)
>> +{
>> +unsigned long now = local_clock() >> 10;
> local_clock might go backwards if we jump between CPUs.
> One way to fix would be to record the CPU id and break
> out of loop if that changes.

Right, or maybe disable preemption in this case?

>
> Also - defer this until we actually know we need it?

Right.

>
>> +
>> +return busyloop_timeout && !need_resched() &&
>> +   !time_after(now, endtime) && !vhost_has_work(dev) &&
>> +   single_task_running();
> signal pending as well?

Yes.

>> +}
>> +
>>  /* Expects to be always run from workqueue - which acts as
>>   * read-size critical section for our kind of RCU. */
>>  static void handle_tx(struct vhost_net *net)
>>  {
>>  struct vhost_net_virtqueue *nvq = >vqs[VHOST_NET_VQ_TX];
>>  struct vhost_virtqueue *vq = >vq;
>> +unsigned long endtime;
>>  unsigned out, in;
>>  int head;
>>  struct msghdr msg = {
>> @@ -331,6 +344,8 @@ static void handle_tx(struct vhost_net *net)
>>% UIO_MAXIOV == nvq->done_idx))
>>  break;
>>  
>> +endtime  = (local_clock() >> 10) + busyloop_timeout;
>> +again:
>>  head = vhost_get_vq_desc(vq, vq->iov,
>>   ARRAY_SIZE(vq->iov),
>>   , ,
>> @@ -340,6 +355,10 @@ static void handle_tx(struct vhost_net *net)
>>  break;
>>  /* Nothing new?  Wait for eventfd to tell us they refilled. */
>>  if (head == vq->num) {
>> +if (tx_can_busy_poll(vq->dev, endtime)) {
>> +cpu_relax();
>> +goto again;
>> +}
>>  if (unlikely(vhost_enable_notify(>dev, vq))) {
>>  vhost_disable_notify(>dev, vq);
>>  continue;
>> -- 
>> 1.8.3.1
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next RFC 2/2] vhost_net: basic polling support

2015-10-23 Thread Jason Wang


On 10/23/2015 12:16 AM, Michael S. Tsirkin wrote:
> On Thu, Oct 22, 2015 at 08:46:33AM -0700, Rick Jones wrote:
>> On 10/22/2015 02:33 AM, Michael S. Tsirkin wrote:
>>> On Thu, Oct 22, 2015 at 01:27:29AM -0400, Jason Wang wrote:
 This patch tries to poll for new added tx buffer for a while at the
 end of tx processing. The maximum time spent on polling were limited
 through a module parameter. To avoid block rx, the loop will end it
 there's new other works queued on vhost so in fact socket receive
 queue is also be polled.

 busyloop_timeout = 50 gives us following improvement on TCP_RR test:

 size/session/+thu%/+normalize%
 1/ 1/   +5%/  -20%
 1/50/  +17%/   +3%
>>> Is there a measureable increase in cpu utilization
>>> with busyloop_timeout = 0?
>> And since a netperf TCP_RR test is involved, be careful about what netperf
>> reports for CPU util if that increase isn't in the context of the guest OS.

Right, the cpu utilization is measured on host.

>>
>> For completeness, looking at the effect on TCP_STREAM and TCP_MAERTS,
>> aggregate _RR and even aggregate _RR/packets per second for many VMs on the
>> same system would be in order.
>>
>> happy benchmarking,
>>
>> rick jones
> Absolutely, merging a new kernel API just for a specific
> benchmark doesn't make sense.
> I'm guessing this is just an early RFC, a fuller submission
> will probably include more numbers.
>

Yes, will run more complete tests.

Thanks

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 1/2] KVM: arm/arm64: Optimize away redundant LR tracking

2015-10-23 Thread Pavel Fedin
 Hello!

> -Original Message-
> From: Christoffer Dall [mailto:christoffer.d...@linaro.org]
> Sent: Friday, October 23, 2015 12:43 AM
> To: Pavel Fedin
> Cc: kvm...@lists.cs.columbia.edu; kvm@vger.kernel.org; Marc Zyngier; Andre 
> Przywara
> Subject: Re: [PATCH 1/2] KVM: arm/arm64: Optimize away redundant LR tracking
> 
> On Fri, Oct 02, 2015 at 05:44:28PM +0300, Pavel Fedin wrote:
> > Currently we use vgic_irq_lr_map in order to track which LRs hold which
> > IRQs, and lr_used bitmap in order to track which LRs are used or free.
> >
> > vgic_irq_lr_map is actually used only for piggy-back optimization, and
> > can be easily replaced by iteration over lr_used. This is good because in
> > future, when LPI support is introduced, number of IRQs will grow up to at
> > least 16384, while numbers from 1024 to 8192 are never going to be used.
> > This would be a huge memory waste.
> >
> > In its turn, lr_used is also completely redundant since
> > ae705930fca6322600690df9dc1c7d0516145a93 ("arm/arm64: KVM: Keep elrsr/aisr
> > in sync with software model"), because together with lr_used we also update
> > elrsr. This allows to easily replace lr_used with elrsr, inverting all
> > conditions (because in elrsr '1' means 'free').
> >
> > Signed-off-by: Pavel Fedin 
> > ---
> >  include/kvm/arm_vgic.h |  6 
> >  virt/kvm/arm/vgic.c| 74 
> > +++---
> >  2 files changed, 28 insertions(+), 52 deletions(-)
> >
> > diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> > index 4e14dac..d908028 100644
> > --- a/include/kvm/arm_vgic.h
> > +++ b/include/kvm/arm_vgic.h
> > @@ -296,9 +296,6 @@ struct vgic_v3_cpu_if {
> >  };
> >
> >  struct vgic_cpu {
> > -   /* per IRQ to LR mapping */
> > -   u8  *vgic_irq_lr_map;
> > -
> > /* Pending/active/both interrupts on this VCPU */
> > DECLARE_BITMAP( pending_percpu, VGIC_NR_PRIVATE_IRQS);
> > DECLARE_BITMAP( active_percpu, VGIC_NR_PRIVATE_IRQS);
> > @@ -309,9 +306,6 @@ struct vgic_cpu {
> > unsigned long   *active_shared;
> > unsigned long   *pend_act_shared;
> >
> > -   /* Bitmap of used/free list registers */
> > -   DECLARE_BITMAP( lr_used, VGIC_V2_MAX_LRS);
> > -
> > /* Number of list registers on this CPU */
> > int nr_lr;
> >
> > diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> > index 6bd1c9b..2f4d25a 100644
> > --- a/virt/kvm/arm/vgic.c
> > +++ b/virt/kvm/arm/vgic.c
> > @@ -102,9 +102,10 @@
> >  #include "vgic.h"
> >
> >  static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu);
> > -static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu);
> > +static void vgic_retire_lr(int lr_nr, struct kvm_vcpu *vcpu);
> >  static struct vgic_lr vgic_get_lr(const struct kvm_vcpu *vcpu, int lr);
> >  static void vgic_set_lr(struct kvm_vcpu *vcpu, int lr, struct vgic_lr 
> > lr_desc);
> > +static u64 vgic_get_elrsr(struct kvm_vcpu *vcpu);
> >  static struct irq_phys_map *vgic_irq_map_search(struct kvm_vcpu *vcpu,
> > int virt_irq);
> >
> > @@ -683,9 +684,11 @@ bool vgic_handle_cfg_reg(u32 *reg, struct 
> > kvm_exit_mmio *mmio,
> >  void vgic_unqueue_irqs(struct kvm_vcpu *vcpu)
> >  {
> > struct vgic_cpu *vgic_cpu = >arch.vgic_cpu;
> > +   u64 elrsr = vgic_get_elrsr(vcpu);
> > +   unsigned long *elrsr_ptr = u64_to_bitmask();
> > int i;
> >
> > -   for_each_set_bit(i, vgic_cpu->lr_used, vgic_cpu->nr_lr) {
> > +   for_each_clear_bit(i, elrsr_ptr, vgic_cpu->nr_lr) {
> > struct vgic_lr lr = vgic_get_lr(vcpu, i);
> >
> > /*
> > @@ -728,7 +731,7 @@ void vgic_unqueue_irqs(struct kvm_vcpu *vcpu)
> >  * Mark the LR as free for other use.
> >  */
> > BUG_ON(lr.state & LR_STATE_MASK);
> > -   vgic_retire_lr(i, lr.irq, vcpu);
> > +   vgic_retire_lr(i, vcpu);
> > vgic_irq_clear_queued(vcpu, lr.irq);
> >
> > /* Finally update the VGIC state. */
> > @@ -1087,15 +1090,12 @@ static inline void vgic_enable(struct kvm_vcpu 
> > *vcpu)
> > vgic_ops->enable(vcpu);
> >  }
> >
> > -static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu)
> > +static void vgic_retire_lr(int lr_nr, struct kvm_vcpu *vcpu)
> >  {
> > -   struct vgic_cpu *vgic_cpu = >arch.vgic_cpu;
> > struct vgic_lr vlr = vgic_get_lr(vcpu, lr_nr);
> >
> > vlr.state = 0;
> > vgic_set_lr(vcpu, lr_nr, vlr);
> > -   clear_bit(lr_nr, vgic_cpu->lr_used);
> > -   vgic_cpu->vgic_irq_lr_map[irq] = LR_EMPTY;
> > vgic_sync_lr_elrsr(vcpu, lr_nr, vlr);
> >  }
> >
> > @@ -1110,14 +1110,15 @@ static void vgic_retire_lr(int lr_nr, int irq, 
> > struct kvm_vcpu *vcpu)
> >   */
> >  static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu)
> >  {
> > -   struct vgic_cpu *vgic_cpu = >arch.vgic_cpu;
> > +   u64 elrsr = vgic_get_elrsr(vcpu);
> > +   unsigned long *elrsr_ptr = u64_to_bitmask();
> > int lr;
> >
> > -   

[PATCH] KVM: x86: fix RSM into 64-bit protected mode, round 2

2015-10-23 Thread Laszlo Ersek
Commit b10d92a54dac ("KVM: x86: fix RSM into 64-bit protected mode")
reordered the rsm_load_seg_64() and rsm_enter_protected_mode() calls,
relative to each other. The argument that said commit made was correct,
however putting rsm_enter_protected_mode() first whole-sale violated the
following (correct) invariant from em_rsm():

 * Get back to real mode, to prepare a safe state in which to load
 * CR0/CR3/CR4/EFER.  Also this will ensure that addresses passed
 * to read_std/write_std are not virtual.

Namely, rsm_enter_protected_mode() may re-enable paging, *after* which

  rsm_load_seg_64()
GET_SMSTATE()
  read_std()

will try to interpret the (smbase + offset) address as a virtual one. This
will result in unexpected page faults being injected to the guest in
response to the RSM instruction.

Split rsm_load_seg_64() in two parts:

- The first part, rsm_stash_seg_64(), shall call GET_SMSTATE() while in
  real mode, and save the relevant state off SMRAM into an array local to
  rsm_load_state_64().

- The second part, rsm_load_seg_64(), shall occur after entering protected
  mode, but the segment details shall come from the local array, not the
  guest's SMRAM.

Fixes: b10d92a54dac25a6152f1aa1ffc95c12908035ce
Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Cc: Jordan Justen 
Cc: Michael Kinney 
Cc: sta...@vger.kernel.org
Signed-off-by: Laszlo Ersek 
---
 arch/x86/kvm/emulate.c | 37 ++---
 1 file changed, 30 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 9da95b9..25e16b6 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2311,7 +2311,16 @@ static int rsm_load_seg_32(struct x86_emulate_ctxt 
*ctxt, u64 smbase, int n)
return X86EMUL_CONTINUE;
 }
 
-static int rsm_load_seg_64(struct x86_emulate_ctxt *ctxt, u64 smbase, int n)
+struct rsm_stashed_seg_64 {
+   u16 selector;
+   struct desc_struct desc;
+   u32 base3;
+};
+
+static int rsm_stash_seg_64(struct x86_emulate_ctxt *ctxt,
+   struct rsm_stashed_seg_64 *stash,
+   u64 smbase,
+   int n)
 {
struct desc_struct desc;
int offset;
@@ -2326,10 +2335,20 @@ static int rsm_load_seg_64(struct x86_emulate_ctxt 
*ctxt, u64 smbase, int n)
set_desc_base(,  GET_SMSTATE(u32, smbase, offset + 8));
base3 =   GET_SMSTATE(u32, smbase, offset + 12);
 
-   ctxt->ops->set_segment(ctxt, selector, , base3, n);
+   stash[n].selector = selector;
+   stash[n].desc = desc;
+   stash[n].base3 = base3;
return X86EMUL_CONTINUE;
 }
 
+static inline void rsm_load_seg_64(struct x86_emulate_ctxt *ctxt,
+  struct rsm_stashed_seg_64 *stash,
+  int n)
+{
+   ctxt->ops->set_segment(ctxt, stash[n].selector, [n].desc,
+  stash[n].base3, n);
+}
+
 static int rsm_enter_protected_mode(struct x86_emulate_ctxt *ctxt,
 u64 cr0, u64 cr4)
 {
@@ -2419,6 +2438,7 @@ static int rsm_load_state_64(struct x86_emulate_ctxt 
*ctxt, u64 smbase)
u32 base3;
u16 selector;
int i, r;
+   struct rsm_stashed_seg_64 stash[6];
 
for (i = 0; i < 16; i++)
*reg_write(ctxt, i) = GET_SMSTATE(u64, smbase, 0x7ff8 - i * 8);
@@ -2460,15 +2480,18 @@ static int rsm_load_state_64(struct x86_emulate_ctxt 
*ctxt, u64 smbase)
dt.address =GET_SMSTATE(u64, smbase, 0x7e68);
ctxt->ops->set_gdt(ctxt, );
 
+   for (i = 0; i < ARRAY_SIZE(stash); i++) {
+   r = rsm_stash_seg_64(ctxt, stash, smbase, i);
+   if (r != X86EMUL_CONTINUE)
+   return r;
+   }
+
r = rsm_enter_protected_mode(ctxt, cr0, cr4);
if (r != X86EMUL_CONTINUE)
return r;
 
-   for (i = 0; i < 6; i++) {
-   r = rsm_load_seg_64(ctxt, smbase, i);
-   if (r != X86EMUL_CONTINUE)
-   return r;
-   }
+   for (i = 0; i < ARRAY_SIZE(stash); i++)
+   rsm_load_seg_64(ctxt, stash, i);
 
return X86EMUL_CONTINUE;
 }
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html