Re: [PATCH 09/10] KVM: x86: MMU: Move parent_pte handling from kvm_mmu_get_page() to link_shadow_page()

2015-11-25 Thread Paolo Bonzini


On 20/11/2015 09:57, Xiao Guangrong wrote:
> 
> 
> You can move this patch to the front of
> [PATCH 08/10] KVM: x86: MMU: Use for_each_rmap_spte macro instead of
> pte_list_walk()
> 
> By moving kvm_mmu_mark_parents_unsync() to the behind of mmu_spte_set()
> (then the parent
> spte is present now), you can directly clean up for_each_rmap_spte().

So basically squash together the two patches (8/10 and 9/10) except the
change to kvm_mmu_mark_parents_unsync; then in the second patch switch
from pte_list_walk to for_each_rmap_spte.

That makes sense indeed.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] target-i386: kvm: Print warning when clearing mcg_cap bits

2015-11-25 Thread Borislav Petkov
On Wed, Nov 25, 2015 at 01:49:49PM -0200, Eduardo Habkost wrote:
> Instead of silently clearing mcg_cap bits when the host doesn't
> support them, print a warning when doing that.

Why the host? Why would we want there to be any relation between the MCA
capabilities of the host and what qemu is emulating?

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: nVMX: remove incorrect vpid check in nested invvpid emulation

2015-11-25 Thread Bandan Das
Haozhong Zhang  writes:

> On 11/25/15 10:45, Bandan Das wrote:
>> Haozhong Zhang  writes:
>> 
>> > This patch removes the vpid check when emulating nested invvpid
>> > instruction of type all-contexts invalidation. The existing code is
>> > incorrect because:
>> >  (1) According to Intel SDM Vol 3, Section "INVVPID - Invalidate
>> >  Translations Based on VPID", invvpid instruction does not check
>> >  vpid in the invvpid descriptor when its type is all-contexts
>> >  invalidation.
>> 
>> But iirc isn't vpid=0 reserved for root mode ?
> Yes,
>
>> I think we don't want
>> L1 hypervisor to be able do a invvpid(0).
>
> but the invvpid emulated here is doing the all-contexts invalidation that
> does not use the given vpid and "invalidates all mappings tagged with all
> VPIDs except VPID H" (from Intel SDM).

Actually, vpid_sync_context will always try single invalidation first and
I was concerned that we will end up calling vpid_sync_context(0). But that
will not happen since nested.vpid02 is always allocated a vpid.
So... we are good :)

>> 
>> >  (2) According to the same document, invvpid of type all-contexts
>> >  invalidation does not require there is an active VMCS, so/and
>> >  get_vmcs12() in the existing code may result in a NULL-pointer
>> >  dereference. In practice, it can crash both KVM itself and L1
>> >  hypervisors that use invvpid (e.g. Xen).
>> 
>> If that is the case, then just check if it's null and return without
>> doing anything.
>
> (according to Intel SDM) invvpid of type all-contexts invalidation
> should not trigger a valid vmx fail if vpid in the current VMCS is 0.

No, I meant just skip instruction and return but I doubt if there's
any overhead of flushing mappings that don't exist in the first place.
Anyway, better to do as the spec says.

> However, this check and its following operation do change this semantics
> in nested VMX, so it should be completely removed.
>
>> 
>> > Signed-off-by: Haozhong Zhang 
>> > ---
>> >  arch/x86/kvm/vmx.c | 5 -
>> >  1 file changed, 5 deletions(-)
>> >
>> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> > index 87acc52..af823a3 100644
>> > --- a/arch/x86/kvm/vmx.c
>> > +++ b/arch/x86/kvm/vmx.c
>> > @@ -7394,11 +7394,6 @@ static int handle_invvpid(struct kvm_vcpu *vcpu)
>> >  
>> >switch (type) {
>> >case VMX_VPID_EXTENT_ALL_CONTEXT:
>> > -  if (get_vmcs12(vcpu)->virtual_processor_id == 0) {
>> > -  nested_vmx_failValid(vcpu,
>> > -  VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID);
>> > -  return 1;
>> > -  }
>> >__vmx_flush_tlb(vcpu, to_vmx(vcpu)->nested.vpid02);
>> >nested_vmx_succeed(vcpu);
>> >break;
>> 
>> I also noticed a BUG() here in the default. It might be a good idea to 
>> replace
>> it with a WARN.
>
> Or, in nested_vmx_setup_ctls_msrs():
>
> if (enable_vpid)
> - vmx->nested.nested_vmx_vpid_caps = VMX_VPID_INVVPID_BIT |
> - VMX_VPID_EXTENT_GLOBAL_CONTEXT_BIT;
> +   vmx->nested.nested_vmx_vpid_caps = VMX_VPID_EXTENT_GLOBAL_CONTEXT_BIT;
>
> because the current handle_invvpid() only handles all-contexts invalidation.
>
> Haozhong
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] target-i386: kvm: Print warning when clearing mcg_cap bits

2015-11-25 Thread Eduardo Habkost
On Wed, Nov 25, 2015 at 05:45:20PM +0100, Paolo Bonzini wrote:
> On 25/11/2015 16:49, Eduardo Habkost wrote:
> > Instead of silently clearing mcg_cap bits when the host doesn't
> > support them, print a warning when doing that.
> > 
> > Signed-off-by: Eduardo Habkost 
> > ---
> >  target-i386/kvm.c | 8 +++-
> >  1 file changed, 7 insertions(+), 1 deletion(-)
> > 
> > diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> > index d63a85b..446bdfc 100644
> > --- a/target-i386/kvm.c
> > +++ b/target-i386/kvm.c
> > @@ -774,7 +774,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
> >  && (env->features[FEAT_1_EDX] & (CPUID_MCE | CPUID_MCA)) ==
> > (CPUID_MCE | CPUID_MCA)
> >  && kvm_check_extension(cs->kvm_state, KVM_CAP_MCE) > 0) {
> > -uint64_t mcg_cap;
> > +uint64_t mcg_cap, unsupported_caps;
> >  int banks;
> >  int ret;
> >  
> > @@ -790,6 +790,12 @@ int kvm_arch_init_vcpu(CPUState *cs)
> >  return -ENOTSUP;
> >  }
> >  
> > +unsupported_caps = env->mcg_cap & ~(mcg_cap | MCG_CAP_BANKS_MASK);
> > +if (unsupported_caps) {
> > +error_report("warning: Unsupported MCG_CAP bits: 0x%" PRIx64 
> > "\n",
> 
> \n should not be at end of error_report.
> 
> Fixed and applied.

MCG_CAP_BANKS_MASK is defined by patch 2/3. Have you applied the
whole series?

-- 
Eduardo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH V2 0/3] IXGBE/VFIO: Add live migration support for SRIOV NIC

2015-11-25 Thread Alexander Duyck
On Wed, Nov 25, 2015 at 12:21 AM, Lan Tianyu  wrote:
> On 2015年11月25日 13:30, Alexander Duyck wrote:
>> No, what I am getting at is that you can't go around and modify the
>> configuration space for every possible device out there.  This
>> solution won't scale.
>
>
> PCI config space regs are emulation by Qemu and so We can find the free
> PCI config space regs for the faked PCI capability. Its position can be
> not permanent.

Yes, but do you really want to edit every driver on every OS that you
plan to support this on.  What about things like direct assignment of
regular Ethernet ports?  What you really need is a solution that will
work generically on any existing piece of hardware out there.

>>  If you instead moved the logic for notifying
>> the device into a separate mechanism such as making it a part of the
>> hot-plug logic then you only have to write the code once per OS in
>> order to get the hot-plug capability to pause/resume the device.  What
>> I am talking about is not full hot-plug, but rather to extend the
>> existing hot-plug in Qemu and the Linux kernel to support a
>> "pause/resume" functionality.  The PCI hot-plug specification calls
>> out the option of implementing something like this, but we don't
>> currently have support for it.
>>
>
> Could you elaborate the part of PCI hot-plug specification you mentioned?
>
> My concern is whether it needs to change PCI spec or not.


In the PCI Hot-Plug Specification 1.1, in section 4.1.2 it states:
 In addition to quiescing add-in card activity, an operating-system
vendor may optionally implement a less drastic “pause” capability, in
anticipation of the same or a similar add-in card being reinserted.

The idea I had was basically if we were to implement something like
that in Linux then we could pause/resume the device instead of
outright removing it.  The pause functionality could make use of the
suspend/resume functionality most drivers already have for PCI power
management.

- Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: nVMX: remove incorrect vpid check in nested invvpid emulation

2015-11-25 Thread Haozhong Zhang
On 11/25/15 10:45, Bandan Das wrote:
> Haozhong Zhang  writes:
> 
> > This patch removes the vpid check when emulating nested invvpid
> > instruction of type all-contexts invalidation. The existing code is
> > incorrect because:
> >  (1) According to Intel SDM Vol 3, Section "INVVPID - Invalidate
> >  Translations Based on VPID", invvpid instruction does not check
> >  vpid in the invvpid descriptor when its type is all-contexts
> >  invalidation.
> 
> But iirc isn't vpid=0 reserved for root mode ?
Yes,

> I think we don't want
> L1 hypervisor to be able do a invvpid(0).

but the invvpid emulated here is doing the all-contexts invalidation that
does not use the given vpid and "invalidates all mappings tagged with all
VPIDs except VPID H" (from Intel SDM).

> 
> >  (2) According to the same document, invvpid of type all-contexts
> >  invalidation does not require there is an active VMCS, so/and
> >  get_vmcs12() in the existing code may result in a NULL-pointer
> >  dereference. In practice, it can crash both KVM itself and L1
> >  hypervisors that use invvpid (e.g. Xen).
> 
> If that is the case, then just check if it's null and return without
> doing anything.

(according to Intel SDM) invvpid of type all-contexts invalidation
should not trigger a valid vmx fail if vpid in the current VMCS is 0.
However, this check and its following operation do change this semantics
in nested VMX, so it should be completely removed.

> 
> > Signed-off-by: Haozhong Zhang 
> > ---
> >  arch/x86/kvm/vmx.c | 5 -
> >  1 file changed, 5 deletions(-)
> >
> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > index 87acc52..af823a3 100644
> > --- a/arch/x86/kvm/vmx.c
> > +++ b/arch/x86/kvm/vmx.c
> > @@ -7394,11 +7394,6 @@ static int handle_invvpid(struct kvm_vcpu *vcpu)
> >  
> > switch (type) {
> > case VMX_VPID_EXTENT_ALL_CONTEXT:
> > -   if (get_vmcs12(vcpu)->virtual_processor_id == 0) {
> > -   nested_vmx_failValid(vcpu,
> > -   VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID);
> > -   return 1;
> > -   }
> > __vmx_flush_tlb(vcpu, to_vmx(vcpu)->nested.vpid02);
> > nested_vmx_succeed(vcpu);
> > break;
> 
> I also noticed a BUG() here in the default. It might be a good idea to replace
> it with a WARN.

Or, in nested_vmx_setup_ctls_msrs():

if (enable_vpid)
-   vmx->nested.nested_vmx_vpid_caps = VMX_VPID_INVVPID_BIT |
-   VMX_VPID_EXTENT_GLOBAL_CONTEXT_BIT;
+   vmx->nested.nested_vmx_vpid_caps = VMX_VPID_EXTENT_GLOBAL_CONTEXT_BIT;

because the current handle_invvpid() only handles all-contexts invalidation.

Haozhong

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v1 7/7] kvm/x86: Hyper-V SynIC timers

2015-11-25 Thread Andrey Smetanin
Per Hyper-V specification (and as required by Hyper-V-aware guests),
SynIC provides 4 per-vCPU timers.  Each timer is programmed via a pair
of MSRs, and signals expiration by delivering a special format message
to the configured SynIC message slot and triggering the corresponding
synthetic interrupt.

Note: as implemented by this patch, all periodic timers are "lazy"
(i.e. if the vCPU wasn't scheduled for more than the timer period the
timer events are lost), regardless of the corresponding configuration
MSR.  If deemed necessary, the "catch up" mode (the timer period is
shortened until the timer catches up) will be implemented later.

Signed-off-by: Andrey Smetanin 
CC: Gleb Natapov 
CC: Paolo Bonzini 
CC: "K. Y. Srinivasan" 
CC: Haiyang Zhang 
CC: Vitaly Kuznetsov 
CC: Roman Kagan 
CC: Denis V. Lunev 
CC: qemu-de...@nongnu.org
---
 arch/x86/include/asm/kvm_host.h|  13 ++
 arch/x86/include/uapi/asm/hyperv.h |   6 +
 arch/x86/kvm/hyperv.c  | 325 -
 arch/x86/kvm/hyperv.h  |  24 +++
 arch/x86/kvm/x86.c |   9 +
 include/linux/kvm_host.h   |   1 +
 6 files changed, 375 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f608e17..e35c5ca 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -375,6 +375,17 @@ struct kvm_mtrr {
struct list_head head;
 };
 
+/* Hyper-V SynIC timer */
+struct kvm_vcpu_hv_stimer {
+   struct hrtimer timer;
+   int index;
+   u64 config;
+   u64 count;
+   u64 exp_time;
+   struct hv_message msg;
+   bool msg_pending;
+};
+
 /* Hyper-V synthetic interrupt controller (SynIC)*/
 struct kvm_vcpu_hv_synic {
u64 version;
@@ -394,6 +405,8 @@ struct kvm_vcpu_hv {
s64 runtime_offset;
struct kvm_vcpu_hv_synic synic;
struct kvm_hyperv_exit exit;
+   struct kvm_vcpu_hv_stimer stimer[HV_SYNIC_STIMER_COUNT];
+   DECLARE_BITMAP(stimer_pending_bitmap, HV_SYNIC_STIMER_COUNT);
 };
 
 struct kvm_vcpu_arch {
diff --git a/arch/x86/include/uapi/asm/hyperv.h 
b/arch/x86/include/uapi/asm/hyperv.h
index e86d77e..f9d3349 100644
--- a/arch/x86/include/uapi/asm/hyperv.h
+++ b/arch/x86/include/uapi/asm/hyperv.h
@@ -362,4 +362,10 @@ struct hv_message_page {
struct hv_message sint_message[HV_SYNIC_SINT_COUNT];
 };
 
+#define HV_STIMER_ENABLE   (1ULL << 0)
+#define HV_STIMER_PERIODIC (1ULL << 1)
+#define HV_STIMER_LAZY (1ULL << 2)
+#define HV_STIMER_AUTOENABLE   (1ULL << 3)
+#define HV_STIMER_SINT(config) (__u8)(((config) >> 16) & 0x0F)
+
 #endif
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 6412b6b..9f8eb82 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -147,15 +147,32 @@ static void kvm_hv_notify_acked_sint(struct kvm_vcpu 
*vcpu, u32 sint)
 {
struct kvm *kvm = vcpu->kvm;
struct kvm_vcpu_hv_synic *synic = vcpu_to_synic(vcpu);
-   int gsi, idx;
+   struct kvm_vcpu_hv *hv_vcpu = vcpu_to_hv_vcpu(vcpu);
+   struct kvm_vcpu_hv_stimer *stimer;
+   int gsi, idx, stimers_pending;
 
vcpu_debug(vcpu, "Hyper-V SynIC acked sint %d\n", sint);
 
if (synic->msg_page & HV_SYNIC_SIMP_ENABLE)
synic_clear_sint_msg_pending(synic, sint);
 
+   /* Try to deliver pending Hyper-V SynIC timers messages */
+   stimers_pending = 0;
+   for (idx = 0; idx < ARRAY_SIZE(hv_vcpu->stimer); idx++) {
+   stimer = _vcpu->stimer[idx];
+   if (stimer->msg_pending &&
+   (stimer->config & HV_STIMER_ENABLE) &&
+   HV_STIMER_SINT(stimer->config) == sint) {
+   set_bit(stimer->index,
+   hv_vcpu->stimer_pending_bitmap);
+   stimers_pending++;
+   }
+   }
+   if (stimers_pending)
+   kvm_make_request(KVM_REQ_HV_STIMER, vcpu);
+
idx = srcu_read_lock(>irq_srcu);
-   gsi = atomic_read(_to_synic(vcpu)->sint_to_gsi[sint]);
+   gsi = atomic_read(>sint_to_gsi[sint]);
if (gsi != -1)
kvm_notify_acked_gsi(kvm, gsi);
srcu_read_unlock(>irq_srcu, idx);
@@ -371,9 +388,275 @@ static u64 get_time_ref_counter(struct kvm *kvm)
return div_u64(get_kernel_ns() + kvm->arch.kvmclock_offset, 100);
 }
 
+static void stimer_mark_expired(struct kvm_vcpu_hv_stimer *stimer,
+   bool vcpu_kick)
+{
+   struct kvm_vcpu *vcpu = stimer_to_vcpu(stimer);
+
+   set_bit(stimer->index,
+   vcpu_to_hv_vcpu(vcpu)->stimer_pending_bitmap);
+   kvm_make_request(KVM_REQ_HV_STIMER, vcpu);
+   if (vcpu_kick)
+   

[PATCH v1 0/7] KVM: Hyper-V SynIC timers

2015-11-25 Thread Andrey Smetanin
Per Hyper-V specification (and as required by Hyper-V-aware guests),
SynIC provides 4 per-vCPU timers.  Each timer is programmed via a pair
of MSRs, and signals expiration by delivering a special format message
to the configured SynIC message slot and triggering the corresponding
synthetic interrupt.

Note: as implemented by this patch, all periodic timers are "lazy"
(i.e. if the vCPU wasn't scheduled for more than the timer period the
timer events are lost), regardless of the corresponding configuration
MSR.  If deemed necessary, the "catch up" mode (the timer period is
shortened until the timer catches up) will be implemented later.

The Hyper-V SynIC timers support is required to load winhv.sys
inside Windows guest on which guest VMBus devices depends on.

This patches depends on Hyper-V SynIC patches previosly sent.

Signed-off-by: Andrey Smetanin 
CC: Gleb Natapov 
CC: Paolo Bonzini 
CC: "K. Y. Srinivasan" 
CC: Haiyang Zhang 
CC: Vitaly Kuznetsov 
CC: Roman Kagan 
CC: Denis V. Lunev 
CC: qemu-de...@nongnu.org

Andrey Smetanin (7):
  drivers/hv: Move HV_SYNIC_STIMER_COUNT into Hyper-V UAPI x86 header
  drivers/hv: Move struct hv_message into UAPI Hyper-V x86 header
  kvm/x86: Rearrange func's declarations inside Hyper-V header
  kvm/x86: Added Hyper-V vcpu_to_hv_vcpu()/hv_vcpu_to_vcpu() helpers
  kvm/x86: Hyper-V internal helper to read MSR HV_X64_MSR_TIME_REF_COUNT
  kvm/x86: Hyper-V SynIC message slot pending clearing at SINT ack
  kvm/x86: Hyper-V SynIC timers

 arch/x86/include/asm/kvm_host.h|  13 ++
 arch/x86/include/uapi/asm/hyperv.h |  99 ++
 arch/x86/kvm/hyperv.c  | 367 -
 arch/x86/kvm/hyperv.h  |  54 --
 arch/x86/kvm/x86.c |   9 +
 drivers/hv/hyperv_vmbus.h  |  93 --
 include/linux/kvm_host.h   |   3 +
 7 files changed, 527 insertions(+), 111 deletions(-)

-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH V2 3/3] Ixgbevf: Add migration support for ixgbevf driver

2015-11-25 Thread Lan, Tianyu

On 11/25/2015 8:28 PM, Michael S. Tsirkin wrote:

Frankly, I don't really see what this short term hack buys us,
and if it goes in, we'll have to maintain it forever.



The framework of how to notify VF about migration status won't be
changed regardless of stopping VF or not before doing migration.
We hope to reach agreement on this first. Tracking dirty memory still
need to more discussions and we will continue working on it. Stop VF may
help to work around the issue and make tracking easier.



Also, assuming you just want to do ifdown/ifup for some reason, it's
easy enough to do using a guest agent, in a completely generic way.



Just ifdown/ifup is not enough for migration. It needs to restore some 
PCI settings before doing ifup on the target machine

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL] KVM fixes for 4.4-rc3

2015-11-25 Thread Paolo Bonzini
Linus,

The following changes since commit 1ec218373b8ebda821aec00bb156a9c94fad9cd4:

  Linux 4.4-rc2 (2015-11-22 16:45:59 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/virt/kvm/kvm.git tags/for-linus

for you to fetch changes up to b2467e744f89fcb2e723143c2b78bcbaf391828a:

  KVM: nVMX: remove incorrect vpid check in nested invvpid emulation 
(2015-11-25 15:52:55 +0100)


Bug fixes for all architectures.  Nothing really stands out.


Ard Biesheuvel (1):
  ARM/arm64: KVM: test properly for a PTE's uncachedness

Christoffer Dall (3):
  KVM: arm/arm64: Fix preemptible timer active state crazyness
  KVM: arm/arm64: arch_timer: Preserve physical dist. active state on 
LR.active
  KVM: arm/arm64: vgic: Trust the LR state for HW IRQs

David Hildenbrand (4):
  KVM: s390: enable SIMD only when no VCPUs were created
  KVM: Provide function for VCPU lookup by id
  KVM: s390: avoid memory overwrites on emergency signal injection
  KVM: s390: fix wrong lookup of VCPUs by array index

Haozhong Zhang (1):
  KVM: nVMX: remove incorrect vpid check in nested invvpid emulation

Heiko Carstens (1):
  KVM: s390: fix pfmf intercept handler

James Hogan (3):
  MIPS: KVM: Fix ASID restoration logic
  MIPS: KVM: Fix CACHE immediate offset sign extension
  MIPS: KVM: Uninit VCPU in vcpu_create error path

Marc Zyngier (2):
  arm64: KVM: Fix AArch32 to AArch64 register mapping
  arm64: KVM: Add workaround for Cortex-A57 erratum 834220

Mark Rutland (2):
  arm64: kvm: avoid %p in __kvm_hyp_panic
  arm64: kvm: report original PAR_EL1 upon panic

Matt Gingell (4):
  KVM: x86: fix interrupt window handling in split IRQ chip case
  KVM: x86: split kvm_vcpu_ready_for_interrupt_injection out of 
dm_request_for_irq_injection
  KVM: x86: set KVM_REQ_EVENT on local interrupt request from user space
  KVM: x86: request interrupt window when IRQ chip is split

Paolo Bonzini (2):
  Merge tag 'kvm-s390-master-4.4-1' of 
git://git.kernel.org/.../kvms390/linux into kvm-master
  Merge tag 'kvm-arm-for-v4.4-rc3' of 
git://git.kernel.org/.../kvmarm/kvmarm into kvm-master

 arch/arm/kvm/arm.c   |  7 +
 arch/arm/kvm/mmu.c   | 15 +
 arch/arm64/Kconfig   | 21 +
 arch/arm64/include/asm/cpufeature.h  |  3 +-
 arch/arm64/include/asm/kvm_emulate.h |  8 +++--
 arch/arm64/kernel/cpu_errata.c   |  9 ++
 arch/arm64/kvm/hyp.S | 14 +++--
 arch/arm64/kvm/inject_fault.c|  2 +-
 arch/mips/kvm/emulate.c  |  2 +-
 arch/mips/kvm/locore.S   | 16 ++
 arch/mips/kvm/mips.c |  5 ++-
 arch/s390/kvm/interrupt.c|  7 +++--
 arch/s390/kvm/kvm-s390.c |  6 +++-
 arch/s390/kvm/priv.c |  2 +-
 arch/s390/kvm/sigp.c |  8 ++---
 arch/x86/kvm/vmx.c   |  5 ---
 arch/x86/kvm/x86.c   | 61 +++-
 include/kvm/arm_vgic.h   |  2 +-
 include/linux/kvm_host.h | 11 +++
 virt/kvm/arm/arch_timer.c| 28 ++---
 virt/kvm/arm/vgic.c  | 50 ++---
 21 files changed, 171 insertions(+), 111 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] target-i386: kvm: Abort if MCE bank count is not supported by host

2015-11-25 Thread Paolo Bonzini


On 25/11/2015 18:26, Eduardo Habkost wrote:
>> > Yoda conditions?
>> > 
>> > if (banks < MCE_BANKS_DEF) {
>> > error_report("kvm: Unsupported MCE bank count (QEMU = %d, KVM 
>> > = %d)",
>> >  MCE_BANKS_DEF, banks);
> This was on purpose, because MCE_BANKS_DEF is replaced by
> (env->mcg_caps & MCG_CAPS_COUNT_MASK) in the next patch.

Yeah, I noticed it later.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: Add lowest-priority support for vt-d posted-interrupts

2015-11-25 Thread Radim Krčmář
2015-11-25 15:38+0100, Paolo Bonzini:
> On 25/11/2015 15:12, Radim Krcmár wrote:
>> I think it's ok to pick any algorithm we like.  It's unlikely that
>> software would recognize and take advantage of the hardware algorithm
>> without adding a special treatment for KVM.
>> (I'd vote for the simple pick-first-APIC lowest priority algorithm ...
>>  I don't see much point in complicating lowest priority when it doesn't
>>  deliver to lowest priority CPU anyway.)
> 
> Vector hashing is an improvement for the common case where all vectors
> are set to all CPUs.  Sure you can get an unlucky assignment, but it's
> still better than pick-first-APIC.

Yeah, hashing has a valid use case, but a subtle weighting of drawbacks
led me to prefer pick-first-APIC ...

(I'd prefer to have simple code in KVM and depend on static IRQ balancing
 in a guest to handle the distribution.
 The guest could get the unlucky assignment anyway, so it should be
 prepared;  and hashing just made KVM worse in that case.  Guests might
 also configure physical x(2)APIC, where is no lowest priority.
 And if the guest doesn't do anything with IRQs, then it might not even
 care about the impact that our choice has.)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH V2 09/10] Qemu/VFIO: Add SRIOV VF migration support

2015-11-25 Thread Michael S. Tsirkin
On Wed, Nov 25, 2015 at 11:32:23PM +0800, Lan, Tianyu wrote:
> 
> On 11/25/2015 5:03 AM, Michael S. Tsirkin wrote:
> >>>+void vfio_migration_cap_handle(PCIDevice *pdev, uint32_t addr,
> >>>+  uint32_t val, int len)
> >>>+{
> >>>+VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
> >>>+
> >>>+if (addr == vdev->migration_cap + PCI_VF_MIGRATION_VF_STATUS
> >>>+&& val == PCI_VF_READY_FOR_MIGRATION) {
> >>>+qemu_event_set(_event);
> >This would wake migration so it can proceed -
> >except it needs QEMU lock to run, and that's
> >taken by the migration thread.
> 
> Sorry, I seem to miss something.
> Which lock may cause dead lock when calling vfio_migration_cap_handle()
> and run migration?

qemu_global_mutex.

> The function is called when VF accesses faked PCI capability.
> 
> >
> >It seems unlikely that this ever worked - how
> >did you test this?
> >
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH V2 3/3] Ixgbevf: Add migration support for ixgbevf driver

2015-11-25 Thread Michael S. Tsirkin
On Thu, Nov 26, 2015 at 12:02:33AM +0800, Lan, Tianyu wrote:
> On 11/25/2015 8:28 PM, Michael S. Tsirkin wrote:
> >Frankly, I don't really see what this short term hack buys us,
> >and if it goes in, we'll have to maintain it forever.
> >
> 
> The framework of how to notify VF about migration status won't be
> changed regardless of stopping VF or not before doing migration.
> We hope to reach agreement on this first.

Well it's bi-directional, the framework won't work if it's
uni-directional.
Further, if you use this interface to stop the interface
at the moment, you won't be able to do anything else
with it, and will need a new one down the road.


> Tracking dirty memory still
> need to more discussions and we will continue working on it. Stop VF may
> help to work around the issue and make tracking easier.
> 
> 
> >Also, assuming you just want to do ifdown/ifup for some reason, it's
> >easy enough to do using a guest agent, in a completely generic way.
> >
> 
> Just ifdown/ifup is not enough for migration. It needs to restore some PCI
> settings before doing ifup on the target machine

I'd focus on just restoring then.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH V2 3/3] Ixgbevf: Add migration support for ixgbevf driver

2015-11-25 Thread Michael S. Tsirkin
On Wed, Nov 25, 2015 at 08:24:38AM -0800, Alexander Duyck wrote:
> >> Also, assuming you just want to do ifdown/ifup for some reason, it's
> >> easy enough to do using a guest agent, in a completely generic way.
> >>
> >
> > Just ifdown/ifup is not enough for migration. It needs to restore some PCI
> > settings before doing ifup on the target machine
> 
> That is why I have been suggesting making use of suspend/resume logic
> that is already in place for PCI power management.  In the case of a
> suspend/resume we already have to deal with the fact that the device
> will go through a D0->D3->D0 reset so we have to restore all of the
> existing state.  It would take a significant load off of Qemu since
> the guest would be restoring its own state instead of making Qemu have
> to do all of the device migration work.

That can work, though again, the issue is you need guest
cooperation to migrate.

If you reset device on destination instead of restoring state,
then that issue goes away, but maybe the downtime
will be increased.

Will it really? I think it's worth it to start with the
simplest solution (reset on destination) and see
what the effect is, then add optimizations.


One thing that I've been thinking about for a while, is saving (some)
state speculatively.  For example, notify guest a bit before migration
is done, so it can save device state. If guest responds quickly, you
have state that can be restored.  If it doesn't, still migrate, and it
will have to reset on destination.


-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] target-i386: kvm: Print warning when clearing mcg_cap bits

2015-11-25 Thread Paolo Bonzini


On 25/11/2015 18:29, Eduardo Habkost wrote:
>>> > >  
>>> > > +unsupported_caps = env->mcg_cap & ~(mcg_cap | 
>>> > > MCG_CAP_BANKS_MASK);
>>> > > +if (unsupported_caps) {
>>> > > +error_report("warning: Unsupported MCG_CAP bits: 0x%" 
>>> > > PRIx64 "\n",
>> > 
>> > \n should not be at end of error_report.
>> > 
>> > Fixed and applied.
> MCG_CAP_BANKS_MASK is defined by patch 2/3. Have you applied the
> whole series?

Yes, of course.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v1 2/2] target-i386/kvm: Hyper-V SynIC timers MSR's support

2015-11-25 Thread Andrey Smetanin
Hyper-V SynIC timers are host timers that are configurable
by guest through corresponding MSR's (HV_X64_MSR_STIMER*).
Guest setup and use fired by host events(SynIC interrupt
and appropriate timer expiration message) as guest clock
events.

The state of Hyper-V SynIC timers are stored in corresponding
MSR's. This patch seria implements such MSR's support and migration.

Signed-off-by: Andrey Smetanin 
CC: Paolo Bonzini 
CC: Richard Henderson 
CC: Eduardo Habkost 
CC: "Andreas Färber" 
CC: Marcelo Tosatti 
CC: Denis V. Lunev 
CC: Roman Kagan 
CC: kvm@vger.kernel.org

---
 target-i386/cpu-qom.h |  1 +
 target-i386/cpu.c |  1 +
 target-i386/cpu.h |  2 ++
 target-i386/kvm.c | 50 +-
 target-i386/machine.c | 29 +
 5 files changed, 82 insertions(+), 1 deletion(-)

diff --git a/target-i386/cpu-qom.h b/target-i386/cpu-qom.h
index 7ea5b34..5f9d960 100644
--- a/target-i386/cpu-qom.h
+++ b/target-i386/cpu-qom.h
@@ -95,6 +95,7 @@ typedef struct X86CPU {
 bool hyperv_vpindex;
 bool hyperv_runtime;
 bool hyperv_synic;
+bool hyperv_stimer;
 bool check_cpuid;
 bool enforce_cpuid;
 bool expose_kvm;
diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index 1462e19..31407f1 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -3143,6 +3143,7 @@ static Property x86_cpu_properties[] = {
 DEFINE_PROP_BOOL("hv-vpindex", X86CPU, hyperv_vpindex, false),
 DEFINE_PROP_BOOL("hv-runtime", X86CPU, hyperv_runtime, false),
 DEFINE_PROP_BOOL("hv-synic", X86CPU, hyperv_synic, false),
+DEFINE_PROP_BOOL("hv-stimer", X86CPU, hyperv_stimer, false),
 DEFINE_PROP_BOOL("check", X86CPU, check_cpuid, true),
 DEFINE_PROP_BOOL("enforce", X86CPU, enforce_cpuid, false),
 DEFINE_PROP_BOOL("kvm", X86CPU, expose_kvm, true),
diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index 8cf33df..2376a55 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -923,6 +923,8 @@ typedef struct CPUX86State {
 uint64_t msr_hv_synic_evt_page;
 uint64_t msr_hv_synic_msg_page;
 uint64_t msr_hv_synic_sint[HV_SYNIC_SINT_COUNT];
+uint64_t msr_hv_stimer_config[HV_SYNIC_STIMER_COUNT];
+uint64_t msr_hv_stimer_count[HV_SYNIC_STIMER_COUNT];
 
 /* exception/interrupt handling */
 int error_code;
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 7513ef6..cb419ea 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -90,6 +90,7 @@ static bool has_msr_hv_reset;
 static bool has_msr_hv_vpindex;
 static bool has_msr_hv_runtime;
 static bool has_msr_hv_synic;
+static bool has_msr_hv_stimer;
 static bool has_msr_mtrr;
 static bool has_msr_xss;
 
@@ -526,7 +527,8 @@ static bool hyperv_enabled(X86CPU *cpu)
 cpu->hyperv_reset ||
 cpu->hyperv_vpindex ||
 cpu->hyperv_runtime ||
-cpu->hyperv_synic);
+cpu->hyperv_synic ||
+cpu->hyperv_stimer);
 }
 
 static Error *invtsc_mig_blocker;
@@ -630,6 +632,13 @@ int kvm_arch_init_vcpu(CPUState *cs)
 env->msr_hv_synic_sint[sint] = HV_SYNIC_SINT_MASKED;
 }
 }
+if (cpu->hyperv_stimer) {
+if (!has_msr_hv_stimer) {
+fprintf(stderr, "Hyper-V timers aren't supported by kernel\n");
+return -ENOSYS;
+}
+c->eax |= HV_X64_MSR_SYNTIMER_AVAILABLE;
+}
 c = _data.entries[cpuid_i++];
 c->function = HYPERV_CPUID_ENLIGHTMENT_INFO;
 if (cpu->hyperv_relaxed_timing) {
@@ -974,6 +983,10 @@ static int kvm_get_supported_msrs(KVMState *s)
 has_msr_hv_synic = true;
 continue;
 }
+if (kvm_msr_list->indices[i] == HV_X64_MSR_STIMER0_CONFIG) {
+has_msr_hv_stimer = true;
+continue;
+}
 }
 }
 
@@ -1552,6 +1565,19 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
   env->msr_hv_synic_sint[j]);
 }
 }
+if (has_msr_hv_stimer) {
+int j;
+
+for (j = 0; j < ARRAY_SIZE(env->msr_hv_stimer_config); j++) {
+kvm_msr_entry_set([n++], HV_X64_MSR_STIMER0_CONFIG + j*2,
+env->msr_hv_stimer_config[j]);
+}
+
+for (j = 0; j < ARRAY_SIZE(env->msr_hv_stimer_count); j++) {
+kvm_msr_entry_set([n++], HV_X64_MSR_STIMER0_COUNT + j*2,
+env->msr_hv_stimer_count[j]);
+}
+}
 if (has_msr_mtrr) {
 kvm_msr_entry_set([n++], MSR_MTRRdefType, env->mtrr_deftype);
 kvm_msr_entry_set([n++],
@@ -1931,6 +1957,14 @@ static int kvm_get_msrs(X86CPU *cpu)
 

[PATCH v1 6/7] kvm/x86: Hyper-V SynIC message slot pending clearing at SINT ack

2015-11-25 Thread Andrey Smetanin
The SynIC message protocol mandates that the message slot is claimed
by atomically setting message type to something other than HVMSG_NONE.
If another message is to be delivered while the slot is still busy,
message pending flag is asserted to indicate to the guest that the
hypervisor wants to be notified when the slot is released.

To make sure the protocol works regardless of where the message
sources are (kernel or userspace), clear the pending flag on SINT ACK
notification, and let the message sources compete for the slot again.

Signed-off-by: Andrey Smetanin 
CC: Gleb Natapov 
CC: Paolo Bonzini 
CC: "K. Y. Srinivasan" 
CC: Haiyang Zhang 
CC: Vitaly Kuznetsov 
CC: Roman Kagan 
CC: Denis V. Lunev 
CC: qemu-de...@nongnu.org
---
 arch/x86/kvm/hyperv.c| 31 +++
 include/linux/kvm_host.h |  2 ++
 2 files changed, 33 insertions(+)

diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 9958926..6412b6b 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -27,6 +27,7 @@
 #include "hyperv.h"
 
 #include 
+#include 
 #include 
 #include 
 
@@ -116,13 +117,43 @@ static struct kvm_vcpu_hv_synic *synic_get(struct kvm 
*kvm, u32 vcpu_id)
return (synic->active) ? synic : NULL;
 }
 
+static void synic_clear_sint_msg_pending(struct kvm_vcpu_hv_synic *synic,
+   u32 sint)
+{
+   struct kvm_vcpu *vcpu = synic_to_vcpu(synic);
+   struct page *page;
+   gpa_t gpa;
+   struct hv_message *msg;
+   struct hv_message_page *msg_page;
+
+   gpa = synic->msg_page & PAGE_MASK;
+   page = kvm_vcpu_gfn_to_page(vcpu, gpa >> PAGE_SHIFT);
+   if (is_error_page(page)) {
+   vcpu_err(vcpu, "Hyper-V SynIC can't get msg page, gpa 0x%llx\n",
+gpa);
+   return;
+   }
+   msg_page = kmap_atomic(page);
+
+   msg = _page->sint_message[sint];
+   msg->header.message_flags.msg_pending = 0;
+
+   kunmap_atomic(msg_page);
+   kvm_release_page_dirty(page);
+   kvm_vcpu_mark_page_dirty(vcpu, gpa >> PAGE_SHIFT);
+}
+
 static void kvm_hv_notify_acked_sint(struct kvm_vcpu *vcpu, u32 sint)
 {
struct kvm *kvm = vcpu->kvm;
+   struct kvm_vcpu_hv_synic *synic = vcpu_to_synic(vcpu);
int gsi, idx;
 
vcpu_debug(vcpu, "Hyper-V SynIC acked sint %d\n", sint);
 
+   if (synic->msg_page & HV_SYNIC_SIMP_ENABLE)
+   synic_clear_sint_msg_pending(synic, sint);
+
idx = srcu_read_lock(>irq_srcu);
gsi = atomic_read(_to_synic(vcpu)->sint_to_gsi[sint]);
if (gsi != -1)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 2911919..9b64c8c 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -450,6 +450,8 @@ struct kvm {
 
 #define vcpu_debug(vcpu, fmt, ...) \
kvm_debug("vcpu%i " fmt, (vcpu)->vcpu_id, ## __VA_ARGS__)
+#define vcpu_err(vcpu, fmt, ...)   \
+   kvm_err("vcpu%i " fmt, (vcpu)->vcpu_id, ## __VA_ARGS__)
 
 static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm *kvm, int i)
 {
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v1 4/7] kvm/x86: Added Hyper-V vcpu_to_hv_vcpu()/hv_vcpu_to_vcpu() helpers

2015-11-25 Thread Andrey Smetanin
Signed-off-by: Andrey Smetanin 
Reviewed-by: Roman Kagan 
CC: Gleb Natapov 
CC: Paolo Bonzini 
CC: "K. Y. Srinivasan" 
CC: Haiyang Zhang 
CC: Vitaly Kuznetsov 
CC: Roman Kagan 
CC: Denis V. Lunev 
CC: qemu-de...@nongnu.org
---
 arch/x86/kvm/hyperv.h | 20 ++--
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/hyperv.h b/arch/x86/kvm/hyperv.h
index 9483d49..d5d8217 100644
--- a/arch/x86/kvm/hyperv.h
+++ b/arch/x86/kvm/hyperv.h
@@ -24,21 +24,29 @@
 #ifndef __ARCH_X86_KVM_HYPERV_H__
 #define __ARCH_X86_KVM_HYPERV_H__
 
-static inline struct kvm_vcpu_hv_synic *vcpu_to_synic(struct kvm_vcpu *vcpu)
+static inline struct kvm_vcpu_hv *vcpu_to_hv_vcpu(struct kvm_vcpu *vcpu)
 {
-   return >arch.hyperv.synic;
+   return >arch.hyperv;
 }
 
-static inline struct kvm_vcpu *synic_to_vcpu(struct kvm_vcpu_hv_synic *synic)
+static inline struct kvm_vcpu *hv_vcpu_to_vcpu(struct kvm_vcpu_hv *hv_vcpu)
 {
-   struct kvm_vcpu_hv *hv;
struct kvm_vcpu_arch *arch;
 
-   hv = container_of(synic, struct kvm_vcpu_hv, synic);
-   arch = container_of(hv, struct kvm_vcpu_arch, hyperv);
+   arch = container_of(hv_vcpu, struct kvm_vcpu_arch, hyperv);
return container_of(arch, struct kvm_vcpu, arch);
 }
 
+static inline struct kvm_vcpu_hv_synic *vcpu_to_synic(struct kvm_vcpu *vcpu)
+{
+   return >arch.hyperv.synic;
+}
+
+static inline struct kvm_vcpu *synic_to_vcpu(struct kvm_vcpu_hv_synic *synic)
+{
+   return hv_vcpu_to_vcpu(container_of(synic, struct kvm_vcpu_hv, synic));
+}
+
 int kvm_hv_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data, bool host);
 int kvm_hv_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata);
 
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v1 0/2] QEMU: Hyper-V SynIC timers MSR's support

2015-11-25 Thread Andrey Smetanin
Hyper-V SynIC timers are host timers that are configurable
by guest through corresponding MSR's (HV_X64_MSR_STIMER*).
Guest setup and use fired by host events(SynIC interrupt
and appropriate timer expiration message) as guest clock
events.

The state of Hyper-V SynIC timers are stored in corresponding
MSR's. This patch seria implements such MSR's support and migration.

Signed-off-by: Andrey Smetanin 
CC: Paolo Bonzini 
CC: Richard Henderson 
CC: Eduardo Habkost 
CC: "Andreas Färber" 
CC: Marcelo Tosatti 
CC: Denis V. Lunev 
CC: Roman Kagan 
CC: kvm@vger.kernel.org

Andrey Smetanin (2):
  include: update Hyper-V header to include SynIC timers defines
  target-i386/kvm: Hyper-V SynIC timers MSR's support

 include/standard-headers/asm-x86/hyperv.h | 99 +++
 target-i386/cpu-qom.h |  1 +
 target-i386/cpu.c |  1 +
 target-i386/cpu.h |  2 +
 target-i386/kvm.c | 50 +++-
 target-i386/machine.c | 29 +
 6 files changed, 181 insertions(+), 1 deletion(-)

-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v1 1/7] drivers/hv: Move HV_SYNIC_STIMER_COUNT into Hyper-V UAPI x86 header

2015-11-25 Thread Andrey Smetanin
This constant is required for Hyper-V SynIC timers MSR's
support by userspace(QEMU).

Signed-off-by: Andrey Smetanin 
Reviewed-by: Roman Kagan 
CC: Gleb Natapov 
CC: Paolo Bonzini 
CC: "K. Y. Srinivasan" 
CC: Haiyang Zhang 
CC: Vitaly Kuznetsov 
CC: Roman Kagan 
CC: Denis V. Lunev 
CC: qemu-de...@nongnu.org
---
 arch/x86/include/uapi/asm/hyperv.h | 2 ++
 drivers/hv/hyperv_vmbus.h  | 2 --
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/uapi/asm/hyperv.h 
b/arch/x86/include/uapi/asm/hyperv.h
index 040d408..07981f0 100644
--- a/arch/x86/include/uapi/asm/hyperv.h
+++ b/arch/x86/include/uapi/asm/hyperv.h
@@ -269,4 +269,6 @@ typedef struct _HV_REFERENCE_TSC_PAGE {
 #define HV_SYNIC_SINT_AUTO_EOI (1ULL << 17)
 #define HV_SYNIC_SINT_VECTOR_MASK  (0xFF)
 
+#define HV_SYNIC_STIMER_COUNT  (4)
+
 #endif
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 3782636..46e23d1 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -102,8 +102,6 @@ enum hv_message_type {
HVMSG_X64_LEGACY_FP_ERROR   = 0x80010005
 };
 
-#define HV_SYNIC_STIMER_COUNT  (4)
-
 /* Define invalid partition identifier. */
 #define HV_PARTITION_ID_INVALID((u64)0x0)
 
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v1 1/2] include: update Hyper-V header to include SynIC timers defines

2015-11-25 Thread Andrey Smetanin
This patch brings in the necessary changes from the corresponding kernel
patchset.  It's included only for completeness; ideally these changes
should arrive via the standard kernel header pull.

Signed-off-by: Andrey Smetanin 
CC: Paolo Bonzini 
CC: Richard Henderson 
CC: Eduardo Habkost 
CC: "Andreas Färber" 
CC: Marcelo Tosatti 
CC: Denis V. Lunev 
CC: Roman Kagan 
CC: kvm@vger.kernel.org

---
 include/standard-headers/asm-x86/hyperv.h | 99 +++
 1 file changed, 99 insertions(+)

diff --git a/include/standard-headers/asm-x86/hyperv.h 
b/include/standard-headers/asm-x86/hyperv.h
index f9780f1..3684610 100644
--- a/include/standard-headers/asm-x86/hyperv.h
+++ b/include/standard-headers/asm-x86/hyperv.h
@@ -269,4 +269,103 @@ typedef struct _HV_REFERENCE_TSC_PAGE {
 #define HV_SYNIC_SINT_AUTO_EOI (1ULL << 17)
 #define HV_SYNIC_SINT_VECTOR_MASK  (0xFF)
 
+#define HV_SYNIC_STIMER_COUNT  (4)
+
+/* Define synthetic interrupt controller message constants. */
+#define HV_MESSAGE_SIZE(256)
+#define HV_MESSAGE_PAYLOAD_BYTE_COUNT  (240)
+#define HV_MESSAGE_PAYLOAD_QWORD_COUNT (30)
+
+/* Define hypervisor message types. */
+enum hv_message_type {
+   HVMSG_NONE  = 0x,
+
+   /* Memory access messages. */
+   HVMSG_UNMAPPED_GPA  = 0x8000,
+   HVMSG_GPA_INTERCEPT = 0x8001,
+
+   /* Timer notification messages. */
+   HVMSG_TIMER_EXPIRED = 0x8010,
+
+   /* Error messages. */
+   HVMSG_INVALID_VP_REGISTER_VALUE = 0x8020,
+   HVMSG_UNRECOVERABLE_EXCEPTION   = 0x8021,
+   HVMSG_UNSUPPORTED_FEATURE   = 0x8022,
+
+   /* Trace buffer complete messages. */
+   HVMSG_EVENTLOG_BUFFERCOMPLETE   = 0x8040,
+
+   /* Platform-specific processor intercept messages. */
+   HVMSG_X64_IOPORT_INTERCEPT  = 0x8001,
+   HVMSG_X64_MSR_INTERCEPT = 0x80010001,
+   HVMSG_X64_CPUID_INTERCEPT   = 0x80010002,
+   HVMSG_X64_EXCEPTION_INTERCEPT   = 0x80010003,
+   HVMSG_X64_APIC_EOI  = 0x80010004,
+   HVMSG_X64_LEGACY_FP_ERROR   = 0x80010005
+};
+
+/* Define synthetic interrupt controller message flags. */
+union hv_message_flags {
+   uint8_t asu8;
+   struct {
+   uint8_t msg_pending:1;
+   uint8_t reserved:7;
+   };
+};
+
+/* Define port identifier type. */
+union hv_port_id {
+   uint32_t asu32;
+   struct {
+   uint32_t id:24;
+   uint32_t reserved:8;
+   } u;
+};
+
+/* Define port type. */
+enum hv_port_type {
+   HVPORT_MSG  = 1,
+   HVPORT_EVENT= 2,
+   HVPORT_MONITOR  = 3
+};
+
+/* Define synthetic interrupt controller message header. */
+struct hv_message_header {
+   enum hv_message_type message_type;
+   uint8_t payload_size;
+   union hv_message_flags message_flags;
+   uint8_t reserved[2];
+   union {
+   uint64_t sender;
+   union hv_port_id port;
+   };
+};
+
+/* Define timer message payload structure. */
+struct hv_timer_message_payload {
+   uint32_t timer_index;
+   uint32_t reserved;
+   uint64_t expiration_time;   /* When the timer expired */
+   uint64_t delivery_time; /* When the message was delivered */
+};
+
+/* Define synthetic interrupt controller message format. */
+struct hv_message {
+   struct hv_message_header header;
+   union {
+   uint64_t payload[HV_MESSAGE_PAYLOAD_QWORD_COUNT];
+   } u;
+};
+
+/* Define the synthetic interrupt message page layout. */
+struct hv_message_page {
+   struct hv_message sint_message[HV_SYNIC_SINT_COUNT];
+};
+
+#define HV_STIMER_ENABLE   (1ULL << 0)
+#define HV_STIMER_PERIODIC (1ULL << 1)
+#define HV_STIMER_LAZY (1ULL << 2)
+#define HV_STIMER_AUTOENABLE   (1ULL << 3)
+#define HV_STIMER_SINT(config) (uint8_t)(((config) >> 16) & 0x0F)
+
 #endif
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH V2 3/3] Ixgbevf: Add migration support for ixgbevf driver

2015-11-25 Thread Alexander Duyck
On Wed, Nov 25, 2015 at 8:02 AM, Lan, Tianyu  wrote:
> On 11/25/2015 8:28 PM, Michael S. Tsirkin wrote:
>>
>> Frankly, I don't really see what this short term hack buys us,
>> and if it goes in, we'll have to maintain it forever.
>>
>
> The framework of how to notify VF about migration status won't be
> changed regardless of stopping VF or not before doing migration.
> We hope to reach agreement on this first. Tracking dirty memory still
> need to more discussions and we will continue working on it. Stop VF may
> help to work around the issue and make tracking easier.

The problem is you still have to stop the device at some point for the
same reason why you have to halt the VM.  You seem to think you can
get by without doing that but you can't.  All you do is open the
system up to multiple races if you leave the device running.  The goal
should be to avoid stopping the device until the last possible moment,
however it will still have to be stopped eventually.  It isn't as if
you can migrate memory and leave the device doing DMA and expect to
get a clean state.

I agree with Michael.  The focus needs to be on first addressing dirty
page tracking.  Once you have that you could use a variation on the
bonding solution where you postpone the hot-plug event until near the
end of the migration just before you halt the guest instead of having
to do it before you start the migration.  Then after that we could
look at optimizing things further by introducing a variation that you
could further improve on things by introducing a variation of hot-plug
that would pause the device as I suggested instead of removing it.  At
that point you should be able to have almost all of the key issues
addresses so that you could drop the bond interface entirely.

>> Also, assuming you just want to do ifdown/ifup for some reason, it's
>> easy enough to do using a guest agent, in a completely generic way.
>>
>
> Just ifdown/ifup is not enough for migration. It needs to restore some PCI
> settings before doing ifup on the target machine

That is why I have been suggesting making use of suspend/resume logic
that is already in place for PCI power management.  In the case of a
suspend/resume we already have to deal with the fact that the device
will go through a D0->D3->D0 reset so we have to restore all of the
existing state.  It would take a significant load off of Qemu since
the guest would be restoring its own state instead of making Qemu have
to do all of the device migration work.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] target-i386: kvm: Print warning when clearing mcg_cap bits

2015-11-25 Thread Paolo Bonzini


On 25/11/2015 16:49, Eduardo Habkost wrote:
> Instead of silently clearing mcg_cap bits when the host doesn't
> support them, print a warning when doing that.
> 
> Signed-off-by: Eduardo Habkost 
> ---
>  target-i386/kvm.c | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> index d63a85b..446bdfc 100644
> --- a/target-i386/kvm.c
> +++ b/target-i386/kvm.c
> @@ -774,7 +774,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
>  && (env->features[FEAT_1_EDX] & (CPUID_MCE | CPUID_MCA)) ==
> (CPUID_MCE | CPUID_MCA)
>  && kvm_check_extension(cs->kvm_state, KVM_CAP_MCE) > 0) {
> -uint64_t mcg_cap;
> +uint64_t mcg_cap, unsupported_caps;
>  int banks;
>  int ret;
>  
> @@ -790,6 +790,12 @@ int kvm_arch_init_vcpu(CPUState *cs)
>  return -ENOTSUP;
>  }
>  
> +unsupported_caps = env->mcg_cap & ~(mcg_cap | MCG_CAP_BANKS_MASK);
> +if (unsupported_caps) {
> +error_report("warning: Unsupported MCG_CAP bits: 0x%" PRIx64 
> "\n",

\n should not be at end of error_report.

Fixed and applied.

Paolo

> + unsupported_caps);
> +}
> +
>  env->mcg_cap &= mcg_cap | MCG_CAP_BANKS_MASK;
>  ret = kvm_vcpu_ioctl(cs, KVM_X86_SETUP_MCE, >mcg_cap);
>  if (ret < 0) {
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 6/7] kvm/x86: Hyper-V SynIC message slot pending clearing at SINT ack

2015-11-25 Thread Paolo Bonzini


On 25/11/2015 16:20, Andrey Smetanin wrote:
> +static void synic_clear_sint_msg_pending(struct kvm_vcpu_hv_synic *synic,
> + u32 sint)
> +{
> + struct kvm_vcpu *vcpu = synic_to_vcpu(synic);
> + struct page *page;
> + gpa_t gpa;
> + struct hv_message *msg;
> + struct hv_message_page *msg_page;
> +
> + gpa = synic->msg_page & PAGE_MASK;
> + page = kvm_vcpu_gfn_to_page(vcpu, gpa >> PAGE_SHIFT);
> + if (is_error_page(page)) {
> + vcpu_err(vcpu, "Hyper-V SynIC can't get msg page, gpa 0x%llx\n",
> +  gpa);
> + return;
> + }
> + msg_page = kmap_atomic(page);

But the message page is not being pinned, is it?

Paolo

> + msg = _page->sint_message[sint];
> + msg->header.message_flags.msg_pending = 0;
> +
> + kunmap_atomic(msg_page);
> + kvm_release_page_dirty(page);
> + kvm_vcpu_mark_page_dirty(vcpu, gpa >> PAGE_SHIFT);
> +}
> +
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: nVMX: remove incorrect vpid check in nested invvpid emulation

2015-11-25 Thread Bandan Das
Haozhong Zhang  writes:

> This patch removes the vpid check when emulating nested invvpid
> instruction of type all-contexts invalidation. The existing code is
> incorrect because:
>  (1) According to Intel SDM Vol 3, Section "INVVPID - Invalidate
>  Translations Based on VPID", invvpid instruction does not check
>  vpid in the invvpid descriptor when its type is all-contexts
>  invalidation.

But iirc isn't vpid=0 reserved for root mode ? I think we don't want
L1 hypervisor to be able do a invvpid(0).

>  (2) According to the same document, invvpid of type all-contexts
>  invalidation does not require there is an active VMCS, so/and
>  get_vmcs12() in the existing code may result in a NULL-pointer
>  dereference. In practice, it can crash both KVM itself and L1
>  hypervisors that use invvpid (e.g. Xen).

If that is the case, then just check if it's null and return without
doing anything.

> Signed-off-by: Haozhong Zhang 
> ---
>  arch/x86/kvm/vmx.c | 5 -
>  1 file changed, 5 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 87acc52..af823a3 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -7394,11 +7394,6 @@ static int handle_invvpid(struct kvm_vcpu *vcpu)
>  
>   switch (type) {
>   case VMX_VPID_EXTENT_ALL_CONTEXT:
> - if (get_vmcs12(vcpu)->virtual_processor_id == 0) {
> - nested_vmx_failValid(vcpu,
> - VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID);
> - return 1;
> - }
>   __vmx_flush_tlb(vcpu, to_vmx(vcpu)->nested.vpid02);
>   nested_vmx_succeed(vcpu);
>   break;

I also noticed a BUG() here in the default. It might be a good idea to replace
it with a WARN.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/10] KVM: x86: MMU: Use for_each_rmap_spte macro instead of pte_list_walk()

2015-11-25 Thread Paolo Bonzini


On 20/11/2015 09:47, Takuya Yoshikawa wrote:
> kvm_mmu_mark_parents_unsync() alone uses pte_list_walk(), witch does
> nearly the same as the for_each_rmap_spte macro.  The only difference
> is that is_shadow_present_pte() checks cannot be placed there because
> kvm_mmu_mark_parents_unsync() can be called with a new parent pointer
> whose entry is not set yet.
> 
> By calling mark_unsync() separately for the parent and adding the parent
> pointer to the parent_ptes chain later in kvm_mmu_get_page(), the macro
> works with no problem.
> 
> Signed-off-by: Takuya Yoshikawa 
> ---
>  arch/x86/kvm/mmu.c | 36 +---
>  1 file changed, 13 insertions(+), 23 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 7f46e3e..4e29d9a 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -1007,26 +1007,6 @@ static void pte_list_remove(u64 *spte, struct 
> kvm_rmap_head *rmap_head)
>   }
>  }
>  
> -typedef void (*pte_list_walk_fn) (u64 *spte);
> -static void pte_list_walk(struct kvm_rmap_head *rmap_head, pte_list_walk_fn 
> fn)
> -{
> - struct pte_list_desc *desc;
> - int i;
> -
> - if (!rmap_head->val)
> - return;
> -
> - if (!(rmap_head->val & 1))
> - return fn((u64 *)rmap_head->val);
> -
> - desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
> - while (desc) {
> - for (i = 0; i < PTE_LIST_EXT && desc->sptes[i]; ++i)
> - fn(desc->sptes[i]);
> - desc = desc->more;
> - }
> -}
> -
>  static struct kvm_rmap_head *__gfn_to_rmap(gfn_t gfn, int level,
>  struct kvm_memory_slot *slot)
>  {
> @@ -1749,7 +1729,12 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct 
> kvm_vcpu *vcpu, int direct
>  static void mark_unsync(u64 *spte);
>  static void kvm_mmu_mark_parents_unsync(struct kvm_mmu_page *sp)
>  {
> - pte_list_walk(>parent_ptes, mark_unsync);
> + u64 *sptep;
> + struct rmap_iterator iter;
> +
> + for_each_rmap_spte(>parent_ptes, , sptep) {
> + mark_unsync(sptep);
> + }
>  }
>  
>  static void mark_unsync(u64 *spte)
> @@ -2119,12 +2104,17 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 
> kvm_vcpu *vcpu,
>   if (sp->unsync && kvm_sync_page_transient(vcpu, sp))
>   break;
>  
> - mmu_page_add_parent_pte(vcpu, sp, parent_pte);
>   if (sp->unsync_children) {
>   kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
>   kvm_mmu_mark_parents_unsync(sp);
> - } else if (sp->unsync)
> + if (parent_pte)
> + mark_unsync(parent_pte);
> + } else if (sp->unsync) {
>   kvm_mmu_mark_parents_unsync(sp);
> + if (parent_pte)
> + mark_unsync(parent_pte);
> + }
> + mmu_page_add_parent_pte(vcpu, sp, parent_pte);

This patch is okay with Xiao's suggestion to remove the
kvm_mmu_mark_parents_unsync call.

Paolo

>   __clear_sp_write_flooding_count(sp);
>   trace_kvm_mmu_get_page(sp, false);
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 6/7] kvm/x86: Hyper-V SynIC message slot pending clearing at SINT ack

2015-11-25 Thread Andrey Smetanin



On 11/25/2015 07:52 PM, Paolo Bonzini wrote:



On 25/11/2015 16:20, Andrey Smetanin wrote:

+static void synic_clear_sint_msg_pending(struct kvm_vcpu_hv_synic *synic,
+   u32 sint)
+{
+   struct kvm_vcpu *vcpu = synic_to_vcpu(synic);
+   struct page *page;
+   gpa_t gpa;
+   struct hv_message *msg;
+   struct hv_message_page *msg_page;
+
+   gpa = synic->msg_page & PAGE_MASK;
+   page = kvm_vcpu_gfn_to_page(vcpu, gpa >> PAGE_SHIFT);
+   if (is_error_page(page)) {
+   vcpu_err(vcpu, "Hyper-V SynIC can't get msg page, gpa 0x%llx\n",
+gpa);
+   return;
+   }
+   msg_page = kmap_atomic(page);


But the message page is not being pinned, is it?


Actually I don't know anything about pinning.
Is it pinning against page swapping ?
Could you please clarify and provide an API to use in this case ?

Paolo


+   msg = _page->sint_message[sint];
+   msg->header.message_flags.msg_pending = 0;
+
+   kunmap_atomic(msg_page);
+   kvm_release_page_dirty(page);
+   kvm_vcpu_mark_page_dirty(vcpu, gpa >> PAGE_SHIFT);
+}
+

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] target-i386: kvm: Print warning when clearing mcg_cap bits

2015-11-25 Thread Paolo Bonzini


On 25/11/2015 18:21, Borislav Petkov wrote:
>> Instead of silently clearing mcg_cap bits when the host doesn't
>> > support them, print a warning when doing that.
> Why the host? Why would we want there to be any relation between the MCA
> capabilities of the host and what qemu is emulating?

He means the hypervisor. :)

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] target-i386: kvm: Print warning when clearing mcg_cap bits

2015-11-25 Thread Eduardo Habkost
Instead of silently clearing mcg_cap bits when the host doesn't
support them, print a warning when doing that.

Signed-off-by: Eduardo Habkost 
---
 target-i386/kvm.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index d63a85b..446bdfc 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -774,7 +774,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
 && (env->features[FEAT_1_EDX] & (CPUID_MCE | CPUID_MCA)) ==
(CPUID_MCE | CPUID_MCA)
 && kvm_check_extension(cs->kvm_state, KVM_CAP_MCE) > 0) {
-uint64_t mcg_cap;
+uint64_t mcg_cap, unsupported_caps;
 int banks;
 int ret;
 
@@ -790,6 +790,12 @@ int kvm_arch_init_vcpu(CPUState *cs)
 return -ENOTSUP;
 }
 
+unsupported_caps = env->mcg_cap & ~(mcg_cap | MCG_CAP_BANKS_MASK);
+if (unsupported_caps) {
+error_report("warning: Unsupported MCG_CAP bits: 0x%" PRIx64 "\n",
+ unsupported_caps);
+}
+
 env->mcg_cap &= mcg_cap | MCG_CAP_BANKS_MASK;
 ret = kvm_vcpu_ioctl(cs, KVM_X86_SETUP_MCE, >mcg_cap);
 if (ret < 0) {
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] target-i386: kvm: Use env->mcg_cap when setting up MCE

2015-11-25 Thread Eduardo Habkost
When setting up MCE, instead of using the MCE_*_DEF macros
directly, just filter the existing env->mcg_cap value.

As env->mcg_cap is already initialized as
MCE_CAP_DEF|MCE_BANKS_DEF at target-i386/cpu.c:mce_init(), this
doesn't change any behavior. But it will allow us to change
mce_init() in the future, to implement different defaults
depending on CPU model, machine-type or command-line parameters.

Signed-off-by: Eduardo Habkost 
---
 target-i386/cpu.h |  2 ++
 target-i386/kvm.c | 11 ---
 2 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index fc4a605..84edfd0 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -286,6 +286,8 @@
 #define MCE_CAP_DEF (MCG_CTL_P|MCG_SER_P)
 #define MCE_BANKS_DEF   10
 
+#define MCG_CAP_BANKS_MASK 0xff
+
 #define MCG_STATUS_RIPV (1ULL<<0)   /* restart ip valid */
 #define MCG_STATUS_EIPV (1ULL<<1)   /* ip points to correct instruction */
 #define MCG_STATUS_MCIP (1ULL<<2)   /* machine check in progress */
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index ee7bc69..d63a85b 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -784,21 +784,18 @@ int kvm_arch_init_vcpu(CPUState *cs)
 return ret;
 }
 
-if (MCE_BANKS_DEF > banks) {
+if ((env->mcg_cap & MCG_CAP_BANKS_MASK) > banks) {
 error_report("kvm: Unsupported MCE bank count: %d > %d\n",
- MCE_BANKS_DEF, banks);
+ (int)(env->mcg_cap & MCG_CAP_BANKS_MASK), banks);
 return -ENOTSUP;
 }
 
-mcg_cap &= MCE_CAP_DEF;
-mcg_cap |= MCE_BANKS_DEF;
-ret = kvm_vcpu_ioctl(cs, KVM_X86_SETUP_MCE, _cap);
+env->mcg_cap &= mcg_cap | MCG_CAP_BANKS_MASK;
+ret = kvm_vcpu_ioctl(cs, KVM_X86_SETUP_MCE, >mcg_cap);
 if (ret < 0) {
 fprintf(stderr, "KVM_X86_SETUP_MCE: %s", strerror(-ret));
 return ret;
 }
-
-env->mcg_cap = mcg_cap;
 }
 
 qemu_add_vm_change_state_handler(cpu_update_state, env);
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] target-i386: kvm: Abort if MCE bank count is not supported by host

2015-11-25 Thread Eduardo Habkost
Instead of silently changing the number of banks in mcg_cap based
on kvm_get_mce_cap_supported(), abort initialization if the host
doesn't support MCE_BANKS_DEF banks.

Note that MCE_BANKS_DEF was always 10 since it was introduced in
QEMU, and Linux always returned 32 at KVM_CAP_MCE since
KVM_CAP_MCE was introduced, so no behavior is being changed and
the error can't be triggered by any Linux version. The point of
the new check is to ensure we won't silently change the bank
count if we change MCE_BANKS_DEF or make the bank count
configurable in the future.

Signed-off-by: Eduardo Habkost 
---
 target-i386/kvm.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 2a9953b..ee7bc69 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -784,11 +784,14 @@ int kvm_arch_init_vcpu(CPUState *cs)
 return ret;
 }
 
-if (banks > MCE_BANKS_DEF) {
-banks = MCE_BANKS_DEF;
+if (MCE_BANKS_DEF > banks) {
+error_report("kvm: Unsupported MCE bank count: %d > %d\n",
+ MCE_BANKS_DEF, banks);
+return -ENOTSUP;
 }
+
 mcg_cap &= MCE_CAP_DEF;
-mcg_cap |= banks;
+mcg_cap |= MCE_BANKS_DEF;
 ret = kvm_vcpu_ioctl(cs, KVM_X86_SETUP_MCE, _cap);
 if (ret < 0) {
 fprintf(stderr, "KVM_X86_SETUP_MCE: %s", strerror(-ret));
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] target-i386: kvm: Abort if MCE bank count is not supported by host

2015-11-25 Thread Paolo Bonzini


On 25/11/2015 16:49, Eduardo Habkost wrote:
> Instead of silently changing the number of banks in mcg_cap based
> on kvm_get_mce_cap_supported(), abort initialization if the host
> doesn't support MCE_BANKS_DEF banks.
> 
> Note that MCE_BANKS_DEF was always 10 since it was introduced in
> QEMU, and Linux always returned 32 at KVM_CAP_MCE since
> KVM_CAP_MCE was introduced, so no behavior is being changed and
> the error can't be triggered by any Linux version. The point of
> the new check is to ensure we won't silently change the bank
> count if we change MCE_BANKS_DEF or make the bank count
> configurable in the future.
> 
> Signed-off-by: Eduardo Habkost 
> ---
>  target-i386/kvm.c | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> index 2a9953b..ee7bc69 100644
> --- a/target-i386/kvm.c
> +++ b/target-i386/kvm.c
> @@ -784,11 +784,14 @@ int kvm_arch_init_vcpu(CPUState *cs)
>  return ret;
>  }
>  
> -if (banks > MCE_BANKS_DEF) {
> -banks = MCE_BANKS_DEF;
> +if (MCE_BANKS_DEF > banks) {
> +error_report("kvm: Unsupported MCE bank count: %d > %d\n",
> + MCE_BANKS_DEF, banks);

Yoda conditions?

if (banks < MCE_BANKS_DEF) {
error_report("kvm: Unsupported MCE bank count (QEMU = %d, KVM = 
%d)",
 MCE_BANKS_DEF, banks);

Paolo

> +return -ENOTSUP;
>  }
> +
>  mcg_cap &= MCE_CAP_DEF;
> -mcg_cap |= banks;
> +mcg_cap |= MCE_BANKS_DEF;
>  ret = kvm_vcpu_ioctl(cs, KVM_X86_SETUP_MCE, _cap);
>  if (ret < 0) {
>  fprintf(stderr, "KVM_X86_SETUP_MCE: %s", strerror(-ret));
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 6/7] kvm/x86: Hyper-V SynIC message slot pending clearing at SINT ack

2015-11-25 Thread Paolo Bonzini


On 25/11/2015 17:55, Andrey Smetanin wrote:
>>
>> +gpa = synic->msg_page & PAGE_MASK;
>> +page = kvm_vcpu_gfn_to_page(vcpu, gpa >> PAGE_SHIFT);
>> +if (is_error_page(page)) {
>> +vcpu_err(vcpu, "Hyper-V SynIC can't get msg page, gpa 0x%llx\n",
>> + gpa);
>> +return;
>> +}
>> +msg_page = kmap_atomic(page);
>
> But the message page is not being pinned, is it?
>
> Actually I don't know anything about pinning.
> Is it pinning against page swapping ?

Yes.  Unless the page is pinned, kmap_atomic can fail.

However, I don't think that kvm_hv_notify_acked_sint is called from
atomic context.  It is only called from apic_set_eoi.  Could you just
use kvm_vcpu_write_guest_page?

By the way, do you need to do this also in kvm_get_apic_interrupt, for
auto EOI interrupts?

Thanks,

Paolo

> Could you please clarify and provide an API to use in this case ?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH V2 3/3] Ixgbevf: Add migration support for ixgbevf driver

2015-11-25 Thread Alexander Duyck
On Wed, Nov 25, 2015 at 8:39 AM, Michael S. Tsirkin  wrote:
> On Wed, Nov 25, 2015 at 08:24:38AM -0800, Alexander Duyck wrote:
>> >> Also, assuming you just want to do ifdown/ifup for some reason, it's
>> >> easy enough to do using a guest agent, in a completely generic way.
>> >>
>> >
>> > Just ifdown/ifup is not enough for migration. It needs to restore some PCI
>> > settings before doing ifup on the target machine
>>
>> That is why I have been suggesting making use of suspend/resume logic
>> that is already in place for PCI power management.  In the case of a
>> suspend/resume we already have to deal with the fact that the device
>> will go through a D0->D3->D0 reset so we have to restore all of the
>> existing state.  It would take a significant load off of Qemu since
>> the guest would be restoring its own state instead of making Qemu have
>> to do all of the device migration work.
>
> That can work, though again, the issue is you need guest
> cooperation to migrate.

Right now the problem is you need to have guest cooperation anyway as
you need to have some way of tracking the dirty pages.  If the IOMMU
on the host were to provide some sort of dirty page tracking then we
could exclude the guest from the equation, but until then we need the
guest to notify us of what pages it is letting the device dirty.  I'm
still of the opinion that the best way to go there is to just modify
the DMA API that is used in the guest so that it supports some sort of
page flag modification or something along those lines so we can track
all of the pages that might be written to by the device.

> If you reset device on destination instead of restoring state,
> then that issue goes away, but maybe the downtime
> will be increased.

Yes, the downtime will be increased, but it shouldn't be by much.
Depending on the setup a VF with a single queue can have about 3MB of
data outstanding when you move the driver over.  After that it is just
a matter of bringing the interface back up which should take only a
few hundred milliseconds assuming the PF is fairly responsive.

> Will it really? I think it's worth it to start with the
> simplest solution (reset on destination) and see
> what the effect is, then add optimizations.

Agreed.  My thought would be to start with something like
dma_mark_clean() that could be used to take care of marking the pages
for migration when they are unmapped or synced.

> One thing that I've been thinking about for a while, is saving (some)
> state speculatively.  For example, notify guest a bit before migration
> is done, so it can save device state. If guest responds quickly, you
> have state that can be restored.  If it doesn't, still migrate, and it
> will have to reset on destination.

I'm not sure how much more device state we really need to save.  The
driver in the guest has to have enough state to recover in the event
of a device failure resulting in a slot reset.  To top it off the
driver is able to reconfigure things probably as quick as we could if
we were restoring the state.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v1 5/7] kvm/x86: Hyper-V internal helper to read MSR HV_X64_MSR_TIME_REF_COUNT

2015-11-25 Thread Andrey Smetanin
This helper will be used also in Hyper-V SynIC timers implementation.

Signed-off-by: Andrey Smetanin 
Reviewed-by: Roman Kagan 
CC: Gleb Natapov 
CC: Paolo Bonzini 
CC: "K. Y. Srinivasan" 
CC: Haiyang Zhang 
CC: Vitaly Kuznetsov 
CC: Roman Kagan 
CC: Denis V. Lunev 
CC: qemu-de...@nongnu.org
---
 arch/x86/kvm/hyperv.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 41869a9..9958926 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -335,6 +335,11 @@ static void synic_init(struct kvm_vcpu_hv_synic *synic)
}
 }
 
+static u64 get_time_ref_counter(struct kvm *kvm)
+{
+   return div_u64(get_kernel_ns() + kvm->arch.kvmclock_offset, 100);
+}
+
 void kvm_hv_vcpu_init(struct kvm_vcpu *vcpu)
 {
synic_init(vcpu_to_synic(vcpu));
@@ -576,11 +581,9 @@ static int kvm_hv_get_msr_pw(struct kvm_vcpu *vcpu, u32 
msr, u64 *pdata)
case HV_X64_MSR_HYPERCALL:
data = hv->hv_hypercall;
break;
-   case HV_X64_MSR_TIME_REF_COUNT: {
-   data =
-div_u64(get_kernel_ns() + kvm->arch.kvmclock_offset, 100);
+   case HV_X64_MSR_TIME_REF_COUNT:
+   data = get_time_ref_counter(kvm);
break;
-   }
case HV_X64_MSR_REFERENCE_TSC:
data = hv->hv_tsc_page;
break;
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v1 2/7] drivers/hv: Move struct hv_message into UAPI Hyper-V x86 header

2015-11-25 Thread Andrey Smetanin
This struct is required for Hyper-V SynIC timers implementation inside KVM
and for upcoming Hyper-V VMBus support by userspace(QEMU). So place it into
Hyper-V UAPI header.

Signed-off-by: Andrey Smetanin 
Reviewed-by: Roman Kagan 
CC: Gleb Natapov 
CC: Paolo Bonzini 
CC: "K. Y. Srinivasan" 
CC: Haiyang Zhang 
CC: Vitaly Kuznetsov 
CC: Roman Kagan 
CC: Denis V. Lunev 
CC: qemu-de...@nongnu.org
---
 arch/x86/include/uapi/asm/hyperv.h | 91 ++
 drivers/hv/hyperv_vmbus.h  | 91 --
 2 files changed, 91 insertions(+), 91 deletions(-)

diff --git a/arch/x86/include/uapi/asm/hyperv.h 
b/arch/x86/include/uapi/asm/hyperv.h
index 07981f0..e86d77e 100644
--- a/arch/x86/include/uapi/asm/hyperv.h
+++ b/arch/x86/include/uapi/asm/hyperv.h
@@ -271,4 +271,95 @@ typedef struct _HV_REFERENCE_TSC_PAGE {
 
 #define HV_SYNIC_STIMER_COUNT  (4)
 
+/* Define synthetic interrupt controller message constants. */
+#define HV_MESSAGE_SIZE(256)
+#define HV_MESSAGE_PAYLOAD_BYTE_COUNT  (240)
+#define HV_MESSAGE_PAYLOAD_QWORD_COUNT (30)
+
+/* Define hypervisor message types. */
+enum hv_message_type {
+   HVMSG_NONE  = 0x,
+
+   /* Memory access messages. */
+   HVMSG_UNMAPPED_GPA  = 0x8000,
+   HVMSG_GPA_INTERCEPT = 0x8001,
+
+   /* Timer notification messages. */
+   HVMSG_TIMER_EXPIRED = 0x8010,
+
+   /* Error messages. */
+   HVMSG_INVALID_VP_REGISTER_VALUE = 0x8020,
+   HVMSG_UNRECOVERABLE_EXCEPTION   = 0x8021,
+   HVMSG_UNSUPPORTED_FEATURE   = 0x8022,
+
+   /* Trace buffer complete messages. */
+   HVMSG_EVENTLOG_BUFFERCOMPLETE   = 0x8040,
+
+   /* Platform-specific processor intercept messages. */
+   HVMSG_X64_IOPORT_INTERCEPT  = 0x8001,
+   HVMSG_X64_MSR_INTERCEPT = 0x80010001,
+   HVMSG_X64_CPUID_INTERCEPT   = 0x80010002,
+   HVMSG_X64_EXCEPTION_INTERCEPT   = 0x80010003,
+   HVMSG_X64_APIC_EOI  = 0x80010004,
+   HVMSG_X64_LEGACY_FP_ERROR   = 0x80010005
+};
+
+/* Define synthetic interrupt controller message flags. */
+union hv_message_flags {
+   __u8 asu8;
+   struct {
+   __u8 msg_pending:1;
+   __u8 reserved:7;
+   };
+};
+
+/* Define port identifier type. */
+union hv_port_id {
+   __u32 asu32;
+   struct {
+   __u32 id:24;
+   __u32 reserved:8;
+   } u;
+};
+
+/* Define port type. */
+enum hv_port_type {
+   HVPORT_MSG  = 1,
+   HVPORT_EVENT= 2,
+   HVPORT_MONITOR  = 3
+};
+
+/* Define synthetic interrupt controller message header. */
+struct hv_message_header {
+   enum hv_message_type message_type;
+   __u8 payload_size;
+   union hv_message_flags message_flags;
+   __u8 reserved[2];
+   union {
+   __u64 sender;
+   union hv_port_id port;
+   };
+};
+
+/* Define timer message payload structure. */
+struct hv_timer_message_payload {
+   __u32 timer_index;
+   __u32 reserved;
+   __u64 expiration_time;  /* When the timer expired */
+   __u64 delivery_time;/* When the message was delivered */
+};
+
+/* Define synthetic interrupt controller message format. */
+struct hv_message {
+   struct hv_message_header header;
+   union {
+   __u64 payload[HV_MESSAGE_PAYLOAD_QWORD_COUNT];
+   } u;
+};
+
+/* Define the synthetic interrupt message page layout. */
+struct hv_message_page {
+   struct hv_message sint_message[HV_SYNIC_SINT_COUNT];
+};
+
 #endif
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 46e23d1..d22230c 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -63,10 +63,6 @@ enum hv_cpuid_function {
 /* Define version of the synthetic interrupt controller. */
 #define HV_SYNIC_VERSION   (1)
 
-/* Define synthetic interrupt controller message constants. */
-#define HV_MESSAGE_SIZE(256)
-#define HV_MESSAGE_PAYLOAD_BYTE_COUNT  (240)
-#define HV_MESSAGE_PAYLOAD_QWORD_COUNT (30)
 #define HV_ANY_VP  (0x)
 
 /* Define synthetic interrupt controller flag constants. */
@@ -74,53 +70,9 @@ enum hv_cpuid_function {
 #define HV_EVENT_FLAGS_BYTE_COUNT  (256)
 #define HV_EVENT_FLAGS_DWORD_COUNT (256 / sizeof(u32))
 
-/* Define hypervisor message types. */
-enum hv_message_type {
-   HVMSG_NONE  = 0x,
-
-   /* Memory access messages. */
-   HVMSG_UNMAPPED_GPA  = 0x8000,
-   HVMSG_GPA_INTERCEPT = 0x8001,
-
-  

[PATCH v1 3/7] kvm/x86: Rearrange func's declarations inside Hyper-V header

2015-11-25 Thread Andrey Smetanin
This rearrangement places functions declarations together
according to their functionality, so future additions
will be simplier.

Signed-off-by: Andrey Smetanin 
Reviewed-by: Roman Kagan 
CC: Gleb Natapov 
CC: Paolo Bonzini 
CC: "K. Y. Srinivasan" 
CC: Haiyang Zhang 
CC: Vitaly Kuznetsov 
CC: Roman Kagan 
CC: Denis V. Lunev 
CC: qemu-de...@nongnu.org
---
 arch/x86/kvm/hyperv.h | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/hyperv.h b/arch/x86/kvm/hyperv.h
index 315af4b..9483d49 100644
--- a/arch/x86/kvm/hyperv.h
+++ b/arch/x86/kvm/hyperv.h
@@ -24,14 +24,6 @@
 #ifndef __ARCH_X86_KVM_HYPERV_H__
 #define __ARCH_X86_KVM_HYPERV_H__
 
-int kvm_hv_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data, bool host);
-int kvm_hv_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata);
-bool kvm_hv_hypercall_enabled(struct kvm *kvm);
-int kvm_hv_hypercall(struct kvm_vcpu *vcpu);
-
-int kvm_hv_synic_set_irq(struct kvm *kvm, u32 vcpu_id, u32 sint);
-void kvm_hv_synic_send_eoi(struct kvm_vcpu *vcpu, int vector);
-
 static inline struct kvm_vcpu_hv_synic *vcpu_to_synic(struct kvm_vcpu *vcpu)
 {
return >arch.hyperv.synic;
@@ -46,10 +38,18 @@ static inline struct kvm_vcpu *synic_to_vcpu(struct 
kvm_vcpu_hv_synic *synic)
arch = container_of(hv, struct kvm_vcpu_arch, hyperv);
return container_of(arch, struct kvm_vcpu, arch);
 }
-void kvm_hv_irq_routing_update(struct kvm *kvm);
 
-void kvm_hv_vcpu_init(struct kvm_vcpu *vcpu);
+int kvm_hv_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data, bool host);
+int kvm_hv_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata);
+
+bool kvm_hv_hypercall_enabled(struct kvm *kvm);
+int kvm_hv_hypercall(struct kvm_vcpu *vcpu);
 
+void kvm_hv_irq_routing_update(struct kvm *kvm);
+int kvm_hv_synic_set_irq(struct kvm *kvm, u32 vcpu_id, u32 sint);
+void kvm_hv_synic_send_eoi(struct kvm_vcpu *vcpu, int vector);
 int kvm_hv_activate_synic(struct kvm_vcpu *vcpu);
 
+void kvm_hv_vcpu_init(struct kvm_vcpu *vcpu);
+
 #endif
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH V2 09/10] Qemu/VFIO: Add SRIOV VF migration support

2015-11-25 Thread Lan, Tianyu


On 11/25/2015 5:03 AM, Michael S. Tsirkin wrote:

>+void vfio_migration_cap_handle(PCIDevice *pdev, uint32_t addr,
>+  uint32_t val, int len)
>+{
>+VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
>+
>+if (addr == vdev->migration_cap + PCI_VF_MIGRATION_VF_STATUS
>+&& val == PCI_VF_READY_FOR_MIGRATION) {
>+qemu_event_set(_event);

This would wake migration so it can proceed -
except it needs QEMU lock to run, and that's
taken by the migration thread.


Sorry, I seem to miss something.
Which lock may cause dead lock when calling vfio_migration_cap_handle()
and run migration?
The function is called when VF accesses faked PCI capability.



It seems unlikely that this ever worked - how
did you test this?


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3] target-i386: kvm: Use env->mcg_cap when setting up MCE

2015-11-25 Thread Eduardo Habkost
Instead of overwriting env->mcg_cap, make kvm_arch_init_vcpu(),
use the value already set at the CPU object when initializing
MCE.

Except for the new "unsupported MCG_CAPS bits" warning, this
patch doesn't change any of the existing QEMU behavior. The
previous code set env->mcg_cap to:
  (MCE_CAP_DEF & ioctl(KVM_X86_GET_MCE_CAP_SUPPORTED)) | MCE_BANKS_DEF
and the new code still keeps it exactly the same, as env->mcg_cap
is already initialized as MCE_CAP_DEF|MCE_BANKS_DEF at
mce_init().

This will allow us to change mce_init() in the future, to
implement different defaults depending on CPU model, machine-type
or command-line parameters.

Eduardo Habkost (3):
  target-i386: kvm: Abort if MCE bank count is not supported by host
  target-i386: kvm: Use env->mcg_cap when setting up MCE
  target-i386: kvm: Print warning when clearing mcg_cap bits

 target-i386/cpu.h |  2 ++
 target-i386/kvm.c | 22 ++
 2 files changed, 16 insertions(+), 8 deletions(-)

-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: nVMX: remove incorrect vpid check in nested invvpid emulation

2015-11-25 Thread Paolo Bonzini


On 25/11/2015 16:45, Bandan Das wrote:
> Haozhong Zhang  writes:
> 
>> This patch removes the vpid check when emulating nested invvpid
>> instruction of type all-contexts invalidation. The existing code is
>> incorrect because:
>>  (1) According to Intel SDM Vol 3, Section "INVVPID - Invalidate
>>  Translations Based on VPID", invvpid instruction does not check
>>  vpid in the invvpid descriptor when its type is all-contexts
>>  invalidation.
> 
> But iirc isn't vpid=0 reserved for root mode ? I think we don't want
> L1 hypervisor to be able do a invvpid(0).

The instruction simply "invalidates all mappings tagged with all
non-zero VPIDs", which in our case is all L0 mappings tagged with vpid02.

Paolo

>>  (2) According to the same document, invvpid of type all-contexts
>>  invalidation does not require there is an active VMCS, so/and
>>  get_vmcs12() in the existing code may result in a NULL-pointer
>>  dereference. In practice, it can crash both KVM itself and L1
>>  hypervisors that use invvpid (e.g. Xen).
> 
> If that is the case, then just check if it's null and return without
> doing anything.
> 
>> Signed-off-by: Haozhong Zhang 
>> ---
>>  arch/x86/kvm/vmx.c | 5 -
>>  1 file changed, 5 deletions(-)
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index 87acc52..af823a3 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -7394,11 +7394,6 @@ static int handle_invvpid(struct kvm_vcpu *vcpu)
>>  
>>  switch (type) {
>>  case VMX_VPID_EXTENT_ALL_CONTEXT:
>> -if (get_vmcs12(vcpu)->virtual_processor_id == 0) {
>> -nested_vmx_failValid(vcpu,
>> -VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID);
>> -return 1;
>> -}
>>  __vmx_flush_tlb(vcpu, to_vmx(vcpu)->nested.vpid02);
>>  nested_vmx_succeed(vcpu);
>>  break;
> 
> I also noticed a BUG() here in the default. It might be a good idea to replace
> it with a WARN.
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] target-i386: kvm: Abort if MCE bank count is not supported by host

2015-11-25 Thread Eduardo Habkost
On Wed, Nov 25, 2015 at 05:46:38PM +0100, Paolo Bonzini wrote:
> 
> 
> On 25/11/2015 16:49, Eduardo Habkost wrote:
> > Instead of silently changing the number of banks in mcg_cap based
> > on kvm_get_mce_cap_supported(), abort initialization if the host
> > doesn't support MCE_BANKS_DEF banks.
> > 
> > Note that MCE_BANKS_DEF was always 10 since it was introduced in
> > QEMU, and Linux always returned 32 at KVM_CAP_MCE since
> > KVM_CAP_MCE was introduced, so no behavior is being changed and
> > the error can't be triggered by any Linux version. The point of
> > the new check is to ensure we won't silently change the bank
> > count if we change MCE_BANKS_DEF or make the bank count
> > configurable in the future.
> > 
> > Signed-off-by: Eduardo Habkost 
> > ---
> >  target-i386/kvm.c | 9 ++---
> >  1 file changed, 6 insertions(+), 3 deletions(-)
> > 
> > diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> > index 2a9953b..ee7bc69 100644
> > --- a/target-i386/kvm.c
> > +++ b/target-i386/kvm.c
> > @@ -784,11 +784,14 @@ int kvm_arch_init_vcpu(CPUState *cs)
> >  return ret;
> >  }
> >  
> > -if (banks > MCE_BANKS_DEF) {
> > -banks = MCE_BANKS_DEF;
> > +if (MCE_BANKS_DEF > banks) {
> > +error_report("kvm: Unsupported MCE bank count: %d > %d\n",
> > + MCE_BANKS_DEF, banks);
> 
> Yoda conditions?
> 
> if (banks < MCE_BANKS_DEF) {
> error_report("kvm: Unsupported MCE bank count (QEMU = %d, KVM = 
> %d)",
>  MCE_BANKS_DEF, banks);

This was on purpose, because MCE_BANKS_DEF is replaced by
(env->mcg_caps & MCG_CAPS_COUNT_MASK) in the next patch.

-- 
Eduardo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] target-i386: kvm: Print warning when clearing mcg_cap bits

2015-11-25 Thread Borislav Petkov
On Wed, Nov 25, 2015 at 06:29:25PM +0100, Paolo Bonzini wrote:
> On 25/11/2015 18:21, Borislav Petkov wrote:
> >> Instead of silently clearing mcg_cap bits when the host doesn't
> >> > support them, print a warning when doing that.
> > Why the host? Why would we want there to be any relation between the MCA
> > capabilities of the host and what qemu is emulating?
> 
> He means the hypervisor. :)

Ah, ok. :)

Then they look good to me, a step in the right direction.

Acked-by: Borislav Petkov 

Thanks!

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 7/9] KVM: PPC: Book3S HV: Host side kick VCPU when poked by real-mode KVM

2015-11-25 Thread Suresh Warrier
This patch adds the support for the kick VCPU operation for
kvmppc_host_rm_ops. The kvmppc_xics_ipi_action() function
provides the function to be invoked for a host side operation
when poked by the real mode KVM. This is initiated by KVM by
sending an IPI to any free host core.

KVM real mode must set the rm_action to XICS_RM_KICK_VCPU and
rm_data to point to the VCPU to be woken up before sending the IPI.
Note that we have allocated one kvmppc_host_rm_core structure
per core. The above values need to be set in the structure
corresponding to the core to which the IPI will be sent.

Signed-off-by: Suresh Warrier 
---
 arch/powerpc/include/asm/kvm_ppc.h   |  1 +
 arch/powerpc/kvm/book3s_hv.c |  2 ++
 arch/powerpc/kvm/book3s_hv_rm_xics.c | 36 
 3 files changed, 39 insertions(+)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index 47cd441..1b93519 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -447,6 +447,7 @@ extern u64 kvmppc_xics_get_icp(struct kvm_vcpu *vcpu);
 extern int kvmppc_xics_set_icp(struct kvm_vcpu *vcpu, u64 icpval);
 extern int kvmppc_xics_connect_vcpu(struct kvm_device *dev,
struct kvm_vcpu *vcpu, u32 cpu);
+extern void kvmppc_xics_ipi_action(void);
 #else
 static inline void kvmppc_alloc_host_rm_ops(void) {};
 static inline void kvmppc_free_host_rm_ops(void) {};
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index da2cc56..d6280ed 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3085,6 +3085,8 @@ void kvmppc_alloc_host_rm_ops(void)
ops->rm_core[core].rm_state.in_host = 1;
}
 
+   ops->vcpu_kick = kvmppc_fast_vcpu_kick_hv;
+
/*
 * Make the contents of the kvmppc_host_rm_ops structure visible
 * to other CPUs before we assign it to the global variable.
diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c 
b/arch/powerpc/kvm/book3s_hv_rm_xics.c
index 24f5807..43ffbfe 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "book3s_xics.h"
@@ -623,3 +624,38 @@ int kvmppc_rm_h_eoi(struct kvm_vcpu *vcpu, unsigned long 
xirr)
  bail:
return check_too_hard(xics, icp);
 }
+
+/*  --- Non-real mode XICS-related built-in routines ---  */
+
+/**
+ * Host Operations poked by RM KVM
+ */
+static void rm_host_ipi_action(int action, void *data)
+{
+   switch (action) {
+   case XICS_RM_KICK_VCPU:
+   kvmppc_host_rm_ops_hv->vcpu_kick(data);
+   break;
+   default:
+   WARN(1, "Unexpected rm_action=%d data=%p\n", action, data);
+   break;
+   }
+
+}
+
+void kvmppc_xics_ipi_action(void)
+{
+   int core;
+   unsigned int cpu = smp_processor_id();
+   struct kvmppc_host_rm_core *rm_corep;
+
+   core = cpu >> threads_shift;
+   rm_corep = _host_rm_ops_hv->rm_core[core];
+
+   if (rm_corep->rm_data) {
+   rm_host_ipi_action(rm_corep->rm_state.rm_action,
+   rm_corep->rm_data);
+   rm_corep->rm_data = NULL;
+   rm_corep->rm_state.rm_action = 0;
+   }
+}
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/9] powerpc/smp: Add smp_muxed_ipi_set_message

2015-11-25 Thread Suresh Warrier
smp_muxed_ipi_message_pass() invokes smp_ops->cause_ipi, which
updates the MFFR through an ioremapped address, to cause the
IPI. Because of this real mode callers cannot call
smp_muxed_ipi_message_pass() for IPI messaging.

This patch creates a separate function smp_muxed_ipi_set_message
just to set the IPI message without the cause_ipi routine.
After calling this function to set the IPI message, real
mode callers must cause the IPI directly.

As part of this, we also change smp_muxed_ipi_message_pass
to call smp_muxed_ipi_set_message to set the message instead
of doing it directly inside the routine.

Signed-off-by: Suresh Warrier 
---
 arch/powerpc/include/asm/smp.h | 1 +
 arch/powerpc/kernel/smp.c  | 9 -
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index 9ef9c37..78083ed 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -124,6 +124,7 @@ extern const char *smp_ipi_name[];
 /* for irq controllers with only a single ipi */
 extern void smp_muxed_ipi_set_data(int cpu, unsigned long data);
 extern void smp_muxed_ipi_message_pass(int cpu, int msg);
+extern void smp_muxed_ipi_set_message(int cpu, int msg);
 extern irqreturn_t smp_ipi_demux(void);
 
 void smp_init_pSeries(void);
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index a53a130..e222efc 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -218,7 +218,7 @@ void smp_muxed_ipi_set_data(int cpu, unsigned long data)
info->data = data;
 }
 
-void smp_muxed_ipi_message_pass(int cpu, int msg)
+void smp_muxed_ipi_set_message(int cpu, int msg)
 {
struct cpu_messages *info = _cpu(ipi_message, cpu);
char *message = (char *)>messages;
@@ -228,6 +228,13 @@ void smp_muxed_ipi_message_pass(int cpu, int msg)
 */
smp_mb();
message[msg] = 1;
+}
+
+void smp_muxed_ipi_message_pass(int cpu, int msg)
+{
+   struct cpu_messages *info = _cpu(ipi_message, cpu);
+
+   smp_muxed_ipi_set_message(cpu, msg);
/*
 * cause_ipi functions are required to include a full barrier
 * before doing whatever causes the IPI.
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 4/9] KVM: PPC: Book3S HV: Host-side RM data structures

2015-11-25 Thread Suresh Warrier
This patch defines the data structures to support the setting up
of host side operations while running in real mode in the guest,
and also the functions to allocate and free it.

The operations are for now limited to virtual XICS operations.
Currently, we have only defined one operation in the data
structure:
 - Wake up a VCPU sleeping in the host when it
   receives a virtual interrupt

The operations are assigned at the core level because PowerKVM
requires that the host run in SMT off mode. For each core,
we will need to manage its state atomically - where the state
is defined by:
1. Is the core running in the host?
2. Is there a Real Mode (RM) operation pending on the host?

Currently, core state is only managed at the whole-core level
even when the system is in split-core mode. This just limits
the number of free or "available" cores in the host to perform
any host-side operations.

The kvmppc_host_rm_core.rm_data allows any data to be passed by
KVM in real mode to the host core along with the operation to
be performed.

The kvmppc_host_rm_ops structure is allocated the very first time
a guest VM is started. Initial core state is also set - all online
cores are in the host. This structure is never deleted, not even
when there are no active guests. However, it needs to be freed
when the module is unloaded because the kvmppc_host_rm_ops_hv
can contain function pointers to kvm-hv.ko functions for the
different supported host operations.

Signed-off-by: Suresh Warrier 
---
 arch/powerpc/include/asm/kvm_ppc.h   | 31 
 arch/powerpc/kvm/book3s_hv.c | 70 
 arch/powerpc/kvm/book3s_hv_builtin.c |  3 ++
 3 files changed, 104 insertions(+)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index c6ef05b..47cd441 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -437,6 +437,8 @@ static inline int kvmppc_xics_enabled(struct kvm_vcpu *vcpu)
 {
return vcpu->arch.irq_type == KVMPPC_IRQ_XICS;
 }
+extern void kvmppc_alloc_host_rm_ops(void);
+extern void kvmppc_free_host_rm_ops(void);
 extern void kvmppc_xics_free_icp(struct kvm_vcpu *vcpu);
 extern int kvmppc_xics_create_icp(struct kvm_vcpu *vcpu, unsigned long server);
 extern int kvm_vm_ioctl_xics_irq(struct kvm *kvm, struct kvm_irq_level *args);
@@ -446,6 +448,8 @@ extern int kvmppc_xics_set_icp(struct kvm_vcpu *vcpu, u64 
icpval);
 extern int kvmppc_xics_connect_vcpu(struct kvm_device *dev,
struct kvm_vcpu *vcpu, u32 cpu);
 #else
+static inline void kvmppc_alloc_host_rm_ops(void) {};
+static inline void kvmppc_free_host_rm_ops(void) {};
 static inline int kvmppc_xics_enabled(struct kvm_vcpu *vcpu)
{ return 0; }
 static inline void kvmppc_xics_free_icp(struct kvm_vcpu *vcpu) { }
@@ -459,6 +463,33 @@ static inline int kvmppc_xics_hcall(struct kvm_vcpu *vcpu, 
u32 cmd)
{ return 0; }
 #endif
 
+/*
+ * Host-side operations we want to set up while running in real
+ * mode in the guest operating on the xics.
+ * Currently only VCPU wakeup is supported.
+ */
+
+union kvmppc_rm_state {
+   unsigned long raw;
+   struct {
+   u32 in_host;
+   u32 rm_action;
+   };
+};
+
+struct kvmppc_host_rm_core {
+   union kvmppc_rm_state rm_state;
+   void *rm_data;
+   char pad[112];
+};
+
+struct kvmppc_host_rm_ops {
+   struct kvmppc_host_rm_core  *rm_core;
+   void(*vcpu_kick)(struct kvm_vcpu *vcpu);
+};
+
+extern struct kvmppc_host_rm_ops *kvmppc_host_rm_ops_hv;
+
 static inline unsigned long kvmppc_get_epr(struct kvm_vcpu *vcpu)
 {
 #ifdef CONFIG_KVM_BOOKE_HV
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 54b45b7..4042623 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2967,6 +2967,73 @@ static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu 
*vcpu)
goto out_srcu;
 }
 
+#ifdef CONFIG_KVM_XICS
+/*
+ * Allocate a per-core structure for managing state about which cores are
+ * running in the host versus the guest and for exchanging data between
+ * real mode KVM and CPU running in the host.
+ * This is only done for the first VM.
+ * The allocated structure stays even if all VMs have stopped.
+ * It is only freed when the kvm-hv module is unloaded.
+ * It's OK for this routine to fail, we just don't support host
+ * core operations like redirecting H_IPI wakeups.
+ */
+void kvmppc_alloc_host_rm_ops(void)
+{
+   struct kvmppc_host_rm_ops *ops;
+   unsigned long l_ops;
+   int cpu, core;
+   int size;
+
+   /* Not the first time here ? */
+   if (kvmppc_host_rm_ops_hv != NULL)
+   return;
+
+   ops = kzalloc(sizeof(struct kvmppc_host_rm_ops), GFP_KERNEL);
+   if (!ops)
+   return;
+
+   size = cpu_nr_cores() * sizeof(struct kvmppc_host_rm_core);
+   

[PATCH v2 8/9] KVM: PPC: Book3S HV: Send IPI to host core to wake VCPU

2015-11-25 Thread Suresh Warrier
This patch adds support to real-mode KVM to search for a core
running in the host partition and send it an IPI message with
VCPU to be woken. This avoids having to switch to the host
partition to complete an H_IPI hypercall when the VCPU which
is the target of the the H_IPI is not loaded (is not running
in the guest).

The patch also includes the support in the IPI handler running
in the host to do the wakeup by calling kvmppc_xics_ipi_action
for the PPC_MSG_RM_HOST_ACTION message.

When a guest is being destroyed, we need to ensure that there
are no pending IPIs waiting to wake up a VCPU before we free
the VCPUs of the guest. This is accomplished by:
- Forces a PPC_MSG_CALL_FUNCTION IPI to be completed by all CPUs
  before freeing any VCPUs in kvm_arch_destroy_vm()
- Any PPC_MSG_RM_HOST_ACTION messages must be executed first
  before any other PPC_MSG_CALL_FUNCTION messages

Signed-off-by: Suresh Warrier 
---
 arch/powerpc/kernel/smp.c| 11 +
 arch/powerpc/kvm/book3s_hv_rm_xics.c | 81 ++--
 arch/powerpc/kvm/powerpc.c   | 10 +
 3 files changed, 99 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index e222efc..cb8be5d 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -257,6 +257,17 @@ irqreturn_t smp_ipi_demux(void)
 
do {
all = xchg(>messages, 0);
+#if defined(CONFIG_KVM_XICS) && defined(CONFIG_KVM_BOOK3S_HV_POSSIBLE)
+   /*
+* Must check for PPC_MSG_RM_HOST_ACTION messages
+* before PPC_MSG_CALL_FUNCTION messages because when
+* a VM is destroyed, we call kick_all_cpus_sync()
+* to ensure that any pending PPC_MSG_RM_HOST_ACTION
+* messages have completed before we free any VCPUs.
+*/
+   if (all & IPI_MESSAGE(PPC_MSG_RM_HOST_ACTION))
+   kvmppc_xics_ipi_action();
+#endif
if (all & IPI_MESSAGE(PPC_MSG_CALL_FUNCTION))
generic_smp_call_function_interrupt();
if (all & IPI_MESSAGE(PPC_MSG_RESCHEDULE))
diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c 
b/arch/powerpc/kvm/book3s_hv_rm_xics.c
index 43ffbfe..a8ca3ed 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -51,11 +51,70 @@ static void ics_rm_check_resend(struct kvmppc_xics *xics,
 
 /* -- ICP routines -- */
 
+/*
+ * We start the search from our current CPU Id in the core map
+ * and go in a circle until we get back to our ID looking for a
+ * core that is running in host context and that hasn't already
+ * been targeted for another rm_host_ops.
+ *
+ * In the future, could consider using a fairer algorithm (one
+ * that distributes the IPIs better)
+ *
+ * Returns -1, if no CPU could be found in the host
+ * Else, returns a CPU Id which has been reserved for use
+ */
+static inline int grab_next_hostcore(int start,
+   struct kvmppc_host_rm_core *rm_core, int max, int action)
+{
+   bool success;
+   int core;
+   union kvmppc_rm_state old, new;
+
+   for (core = start + 1; core < max; core++)  {
+   old = new = READ_ONCE(rm_core[core].rm_state);
+
+   if (!old.in_host || old.rm_action)
+   continue;
+
+   /* Try to grab this host core if not taken already. */
+   new.rm_action = action;
+
+   success = cmpxchg64(_core[core].rm_state.raw,
+   old.raw, new.raw) == old.raw;
+   if (success) {
+   /*
+* Make sure that the store to the rm_action is made
+* visible before we return to caller (and the
+* subsequent store to rm_data) to synchronize with
+* the IPI handler.
+*/
+   smp_wmb();
+   return core;
+   }
+   }
+
+   return -1;
+}
+
+static inline int find_available_hostcore(int action)
+{
+   int core;
+   int my_core = smp_processor_id() >> threads_shift;
+   struct kvmppc_host_rm_core *rm_core = kvmppc_host_rm_ops_hv->rm_core;
+
+   core = grab_next_hostcore(my_core, rm_core, cpu_nr_cores(), action);
+   if (core == -1)
+   core = grab_next_hostcore(core, rm_core, my_core, action);
+
+   return core;
+}
+
 static void icp_rm_set_vcpu_irq(struct kvm_vcpu *vcpu,
struct kvm_vcpu *this_vcpu)
 {
struct kvmppc_icp *this_icp = this_vcpu->arch.icp;
int cpu;
+   int hcore, hcpu;
 
/* Mark the target VCPU as having an interrupt pending */
vcpu->stat.queue_intr++;
@@ -67,11 +126,25 @@ static void icp_rm_set_vcpu_irq(struct kvm_vcpu *vcpu,
return;
}
 
-   

[PATCH v2 6/9] KVM: PPC: Book3S HV: kvmppc_host_rm_ops - handle offlining CPUs

2015-11-25 Thread Suresh Warrier
The kvmppc_host_rm_ops structure keeps track of which cores are
are in the host by maintaining a bitmask of active/runnable
online CPUs that have not entered the guest. This patch adds
support to manage the bitmask when a CPU is offlined or onlined
in the host.

Signed-off-by: Suresh Warrier 
---
 arch/powerpc/kvm/book3s_hv.c | 39 +++
 1 file changed, 39 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 95a2ed3..da2cc56 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3012,6 +3012,36 @@ static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu 
*vcpu)
 }
 
 #ifdef CONFIG_KVM_XICS
+static int kvmppc_cpu_notify(struct notifier_block *self, unsigned long action,
+   void *hcpu)
+{
+   unsigned long cpu = (long)hcpu;
+
+   switch (action) {
+   case CPU_UP_PREPARE:
+   case CPU_UP_PREPARE_FROZEN:
+   kvmppc_set_host_core(cpu);
+   break;
+
+#ifdef CONFIG_HOTPLUG_CPU
+   case CPU_DEAD:
+   case CPU_DEAD_FROZEN:
+   case CPU_UP_CANCELED:
+   case CPU_UP_CANCELED_FROZEN:
+   kvmppc_clear_host_core(cpu);
+   break;
+#endif
+   default:
+   break;
+   }
+
+   return NOTIFY_OK;
+}
+
+static struct notifier_block kvmppc_cpu_notifier = {
+   .notifier_call = kvmppc_cpu_notify,
+};
+
 /*
  * Allocate a per-core structure for managing state about which cores are
  * running in the host versus the guest and for exchanging data between
@@ -3045,6 +3075,8 @@ void kvmppc_alloc_host_rm_ops(void)
return;
}
 
+   get_online_cpus();
+
for (cpu = 0; cpu < nr_cpu_ids; cpu += threads_per_core) {
if (!cpu_online(cpu))
continue;
@@ -3063,14 +3095,21 @@ void kvmppc_alloc_host_rm_ops(void)
l_ops = (unsigned long) ops;
 
if (cmpxchg64((unsigned long *)_host_rm_ops_hv, 0, l_ops)) {
+   put_online_cpus();
kfree(ops->rm_core);
kfree(ops);
+   return;
}
+
+   register_cpu_notifier(_cpu_notifier);
+
+   put_online_cpus();
 }
 
 void kvmppc_free_host_rm_ops(void)
 {
if (kvmppc_host_rm_ops_hv) {
+   unregister_cpu_notifier(_cpu_notifier);
kfree(kvmppc_host_rm_ops_hv->rm_core);
kfree(kvmppc_host_rm_ops_hv);
kvmppc_host_rm_ops_hv = NULL;
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/9] powerpc/smp: Support more IPI messages

2015-11-25 Thread Suresh Warrier
This patch increases the number of demuxed messages for a
controller with a single ipi to 8 for 64-bit systems

This is required because we want to use the IPI mechanism
to send messages from a CPU running in KVM real mode in a
guest to a CPU in the host to take some action. Currently,
we only support 4 messages and all 4 are already taken.

Define a fifth message PPC_MSG_RM_HOST_ACTION for this
purpose.

Signed-off-by: Suresh Warrier 
---
 arch/powerpc/include/asm/smp.h | 3 +++
 arch/powerpc/kernel/smp.c  | 8 
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index 825663c..9ef9c37 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -114,6 +114,9 @@ extern int cpu_to_core_id(int cpu);
 #define PPC_MSG_TICK_BROADCAST 2
 #define PPC_MSG_DEBUGGER_BREAK  3
 
+/* This is only used by the powernv kernel */
+#define PPC_MSG_RM_HOST_ACTION 4
+
 /* for irq controllers that have dedicated ipis per message (4) */
 extern int smp_request_message_ipi(int virq, int message);
 extern const char *smp_ipi_name[];
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index ec9ec20..a53a130 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -206,7 +206,7 @@ int smp_request_message_ipi(int virq, int msg)
 
 #ifdef CONFIG_PPC_SMP_MUXED_IPI
 struct cpu_messages {
-   int messages;   /* current messages */
+   long messages;  /* current messages */
unsigned long data; /* data for cause ipi */
 };
 static DEFINE_PER_CPU_SHARED_ALIGNED(struct cpu_messages, ipi_message);
@@ -236,15 +236,15 @@ void smp_muxed_ipi_message_pass(int cpu, int msg)
 }
 
 #ifdef __BIG_ENDIAN__
-#define IPI_MESSAGE(A) (1 << (24 - 8 * (A)))
+#define IPI_MESSAGE(A) (1uL << ((BITS_PER_LONG - 8) - 8 * (A)))
 #else
-#define IPI_MESSAGE(A) (1 << (8 * (A)))
+#define IPI_MESSAGE(A) (1uL << (8 * (A)))
 #endif
 
 irqreturn_t smp_ipi_demux(void)
 {
struct cpu_messages *info = this_cpu_ptr(_message);
-   unsigned int all;
+   unsigned long all;
 
mb();   /* order any irq clear */
 
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 5/9] KVM: PPC: Book3S HV: Manage core host state

2015-11-25 Thread Suresh Warrier
Update the core host state in kvmppc_host_rm_ops whenever
the primary thread of the core enters the guest or returns
back.

Signed-off-by: Suresh Warrier 
---
 arch/powerpc/kvm/book3s_hv.c | 44 
 1 file changed, 44 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 4042623..95a2ed3 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2261,6 +2261,46 @@ static void post_guest_process(struct kvmppc_vcore *vc, 
bool is_master)
 }
 
 /*
+ * Clear core from the list of active host cores as we are about to
+ * enter the guest. Only do this if it is the primary thread of the
+ * core (not if a subcore) that is entering the guest.
+ */
+static inline void kvmppc_clear_host_core(int cpu)
+{
+   int core;
+
+   if (!kvmppc_host_rm_ops_hv || cpu_thread_in_core(cpu))
+   return;
+   /*
+* Memory barrier can be omitted here as we will do a smp_wmb()
+* later in kvmppc_start_thread and we need ensure that state is
+* visible to other CPUs only after we enter guest.
+*/
+   core = cpu >> threads_shift;
+   kvmppc_host_rm_ops_hv->rm_core[core].rm_state.in_host = 0;
+}
+
+/*
+ * Advertise this core as an active host core since we exited the guest
+ * Only need to do this if it is the primary thread of the core that is
+ * exiting.
+ */
+static inline void kvmppc_set_host_core(int cpu)
+{
+   int core;
+
+   if (!kvmppc_host_rm_ops_hv || cpu_thread_in_core(cpu))
+   return;
+
+   /*
+* Memory barrier can be omitted here because we do a spin_unlock
+* immediately after this which provides the memory barrier.
+*/
+   core = cpu >> threads_shift;
+   kvmppc_host_rm_ops_hv->rm_core[core].rm_state.in_host = 1;
+}
+
+/*
  * Run a set of guest threads on a physical core.
  * Called with vc->lock held.
  */
@@ -2372,6 +2412,8 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore 
*vc)
}
}
 
+   kvmppc_clear_host_core(pcpu);
+
/* Start all the threads */
active = 0;
for (sub = 0; sub < core_info.n_subcores; ++sub) {
@@ -2468,6 +2510,8 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore 
*vc)
kvmppc_ipi_thread(pcpu + i);
}
 
+   kvmppc_set_host_core(pcpu);
+
spin_unlock(>lock);
 
/* make sure updates to secondary vcpu structs are visible now */
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/9] powerpc/powernv: Add icp_native_cause_ipi_rm

2015-11-25 Thread Suresh Warrier
Function to cause an IPI. Requires kvm_hstate.xics_phys
to be initialized with physical address of XICS.

Signed-off-by: Suresh Warrier 
---
 arch/powerpc/include/asm/xics.h   |  1 +
 arch/powerpc/sysdev/xics/icp-native.c | 19 +++
 2 files changed, 20 insertions(+)

diff --git a/arch/powerpc/include/asm/xics.h b/arch/powerpc/include/asm/xics.h
index 0e25bdb..2546048 100644
--- a/arch/powerpc/include/asm/xics.h
+++ b/arch/powerpc/include/asm/xics.h
@@ -30,6 +30,7 @@
 #ifdef CONFIG_PPC_ICP_NATIVE
 extern int icp_native_init(void);
 extern void icp_native_flush_interrupt(void);
+extern void icp_native_cause_ipi_rm(int cpu);
 #else
 static inline int icp_native_init(void) { return -ENODEV; }
 #endif
diff --git a/arch/powerpc/sysdev/xics/icp-native.c 
b/arch/powerpc/sysdev/xics/icp-native.c
index eae3265..e39b18a 100644
--- a/arch/powerpc/sysdev/xics/icp-native.c
+++ b/arch/powerpc/sysdev/xics/icp-native.c
@@ -159,6 +159,25 @@ static void icp_native_cause_ipi(int cpu, unsigned long 
data)
icp_native_set_qirr(cpu, IPI_PRIORITY);
 }
 
+void icp_native_cause_ipi_rm(int cpu)
+{
+   /*
+* Currently not used to send IPIs to another CPU
+* on the same core. Only caller is KVM real mode.
+* Need the physical address of the XICS to be
+* previously saved in kvm_hstate in the paca.
+*/
+   unsigned long xics_phys;
+
+   /*
+* Just like the cause_ipi functions, it is required to
+* include a full barrier (out8 includes a sync) before
+* causing the IPI.
+*/
+   xics_phys = paca[cpu].kvm_hstate.xics_phys;
+   out_rm8((u8 *)(xics_phys + XICS_MFRR), IPI_PRIORITY);
+}
+
 /*
  * Called when an interrupt is received on an off-line CPU to
  * clear the interrupt, so that the CPU can go back to nap mode.
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 9/9] KVM: PPC: Book3S HV: Add tunable to control H_IPI redirection

2015-11-25 Thread Suresh Warrier
Redirecting the wakeup of a VCPU from the H_IPI hypercall to
a core running in the host is usually a good idea, most workloads
seemed to benefit. However, in one heavily interrupt-driven SMT1
workload, some regression was observed. This patch adds a kvm_hv
module parameter called h_ipi_redirect to control this feature.

The default value for this tunable is 1 - that is enable the feature.

Signed-off-by: Suresh Warrier 
---
 arch/powerpc/include/asm/kvm_ppc.h   |  1 +
 arch/powerpc/kvm/book3s_hv.c | 11 +++
 arch/powerpc/kvm/book3s_hv_rm_xics.c |  5 -
 3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index 1b93519..29d1442 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -448,6 +448,7 @@ extern int kvmppc_xics_set_icp(struct kvm_vcpu *vcpu, u64 
icpval);
 extern int kvmppc_xics_connect_vcpu(struct kvm_device *dev,
struct kvm_vcpu *vcpu, u32 cpu);
 extern void kvmppc_xics_ipi_action(void);
+extern int h_ipi_redirect;
 #else
 static inline void kvmppc_alloc_host_rm_ops(void) {};
 static inline void kvmppc_free_host_rm_ops(void) {};
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index d6280ed..182ec84 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -81,6 +81,17 @@ static int target_smt_mode;
 module_param(target_smt_mode, int, S_IRUGO | S_IWUSR);
 MODULE_PARM_DESC(target_smt_mode, "Target threads per core (0 = max)");
 
+#ifdef CONFIG_KVM_XICS
+static struct kernel_param_ops module_param_ops = {
+   .set = param_set_int,
+   .get = param_get_int,
+};
+
+module_param_cb(h_ipi_redirect, _param_ops, _ipi_redirect,
+   S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(h_ipi_redirect, "Redirect H_IPI wakeup to a free host core");
+#endif
+
 static void kvmppc_end_cede(struct kvm_vcpu *vcpu);
 static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu);
 
diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c 
b/arch/powerpc/kvm/book3s_hv_rm_xics.c
index a8ca3ed..4c062e7 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -24,6 +24,9 @@
 
 #define DEBUG_PASSUP
 
+int h_ipi_redirect = 1;
+EXPORT_SYMBOL(h_ipi_redirect);
+
 static void icp_rm_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp 
*icp,
u32 new_irq);
 
@@ -134,7 +137,7 @@ static void icp_rm_set_vcpu_irq(struct kvm_vcpu *vcpu,
cpu = vcpu->arch.thread_cpu;
if (cpu < 0 || cpu >= nr_cpu_ids) {
hcore = -1;
-   if (kvmppc_host_rm_ops_hv)
+   if (kvmppc_host_rm_ops_hv && h_ipi_redirect)
hcore = find_available_hostcore(XICS_RM_KICK_VCPU);
if (hcore != -1) {
hcpu = hcore << threads_shift;
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/9] KVM: PPC: Book3S HV: Optimize wakeup VCPU from H_IPI

2015-11-25 Thread Suresh Warrier
When the VCPU target of an H_IPI hypercall is not running
in the guest, we need to do a kick VCPU (wake the VCPU thread)
to make it runnable. The real-mode version of the H_IPI hypercall
cannot do this because it involves waking a sleeping thread.
Thus the hcall returns H_TOO_HARD which forces a switch back
to host so that the H_IPI call can be completed in virtual mode.
This has been found to cause a slowdown for many workloads like
YCSB MongoDB, small message networking, etc. 

One solution is to hand off this job of waking the VCPU to a CPU
that is running in the host by sending it a message through the 
IPI mechanism from the hypercall.

This patch set optimizes the wakeup of the target VCPU by posting
a free core already running in the host to do the wakeup, thus
avoiding the switch to host and back. It requires maintaining a
bitmask of all the available cores in the system to indicate if
they are in the host or running in some guest. It also requires
the H_IPI hypercall to search for a free host core and send it a
new IPI message PPC_MSG_RM_HOST_ACTION after stashing away some
parameters like the pointer to VCPU for the IPI handler. Locks
are avoided by using atomic operations to save core state, to
find and reserve a core in the host, etc.

Note that it is possible for a guest to be destroyed and its
VCPUs freed before the IPI handler gets to run. This case is
handled by ensuring that any pending PPC_MSG_RM_HOST_ACTION
IPIs are completed before proceeding with freeing the VCPUs.

Currently, powerpc only support 4 IPI messages and all 4 are 
already taken for other purposes. This patch also set increases
the number of supported IPI messages to 8. It also provides the
code to send an IPI from hypercall running in real-mode since
the existing cause_ipi functions cannot be executed in real-mode.

A tunable h_ipi_redirect is also included in the patch set to
disable the feature. 

v2:
* Complete patch set sent to both kvm and linuxppc mailing lists
  to avoid build-breaks.
* Broke up real mode IPI messaging function into two pieces - one
  to set the message and one to cause the IPI. New function
  icp_native_cause_ipi_rm added to arch/powerpc/sysdev/xics/icp-native.c

Suresh Warrier (9):
  powerpc/smp: Support more IPI messages
  powerpc/smp: Add smp_muxed_ipi_set_message
  powerpc/powernv: Add icp_native_cause_ipi_rm
  KVM: PPC: Book3S HV: Host-side RM data structures
  KVM: PPC: Book3S HV: Manage core host state
  KVM: PPC: Book3S HV: kvmppc_host_rm_ops - handle offlining CPUs
  KVM: PPC: Book3S HV: Host side kick VCPU when poked by real-mode KVM
  KVM: PPC: Book3S HV: Send IPI to host core to wake VCPU
  KVM: PPC: Book3S HV: Add tunable to control H_IPI redirection

 arch/powerpc/include/asm/kvm_ppc.h|  33 +++
 arch/powerpc/include/asm/smp.h|   4 +
 arch/powerpc/include/asm/xics.h   |   1 +
 arch/powerpc/kernel/smp.c |  28 +-
 arch/powerpc/kvm/book3s_hv.c  | 166 ++
 arch/powerpc/kvm/book3s_hv_builtin.c  |   3 +
 arch/powerpc/kvm/book3s_hv_rm_xics.c  | 120 +++-
 arch/powerpc/kvm/powerpc.c|  10 ++
 arch/powerpc/sysdev/xics/icp-native.c |  19 
 9 files changed, 376 insertions(+), 8 deletions(-)

-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH V2 0/3] IXGBE/VFIO: Add live migration support for SRIOV NIC

2015-11-25 Thread Alexander Duyck
On Wed, Nov 25, 2015 at 7:15 PM, Dong, Eddie  wrote:
>> On Wed, Nov 25, 2015 at 12:21 AM, Lan Tianyu  wrote:
>> > On 2015年11月25日 13:30, Alexander Duyck wrote:
>> >> No, what I am getting at is that you can't go around and modify the
>> >> configuration space for every possible device out there.  This
>> >> solution won't scale.
>> >
>> >
>> > PCI config space regs are emulation by Qemu and so We can find the
>> > free PCI config space regs for the faked PCI capability. Its position
>> > can be not permanent.
>>
>> Yes, but do you really want to edit every driver on every OS that you plan to
>> support this on.  What about things like direct assignment of regular 
>> Ethernet
>> ports?  What you really need is a solution that will work generically on any
>> existing piece of hardware out there.
>
> The fundamental assumption of this patch series is to modify the driver in 
> guest to self-emulate or track the device state, so that the migration may be 
> possible.
> I don't think we can modify OS, without modifying the drivers, even using the 
> PCIe hotplug mechanism.
> In the meantime, modifying Windows OS is a big challenge given that only 
> Microsoft can do. While, modifying driver is relatively simple and manageable 
> to device vendors, if the device vendor want to support state-clone based 
> migration.

The problem is the code you are presenting, even as a proof of concept
is seriously flawed.  It does a poor job of exposing how any of this
can be duplicated for any other VF other than the one you are working
on.

I am not saying you cannot modify the drivers, however what you are
doing is far too invasive.  Do you seriously plan on modifying all of
the PCI device drivers out there in order to allow any device that
might be direct assigned to a port to support migration?  I certainly
hope not.  That is why I have said that this solution will not scale.

What I am counter proposing seems like a very simple proposition.  It
can be implemented in two steps.

1.  Look at modifying dma_mark_clean().  It is a function called in
the sync and unmap paths of the lib/swiotlb.c.  If you could somehow
modify it to take care of marking the pages you unmap for Rx as being
dirty it will get you a good way towards your goal as it will allow
you to continue to do DMA while you are migrating the VM.

2.  Look at making use of the existing PCI suspend/resume calls that
are there to support PCI power management.  They have everything
needed to allow you to pause and resume DMA for the device before and
after the migration while retaining the driver state.  If you can
implement something that allows you to trigger these calls from the
PCI subsystem such as hot-plug then you would have a generic solution
that can be easily reproduced for multiple drivers beyond those
supported by ixgbevf.

Thanks.

- Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/10] KVM: x86: MMU: Move parent_pte handling from kvm_mmu_get_page() to link_shadow_page()

2015-11-25 Thread Takuya Yoshikawa

On 2015/11/26 1:32, Paolo Bonzini wrote:

On 20/11/2015 09:57, Xiao Guangrong wrote:



You can move this patch to the front of
[PATCH 08/10] KVM: x86: MMU: Use for_each_rmap_spte macro instead of
pte_list_walk()

By moving kvm_mmu_mark_parents_unsync() to the behind of mmu_spte_set()
(then the parent
spte is present now), you can directly clean up for_each_rmap_spte().


So basically squash together the two patches (8/10 and 9/10) except the
change to kvm_mmu_mark_parents_unsync; then in the second patch switch
from pte_list_walk to for_each_rmap_spte.

That makes sense indeed.


Sorry for my being late to respond to Xiao's suggestions.  I could not
use my development machine for a while this week.

In short, this kvm_mmu_mark_parents_unsync() call in kvm_mmu_get_page()
should have been mark_unsync() for the new parent_pte only, because we
are constructing the mappings from/to it and other parents in the
sp->parent_ptes are not related to this fault?

As the code has been this way for some time, a bit scary to change it,
but I'll do some tests without that extra kvm_mmu_mark_parents_unsync()
with a guest (with ept=0) this afternoon.

  Takuya


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [RFC PATCH V2 0/3] IXGBE/VFIO: Add live migration support for SRIOV NIC

2015-11-25 Thread Dong, Eddie
> On Wed, Nov 25, 2015 at 12:21 AM, Lan Tianyu  wrote:
> > On 2015年11月25日 13:30, Alexander Duyck wrote:
> >> No, what I am getting at is that you can't go around and modify the
> >> configuration space for every possible device out there.  This
> >> solution won't scale.
> >
> >
> > PCI config space regs are emulation by Qemu and so We can find the
> > free PCI config space regs for the faked PCI capability. Its position
> > can be not permanent.
> 
> Yes, but do you really want to edit every driver on every OS that you plan to
> support this on.  What about things like direct assignment of regular Ethernet
> ports?  What you really need is a solution that will work generically on any
> existing piece of hardware out there.

The fundamental assumption of this patch series is to modify the driver in 
guest to self-emulate or track the device state, so that the migration may be 
possible.
I don't think we can modify OS, without modifying the drivers, even using the 
PCIe hotplug mechanism.  
In the meantime, modifying Windows OS is a big challenge given that only 
Microsoft can do. While, modifying driver is relatively simple and manageable 
to device vendors, if the device vendor want to support state-clone based 
migration.

Thx Eddie


Re: [PATCH v6 1/3] target-i386: fallback vcpu's TSC rate to value returned by KVM

2015-11-25 Thread Eduardo Habkost
On Tue, Nov 24, 2015 at 11:33:55AM +0800, Haozhong Zhang wrote:
> If no user-specified TSC rate is present, we will try to set
> env->tsc_khz to the value returned by KVM_GET_TSC_KHZ. This patch does
> not change the current functionality of QEMU and just prepares for later
> patches to enable migrating vcpu's TSC rate.
> 
> Signed-off-by: Haozhong Zhang 

Reviewed-by: Eduardo Habkost 

> ---
>  target-i386/kvm.c | 14 ++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> index 2a9953b..a0fe9d4 100644
> --- a/target-i386/kvm.c
> +++ b/target-i386/kvm.c
> @@ -832,6 +832,20 @@ int kvm_arch_init_vcpu(CPUState *cs)
>  }
>  }
>  
> +/* vcpu's TSC frequency is either specified by user, or following
> + * the value used by KVM if the former is not present. In the
> + * latter case, we query it from KVM and record in env->tsc_khz,
> + * so that vcpu's TSC frequency can be migrated later via this field.
> + */
> +if (!env->tsc_khz) {
> +r = kvm_check_extension(cs->kvm_state, KVM_CAP_GET_TSC_KHZ) ?
> +kvm_vcpu_ioctl(cs, KVM_GET_TSC_KHZ) :
> +-ENOTSUP;
> +if (r > 0) {
> +env->tsc_khz = r;
> +}
> +}
> +
>  if (has_xsave) {
>  env->kvm_xsave_buf = qemu_memalign(4096, sizeof(struct kvm_xsave));
>  }
> -- 
> 2.4.8
> 

-- 
Eduardo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] KVM: x86: Add lowest-priority support for vt-d posted-interrupts

2015-11-25 Thread Wu, Feng


> -Original Message-
> From: Radim Krčmář [mailto:rkrc...@redhat.com]
> Sent: Wednesday, November 25, 2015 11:43 PM
> To: Paolo Bonzini 
> Cc: Wu, Feng ; kvm@vger.kernel.org; linux-
> ker...@vger.kernel.org
> Subject: Re: [PATCH] KVM: x86: Add lowest-priority support for vt-d posted-
> interrupts
> 
> 2015-11-25 15:38+0100, Paolo Bonzini:
> > On 25/11/2015 15:12, Radim Krcmár wrote:
> >> I think it's ok to pick any algorithm we like.  It's unlikely that
> >> software would recognize and take advantage of the hardware algorithm
> >> without adding a special treatment for KVM.
> >> (I'd vote for the simple pick-first-APIC lowest priority algorithm ...
> >>  I don't see much point in complicating lowest priority when it doesn't
> >>  deliver to lowest priority CPU anyway.)
> >
> > Vector hashing is an improvement for the common case where all vectors
> > are set to all CPUs.  Sure you can get an unlucky assignment, but it's
> > still better than pick-first-APIC.
> 
> Yeah, hashing has a valid use case, but a subtle weighting of drawbacks
> led me to prefer pick-first-APIC ...

Is it possible that pick-first-APIC policy make certain vCPU's irq workload too
heavy?

> 
> (I'd prefer to have simple code in KVM and depend on static IRQ balancing
>  in a guest to handle the distribution.
>  The guest could get the unlucky assignment anyway, so it should be
>  prepared;  and hashing just made KVM worse in that case.  Guests might
>  also configure physical x(2)APIC, where is no lowest priority.
>  And if the guest doesn't do anything with IRQs, then it might not even
>  care about the impact that our choice has.)

Do do you guys have an agreement on how to handle this? Or we can implement
the vector hashing at the current stage. then we can improve it like Radim 
mentioned
above if it is really needed? 

Thanks,
Feng


Is KVM support single step execution

2015-11-25 Thread Wu, Feng
Hi Paolo,

Do you know whether KVM supports single step execution? If it is, could you 
please give me some information about it. Really appreciate it!

Thanks,
Feng
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 0/7] KVM: Hyper-V SynIC timers

2015-11-25 Thread Wanpeng Li
2015-11-25 23:20 GMT+08:00 Andrey Smetanin :
> Per Hyper-V specification (and as required by Hyper-V-aware guests),
> SynIC provides 4 per-vCPU timers.  Each timer is programmed via a pair
> of MSRs, and signals expiration by delivering a special format message
> to the configured SynIC message slot and triggering the corresponding
> synthetic interrupt.

Could you post a link for this specification?

Regards,
Wanpeng Li
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: nVMX: remove incorrect vpid check in nested invvpid emulation

2015-11-25 Thread Haozhong Zhang
This patch removes the vpid check when emulating nested invvpid
instruction of type all-contexts invalidation. The existing code is
incorrect because:
 (1) According to Intel SDM Vol 3, Section "INVVPID - Invalidate
 Translations Based on VPID", invvpid instruction does not check
 vpid in the invvpid descriptor when its type is all-contexts
 invalidation.
 (2) According to the same document, invvpid of type all-contexts
 invalidation does not require there is an active VMCS, so/and
 get_vmcs12() in the existing code may result in a NULL-pointer
 dereference. In practice, it can crash both KVM itself and L1
 hypervisors that use invvpid (e.g. Xen).

Signed-off-by: Haozhong Zhang 
---
 arch/x86/kvm/vmx.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 87acc52..af823a3 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -7394,11 +7394,6 @@ static int handle_invvpid(struct kvm_vcpu *vcpu)
 
switch (type) {
case VMX_VPID_EXTENT_ALL_CONTEXT:
-   if (get_vmcs12(vcpu)->virtual_processor_id == 0) {
-   nested_vmx_failValid(vcpu,
-   VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID);
-   return 1;
-   }
__vmx_flush_tlb(vcpu, to_vmx(vcpu)->nested.vpid02);
nested_vmx_succeed(vcpu);
break;
-- 
2.4.8

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH V2 0/3] IXGBE/VFIO: Add live migration support for SRIOV NIC

2015-11-25 Thread Lan Tianyu
On 2015年11月25日 13:30, Alexander Duyck wrote:
> No, what I am getting at is that you can't go around and modify the
> configuration space for every possible device out there.  This
> solution won't scale.


PCI config space regs are emulation by Qemu and so We can find the free
PCI config space regs for the faked PCI capability. Its position can be
not permanent.


>  If you instead moved the logic for notifying
> the device into a separate mechanism such as making it a part of the
> hot-plug logic then you only have to write the code once per OS in
> order to get the hot-plug capability to pause/resume the device.  What
> I am talking about is not full hot-plug, but rather to extend the
> existing hot-plug in Qemu and the Linux kernel to support a
> "pause/resume" functionality.  The PCI hot-plug specification calls
> out the option of implementing something like this, but we don't
> currently have support for it.
>

Could you elaborate the part of PCI hot-plug specification you mentioned?

My concern is whether it needs to change PCI spec or not.



> I just feel doing it through PCI hot-plug messages will scale much
> better as you could likely make use of the power management
> suspend/resume calls to take care of most of the needed implementation
> details.
> 
> - Alex
-- 
Best regards
Tianyu Lan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/21] arm64: KVM: Implement timer save/restore

2015-11-25 Thread Marc Zyngier
On Mon, 23 Nov 2015 10:47:10 +
Steve Capper  wrote:

Hi Steve,

> On Mon, Nov 16, 2015 at 01:11:43PM +, Marc Zyngier wrote:
> > Implement the timer save restore as a direct translation of
> > the assembly code version.  
> 
> Hi Marc, some comments below.
> Cheers,
> -- 
> Steve
> 
> > 
> > Signed-off-by: Marc Zyngier 
> > ---
> >  arch/arm64/kvm/hyp/Makefile   |  1 +
> >  arch/arm64/kvm/hyp/hyp.h  |  3 ++
> >  arch/arm64/kvm/hyp/timer-sr.c | 68
> > +++ 3 files changed, 72
> > insertions(+) create mode 100644 arch/arm64/kvm/hyp/timer-sr.c
> > 
> > diff --git a/arch/arm64/kvm/hyp/Makefile
> > b/arch/arm64/kvm/hyp/Makefile index d1e38ce..455dc0a 100644
> > --- a/arch/arm64/kvm/hyp/Makefile
> > +++ b/arch/arm64/kvm/hyp/Makefile
> > @@ -4,3 +4,4 @@
> >  
> >  obj-$(CONFIG_KVM_ARM_HOST) += vgic-v2-sr.o
> >  obj-$(CONFIG_KVM_ARM_HOST) += vgic-v3-sr.o
> > +obj-$(CONFIG_KVM_ARM_HOST) += timer-sr.o
> > diff --git a/arch/arm64/kvm/hyp/hyp.h b/arch/arm64/kvm/hyp/hyp.h
> > index a31cb6e..86aa5a2 100644
> > --- a/arch/arm64/kvm/hyp/hyp.h
> > +++ b/arch/arm64/kvm/hyp/hyp.h
> > @@ -33,5 +33,8 @@ void __vgic_v2_restore_state(struct kvm_vcpu
> > *vcpu); void __vgic_v3_save_state(struct kvm_vcpu *vcpu);
> >  void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
> >  
> > +void __timer_save_state(struct kvm_vcpu *vcpu);
> > +void __timer_restore_state(struct kvm_vcpu *vcpu);
> > +
> >  #endif /* __ARM64_KVM_HYP_H__ */
> >  
> > diff --git a/arch/arm64/kvm/hyp/timer-sr.c
> > b/arch/arm64/kvm/hyp/timer-sr.c new file mode 100644
> > index 000..1a1d2ac
> > --- /dev/null
> > +++ b/arch/arm64/kvm/hyp/timer-sr.c
> > @@ -0,0 +1,68 @@
> > +/*
> > + * Copyright (C) 2012-2015 - ARM Ltd
> > + * Author: Marc Zyngier 
> > + *
> > + * This program is free software; you can redistribute it and/or
> > modify
> > + * it under the terms of the GNU General Public License version 2
> > as
> > + * published by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public
> > License
> > + * along with this program.  If not, see
> > .
> > + */
> > +
> > +#include 
> > +#include 
> > +
> > +#include 
> > +
> > +#include "hyp.h"
> > +
> > +/* vcpu is already in the HYP VA space */
> > +void __hyp_text __timer_save_state(struct kvm_vcpu *vcpu)
> > +{
> > +   struct kvm *kvm = kern_hyp_va(vcpu->kvm);
> > +   struct arch_timer_cpu *timer = >arch.timer_cpu;
> > +
> > +   if (kvm->arch.timer.enabled) {
> > +   timer->cntv_ctl = read_sysreg(cntv_ctl_el0);  
> 
> The old assembler version ands this value with 3 before storing it.
> Are we not worried about the ISTATUS bit?

Not really. It could even be useful to find out about it, to be
honest.

> 
> > +   isb();
> > +   timer->cntv_cval = read_sysreg(cntv_cval_el0);
> > +   }
> > +
> > +   /* Disable the virtual timer */
> > +   write_sysreg(0, cntv_ctl_el0);
> > +
> > +   /* Allow physical timer/counter access for the host */
> > +   write_sysreg(read_sysreg(cnthctl_el2) | 3, cnthctl_el2);  
> 
> nit: EL1PCTEN | EL1PCEN or something similar?

Sure.

> > +
> > +   /* Clear cntvoff for the host */
> > +   write_sysreg(0, cntvoff_el2);
> > +}
> > +
> > +void __hyp_text __timer_restore_state(struct kvm_vcpu *vcpu)
> > +{
> > +   struct kvm *kvm = kern_hyp_va(vcpu->kvm);
> > +   struct arch_timer_cpu *timer = >arch.timer_cpu;
> > +   u64 val;
> > +
> > +   /*
> > +* Disallow physical timer access for the guest
> > +* Physical counter access is allowed
> > +*/
> > +   val = read_sysreg(cnthctl_el2);
> > +   val &= ~(1 << 1);
> > +   val |= 1;  
> 
> Similar nit here about constants.
> 
> > +   write_sysreg(val, cnthctl_el2);
> > +
> > +   if (kvm->arch.timer.enabled) {
> > +   write_sysreg(kvm->arch.timer.cntvoff, cntvoff_el2);
> > +   write_sysreg(timer->cntv_cval, cntv_cval_el0);
> > +   isb();
> > +   write_sysreg(timer->cntv_ctl, cntv_ctl_el0);  
> 
> In the assembler version we and cntv_ctl with 3 before writing to
> cntv_ctl_el0.

ISTATUS is RO, so masking it was a bit useless.

Thanks,

M.
-- 
Jazz is not dead. It just smells funny.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: Add lowest-priority support for vt-d posted-interrupts

2015-11-25 Thread Paolo Bonzini


On 25/11/2015 02:58, Wu, Feng wrote:
> Okay, let me try to understand this clearly:
> - We will have a new KVM command line parameter to indicate whether
>   vector hashing is enabled.
> - If it is not enabled, for PI, we can only support single destination lowest
>   priority interrupts, for non-PI, we continue to use RR.
> - If it is enabled, for PI and non-PI we use vector hashing for both of them.
> 
> Is this the case you have in mind? Thanks a lot!

Yes, thanks!

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH V2 3/3] Ixgbevf: Add migration support for ixgbevf driver

2015-11-25 Thread Michael S. Tsirkin
On Wed, Nov 25, 2015 at 01:39:32PM +0800, Lan Tianyu wrote:
> On 2015年11月25日 05:20, Michael S. Tsirkin wrote:
> > I have to say, I was much more interested in the idea
> > of tracking dirty memory. I have some thoughts about
> > that one - did you give up on it then?
> 
> No, our finial target is to keep VF active before doing
> migration and tracking dirty memory is essential. But this
> seems not easy to do that in short term for upstream. As
> starters, stop VF before migration.

Frankly, I don't really see what this short term hack buys us,
and if it goes in, we'll have to maintain it forever.

Also, assuming you just want to do ifdown/ifup for some reason, it's
easy enough to do using a guest agent, in a completely generic way.


> After deep thinking, the way of stopping VF still needs tracking
> DMA-accessed dirty memory to make sure the received data buffer
> before stopping VF migrated. It's easier to do that via dummy writing
> data buffer when receive packet.
> 
> 
> -- 
> Best regards
> Tianyu Lan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: don't expose syscall/sysret to intel 32-bit guest

2015-11-25 Thread Wanpeng Li
2015-11-19 19:05 GMT+08:00 Paolo Bonzini :
>
> 1) Clear F(SYSCALL) in kvm_update_cpuid, like you are doing here but
> only if F(LM) is already clear (in addition to the vendor being Intel).

It seems that F(LM) is always set in the case of qemu-system-x86_64 w/
32-bit guest, vmware also exposes LM bit to 32 bit guest(however it
doesn't expose syscall/sysret), so maybe we can't depend on F(LM).

Regards,
Wanpeng Li
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/7] vfio: fix a problematic usage of WARN()

2015-11-25 Thread Geliang Tang
WARN() takes a condition and a format string. The condition was
omitted. So I added it.

Signed-off-by: Geliang Tang 
---
 drivers/vfio/vfio.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index de632da..9da0703 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -682,7 +682,7 @@ static int vfio_group_nb_add_dev(struct vfio_group *group, 
struct device *dev)
return 0;
 
/* TODO Prevent device auto probing */
-   WARN("Device %s added to live group %d!\n", dev_name(dev),
+   WARN(1, "Device %s added to live group %d!\n", dev_name(dev),
 iommu_group_id(group->iommu_group));
 
return 0;
-- 
2.5.0


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v3 15/16] KVM: arm64: implement MSI injection in ITS emulation

2015-11-25 Thread Pavel Fedin
 Hello!

 I have discovered one more issue, and it is major one. It gets triggered by 
VFIO. See inline.

 P.S. What is the overall current status? Long time has passed since the last 
email...

> -Original Message-
> From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf 
> Of Andre Przywara
> Sent: Wednesday, October 07, 2015 5:55 PM
> To: marc.zyng...@arm.com; christoffer.d...@linaro.org
> Cc: eric.au...@linaro.org; p.fe...@samsung.com; kvm...@lists.cs.columbia.edu; 
> linux-arm-
> ker...@lists.infradead.org; kvm@vger.kernel.org
> Subject: [PATCH v3 15/16] KVM: arm64: implement MSI injection in ITS emulation
> 
> When userland wants to inject a MSI into the guest, we have to use
> our data structures to find the LPI number and the VCPU to receive
> the interrupt.
> Use the wrapper functions to iterate the linked lists and find the
> proper Interrupt Translation Table Entry. Then set the pending bit
> in this ITTE to be later picked up by the LR handling code. Kick
> the VCPU which is meant to handle this interrupt.
> We provide a VGIC emulation model specific routine for the actual
> MSI injection. The wrapper functions return an error for models not
> (yet) implementing MSIs (like the GICv2 emulation).
> We also provide the handler for the ITS "INT" command, which allows a
> guest to trigger an MSI via the ITS command queue.
> 
> Signed-off-by: Andre Przywara 
> ---
> Changelog v2..v3:
> - proper checking for unmapped collections
> 
>  include/kvm/arm_vgic.h  |  1 +
>  virt/kvm/arm/its-emul.c | 65 
> +
>  virt/kvm/arm/its-emul.h |  2 ++
>  virt/kvm/arm/vgic-v3-emul.c |  1 +
>  4 files changed, 69 insertions(+)
> 
> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> index 4ea023c..7911059 100644
> --- a/include/kvm/arm_vgic.h
> +++ b/include/kvm/arm_vgic.h
> @@ -149,6 +149,7 @@ struct vgic_vm_ops {
>   int (*map_resources)(struct kvm *, const struct vgic_params *);
>   bool(*queue_lpis)(struct kvm_vcpu *);
>   void(*unqueue_lpi)(struct kvm_vcpu *, int irq);
> + int (*inject_msi)(struct kvm *, struct kvm_msi *);
>  };
> 
>  struct vgic_io_device {
> diff --git a/virt/kvm/arm/its-emul.c b/virt/kvm/arm/its-emul.c
> index 642effb..cd8526a 100644
> --- a/virt/kvm/arm/its-emul.c
> +++ b/virt/kvm/arm/its-emul.c
> @@ -333,6 +333,55 @@ static bool handle_mmio_gits_idregs(struct kvm_vcpu 
> *vcpu,
>  }
> 
>  /*
> + * Translates an incoming MSI request into the redistributor (=VCPU) and
> + * the associated LPI number. Sets the LPI pending bit and also marks the
> + * VCPU as having a pending interrupt.
> + */
> +int vits_inject_msi(struct kvm *kvm, struct kvm_msi *msi)
> +{
> + struct vgic_dist *dist = >arch.vgic;
> + struct vgic_its *its = >its;
> + struct its_itte *itte;
> + int cpuid;
> + bool inject = false;
> + int ret = 0;
> +
> + if (!vgic_has_its(kvm))
> + return -ENODEV;
> +
> + if (!(msi->flags & KVM_MSI_VALID_DEVID))
> + return -EINVAL;
> +
> + spin_lock(>lock);
> +
> + if (!its->enabled || !dist->lpis_enabled) {
> + ret = -EAGAIN;
> + goto out_unlock;
> + }
> +
> + itte = find_itte(kvm, msi->devid, msi->data);
> + /* Triggering an unmapped IRQ gets silently dropped. */
> + if (!itte || !its_is_collection_mapped(itte->collection))
> + goto out_unlock;
> +
> + cpuid = itte->collection->target_addr;
> + __set_bit(cpuid, itte->pending);
> + inject = itte->enabled;
> +
> +out_unlock:
> + spin_unlock(>lock);
> +
> + if (inject) {
> + spin_lock(>lock);

 At this point there can be a deadlock, because dist->lock is taken from within 
many places in KVM. If we are forwarding VFIO IRQ
using IRQFDs, then irqfd_wakeup() will directly call kvm_set_msi(), which ends 
up here. But, interrupts from VFIO devices can happen
at any moments, including those when dist->lock is taken by KVM status update 
code.
 Currently i added a simple workaround by disabling MSI fast path for 
KVM_ARM_HOST, but i believe it's not good solution. But can we
do it better?
 OTOH, i know, direct IRQ forwarding is on the way.

> + __set_bit(cpuid, dist->irq_pending_on_cpu);
> + spin_unlock(>lock);
> + kvm_vcpu_kick(kvm_get_vcpu(kvm, cpuid));
> + }
> +
> + return ret;
> +}
> +
> +/*
>   * Find all enabled and pending LPIs and queue them into the list
>   * registers.
>   * The dist lock is held by the caller.
> @@ -812,6 +861,19 @@ static int vits_cmd_handle_movall(struct kvm *kvm, u64 
> *its_cmd)
>   return 0;
>  }
> 
> +/* The INT command injects the LPI associated with that DevID/EvID pair. */
> +static int vits_cmd_handle_int(struct kvm *kvm, u64 *its_cmd)
> +{
> + struct kvm_msi msi = {
> + .data = its_cmd_get_id(its_cmd),
> + .devid = 

Re: [PATCH] KVM: x86: don't expose syscall/sysret to intel 32-bit guest

2015-11-25 Thread Paolo Bonzini


On 25/11/2015 13:45, Wanpeng Li wrote:
> 2015-11-19 19:05 GMT+08:00 Paolo Bonzini :
>>
>> 1) Clear F(SYSCALL) in kvm_update_cpuid, like you are doing here but
>> only if F(LM) is already clear (in addition to the vendor being Intel).
> 
> It seems that F(LM) is always set in the case of qemu-system-x86_64 w/
> 32-bit guest, vmware also exposes LM bit to 32 bit guest(however it
> doesn't expose syscall/sysret)

Does it expose the SYSCALL bit?

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/7] vfio: fix a problematic usage of WARN()

2015-11-25 Thread Alex Williamson
On Wed, 2015-11-25 at 21:12 +0800, Geliang Tang wrote:
> WARN() takes a condition and a format string. The condition was
> omitted. So I added it.
> 
> Signed-off-by: Geliang Tang 
> ---
>  drivers/vfio/vfio.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index de632da..9da0703 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -682,7 +682,7 @@ static int vfio_group_nb_add_dev(struct vfio_group 
> *group, struct device *dev)
>   return 0;
>  
>   /* TODO Prevent device auto probing */
> - WARN("Device %s added to live group %d!\n", dev_name(dev),
> + WARN(1, "Device %s added to live group %d!\n", dev_name(dev),
>iommu_group_id(group->iommu_group));
>  
>   return 0;

This was already reported and I've got a patch queued to resolve it:

https://www.mail-archive.com/kvm@vger.kernel.org/msg123061.html

Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: Add lowest-priority support for vt-d posted-interrupts

2015-11-25 Thread Radim Krcmár
2015-11-25 03:21+, Wu, Feng:
> From: Radim Krčmář [mailto:rkrc...@redhat.com]
>> The hash function just interprets a subset of vector's bits as a number
>> and uses that as a starting offset in a search for an enabled APIC
>> within the destination set?
>> 
>> For example:
>> The x2APIC destination is 0x0055 (= first four even APICs in cluster
>> 0), the vector is 0b1110, and bits 10:8 of IntControl are 000.
>> 
>> 000 means that bits 7:4 of vector are selected, thus the vector hash is
>> 0b1110 = 14, so the round-robin effectively does 14 % 4 (because we only
>> have 4 destinations) and delivers to the 3rd possible APIC (= ID 6)?
> 
> In my current implementation, I don't select a subset of vector's bits as
> the number, instead, I use the whole vector number. For software emulation
> p. o. v, do we really need to select a subset of the vector's bits as the base
> number? What is your opinion? Thanks a lot!

I think it's ok to pick any algorithm we like.  It's unlikely that
software would recognize and take advantage of the hardware algorithm
without adding a special treatment for KVM.
(I'd vote for the simple pick-first-APIC lowest priority algorithm ...
 I don't see much point in complicating lowest priority when it doesn't
 deliver to lowest priority CPU anyway.)

I mainly wanted to know what real hardware really does, because there is
a lot of alternatives that still fit into the Xeon documentation.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: Add lowest-priority support for vt-d posted-interrupts

2015-11-25 Thread Paolo Bonzini


On 25/11/2015 15:12, Radim Krcmár wrote:
> I think it's ok to pick any algorithm we like.  It's unlikely that
> software would recognize and take advantage of the hardware algorithm
> without adding a special treatment for KVM.
> (I'd vote for the simple pick-first-APIC lowest priority algorithm ...
>  I don't see much point in complicating lowest priority when it doesn't
>  deliver to lowest priority CPU anyway.)

Vector hashing is an improvement for the common case where all vectors
are set to all CPUs.  Sure you can get an unlucky assignment, but it's
still better than pick-first-APIC.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html