Re: [PATCH v2 4/5] KVM: add KVM_USER_EXIT vcpu ioctl for userspace exit

2015-08-18 Thread Avi Kivity

On 08/18/2015 10:57 PM, Paolo Bonzini wrote:


On 18/08/2015 11:30, Avi Kivity wrote:

KVM_USER_EXIT in practice should be so rare (at least with in-kernel
LAPIC) that I don't think this matters.  KVM_USER_EXIT is relatively
uninteresting, it only exists to provide an alternative to signals that
doesn't require expensive atomics on each and every KVM_RUN. :(

Ah, so the idea is to remove the cost of changing the signal mask?

Yes, it's explained in the cover letter.


Yes, although it looks like a thread-local operation, it takes a
process-wide lock.

IIRC the lock was only task-wide and uncontended.  Problem is, it's on
the node that created the thread rather than the node that is running
it, and inter-node atomics are really, really slow.


Cached inter-node atomics are (relatively) fast, but I think it really 
is a process-wide lock:


sigprocmask calls:

void __set_current_blocked(const sigset_t *newset)
{
struct task_struct *tsk = current;

spin_lock_irq(&tsk->sighand->siglock);
__set_task_blocked(tsk, newset);
spin_unlock_irq(&tsk->sighand->siglock);
}

struct sighand_struct {
atomic_tcount;
struct k_sigactionaction[_NSIG];
spinlock_tsiglock;
wait_queue_head_tsignalfd_wqh;
};

Since sigaction is usually process-wide, I conclude that so will 
tsk->sighand.






For guests spanning >1 host NUMA nodes it's not really practical to
ensure that the thread is created on the right node.  Even for guests
that fit into 1 host node, if you rely on AutoNUMA the VCPUs are created
too early for AutoNUMA to have any effect.  And newer machines have
frighteningly small nodes (two nodes per socket, so it's something like
7 pCPUs if you don't have hyper-threading enabled).  True, the NUMA
penalty within the same socket is not huge, but it still costs a few
thousand clock cycles on vmexit.flat and this feature sweeps it away
completely.


I expect most user wakeups are via irqfd, so indeed the performance of
KVM_USER_EXIT is uninteresting.

Yup, either irqfd or KVM_SET_SIGNAL_MSI.

Paolo


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/12] kvm/x86: added hyper-v crash data and ctl msr's get/set'ers

2015-08-18 Thread Wanpeng Li

On 7/3/15 8:01 PM, Denis V. Lunev wrote:

From: Andrey Smetanin 

Added hyper-v crash msr's(HV_X64_MSR_CRASH*) data and control
geters and setters. Userspace should check that such msr's
available by check of KVM_CAP_HYPERV_MSR_CRASH capability.


I didn't see the KVM_CAP_HYPERV_MSR_CRASH in this patchset. :(

Regards,
Wanpeng Li
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/12] kvm/x86: added hyper-v crash data and ctl msr's get/set'ers

2015-08-18 Thread Denis V. Lunev

On 08/18/2015 05:41 PM, Wanpeng Li wrote:

On 7/3/15 8:01 PM, Denis V. Lunev wrote:

From: Andrey Smetanin 

Added hyper-v crash msr's(HV_X64_MSR_CRASH*) data and control
geters and setters. Userspace should check that such msr's
available by check of KVM_CAP_HYPERV_MSR_CRASH capability.


I didn't see the KVM_CAP_HYPERV_MSR_CRASH in this patchset. :(

Regards,
Wanpeng Li

, actually I have not updated the comment. Sorry.
This cap was gone since previous revision, the check
of the presense of this feature in KVM is performed
by MSR availability.

Den
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 4/5] KVM: add KVM_USER_EXIT vcpu ioctl for userspace exit

2015-08-18 Thread Paolo Bonzini


On 18/08/2015 11:30, Avi Kivity wrote:
>> KVM_USER_EXIT in practice should be so rare (at least with in-kernel
>> LAPIC) that I don't think this matters.  KVM_USER_EXIT is relatively
>> uninteresting, it only exists to provide an alternative to signals that
>> doesn't require expensive atomics on each and every KVM_RUN. :(
> 
> Ah, so the idea is to remove the cost of changing the signal mask?

Yes, it's explained in the cover letter.

> Yes, although it looks like a thread-local operation, it takes a
> process-wide lock.

IIRC the lock was only task-wide and uncontended.  Problem is, it's on
the node that created the thread rather than the node that is running
it, and inter-node atomics are really, really slow.

For guests spanning >1 host NUMA nodes it's not really practical to
ensure that the thread is created on the right node.  Even for guests
that fit into 1 host node, if you rely on AutoNUMA the VCPUs are created
too early for AutoNUMA to have any effect.  And newer machines have
frighteningly small nodes (two nodes per socket, so it's something like
7 pCPUs if you don't have hyper-threading enabled).  True, the NUMA
penalty within the same socket is not huge, but it still costs a few
thousand clock cycles on vmexit.flat and this feature sweeps it away
completely.

> I expect most user wakeups are via irqfd, so indeed the performance of
> KVM_USER_EXIT is uninteresting.

Yup, either irqfd or KVM_SET_SIGNAL_MSI.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 2/5] KVM: add KVM_EXIT_MSR exit reason and capability.

2015-08-18 Thread Peter Hornyack
Define KVM_EXIT_MSR, a new exit reason for accesses to MSRs that kvm
does not handle. Define KVM_CAP_UNHANDLED_MSR_EXITS, a vm-wide
capability that guards the new exit reason and which can be enabled via
the KVM_ENABLE_CAP ioctl.

Signed-off-by: Peter Hornyack 
---
 Documentation/virtual/kvm/api.txt | 48 +++
 include/uapi/linux/kvm.h  | 14 
 2 files changed, 62 insertions(+)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index a4ebcb712375..bda540b3dd03 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -3302,6 +3302,36 @@ Valid values for 'type' are:
to ignore the request, or to gather VM memory core dump and/or
reset/shutdown of the VM.
 
+   /* KVM_EXIT_MSR */
+   struct {
+#define KVM_EXIT_MSR_RDMSR 1
+#define KVM_EXIT_MSR_WRMSR 2
+#define KVM_EXIT_MSR_COMPLETION_FAILED 3
+   __u8 direction; /* out */
+#define KVM_EXIT_MSR_UNHANDLED 1
+#define KVM_EXIT_MSR_HANDLED   2
+   __u8 handled;   /* in */
+   __u32 index;/* i.e. ecx; out */
+   __u64 data; /* out (wrmsr) / in (rdmsr) */
+   } msr;
+
+If exit_reason is KVM_EXIT_MSR, then the vcpu has executed a rdmsr or wrmsr
+instruction which could not be satisfied by kvm. The msr struct is used for
+both output to and input from user space. direction indicates whether the
+instruction was rdmsr or wrmsr, and index is the target MSR number held in
+ecx. User space must not modify index. data holds the payload from a wrmsr or
+must be filled in with a payload on a rdmsr.
+
+On the return path into kvm, user space should set handled to
+KVM_EXIT_MSR_HANDLED if it successfully handled the MSR access; otherwise,
+handled should be set to KVM_EXIT_MSR_UNHANDLED, which will cause a general
+protection fault to be injected into the vcpu. If an error occurs during the
+return into kvm, the vcpu will not be run and another KVM_EXIT_MSR will be
+generated with direction KVM_EXIT_MSR_COMPLETION_FAILED.
+
+KVM_EXIT_MSR can only occur when KVM_CAP_UNHANDLED_MSR_EXITS has been enabled;
+a detailed description of this capability is below.
+
/* Fix the size of the union. */
char padding[256];
};
@@ -3620,6 +3650,24 @@ struct {
 
 KVM handlers should exit to userspace with rc = -EREMOTE.
 
+7.5 KVM_CAP_UNHANDLED_MSR_EXITS
+
+Architectures: x86 (vmx-only)
+Parameters: args[0] enables or disables unhandled MSR exits
+Returns: 0 on success; -1 on error
+
+This capability enables exits to user space on unhandled MSR accesses.
+
+When enabled (args[0] != 0), when the guest accesses an MSR that kvm does not
+handle kvm will exit to user space with the reason KVM_EXIT_MSR. When disabled
+(by default, or with args[0] == 0), when the guest accesses an MSR that kvm
+does not handle a GP fault is immediately injected into the guest.
+
+Currently only implemented for vmx; attempts to enable this capability on svm
+systems will return an error. Also, note that this capability is overridden if
+the kvm module's ignore_msrs flag is set, in which case unhandled MSR accesses
+are simply ignored and the guest is re-entered immediately.
+
 
 8. Other capabilities.
 --
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 0d831f94f8a8..43d2d1e15ac4 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -183,6 +183,7 @@ struct kvm_s390_skeys {
 #define KVM_EXIT_EPR  23
 #define KVM_EXIT_SYSTEM_EVENT 24
 #define KVM_EXIT_S390_STSI25
+#define KVM_EXIT_MSR  26
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 /* Emulate instruction failed. */
@@ -330,6 +331,18 @@ struct kvm_run {
__u8 sel1;
__u16 sel2;
} s390_stsi;
+   /* KVM_EXIT_MSR */
+   struct {
+#define KVM_EXIT_MSR_RDMSR 1
+#define KVM_EXIT_MSR_WRMSR 2
+#define KVM_EXIT_MSR_COMPLETION_FAILED 3
+   __u8 direction; /* out */
+#define KVM_EXIT_MSR_UNHANDLED 1
+#define KVM_EXIT_MSR_HANDLED   2
+   __u8 handled;   /* in */
+   __u32 index;/* i.e. ecx; out */
+   __u64 data; /* out (wrmsr) / in (rdmsr) */
+   } msr;
/* Fix the size of the union. */
char padding[256];
};
@@ -819,6 +832,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_DISABLE_QUIRKS 116
 #define KVM_CAP_X86_SMM 117
 #define KVM_CAP_MULTI_ADDRESS_SPACE 118
+#define KVM_CAP_UNHANDLED_MSR_EXITS 119
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.5.0.276.gf5e568e

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 3/5] KVM: x86: add msr_exits_supported to kvm_x86_ops

2015-08-18 Thread Peter Hornyack
msr_exits_supported will be checked when user space attempts to enable
the KVM_CAP_UNHANDLED_MSR_EXITS capability for the vm. This is needed
because MSR exit support will be implemented for vmx but not svm later
in this patchset.

Signed-off-by: Peter Hornyack 
---
 arch/x86/include/asm/kvm_host.h | 1 +
 arch/x86/kvm/svm.c  | 6 ++
 arch/x86/kvm/vmx.c  | 6 ++
 3 files changed, 13 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c12e845f59e6..a6e145b1e271 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -854,6 +854,7 @@ struct kvm_x86_ops {
void (*handle_external_intr)(struct kvm_vcpu *vcpu);
bool (*mpx_supported)(void);
bool (*xsaves_supported)(void);
+   bool (*msr_exits_supported)(void);
 
int (*check_nested_events)(struct kvm_vcpu *vcpu, bool external_intr);
 
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 74d825716f4f..bcbb56f49b9f 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -4249,6 +4249,11 @@ static bool svm_xsaves_supported(void)
return false;
 }
 
+static bool svm_msr_exits_supported(void)
+{
+   return false;
+}
+
 static bool svm_has_wbinvd_exit(void)
 {
return true;
@@ -4540,6 +4545,7 @@ static struct kvm_x86_ops svm_x86_ops = {
.invpcid_supported = svm_invpcid_supported,
.mpx_supported = svm_mpx_supported,
.xsaves_supported = svm_xsaves_supported,
+   .msr_exits_supported = svm_msr_exits_supported,
 
.set_supported_cpuid = svm_set_supported_cpuid,
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index acc38e27d221..27fec385d79d 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -8161,6 +8161,11 @@ static bool vmx_xsaves_supported(void)
SECONDARY_EXEC_XSAVES;
 }
 
+static bool vmx_msr_exits_supported(void)
+{
+   return false;
+}
+
 static void vmx_recover_nmi_blocking(struct vcpu_vmx *vmx)
 {
u32 exit_intr_info;
@@ -10413,6 +10418,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
.handle_external_intr = vmx_handle_external_intr,
.mpx_supported = vmx_mpx_supported,
.xsaves_supported = vmx_xsaves_supported,
+   .msr_exits_supported = vmx_msr_exits_supported,
 
.check_nested_events = vmx_check_nested_events,
 
-- 
2.5.0.276.gf5e568e

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 1/5] KVM: x86: refactor vmx rdmsr/wrmsr completion into new functions

2015-08-18 Thread Peter Hornyack
After handling a rdmsr or wrmsr, refactor the success and failure code
paths into separate functions. This will allow us to also complete or
fail MSR accesses on the entry path from userspace into kvm.

Signed-off-by: Peter Hornyack 
---
 arch/x86/kvm/vmx.c | 44 +---
 1 file changed, 33 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 37eae551857c..acc38e27d221 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5463,6 +5463,34 @@ static int handle_cpuid(struct kvm_vcpu *vcpu)
return 1;
 }
 
+static void complete_rdmsr(struct kvm_vcpu *vcpu, const struct msr_data *msr)
+{
+   trace_kvm_msr_read(msr->index, msr->data);
+
+   /* FIXME: handling of bits 32:63 of rax, rdx */
+   vcpu->arch.regs[VCPU_REGS_RAX] = msr->data & -1u;
+   vcpu->arch.regs[VCPU_REGS_RDX] = (msr->data >> 32) & -1u;
+   skip_emulated_instruction(vcpu);
+}
+
+static void fail_rdmsr(struct kvm_vcpu *vcpu, const struct msr_data *msr)
+{
+   trace_kvm_msr_read_ex(msr->index);
+   kvm_inject_gp(vcpu, 0);
+}
+
+static void complete_wrmsr(struct kvm_vcpu *vcpu, const struct msr_data *msr)
+{
+   trace_kvm_msr_write(msr->index, msr->data);
+   skip_emulated_instruction(vcpu);
+}
+
+static void fail_wrmsr(struct kvm_vcpu *vcpu, const struct msr_data *msr)
+{
+   trace_kvm_msr_write_ex(msr->index, msr->data);
+   kvm_inject_gp(vcpu, 0);
+}
+
 static int handle_rdmsr(struct kvm_vcpu *vcpu)
 {
u32 ecx = vcpu->arch.regs[VCPU_REGS_RCX];
@@ -5471,17 +5499,12 @@ static int handle_rdmsr(struct kvm_vcpu *vcpu)
msr_info.index = ecx;
msr_info.host_initiated = false;
if (vmx_get_msr(vcpu, &msr_info)) {
-   trace_kvm_msr_read_ex(ecx);
-   kvm_inject_gp(vcpu, 0);
+   fail_rdmsr(vcpu, &msr_info);
return 1;
}
 
-   trace_kvm_msr_read(ecx, msr_info.data);
+   complete_rdmsr(vcpu, &msr_info);
 
-   /* FIXME: handling of bits 32:63 of rax, rdx */
-   vcpu->arch.regs[VCPU_REGS_RAX] = msr_info.data & -1u;
-   vcpu->arch.regs[VCPU_REGS_RDX] = (msr_info.data >> 32) & -1u;
-   skip_emulated_instruction(vcpu);
return 1;
 }
 
@@ -5496,13 +5519,12 @@ static int handle_wrmsr(struct kvm_vcpu *vcpu)
msr.index = ecx;
msr.host_initiated = false;
if (kvm_set_msr(vcpu, &msr) != 0) {
-   trace_kvm_msr_write_ex(ecx, data);
-   kvm_inject_gp(vcpu, 0);
+   fail_wrmsr(vcpu, &msr);
return 1;
}
 
-   trace_kvm_msr_write(ecx, data);
-   skip_emulated_instruction(vcpu);
+   complete_wrmsr(vcpu, &msr);
+
return 1;
 }
 
-- 
2.5.0.276.gf5e568e

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 5/5] KVM: x86: add trace events for unhandled MSR exits

2015-08-18 Thread Peter Hornyack
Add trace_kvm_userspace_msr and call it when user space reenters kvm
after KVM_EXIT_MSR.

Add KVM_EXIT_MSR to kvm_trace_exit_reason list.

Signed-off-by: Peter Hornyack 
---
 arch/x86/kvm/trace.h   | 28 
 arch/x86/kvm/vmx.c |  4 
 arch/x86/kvm/x86.c |  1 +
 include/trace/events/kvm.h |  2 +-
 4 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index 4eae7c35ddf5..6d144d424896 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -330,6 +330,34 @@ TRACE_EVENT(kvm_msr,
 #define trace_kvm_msr_read_ex(ecx) trace_kvm_msr(0, ecx, 0, true)
 #define trace_kvm_msr_write_ex(ecx, data)  trace_kvm_msr(1, ecx, data, true)
 
+TRACE_EVENT(kvm_userspace_msr,
+   TP_PROTO(u8 direction, u8 handled, u32 index, u64 data),
+   TP_ARGS(direction, handled, index, data),
+
+   TP_STRUCT__entry(
+   __field(u8, direction)
+   __field(u8, handled)
+   __field(u32, index)
+   __field(u64, data)
+   ),
+
+   TP_fast_assign(
+   __entry->direction  = direction;
+   __entry->handled= handled;
+   __entry->index  = index;
+   __entry->data   = data;
+   ),
+
+   TP_printk("userspace %s %x = 0x%llx, %s",
+ __entry->direction == KVM_EXIT_MSR_RDMSR ? "rdmsr" :
+ __entry->direction == KVM_EXIT_MSR_WRMSR ? "wrmsr" :
+"unknown!",
+ __entry->index, __entry->data,
+ __entry->handled == KVM_EXIT_MSR_UNHANDLED ? "unhandled" :
+ __entry->handled == KVM_EXIT_MSR_HANDLED   ? "handled" :
+  "unknown!")
+);
+
 /*
  * Tracepoint for guest CR access.
  */
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index ba26d382d785..46d276235f78 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5500,6 +5500,10 @@ static int vmx_complete_userspace_msr(struct kvm_vcpu 
*vcpu)
 {
struct msr_data msr;
 
+   trace_kvm_userspace_msr(vcpu->run->msr.direction,
+   vcpu->run->msr.handled, vcpu->run->msr.index,
+   vcpu->run->msr.data);
+
if (vcpu->run->msr.index != vcpu->arch.regs[VCPU_REGS_RCX]) {
pr_debug("msr.index 0x%x changed, does not match ecx 0x%lx\n",
 vcpu->run->msr.index,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5c22f4655741..cc74ba1d01e6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8040,6 +8040,7 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_msr);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_userspace_msr);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_cr);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmrun);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmexit);
diff --git a/include/trace/events/kvm.h b/include/trace/events/kvm.h
index a44062da684b..aa6ce656d658 100644
--- a/include/trace/events/kvm.h
+++ b/include/trace/events/kvm.h
@@ -14,7 +14,7 @@
ERSN(SHUTDOWN), ERSN(FAIL_ENTRY), ERSN(INTR), ERSN(SET_TPR),\
ERSN(TPR_ACCESS), ERSN(S390_SIEIC), ERSN(S390_RESET), ERSN(DCR),\
ERSN(NMI), ERSN(INTERNAL_ERROR), ERSN(OSI), ERSN(PAPR_HCALL),   \
-   ERSN(S390_UCONTROL), ERSN(WATCHDOG), ERSN(S390_TSCH)
+   ERSN(S390_UCONTROL), ERSN(WATCHDOG), ERSN(S390_TSCH), ERSN(MSR)
 
 TRACE_EVENT(kvm_userspace_exit,
TP_PROTO(__u32 reason, int errno),
-- 
2.5.0.276.gf5e568e

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 0/5] KVM: x86: exit to user space on unhandled MSR accesses

2015-08-18 Thread Peter Hornyack
There are numerous MSRs that kvm does not currently handle. On Intel
platforms we have observed guest VMs accessing some of these MSRs (for
example, MSR_PLATFORM_INFO) and behaving poorly (to the point of guest OS
crashes) when they receive a GP fault because the MSR is not emulated. This
patchset adds a new kvm exit path for unhandled MSR accesses that allows
user space to emulate additional MSRs without having to implement them in
kvm.

The core of the patchset modifies the vmx handle_rdmsr and handle_wrmsr
functions to exit to user space on MSR reads/writes that kvm can't handle
itself. Then, on the return path into kvm we check for outstanding user
space MSR completions and either complete the MSR access successfully or
inject a GP fault as kvm would do by default. This new exit path must be
enabled for the vm via the KVM_CAP_UNHANDLED_MSR_EXITS capability.

In the future we plan to extend this functionality to allow user space to
register the MSRs that it would like to handle itself, even if kvm already
provides an implementation. In the long-term we will move the
implementation of all non-performance-sensitive MSRs to user space,
reducing the potential attack surface of kvm and allowing us to respond to
bugs more quickly.

This patchset has been tested with our non-qemu user space hypervisor on
vmx platforms; svm support is not implemented.

Peter Hornyack (5):
  KVM: x86: refactor vmx rdmsr/wrmsr completion into new functions
  KVM: add KVM_EXIT_MSR exit reason and capability.
  KVM: x86: add msr_exits_supported to kvm_x86_ops
  KVM: x86: enable unhandled MSR exits for vmx
  KVM: x86: add trace events for unhandled MSR exits

 Documentation/virtual/kvm/api.txt |  48 +++
 arch/x86/include/asm/kvm_host.h   |   2 +
 arch/x86/kvm/svm.c|   6 ++
 arch/x86/kvm/trace.h  |  28 +
 arch/x86/kvm/vmx.c| 126 ++
 arch/x86/kvm/x86.c|  13 
 include/trace/events/kvm.h|   2 +-
 include/uapi/linux/kvm.h  |  14 +
 8 files changed, 227 insertions(+), 12 deletions(-)

-- 
2.5.0.276.gf5e568e

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 4/5] KVM: x86: enable unhandled MSR exits for vmx

2015-08-18 Thread Peter Hornyack
Set the vm's unhandled_msr_exits flag when user space calls the
KVM_ENABLE_CAP ioctl with KVM_CAP_UNHANDLED_MSR_EXITS. After kvm fails
to handle a guest rdmsr or wrmsr, check this flag and exit to user space
with KVM_EXIT_MSR rather than immediately injecting a GP fault.

On reentry into kvm, use the complete_userspace_io callback path to call
vmx_complete_userspace_msr. Complete the MSR access if user space was
able to handle it successfully, or fail the MSR access and inject a GP
fault if user space could not handle the access.

Signed-off-by: Peter Hornyack 
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/vmx.c  | 74 -
 arch/x86/kvm/x86.c  | 12 +++
 3 files changed, 86 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index a6e145b1e271..1b06cea06a8e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -682,6 +682,7 @@ struct kvm_arch {
u32 bsp_vcpu_id;
 
u64 disabled_quirks;
+   bool unhandled_msr_exits;
 };
 
 struct kvm_vm_stat {
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 27fec385d79d..ba26d382d785 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -32,6 +32,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "kvm_cache_regs.h"
 #include "x86.h"
 
@@ -5491,6 +5492,53 @@ static void fail_wrmsr(struct kvm_vcpu *vcpu, const 
struct msr_data *msr)
kvm_inject_gp(vcpu, 0);
 }
 
+/*
+ * On success, returns 1 so that __vcpu_run() will happen next. On error,
+ * returns 0.
+ */
+static int vmx_complete_userspace_msr(struct kvm_vcpu *vcpu)
+{
+   struct msr_data msr;
+
+   if (vcpu->run->msr.index != vcpu->arch.regs[VCPU_REGS_RCX]) {
+   pr_debug("msr.index 0x%x changed, does not match ecx 0x%lx\n",
+vcpu->run->msr.index,
+vcpu->arch.regs[VCPU_REGS_RCX]);
+   goto err_out;
+   }
+   msr.index = vcpu->run->msr.index;
+   msr.data = vcpu->run->msr.data;
+   msr.host_initiated = false;
+
+   switch (vcpu->run->msr.direction) {
+   case KVM_EXIT_MSR_RDMSR:
+   if (vcpu->run->msr.handled == KVM_EXIT_MSR_HANDLED)
+   complete_rdmsr(vcpu, &msr);
+   else
+   fail_rdmsr(vcpu, &msr);
+   break;
+   case KVM_EXIT_MSR_WRMSR:
+   if (vcpu->run->msr.handled == KVM_EXIT_MSR_HANDLED)
+   complete_wrmsr(vcpu, &msr);
+   else
+   fail_wrmsr(vcpu, &msr);
+   break;
+   default:
+   pr_debug("bad msr.direction %u\n", vcpu->run->msr.direction);
+   goto err_out;
+   }
+
+   return 1;
+err_out:
+   vcpu->run->exit_reason = KVM_EXIT_MSR;
+   vcpu->run->msr.direction = KVM_EXIT_MSR_COMPLETION_FAILED;
+   return 0;
+}
+
+/*
+ * Returns 1 if the rdmsr handling is complete; returns 0 if kvm should exit to
+ * user space to handle this rdmsr.
+ */
 static int handle_rdmsr(struct kvm_vcpu *vcpu)
 {
u32 ecx = vcpu->arch.regs[VCPU_REGS_RCX];
@@ -5499,6 +5547,16 @@ static int handle_rdmsr(struct kvm_vcpu *vcpu)
msr_info.index = ecx;
msr_info.host_initiated = false;
if (vmx_get_msr(vcpu, &msr_info)) {
+   if (vcpu->kvm->arch.unhandled_msr_exits) {
+   vcpu->run->exit_reason = KVM_EXIT_MSR;
+   vcpu->run->msr.direction = KVM_EXIT_MSR_RDMSR;
+   vcpu->run->msr.index = msr_info.index;
+   vcpu->run->msr.data = 0;
+   vcpu->run->msr.handled = 0;
+   vcpu->arch.complete_userspace_io =
+   vmx_complete_userspace_msr;
+   return 0;
+   }
fail_rdmsr(vcpu, &msr_info);
return 1;
}
@@ -5508,6 +5566,10 @@ static int handle_rdmsr(struct kvm_vcpu *vcpu)
return 1;
 }
 
+/*
+ * Returns 1 if the wrmsr handling is complete; returns 0 if kvm should exit to
+ * user space to handle this wrmsr.
+ */
 static int handle_wrmsr(struct kvm_vcpu *vcpu)
 {
struct msr_data msr;
@@ -5519,6 +5581,16 @@ static int handle_wrmsr(struct kvm_vcpu *vcpu)
msr.index = ecx;
msr.host_initiated = false;
if (kvm_set_msr(vcpu, &msr) != 0) {
+   if (vcpu->kvm->arch.unhandled_msr_exits) {
+   vcpu->run->exit_reason = KVM_EXIT_MSR;
+   vcpu->run->msr.direction = KVM_EXIT_MSR_WRMSR;
+   vcpu->run->msr.index = msr.index;
+   vcpu->run->msr.data = msr.data;
+   vcpu->run->msr.handled = 0;
+   vcpu->arch.complete_userspace_io =
+   vmx_complete_userspace_msr;
+   return 0;

Re: [PATCH v2 4/5] KVM: add KVM_USER_EXIT vcpu ioctl for userspace exit

2015-08-18 Thread Avi Kivity

On 08/17/2015 04:15 PM, Paolo Bonzini wrote:


On 16/08/2015 13:27, Avi Kivity wrote:

On 08/05/2015 07:33 PM, Radim Krčmář wrote:

The guest can use KVM_USER_EXIT instead of a signal-based exiting to
userspace.  Availability depends on KVM_CAP_USER_EXIT.
Only x86 is implemented so far.

Signed-off-by: Radim Krčmář 
---
   v2:
* use vcpu ioctl instead of vm one [4/5]
* shrink kvm_user_exit from 64 to 32 bytes [4/5]

   Documentation/virtual/kvm/api.txt | 30 ++
   arch/x86/kvm/x86.c| 24 
   include/uapi/linux/kvm.h  |  7 +++
   virt/kvm/kvm_main.c   |  5 +++--
   4 files changed, 64 insertions(+), 2 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt
b/Documentation/virtual/kvm/api.txt
index 3c714d43a717..c5844f0b8e7c 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -3020,6 +3020,36 @@ Returns: 0 on success, -1 on error
 Queues an SMI on the thread's vcpu.
   +
+4.97 KVM_USER_EXIT
+
+Capability: KVM_CAP_USER_EXIT
+Architectures: x86
+Type: vcpu ioctl
+Parameters: struct kvm_user_exit (in)
+Returns: 0 on success,
+ -EFAULT if the parameter couldn't be read,
+ -EINVAL if 'reserved' is not zeroed,
+
+struct kvm_user_exit {
+__u8 reserved[32];
+};
+
+The ioctl is asynchronous to VCPU execution and can be issued from
all threads.
+format

This breaks an invariant of vcpu ioctls, and also forces a cacheline
bounce when we fget() the vcpu fd.

KVM_USER_EXIT in practice should be so rare (at least with in-kernel
LAPIC) that I don't think this matters.  KVM_USER_EXIT is relatively
uninteresting, it only exists to provide an alternative to signals that
doesn't require expensive atomics on each and every KVM_RUN. :(


Ah, so the idea is to remove the cost of changing the signal mask?

Yes, although it looks like a thread-local operation, it takes a 
process-wide lock.


I expect most user wakeups are via irqfd, so indeed the performance of 
KVM_USER_EXIT is uninteresting.



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 04/10] VFIO: platform: add vfio_platform_set_automasked

2015-08-18 Thread Alex Williamson
On Mon, 2015-08-17 at 17:38 +0200, Eric Auger wrote:
> On 08/12/2015 08:56 PM, Alex Williamson wrote:
> > On Mon, 2015-08-10 at 15:20 +0200, Eric Auger wrote:
> >> This function makes possible to change the automasked mode.
> >>
> >> Signed-off-by: Eric Auger 
> >>
> >> ---
> >>
> >> v1 -> v2:
> >> - set forwarded flag
> >> ---
> >>  drivers/vfio/platform/vfio_platform_irq.c | 19 +++
> >>  1 file changed, 19 insertions(+)
> >>
> >> diff --git a/drivers/vfio/platform/vfio_platform_irq.c 
> >> b/drivers/vfio/platform/vfio_platform_irq.c
> >> index b31b1f0..a285384 100644
> >> --- a/drivers/vfio/platform/vfio_platform_irq.c
> >> +++ b/drivers/vfio/platform/vfio_platform_irq.c
> >> @@ -186,6 +186,25 @@ static irqreturn_t vfio_handler(int irq, void *dev_id)
> >>return ret;
> >>  }
> >>  
> >> +static int vfio_platform_set_automasked(struct vfio_platform_irq *irq,
> >> + bool automasked)
> >> +{
> >> +  unsigned long flags;
> >> +
> >> +  spin_lock_irqsave(&irq->lock, flags);
> >> +  if (automasked) {
> >> +  irq->forwarded = true;
> >> +  irq->flags |= VFIO_IRQ_INFO_AUTOMASKED;
> >> +  irq->handler = vfio_automasked_irq_handler;
> >> +  } else {
> >> +  irq->forwarded = false;
> >> +  irq->flags &= ~VFIO_IRQ_INFO_AUTOMASKED;
> >> +  irq->handler = vfio_irq_handler;
> >> +  }
> >> +  spin_unlock_irqrestore(&irq->lock, flags);
> >> +  return 0;
> > 
> > In vfio-speak, automasked means level and we're not magically changing
> > the IRQ from level to edge, we're simply able to handle level
> > differently based on a hardware optimization.  Should the user visible
> > flags therefore change based on this?  Aren't we really setting the
> > forwarded state rather than the automasked state?
> 
> Well actually this was following the discussion we had a long time ago
> about that topic:
> 
> http://lkml.iu.edu/hypermail/linux/kernel/1409.1/03659.html
> 
> I did not really know how to conclude ...
> 
> If it is preferred I can hide this to the userspace, no problem.

I think that was based on the user being involved in enabling forwarding
though, now that it's hidden and automatic, it doesn't make much sense
to me to toggle any of the interrupt info details based on the state of
the forward.  The user always needs to handle the interrupt as level
since the bypass can be torn down at any point in time.  We're taking
advantage of the in-kernel path to make further optimizations, which
seems like they should be transparent to the user.  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Help debugging a regression in KVM Module

2015-08-18 Thread Peter Lieven


> Am 18.08.2015 um 17:25 schrieb Radim Krčmář :
> 
> 2015-08-18 16:54+0200, Peter Lieven:
>> After some experiments I was able to find out the bad commit that introduced 
>> the regression:
>> 
>> commit f30ebc312ca9def25650b4e1d01cdb425c310dca
>> Author: Radim Krčmář 
>> Date:   Thu Oct 30 15:06:47 2014 +0100
>> 
>> It seems that this optimisation is not working reliabliy after live 
>> migration. I can't reproduce if
>> I take a 3.19 kernel and revert this single commit.
> 
> Hello, this bug has gone unnoticed for a long time so it is fixed only
> since v4.1 (and v3.19.stable was dead at that point).

thanks for the pointer. i noticed the regression some time ago, but never found 
the time to debug. some distros rely on 3.19 e.g. Ubuntu LTS 14.04.2. I will 
try to ping the maintainer.

Peter

> 
> commit b6ac069532218027f2991cba01d7a72a200688b0
> Author: Radim Krčmář 
> Date:   Fri Jun 5 20:57:41 2015 +0200
> 
>KVM: x86: fix lapic.timer_mode on restore
> 
>lapic.timer_mode was not properly initialized after migration, which
>broke few useful things, like login, by making every sleep eternal.
> 
>Fix this by calling apic_update_lvtt in kvm_apic_post_state_restore.
> 
>There are other slowpaths that update lvtt, so this patch makes sure
>something similar doesn't happen again by calling apic_update_lvtt
>after every modification.
> 
>Cc: sta...@vger.kernel.org
>Fixes: f30ebc312ca9 ("KVM: x86: optimize some accesses to LVTT and SPIV")
>Signed-off-by: Radim Krčmář 
>Signed-off-by: Marcelo Tosatti 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Help debugging a regression in KVM Module

2015-08-18 Thread Radim Krčmář
2015-08-18 16:54+0200, Peter Lieven:
> After some experiments I was able to find out the bad commit that introduced 
> the regression:
> 
> commit f30ebc312ca9def25650b4e1d01cdb425c310dca
> Author: Radim Krčmář 
> Date:   Thu Oct 30 15:06:47 2014 +0100
> 
> It seems that this optimisation is not working reliabliy after live 
> migration. I can't reproduce if
> I take a 3.19 kernel and revert this single commit.

Hello, this bug has gone unnoticed for a long time so it is fixed only
since v4.1 (and v3.19.stable was dead at that point).

commit b6ac069532218027f2991cba01d7a72a200688b0
Author: Radim Krčmář 
Date:   Fri Jun 5 20:57:41 2015 +0200

KVM: x86: fix lapic.timer_mode on restore

lapic.timer_mode was not properly initialized after migration, which
broke few useful things, like login, by making every sleep eternal.

Fix this by calling apic_update_lvtt in kvm_apic_post_state_restore.

There are other slowpaths that update lvtt, so this patch makes sure
something similar doesn't happen again by calling apic_update_lvtt
after every modification.

Cc: sta...@vger.kernel.org
Fixes: f30ebc312ca9 ("KVM: x86: optimize some accesses to LVTT and SPIV")
Signed-off-by: Radim Krčmář 
Signed-off-by: Marcelo Tosatti 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Help debugging a regression in KVM Module

2015-08-18 Thread Peter Lieven

Am 14.08.2015 um 22:01 schrieb Alex Bennée:

Peter Lieven  writes:


Hi,

some time a go I stumbled across a regression in the KVM Module that has been 
introduced somewhere
between 3.17 and 3.19.

I have a rather old openSUSE guest with an XFS filesystem which realiably 
crashes after some live migrations.
I originally believed that the issue might be related to my setup with a 3.12 
host kernel and kvm-kmod 3.19,
but I now found that it is also still present with a 3.19 host kernel with 
included 3.19 kvm module.

My idea was to continue testing on a 3.12 host kernel and then bisect all 
commits to the kvm related parts.

Now my question is how to best bisect only kvm related changes (those
that go into kvm-kmod)?

In general I don't bother. As it is a bisection you eliminate half the
commits at a time you get their fairly quickly anyway. However you can
tell bisect which parts of the tree you car about:

   git bisect start -- arch/arm64/kvm include/linux/kvm* 
include/uapi/linux/kvm* virt/kvm/


After some experiments I was able to find out the bad commit that introduced 
the regression:

commit f30ebc312ca9def25650b4e1d01cdb425c310dca
Author: Radim Krčmář 
Date:   Thu Oct 30 15:06:47 2014 +0100

It seems that this optimisation is not working reliabliy after live migration. 
I can't reproduce if
I take a 3.19 kernel and revert this single commit.

Peter
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Runtime-modified DIMMs and live migration issue

2015-08-18 Thread Andrey Korolyov
"Fixed" with cherry-pick of the
7a72f7a140bfd3a5dae73088947010bfdbcf6a40 and its predecessor
7103f60de8bed21a0ad5d15d2ad5b7a333dda201. Of course this is not a real
fix as the only race precondition is shifted/disappeared by a clear
assumption. Though there are not too many hotplug users around, I hope
this information would be useful for those who would experience the
same in a next year or so, until 3.18+ will be stable enough for
hypervisor kernel role. Any suggestions on a further debug/race
re-exposition are of course very welcomed.

CCing kvm@ as it looks as a hypervisor subsystem issue then. The
entire discussion can be found at
https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg03117.html .
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] virt: IRQ bypass manager

2015-08-18 Thread Eric Auger
Reviewed-by: Eric Auger 
Tested-by: Eric Auger 

Best Regards

Eric


On 08/06/2015 07:42 PM, Alex Williamson wrote:
> When a physical I/O device is assigned to a virtual machine through
> facilities like VFIO and KVM, the interrupt for the device generally
> bounces through the host system before being injected into the VM.
> However, hardware technologies exist that often allow the host to be
> bypassed for some of these scenarios.  Intel Posted Interrupts allow
> the specified physical edge interrupts to be directly injected into a
> guest when delivered to a physical processor while the vCPU is
> running.  ARM IRQ Forwarding allows forwarded physical interrupts to
> be directly deactivated by the guest.
> 
> The IRQ bypass manager here is meant to provide the shim to connect
> interrupt producers, generally the host physical device driver, with
> interrupt consumers, generally the hypervisor, in order to configure
> these bypass mechanism.  To do this, we base the connection on a
> shared, opaque token.  For KVM-VFIO this is expected to be an
> eventfd_ctx since this is the connection we already use to connect an
> eventfd to an irqfd on the in-kernel path.  When a producer and
> consumer with matching tokens is found, callbacks via both registered
> participants allow the bypass facilities to be automatically enabled.
> 
> Signed-off-by: Alex Williamson 
> ---
> 
> v4: All producer callbacks are optional, as with Intel PI, it's
> possible for the producer to be blissfully unaware of the bypass.
> 
>  MAINTAINERS   |7 +
>  include/linux/irqbypass.h |   90 
>  virt/lib/Kconfig  |2 
>  virt/lib/Makefile |1 
>  virt/lib/irqbypass.c  |  257 
> +
>  5 files changed, 357 insertions(+)
>  create mode 100644 include/linux/irqbypass.h
>  create mode 100644 virt/lib/Kconfig
>  create mode 100644 virt/lib/Makefile
>  create mode 100644 virt/lib/irqbypass.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index a9ae6c1..10c8b2f 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -10963,6 +10963,13 @@ L:   net...@vger.kernel.org
>  S:   Maintained
>  F:   drivers/net/ethernet/via/via-velocity.*
>  
> +VIRT LIB
> +M:   Alex Williamson 
> +M:   Paolo Bonzini 
> +L:   kvm@vger.kernel.org
> +S:   Supported
> +F:   virt/lib/
> +
>  VIVID VIRTUAL VIDEO DRIVER
>  M:   Hans Verkuil 
>  L:   linux-me...@vger.kernel.org
> diff --git a/include/linux/irqbypass.h b/include/linux/irqbypass.h
> new file mode 100644
> index 000..1551b5b
> --- /dev/null
> +++ b/include/linux/irqbypass.h
> @@ -0,0 +1,90 @@
> +/*
> + * IRQ offload/bypass manager
> + *
> + * Copyright (C) 2015 Red Hat, Inc.
> + * Copyright (c) 2015 Linaro Ltd.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +#ifndef IRQBYPASS_H
> +#define IRQBYPASS_H
> +
> +#include 
> +
> +struct irq_bypass_consumer;
> +
> +/*
> + * Theory of operation
> + *
> + * The IRQ bypass manager is a simple set of lists and callbacks that allows
> + * IRQ producers (ex. physical interrupt sources) to be matched to IRQ
> + * consumers (ex. virtualization hardware that allows IRQ bypass or offload)
> + * via a shared token (ex. eventfd_ctx).  Producers and consumers register
> + * independently.  When a token match is found, the optional @stop callback
> + * will be called for each participant.  The pair will then be connected via
> + * the @add_* callbacks, and finally the optional @start callback will allow
> + * any final coordination.  When either participant is unregistered, the
> + * process is repeated using the @del_* callbacks in place of the @add_*
> + * callbacks.  Match tokens must be unique per producer/consumer, 1:N 
> pairings
> + * are not supported.
> + */
> +
> +/**
> + * struct irq_bypass_producer - IRQ bypass producer definition
> + * @node: IRQ bypass manager private list management
> + * @token: opaque token to match between producer and consumer
> + * @irq: Linux IRQ number for the producer device
> + * @add_consumer: Connect the IRQ producer to an IRQ consumer (optional)
> + * @del_consumer: Disconnect the IRQ producer from an IRQ consumer (optional)
> + * @stop: Perform any quiesce operations necessary prior to add/del 
> (optional)
> + * @start: Perform any startup operations necessary after add/del (optional)
> + *
> + * The IRQ bypass producer structure represents an interrupt source for
> + * participation in possible host bypass, for instance an interrupt vector
> + * for a physical device assigned to a VM.
> + */
> +struct irq_bypass_producer {
> + struct list_head node;
> + void *token;
> + int irq;
> + int (*add_consumer)(struct irq_bypass_producer *,
> + struct irq_bypass_consumer *);
> + void (*del_consumer)(struct irq_bypass_producer *,
> + 

Re: [PATCH v2 4/4] irqchip: GIC: Don't deactivate interrupts forwarded to a guest

2015-08-18 Thread Eric Auger
Hi Marc,
On 08/13/2015 10:28 AM, Marc Zyngier wrote:
> Commit 0a4377de3056 ("genirq: Introduce irq_set_vcpu_affinity() to
> target an interrupt to a VCPU") added just what we needed at the
> lowest level to allow an interrupt to be deactivated by a guest.
> 
> When such a request reaches the GIC, it knows it doesn't need to
> perform the deactivation anymore, and can safely leave the guest
> do its magic. This of course requires additional support in both
> VFIO and KVM.
> 
> Signed-off-by: Marc Zyngier 
> ---
>  drivers/irqchip/irq-gic.c | 58 
> +++
>  1 file changed, 58 insertions(+)
> 
> diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
> index b020c3a..ea691be 100644
> --- a/drivers/irqchip/irq-gic.c
> +++ b/drivers/irqchip/irq-gic.c
> @@ -148,6 +148,34 @@ static inline bool primary_gic_irq(struct irq_data *d)
>   return true;
>  }
>  
> +static inline bool cascading_gic_irq(struct irq_data *d)
> +{
> + /*
> +  * If handler_data pointing to one of the secondary GICs, then
> +  * this is the cascading interrupt, and it cannot possibly be
a cascading interrupt?
> +  * forwarded.
> +  */
> + if (d->handler_data >= (void *)(gic_data + 1) &&
> + d->handler_data <  (void *)(gic_data + MAX_GIC_NR))
there is an accessor for d->handler_data: irq_data_get_irq_handler_data

besides:
Reviewed-by: Eric Auger 
Tested-by: Eric Auger  on Calxeda Midway with
VFIO SPI assignment

Best Regards

Eric


> + return true;
> +
> + return false;
> +}
> +
> +static inline bool forwarded_irq(struct irq_data *d)
> +{
> + /*
> +  * A forwarded interrupt:
> +  * - is on the primary GIC
> +  * - has its handler_data set to a value
> +  * - that isn't a secondary GIC
> +  */
> + if (primary_gic_irq(d) && d->handler_data && !cascading_gic_irq(d))
> + return true;
> +
> + return false;
> +}
> +
>  /*
>   * Routines to acknowledge, disable and enable interrupts
>   */
> @@ -166,6 +194,18 @@ static int gic_peek_irq(struct irq_data *d, u32 offset)
>  static void gic_mask_irq(struct irq_data *d)
>  {
>   gic_poke_irq(d, GIC_DIST_ENABLE_CLEAR);
> + /*
> +  * When masking a forwarded interrupt, make sure it is
> +  * deactivated as well.
> +  *
> +  * This ensures that an interrupt that is getting
> +  * disabled/masked will not get "stuck", because there is
> +  * noone to deactivate it (guest is being terminated).
> +  */
> + if (static_key_true(&supports_deactivate)) {
> + if (forwarded_irq(d))
> + gic_poke_irq(d, GIC_DIST_ACTIVE_CLEAR);
> + }
>  }
>  
>  static void gic_unmask_irq(struct irq_data *d)
> @@ -178,6 +218,10 @@ static void gic_eoi_irq(struct irq_data *d)
>   u32 deact_offset = GIC_CPU_EOI;
>  
>   if (static_key_true(&supports_deactivate)) {
> + /* Do not deactivate an IRQ forwarded to a vcpu. */
> + if (forwarded_irq(d))
> + return;
> +
>   if (primary_gic_irq(d))
>   deact_offset = GIC_CPU_DEACTIVATE;
>   }
> @@ -251,6 +295,19 @@ static int gic_set_type(struct irq_data *d, unsigned int 
> type)
>   return gic_configure_irq(gicirq, type, base, NULL);
>  }
>  
> +static int gic_irq_set_vcpu_affinity(struct irq_data *d, void *vcpu)
> +{
> + /* Only interrupts on the primary GIC can be forwarded to a vcpu. */
> + if (static_key_true(&supports_deactivate)) {
> + if (primary_gic_irq(d) && !cascading_gic_irq(d)) {
> + d->handler_data = vcpu;
> + return 0;
> + }
> + }
> +
> + return -EINVAL;
> +}
> +
>  #ifdef CONFIG_SMP
>  static int gic_set_affinity(struct irq_data *d, const struct cpumask 
> *mask_val,
>   bool force)
> @@ -346,6 +403,7 @@ static struct irq_chip gic_chip = {
>  #endif
>   .irq_get_irqchip_state  = gic_irq_get_irqchip_state,
>   .irq_set_irqchip_state  = gic_irq_set_irqchip_state,
> + .irq_set_vcpu_affinity  = gic_irq_set_vcpu_affinity,
>   .flags  = IRQCHIP_SET_TYPE_MASKED,
>  };
>  
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html