Re: [RFC PATCH 1/6] kvm: add device control API
On Wed, Feb 20, 2013 at 07:28:52PM -0600, Scott Wood wrote: > On 02/20/2013 06:14:37 PM, Marcelo Tosatti wrote: > >On Wed, Feb 20, 2013 at 05:53:20PM -0600, Scott Wood wrote: > >> >It is then not necessary to set device attributes on a live > >guest and > >> >deal with the complications associated with that. > >> > >> Which complications? > >> > >> -Scott > > > >Semantics of individual attribute writes, for one. > > When the attribute is a device register, the hardware documentation > takes care of that. Otherwise, the semantics are documented in the > device-specific documentation -- which can include restricting when > the access is allowed. Same as with any other interface > documentation. > > I suppose mpic.txt could use an additional statement that > KVM_DEV_MPIC_GRP_REGISTER performs an access as if it were performed > by the guest. > If access to an attribute has a guest visible side effect you cannot use this interface for migration. This is exactly the point in favor of distinguish accesses that have side effects (COMMAND or whatever) and accesses that set/get attribute (SET|GET_ATTR). -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/2] KVM: VMX: Add Posted Interrupt supporting
On Thu, Feb 21, 2013 at 06:04:52AM +, Zhang, Yang Z wrote: > Hi Marcelo, > > Can you help to review this patch? Many thanks if you can review it quickly. > The patch is only 2 days on the list. Be patient. > Zhang, Yang Z wrote on 2013-02-19: > > From: Yang Zhang > > > > Posted Interrupt allows APIC interrupts to inject into guest directly > > without any vmexit. > > > > - When delivering a interrupt to guest, if target vcpu is running, > > update Posted-interrupt requests bitmap and send a notification event > > to the vcpu. Then the vcpu will handle this interrupt automatically, > > without any software involvemnt. > > - If target vcpu is not running or there already a notification event > > pending in the vcpu, do nothing. The interrupt will be handled by > > next vm entry > > Signed-off-by: Yang Zhang > > --- > > arch/x86/include/asm/entry_arch.h |4 + > > arch/x86/include/asm/hw_irq.h |1 + > > arch/x86/include/asm/irq_vectors.h |5 + > > arch/x86/include/asm/kvm_host.h|3 + arch/x86/include/asm/vmx.h > > |4 + arch/x86/kernel/entry_64.S |5 + > > arch/x86/kernel/irq.c | 20 + > > arch/x86/kernel/irqinit.c |4 + arch/x86/kvm/lapic.c > > | 19 - arch/x86/kvm/lapic.h |1 + > > arch/x86/kvm/svm.c | 13 +++ arch/x86/kvm/vmx.c > > | 157 +++- arch/x86/kvm/x86.c > > |1 + 13 files changed, 214 insertions(+), 23 > > deletions(-) > > diff --git a/arch/x86/include/asm/entry_arch.h > > b/arch/x86/include/asm/entry_arch.h index 40afa00..9bd4eca 100644 --- > > a/arch/x86/include/asm/entry_arch.h +++ > > b/arch/x86/include/asm/entry_arch.h @@ -19,6 +19,10 @@ > > BUILD_INTERRUPT(reboot_interrupt,REBOOT_VECTOR) > > > > BUILD_INTERRUPT(x86_platform_ipi, X86_PLATFORM_IPI_VECTOR) > > +#ifdef CONFIG_HAVE_KVM +BUILD_INTERRUPT(kvm_posted_intr_ipi, > > POSTED_INTR_VECTOR) +#endif + > > /* > > * every pentium local APIC has two 'local interrupts', with a > > * soft-definable vector attached to both interrupts, one of > > diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h > > index eb92a6e..cebef02 100644 > > --- a/arch/x86/include/asm/hw_irq.h > > +++ b/arch/x86/include/asm/hw_irq.h > > @@ -28,6 +28,7 @@ > > /* Interrupt handlers registered during init_IRQ */ extern void > > apic_timer_interrupt(void); extern void x86_platform_ipi(void); +extern > > void kvm_posted_intr_ipi(void); extern void error_interrupt(void); > > extern void irq_work_interrupt(void); > > diff --git a/arch/x86/include/asm/irq_vectors.h > > b/arch/x86/include/asm/irq_vectors.h index 1508e51..774dc9f 100644 --- > > a/arch/x86/include/asm/irq_vectors.h +++ > > b/arch/x86/include/asm/irq_vectors.h @@ -102,6 +102,11 @@ > > */ > > #define X86_PLATFORM_IPI_VECTOR0xf7 > > +/* Vector for KVM to deliver posted interrupt IPI */ > > +#ifdef CONFIG_HAVE_KVM > > +#define POSTED_INTR_VECTOR 0xf2 > > +#endif > > + > > /* > > * IRQ work vector: > > */ > > diff --git a/arch/x86/include/asm/kvm_host.h > > b/arch/x86/include/asm/kvm_host.h index b8388e9..79da55e 100644 --- > > a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h > > @@ -704,6 +704,9 @@ struct kvm_x86_ops { > > void (*hwapic_isr_update)(struct kvm *kvm, int isr); > > void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap); > > void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set); > > + bool (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector, > > + int *result, bool *delivered); > > + void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu); > > int (*set_tss_addr)(struct kvm *kvm, unsigned int addr); > > int (*get_tdp_level)(void); > > u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio); > > diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h > > index 5c9dbad..ce8ac80 100644 > > --- a/arch/x86/include/asm/vmx.h > > +++ b/arch/x86/include/asm/vmx.h > > @@ -158,6 +158,7 @@ > > #define PIN_BASED_EXT_INTR_MASK 0x0001 > > #define PIN_BASED_NMI_EXITING 0x0008 > > #define PIN_BASED_VIRTUAL_NMIS 0x0020 > > +#define PIN_BASED_POSTED_INTR 0x0080 > > > > #define VM_EXIT_SAVE_DEBUG_CONTROLS 0x0002 #define > > VM_EXIT_HOST_ADDR_SPACE_SIZE0x0200 @@ -180,6 +181,7 @@ > > /* VMCS Encodings */ enum vmcs_field { VIRTUAL_PROCESSOR_ID > > = 0x, +POSTED_INTR_NV = 0x0002, > > GUEST_ES_SELECTOR = 0x0800, GUEST_CS_SELECTOR > > = 0x0802, GUEST_SS_SELECTOR = 0x0804, @@ > > -214,6 +216,8 @@ enum vmcs_field { VIRTUAL_APIC_PAGE_ADDR_HIGH > > = > > 0x2013,
RE: [PATCH v3 2/2] KVM: VMX: Add Posted Interrupt supporting
Hi Marcelo, Can you help to review this patch? Many thanks if you can review it quickly. Zhang, Yang Z wrote on 2013-02-19: > From: Yang Zhang > > Posted Interrupt allows APIC interrupts to inject into guest directly > without any vmexit. > > - When delivering a interrupt to guest, if target vcpu is running, > update Posted-interrupt requests bitmap and send a notification event > to the vcpu. Then the vcpu will handle this interrupt automatically, > without any software involvemnt. > - If target vcpu is not running or there already a notification event > pending in the vcpu, do nothing. The interrupt will be handled by > next vm entry > Signed-off-by: Yang Zhang > --- > arch/x86/include/asm/entry_arch.h |4 + > arch/x86/include/asm/hw_irq.h |1 + > arch/x86/include/asm/irq_vectors.h |5 + > arch/x86/include/asm/kvm_host.h|3 + arch/x86/include/asm/vmx.h > |4 + arch/x86/kernel/entry_64.S |5 + > arch/x86/kernel/irq.c | 20 + > arch/x86/kernel/irqinit.c |4 + arch/x86/kvm/lapic.c > | 19 - arch/x86/kvm/lapic.h |1 + > arch/x86/kvm/svm.c | 13 +++ arch/x86/kvm/vmx.c > | 157 +++- arch/x86/kvm/x86.c > |1 + 13 files changed, 214 insertions(+), 23 > deletions(-) > diff --git a/arch/x86/include/asm/entry_arch.h > b/arch/x86/include/asm/entry_arch.h index 40afa00..9bd4eca 100644 --- > a/arch/x86/include/asm/entry_arch.h +++ > b/arch/x86/include/asm/entry_arch.h @@ -19,6 +19,10 @@ > BUILD_INTERRUPT(reboot_interrupt,REBOOT_VECTOR) > > BUILD_INTERRUPT(x86_platform_ipi, X86_PLATFORM_IPI_VECTOR) > +#ifdef CONFIG_HAVE_KVM +BUILD_INTERRUPT(kvm_posted_intr_ipi, > POSTED_INTR_VECTOR) +#endif + > /* > * every pentium local APIC has two 'local interrupts', with a > * soft-definable vector attached to both interrupts, one of > diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h > index eb92a6e..cebef02 100644 > --- a/arch/x86/include/asm/hw_irq.h > +++ b/arch/x86/include/asm/hw_irq.h > @@ -28,6 +28,7 @@ > /* Interrupt handlers registered during init_IRQ */ extern void > apic_timer_interrupt(void); extern void x86_platform_ipi(void); +extern > void kvm_posted_intr_ipi(void); extern void error_interrupt(void); > extern void irq_work_interrupt(void); > diff --git a/arch/x86/include/asm/irq_vectors.h > b/arch/x86/include/asm/irq_vectors.h index 1508e51..774dc9f 100644 --- > a/arch/x86/include/asm/irq_vectors.h +++ > b/arch/x86/include/asm/irq_vectors.h @@ -102,6 +102,11 @@ > */ > #define X86_PLATFORM_IPI_VECTOR 0xf7 > +/* Vector for KVM to deliver posted interrupt IPI */ > +#ifdef CONFIG_HAVE_KVM > +#define POSTED_INTR_VECTOR 0xf2 > +#endif > + > /* > * IRQ work vector: > */ > diff --git a/arch/x86/include/asm/kvm_host.h > b/arch/x86/include/asm/kvm_host.h index b8388e9..79da55e 100644 --- > a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h > @@ -704,6 +704,9 @@ struct kvm_x86_ops { > void (*hwapic_isr_update)(struct kvm *kvm, int isr); > void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap); > void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set); > + bool (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector, > + int *result, bool *delivered); > + void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu); > int (*set_tss_addr)(struct kvm *kvm, unsigned int addr); > int (*get_tdp_level)(void); > u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio); > diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h > index 5c9dbad..ce8ac80 100644 > --- a/arch/x86/include/asm/vmx.h > +++ b/arch/x86/include/asm/vmx.h > @@ -158,6 +158,7 @@ > #define PIN_BASED_EXT_INTR_MASK 0x0001 > #define PIN_BASED_NMI_EXITING 0x0008 > #define PIN_BASED_VIRTUAL_NMIS 0x0020 > +#define PIN_BASED_POSTED_INTR 0x0080 > > #define VM_EXIT_SAVE_DEBUG_CONTROLS 0x0002 #define > VM_EXIT_HOST_ADDR_SPACE_SIZE0x0200 @@ -180,6 +181,7 @@ > /* VMCS Encodings */ enum vmcs_field { VIRTUAL_PROCESSOR_ID > = 0x, + POSTED_INTR_NV = 0x0002, > GUEST_ES_SELECTOR = 0x0800, GUEST_CS_SELECTOR > = 0x0802,GUEST_SS_SELECTOR = 0x0804, @@ > -214,6 +216,8 @@ enum vmcs_field { VIRTUAL_APIC_PAGE_ADDR_HIGH = > 0x2013, APIC_ACCESS_ADDR= 0x2014, > APIC_ACCESS_ADDR_HIGH = > 0x2015, > + POSTED_INTR_DESC_ADDR = 0x2016, > + POSTED_INTR_DESC_ADDR_HIGH = 0x2017, > EPT_POINTER = 0x201a, > EPT_POINTER_HIGH
[PATCH v2 13/15] target-s390x: Refactor debug output macros
Make debug output compile-testable even if disabled. Rename dprintf() in kvm.c to kvm_dprintf() due to a conflict with glibc. Drop unused DEBUG_HELPER and LOG_HELPER() in fpu_helper.c. Drop unused LOG_DISAS() in translate.c and inline S390X_DEBUG_DISAS. Signed-off-by: Andreas Färber --- target-s390x/cc_helper.c | 14 ++-- target-s390x/fpu_helper.c |7 -- target-s390x/helper.c | 53 ++-- target-s390x/int_helper.c | 14 ++-- target-s390x/kvm.c | 33 +-- target-s390x/mem_helper.c | 14 ++-- target-s390x/misc_helper.c | 14 ++-- target-s390x/translate.c | 23 +-- 8 Dateien geändert, 120 Zeilen hinzugefügt(+), 52 Zeilen entfernt(-) diff --git a/target-s390x/cc_helper.c b/target-s390x/cc_helper.c index a6d60bf..e45a19a 100644 --- a/target-s390x/cc_helper.c +++ b/target-s390x/cc_helper.c @@ -24,11 +24,21 @@ /* #define DEBUG_HELPER */ #ifdef DEBUG_HELPER -#define HELPER_LOG(x...) qemu_log(x) +static const bool debug_helper = true; #else -#define HELPER_LOG(x...) +static const bool debug_helper; #endif +static void GCC_FMT_ATTR(1, 2) HELPER_LOG(const char *fmt, ...) +{ +if (debug_helper) { +va_list ap; +va_start(ap, fmt); +qemu_log_vprintf(fmt, ap); +va_end(ap); +} +} + static uint32_t cc_calc_ltgt_32(int32_t src, int32_t dst) { if (src == dst) { diff --git a/target-s390x/fpu_helper.c b/target-s390x/fpu_helper.c index 94375b6..937da59 100644 --- a/target-s390x/fpu_helper.c +++ b/target-s390x/fpu_helper.c @@ -25,13 +25,6 @@ #include "exec/softmmu_exec.h" #endif -/* #define DEBUG_HELPER */ -#ifdef DEBUG_HELPER -#define HELPER_LOG(x...) qemu_log(x) -#else -#define HELPER_LOG(x...) -#endif - #define RET128(F) (env->retxl = F.low, F.high) #define convert_bit(mask, from, to) \ diff --git a/target-s390x/helper.c b/target-s390x/helper.c index 1183b45..4a02251 100644 --- a/target-s390x/helper.c +++ b/target-s390x/helper.c @@ -30,24 +30,53 @@ //#define DEBUG_S390_STDOUT #ifdef DEBUG_S390 -#ifdef DEBUG_S390_STDOUT -#define DPRINTF(fmt, ...) \ -do { fprintf(stderr, fmt, ## __VA_ARGS__); \ - qemu_log(fmt, ##__VA_ARGS__); } while (0) +static const bool debug_helper = true; #else -#define DPRINTF(fmt, ...) \ -do { qemu_log(fmt, ## __VA_ARGS__); } while (0) +static const bool debug_helper; #endif + +#ifdef DEBUG_S390_PTE +static const bool debug_pte = true; #else -#define DPRINTF(fmt, ...) \ -do { } while (0) +static const bool debug_pte; #endif -#ifdef DEBUG_S390_PTE -#define PTE_DPRINTF DPRINTF +#ifdef DEBUG_S390_STDOUT +static const bool debug_to_stdout = true; #else -#define PTE_DPRINTF(fmt, ...) \ -do { } while (0) +static const bool debug_to_stdout; +#endif + +#ifndef CONFIG_USER_ONLY +static inline void GCC_FMT_ATTR(1, 0) vDPRINTF(const char *fmt, va_list ap) +{ +if (debug_helper) { +if (debug_to_stdout) { +vfprintf(stderr, fmt, ap); +} +qemu_log_vprintf(fmt, ap); +} +} + +static void GCC_FMT_ATTR(1, 2) DPRINTF(const char *fmt, ...) +{ +if (debug_helper) { +va_list ap; +va_start(ap, fmt); +vDPRINTF(fmt, ap); +va_end(ap); +} +} + +static void GCC_FMT_ATTR(1, 2) PTE_DPRINTF(const char *fmt, ...) +{ +if (debug_pte) { +va_list ap; +va_start(ap, fmt); +vDPRINTF(fmt, ap); +va_end(ap); +} +} #endif #ifndef CONFIG_USER_ONLY diff --git a/target-s390x/int_helper.c b/target-s390x/int_helper.c index 6858301..c7da7e6 100644 --- a/target-s390x/int_helper.c +++ b/target-s390x/int_helper.c @@ -24,11 +24,21 @@ /* #define DEBUG_HELPER */ #ifdef DEBUG_HELPER -#define HELPER_LOG(x...) qemu_log(x) +static const bool debug_helper = true; #else -#define HELPER_LOG(x...) +static const bool debug_helper; #endif +static void GCC_FMT_ATTR(1, 2) HELPER_LOG(const char *fmt, ...) +{ +if (debug_helper) { +va_list ap; +va_start(ap, fmt); +qemu_log_vprintf(fmt, ap); +va_end(ap); +} +} + /* 64/64 -> 128 unsigned multiplication */ uint64_t HELPER(mul128)(CPUS390XState *env, uint64_t v1, uint64_t v2) { diff --git a/target-s390x/kvm.c b/target-s390x/kvm.c index 3929771..8eea7f1 100644 --- a/target-s390x/kvm.c +++ b/target-s390x/kvm.c @@ -38,13 +38,21 @@ /* #define DEBUG_KVM */ #ifdef DEBUG_KVM -#define dprintf(fmt, ...) \ -do { fprintf(stderr, fmt, ## __VA_ARGS__); } while (0) +static const bool debug_kvm = true; #else -#define dprintf(fmt, ...) \ -do { } while (0) +static const bool debug_kvm; #endif +static void GCC_FMT_ATTR(1, 2) kvm_dprintf(const char *fmt, ...) +{ +if (debug_kvm) { +va_list ap; +va_start(ap, fmt); +vfprintf(stderr, fmt, ap); +va_end(ap); +} +} + #define IPA0_DIAG 0x8300 #define IPA0_SIGP 0xae00 #define IPA0_B
[PATCH v2 08/15] target-i386: Refactor debug output macros
Make debug output compile-testable even if disabled. Signed-off-by: Andreas Färber Cc: Richard Henderson --- target-i386/helper.c | 42 ++ target-i386/kvm.c| 16 target-i386/seg_helper.c | 24 +++- 3 Dateien geändert, 57 Zeilen hinzugefügt(+), 25 Zeilen entfernt(-) diff --git a/target-i386/helper.c b/target-i386/helper.c index 4bf9db7..696e4e1 100644 --- a/target-i386/helper.c +++ b/target-i386/helper.c @@ -26,6 +26,22 @@ //#define DEBUG_MMU +#ifdef DEBUG_MMU +static const bool debug_mmu = true; +#else +static const bool debug_mmu; +#endif + +static void GCC_FMT_ATTR(1, 2) mmu_dprintf(const char *fmt, ...) +{ +if (debug_mmu) { +va_list ap; +va_start(ap, fmt); +vprintf(fmt, ap); +va_end(ap); +} +} + static void cpu_x86_version(CPUX86State *env, int *family, int *model) { int cpuver = env->cpuid_version; @@ -372,9 +388,8 @@ void x86_cpu_set_a20(X86CPU *cpu, int a20_state) a20_state = (a20_state != 0); if (a20_state != ((env->a20_mask >> 20) & 1)) { -#if defined(DEBUG_MMU) -printf("A20 update: a20=%d\n", a20_state); -#endif +mmu_dprintf("A20 update: a20=%d\n", a20_state); + /* if the cpu is currently executing code, we must unlink it and all the potentially executing TB */ cpu_interrupt(env, CPU_INTERRUPT_EXITTB); @@ -390,9 +405,8 @@ void cpu_x86_update_cr0(CPUX86State *env, uint32_t new_cr0) { int pe_state; -#if defined(DEBUG_MMU) -printf("CR0 update: CR0=0x%08x\n", new_cr0); -#endif +mmu_dprintf("CR0 update: CR0=0x%08x\n", new_cr0); + if ((new_cr0 & (CR0_PG_MASK | CR0_WP_MASK | CR0_PE_MASK)) != (env->cr[0] & (CR0_PG_MASK | CR0_WP_MASK | CR0_PE_MASK))) { tlb_flush(env, 1); @@ -433,18 +447,15 @@ void cpu_x86_update_cr3(CPUX86State *env, target_ulong new_cr3) { env->cr[3] = new_cr3; if (env->cr[0] & CR0_PG_MASK) { -#if defined(DEBUG_MMU) -printf("CR3 update: CR3=" TARGET_FMT_lx "\n", new_cr3); -#endif +mmu_dprintf("CR3 update: CR3=" TARGET_FMT_lx "\n", new_cr3); tlb_flush(env, 0); } } void cpu_x86_update_cr4(CPUX86State *env, uint32_t new_cr4) { -#if defined(DEBUG_MMU) -printf("CR4 update: CR4=%08x\n", (uint32_t)env->cr[4]); -#endif +mmu_dprintf("CR4 update: CR4=%08x\n", (uint32_t)env->cr[4]); + if ((new_cr4 ^ env->cr[4]) & (CR4_PGE_MASK | CR4_PAE_MASK | CR4_PSE_MASK | CR4_SMEP_MASK | CR4_SMAP_MASK)) { @@ -510,10 +521,9 @@ int cpu_x86_handle_mmu_fault(CPUX86State *env, target_ulong addr, target_ulong vaddr, virt_addr; is_user = mmu_idx == MMU_USER_IDX; -#if defined(DEBUG_MMU) -printf("MMU fault: addr=" TARGET_FMT_lx " w=%d u=%d eip=" TARGET_FMT_lx "\n", - addr, is_write1, is_user, env->eip); -#endif +mmu_dprintf("MMU fault: addr=" TARGET_FMT_lx " w=%d u=%d" +" eip=" TARGET_FMT_lx "\n", +addr, is_write1, is_user, env->eip); is_write = is_write1 & 1; if (!(env->cr[0] & CR0_PG_MASK)) { diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 0cf413d..9cbef77 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -37,13 +37,21 @@ //#define DEBUG_KVM #ifdef DEBUG_KVM -#define DPRINTF(fmt, ...) \ -do { fprintf(stderr, fmt, ## __VA_ARGS__); } while (0) +static const bool debug_kvm = true; #else -#define DPRINTF(fmt, ...) \ -do { } while (0) +static const bool debug_kvm; #endif +static void GCC_FMT_ATTR(1, 2) DPRINTF(const char *fmt, ...) +{ +if (debug_kvm) { +va_list ap; +va_start(ap, fmt); +vfprintf(stderr, fmt, ap); +va_end(ap); +} +} + #define MSR_KVM_WALL_CLOCK 0x11 #define MSR_KVM_SYSTEM_TIME 0x12 diff --git a/target-i386/seg_helper.c b/target-i386/seg_helper.c index 3247dee..2338d54 100644 --- a/target-i386/seg_helper.c +++ b/target-i386/seg_helper.c @@ -29,14 +29,28 @@ #endif /* !defined(CONFIG_USER_ONLY) */ #ifdef DEBUG_PCALL -# define LOG_PCALL(...) qemu_log_mask(CPU_LOG_PCALL, ## __VA_ARGS__) -# define LOG_PCALL_STATE(env) \ -log_cpu_state_mask(CPU_LOG_PCALL, (env), CPU_DUMP_CCOP) +static const bool debug_pcall = true; #else -# define LOG_PCALL(...) do { } while (0) -# define LOG_PCALL_STATE(env) do { } while (0) +static const bool debug_pcall; #endif +static void GCC_FMT_ATTR(1, 2) LOG_PCALL(const char *fmt, ...) +{ +if (debug_pcall) { +va_list ap; +va_start(ap, fmt); +qemu_log_mask(CPU_LOG_PCALL, fmt, ap); +va_end(ap); +} +} + +static inline void LOG_PCALL_STATE(CPUX86State *env) +{ +if (debug_pcall) { +log_cpu_state_mask(CPU_LOG_PCALL, (env), CPU_DUMP_CCOP); +} +} + /* return non zero if error */ static inline int load_segment(CPUX86State *env, uint32_t *e1_ptr, uint32_t *e2_ptr, int selector) -- 1.7.10.4 -- To
[PATCH v2 12/15] target-ppc: Refactor debug output macros
Make debug output compile-testable even if disabled. Rename dprintf() in kvm.c to kvm_dprintf() to avoid conflict with glibc. Inline DEBUG_OP check in excp_helper.c. Inline LOG_MMU_STATE() in mmu_helper.c. Inline PPC_{DEBUG_SPR,DUMP_SPR_ACCESSES} checks in translate_init.c. Signed-off-by: Andreas Färber --- target-ppc/excp_helper.c| 30 ++ target-ppc/kvm.c| 28 +++-- target-ppc/mem_helper.c |2 -- target-ppc/mmu_helper.c | 71 ++- target-ppc/translate.c | 15 +++-- target-ppc/translate_init.c | 40 +++- 6 Dateien geändert, 137 Zeilen hinzugefügt(+), 49 Zeilen entfernt(-) diff --git a/target-ppc/excp_helper.c b/target-ppc/excp_helper.c index 0a1ac86..9a11e1f 100644 --- a/target-ppc/excp_helper.c +++ b/target-ppc/excp_helper.c @@ -24,10 +24,28 @@ //#define DEBUG_OP //#define DEBUG_EXCEPTIONS +#ifdef DEBUG_OP +static const bool debug_op = true; +#else +static const bool debug_op; +#endif + #ifdef DEBUG_EXCEPTIONS -# define LOG_EXCP(...) qemu_log(__VA_ARGS__) +static const bool debug_exceptions = true; #else -# define LOG_EXCP(...) do { } while (0) +static const bool debug_exceptions; +#endif + +#ifndef CONFIG_USER_ONLY +static void GCC_FMT_ATTR(1, 2) LOG_EXCP(const char *fmt, ...) +{ +if (debug_exceptions) { +va_list ap; +va_start(ap, fmt); +qemu_log_vprintf(fmt, ap); +va_end(ap); +} +} #endif /*/ @@ -777,7 +795,7 @@ void ppc_hw_interrupt(CPUPPCState *env) } #endif /* !CONFIG_USER_ONLY */ -#if defined(DEBUG_OP) +#ifndef CONFIG_USER_ONLY static void cpu_dump_rfi(target_ulong RA, target_ulong msr) { qemu_log("Return from exception at " TARGET_FMT_lx " with flags " @@ -835,9 +853,9 @@ static inline void do_rfi(CPUPPCState *env, target_ulong nip, target_ulong msr, /* XXX: beware: this is false if VLE is supported */ env->nip = nip & ~((target_ulong)0x0003); hreg_store_msr(env, msr, 1); -#if defined(DEBUG_OP) -cpu_dump_rfi(env->nip, env->msr); -#endif +if (debug_op) { +cpu_dump_rfi(env->nip, env->msr); +} /* No need to raise an exception here, * as rfi is always the last insn of a TB */ diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c index 2c64c63..997d400 100644 --- a/target-ppc/kvm.c +++ b/target-ppc/kvm.c @@ -40,13 +40,21 @@ //#define DEBUG_KVM #ifdef DEBUG_KVM -#define dprintf(fmt, ...) \ -do { fprintf(stderr, fmt, ## __VA_ARGS__); } while (0) +static const bool debug_kvm = true; #else -#define dprintf(fmt, ...) \ -do { } while (0) +static const bool debug_kvm; #endif +static void GCC_FMT_ATTR(1, 2) kvm_dprintf(const char *fmt, ...) +{ +if (debug_kvm) { +va_list ap; +va_start(ap, fmt); +vfprintf(stderr, fmt, ap); +va_end(ap); +} +} + #define PROC_DEVTREE_CPU "/proc/device-tree/cpus/" const KVMCapabilityInfo kvm_arch_required_capabilities[] = { @@ -769,7 +777,7 @@ void kvm_arch_pre_run(CPUState *cs, struct kvm_run *run) */ irq = KVM_INTERRUPT_SET; -dprintf("injected interrupt %d\n", irq); +kvm_dprintf("injected interrupt %d\n", irq); r = kvm_vcpu_ioctl(cs, KVM_INTERRUPT, &irq); if (r < 0) { printf("cpu %d fail inject %x\n", cs->cpu_index, irq); @@ -831,20 +839,20 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run) switch (run->exit_reason) { case KVM_EXIT_DCR: if (run->dcr.is_write) { -dprintf("handle dcr write\n"); +kvm_dprintf("handle dcr write\n"); ret = kvmppc_handle_dcr_write(env, run->dcr.dcrn, run->dcr.data); } else { -dprintf("handle dcr read\n"); +kvm_dprintf("handle dcr read\n"); ret = kvmppc_handle_dcr_read(env, run->dcr.dcrn, &run->dcr.data); } break; case KVM_EXIT_HLT: -dprintf("handle halt\n"); +kvm_dprintf("handle halt\n"); ret = kvmppc_handle_halt(env); break; #ifdef CONFIG_PSERIES case KVM_EXIT_PAPR_HCALL: -dprintf("handle PAPR hypercall\n"); +kvm_dprintf("handle PAPR hypercall\n"); run->papr_hcall.ret = spapr_hypercall(cpu, run->papr_hcall.nr, run->papr_hcall.args); @@ -852,7 +860,7 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run) break; #endif case KVM_EXIT_EPR: -dprintf("handle epr\n"); +kvm_dprintf("handle epr\n"); run->epr.epr = ldl_phys(env->mpic_iack); ret = 0; break; diff --git a/target-ppc/mem_helper.c b/target-ppc/mem_helper.c index ba383c8..89c51d0 100644 --- a/target-ppc/mem_helper.c +++ b/target-ppc/mem_helper.c @@ -26,8 +26,6 @@ #include "exec/softmmu_exec.
Re: [RFC PATCH 1/6] kvm: add device control API
On 02/20/2013 07:09:49 AM, Gleb Natapov wrote: On Tue, Feb 19, 2013 at 03:16:37PM -0600, Scott Wood wrote: > On 02/19/2013 06:24:18 AM, Gleb Natapov wrote: > >On Mon, Feb 18, 2013 at 05:01:40PM -0600, Scott Wood wrote: > >> The ability to set/get attributes is needed. Sorry, but "get or set > >> one blob of data, up to 512 bytes, for the entire irqchip" is just > >> not good enough -- assuming you don't want us to start sticking > >> pointers and commands in *that* data. :-) > >> > >Proposed interface sticks pointers into ioctl data, so why doing > >the same > >for KVM_SET_IRQCHIP/KVM_GET_IRQCHIP makes you smile. > > There's a difference between putting a pointer in an ioctl control > structure that is specifically documented as being that way (as in > ONE_REG), versus taking an ioctl that claims to be setting/getting a > blob of state and embedding pointers in it. It would be like > sticking a pointer in the attribute payload of this API, which I > think is something to be discouraged. If documentation is what differentiate for you between silly and smart then write documentation instead of new interfaces. You mean like what we did with SREGS, that got deprecated and replaced with ONE_REG? How is writing documentation not creating new interfaces, if the documentation is different from what the interface is currently understood to do? Note that Marcelo seems to view KVM_SET_IRQCHIP as effectively being a device reset, which is rather different. KVM_SET_IRQCHIP/KVM_GET_IRQCHIP is defined to operate on blob of data on x86, nothing prevent you from adding MPIC specifics to the interface, Add mpic state into kvm_irqchip structure and if 512 bytes is not enough for you to transfer the state put pointers there and _document_ them. So basically, you want me to keep this interface but share the ioctl number with an older interface? :-P But with 512 bytes you can transfer properties inline, so you probably do not need pointer there anyway. I see you have three properties 2 of them 32bit and one 64bit. Three *groups* of properties. One of the property groups is per source, and we can have hundreds of sources. Another exposes the register space, which is 64 KiB (admittedly it's somewhat sparse, but there's more than 512 bytes of real data in there). And we don't necessarily want to set *everything*. >It'd also be using > KVM_SET_IRQCHIP to read data, which is the sort of thing you object > to later on regarding KVM_IRQ_LINE_STATUS. > Do not see why. It's either that, or have the data direction of the "chip" field in KVM_GET_IRQCHIP not be entirely in the "get" direction. > Then there's the silliness of transporting 512 bytes just to read a > descriptor for transporting something else. > Yes, agree. But is this enough of a reason to introduce entirely new interface? Is it on performance critical path? Doubt it, unless you abuse the interface to send interrupts, but then isn't it silty to do copy_from_user() twice to inject an interrupt like proposed interface does? It should probably be get_user() instead, which is pretty fast in the absence of a fault. > >For signaling irqs (I think this is what you mean by "commands") > >we have KVM_IRQ_LINE. > > It's one type of command. Another is setting the address. Another > is writing to registers that have side effects (this is how MSI > injection is done on MPIC, just as in real hardware). > Setting the address is setting an attribute. Sending MSI is a command. Things you set/get during init/migration are attributes. Things you do to cause side-effects are commands. What if I set the address at a time that isn't init/migration (the hardware allows moving it, like a PCI BAR)? Suddenly it becomes not an attribute due to how the caller uses it? > What is the benefit of KVM_IRQ_LINE over what MPIC does? What real > (non-glue/wrapper) code can become common? > No new ioctl with exactly same result (well actually even faster since less copying is done). Which ioctl would go away? You need to show us the benefits of the new interface vs existing one, not vice versa. Well, as I said to Marcello, the real reason why we probably need to use the existing interface is irqfd. That doesn't make the device control stuff go away. > And I really hope you don't want us to do MSIs the x86 way. > What is wrong with KVM_SIGNAL_MSI? Except that you'll need code to glue it to mpic. We'll have to write extra code for it compared to the current way where it works with *zero* code beyond what is wanted for other purposes such as debug and (eventually) migration. At least it's more direct than having to establish a GSI route... > In the XICS thread, Paul brought up the possibliity of cascaded > MPICs. It's not relevant to the systems we're trying to model, but > if one did want to use the in-kernel irqchip interface for that, it > would be really nice
Re: [PATCH 3/9] KVM: PPC: Book3S: Add kernel emulation for the XICS interrupt controller
On 02/20/2013 07:09:55 PM, Marcelo Tosatti wrote: On Wed, Feb 20, 2013 at 06:20:51PM -0600, Scott Wood wrote: > On 02/20/2013 01:58:54 PM, Marcelo Tosatti wrote: > >This is probably a stupid question, but why the > >KVM_SET_IRQCHIP/KVM_SET_GSI_ROUTING interface is not appropriate for > >your purposes? > > > >x86 sets up a default GSI->IRQCHIP PIN mapping on creation (during > >KVM_SET_IRQCHIP), but it can be modified with KVM_SET_GSI_ROUTING. > > To start, the whole IRQ routing stuff is poorly documented. > > Am I supposed to make up GSI numbers and use the routing thing to > associate them with real interrupts? I have no idea. Is mapping from one integer linear space (GSIs) to real interrupts suitable for you? I can live with it. > Where does the code live to manage this table, and how APICy is it (looks like the > answer is "irq_comm.c, and very")? Thinking faster than typing? Not sure what you mean. Sorry... The code to manage the table lives in virt/kvm/irq_comm.c. That code is very APIC-specific and not currently in a state that invites sharing. > It looks like I'm going to have to do this anyway for irqfd, though > that doesn't make the other uses of the device control api go away. > Even KVM_DEV_MPIC_GRP_IRQ_ACTIVE would still be useful for reading > the status for debugging (reading device registers doesn't quite do > it, since the "active" bit won't show up if the interrupt is > masked). > At that point, is it more offensive to make it read-only > even though it would be trivial to make it read/write (which would > allow users who don't need it to bypass the routing API), or to make > it read/write and live with there being more than one way to do > something? Can't follow this sentence. Suppose, for the sake of irqfd (and maillist tranquility) MPIC uses existing KVM_IRQ_LINE and such. Will there be objection to being able to use KVM_GET_DEVICE_ATTR to *get* the irq line status for debugging purposes (maybe also useful for migration)? If there's no objection to that, would there be any actual reason, beyond saving a few lines of glue code, to make it a read-only attribute? > KVM_SET_IRQCHIP is not suitable because we have more than 512 bytes > of state, and because it doesn't allow debugging access to device > registers (e.g. inspecting from the QEMU command line), and because > it's hard to add new pieces of state if we realize we left something > out. It reminds be of GET/SET_SREGS. With that, I did what you > seem to want here, which is to adapt the existing interfaces, using > feature flags to control optional state. It ended up being a mess, > and ONE_REG was introduced as a replacement. The device control API > is the equivalent of ONE_REG for things other than vcpus. > > -Scott - ACK on 512 bytes not sufficient. Add another ioctl, SET_IRQCHIP2? Well, that's what KVM_SET_DEVICE_ATTR is. - Agree on poor extensibility of interface. Adding a reserved amount of space as padding and versioning such as has been done so far is not acceptable? That's exactly what we did with SREGS, and it got declared a mess and replaced with ONE_REG. I'm trying to learn from my mistakes. :-) - Debugging: why is reading entire register state not acceptable? Yes, its slow. For one, it's more work. The current way works by simulating a guest MMIO access. No blob format to design, implement, and keep compatible. -Scott -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 1/6] kvm: add device control API
On 02/20/2013 06:14:37 PM, Marcelo Tosatti wrote: On Wed, Feb 20, 2013 at 05:53:20PM -0600, Scott Wood wrote: > >It is then not necessary to set device attributes on a live guest and > >deal with the complications associated with that. > > Which complications? > > -Scott Semantics of individual attribute writes, for one. When the attribute is a device register, the hardware documentation takes care of that. Otherwise, the semantics are documented in the device-specific documentation -- which can include restricting when the access is allowed. Same as with any other interface documentation. I suppose mpic.txt could use an additional statement that KVM_DEV_MPIC_GRP_REGISTER performs an access as if it were performed by the guest. Locking versus currently executing VCPUs, for another (see how KVM_SET_IRQ's RCU usage, for instance, that is something should be shared). If you mean kvm_set_irq() in irq_comm.c, that's only relevant when you have a GSI routing table, which this patchset doesn't. Assuming we end up having a routing table to support irqfd, we still can't share the code as is, since it's APIC-specific. -Scott -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 1/6] kvm: add device control API
On Wed, Feb 20, 2013 at 05:53:20PM -0600, Scott Wood wrote: > >Why exactly you need to set attributes Scott? > > That's best answered by looking at patch 6/6 and discussing the > actual attributes that are defined so far. > > The need to set the base address of the registers is > straightforward. When ARM added their special ioctl for this, we > discussed it being eventually replaced with a more general API (in > fact, that was the reason they put "ARM" in the name). > > Access to device registers was originally intended for debugging, > and eventually migration. It turned out to also be very useful for > injecting MSIs -- nothing special needed to be done. It Just > Works(tm) the same way it does in hardware, with an MMIO write from > a PCI device to a specific MPIC register. Again, irqfd may > complicate this, but there's no need for QEMU-generated MSIs to have > to take a circuitous path. > > Finally, there's the interrupt source attribute. Even if we get rid > of that, we'll need MPIC-specific documentation on how to flatten > the IRQ numberspace, and if we ever have a cascaded PIC situation > things could get ugly since there's no way to identify a specific > IRQ controller in KVM_IRQ_LINE. > > >Also related to this question is the point of avoiding the > >implementation of devices to be spread across userspace and the kernel > >(this is one point Avi brought up often). If the device emulation > >is implemented entirely in the kernel, all is necessary are initial > >values of device registers (and retrieval of those values later for > >savevm/migration). > > MPIC emulation is entirely in the kernel with this patchset -- > though the register that lets you reset cores will likely need to be > kicked back to QEMU. > > >It is then not necessary to set device attributes on a live guest and > >deal with the complications associated with that. > > Which complications? > > -Scott Semantics of individual attribute writes, for one. Locking versus currently executing VCPUs, for another (see how KVM_SET_IRQ's RCU usage, for instance, that is something should be shared). -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/9] KVM: PPC: Book3S: Add kernel emulation for the XICS interrupt controller
On Wed, Feb 20, 2013 at 06:20:51PM -0600, Scott Wood wrote: > On 02/20/2013 01:58:54 PM, Marcelo Tosatti wrote: > >This is probably a stupid question, but why the > >KVM_SET_IRQCHIP/KVM_SET_GSI_ROUTING interface is not appropriate for > >your purposes? > > > >x86 sets up a default GSI->IRQCHIP PIN mapping on creation (during > >KVM_SET_IRQCHIP), but it can be modified with KVM_SET_GSI_ROUTING. > > To start, the whole IRQ routing stuff is poorly documented. > > Am I supposed to make up GSI numbers and use the routing thing to > associate them with real interrupts? I have no idea. Is mapping from one integer linear space (GSIs) to real interrupts suitable for you? > Are there constraints on what > sort of GSI numbers I can choose (I now see in the code that > KVM_MAX_IRQ_ROUTES is returned from the capability check, but where > is that documented? Don't think it is. > It looks like the APIC implementation has > default routes, where is that documented?)? In the code. > Where does the code live to manage this table, and how APICy is it (looks > like the > answer is "irq_comm.c, and very")? Thinking faster than typing? Not sure what you mean. > I suppose I could write another > implementation of the table management code for MPIC, though the > placement of "irqchip" inside the route entry, rather than as an > argument to KVM_IRQ_LINE, suggests the table is supposed to be > global, not in the individual interrupt controller. Yes the table is global. It maps GSI ("Global System Interrupt" IIRC) (integer) to (irqchip,pin) pair. > It looks like I'm going to have to do this anyway for irqfd, though > that doesn't make the other uses of the device control api go away. > Even KVM_DEV_MPIC_GRP_IRQ_ACTIVE would still be useful for reading > the status for debugging (reading device registers doesn't quite do > it, since the "active" bit won't show up if the interrupt is > masked). > At that point, is it more offensive to make it read-only > even though it would be trivial to make it read/write (which would > allow users who don't need it to bypass the routing API), or to make > it read/write and live with there being more than one way to do > something? Can't follow this sentence. > KVM_SET_IRQCHIP is not suitable because we have more than 512 bytes > of state, and because it doesn't allow debugging access to device > registers (e.g. inspecting from the QEMU command line), and because > it's hard to add new pieces of state if we realize we left something > out. It reminds be of GET/SET_SREGS. With that, I did what you > seem to want here, which is to adapt the existing interfaces, using > feature flags to control optional state. It ended up being a mess, > and ONE_REG was introduced as a replacement. The device control API > is the equivalent of ONE_REG for things other than vcpus. > > -Scott - ACK on 512 bytes not sufficient. Add another ioctl, SET_IRQCHIP2? - Agree on poor extensibility of interface. Adding a reserved amount of space as padding and versioning such as has been done so far is not acceptable? - Debugging: why is reading entire register state not acceptable? Yes, its slow. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PULL] KVM updates for the 3.9 merge window
Linus, Please pull from git://git.kernel.org/pub/scm/virt/kvm/kvm.git tags/kvm-3.9-1 to receive the KVM updates for the 3.9 merge window, including x86 real mode emulation fixes, stronger memory slot interface restrictions, mmu_lock spinlock hold time reduction, improved handling of large page faults on shadow, initial APICv HW acceleration support, s390 channel IO based virtio, amongst others. -- Alex Williamson (13): KVM: Restrict non-existing slot state transitions KVM: Check userspace_addr when modifying a memory slot KVM: Fix iommu map/unmap to handle memory slot moves KVM: Minor memory slot optimization KVM: Rename KVM_MEMORY_SLOTS -> KVM_USER_MEM_SLOTS KVM: Make KVM_PRIVATE_MEM_SLOTS optional KVM: struct kvm_memory_slot.user_alloc -> bool KVM: struct kvm_memory_slot.flags -> u32 KVM: struct kvm_memory_slot.id -> short KVM: Increase user memory slots on x86 to 125 kvm: Fix memory slot generation updates kvm: Force IOMMU remapping on memory slot read-only flag changes kvm: Obey read-only mappings in iommu Alexander Graf (17): KVM: PPC: Only WARN on invalid emulation KVM: PPC: Book3S: PR: Enable alternative instruction for SC 1 KVM: PPC: BookE: Allow irq deliveries to inject requests KVM: PPC: BookE: Emulate mfspr on EPR KVM: PPC: BookE: Implement EPR exit KVM: PPC: BookE: Add EPR ONE_REG sync KVM: PPC: E500: Move write_stlbe higher KVM: PPC: E500: Explicitly mark shadow maps invalid KVM: PPC: E500: Propagate errors when shadow mapping KVM: PPC: e500: Call kvmppc_mmu_map for initial mapping KVM: PPC: E500: Split host and guest MMU parts KVM: PPC: e500: Implement TLB1-in-TLB0 mapping KVM: PPC: E500: Make clear_tlb_refs and clear_tlb1_bitmap static KVM: PPC: E500: Remove kvmppc_e500_tlbil_all usage from guest TLB code Merge commit 'origin/next' into kvm-ppc-next KVM: PPC: BookE: Handle alignment interrupts Merge commit 'origin/next' into kvm-ppc-next Avi Kivity (16): KVM: x86 emulator: framework for streamlining arithmetic opcodes KVM: x86 emulator: Support for declaring single operand fastops KVM: x86 emulator: introduce NoWrite flag KVM: x86 emulator: mark CMP, CMPS, SCAS, TEST as NoWrite KVM: x86 emulator: convert NOT, NEG to fastop KVM: x86 emulator: add macros for defining 2-operand fastop emulation KVM: x86 emulator: convert basic ALU ops to fastop KVM: x86 emulator: Convert SHLD, SHRD to fastop KVM: x86 emulator: convert shift/rotate instructions to fastop KVM: x86 emulator: covert SETCC to fastop KVM: x86 emulator: convert INC/DEC to fastop KVM: x86 emulator: convert BT/BTS/BTR/BTC/BSF/BSR to fastop KVM: x86 emulator: convert 2-operand IMUL to fastop KVM: x86 emulator: rearrange fastop definitions KVM: x86 emulator: convert a few freestanding emulations to fastop KVM: x86 emulator: fix test_cc() build failure on i386 Bharat Bhushan (3): KVM: PPC: booke: use vcpu reference from thread_struct KVM: PPC: booke: Allow multiple exception types booke: Added DBCR4 SPR number Christian Borntraeger (3): KVM: s390: Gracefully handle busy conditions on ccw_device_start s390/kvm: Fix store status for ACRS/FPRS s390/kvm: Fix instruction decoding Cong Ding (1): KVM: s390: kvm/sigp.c: fix memory leakage Cornelia Huck (14): KVM: s390: Handle hosts not supporting s390-virtio. s390/ccwdev: Include asm/schid.h. KVM: s390: Add a channel I/O based virtio transport driver. KVM: s390: Constify intercept handler tables. KVM: s390: Decoding helper functions. KVM: s390: Support for I/O interrupts. KVM: s390: Add support for machine checks. KVM: s390: In-kernel handling of I/O instructions. KVM: s390: Base infrastructure for enabling capabilities. KVM: s390: Add support for channel I/O instructions. KVM: s390: Dynamic allocation of virtio-ccw I/O data. KVM: trace: Fix exit decoding. s390/virtio-ccw: Fix setup_vq error handling. KVM: s390: Fix handling of iscs. Dongxiao Xu (1): KVM: VMX: disable SMEP feature when guest is in non-paging mode Geoff Levand (1): KVM: Remove duplicate text in api.txt Gleb Natapov (39): KVM: emulator: implement AAD instruction KVM: inject ExtINT interrupt before APIC interrupts KVM: remove unused variable. KVM: VMX: cleanup rmode_segment_valid() KVM: VMX: relax check for CS register in rmode_segment_valid() KVM: VMX: return correct segment limit and flags for CS/SS registers in real mode KVM: VMX: use fix_rmode_seg() to fix all code/data segments KVM: VMX: remove redundant code from vmx_set_segment() KVM: VMX: clean-up vmx_set_segment() KVM: VMX: remove unneeded temporary variable from vmx_set_segment() KVM
Re: [RFC PATCH 1/6] kvm: add device control API
On 02/20/2013 06:01:00 PM, Marcelo Tosatti wrote: The main problem, to me, is that the interface allows flexibility but the details of using such flexibility are not worked out. That is, lack of definitions such as when setting attributes is allowed. That is defined by the individual attributes. There is one constraint that I did forget to indicate in mpic.txt -- KVM_DEV_MPIC_GRP_REGISTER can only be used when the registers are mapped. With a big blog, that information is implicit: a SET operation is a full device reset. So, for example, how would I handle the guest changing the location of the MPIC registers dynamically (similar to changing a PCI BAR)? QEMU doesn't implement this currently, but the hardware does, so the kernel interface shouldn't preclude it. What if we later discover that we need some extra bit of state that wasn't included in the initial definition? Especially since we don't do migration at all yet on ppc, so we can't even use what QEMU currently saves as a reference. What if a particular array within the state blob needs to be scaled up because we now have more sources? Will we need to version the blob? It just gets to be a mess, like SREGS. And then we need special code for packing/unpacking the blob, whereas with this patchset we reuse the same emulation code guest MMIO uses. With individual attributes: - Which attributes can be set individually? - Is there an order which must be respected? If there are constraints, the device documentation should specify them. - Locking. Same locking I need for MMIO and interrupt injection via other paths. - End result: more logic/code necessary. That's a tradeoff for individual devices to make, in cases where it's actually true. > It's not about how often we do it, but how much state we have, > especially if we ever want to implement migration. Migration reads/writes the device state once, which is supposedly much smaller than VM's RAM, so can't see the logic here. struct kvm_irqchip can only hold 512 bytes. We have more state than that. -Scott -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/9] KVM: PPC: Book3S: Add kernel emulation for the XICS interrupt controller
On 02/20/2013 01:58:54 PM, Marcelo Tosatti wrote: This is probably a stupid question, but why the KVM_SET_IRQCHIP/KVM_SET_GSI_ROUTING interface is not appropriate for your purposes? x86 sets up a default GSI->IRQCHIP PIN mapping on creation (during KVM_SET_IRQCHIP), but it can be modified with KVM_SET_GSI_ROUTING. To start, the whole IRQ routing stuff is poorly documented. Am I supposed to make up GSI numbers and use the routing thing to associate them with real interrupts? Are there constraints on what sort of GSI numbers I can choose (I now see in the code that KVM_MAX_IRQ_ROUTES is returned from the capability check, but where is that documented? It looks like the APIC implementation has default routes, where is that documented?)? Where does the code live to manage this table, and how APICy is it (looks like the answer is "irq_comm.c, and very")? I suppose I could write another implementation of the table management code for MPIC, though the placement of "irqchip" inside the route entry, rather than as an argument to KVM_IRQ_LINE, suggests the table is supposed to be global, not in the individual interrupt controller. It looks like I'm going to have to do this anyway for irqfd, though that doesn't make the other uses of the device control api go away. Even KVM_DEV_MPIC_GRP_IRQ_ACTIVE would still be useful for reading the status for debugging (reading device registers doesn't quite do it, since the "active" bit won't show up if the interrupt is masked). At that point, is it more offensive to make it read-only even though it would be trivial to make it read/write (which would allow users who don't need it to bypass the routing API), or to make it read/write and live with there being more than one way to do something? KVM_SET_IRQCHIP is not suitable because we have more than 512 bytes of state, and because it doesn't allow debugging access to device registers (e.g. inspecting from the QEMU command line), and because it's hard to add new pieces of state if we realize we left something out. It reminds be of GET/SET_SREGS. With that, I did what you seem to want here, which is to adapt the existing interfaces, using feature flags to control optional state. It ended up being a mess, and ONE_REG was introduced as a replacement. The device control API is the equivalent of ONE_REG for things other than vcpus. -Scott -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 1/6] kvm: add device control API
On Wed, Feb 20, 2013 at 05:20:50PM -0600, Scott Wood wrote: > On 02/20/2013 03:17:27 PM, Marcelo Tosatti wrote: > >On Mon, Feb 18, 2013 at 05:01:40PM -0600, Scott Wood wrote: > >> On 02/18/2013 06:21:59 AM, Gleb Natapov wrote: > >> >Copying Christoffer since ARM has in kernel irq chip too. > >> > > >> >On Wed, Feb 13, 2013 at 11:49:15PM -0600, Scott Wood wrote: > >> >> Currently, devices that are emulated inside KVM are > >configured in a > >> >> hardcoded manner based on an assumption that any given > >architecture > >> >> only has one way to do it. If there's any need to access device > >> >state, > >> >> it is done through inflexible one-purpose-only IOCTLs (e.g. > >> >> KVM_GET/SET_LAPIC). Defining new IOCTLs for every little > >thing is > >> >> cumbersome and depletes a limited numberspace. > > > >Creation of ioctl has advantages. It makes explicit what the > >data contains and how it should be interpreted. > >This means that for example, policy control can be performed at ioctl > >level (can group, by ioctl number, which ioctl calls are allowed after > >VM creation, for example). > > > >It also makes it explicit that its a userspace interface which > >applications depend. > > > >With a single 'set device attribute' ioctl its more difficult > >to do that. > > I don't see why creating ioctls rather than device attributes (or > whatever you want to call them) differs this way. Device attributes > are inherently userspace interfaces, just as ioctls are. What the > data contains and how it is interpreted are documented, albeit in a > more lightweight format than KVM usually uses for ioctls. > > An ioctl is no guarantee of good documentation. KVM is far better > than most of the kernel in that regard, but even there some ioctls > are missing (e.g. KVM_IRQ_LINE_STATUS as mentioned elsewhere in this > thread), and some others are inadequately explained (e.g. IRQ > routing). > > By "policy control", do you just mean that some ioctls act on a > /dev/kvm fd, others on a vm fd, and others on a vcpu fd? This is > pretty similar to having a device fd, except for the mechanism used. No, i mean policy control acting on ioctl numbers. Its essentially the same issue as with 'strace' (in that the its necessary to parse the ioctl data further). > The main things that I dislike about adding new things at the ioctl > level are: > - limited numberspace, and more prone to merge conflicts than a > device-specific numberspace Can reserve per-architecture ranges to deal with that. > - having to add a new pathway to get into the driver for each ioctl, > rather than having all operations on a particular device go to the > right place, and the device implementation interprets further > (this assumes that we're talking about vm ioctls, and not returning > a > new fd for the device) Agree with that point. > - awkward way of negotiating capabilities with userspace (have to > declare the capability id separately, and add code somewhere outside > the > device implementation to respond to capability requests) Agree with that point. > - api.txt formatting that imposes yet another document-internal > numberspace to conflict on :-) The main problem, to me, is that the interface allows flexibility but the details of using such flexibility are not worked out. That is, lack of definitions such as when setting attributes is allowed. With a big blog, that information is implicit: a SET operation is a full device reset. With individual attributes: - Which attributes can be set individually? - Is there an order which must be respected? - Locking. - End result: more logic/code necessary. > >Abstracting the details also makes it cumbersome to read > >strace output :-) > > You'd have to update strace one way or another if you want > pretty-printed output. Having a more restricted API than arbitrary > ioctls could actually improve the situation -- this could be a good > reason for including the attribute length as an explicit parameter > rather than being implicit in the attribute id. Then you'd just > teach strace about the device control API once, and not for each new > attribute or device that gets added. > > >> >> This API provides a mechanism to instantiate a device of a > >certain > >> >> type, returning an ID that can be used to set/get attributes > >of the > >> >> device. Attributes may include configuration parameters (e.g. > >> >> register base address), device state, operational commands, > >etc. It > >> >> is similar to the ONE_REG API, except that it acts on devices > >rather > >> >> than vcpus. > >> >You are not only provide different way to create in kernel irq > >> >chip you > >> >also use an alternate way to trigger interrupt lines. Before going > >> >into > >> >interface specifics lets think about whether it is really worth it? > >> > >> Which "it" do you mean here? > >> > >> The ability to set/get attributes is needed. Sorry, but "get or set > >> one blob of data, up to 512 bytes, for the entire
Re: [RFC PATCH 1/6] kvm: add device control API
On 02/20/2013 03:28:24 PM, Marcelo Tosatti wrote: On Wed, Feb 20, 2013 at 03:09:49PM +0200, Gleb Natapov wrote: > On Tue, Feb 19, 2013 at 03:16:37PM -0600, Scott Wood wrote: > > On 02/19/2013 06:24:18 AM, Gleb Natapov wrote: > > >On Mon, Feb 18, 2013 at 05:01:40PM -0600, Scott Wood wrote: > > >> The ability to set/get attributes is needed. Sorry, but "get or set > > >> one blob of data, up to 512 bytes, for the entire irqchip" is just > > >> not good enough -- assuming you don't want us to start sticking > > >> pointers and commands in *that* data. :-) > > >> > > >Proposed interface sticks pointers into ioctl data, so why doing > > >the same > > >for KVM_SET_IRQCHIP/KVM_GET_IRQCHIP makes you smile. > > > > There's a difference between putting a pointer in an ioctl control > > structure that is specifically documented as being that way (as in > > ONE_REG), versus taking an ioctl that claims to be setting/getting a > > blob of state and embedding pointers in it. It would be like > > sticking a pointer in the attribute payload of this API, which I > > think is something to be discouraged. > If documentation is what differentiate for you between silly and smart > then write documentation instead of new interfaces. > > KVM_SET_IRQCHIP/KVM_GET_IRQCHIP is defined to operate on blob of data on > x86, nothing prevent you from adding MPIC specifics to the interface, > Add mpic state into kvm_irqchip structure and if 512 bytes is not enough > for you to transfer the state put pointers there and _document_ them. > But with 512 bytes you can transfer properties inline, so you probably > do not need pointer there anyway. I see you have three properties 2 of > them 32bit and one 64bit. > > >It'd also be using > > KVM_SET_IRQCHIP to read data, which is the sort of thing you object > > to later on regarding KVM_IRQ_LINE_STATUS. > > > Do not see why. > > > Then there's the silliness of transporting 512 bytes just to read a > > descriptor for transporting something else. > > > Yes, agree. But is this enough of a reason to introduce entirely new > interface? Is it on performance critical path? Doubt it, unless you > abuse the interface to send interrupts, but then isn't it silty to > do copy_from_user() twice to inject an interrupt like proposed interface > does? > > > >For signaling irqs (I think this is what you mean by "commands") > > >we have KVM_IRQ_LINE. > > > > It's one type of command. Another is setting the address. Another > > is writing to registers that have side effects (this is how MSI > > injection is done on MPIC, just as in real hardware). > > > Setting the address is setting an attribute. Sending MSI is a command. > Things you set/get during init/migration are attributes. Things you do > to cause side-effects are commands. Yes, it would be good to restrict what can be done via the interface (to avoid abstracting away problems). At a first glance, i would say it should allow for "initial configuration of device state", such as registers etc. Why exactly you need to set attributes Scott? That's best answered by looking at patch 6/6 and discussing the actual attributes that are defined so far. The need to set the base address of the registers is straightforward. When ARM added their special ioctl for this, we discussed it being eventually replaced with a more general API (in fact, that was the reason they put "ARM" in the name). Access to device registers was originally intended for debugging, and eventually migration. It turned out to also be very useful for injecting MSIs -- nothing special needed to be done. It Just Works(tm) the same way it does in hardware, with an MMIO write from a PCI device to a specific MPIC register. Again, irqfd may complicate this, but there's no need for QEMU-generated MSIs to have to take a circuitous path. Finally, there's the interrupt source attribute. Even if we get rid of that, we'll need MPIC-specific documentation on how to flatten the IRQ numberspace, and if we ever have a cascaded PIC situation things could get ugly since there's no way to identify a specific IRQ controller in KVM_IRQ_LINE. Also related to this question is the point of avoiding the implementation of devices to be spread across userspace and the kernel (this is one point Avi brought up often). If the device emulation is implemented entirely in the kernel, all is necessary are initial values of device registers (and retrieval of those values later for savevm/migration). MPIC emulation is entirely in the kernel with this patchset -- though the register that lets you reset cores will likely need to be kicked back to QEMU. It is then not necessary to set device attributes on a live guest and deal with the complications associated with that. Which complications? -Scott -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord.
Re: [RFC PATCH 1/6] kvm: add device control API
On 02/20/2013 03:17:27 PM, Marcelo Tosatti wrote: On Mon, Feb 18, 2013 at 05:01:40PM -0600, Scott Wood wrote: > On 02/18/2013 06:21:59 AM, Gleb Natapov wrote: > >Copying Christoffer since ARM has in kernel irq chip too. > > > >On Wed, Feb 13, 2013 at 11:49:15PM -0600, Scott Wood wrote: > >> Currently, devices that are emulated inside KVM are configured in a > >> hardcoded manner based on an assumption that any given architecture > >> only has one way to do it. If there's any need to access device > >state, > >> it is done through inflexible one-purpose-only IOCTLs (e.g. > >> KVM_GET/SET_LAPIC). Defining new IOCTLs for every little thing is > >> cumbersome and depletes a limited numberspace. Creation of ioctl has advantages. It makes explicit what the data contains and how it should be interpreted. This means that for example, policy control can be performed at ioctl level (can group, by ioctl number, which ioctl calls are allowed after VM creation, for example). It also makes it explicit that its a userspace interface which applications depend. With a single 'set device attribute' ioctl its more difficult to do that. I don't see why creating ioctls rather than device attributes (or whatever you want to call them) differs this way. Device attributes are inherently userspace interfaces, just as ioctls are. What the data contains and how it is interpreted are documented, albeit in a more lightweight format than KVM usually uses for ioctls. An ioctl is no guarantee of good documentation. KVM is far better than most of the kernel in that regard, but even there some ioctls are missing (e.g. KVM_IRQ_LINE_STATUS as mentioned elsewhere in this thread), and some others are inadequately explained (e.g. IRQ routing). By "policy control", do you just mean that some ioctls act on a /dev/kvm fd, others on a vm fd, and others on a vcpu fd? This is pretty similar to having a device fd, except for the mechanism used. The main things that I dislike about adding new things at the ioctl level are: - limited numberspace, and more prone to merge conflicts than a device-specific numberspace - having to add a new pathway to get into the driver for each ioctl, rather than having all operations on a particular device go to the right place, and the device implementation interprets further (this assumes that we're talking about vm ioctls, and not returning a new fd for the device) - awkward way of negotiating capabilities with userspace (have to declare the capability id separately, and add code somewhere outside the device implementation to respond to capability requests) - api.txt formatting that imposes yet another document-internal numberspace to conflict on :-) Abstracting the details also makes it cumbersome to read strace output :-) You'd have to update strace one way or another if you want pretty-printed output. Having a more restricted API than arbitrary ioctls could actually improve the situation -- this could be a good reason for including the attribute length as an explicit parameter rather than being implicit in the attribute id. Then you'd just teach strace about the device control API once, and not for each new attribute or device that gets added. > >> This API provides a mechanism to instantiate a device of a certain > >> type, returning an ID that can be used to set/get attributes of the > >> device. Attributes may include configuration parameters (e.g. > >> register base address), device state, operational commands, etc. It > >> is similar to the ONE_REG API, except that it acts on devices rather > >> than vcpus. > >You are not only provide different way to create in kernel irq > >chip you > >also use an alternate way to trigger interrupt lines. Before going > >into > >interface specifics lets think about whether it is really worth it? > > Which "it" do you mean here? > > The ability to set/get attributes is needed. Sorry, but "get or set > one blob of data, up to 512 bytes, for the entire irqchip" is just > not good enough -- assuming you don't want us to start sticking > pointers and commands in *that* data. :-) Why not? Is it necessary to constantly read/write attributes? It's not about how often we do it, but how much state we have, especially if we ever want to implement migration. > If you mean the way to inject interrupts, it's simpler this way. Are you injecting interrupts via this new SET_DEVICE_ATTRIBUTE ioctl? Yes, but even if that gets shot down (the best objection IMHO is the one nobody is raising -- how to hook into irqfd), we still need the rest of it. I think we'd even want to keep attributes for IRQ line status so that we have a way to read it, even if just for debugging. -Scott -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 1/6] kvm: add device control API
On Wed, Feb 20, 2013 at 06:28:24PM -0300, Marcelo Tosatti wrote: > On Wed, Feb 20, 2013 at 03:09:49PM +0200, Gleb Natapov wrote: > > On Tue, Feb 19, 2013 at 03:16:37PM -0600, Scott Wood wrote: > > > On 02/19/2013 06:24:18 AM, Gleb Natapov wrote: > > > >On Mon, Feb 18, 2013 at 05:01:40PM -0600, Scott Wood wrote: > > > >> The ability to set/get attributes is needed. Sorry, but "get or set > > > >> one blob of data, up to 512 bytes, for the entire irqchip" is just > > > >> not good enough -- assuming you don't want us to start sticking > > > >> pointers and commands in *that* data. :-) > > > >> > > > >Proposed interface sticks pointers into ioctl data, so why doing > > > >the same > > > >for KVM_SET_IRQCHIP/KVM_GET_IRQCHIP makes you smile. > > > > > > There's a difference between putting a pointer in an ioctl control > > > structure that is specifically documented as being that way (as in > > > ONE_REG), versus taking an ioctl that claims to be setting/getting a > > > blob of state and embedding pointers in it. It would be like > > > sticking a pointer in the attribute payload of this API, which I > > > think is something to be discouraged. > > If documentation is what differentiate for you between silly and smart > > then write documentation instead of new interfaces. > > > > KVM_SET_IRQCHIP/KVM_GET_IRQCHIP is defined to operate on blob of data on > > x86, nothing prevent you from adding MPIC specifics to the interface, > > Add mpic state into kvm_irqchip structure and if 512 bytes is not enough > > for you to transfer the state put pointers there and _document_ them. > > But with 512 bytes you can transfer properties inline, so you probably > > do not need pointer there anyway. I see you have three properties 2 of > > them 32bit and one 64bit. > > > > >It'd also be using > > > KVM_SET_IRQCHIP to read data, which is the sort of thing you object > > > to later on regarding KVM_IRQ_LINE_STATUS. > > > > > Do not see why. > > > > > Then there's the silliness of transporting 512 bytes just to read a > > > descriptor for transporting something else. > > > > > Yes, agree. But is this enough of a reason to introduce entirely new > > interface? Is it on performance critical path? Doubt it, unless you > > abuse the interface to send interrupts, but then isn't it silty to > > do copy_from_user() twice to inject an interrupt like proposed interface > > does? > > > > > >For signaling irqs (I think this is what you mean by "commands") > > > >we have KVM_IRQ_LINE. > > > > > > It's one type of command. Another is setting the address. Another > > > is writing to registers that have side effects (this is how MSI > > > injection is done on MPIC, just as in real hardware). > > > > > Setting the address is setting an attribute. Sending MSI is a command. > > Things you set/get during init/migration are attributes. Things you do > > to cause side-effects are commands. > > Yes, it would be good to restrict what can be done via the interface > (to avoid abstracting away problems). At a first glance, i would say > it should allow for "initial configuration of device state", such as > registers etc. > > Why exactly you need to set attributes Scott? > > > > What is the benefit of KVM_IRQ_LINE over what MPIC does? What real > > > (non-glue/wrapper) code can become common? > > > > > No new ioctl with exactly same result (well actually even faster since > > less copying is done). You need to show us the benefits of the new interface > > vs existing one, not vice versa. > > Also related to this question is the point of avoiding the > implementation of devices to be spread across userspace and the kernel > (this is one point Avi brought up often). If the device emulation > is implemented entirely in the kernel, all is necessary are initial > values of device registers (and retrieval of those values later for > savevm/migration). > > It is then not necessary to set device attributes on a live guest and > deal with the complications associated with that. > > I suppose the atomic-test-and-set type operations introduced to ONE_REG ioctls are one example of such complications. Avi, can you remind us what are the drawbacks of having separate userspace/kernel device emulation? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 1/6] kvm: add device control API
On Wed, Feb 20, 2013 at 03:09:49PM +0200, Gleb Natapov wrote: > On Tue, Feb 19, 2013 at 03:16:37PM -0600, Scott Wood wrote: > > On 02/19/2013 06:24:18 AM, Gleb Natapov wrote: > > >On Mon, Feb 18, 2013 at 05:01:40PM -0600, Scott Wood wrote: > > >> The ability to set/get attributes is needed. Sorry, but "get or set > > >> one blob of data, up to 512 bytes, for the entire irqchip" is just > > >> not good enough -- assuming you don't want us to start sticking > > >> pointers and commands in *that* data. :-) > > >> > > >Proposed interface sticks pointers into ioctl data, so why doing > > >the same > > >for KVM_SET_IRQCHIP/KVM_GET_IRQCHIP makes you smile. > > > > There's a difference between putting a pointer in an ioctl control > > structure that is specifically documented as being that way (as in > > ONE_REG), versus taking an ioctl that claims to be setting/getting a > > blob of state and embedding pointers in it. It would be like > > sticking a pointer in the attribute payload of this API, which I > > think is something to be discouraged. > If documentation is what differentiate for you between silly and smart > then write documentation instead of new interfaces. > > KVM_SET_IRQCHIP/KVM_GET_IRQCHIP is defined to operate on blob of data on > x86, nothing prevent you from adding MPIC specifics to the interface, > Add mpic state into kvm_irqchip structure and if 512 bytes is not enough > for you to transfer the state put pointers there and _document_ them. > But with 512 bytes you can transfer properties inline, so you probably > do not need pointer there anyway. I see you have three properties 2 of > them 32bit and one 64bit. > > >It'd also be using > > KVM_SET_IRQCHIP to read data, which is the sort of thing you object > > to later on regarding KVM_IRQ_LINE_STATUS. > > > Do not see why. > > > Then there's the silliness of transporting 512 bytes just to read a > > descriptor for transporting something else. > > > Yes, agree. But is this enough of a reason to introduce entirely new > interface? Is it on performance critical path? Doubt it, unless you > abuse the interface to send interrupts, but then isn't it silty to > do copy_from_user() twice to inject an interrupt like proposed interface > does? > > > >For signaling irqs (I think this is what you mean by "commands") > > >we have KVM_IRQ_LINE. > > > > It's one type of command. Another is setting the address. Another > > is writing to registers that have side effects (this is how MSI > > injection is done on MPIC, just as in real hardware). > > > Setting the address is setting an attribute. Sending MSI is a command. > Things you set/get during init/migration are attributes. Things you do > to cause side-effects are commands. Yes, it would be good to restrict what can be done via the interface (to avoid abstracting away problems). At a first glance, i would say it should allow for "initial configuration of device state", such as registers etc. Why exactly you need to set attributes Scott? > > What is the benefit of KVM_IRQ_LINE over what MPIC does? What real > > (non-glue/wrapper) code can become common? > > > No new ioctl with exactly same result (well actually even faster since > less copying is done). You need to show us the benefits of the new interface > vs existing one, not vice versa. Also related to this question is the point of avoiding the implementation of devices to be spread across userspace and the kernel (this is one point Avi brought up often). If the device emulation is implemented entirely in the kernel, all is necessary are initial values of device registers (and retrieval of those values later for savevm/migration). It is then not necessary to set device attributes on a live guest and deal with the complications associated with that. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 1/6] kvm: add device control API
On Mon, Feb 18, 2013 at 05:01:40PM -0600, Scott Wood wrote: > On 02/18/2013 06:21:59 AM, Gleb Natapov wrote: > >Copying Christoffer since ARM has in kernel irq chip too. > > > >On Wed, Feb 13, 2013 at 11:49:15PM -0600, Scott Wood wrote: > >> Currently, devices that are emulated inside KVM are configured in a > >> hardcoded manner based on an assumption that any given architecture > >> only has one way to do it. If there's any need to access device > >state, > >> it is done through inflexible one-purpose-only IOCTLs (e.g. > >> KVM_GET/SET_LAPIC). Defining new IOCTLs for every little thing is > >> cumbersome and depletes a limited numberspace. Creation of ioctl has advantages. It makes explicit what the data contains and how it should be interpreted. This means that for example, policy control can be performed at ioctl level (can group, by ioctl number, which ioctl calls are allowed after VM creation, for example). It also makes it explicit that its a userspace interface which applications depend. With a single 'set device attribute' ioctl its more difficult to do that. Abstracting the details also makes it cumbersome to read strace output :-) > >> This API provides a mechanism to instantiate a device of a certain > >> type, returning an ID that can be used to set/get attributes of the > >> device. Attributes may include configuration parameters (e.g. > >> register base address), device state, operational commands, etc. It > >> is similar to the ONE_REG API, except that it acts on devices rather > >> than vcpus. > >You are not only provide different way to create in kernel irq > >chip you > >also use an alternate way to trigger interrupt lines. Before going > >into > >interface specifics lets think about whether it is really worth it? > > Which "it" do you mean here? > > The ability to set/get attributes is needed. Sorry, but "get or set > one blob of data, up to 512 bytes, for the entire irqchip" is just > not good enough -- assuming you don't want us to start sticking > pointers and commands in *that* data. :-) Why not? Is it necessary to constantly read/write attributes? > If you mean the way to inject interrupts, it's simpler this way. Are you injecting interrupts via this new SET_DEVICE_ATTRIBUTE ioctl? > Why go out of our way to inject common glue code into a > communication path between hw/kvm/mpic.c in QEMU and > arch/powerpc/kvm/mpic.c in KVM? Or rather, why make that common > glue be specific to this one function when we could reuse the same > communication glue used for other things, such as device state? > > And that's just for regular interrupts. MSIs are vastly simpler on > MPIC than what x86 does. > > >x86 obviously support old way and will have to for some, very > >long, time. > > Sure. > > >ARM vGIC code, that is ready to go upstream, uses old way too. So > >it will > >be 2 archs against one. > > I wasn't aware that that's how it worked. :-P > > I was trying to be considerate by not making the entire thing > gratuitously PPC or MPIC specific, as some others seem inclined to > do (e.g. see irqfd and APIC). We already had a discussion on ARM's > "set address" ioctl and rather than extend *that* interface, they > preferred to just stick something ARM-specific in ASAP with the > understanding that it would be replaced (or more accurately, kept > around as a thin wrapper around the new stuff) later. > > >Christoffer do you think the proposed way it > >better for your needs. Are you willing to make vGIC use it? > > > >Scott, what other devices are you planning to support with this > >interface? > > At the moment I do not have plans for other devices, though what > does it hurt for the capability to be there? > > -Scott -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2][QEMU] vmxcap: Report APIC register emulation and RDTSCP control
On Mon, Feb 18, 2013 at 07:56:54AM +0100, Jan Kiszka wrote: > From: Jan Kiszka > > Signed-off-by: Jan Kiszka > --- > > This time I've checked twice that I'm no longer missing a field. Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/9] KVM: PPC: Book3S: Add kernel emulation for the XICS interrupt controller
On Sat, Feb 16, 2013 at 01:56:14PM +1100, Paul Mackerras wrote: > On Fri, Feb 15, 2013 at 05:59:11PM -0600, Scott Wood wrote: > > On 02/15/2013 05:18:31 PM, Paul Mackerras wrote: > > >On Fri, Feb 15, 2013 at 02:05:41PM -0600, Scott Wood wrote: > > >> On 02/14/2013 06:01:08 PM, Paul Mackerras wrote: > > >> >From: Benjamin Herrenschmidt > > >> > > > >> >This adds in-kernel emulation of the XICS (eXternal Interrupt > > >> >Controller Specification) interrupt controller specified by > > >PAPR, for > > >> >both HV and PR KVM guests. > > >> > > > >> >This adds a new KVM_CREATE_IRQCHIP_ARGS ioctl, which is like > > >> >KVM_CREATE_IRQCHIP in that it indicates that the virtual machine > > >> >should use in-kernel interrupt controller emulation, but also > > >takes an > > >> >argument struct that contains the type of interrupt controller > > >> >architecture and an optional parameter. Currently only one > > >type value > > >> >is defined, that which indicates the XICS architecture. > > >> > > >> Would the device config API I posted a couple days ago work for you? > > > > > >I suppose it could be made to work. It doesn't feel like a natural > > >fit though, because your API seems to assume (AFAICT) that a device is > > >manipulated via registers at specific physical addresses, so I would > > >have to invent an artificial set of registers with addresses and bit > > >layouts, that aren't otherwise required. The XICS is operated from > > >the guest side via hcalls, not via emulated MMIO. > > > > I don't think it makes such an assumption. The MPIC device has > > physical registers, so it exposes them, but it also exposes things > > that are not physical registers (e.g. the per-IRQ input state). The > > generic device control layer leaves interpretation of attributes up > > to the device. > > > > I think it would be easier to fit XICS into the device control api > > model than to fit MPIC into this model, not to mention what would > > happen if we later want to emulate some other type of device -- x86 > > already has at least one non-irqchip emulated device (i8254). > > I have no particular objection to the device control API per se, but > I have two objections to using it as the primary interface to the XICS > emulation. > > First, I dislike the magical side-effect where creating a device of a > particular type (e.g. MPIC or XICS) automatically attaches it to the > interrupt lines of the vcpus. I prefer an explicit request to do > in-kernel interrupt control. This is probably a stupid question, but why the KVM_SET_IRQCHIP/KVM_SET_GSI_ROUTING interface is not appropriate for your purposes? x86 sets up a default GSI->IRQCHIP PIN mapping on creation (during KVM_SET_IRQCHIP), but it can be modified with KVM_SET_GSI_ROUTING. > Further, the magic means that you can > only have one instance of the device, whereas you might want to model > the interrupt controller architecture as several devices. You could > do that using several device types, but then the interconnections > between them would also be magic. > > Secondly, it means that we are completely abandoning any attempt to > define an abstract or generic interface to in-kernel interrupt > controller emulations. Each device will have its own unique set of > attribute groups and its own unique userspace code to drive it, with > no commonality between them. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: nVMX: Rework event injection and recovery
On 2013-02-20 18:24, Jan Kiszka wrote: > On 2013-02-20 18:01, Gleb Natapov wrote: >> On Wed, Feb 20, 2013 at 03:37:51PM +0100, Jan Kiszka wrote: >>> On 2013-02-20 15:14, Nadav Har'El wrote: Hi, By the way, if you haven't seen my description of why the current code did what it did, take a look at http://www.mail-archive.com/kvm@vger.kernel.org/msg54478.html Another description might also come in handy: http://www.mail-archive.com/kvm@vger.kernel.org/msg54476.html On Wed, Feb 20, 2013, Jan Kiszka wrote about "[PATCH] KVM: nVMX: Rework event injection and recovery": > This aligns VMX more with SVM regarding event injection and recovery for > nested guests. The changes allow to inject interrupts directly from L0 > to L2. > > One difference to SVM is that we always transfer the pending event > injection into the architectural state of the VCPU and then drop it from > there if it turns out that we left L2 to enter L1. Last time I checked, if I'm remembering correctly, the nested SVM code did something a bit different: After the exit from L2 to L1 and unnecessarily queuing the pending interrupt for injection, it skipped one entry into L1, and as usual after the entry the interrupt queue is cleared so next time around, when L1 one is really entered, the wrong injection is not attempted. > VMX and SVM are now identical in how they recover event injections from > unperformed vmlaunch/vmresume: We detect that VM_ENTRY_INTR_INFO_FIELD > still contains a valid event and, if yes, transfer the content into L1's > idt_vectoring_info_field. > To avoid that we incorrectly leak an event into the architectural VCPU > state that L1 wants to inject, we skip cancellation on nested run. I didn't understand this last point. >>> >>> - prepare_vmcs02 sets event to be injected into L2 >>> - while trying to enter L2, a cancel condition is met >>> - we call vmx_cancel_interrupts but should now avoid filling L1's event >>> into the arch event queues - it's kept in vmcs12 >>> >> But what if we put it in arch event queue? It will be reinjected during >> next entry attempt, so nothing bad happens and we have one less if() to >> explain, >> or do I miss something terrible that will happen? > > I started without that if but ran into troubles with KVM-on-KVM (L1 > locks up). Let me dig out the instrumentation and check the event flow > again. OK, got it again: If we transfer an IRQ that L1 wants to send to L2 into the architectural VCPU state, we will also trigger enable_irq_window. And that raises KVM_REQ_IMMEDIATE_EXIT again as it thinks L0 wants inject. That will send us into an endless loop. Not sure if we can and should handle this scenario in enable_irq_window in a nicer way. Open for suggestions. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: nVMX: Rework event injection and recovery
On 2013-02-20 18:01, Gleb Natapov wrote: > On Wed, Feb 20, 2013 at 03:37:51PM +0100, Jan Kiszka wrote: >> On 2013-02-20 15:14, Nadav Har'El wrote: >>> Hi, >>> >>> By the way, if you haven't seen my description of why the current code >>> did what it did, take a look at >>> http://www.mail-archive.com/kvm@vger.kernel.org/msg54478.html >>> Another description might also come in handy: >>> http://www.mail-archive.com/kvm@vger.kernel.org/msg54476.html >>> >>> On Wed, Feb 20, 2013, Jan Kiszka wrote about "[PATCH] KVM: nVMX: Rework >>> event injection and recovery": This aligns VMX more with SVM regarding event injection and recovery for nested guests. The changes allow to inject interrupts directly from L0 to L2. One difference to SVM is that we always transfer the pending event injection into the architectural state of the VCPU and then drop it from there if it turns out that we left L2 to enter L1. >>> >>> Last time I checked, if I'm remembering correctly, the nested SVM code did >>> something a bit different: After the exit from L2 to L1 and unnecessarily >>> queuing the pending interrupt for injection, it skipped one entry into L1, >>> and as usual after the entry the interrupt queue is cleared so next time >>> around, when L1 one is really entered, the wrong injection is not attempted. >>> VMX and SVM are now identical in how they recover event injections from unperformed vmlaunch/vmresume: We detect that VM_ENTRY_INTR_INFO_FIELD still contains a valid event and, if yes, transfer the content into L1's idt_vectoring_info_field. >>> To avoid that we incorrectly leak an event into the architectural VCPU state that L1 wants to inject, we skip cancellation on nested run. >>> >>> I didn't understand this last point. >> >> - prepare_vmcs02 sets event to be injected into L2 >> - while trying to enter L2, a cancel condition is met >> - we call vmx_cancel_interrupts but should now avoid filling L1's event >> into the arch event queues - it's kept in vmcs12 >> > But what if we put it in arch event queue? It will be reinjected during > next entry attempt, so nothing bad happens and we have one less if() to > explain, > or do I miss something terrible that will happen? I started without that if but ran into troubles with KVM-on-KVM (L1 locks up). Let me dig out the instrumentation and check the event flow again. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bisected][-next-20130204+] [x86/kvm] udevd:[97]: segfault at ffffffffff5fd020 ip 00007fff069e277f sp 00007fff068c9ef8 error d
On Tue, 2013-02-19 at 10:26 +0200, Gleb Natapov wrote: > On Mon, Feb 18, 2013 at 08:12:21PM -0500, Peter Hurley wrote: > > On Mon, 2013-02-18 at 19:59 -0300, Marcelo Tosatti wrote: > > > On Wed, Feb 13, 2013 at 06:57:09AM -0500, Peter Hurley wrote: > > > > On Wed, 2013-02-13 at 12:51 +0200, Gleb Natapov wrote: > > > > > On Tue, Feb 12, 2013 at 04:39:03PM -0800, H. Peter Anvin wrote: > > > > > > On 02/12/2013 04:26 PM, Peter Hurley wrote: > > > > > > > With -next-20130204+ in ubuntu 12.10 VM (so the 80x25 VGA > > > > > > > device/console): > > > > > > > > > > > > > > [0.666410] udevd[97]: starting version 175 > > > > > > > [0.674043] udevd[97]: udevd:[97]: segfault at > > > > > > > ff5fd020 ip 7fff069e277f sp 7fff068c9ef8 error d > > > > > > > > > > > > > > and boots to an initramfs prompt. > > > > > > > > > > > > > > git bisect (log attached) blames: > > > > > > > > > > > > > > commit 7b5c4a65cc27f017c170b025f8d6d75dabb11c6f > > > > > > > Merge: 3596f5b 949db15 > > > > > > > Author: H. Peter Anvin > > > > > > > Date: Fri Jan 25 16:31:21 2013 -0800 > > > > > > > > > > > > > > Merge tag 'v3.8-rc5' into x86/mm > > > > > > > > > > > > > > The __pa() fixup series that follows touches KVM code that is > > > > > > > not > > > > > > > present in the existing branch based on v3.7-rc5, so merge in > > > > > > > the > > > > > > > current upstream from Linus. > > > > > > > > > > > > > > Signed-off-by: H. Peter Anvin > > > > > > > > > > > > > > > > > > > > > This only happens with the VGA device/console but that is the > > > > > > > default > > > > > > > configuration for Ubuntu/KVM because it blacklists pretty much > > > > > > > every fb > > > > > > > driver. > > > > > > > > > > > > > > > > > > > I am guessing this is another bad use of __pa()... need to look > > > > > > into that. > > > > > > Can't find this commit on kvm.git or linux-2.6.git. Is it reproducible > > > there? > > > > This is in the linux-next repo (any git tag after 'next-20130204' will > > reproduce this). It's a pretty large merge commit. > > > > This doesn't happen on 3.8-rc7. > > > > I'll try to repro this on kvm.git sometime this week. Otherwise, we can > > wait for it to show up in 3.9. > > > Can you also drop 5dfd486c4750c9278c63fa96e6e85bdd2fb58e9d from linux-next > and reproduce? Still reproduced on next-20130204 after reverting 5dfd486c. Haven't tested kvm.git and newest next. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: nVMX: Rework event injection and recovery
On Wed, Feb 20, 2013 at 03:37:51PM +0100, Jan Kiszka wrote: > On 2013-02-20 15:14, Nadav Har'El wrote: > > Hi, > > > > By the way, if you haven't seen my description of why the current code > > did what it did, take a look at > > http://www.mail-archive.com/kvm@vger.kernel.org/msg54478.html > > Another description might also come in handy: > > http://www.mail-archive.com/kvm@vger.kernel.org/msg54476.html > > > > On Wed, Feb 20, 2013, Jan Kiszka wrote about "[PATCH] KVM: nVMX: Rework > > event injection and recovery": > >> This aligns VMX more with SVM regarding event injection and recovery for > >> nested guests. The changes allow to inject interrupts directly from L0 > >> to L2. > >> > >> One difference to SVM is that we always transfer the pending event > >> injection into the architectural state of the VCPU and then drop it from > >> there if it turns out that we left L2 to enter L1. > > > > Last time I checked, if I'm remembering correctly, the nested SVM code did > > something a bit different: After the exit from L2 to L1 and unnecessarily > > queuing the pending interrupt for injection, it skipped one entry into L1, > > and as usual after the entry the interrupt queue is cleared so next time > > around, when L1 one is really entered, the wrong injection is not attempted. > > > >> VMX and SVM are now identical in how they recover event injections from > >> unperformed vmlaunch/vmresume: We detect that VM_ENTRY_INTR_INFO_FIELD > >> still contains a valid event and, if yes, transfer the content into L1's > >> idt_vectoring_info_field. > > > >> To avoid that we incorrectly leak an event into the architectural VCPU > >> state that L1 wants to inject, we skip cancellation on nested run. > > > > I didn't understand this last point. > > - prepare_vmcs02 sets event to be injected into L2 > - while trying to enter L2, a cancel condition is met > - we call vmx_cancel_interrupts but should now avoid filling L1's event > into the arch event queues - it's kept in vmcs12 > But what if we put it in arch event queue? It will be reinjected during next entry attempt, so nothing bad happens and we have one less if() to explain, or do I miss something terrible that will happen? -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: nVMX: Rework event injection and recovery
On Wed, Feb 20, 2013 at 05:48:40PM +0100, Jan Kiszka wrote: > On 2013-02-20 17:46, Gleb Natapov wrote: > > On Wed, Feb 20, 2013 at 02:01:47PM +0100, Jan Kiszka wrote: > >> This aligns VMX more with SVM regarding event injection and recovery for > >> nested guests. The changes allow to inject interrupts directly from L0 > >> to L2. > >> > >> One difference to SVM is that we always transfer the pending event > >> injection into the architectural state of the VCPU and then drop it from > >> there if it turns out that we left L2 to enter L1. > >> > >> VMX and SVM are now identical in how they recover event injections from > >> unperformed vmlaunch/vmresume: We detect that VM_ENTRY_INTR_INFO_FIELD > >> still contains a valid event and, if yes, transfer the content into L1's > >> idt_vectoring_info_field. > >> > >> To avoid that we incorrectly leak an event into the architectural VCPU > >> state that L1 wants to inject, we skip cancellation on nested run. > >> > >> Signed-off-by: Jan Kiszka > >> --- > >> > >> Survived moderate testing here and (currently) makes sense to me, but > >> please review very carefully. I wouldn't be surprised if I'm still > >> missing some subtle corner case. > >> > >> arch/x86/kvm/vmx.c | 57 > >> +++ > >> 1 files changed, 26 insertions(+), 31 deletions(-) > >> > >> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > >> index dd3a8a0..7d2fbd2 100644 > >> --- a/arch/x86/kvm/vmx.c > >> +++ b/arch/x86/kvm/vmx.c > >> @@ -6489,8 +6489,6 @@ static void __vmx_complete_interrupts(struct > >> vcpu_vmx *vmx, > >> > >> static void vmx_complete_interrupts(struct vcpu_vmx *vmx) > >> { > >> - if (is_guest_mode(&vmx->vcpu)) > >> - return; > >>__vmx_complete_interrupts(vmx, vmx->idt_vectoring_info, > >> VM_EXIT_INSTRUCTION_LEN, > >> IDT_VECTORING_ERROR_CODE); > >> @@ -6498,7 +6496,7 @@ static void vmx_complete_interrupts(struct vcpu_vmx > >> *vmx) > >> > >> static void vmx_cancel_injection(struct kvm_vcpu *vcpu) > >> { > >> - if (is_guest_mode(vcpu)) > >> + if (to_vmx(vcpu)->nested.nested_run_pending) > >>return; > > Why is this needed here? > > Please check if my reply to Nadav explains this sufficiently. > Ah, sorry. Will follow up there if it is not. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: nVMX: Rework event injection and recovery
On 2013-02-20 17:46, Gleb Natapov wrote: > On Wed, Feb 20, 2013 at 02:01:47PM +0100, Jan Kiszka wrote: >> This aligns VMX more with SVM regarding event injection and recovery for >> nested guests. The changes allow to inject interrupts directly from L0 >> to L2. >> >> One difference to SVM is that we always transfer the pending event >> injection into the architectural state of the VCPU and then drop it from >> there if it turns out that we left L2 to enter L1. >> >> VMX and SVM are now identical in how they recover event injections from >> unperformed vmlaunch/vmresume: We detect that VM_ENTRY_INTR_INFO_FIELD >> still contains a valid event and, if yes, transfer the content into L1's >> idt_vectoring_info_field. >> >> To avoid that we incorrectly leak an event into the architectural VCPU >> state that L1 wants to inject, we skip cancellation on nested run. >> >> Signed-off-by: Jan Kiszka >> --- >> >> Survived moderate testing here and (currently) makes sense to me, but >> please review very carefully. I wouldn't be surprised if I'm still >> missing some subtle corner case. >> >> arch/x86/kvm/vmx.c | 57 >> +++ >> 1 files changed, 26 insertions(+), 31 deletions(-) >> >> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c >> index dd3a8a0..7d2fbd2 100644 >> --- a/arch/x86/kvm/vmx.c >> +++ b/arch/x86/kvm/vmx.c >> @@ -6489,8 +6489,6 @@ static void __vmx_complete_interrupts(struct vcpu_vmx >> *vmx, >> >> static void vmx_complete_interrupts(struct vcpu_vmx *vmx) >> { >> -if (is_guest_mode(&vmx->vcpu)) >> -return; >> __vmx_complete_interrupts(vmx, vmx->idt_vectoring_info, >>VM_EXIT_INSTRUCTION_LEN, >>IDT_VECTORING_ERROR_CODE); >> @@ -6498,7 +6496,7 @@ static void vmx_complete_interrupts(struct vcpu_vmx >> *vmx) >> >> static void vmx_cancel_injection(struct kvm_vcpu *vcpu) >> { >> -if (is_guest_mode(vcpu)) >> +if (to_vmx(vcpu)->nested.nested_run_pending) >> return; > Why is this needed here? Please check if my reply to Nadav explains this sufficiently. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: nVMX: Rework event injection and recovery
On Wed, Feb 20, 2013 at 02:01:47PM +0100, Jan Kiszka wrote: > This aligns VMX more with SVM regarding event injection and recovery for > nested guests. The changes allow to inject interrupts directly from L0 > to L2. > > One difference to SVM is that we always transfer the pending event > injection into the architectural state of the VCPU and then drop it from > there if it turns out that we left L2 to enter L1. > > VMX and SVM are now identical in how they recover event injections from > unperformed vmlaunch/vmresume: We detect that VM_ENTRY_INTR_INFO_FIELD > still contains a valid event and, if yes, transfer the content into L1's > idt_vectoring_info_field. > > To avoid that we incorrectly leak an event into the architectural VCPU > state that L1 wants to inject, we skip cancellation on nested run. > > Signed-off-by: Jan Kiszka > --- > > Survived moderate testing here and (currently) makes sense to me, but > please review very carefully. I wouldn't be surprised if I'm still > missing some subtle corner case. > > arch/x86/kvm/vmx.c | 57 +++ > 1 files changed, 26 insertions(+), 31 deletions(-) > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index dd3a8a0..7d2fbd2 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -6489,8 +6489,6 @@ static void __vmx_complete_interrupts(struct vcpu_vmx > *vmx, > > static void vmx_complete_interrupts(struct vcpu_vmx *vmx) > { > - if (is_guest_mode(&vmx->vcpu)) > - return; > __vmx_complete_interrupts(vmx, vmx->idt_vectoring_info, > VM_EXIT_INSTRUCTION_LEN, > IDT_VECTORING_ERROR_CODE); > @@ -6498,7 +6496,7 @@ static void vmx_complete_interrupts(struct vcpu_vmx > *vmx) > > static void vmx_cancel_injection(struct kvm_vcpu *vcpu) > { > - if (is_guest_mode(vcpu)) > + if (to_vmx(vcpu)->nested.nested_run_pending) > return; Why is this needed here? > __vmx_complete_interrupts(to_vmx(vcpu), > vmcs_read32(VM_ENTRY_INTR_INFO_FIELD), > @@ -6531,21 +6529,6 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu > *vcpu) > struct vcpu_vmx *vmx = to_vmx(vcpu); > unsigned long debugctlmsr; > > - if (is_guest_mode(vcpu) && !vmx->nested.nested_run_pending) { > - struct vmcs12 *vmcs12 = get_vmcs12(vcpu); > - if (vmcs12->idt_vectoring_info_field & > - VECTORING_INFO_VALID_MASK) { > - vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, > - vmcs12->idt_vectoring_info_field); > - vmcs_write32(VM_ENTRY_INSTRUCTION_LEN, > - vmcs12->vm_exit_instruction_len); > - if (vmcs12->idt_vectoring_info_field & > - VECTORING_INFO_DELIVER_CODE_MASK) > - vmcs_write32(VM_ENTRY_EXCEPTION_ERROR_CODE, > - vmcs12->idt_vectoring_error_code); > - } > - } > - > /* Record the guest's net vcpu time for enforced NMI injections. */ > if (unlikely(!cpu_has_virtual_nmis() && vmx->soft_vnmi_blocked)) > vmx->entry_time = ktime_get(); > @@ -6704,17 +6687,6 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu > *vcpu) > > vmx->idt_vectoring_info = vmcs_read32(IDT_VECTORING_INFO_FIELD); > > - if (is_guest_mode(vcpu)) { > - struct vmcs12 *vmcs12 = get_vmcs12(vcpu); > - vmcs12->idt_vectoring_info_field = vmx->idt_vectoring_info; > - if (vmx->idt_vectoring_info & VECTORING_INFO_VALID_MASK) { > - vmcs12->idt_vectoring_error_code = > - vmcs_read32(IDT_VECTORING_ERROR_CODE); > - vmcs12->vm_exit_instruction_len = > - vmcs_read32(VM_EXIT_INSTRUCTION_LEN); > - } > - } > - > vmx->loaded_vmcs->launched = 1; > > vmx->exit_reason = vmcs_read32(VM_EXIT_REASON); > @@ -7403,9 +7375,32 @@ void prepare_vmcs12(struct kvm_vcpu *vcpu, struct > vmcs12 *vmcs12) > vmcs12->vm_exit_instruction_len = vmcs_read32(VM_EXIT_INSTRUCTION_LEN); > vmcs12->vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO); > > - /* clear vm-entry fields which are to be cleared on exit */ > - if (!(vmcs12->vm_exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY)) > + /* drop what we picked up for L0 via vmx_complete_interrupts */ > + vcpu->arch.nmi_injected = false; > + kvm_clear_exception_queue(vcpu); > + kvm_clear_interrupt_queue(vcpu); > + > + if (!(vmcs12->vm_exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY) && > + vmcs12->vm_entry_intr_info_field & INTR_INFO_VALID_MASK) { > + /* > + * Preserve the event that was supposed to be injected > + * by emulating it would have been re
Re: [PATCH] KVM: nVMX: Rework event injection and recovery
On 2013-02-20 16:57, Gleb Natapov wrote: > On Wed, Feb 20, 2013 at 04:51:39PM +0100, Jan Kiszka wrote: >> On 2013-02-20 16:30, Gleb Natapov wrote: >>> On Wed, Feb 20, 2013 at 03:53:53PM +0100, Jan Kiszka wrote: On 2013-02-20 14:01, Jan Kiszka wrote: > This aligns VMX more with SVM regarding event injection and recovery for > nested guests. The changes allow to inject interrupts directly from L0 > to L2. > > One difference to SVM is that we always transfer the pending event > injection into the architectural state of the VCPU and then drop it from > there if it turns out that we left L2 to enter L1. > > VMX and SVM are now identical in how they recover event injections from > unperformed vmlaunch/vmresume: We detect that VM_ENTRY_INTR_INFO_FIELD > still contains a valid event and, if yes, transfer the content into L1's > idt_vectoring_info_field. > > To avoid that we incorrectly leak an event into the architectural VCPU > state that L1 wants to inject, we skip cancellation on nested run. > > Signed-off-by: Jan Kiszka > --- > > Survived moderate testing here and (currently) makes sense to me, but > please review very carefully. I wouldn't be surprised if I'm still > missing some subtle corner case. Forgot to point this out again: It still takes "KVM: nVMX: Fix injection of PENDING_INTERRUPT and NMI_WINDOW exits to L1" to make L0->L2 injection work. So this patch logically depends on it. >>> But this patch has hunks from that patch. >> >> Not mechanically. >> > What do you mean? You can apply them in arbitrary order, just minor offset shifts will be the result. > >> If you prefer me merging them together, let me know. >> > For review not necessary, for applying preferably. OK, will wait for review on this, then send out a combo patch. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: nVMX: Rework event injection and recovery
On Wed, Feb 20, 2013 at 04:51:39PM +0100, Jan Kiszka wrote: > On 2013-02-20 16:30, Gleb Natapov wrote: > > On Wed, Feb 20, 2013 at 03:53:53PM +0100, Jan Kiszka wrote: > >> On 2013-02-20 14:01, Jan Kiszka wrote: > >>> This aligns VMX more with SVM regarding event injection and recovery for > >>> nested guests. The changes allow to inject interrupts directly from L0 > >>> to L2. > >>> > >>> One difference to SVM is that we always transfer the pending event > >>> injection into the architectural state of the VCPU and then drop it from > >>> there if it turns out that we left L2 to enter L1. > >>> > >>> VMX and SVM are now identical in how they recover event injections from > >>> unperformed vmlaunch/vmresume: We detect that VM_ENTRY_INTR_INFO_FIELD > >>> still contains a valid event and, if yes, transfer the content into L1's > >>> idt_vectoring_info_field. > >>> > >>> To avoid that we incorrectly leak an event into the architectural VCPU > >>> state that L1 wants to inject, we skip cancellation on nested run. > >>> > >>> Signed-off-by: Jan Kiszka > >>> --- > >>> > >>> Survived moderate testing here and (currently) makes sense to me, but > >>> please review very carefully. I wouldn't be surprised if I'm still > >>> missing some subtle corner case. > >> > >> Forgot to point this out again: It still takes "KVM: nVMX: Fix injection > >> of PENDING_INTERRUPT and NMI_WINDOW exits to L1" to make L0->L2 > >> injection work. So this patch logically depends on it. > >> > > But this patch has hunks from that patch. > > Not mechanically. > What do you mean? > If you prefer me merging them together, let me know. > For review not necessary, for applying preferably. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: nVMX: Rework event injection and recovery
On 2013-02-20 16:30, Gleb Natapov wrote: > On Wed, Feb 20, 2013 at 03:53:53PM +0100, Jan Kiszka wrote: >> On 2013-02-20 14:01, Jan Kiszka wrote: >>> This aligns VMX more with SVM regarding event injection and recovery for >>> nested guests. The changes allow to inject interrupts directly from L0 >>> to L2. >>> >>> One difference to SVM is that we always transfer the pending event >>> injection into the architectural state of the VCPU and then drop it from >>> there if it turns out that we left L2 to enter L1. >>> >>> VMX and SVM are now identical in how they recover event injections from >>> unperformed vmlaunch/vmresume: We detect that VM_ENTRY_INTR_INFO_FIELD >>> still contains a valid event and, if yes, transfer the content into L1's >>> idt_vectoring_info_field. >>> >>> To avoid that we incorrectly leak an event into the architectural VCPU >>> state that L1 wants to inject, we skip cancellation on nested run. >>> >>> Signed-off-by: Jan Kiszka >>> --- >>> >>> Survived moderate testing here and (currently) makes sense to me, but >>> please review very carefully. I wouldn't be surprised if I'm still >>> missing some subtle corner case. >> >> Forgot to point this out again: It still takes "KVM: nVMX: Fix injection >> of PENDING_INTERRUPT and NMI_WINDOW exits to L1" to make L0->L2 >> injection work. So this patch logically depends on it. >> > But this patch has hunks from that patch. Not mechanically. If you prefer me merging them together, let me know. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: nVMX: Rework event injection and recovery
On Wed, Feb 20, 2013 at 03:53:53PM +0100, Jan Kiszka wrote: > On 2013-02-20 14:01, Jan Kiszka wrote: > > This aligns VMX more with SVM regarding event injection and recovery for > > nested guests. The changes allow to inject interrupts directly from L0 > > to L2. > > > > One difference to SVM is that we always transfer the pending event > > injection into the architectural state of the VCPU and then drop it from > > there if it turns out that we left L2 to enter L1. > > > > VMX and SVM are now identical in how they recover event injections from > > unperformed vmlaunch/vmresume: We detect that VM_ENTRY_INTR_INFO_FIELD > > still contains a valid event and, if yes, transfer the content into L1's > > idt_vectoring_info_field. > > > > To avoid that we incorrectly leak an event into the architectural VCPU > > state that L1 wants to inject, we skip cancellation on nested run. > > > > Signed-off-by: Jan Kiszka > > --- > > > > Survived moderate testing here and (currently) makes sense to me, but > > please review very carefully. I wouldn't be surprised if I'm still > > missing some subtle corner case. > > Forgot to point this out again: It still takes "KVM: nVMX: Fix injection > of PENDING_INTERRUPT and NMI_WINDOW exits to L1" to make L0->L2 > injection work. So this patch logically depends on it. > But this patch has hunks from that patch. > Jan > > > > > arch/x86/kvm/vmx.c | 57 > > +++ > > 1 files changed, 26 insertions(+), 31 deletions(-) > > > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > > index dd3a8a0..7d2fbd2 100644 > > --- a/arch/x86/kvm/vmx.c > > +++ b/arch/x86/kvm/vmx.c > > @@ -6489,8 +6489,6 @@ static void __vmx_complete_interrupts(struct vcpu_vmx > > *vmx, > > > > static void vmx_complete_interrupts(struct vcpu_vmx *vmx) > > { > > - if (is_guest_mode(&vmx->vcpu)) > > - return; > > __vmx_complete_interrupts(vmx, vmx->idt_vectoring_info, > > VM_EXIT_INSTRUCTION_LEN, > > IDT_VECTORING_ERROR_CODE); > > @@ -6498,7 +6496,7 @@ static void vmx_complete_interrupts(struct vcpu_vmx > > *vmx) > > > > static void vmx_cancel_injection(struct kvm_vcpu *vcpu) > > { > > - if (is_guest_mode(vcpu)) > > + if (to_vmx(vcpu)->nested.nested_run_pending) > > return; > > __vmx_complete_interrupts(to_vmx(vcpu), > > vmcs_read32(VM_ENTRY_INTR_INFO_FIELD), > > @@ -6531,21 +6529,6 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu > > *vcpu) > > struct vcpu_vmx *vmx = to_vmx(vcpu); > > unsigned long debugctlmsr; > > > > - if (is_guest_mode(vcpu) && !vmx->nested.nested_run_pending) { > > - struct vmcs12 *vmcs12 = get_vmcs12(vcpu); > > - if (vmcs12->idt_vectoring_info_field & > > - VECTORING_INFO_VALID_MASK) { > > - vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, > > - vmcs12->idt_vectoring_info_field); > > - vmcs_write32(VM_ENTRY_INSTRUCTION_LEN, > > - vmcs12->vm_exit_instruction_len); > > - if (vmcs12->idt_vectoring_info_field & > > - VECTORING_INFO_DELIVER_CODE_MASK) > > - vmcs_write32(VM_ENTRY_EXCEPTION_ERROR_CODE, > > - vmcs12->idt_vectoring_error_code); > > - } > > - } > > - > > /* Record the guest's net vcpu time for enforced NMI injections. */ > > if (unlikely(!cpu_has_virtual_nmis() && vmx->soft_vnmi_blocked)) > > vmx->entry_time = ktime_get(); > > @@ -6704,17 +6687,6 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu > > *vcpu) > > > > vmx->idt_vectoring_info = vmcs_read32(IDT_VECTORING_INFO_FIELD); > > > > - if (is_guest_mode(vcpu)) { > > - struct vmcs12 *vmcs12 = get_vmcs12(vcpu); > > - vmcs12->idt_vectoring_info_field = vmx->idt_vectoring_info; > > - if (vmx->idt_vectoring_info & VECTORING_INFO_VALID_MASK) { > > - vmcs12->idt_vectoring_error_code = > > - vmcs_read32(IDT_VECTORING_ERROR_CODE); > > - vmcs12->vm_exit_instruction_len = > > - vmcs_read32(VM_EXIT_INSTRUCTION_LEN); > > - } > > - } > > - > > vmx->loaded_vmcs->launched = 1; > > > > vmx->exit_reason = vmcs_read32(VM_EXIT_REASON); > > @@ -7403,9 +7375,32 @@ void prepare_vmcs12(struct kvm_vcpu *vcpu, struct > > vmcs12 *vmcs12) > > vmcs12->vm_exit_instruction_len = vmcs_read32(VM_EXIT_INSTRUCTION_LEN); > > vmcs12->vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO); > > > > - /* clear vm-entry fields which are to be cleared on exit */ > > - if (!(vmcs12->vm_exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY)) > > + /* drop what we picked up for L0 via vmx_complete_interrupts */ > > + vcpu->arch.nmi_injected =
[Bug 54161] nVMX: nested vpid
https://bugzilla.kernel.org/show_bug.cgi?id=54161 Nadav Har'El changed: What|Removed |Added Blocks||53601 -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 53601] nVMX meta-bug
https://bugzilla.kernel.org/show_bug.cgi?id=53601 Nadav Har'El changed: What|Removed |Added Depends on||54161 -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 54161] New: nVMX: nested vpid
https://bugzilla.kernel.org/show_bug.cgi?id=54161 Summary: nVMX: nested vpid Product: Virtualization Version: unspecified Platform: All OS/Version: Linux Tree: Mainline Status: NEW Severity: enhancement Priority: P1 Component: kvm AssignedTo: virtualization_...@kernel-bugs.osdl.org ReportedBy: n...@math.technion.ac.il Regression: No Currently, support for VPID in nested VMX is minimal: L1 doesn't see the vpid feature, and doesn't use it, but L0 may use it (if available) when running the guests: We use the same vpid to run L1 and all its L2 guests, but vmx_flush_tlb(vcpu) when going back and forth between L1 and L2 - this flushes this particular vpid, but at least leave other L1's cache entries behind. A better performing solution would be to support "nested vpid" - L0 exposes to L1 the VPID feature, and maintains a mapping between VPID numbers that L1 uses to VPID numbers that it uses. L0 runs each of its guests - L1s and L2s, as separate guests. Some points to remember while doing this (this is only a partial list!): 1. Remember to set vmx->vpid on nested_vmx_run() and nested_vmx_vmexit() to L1's and L2's vpid. This is important because vmx_flush_tlb() (called on context switch between a guest's processes) flushes only the current vpid (if possible) and we need to do it to the correct one. 2. In free_nested(), also free the vpids that have been allocated to all the L2s. This can probably be done in nested_free_saved_vmcs() (see mention above). However, note that free_nested() is preceded (in vmx_free_vcpu()) by a call to free_vpid() which frees the current vmx->vpid, which might be l1 or l2 and we need to make sure we free all of them, and just once. 3. Implement, of course, also INVVPID for L1. -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 1/2] KVM: VMX: enable acknowledge interupt on vmexit
On Wed, Feb 20, 2013 at 3:10 PM, Zhang, Yang Z wrote: push %%cs >>> "push %%cs" is invalid in x86_64. >> >> Oops. 'push[lq] $__KERNEL_CS' then. > Is this right? Just copy it from other file. > > #define __STR(X) #X > #define STR(X) __STR(X) > > #ifdef CONFIG_X86_64 > "pushq $"STR(__KERNEL_CS)" \n\t" > #else > "pushl $"STR(__KERNEL_CS)" \n\t" > #endif > > #undef STR > #undef __STR > Use __ASM_SIZE and an immediate operand for __KERNEL_CS: asm ( ... : : [cs]"i"(__KERNEL_CS) ); and the code will be cleaner. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: nVMX: Rework event injection and recovery
On 2013-02-20 14:01, Jan Kiszka wrote: > This aligns VMX more with SVM regarding event injection and recovery for > nested guests. The changes allow to inject interrupts directly from L0 > to L2. > > One difference to SVM is that we always transfer the pending event > injection into the architectural state of the VCPU and then drop it from > there if it turns out that we left L2 to enter L1. > > VMX and SVM are now identical in how they recover event injections from > unperformed vmlaunch/vmresume: We detect that VM_ENTRY_INTR_INFO_FIELD > still contains a valid event and, if yes, transfer the content into L1's > idt_vectoring_info_field. > > To avoid that we incorrectly leak an event into the architectural VCPU > state that L1 wants to inject, we skip cancellation on nested run. > > Signed-off-by: Jan Kiszka > --- > > Survived moderate testing here and (currently) makes sense to me, but > please review very carefully. I wouldn't be surprised if I'm still > missing some subtle corner case. Forgot to point this out again: It still takes "KVM: nVMX: Fix injection of PENDING_INTERRUPT and NMI_WINDOW exits to L1" to make L0->L2 injection work. So this patch logically depends on it. Jan > > arch/x86/kvm/vmx.c | 57 +++ > 1 files changed, 26 insertions(+), 31 deletions(-) > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index dd3a8a0..7d2fbd2 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -6489,8 +6489,6 @@ static void __vmx_complete_interrupts(struct vcpu_vmx > *vmx, > > static void vmx_complete_interrupts(struct vcpu_vmx *vmx) > { > - if (is_guest_mode(&vmx->vcpu)) > - return; > __vmx_complete_interrupts(vmx, vmx->idt_vectoring_info, > VM_EXIT_INSTRUCTION_LEN, > IDT_VECTORING_ERROR_CODE); > @@ -6498,7 +6496,7 @@ static void vmx_complete_interrupts(struct vcpu_vmx > *vmx) > > static void vmx_cancel_injection(struct kvm_vcpu *vcpu) > { > - if (is_guest_mode(vcpu)) > + if (to_vmx(vcpu)->nested.nested_run_pending) > return; > __vmx_complete_interrupts(to_vmx(vcpu), > vmcs_read32(VM_ENTRY_INTR_INFO_FIELD), > @@ -6531,21 +6529,6 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu > *vcpu) > struct vcpu_vmx *vmx = to_vmx(vcpu); > unsigned long debugctlmsr; > > - if (is_guest_mode(vcpu) && !vmx->nested.nested_run_pending) { > - struct vmcs12 *vmcs12 = get_vmcs12(vcpu); > - if (vmcs12->idt_vectoring_info_field & > - VECTORING_INFO_VALID_MASK) { > - vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, > - vmcs12->idt_vectoring_info_field); > - vmcs_write32(VM_ENTRY_INSTRUCTION_LEN, > - vmcs12->vm_exit_instruction_len); > - if (vmcs12->idt_vectoring_info_field & > - VECTORING_INFO_DELIVER_CODE_MASK) > - vmcs_write32(VM_ENTRY_EXCEPTION_ERROR_CODE, > - vmcs12->idt_vectoring_error_code); > - } > - } > - > /* Record the guest's net vcpu time for enforced NMI injections. */ > if (unlikely(!cpu_has_virtual_nmis() && vmx->soft_vnmi_blocked)) > vmx->entry_time = ktime_get(); > @@ -6704,17 +6687,6 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu > *vcpu) > > vmx->idt_vectoring_info = vmcs_read32(IDT_VECTORING_INFO_FIELD); > > - if (is_guest_mode(vcpu)) { > - struct vmcs12 *vmcs12 = get_vmcs12(vcpu); > - vmcs12->idt_vectoring_info_field = vmx->idt_vectoring_info; > - if (vmx->idt_vectoring_info & VECTORING_INFO_VALID_MASK) { > - vmcs12->idt_vectoring_error_code = > - vmcs_read32(IDT_VECTORING_ERROR_CODE); > - vmcs12->vm_exit_instruction_len = > - vmcs_read32(VM_EXIT_INSTRUCTION_LEN); > - } > - } > - > vmx->loaded_vmcs->launched = 1; > > vmx->exit_reason = vmcs_read32(VM_EXIT_REASON); > @@ -7403,9 +7375,32 @@ void prepare_vmcs12(struct kvm_vcpu *vcpu, struct > vmcs12 *vmcs12) > vmcs12->vm_exit_instruction_len = vmcs_read32(VM_EXIT_INSTRUCTION_LEN); > vmcs12->vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO); > > - /* clear vm-entry fields which are to be cleared on exit */ > - if (!(vmcs12->vm_exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY)) > + /* drop what we picked up for L0 via vmx_complete_interrupts */ > + vcpu->arch.nmi_injected = false; > + kvm_clear_exception_queue(vcpu); > + kvm_clear_interrupt_queue(vcpu); > + > + if (!(vmcs12->vm_exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY) && > + vmcs12->vm_entry_intr_info_field & INTR_INFO_VALID
Re: [PATCH] KVM: nVMX: Rework event injection and recovery
On 2013-02-20 15:14, Nadav Har'El wrote: > Hi, > > By the way, if you haven't seen my description of why the current code > did what it did, take a look at > http://www.mail-archive.com/kvm@vger.kernel.org/msg54478.html > Another description might also come in handy: > http://www.mail-archive.com/kvm@vger.kernel.org/msg54476.html > > On Wed, Feb 20, 2013, Jan Kiszka wrote about "[PATCH] KVM: nVMX: Rework event > injection and recovery": >> This aligns VMX more with SVM regarding event injection and recovery for >> nested guests. The changes allow to inject interrupts directly from L0 >> to L2. >> >> One difference to SVM is that we always transfer the pending event >> injection into the architectural state of the VCPU and then drop it from >> there if it turns out that we left L2 to enter L1. > > Last time I checked, if I'm remembering correctly, the nested SVM code did > something a bit different: After the exit from L2 to L1 and unnecessarily > queuing the pending interrupt for injection, it skipped one entry into L1, > and as usual after the entry the interrupt queue is cleared so next time > around, when L1 one is really entered, the wrong injection is not attempted. > >> VMX and SVM are now identical in how they recover event injections from >> unperformed vmlaunch/vmresume: We detect that VM_ENTRY_INTR_INFO_FIELD >> still contains a valid event and, if yes, transfer the content into L1's >> idt_vectoring_info_field. > >> To avoid that we incorrectly leak an event into the architectural VCPU >> state that L1 wants to inject, we skip cancellation on nested run. > > I didn't understand this last point. - prepare_vmcs02 sets event to be injected into L2 - while trying to enter L2, a cancel condition is met - we call vmx_cancel_interrupts but should now avoid filling L1's event into the arch event queues - it's kept in vmcs12 > >> @@ -7403,9 +7375,32 @@ void prepare_vmcs12(struct kvm_vcpu *vcpu, struct >> vmcs12 *vmcs12) >> vmcs12->vm_exit_instruction_len = vmcs_read32(VM_EXIT_INSTRUCTION_LEN); >> vmcs12->vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO); >> >> -/* clear vm-entry fields which are to be cleared on exit */ >> -if (!(vmcs12->vm_exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY)) >> +/* drop what we picked up for L0 via vmx_complete_interrupts */ >> +vcpu->arch.nmi_injected = false; >> +kvm_clear_exception_queue(vcpu); >> +kvm_clear_interrupt_queue(vcpu); > > It would be nice to move these lines out of prepare_vmcs12(), since they > don't really do anything with vmcs12, and move it into > nested_vmx_vmexit() (which is the one which called prepare_vmcs12()). OK. > > Did you test this both with PIN_BASED_EXT_INTR_MASK (the usual case) and > !PIN_BASED_EXT_INTR_MASK (the case which interests you)? We need to make > sure that in the former case, this doesn't clear the interrupt queue after > we put an interrupt to be injected in it (at first glance it seems fine, > but these code paths are so convoluted, it's hard to be sure). I tested both, but none of my tests was close to cover all potential corner cases. But that unconditional queue clearing surely deserves attention and critical review. > >> +if (!(vmcs12->vm_exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY) && >> +vmcs12->vm_entry_intr_info_field & INTR_INFO_VALID_MASK) { >> +/* >> + * Preserve the event that was supposed to be injected >> + * by emulating it would have been returned in >> + * IDT_VECTORING_INFO_FIELD. >> + */ >> +if (vmcs_read32(VM_ENTRY_INTR_INFO_FIELD) & >> +INTR_INFO_VALID_MASK) { >> +vmcs12->idt_vectoring_info_field = >> +vmcs12->vm_entry_intr_info_field; >> +vmcs12->idt_vectoring_error_code = >> +vmcs12->vm_entry_exception_error_code; >> +vmcs12->vm_exit_instruction_len = >> +vmcs12->vm_entry_instruction_len; >> +vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0); > > I'm afraid I'm missing what you are trying to do here. Why would > vmcs_read32(VM_ENTRY_INTR_INFO_FIELD) & INTR_INFO_VALID_MASK ever be > true? After all, the processor clears it after each sucessful exit so > last if() will only succeed on failed entries - but this is NOT the > case if we're in the enclosing if (note that vmcs12->vm_exit_reason = > vmcs_read32(VM_EXIT_REASON)). Maybe I'm missing something? Canceled vmentry as indicated above. Look at vcpu_enter_guest: kvm_mmu_reload may fail, or we need to handle some async event / perform some reschedule. But those points are past prepare_vmcs02. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at htt
[Bug 54141] nVMX: Support TPR shadow
https://bugzilla.kernel.org/show_bug.cgi?id=54141 Nadav Har'El changed: What|Removed |Added Blocks||53601 -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 53601] nVMX meta-bug
https://bugzilla.kernel.org/show_bug.cgi?id=53601 Nadav Har'El changed: What|Removed |Added Depends on||54141 -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 54141] New: nVMX: Support TPR shadow
https://bugzilla.kernel.org/show_bug.cgi?id=54141 Summary: nVMX: Support TPR shadow Product: Virtualization Version: unspecified Platform: All OS/Version: Linux Tree: Mainline Status: NEW Severity: enhancement Priority: P1 Component: kvm AssignedTo: virtualization_...@kernel-bugs.osdl.org ReportedBy: n...@math.technion.ac.il Regression: No Add support for TPR shadow, which may be important for performance of Windows L2 guests (which currently don't work anyway - see bug 53641). Note, however, that TPR shadow is an optional feature, and KVM (as L1) won't use it if not available to it. Some of the things we'll need to do (I think): 1. Advertise CPU_BASED_TPR_SHADOW in MSR_IA32_VMX_PROCBASED_CTLS 2. In nested_vmx_exit_handled_cr, in "mov to cr8", consider if we need an "else if CPU_BASED_TPR_SHADOW" which sets the shadow and only exits if below the tpr_threshold. This may be unnecessary, because the processor will already do this if we put vmcs12 desires in vmcs02, and don't merge it with vmcs01. Also, do we need to change anything in "mov from cr8" in the same function? I don't think it will ever get called. 3. In prepare_vmcs02, set the TPR-shadow definitions from vmcs12, ignoring L0's wishes (if I understand correctly, this is the right thing to do): 4. In prepare_vmcs02, if nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW), we need to set VIRTUAL_APIC_PAGE_ADDR. The code will look something like struct page *page = nested_get_page(vcpu, vmcs12->virtual_apic_page_addr); if (!page) return 1; vmcs_write64(VIRTUAL_APIC_PAGE_ADDR, page_to_phys(page)); but we need to save this "page" in vmx->nested and nested_release_page() it on nested_vmx_vmexit() or free_nested(). 4. In prepare_vmcs02, set TPR_THRESHOLD as requested by L1. We used to have this code: if (vm_need_tpr_shadow(vcpu->kvm) && nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW)) vmcs_write32(TPR_THRESHOLD, vmcs12->tpr_threshold); But I need to consider if "vm_need_tpr_shadow" is the right thing to check. 5. In prepare_vmcs02, in setting up CPU_BASED_VM_EXEC_CONTROL, probably *leave* the code which removes the CPU_BASED_TPR_SHADOW from L0's exec_control (we'll get this bit from vmcs12's exec_control). 6. We used to have the following code in prepare_vmcs02, after exec_control |= vmcs12->cpu_based_vm_exec_control, to remove the TPR_SHADOW feature even if L1 asked for it in certain cases. I don't see why this was needed: if (!vm_need_tpr_shadow(vcpu->kvm) || vmcs12->virtual_apic_page_addr == 0) { exec_control &= ~CPU_BASED_TPR_SHADOW; #ifdef CONFIG_X86_64 exec_control |= CPU_BASED_CR8_STORE_EXITING | CPU_BASED_CR8_LOAD_EXITING; #endif } else if (exec_control & CPU_BASED_TPR_SHADOW) { #ifdef CONFIG_X86_64 exec_control &= ~CPU_BASED_CR8_STORE_EXITING; exec_control &= ~CPU_BASED_CR8_LOAD_EXITING; #endif } -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [uq/master PATCH] target-i386: kvm: save/restore steal time MSR
On Tue, Feb 19, 2013 at 11:27:20PM -0300, Marcelo Tosatti wrote: > > Read and write steal time MSR, so that reporting is functional across > migration. > > Signed-off-by: Marcelo Tosatti > Reviewed-by: Gleb Natapov > diff --git a/target-i386/cpu.h b/target-i386/cpu.h > index 7577e4f..17c7293 100644 > --- a/target-i386/cpu.h > +++ b/target-i386/cpu.h > @@ -792,6 +792,7 @@ typedef struct CPUX86State { > #endif > uint64_t system_time_msr; > uint64_t wall_clock_msr; > +uint64_t steal_time_msr; > uint64_t async_pf_en_msr; > uint64_t pv_eoi_en_msr; > > diff --git a/target-i386/kvm.c b/target-i386/kvm.c > index 0cf413d..9ae9d74 100644 > --- a/target-i386/kvm.c > +++ b/target-i386/kvm.c > @@ -68,6 +68,7 @@ static bool has_msr_tsc_deadline; > static bool has_msr_async_pf_en; > static bool has_msr_pv_eoi_en; > static bool has_msr_misc_enable; > +static bool has_msr_kvm_steal_time; > static int lm_capable_kernel; > > bool kvm_allows_irq0_override(void) > @@ -507,6 +508,8 @@ int kvm_arch_init_vcpu(CPUState *cs) > > has_msr_pv_eoi_en = c->eax & (1 << KVM_FEATURE_PV_EOI); > > +has_msr_kvm_steal_time = c->eax & (1 << KVM_FEATURE_STEAL_TIME); > + > cpu_x86_cpuid(env, 0, 0, &limit, &unused, &unused, &unused); > > for (i = 0; i <= limit; i++) { > @@ -1107,6 +1110,10 @@ static int kvm_put_msrs(X86CPU *cpu, int level) > kvm_msr_entry_set(&msrs[n++], MSR_KVM_PV_EOI_EN, >env->pv_eoi_en_msr); > } > +if (has_msr_kvm_steal_time) { > +kvm_msr_entry_set(&msrs[n++], MSR_KVM_STEAL_TIME, > + env->steal_time_msr); > +} > if (hyperv_hypercall_available()) { > kvm_msr_entry_set(&msrs[n++], HV_X64_MSR_GUEST_OS_ID, 0); > kvm_msr_entry_set(&msrs[n++], HV_X64_MSR_HYPERCALL, 0); > @@ -1360,6 +1367,9 @@ static int kvm_get_msrs(X86CPU *cpu) > if (has_msr_pv_eoi_en) { > msrs[n++].index = MSR_KVM_PV_EOI_EN; > } > +if (has_msr_kvm_steal_time) { > +msrs[n++].index = MSR_KVM_STEAL_TIME; > +} > > if (env->mcg_cap) { > msrs[n++].index = MSR_MCG_STATUS; > @@ -1445,6 +1455,9 @@ static int kvm_get_msrs(X86CPU *cpu) > case MSR_KVM_PV_EOI_EN: > env->pv_eoi_en_msr = msrs[i].data; > break; > +case MSR_KVM_STEAL_TIME: > +env->steal_time_msr = msrs[i].data; > +break; > } > } > > diff --git a/target-i386/machine.c b/target-i386/machine.c > index 8df6a6b..1feb9ca 100644 > --- a/target-i386/machine.c > +++ b/target-i386/machine.c > @@ -287,6 +287,24 @@ static bool pv_eoi_msr_needed(void *opaque) > return cpu->pv_eoi_en_msr != 0; > } > > +static bool steal_time_msr_needed(void *opaque) > +{ > +CPUX86State *cpu = opaque; > + > +return cpu->steal_time_msr != 0; > +} > + > +static const VMStateDescription vmstate_steal_time_msr = { > +.name = "cpu/steal_time_msr", > +.version_id = 1, > +.minimum_version_id = 1, > +.minimum_version_id_old = 1, > +.fields = (VMStateField []) { > +VMSTATE_UINT64(steal_time_msr, CPUX86State), > +VMSTATE_END_OF_LIST() > +} > +}; > + > static const VMStateDescription vmstate_async_pf_msr = { > .name = "cpu/async_pf_msr", > .version_id = 1, > @@ -494,6 +512,9 @@ static const VMStateDescription vmstate_cpu = { > .vmsd = &vmstate_pv_eoi_msr, > .needed = pv_eoi_msr_needed, > } , { > +.vmsd = &vmstate_steal_time_msr, > +.needed = steal_time_msr_needed, > +} , { > .vmsd = &vmstate_fpop_ip_dp, > .needed = fpop_ip_dp_needed, > }, { -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: nVMX: Rework event injection and recovery
Hi, By the way, if you haven't seen my description of why the current code did what it did, take a look at http://www.mail-archive.com/kvm@vger.kernel.org/msg54478.html Another description might also come in handy: http://www.mail-archive.com/kvm@vger.kernel.org/msg54476.html On Wed, Feb 20, 2013, Jan Kiszka wrote about "[PATCH] KVM: nVMX: Rework event injection and recovery": > This aligns VMX more with SVM regarding event injection and recovery for > nested guests. The changes allow to inject interrupts directly from L0 > to L2. > > One difference to SVM is that we always transfer the pending event > injection into the architectural state of the VCPU and then drop it from > there if it turns out that we left L2 to enter L1. Last time I checked, if I'm remembering correctly, the nested SVM code did something a bit different: After the exit from L2 to L1 and unnecessarily queuing the pending interrupt for injection, it skipped one entry into L1, and as usual after the entry the interrupt queue is cleared so next time around, when L1 one is really entered, the wrong injection is not attempted. > VMX and SVM are now identical in how they recover event injections from > unperformed vmlaunch/vmresume: We detect that VM_ENTRY_INTR_INFO_FIELD > still contains a valid event and, if yes, transfer the content into L1's > idt_vectoring_info_field. > To avoid that we incorrectly leak an event into the architectural VCPU > state that L1 wants to inject, we skip cancellation on nested run. I didn't understand this last point. > @@ -7403,9 +7375,32 @@ void prepare_vmcs12(struct kvm_vcpu *vcpu, struct > vmcs12 *vmcs12) > vmcs12->vm_exit_instruction_len = vmcs_read32(VM_EXIT_INSTRUCTION_LEN); > vmcs12->vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO); > > - /* clear vm-entry fields which are to be cleared on exit */ > - if (!(vmcs12->vm_exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY)) > + /* drop what we picked up for L0 via vmx_complete_interrupts */ > + vcpu->arch.nmi_injected = false; > + kvm_clear_exception_queue(vcpu); > + kvm_clear_interrupt_queue(vcpu); It would be nice to move these lines out of prepare_vmcs12(), since they don't really do anything with vmcs12, and move it into nested_vmx_vmexit() (which is the one which called prepare_vmcs12()). Did you test this both with PIN_BASED_EXT_INTR_MASK (the usual case) and !PIN_BASED_EXT_INTR_MASK (the case which interests you)? We need to make sure that in the former case, this doesn't clear the interrupt queue after we put an interrupt to be injected in it (at first glance it seems fine, but these code paths are so convoluted, it's hard to be sure). > + if (!(vmcs12->vm_exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY) && > + vmcs12->vm_entry_intr_info_field & INTR_INFO_VALID_MASK) { > + /* > + * Preserve the event that was supposed to be injected > + * by emulating it would have been returned in > + * IDT_VECTORING_INFO_FIELD. > + */ > + if (vmcs_read32(VM_ENTRY_INTR_INFO_FIELD) & > + INTR_INFO_VALID_MASK) { > + vmcs12->idt_vectoring_info_field = > + vmcs12->vm_entry_intr_info_field; > + vmcs12->idt_vectoring_error_code = > + vmcs12->vm_entry_exception_error_code; > + vmcs12->vm_exit_instruction_len = > + vmcs12->vm_entry_instruction_len; > + vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0); I'm afraid I'm missing what you are trying to do here. Why would vmcs_read32(VM_ENTRY_INTR_INFO_FIELD) & INTR_INFO_VALID_MASK ever be true? After all, the processor clears it after each sucessful exit so last if() will only succeed on failed entries - but this is NOT the case if we're in the enclosing if (note that vmcs12->vm_exit_reason = vmcs_read32(VM_EXIT_REASON)). Maybe I'm missing something? Nadav. -- Nadav Har'El| Wednesday, Feb 20 2013, 10 Adar 5773 n...@math.technion.ac.il |- Phone +972-523-790466, ICQ 13349191 |Change is inevitable, except from a http://nadav.harel.org.il |vending machine. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 1/6] kvm: add device control API
On Tue, Feb 19, 2013 at 03:16:37PM -0600, Scott Wood wrote: > On 02/19/2013 06:24:18 AM, Gleb Natapov wrote: > >On Mon, Feb 18, 2013 at 05:01:40PM -0600, Scott Wood wrote: > >> The ability to set/get attributes is needed. Sorry, but "get or set > >> one blob of data, up to 512 bytes, for the entire irqchip" is just > >> not good enough -- assuming you don't want us to start sticking > >> pointers and commands in *that* data. :-) > >> > >Proposed interface sticks pointers into ioctl data, so why doing > >the same > >for KVM_SET_IRQCHIP/KVM_GET_IRQCHIP makes you smile. > > There's a difference between putting a pointer in an ioctl control > structure that is specifically documented as being that way (as in > ONE_REG), versus taking an ioctl that claims to be setting/getting a > blob of state and embedding pointers in it. It would be like > sticking a pointer in the attribute payload of this API, which I > think is something to be discouraged. If documentation is what differentiate for you between silly and smart then write documentation instead of new interfaces. KVM_SET_IRQCHIP/KVM_GET_IRQCHIP is defined to operate on blob of data on x86, nothing prevent you from adding MPIC specifics to the interface, Add mpic state into kvm_irqchip structure and if 512 bytes is not enough for you to transfer the state put pointers there and _document_ them. But with 512 bytes you can transfer properties inline, so you probably do not need pointer there anyway. I see you have three properties 2 of them 32bit and one 64bit. >It'd also be using > KVM_SET_IRQCHIP to read data, which is the sort of thing you object > to later on regarding KVM_IRQ_LINE_STATUS. > Do not see why. > Then there's the silliness of transporting 512 bytes just to read a > descriptor for transporting something else. > Yes, agree. But is this enough of a reason to introduce entirely new interface? Is it on performance critical path? Doubt it, unless you abuse the interface to send interrupts, but then isn't it silty to do copy_from_user() twice to inject an interrupt like proposed interface does? > >For signaling irqs (I think this is what you mean by "commands") > >we have KVM_IRQ_LINE. > > It's one type of command. Another is setting the address. Another > is writing to registers that have side effects (this is how MSI > injection is done on MPIC, just as in real hardware). > Setting the address is setting an attribute. Sending MSI is a command. Things you set/get during init/migration are attributes. Things you do to cause side-effects are commands. > What is the benefit of KVM_IRQ_LINE over what MPIC does? What real > (non-glue/wrapper) code can become common? > No new ioctl with exactly same result (well actually even faster since less copying is done). You need to show us the benefits of the new interface vs existing one, not vice versa. > And I really hope you don't want us to do MSIs the x86 way. > What is wrong with KVM_SIGNAL_MSI? Except that you'll need code to glue it to mpic. > In the XICS thread, Paul brought up the possibliity of cascaded > MPICs. It's not relevant to the systems we're trying to model, but > if one did want to use the in-kernel irqchip interface for that, it > would be really nice to be able to operate on a specific MPIC for > injection rather than have to come up with some sort of global > identifier (above and beyond the minor flattening we'd need to do to > represent a single MPIC's interrupts in a flat numberspace). > ARM encodes information in irq field of KVM_IRQ_LINE like that: bits: | 31 ... 24 | 23 ... 16 | 15... 0 | field: | irq_type | vcpu_index | irq_number| Why will similar approach not work? > >> If you mean the way to inject interrupts, it's simpler this way. > >> Why go out of our way to inject common glue code into a > >> communication path between hw/kvm/mpic.c in QEMU and > >> arch/powerpc/kvm/mpic.c in KVM? Or rather, why make that common > >> glue be specific to this one function when we could reuse the same > >> communication glue used for other things, such as device state? > >You will need glue anyway and I do no see how amount of it is much > >different one way or the other. > > It uses glue that we need to be present for other things anyway. If > it weren't for XICS we wouldn't need a KVM_IRQ_LINE implementation > at all on PPC. It may not be a major difference, but it doesn't > affect anything but MPIC and it seems more straightforward this way. > We are talking about something like 4 lines of userspace code including bracket. I do not think this is strong point in favor of different interface. > >Gluing qemu_set_irq() to > >ioctl(KVM_IRQ_LINE) or ioctl(KVM_SET_DEVICE_ATTR) is not much > >different. > > qemu_set_irq() is not glued to either of those. It's glued to > kvm_openpic_set_irq(), kvm_ioapic_set_irq(), etc. It's already not > generic code. > OK, this does not invalidates my argument though. > >O
RE: [PATCH v3 1/2] KVM: VMX: enable acknowledge interupt on vmexit
Avi Kivity wrote on 2013-02-20: > On Wed, Feb 20, 2013 at 4:46 AM, Zhang, Yang Z > wrote: +static void vmx_handle_external_intr(struct kvm_vcpu *vcpu) +{ + u32 exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO); + + /* + * If external interrupt exists, IF bit is set in rflags/eflags on the +* interrupt stack frame, and interrupt will be enabled on a return +* from interrupt handler. +*/ + if ((exit_intr_info & (INTR_INFO_VALID_MASK | > INTR_INFO_INTR_TYPE_MASK)) + == (INTR_INFO_VALID_MASK | > INTR_TYPE_EXT_INTR)) { + unsigned int vector; + unsigned long entry; + gate_desc *desc; + struct vcpu_vmx *vmx = to_vmx(vcpu); + + vector = exit_intr_info & INTR_INFO_VECTOR_MASK; +#ifdef CONFIG_X86_64 + desc = (void *)vmx->host_idt_base + vector * 16; +#else + desc = (void *)vmx->host_idt_base + vector * 8; +#endif + + entry = gate_offset(*desc); + asm( + "mov %0, %%" _ASM_DX " \n\t" +#ifdef CONFIG_X86_64 + "mov %%" _ASM_SP ", %%" _ASM_BX " \n\t" + "and $0xfff0, %%" _ASM_SP " \n\t" + "mov %%ss, %%" _ASM_AX " \n\t" + "push %%" _ASM_AX " \n\t" + "push %%" _ASM_BX " \n\t" +#endif >>> >>> Are we sure no interrupts are using the IST feature? I guess it's unlikely. >> Linux uses IST for NMI, stack fault, machine-check, double fault and >> debug interrupt . No external interrupt will use it. This feature is >> only for external interrupt. So we don't need to check it here. > > Ok, thanks for checking. > >> >>> + "pushf \n\t" + "pop %%" _ASM_AX " \n\t" + "or $0x200, %%" _ASM_AX " \n\t" + "push %%" _ASM_AX " \n\t" >>> >>> Can simplify to pushf; orl $0x200, %%rsp. >> Sure. >> + "mov %%cs, %%" _ASM_AX " \n\t" + "push %%" _ASM_AX " \n\t" >>> >>> push %%cs >> "push %%cs" is invalid in x86_64. > > Oops. 'push[lq] $__KERNEL_CS' then. Is this right? Just copy it from other file. #define __STR(X) #X #define STR(X) __STR(X) #ifdef CONFIG_X86_64 "pushq $"STR(__KERNEL_CS)" \n\t" #else "pushl $"STR(__KERNEL_CS)" \n\t" #endif #undef STR #undef __STR >> + "push intr_return \n\t" >>> >>> push $1f. Or even combine with the next instruction, and call %rdx. >> Which is faster? jmp or call? > > Actually it doesn't matter, the processor is clever enough to minimize > the difference. But the code is simpler and shorter with 'call'. -- To Yes, 'call' is better. Best regards, Yang -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: VMX: Pass vcpu to __vmx_complete_interrupts
Cleanup: __vmx_complete_interrupts has no use for the vmx structure. Signed-off-by: Jan Kiszka --- Note: this applies on top of "Rework event injection and recovery" arch/x86/kvm/vmx.c | 31 ++- 1 files changed, 14 insertions(+), 17 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index d99a519..d6ea4a7 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -6430,7 +6430,7 @@ static void vmx_recover_nmi_blocking(struct vcpu_vmx *vmx) ktime_to_ns(ktime_sub(ktime_get(), vmx->entry_time)); } -static void __vmx_complete_interrupts(struct vcpu_vmx *vmx, +static void __vmx_complete_interrupts(struct kvm_vcpu *vcpu, u32 idt_vectoring_info, int instr_len_field, int error_code_field) @@ -6441,46 +6441,43 @@ static void __vmx_complete_interrupts(struct vcpu_vmx *vmx, idtv_info_valid = idt_vectoring_info & VECTORING_INFO_VALID_MASK; - vmx->vcpu.arch.nmi_injected = false; - kvm_clear_exception_queue(&vmx->vcpu); - kvm_clear_interrupt_queue(&vmx->vcpu); + vcpu->arch.nmi_injected = false; + kvm_clear_exception_queue(vcpu); + kvm_clear_interrupt_queue(vcpu); if (!idtv_info_valid) return; - kvm_make_request(KVM_REQ_EVENT, &vmx->vcpu); + kvm_make_request(KVM_REQ_EVENT, vcpu); vector = idt_vectoring_info & VECTORING_INFO_VECTOR_MASK; type = idt_vectoring_info & VECTORING_INFO_TYPE_MASK; switch (type) { case INTR_TYPE_NMI_INTR: - vmx->vcpu.arch.nmi_injected = true; + vcpu->arch.nmi_injected = true; /* * SDM 3: 27.7.1.2 (September 2008) * Clear bit "block by NMI" before VM entry if a NMI * delivery faulted. */ - vmx_set_nmi_mask(&vmx->vcpu, false); + vmx_set_nmi_mask(vcpu, false); break; case INTR_TYPE_SOFT_EXCEPTION: - vmx->vcpu.arch.event_exit_inst_len = - vmcs_read32(instr_len_field); + vcpu->arch.event_exit_inst_len = vmcs_read32(instr_len_field); /* fall through */ case INTR_TYPE_HARD_EXCEPTION: if (idt_vectoring_info & VECTORING_INFO_DELIVER_CODE_MASK) { u32 err = vmcs_read32(error_code_field); - kvm_queue_exception_e(&vmx->vcpu, vector, err); + kvm_queue_exception_e(vcpu, vector, err); } else - kvm_queue_exception(&vmx->vcpu, vector); + kvm_queue_exception(vcpu, vector); break; case INTR_TYPE_SOFT_INTR: - vmx->vcpu.arch.event_exit_inst_len = - vmcs_read32(instr_len_field); + vcpu->arch.event_exit_inst_len = vmcs_read32(instr_len_field); /* fall through */ case INTR_TYPE_EXT_INTR: - kvm_queue_interrupt(&vmx->vcpu, vector, - type == INTR_TYPE_SOFT_INTR); + kvm_queue_interrupt(vcpu, vector, type == INTR_TYPE_SOFT_INTR); break; default: break; @@ -6489,7 +6486,7 @@ static void __vmx_complete_interrupts(struct vcpu_vmx *vmx, static void vmx_complete_interrupts(struct vcpu_vmx *vmx) { - __vmx_complete_interrupts(vmx, vmx->idt_vectoring_info, + __vmx_complete_interrupts(&vmx->vcpu, vmx->idt_vectoring_info, VM_EXIT_INSTRUCTION_LEN, IDT_VECTORING_ERROR_CODE); } @@ -6498,7 +6495,7 @@ static void vmx_cancel_injection(struct kvm_vcpu *vcpu) { if (to_vmx(vcpu)->nested.nested_run_pending) return; - __vmx_complete_interrupts(to_vmx(vcpu), + __vmx_complete_interrupts(vcpu, vmcs_read32(VM_ENTRY_INTR_INFO_FIELD), VM_ENTRY_INSTRUCTION_LEN, VM_ENTRY_EXCEPTION_ERROR_CODE); -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: nVMX: Avoid one redundant vmcs_read in prepare_vmcs12
IDT_VECTORING_INFO_FIELD was already read right after vmexit. Signed-off-by: Jan Kiszka --- arch/x86/kvm/vmx.c |3 +-- 1 files changed, 1 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 7d2fbd2..d99a519 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -7368,8 +7368,7 @@ void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) vmcs12->vm_exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO); vmcs12->vm_exit_intr_error_code = vmcs_read32(VM_EXIT_INTR_ERROR_CODE); - vmcs12->idt_vectoring_info_field = - vmcs_read32(IDT_VECTORING_INFO_FIELD); + vmcs12->idt_vectoring_info_field = to_vmx(vcpu)->idt_vectoring_info; vmcs12->idt_vectoring_error_code = vmcs_read32(IDT_VECTORING_ERROR_CODE); vmcs12->vm_exit_instruction_len = vmcs_read32(VM_EXIT_INSTRUCTION_LEN); -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: nVMX: Rework event injection and recovery
This aligns VMX more with SVM regarding event injection and recovery for nested guests. The changes allow to inject interrupts directly from L0 to L2. One difference to SVM is that we always transfer the pending event injection into the architectural state of the VCPU and then drop it from there if it turns out that we left L2 to enter L1. VMX and SVM are now identical in how they recover event injections from unperformed vmlaunch/vmresume: We detect that VM_ENTRY_INTR_INFO_FIELD still contains a valid event and, if yes, transfer the content into L1's idt_vectoring_info_field. To avoid that we incorrectly leak an event into the architectural VCPU state that L1 wants to inject, we skip cancellation on nested run. Signed-off-by: Jan Kiszka --- Survived moderate testing here and (currently) makes sense to me, but please review very carefully. I wouldn't be surprised if I'm still missing some subtle corner case. arch/x86/kvm/vmx.c | 57 +++ 1 files changed, 26 insertions(+), 31 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index dd3a8a0..7d2fbd2 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -6489,8 +6489,6 @@ static void __vmx_complete_interrupts(struct vcpu_vmx *vmx, static void vmx_complete_interrupts(struct vcpu_vmx *vmx) { - if (is_guest_mode(&vmx->vcpu)) - return; __vmx_complete_interrupts(vmx, vmx->idt_vectoring_info, VM_EXIT_INSTRUCTION_LEN, IDT_VECTORING_ERROR_CODE); @@ -6498,7 +6496,7 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx) static void vmx_cancel_injection(struct kvm_vcpu *vcpu) { - if (is_guest_mode(vcpu)) + if (to_vmx(vcpu)->nested.nested_run_pending) return; __vmx_complete_interrupts(to_vmx(vcpu), vmcs_read32(VM_ENTRY_INTR_INFO_FIELD), @@ -6531,21 +6529,6 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu) struct vcpu_vmx *vmx = to_vmx(vcpu); unsigned long debugctlmsr; - if (is_guest_mode(vcpu) && !vmx->nested.nested_run_pending) { - struct vmcs12 *vmcs12 = get_vmcs12(vcpu); - if (vmcs12->idt_vectoring_info_field & - VECTORING_INFO_VALID_MASK) { - vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, - vmcs12->idt_vectoring_info_field); - vmcs_write32(VM_ENTRY_INSTRUCTION_LEN, - vmcs12->vm_exit_instruction_len); - if (vmcs12->idt_vectoring_info_field & - VECTORING_INFO_DELIVER_CODE_MASK) - vmcs_write32(VM_ENTRY_EXCEPTION_ERROR_CODE, - vmcs12->idt_vectoring_error_code); - } - } - /* Record the guest's net vcpu time for enforced NMI injections. */ if (unlikely(!cpu_has_virtual_nmis() && vmx->soft_vnmi_blocked)) vmx->entry_time = ktime_get(); @@ -6704,17 +6687,6 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu) vmx->idt_vectoring_info = vmcs_read32(IDT_VECTORING_INFO_FIELD); - if (is_guest_mode(vcpu)) { - struct vmcs12 *vmcs12 = get_vmcs12(vcpu); - vmcs12->idt_vectoring_info_field = vmx->idt_vectoring_info; - if (vmx->idt_vectoring_info & VECTORING_INFO_VALID_MASK) { - vmcs12->idt_vectoring_error_code = - vmcs_read32(IDT_VECTORING_ERROR_CODE); - vmcs12->vm_exit_instruction_len = - vmcs_read32(VM_EXIT_INSTRUCTION_LEN); - } - } - vmx->loaded_vmcs->launched = 1; vmx->exit_reason = vmcs_read32(VM_EXIT_REASON); @@ -7403,9 +7375,32 @@ void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) vmcs12->vm_exit_instruction_len = vmcs_read32(VM_EXIT_INSTRUCTION_LEN); vmcs12->vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO); - /* clear vm-entry fields which are to be cleared on exit */ - if (!(vmcs12->vm_exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY)) + /* drop what we picked up for L0 via vmx_complete_interrupts */ + vcpu->arch.nmi_injected = false; + kvm_clear_exception_queue(vcpu); + kvm_clear_interrupt_queue(vcpu); + + if (!(vmcs12->vm_exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY) && + vmcs12->vm_entry_intr_info_field & INTR_INFO_VALID_MASK) { + /* +* Preserve the event that was supposed to be injected +* by emulating it would have been returned in +* IDT_VECTORING_INFO_FIELD. +*/ + if (vmcs_read32(VM_ENTRY_INTR_INFO_FIELD) & + INTR_INFO_VALID_MASK) { +
Re: [PATCH v3 1/2] KVM: VMX: enable acknowledge interupt on vmexit
On Wed, Feb 20, 2013 at 4:46 AM, Zhang, Yang Z wrote: >>> >>> +static void vmx_handle_external_intr(struct kvm_vcpu *vcpu) +{ + >>> u32 exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO); + + /* + >>>* If external interrupt exists, IF bit is set in rflags/eflags on >>> the +* interrupt stack frame, and interrupt will be enabled on >>> a return +* from interrupt handler. +*/ + if >>> ((exit_intr_info & (INTR_INFO_VALID_MASK | INTR_INFO_INTR_TYPE_MASK)) + >>> == (INTR_INFO_VALID_MASK | INTR_TYPE_EXT_INTR)) { >>> + unsigned int vector; + unsigned long >>> entry; + gate_desc *desc; + struct vcpu_vmx >>> *vmx = to_vmx(vcpu); + + vector = exit_intr_info & >>> INTR_INFO_VECTOR_MASK; +#ifdef CONFIG_X86_64 + desc = >>> (void *)vmx->host_idt_base + vector * 16; +#else + desc = >>> (void *)vmx->host_idt_base + vector * 8; +#endif + + >>> entry = gate_offset(*desc); + asm( + >>> "mov %0, %%" _ASM_DX " \n\t" +#ifdef CONFIG_X86_64 + >>> "mov %%" _ASM_SP ", %%" _ASM_BX " \n\t" + >>> "and $0xfff0, %%" _ASM_SP " \n\t" + >>> "mov %%ss, %%" _ASM_AX " \n\t" + "push %%" >>> _ASM_AX " \n\t" + "push %%" _ASM_BX " \n\t" >>> +#endif >> >> Are we sure no interrupts are using the IST feature? I guess it's unlikely. > Linux uses IST for NMI, stack fault, machine-check, double fault and debug > interrupt . No external interrupt will use it. > This feature is only for external interrupt. So we don't need to check it > here. Ok, thanks for checking. > >> >>> + "pushf \n\t" >>> + "pop %%" _ASM_AX " \n\t" >>> + "or $0x200, %%" _ASM_AX " \n\t" >>> + "push %%" _ASM_AX " \n\t" >> >> Can simplify to pushf; orl $0x200, %%rsp. > Sure. > >>> + "mov %%cs, %%" _ASM_AX " \n\t" >>> + "push %%" _ASM_AX " \n\t" >> >> push %%cs > "push %%cs" is invalid in x86_64. Oops. 'push[lq] $__KERNEL_CS' then. > >>> + "push intr_return \n\t" >> >> push $1f. Or even combine with the next instruction, and call %rdx. > Which is faster? jmp or call? Actually it doesn't matter, the processor is clever enough to minimize the difference. But the code is simpler and shorter with 'call'. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v3 1/2] KVM: VMX: enable acknowledge interupt on vmexit
Gleb Natapov wrote on 2013-02-20: > On Wed, Feb 20, 2013 at 02:46:05AM +, Zhang, Yang Z wrote: >> Avi Kivity wrote on 2013-02-20: >>> On Tue, Feb 19, 2013 at 3:39 PM, Yang Zhang > wrote: From: Yang Zhang The "acknowledge interrupt on exit" feature controls processor behavior for external interrupt acknowledgement. When this control is set, the processor acknowledges the interrupt controller to acquire the interrupt vector on VM exit. After enabling this feature, an interrupt which arrived when target cpu is running in vmx non-root mode will be handled by vmx handler instead of handler in idt. Currently, vmx handler only fakes an interrupt stack and jump to idt table to let real handler to handle it. Further, we will recognize the interrupt and only delivery the interrupt which not belong to current vcpu through idt table. The interrupt which belonged to current vcpu will be handled inside vmx handler. This will reduce the interrupt handle cost of KVM. Also, interrupt enable logic is changed if this feature is turnning on: Before this patch, hypervior call local_irq_enable() to enable it directly. Now IF bit is set on interrupt stack frame, and will be enabled on a return from interrupt handler if exterrupt interrupt exists. If no external interrupt, still call local_irq_enable() to enable it. Refer to Intel SDM volum 3, chapter 33.2. +static void vmx_handle_external_intr(struct kvm_vcpu *vcpu) +{ + u32 exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO); + + /* + * If external interrupt exists, IF bit is set in rflags/eflags on the +* interrupt stack frame, and interrupt will be enabled on a return +* from interrupt handler. +*/ + if ((exit_intr_info & (INTR_INFO_VALID_MASK | > INTR_INFO_INTR_TYPE_MASK)) + == (INTR_INFO_VALID_MASK | > INTR_TYPE_EXT_INTR)) { + unsigned int vector; + unsigned long entry; + gate_desc *desc; + struct vcpu_vmx *vmx = to_vmx(vcpu); + + vector = exit_intr_info & INTR_INFO_VECTOR_MASK; +#ifdef CONFIG_X86_64 + desc = (void *)vmx->host_idt_base + vector * 16; +#else + desc = (void *)vmx->host_idt_base + vector * 8; +#endif + + entry = gate_offset(*desc); + asm( + "mov %0, %%" _ASM_DX " \n\t" +#ifdef CONFIG_X86_64 + "mov %%" _ASM_SP ", %%" _ASM_BX " \n\t" + "and $0xfff0, %%" _ASM_SP " \n\t" + "mov %%ss, %%" _ASM_AX " \n\t" + "push %%" _ASM_AX " \n\t" + "push %%" _ASM_BX " \n\t" +#endif >>> >>> Are we sure no interrupts are using the IST feature? I guess it's unlikely. >> Linux uses IST for NMI, stack fault, machine-check, double fault and >> debug interrupt . No external interrupt will use it. This feature is >> only for external interrupt. So we don't need to check it here. >> >>> + "pushf \n\t" + "pop %%" _ASM_AX " \n\t" + "or $0x200, %%" _ASM_AX " \n\t" + "push %%" _ASM_AX " \n\t" >>> >>> Can simplify to pushf; orl $0x200, %%rsp. >> Sure. >> + "mov %%cs, %%" _ASM_AX " \n\t" + "push %%" _ASM_AX " \n\t" >>> >>> push %%cs >> "push %%cs" is invalid in x86_64. >> + "push intr_return \n\t" >>> >>> push $1f. Or even combine with the next instruction, and call %rdx. >> Which is faster? jmp or call? >> > Wrong question. You need to compare push+jmp with call. I do not see why Sorry, I didn't express it clearly. Yes, I want to compare push+jmp with call. > later will be slower. I think so. If push+jmp is not faster than call, I will use the latter. + "jmp *%% " _ASM_DX " \n\t" + "1: \n\t" + ".pushsection .rodata \n\t" + ".global intr_return \n\t" + "intr_return: " _ASM_PTR " 1b \n\t" + ".popsection \n\t" + : : "m"(entry) : +#ifdef CONFIG_X86_64 + "rax", "rbx", "rdx" +#else + "eax", "edx" +#endif + ); + } else + local_irq_enable(); +} + >> >> >> Best regards, >> Yang >> > > -- > Gleb. Best regards, Yang -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 1/2] KVM: VMX: enable acknowledge interupt on vmexit
On Wed, Feb 20, 2013 at 02:46:05AM +, Zhang, Yang Z wrote: > Avi Kivity wrote on 2013-02-20: > > On Tue, Feb 19, 2013 at 3:39 PM, Yang Zhang wrote: > >> From: Yang Zhang > >> > >> The "acknowledge interrupt on exit" feature controls processor behavior > >> for external interrupt acknowledgement. When this control is set, the > >> processor acknowledges the interrupt controller to acquire the > >> interrupt vector on VM exit. > >> > >> After enabling this feature, an interrupt which arrived when target cpu > >> is running in vmx non-root mode will be handled by vmx handler instead > >> of handler in idt. Currently, vmx handler only fakes an interrupt stack > >> and jump to idt table to let real handler to handle it. Further, we > >> will recognize the interrupt and only delivery the interrupt which not > >> belong to current vcpu through idt table. The interrupt which belonged > >> to current vcpu will be handled inside vmx handler. This will reduce > >> the interrupt handle cost of KVM. > >> > >> Also, interrupt enable logic is changed if this feature is turnning on: > >> Before this patch, hypervior call local_irq_enable() to enable it directly. > >> Now IF bit is set on interrupt stack frame, and will be enabled on a > >> return from > >> interrupt handler if exterrupt interrupt exists. If no external interrupt, > >> still > >> call local_irq_enable() to enable it. > >> > >> Refer to Intel SDM volum 3, chapter 33.2. > >> > >> > >> +static void vmx_handle_external_intr(struct kvm_vcpu *vcpu) +{ + > >> u32 exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO); + + /* + > >>* If external interrupt exists, IF bit is set in rflags/eflags on > >> the +* interrupt stack frame, and interrupt will be enabled on > >> a return +* from interrupt handler. +*/ + if > >> ((exit_intr_info & (INTR_INFO_VALID_MASK | INTR_INFO_INTR_TYPE_MASK)) + > >> == (INTR_INFO_VALID_MASK | INTR_TYPE_EXT_INTR)) { > >> + unsigned int vector; + unsigned long > >> entry; + gate_desc *desc; + struct vcpu_vmx > >> *vmx = to_vmx(vcpu); + + vector = exit_intr_info & > >> INTR_INFO_VECTOR_MASK; +#ifdef CONFIG_X86_64 + desc = > >> (void *)vmx->host_idt_base + vector * 16; +#else + desc = > >> (void *)vmx->host_idt_base + vector * 8; +#endif + + > >> entry = gate_offset(*desc); + asm( + > >> "mov %0, %%" _ASM_DX " \n\t" +#ifdef CONFIG_X86_64 + > >> "mov %%" _ASM_SP ", %%" _ASM_BX " \n\t" + > >> "and $0xfff0, %%" _ASM_SP " \n\t" + > >> "mov %%ss, %%" _ASM_AX " \n\t" + "push %%" > >> _ASM_AX " \n\t" + "push %%" _ASM_BX " \n\t" > >> +#endif > > > > Are we sure no interrupts are using the IST feature? I guess it's unlikely. > Linux uses IST for NMI, stack fault, machine-check, double fault and debug > interrupt . No external interrupt will use it. > This feature is only for external interrupt. So we don't need to check it > here. > > > > >> + "pushf \n\t" > >> + "pop %%" _ASM_AX " \n\t" > >> + "or $0x200, %%" _ASM_AX " \n\t" > >> + "push %%" _ASM_AX " \n\t" > > > > Can simplify to pushf; orl $0x200, %%rsp. > Sure. > > >> + "mov %%cs, %%" _ASM_AX " \n\t" > >> + "push %%" _ASM_AX " \n\t" > > > > push %%cs > "push %%cs" is invalid in x86_64. > > >> + "push intr_return \n\t" > > > > push $1f. Or even combine with the next instruction, and call %rdx. > Which is faster? jmp or call? > Wrong question. You need to compare push+jmp with call. I do not see why later will be slower. > >> + "jmp *%% " _ASM_DX " \n\t" > >> + "1: \n\t" > >> + ".pushsection .rodata \n\t" > >> + ".global intr_return \n\t" > >> + "intr_return: " _ASM_PTR " 1b \n\t" > >> + ".popsection \n\t" > >> + : : "m"(entry) : > >> +#ifdef CONFIG_X86_64 > >> + "rax", "rbx", "rdx" > >> +#else > >> + "eax", "edx" > >> +#endif > >> + ); > >> + } else > >> + local_irq_enable(); > >> +} > >> + > > > Best regards, > Yang > -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH V4 RESEND 15/22] tap: multiqueue support
On 02/11/2013 06:28 PM, Markus Armbruster wrote: > Commit 264986e2 extended NetdevTapOptions without updating the > documentation. Hasn't been addressed since. Must fix for 1.4, in my > opinion. Will send a patch to fix this. Thanks > > This is the offending patch: > > Jason Wang writes: > >> Recently, linux support multiqueue tap which could let userspace call >> TUNSETIFF >> for a signle device many times to create multiple file descriptors as >> independent queues. User could also enable/disabe a specific queue through >> TUNSETQUEUE. >> >> The patch adds the generic infrastructure to create multiqueue taps. To >> achieve >> this a new parameter "queues" were introduced to specify how many queues were >> expected to be created for tap by qemu itself. Alternatively, management >> could >> also pass multiple pre-created tap file descriptors separated with ':' >> through a >> new parameter fds like -netdev tap,id=hn0,fds="X:Y:..:Z". Multiple vhost file >> descriptors could also be passed in this way. >> >> Each TAPState were still associated to a tap fd, which mean multiple >> TAPStates >> were created when user needs multiqueue taps. Since each TAPState contains >> one >> NetClientState, with the multiqueue nic support, an N peers of NetClientState >> were built up. >> >> A new parameter, mq_required were introduce in tap_open() to create >> multiqueue >> tap fds. > [...] >> diff --git a/qapi-schema.json b/qapi-schema.json >> index 3a4817b..cdd8384 100644 >> --- a/qapi-schema.json >> +++ b/qapi-schema.json >> @@ -2533,6 +2533,7 @@ >>'data': { >> '*ifname': 'str', >> '*fd': 'str', >> +'*fds':'str', >> '*script': 'str', >> '*downscript': 'str', >> '*helper': 'str', >> @@ -2540,7 +2541,9 @@ >> '*vnet_hdr': 'bool', >> '*vhost': 'bool', >> '*vhostfd':'str', >> -'*vhostforce': 'bool' } } >> +'*vhostfds': 'str', >> +'*vhostforce': 'bool', >> +'*queues': 'uint32'} } >> >> ## >> # @NetdevSocketOptions -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v12 rebased 2/8] start vm after resetting it
On Thu, Feb 07, 2013 at 11:50:28PM -0200, Marcelo Tosatti wrote: > On Wed, Jan 23, 2013 at 03:19:23PM +0800, Hu Tao wrote: > > From: Wen Congyang > > > > The guest should run after resetting it, but it does not run if its > > old state is RUN_STATE_INTERNAL_ERROR or RUN_STATE_PAUSED. > > > > We don't set runstate to RUN_STATE_PAUSED when resetting the guest, > > so the runstate will be changed from RUN_STATE_INTERNAL_ERROR or > > RUN_STATE_PAUSED to RUN_STATE_RUNNING(not RUN_STATE_PAUSED). > > It appears the last hunk will automatically reset state from > RUN_STATE_INTERNAL_ERROR to RUN_STATE_RUNNING ? Yes. > > I suppose the transition table allows, from RUN_STATE_INTERNAL_ERROR: > > system_reset > cont > > To resume the machine? True. I think the purpose of this patch is to always reset and _run_ the guest by `system_reset', avoiding an additional `cont' following `system_reset'. > > > > > Signed-off-by: Wen Congyang > > --- > > include/block/block.h | 2 ++ > > qmp.c | 2 +- > > vl.c | 7 --- > > 3 files changed, 7 insertions(+), 4 deletions(-) > > > > diff --git a/include/block/block.h b/include/block/block.h > > index ffd1936..5e82ccb 100644 > > --- a/include/block/block.h > > +++ b/include/block/block.h > > @@ -366,6 +366,8 @@ void bdrv_disable_copy_on_read(BlockDriverState *bs); > > void bdrv_set_in_use(BlockDriverState *bs, int in_use); > > int bdrv_in_use(BlockDriverState *bs); > > > > +void iostatus_bdrv_it(void *opaque, BlockDriverState *bs); > > + > > #ifdef CONFIG_LINUX_AIO > > int raw_get_aio_fd(BlockDriverState *bs); > > #else > > diff --git a/qmp.c b/qmp.c > > index 55b056b..5f1bed1 100644 > > --- a/qmp.c > > +++ b/qmp.c > > @@ -130,7 +130,7 @@ SpiceInfo *qmp_query_spice(Error **errp) > > }; > > #endif > > > > -static void iostatus_bdrv_it(void *opaque, BlockDriverState *bs) > > +void iostatus_bdrv_it(void *opaque, BlockDriverState *bs) > > { > > bdrv_iostatus_reset(bs); > > } > > diff --git a/vl.c b/vl.c > > index b0bcf1e..1d2edaa 100644 > > --- a/vl.c > > +++ b/vl.c > > @@ -534,7 +534,7 @@ static const RunStateTransition > > runstate_transitions_def[] = { > > { RUN_STATE_INMIGRATE, RUN_STATE_RUNNING }, > > { RUN_STATE_INMIGRATE, RUN_STATE_PAUSED }, > > > > -{ RUN_STATE_INTERNAL_ERROR, RUN_STATE_PAUSED }, > > +{ RUN_STATE_INTERNAL_ERROR, RUN_STATE_RUNNING }, > > { RUN_STATE_INTERNAL_ERROR, RUN_STATE_FINISH_MIGRATE }, > > > > { RUN_STATE_IO_ERROR, RUN_STATE_RUNNING }, > > @@ -569,7 +569,7 @@ static const RunStateTransition > > runstate_transitions_def[] = { > > > > { RUN_STATE_SAVE_VM, RUN_STATE_RUNNING }, > > > > -{ RUN_STATE_SHUTDOWN, RUN_STATE_PAUSED }, > > +{ RUN_STATE_SHUTDOWN, RUN_STATE_RUNNING }, > > { RUN_STATE_SHUTDOWN, RUN_STATE_FINISH_MIGRATE }, > > > > { RUN_STATE_DEBUG, RUN_STATE_SUSPENDED }, > > @@ -1951,7 +1951,8 @@ static bool main_loop_should_exit(void) > > resume_all_vcpus(); > > if (runstate_check(RUN_STATE_INTERNAL_ERROR) || > > runstate_check(RUN_STATE_SHUTDOWN)) { > > -runstate_set(RUN_STATE_PAUSED); > > +bdrv_iterate(iostatus_bdrv_it, NULL); > > +vm_start(); > > } > > } > > if (qemu_wakeup_requested()) { > > -- > > 1.8.0.1.240.ge8a1f5a > > > > -- > > To unsubscribe from this list: send the line "unsubscribe kvm" in > > the body of a message to majord...@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Regards, Hu Tao -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 54061] guest panic after live migration
https://bugzilla.kernel.org/show_bug.cgi?id=54061 --- Comment #2 from Jay Ren 2013-02-20 08:06:16 --- Marcleo, yes, after reverting that commit "caf6900f2d8", live-migration can work fine. -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html