[PATCH] kvm: qemu: fix kvm_tpr_opt_setup() args
From: Mark McLoughlin mar...@redhat.com Fixes: qemu-kvm.h:110: warning: function declaration isn’t a prototype Signed-off-by: Mark McLoughlin mar...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/qemu/kvm-tpr-opt.c b/qemu/kvm-tpr-opt.c index f2a3a1e..b3d26aa 100644 --- a/qemu/kvm-tpr-opt.c +++ b/qemu/kvm-tpr-opt.c @@ -370,7 +370,7 @@ static void vtpr_ioport_write(void *opaque, uint32_t addr, uint32_t val) enable_vapic(env); } -void kvm_tpr_opt_setup(CPUState *env) +void kvm_tpr_opt_setup(void) { register_savevm(kvm-tpr-opt, 0, 1, tpr_save, tpr_load, NULL); register_ioport_write(0x7e, 1, 1, vtpr_ioport_write, NULL); diff --git a/qemu/qemu-kvm.h b/qemu/qemu-kvm.h index 896df4e..12bd5a0 100644 --- a/qemu/qemu-kvm.h +++ b/qemu/qemu-kvm.h @@ -107,7 +107,7 @@ void qemu_kvm_aio_wait_end(void); void qemu_kvm_notify_work(void); -void kvm_tpr_opt_setup(); +void kvm_tpr_opt_setup(void); void kvm_tpr_access_report(CPUState *env, uint64_t rip, int is_write); int handle_tpr_access(void *opaque, int vcpu, uint64_t rip, int is_write); -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: mmu_notifiers release method
From: Marcelo Tosatti mtosa...@redhat.com The destructor for huge pages uses the backing inode for adjusting hugetlbfs accounting. Hugepage mappings are destroyed by exit_mmap, after mmu_notifier_release, so there are no notifications through unmap_hugepage_range at this point. The hugetlbfs inode can be freed with pages backed by it referenced by the shadow. When the shadow releases its reference, the huge page destructor will access a now freed inode. Implement the release operation for kvm mmu notifiers to release page refs before the hugetlbfs inode is gone. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 785c1e3..103bc08 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -814,11 +814,19 @@ static int kvm_mmu_notifier_clear_flush_young(struct mmu_notifier *mn, return young; } +static void kvm_mmu_notifier_release(struct mmu_notifier *mn, +struct mm_struct *mm) +{ + struct kvm *kvm = mmu_notifier_to_kvm(mn); + kvm_arch_flush_shadow(kvm); +} + static const struct mmu_notifier_ops kvm_mmu_notifier_ops = { .invalidate_page= kvm_mmu_notifier_invalidate_page, .invalidate_range_start = kvm_mmu_notifier_invalidate_range_start, .invalidate_range_end = kvm_mmu_notifier_invalidate_range_end, .clear_flush_young = kvm_mmu_notifier_clear_flush_young, + .release= kvm_mmu_notifier_release, }; #endif /* CONFIG_MMU_NOTIFIER KVM_ARCH_WANT_MMU_NOTIFIER */ -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM host kernel hang
Hi, while trying to run a current openSUSE in VMWare ESX in KVM (using NPT), some KVM code seems to be stuck in an endless loop. The qemu process hangs, I can't attach gdb to it and the kernel module seems to be hanging in a place where I don't see any looping code. One CPU is definitely stuck in sys at 100% though. This is running git as of yesterday with some minor ESX modifications that should not touch any of these parts (userspace and MSRs). Maybe one of you guys has a clue what's going on here. You'll find a snippet of a t-sysrq trace with all qemu relevant parts below. The registers (incl. IP) of these don't change over time. Alex qemu-system-x D 810001025280 0 27900 9501 8101000e5c58 0082 8101000e5c1c 81011446e728 807e6280 807e6280 8100388ca680 80601890 8100388ca9c0 00200200 8100388ca9c0 Call Trace: [804485ec] __mutex_lock_slowpath+0x72/0xa9 [8044847a] mutex_lock+0x1e/0x22 [88d7f630] :kvm:kvm_arch_vm_ioctl+0x30e/0x5ae [88d7c78e] :kvm:kvm_vm_ioctl+0x744/0x777 [802acada] vfs_ioctl+0x2a/0x78 [802acd6f] do_vfs_ioctl+0x247/0x261 [802acdde] sys_ioctl+0x55/0x77 [8020bffa] system_call_after_swapgs+0x8a/0x8f [7f2f3b15eb67] qemu-system-x R running task0 27908 9501 88d7d3ad 0390 810100120040 810116491000 fee00390 81011b361d08 88d7f1fb 0001 Call Trace: Inexact backtrace: [88d7d3ad] :kvm:kvm_get_cs_db_l_bits+0x27/0x3e [88d7f1fb] :kvm:emulate_instruction+0x199/0x266 [88d86700] :kvm:kvm_mmu_page_fault+0x49/0x86 [88a3ebe8] :kvm_amd:pf_interception+0xa8/0xb1 [88a3e1b4] :kvm_amd:handle_exit+0x218/0x221 [88d810f6] :kvm:kvm_arch_vcpu_ioctl_run+0x600/0x81a [88d7a4f0] :kvm:kvm_vcpu_ioctl+0xf6/0x485 [802acada] vfs_ioctl+0x2a/0x78 [802acd6f] do_vfs_ioctl+0x247/0x261 [802a13a3] fget_light+0x1/0x83 [802acdde] sys_ioctl+0x55/0x77 [802a0b48] sys_writev+0x60/0x94 [8020bffa] system_call_after_swapgs+0x8a/0x8f dmesg.kvm.gz Description: GNU Zip compressed data
[PATCH] CPUID Masking MSRs
Current AMD CPUs support masking of CPUID bits. Using this functionality, a VMM can limit what features are exposed to the guest, even if it's not using SVM/VMX. While I'm not aware of any open source hypervisor that uses these MSRs atm, VMware ESX does and patches exist for Xen, where trapping CPUID is non-trivial. This patch implements emulation for this masking, which is pretty trivial because we're intercepting CPUID anyways. Because it's so simple and can be pretty effective, I put it into the generic code paths, so VMX benefits from it as well. Signed-off-by: Alexander Graf ag...@suse.de diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 863ea73..e2f0dde 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -370,6 +370,9 @@ struct kvm_vcpu_arch { unsigned long dr6; unsigned long dr7; unsigned long eff_db[KVM_NR_DB_REGS]; + + u64 cpuid_mask; + u64 cpuid_mask_ext; }; struct kvm_mem_alias { diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index 1890032..03b53ba 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -337,5 +337,7 @@ #define MSR_VM_CR 0xc0010114 #define MSR_VM_HSAVE_PA 0xc0010117 +#define MSR_VM_MASK_CPUID 0xc0011004 +#define MSR_VM_MASK_CPUID_EXT 0xc0011005 #endif /* _ASM_X86_MSR_INDEX_H */ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 18bba94..83b4877 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -782,6 +784,12 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data) kvm_write_guest_time(vcpu); break; } + case MSR_VM_MASK_CPUID: + vcpu-arch.cpuid_mask = data; + break; + case MSR_VM_MASK_CPUID_EXT: + vcpu-arch.cpuid_mask_ext = data; + break; default: pr_unimpl(vcpu, unhandled wrmsr: 0x%x data %llx\n, msr, data); return 1; @@ -896,6 +904,12 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata) case MSR_KVM_SYSTEM_TIME: data = vcpu-arch.time; break; + case MSR_VM_MASK_CPUID: + data = vcpu-arch.cpuid_mask; + break; + case MSR_VM_MASK_CPUID_EXT: + data = vcpu-arch.cpuid_mask_ext; + break; default: pr_unimpl(vcpu, unhandled rdmsr: 0x%x\n, msr); return 1; @@ -2901,10 +2915,19 @@ void kvm_emulate_cpuid(struct kvm_vcpu *vcpu) kvm_register_write(vcpu, VCPU_REGS_RDX, 0); best = kvm_find_cpuid_entry(vcpu, function, index); if (best) { + u32 ecx = best-ecx; + u32 edx = best-edx; kvm_register_write(vcpu, VCPU_REGS_RAX, best-eax); kvm_register_write(vcpu, VCPU_REGS_RBX, best-ebx); - kvm_register_write(vcpu, VCPU_REGS_RCX, best-ecx); - kvm_register_write(vcpu, VCPU_REGS_RDX, best-edx); + if ( function == 1 ) { + ecx = (u32)vcpu-arch.cpuid_mask; + edx = (u32)(vcpu-arch.cpuid_mask 32); + } else if ( function == 0x8001 ) { + ecx = (u32)vcpu-arch.cpuid_mask_ext; + edx = (u32)(vcpu-arch.cpuid_mask_ext 32); + } + kvm_register_write(vcpu, VCPU_REGS_RCX, ecx); + kvm_register_write(vcpu, VCPU_REGS_RDX, edx); } kvm_x86_ops-skip_emulated_instruction(vcpu); KVMTRACE_5D(CPUID, vcpu, function, @@ -4089,6 +4112,8 @@ int kvm_arch_vcpu_reset(struct kvm_vcpu *vcpu) memset(vcpu-arch.db, 0, sizeof(vcpu-arch.db)); vcpu-arch.dr6 = DR6_FIXED_1; vcpu-arch.dr7 = DR7_FIXED_1; + vcpu-arch.cpuid_mask = 0x; + vcpu-arch.cpuid_mask_ext = 0x; return kvm_x86_ops-vcpu_reset(vcpu); } -- 1.5.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: compat: define marker_synchronize_unregister on older kernels
Eduardo Habkost wrote: marker_synchronize_unregister() is available only on 2.6.28. However, its definition is very simple, so we can define it if it is missing. This fixes compilation of kvm_trace.c against older kernels. Applied, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: __purge_vmap_area_lazy crash with CONFIG_PREEMPT_RCU=y
Marcelo Tosatti wrote: Ok, the bug seems to be gone now. Avi, can you apply the kernel patch please? Done. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] CPUID Masking MSRs
Alexander Graf wrote: Current AMD CPUs support masking of CPUID bits. Using this functionality, a VMM can limit what features are exposed to the guest, even if it's not using SVM/VMX. While I'm not aware of any open source hypervisor that uses these MSRs atm, VMware ESX does and patches exist for Xen, where trapping CPUID is non-trivial. This patch implements emulation for this masking, which is pretty trivial because we're intercepting CPUID anyways. Because it's so simple and can be pretty effective, I put it into the generic code paths, so VMX benefits from it as well. Missing save/restore support. Note that Intel has similar functionality, called FlexMigration IIRC, likely using different MSRs. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] CPUID Masking MSRs
On 07.01.2009, at 11:07, Avi Kivity wrote: Alexander Graf wrote: Current AMD CPUs support masking of CPUID bits. Using this functionality, a VMM can limit what features are exposed to the guest, even if it's not using SVM/VMX. While I'm not aware of any open source hypervisor that uses these MSRs atm, VMware ESX does and patches exist for Xen, where trapping CPUID is non-trivial. This patch implements emulation for this masking, which is pretty trivial because we're intercepting CPUID anyways. Because it's so simple and can be pretty effective, I put it into the generic code paths, so VMX benefits from it as well. Missing save/restore support. Right. I keep forgetting about that one ;-). Note that Intel has similar functionality, called FlexMigration IIRC, likely using different MSRs. Hum. I'll take a look at it to see if that's as easy to implement then. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: qemu: fix kvm_tpr_opt_setup() args
Mark McLoughlin wrote: Fixes: qemu-kvm.h:110: warning: function declaration isn’t a prototype Applied, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] CPUID Masking MSRs
Alexander Graf wrote: Note that Intel has similar functionality, called FlexMigration IIRC, likely using different MSRs. Hum. I'll take a look at it to see if that's as easy to implement then. It's probably easy (well supporting both might be tricky) but if you don't have a real test case then it's best to wait with it. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm: qemu: fix kvm_tpr_opt_setup() args
Fixes: qemu-kvm.h:110: warning: function declaration isn’t a prototype Signed-off-by: Mark McLoughlin mar...@redhat.com --- qemu/kvm-tpr-opt.c |2 +- qemu/qemu-kvm.h|2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/qemu/kvm-tpr-opt.c b/qemu/kvm-tpr-opt.c index f2a3a1e..b3d26aa 100644 --- a/qemu/kvm-tpr-opt.c +++ b/qemu/kvm-tpr-opt.c @@ -370,7 +370,7 @@ static void vtpr_ioport_write(void *opaque, uint32_t addr, uint32_t val) enable_vapic(env); } -void kvm_tpr_opt_setup(CPUState *env) +void kvm_tpr_opt_setup(void) { register_savevm(kvm-tpr-opt, 0, 1, tpr_save, tpr_load, NULL); register_ioport_write(0x7e, 1, 1, vtpr_ioport_write, NULL); diff --git a/qemu/qemu-kvm.h b/qemu/qemu-kvm.h index 896df4e..12bd5a0 100644 --- a/qemu/qemu-kvm.h +++ b/qemu/qemu-kvm.h @@ -107,7 +107,7 @@ void qemu_kvm_aio_wait_end(void); void qemu_kvm_notify_work(void); -void kvm_tpr_opt_setup(); +void kvm_tpr_opt_setup(void); void kvm_tpr_access_report(CPUState *env, uint64_t rip, int is_write); int handle_tpr_access(void *opaque, int vcpu, uint64_t rip, int is_write); -- 1.6.0.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM host kernel hang
Alexander Graf wrote: Hi, while trying to run a current openSUSE in VMWare ESX in KVM (using NPT), some KVM code seems to be stuck in an endless loop. The qemu process hangs, I can't attach gdb to it and the kernel module seems to be hanging in a place where I don't see any looping code. One CPU is definitely stuck in sys at 100% though. This is running git as of yesterday with some minor ESX modifications that should not touch any of these parts (userspace and MSRs). Maybe one of you guys has a clue what's going on here. You'll find a snippet of a t-sysrq trace with all qemu relevant parts below. The registers (incl. IP) of these don't change over time. Alex qemu-system-x D 810001025280 0 27900 9501 8101000e5c58 0082 8101000e5c1c 81011446e728 807e6280 807e6280 8100388ca680 80601890 8100388ca9c0 00200200 8100388ca9c0 Call Trace: [804485ec] __mutex_lock_slowpath+0x72/0xa9 [8044847a] mutex_lock+0x1e/0x22 [88d7f630] :kvm:kvm_arch_vm_ioctl+0x30e/0x5ae [88d7c78e] :kvm:kvm_vm_ioctl+0x744/0x777 [802acada] vfs_ioctl+0x2a/0x78 [802acd6f] do_vfs_ioctl+0x247/0x261 [802acdde] sys_ioctl+0x55/0x77 [8020bffa] system_call_after_swapgs+0x8a/0x8f [7f2f3b15eb67] Waiting for kvm-lock, so can't kill or strace. qemu-system-x R running task0 27908 9501 88d7d3ad 0390 810100120040 810116491000 fee00390 81011b361d08 88d7f1fb 0001 Call Trace: Inexact backtrace: [88d7d3ad] :kvm:kvm_get_cs_db_l_bits+0x27/0x3e [88d7f1fb] :kvm:emulate_instruction+0x199/0x266 [88d86700] :kvm:kvm_mmu_page_fault+0x49/0x86 [88a3ebe8] :kvm_amd:pf_interception+0xa8/0xb1 [88a3e1b4] :kvm_amd:handle_exit+0x218/0x221 [88d810f6] :kvm:kvm_arch_vcpu_ioctl_run+0x600/0x81a [88d7a4f0] :kvm:kvm_vcpu_ioctl+0xf6/0x485 [802acada] vfs_ioctl+0x2a/0x78 [802acd6f] do_vfs_ioctl+0x247/0x261 [802a13a3] fget_light+0x1/0x83 [802acdde] sys_ioctl+0x55/0x77 [802a0b48] sys_writev+0x60/0x94 [8020bffa] system_call_after_swapgs+0x8a/0x8f But the mutex is not taken here. Looks like we lost it, maybe CONFIG_LOCKDEP can find out where. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: MMU: Segregate mmu pages created with different cr4.pge settings
Alexander Graf wrote: Using this patch it works. But if I read it correctly, that doesn't actually fix anything but only treats NPT/EPT special, which it shouldn't, should it? The patch doesn't fix the bug but is nevertheless correct. cr4.pge only matters to the mmu if using the shadow mmu; with tdp it only wastes memory (and exposes the bug which you encountered). So, wrt to the bug you saw, it's a workaround, but it's also a correct fix for another bug. Maybe this actually even breaks EPT? It shouldn't. I remember having seen a lot of CR4 hacks in svm.c when npt is enabled. Maybe that is related? No. cr4 controls the guest mmu, but with npt the guest mmu is completely virtualized, so we need to ignore those bits. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 07/10] KVM: Unified the delivery of IOAPIC and MSI
Signed-off-by: Sheng Yang sh...@linux.intel.com --- include/linux/kvm_host.h |3 ++ virt/kvm/ioapic.c| 84 virt/kvm/irq_comm.c | 86 -- 3 files changed, 86 insertions(+), 87 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index bfdaab9..2736dbf 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -351,6 +351,9 @@ struct kvm_gsi_route_entry { struct hlist_node link; }; +void kvm_get_intr_delivery_bitmask(struct kvm_ioapic *ioapic, + union kvm_ioapic_redirect_entry *entry, + u32 *deliver_bitmask); void kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 gsi, int level); void kvm_notify_acked_irq(struct kvm *kvm, unsigned gsi); void kvm_register_irq_ack_notifier(struct kvm *kvm, diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c index b6530e9..951df12 100644 --- a/virt/kvm/ioapic.c +++ b/virt/kvm/ioapic.c @@ -200,75 +200,53 @@ u32 kvm_ioapic_get_delivery_bitmask(struct kvm_ioapic *ioapic, u8 dest, static int ioapic_deliver(struct kvm_ioapic *ioapic, int irq) { - u8 dest = ioapic-redirtbl[irq].fields.dest_id; - u8 dest_mode = ioapic-redirtbl[irq].fields.dest_mode; - u8 delivery_mode = ioapic-redirtbl[irq].fields.delivery_mode; - u8 vector = ioapic-redirtbl[irq].fields.vector; - u8 trig_mode = ioapic-redirtbl[irq].fields.trig_mode; + union kvm_ioapic_redirect_entry entry = ioapic-redirtbl[irq]; u32 deliver_bitmask; struct kvm_vcpu *vcpu; int vcpu_id, r = 0; ioapic_debug(dest=%x dest_mode=%x delivery_mode=%x vector=%x trig_mode=%x\n, -dest, dest_mode, delivery_mode, vector, trig_mode); +entry.fields.dest, entry.fields.dest_mode, +entry.fields.delivery_mode, entry.fields.vector, +entry.fields.trig_mode); - deliver_bitmask = kvm_ioapic_get_delivery_bitmask(ioapic, dest, - dest_mode); + kvm_get_intr_delivery_bitmask(ioapic, entry, deliver_bitmask); if (!deliver_bitmask) { ioapic_debug(no target on destination\n); return 0; } - switch (delivery_mode) { - case IOAPIC_LOWEST_PRIORITY: - vcpu = kvm_get_lowest_prio_vcpu(ioapic-kvm, vector, - deliver_bitmask); + /* Always delivery PIT interrupt to vcpu 0 */ #ifdef CONFIG_X86 - if (irq == 0) - vcpu = ioapic-kvm-vcpus[0]; + if (irq == 0) + deliver_bitmask = 1 0; #endif - if (vcpu != NULL) - r = ioapic_inj_irq(ioapic, vcpu, vector, - trig_mode, delivery_mode); - else - ioapic_debug(null lowest prio vcpu: -mask=%x vector=%x delivery_mode=%x\n, -deliver_bitmask, vector, IOAPIC_LOWEST_PRIORITY); - break; - case IOAPIC_FIXED: -#ifdef CONFIG_X86 - if (irq == 0) - deliver_bitmask = 1; -#endif - for (vcpu_id = 0; deliver_bitmask != 0; vcpu_id++) { - if (!(deliver_bitmask (1 vcpu_id))) - continue; - deliver_bitmask = ~(1 vcpu_id); - vcpu = ioapic-kvm-vcpus[vcpu_id]; - if (vcpu) { - r = ioapic_inj_irq(ioapic, vcpu, vector, - trig_mode, delivery_mode); - } - } - break; - case IOAPIC_NMI: - for (vcpu_id = 0; deliver_bitmask != 0; vcpu_id++) { - if (!(deliver_bitmask (1 vcpu_id))) - continue; - deliver_bitmask = ~(1 vcpu_id); - vcpu = ioapic-kvm-vcpus[vcpu_id]; - if (vcpu) + + for (vcpu_id = 0; deliver_bitmask != 0; vcpu_id++) { + if (!(deliver_bitmask (1 vcpu_id))) + continue; + deliver_bitmask = ~(1 vcpu_id); + vcpu = ioapic-kvm-vcpus[vcpu_id]; + if (vcpu) { + if (entry.fields.delivery_mode == + IOAPIC_LOWEST_PRIORITY || + entry.fields.delivery_mode == IOAPIC_FIXED) + r = ioapic_inj_irq(ioapic, vcpu, + entry.fields.vector, + entry.fields.trig_mode, +
[PATCH 01/10] KVM: Add a route layer to convert MSI message to GSI
Avi's purpose, to use single kvm_set_irq() to deal with all interrupt, including MSI. So here is it. struct gsi_route_entry is a mapping from a special gsi(with KVM_GSI_MSG_MASK) to MSI/MSI-X message address/data. And the struct can also be extended for other purpose. Now we support up to 256 gsi_route_entry mapping, and gsi is allocated by kernel and provide two ioctls to userspace, which is more flexiable. Signed-off-by: Sheng Yang sh...@linux.intel.com --- include/linux/kvm.h | 26 +++ include/linux/kvm_host.h | 20 + virt/kvm/irq_comm.c | 70 ++ virt/kvm/kvm_main.c | 106 ++ 4 files changed, 222 insertions(+), 0 deletions(-) diff --git a/include/linux/kvm.h b/include/linux/kvm.h index 71c150f..bbefce6 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -399,6 +399,9 @@ struct kvm_trace_rec { #if defined(CONFIG_X86) #define KVM_CAP_REINJECT_CONTROL 24 #endif +#if defined(CONFIG_X86) +#define KVM_CAP_GSI_ROUTE 25 +#endif /* * ioctls for VM fds @@ -433,6 +436,8 @@ struct kvm_trace_rec { #define KVM_ASSIGN_IRQ _IOR(KVMIO, 0x70, \ struct kvm_assigned_irq) #define KVM_REINJECT_CONTROL _IO(KVMIO, 0x71) +#define KVM_REQUEST_GSI_ROUTE_IOWR(KVMIO, 0x72, void *) +#define KVM_FREE_GSI_ROUTE _IOR(KVMIO, 0x73, void *) /* * ioctls for vcpu fds @@ -553,4 +558,25 @@ struct kvm_assigned_irq { #define KVM_DEV_IRQ_ASSIGN_MSI_ACTION KVM_DEV_IRQ_ASSIGN_ENABLE_MSI #define KVM_DEV_IRQ_ASSIGN_ENABLE_MSI (1 0) +struct kvm_gsi_route_guest { + __u32 entries_nr; + struct kvm_gsi_route_entry_guest *entries; +}; + +#define KVM_GSI_ROUTE_MSI (1 0) +struct kvm_gsi_route_entry_guest { + __u32 gsi; + __u32 type; + __u32 flags; + __u32 reserved; + union { + struct { + __u32 addr_lo; + __u32 addr_hi; + __u32 data; + } msi; + __u32 padding[8]; + }; +}; + #endif diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index a8bcad0..6a00201 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -136,6 +136,9 @@ struct kvm { unsigned long mmu_notifier_seq; long mmu_notifier_count; #endif + struct hlist_head gsi_route_list; +#define KVM_NR_GSI_ROUTE_ENTRIES256 + DECLARE_BITMAP(gsi_route_bitmap, KVM_NR_GSI_ROUTE_ENTRIES); }; /* The guest did something we don't support. */ @@ -336,6 +339,19 @@ void kvm_unregister_irq_mask_notifier(struct kvm *kvm, int irq, struct kvm_irq_mask_notifier *kimn); void kvm_fire_mask_notifiers(struct kvm *kvm, int irq, bool mask); +#define KVM_GSI_ROUTE_MASK0x100ull +struct kvm_gsi_route_entry { + u32 gsi; + u32 type; + u32 flags; + u32 reserved; + union { + struct msi_msg msi; + u32 reserved[8]; + }; + struct hlist_node link; +}; + void kvm_set_irq(struct kvm *kvm, int irq_source_id, int irq, int level); void kvm_notify_acked_irq(struct kvm *kvm, unsigned gsi); void kvm_register_irq_ack_notifier(struct kvm *kvm, @@ -343,6 +359,10 @@ void kvm_register_irq_ack_notifier(struct kvm *kvm, void kvm_unregister_irq_ack_notifier(struct kvm_irq_ack_notifier *kian); int kvm_request_irq_source_id(struct kvm *kvm); void kvm_free_irq_source_id(struct kvm *kvm, int irq_source_id); +int kvm_update_gsi_route(struct kvm *kvm, struct kvm_gsi_route_entry *entry); +struct kvm_gsi_route_entry *kvm_find_gsi_route_entry(struct kvm *kvm, u32 gsi); +void kvm_free_gsi_route(struct kvm *kvm, struct kvm_gsi_route_entry *entry); +void kvm_free_gsi_route_list(struct kvm *kvm); #ifdef CONFIG_DMAR int kvm_iommu_map_pages(struct kvm *kvm, gfn_t base_gfn, diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c index 5162a41..7460e7f 100644 --- a/virt/kvm/irq_comm.c +++ b/virt/kvm/irq_comm.c @@ -123,3 +123,73 @@ void kvm_fire_mask_notifiers(struct kvm *kvm, int irq, bool mask) kimn-func(kimn, mask); } +int kvm_update_gsi_route(struct kvm *kvm, struct kvm_gsi_route_entry *entry) +{ + struct kvm_gsi_route_entry *found_entry, *new_entry; + int r, gsi; + + mutex_lock(kvm-lock); + /* Find whether we need a update or a new entry */ + found_entry = kvm_find_gsi_route_entry(kvm, entry-gsi); + if (found_entry) + *found_entry = *entry; + else { + gsi = find_first_zero_bit(kvm-gsi_route_bitmap, + KVM_NR_GSI_ROUTE_ENTRIES); + if (gsi = KVM_NR_GSI_ROUTE_ENTRIES) { + r = -ENOSPC; + goto out; + } + __set_bit(gsi, kvm-gsi_route_bitmap); + entry-gsi = gsi | KVM_GSI_ROUTE_MASK; +
Re: [PATCH] KVM: MMU: Segregate mmu pages created with different cr4.pge settings
On Wed, Jan 07, 2009 at 12:19:26PM +0200, Avi Kivity wrote: Alexander Graf wrote: Using this patch it works. But if I read it correctly, that doesn't actually fix anything but only treats NPT/EPT special, which it shouldn't, should it? The patch doesn't fix the bug but is nevertheless correct. cr4.pge only matters to the mmu if using the shadow mmu; with tdp it only wastes memory (and exposes the bug which you encountered). So, wrt to the bug you saw, it's a workaround, but it's also a correct fix for another bug. Maybe this actually even breaks EPT? It shouldn't. I remember having seen a lot of CR4 hacks in svm.c when npt is enabled. Maybe that is related? No. cr4 controls the guest mmu, but with npt the guest mmu is completely virtualized, so we need to ignore those bits. Let me shoot at one direction: a shadow page with PGE bit in either state is created. Later that shadow page is nuked (via mmu notifiers, for example). Then set_cr4 changes base_role.pge to a different value, and a fault creates a new shadow page and instantiates that in the tree. Perhaps a svm_flush_tlb is required in such case, when updating a previously valid pagetable entry? Joerg? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/6][v3] Userspace support for MSI
Update from v2: Change API to gsi_route. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/6] kvm: Replace force type convert with container_of()
Signed-off-by: Sheng Yang sh...@linux.intel.com --- qemu/hw/device-assignment.c | 20 1 files changed, 12 insertions(+), 8 deletions(-) diff --git a/qemu/hw/device-assignment.c b/qemu/hw/device-assignment.c index d5eb7b2..f357d17 100644 --- a/qemu/hw/device-assignment.c +++ b/qemu/hw/device-assignment.c @@ -144,7 +144,7 @@ static uint32_t assigned_dev_ioport_readl(void *opaque, uint32_t addr) static void assigned_dev_iomem_map(PCIDevice *pci_dev, int region_num, uint32_t e_phys, uint32_t e_size, int type) { -AssignedDevice *r_dev = (AssignedDevice *) pci_dev; +AssignedDevice *r_dev = container_of(pci_dev, AssignedDevice, dev); AssignedDevRegion *region = r_dev-v_addrs[region_num]; uint32_t old_ephys = region-e_physbase; uint32_t old_esize = region-e_size; @@ -178,7 +178,7 @@ static void assigned_dev_iomem_map(PCIDevice *pci_dev, int region_num, static void assigned_dev_ioport_map(PCIDevice *pci_dev, int region_num, uint32_t addr, uint32_t size, int type) { -AssignedDevice *r_dev = (AssignedDevice *) pci_dev; +AssignedDevice *r_dev = container_of(pci_dev, AssignedDevice, dev); AssignedDevRegion *region = r_dev-v_addrs[region_num]; int first_map = (region-e_size == 0); CPUState *env; @@ -227,6 +227,7 @@ static void assigned_dev_pci_write_config(PCIDevice *d, uint32_t address, { int fd; ssize_t ret; +AssignedDevice *pci_dev = container_of(d, AssignedDevice, dev); DEBUG((%x.%x): address=%04x val=0x%08x len=%d\n, ((d-devfn 3) 0x1F), (d-devfn 0x7), @@ -248,7 +249,7 @@ static void assigned_dev_pci_write_config(PCIDevice *d, uint32_t address, ((d-devfn 3) 0x1F), (d-devfn 0x7), (uint16_t) address, val, len); -fd = ((AssignedDevice *)d)-real_device.config_fd; +fd = pci_dev-real_device.config_fd; again: ret = pwrite(fd, val, len, address); @@ -269,6 +270,7 @@ static uint32_t assigned_dev_pci_read_config(PCIDevice *d, uint32_t address, uint32_t val = 0; int fd; ssize_t ret; +AssignedDevice *pci_dev = container_of(d, AssignedDevice, dev); if ((address = 0x10 address = 0x24) || address == 0x34 || address == 0x3c || address == 0x3d) { @@ -282,7 +284,7 @@ static uint32_t assigned_dev_pci_read_config(PCIDevice *d, uint32_t address, if (address == 0xFC) goto do_log; -fd = ((AssignedDevice *)d)-real_device.config_fd; +fd = pci_dev-real_device.config_fd; again: ret = pread(fd, val, len, address); @@ -539,16 +541,18 @@ struct PCIDevice *init_assigned_device(AssignedDevInfo *adev, PCIBus *bus) { int r; AssignedDevice *dev; +PCIDevice *pci_dev; uint8_t e_device, e_intx; struct kvm_assigned_pci_dev assigned_dev_data; DEBUG(Registering real physical device %s (bus=%x dev=%x func=%x)\n, adev-name, adev-bus, adev-dev, adev-func); -dev = (AssignedDevice *) -pci_register_device(bus, adev-name, sizeof(AssignedDevice), --1, assigned_dev_pci_read_config, -assigned_dev_pci_write_config); +pci_dev = pci_register_device(bus, adev-name, + sizeof(AssignedDevice), -1, assigned_dev_pci_read_config, + assigned_dev_pci_write_config); +dev = container_of(pci_dev, AssignedDevice, dev); + if (NULL == dev) { fprintf(stderr, %s: Error: Couldn't register real device %s\n, __func__, adev-name); -- 1.5.4.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/6] Support for device capability
This framework can be easily extended to support device capability, like MSI/MSI-x. Signed-off-by: Sheng Yang sh...@linux.intel.com --- qemu/hw/pci.c | 85 + qemu/hw/pci.h | 30 2 files changed, 115 insertions(+), 0 deletions(-) diff --git a/qemu/hw/pci.c b/qemu/hw/pci.c index 8589dfa..d755516 100644 --- a/qemu/hw/pci.c +++ b/qemu/hw/pci.c @@ -351,11 +351,65 @@ static void pci_update_mappings(PCIDevice *d) } } +int pci_access_cap_config(PCIDevice *pci_dev, uint32_t address, int len) +{ +if (pci_dev-cap.supported address = pci_dev-cap.start +(address + len) pci_dev-cap.start + pci_dev-cap.length) +return 1; +return 0; +} + +uint32_t pci_default_cap_read_config(PCIDevice *pci_dev, + uint32_t address, int len) +{ +uint32_t val = 0; + +if (pci_access_cap_config(pci_dev, address, len)) { +switch(len) { +default: +case 4: +if (address pci_dev-cap.start + pci_dev-cap.length - 4) { +val = le32_to_cpu(*(uint32_t *)(pci_dev-cap.config ++ address - pci_dev-cap.start)); +break; +} +/* fall through */ +case 2: +if (address pci_dev-cap.start + pci_dev-cap.length - 2) { +val = le16_to_cpu(*(uint16_t *)(pci_dev-cap.config ++ address - pci_dev-cap.start)); +break; +} +/* fall through */ +case 1: +val = pci_dev-cap.config[address - pci_dev-cap.start]; +break; +} +} +return val; +} + +void pci_default_cap_write_config(PCIDevice *pci_dev, + uint32_t address, uint32_t val, int len) +{ +if (pci_access_cap_config(pci_dev, address, len)) { +int i; +for (i = 0; i len; i++) { +pci_dev-cap.config[address + i - pci_dev-cap.start] = val; +val = 8; +} +return; +} +} + uint32_t pci_default_read_config(PCIDevice *d, uint32_t address, int len) { uint32_t val; +if (pci_access_cap_config(d, address, len)) +return d-cap.config_read(d, address, len); + switch(len) { default: case 4: @@ -409,6 +463,11 @@ void pci_default_write_config(PCIDevice *d, return; } default_config: +if (pci_access_cap_config(d, address, len)) { +d-cap.config_write(d, address, val, len); +return; +} + /* not efficient, but simple */ addr = address; for(i = 0; i len; i++) { @@ -828,3 +887,29 @@ PCIBus *pci_bridge_init(PCIBus *bus, int devfn, uint32_t id, s-bus = pci_register_secondary_bus(s-dev, map_irq); return s-bus; } + +void pci_enable_capability_support(PCIDevice *pci_dev, + uint32_t config_start, + PCICapConfigReadFunc *config_read, + PCICapConfigWriteFunc *config_write, + PCICapConfigInitFunc *config_init) +{ +if (!pci_dev) +return; + +if (config_start = 0x40 config_start 0xff) +pci_dev-cap.start = config_start; +else +pci_dev-cap.start = PCI_CAPABILITY_CONFIG_DEFAULT_START_ADDR; +if (config_read) +pci_dev-cap.config_read = config_read; +else +pci_dev-cap.config_read = pci_default_cap_read_config; +if (config_write) +pci_dev-cap.config_write = config_write; +else +pci_dev-cap.config_write = pci_default_cap_write_config; +pci_dev-cap.supported = 1; +pci_dev-config[0x34] = pci_dev-cap.start; +config_init(pci_dev); +} diff --git a/qemu/hw/pci.h b/qemu/hw/pci.h index 1f33819..f2a622c 100644 --- a/qemu/hw/pci.h +++ b/qemu/hw/pci.h @@ -28,6 +28,12 @@ typedef void PCIMapIORegionFunc(PCIDevice *pci_dev, int region_num, uint32_t addr, uint32_t size, int type); typedef int PCIUnregisterFunc(PCIDevice *pci_dev); +typedef void PCICapConfigWriteFunc(PCIDevice *pci_dev, + uint32_t address, uint32_t val, int len); +typedef uint32_t PCICapConfigReadFunc(PCIDevice *pci_dev, + uint32_t address, int len); +typedef void PCICapConfigInitFunc(PCIDevice *pci_dev); + #define PCI_ADDRESS_SPACE_MEM 0x00 #define PCI_ADDRESS_SPACE_IO 0x01 #define PCI_ADDRESS_SPACE_MEM_PREFETCH 0x08 @@ -78,6 +84,10 @@ typedef struct PCIIORegion { #define PCI_COMMAND_RESERVED_MASK_HI (PCI_COMMAND_RESERVED 8) +#define PCI_CAPABILITY_CONFIG_MAX_LENGTH 0x60 +#define PCI_CAPABILITY_CONFIG_DEFAULT_START_ADDR 0x40 +#define PCI_CAPABILITY_CONFIG_MSI_LENGTH 0x10 + struct PCIDevice { /* PCI config space */ uint8_t config[256]; @@ -100,6 +110,15 @@ struct PCIDevice { /*
[PATCH 6/6] kvm: expose MSI capability to guest
Signed-off-by: Sheng Yang sh...@linux.intel.com --- qemu/hw/device-assignment.c | 111 --- qemu/hw/device-assignment.h |7 +++ 2 files changed, 111 insertions(+), 7 deletions(-) diff --git a/qemu/hw/device-assignment.c b/qemu/hw/device-assignment.c index 169357f..4c08b00 100644 --- a/qemu/hw/device-assignment.c +++ b/qemu/hw/device-assignment.c @@ -268,7 +268,8 @@ static void assigned_dev_pci_write_config(PCIDevice *d, uint32_t address, } if ((address = 0x10 address = 0x24) || address == 0x34 || -address == 0x3c || address == 0x3d) { +address == 0x3c || address == 0x3d || +pci_access_cap_config(d, address, len)) { /* used for update-mappings (BAR emulation) */ pci_default_write_config(d, address, val, len); return; @@ -302,7 +303,8 @@ static uint32_t assigned_dev_pci_read_config(PCIDevice *d, uint32_t address, AssignedDevice *pci_dev = container_of(d, AssignedDevice, dev); if ((address = 0x10 address = 0x24) || address == 0x34 || -address == 0x3c || address == 0x3d) { +address == 0x3c || address == 0x3d || +pci_access_cap_config(d, address, len)) { val = pci_default_read_config(d, address, len); DEBUG((%x.%x): address=%04x val=0x%08x len=%d\n, (d-devfn 3) 0x1F, (d-devfn 0x7), address, val, len); @@ -331,11 +333,13 @@ do_log: DEBUG((%x.%x): address=%04x val=0x%08x len=%d\n, (d-devfn 3) 0x1F, (d-devfn 0x7), address, val, len); -/* kill the special capabilities */ -if (address == 4 len == 4) -val = ~0x10; -else if (address == 6) -val = ~0x10; +if (!pci_dev-cap.available) { +/* kill the special capabilities */ +if (address == 4 len == 4) +val = ~0x10; +else if (address == 6) +val = ~0x10; +} return val; } @@ -566,6 +570,95 @@ void assigned_dev_update_irq(PCIDevice *d) } } +#if defined(KVM_CAP_DEVICE_MSI) defined (KVM_CAP_GSI_ROUTE) +static void assigned_dev_update_msi(PCIDevice *pci_dev, unsigned int ctrl_pos) +{ +struct kvm_assigned_irq assigned_irq_data; +struct kvm_gsi_route_guest gsi_route; +struct kvm_gsi_route_entry_guest gsi_entry[1]; +AssignedDevice *assigned_dev = container_of(pci_dev, AssignedDevice, dev); +uint8_t ctrl_byte = pci_dev-cap.config[ctrl_pos]; + +memset(assigned_irq_data, 0, sizeof assigned_irq_data); +assigned_irq_data.assigned_dev_id = +calc_assigned_dev_id(assigned_dev-h_busnr, +(uint8_t)assigned_dev-h_devfn); + +if (ctrl_byte PCI_MSI_FLAGS_ENABLE) { + gsi_route.entries_nr = 1; +gsi_entry[0].msi.addr_lo = *(uint32_t *)(pci_dev-cap.config + +PCI_MSI_ADDRESS_LO); +gsi_entry[0].msi.data = *(uint16_t *)(pci_dev-cap.config + + PCI_MSI_DATA_32); +gsi_entry[0].type = KVM_GSI_ROUTE_MSI; +gsi_route.entries = gsi_entry; +if (kvm_request_gsi_route(kvm_context, gsi_route) 0) { +perror(assigned_dev_enable_msi: kvm_request_gsi_route); +assigned_dev-cap.state = ~ASSIGNED_DEVICE_MSI_ENABLED; +return; +} +assigned_irq_data.guest_irq = gsi_entry[0].gsi; +assigned_irq_data.flags = KVM_DEV_IRQ_ASSIGN_ENABLE_MSI; +} else + assigned_irq_data.guest_irq = assigned_dev-girq; + +if (kvm_assign_irq(kvm_context, assigned_irq_data) 0) +perror(assigned_dev_enable_msi); +if (assigned_irq_data.flags KVM_DEV_IRQ_ASSIGN_ENABLE_MSI) { +assigned_dev-cap.state |= ASSIGNED_DEVICE_MSI_ENABLED; +pci_dev-cap.config[ctrl_pos] |= PCI_MSI_FLAGS_ENABLE; +} else { +assigned_dev-cap.state = ~ASSIGNED_DEVICE_MSI_ENABLED; +pci_dev-cap.config[ctrl_pos] = ~PCI_MSI_FLAGS_ENABLE; +} +} +#endif + +void assigned_device_pci_cap_write_config(PCIDevice *pci_dev, uint32_t address, + uint32_t val, int len) +{ +AssignedDevice *assigned_dev = container_of(pci_dev, AssignedDevice, dev); +unsigned int pos = pci_dev-cap.start, ctrl_pos; + +pci_default_cap_write_config(pci_dev, address, val, len); +#if defined(KVM_CAP_DEVICE_MSI) defined (KVM_CAP_GSI_ROUTE) +if (assigned_dev-cap.available ASSIGNED_DEVICE_CAP_MSI) { +ctrl_pos = pos + PCI_MSI_FLAGS; +if (address = ctrl_pos address + len ctrl_pos) +assigned_dev_update_msi(pci_dev, ctrl_pos - pci_dev-cap.start); +pos += PCI_CAPABILITY_CONFIG_MSI_LENGTH; +} +#endif +return; +} + +static void assigned_device_pci_cap_init(PCIDevice *pci_dev) +{ +AssignedDevice *dev = container_of(pci_dev, AssignedDevice, dev); +int next_cap_pt; +struct pci_access *pacc; +int h_bus, h_dev, h_func; + +pci_dev-cap.length = 0; +h_bus = dev-h_busnr; +h_dev =
[PATCH 2/6] Make device assignment depend on libpci
Which is used later for capability detection. Signed-off-by: Sheng Yang sh...@linux.intel.com --- qemu/Makefile.target |1 + qemu/configure | 20 2 files changed, 21 insertions(+), 0 deletions(-) diff --git a/qemu/Makefile.target b/qemu/Makefile.target index f58015b..a58f31d 100644 --- a/qemu/Makefile.target +++ b/qemu/Makefile.target @@ -696,6 +696,7 @@ OBJS += device-hotplug.o ifeq ($(USE_KVM_DEVICE_ASSIGNMENT), 1) OBJS+= device-assignment.o +LIBS+=-lpci endif ifeq ($(TARGET_BASE_ARCH), i386) diff --git a/qemu/configure b/qemu/configure index 6eb12ae..f5d3f89 100755 --- a/qemu/configure +++ b/qemu/configure @@ -780,6 +780,26 @@ EOF fi fi +# libpci probe for kvm_cap_device_assignment +if test $kvm_cap_device_assignment = yes ; then +cat $TMPC EOF +#include pci/pci.h +#ifndef PCI_VENDOR_ID +#error NO LIBPCI +#endif +int main(void) { return 0; } +EOF +if $cc $ARCH_CFLAGS -o $TMPE ${OS_CFLAGS} $TMPC 2/dev/null ; then +: +else +echo +echo Error: libpci check failed +echo Disable KVM Device Assignment capability. +echo +kvm_cap_device_assignment=no +fi +fi + ## # zlib check -- 1.5.4.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/6] kvm: ioctl for gsi_route
Signed-off-by: Sheng Yang sh...@linux.intel.com --- libkvm/libkvm.c | 27 +++ libkvm/libkvm.h |8 2 files changed, 35 insertions(+), 0 deletions(-) diff --git a/libkvm/libkvm.c b/libkvm/libkvm.c index 0408fdb..6d53f38 100644 --- a/libkvm/libkvm.c +++ b/libkvm/libkvm.c @@ -1164,3 +1164,30 @@ int kvm_reinject_control(kvm_context_t kvm, int pit_reinject) #endif return -ENOSYS; } + +#ifdef KVM_CAP_GSI_ROUTE +int kvm_request_gsi_route(kvm_context_t kvm, + struct kvm_gsi_route_guest *route) +{ +int ret; + +ret = ioctl(kvm-vm_fd, KVM_REQUEST_GSI_ROUTE, route); +if (ret 0) +return -errno; + +return ret; +} + +int kvm_free_gsi_route(kvm_context_t kvm, + struct kvm_gsi_route_guest *route) +{ +int ret; + +ret = ioctl(kvm-vm_fd, KVM_FREE_GSI_ROUTE, route); +if (ret 0) +return -errno; + +return ret; +} + +#endif diff --git a/libkvm/libkvm.h b/libkvm/libkvm.h index ee1ba68..2bfcfe3 100644 --- a/libkvm/libkvm.h +++ b/libkvm/libkvm.h @@ -720,4 +720,12 @@ int kvm_assign_irq(kvm_context_t kvm, */ int kvm_destroy_memory_region_works(kvm_context_t kvm); #endif + +#ifdef KVM_CAP_GSI_ROUTE +int kvm_request_gsi_route(kvm_context_t kvm, + struct kvm_gsi_route_guest *route); +int kvm_free_gsi_route(kvm_context_t kvm, + struct kvm_gsi_route_guest *route); +#endif + #endif -- 1.5.4.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/6] Figure out device capability
Try to figure out device capability in update_dev_cap(). Now we are only care about MSI capability. The function pci_find_cap_offset original function wrote by Allen for Xen. Notice the function need root privilege to work. This depends on libpci to work. Signed-off-by: Allen Kay allen.m@intel.com Signed-off-by: Sheng Yang sh...@linux.intel.com --- qemu/hw/device-assignment.c | 29 + qemu/hw/device-assignment.h |1 + 2 files changed, 30 insertions(+), 0 deletions(-) diff --git a/qemu/hw/device-assignment.c b/qemu/hw/device-assignment.c index f357d17..169357f 100644 --- a/qemu/hw/device-assignment.c +++ b/qemu/hw/device-assignment.c @@ -222,6 +222,35 @@ static void assigned_dev_ioport_map(PCIDevice *pci_dev, int region_num, (r_dev-v_addrs + region_num)); } +static uint8_t pci_find_cap_offset(struct pci_dev *pci_dev, uint8_t cap) +{ +int id; +int max_cap = 48; +int pos = PCI_CAPABILITY_LIST; +int status; + +status = pci_read_byte(pci_dev, PCI_STATUS); +if ((status PCI_STATUS_CAP_LIST) == 0) +return 0; + +while (max_cap--) { +pos = pci_read_byte(pci_dev, pos); +if (pos 0x40) +break; + +pos = ~3; +id = pci_read_byte(pci_dev, pos + PCI_CAP_LIST_ID); + +if (id == 0xff) +break; +if (id == cap) +return pos; + +pos += PCI_CAP_LIST_NEXT; +} +return 0; +} + static void assigned_dev_pci_write_config(PCIDevice *d, uint32_t address, uint32_t val, int len) { diff --git a/qemu/hw/device-assignment.h b/qemu/hw/device-assignment.h index a565948..2d83566 100644 --- a/qemu/hw/device-assignment.h +++ b/qemu/hw/device-assignment.h @@ -29,6 +29,7 @@ #define __DEVICE_ASSIGNMENT_H__ #include sys/mman.h +#include pci/pci.h #include qemu-common.h #include sys-queue.h #include pci.h -- 1.5.4.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] CPUID Masking MSRs
Alexander Graf wrote: Well if I could take the FlexMigration design into account when putting variables in the vcpu context, that'd be great. But I can't seem to find it in the Intel documentation, so I'll leave it for now. Not real documentation (tell me if you find some!), but this code shows almost everything you probably need: http://xenbits.xensource.com/xen-unstable.hg?rev/be20b11656bb Regards, Andre. -- Andre Przywara AMD-Operating System Research Center (OSRC), Dresden, Germany Tel: +49 351 277-84917 to satisfy European Law for business letters: Advanced Micro Devices GmbH Karl-Hammerschmidt-Str. 34, 85609 Dornach b. München Geschäftsführer: Jochen Polster; Thomas M. McCoy; Giuliano Meroni Sitz: Dornach, Gemeinde Aschheim, Landkreis München Registergericht München, HRB Nr. 43632 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 04/10] KVM: Using ioapic_irqchip() macro for kvm_set_irq
Signed-off-by: Sheng Yang sh...@linux.intel.com --- virt/kvm/irq_comm.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c index 7460e7f..f5e2d2c 100644 --- a/virt/kvm/irq_comm.c +++ b/virt/kvm/irq_comm.c @@ -39,7 +39,7 @@ void kvm_set_irq(struct kvm *kvm, int irq_source_id, int irq, int level) * IOAPIC. So set the bit in both. The guest will ignore * writes to the unused one. */ - kvm_ioapic_set_irq(kvm-arch.vioapic, irq, !!(*irq_state)); + kvm_ioapic_set_irq(ioapic_irqchip(kvm), irq, !!(*irq_state)); #ifdef CONFIG_X86 kvm_pic_set_irq(pic_irqchip(kvm), irq, !!(*irq_state)); #endif -- 1.5.4.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 02/10] KVM: Using gsi route for MSI device assignment
Convert MSI userspace interface to support gsi_msg mapping(and nobody should be the user of the old interface...). Signed-off-by: Sheng Yang sh...@linux.intel.com --- include/linux/kvm_host.h |1 - virt/kvm/kvm_main.c | 79 ++ 2 files changed, 45 insertions(+), 35 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 6a00201..eab9588 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -316,7 +316,6 @@ struct kvm_assigned_dev_kernel { int host_irq; bool host_irq_disabled; int guest_irq; - struct msi_msg guest_msi; #define KVM_ASSIGNED_DEV_GUEST_INTX(1 0) #define KVM_ASSIGNED_DEV_GUEST_MSI (1 1) #define KVM_ASSIGNED_DEV_HOST_INTX (1 8) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index bc1a27b..0a59245 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -92,44 +92,56 @@ static void assigned_device_msi_dispatch(struct kvm_assigned_dev_kernel *dev) int vcpu_id; struct kvm_vcpu *vcpu; struct kvm_ioapic *ioapic = ioapic_irqchip(dev-kvm); - int dest_id = (dev-guest_msi.address_lo MSI_ADDR_DEST_ID_MASK) -MSI_ADDR_DEST_ID_SHIFT; - int vector = (dev-guest_msi.data MSI_DATA_VECTOR_MASK) -MSI_DATA_VECTOR_SHIFT; - int dest_mode = test_bit(MSI_ADDR_DEST_MODE_SHIFT, - (unsigned long *)dev-guest_msi.address_lo); - int trig_mode = test_bit(MSI_DATA_TRIGGER_SHIFT, - (unsigned long *)dev-guest_msi.data); - int delivery_mode = test_bit(MSI_DATA_DELIVERY_MODE_SHIFT, - (unsigned long *)dev-guest_msi.data); + struct kvm_gsi_route_entry *gsi_entry; + int dest_id, vector, dest_mode, trig_mode, delivery_mode; u32 deliver_bitmask; BUG_ON(!ioapic); - deliver_bitmask = kvm_ioapic_get_delivery_bitmask(ioapic, + gsi_entry = kvm_find_gsi_route_entry(dev-kvm, dev-guest_irq); + if (!gsi_entry) { + printk(KERN_WARNING kvm: fail to find correlated gsi entry\n); + return; + } + + if (gsi_entry-type KVM_GSI_ROUTE_MSI) { + dest_id = (gsi_entry-msi.address_lo MSI_ADDR_DEST_ID_MASK) +MSI_ADDR_DEST_ID_SHIFT; + vector = (gsi_entry-msi.data MSI_DATA_VECTOR_MASK) +MSI_DATA_VECTOR_SHIFT; + dest_mode = test_bit(MSI_ADDR_DEST_MODE_SHIFT, + (unsigned long *)gsi_entry-msi.address_lo); + trig_mode = test_bit(MSI_DATA_TRIGGER_SHIFT, + (unsigned long *)gsi_entry-msi.data); + delivery_mode = test_bit(MSI_DATA_DELIVERY_MODE_SHIFT, + (unsigned long *)gsi_entry-msi.data); + deliver_bitmask = kvm_ioapic_get_delivery_bitmask(ioapic, dest_id, dest_mode); - /* IOAPIC delivery mode value is the same as MSI here */ - switch (delivery_mode) { - case IOAPIC_LOWEST_PRIORITY: - vcpu = kvm_get_lowest_prio_vcpu(ioapic-kvm, vector, - deliver_bitmask); - if (vcpu != NULL) - kvm_apic_set_irq(vcpu, vector, trig_mode); - else - printk(KERN_INFO kvm: null lowest priority vcpu!\n); - break; - case IOAPIC_FIXED: - for (vcpu_id = 0; deliver_bitmask != 0; vcpu_id++) { - if (!(deliver_bitmask (1 vcpu_id))) - continue; - deliver_bitmask = ~(1 vcpu_id); - vcpu = ioapic-kvm-vcpus[vcpu_id]; - if (vcpu) + /* IOAPIC delivery mode value is the same as MSI here */ + switch (delivery_mode) { + case IOAPIC_LOWEST_PRIORITY: + vcpu = kvm_get_lowest_prio_vcpu(ioapic-kvm, vector, + deliver_bitmask); + if (vcpu != NULL) kvm_apic_set_irq(vcpu, vector, trig_mode); + else + printk(KERN_INFO + kvm: null lowest priority vcpu!\n); + break; + case IOAPIC_FIXED: + for (vcpu_id = 0; deliver_bitmask != 0; vcpu_id++) { + if (!(deliver_bitmask (1 vcpu_id))) + continue; + deliver_bitmask = ~(1 vcpu_id); + vcpu = ioapic-kvm-vcpus[vcpu_id]; + if (vcpu) + kvm_apic_set_irq(vcpu, vector, +
[PATCH 09/10] KVM: Update intr delivery func to accept unsigned long* bitmap
Would be used with bit ops, and would be easily extended if KVM_MAX_VCPUS is increased. Signed-off-by: Sheng Yang sh...@linux.intel.com --- arch/x86/kvm/lapic.c |8 include/linux/kvm_host.h |2 +- virt/kvm/ioapic.c|4 ++-- virt/kvm/ioapic.h|4 ++-- virt/kvm/irq_comm.c |6 +++--- 5 files changed, 12 insertions(+), 12 deletions(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index afac68c..c1e4935 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -403,7 +403,7 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode, } static struct kvm_lapic *kvm_apic_round_robin(struct kvm *kvm, u8 vector, - unsigned long bitmap) + unsigned long *bitmap) { int last; int next; @@ -415,7 +415,7 @@ static struct kvm_lapic *kvm_apic_round_robin(struct kvm *kvm, u8 vector, do { if (++next == KVM_MAX_VCPUS) next = 0; - if (kvm-vcpus[next] == NULL || !test_bit(next, bitmap)) + if (kvm-vcpus[next] == NULL || !test_bit(next, bitmap)) continue; apic = kvm-vcpus[next]-arch.apic; if (apic apic_enabled(apic)) @@ -431,7 +431,7 @@ static struct kvm_lapic *kvm_apic_round_robin(struct kvm *kvm, u8 vector, } struct kvm_vcpu *kvm_get_lowest_prio_vcpu(struct kvm *kvm, u8 vector, - unsigned long bitmap) + unsigned long *bitmap) { struct kvm_lapic *apic; @@ -502,7 +502,7 @@ static void apic_send_ipi(struct kvm_lapic *apic) } if (delivery_mode == APIC_DM_LOWEST) { - target = kvm_get_lowest_prio_vcpu(vcpu-kvm, vector, lpr_map); + target = kvm_get_lowest_prio_vcpu(vcpu-kvm, vector, lpr_map); if (target != NULL) __apic_accept_irq(target-arch.apic, delivery_mode, vector, level, trig_mode); diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 2736dbf..ed1c6bb 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -353,7 +353,7 @@ struct kvm_gsi_route_entry { void kvm_get_intr_delivery_bitmask(struct kvm_ioapic *ioapic, union kvm_ioapic_redirect_entry *entry, - u32 *deliver_bitmask); + unsigned long *deliver_bitmask); void kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 gsi, int level); void kvm_notify_acked_irq(struct kvm *kvm, unsigned gsi); void kvm_register_irq_ack_notifier(struct kvm *kvm, diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c index aa4e8d8..0dcb0da 100644 --- a/virt/kvm/ioapic.c +++ b/virt/kvm/ioapic.c @@ -159,7 +159,7 @@ static void ioapic_inj_nmi(struct kvm_vcpu *vcpu) } void kvm_ioapic_get_delivery_bitmask(struct kvm_ioapic *ioapic, u8 dest, -u8 dest_mode, u32 *mask) +u8 dest_mode, unsigned long *mask) { int i; struct kvm *kvm = ioapic-kvm; @@ -200,7 +200,7 @@ void kvm_ioapic_get_delivery_bitmask(struct kvm_ioapic *ioapic, u8 dest, static int ioapic_deliver(struct kvm_ioapic *ioapic, int irq) { union kvm_ioapic_redirect_entry entry = ioapic-redirtbl[irq]; - u32 deliver_bitmask; + unsigned long deliver_bitmask; struct kvm_vcpu *vcpu; int vcpu_id, r = 0; diff --git a/virt/kvm/ioapic.h b/virt/kvm/ioapic.h index e107dbb..c418a7f 100644 --- a/virt/kvm/ioapic.h +++ b/virt/kvm/ioapic.h @@ -65,12 +65,12 @@ static inline struct kvm_ioapic *ioapic_irqchip(struct kvm *kvm) } struct kvm_vcpu *kvm_get_lowest_prio_vcpu(struct kvm *kvm, u8 vector, - unsigned long bitmap); + unsigned long *bitmap); void kvm_ioapic_update_eoi(struct kvm *kvm, int vector, int trigger_mode); int kvm_ioapic_init(struct kvm *kvm); void kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, int level); void kvm_ioapic_reset(struct kvm_ioapic *ioapic); void kvm_ioapic_get_delivery_bitmask(struct kvm_ioapic *ioapic, u8 dest, -u8 dest_mode, u32 *mask); +u8 dest_mode, unsigned long *mask); #endif diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c index d97cdd6..baee4b7 100644 --- a/virt/kvm/irq_comm.c +++ b/virt/kvm/irq_comm.c @@ -30,7 +30,7 @@ void kvm_get_intr_delivery_bitmask(struct kvm_ioapic *ioapic, union kvm_ioapic_redirect_entry *entry, - u32 *deliver_bitmask) + unsigned long *deliver_bitmask) { struct kvm_vcpu *vcpu; @@ -40,7 +40,7 @@ void kvm_get_intr_delivery_bitmask(struct kvm_ioapic *ioapic, switch
[PATCH 03/10] KVM: Improve MSI dispatch function
Prepare to merge with kvm_set_irq(). Signed-off-by: Sheng Yang sh...@linux.intel.com --- virt/kvm/kvm_main.c |8 1 files changed, 4 insertions(+), 4 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 0a59245..717e1b0 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -87,7 +87,7 @@ static bool kvm_rebooting; #ifdef KVM_CAP_DEVICE_ASSIGNMENT #ifdef CONFIG_X86 -static void assigned_device_msi_dispatch(struct kvm_assigned_dev_kernel *dev) +static void assigned_device_msi_dispatch(struct kvm_assigned_dev_kernel *dev, u32 gsi) { int vcpu_id; struct kvm_vcpu *vcpu; @@ -98,7 +98,7 @@ static void assigned_device_msi_dispatch(struct kvm_assigned_dev_kernel *dev) BUG_ON(!ioapic); - gsi_entry = kvm_find_gsi_route_entry(dev-kvm, dev-guest_irq); + gsi_entry = kvm_find_gsi_route_entry(dev-kvm, gsi); if (!gsi_entry) { printk(KERN_WARNING kvm: fail to find correlated gsi entry\n); return; @@ -145,7 +145,7 @@ static void assigned_device_msi_dispatch(struct kvm_assigned_dev_kernel *dev) } } #else -static void assigned_device_msi_dispatch(struct kvm_assigned_dev_kernel *dev) {} +static void assigned_device_msi_dispatch(struct kvm_assigned_dev_kernel *dev, u32 gsi) {} #endif static struct kvm_assigned_dev_kernel *kvm_find_assigned_dev(struct list_head *head, @@ -180,7 +180,7 @@ static void kvm_assigned_dev_interrupt_work_handler(struct work_struct *work) assigned_dev-guest_irq, 1); else if (assigned_dev-irq_requested_type KVM_ASSIGNED_DEV_GUEST_MSI) { - assigned_device_msi_dispatch(assigned_dev); + assigned_device_msi_dispatch(assigned_dev, assigned_dev-guest_irq); enable_irq(assigned_dev-host_irq); assigned_dev-host_irq_disabled = false; } -- 1.5.4.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 10/10] KVM: bit ops for deliver_bitmap
It's also convenient when we extend KVM supported vcpu number in the future. Signed-off-by: Sheng Yang sh...@linux.intel.com --- arch/x86/kvm/lapic.c |7 --- virt/kvm/ioapic.c| 24 +--- virt/kvm/irq_comm.c | 17 + 3 files changed, 26 insertions(+), 22 deletions(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index c1e4935..359e02c 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -477,9 +477,10 @@ static void apic_send_ipi(struct kvm_lapic *apic) struct kvm_vcpu *target; struct kvm_vcpu *vcpu; - unsigned long lpr_map = 0; + DECLARE_BITMAP(lpr_map, KVM_MAX_VCPUS); int i; + bitmap_zero(lpr_map, KVM_MAX_VCPUS); apic_debug(icr_high 0x%x, icr_low 0x%x, short_hand 0x%x, dest 0x%x, trig_mode 0x%x, level 0x%x, dest_mode 0x%x, delivery_mode 0x%x, vector 0x%x\n, @@ -494,7 +495,7 @@ static void apic_send_ipi(struct kvm_lapic *apic) if (vcpu-arch.apic apic_match_dest(vcpu, apic, short_hand, dest, dest_mode)) { if (delivery_mode == APIC_DM_LOWEST) - set_bit(vcpu-vcpu_id, lpr_map); + set_bit(vcpu-vcpu_id, lpr_map); else __apic_accept_irq(vcpu-arch.apic, delivery_mode, vector, level, trig_mode); @@ -502,7 +503,7 @@ static void apic_send_ipi(struct kvm_lapic *apic) } if (delivery_mode == APIC_DM_LOWEST) { - target = kvm_get_lowest_prio_vcpu(vcpu-kvm, vector, lpr_map); + target = kvm_get_lowest_prio_vcpu(vcpu-kvm, vector, lpr_map); if (target != NULL) __apic_accept_irq(target-arch.apic, delivery_mode, vector, level, trig_mode); diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c index 0dcb0da..162cbdd 100644 --- a/virt/kvm/ioapic.c +++ b/virt/kvm/ioapic.c @@ -200,7 +200,7 @@ void kvm_ioapic_get_delivery_bitmask(struct kvm_ioapic *ioapic, u8 dest, static int ioapic_deliver(struct kvm_ioapic *ioapic, int irq) { union kvm_ioapic_redirect_entry entry = ioapic-redirtbl[irq]; - unsigned long deliver_bitmask; + DECLARE_BITMAP(deliver_bitmask, KVM_MAX_VCPUS); struct kvm_vcpu *vcpu; int vcpu_id, r = 0; @@ -210,22 +210,24 @@ static int ioapic_deliver(struct kvm_ioapic *ioapic, int irq) entry.fields.delivery_mode, entry.fields.vector, entry.fields.trig_mode); - kvm_get_intr_delivery_bitmask(ioapic, entry, deliver_bitmask); - if (!deliver_bitmask) { - ioapic_debug(no target on destination\n); - return 0; - } + bitmap_zero(deliver_bitmask, KVM_MAX_VCPUS); /* Always delivery PIT interrupt to vcpu 0 */ #ifdef CONFIG_X86 if (irq == 0) - deliver_bitmask = 1 0; + set_bit(0, deliver_bitmask); + else #endif + kvm_get_intr_delivery_bitmask(ioapic, entry, deliver_bitmask); + + if (find_first_bit(deliver_bitmask, KVM_MAX_VCPUS) = KVM_MAX_VCPUS) { + ioapic_debug(no target on destination\n); + return 0; + } - for (vcpu_id = 0; deliver_bitmask != 0; vcpu_id++) { - if (!(deliver_bitmask (1 vcpu_id))) - continue; - deliver_bitmask = ~(1 vcpu_id); + while ((vcpu_id = find_first_bit(deliver_bitmask, KVM_MAX_VCPUS)) +KVM_MAX_VCPUS) { + clear_bit(vcpu_id, deliver_bitmask); vcpu = ioapic-kvm-vcpus[vcpu_id]; if (vcpu) { if (entry.fields.delivery_mode == diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c index baee4b7..bce3cd5 100644 --- a/virt/kvm/irq_comm.c +++ b/virt/kvm/irq_comm.c @@ -41,7 +41,7 @@ void kvm_get_intr_delivery_bitmask(struct kvm_ioapic *ioapic, case IOAPIC_LOWEST_PRIORITY: vcpu = kvm_get_lowest_prio_vcpu(ioapic-kvm, entry-fields.vector, deliver_bitmask); - *deliver_bitmask = 1 vcpu-vcpu_id; + set_bit(vcpu-vcpu_id, deliver_bitmask); break; case IOAPIC_FIXED: case IOAPIC_NMI: @@ -62,7 +62,7 @@ static void gsi_dispatch(struct kvm *kvm, u32 gsi) struct kvm_ioapic *ioapic = ioapic_irqchip(kvm); struct kvm_gsi_route_entry *gsi_entry; union kvm_ioapic_redirect_entry entry; - unsigned long deliver_bitmask; + DECLARE_BITMAP(deliver_bitmask, KVM_MAX_VCPUS); BUG_ON(!ioapic); @@ -72,6 +72,7 @@ static void gsi_dispatch(struct kvm *kvm, u32 gsi) return; } + bitmap_zero(deliver_bitmask, KVM_MAX_VCPUS); #ifdef CONFIG_X86 if
[PATCH 0/10][v4]GSI route layer for MSI/MSI-X
Update from v3: Addressed Avi's comment, improve struct gsi_route_entry and use a pair of ioctl to handle them(including some specific interrupt routing) all. Now only support MSI/MSI-X. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 05/10] KVM: Merge MSI handling to kvm_set_irq
Using kvm_set_irq to handle all interrupt injection. Signed-off-by: Sheng Yang sh...@linux.intel.com --- include/linux/kvm_host.h |2 +- virt/kvm/irq_comm.c | 79 +++-- virt/kvm/kvm_main.c | 79 +++--- 3 files changed, 81 insertions(+), 79 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index eab9588..bfdaab9 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -351,7 +351,7 @@ struct kvm_gsi_route_entry { struct hlist_node link; }; -void kvm_set_irq(struct kvm *kvm, int irq_source_id, int irq, int level); +void kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 gsi, int level); void kvm_notify_acked_irq(struct kvm *kvm, unsigned gsi); void kvm_register_irq_ack_notifier(struct kvm *kvm, struct kvm_irq_ack_notifier *kian); diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c index f5e2d2c..e9fcd23 100644 --- a/virt/kvm/irq_comm.c +++ b/virt/kvm/irq_comm.c @@ -24,10 +24,81 @@ #include ioapic.h +#ifdef CONFIG_X86 +#include asm/msidef.h +#endif + +static void gsi_dispatch(struct kvm *kvm, u32 gsi) +{ + int vcpu_id; + struct kvm_vcpu *vcpu; + struct kvm_ioapic *ioapic = ioapic_irqchip(kvm); + struct kvm_gsi_route_entry *gsi_entry; + int dest_id, vector, dest_mode, trig_mode, delivery_mode; + u32 deliver_bitmask; + + BUG_ON(!ioapic); + + gsi_entry = kvm_find_gsi_route_entry(kvm, gsi); + if (!gsi_entry) { + printk(KERN_WARNING kvm: fail to find correlated gsi entry\n); + return; + } + +#ifdef CONFIG_X86 + if (gsi_entry-type KVM_GSI_ROUTE_MSI) { + dest_id = (gsi_entry-msi.address_lo MSI_ADDR_DEST_ID_MASK) +MSI_ADDR_DEST_ID_SHIFT; + vector = (gsi_entry-msi.data MSI_DATA_VECTOR_MASK) +MSI_DATA_VECTOR_SHIFT; + dest_mode = test_bit(MSI_ADDR_DEST_MODE_SHIFT, + (unsigned long *)gsi_entry-msi.address_lo); + trig_mode = test_bit(MSI_DATA_TRIGGER_SHIFT, + (unsigned long *)gsi_entry-msi.data); + delivery_mode = test_bit(MSI_DATA_DELIVERY_MODE_SHIFT, + (unsigned long *)gsi_entry-msi.data); + deliver_bitmask = kvm_ioapic_get_delivery_bitmask(ioapic, + dest_id, dest_mode); + /* IOAPIC delivery mode value is the same as MSI here */ + switch (delivery_mode) { + case IOAPIC_LOWEST_PRIORITY: + vcpu = kvm_get_lowest_prio_vcpu(ioapic-kvm, vector, + deliver_bitmask); + if (vcpu != NULL) + kvm_apic_set_irq(vcpu, vector, trig_mode); + else + printk(KERN_INFO + kvm: null lowest priority vcpu!\n); + break; + case IOAPIC_FIXED: + for (vcpu_id = 0; deliver_bitmask != 0; vcpu_id++) { + if (!(deliver_bitmask (1 vcpu_id))) + continue; + deliver_bitmask = ~(1 vcpu_id); + vcpu = ioapic-kvm-vcpus[vcpu_id]; + if (vcpu) + kvm_apic_set_irq(vcpu, vector, + trig_mode); + } + break; + default: + break; + } + } +#endif /* CONFIG_X86 */ +} + /* This should be called with the kvm-lock mutex held */ -void kvm_set_irq(struct kvm *kvm, int irq_source_id, int irq, int level) +void kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 gsi, int level) { - unsigned long *irq_state = (unsigned long *)kvm-arch.irq_states[irq]; + unsigned long *irq_state; + + if (gsi KVM_GSI_ROUTE_MASK) { + gsi_dispatch(kvm, gsi); + return; + } + + irq_state = (unsigned long *)kvm-arch.irq_states[gsi]; /* Logical OR for level trig interrupt */ if (level) @@ -39,9 +110,9 @@ void kvm_set_irq(struct kvm *kvm, int irq_source_id, int irq, int level) * IOAPIC. So set the bit in both. The guest will ignore * writes to the unused one. */ - kvm_ioapic_set_irq(ioapic_irqchip(kvm), irq, !!(*irq_state)); + kvm_ioapic_set_irq(ioapic_irqchip(kvm), gsi, !!(*irq_state)); #ifdef CONFIG_X86 - kvm_pic_set_irq(pic_irqchip(kvm), irq, !!(*irq_state)); + kvm_pic_set_irq(pic_irqchip(kvm), gsi, !!(*irq_state)); #endif } diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index
Re: [PATCH] CPUID Masking MSRs
On 07.01.2009, at 12:16, Andre Przywara wrote: Alexander Graf wrote: Well if I could take the FlexMigration design into account when putting variables in the vcpu context, that'd be great. But I can't seem to find it in the Intel documentation, so I'll leave it for now. Not real documentation (tell me if you find some!), but this code shows almost everything you probably need: http://xenbits.xensource.com/xen-unstable.hg?rev/be20b11656bb It only shows two of the four feature values, but it's definitely a start :-). Thanks a lot! Looks like the Intel way is about the same. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: MMU: Segregate mmu pages created with different cr4.pge settings
Marcelo Tosatti wrote: Let me shoot at one direction: a shadow page with PGE bit in either state is created. Later that shadow page is nuked (via mmu notifiers, for example). I doubt that mmu notifiers were invoked in this case (the bug would be very rare); in any case we flush the tlb. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM host kernel hang
On 07.01.2009, at 11:15, Avi Kivity wrote: Alexander Graf wrote: Hi, while trying to run a current openSUSE in VMWare ESX in KVM (using NPT), some KVM code seems to be stuck in an endless loop. The qemu process hangs, I can't attach gdb to it and the kernel module seems to be hanging in a place where I don't see any looping code. One CPU is definitely stuck in sys at 100% though. This is running git as of yesterday with some minor ESX modifications that should not touch any of these parts (userspace and MSRs). Maybe one of you guys has a clue what's going on here. You'll find a snippet of a t-sysrq trace with all qemu relevant parts below. The registers (incl. IP) of these don't change over time. Alex qemu-system-x D 810001025280 0 27900 9501 8101000e5c58 0082 8101000e5c1c 81011446e728 807e6280 807e6280 8100388ca680 80601890 8100388ca9c0 00200200 8100388ca9c0 Call Trace: [804485ec] __mutex_lock_slowpath+0x72/0xa9 [8044847a] mutex_lock+0x1e/0x22 [88d7f630] :kvm:kvm_arch_vm_ioctl+0x30e/0x5ae [88d7c78e] :kvm:kvm_vm_ioctl+0x744/0x777 [802acada] vfs_ioctl+0x2a/0x78 [802acd6f] do_vfs_ioctl+0x247/0x261 [802acdde] sys_ioctl+0x55/0x77 [8020bffa] system_call_after_swapgs+0x8a/0x8f [7f2f3b15eb67] Waiting for kvm-lock, so can't kill or strace. qemu-system-x R running task0 27908 9501 88d7d3ad 0390 810100120040 810116491000 fee00390 81011b361d08 88d7f1fb 0001 Call Trace: Inexact backtrace: [88d7d3ad] :kvm:kvm_get_cs_db_l_bits+0x27/0x3e [88d7f1fb] :kvm:emulate_instruction+0x199/0x266 [88d86700] :kvm:kvm_mmu_page_fault+0x49/0x86 [88a3ebe8] :kvm_amd:pf_interception+0xa8/0xb1 [88a3e1b4] :kvm_amd:handle_exit+0x218/0x221 [88d810f6] :kvm:kvm_arch_vcpu_ioctl_run+0x600/0x81a [88d7a4f0] :kvm:kvm_vcpu_ioctl+0xf6/0x485 [802acada] vfs_ioctl+0x2a/0x78 [802acd6f] do_vfs_ioctl+0x247/0x261 [802a13a3] fget_light+0x1/0x83 [802acdde] sys_ioctl+0x55/0x77 [802a0b48] sys_writev+0x60/0x94 [8020bffa] system_call_after_swapgs+0x8a/0x8f But the mutex is not taken here. Looks like we lost it, maybe CONFIG_LOCKDEP can find out where. I have CONFIG_LOCKDEP_SUPPORT=y. How do I make it detect that it's actually locking itself up? Btw: The issue seems to be easily reproducible :-) Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM host kernel hang
Alexander Graf wrote: I have CONFIG_LOCKDEP_SUPPORT=y. How do I make it detect that it's actually locking itself up? Btw: The issue seems to be easily reproducible :-) Perhaps CONFIG_PROVE_LOCKING and CONFIG_LOCKDEP. _SUPPORT just indicates the arch can do it if you want, IIUC. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM host kernel hang
Avi Kivity wrote: Alexander Graf wrote: I have CONFIG_LOCKDEP_SUPPORT=y. How do I make it detect that it's actually locking itself up? Btw: The issue seems to be easily reproducible :-) Perhaps CONFIG_PROVE_LOCKING and CONFIG_LOCKDEP. _SUPPORT just indicates the arch can do it if you want, IIUC. I just added some debug #define's to show me where exactly things break. Jan 7 14:34:46 linux-dp8n kernel: 2149: Grabbing lock { Jan 7 14:34:46 linux-dp8n kernel: 1908: Grabbing lock { 2145 mmio: 2146 /* 2147 * Is this MMIO handled locally? 2148 */ 2149 mutex_lock(vcpu-kvm-lock); 2150 mmio_dev = vcpu_find_mmio_dev(vcpu, gpa, bytes, 0); 2151 if (mmio_dev) { 2152 kvm_iodevice_read(mmio_dev, gpa, bytes, val); 2153 mutex_unlock(vcpu-kvm-lock); 2154 return X86EMUL_CONTINUE; 2155 } 2156 mutex_unlock(vcpu-kvm-lock); 1901 case KVM_IRQ_LINE: { 1902 struct kvm_irq_level irq_event; 1903 1904 r = -EFAULT; 1905 if (copy_from_user(irq_event, argp, sizeof irq_event)) 1906 goto out; 1907 if (irqchip_in_kernel(kvm)) { 1908 mutex_lock(kvm-lock); 1909 kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, 1910 irq_event.irq, irq_event.level); 1911 mutex_unlock(kvm-lock); 1912 r = 0; 1913 } 1914 break; 1915 } Any ideas? Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: MMU: Segregate mmu pages created with different cr4.pge settings
On Wed, Jan 07, 2009 at 01:32:41PM +0200, Avi Kivity wrote: Marcelo Tosatti wrote: Let me shoot at one direction: a shadow page with PGE bit in either state is created. Later that shadow page is nuked (via mmu notifiers, for example). I doubt that mmu notifiers were invoked in this case (the bug would be very rare); in any case we flush the tlb. This comment is worrying /* * FIXME: Tis shouldn't be necessary here, but there is a flush * missing in the MMU code. Until we find this bug, flush the * complete TLB here on an NPF */ if (npt_enabled) svm_flush_tlb(svm-vcpu); Alexander, you might want to try this patch, -ENONPT here (and revert the previous one). I have no clue, what else could be causing this? diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 10bdb2a..bf68e5b 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -33,6 +33,7 @@ #include asm/cmpxchg.h #include asm/io.h #include asm/vmx.h +#include asm/tlbflush.h /* * When setting this variable to true it enables Two-Dimensional-Paging @@ -1850,6 +1851,11 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write, if (*iterator.sptep == shadow_trap_nonpresent_pte) { pseudo_gfn = (iterator.addr PT64_DIR_BASE_ADDR_MASK) PAGE_SHIFT; + +kvm_flush_remote_tlbs(vcpu-kvm); +kvm_mmu_flush_tlb(vcpu); +__flush_tlb(); + sp = kvm_mmu_get_page(vcpu, pseudo_gfn, iterator.addr, iterator.level - 1, 1, ACC_ALL, iterator.sptep); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM host kernel hang
Alexander Graf wrote: Avi Kivity wrote: Alexander Graf wrote: I have CONFIG_LOCKDEP_SUPPORT=y. How do I make it detect that it's actually locking itself up? Btw: The issue seems to be easily reproducible :-) Perhaps CONFIG_PROVE_LOCKING and CONFIG_LOCKDEP. _SUPPORT just indicates the arch can do it if you want, IIUC. I just added some debug #define's to show me where exactly things break. Jan 7 14:34:46 linux-dp8n kernel: 2149: Grabbing lock { Jan 7 14:34:46 linux-dp8n kernel: 1908: Grabbing lock { 2145 mmio: 2146 /* 2147 * Is this MMIO handled locally? 2148 */ 2149 mutex_lock(vcpu-kvm-lock); 2150 mmio_dev = vcpu_find_mmio_dev(vcpu, gpa, bytes, 0); 2151 if (mmio_dev) { 2152 kvm_iodevice_read(mmio_dev, gpa, bytes, val); 2153 mutex_unlock(vcpu-kvm-lock); 2154 return X86EMUL_CONTINUE; 2155 } 2156 mutex_unlock(vcpu-kvm-lock); The lock was lost here. But how? 1901 case KVM_IRQ_LINE: { 1902 struct kvm_irq_level irq_event; 1903 1904 r = -EFAULT; 1905 if (copy_from_user(irq_event, argp, sizeof irq_event)) 1906 goto out; 1907 if (irqchip_in_kernel(kvm)) { 1908 mutex_lock(kvm-lock); 1909 kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, 1910 irq_event.irq, irq_event.level); 1911 mutex_unlock(kvm-lock); 1912 r = 0; 1913 } 1914 break; 1915 } This is your hung iothread trying to inject an interrupt. It's waiting for the lost lock. I suggest enabling all the lock debug magic you can find in kconfig. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
routed tap devices
I am using kvm-82 on a 64-bit host and giving my virtual machines routed tap devices and utilizing proxy arp to provide them connectivity. My host has two ethernet adapters, one connected to the WAN and the other is a private link to another server with a private IP address. Even though I'm assigning device names (on the host) based upon mac address, it seems that depending upon the order in which the linux kernel sees my ethernet adapters they are behaving differently in terms of ip forwarding. If I run `ip link` I see eth1 listed before eth0 and a virtual machine running behind a tap device that is using ip forwarding sees eth1's IP as it's first hop in a traceroute. If I swap eth0 and eth1 (via their configuration), the first hop in the guest's traceroute is eth0's IP and `ip link` shows eth0 first. Is there a way to control this behavior other than switching physical ethernet adapters? I may be paranoid, but I don't want the virtual machines to see my private IP address when using standard tools such as traceroute. Anyone have any ideas? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/6] ATS capability support for Intel IOMMU
This patch series implements Address Translation Service support for the Intel IOMMU. ATS provides ability for the PCI Endpoint to request the DMA address translation from the IOMMU and cache the translation in the Endpoint to alleviate IOMMU pressure and improve the hardware performance in the I/O virtualization environment. [PATCH 1/6] PCI: support the ATS capability [PATCH 2/6] VT-d: parse ATSR in DMA Remapping Reporting Structure [PATCH 3/6] VT-d: add queue invalidation fault status support [PATCH 4/6] VT-d: add device IOTLB invalidation support [PATCH 5/6] VT-d: cleanup iommu_flush_iotlb_psi and flush_unmaps [PATCH 6/6] VT-d: support the device IOTLB -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/6] PCI: support the ATS capability
The ATS spec can be found at http://www.pcisig.com/specifications/iov/ats/ (it requires membership). Signed-off-by: Yu Zhao yu.z...@intel.com --- drivers/pci/pci.c| 68 ++ include/linux/pci.h | 15 ++ include/linux/pci_regs.h | 10 +++ 3 files changed, 93 insertions(+), 0 deletions(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 061d1ee..5abab14 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -1337,6 +1337,74 @@ void pci_enable_ari(struct pci_dev *dev) bridge-ari_enabled = 1; } +/** + * pci_enable_ats - enable the ATS capability + * @dev: the PCI device + * @ps: the IOMMU page shift + * + * Returns 0 on success, or a negative value on error. + */ +int pci_enable_ats(struct pci_dev *dev, int ps) +{ + int pos; + u16 ctrl; + + pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ATS); + if (!pos) + return -ENODEV; + + if (ps PCI_ATS_MIN_STU) + return -EINVAL; + + ctrl = PCI_ATS_CTRL_STU(ps - PCI_ATS_MIN_STU) | PCI_ATS_CTRL_ENABLE; + pci_write_config_word(dev, pos + PCI_ATS_CTRL, ctrl); + + dev-ats_enabled = 1; + + return 0; +} + +/** + * pci_disable_ats - disable the ATS capability + * @dev: the PCI device + */ +void pci_disable_ats(struct pci_dev *dev) +{ + int pos; + u16 ctrl; + + if (!dev-ats_enabled) + return; + + pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ATS); + if (!pos) + return; + + pci_read_config_word(dev, pos + PCI_ATS_CTRL, ctrl); + ctrl = ~PCI_ATS_CTRL_ENABLE; + pci_write_config_word(dev, pos + PCI_ATS_CTRL, ctrl); +} + +/** + * pci_ats_qdep - query ATS Invalidate Queue Depth + * @dev: the PCI device + * + * Returns the queue depth on success, or 0 on error. + */ +int pci_ats_qdep(struct pci_dev *dev) +{ + int pos; + u16 cap; + + pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ATS); + if (!pos) + return 0; + + pci_read_config_word(dev, pos + PCI_ATS_CAP, cap); + + return PCI_ATS_CAP_QDEP(cap) ? : PCI_ATS_MAX_QDEP; +} + int pci_get_interrupt_pin(struct pci_dev *dev, struct pci_dev **bridge) { diff --git a/include/linux/pci.h b/include/linux/pci.h index 4bb156b..e6a1b5a 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -227,6 +227,7 @@ struct pci_dev { unsigned intmsi_enabled:1; unsigned intmsix_enabled:1; unsigned intari_enabled:1; /* ARI forwarding */ + unsigned intats_enabled:1; /* Address Translation Service */ unsigned intis_managed:1; unsigned intis_pcie:1; pci_dev_flags_t dev_flags; @@ -1155,5 +1156,19 @@ static inline void __iomem *pci_ioremap_bar(struct pci_dev *pdev, int bar) } #endif +extern int pci_enable_ats(struct pci_dev *dev, int ps); +extern void pci_disable_ats(struct pci_dev *dev); +extern int pci_ats_qdep(struct pci_dev *dev); +/** + * pci_ats_enabled - query the ATS status + * @dev: the PCI device + * + * Returns 1 if ATS capability is enabled, or 0 if not. + */ +static inline int pci_ats_enabled(struct pci_dev *dev) +{ + return dev-ats_enabled; +} + #endif /* __KERNEL__ */ #endif /* LINUX_PCI_H */ diff --git a/include/linux/pci_regs.h b/include/linux/pci_regs.h index e5effd4..00c9db5 100644 --- a/include/linux/pci_regs.h +++ b/include/linux/pci_regs.h @@ -436,6 +436,7 @@ #define PCI_EXT_CAP_ID_DSN 3 #define PCI_EXT_CAP_ID_PWR 4 #define PCI_EXT_CAP_ID_ARI 14 +#define PCI_EXT_CAP_ID_ATS 15 /* Advanced Error Reporting */ #define PCI_ERR_UNCOR_STATUS 4 /* Uncorrectable Error Status */ @@ -553,4 +554,13 @@ #define PCI_ARI_CTRL_ACS 0x0002 /* ACS Function Groups Enable */ #define PCI_ARI_CTRL_FG(x)(((x) 4) 7) /* Function Group */ +/* Address Translation Service */ +#define PCI_ATS_CAP0x04/* ATS Capability Register */ +#define PCI_ATS_CAP_QDEP(x) ((x) 0x1f)/* Invalidate Queue Depth */ +#define PCI_ATS_MAX_QDEP 32 /* Max Invalidate Queue Depth */ +#define PCI_ATS_CTRL 0x06/* ATS Control Register */ +#define PCI_ATS_CTRL_ENABLE 0x8000 /* ATS Enable */ +#define PCI_ATS_CTRL_STU(x) ((x) 0x1f)/* Smallest Translation Unit */ +#define PCI_ATS_MIN_STU 12 /* shift of minimum STU block */ + #endif /* LINUX_PCI_REGS_H */ -- 1.5.6.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/6] VT-d: parse ATSR in DMA Remapping Reporting Structure
Parse the Root Port ATS Capability Reporting Structure in DMA Remapping Reporting Structure ACPI table. Signed-off-by: Yu Zhao yu.z...@intel.com --- drivers/pci/dmar.c | 114 -- include/linux/dmar.h|9 +++ include/linux/intel-iommu.h |1 + 3 files changed, 118 insertions(+), 6 deletions(-) diff --git a/drivers/pci/dmar.c b/drivers/pci/dmar.c index f5a662a..f2859d1 100644 --- a/drivers/pci/dmar.c +++ b/drivers/pci/dmar.c @@ -254,6 +254,86 @@ rmrr_parse_dev(struct dmar_rmrr_unit *rmrru) } return ret; } + +LIST_HEAD(dmar_atsr_units); + +static int __init dmar_parse_one_atsr(struct acpi_dmar_header *hdr) +{ + struct acpi_dmar_atsr *atsr; + struct dmar_atsr_unit *atsru; + + atsr = container_of(hdr, struct acpi_dmar_atsr, header); + atsru = kzalloc(sizeof(*atsru), GFP_KERNEL); + if (!atsru) + return -ENOMEM; + + atsru-hdr = hdr; + atsru-include_all = atsr-flags 0x1; + + if (atsru-include_all) + list_add_tail(atsru-list, dmar_atsr_units); + else + list_add(atsru-list, dmar_atsr_units); + + return 0; +} + +static int __init atsr_parse_dev(struct dmar_atsr_unit *atsru) +{ + int ret = 0; + struct acpi_dmar_atsr *atsr; + + atsr = container_of(atsru-hdr, struct acpi_dmar_atsr, header); + if (!atsru-include_all) + ret = dmar_parse_dev_scope((void *)(atsr + 1), + (void *)atsr + atsr-header.length, + atsru-devices_cnt, atsru-devices, + atsr-segment); + + if (ret || !(atsru-include_all || atsru-devices_cnt)) { + list_del(atsru-list); + kfree(atsru); + } + + return ret; +} + +int dmar_find_matched_atsr_unit(struct pci_dev *dev) +{ + int i; + struct pci_bus *bus; + struct acpi_dmar_atsr *atsr; + struct dmar_atsr_unit *atsru; + + list_for_each_entry(atsru, dmar_atsr_units, list) { + atsr = container_of(atsru-hdr, struct acpi_dmar_atsr, header); + if (atsr-segment == pci_domain_nr(dev-bus)) + goto found; + } + + return 0; + +found: + for (bus = dev-bus; bus; bus = bus-parent) { + struct pci_dev *bridge = bus-self; + + if (!bridge || !bridge-is_pcie || + bridge-pcie_type == PCI_EXP_TYPE_PCI_BRIDGE) + return 0; + + if (bridge-pcie_type == PCI_EXP_TYPE_ROOT_PORT) { + for (i = 0; i atsru-devices_cnt; i++) + if (atsru-devices[i] == bridge) + return 1; + break; + } + } + + if (atsru-include_all) + return 1; + + return 0; +} #endif static void __init @@ -261,22 +341,28 @@ dmar_table_print_dmar_entry(struct acpi_dmar_header *header) { struct acpi_dmar_hardware_unit *drhd; struct acpi_dmar_reserved_memory *rmrr; + struct acpi_dmar_atsr *atsr; switch (header-type) { case ACPI_DMAR_TYPE_HARDWARE_UNIT: - drhd = (struct acpi_dmar_hardware_unit *)header; + drhd = container_of(header, struct acpi_dmar_hardware_unit, + header); printk (KERN_INFO PREFIX - DRHD (flags: 0x%08x)base: 0x%016Lx\n, - drhd-flags, (unsigned long long)drhd-address); + DRHD base: %#016Lx flags: %#x\n, + (unsigned long long)drhd-address, drhd-flags); break; case ACPI_DMAR_TYPE_RESERVED_MEMORY: - rmrr = (struct acpi_dmar_reserved_memory *)header; - + rmrr = container_of(header, struct acpi_dmar_reserved_memory, + header); printk (KERN_INFO PREFIX - RMRR base: 0x%016Lx end: 0x%016Lx\n, + RMRR base: %#016Lx end: %#016Lx\n, (unsigned long long)rmrr-base_address, (unsigned long long)rmrr-end_address); break; + case ACPI_DMAR_TYPE_ATSR: + atsr = container_of(header, struct acpi_dmar_atsr, header); + printk(KERN_INFO PREFIX ATSR flags: %#x\n, atsr-flags); + break; } } @@ -341,6 +427,11 @@ parse_dmar_table(void) ret = dmar_parse_one_rmrr(entry_header); #endif break; + case ACPI_DMAR_TYPE_ATSR: +#ifdef CONFIG_DMAR + ret = dmar_parse_one_atsr(entry_header); +#endif + break; default: printk(KERN_WARNING PREFIX Unknown DMAR structure type\n);
[PATCH 3/6] VT-d: add queue invalidation fault status support
Check fault register after submitting an queue invalidation request. Signed-off-by: Yu Zhao yu.z...@intel.com --- drivers/pci/dmar.c | 59 +++-- drivers/pci/intr_remapping.c | 21 -- include/linux/intel-iommu.h |4 ++- 3 files changed, 59 insertions(+), 25 deletions(-) diff --git a/drivers/pci/dmar.c b/drivers/pci/dmar.c index f2859d1..eb77258 100644 --- a/drivers/pci/dmar.c +++ b/drivers/pci/dmar.c @@ -673,19 +673,49 @@ static inline void reclaim_free_desc(struct q_inval *qi) } } +static int qi_check_fault(struct intel_iommu *iommu, int index) +{ + u32 fault; + int head; + struct q_inval *qi = iommu-qi; + int wait_index = (index + 1) % QI_LENGTH; + + fault = readl(iommu-reg + DMAR_FSTS_REG); + + /* +* If IQE happens, the head points to the descriptor associated +* with the error. No new descriptors are fetched until the IQE +* is cleared. +*/ + if (fault DMA_FSTS_IQE) { + head = readl(iommu-reg + DMAR_IQH_REG); + if ((head DMAR_IQ_OFFSET) == index) { + memcpy(qi-desc[index], qi-desc[wait_index], + sizeof(struct qi_desc)); + __iommu_flush_cache(iommu, qi-desc[index], + sizeof(struct qi_desc)); + writel(DMA_FSTS_IQE, iommu-reg + DMAR_FSTS_REG); + return -EINVAL; + } + } + + return 0; +} + /* * Submit the queued invalidation descriptor to the remapping * hardware unit and wait for its completion. */ -void qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu) +int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu) { + int rc = 0; struct q_inval *qi = iommu-qi; struct qi_desc *hw, wait_desc; int wait_index, index; unsigned long flags; if (!qi) - return; + return 0; hw = qi-desc; @@ -703,7 +733,8 @@ void qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu) hw[index] = *desc; - wait_desc.low = QI_IWD_STATUS_DATA(2) | QI_IWD_STATUS_WRITE | QI_IWD_TYPE; + wait_desc.low = QI_IWD_STATUS_DATA(QI_DONE) | + QI_IWD_STATUS_WRITE | QI_IWD_TYPE; wait_desc.high = virt_to_phys(qi-desc_status[wait_index]); hw[wait_index] = wait_desc; @@ -714,13 +745,11 @@ void qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu) qi-free_head = (qi-free_head + 2) % QI_LENGTH; qi-free_cnt -= 2; - spin_lock(iommu-register_lock); /* * update the HW tail register indicating the presence of * new descriptors. */ - writel(qi-free_head 4, iommu-reg + DMAR_IQT_REG); - spin_unlock(iommu-register_lock); + writel(qi-free_head DMAR_IQ_OFFSET, iommu-reg + DMAR_IQT_REG); while (qi-desc_status[wait_index] != QI_DONE) { /* @@ -730,6 +759,10 @@ void qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu) * a deadlock where the interrupt context can wait indefinitely * for free slots in the queue. */ + rc = qi_check_fault(iommu, index); + if (rc) + break; + spin_unlock(qi-q_lock); cpu_relax(); spin_lock(qi-q_lock); @@ -739,6 +772,8 @@ void qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu) reclaim_free_desc(qi); spin_unlock_irqrestore(qi-q_lock, flags); + + return rc; } /* @@ -751,13 +786,13 @@ void qi_global_iec(struct intel_iommu *iommu) desc.low = QI_IEC_TYPE; desc.high = 0; + /* should never fail */ qi_submit_sync(desc, iommu); } int qi_flush_context(struct intel_iommu *iommu, u16 did, u16 sid, u8 fm, u64 type, int non_present_entry_flush) { - struct qi_desc desc; if (non_present_entry_flush) { @@ -771,10 +806,7 @@ int qi_flush_context(struct intel_iommu *iommu, u16 did, u16 sid, u8 fm, | QI_CC_GRAN(type) | QI_CC_TYPE; desc.high = 0; - qi_submit_sync(desc, iommu); - - return 0; - + return qi_submit_sync(desc, iommu); } int qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr, @@ -804,10 +836,7 @@ int qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr, desc.high = QI_IOTLB_ADDR(addr) | QI_IOTLB_IH(ih) | QI_IOTLB_AM(size_order); - qi_submit_sync(desc, iommu); - - return 0; - + return qi_submit_sync(desc, iommu); } /* diff --git a/drivers/pci/intr_remapping.c b/drivers/pci/intr_remapping.c index f78371b..45effc5 100644 --- a/drivers/pci/intr_remapping.c +++ b/drivers/pci/intr_remapping.c @@
[PATCH 4/6] VT-d: add device IOTLB invalidation support
Support device IOTLB invalidation to flush the translation cached in the Endpoint. Signed-off-by: Yu Zhao yu.z...@intel.com --- drivers/pci/dmar.c | 63 -- include/linux/intel-iommu.h | 13 - 2 files changed, 72 insertions(+), 4 deletions(-) diff --git a/drivers/pci/dmar.c b/drivers/pci/dmar.c index eb77258..88f6b1f 100644 --- a/drivers/pci/dmar.c +++ b/drivers/pci/dmar.c @@ -666,7 +666,8 @@ void free_iommu(struct intel_iommu *iommu) */ static inline void reclaim_free_desc(struct q_inval *qi) { - while (qi-desc_status[qi-free_tail] == QI_DONE) { + while (qi-desc_status[qi-free_tail] == QI_DONE || + qi-desc_status[qi-free_tail] == QI_ABORT) { qi-desc_status[qi-free_tail] = QI_FREE; qi-free_tail = (qi-free_tail + 1) % QI_LENGTH; qi-free_cnt++; @@ -676,10 +677,13 @@ static inline void reclaim_free_desc(struct q_inval *qi) static int qi_check_fault(struct intel_iommu *iommu, int index) { u32 fault; - int head; + int head, tail; struct q_inval *qi = iommu-qi; int wait_index = (index + 1) % QI_LENGTH; + if (qi-desc_status[wait_index] == QI_ABORT) + return -EAGAIN; + fault = readl(iommu-reg + DMAR_FSTS_REG); /* @@ -699,6 +703,32 @@ static int qi_check_fault(struct intel_iommu *iommu, int index) } } + /* +* If ITE happens, all pending wait_desc commands are aborted. +* No new descriptors are fetched until the ITE is cleared. +*/ + if (fault DMA_FSTS_ITE) { + head = readl(iommu-reg + DMAR_IQH_REG); + head = ((head DMAR_IQ_OFFSET) - 1 + QI_LENGTH) % QI_LENGTH; + head |= 1; + tail = readl(iommu-reg + DMAR_IQT_REG); + tail = ((tail DMAR_IQ_OFFSET) - 1 + QI_LENGTH) % QI_LENGTH; + + writel(DMA_FSTS_ITE, iommu-reg + DMAR_FSTS_REG); + + do { + if (qi-desc_status[head] == QI_IN_USE) + qi-desc_status[head] = QI_ABORT; + head = (head - 2 + QI_LENGTH) % QI_LENGTH; + } while (head != tail); + + if (qi-desc_status[wait_index] == QI_ABORT) + return -EAGAIN; + } + + if (fault DMA_FSTS_ICE) + writel(DMA_FSTS_ICE, iommu-reg + DMAR_FSTS_REG); + return 0; } @@ -708,7 +738,7 @@ static int qi_check_fault(struct intel_iommu *iommu, int index) */ int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu) { - int rc = 0; + int rc; struct q_inval *qi = iommu-qi; struct qi_desc *hw, wait_desc; int wait_index, index; @@ -719,6 +749,9 @@ int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu) hw = qi-desc; +restart: + rc = 0; + spin_lock_irqsave(qi-q_lock, flags); while (qi-free_cnt 3) { spin_unlock_irqrestore(qi-q_lock, flags); @@ -773,6 +806,9 @@ int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu) reclaim_free_desc(qi); spin_unlock_irqrestore(qi-q_lock, flags); + if (rc == -EAGAIN) + goto restart; + return rc; } @@ -839,6 +875,27 @@ int qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr, return qi_submit_sync(desc, iommu); } +int qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, int qdep, + u64 addr, unsigned int mask) +{ + struct qi_desc desc; + + if (mask) { + BUG_ON(addr ((1 (VTD_PAGE_SHIFT + mask)) - 1)); + addr |= (1 (VTD_PAGE_SHIFT + mask - 1)) - 1; + desc.high = QI_DEV_IOTLB_ADDR(addr) | QI_DEV_IOTLB_SIZE; + } else + desc.high = QI_DEV_IOTLB_ADDR(addr); + + if (qdep = QI_DEV_IOTLB_MAX_INVS) + qdep = 0; + + desc.low = QI_DEV_IOTLB_SID(sid) | QI_DEV_IOTLB_QDEP(qdep) | + QI_DIOTLB_TYPE; + + return qi_submit_sync(desc, iommu); +} + /* * Enable Queued Invalidation interface. This is a must to support * interrupt-remapping. Also used by DMA-remapping, which replaces diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index 0a220c9..d82bdac 100644 --- a/include/linux/intel-iommu.h +++ b/include/linux/intel-iommu.h @@ -196,6 +196,8 @@ static inline void dmar_writeq(void __iomem *addr, u64 val) #define DMA_FSTS_PPF ((u32)2) #define DMA_FSTS_PFO ((u32)1) #define DMA_FSTS_IQE (1 4) +#define DMA_FSTS_ICE (1 5) +#define DMA_FSTS_ITE (1 6) #define dma_fsts_fault_record_index(s) (((s) 8) 0xff) /* FRCD_REG, 32 bits access */ @@ -224,7 +226,8 @@ do { \ enum { QI_FREE, QI_IN_USE, - QI_DONE + QI_DONE, + QI_ABORT }; #define
[PATCH 5/6] VT-d: cleanup iommu_flush_iotlb_psi and flush_unmaps
Make iommu_flush_iotlb_psi() and flush_unmaps() easier to read. Signed-off-by: Yu Zhao yu.z...@intel.com --- drivers/pci/intel-iommu.c | 46 +--- 1 files changed, 22 insertions(+), 24 deletions(-) diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c index 235fb7a..261b6bd 100644 --- a/drivers/pci/intel-iommu.c +++ b/drivers/pci/intel-iommu.c @@ -916,30 +916,27 @@ static int __iommu_flush_iotlb(struct intel_iommu *iommu, u16 did, static int iommu_flush_iotlb_psi(struct intel_iommu *iommu, u16 did, u64 addr, unsigned int pages, int non_present_entry_flush) { - unsigned int mask; + int rc; + unsigned int mask = ilog2(__roundup_pow_of_two(pages)); BUG_ON(addr (~VTD_PAGE_MASK)); BUG_ON(pages == 0); - /* Fallback to domain selective flush if no PSI support */ - if (!cap_pgsel_inv(iommu-cap)) - return iommu-flush.flush_iotlb(iommu, did, 0, 0, - DMA_TLB_DSI_FLUSH, - non_present_entry_flush); - /* +* Fallback to domain selective flush if no PSI support or the size is +* too big. * PSI requires page size to be 2 ^ x, and the base address is naturally * aligned to the size */ - mask = ilog2(__roundup_pow_of_two(pages)); - /* Fallback to domain selective flush if size is too big */ - if (mask cap_max_amask_val(iommu-cap)) - return iommu-flush.flush_iotlb(iommu, did, 0, 0, - DMA_TLB_DSI_FLUSH, non_present_entry_flush); - - return iommu-flush.flush_iotlb(iommu, did, addr, mask, - DMA_TLB_PSI_FLUSH, - non_present_entry_flush); + if (!cap_pgsel_inv(iommu-cap) || mask cap_max_amask_val(iommu-cap)) + rc = iommu-flush.flush_iotlb(iommu, did, 0, 0, + DMA_TLB_DSI_FLUSH, + non_present_entry_flush); + else + rc = iommu-flush.flush_iotlb(iommu, did, addr, mask, + DMA_TLB_PSI_FLUSH, + non_present_entry_flush); + return rc; } static void iommu_disable_protect_mem_regions(struct intel_iommu *iommu) @@ -2292,15 +2289,16 @@ static void flush_unmaps(void) if (!iommu) continue; - if (deferred_flush[i].next) { - iommu-flush.flush_iotlb(iommu, 0, 0, 0, -DMA_TLB_GLOBAL_FLUSH, 0); - for (j = 0; j deferred_flush[i].next; j++) { - __free_iova(deferred_flush[i].domain[j]-iovad, - deferred_flush[i].iova[j]); - } - deferred_flush[i].next = 0; + if (!deferred_flush[i].next) + continue; + + iommu-flush.flush_iotlb(iommu, 0, 0, 0, +DMA_TLB_GLOBAL_FLUSH, 0); + for (j = 0; j deferred_flush[i].next; j++) { + __free_iova(deferred_flush[i].domain[j]-iovad, + deferred_flush[i].iova[j]); } + deferred_flush[i].next = 0; } list_size = 0; -- 1.5.6.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/6] VT-d: support the device IOTLB
Support device IOTLB (i.e. ATS) for both native and KVM environments. Signed-off-by: Yu Zhao yu.z...@intel.com --- drivers/pci/intel-iommu.c | 97 +- include/linux/intel-iommu.h |1 + 2 files changed, 95 insertions(+), 3 deletions(-) diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c index 261b6bd..a7ff7cb 100644 --- a/drivers/pci/intel-iommu.c +++ b/drivers/pci/intel-iommu.c @@ -125,6 +125,7 @@ static inline void context_set_fault_enable(struct context_entry *context) } #define CONTEXT_TT_MULTI_LEVEL 0 +#define CONTEXT_TT_DEV_IOTLB 1 static inline void context_set_translation_type(struct context_entry *context, unsigned long value) @@ -240,6 +241,8 @@ struct device_domain_info { struct list_head global; /* link to global list */ u8 bus; /* PCI bus numer */ u8 devfn; /* PCI devfn number */ + int qdep; /* invalidate queue depth */ + struct intel_iommu *iommu; /* IOMMU used by this device */ struct pci_dev *dev; /* it's NULL for PCIE-to-PCI bridge */ struct dmar_domain *domain; /* pointer to domain */ }; @@ -913,6 +916,75 @@ static int __iommu_flush_iotlb(struct intel_iommu *iommu, u16 did, return 0; } +static struct device_domain_info * +iommu_support_dev_iotlb(struct dmar_domain *domain, u8 bus, u8 devfn) +{ + int found = 0; + unsigned long flags; + struct device_domain_info *info; + struct intel_iommu *iommu = device_to_iommu(bus, devfn); + + if (!ecap_dev_iotlb_support(iommu-ecap)) + return NULL; + + if (!iommu-qi) + return NULL; + + spin_lock_irqsave(device_domain_lock, flags); + list_for_each_entry(info, domain-devices, link) + if (info-dev info-bus == bus info-devfn == devfn) { + found = 1; + break; + } + spin_unlock_irqrestore(device_domain_lock, flags); + + if (!found) + return NULL; + + if (!dmar_find_matched_atsr_unit(info-dev)) + return NULL; + + info-iommu = iommu; + info-qdep = pci_ats_qdep(info-dev); + if (!info-qdep) + return NULL; + + return info; +} + +static void iommu_enable_dev_iotlb(struct device_domain_info *info) +{ + pci_enable_ats(info-dev, VTD_PAGE_SHIFT); +} + +static void iommu_disable_dev_iotlb(struct device_domain_info *info) +{ + if (info-dev pci_ats_enabled(info-dev)) + pci_disable_ats(info-dev); +} + +static void iommu_flush_dev_iotlb(struct dmar_domain *domain, + u64 addr, unsigned int mask) +{ + int rc; + u16 sid; + unsigned long flags; + struct device_domain_info *info; + + spin_lock_irqsave(device_domain_lock, flags); + list_for_each_entry(info, domain-devices, link) { + if (!info-dev || !pci_ats_enabled(info-dev)) + continue; + + sid = info-bus 8 | info-devfn; + rc = qi_flush_dev_iotlb(info-iommu, sid, + info-qdep, addr, mask); + if (rc) + printk(KERN_ERR IOMMU: flush device IOTLB failed\n); + } + spin_unlock_irqrestore(device_domain_lock, flags); +} + static int iommu_flush_iotlb_psi(struct intel_iommu *iommu, u16 did, u64 addr, unsigned int pages, int non_present_entry_flush) { @@ -936,6 +1008,9 @@ static int iommu_flush_iotlb_psi(struct intel_iommu *iommu, u16 did, rc = iommu-flush.flush_iotlb(iommu, did, addr, mask, DMA_TLB_PSI_FLUSH, non_present_entry_flush); + if (!rc !non_present_entry_flush) + iommu_flush_dev_iotlb(iommu-domains[did], addr, mask); + return rc; } @@ -1460,6 +1535,7 @@ static int domain_context_mapping_one(struct dmar_domain *domain, unsigned long ndomains; int id; int agaw; + struct device_domain_info *info; pr_debug(Set context mapping for %02x:%02x.%d\n, bus, PCI_SLOT(devfn), PCI_FUNC(devfn)); @@ -1525,7 +1601,11 @@ static int domain_context_mapping_one(struct dmar_domain *domain, context_set_domain_id(context, id); context_set_address_width(context, iommu-agaw); context_set_address_root(context, virt_to_phys(pgd)); - context_set_translation_type(context, CONTEXT_TT_MULTI_LEVEL); + info = iommu_support_dev_iotlb(domain, bus, devfn); + if (info) + context_set_translation_type(context, CONTEXT_TT_DEV_IOTLB); + else + context_set_translation_type(context, CONTEXT_TT_MULTI_LEVEL); context_set_fault_enable(context); context_set_present(context);
Re: [PATCH 01/10] KVM: Add a route layer to convert MSI message to GSI
Hi Sheng, On Wed, Jan 07, 2009 at 06:42:37PM +0800, Sheng Yang wrote: Avi's purpose, to use single kvm_set_irq() to deal with all interrupt, including MSI. So here is it. struct gsi_route_entry is a mapping from a special gsi(with KVM_GSI_MSG_MASK) to MSI/MSI-X message address/data. And the struct can also be extended for other purpose. Now we support up to 256 gsi_route_entry mapping, and gsi is allocated by kernel and provide two ioctls to userspace, which is more flexiable. Signed-off-by: Sheng Yang sh...@linux.intel.com --- include/linux/kvm.h | 26 +++ include/linux/kvm_host.h | 20 + virt/kvm/irq_comm.c | 70 ++ virt/kvm/kvm_main.c | 106 ++ 4 files changed, 222 insertions(+), 0 deletions(-) diff --git a/include/linux/kvm.h b/include/linux/kvm.h index 71c150f..bbefce6 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -399,6 +399,9 @@ struct kvm_trace_rec { #if defined(CONFIG_X86) #define KVM_CAP_REINJECT_CONTROL 24 #endif +#if defined(CONFIG_X86) +#define KVM_CAP_GSI_ROUTE 25 +#endif /* * ioctls for VM fds @@ -433,6 +436,8 @@ struct kvm_trace_rec { #define KVM_ASSIGN_IRQ _IOR(KVMIO, 0x70, \ struct kvm_assigned_irq) #define KVM_REINJECT_CONTROL _IO(KVMIO, 0x71) +#define KVM_REQUEST_GSI_ROUTE _IOWR(KVMIO, 0x72, void *) +#define KVM_FREE_GSI_ROUTE _IOR(KVMIO, 0x73, void *) /* * ioctls for vcpu fds @@ -553,4 +558,25 @@ struct kvm_assigned_irq { #define KVM_DEV_IRQ_ASSIGN_MSI_ACTIONKVM_DEV_IRQ_ASSIGN_ENABLE_MSI #define KVM_DEV_IRQ_ASSIGN_ENABLE_MSI(1 0) +struct kvm_gsi_route_guest { + __u32 entries_nr; + struct kvm_gsi_route_entry_guest *entries; +}; + +#define KVM_GSI_ROUTE_MSI(1 0) +struct kvm_gsi_route_entry_guest { + __u32 gsi; + __u32 type; + __u32 flags; + __u32 reserved; + union { + struct { + __u32 addr_lo; + __u32 addr_hi; + __u32 data; + } msi; + __u32 padding[8]; + }; +}; + #endif diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index a8bcad0..6a00201 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -136,6 +136,9 @@ struct kvm { unsigned long mmu_notifier_seq; long mmu_notifier_count; #endif + struct hlist_head gsi_route_list; +#define KVM_NR_GSI_ROUTE_ENTRIES256 + DECLARE_BITMAP(gsi_route_bitmap, KVM_NR_GSI_ROUTE_ENTRIES); }; /* The guest did something we don't support. */ @@ -336,6 +339,19 @@ void kvm_unregister_irq_mask_notifier(struct kvm *kvm, int irq, struct kvm_irq_mask_notifier *kimn); void kvm_fire_mask_notifiers(struct kvm *kvm, int irq, bool mask); +#define KVM_GSI_ROUTE_MASK0x100ull +struct kvm_gsi_route_entry { + u32 gsi; + u32 type; + u32 flags; + u32 reserved; + union { + struct msi_msg msi; + u32 reserved[8]; + }; + struct hlist_node link; +}; + void kvm_set_irq(struct kvm *kvm, int irq_source_id, int irq, int level); void kvm_notify_acked_irq(struct kvm *kvm, unsigned gsi); void kvm_register_irq_ack_notifier(struct kvm *kvm, @@ -343,6 +359,10 @@ void kvm_register_irq_ack_notifier(struct kvm *kvm, void kvm_unregister_irq_ack_notifier(struct kvm_irq_ack_notifier *kian); int kvm_request_irq_source_id(struct kvm *kvm); void kvm_free_irq_source_id(struct kvm *kvm, int irq_source_id); +int kvm_update_gsi_route(struct kvm *kvm, struct kvm_gsi_route_entry *entry); +struct kvm_gsi_route_entry *kvm_find_gsi_route_entry(struct kvm *kvm, u32 gsi); +void kvm_free_gsi_route(struct kvm *kvm, struct kvm_gsi_route_entry *entry); +void kvm_free_gsi_route_list(struct kvm *kvm); #ifdef CONFIG_DMAR int kvm_iommu_map_pages(struct kvm *kvm, gfn_t base_gfn, diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c index 5162a41..7460e7f 100644 --- a/virt/kvm/irq_comm.c +++ b/virt/kvm/irq_comm.c @@ -123,3 +123,73 @@ void kvm_fire_mask_notifiers(struct kvm *kvm, int irq, bool mask) kimn-func(kimn, mask); } +int kvm_update_gsi_route(struct kvm *kvm, struct kvm_gsi_route_entry *entry) +{ + struct kvm_gsi_route_entry *found_entry, *new_entry; + int r, gsi; + + mutex_lock(kvm-lock); + /* Find whether we need a update or a new entry */ + found_entry = kvm_find_gsi_route_entry(kvm, entry-gsi); + if (found_entry) + *found_entry = *entry; + else { Having a kvm_find_alloc_gsi_route_entry which either returns a present entry if found or returns a newly allocated one makes the code easier to read for me. Then just entry = kvm_find_alloc_gsi_route_entry *entry =
[ kvm-Bugs-2030703 ] Virtio Vista drivers
Bugs item #2030703, was opened at 2008-07-29 05:05 Message generated for change (Comment added) made by roy-anonymous You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2030703group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Ross Patterson (rossp) Assigned to: Nobody/Anonymous (nobody) Summary: Virtio Vista drivers Initial Comment: Neither the Windows 2000 nor the Windows XP drivers for the paravirtualized ethernet adapter or block device seem to work under Windows Vista. It would be nice to have Vista compatible drivers. -- Comment By: roy anonymous (roy-anonymous) Date: 2009-01-08 01:09 Message: Is there really a block device driver for win2k and winxp?? -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2030703group_id=180599 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2490866 ] repeatable corruption with qcow2 on kvm-79
Bugs item #2490866, was opened at 2009-01-07 05:10 Message generated for change (Comment added) made by roy-anonymous You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2490866group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: qemu Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Adrian Bridgett (abridgett) Assigned to: Nobody/Anonymous (nobody) Summary: repeatable corruption with qcow2 on kvm-79 Initial Comment: Creating a qcow2 image, mkfs.ext3, sometimes mounting it would fail immediately, but in all cases it would corrupt (overwritten with zeros) after starting up backuppc on it. This is KVM-79 on a Debian lenny host and guest. This occured using virtio or not. Swapping to a raw file or LV worked flawlessly. I've tested the box with memtest and I don't have issues elsewhere but I've seen corruptions on other images. host and guest are both 2.6.26-1-adm64 kernel (debian lenny) I'm running 32-bit userspace everywhere. Dual core Intel Core2 E6300. I see KVM-81 has improve qcow2 data integrity with cache=writethrough which _might_ be what I'm hitting - but I can't find more details about this to check (and backport patch to debian package or wait for newer debian package). thanks. -- Comment By: roy anonymous (roy-anonymous) Date: 2009-01-08 01:14 Message: I am not quite sure it's true or not, for my case, I get corruption if I have a new FC9 Guest installation with qcow2 with virtio_blk. But it won't have any problem if I install with a FC8 qcow2 installation, then upgrade to FC9 with virtio_blk -- Comment By: Laszlo Dvornik (ldvornik) Date: 2009-01-07 15:42 Message: Same problem here. With Lenny and vanilla 2.6.28 kernel, with KVM 79, and with KVM 82 user tools. Tried with KVM 82 module compile for 2.6.28 and with 2.6.28 builtin KVM sources. 32-bit userspace and kernel, Intel C2D T7100. Another effect: With empty qcow2, vmdk disk image formats, when I try to create a partition and save the new partition table, they can't save it until reboot. With raw image format there is no such problem. I liked to try with qcow, but: qemu: could not open disk image teszt.qcow I switched all of my disk images to raw, until the problem fixed. PS: The host filesystem is ext4, but I tested under ext3 filesystem too and the problem wasn't disappeared. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2490866group_id=180599 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/5][RFC] virtio-net: Add load/save for status bits
virtio-net: Add load/save for status bits Signed-off-by: Alex Williamson alex.william...@hp.com --- hw/virtio-net.c | 10 -- 1 files changed, 8 insertions(+), 2 deletions(-) diff --git a/hw/virtio-net.c b/hw/virtio-net.c index bfb7510..77e3077 100644 --- a/hw/virtio-net.c +++ b/hw/virtio-net.c @@ -16,6 +16,8 @@ #include qemu-timer.h #include virtio-net.h +#define VIRTIO_VM_VERSION 2 + typedef struct VirtIONet { VirtIODevice vdev; @@ -307,13 +309,14 @@ static void virtio_net_save(QEMUFile *f, void *opaque) qemu_put_buffer(f, n-mac, 6); qemu_put_be32(f, n-tx_timer_active); +qemu_put_be16(f, n-status); } static int virtio_net_load(QEMUFile *f, void *opaque, int version_id) { VirtIONet *n = opaque; -if (version_id != 1) +if (version_id 1 || version_id VIRTIO_VM_VERSION) return -EINVAL; virtio_load(n-vdev, f); @@ -321,6 +324,9 @@ static int virtio_net_load(QEMUFile *f, void *opaque, int version_id) qemu_get_buffer(f, n-mac, 6); n-tx_timer_active = qemu_get_be32(f); +if (version_id = 2) +n-status = qemu_get_be16(f); + if (n-tx_timer_active) { qemu_mod_timer(n-tx_timer, qemu_get_clock(vm_clock) + TX_TIMER_INTERVAL); @@ -363,7 +369,7 @@ PCIDevice *virtio_net_init(PCIBus *bus, NICInfo *nd, int devfn) n-tx_timer_active = 0; n-mergeable_rx_bufs = 0; -register_savevm(virtio-net, virtio_net_id++, 1, +register_savevm(virtio-net, virtio_net_id++, VIRTIO_VM_VERSION, virtio_net_save, virtio_net_load, n); return (PCIDevice *)n; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/5][RFC] virti-net: Enable filtering based on MAC, promisc, broadcast and allmulti
virti-net: Enable filtering based on MAC, promisc, broadcast and allmulti Signed-off-by: Alex Williamson alex.william...@hp.com --- hw/virtio-net.c | 22 ++ 1 files changed, 22 insertions(+), 0 deletions(-) diff --git a/hw/virtio-net.c b/hw/virtio-net.c index 653cad4..fa8e71c 100644 --- a/hw/virtio-net.c +++ b/hw/virtio-net.c @@ -167,6 +167,25 @@ static int receive_header(VirtIONet *n, struct iovec *iov, int iovcnt, return offset; } +static int receive_filter(VirtIONet *n, const uint8_t *buf, int size) +{ +static uint8_t bcast[] = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff}; + +if (n-status.bits.promisc) +return 1; + +if ((buf[0] 1) n-status.bits.allmulti) +return 1; + +if (!memcmp(buf, bcast, sizeof(bcast))) +return 1; + +if (!memcmp(buf, n-mac, 6)) +return 1; + +return 0; +} + static void virtio_net_receive(void *opaque, const uint8_t *buf, int size) { VirtIONet *n = opaque; @@ -176,6 +195,9 @@ static void virtio_net_receive(void *opaque, const uint8_t *buf, int size) if (!do_virtio_net_can_receive(n, size)) return; +if (!receive_filter(n, buf, size)) +return; + /* hdr_len refers to the header we supply to the guest */ hdr_len = n-mergeable_rx_bufs ? sizeof(struct virtio_net_hdr_mrg_rxbuf) : sizeof(struct virtio_net_hdr); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/5][RFC] virtio-net: Add additional MACs via a filter table
virtio-net: Add additional MACs via a filter table Signed-off-by: Alex Williamson alex.william...@hp.com --- hw/virtio-net.c | 27 +-- hw/virtio-net.h |4 2 files changed, 29 insertions(+), 2 deletions(-) diff --git a/hw/virtio-net.c b/hw/virtio-net.c index fa8e71c..f7cc36f 100644 --- a/hw/virtio-net.c +++ b/hw/virtio-net.c @@ -16,7 +16,7 @@ #include qemu-timer.h #include virtio-net.h -#define VIRTIO_VM_VERSION 2 +#define VIRTIO_VM_VERSION 3 typedef struct VirtIONet { @@ -28,6 +28,7 @@ typedef struct VirtIONet uint16_t link:1; uint16_t promisc:1; uint16_t allmulti:1; +uint16_t mac_table:1; } bits; } status; VirtQueue *rx_vq; @@ -36,6 +37,7 @@ typedef struct VirtIONet QEMUTimer *tx_timer; int tx_timer_active; int mergeable_rx_bufs; +uint64_t mac_table[16]; } VirtIONet; /* TODO @@ -54,6 +56,7 @@ static void virtio_net_get_config(VirtIODevice *vdev, uint8_t *config) netcfg.status.raw = n-status.raw; memcpy(netcfg.mac, n-mac, 6); +memcpy(netcfg.mac_table, n-mac_table, sizeof(netcfg.mac_table)); memcpy(config, netcfg, sizeof(netcfg)); } @@ -77,7 +80,12 @@ static void virtio_net_set_config(VirtIODevice *vdev, const uint8_t *config) n-status.bits.promisc = netcfg.status.bits.promisc; if (netcfg.status.bits.allmulti != n-status.bits.allmulti) n-status.bits.allmulti = netcfg.status.bits.allmulti; + if (netcfg.status.bits.mac_table != n-status.bits.mac_table) +n-status.bits.mac_table = netcfg.status.bits.mac_table; } + +if (memcmp(n-mac_table, netcfg.mac_table, sizeof(n-mac_table))) +memcpy(n-mac_table, netcfg.mac_table, sizeof(n-mac_table)); } static void virtio_net_set_link_status(VLANClientState *vc) @@ -92,7 +100,8 @@ static void virtio_net_set_link_status(VLANClientState *vc) static uint32_t virtio_net_get_features(VirtIODevice *vdev) { -uint32_t features = (1 VIRTIO_NET_F_MAC) | (1 VIRTIO_NET_F_STATUS); +uint32_t features = (1 VIRTIO_NET_F_MAC) | (1 VIRTIO_NET_F_STATUS) | +(1 VIRTIO_NET_F_MAC_TABLE); return features; } @@ -170,6 +179,7 @@ static int receive_header(VirtIONet *n, struct iovec *iov, int iovcnt, static int receive_filter(VirtIONet *n, const uint8_t *buf, int size) { static uint8_t bcast[] = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff}; +int i; if (n-status.bits.promisc) return 1; @@ -183,6 +193,15 @@ static int receive_filter(VirtIONet *n, const uint8_t *buf, int size) if (!memcmp(buf, n-mac, 6)) return 1; +if (n-status.bits.mac_table) { +for (i = 0; i 16; i++) { +uint8_t *mac = (uint8_t *)n-mac_table[i]; + +if (mac[7] !memcmp(buf, mac, 6)) +return 1; +} +} + return 0; } @@ -342,6 +361,7 @@ static void virtio_net_save(QEMUFile *f, void *opaque) qemu_put_buffer(f, n-mac, 6); qemu_put_be32(f, n-tx_timer_active); qemu_put_be16(f, n-status.raw); +qemu_put_buffer(f, (uint8_t *)n-mac_table, sizeof(n-mac_table)); } static int virtio_net_load(QEMUFile *f, void *opaque, int version_id) @@ -361,6 +381,9 @@ static int virtio_net_load(QEMUFile *f, void *opaque, int version_id) else n-status.raw |= (VIRTIO_NET_S_PROMISC | VIRTIO_NET_S_ALLMULTI); +if (version_id = 3) + qemu_get_buffer(f, (uint8_t *)n-mac_table, sizeof(n-mac_table)); + if (n-tx_timer_active) { qemu_mod_timer(n-tx_timer, qemu_get_clock(vm_clock) + TX_TIMER_INTERVAL); diff --git a/hw/virtio-net.h b/hw/virtio-net.h index 74f1595..532c7c4 100644 --- a/hw/virtio-net.h +++ b/hw/virtio-net.h @@ -38,10 +38,12 @@ #define VIRTIO_NET_F_HOST_UFO 14 /* Host can handle UFO in. */ #define VIRTIO_NET_F_MRG_RXBUF 15 /* Host can merge receive buffers. */ #define VIRTIO_NET_F_STATUS 16 /* virtio_net_config.status available */ +#define VIRTIO_NET_F_MAC_TABLE 17 /* Additional MAC addresses */ #define VIRTIO_NET_S_LINK_UP1 /* Link is up */ #define VIRTIO_NET_S_PROMISC2 /* Promiscuous mode */ #define VIRTIO_NET_S_ALLMULTI 4 /* All-multicast mode */ +#define VIRTIO_NET_S_MAC_TABLE 8 /* Enable MAC filter table */ #define TX_TIMER_INTERVAL 15 /* 150 us */ @@ -59,8 +61,10 @@ struct virtio_net_config uint16_t link:1; uint16_t promisc:1; uint16_t allmulti:1; +uint16_t mac_table:1; } bits; } status; +uint64_t mac_table[16]; } __attribute__((packed)); /* This is the first element of the scatter-gather list. If you don't -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2030703 ] Virtio Vista drivers
Bugs item #2030703, was opened at 2008-07-28 23:05 Message generated for change (Comment added) made by martinmaurer You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2030703group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Ross Patterson (rossp) Assigned to: Nobody/Anonymous (nobody) Summary: Virtio Vista drivers Initial Comment: Neither the Windows 2000 nor the Windows XP drivers for the paravirtualized ethernet adapter or block device seem to work under Windows Vista. It would be nice to have Vista compatible drivers. -- Comment By: martinmaurer (martinmaurer) Date: 2009-01-07 18:36 Message: there are drivers for vista (network). the latest release: see https://sourceforge.net/project/showfiles.php?group_id=180599package_id=267944 (virtio drivers for a block device on windows are not available) -- Comment By: roy anonymous (roy-anonymous) Date: 2009-01-07 18:09 Message: Is there really a block device driver for win2k and winxp?? -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2030703group_id=180599 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/5][RFC] virtio-net: MAC filtering
This series is based on some of the work Mark McLoughlin has been doing, so isn't going to apply until that makes it into the tree. The goal is to enable MAC filtering at the qemu/kvm level for virtio-net packets. I start by adding the capability to set the MAC address, naming the bits in the status field, enabling filtering, and finally adding a MAC table for additional MAC addresses. If this looks reasonable, I'll follow up with VLAN filtering support. A concern here is the growing size of the virtio-net I/O port space config. This series brings it up to 256 bytes with PCI resource rounding. The VLAN filter bitmap would increase that by another 512 bytes, making it 1kB and limiting us to something less than 64 such devices per guest. Is anyone worried? Should filter tables live in MMIO space for virtio devices? I'll send out the guest side patches for virtio-net in a separate thread. Thanks, Alex -- Alex Williamson HP Open Source Linux Org. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/5][RFC] virtio-net: Allow setting the MAC address via set_config
virtio-net: Allow setting the MAC address via set_config Rename get_config for simplicity Signed-off-by: Alex Williamson alex.william...@hp.com --- hw/virtio-net.c | 21 +++-- 1 files changed, 19 insertions(+), 2 deletions(-) diff --git a/hw/virtio-net.c b/hw/virtio-net.c index 2c41b3e..bfb7510 100644 --- a/hw/virtio-net.c +++ b/hw/virtio-net.c @@ -38,7 +38,7 @@ static VirtIONet *to_virtio_net(VirtIODevice *vdev) return (VirtIONet *)vdev; } -static void virtio_net_update_config(VirtIODevice *vdev, uint8_t *config) +static void virtio_net_get_config(VirtIODevice *vdev, uint8_t *config) { VirtIONet *n = to_virtio_net(vdev); struct virtio_net_config netcfg; @@ -48,6 +48,22 @@ static void virtio_net_update_config(VirtIODevice *vdev, uint8_t *config) memcpy(config, netcfg, sizeof(netcfg)); } +static void virtio_net_set_config(VirtIODevice *vdev, const uint8_t *config) +{ +VirtIONet *n = to_virtio_net(vdev); +struct virtio_net_config netcfg; + +memcpy(netcfg, config, sizeof(netcfg)); + +if (memcmp(netcfg.mac, n-mac, 6)) { +memcpy(n-mac, netcfg.mac, 6); +snprintf(n-vc-info_str, sizeof(n-vc-info_str), + virtio macaddr=%02x:%02x:%02x:%02x:%02x:%02x, + n-mac[0], n-mac[1], n-mac[2], + n-mac[3], n-mac[4], n-mac[5]); +} +} + static void virtio_net_set_link_status(VLANClientState *vc) { VirtIONet *n = vc-opaque; @@ -326,7 +342,8 @@ PCIDevice *virtio_net_init(PCIBus *bus, NICInfo *nd, int devfn) if (!n) return NULL; -n-vdev.get_config = virtio_net_update_config; +n-vdev.get_config = virtio_net_get_config; +n-vdev.set_config = virtio_net_set_config; n-vdev.get_features = virtio_net_get_features; n-vdev.set_features = virtio_net_set_features; n-rx_vq = virtio_add_queue(n-vdev, 256, virtio_net_handle_rx); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/5][RFC] virtio-net: Name the status bits, adding promisc and allmulti
virtio-net: Name the status bits, adding promisc and allmulti Signed-off-by: Alex Williamson alex.william...@hp.com --- hw/virtio-net.c | 36 hw/virtio-net.h | 11 ++- 2 files changed, 34 insertions(+), 13 deletions(-) diff --git a/hw/virtio-net.c b/hw/virtio-net.c index 77e3077..653cad4 100644 --- a/hw/virtio-net.c +++ b/hw/virtio-net.c @@ -22,7 +22,14 @@ typedef struct VirtIONet { VirtIODevice vdev; uint8_t mac[6]; -uint16_t status; +union { +uint16_t raw; +struct { +uint16_t link:1; +uint16_t promisc:1; +uint16_t allmulti:1; +} bits; +} status; VirtQueue *rx_vq; VirtQueue *tx_vq; VLANClientState *vc; @@ -45,7 +52,7 @@ static void virtio_net_get_config(VirtIODevice *vdev, uint8_t *config) VirtIONet *n = to_virtio_net(vdev); struct virtio_net_config netcfg; -netcfg.status = n-status; +netcfg.status.raw = n-status.raw; memcpy(netcfg.mac, n-mac, 6); memcpy(config, netcfg, sizeof(netcfg)); } @@ -64,20 +71,23 @@ static void virtio_net_set_config(VirtIODevice *vdev, const uint8_t *config) n-mac[0], n-mac[1], n-mac[2], n-mac[3], n-mac[4], n-mac[5]); } + +if (netcfg.status.raw != n-status.raw) { + if (netcfg.status.bits.promisc != n-status.bits.promisc) +n-status.bits.promisc = netcfg.status.bits.promisc; + if (netcfg.status.bits.allmulti != n-status.bits.allmulti) +n-status.bits.allmulti = netcfg.status.bits.allmulti; +} } static void virtio_net_set_link_status(VLANClientState *vc) { VirtIONet *n = vc-opaque; -uint16_t old_status = n-status; - -if (vc-link_down) -n-status = ~VIRTIO_NET_S_LINK_UP; -else -n-status |= VIRTIO_NET_S_LINK_UP; -if (n-status != old_status) +if (n-status.bits.link != !(vc-link_down)) { + n-status.bits.link = !(vc-link_down); virtio_notify_config(n-vdev); +} } static uint32_t virtio_net_get_features(VirtIODevice *vdev) @@ -309,7 +319,7 @@ static void virtio_net_save(QEMUFile *f, void *opaque) qemu_put_buffer(f, n-mac, 6); qemu_put_be32(f, n-tx_timer_active); -qemu_put_be16(f, n-status); +qemu_put_be16(f, n-status.raw); } static int virtio_net_load(QEMUFile *f, void *opaque, int version_id) @@ -325,7 +335,9 @@ static int virtio_net_load(QEMUFile *f, void *opaque, int version_id) n-tx_timer_active = qemu_get_be32(f); if (version_id = 2) -n-status = qemu_get_be16(f); +n-status.raw = qemu_get_be16(f); +else +n-status.raw |= (VIRTIO_NET_S_PROMISC | VIRTIO_NET_S_ALLMULTI); if (n-tx_timer_active) { qemu_mod_timer(n-tx_timer, @@ -355,7 +367,7 @@ PCIDevice *virtio_net_init(PCIBus *bus, NICInfo *nd, int devfn) n-rx_vq = virtio_add_queue(n-vdev, 256, virtio_net_handle_rx); n-tx_vq = virtio_add_queue(n-vdev, 256, virtio_net_handle_tx); memcpy(n-mac, nd-macaddr, 6); -n-status = VIRTIO_NET_S_LINK_UP; +n-status.raw = VIRTIO_NET_S_LINK_UP; n-vc = qemu_new_vlan_client(nd-vlan, nd-model, nd-name, virtio_net_receive, virtio_net_can_receive, n); n-vc-link_status_changed = virtio_net_set_link_status; diff --git a/hw/virtio-net.h b/hw/virtio-net.h index 9ac9e34..74f1595 100644 --- a/hw/virtio-net.h +++ b/hw/virtio-net.h @@ -40,6 +40,8 @@ #define VIRTIO_NET_F_STATUS 16 /* virtio_net_config.status available */ #define VIRTIO_NET_S_LINK_UP1 /* Link is up */ +#define VIRTIO_NET_S_PROMISC2 /* Promiscuous mode */ +#define VIRTIO_NET_S_ALLMULTI 4 /* All-multicast mode */ #define TX_TIMER_INTERVAL 15 /* 150 us */ @@ -51,7 +53,14 @@ struct virtio_net_config /* The config defining mac address (6 bytes) */ uint8_t mac[6]; /* See VIRTIO_NET_F_STATUS and VIRTIO_NET_S_* above */ -uint16_t status; +union { +uint16_t raw; +struct { +uint16_t link:1; +uint16_t promisc:1; +uint16_t allmulti:1; +} bits; +} status; } __attribute__((packed)); /* This is the first element of the scatter-gather list. If you don't -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2][RFC] virtio_net: MAC filtering
This series builds on some of the patches Mark McLoughlin has sent out recently, so likely won't apply to any current trees until those get upstream. The goal is to enable MAC filtering at the kvm/qemu level for virtio-net packets. Promiscuous and allmulti mode are handled by adding bits to Mark's proposed status field. I also add a 16 entry MAC table for additional unicast and multicast addresses to filter. If this looks reasonable, I'll follow-up with VLAN filtering. As noted in the RFC thread adding the kvm/qemu backing, this does increase the size of the virtio-net device I/O port space, up to 1kB with PCI rounding if we add a 4k entry VLAN bitmap. A 64 device limit is still pretty high for a VM, but maybe we should think about adding MMIO space for virtio-pci. Thanks, Alex -- Alex Williamson HP Open Source Linux Org. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2][RFC] virtio_net: Enable setting MAC, promisc, and allmulti mode
virtio_net: Enable setting MAC, promisc, and allmulti mode Signed-off-by: Alex Williamson alex.william...@hp.com --- drivers/net/virtio_net.c | 79 include/linux/virtio_net.h | 11 ++ 2 files changed, 82 insertions(+), 8 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 3af5e33..f502edd 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -41,7 +41,14 @@ struct virtnet_info struct virtqueue *rvq, *svq; struct net_device *dev; struct napi_struct napi; - unsigned int status; + union { + u16 raw; + struct { + u16 link:1; + u16 promisc:1; + u16 allmulti:1; + } bits; + } status; /* The skb we couldn't send because buffers were full. */ struct sk_buff *last_xmit_skb; @@ -476,6 +483,54 @@ static int virtnet_set_tx_csum(struct net_device *dev, u32 data) return ethtool_op_set_tx_hw_csum(dev, data); } +static int virtnet_set_mac_address(struct net_device *dev, void *p) +{ + struct virtnet_info *vi = netdev_priv(dev); + struct virtio_device *vdev = vi-vdev; + struct sockaddr *addr = p; + + if (!is_valid_ether_addr(addr-sa_data)) + return -EADDRNOTAVAIL; + + memcpy(dev-dev_addr, addr-sa_data, dev-addr_len); + + vdev-config-set(vdev, offsetof(struct virtio_net_config, mac), + dev-dev_addr, dev-addr_len); + + return 0; +} + +static void virtnet_set_rx_mode(struct net_device *dev) +{ + struct virtnet_info *vi = netdev_priv(dev); + struct virtio_device *vdev = vi-vdev; + u16 status = vi-status.raw; + + if (!virtio_has_feature(vi-vdev, VIRTIO_NET_F_STATUS)) + return; + + if (dev-flags IFF_PROMISC) + status |= VIRTIO_NET_S_PROMISC; + else + status = ~VIRTIO_NET_S_PROMISC; + + if (dev-flags IFF_ALLMULTI) + status |= VIRTIO_NET_S_ALLMULTI; + else + status = ~VIRTIO_NET_S_ALLMULTI; + + if (dev-uc_count) + status |= VIRTIO_NET_S_PROMISC; + if (dev-mc_count) + status |= VIRTIO_NET_S_ALLMULTI; + + if (status != vi-status.raw) { + vi-status.raw = status; + vdev-config-set(vdev, offsetof(struct virtio_net_config, + status), vi-status, sizeof(vi-status)); + } +} + static struct ethtool_ops virtnet_ethtool_ops = { .set_tx_csum = virtnet_set_tx_csum, .set_sg = ethtool_op_set_sg, @@ -494,14 +549,15 @@ static void virtnet_update_status(struct virtnet_info *vi) v, sizeof(v)); /* Ignore unknown (future) status bits */ - v = VIRTIO_NET_S_LINK_UP; + v = VIRTIO_NET_S_LINK_UP | VIRTIO_NET_S_PROMISC | + VIRTIO_NET_S_ALLMULTI; - if (vi-status == v) + if (vi-status.raw == v) return; - vi-status = v; + vi-status.raw = v; - if (vi-status VIRTIO_NET_S_LINK_UP) { + if (vi-status.bits.link) { netif_carrier_on(vi-dev); netif_wake_queue(vi-dev); } else { @@ -563,8 +619,17 @@ static int virtnet_probe(struct virtio_device *vdev) vdev-config-get(vdev, offsetof(struct virtio_net_config, mac), dev-dev_addr, dev-addr_len); - } else + } else { + struct sockaddr addr; + random_ether_addr(dev-dev_addr); + memset(addr, 0, sizeof(addr)); + memcpy(addr.sa_data, dev-dev_addr, dev-addr_len); + virtnet_set_mac_address(dev, addr); + } + + dev-set_mac_address = virtnet_set_mac_address; + dev-set_rx_mode = virtnet_set_rx_mode; /* Set up our device-specific information */ vi = netdev_priv(dev); @@ -621,7 +686,7 @@ static int virtnet_probe(struct virtio_device *vdev) goto unregister; } - vi-status = VIRTIO_NET_S_LINK_UP; + vi-status.raw = VIRTIO_NET_S_LINK_UP; virtnet_update_status(vi); pr_debug(virtnet: registered device %s\n, dev-name); diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h index d9174be..5a70edb 100644 --- a/include/linux/virtio_net.h +++ b/include/linux/virtio_net.h @@ -23,6 +23,8 @@ #define VIRTIO_NET_F_STATUS16 /* virtio_net_config.status available */ #define VIRTIO_NET_S_LINK_UP 1 /* Link is up */ +#define VIRTIO_NET_S_PROMISC 2 /* Promiscuous mode */ +#define VIRTIO_NET_S_ALLMULTI 4 /* All-multicast mode */ struct virtio_net_config { @@ -30,7 +32,14 @@ struct virtio_net_config __u8 mac[6]; /* Status supplied by host; see
[PATCH 2/2][RFC] virtio_net: Add MAC fitler table support
virtio_net: Add MAC fitler table support Signed-off-by: Alex Williamson alex.william...@hp.com --- drivers/net/virtio_net.c | 52 +--- include/linux/virtio_net.h |6 - 2 files changed, 54 insertions(+), 4 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index f502edd..d751711 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -505,6 +505,8 @@ static void virtnet_set_rx_mode(struct net_device *dev) struct virtnet_info *vi = netdev_priv(dev); struct virtio_device *vdev = vi-vdev; u16 status = vi-status.raw; + struct dev_addr_list *uc_ptr, *mc_ptr; + int i; if (!virtio_has_feature(vi-vdev, VIRTIO_NET_F_STATUS)) return; @@ -519,11 +521,55 @@ static void virtnet_set_rx_mode(struct net_device *dev) else status = ~VIRTIO_NET_S_ALLMULTI; - if (dev-uc_count) + if (!virtio_has_feature(vi-vdev, VIRTIO_NET_F_MAC_TABLE)) { + if (dev-uc_count) + status |= VIRTIO_NET_S_PROMISC; + if (dev-mc_count) + status |= VIRTIO_NET_S_ALLMULTI; + if (status != vi-status.raw) { + vi-status.raw = status; + vdev-config-set(vdev, + offsetof(struct virtio_net_config, + status), vi-status, + sizeof(vi-status)); + } + return; + } + + if (dev-uc_count 16) { status |= VIRTIO_NET_S_PROMISC; - if (dev-mc_count) + if (dev-mc_count 16) + status |= VIRTIO_NET_S_ALLMULTI; + } else if (dev-uc_count + dev-mc_count 16) status |= VIRTIO_NET_S_ALLMULTI; + if ((dev-uc_count !(status VIRTIO_NET_S_PROMISC)) || + (dev-mc_count !(status VIRTIO_NET_S_ALLMULTI))) + status |= VIRTIO_NET_S_MAC_TABLE; + else + status = ~VIRTIO_NET_S_MAC_TABLE; + + uc_ptr = dev-uc_list; + mc_ptr = dev-mc_list; + + for (i = 0; i 16; i++) { + uint8_t entry[8] = { 0 }; + + if (uc_ptr !(status VIRTIO_NET_S_PROMISC)) { + memcpy(entry, uc_ptr-da_addr, 6); + entry[7] = 1; + uc_ptr = uc_ptr-next; + } else if (mc_ptr !(status VIRTIO_NET_S_ALLMULTI)) { + memcpy(entry, mc_ptr-da_addr, 6); + entry[7] = 1; + mc_ptr = mc_ptr-next; + } + + vdev-config-set(vdev, offsetof(struct virtio_net_config, + mac_table) + (sizeof(entry) * i), + entry, sizeof(entry)); + } + if (status != vi-status.raw) { vi-status.raw = status; vdev-config-set(vdev, offsetof(struct virtio_net_config, @@ -744,7 +790,7 @@ static unsigned int features[] = { VIRTIO_NET_F_HOST_TSO4, VIRTIO_NET_F_HOST_UFO, VIRTIO_NET_F_HOST_TSO6, VIRTIO_NET_F_HOST_ECN, VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6, VIRTIO_NET_F_GUEST_ECN, /* We don't yet handle UFO input. */ - VIRTIO_NET_F_STATUS, + VIRTIO_NET_F_STATUS, VIRTIO_NET_F_MAC_TABLE, VIRTIO_F_NOTIFY_ON_EMPTY, }; diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h index 5a70edb..905319b 100644 --- a/include/linux/virtio_net.h +++ b/include/linux/virtio_net.h @@ -21,10 +21,12 @@ #define VIRTIO_NET_F_HOST_ECN 13 /* Host can handle TSO[6] w/ ECN in. */ #define VIRTIO_NET_F_HOST_UFO 14 /* Host can handle UFO in. */ #define VIRTIO_NET_F_STATUS16 /* virtio_net_config.status available */ +#define VIRTIO_NET_F_MAC_TABLE 17 /* Additional MAC addresses */ #define VIRTIO_NET_S_LINK_UP 1 /* Link is up */ #define VIRTIO_NET_S_PROMISC 2 /* Promiscuous mode */ #define VIRTIO_NET_S_ALLMULTI 4 /* All-multicast mode */ +#define VIRTIO_NET_S_MAC_TABLE 8 /* Enable MAC filter table */ struct virtio_net_config { @@ -38,8 +40,10 @@ struct virtio_net_config __u16 link:1; __u16 promisc:1; __u16 allmulti:1; + __u16 mac_table:1; } bits; - } status; + } status; + __u64 mac_table[16]; } __attribute__((packed)); /* This is the first element of the scatter-gather list. If you don't -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/5][RFC] virtio-net: Name the status bits, adding promisc and allmulti
Alex Williamson wrote: virtio-net: Name the status bits, adding promisc and allmulti Signed-off-by: Alex Williamson alex.william...@hp.com --- hw/virtio-net.c | 36 hw/virtio-net.h | 11 ++- 2 files changed, 34 insertions(+), 13 deletions(-) diff --git a/hw/virtio-net.c b/hw/virtio-net.c index 77e3077..653cad4 100644 --- a/hw/virtio-net.c +++ b/hw/virtio-net.c @@ -22,7 +22,14 @@ typedef struct VirtIONet { VirtIODevice vdev; uint8_t mac[6]; -uint16_t status; +union { +uint16_t raw; +struct { +uint16_t link:1; +uint16_t promisc:1; +uint16_t allmulti:1; +} bits; +} status; I'd prefer the use of #define's like we have today. bit fields have really weird packing and ordering properties across architectures. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/5][RFC] virtio-net: Add load/save for status bits
Alex Williamson wrote: virtio-net: Add load/save for status bits Signed-off-by: Alex Williamson alex.william...@hp.com --- hw/virtio-net.c | 10 -- 1 files changed, 8 insertions(+), 2 deletions(-) diff --git a/hw/virtio-net.c b/hw/virtio-net.c index bfb7510..77e3077 100644 --- a/hw/virtio-net.c +++ b/hw/virtio-net.c @@ -16,6 +16,8 @@ #include qemu-timer.h #include virtio-net.h +#define VIRTIO_VM_VERSION 2 + virtio-net is now at 2 already because of the mergable buffers fix but this is definitely needed for Mark's set_link changes. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/5][RFC] virtio-net: Name the status bits, adding promisc and allmulti
On Wed, 2009-01-07 at 12:09 -0600, Anthony Liguori wrote: Alex Williamson wrote: virtio-net: Name the status bits, adding promisc and allmulti I'd prefer the use of #define's like we have today. bit fields have really weird packing and ordering properties across architectures. Ok, it made a few things easier, but I'll work on using a mask interface. Thanks, Alex -- Alex Williamson HP Open Source Linux Org. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2][RFC] virtio_net: MAC filtering
Alex Williamson wrote: This series builds on some of the patches Mark McLoughlin has sent out recently, so likely won't apply to any current trees until those get upstream. The goal is to enable MAC filtering at the kvm/qemu level for virtio-net packets. Promiscuous and allmulti mode are handled by adding bits to Mark's proposed status field. I also add a 16 entry MAC table for additional unicast and multicast addresses to filter. If this looks reasonable, I'll follow-up with VLAN filtering. As noted in the RFC thread adding the kvm/qemu backing, this does increase the size of the virtio-net device I/O port space, up to 1kB with PCI rounding if we add a 4k entry VLAN bitmap. A 64 device limit is still pretty high for a VM, but maybe we should think about adding MMIO space for virtio-pci. Thanks, I'm not quite sure the best way to address this. Maybe another control queue for sending commands to control this sort of stuff? What are your thoughts Rusty? Regards, Anthony Liguori Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio_net: add link status handling
Hi Rusty, On Fri, 2008-12-12 at 18:34 +1030, Rusty Russell wrote: On Thursday 11 December 2008 05:04:44 Mark McLoughlin wrote: On Tue, 2008-12-09 at 21:11 -0600, Anthony Liguori wrote: Rusty Russell wrote: On Wednesday 10 December 2008 08:02:14 Anthony Liguori wrote: It would be nice if the virtio-net card wrote some acknowledgement that it has received the link status down/up events. How about of every status change event? ie. a generic virtio_pci solution? A really simple way to do it would just be to have another status field that was the guest's status (verses the host requested status which the current field is). All config reads/writes result in exits so it's easy to track. Adding YA virtio event may be a little overkill. Sounds very reasonable; that and Rusty's mask out unknown bits suggestion in the version below. Not quite what I was after. I've taken the original patch, added the masking change. I'll test here and feed to DaveM. This never got pushed to davem, did it? What you've got in your queue looks fine to me ... Cheers, Mark. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2][RFC] virtio_net: MAC filtering
On Wed, 2009-01-07 at 12:14 -0600, Anthony Liguori wrote: Alex Williamson wrote: As noted in the RFC thread adding the kvm/qemu backing, this does increase the size of the virtio-net device I/O port space, up to 1kB with PCI rounding if we add a 4k entry VLAN bitmap. A 64 device limit is still pretty high for a VM, but maybe we should think about adding MMIO space for virtio-pci. Thanks, I'm not quite sure the best way to address this. Maybe another control queue for sending commands to control this sort of stuff? What are your thoughts Rusty? This is also a good time to decide if a fixed 16 entry MAC filter table is sufficient. Should the size be programmed into the config space? There's plenty of room to make it a bigger fixed size and still stay at 1kB of I/O port space with the VLAN table. This implementation is a little wasteful of space in using 8 bytes to store the MAC and a valid bit, but I suspect there's some endian issues I'm ignoring and a standard data type might make that easier later. Alex -- Alex Williamson HP Open Source Linux Org. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2][RFC] virtio_net: MAC filtering
Alex Williamson wrote: On Wed, 2009-01-07 at 12:14 -0600, Anthony Liguori wrote: Alex Williamson wrote: As noted in the RFC thread adding the kvm/qemu backing, this does increase the size of the virtio-net device I/O port space, up to 1kB with PCI rounding if we add a 4k entry VLAN bitmap. A 64 device limit is still pretty high for a VM, but maybe we should think about adding MMIO space for virtio-pci. Thanks, I'm not quite sure the best way to address this. Maybe another control queue for sending commands to control this sort of stuff? What are your thoughts Rusty? This is also a good time to decide if a fixed 16 entry MAC filter table is sufficient. Should the size be programmed into the config space? There's plenty of room to make it a bigger fixed size and still stay at 1kB of I/O port space with the VLAN table. This implementation is a little wasteful of space in using 8 bytes to store the MAC and a valid bit, but I suspect there's some endian issues I'm ignoring and a standard data type might make that easier later. If we switch to a command queue, then there's no need to have any fixed limitation. Regards, Anthony Liguori Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM host kernel hang
On 07.01.2009, at 14:53, Avi Kivity a...@redhat.com wrote: Alexander Graf wrote: Avi Kivity wrote: Alexander Graf wrote: I have CONFIG_LOCKDEP_SUPPORT=y. How do I make it detect that it's actually locking itself up? Btw: The issue seems to be easily reproducible :-) Perhaps CONFIG_PROVE_LOCKING and CONFIG_LOCKDEP. _SUPPORT just indicates the arch can do it if you want, IIUC. I just added some debug #define's to show me where exactly things break. Jan 7 14:34:46 linux-dp8n kernel: 2149: Grabbing lock { Jan 7 14:34:46 linux-dp8n kernel: 1908: Grabbing lock { 2145 mmio: 2146 /* 2147 * Is this MMIO handled locally? 2148 */ 2149 mutex_lock(vcpu-kvm-lock); 2150 mmio_dev = vcpu_find_mmio_dev(vcpu, gpa, bytes, 0); 2151 if (mmio_dev) { 2152 kvm_iodevice_read(mmio_dev, gpa, bytes, val); 2153 mutex_unlock(vcpu-kvm-lock); 2154 return X86EMUL_CONTINUE; 2155 } 2156 mutex_unlock(vcpu-kvm-lock); The lock was lost here. But how? 1901 case KVM_IRQ_LINE: { 1902 struct kvm_irq_level irq_event; 1903 1904 r = -EFAULT; 1905 if (copy_from_user(irq_event, argp, sizeof irq_event)) 1906 goto out; 1907 if (irqchip_in_kernel(kvm)) { 1908 mutex_lock(kvm-lock); 1909 kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, 1910 irq_event.irq, irq_event.level); 1911 mutex_unlock(kvm-lock); 1912 r = 0; 1913 } 1914 break; 1915 } This is your hung iothread trying to inject an interrupt. It's waiting for the lost lock. I suggest enabling all the lock debug magic you can find in kconfig. I did that and still don't get anything. I'll try digging deeper into this tomorrow. Alex -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG() with SCSI-interfaced disk images
On Fri, Dec 26, 2008 at 04:00:28PM -0500, John Morrissey wrote: I'm encountering a kernel BUG() in guests using SCSI-interfaced disk images. I've tried with the Debian packaging of KVM 79 and 82; both exhibit the same behavior (disclaimer: Debian has about a dozen patches in their kvm packaging, but they all seem to be changes to the build/install process or security-related). Not to be pushy, but does anyone have any ideas on this, or can I provide any additional information? I'm afraid I'm a bit over my head when debugging kernel internals. john IDE-interfaced disk images seem fine. Host and guest are up-to-date Debian lenny (32-bit/i386) running kernel 2.6.26 (Debian linux-image-2.6.26-1-amd64 2.6.26-12). After a few minutes of disk activity (fsck(8)ing a fairly empty ~20GB filesystem is a reliable trigger), the kernel BUGs (oops output below). I was previously using KVM 72, and tried upgrading to 79 because both Debian lenny and Ubuntu hardy guests were panicing due to sym disconnects/timeouts. 79 makes the lenny guest start BUGging as described above. 82 is not perceivably different from 79 for the lenny guest. FWIW, the upgrade to 79 allowed the Ubuntu hardy guest to stay up, although it emits: Dec 25 00:28:51 vicar kernel: [106621.553272] sd 2:0:0:0: [sda] Sense Key : No Sense [current] Dec 25 00:28:51 vicar kernel: [106621.553279] Info fld=0x0 Dec 25 00:28:51 vicar kernel: [106621.553280] sd 2:0:0:0: [sda] Add. Sense: No additional sense information at seemingly random intervals. The upgrade to 82 made the hardy guest start BUGging on soft lockups at random intervals (I can provide the full output if anyone's interested, but I'm much more interested in the lenny guest oops at this point). john run via libvirt: /usr/bin/kvm -S -M pc -m 512 -smp 1 -name test -monitor pty \ -boot c -drive file=image.qcow,if=scsi,index=0,boot=on -net nic,macaddr=00:0c:29:1e:ea:b9,vlan=0,model=e1000 \ -net tap,fd=17,script=,vlan=0,ifname=vnet2 \ -net nic,macaddr=00:0c:29:1e:ea:c3,vlan=1,model=e1000 \ -net tap,fd=18,script=,vlan=1,ifname=vnet3 \ -serial pty -parallel none -usb -vnc 0.0.0.0:1 [The KVMWiki asks whether the problem is reproducible with -no-kvm-irqchip, -no-kvm-pit, or -no-kvm, but when I tried invoking the above command line by hand (outside of libvirt), the VNC console was always blank and there was no console output on the serial pty. If this would be useful information to have in this case, I'd love to know what I'm doing wrong, or if there's a way to specify additional command line arguments with libvirt.] oops generated in the guest: [ 140.101828] sym0: unexpected disconnect [ 140.102748] BUG: unable to handle kernel NULL pointer dereference at 0358 [ 140.103818] IP: [e08e2670] :sym53c8xx:sym_int_sir+0x547/0x118f [ 140.106449] *pdpt = 1f5f9001 *pde = [ 140.107356] Oops: [#1] SMP [ 140.107864] Modules linked in: loop virtio_balloon psmouse pcspkr serio_raw i2c_piix4 i2c_core button evdev ext3 jbd mbcache sd_mod ide_cd_mod cdrom ata_generic libata dock ide_pci_generic floppy virtio_pci virtio_ring virtio sym53c8xx scsi_transport_spi scsi_mod e1000 uhci_hcd usbcore piix ide_core thermal processor fan thermal_sys [ 140.108062] [ 140.108062] Pid: 131, comm: pdflush Not tainted (2.6.26-1-686-bigmem #1) [ 140.108062] EIP: 0060:[e08e2670] EFLAGS: 00010287 CPU: 0 [ 140.108062] EIP is at sym_int_sir+0x547/0x118f [sym53c8xx] [ 140.108062] EAX: 000a EBX: ECX: 1f98c084 EDX: 0030 [ 140.108062] ESI: df98c084 EDI: df98c000 EBP: df98c000 ESP: de0f3ba0 [ 140.108062] DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 [ 140.108062] Process pdflush (pid: 131, ti=de0f2000 task=df48e520 task.ti=de0f2000) [ 140.108062] Stack: 000144d6 7f5a222c c011a853 0021d496 [ 140.108062] df98c000 e08e08cd 0001 df98c000 [ 140.108062]0084 e08e3f2f df988c00 0046 df544400 0196 [ 140.108062] Call Trace: [ 140.108062] [c011a853] pvclock_clocksource_read+0x4b/0xd0 [ 140.108062] [e08e08cd] sym_recover_scsi_int+0xb3/0x10d [sym53c8xx] [ 140.108062] [e08e3f2f] sym_interrupt+0x3ee/0x5fd [sym53c8xx] [ 140.108062] [e08df3dc] sym53c8xx_intr+0x35/0x56 [sym53c8xx] [ 140.108062] [c0158e4e] handle_IRQ_event+0x23/0x51 [ 140.108062] [c0159f4d] handle_fasteoi_irq+0x71/0xa4 [ 140.108062] [c010afd2] do_IRQ+0x4d/0x63 [ 140.108062] [c01092a7] common_interrupt+0x23/0x28 [ 140.108062] [c01300d8] ptrace_request+0x1ec/0x278 [ 140.108062] [c012d0c6] __do_softirq+0x57/0xd3 [ 140.108062] [c012d187] do_softirq+0x45/0x53 [ 140.108062] [c012d43e] irq_exit+0x35/0x67 [ 140.108062] [c01152b6] smp_apic_timer_interrupt+0x6b/0x75 [ 140.108062] [c0109364] apic_timer_interrupt+0x28/0x30 [ 140.108062]
Re: [PATCH 05/10] KVM: Merge MSI handling to kvm_set_irq
On Wed, Jan 07, 2009 at 06:42:41PM +0800, Sheng Yang wrote: Using kvm_set_irq to handle all interrupt injection. Signed-off-by: Sheng Yang sh...@linux.intel.com --- include/linux/kvm_host.h |2 +- virt/kvm/irq_comm.c | 79 +++-- virt/kvm/kvm_main.c | 79 +++--- 3 files changed, 81 insertions(+), 79 deletions(-) +static void gsi_dispatch(struct kvm *kvm, u32 gsi) +{ + int vcpu_id; + struct kvm_vcpu *vcpu; + struct kvm_ioapic *ioapic = ioapic_irqchip(kvm); + struct kvm_gsi_route_entry *gsi_entry; + int dest_id, vector, dest_mode, trig_mode, delivery_mode; + u32 deliver_bitmask; + + BUG_ON(!ioapic); + + gsi_entry = kvm_find_gsi_route_entry(kvm, gsi); + if (!gsi_entry) { + printk(KERN_WARNING kvm: fail to find correlated gsi entry\n); + return; + } + +#ifdef CONFIG_X86 + if (gsi_entry-type KVM_GSI_ROUTE_MSI) { + dest_id = (gsi_entry-msi.address_lo MSI_ADDR_DEST_ID_MASK) + MSI_ADDR_DEST_ID_SHIFT; + vector = (gsi_entry-msi.data MSI_DATA_VECTOR_MASK) + MSI_DATA_VECTOR_SHIFT; + dest_mode = test_bit(MSI_ADDR_DEST_MODE_SHIFT, + (unsigned long *)gsi_entry-msi.address_lo); + trig_mode = test_bit(MSI_DATA_TRIGGER_SHIFT, + (unsigned long *)gsi_entry-msi.data); + delivery_mode = test_bit(MSI_DATA_DELIVERY_MODE_SHIFT, + (unsigned long *)gsi_entry-msi.data); + deliver_bitmask = kvm_ioapic_get_delivery_bitmask(ioapic, + dest_id, dest_mode); + /* IOAPIC delivery mode value is the same as MSI here */ + switch (delivery_mode) { Sheng, This code seems to ignore the RH bit (MSI_ADDR_REDIRECTION_SHIFT): 4.Destination mode (DM) — This bit indicates whether the Destination ID field should be interpreted as logical or physical APIC ID for delivery of the lowest priority interrupt. If RH is 1 and DM is 0, the Destination ID field is in physical destination mode and only the processor in the system that has the matching APIC ID is considered for delivery of that interrupt (this means no re-direction). If RH is 1 and DM is 1, the Destination ID Field is interpreted as in logical destination mode and the redirection is limited to only those processors that are part of the logical group of processors based on the processor’s logical APIC ID and the Destination ID field in the message. The logical group of processors consists of those identified by matching the 8-bit Destination ID with the logical destination identified by the Destination Format Register and the Logical Destination Register in each local APIC. Is that intentional? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2030703 ] Virtio Vista drivers
Bugs item #2030703, was opened at 2008-07-29 00:05 Message generated for change (Comment added) made by thekozmo You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2030703group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Ross Patterson (rossp) Assigned to: Nobody/Anonymous (nobody) Summary: Virtio Vista drivers Initial Comment: Neither the Windows 2000 nor the Windows XP drivers for the paravirtualized ethernet adapter or block device seem to work under Windows Vista. It would be nice to have Vista compatible drivers. -- Comment By: Dor Laor (thekozmo) Date: 2009-01-08 00:30 Message: What version did you use for pvnet on Vista? Avi uploaded a new one last week. If there is an error, dump, bsod, please provide it. Currently there is no pv block support for win*. There is work in progress on this one. -- Comment By: martinmaurer (martinmaurer) Date: 2009-01-07 19:36 Message: there are drivers for vista (network). the latest release: see https://sourceforge.net/project/showfiles.php?group_id=180599package_id=267944 (virtio drivers for a block device on windows are not available) -- Comment By: roy anonymous (roy-anonymous) Date: 2009-01-07 19:09 Message: Is there really a block device driver for win2k and winxp?? -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2030703group_id=180599 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 09/10] KVM: Update intr delivery func to accept unsigned long* bitmap
Better separate the bitmap patches from this series to ease merging of the MSI changes. On Wed, Jan 07, 2009 at 06:42:45PM +0800, Sheng Yang wrote: Would be used with bit ops, and would be easily extended if KVM_MAX_VCPUS is increased. Signed-off-by: Sheng Yang sh...@linux.intel.com --- arch/x86/kvm/lapic.c |8 include/linux/kvm_host.h |2 +- virt/kvm/ioapic.c|4 ++-- virt/kvm/ioapic.h|4 ++-- virt/kvm/irq_comm.c |6 +++--- 5 files changed, 12 insertions(+), 12 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2493108 ] Win2k problems on some (not all) Intel hosts
Bugs item #2493108, was opened at 2009-01-08 12:53 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2493108group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: intel Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Kevin Shanahan (kmshanah) Assigned to: Nobody/Anonymous (nobody) Summary: Win2k problems on some (not all) Intel hosts Initial Comment: I have a Windows 2000 guest that I have been testing on an desktop machine (E8400 CPU) and that has been working well (apart from being affected by bug 2314737). When I moved (not a live migration, just shutdown, rsync and boot on the new server) the guest to our server, which is an IBM X3550 with two Xeon 5130 CPUs the guest has short freezes where the guest CPU usage spikes on both virtual CPUs and the guest becomes unresponsive for several seconds at a time. These guest CPU spikes can last over a minute, but the guest might have moments where it briefly responds to the delayed keystrokes, etc. every few seconds. One symptom of this behaviour for me last night was that Win2k AD replication was failing, so it's not just interactive use that suffers. Something I noticed that may be relevant - due to the other bug (2314737) which causes each guest CPU to use 100% of the host CPU, whether the guest CPU is idle or not, I can tell when the guest is having this problem by looking at the 'top' output on the host. When the guest is operating normally, the host will show the qemu-system-x86_64 process using 200% CPU (the guest is using -smp 2). However, when the guest is misbehaving, the host will show the qemu-system-x86_64 process only using _100%_ CPU. Maybe one of the threads is stuck, or something is forcing them to share a single core? Both hosts are running Debian Lenny/Sid, 64-bit with a kernel.org 2.6.28 kernel and kvm-82. The command line used in both cases: /usr/local/kvm/bin/qemu-system-x86_64 \ -smp 2 \ -localtime -m 2048 \ -drive if=ide,file=kvm-ks-02a.img,index=0,media=disk,boot=on \ -drive if=ide,file=kvm-ks-02b.img,index=2,media=disk \ -net nic,vlan=0,macaddr=52:54:00:12:34:68,model=virtio \ -net tap,vlan=0,ifname=tap18,script=no \ -vnc 127.0.0.1:18 -usbdevice tablet \ -daemonize CPUs on the good host: processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz stepping: 10 cpu MHz : 3000.000 cache size : 6144 KB physical id : 0 siblings: 2 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm tpr_shadow vnmi flexpriority bogomips: 5984.98 clflush size: 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz stepping: 10 cpu MHz : 3000.000 cache size : 6144 KB physical id : 0 siblings: 2 core id : 1 cpu cores : 2 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm tpr_shadow vnmi flexpriority bogomips: 5984.97 clflush size: 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: And the bad host: processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Xeon(R) CPU5130 @ 2.00GHz stepping: 6 cpu MHz : 1995.117 cache size : 4096 KB physical id : 0 siblings: 2 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant _tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl
Userspace specific host irq?
Hi all This piece of code puzzled me at all time: virt/kvm/kvm_main.c:assigned_device_update_intx() if (airq-host_irq) adev-host_irq = airq-host_irq; else adev-host_irq = adev-dev-irq; I don't know why we can let userspace use a different host_irq rather than the real one. I've queried Amit and Ben-Ami who are the original author about this question, and they also think this piece of code is redundancy. Send the question to the mailing list, if everyone agree, I would discard this one. Thanks! -- regards Yang, Sheng -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm: userspace: change vtd.o to iommu.o in Kbuild
vtd.c has been renamed to iommu.c, need to change it in Kbuild accordingly. Signed-off-by: Wei Huang wei.w.hu...@intel.com --- kernel/x86/Kbuild |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/kernel/x86/Kbuild b/kernel/x86/Kbuild index c4723b1..48339b4 100644 --- a/kernel/x86/Kbuild +++ b/kernel/x86/Kbuild @@ -10,7 +10,7 @@ ifeq ($(EXT_CONFIG_KVM_TRACE),y) kvm-objs += kvm_trace.o endif ifeq ($(CONFIG_DMAR),y) -kvm-objs += vtd.o +kvm-objs += iommu.o endif kvm-intel-objs := vmx.o vmx-debug.o ../external-module-compat.o kvm-amd-objs := svm.o ../external-module-compat.o -- 1.6.1.rc3 0001-Change-vtd.o-to-iommu.o-in-Kbuild.patch Description: 0001-Change-vtd.o-to-iommu.o-in-Kbuild.patch