Re: qemu-kvm: fix infinite recursion in pci
On 12/15/2009 02:58 PM, Michael S. Tsirkin wrote: Make config reads for assigned devices work like they used to: both pci_default_read_config and pci_default_cap_read_config call to pci_read_config, which does the actual work. This fixes infinite recursion introduced by recent merge from stable-0.12. Applied to master and 0.12, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: debugging windows guests
On 12/17/2009 09:05 AM, Raindog wrote: On 12/16/2009 9:36 PM, Avi Kivity wrote: On 12/17/2009 12:06 AM, Raindog wrote: Are there any advantages over stock qemu if using kvm w/out the kernel module? No. qemu-kvm is not tested without kvm, so there may be disadvantages. Does that then imply that svm emulation (--enable-nesting) is not well tested either? It's an unrelated feature (but I wouldn't say it is heavily tested). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Tuning guide, -cpu host crashes hosted VM
On 12/16/2009 12:33 AM, Steven Ball wrote: Howdy all, I am attempting to evaluate a few different virtualization platforms. I've been going about this by installing a really basic XP host, and running some benchmarks. I realize these are synthetic benchmarks, but it at least allows me to see how the virtualization platforms differ. I attempted to do some tuning with KVM as per the tuning guide, using -cpu host, and as soon as my benchmarking program under XP uses a sse instruction, I get a blue screen crash and reboot. Any idea what the instruction is? You can set Windows to collect a crash dump, then use windbg to examine the dump and it will point out the offending instruction. Alternatively, post a pointer to the application and someone may reproduce it some day. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/7] Nested VMX patch 5 Simplify fpu handling
On 12/10/2009 08:38 PM, or...@il.ibm.com wrote: From: Orit Wassermanor...@il.ibm.com What exactly is the simplification? Is it intended to have a functional change? --- arch/x86/kvm/vmx.c | 27 +-- 1 files changed, 17 insertions(+), 10 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 8745d44..de1f596 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -1244,8 +1244,6 @@ static void update_exception_bitmap(struct kvm_vcpu *vcpu) u32 eb; eb = (1u PF_VECTOR) | (1u UD_VECTOR) | (1u MC_VECTOR); - if (!vcpu-fpu_active) - eb |= 1u NM_VECTOR; /* * Unconditionally intercept #DB so we can maintain dr6 without * reading it every exit. @@ -1463,10 +1461,6 @@ static void vmx_fpu_activate(struct kvm_vcpu *vcpu) if (vcpu-fpu_active) return; vcpu-fpu_active = 1; - vmcs_clear_bits(GUEST_CR0, X86_CR0_TS); - if (vcpu-arch.cr0 X86_CR0_TS) - vmcs_set_bits(GUEST_CR0, X86_CR0_TS); - update_exception_bitmap(vcpu); } static void vmx_fpu_deactivate(struct kvm_vcpu *vcpu) @@ -1474,8 +1468,6 @@ static void vmx_fpu_deactivate(struct kvm_vcpu *vcpu) if (!vcpu-fpu_active) return; vcpu-fpu_active = 0; - vmcs_set_bits(GUEST_CR0, X86_CR0_TS); - update_exception_bitmap(vcpu); } static unsigned long vmx_get_rflags(struct kvm_vcpu *vcpu) @@ -2715,8 +2707,10 @@ static void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3) vmx_flush_tlb(vcpu); vmcs_writel(GUEST_CR3, guest_cr3); - if (vcpu-arch.cr0 X86_CR0_PE) - vmx_fpu_deactivate(vcpu); + if (vcpu-arch.cr0 X86_CR0_PE) { + if (guest_cr3 != vmcs_readl(GUEST_CR3)) + vmx_fpu_deactivate(vcpu); + } Why the added cr3 check? It may make sense, but it isn't a simplification. static void vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) @@ -5208,6 +5202,19 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu) if (vcpu-arch.switch_db_regs) get_debugreg(vcpu-arch.dr6, 6); + if (vcpu-fpu_active) { + if (vmcs_readl(CR0_READ_SHADOW) X86_CR0_TS) + vmcs_set_bits(GUEST_CR0, X86_CR0_TS); + else + vmcs_clear_bits(GUEST_CR0, X86_CR0_TS); + vmcs_write32(EXCEPTION_BITMAP, +vmcs_read32(EXCEPTION_BITMAP) ~(1u NM_VECTOR)); + } else { + vmcs_set_bits(GUEST_CR0, X86_CR0_TS); + vmcs_write32(EXCEPTION_BITMAP, +vmcs_read32(EXCEPTION_BITMAP) | (1u NM_VECTOR)); + } This is executed unconditionally, so the vmreads/vmwrites take place every time. Need to cache the previous fpu_active state and only take action if it changed. Since this is a large piece of code, move it to a function. Please post this as the first patch (or better, separately), so I can apply it independently of the rest. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] KVM: Extended shared_msr_global to per CPU
shared_msr_global saved host value of relevant MSRs, but it have an assumption that all MSRs it tracked shared the value across the different CPUs. It's not true with some MSRs, e.g. MSR_TSC_AUX. Extend it to per CPU to provide the support of MSR_TSC_AUX, and more alike MSRs. Notice now the shared_msr_global still have one assumption: it can only deal with the MSRs that won't change in host after KVM module loaded. Signed-off-by: Sheng Yang sh...@linux.intel.com --- How about this? Move the all initialization to hardware_enable(). And only initialized once for each cpu. arch/x86/kvm/x86.c | 60 --- 1 files changed, 38 insertions(+), 22 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index dd15d7a..e3eae50 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -92,16 +92,17 @@ module_param_named(ignore_msrs, ignore_msrs, bool, S_IRUGO | S_IWUSR); struct kvm_shared_msrs_global { int nr; - struct kvm_shared_msr { - u32 msr; - u64 value; - } msrs[KVM_NR_SHARED_MSRS]; + u32 msrs[KVM_NR_SHARED_MSRS]; }; struct kvm_shared_msrs { struct user_return_notifier urn; bool registered; - u64 current_value[KVM_NR_SHARED_MSRS]; + struct kvm_shared_msr_values { + u64 host; + u64 curr; + bool initialized; + } values[KVM_NR_SHARED_MSRS]; }; static struct kvm_shared_msrs_global __read_mostly shared_msrs_global; @@ -146,53 +147,68 @@ struct kvm_stats_debugfs_item debugfs_entries[] = { static void kvm_on_user_return(struct user_return_notifier *urn) { unsigned slot; - struct kvm_shared_msr *global; struct kvm_shared_msrs *locals = container_of(urn, struct kvm_shared_msrs, urn); + struct kvm_shared_msr_values *values; for (slot = 0; slot shared_msrs_global.nr; ++slot) { - global = shared_msrs_global.msrs[slot]; - if (global-value != locals-current_value[slot]) { - wrmsrl(global-msr, global-value); - locals-current_value[slot] = global-value; + values = locals-values[slot]; + if (values-host != values-curr) { + wrmsrl(shared_msrs_global.msrs[slot], values-host); + values-curr = values-host; } } locals-registered = false; user_return_notifier_unregister(urn); } -void kvm_define_shared_msr(unsigned slot, u32 msr) +static void shared_msr_update(unsigned slot, u32 msr) { - int cpu; + struct kvm_shared_msrs *smsr; u64 value; + smsr = __get_cpu_var(shared_msrs); + /* only read, and nobody should modify it at this time, +* so don't need lock */ + if (slot = shared_msrs_global.nr) { + printk(KERN_ERR kvm: invalid MSR slot!); + return; + } + if (smsr-values[slot].initialized) + return; + rdmsrl_safe(msr, value); + smsr-values[slot].host = value; + smsr-values[slot].curr = value; + smsr-values[slot].initialized = true; + put_cpu_var(shared_msrs); +} + +void kvm_define_shared_msr(unsigned slot, u32 msr) +{ if (slot = shared_msrs_global.nr) shared_msrs_global.nr = slot + 1; - shared_msrs_global.msrs[slot].msr = msr; - rdmsrl_safe(msr, value); - shared_msrs_global.msrs[slot].value = value; - for_each_online_cpu(cpu) - per_cpu(shared_msrs, cpu).current_value[slot] = value; + shared_msrs_global.msrs[slot] = msr; + /* we need ensured the shared_msr_global have been updated */ + smp_wmb(); } EXPORT_SYMBOL_GPL(kvm_define_shared_msr); static void kvm_shared_msr_cpu_online(void) { unsigned i; - struct kvm_shared_msrs *locals = __get_cpu_var(shared_msrs); for (i = 0; i shared_msrs_global.nr; ++i) - locals-current_value[i] = shared_msrs_global.msrs[i].value; + shared_msr_update(i, shared_msrs_global.msrs[i]); } void kvm_set_shared_msr(unsigned slot, u64 value, u64 mask) { struct kvm_shared_msrs *smsr = __get_cpu_var(shared_msrs); - if (((value ^ smsr-current_value[slot]) mask) == 0) + if (((value ^ smsr-values[slot].curr) mask) == 0) return; - smsr-current_value[slot] = value; - wrmsrl(shared_msrs_global.msrs[slot].msr, value); + smsr-values[slot].curr = value; + wrmsrl(shared_msrs_global.msrs[slot], value); if (!smsr-registered) { smsr-urn.on_user_return = kvm_on_user_return; user_return_notifier_register(smsr-urn); -- 1.5.4.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] KVM: VMX: Add instruction rdtscp support for guest
Before enabling, execution of rdtscp in guest would result in #UD. Signed-off-by: Sheng Yang sh...@linux.intel.com --- Reflect guest CPUID on vmcs fields as well, but it involved some more code which would only executed once... Do we need a callback there for post-cpuid setting? arch/x86/include/asm/kvm_host.h |3 ++ arch/x86/include/asm/vmx.h |1 + arch/x86/kvm/svm.c |6 arch/x86/kvm/vmx.c | 49 -- arch/x86/kvm/x86.c | 15 ++- 5 files changed, 69 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 4f865e8..c920285 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -374,6 +374,8 @@ struct kvm_vcpu_arch { /* used for guest single stepping over the given code position */ u16 singlestep_cs; unsigned long singlestep_rip; + + bool cpuid_rdtscp_enabled; }; struct kvm_mem_alias { @@ -532,6 +534,7 @@ struct kvm_x86_ops { int (*get_tdp_level)(void); u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio); bool (*gb_page_enable)(void); + bool (*rdtscp_supported)(void); const struct trace_print_flags *exit_reasons_str; }; diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index 2b49454..8c39320 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -53,6 +53,7 @@ */ #define SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES 0x0001 #define SECONDARY_EXEC_ENABLE_EPT 0x0002 +#define SECONDARY_EXEC_RDTSCP 0x0008 #define SECONDARY_EXEC_ENABLE_VPID 0x0020 #define SECONDARY_EXEC_WBINVD_EXITING 0x0040 #define SECONDARY_EXEC_UNRESTRICTED_GUEST 0x0080 diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 3de0b37..052d91a 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -2912,6 +2912,11 @@ static bool svm_gb_page_enable(void) return true; } +static bool svm_rdtscp_supported(void) +{ + return false; +} + static struct kvm_x86_ops svm_x86_ops = { .cpu_has_kvm_support = has_svm, .disabled_by_bios = is_disabled, @@ -2978,6 +2983,7 @@ static struct kvm_x86_ops svm_x86_ops = { .exit_reasons_str = svm_exit_reasons_str, .gb_page_enable = svm_gb_page_enable, + .rdtscp_supported = svm_rdtscp_supported, }; static int __init svm_init(void) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 5c464ed..f60e645 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -136,6 +136,8 @@ struct vcpu_vmx { ktime_t entry_time; s64 vnmi_blocked_time; u32 exit_reason; + + bool control_rdtscp_enabled; }; static inline struct vcpu_vmx *to_vmx(struct kvm_vcpu *vcpu) @@ -210,7 +212,7 @@ static const u32 vmx_msr_index[] = { #ifdef CONFIG_X86_64 MSR_SYSCALL_MASK, MSR_LSTAR, MSR_CSTAR, #endif - MSR_EFER, MSR_K6_STAR, + MSR_EFER, MSR_TSC_AUX, MSR_K6_STAR, }; #define NR_VMX_MSR ARRAY_SIZE(vmx_msr_index) @@ -347,6 +349,12 @@ static inline int cpu_has_vmx_vpid(void) SECONDARY_EXEC_ENABLE_VPID; } +static inline int cpu_has_vmx_rdtscp(void) +{ + return vmcs_config.cpu_based_2nd_exec_ctrl + SECONDARY_EXEC_RDTSCP; +} + static inline int cpu_has_virtual_nmis(void) { return vmcs_config.pin_based_exec_ctrl PIN_BASED_VIRTUAL_NMIS; @@ -878,6 +886,11 @@ static void vmx_queue_exception(struct kvm_vcpu *vcpu, unsigned nr, vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, intr_info); } +static bool vmx_rdtscp_supported(void) +{ + return cpu_has_vmx_rdtscp(); +} + /* * Swap MSR entry in host/guest MSR entry array. */ @@ -913,6 +926,9 @@ static void setup_msrs(struct vcpu_vmx *vmx) index = __find_msr_index(vmx, MSR_CSTAR); if (index = 0) move_msr_up(vmx, index, save_nmsrs++); + index = __find_msr_index(vmx, MSR_TSC_AUX); + if (index = 0) + move_msr_up(vmx, index, save_nmsrs++); /* * MSR_K6_STAR is only needed on long mode guests, and only * if efer.sce is enabled. @@ -1002,6 +1018,10 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata) case MSR_IA32_SYSENTER_ESP: data = vmcs_readl(GUEST_SYSENTER_ESP); break; + case MSR_TSC_AUX: + if (!vcpu-arch.cpuid_rdtscp_enabled || !vmx_rdtscp_supported()) + return 1; + /* Otherwise falls through */ default: vmx_load_host_state(to_vmx(vcpu)); msr = find_msr_entry(to_vmx(vcpu), msr_index); @@ -1065,7 +1085,15 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data)
Re: SIGTERM to qemu-kvm process destroys qcow2 image?
2009/12/17 Avi Kivity a...@redhat.com: On 12/17/2009 02:52 AM, Kenni Lund wrote: Yesterday I entered an invalid boot device as an argument to my qemu-kvm command for my Windows XP machine, causing an error about a missing boot device in the qemu BIOS/POST. As I didn't have any filesystems mounted inside the virtual machine (since it was stuck at the BIOS asking for a device to boot), I did a kill $pid, fixed the boot device in the qemu-kvm command and tried booting again...but with no luck, whatever I try now with qemu-kvm gives me the error: qemu: could not open disk image /data/virtualization/WindowsXP.img And qemu-img (check, convert, etc) gives me: qemu-img: Could not open 'WindowsXP.img' Can you post the first 4K of the image? It shouldn't contain private data, but go over it (or don't post) if you sensitive information there. 4K dump attached. Best Regards Kenni Lund WindowsXP.img_4k_dump Description: Binary data
Re: [PATCH 6/7] Nested VMX patch 6 implements vmlaunch and vmresume
On 12/10/2009 08:38 PM, or...@il.ibm.com wrote: From: Orit Wassermanor...@il.ibm.com @@ -287,7 +297,7 @@ static inline int vmcs_field_type(unsigned long field) } /* - Returncs VMCS field size in bits + Returns VMCS field size in bits */ Fold. static inline int vmcs_field_size(int field_type, struct kvm_vcpu *vcpu) { @@ -313,6 +323,10 @@ static inline int vmcs_field_size(int field_type, struct kvm_vcpu *vcpu) return 0; } +#define NESTED_VM_EXIT_CONTROLS_MASK (~(VM_EXIT_LOAD_IA32_PAT | \ + VM_EXIT_SAVE_IA32_PAT)) +#define NESTED_VM_ENTRY_CONTROLS_MASK (~(VM_ENTRY_LOAD_IA32_PAT | \ + I think a whitelist is better here, so if we add a new feature and forget it here, we don't introduce a vulnerability. +static inline int nested_cpu_has_vmx_tpr_shadow(struct kvm_vcpu *vcpu) +{ + return cpu_has_vmx_tpr_shadow() + get_shadow_vmcs(vcpu)-cpu_based_vm_exec_control + CPU_BASED_TPR_SHADOW; +} bools are better. + +static inline int nested_cpu_has_secondary_exec_ctrls(struct kvm_vcpu *vcpu) +{ + return cpu_has_secondary_exec_ctrls() + get_shadow_vmcs(vcpu)-cpu_based_vm_exec_control + CPU_BASED_ACTIVATE_SECONDARY_CONTROLS; +} + +static inline bool nested_vm_need_virtualize_apic_accesses(struct kvm_vcpu + *vcpu) +{ + return get_shadow_vmcs(vcpu)-secondary_vm_exec_control + SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES; +} Need to check for secondary controls first. + +static inline int nested_cpu_has_vmx_ept(struct kvm_vcpu *vcpu) +{ + return get_shadow_vmcs(vcpu)- + secondary_vm_exec_control SECONDARY_EXEC_ENABLE_EPT; +} + +static inline int nested_cpu_has_vmx_vpid(struct kvm_vcpu *vcpu) +{ + return get_shadow_vmcs(vcpu)-secondary_vm_exec_control + SECONDARY_EXEC_ENABLE_VPID; +} A helper nested_check_2ndary_control() can help reduce duplication. + +static inline int nested_cpu_has_vmx_pat(struct kvm_vcpu *vcpu) +{ + return get_shadow_vmcs(vcpu)-vm_entry_controls + VM_ENTRY_LOAD_IA32_PAT; +} Suggest dropping PAT support for now (it's optional in the spec IIRC, and doesn't help much). + +void prepare_vmcs_12(struct kvm_vcpu *vcpu) +{ Not sure what this does. From the name, it appears to prepare a vmcs. From the contents, it appears to read the vmcs and save it into a shadow vmcs. + struct shadow_vmcs *l2_shadow_vmcs = + get_shadow_vmcs(vcpu); + + l2_shadow_vmcs-guest_es_selector = vmcs_read16(GUEST_ES_SELECTOR); + l2_shadow_vmcs-guest_cs_selector = vmcs_read16(GUEST_CS_SELECTOR); + l2_shadow_vmcs-guest_ss_selector = vmcs_read16(GUEST_SS_SELECTOR); + l2_shadow_vmcs-guest_ds_selector = vmcs_read16(GUEST_DS_SELECTOR); + l2_shadow_vmcs-guest_fs_selector = vmcs_read16(GUEST_FS_SELECTOR); + l2_shadow_vmcs-guest_gs_selector = vmcs_read16(GUEST_GS_SELECTOR); + l2_shadow_vmcs-guest_ldtr_selector = vmcs_read16(GUEST_LDTR_SELECTOR); + l2_shadow_vmcs-guest_tr_selector = vmcs_read16(GUEST_TR_SELECTOR); + + l2_shadow_vmcs-tsc_offset = vmcs_read64(TSC_OFFSET); + l2_shadow_vmcs-guest_physical_address = + vmcs_read64(GUEST_PHYSICAL_ADDRESS); + l2_shadow_vmcs-vmcs_link_pointer = vmcs_read64(VMCS_LINK_POINTER); + l2_shadow_vmcs-guest_ia32_debugctl = vmcs_read64(GUEST_IA32_DEBUGCTL); + if (vmcs_config.vmentry_ctrl VM_ENTRY_LOAD_IA32_PAT) + l2_shadow_vmcs-guest_ia32_pat = vmcs_read64(GUEST_IA32_PAT); + l2_shadow_vmcs-cr3_target_count = vmcs_read32(CR3_TARGET_COUNT); + l2_shadow_vmcs-vm_entry_intr_info_field = + vmcs_read32(VM_ENTRY_INTR_INFO_FIELD); + l2_shadow_vmcs-vm_entry_exception_error_code = + vmcs_read32(VM_ENTRY_EXCEPTION_ERROR_CODE); + l2_shadow_vmcs-vm_entry_instruction_len = + vmcs_read32(VM_ENTRY_INSTRUCTION_LEN); + l2_shadow_vmcs-vm_instruction_error = + vmcs_read32(VM_INSTRUCTION_ERROR); + l2_shadow_vmcs-vm_exit_reason = vmcs_read32(VM_EXIT_REASON); + l2_shadow_vmcs-vm_exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO); + l2_shadow_vmcs-vm_exit_intr_error_code = + vmcs_read32(VM_EXIT_INTR_ERROR_CODE); + l2_shadow_vmcs-idt_vectoring_info_field = + vmcs_read32(IDT_VECTORING_INFO_FIELD); + l2_shadow_vmcs-idt_vectoring_error_code = + vmcs_read32(IDT_VECTORING_ERROR_CODE); + l2_shadow_vmcs-vm_exit_instruction_len = + vmcs_read32(VM_EXIT_INSTRUCTION_LEN); + l2_shadow_vmcs-vmx_instruction_info = + vmcs_read32(VMX_INSTRUCTION_INFO); + l2_shadow_vmcs-guest_es_limit = vmcs_read32(GUEST_ES_LIMIT); + l2_shadow_vmcs-guest_cs_limit = vmcs_read32(GUEST_CS_LIMIT); +
Re: SIGTERM to qemu-kvm process destroys qcow2 image?
On 12/17/2009 11:38 AM, Kenni Lund wrote: 2009/12/17 Avi Kivitya...@redhat.com: On 12/17/2009 02:52 AM, Kenni Lund wrote: Yesterday I entered an invalid boot device as an argument to my qemu-kvm command for my Windows XP machine, causing an error about a missing boot device in the qemu BIOS/POST. As I didn't have any filesystems mounted inside the virtual machine (since it was stuck at the BIOS asking for a device to boot), I did a kill $pid, fixed the boot device in the qemu-kvm command and tried booting again...but with no luck, whatever I try now with qemu-kvm gives me the error: qemu: could not open disk image /data/virtualization/WindowsXP.img And qemu-img (check, convert, etc) gives me: qemu-img: Could not open 'WindowsXP.img' Can you post the first 4K of the image? It shouldn't contain private data, but go over it (or don't post) if you sensitive information there. 4K dump attached. Seems fine. Kevin can you take a look? You have a backing file. Do qemu-img info and qemu-img check like it? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4] KVM: Extended shared_msr_global to per CPU
On 12/17/2009 11:32 AM, Sheng Yang wrote: shared_msr_global saved host value of relevant MSRs, but it have an assumption that all MSRs it tracked shared the value across the different CPUs. It's not true with some MSRs, e.g. MSR_TSC_AUX. Extend it to per CPU to provide the support of MSR_TSC_AUX, and more alike MSRs. Notice now the shared_msr_global still have one assumption: it can only deal with the MSRs that won't change in host after KVM module loaded. Signed-off-by: Sheng Yangsh...@linux.intel.com --- How about this? Move the all initialization to hardware_enable(). And only initialized once for each cpu. -void kvm_define_shared_msr(unsigned slot, u32 msr) +static void shared_msr_update(unsigned slot, u32 msr) { - int cpu; + struct kvm_shared_msrs *smsr; u64 value; + smsr =__get_cpu_var(shared_msrs); + /* only read, and nobody should modify it at this time, +* so don't need lock */ + if (slot= shared_msrs_global.nr) { + printk(KERN_ERR kvm: invalid MSR slot!); + return; + } + if (smsr-values[slot].initialized) + return; I don't think .initialized is worthwhile. shared_msr_update is run very rarely. + smsr-values[slot].initialized = true; + put_cpu_var(shared_msrs); If you use __get_cpu_var(), you need to remove put_cpu_var(). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] KVM: VMX: Add instruction rdtscp support for guest
On 12/17/2009 11:33 AM, Sheng Yang wrote: Before enabling, execution of rdtscp in guest would result in #UD. Signed-off-by: Sheng Yangsh...@linux.intel.com --- Reflect guest CPUID on vmcs fields as well, but it involved some more code which would only executed once... Do we need a callback there for post-cpuid setting? I guess we do need a callback. - /* Otherwise falls through to kvm_set_msr_common */ + ret = kvm_set_msr_common(vcpu, msr_index, data); + break; + case MSR_TSC_AUX: + if (!vcpu-arch.cpuid_rdtscp_enabled || !vmx_rdtscp_supported()) + return 1; + /* Check reserved bit */ + if ((data 0x) != 0) + return 1; It's 0x... @@ -913,6 +926,9 @@ static void setup_msrs(struct vcpu_vmx *vmx) index = __find_msr_index(vmx, MSR_CSTAR); if (index= 0) move_msr_up(vmx, index, save_nmsrs++); + index = __find_msr_index(vmx, MSR_TSC_AUX); + if (index= 0) + move_msr_up(vmx, index, save_nmsrs++); Only do this if rdtscp is enabled! -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SIGTERM to qemu-kvm process destroys qcow2 image?
Am 17.12.2009 11:23, schrieb Avi Kivity: On 12/17/2009 11:38 AM, Kenni Lund wrote: 2009/12/17 Avi Kivitya...@redhat.com: On 12/17/2009 02:52 AM, Kenni Lund wrote: Yesterday I entered an invalid boot device as an argument to my qemu-kvm command for my Windows XP machine, causing an error about a missing boot device in the qemu BIOS/POST. As I didn't have any filesystems mounted inside the virtual machine (since it was stuck at the BIOS asking for a device to boot), I did a kill $pid, fixed the boot device in the qemu-kvm command and tried booting again...but with no luck, whatever I try now with qemu-kvm gives me the error: qemu: could not open disk image /data/virtualization/WindowsXP.img And qemu-img (check, convert, etc) gives me: qemu-img: Could not open 'WindowsXP.img' Can you post the first 4K of the image? It shouldn't contain private data, but go over it (or don't post) if you sensitive information there. 4K dump attached. Seems fine. Kevin can you take a look? Agreed, the dump looks fine. Even qcow2_open succeeds if I just use this header as a qcow2 file. You have a backing file. Do qemu-img info and qemu-img check like it? So I'd definitely have a look at the backing file. It's almost the only thing that can go wrong after qcow2_open. qemu-img doesn't have known problems with backing files in general, but it does have problems with missing (or broken) backing files. Kenni, could you check if your backing file is still alive? /tmp doesn't sound like a safe place for images. If it still exists in the right location (/tmp/WindowsXP.img.backup) and it is qcow2, can you please attach the first 4k of the backing file? Kevin -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Mouse and keyboard dead in XP guest...
Just upgraded from kvm-83 with evdev keyboard codes backoprted (yes, I know, it's old) to 0.12.0rc2 plus the 2.6.32 source for kvm-kmod. This on 64-bit Linux with an evdev Xorg. Mouse and keyboard simply won't work in my existing XP VMs. Installed a fresh XP3 straight from a slipstreamed ISO. The text-based part of the install took my keyboard input; the GUI part was automated, thanks to nLite. Windows installed fine and I ended at the Ctrl-Alt-Del logon screen. No mouse, no keyboard. Tried via VNC. No mouse, no keyboard. Slackware-64 13.0 -- forget what kernel version. 2.6.29, I think. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] vhost: add missing architectures
vhost is completely portable, but Kconfig include was missing for all architectures besides x86, so it did not appear in the menu. Add the relevant Kconfig includes to all architectures that support virtualization. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- Hi Rusty, please apply the following trivial fixup patch for 2.6.33. Thanks! arch/ia64/kvm/Kconfig|1 + arch/powerpc/kvm/Kconfig |1 + arch/s390/kvm/Kconfig|1 + 3 files changed, 3 insertions(+), 0 deletions(-) diff --git a/arch/ia64/kvm/Kconfig b/arch/ia64/kvm/Kconfig index ef3e7be..01c7579 100644 --- a/arch/ia64/kvm/Kconfig +++ b/arch/ia64/kvm/Kconfig @@ -47,6 +47,7 @@ config KVM_INTEL Provides support for KVM on Itanium 2 processors equipped with the VT extensions. +source drivers/vhost/Kconfig source drivers/virtio/Kconfig endif # VIRTUALIZATION diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig index c299268..a1b4c5d 100644 --- a/arch/powerpc/kvm/Kconfig +++ b/arch/powerpc/kvm/Kconfig @@ -58,6 +58,7 @@ config KVM_E500 If unsure, say N. +source drivers/vhost/Kconfig source drivers/virtio/Kconfig endif # VIRTUALIZATION diff --git a/arch/s390/kvm/Kconfig b/arch/s390/kvm/Kconfig index bf164fc..6be6aea 100644 --- a/arch/s390/kvm/Kconfig +++ b/arch/s390/kvm/Kconfig @@ -36,6 +36,7 @@ config KVM # OK, it's a little counter-intuitive to do this, but it puts it neatly under # the virtualization menu. +source drivers/vhost/Kconfig source drivers/virtio/Kconfig endif # VIRTUALIZATION -- 1.6.6.rc1.43.gf55cc -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH-cc-fixed] vhost: add missing architectures
vhost is completely portable, but Kconfig include was missing for all architectures besides x86, so it did not appear in the menu. Add the relevant Kconfig includes to all architectures that support virtualization. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- Some IBM addresses seem to be bouncing. Reposting with these removed, please send replies to this shorter list to avoid bounces. Sorry about the noise. Hi Rusty, please apply the following trivial fixup patch for 2.6.33. Thanks! arch/ia64/kvm/Kconfig|1 + arch/powerpc/kvm/Kconfig |1 + arch/s390/kvm/Kconfig|1 + 3 files changed, 3 insertions(+), 0 deletions(-) diff --git a/arch/ia64/kvm/Kconfig b/arch/ia64/kvm/Kconfig index ef3e7be..01c7579 100644 --- a/arch/ia64/kvm/Kconfig +++ b/arch/ia64/kvm/Kconfig @@ -47,6 +47,7 @@ config KVM_INTEL Provides support for KVM on Itanium 2 processors equipped with the VT extensions. +source drivers/vhost/Kconfig source drivers/virtio/Kconfig endif # VIRTUALIZATION diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig index c299268..a1b4c5d 100644 --- a/arch/powerpc/kvm/Kconfig +++ b/arch/powerpc/kvm/Kconfig @@ -58,6 +58,7 @@ config KVM_E500 If unsure, say N. +source drivers/vhost/Kconfig source drivers/virtio/Kconfig endif # VIRTUALIZATION diff --git a/arch/s390/kvm/Kconfig b/arch/s390/kvm/Kconfig index bf164fc..6be6aea 100644 --- a/arch/s390/kvm/Kconfig +++ b/arch/s390/kvm/Kconfig @@ -36,6 +36,7 @@ config KVM # OK, it's a little counter-intuitive to do this, but it puts it neatly under # the virtualization menu. +source drivers/vhost/Kconfig source drivers/virtio/Kconfig endif # VIRTUALIZATION -- 1.6.6.rc1.43.gf55cc -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/7] Nested VMX patch 7 handling of nested guest exits
On 12/10/2009 08:38 PM, or...@il.ibm.com wrote: From: Orit Wassermanor...@il.ibm.com (changelog) @@ -1525,6 +1539,22 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) new_offset = vmcs_read64(TSC_OFFSET) + delta; vmcs_write64(TSC_OFFSET, new_offset); } + + if (l1_shadow_vmcs != NULL) { + l1_shadow_vmcs-host_tr_base = + vmcs_readl(HOST_TR_BASE); + l1_shadow_vmcs-host_gdtr_base = + vmcs_readl(HOST_GDTR_BASE); + l1_shadow_vmcs-host_ia32_sysenter_esp = + vmcs_readl(HOST_IA32_SYSENTER_ESP); + + if (tsc_this vcpu-arch.host_tsc) + l1_shadow_vmcs-tsc_offset = + vmcs_read64(TSC_OFFSET); + + if (vmx-nested.nested_mode) + load_vmcs_host_state(l1_shadow_vmcs); + } Please share this code with non-nested vmcs setup. @@ -3794,6 +3824,11 @@ static void enable_irq_window(struct kvm_vcpu *vcpu) { u32 cpu_based_vm_exec_control; + if (to_vmx(vcpu)-nested.nested_mode) { + nested_vmx_intr(vcpu); + return; + } I think this happens too late? enable_irq_window() is called after we've given up on injecting the interrupt because interrupts are disabled. But if we're running a guest, we can vmexit and inject the interrupt. This code will only vmexit. Hm, I see the vmexit code has an in_interrupt case, but I'd like this to be more regular: adjust vmx_interrupt_allowed() to allow interrupts if in a guest, and vmx_inject_irq() to force the vmexit. This way interrupts have a single code path. static int vmx_interrupt_allowed(struct kvm_vcpu *vcpu) { + if (to_vmx(vcpu)-nested.nested_mode) { + if (!nested_vmx_intr(vcpu)) + return 0; + } ... and you do that... so I wonder why the changes to enable_irq_window() are needed? + return (vmcs_readl(GUEST_RFLAGS) X86_EFLAGS_IF) !(vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) (GUEST_INTR_STATE_STI | GUEST_INTR_STATE_MOV_SS)); @@ -4042,6 +4082,10 @@ static int handle_exception(struct kvm_vcpu *vcpu) not interested (exception bitmap 12 does not include NM_VECTOR) enable fpu and resume l2 (avoid switching to l1) */ + + if (vmx-nested.nested_mode) + vmx-nested.nested_run_pending = 1; /* removing this line cause hung on boot of l2*/ + This indicates a hack? vmx_fpu_activate(vcpu); return 1; @@ -4169,7 +4213,33 @@ static int handle_cr(struct kvm_vcpu *vcpu) trace_kvm_cr_write(cr, val); switch (cr) { case 0: - kvm_set_cr0(vcpu, val); + if (to_vmx(vcpu)-nested.nested_mode) { + /* assume only X86_CR0_TS is handled by l0 */ + long new_cr0 = vmcs_readl(GUEST_CR0); + long new_cr0_read_shadow = vmcs_readl(CR0_READ_SHADOW); + + vmx_fpu_deactivate(vcpu); + + if (val X86_CR0_TS) { + new_cr0 |= X86_CR0_TS; + new_cr0_read_shadow |= X86_CR0_TS; + vcpu-arch.cr0 |= X86_CR0_TS; + } else { + new_cr0= ~X86_CR0_TS; + new_cr0_read_shadow= ~X86_CR0_TS; + vcpu-arch.cr0= X86_CR0_TS; + } + + vmcs_writel(GUEST_CR0, new_cr0); + vmcs_writel(CR0_READ_SHADOW, new_cr0_read_shadow); Don't you need to #vmexit if the new cr0 violates the cr0_bits_always_on constraint, or if it changes bits in cr0 that the guest intercepts? + + if (!(val X86_CR0_TS) || !(val X86_CR0_PE)) + vmx_fpu_activate(vcpu); + + to_vmx(vcpu)-nested.nested_run_pending = 1; Please split into a function. + } else + kvm_set_cr0(vcpu, val); + skip_emulated_instruction(vcpu); return 1; case 3: @@ -4196,8 +4266,15 @@ static int handle_cr(struct kvm_vcpu *vcpu) break; case 2: /* clts */ vmx_fpu_deactivate(vcpu); - vcpu-arch.cr0= ~X86_CR0_TS; - vmcs_writel(CR0_READ_SHADOW,
Re: Nested VMX support v4
On 12/10/2009 08:38 PM, or...@il.ibm.com wrote: Avi, We have addressed all of the comments, please apply. I'm afraid there is still a lot of work remaining. This work was inspired by the nested SVM support by Alexander Graf and Joerg Roedel. Please try to make this as readable as the svm work. That means smaller patches, detailed changelogs, reduced forward declarations. I would appreciate documentation of the data structures. I am still confused about the triplication in kvm_vcpu, l1_state, and l2_state - there shouldn't be three, just two. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] KVM: VMX: Add instruction rdtscp support for guest
On Thursday 17 December 2009 18:39:22 Avi Kivity wrote: On 12/17/2009 11:33 AM, Sheng Yang wrote: Before enabling, execution of rdtscp in guest would result in #UD. Signed-off-by: Sheng Yangsh...@linux.intel.com --- Reflect guest CPUID on vmcs fields as well, but it involved some more code which would only executed once... Do we need a callback there for post-cpuid setting? I guess we do need a callback. - /* Otherwise falls through to kvm_set_msr_common */ + ret = kvm_set_msr_common(vcpu, msr_index, data); + break; + case MSR_TSC_AUX: + if (!vcpu-arch.cpuid_rdtscp_enabled || !vmx_rdtscp_supported()) + return 1; + /* Check reserved bit */ + if ((data 0x) != 0) + return 1; It's 0x... ... /me blame himself... @@ -913,6 +926,9 @@ static void setup_msrs(struct vcpu_vmx *vmx) index = __find_msr_index(vmx, MSR_CSTAR); if (index= 0) move_msr_up(vmx, index, save_nmsrs++); + index = __find_msr_index(vmx, MSR_TSC_AUX); + if (index= 0) + move_msr_up(vmx, index, save_nmsrs++); Only do this if rdtscp is enabled! /me blame himself again... -- regards Yang, Sheng -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4] KVM: Extended shared_msr_global to per CPU
On Thursday 17 December 2009 18:32:08 Avi Kivity wrote: On 12/17/2009 11:32 AM, Sheng Yang wrote: shared_msr_global saved host value of relevant MSRs, but it have an assumption that all MSRs it tracked shared the value across the different CPUs. It's not true with some MSRs, e.g. MSR_TSC_AUX. Extend it to per CPU to provide the support of MSR_TSC_AUX, and more alike MSRs. Notice now the shared_msr_global still have one assumption: it can only deal with the MSRs that won't change in host after KVM module loaded. Signed-off-by: Sheng Yangsh...@linux.intel.com --- How about this? Move the all initialization to hardware_enable(). And only initialized once for each cpu. -void kvm_define_shared_msr(unsigned slot, u32 msr) +static void shared_msr_update(unsigned slot, u32 msr) { - int cpu; + struct kvm_shared_msrs *smsr; u64 value; + smsr =__get_cpu_var(shared_msrs); + /* only read, and nobody should modify it at this time, +* so don't need lock */ + if (slot= shared_msrs_global.nr) { + printk(KERN_ERR kvm: invalid MSR slot!); + return; + } + if (smsr-values[slot].initialized) + return; I don't think .initialized is worthwhile. shared_msr_update is run very rarely. The reason is, after cpu hotplug, the MSR_TSC_AUX would be rewritten by vsyscall_init again. But in the hotplug notifier chain, KVM has higher priority(20 vs 0 for vsyscall_init), so maybe the rdmsr() here would get a bogus value... Then I think prevent it from initializing again should be safer. But I just think of another issue: if we hot plug in a cpu(without hot plug off), it would have a bogus value as well in the same path? Sound troublesome... + smsr-values[slot].initialized = true; + put_cpu_var(shared_msrs); If you use __get_cpu_var(), you need to remove put_cpu_var(). Sure... -- regards Yang, Sheng -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/3] Device Assignment fixes
While trying out I stumbled across several issues in the device assignment code. This set addresses the most major ones. Namely allowing passthrough of non page aligned BAR region (like on lpfc) and telling users what to do when their device is in use. Alexander Graf (3): Enable non page boundary BAR device assignment Split off sysfs id retrieval Inform users about busy device assignment attempt hw/device-assignment.c | 244 ++-- 1 files changed, 215 insertions(+), 29 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] Split off sysfs id retrieval
To retreive device and vendor ID from a PCI device, we need to read a sysfs file. That code is currently hand written at least two times, the later patch introducing two more calls. So let's move that out to a function. Signed-off-by: Alexander Graf ag...@suse.de --- hw/device-assignment.c | 66 1 files changed, 44 insertions(+), 22 deletions(-) diff --git a/hw/device-assignment.c b/hw/device-assignment.c index 000fa61..566671c 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -562,14 +562,46 @@ static int assigned_dev_register_regions(PCIRegion *io_regions, return 0; } +static int get_real_id(const char *devpath, const char *idname, uint16_t *val) +{ +FILE *f; +char name[128]; +long id; + +snprintf(name, sizeof(name), %s%s, devpath, idname); +f = fopen(name, r); +if (f == NULL) { +fprintf(stderr, %s: %s: %m\n, __func__, name); +return -1; +} +if (fscanf(f, %li\n, id) == 1) { +*val = id; +} else { +return -1; +} +fclose(f); + +return 0; +} + +static int get_real_vendor_id(const char *devpath, uint16_t *val) +{ +return get_real_id(devpath, vendor, val); +} + +static int get_real_device_id(const char *devpath, uint16_t *val) +{ +return get_real_id(devpath, device, val); +} + static int get_real_device(AssignedDevice *pci_dev, uint8_t r_bus, uint8_t r_dev, uint8_t r_func) { char dir[128], name[128]; -int fd, r = 0; +int fd, r = 0, v; FILE *f; unsigned long long start, end, size, flags; -unsigned long id; +uint16_t id; struct stat statbuf; PCIRegion *rp; PCIDevRegions *dev = pci_dev-real_device; @@ -635,31 +667,21 @@ again: fclose(f); -/* read and fill device ID */ -snprintf(name, sizeof(name), %svendor, dir); -f = fopen(name, r); -if (f == NULL) { -fprintf(stderr, %s: %s: %m\n, __func__, name); +/* read and fill vendor ID */ +v = get_real_vendor_id(dir, id); +if (v) { return 1; } -if (fscanf(f, %li\n, id) == 1) { - pci_dev-dev.config[0] = id 0xff; - pci_dev-dev.config[1] = (id 0xff00) 8; -} -fclose(f); +pci_dev-dev.config[0] = id 0xff; +pci_dev-dev.config[1] = (id 0xff00) 8; -/* read and fill vendor ID */ -snprintf(name, sizeof(name), %sdevice, dir); -f = fopen(name, r); -if (f == NULL) { -fprintf(stderr, %s: %s: %m\n, __func__, name); +/* read and fill device ID */ +v = get_real_device_id(dir, id); +if (v) { return 1; } -if (fscanf(f, %li\n, id) == 1) { - pci_dev-dev.config[2] = id 0xff; - pci_dev-dev.config[3] = (id 0xff00) 8; -} -fclose(f); +pci_dev-dev.config[2] = id 0xff; +pci_dev-dev.config[3] = (id 0xff00) 8; /* dealing with virtual function device */ snprintf(name, sizeof(name), %sphysfn/, dir); -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] Enable non page boundary BAR device assignment
While trying to get device passthrough working with an emulex hba, kvm refused to pass it through because it has a BAR of 256 bytes: Region 0: Memory at d210 (64-bit, non-prefetchable) [size=4K] Region 2: Memory at d2101000 (64-bit, non-prefetchable) [size=256] Region 4: I/O ports at b100 [size=256] Since the page boundary is an arbitrary optimization to allow 1:1 mapping of physical to virtual addresses, we can still take the old MMIO callback route. So let's add a second code path that allows for size 0xFFF != 0 sized regions by looping it through userspace. I verified that it works by passing through an e1000 with this additional patch applied and the card acted the same way it did without this patch: map_func = assigned_dev_iomem_map; -if (cur_region-size 0xFFF) { +if (i != PCI_ROM_SLOT){ fprintf(stderr, PCI region %d at address 0x%llx Signed-off-by: Alexander Graf ag...@suse.de --- v1 - v2: - don't use map_func function pointer - use the same code for mmap on fast and slow path v2 - v3: - address mst's comments --- hw/device-assignment.c | 117 +-- 1 files changed, 112 insertions(+), 5 deletions(-) diff --git a/hw/device-assignment.c b/hw/device-assignment.c index 13a86bb..000fa61 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -148,6 +148,105 @@ static uint32_t assigned_dev_ioport_readl(void *opaque, uint32_t addr) return value; } +static uint32_t slow_bar_readb(void *opaque, target_phys_addr_t addr) +{ +AssignedDevRegion *d = opaque; +uint8_t *in = d-u.r_virtbase + addr; +uint32_t r; + +r = *in; +DEBUG(slow_bar_readl addr=0x TARGET_FMT_plx val=0x%08x\n, addr, r); + +return r; +} + +static uint32_t slow_bar_readw(void *opaque, target_phys_addr_t addr) +{ +AssignedDevRegion *d = opaque; +uint16_t *in = d-u.r_virtbase + addr; +uint32_t r; + +r = *in; +DEBUG(slow_bar_readl addr=0x TARGET_FMT_plx val=0x%08x\n, addr, r); + +return r; +} + +static uint32_t slow_bar_readl(void *opaque, target_phys_addr_t addr) +{ +AssignedDevRegion *d = opaque; +uint32_t *in = d-u.r_virtbase + addr; +uint32_t r; + +r = *in; +DEBUG(slow_bar_readl addr=0x TARGET_FMT_plx val=0x%08x\n, addr, r); + +return r; +} + +static void slow_bar_writeb(void *opaque, target_phys_addr_t addr, uint32_t val) +{ +AssignedDevRegion *d = opaque; +uint8_t *out = d-u.r_virtbase + addr; + +DEBUG(slow_bar_writeb addr=0x TARGET_FMT_plx val=0x%02x\n, addr, val); +*out = val; +} + +static void slow_bar_writew(void *opaque, target_phys_addr_t addr, uint32_t val) +{ +AssignedDevRegion *d = opaque; +uint16_t *out = d-u.r_virtbase + addr; + +DEBUG(slow_bar_writew addr=0x TARGET_FMT_plx val=0x%04x\n, addr, val); +*out = val; +} + +static void slow_bar_writel(void *opaque, target_phys_addr_t addr, uint32_t val) +{ +AssignedDevRegion *d = opaque; +uint32_t *out = d-u.r_virtbase + addr; + +DEBUG(slow_bar_writel addr=0x TARGET_FMT_plx val=0x%08x\n, addr, val); +*out = val; +} + +static CPUWriteMemoryFunc * const slow_bar_write[] = { +slow_bar_writeb, +slow_bar_writew, +slow_bar_writel +}; + +static CPUReadMemoryFunc * const slow_bar_read[] = { +slow_bar_readb, +slow_bar_readw, +slow_bar_readl +}; + +static void assigned_dev_iomem_map_slow(PCIDevice *pci_dev, int region_num, +pcibus_t e_phys, pcibus_t e_size, +int type) +{ +AssignedDevice *r_dev = container_of(pci_dev, AssignedDevice, dev); +AssignedDevRegion *region = r_dev-v_addrs[region_num]; +PCIRegion *real_region = r_dev-real_device.regions[region_num]; +int m; + +DEBUG(slow map\n); +m = cpu_register_io_memory(slow_bar_read, slow_bar_write, region); +cpu_register_physical_memory(e_phys, e_size, m); + +/* MSI-X MMIO page */ +if ((e_size 0) +real_region-base_addr = r_dev-msix_table_addr +real_region-base_addr + real_region-size = r_dev-msix_table_addr) { +int offset = r_dev-msix_table_addr - real_region-base_addr; + +cpu_register_physical_memory(e_phys + offset, +TARGET_PAGE_SIZE, r_dev-mmio_index); +} +} + static void assigned_dev_iomem_map(PCIDevice *pci_dev, int region_num, pcibus_t e_phys, pcibus_t e_size, int type) { @@ -381,15 +480,22 @@ static int assigned_dev_register_regions(PCIRegion *io_regions, /* handle memory io regions */ if (cur_region-type IORESOURCE_MEM) { +int slow_map = 0; int t = cur_region-type IORESOURCE_PREFETCH ? PCI_BASE_ADDRESS_MEM_PREFETCH : PCI_BASE_ADDRESS_SPACE_MEMORY; + if (cur_region-size 0xFFF) { -fprintf(stderr, Unable to assign
Re: [PATCH 2/4] KVM: Extended shared_msr_global to per CPU
On 12/17/2009 05:01 PM, Sheng Yang wrote: -void kvm_define_shared_msr(unsigned slot, u32 msr) +static void shared_msr_update(unsigned slot, u32 msr) { - int cpu; + struct kvm_shared_msrs *smsr; u64 value; + smsr =__get_cpu_var(shared_msrs); + /* only read, and nobody should modify it at this time, +* so don't need lock */ + if (slot= shared_msrs_global.nr) { + printk(KERN_ERR kvm: invalid MSR slot!); + return; + } + if (smsr-values[slot].initialized) + return; I don't think .initialized is worthwhile. shared_msr_update is run very rarely. The reason is, after cpu hotplug, the MSR_TSC_AUX would be rewritten by vsyscall_init again. But in the hotplug notifier chain, KVM has higher priority(20 vs 0 for vsyscall_init), so maybe the rdmsr() here would get a bogus value... Then I think prevent it from initializing again should be safer. So let's raise vsyscall_init's priority? But I just think of another issue: if we hot plug in a cpu(without hot plug off), it would have a bogus value as well in the same path? Sound troublesome... Removing .initialized would take care of this issue. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] Inform users about busy device assignment attempt
When using -pcidevice on a device that is already in use by a kernel driver all the user gets is the following (very useful) information: Failed to assign device 04:00.0 : Device or resource busy Failed to deassign device 04:00.0 : Invalid argument Error initializing device pci-assign Since I usually prefer to have my computer do the thinking for me, I figured it might be a good idea to check and see if a device is actually used by a driver. If so, tell the user. So with this patch applied you get the following output: Failed to assign device 04:00.0 : Device or resource busy *** The driver 'igb' is occupying your device 04:00.0. *** *** You can try the following commands to free it: *** *** $ echo 8086 150a /sys/bus/pci/drivers/pci-stub/new_id *** $ echo :04:00.0 /sys/bus/pci/drivers/igb/unbind *** $ echo :04:00.0 /sys/bus/pci/drivers/pci-stub/bind *** $ echo 8086 150a /sys/bus/pci/drivers/pci-stub/remove_id *** Failed to deassign device 04:00.0 : Invalid argument Error initializing device pci-assign That should keep people like me from doing the most obvious misuses :-). CC: Daniel P. Berrange berra...@redhat.com Signed-off-by: Alexander Graf ag...@suse.de --- v1 - v2: - add more helpful guidance thanks to Daniel Berrange v2 - v3: - clear name variable before using it, thus 0-terminating the string - fix region numbers - use correct unbind/bind names v3 - v4: - split id retrieval part - mst comments v4 - v5: - address chris' comments --- hw/device-assignment.c | 61 ++- 1 files changed, 59 insertions(+), 2 deletions(-) diff --git a/hw/device-assignment.c b/hw/device-assignment.c index 566671c..d6d44eb 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -756,6 +756,54 @@ static uint32_t calc_assigned_dev_id(uint8_t bus, uint8_t devfn) return (uint32_t)bus 8 | (uint32_t)devfn; } +static void assign_failed_examine(AssignedDevice *dev) +{ +char name[PATH_MAX], dir[PATH_MAX], driver[PATH_MAX] = {}, *ns; +uint16_t vendor_id, device_id; +int r; + +/* XXX implement multidomain */ +sprintf(dir, /sys/bus/pci/devices/:%02x:%02x.%01x/, + dev-host.bus, dev-host.dev, dev-host.func); + +sprintf(name, %sdriver, dir); + +r = readlink(name, driver, sizeof(driver)); +if ((r = 0) || r = sizeof(driver) || !(ns = strrchr(driver, '/'))) { +goto fail; +} + +ns++; + +if (get_real_vendor_id(dir, vendor_id) || +get_real_device_id(dir, device_id)) { +goto fail; +} + +fprintf(stderr, *** The driver '%s' is occupying your device +%02x:%02x.%x.\n, +ns, dev-host.bus, dev-host.dev, dev-host.func); +fprintf(stderr, ***\n); +fprintf(stderr, *** You can try the following commands to free it:\n); +fprintf(stderr, ***\n); +fprintf(stderr, *** $ echo \%04x %04x\ /sys/bus/pci/drivers/pci-stub/ +new_id\n, vendor_id, device_id); +fprintf(stderr, *** $ echo \:%02x:%02x.%x\ /sys/bus/pci/drivers/ +%s/unbind\n, +dev-host.bus, dev-host.dev, dev-host.func, ns); +fprintf(stderr, *** $ echo \:%02x:%02x.%x\ /sys/bus/pci/drivers/ +pci-stub/bind\n, +dev-host.bus, dev-host.dev, dev-host.func); +fprintf(stderr, *** $ echo \%04x %04x\ /sys/bus/pci/drivers/pci-stub +/remove_id\n, vendor_id, device_id); +fprintf(stderr, ***\n); + +return; + +fail: +fprintf(stderr, Couldn't find out why.\n); +} + static int assign_device(AssignedDevice *dev) { struct kvm_assigned_pci_dev assigned_dev_data; @@ -781,9 +829,18 @@ static int assign_device(AssignedDevice *dev) #endif r = kvm_assign_pci_device(kvm_context, assigned_dev_data); -if (r 0) - fprintf(stderr, Failed to assign device \%s\ : %s\n, +if (r 0) { +fprintf(stderr, Failed to assign device \%s\ : %s\n, dev-dev.qdev.id, strerror(-r)); + +switch (r) { +case -EBUSY: +assign_failed_examine(dev); +break; +default: +break; +} +} return r; } -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
wifi pci card issue in windows host.
Hello folks. My goal is to make my pci based network card work under windows guest.(obviously using windows drivers) this is device i am interested in: 01:07.0 Network controller: Atheros Communications Inc. AR5416 802.11abgn Wireless PCI Adapter (rev 01) Subsystem: D-Link System Inc Device 3a6b Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B+ DisINTx- Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium TAbort- TAbort- MAbort- SERR- PERR- INTx- Latency: 168, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 16 Region 0: Memory at f9ff (32-bit, non-prefetchable) [size=64K] Capabilities: [40] #00 [] Kernel driver in use: ath9k Kernel modules: ath9k uname -a: Linux dimko 2.6.32 #4 SMP PREEMPT Thu Dec 17 02:45:45 GMT 2009 x86_64 AMD Athlon(tm) 64 X2 Dual Core Processor 6000+ AuthenticAMD GNU/Linux below commands I have executed: #echo 1186 3a6b /sys/bus/pci/drivers/pci-stub/new_id #echo :01:07.0 /sys/bus/pci/devices/\:01\:07.0/driver/unbind #echo :01:07.0 /sys/bus/pci/drivers/pci-stub/bind last command gives: echo: write error: No such device I have pci_stub module loaded. and part of dmesg: [ 8471.257299] wlan0: deauthenticating from 00:1e:58:b4:f6:83 by local choice (reason=3) [ 8471.537053] ath9k :01:07.0: PCI INT A disabled Which i guess is ok. than i load my guest with smth like: qemu xp32.img -pcidevice host=01:07.0 -m 2048 I can see device in linux, i can even install driver for it, and there are no other problems with that. But device is not usable. Basically wireless device sees no networks, even though my network has ESSID opened. I was wondering if that previous echo :01:07.0 /sys/bus/pci/drivers/pci-stub/bind failure can affect me? Windows driver is freshly downloaded off dlink site. Access Point is */definitely/* http://www.google.ie/search?hl=ensafe=offclient=firefox-arls=org.gentoo:en-US:officialhs=h3dei=NVcqS5TnA8be4gbw3siBCQsa=Xoi=spellresnum=0ct=resultcd=1ved=0CAYQBSgAq=definitelyspell=1 working, since i am connecting to it using native linux drivers. Thanks in advance. Dmitri -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
kernel memory allocation bug in 2.6.27.32-2.6.27.41 kvm section
Hello! I can't register new account in bugzilla.kernel.org. / my ISP's spamfilter problem (?) maybe./ -- I sent this mail to Greg KH (2.6.27.y maintainer), he sent me: Can you get the kvm maintainers to agree that this is correct? thanks, greg k-h --- So the bug : I found a memory allocation bug in kvm/mmu.c kvm_main.c. /in kvm_destroy_vm()/ Affected kernel: 2.6.27.32-2.6.27.41 Mainline kernel (2.6.32) is not affected. (Modified kvm subsystem.) Cause: http://git.kernel.org/?p=linux%2Fkernel%2Fgit%2Fstable%2Flinux-2.6.27.y.git;a=commitdiff_plain;h=d2127c8300fb1ec54af56faee17170e7a525326d Solution: Revert this patch. This bug can cause local DoS in the host system. --- Description: After closing of kvm virtual machine kmv_destroy_vm() doens't free mmu_pages. The allocated memory from host's system is LOST. A script for demonstration is here: www.freeweb.hu/oscon/kvm-memory-eater.sh.gz WARNING: this script can cause local DoS. description of script: script starts a kvm with boot image file. After start this script wait 10sec. -wait to boot guest processor (586+kernel image or a windows guest / 486 kernel_image doesn't activate the kvm_amd for example and it doesn't allocate...it seems... /) to activates kvm_svm. you can see the used memory of the host system with top or kinfocenter or other.. After 10 sec this script killes the kvm userspace process - kvm_destroy_vm() is activated. kvm_destroy_vm doesn't free mmu_pages so you lose ~500 mbyte memory from host system. /kvm -m 512/ this script starts a kvm again...and usw... :-) you lose all memory from host. from swap (partition or file) too. -- comment: oom_killer doesn't protect against this bug. Not kvm userspace process allocates the memory but the kernel module (by me: kvm_amd) so there isn't process to kill. Not user initiated process allocates the memory so ulimit or limits.conf restrictions doesn't protect. Remove kvm_amd and kvm modules from system doesn't help because: :-): After killing the script and killing kvm process I tried rmmod kvm_amd : Effect is: Dec 14 16:12:45 osconsfortress kernel: slab error in kmem_cache_destroy(): cache `kvm_pte_chain': Can't free all objects Dec 14 16:12:45 osconsfortress kernel: Pid: 9343, comm: rmmod Tainted: P 2.6.27.41 #1 Dec 14 16:12:45 osconsfortress kernel: [c02d7a27] ? printk+0x18/0x21 Dec 14 16:12:45 osconsfortress kernel: [c017ed48] kmem_cache_destroy+0xb8/0xf0 Dec 14 16:12:45 osconsfortress kernel: [f9885481] mmu_destroy_caches+0x11/0x30 [kvm] Dec 14 16:12:45 osconsfortress kernel: [f9885558] kvm_mmu_module_exit+0x8/0x20 [kvm] Dec 14 16:12:45 osconsfortress kernel: [f987f4c2] kvm_arch_exit+0x12/0x20 [kvm] Dec 14 16:12:45 osconsfortress kernel: [f987b31a] kvm_exit+0x5a/0x80 [kvm] Dec 14 16:12:46 osconsfortress kernel: [f98ce4d4] svm_exit+0x8/0xa [kvm_amd] Dec 14 16:12:46 osconsfortress kernel: [c014dafa] sys_delete_module+0x15a/0x200 Dec 14 16:12:46 osconsfortress kernel: [c0171bc2] ? do_munmap+0x1d2/0x230 Dec 14 16:12:46 osconsfortress kernel: [c0103271] sysenter_do_call+0x12/0x25 Dec 14 16:12:46 osconsfortress kernel: === Dec 14 16:12:46 osconsfortress kernel: slab error in kmem_cache_destroy(): cache `kvm_rmap_desc': Can't free all objects Dec 14 16:12:46 osconsfortress kernel: Pid: 9343, comm: rmmod Tainted: P 2.6.27.41 #1 Dec 14 16:12:46 osconsfortress kernel: [c02d7a27] ? printk+0x18/0x21 Dec 14 16:12:46 osconsfortress kernel: [c017ed48] kmem_cache_destroy+0xb8/0xf0 Dec 14 16:12:46 osconsfortress kernel: [f988548f] mmu_destroy_caches+0x1f/0x30 [kvm] Dec 14 16:12:46 osconsfortress kernel: [f9885558] kvm_mmu_module_exit+0x8/0x20 [kvm] Dec 14 16:12:46 osconsfortress kernel: [f987f4c2] kvm_arch_exit+0x12/0x20 [kvm] Dec 14 16:12:46 osconsfortress kernel: [f987b31a] kvm_exit+0x5a/0x80 [kvm] Dec 14 16:12:46 osconsfortress kernel: [f98ce4d4] svm_exit+0x8/0xa [kvm_amd] Dec 14 16:12:46 osconsfortress kernel: [c014dafa] sys_delete_module+0x15a/0x200 Dec 14 16:12:46 osconsfortress kernel: [c0171bc2] ? do_munmap+0x1d2/0x230 Dec 14 16:12:46 osconsfortress kernel: [c0103271] sysenter_do_call+0x12/0x25 Dec 14 16:12:46 osconsfortress kernel: === Dec 14 16:12:46 osconsfortress kernel: slab error in kmem_cache_destroy(): cache `kvm_mmu_page_header': Can't free all objects Dec 14 16:12:46 osconsfortress kernel: Pid: 9343, comm: rmmod Tainted: P 2.6.27.41 #1 Dec 14 16:12:46 osconsfortress kernel: [c02d7a27] ? printk+0x18/0x21 Dec 14 16:12:46 osconsfortress kernel: [c017ed48] kmem_cache_destroy+0xb8/0xf0 Dec 14 16:12:46 osconsfortress kernel: [f988549d] mmu_destroy_caches+0x2d/0x30 [kvm] Dec 14 16:12:46 osconsfortress kernel: [f9885558] kvm_mmu_module_exit+0x8/0x20 [kvm] Dec 14 16:12:46 osconsfortress kernel: [f987f4c2] kvm_arch_exit+0x12/0x20 [kvm] Dec 14 16:12:46 osconsfortress kernel:
Re: network shutdown under heavy load
What's the exact guest kernel version? When the network is down, please get onto the guest console to determine which direction (if not both) of the network is not functioning. You can run tcpdump in the guest/host and execute pings on both sides to see which direction is blocked. Cheers, on the hosts: uname -a Linux 2.6.31-16-server #53-Ubuntu SMP Tue Dec 8 05:08:02 UTC 2009 x86_64 GNU/Linux I been told that today the network when down again and one of the guys here had to log using the console and restart it for that particular guests.. on the guest: uname -a Linux 2.6.27.25-170.2.72.fc10.x86_64 #1 SMP Sun Jun 21 18:39:34 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux Next time it goes down I will try to run a sniffer and try both sides. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] fix vhost ioctl handling for 32-bit
VHOST_GET_FEATURES returns high-order garbage on 32-bit machines. This patch fixes it to use 64 bits throughout. +-DLS [in-line for viewing, attached to avoid whitespace mangling] Signed-off-by: David L Stevens dlstev...@us.ibm.com --- a/drivers/vhost/net.c 2009-11-17 22:51:56.0 -0800 +++ b/drivers/vhost/net.c 2009-12-17 11:31:51.0 -0800 @@ -563,7 +563,7 @@ { struct vhost_net *n = f-private_data; void __user *argp = (void __user *)arg; - u32 __user *featurep = argp; + u64 __user *featurep = (u64 __user *)argp; struct vhost_vring_file backend; u64 features; int r; @@ -577,7 +577,7 @@ features = VHOST_FEATURES; return put_user(features, featurep); case VHOST_SET_FEATURES: - r = get_user(features, featurep); + r = copy_from_user(features, featurep, sizeof features); /* No features for now */ if (r 0) return r; VH1.patch Description: Binary data
Re: network shutdown under heavy load
On Thu, Dec 17, 2009 at 01:15:46PM -0500, rek2 wrote: I been told that today the network when down again and one of the guys here had to log using the console and restart it for that particular guests.. on the guest: uname -a Linux 2.6.27.25-170.2.72.fc10.x86_64 #1 SMP Sun Jun 21 18:39:34 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux Next time it goes down I will try to run a sniffer and try both sides. OK I'm fairly sure this version has a buggy virtio-net. Does this patch (if it applies :) help? diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 9eec5a5..74b3854 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -521,8 +521,10 @@ static void xmit_tasklet(unsigned long data) vi-svq-vq_ops-kick(vi-svq); vi-last_xmit_skb = NULL; } - if (vi-free_in_tasklet) + if (vi-free_in_tasklet) { free_old_xmit_skbs(vi); + netif_wake_queue(vi-dev); + } netif_tx_unlock_bh(vi-dev); } Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} herb...@gondor.apana.org.au Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Autotest PATCH] KVM test: Add a subtest vnc via which interacts with guest
Signed-off-by: Yolkfull Chow yz...@redhat.com --- client/tests/kvm/tests/vnc.py | 24 client/tests/kvm/tests_base.cfg.sample |3 +++ 2 files changed, 27 insertions(+), 0 deletions(-) create mode 100644 client/tests/kvm/tests/vnc.py diff --git a/client/tests/kvm/tests/vnc.py b/client/tests/kvm/tests/vnc.py new file mode 100644 index 000..0f00379 --- /dev/null +++ b/client/tests/kvm/tests/vnc.py @@ -0,0 +1,24 @@ +import logging, pexpect +from autotest_lib.client.common_lib import error +import kvm_test_utils, kvm_subprocess + +def run_vnc(test, params, env): + +Test whether guest could be interacted with vnc. + +@param test: kvm test object +@param params: Dictionary with the test parameters +@param env: Dictionary with test environment. + +vm = kvm_test_utils.get_living_vm(env, params.get(main_vm)) +session = kvm_test_utils.wait_for_login(vm) + +# Start vnc connection test +vnc_port = str(vm.vnc_port - 5900) +vnc_cmd = vncviewer + localhost: + vnc_port +logging.debug(Using command to vnc connect: %s % vnc_cmd) + +p = kvm_subprocess.run_bg(vnc_cmd, None, logging.debug, (vnc) ) +if not p.is_alive(): +raise error.TestFail(Vnc connect to guest failed) +p.close() diff --git a/client/tests/kvm/tests_base.cfg.sample b/client/tests/kvm/tests_base.cfg.sample index a403399..0eaccae 100644 --- a/client/tests/kvm/tests_base.cfg.sample +++ b/client/tests/kvm/tests_base.cfg.sample @@ -270,6 +270,9 @@ variants: type = physical_resources_check catch_uuid_cmd = dmidecode | awk -F: '/UUID/ {print $2}' +- vnc: install setup unattended_install +type = vnc + # NICs variants: - @rtl8139: -- 1.6.5.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 0/2] Defer skb allocation in virtio_net recv
This patch-set has deferred virtio_net skb allocation in receiving path for both big packets and mergeable buffers. It reduces skb pre-allocations and skb frees. This patch-set also add a new API detach_unused_bufs in virtio. Recv skb queue has been removed in virtio_net. It is based on previous Rusty and Michaels' review, patch has split into two: [PATCH 1/2] virtio: Add detach unused buffer from vring [PATCH 2/2] virtio_net: Defer skb allocation in receive path I copied Rusty's comment as [PATCH 1/2] commit message. This patch is built against Dave's net-next tree. Thanks Shirley -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] virtio: Add detach unused buffer from vring
There's currently no way for a virtio driver to ask for unused buffers, so it has to keep a list itself to reclaim them at shutdown. This is redundant, since virtio_ring stores that information. So add a new hook to do this: virtio_net will be the first user. Signed-off-by: Shirley Ma x...@us.ibm.com --- drivers/virtio/virtio_ring.c | 25 + include/linux/virtio.h |4 2 files changed, 29 insertions(+), 0 deletions(-) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index fbd2ecd..71929ee 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -334,6 +334,30 @@ static bool vring_enable_cb(struct virtqueue *_vq) return true; } +static void *vring_detach_unused_buf(struct virtqueue *_vq) +{ + struct vring_virtqueue *vq = to_vvq(_vq); + unsigned int i; + void *buf; + + START_USE(vq); + + for (i = 0; i vq-vring.num; i++) { + if (!vq-data[i]) + continue; + /* detach_buf clears data, so grab it now. */ + buf = vq-data[i]; + detach_buf(vq, i); + END_USE(vq); + return buf; + } + /* That should have freed everything. */ + BUG_ON(vq-num_free != vq-vring.num); + + END_USE(vq); + return NULL; +} + irqreturn_t vring_interrupt(int irq, void *_vq) { struct vring_virtqueue *vq = to_vvq(_vq); @@ -360,6 +384,7 @@ static struct virtqueue_ops vring_vq_ops = { .kick = vring_kick, .disable_cb = vring_disable_cb, .enable_cb = vring_enable_cb, + .detach_unused_buf = vring_detach_unused_buf, }; struct virtqueue *vring_new_virtqueue(unsigned int num, diff --git a/include/linux/virtio.h b/include/linux/virtio.h index 057a2e0..f508c65 100644 --- a/include/linux/virtio.h +++ b/include/linux/virtio.h @@ -51,6 +51,9 @@ struct virtqueue { * This re-enables callbacks; it returns false if there are pending * buffers in the queue, to detect a possible race between the driver * checking for more work, and enabling callbacks. + * @detach_unused_buf: detach first unused buffer + * vq: the struct virtqueue we're talking about. + * Returns NULL or the data token handed to add_buf * * Locking rules are straightforward: the driver is responsible for * locking. No two operations may be invoked simultaneously, with the exception @@ -71,6 +74,7 @@ struct virtqueue_ops { void (*disable_cb)(struct virtqueue *vq); bool (*enable_cb)(struct virtqueue *vq); + void *(*detach_unused_buf)(struct virtqueue *vq); }; /** -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] virtio_net: Defer skb allocation in receive path
virtio_net receives packets from its pre-allocated vring buffers, then it delivers these packets to upper layer protocols as skb buffs. So it's not necessary to pre-allocate skb for each mergable buffer, then frees extra skbs when buffers are merged into a large packet. This patch has deferred skb allocation in receiving packets for both big packets and mergeable buffers to reduce skb pre-allocations and skb frees. It frees unused buffers by calling detach_unused_buf in vring, so recv skb queue is not needed. Signed-off-by: Shirley Ma x...@us.ibm.com --- drivers/net/virtio_net.c | 426 +++--- 1 files changed, 247 insertions(+), 179 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index c708ecc..bb57b96 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -56,8 +56,7 @@ struct virtnet_info /* Host will merge rx buffers for big packets (shake it! shake it!) */ bool mergeable_rx_bufs; - /* Receive send queues. */ - struct sk_buff_head recv; + /* Send queue. */ struct sk_buff_head send; /* Work struct for refilling if we run low on memory. */ @@ -75,34 +74,44 @@ struct skb_vnet_hdr { unsigned int num_sg; }; +struct padded_vnet_hdr { + struct virtio_net_hdr hdr; + /* +* virtio_net_hdr should be in a separated sg buffer because of a +* QEMU bug, and data sg buffer shares same page with this header sg. +* This padding makes next sg 16 byte aligned after virtio_net_hdr. +*/ + char padding[6]; +}; + static inline struct skb_vnet_hdr *skb_vnet_hdr(struct sk_buff *skb) { return (struct skb_vnet_hdr *)skb-cb; } -static void give_a_page(struct virtnet_info *vi, struct page *page) -{ - page-private = (unsigned long)vi-pages; - vi-pages = page; -} - -static void trim_pages(struct virtnet_info *vi, struct sk_buff *skb) +/* + * private is used to chain pages for big packets, put the whole + * most recent used list in the beginning for reuse + */ +static void give_pages(struct virtnet_info *vi, struct page *page) { - unsigned int i; + struct page *end; - for (i = 0; i skb_shinfo(skb)-nr_frags; i++) - give_a_page(vi, skb_shinfo(skb)-frags[i].page); - skb_shinfo(skb)-nr_frags = 0; - skb-data_len = 0; + /* Find end of list, sew whole thing into vi-pages. */ + for (end = page; end-private; end = (struct page *)end-private); + end-private = (unsigned long)vi-pages; + vi-pages = page; } static struct page *get_a_page(struct virtnet_info *vi, gfp_t gfp_mask) { struct page *p = vi-pages; - if (p) + if (p) { vi-pages = (struct page *)p-private; - else + /* clear private here, it is used to chain pages */ + p-private = 0; + } else p = alloc_page(gfp_mask); return p; } @@ -118,99 +127,142 @@ static void skb_xmit_done(struct virtqueue *svq) netif_wake_queue(vi-dev); } -static void receive_skb(struct net_device *dev, struct sk_buff *skb, - unsigned len) +static void set_skb_frag(struct sk_buff *skb, struct page *page, +unsigned int offset, unsigned int *len) { - struct virtnet_info *vi = netdev_priv(dev); - struct skb_vnet_hdr *hdr = skb_vnet_hdr(skb); - int err; - int i; + int i = skb_shinfo(skb)-nr_frags; + skb_frag_t *f; + + f = skb_shinfo(skb)-frags[i]; + f-size = min((unsigned)PAGE_SIZE - offset, *len); + f-page_offset = offset; + f-page = page; + + skb-data_len += f-size; + skb-len += f-size; + skb_shinfo(skb)-nr_frags++; + *len -= f-size; +} - if (unlikely(len sizeof(struct virtio_net_hdr) + ETH_HLEN)) { - pr_debug(%s: short packet %i\n, dev-name, len); - dev-stats.rx_length_errors++; - goto drop; - } +static struct sk_buff *page_to_skb(struct virtnet_info *vi, + struct page *page, unsigned int len) +{ + struct sk_buff *skb; + struct skb_vnet_hdr *hdr; + unsigned int copy, hdr_len, offset; + char *p; - if (vi-mergeable_rx_bufs) { - unsigned int copy; - char *p = page_address(skb_shinfo(skb)-frags[0].page); + p = page_address(page); - if (len PAGE_SIZE) - len = PAGE_SIZE; - len -= sizeof(struct virtio_net_hdr_mrg_rxbuf); + /* copy small packet so we can reuse these pages for small data */ + skb = netdev_alloc_skb_ip_align(vi-dev, GOOD_COPY_LEN); + if (unlikely(!skb)) + return NULL; - memcpy(hdr-mhdr, p, sizeof(hdr-mhdr)); - p += sizeof(hdr-mhdr); + hdr = skb_vnet_hdr(skb); - copy = len; -
Re: [PATCH net-next 0/2] Defer skb allocation in virtio_net recv
Send skb queue can also be reduced with detach_unused_buf API by it is not a part of this patch. Shirley -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] vhost: add missing architectures
vhost is completely portable, but Kconfig include was missing for all architectures besides x86, so it did not appear in the menu. Add the relevant Kconfig includes to all architectures that support virtualization. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- Hi Rusty, please apply the following trivial fixup patch for 2.6.33. Thanks! arch/ia64/kvm/Kconfig|1 + arch/powerpc/kvm/Kconfig |1 + arch/s390/kvm/Kconfig|1 + 3 files changed, 3 insertions(+), 0 deletions(-) diff --git a/arch/ia64/kvm/Kconfig b/arch/ia64/kvm/Kconfig index ef3e7be..01c7579 100644 --- a/arch/ia64/kvm/Kconfig +++ b/arch/ia64/kvm/Kconfig @@ -47,6 +47,7 @@ config KVM_INTEL Provides support for KVM on Itanium 2 processors equipped with the VT extensions. +source drivers/vhost/Kconfig source drivers/virtio/Kconfig endif # VIRTUALIZATION diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig index c299268..a1b4c5d 100644 --- a/arch/powerpc/kvm/Kconfig +++ b/arch/powerpc/kvm/Kconfig @@ -58,6 +58,7 @@ config KVM_E500 If unsure, say N. +source drivers/vhost/Kconfig source drivers/virtio/Kconfig endif # VIRTUALIZATION diff --git a/arch/s390/kvm/Kconfig b/arch/s390/kvm/Kconfig index bf164fc..6be6aea 100644 --- a/arch/s390/kvm/Kconfig +++ b/arch/s390/kvm/Kconfig @@ -36,6 +36,7 @@ config KVM # OK, it's a little counter-intuitive to do this, but it puts it neatly under # the virtualization menu. +source drivers/vhost/Kconfig source drivers/virtio/Kconfig endif # VIRTUALIZATION -- 1.6.6.rc1.43.gf55cc -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html