Re: [PATCH 18/18] KVM test: Add subtest of testing offload by ethtool
On Mon, 27 Sep 2010 18:44:04 -0400 Lucas Meneghel Rodrigues l...@redhat.com wrote: + +vm = kvm_test_utils.get_living_vm(env, params.get(main_vm)) +session = kvm_test_utils.wait_for_login(vm, + timeout=int(params.get(login_timeout, 360))) +# Let's just error the test if we identify that there's no ethtool installed +if session.get_command_status(ethtool -h): +raise error.TestError(Command ethtool not installed on guest) +session2 = kvm_test_utils.wait_for_login(vm, + timeout=int(params.get(login_timeout, 360))) +mtu = 1514 +feature_status = {} +filename = /tmp/ethtool.dd +guest_ip = vm.get_address() +ethname = kvm_test_utils.get_linux_ifname(session, vm.get_mac_address(0)) +supported_features = params.get(supported_features).split() I guess split this expects input. 23:48:03 ERROR| Test failed: AttributeError: 'NoneType' object has no attribute 'split' 22.12', '00:1a:4a:65:09:09': '192.168.122.66', '9a:52:2f:62:12:63': '192.168.122.151', '9a:52:2f:62:6b:28': '192.168.122.35'}, 'version': 0, 'tcpdump': kvm_subprocess.kvm_tail instance at 0x27cb200} 23:48:05 INFO | ['iteration.1'] 23:48:05 ERROR| Exception escaping from test: Traceback (most recent call last): File /home/pradeep/vhost_net/autotest/client/common_lib/test.py, line 412, in _exec _call_test_function(self.execute, *p_args, **p_dargs) File /home/pradeep/vhost_net/autotest/client/common_lib/test.py, line 605, in _call_test_function raise error.UnhandledTestFail(e) UnhandledTestFail: Unhandled AttributeError: 'NoneType' object has no attribute 'split' Traceback (most recent call last): File /home/pradeep/vhost_net/autotest/client/common_lib/test.py, line 598, in _call_test_function return func(*args, **dargs) File /home/pradeep/vhost_net/autotest/client/common_lib/test.py, line 284, in execute postprocess_profiled_run, args, dargs) File /home/pradeep/vhost_net/autotest/client/common_lib/test.py, line 202, in _call_run_once self.run_once_profiling(postprocess_profiled_run, *args, **dargs) File /home/pradeep/vhost_net/autotest/client/common_lib/test.py, line 308, in run_once_profiling self.run_once(*args, **dargs) File /home/pradeep/vhost_net/autotest/client/tests/kvm/kvm.py, line 73, in run_once run_func(self, params, env) File /home/pradeep/vhost_net/autotest/client/tests/kvm/tests/ethtool.py, line 185, in run_ethtool supported_features = params.get(supported_features).split() AttributeError: 'NoneType' object has no attribute 'split' --Pradeep -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: add oom notifier for virtio balloon
On Tue, 5 Oct 2010 11:15:21 pm Dave Young wrote: Balloon could cause guest memory oom killing and panic. Add oom notify to leak some memory and retry fill balloon after 5 minutes. Have you tried registering a shrinker? See mm.h. Thanks, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/9] msix/kvm integration cleanups
On Tue, Sep 21, 2010 at 06:05:10PM +0200, Avi Kivity wrote: On 09/20/2010 07:02 PM, Michael S. Tsirkin wrote: On Mon, Sep 20, 2010 at 05:06:41PM +0200, Avi Kivity wrote: This cleans up msix/kvm integration a bit. The really important patch is the last one, which allows msix.o to be part of non-target-specific build. I actually thoought this later move should be done in a different way: - add all functions msix uses to kvm-stub.c Isn't that what I did? - kvm_irq_routing_entry should also have a stub I sent some minor comments in case you have a reason to prefer this way. My motivation is really the last patch. If you explain what you'd like to see I'll try to do it. Still looking at this? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/9] msix/kvm integration cleanups
On 10/06/2010 11:39 AM, Michael S. Tsirkin wrote: On Tue, Sep 21, 2010 at 06:05:10PM +0200, Avi Kivity wrote: On 09/20/2010 07:02 PM, Michael S. Tsirkin wrote: On Mon, Sep 20, 2010 at 05:06:41PM +0200, Avi Kivity wrote: This cleans up msix/kvm integration a bit. The really important patch is the last one, which allows msix.o to be part of non-target-specific build. I actually thoought this later move should be done in a different way: - add all functions msix uses to kvm-stub.c Isn't that what I did? - kvm_irq_routing_entry should also have a stub I sent some minor comments in case you have a reason to prefer this way. My motivation is really the last patch. If you explain what you'd like to see I'll try to do it. Still looking at this? I plan to do this yes, when I get a bit of time. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Autotest] [PATCH 18/18] KVM test: Add subtest of testing offload by ethtool
On Wed, 6 Oct 2010 14:26:46 +0530 pradeep psuri...@linux.vnet.ibm.com wrote: On Mon, 27 Sep 2010 18:44:04 -0400 Lucas Meneghel Rodrigues l...@redhat.com wrote: ion, vm.get_mac_address(0)) +supported_features = params.get(supported_features).split() I guess split this expects input. 23:48:03 ERROR| Test failed: AttributeError: 'NoneType' object has no attribute 'split' Neglect my earlier mail. i was using rtl8139. rtl8139 doesnt support this. --Pradeep ___ Autotest mailing list autot...@test.kernel.org http://test.kernel.org/cgi-bin/mailman/listinfo/autotest -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
NIC limit
Hi again everybody, One of the admins at the ProxmoxVE project was gracious enough to quickly release a package including the previously discussed change to allow up to 32 NICs in qemu. For future reference the .deb is here: ftp://download.proxmox.com/debian/dists/lenny/pvetest/binary-amd64/pve-qemu-kvm_0.12.5-2_amd64.deb Upon creating running the VM with the newly patched qemu-kvm app installed, I found a NIC limitation remained in place, presumably imposed by some other aspect of the environment. The machine would start when it had 33 PCI devices, as long as no more than 28 of them were NICs. This is still a vast improvement compared to the previous limit of 8 NICs, and is very good news for my project. I post here in hopes that maybe someone will come across the link in a search and have a solution. More likely however the new API will be in place and widely in use by then, but whatever. Either way, thanks for your help yesterday. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 10/12] Handle async PF in non preemptable context
On Tue, Oct 05, 2010 at 04:51:50PM -0300, Marcelo Tosatti wrote: On Mon, Oct 04, 2010 at 05:56:32PM +0200, Gleb Natapov wrote: If async page fault is received by idle task or when preemp_count is not zero guest cannot reschedule, so do sti; hlt and wait for page to be ready. vcpu can still process interrupts while it waits for the page to be ready. Acked-by: Rik van Riel r...@redhat.com Signed-off-by: Gleb Natapov g...@redhat.com --- arch/x86/kernel/kvm.c | 40 ++-- 1 files changed, 34 insertions(+), 6 deletions(-) diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 36fb3e4..f73946f 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -37,6 +37,7 @@ #include asm/cpu.h #include asm/traps.h #include asm/desc.h +#include asm/tlbflush.h #define MMU_QUEUE_SIZE 1024 @@ -78,6 +79,8 @@ struct kvm_task_sleep_node { wait_queue_head_t wq; u32 token; int cpu; + bool halted; + struct mm_struct *mm; }; static struct kvm_task_sleep_head { @@ -106,6 +109,11 @@ void kvm_async_pf_task_wait(u32 token) struct kvm_task_sleep_head *b = async_pf_sleepers[key]; struct kvm_task_sleep_node n, *e; DEFINE_WAIT(wait); + int cpu, idle; + + cpu = get_cpu(); + idle = idle_cpu(cpu); + put_cpu(); spin_lock(b-lock); e = _find_apf_task(b, token); @@ -119,19 +127,33 @@ void kvm_async_pf_task_wait(u32 token) n.token = token; n.cpu = smp_processor_id(); + n.mm = current-active_mm; + n.halted = idle || preempt_count() 1; + atomic_inc(n.mm-mm_count); Can't see why this reference is needed. I thought that if kernel thread does fault on behalf of some process mm can go away while kernel thread is sleeping. But it looks like kernel thread increase reference to mm it runs with by himself, so may be this is redundant (but not harmful). -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 09/12] Inject asynchronous page fault into a PV guest if page is swapped out.
On Tue, Oct 05, 2010 at 04:00:51PM -0300, Marcelo Tosatti wrote: On Mon, Oct 04, 2010 at 05:56:31PM +0200, Gleb Natapov wrote: Send async page fault to a PV guest if it accesses swapped out memory. Guest will choose another task to run upon receiving the fault. Allow async page fault injection only when guest is in user mode since otherwise guest may be in non-sleepable context and will not be able to reschedule. Vcpu will be halted if guest will fault on the same page again or if vcpu executes kernel code. Signed-off-by: Gleb Natapov g...@redhat.com --- arch/x86/include/asm/kvm_host.h |3 ++ arch/x86/kvm/mmu.c |1 + arch/x86/kvm/x86.c | 49 -- include/trace/events/kvm.h | 17 virt/kvm/async_pf.c |3 +- 5 files changed, 58 insertions(+), 15 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index de31551..2f6fc87 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -419,6 +419,7 @@ struct kvm_vcpu_arch { gfn_t gfns[roundup_pow_of_two(ASYNC_PF_PER_VCPU)]; struct gfn_to_hva_cache data; u64 msr_val; + u32 id; } apf; }; @@ -594,6 +595,7 @@ struct kvm_x86_ops { }; struct kvm_arch_async_pf { + u32 token; gfn_t gfn; }; @@ -842,6 +844,7 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu, struct kvm_async_pf *work); void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work); +bool kvm_arch_can_inject_async_page_present(struct kvm_vcpu *vcpu); extern bool kvm_find_async_pf_gfn(struct kvm_vcpu *vcpu, gfn_t gfn); #endif /* _ASM_X86_KVM_HOST_H */ diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index d85fda8..de53cab 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2580,6 +2580,7 @@ static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gva_t gva, int kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn) { struct kvm_arch_async_pf arch; + arch.token = (vcpu-arch.apf.id++ 12) | vcpu-vcpu_id; arch.gfn = gfn; return kvm_setup_async_pf(vcpu, gva, gfn, arch); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 3e123ab..0e69d37 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6225,25 +6225,58 @@ static void kvm_del_async_pf_gfn(struct kvm_vcpu *vcpu, gfn_t gfn) } } +static int apf_put_user(struct kvm_vcpu *vcpu, u32 val) +{ + + return kvm_write_guest_cached(vcpu-kvm, vcpu-arch.apf.data, val, + sizeof(val)); +} + void kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu, struct kvm_async_pf *work) { - vcpu-arch.mp_state = KVM_MP_STATE_HALTED; - - if (work == kvm_double_apf) + if (work == kvm_double_apf) { trace_kvm_async_pf_doublefault(kvm_rip_read(vcpu)); - else { - trace_kvm_async_pf_not_present(work-gva); - + vcpu-arch.mp_state = KVM_MP_STATE_HALTED; + } else { + trace_kvm_async_pf_not_present(work-arch.token, work-gva); kvm_add_async_pf_gfn(vcpu, work-arch.gfn); + + if (!(vcpu-arch.apf.msr_val KVM_ASYNC_PF_ENABLED) || + kvm_x86_ops-get_cpl(vcpu) == 0) + vcpu-arch.mp_state = KVM_MP_STATE_HALTED; + else if (!apf_put_user(vcpu, KVM_PV_REASON_PAGE_NOT_PRESENT)) { + vcpu-arch.fault.error_code = 0; + vcpu-arch.fault.address = work-arch.token; + kvm_inject_page_fault(vcpu); + } Missed !kvm_event_needs_reinjection(vcpu) ? This check is done in can_do_async_pf(). We will not get here if event is pending. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Autotest] [PATCH 14/18] KVM test: Add a netperf subtest
This case can pass with rhel5.5 rhel6.0, not test with fedora. it would not be the problem of testcase. I did not touch this problem, can you provide more debug info ? eg, tcpdump, ... It seems like RHEL 5.5 issue it fails only with TCP_CRR -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 02/12] Halt vcpu if page it tries to access is swapped out.
On 10/05/2010 04:59 PM, Marcelo Tosatti wrote: On Mon, Oct 04, 2010 at 05:56:24PM +0200, Gleb Natapov wrote: If a guest accesses swapped out memory do not swap it in from vcpu thread context. Schedule work to do swapping and put vcpu into halted state instead. Interrupts will still be delivered to the guest and if interrupt will cause reschedule guest will continue to run another task. Signed-off-by: Gleb Natapovg...@redhat.com --- arch/x86/include/asm/kvm_host.h | 17 +++ arch/x86/kvm/Kconfig|1 + arch/x86/kvm/Makefile |1 + arch/x86/kvm/mmu.c | 51 +- arch/x86/kvm/paging_tmpl.h |4 +- arch/x86/kvm/x86.c | 109 +++- include/linux/kvm_host.h| 31 ++ include/trace/events/kvm.h | 88 virt/kvm/Kconfig|3 + virt/kvm/async_pf.c | 220 +++ virt/kvm/async_pf.h | 36 +++ virt/kvm/kvm_main.c | 57 -- 12 files changed, 603 insertions(+), 15 deletions(-) create mode 100644 virt/kvm/async_pf.c create mode 100644 virt/kvm/async_pf.h + async_pf_cache = NULL; +} + +void kvm_async_pf_vcpu_init(struct kvm_vcpu *vcpu) +{ + INIT_LIST_HEAD(vcpu-async_pf.done); + INIT_LIST_HEAD(vcpu-async_pf.queue); + spin_lock_init(vcpu-async_pf.lock); +} + +static void async_pf_execute(struct work_struct *work) +{ + struct page *page; + struct kvm_async_pf *apf = + container_of(work, struct kvm_async_pf, work); + struct mm_struct *mm = apf-mm; + struct kvm_vcpu *vcpu = apf-vcpu; + unsigned long addr = apf-addr; + gva_t gva = apf-gva; + + might_sleep(); + + use_mm(mm); + down_read(mm-mmap_sem); + get_user_pages(current, mm, addr, 1, 1, 0,page, NULL); + up_read(mm-mmap_sem); + unuse_mm(mm); + + spin_lock(vcpu-async_pf.lock); + list_add_tail(apf-link,vcpu-async_pf.done); + apf-page = page; + spin_unlock(vcpu-async_pf.lock); This can fail, and apf-page become NULL. Does it even become NULL? On error, get_user_pages() won't update the pages argument, so page becomes garbage here. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 02/12] Halt vcpu if page it tries to access is swapped out.
On Wed, Oct 06, 2010 at 12:50:01PM +0200, Avi Kivity wrote: On 10/05/2010 04:59 PM, Marcelo Tosatti wrote: On Mon, Oct 04, 2010 at 05:56:24PM +0200, Gleb Natapov wrote: If a guest accesses swapped out memory do not swap it in from vcpu thread context. Schedule work to do swapping and put vcpu into halted state instead. Interrupts will still be delivered to the guest and if interrupt will cause reschedule guest will continue to run another task. Signed-off-by: Gleb Natapovg...@redhat.com --- arch/x86/include/asm/kvm_host.h | 17 +++ arch/x86/kvm/Kconfig|1 + arch/x86/kvm/Makefile |1 + arch/x86/kvm/mmu.c | 51 +- arch/x86/kvm/paging_tmpl.h |4 +- arch/x86/kvm/x86.c | 109 +++- include/linux/kvm_host.h| 31 ++ include/trace/events/kvm.h | 88 virt/kvm/Kconfig|3 + virt/kvm/async_pf.c | 220 +++ virt/kvm/async_pf.h | 36 +++ virt/kvm/kvm_main.c | 57 -- 12 files changed, 603 insertions(+), 15 deletions(-) create mode 100644 virt/kvm/async_pf.c create mode 100644 virt/kvm/async_pf.h + async_pf_cache = NULL; +} + +void kvm_async_pf_vcpu_init(struct kvm_vcpu *vcpu) +{ + INIT_LIST_HEAD(vcpu-async_pf.done); + INIT_LIST_HEAD(vcpu-async_pf.queue); + spin_lock_init(vcpu-async_pf.lock); +} + +static void async_pf_execute(struct work_struct *work) +{ + struct page *page; + struct kvm_async_pf *apf = + container_of(work, struct kvm_async_pf, work); + struct mm_struct *mm = apf-mm; + struct kvm_vcpu *vcpu = apf-vcpu; + unsigned long addr = apf-addr; + gva_t gva = apf-gva; + + might_sleep(); + + use_mm(mm); + down_read(mm-mmap_sem); + get_user_pages(current, mm, addr, 1, 1, 0,page, NULL); + up_read(mm-mmap_sem); + unuse_mm(mm); + + spin_lock(vcpu-async_pf.lock); + list_add_tail(apf-link,vcpu-async_pf.done); + apf-page = page; + spin_unlock(vcpu-async_pf.lock); This can fail, and apf-page become NULL. Does it even become NULL? On error, get_user_pages() won't update the pages argument, so page becomes garbage here. apf is allocated with kmem_cache_zalloc() and -page is set to NULL in kvm_setup_async_pf() to be extra sure. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 07/12] Add async PF initialization to PV guest.
On Tue, Oct 05, 2010 at 03:25:54PM -0300, Marcelo Tosatti wrote: On Mon, Oct 04, 2010 at 05:56:29PM +0200, Gleb Natapov wrote: Enable async PF in a guest if async PF capability is discovered. Signed-off-by: Gleb Natapov g...@redhat.com --- Documentation/kernel-parameters.txt |3 + arch/x86/include/asm/kvm_para.h |5 ++ arch/x86/kernel/kvm.c | 92 +++ 3 files changed, 100 insertions(+), 0 deletions(-) +static int __cpuinit kvm_cpu_notify(struct notifier_block *self, + unsigned long action, void *hcpu) +{ + int cpu = (unsigned long)hcpu; + switch (action) { + case CPU_ONLINE: + case CPU_DOWN_FAILED: + case CPU_ONLINE_FROZEN: + smp_call_function_single(cpu, kvm_guest_cpu_notify, NULL, 0); wait parameter should probably be 1. Why should we wait for it? FWIW I copied this from somewhere (May be arch/x86/pci/amd_bus.c). -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 03/12] Retry fault before vmentry
On Tue, Oct 05, 2010 at 12:54:09PM -0300, Marcelo Tosatti wrote: On Mon, Oct 04, 2010 at 05:56:25PM +0200, Gleb Natapov wrote: When page is swapped in it is mapped into guest memory only after guest tries to access it again and generate another fault. To save this fault we can map it immediately since we know that guest is going to access the page. Do it only when tdp is enabled for now. Shadow paging case is more complicated. CR[034] and EFER registers should be switched before doing mapping and then switched back. Acked-by: Rik van Riel r...@redhat.com Signed-off-by: Gleb Natapov g...@redhat.com --- arch/x86/include/asm/kvm_host.h |4 +++- arch/x86/kvm/mmu.c | 16 arch/x86/kvm/paging_tmpl.h |6 +++--- arch/x86/kvm/x86.c |7 +++ virt/kvm/async_pf.c |2 ++ 5 files changed, 23 insertions(+), 12 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 5f154d3..b9f263e 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -240,7 +240,7 @@ struct kvm_mmu { void (*new_cr3)(struct kvm_vcpu *vcpu); void (*set_cr3)(struct kvm_vcpu *vcpu, unsigned long root); unsigned long (*get_cr3)(struct kvm_vcpu *vcpu); - int (*page_fault)(struct kvm_vcpu *vcpu, gva_t gva, u32 err); + int (*page_fault)(struct kvm_vcpu *vcpu, gva_t gva, u32 err, bool no_apf); void (*inject_page_fault)(struct kvm_vcpu *vcpu); void (*free)(struct kvm_vcpu *vcpu); gpa_t (*gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t gva, u32 access, @@ -838,6 +838,8 @@ void kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu, struct kvm_async_pf *work); void kvm_arch_async_page_present(struct kvm_vcpu *vcpu, struct kvm_async_pf *work); +void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, + struct kvm_async_pf *work); extern bool kvm_find_async_pf_gfn(struct kvm_vcpu *vcpu, gfn_t gfn); #endif /* _ASM_X86_KVM_HOST_H */ diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 4d49b5e..d85fda8 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2558,7 +2558,7 @@ static gpa_t nonpaging_gva_to_gpa_nested(struct kvm_vcpu *vcpu, gva_t vaddr, } static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gva_t gva, - u32 error_code) + u32 error_code, bool no_apf) { gfn_t gfn; int r; @@ -2594,8 +2594,8 @@ static bool can_do_async_pf(struct kvm_vcpu *vcpu) return kvm_x86_ops-interrupt_allowed(vcpu); } -static bool try_async_pf(struct kvm_vcpu *vcpu, gfn_t gfn, gva_t gva, -pfn_t *pfn) +static bool try_async_pf(struct kvm_vcpu *vcpu, bool no_apf, gfn_t gfn, +gva_t gva, pfn_t *pfn) { bool async; @@ -2606,7 +2606,7 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, gfn_t gfn, gva_t gva, put_page(pfn_to_page(*pfn)); - if (can_do_async_pf(vcpu)) { + if (!no_apf can_do_async_pf(vcpu)) { trace_kvm_try_async_get_page(async, *pfn); if (kvm_find_async_pf_gfn(vcpu, gfn)) { vcpu-async_pf.work = kvm_double_apf; @@ -2620,8 +2620,8 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, gfn_t gfn, gva_t gva, return false; } -static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, - u32 error_code) +static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code, + bool no_apf) { pfn_t pfn; int r; @@ -2643,7 +2643,7 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, mmu_seq = vcpu-kvm-mmu_notifier_seq; smp_rmb(); - if (try_async_pf(vcpu, gfn, gpa, pfn)) + if (try_async_pf(vcpu, no_apf, gfn, gpa, pfn)) return 0; /* mmio */ @@ -3306,7 +3306,7 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u32 error_code) int r; enum emulation_result er; - r = vcpu-arch.mmu.page_fault(vcpu, cr2, error_code); + r = vcpu-arch.mmu.page_fault(vcpu, cr2, error_code, false); if (r 0) goto out; diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 8154353..9ad90f8 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -530,8 +530,8 @@ out_gpte_changed: * Returns: 1 if we need to emulate the instruction, 0 otherwise, or * a negative value on error. */ -static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, - u32 error_code) +static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code, +bool no_apf) { int write_fault =
Re: [PATCH v6 04/12] Add memory slot versioning and use it to provide fast guest write interface
On Tue, Oct 05, 2010 at 01:57:38PM -0300, Marcelo Tosatti wrote: On Mon, Oct 04, 2010 at 05:56:26PM +0200, Gleb Natapov wrote: Keep track of memslots changes by keeping generation number in memslots structure. Provide kvm_write_guest_cached() function that skips gfn_to_hva() translation if memslots was not changed since previous invocation. Signed-off-by: Gleb Natapov g...@redhat.com --- include/linux/kvm_host.h |7 + include/linux/kvm_types.h |7 + virt/kvm/kvm_main.c | 57 +--- 3 files changed, 67 insertions(+), 4 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index a08614e..4dff9a1 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -199,6 +199,7 @@ struct kvm_irq_routing_table {}; struct kvm_memslots { int nmemslots; + u32 generation; struct kvm_memory_slot memslots[KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS]; }; @@ -352,12 +353,18 @@ int kvm_write_guest_page(struct kvm *kvm, gfn_t gfn, const void *data, int offset, int len); int kvm_write_guest(struct kvm *kvm, gpa_t gpa, const void *data, unsigned long len); +int kvm_write_guest_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc, + void *data, unsigned long len); +int kvm_gfn_to_hva_cache_init(struct kvm *kvm, struct gfn_to_hva_cache *ghc, + gpa_t gpa); int kvm_clear_guest_page(struct kvm *kvm, gfn_t gfn, int offset, int len); int kvm_clear_guest(struct kvm *kvm, gpa_t gpa, unsigned long len); struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn); int kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn); unsigned long kvm_host_page_size(struct kvm *kvm, gfn_t gfn); void mark_page_dirty(struct kvm *kvm, gfn_t gfn); +void mark_page_dirty_in_slot(struct kvm *kvm, struct kvm_memory_slot *memslot, +gfn_t gfn); void kvm_vcpu_block(struct kvm_vcpu *vcpu); void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu); diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h index 7ac0d4e..ee6eb71 100644 --- a/include/linux/kvm_types.h +++ b/include/linux/kvm_types.h @@ -67,4 +67,11 @@ struct kvm_lapic_irq { u32 dest_id; }; +struct gfn_to_hva_cache { + u32 generation; + gpa_t gpa; + unsigned long hva; + struct kvm_memory_slot *memslot; +}; + #endif /* __KVM_TYPES_H__ */ diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index db58a1b..45ef50c 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -687,6 +687,7 @@ skip_lpage: memcpy(slots, kvm-memslots, sizeof(struct kvm_memslots)); if (mem-slot = slots-nmemslots) slots-nmemslots = mem-slot + 1; + slots-generation++; slots-memslots[mem-slot].flags |= KVM_MEMSLOT_INVALID; old_memslots = kvm-memslots; @@ -723,6 +724,7 @@ skip_lpage: memcpy(slots, kvm-memslots, sizeof(struct kvm_memslots)); if (mem-slot = slots-nmemslots) slots-nmemslots = mem-slot + 1; + slots-generation++; /* actual memory is freed via old in kvm_free_physmem_slot below */ if (!npages) { @@ -1247,6 +1249,47 @@ int kvm_write_guest(struct kvm *kvm, gpa_t gpa, const void *data, return 0; } +int kvm_gfn_to_hva_cache_init(struct kvm *kvm, struct gfn_to_hva_cache *ghc, + gpa_t gpa) +{ + struct kvm_memslots *slots = kvm_memslots(kvm); + int offset = offset_in_page(gpa); + gfn_t gfn = gpa PAGE_SHIFT; + + ghc-gpa = gpa; + ghc-generation = slots-generation; + ghc-memslot = gfn_to_memslot(kvm, gfn); + ghc-hva = gfn_to_hva(kvm, gfn); + if (!kvm_is_error_hva(ghc-hva)) + ghc-hva += offset; + else + return -EFAULT; + + return 0; +} Should use a unique kvm_memslots structure for the cache entry, since it can change in between (use gfn_to_hva_memslot, etc on slots pointer). I do not understand what do you mean here. kvm_memslots structure itself is not cached only various translation that use it are cached. Translation result are never used if kvm_memslots was changed. Also should zap any cached entries on overflow, otherwise malicious userspace could make use of stale slots: There is only one cached entry at each given time. User who wants to write into guest memory often defines gfn_to_hva_cache variable somewhere. Init it with kvm_gfn_to_hva_cache_init() and then calls kvm_write_guest_cached() on it. If there was no slot changes in between cached translation are used. Otherwise cache is recalculated. +void mark_page_dirty(struct kvm *kvm, gfn_t gfn) +{ + struct kvm_memory_slot *memslot; + + memslot = gfn_to_memslot(kvm, gfn);
Re: [PATCH v6 02/12] Halt vcpu if page it tries to access is swapped out.
On Tue, Oct 05, 2010 at 11:59:16AM -0300, Marcelo Tosatti wrote: On Mon, Oct 04, 2010 at 05:56:24PM +0200, Gleb Natapov wrote: If a guest accesses swapped out memory do not swap it in from vcpu thread context. Schedule work to do swapping and put vcpu into halted state instead. Interrupts will still be delivered to the guest and if interrupt will cause reschedule guest will continue to run another task. Signed-off-by: Gleb Natapov g...@redhat.com --- arch/x86/include/asm/kvm_host.h | 17 +++ arch/x86/kvm/Kconfig|1 + arch/x86/kvm/Makefile |1 + arch/x86/kvm/mmu.c | 51 +- arch/x86/kvm/paging_tmpl.h |4 +- arch/x86/kvm/x86.c | 109 +++- include/linux/kvm_host.h| 31 ++ include/trace/events/kvm.h | 88 virt/kvm/Kconfig|3 + virt/kvm/async_pf.c | 220 +++ virt/kvm/async_pf.h | 36 +++ virt/kvm/kvm_main.c | 57 -- 12 files changed, 603 insertions(+), 15 deletions(-) create mode 100644 virt/kvm/async_pf.c create mode 100644 virt/kvm/async_pf.h + async_pf_cache = NULL; +} + +void kvm_async_pf_vcpu_init(struct kvm_vcpu *vcpu) +{ + INIT_LIST_HEAD(vcpu-async_pf.done); + INIT_LIST_HEAD(vcpu-async_pf.queue); + spin_lock_init(vcpu-async_pf.lock); +} + +static void async_pf_execute(struct work_struct *work) +{ + struct page *page; + struct kvm_async_pf *apf = + container_of(work, struct kvm_async_pf, work); + struct mm_struct *mm = apf-mm; + struct kvm_vcpu *vcpu = apf-vcpu; + unsigned long addr = apf-addr; + gva_t gva = apf-gva; + + might_sleep(); + + use_mm(mm); + down_read(mm-mmap_sem); + get_user_pages(current, mm, addr, 1, 1, 0, page, NULL); + up_read(mm-mmap_sem); + unuse_mm(mm); + + spin_lock(vcpu-async_pf.lock); + list_add_tail(apf-link, vcpu-async_pf.done); + apf-page = page; + spin_unlock(vcpu-async_pf.lock); This can fail, and apf-page become NULL. + if (list_empty_careful(vcpu-async_pf.done)) + return; + + spin_lock(vcpu-async_pf.lock); + work = list_first_entry(vcpu-async_pf.done, typeof(*work), link); + list_del(work-link); + spin_unlock(vcpu-async_pf.lock); + + kvm_arch_async_page_present(vcpu, work); + +free: + list_del(work-queue); + vcpu-async_pf.queued--; + put_page(work-page); + kmem_cache_free(async_pf_cache, work); +} Better handle it here (and other sites). Yeah. We should just reenter gust and let usual code path handle error on next guest access. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [v2 RFC PATCH 0/4] Implement multiqueue virtio-net
On Tuesday 05 October 2010, Krishna Kumar2 wrote: After testing various combinations of #txqs, #vhosts, #netperf sessions, I think the drop for 1 stream is due to TX and RX for a flow being processed on different cpus. I did two more tests: 1. Pin vhosts to same CPU: - BW drop is much lower for 1 stream case (- 5 to -8% range) - But performance is not so high for more sessions. 2. Changed vhost to be single threaded: - No degradation for 1 session, and improvement for upto 8, sometimes 16 streams (5-12%). - BW degrades after that, all the way till 128 netperf sessions. - But overall CPU utilization improves. Summary of the entire run (for 1-128 sessions): txq=4: BW: (-2.3) CPU: (-16.5)RCPU: (-5.3) txq=16: BW: (-1.9) CPU: (-24.9)RCPU: (-9.6) I don't see any reasons mentioned above. However, for higher number of netperf sessions, I see a big increase in retransmissions: ___ #netperf ORG NEW BW (#retr)BW (#retr) ___ 1 70244 (0) 64102 (0) 4 21421 (0) 36570 (416) 8 21746 (0) 38604 (148) 16 21783 (0) 40632 (464) 32 22677 (0) 37163 (1053) 64 23648 (4) 36449 (2197) 12823251 (2) 31676 (3185) ___ This smells like it could be related to a problem that Ben Greear found recently (see macvlan: Enable qdisc backoff logic). When the hardware is busy, used to just drop the packet. With Ben's patch, we return -EAGAIN to qemu (or vhost-net) to trigger a resend. I suppose what we really should do is feed that condition back to the guest network stack and implement the backoff in there. Arnd -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm: fix typo in copyright notice
Fix typo in copyright notice. Signed-off-by: Nicolas Kaiser ni...@nikai.net --- arch/x86/kvm/emulate.c |2 +- arch/x86/kvm/i8254.c |2 +- arch/x86/kvm/i8259.c |2 +- arch/x86/kvm/irq.c |2 +- arch/x86/kvm/lapic.c |2 +- arch/x86/kvm/mmu.c |2 +- arch/x86/kvm/mmu_audit.c |2 +- arch/x86/kvm/paging_tmpl.h |2 +- arch/x86/kvm/svm.c |2 +- arch/x86/kvm/timer.c |2 +- arch/x86/kvm/vmx.c |2 +- arch/x86/kvm/x86.c |2 +- virt/kvm/irq_comm.c|2 +- virt/kvm/kvm_main.c|2 +- 14 files changed, 14 insertions(+), 14 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index aead72e..cb8bd2e 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -9,7 +9,7 @@ * privileged instructions: * * Copyright (C) 2006 Qumranet - * Copyright 2010 Red Hat, Inc. and/or its affilates. + * Copyright 2010 Red Hat, Inc. and/or its affiliates. * * Avi Kivity a...@qumranet.com * Yaniv Kamay ya...@qumranet.com diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c index 2ad40a4..efad723 100644 --- a/arch/x86/kvm/i8254.c +++ b/arch/x86/kvm/i8254.c @@ -5,7 +5,7 @@ * Copyright (c) 2006 Intel Corporation * Copyright (c) 2007 Keir Fraser, XenSource Inc * Copyright (c) 2008 Intel Corporation - * Copyright 2009 Red Hat, Inc. and/or its affilates. + * Copyright 2009 Red Hat, Inc. and/or its affiliates. * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the Software), to deal diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c index 6e77471..cf585f7 100644 --- a/arch/x86/kvm/i8259.c +++ b/arch/x86/kvm/i8259.c @@ -3,7 +3,7 @@ * * Copyright (c) 2003-2004 Fabrice Bellard * Copyright (c) 2007 Intel Corporation - * Copyright 2009 Red Hat, Inc. and/or its affilates. + * Copyright 2009 Red Hat, Inc. and/or its affiliates. * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the Software), to deal diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c index f994da4..7e06ba1 100644 --- a/arch/x86/kvm/irq.c +++ b/arch/x86/kvm/irq.c @@ -1,7 +1,7 @@ /* * irq.c: API for in kernel interrupt controller * Copyright (c) 2007, Intel Corporation. - * Copyright 2009 Red Hat, Inc. and/or its affilates. + * Copyright 2009 Red Hat, Inc. and/or its affiliates. * * This program is free software; you can redistribute it and/or modify it * under the terms and conditions of the GNU General Public License, diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 22b06f7..ed1a533 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -5,7 +5,7 @@ * Copyright (C) 2006 Qumranet, Inc. * Copyright (C) 2007 Novell * Copyright (C) 2007 Intel - * Copyright 2009 Red Hat, Inc. and/or its affilates. + * Copyright 2009 Red Hat, Inc. and/or its affiliates. * * Authors: * Dor Laor dor.l...@qumranet.com diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 6e248d8..3c7d024 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -7,7 +7,7 @@ * MMU support * * Copyright (C) 2006 Qumranet, Inc. - * Copyright 2010 Red Hat, Inc. and/or its affilates. + * Copyright 2010 Red Hat, Inc. and/or its affiliates. * * Authors: * Yaniv Kamay ya...@qumranet.com diff --git a/arch/x86/kvm/mmu_audit.c b/arch/x86/kvm/mmu_audit.c index bd2b1be..ee0feef 100644 --- a/arch/x86/kvm/mmu_audit.c +++ b/arch/x86/kvm/mmu_audit.c @@ -4,7 +4,7 @@ * Audit code for KVM MMU * * Copyright (C) 2006 Qumranet, Inc. - * Copyright 2010 Red Hat, Inc. and/or its affilates. + * Copyright 2010 Red Hat, Inc. and/or its affiliates. * * Authors: * Yaniv Kamay ya...@qumranet.com diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 2bdd843..30cde53 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -7,7 +7,7 @@ * MMU support * * Copyright (C) 2006 Qumranet, Inc. - * Copyright 2010 Red Hat, Inc. and/or its affilates. + * Copyright 2010 Red Hat, Inc. and/or its affiliates. * * Authors: * Yaniv Kamay ya...@qumranet.com diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index eeb08d6..a7fdd78 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -4,7 +4,7 @@ * AMD SVM support * * Copyright (C) 2006 Qumranet, Inc. - * Copyright 2010 Red Hat, Inc. and/or its affilates. + * Copyright 2010 Red Hat, Inc. and/or its affiliates. * * Authors: * Yaniv Kamay ya...@qumranet.com diff --git a/arch/x86/kvm/timer.c b/arch/x86/kvm/timer.c index e16a0db..fc7a101 100644 --- a/arch/x86/kvm/timer.c +++ b/arch/x86/kvm/timer.c @@ -6,7 +6,7 @@ * * timer support * - * Copyright 2010 Red Hat, Inc. and/or its affilates. + * Copyright 2010 Red Hat, Inc. and/or its affiliates. * * This work is licensed under the
Re: [v2 RFC PATCH 0/4] Implement multiqueue virtio-net
On Fri, Sep 17, 2010 at 03:33:07PM +0530, Krishna Kumar wrote: For 1 TCP netperf, I ran 7 iterations and summed it. Explanation for degradation for 1 stream case: I thought about possible RX/TX contention reasons, and I realized that we get/put the mm counter all the time. So I write the following: I haven't seen any performance gain from this in a single queue case, but maybe this will help multiqueue? Thanks, Michael S. Tsirkin (2): vhost: put mm after thread stop vhost-net: batch use/unuse mm drivers/vhost/net.c |7 --- drivers/vhost/vhost.c | 16 ++-- 2 files changed, 10 insertions(+), 13 deletions(-) -- 1.7.3-rc1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] vhost: put mm after thread stop
makes it possible to batch use/unuse mm Signed-off-by: Michael S. Tsirkin m...@redhat.com --- drivers/vhost/vhost.c |9 - 1 files changed, 4 insertions(+), 5 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 677d112..8b9d474 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -207,7 +207,7 @@ static int vhost_worker(void *data) if (work) { __set_current_state(TASK_RUNNING); work-fn(work); - if (n++) { + if (dev-nvqs = ++n) { __set_current_state(TASK_RUNNING); schedule(); n = 0; @@ -409,15 +409,14 @@ void vhost_dev_cleanup(struct vhost_dev *dev) /* No one will access memory at this point */ kfree(dev-memory); dev-memory = NULL; - if (dev-mm) - mmput(dev-mm); - dev-mm = NULL; - WARN_ON(!list_empty(dev-work_list)); if (dev-worker) { kthread_stop(dev-worker); dev-worker = NULL; } + if (dev-mm) + mmput(dev-mm); + dev-mm = NULL; } static int log_access_ok(void __user *log_base, u64 addr, unsigned long sz) -- 1.7.3-rc1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] vhost-net: batch use/unuse mm
Move use/unuse mm to vhost.c which makes it possible to batch these operations. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- drivers/vhost/net.c |7 --- drivers/vhost/vhost.c |7 ++- 2 files changed, 6 insertions(+), 8 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 271678e..ff02ea4 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -10,7 +10,6 @@ #include linux/eventfd.h #include linux/vhost.h #include linux/virtio_net.h -#include linux/mmu_context.h #include linux/miscdevice.h #include linux/module.h #include linux/mutex.h @@ -136,7 +135,6 @@ static void handle_tx(struct vhost_net *net) return; } - use_mm(net-dev.mm); mutex_lock(vq-mutex); vhost_disable_notify(vq); @@ -197,7 +195,6 @@ static void handle_tx(struct vhost_net *net) } mutex_unlock(vq-mutex); - unuse_mm(net-dev.mm); } static int peek_head_len(struct sock *sk) @@ -302,7 +299,6 @@ static void handle_rx_big(struct vhost_net *net) if (!sock || skb_queue_empty(sock-sk-sk_receive_queue)) return; - use_mm(net-dev.mm); mutex_lock(vq-mutex); vhost_disable_notify(vq); hdr_size = vq-vhost_hlen; @@ -381,7 +377,6 @@ static void handle_rx_big(struct vhost_net *net) } mutex_unlock(vq-mutex); - unuse_mm(net-dev.mm); } /* Expects to be always run from workqueue - which acts as @@ -413,7 +408,6 @@ static void handle_rx_mergeable(struct vhost_net *net) if (!sock || skb_queue_empty(sock-sk-sk_receive_queue)) return; - use_mm(net-dev.mm); mutex_lock(vq-mutex); vhost_disable_notify(vq); vhost_hlen = vq-vhost_hlen; @@ -490,7 +484,6 @@ static void handle_rx_mergeable(struct vhost_net *net) } mutex_unlock(vq-mutex); - unuse_mm(net-dev.mm); } static void handle_rx(struct vhost_net *net) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 8b9d474..c83d1c2 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -15,6 +15,7 @@ #include linux/vhost.h #include linux/virtio_net.h #include linux/mm.h +#include linux/mmu_context.h #include linux/miscdevice.h #include linux/mutex.h #include linux/rcupdate.h @@ -179,6 +180,8 @@ static int vhost_worker(void *data) unsigned uninitialized_var(seq); int n = 0; + use_mm(dev-mm); + for (;;) { /* mb paired w/ kthread_stop */ set_current_state(TASK_INTERRUPTIBLE); @@ -193,7 +196,7 @@ static int vhost_worker(void *data) if (kthread_should_stop()) { spin_unlock_irq(dev-work_lock); __set_current_state(TASK_RUNNING); - return 0; + break; } if (!list_empty(dev-work_list)) { work = list_first_entry(dev-work_list, @@ -218,6 +221,8 @@ static int vhost_worker(void *data) } } + unuse_mm(dev-mm); + return 0; } /* Helper to allocate iovec buffers for all vqs. */ -- 1.7.3-rc1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: add oom notifier for virtio balloon
On Wed, Oct 6, 2010 at 5:05 PM, Rusty Russell ru...@rustcorp.com.au wrote: On Tue, 5 Oct 2010 11:15:21 pm Dave Young wrote: Balloon could cause guest memory oom killing and panic. Add oom notify to leak some memory and retry fill balloon after 5 minutes. Have you tried registering a shrinker? See mm.h. Hi, thanks. I didn't know shrinker can shrink mem beyond slab. Will try Thanks, Rusty. -- Regards dave -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Fix calculation of number of entries based on number of mce_banks
The number of mce_banks needs to be multiplied by 4 in order to actually reference all of the entries. Signed-off-by: Dean Nelson dnel...@redhat.com --- qemu-kvm-x86.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c index fd974b3..7fd82fb 100644 --- a/qemu-kvm-x86.c +++ b/qemu-kvm-x86.c @@ -975,7 +975,7 @@ void kvm_arch_load_regs(CPUState *env, int level) } else if (level == KVM_PUT_FULL_STATE) { kvm_msr_entry_set(msrs[n++], MSR_MCG_STATUS, env-mcg_status); kvm_msr_entry_set(msrs[n++], MSR_MCG_CTL, env-mcg_ctl); -for (i = 0; i (env-mcg_cap 0xff); i++) { +for (i = 0; i (env-mcg_cap 0xff) * 4; i++) { kvm_msr_entry_set(msrs[n++], MSR_MC0_CTL + i, env-mce_banks[i]); } } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 03/12] Retry fault before vmentry
On Wed, Oct 06, 2010 at 01:07:04PM +0200, Gleb Natapov wrote: Can't you set a bit in vcpu-requests instead, and handle it in out: at the end of vcpu_enter_guest? To have a single entry point for pagefaults, after vmexit handling. Jumping to out: will skip vmexit handling anyway, so we will not reuse same call site anyway. I don't see yet why the way you propose will have an advantage. What i meant was to call pagefault handler after vmexit handling. Because the way it is in your patch now, with pre pagefault on entry, one has to make an effort to verify ordering wrt other events on entry processing. With pre pagefault after vmexit, its more natural. Does that make sense? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 07/12] Add async PF initialization to PV guest.
On Wed, Oct 06, 2010 at 12:55:04PM +0200, Gleb Natapov wrote: On Tue, Oct 05, 2010 at 03:25:54PM -0300, Marcelo Tosatti wrote: On Mon, Oct 04, 2010 at 05:56:29PM +0200, Gleb Natapov wrote: Enable async PF in a guest if async PF capability is discovered. Signed-off-by: Gleb Natapov g...@redhat.com --- Documentation/kernel-parameters.txt |3 + arch/x86/include/asm/kvm_para.h |5 ++ arch/x86/kernel/kvm.c | 92 +++ 3 files changed, 100 insertions(+), 0 deletions(-) +static int __cpuinit kvm_cpu_notify(struct notifier_block *self, + unsigned long action, void *hcpu) +{ + int cpu = (unsigned long)hcpu; + switch (action) { + case CPU_ONLINE: + case CPU_DOWN_FAILED: + case CPU_ONLINE_FROZEN: + smp_call_function_single(cpu, kvm_guest_cpu_notify, NULL, 0); wait parameter should probably be 1. Why should we wait for it? FWIW I copied this from somewhere (May be arch/x86/pci/amd_bus.c). So that you know its executed in a defined point in cpu bringup. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 04/12] Add memory slot versioning and use it to provide fast guest write interface
On Wed, Oct 06, 2010 at 01:14:17PM +0200, Gleb Natapov wrote: +int kvm_gfn_to_hva_cache_init(struct kvm *kvm, struct gfn_to_hva_cache *ghc, + gpa_t gpa) +{ + struct kvm_memslots *slots = kvm_memslots(kvm); + int offset = offset_in_page(gpa); + gfn_t gfn = gpa PAGE_SHIFT; + + ghc-gpa = gpa; + ghc-generation = slots-generation; kvm-memslots can change here. + ghc-memslot = gfn_to_memslot(kvm, gfn); + ghc-hva = gfn_to_hva(kvm, gfn); And if so, gfn_to_memslot / gfn_to_hva will use new memslots pointer. Should dereference all values from one copy of kvm-memslots pointer. + if (!kvm_is_error_hva(ghc-hva)) + ghc-hva += offset; + else + return -EFAULT; + + return 0; +} Should use a unique kvm_memslots structure for the cache entry, since it can change in between (use gfn_to_hva_memslot, etc on slots pointer). I do not understand what do you mean here. kvm_memslots structure itself is not cached only various translation that use it are cached. Translation result are never used if kvm_memslots was changed. Also should zap any cached entries on overflow, otherwise malicious userspace could make use of stale slots: There is only one cached entry at each given time. User who wants to write into guest memory often defines gfn_to_hva_cache variable somewhere. Init it with kvm_gfn_to_hva_cache_init() and then calls kvm_write_guest_cached() on it. If there was no slot changes in between cached translation are used. Otherwise cache is recalculated. Malicious userspace can cause entry to be cached, ioctl SET_USER_MEMORY_REGION 2^32 times, generation number will match, mark_page_dirty_in_slot will be called with pointer to freed memory. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv2] qemu-kvm/vhost: fix up irqfd support
vhost irqfd support: case where many vqs are mapped to a single msix vector is currently broken. Fix it up. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- This is on top of the qemu patchset, which is unchanged. Fixes from v1: correct error handling hw/msix.c | 68 ++- hw/msix.h |4 +- hw/pci.h|3 +- hw/virtio-pci.c | 56 ++--- 4 files changed, 97 insertions(+), 34 deletions(-) diff --git a/hw/msix.c b/hw/msix.c index 3dd0456..3d4dd61 100644 --- a/hw/msix.c +++ b/hw/msix.c @@ -300,10 +300,8 @@ static void msix_mmio_writel(void *opaque, target_phys_addr_t addr, if (kvm_enabled() kvm_irqchip_in_kernel()) { kvm_msix_update(dev, vector, was_masked, msix_is_masked(dev, vector)); } -if (was_masked != msix_is_masked(dev, vector) -dev-msix_mask_notifier dev-msix_mask_notifier_opaque[vector]) { +if (was_masked != msix_is_masked(dev, vector) dev-msix_mask_notifier) { int r = dev-msix_mask_notifier(dev, vector, - dev-msix_mask_notifier_opaque[vector], msix_is_masked(dev, vector)); assert(r = 0); } @@ -351,9 +349,8 @@ static void msix_mask_all(struct PCIDevice *dev, unsigned nentries) int was_masked = msix_is_masked(dev, vector); dev-msix_table_page[offset] |= MSIX_VECTOR_MASK; if (was_masked != msix_is_masked(dev, vector) -dev-msix_mask_notifier dev-msix_mask_notifier_opaque[vector]) { +dev-msix_mask_notifier) { r = dev-msix_mask_notifier(dev, vector, -dev-msix_mask_notifier_opaque[vector], msix_is_masked(dev, vector)); assert(r = 0); } @@ -379,8 +376,6 @@ int msix_init(struct PCIDevice *dev, unsigned short nentries, sizeof *dev-msix_irq_entries); } #endif -dev-msix_mask_notifier_opaque = -qemu_mallocz(nentries * sizeof *dev-msix_mask_notifier_opaque); dev-msix_mask_notifier = NULL; dev-msix_entry_used = qemu_mallocz(MSIX_MAX_ENTRIES * sizeof *dev-msix_entry_used); @@ -444,8 +439,6 @@ int msix_uninit(PCIDevice *dev) dev-msix_entry_used = NULL; qemu_free(dev-msix_irq_entries); dev-msix_irq_entries = NULL; -qemu_free(dev-msix_mask_notifier_opaque); -dev-msix_mask_notifier_opaque = NULL; dev-cap_present = ~QEMU_PCI_CAP_MSIX; return 0; } @@ -590,46 +583,79 @@ void msix_unuse_all_vectors(PCIDevice *dev) msix_free_irq_entries(dev); } -int msix_set_mask_notifier(PCIDevice *dev, unsigned vector, void *opaque) +static int msix_set_mask_notifier_for_vector(PCIDevice *dev, unsigned vector) { int r = 0; if (vector = dev-msix_entries_nr || !dev-msix_entry_used[vector]) return 0; assert(dev-msix_mask_notifier); -assert(opaque); -assert(!dev-msix_mask_notifier_opaque[vector]); /* Unmask the new notifier unless vector is masked. */ if (!msix_is_masked(dev, vector)) { -r = dev-msix_mask_notifier(dev, vector, opaque, false); +r = dev-msix_mask_notifier(dev, vector, false); if (r 0) { return r; } } -dev-msix_mask_notifier_opaque[vector] = opaque; return r; } -int msix_unset_mask_notifier(PCIDevice *dev, unsigned vector) +static int msix_unset_mask_notifier_for_vector(PCIDevice *dev, unsigned vector) { int r = 0; -void *opaque; if (vector = dev-msix_entries_nr || !dev-msix_entry_used[vector]) return 0; -opaque = dev-msix_mask_notifier_opaque[vector]; - assert(dev-msix_mask_notifier); -assert(opaque); /* Mask the old notifier unless it is already masked. */ if (!msix_is_masked(dev, vector)) { -r = dev-msix_mask_notifier(dev, vector, opaque, true); +r = dev-msix_mask_notifier(dev, vector, true); if (r 0) { return r; } } -dev-msix_mask_notifier_opaque[vector] = NULL; +return r; +} + +int msix_set_mask_notifier(PCIDevice *dev, msix_mask_notifier_func f) +{ +int r, n; +assert(!dev-msix_mask_notifier); +dev-msix_mask_notifier = f; +for (n = 0; n dev-msix_entries_nr; ++n) { +r = msix_set_mask_notifier_for_vector(dev, n); +if (r 0) { +goto undo; +} +} +return 0; + +undo: +while (--n = 0) { +msix_unset_mask_notifier_for_vector(dev, n); +} +dev-msix_mask_notifier = NULL; +return r; +} + +int msix_unset_mask_notifier(PCIDevice *dev) +{ +int r, n; +assert(dev-msix_mask_notifier); +for (n = 0; n dev-msix_entries_nr; ++n) { +r = msix_unset_mask_notifier_for_vector(dev, n); +if (r 0) { +goto undo; +
Re: [PATCH 18/18] KVM test: Add subtest of testing offload by ethtool
* pradeep psuri...@linux.vnet.ibm.com [2010-10-06 03:57]: On Mon, 27 Sep 2010 18:44:04 -0400 Lucas Meneghel Rodrigues l...@redhat.com wrote: + +vm = kvm_test_utils.get_living_vm(env, params.get(main_vm)) +session = kvm_test_utils.wait_for_login(vm, + timeout=int(params.get(login_timeout, 360))) +# Let's just error the test if we identify that there's no ethtool installed +if session.get_command_status(ethtool -h): +raise error.TestError(Command ethtool not installed on guest) +session2 = kvm_test_utils.wait_for_login(vm, + timeout=int(params.get(login_timeout, 360))) +mtu = 1514 +feature_status = {} +filename = /tmp/ethtool.dd +guest_ip = vm.get_address() +ethname = kvm_test_utils.get_linux_ifname(session, vm.get_mac_address(0)) +supported_features = params.get(supported_features).split() I guess split this expects input. 23:48:03 ERROR| Test failed: AttributeError: 'NoneType' object has no attribute 'split' That'll need an update to the tests_base.cfg file to ensure the test type has that config value set. Did the patchset miss updating tests_base.cfg.sample with this one ? 22.12', '00:1a:4a:65:09:09': '192.168.122.66', '9a:52:2f:62:12:63': '192.168.122.151', '9a:52:2f:62:6b:28': '192.168.122.35'}, 'version': 0, 'tcpdump': kvm_subprocess.kvm_tail instance at 0x27cb200} 23:48:05 INFO | ['iteration.1'] 23:48:05 ERROR| Exception escaping from test: Traceback (most recent call last): File /home/pradeep/vhost_net/autotest/client/common_lib/test.py, line 412, in _exec _call_test_function(self.execute, *p_args, **p_dargs) File /home/pradeep/vhost_net/autotest/client/common_lib/test.py, line 605, in _call_test_function raise error.UnhandledTestFail(e) UnhandledTestFail: Unhandled AttributeError: 'NoneType' object has no attribute 'split' Traceback (most recent call last): File /home/pradeep/vhost_net/autotest/client/common_lib/test.py, line 598, in _call_test_function return func(*args, **dargs) File /home/pradeep/vhost_net/autotest/client/common_lib/test.py, line 284, in execute postprocess_profiled_run, args, dargs) File /home/pradeep/vhost_net/autotest/client/common_lib/test.py, line 202, in _call_run_once self.run_once_profiling(postprocess_profiled_run, *args, **dargs) File /home/pradeep/vhost_net/autotest/client/common_lib/test.py, line 308, in run_once_profiling self.run_once(*args, **dargs) File /home/pradeep/vhost_net/autotest/client/tests/kvm/kvm.py, line 73, in run_once run_func(self, params, env) File /home/pradeep/vhost_net/autotest/client/tests/kvm/tests/ethtool.py, line 185, in run_ethtool supported_features = params.get(supported_features).split() AttributeError: 'NoneType' object has no attribute 'split' --Pradeep -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch uq/master 7/8] MCE: Relay UCR MCE to guest
On Wed, Oct 06, 2010 at 10:10:51AM +0900, Hidetoshi Seto wrote: (snip) Index: qemu/kvm.h === --- qemu.orig/kvm.h +++ qemu/kvm.h @@ -110,6 +110,9 @@ int kvm_arch_init_vcpu(CPUState *env); void kvm_arch_reset_vcpu(CPUState *env); +int kvm_on_sigbus(CPUState *env, int code, void *addr); +int kvm_on_sigbus_vcpu(int code, void *addr); + struct kvm_guest_debug; struct kvm_debug_exit_arch; So kvm_on_sigbus() is called from qemu_kvm_eat_signal() that is called on vcpu thread, while kvm_on_sigbus_vcpu() is called via sigbus_handler that invoked on iothread using signalfd. ... Inverse naming? Yes, fixed. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch uq/master 7/8] MCE: Relay UCR MCE to guest
On Wed, Oct 06, 2010 at 10:58:36AM +0900, Hidetoshi Seto wrote: I got some more question: (2010/10/05 3:54), Marcelo Tosatti wrote: Index: qemu/target-i386/cpu.h === --- qemu.orig/target-i386/cpu.h +++ qemu/target-i386/cpu.h @@ -250,16 +250,32 @@ #define PG_ERROR_RSVD_MASK 0x08 #define PG_ERROR_I_D_MASK 0x10 -#define MCG_CTL_P (1UL8) /* MCG_CAP register available */ +#define MCG_CTL_P (1ULL8) /* MCG_CAP register available */ +#define MCG_SER_P (1ULL24) /* MCA recovery/new status bits */ -#define MCE_CAP_DEFMCG_CTL_P +#define MCE_CAP_DEF(MCG_CTL_P|MCG_SER_P) #define MCE_BANKS_DEF 10 It seems that current kvm doesn't support SER_P, so injecting SRAO to guest will mean that guest receives VAL|UC|!PCC and RIPV event from virtual processor that doesn't have SER_P. Dean also noted this. I don't think it was deliberate choice to not expose SER_P. Huang? I think most OSes don't expect that it can receives MCE with !PCC on traditional x86 processor without SER_P. Q1: Is it safe to expect that guests can handle such !PCC event? Q2: What is the expected behavior on the guest? Q3: What happen if guest reboots itself in response to the MCE? Thanks, H.Seto -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2] qemu-kvm/vhost: fix up irqfd support
On Wed, 2010-10-06 at 16:56 +0200, Michael S. Tsirkin wrote: vhost irqfd support: case where many vqs are mapped to a single msix vector is currently broken. Fix it up. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- This is on top of the qemu patchset, which is unchanged. Fixes from v1: correct error handling hw/msix.c | 68 ++- hw/msix.h |4 +- hw/pci.h|3 +- hw/virtio-pci.c | 56 ++--- 4 files changed, 97 insertions(+), 34 deletions(-) diff --git a/hw/msix.c b/hw/msix.c index 3dd0456..3d4dd61 100644 --- a/hw/msix.c +++ b/hw/msix.c @@ -300,10 +300,8 @@ static void msix_mmio_writel(void *opaque, target_phys_addr_t addr, if (kvm_enabled() kvm_irqchip_in_kernel()) { kvm_msix_update(dev, vector, was_masked, msix_is_masked(dev, vector)); } -if (was_masked != msix_is_masked(dev, vector) -dev-msix_mask_notifier dev-msix_mask_notifier_opaque[vector]) { +if (was_masked != msix_is_masked(dev, vector) dev-msix_mask_notifier) { int r = dev-msix_mask_notifier(dev, vector, - dev-msix_mask_notifier_opaque[vector], msix_is_masked(dev, vector)); assert(r = 0); } @@ -351,9 +349,8 @@ static void msix_mask_all(struct PCIDevice *dev, unsigned nentries) int was_masked = msix_is_masked(dev, vector); dev-msix_table_page[offset] |= MSIX_VECTOR_MASK; if (was_masked != msix_is_masked(dev, vector) -dev-msix_mask_notifier dev-msix_mask_notifier_opaque[vector]) { +dev-msix_mask_notifier) { r = dev-msix_mask_notifier(dev, vector, - dev-msix_mask_notifier_opaque[vector], msix_is_masked(dev, vector)); assert(r = 0); } @@ -379,8 +376,6 @@ int msix_init(struct PCIDevice *dev, unsigned short nentries, sizeof *dev-msix_irq_entries); } #endif -dev-msix_mask_notifier_opaque = -qemu_mallocz(nentries * sizeof *dev-msix_mask_notifier_opaque); dev-msix_mask_notifier = NULL; dev-msix_entry_used = qemu_mallocz(MSIX_MAX_ENTRIES * sizeof *dev-msix_entry_used); @@ -444,8 +439,6 @@ int msix_uninit(PCIDevice *dev) dev-msix_entry_used = NULL; qemu_free(dev-msix_irq_entries); dev-msix_irq_entries = NULL; -qemu_free(dev-msix_mask_notifier_opaque); -dev-msix_mask_notifier_opaque = NULL; dev-cap_present = ~QEMU_PCI_CAP_MSIX; return 0; } @@ -590,46 +583,79 @@ void msix_unuse_all_vectors(PCIDevice *dev) msix_free_irq_entries(dev); } -int msix_set_mask_notifier(PCIDevice *dev, unsigned vector, void *opaque) +static int msix_set_mask_notifier_for_vector(PCIDevice *dev, unsigned vector) { int r = 0; if (vector = dev-msix_entries_nr || !dev-msix_entry_used[vector]) return 0; assert(dev-msix_mask_notifier); -assert(opaque); -assert(!dev-msix_mask_notifier_opaque[vector]); /* Unmask the new notifier unless vector is masked. */ if (!msix_is_masked(dev, vector)) { -r = dev-msix_mask_notifier(dev, vector, opaque, false); +r = dev-msix_mask_notifier(dev, vector, false); if (r 0) { return r; } } -dev-msix_mask_notifier_opaque[vector] = opaque; return r; } -int msix_unset_mask_notifier(PCIDevice *dev, unsigned vector) +static int msix_unset_mask_notifier_for_vector(PCIDevice *dev, unsigned vector) { int r = 0; -void *opaque; if (vector = dev-msix_entries_nr || !dev-msix_entry_used[vector]) return 0; -opaque = dev-msix_mask_notifier_opaque[vector]; - assert(dev-msix_mask_notifier); -assert(opaque); /* Mask the old notifier unless it is already masked. */ if (!msix_is_masked(dev, vector)) { -r = dev-msix_mask_notifier(dev, vector, opaque, true); +r = dev-msix_mask_notifier(dev, vector, true); if (r 0) { return r; } } -dev-msix_mask_notifier_opaque[vector] = NULL; +return r; +} The above need to be combined to a single function now since the only difference is s/true/false. Alex + +int msix_set_mask_notifier(PCIDevice *dev, msix_mask_notifier_func f) +{ +int r, n; +assert(!dev-msix_mask_notifier); +dev-msix_mask_notifier = f; +for (n = 0; n dev-msix_entries_nr; ++n) { +r = msix_set_mask_notifier_for_vector(dev, n); +if (r 0) { +goto undo; +} +} +return 0; + +undo: +while (--n = 0) { +msix_unset_mask_notifier_for_vector(dev, n); +} +
Re: [PATCH 18/18] KVM test: Add subtest of testing offload by ethtool
On Wed, 2010-10-06 at 10:55 -0500, Ryan Harper wrote: * pradeep psuri...@linux.vnet.ibm.com [2010-10-06 03:57]: On Mon, 27 Sep 2010 18:44:04 -0400 Lucas Meneghel Rodrigues l...@redhat.com wrote: + +vm = kvm_test_utils.get_living_vm(env, params.get(main_vm)) +session = kvm_test_utils.wait_for_login(vm, + timeout=int(params.get(login_timeout, 360))) +# Let's just error the test if we identify that there's no ethtool installed +if session.get_command_status(ethtool -h): +raise error.TestError(Command ethtool not installed on guest) +session2 = kvm_test_utils.wait_for_login(vm, + timeout=int(params.get(login_timeout, 360))) +mtu = 1514 +feature_status = {} +filename = /tmp/ethtool.dd +guest_ip = vm.get_address() +ethname = kvm_test_utils.get_linux_ifname(session, vm.get_mac_address(0)) +supported_features = params.get(supported_features).split() I guess split this expects input. 23:48:03 ERROR| Test failed: AttributeError: 'NoneType' object has no attribute 'split' That'll need an update to the tests_base.cfg file to ensure the test type has that config value set. Did the patchset miss updating tests_base.cfg.sample with this one ? I think pradeep forgot to update tests_base.cfg indeed. It's working fine for me. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [v2 RFC PATCH 0/4] Implement multiqueue virtio-net
Michael S. Tsirkin m...@redhat.com wrote on 10/06/2010 07:04:31 PM: Michael S. Tsirkin m...@redhat.com 10/06/2010 07:04 PM To Krishna Kumar2/India/i...@ibmin cc ru...@rustcorp.com.au, da...@davemloft.net, kvm@vger.kernel.org, a...@arndb.de, net...@vger.kernel.org, a...@redhat.com, anth...@codemonkey.ws Subject Re: [v2 RFC PATCH 0/4] Implement multiqueue virtio-net On Fri, Sep 17, 2010 at 03:33:07PM +0530, Krishna Kumar wrote: For 1 TCP netperf, I ran 7 iterations and summed it. Explanation for degradation for 1 stream case: I thought about possible RX/TX contention reasons, and I realized that we get/put the mm counter all the time. So I write the following: I haven't seen any performance gain from this in a single queue case, but maybe this will help multiqueue? Great! I am on vacation tomorrow, but will test with this patch tomorrow night. Thanks, - KK -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2] qemu-kvm/vhost: fix up irqfd support
On Wed, Oct 06, 2010 at 10:48:44AM -0600, Alex Williamson wrote: -int msix_unset_mask_notifier(PCIDevice *dev, unsigned vector) +static int msix_unset_mask_notifier_for_vector(PCIDevice *dev, unsigned vector) { int r = 0; -void *opaque; if (vector = dev-msix_entries_nr || !dev-msix_entry_used[vector]) return 0; -opaque = dev-msix_mask_notifier_opaque[vector]; - assert(dev-msix_mask_notifier); -assert(opaque); /* Mask the old notifier unless it is already masked. */ if (!msix_is_masked(dev, vector)) { -r = dev-msix_mask_notifier(dev, vector, opaque, true); +r = dev-msix_mask_notifier(dev, vector, true); if (r 0) { return r; } } -dev-msix_mask_notifier_opaque[vector] = NULL; +return r; +} The above need to be combined to a single function now since the only difference is s/true/false. Alex This is the way it was in the past, and it turned out to be very confusing to read since both variables: mask and assign are bool but polarity is reversed. Unrolled it seems easier to grok. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 8 NIC limit - patch - places limit at 32
It's 8 otherwise- and after the patch is applied, it still only goes to 28 for some reason. 28's acceptable for my needs, so I'll step aside from here leave it to the experts. As for the new -device method, that's all fine good but AFAIK it's not implemented on my platform, so this was the answer. On Wed, 06 Oct 2010 07:54 -0500, Anthony Liguori anth...@codemonkey.ws wrote: On 10/06/2010 12:46 AM, linux_...@proinbox.com wrote: Attached is a patch that allows qemu to have up to 32 NICs, without using the qdev -device method. I'd rather there be no fixed limit and we validate that when add fails because there isn't a TCP slot available, we do the right thing. BTW, using -device, it should be possible to add a very high number of nics because you can specify the PCI address including a function. If this doesn't Just Work today, we should make it work. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [v2 RFC PATCH 0/4] Implement multiqueue virtio-net
Arnd Bergmann a...@arndb.de wrote on 10/06/2010 05:49:00 PM: I don't see any reasons mentioned above. However, for higher number of netperf sessions, I see a big increase in retransmissions: ___ #netperf ORG NEW BW (#retr)BW (#retr) ___ 1 70244 (0) 64102 (0) 4 21421 (0) 36570 (416) 8 21746 (0) 38604 (148) 16 21783 (0) 40632 (464) 32 22677 (0) 37163 (1053) 64 23648 (4) 36449 (2197) 12823251 (2) 31676 (3185) ___ This smells like it could be related to a problem that Ben Greear found recently (see macvlan: Enable qdisc backoff logic). When the hardware is busy, used to just drop the packet. With Ben's patch, we return -EAGAIN to qemu (or vhost-net) to trigger a resend. I suppose what we really should do is feed that condition back to the guest network stack and implement the backoff in there. Thanks for the pointer. I will take a look at this as I hadn't seen this patch earlier. Is there any way to figure out if this is the issue? Thanks, - KK -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
virtio network performance [was: Re: BCM5708 performance issues]
* Chris Wright (chr...@sous-sol.org) wrote: * Pete Ashdown (pashd...@xmission.com) wrote: ProxMox guest: /usr/bin/kvm -monitor unix:/var/run/qemu-server/104.mon,server,nowait -vnc unix:/var/run/qemu-server/104.vnc,password -pidfile /var/run/qemu-server/104.pid -daemonize -usbdevice tablet -name UbuntuServer -smp sockets=2,cores=2 -nodefaults -boot menu=on -vga cirrus -tdf -k en-us -drive file=/var/lib/vz/images/104/vm-104-disk-2.raw,if=ide,index=3 -drive file=/var/lib/vz/images/104/vm-104-disk-1.raw,if=virtio,index=0,boot=on -m 1024 -net tap,vlan=0,ifname=vmtab104i0,script=/var/lib/qemu-server/bridge-vlan -net nic,vlan=0,model=virtio,macaddr=76:3F:1A:03:6D:6F Ubuntu guest: /usr/bin/kvm -S -M pc-0.12 -enable-kvm -m 1024 -smp 1 -name ubutest -uuid c0537369-fffa-9680-2f29-2e0cc0406561 -chardev socket,id=monitor,path=/var/lib/libvirt/qemu/ubutest.monitor,server,nowait -monitor chardev:monitor -boot c -drive file=/dev/vg/ubutest,if=virtio,index=0,boot=on -net nic,macaddr=52:54:00:35:11:f1,vlan=0,model=virtio,name=virtio.0 -net tap,fd=51,vlan=0,name=tap.0 -chardev pty,id=serial0 -serial chardev:serial0 -parallel none -usb -vnc 0.0.0.0 Not sure what userspace you are using, but you are probably not getting any of the useful offload features set. Checking ethtool -k $ETH in the guest will verify that. Try changing this: -net nic,macaddr=52:54:00:35:11:f1,vlan=0,model=virtio,name=virtio.0 \ -net tap,fd=51,vlan=0,name=tap.0 to use newer syntax: -netdev type=tap,id=netdev0 -device virtio-net-pci,mac=52:54:00:35:11:f1,netdev=netdev0 With just a 1Gb link, you should see line rate from guest via virtio. Just to follow-up for the archives. Pete replied offlist that using the above cmdline eliminates the performance issue. thanks, -chris -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NIC limit
* linux_...@proinbox.com (linux_...@proinbox.com) wrote: Hi again everybody, One of the admins at the ProxmoxVE project was gracious enough to quickly release a package including the previously discussed change to allow up to 32 NICs in qemu. You mean they patched qemu to increase the MAX_NICS constant? Nice to get the quick turn around. Te better choice is to use a newer command line. Not only does it avoid the MAX_NICS limitation, but it also enables standard virtio-net offload accelerations. For future reference the .deb is here: ftp://download.proxmox.com/debian/dists/lenny/pvetest/binary-amd64/pve-qemu-kvm_0.12.5-2_amd64.deb Upon creating running the VM with the newly patched qemu-kvm app installed, I found a NIC limitation remained in place, presumably imposed by some other aspect of the environment. The machine would start when it had 33 PCI devices, as long as no more than 28 of them were NICs. The PCI bus has only 32 slots (devices), 3 taken by chipset + vga, and a 4th if you have, for example, a virtio disk. Are you sure these are 33 PCI devices and not 33 PCI functions? thanks, -chris -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2] qemu-kvm/vhost: fix up irqfd support
On Wed, 2010-10-06 at 19:02 +0200, Michael S. Tsirkin wrote: On Wed, Oct 06, 2010 at 10:48:44AM -0600, Alex Williamson wrote: -int msix_unset_mask_notifier(PCIDevice *dev, unsigned vector) +static int msix_unset_mask_notifier_for_vector(PCIDevice *dev, unsigned vector) { int r = 0; -void *opaque; if (vector = dev-msix_entries_nr || !dev-msix_entry_used[vector]) return 0; -opaque = dev-msix_mask_notifier_opaque[vector]; - assert(dev-msix_mask_notifier); -assert(opaque); /* Mask the old notifier unless it is already masked. */ if (!msix_is_masked(dev, vector)) { -r = dev-msix_mask_notifier(dev, vector, opaque, true); +r = dev-msix_mask_notifier(dev, vector, true); if (r 0) { return r; } } -dev-msix_mask_notifier_opaque[vector] = NULL; +return r; +} The above need to be combined to a single function now since the only difference is s/true/false. Alex This is the way it was in the past, and it turned out to be very confusing to read since both variables: mask and assign are bool but polarity is reversed. Unrolled it seems easier to grok. You could always keep the functions as separate wrapper callers of the common function so you only need to keep true = unset, false = set straight in one place. Thanks, Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2] qemu-kvm/vhost: fix up irqfd support
On Wed, Oct 06, 2010 at 11:24:24AM -0600, Alex Williamson wrote: On Wed, 2010-10-06 at 19:02 +0200, Michael S. Tsirkin wrote: On Wed, Oct 06, 2010 at 10:48:44AM -0600, Alex Williamson wrote: -int msix_unset_mask_notifier(PCIDevice *dev, unsigned vector) +static int msix_unset_mask_notifier_for_vector(PCIDevice *dev, unsigned vector) { int r = 0; -void *opaque; if (vector = dev-msix_entries_nr || !dev-msix_entry_used[vector]) return 0; -opaque = dev-msix_mask_notifier_opaque[vector]; - assert(dev-msix_mask_notifier); -assert(opaque); /* Mask the old notifier unless it is already masked. */ if (!msix_is_masked(dev, vector)) { -r = dev-msix_mask_notifier(dev, vector, opaque, true); +r = dev-msix_mask_notifier(dev, vector, true); if (r 0) { return r; } } -dev-msix_mask_notifier_opaque[vector] = NULL; +return r; +} The above need to be combined to a single function now since the only difference is s/true/false. Alex This is the way it was in the past, and it turned out to be very confusing to read since both variables: mask and assign are bool but polarity is reversed. Unrolled it seems easier to grok. You could always keep the functions as separate wrapper callers of the common function so you only need to keep true = unset, false = set straight in one place. Thanks, Alex wrappers still make this confusing. we had so many bugs here, I feel minor duplication is worth it. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch uq/master 3/8] Expose thread_id in info cpus
commit ce6325ff1af34dbaee91c8d28e792277e43f1227 Author: Glauber Costa gco...@redhat.com Date: Wed Mar 5 17:01:10 2008 -0300 Augment info cpus This patch exposes the thread id associated with each cpu through the already well known 'info cpus' interface. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: qemu/cpu-defs.h === --- qemu.orig/cpu-defs.h +++ qemu/cpu-defs.h @@ -197,6 +197,7 @@ typedef struct CPUWatchpoint { int nr_cores; /* number of cores within this CPU package */\ int nr_threads;/* number of threads within this CPU */ \ int running; /* Nonzero if cpu is currently running(usermode). */ \ +int thread_id; \ /* user data */ \ void *opaque; \ \ Index: qemu/cpus.c === --- qemu.orig/cpus.c +++ qemu/cpus.c @@ -539,6 +539,7 @@ static void *kvm_cpu_thread_fn(void *arg qemu_mutex_lock(qemu_global_mutex); qemu_thread_self(env-thread); +env-thread_id = get_thread_id(); if (kvm_enabled()) kvm_init_vcpu(env); @@ -578,6 +579,10 @@ static void *tcg_cpu_thread_fn(void *arg while (!qemu_system_ready) qemu_cond_timedwait(qemu_system_cond, qemu_global_mutex, 100); +for (env = first_cpu; env != NULL; env = env-next_cpu) { +env-thread_id = get_thread_id(); +} + while (1) { cpu_exec_all(); qemu_tcg_wait_io_event(); Index: qemu/exec.c === --- qemu.orig/exec.c +++ qemu/exec.c @@ -637,6 +637,7 @@ void cpu_exec_init(CPUState *env) env-numa_node = 0; QTAILQ_INIT(env-breakpoints); QTAILQ_INIT(env-watchpoints); +env-thread_id = get_thread_id(); *penv = env; #if defined(CONFIG_USER_ONLY) cpu_list_unlock(); Index: qemu/osdep.c === --- qemu.orig/osdep.c +++ qemu/osdep.c @@ -44,6 +44,10 @@ extern int madvise(caddr_t, size_t, int); #endif +#ifdef CONFIG_LINUX +#include sys/syscall.h +#endif + #ifdef CONFIG_EVENTFD #include sys/eventfd.h #endif @@ -200,6 +204,17 @@ int qemu_create_pidfile(const char *file return 0; } +int get_thread_id(void) +{ +#if defined (_WIN32) +return GetCurrentThreadId(); +#elif defined (__linux__) +return syscall(SYS_gettid); +#else +return getpid(); +#endif +} + #ifdef _WIN32 /* mingw32 needs ffs for compilations without optimization. */ Index: qemu/osdep.h === --- qemu.orig/osdep.h +++ qemu/osdep.h @@ -126,6 +126,7 @@ void qemu_vfree(void *ptr); int qemu_madvise(void *addr, size_t len, int advice); int qemu_create_pidfile(const char *filename); +int get_thread_id(void); #ifdef _WIN32 int ffs(int i); Index: qemu/monitor.c === --- qemu.orig/monitor.c +++ qemu/monitor.c @@ -878,6 +878,9 @@ static void print_cpu_iter(QObject *obj, monitor_printf(mon, (halted)); } +monitor_printf(mon, thread_id=% PRId64 , + qdict_get_int(cpu, thread_id)); + monitor_printf(mon, \n); } @@ -922,6 +925,7 @@ static void do_info_cpus(Monitor *mon, Q #elif defined(TARGET_MIPS) qdict_put(cpu, PC, qint_from_int(env-active_tc.PC)); #endif +qdict_put(cpu, thread_id, qint_from_int(env-thread_id)); qlist_append(cpu_list, cpu); } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch uq/master 6/8] Add RAM - physical addr mapping in MCE simulation
From: Huang Ying ying.hu...@intel.com In QEMU-KVM, physical address != RAM address. While MCE simulation needs physical address instead of RAM address. So kvm_physical_memory_addr_from_ram() is implemented to do the conversion, and it is invoked before being filled in the IA32_MCi_ADDR MSR. Reported-by: Dean Nelson dnel...@redhat.com Signed-off-by: Huang Ying ying.hu...@intel.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: qemu/kvm-all.c === --- qemu.orig/kvm-all.c +++ qemu/kvm-all.c @@ -137,6 +137,24 @@ static KVMSlot *kvm_lookup_overlapping_s return found; } +int kvm_physical_memory_addr_from_ram(KVMState *s, ram_addr_t ram_addr, + target_phys_addr_t *phys_addr) +{ +int i; + +for (i = 0; i ARRAY_SIZE(s-slots); i++) { +KVMSlot *mem = s-slots[i]; + +if (ram_addr = mem-phys_offset +ram_addr mem-phys_offset + mem-memory_size) { +*phys_addr = mem-start_addr + (ram_addr - mem-phys_offset); +return 1; +} +} + +return 0; +} + static int kvm_set_user_memory_region(KVMState *s, KVMSlot *slot) { struct kvm_userspace_memory_region mem; Index: qemu/kvm.h === --- qemu.orig/kvm.h +++ qemu/kvm.h @@ -174,6 +174,9 @@ static inline void cpu_synchronize_post_ } } +int kvm_physical_memory_addr_from_ram(KVMState *s, ram_addr_t ram_addr, + target_phys_addr_t *phys_addr); + #endif int kvm_set_ioeventfd_mmio_long(int fd, uint32_t adr, uint32_t val, bool assign); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch uq/master 8/8] Add savevm/loadvm support for MCE
Port qemu-kvm's commit 1bab5d11545d8de5facf46c28630085a2f9651ae Author: Huang Ying ying.hu...@intel.com Date: Wed Mar 3 16:52:46 2010 +0800 Add savevm/loadvm support for MCE MCE registers are saved/load into/from CPUState in kvm_arch_save/load_regs. To simulate the MCG_STATUS clearing upon reset, MSR_MCG_STATUS is set to 0 for KVM_PUT_RESET_STATE. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: qemu/target-i386/kvm.c === --- qemu.orig/target-i386/kvm.c +++ qemu/target-i386/kvm.c @@ -774,7 +774,7 @@ static int kvm_put_msrs(CPUState *env, i struct kvm_msr_entry entries[100]; } msr_data; struct kvm_msr_entry *msrs = msr_data.entries; -int n = 0; +int i, n = 0; kvm_msr_entry_set(msrs[n++], MSR_IA32_SYSENTER_CS, env-sysenter_cs); kvm_msr_entry_set(msrs[n++], MSR_IA32_SYSENTER_ESP, env-sysenter_esp); @@ -794,6 +794,18 @@ static int kvm_put_msrs(CPUState *env, i env-system_time_msr); kvm_msr_entry_set(msrs[n++], MSR_KVM_WALL_CLOCK, env-wall_clock_msr); } +#ifdef KVM_CAP_MCE +if (env-mcg_cap) { +if (level == KVM_PUT_RESET_STATE) +kvm_msr_entry_set(msrs[n++], MSR_MCG_STATUS, env-mcg_status); +else if (level == KVM_PUT_FULL_STATE) { +kvm_msr_entry_set(msrs[n++], MSR_MCG_STATUS, env-mcg_status); +kvm_msr_entry_set(msrs[n++], MSR_MCG_CTL, env-mcg_ctl); +for (i = 0; i (env-mcg_cap 0xff) * 4; i++) +kvm_msr_entry_set(msrs[n++], MSR_MC0_CTL + i, env-mce_banks[i]); +} +} +#endif msr_data.info.nmsrs = n; @@ -1001,6 +1013,15 @@ static int kvm_get_msrs(CPUState *env) msrs[n++].index = MSR_KVM_SYSTEM_TIME; msrs[n++].index = MSR_KVM_WALL_CLOCK; +#ifdef KVM_CAP_MCE +if (env-mcg_cap) { +msrs[n++].index = MSR_MCG_STATUS; +msrs[n++].index = MSR_MCG_CTL; +for (i = 0; i (env-mcg_cap 0xff) * 4; i++) +msrs[n++].index = MSR_MC0_CTL + i; +} +#endif + msr_data.info.nmsrs = n; ret = kvm_vcpu_ioctl(env, KVM_GET_MSRS, msr_data); if (ret 0) @@ -1043,6 +1064,22 @@ static int kvm_get_msrs(CPUState *env) case MSR_KVM_WALL_CLOCK: env-wall_clock_msr = msrs[i].data; break; +#ifdef KVM_CAP_MCE +case MSR_MCG_STATUS: +env-mcg_status = msrs[i].data; +break; +case MSR_MCG_CTL: +env-mcg_ctl = msrs[i].data; +break; +#endif +default: +#ifdef KVM_CAP_MCE +if (msrs[i].index = MSR_MC0_CTL +msrs[i].index MSR_MC0_CTL + (env-mcg_cap 0xff) * 4) { +env-mce_banks[msrs[i].index - MSR_MC0_CTL] = msrs[i].data; +break; +} +#endif } } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch uq/master 5/8] Export qemu_ram_addr_from_host
To be used by next patches. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: qemu/cpu-common.h === --- qemu.orig/cpu-common.h +++ qemu/cpu-common.h @@ -47,7 +47,8 @@ void qemu_ram_free(ram_addr_t addr); /* This should only be used for ram local to a device. */ void *qemu_get_ram_ptr(ram_addr_t addr); /* This should not be used by devices. */ -ram_addr_t qemu_ram_addr_from_host(void *ptr); +int qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr); +ram_addr_t qemu_ram_addr_from_host_nofail(void *ptr); int cpu_register_io_memory(CPUReadMemoryFunc * const *mem_read, CPUWriteMemoryFunc * const *mem_write, Index: qemu/exec.c === --- qemu.orig/exec.c +++ qemu/exec.c @@ -2086,7 +2086,7 @@ static inline void tlb_update_dirty(CPUT if ((tlb_entry-addr_write ~TARGET_PAGE_MASK) == IO_MEM_RAM) { p = (void *)(unsigned long)((tlb_entry-addr_write TARGET_PAGE_MASK) + tlb_entry-addend); -ram_addr = qemu_ram_addr_from_host(p); +ram_addr = qemu_ram_addr_from_host_nofail(p); if (!cpu_physical_memory_is_dirty(ram_addr)) { tlb_entry-addr_write |= TLB_NOTDIRTY; } @@ -2938,23 +2938,31 @@ void *qemu_get_ram_ptr(ram_addr_t addr) return NULL; } -/* Some of the softmmu routines need to translate from a host pointer - (typically a TLB entry) back to a ram offset. */ -ram_addr_t qemu_ram_addr_from_host(void *ptr) +int qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr) { RAMBlock *block; uint8_t *host = ptr; QLIST_FOREACH(block, ram_list.blocks, next) { if (host - block-host block-length) { -return block-offset + (host - block-host); +*ram_addr = block-offset + (host - block-host); +return 0; } } +return -1; +} -fprintf(stderr, Bad ram pointer %p\n, ptr); -abort(); +/* Some of the softmmu routines need to translate from a host pointer + (typically a TLB entry) back to a ram offset. */ +ram_addr_t qemu_ram_addr_from_host_nofail(void *ptr) +{ +ram_addr_t ram_addr; -return 0; +if (qemu_ram_addr_from_host(ptr, ram_addr)) { +fprintf(stderr, Bad ram pointer %p\n, ptr); +abort(); +} +return ram_addr; } static uint32_t unassigned_mem_readb(void *opaque, target_phys_addr_t addr) @@ -3703,7 +3711,7 @@ void cpu_physical_memory_unmap(void *buf { if (buffer != bounce.buffer) { if (is_write) { -ram_addr_t addr1 = qemu_ram_addr_from_host(buffer); +ram_addr_t addr1 = qemu_ram_addr_from_host_nofail(buffer); while (access_len) { unsigned l; l = TARGET_PAGE_SIZE; Index: qemu/exec-all.h === --- qemu.orig/exec-all.h +++ qemu/exec-all.h @@ -334,7 +334,7 @@ static inline tb_page_addr_t get_page_ad } p = (void *)(unsigned long)addr + env1-tlb_table[mmu_idx][page_index].addend; -return qemu_ram_addr_from_host(p); +return qemu_ram_addr_from_host_nofail(p); } #endif -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Fix calculation of number of entries based on number of mce_banks
On Wed, Oct 06, 2010 at 10:08:19AM -0400, Dean Nelson wrote: The number of mce_banks needs to be multiplied by 4 in order to actually reference all of the entries. Signed-off-by: Dean Nelson dnel...@redhat.com Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch uq/master 2/8] iothread: use signalfd
Block SIGALRM, SIGIO and consume them via signalfd. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: qemu/cpus.c === --- qemu.orig/cpus.c +++ qemu/cpus.c @@ -33,6 +33,7 @@ #include exec-all.h #include cpus.h +#include compatfd.h #ifdef SIGRTMIN #define SIG_IPI (SIGRTMIN+4) @@ -329,14 +330,75 @@ static QemuCond qemu_work_cond; static void tcg_init_ipi(void); static void kvm_init_ipi(CPUState *env); -static void unblock_io_signals(void); +static sigset_t block_io_signals(void); + +/* If we have signalfd, we mask out the signals we want to handle and then + * use signalfd to listen for them. We rely on whatever the current signal + * handler is to dispatch the signals when we receive them. + */ +static void sigfd_handler(void *opaque) +{ +int fd = (unsigned long) opaque; +struct qemu_signalfd_siginfo info; +struct sigaction action; +ssize_t len; + +while (1) { +do { +len = read(fd, info, sizeof(info)); +} while (len == -1 errno == EINTR); + +if (len == -1 errno == EAGAIN) { +break; +} + +if (len != sizeof(info)) { +printf(read from sigfd returned %zd: %m\n, len); +return; +} + +sigaction(info.ssi_signo, NULL, action); +if ((action.sa_flags SA_SIGINFO) action.sa_sigaction) { +action.sa_sigaction(info.ssi_signo, +(siginfo_t *)info, NULL); +} else if (action.sa_handler) { +action.sa_handler(info.ssi_signo); +} +} +} + +static int qemu_signalfd_init(sigset_t mask) +{ +int sigfd; + +sigfd = qemu_signalfd(mask); +if (sigfd == -1) { +fprintf(stderr, failed to create signalfd\n); +return -errno; +} + +fcntl_setfl(sigfd, O_NONBLOCK); + +qemu_set_fd_handler2(sigfd, NULL, sigfd_handler, NULL, + (void *)(unsigned long) sigfd); + +return 0; +} int qemu_init_main_loop(void) { int ret; +sigset_t blocked_signals; cpu_set_debug_excp_handler(cpu_debug_handler); +blocked_signals = block_io_signals(); + +ret = qemu_signalfd_init(blocked_signals); +if (ret) +return ret; + +/* Note eventfd must be drained before signalfd handlers run */ ret = qemu_event_init(); if (ret) return ret; @@ -347,7 +409,6 @@ int qemu_init_main_loop(void) qemu_mutex_init(qemu_global_mutex); qemu_mutex_lock(qemu_global_mutex); -unblock_io_signals(); qemu_thread_self(io_thread); return 0; @@ -586,19 +647,22 @@ static void kvm_init_ipi(CPUState *env) } } -static void unblock_io_signals(void) +static sigset_t block_io_signals(void) { sigset_t set; +/* SIGUSR2 used by posix-aio-compat.c */ sigemptyset(set); sigaddset(set, SIGUSR2); -sigaddset(set, SIGIO); -sigaddset(set, SIGALRM); pthread_sigmask(SIG_UNBLOCK, set, NULL); sigemptyset(set); +sigaddset(set, SIGIO); +sigaddset(set, SIGALRM); sigaddset(set, SIG_IPI); pthread_sigmask(SIG_BLOCK, set, NULL); + +return set; } void qemu_mutex_lock_iothread(void) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch uq/master 1/8] signalfd compatibility
Port qemu-kvm's signalfd compat code. commit 5a7fdd0abd7cd24dac205317a4195446ab8748b5 Author: Anthony Liguori aligu...@us.ibm.com Date: Wed May 7 11:55:47 2008 -0500 Use signalfd() in io-thread This patch reworks the IO thread to use signalfd() instead of sigtimedwait() This will eliminate the need to use SIGIO everywhere. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: qemu/compatfd.c === --- /dev/null +++ qemu/compatfd.c @@ -0,0 +1,117 @@ +/* + * signalfd/eventfd compatibility + * + * Copyright IBM, Corp. 2008 + * + * Authors: + * Anthony Liguori aligu...@us.ibm.com + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + * + */ + +#include qemu-common.h +#include compatfd.h + +#include sys/syscall.h +#include pthread.h + +struct sigfd_compat_info +{ +sigset_t mask; +int fd; +}; + +static void *sigwait_compat(void *opaque) +{ +struct sigfd_compat_info *info = opaque; +int err; +sigset_t all; + +sigfillset(all); +sigprocmask(SIG_BLOCK, all, NULL); + +do { +siginfo_t siginfo; + +err = sigwaitinfo(info-mask, siginfo); +if (err == -1 errno == EINTR) { +err = 0; +continue; +} + +if (err 0) { +char buffer[128]; +size_t offset = 0; + +memcpy(buffer, err, sizeof(err)); +while (offset sizeof(buffer)) { +ssize_t len; + +len = write(info-fd, buffer + offset, +sizeof(buffer) - offset); +if (len == -1 errno == EINTR) +continue; + +if (len = 0) { +err = -1; +break; +} + +offset += len; +} +} +} while (err = 0); + +return NULL; +} + +static int qemu_signalfd_compat(const sigset_t *mask) +{ +pthread_attr_t attr; +pthread_t tid; +struct sigfd_compat_info *info; +int fds[2]; + +info = malloc(sizeof(*info)); +if (info == NULL) { +errno = ENOMEM; +return -1; +} + +if (pipe(fds) == -1) { +free(info); +return -1; +} + +qemu_set_cloexec(fds[0]); +qemu_set_cloexec(fds[1]); + +memcpy(info-mask, mask, sizeof(*mask)); +info-fd = fds[1]; + +pthread_attr_init(attr); +pthread_attr_setdetachstate(attr, PTHREAD_CREATE_DETACHED); + +pthread_create(tid, attr, sigwait_compat, info); + +pthread_attr_destroy(attr); + +return fds[0]; +} + +int qemu_signalfd(const sigset_t *mask) +{ +#if defined(CONFIG_SIGNALFD) +int ret; + +ret = syscall(SYS_signalfd, -1, mask, _NSIG / 8); +if (ret != -1) { +qemu_set_cloexec(ret); +return ret; +} +#endif + +return qemu_signalfd_compat(mask); +} Index: qemu/compatfd.h === --- /dev/null +++ qemu/compatfd.h @@ -0,0 +1,43 @@ +/* + * signalfd/eventfd compatibility + * + * Copyright IBM, Corp. 2008 + * + * Authors: + * Anthony Liguori aligu...@us.ibm.com + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + * + */ + +#ifndef QEMU_COMPATFD_H +#define QEMU_COMPATFD_H + +#include signal.h + +struct qemu_signalfd_siginfo { +uint32_t ssi_signo; /* Signal number */ +int32_t ssi_errno; /* Error number (unused) */ +int32_t ssi_code;/* Signal code */ +uint32_t ssi_pid; /* PID of sender */ +uint32_t ssi_uid; /* Real UID of sender */ +int32_t ssi_fd; /* File descriptor (SIGIO) */ +uint32_t ssi_tid; /* Kernel timer ID (POSIX timers) */ +uint32_t ssi_band;/* Band event (SIGIO) */ +uint32_t ssi_overrun; /* POSIX timer overrun count */ +uint32_t ssi_trapno; /* Trap number that caused signal */ +int32_t ssi_status; /* Exit status or signal (SIGCHLD) */ +int32_t ssi_int; /* Integer sent by sigqueue(2) */ +uint64_t ssi_ptr; /* Pointer sent by sigqueue(2) */ +uint64_t ssi_utime; /* User CPU time consumed (SIGCHLD) */ +uint64_t ssi_stime; /* System CPU time consumed (SIGCHLD) */ +uint64_t ssi_addr;/* Address that generated signal + (for hardware-generated signals) */ +uint8_t pad[48]; /* Pad size to 128 bytes (allow for + additional fields in the future) */ +}; + +int qemu_signalfd(const sigset_t *mask); + +#endif Index: qemu/Makefile.objs === --- qemu.orig/Makefile.objs +++ qemu/Makefile.objs @@ -121,6 +121,7 @@ common-obj-y += $(addprefix ui/, $(ui-ob common-obj-y += iov.o acl.o common-obj-$(CONFIG_THREAD) += qemu-thread.o +common-obj-$(CONFIG_IOTHREAD) += compatfd.o common-obj-y +=
[patch uq/master 0/8] port qemu-kvm's MCE support (v2)
Port qemu-kvm's KVM MCE (Machine Check Exception) handling to qemu. It allows qemu to propagate MCEs to the guest. v2: - rename do_qemu_ram_addr_from_host. - fix kvm_on_sigbus/kvm_on_sigbus_vcpu naming. - fix bank register restoration (Dean Nelson). -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch uq/master 4/8] kvm: x86: add mce support
Port qemu-kvm's MCE support commit c68b2374c9048812f488e00ffb95db66c0bc07a7 Author: Huang Ying ying.hu...@intel.com Date: Mon Jul 20 10:00:53 2009 +0800 Add MCE simulation support to qemu/kvm KVM ioctls are used to initialize MCE simulation and inject MCE. The real MCE simulation is implemented in Linux kernel. The Kernel part has been merged. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: qemu/target-i386/helper.c === --- qemu.orig/target-i386/helper.c +++ qemu/target-i386/helper.c @@ -27,6 +27,7 @@ #include exec-all.h #include qemu-common.h #include kvm.h +#include kvm_x86.h //#define DEBUG_MMU @@ -1030,6 +1031,11 @@ void cpu_inject_x86_mce(CPUState *cenv, if (bank = bank_num || !(status MCI_STATUS_VAL)) return; +if (kvm_enabled()) { +kvm_inject_x86_mce(cenv, bank, status, mcg_status, addr, misc); +return; +} + /* * if MSR_MCG_CTL is not all 1s, the uncorrected error * reporting is disabled Index: qemu/target-i386/kvm.c === --- qemu.orig/target-i386/kvm.c +++ qemu/target-i386/kvm.c @@ -27,6 +27,7 @@ #include hw/pc.h #include hw/apic.h #include ioport.h +#include kvm_x86.h #ifdef CONFIG_KVM_PARA #include linux/kvm_para.h @@ -167,6 +168,67 @@ static int get_para_features(CPUState *e } #endif +#ifdef KVM_CAP_MCE +static int kvm_get_mce_cap_supported(KVMState *s, uint64_t *mce_cap, + int *max_banks) +{ +int r; + +r = kvm_ioctl(s, KVM_CHECK_EXTENSION, KVM_CAP_MCE); +if (r 0) { +*max_banks = r; +return kvm_ioctl(s, KVM_X86_GET_MCE_CAP_SUPPORTED, mce_cap); +} +return -ENOSYS; +} + +static int kvm_setup_mce(CPUState *env, uint64_t *mcg_cap) +{ +return kvm_vcpu_ioctl(env, KVM_X86_SETUP_MCE, mcg_cap); +} + +static int kvm_set_mce(CPUState *env, struct kvm_x86_mce *m) +{ +return kvm_vcpu_ioctl(env, KVM_X86_SET_MCE, m); +} + +struct kvm_x86_mce_data +{ +CPUState *env; +struct kvm_x86_mce *mce; +}; + +static void kvm_do_inject_x86_mce(void *_data) +{ +struct kvm_x86_mce_data *data = _data; +int r; + +r = kvm_set_mce(data-env, data-mce); +if (r 0) +perror(kvm_set_mce FAILED); +} +#endif + +void kvm_inject_x86_mce(CPUState *cenv, int bank, uint64_t status, +uint64_t mcg_status, uint64_t addr, uint64_t misc) +{ +#ifdef KVM_CAP_MCE +struct kvm_x86_mce mce = { +.bank = bank, +.status = status, +.mcg_status = mcg_status, +.addr = addr, +.misc = misc, +}; +struct kvm_x86_mce_data data = { +.env = cenv, +.mce = mce, +}; + +run_on_cpu(cenv, kvm_do_inject_x86_mce, data); +#endif +} + int kvm_arch_init_vcpu(CPUState *env) { struct { @@ -274,6 +336,28 @@ int kvm_arch_init_vcpu(CPUState *env) cpuid_data.cpuid.nent = cpuid_i; +#ifdef KVM_CAP_MCE +if (((env-cpuid_version 8)0xF) = 6 + (env-cpuid_features(CPUID_MCE|CPUID_MCA)) == (CPUID_MCE|CPUID_MCA) + kvm_check_extension(env-kvm_state, KVM_CAP_MCE) 0) { +uint64_t mcg_cap; +int banks; + +if (kvm_get_mce_cap_supported(env-kvm_state, mcg_cap, banks)) +perror(kvm_get_mce_cap_supported FAILED); +else { +if (banks MCE_BANKS_DEF) +banks = MCE_BANKS_DEF; +mcg_cap = MCE_CAP_DEF; +mcg_cap |= banks; +if (kvm_setup_mce(env, mcg_cap)) +perror(kvm_setup_mce FAILED); +else +env-mcg_cap = mcg_cap; +} +} +#endif + return kvm_vcpu_ioctl(env, KVM_SET_CPUID2, cpuid_data); } Index: qemu/target-i386/kvm_x86.h === --- /dev/null +++ qemu/target-i386/kvm_x86.h @@ -0,0 +1,21 @@ +/* + * QEMU KVM support + * + * Copyright (C) 2009 Red Hat Inc. + * Copyright IBM, Corp. 2008 + * + * Authors: + * Anthony Liguori aligu...@us.ibm.com + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + * + */ + +#ifndef __KVM_X86_H__ +#define __KVM_X86_H__ + +void kvm_inject_x86_mce(CPUState *cenv, int bank, uint64_t status, +uint64_t mcg_status, uint64_t addr, uint64_t misc); + +#endif -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch uq/master 7/8] MCE: Relay UCR MCE to guest
Port qemu-kvm's commit 4b62fff1101a7ad77553147717a8bd3bf79df7ef Author: Huang Ying ying.hu...@intel.com Date: Mon Sep 21 10:43:25 2009 +0800 MCE: Relay UCR MCE to guest UCR (uncorrected recovery) MCE is supported in recent Intel CPUs, where some hardware error such as some memory error can be reported without PCC (processor context corrupted). To recover from such MCE, the corresponding memory will be unmapped, and all processes accessing the memory will be killed via SIGBUS. For KVM, if QEMU/KVM is killed, all guest processes will be killed too. So we relay SIGBUS from host OS to guest system via a UCR MCE injection. Then guest OS can isolate corresponding memory and kill necessary guest processes only. SIGBUS sent to main thread (not VCPU threads) will be broadcast to all VCPU threads as UCR MCE. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: qemu/cpus.c === --- qemu.orig/cpus.c +++ qemu/cpus.c @@ -34,6 +34,10 @@ #include cpus.h #include compatfd.h +#ifdef CONFIG_LINUX +#include sys/prctl.h +#include sys/signalfd.h +#endif #ifdef SIGRTMIN #define SIG_IPI (SIGRTMIN+4) @@ -41,6 +45,10 @@ #define SIG_IPI SIGUSR1 #endif +#ifndef PR_MCE_KILL +#define PR_MCE_KILL 33 +#endif + static CPUState *next_cpu; /***/ @@ -498,28 +506,77 @@ static void qemu_tcg_wait_io_event(void) } } +static void sigbus_reraise(void) +{ +sigset_t set; +struct sigaction action; + +memset(action, 0, sizeof(action)); +action.sa_handler = SIG_DFL; +if (!sigaction(SIGBUS, action, NULL)) { +raise(SIGBUS); +sigemptyset(set); +sigaddset(set, SIGBUS); +sigprocmask(SIG_UNBLOCK, set, NULL); +} +perror(Failed to re-raise SIGBUS!\n); +abort(); +} + +static void sigbus_handler(int n, struct qemu_signalfd_siginfo *siginfo, + void *ctx) +{ +#if defined(TARGET_I386) +if (kvm_on_sigbus(siginfo-ssi_code, (void *)(intptr_t)siginfo-ssi_addr)) +#endif +sigbus_reraise(); +} + static void qemu_kvm_eat_signal(CPUState *env, int timeout) { struct timespec ts; int r, e; siginfo_t siginfo; sigset_t waitset; +sigset_t chkset; ts.tv_sec = timeout / 1000; ts.tv_nsec = (timeout % 1000) * 100; sigemptyset(waitset); sigaddset(waitset, SIG_IPI); +sigaddset(waitset, SIGBUS); -qemu_mutex_unlock(qemu_global_mutex); -r = sigtimedwait(waitset, siginfo, ts); -e = errno; -qemu_mutex_lock(qemu_global_mutex); +do { +qemu_mutex_unlock(qemu_global_mutex); -if (r == -1 !(e == EAGAIN || e == EINTR)) { -fprintf(stderr, sigtimedwait: %s\n, strerror(e)); -exit(1); -} +r = sigtimedwait(waitset, siginfo, ts); +e = errno; + +qemu_mutex_lock(qemu_global_mutex); + +if (r == -1 !(e == EAGAIN || e == EINTR)) { +fprintf(stderr, sigtimedwait: %s\n, strerror(e)); +exit(1); +} + +switch (r) { +case SIGBUS: +#ifdef TARGET_I386 +if (kvm_on_sigbus_vcpu(env, siginfo.si_code, siginfo.si_addr)) +#endif +sigbus_reraise(); +break; +default: +break; +} + +r = sigpending(chkset); +if (r == -1) { +fprintf(stderr, sigpending: %s\n, strerror(e)); +exit(1); +} +} while (sigismember(chkset, SIG_IPI) || sigismember(chkset, SIGBUS)); } static void qemu_kvm_wait_io_event(CPUState *env) @@ -645,6 +702,7 @@ static void kvm_init_ipi(CPUState *env) pthread_sigmask(SIG_BLOCK, NULL, set); sigdelset(set, SIG_IPI); +sigdelset(set, SIGBUS); r = kvm_set_signal_mask(env, set); if (r) { fprintf(stderr, kvm_set_signal_mask: %s\n, strerror(r)); @@ -655,6 +713,7 @@ static void kvm_init_ipi(CPUState *env) static sigset_t block_io_signals(void) { sigset_t set; +struct sigaction action; /* SIGUSR2 used by posix-aio-compat.c */ sigemptyset(set); @@ -665,8 +724,15 @@ static sigset_t block_io_signals(void) sigaddset(set, SIGIO); sigaddset(set, SIGALRM); sigaddset(set, SIG_IPI); +sigaddset(set, SIGBUS); pthread_sigmask(SIG_BLOCK, set, NULL); +memset(action, 0, sizeof(action)); +action.sa_flags = SA_SIGINFO; +action.sa_sigaction = (void (*)(int, siginfo_t*, void*))sigbus_handler; +sigaction(SIGBUS, action, NULL); +prctl(PR_MCE_KILL, 1, 1, 0, 0); + return set; } Index: qemu/kvm.h === --- qemu.orig/kvm.h +++ qemu/kvm.h @@ -110,6 +110,9 @@ int kvm_arch_init_vcpu(CPUState *env); void kvm_arch_reset_vcpu(CPUState *env); +int kvm_on_sigbus_vcpu(CPUState *env, int code, void *addr); +int kvm_on_sigbus(int code, void *addr); + struct
Re: [v2 RFC PATCH 0/4] Implement multiqueue virtio-net
Michael S. Tsirkin m...@redhat.com wrote on 10/05/2010 11:53:23 PM: Any idea where does this come from? Do you see more TX interrupts? RX interrupts? Exits? Do interrupts bounce more between guest CPUs? 4. Identify reasons for single netperf BW regression. After testing various combinations of #txqs, #vhosts, #netperf sessions, I think the drop for 1 stream is due to TX and RX for a flow being processed on different cpus. Right. Can we fix it? I am not sure how to. My initial patch had one thread but gave small gains and ran into limitations once number of sessions became large. I did two more tests: 1. Pin vhosts to same CPU: - BW drop is much lower for 1 stream case (- 5 to -8% range) - But performance is not so high for more sessions. 2. Changed vhost to be single threaded: - No degradation for 1 session, and improvement for upto 8, sometimes 16 streams (5-12%). - BW degrades after that, all the way till 128 netperf sessions. - But overall CPU utilization improves. Summary of the entire run (for 1-128 sessions): txq=4: BW: (-2.3) CPU: (-16.5)RCPU: (-5.3) txq=16: BW: (-1.9) CPU: (-24.9)RCPU: (-9.6) I don't see any reasons mentioned above. However, for higher number of netperf sessions, I see a big increase in retransmissions: Hmm, ok, and do you see any errors? I haven't seen any in any statistics, messages, etc. Also no retranmissions for txq=1. Single netperf case didn't have any retransmissions so that is not the cause for drop. I tested ixgbe (MQ): ___ #netperf ixgbe ixgbe (pin intrs to cpu#0 on both server/client) BW (#retr) BW (#retr) ___ 1 3567 (117) 6000 (251) 2 4406 (477) 6298 (725) 4 6119 (1085) 7208 (3387) 8 6595 (4276) 7381 (15296) 16 6651 (11651)6856 (30394) Interesting. You are saying we get much more retransmissions with physical nic as well? Yes, with ixgbe. I re-ran with 16 netperfs running for 15 secs on both ixgbe and cxgb3 just now to reconfirm: ixgbe: BW: 6186.85 SD/Remote: 135.711, 339.376 CPU/Remote: 79.99, 200.00, Retrans: 545 cxgb3: BW: 8051.07 SD/Remote: 144.416, 260.487 CPU/Remote: 110.88, 200.00, Retrans: 0 However 64 netperfs for 30 secs gave: ixgbe: BW: 6691.12 SD/Remote: 8046.617, 5259.992 CPU/Remote: 1223.86, 799.97, Retrans: 1424 cxgb3: BW: 7799.16 SD/Remote: 2589.875, 4317.013 CPU/Remote: 480.39 800.64, Retrans: 649 # ethtool -i eth4 driver: ixgbe version: 2.0.84-k2 firmware-version: 0.9-3 bus-info: :1f:00.1 # ifconfig output: RX packets:783241 errors:0 dropped:0 overruns:0 frame:0 TX packets:689533 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 # lspci output: 1f:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit Network Connec tion (rev 01) Subsystem: Intel Corporation Ethernet Server Adapter X520-2 Flags: bus master, fast devsel, latency 0, IRQ 30 Memory at 9890 (64-bit, prefetchable) [size=512K] I/O ports at 2020 [size=32] Memory at 98a0 (64-bit, prefetchable) [size=16K] Capabilities: [40] Power Management version 3 Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+ Capabilities: [70] MSI-X: Enable+ Count=64 Masked- Capabilities: [a0] Express Endpoint, MSI 00 Capabilities: [100] Advanced Error Reporting Capabilities: [140] Device Serial Number 00-1b-21-ff-ff-40-4a-b4 Capabilities: [150] Alternative Routing-ID Interpretation (ARI) Capabilities: [160] Single Root I/O Virtualization (SR-IOV) Kernel driver in use: ixgbe Kernel modules: ixgbe I haven't done this right now since I don't have a setup. I guess it would be limited by wire speed and gains may not be there. I will try to do this later when I get the setup. OK but at least need to check that it does not hurt things. Yes, sure. Summary: 1. Average BW increase for regular I/O is best for #txq=16 with the least CPU utilization increase. 2. The average BW for 512 byte I/O is best for lower #txq=2. For higher #txqs, BW increased only after a particular #netperf sessions - in my testing that limit was 32 netperf sessions. 3. Multiple txq for guest by itself doesn't seem to have any issues. Guest CPU% increase is slightly higher than BW improvement. I think it is true for all mq drivers since more paths run in parallel upto the device instead of sleeping and allowing one thread to send all packets via qdisc_restart. 4. Having high number of txqs gives better gains
Re: [v2 RFC PATCH 0/4] Implement multiqueue virtio-net
On Wednesday 06 October 2010 19:14:42 Krishna Kumar2 wrote: Arnd Bergmann a...@arndb.de wrote on 10/06/2010 05:49:00 PM: I don't see any reasons mentioned above. However, for higher number of netperf sessions, I see a big increase in retransmissions: ___ #netperf ORG NEW BW (#retr)BW (#retr) ___ 1 70244 (0) 64102 (0) 4 21421 (0) 36570 (416) 8 21746 (0) 38604 (148) 16 21783 (0) 40632 (464) 32 22677 (0) 37163 (1053) 64 23648 (4) 36449 (2197) 12823251 (2) 31676 (3185) ___ This smells like it could be related to a problem that Ben Greear found recently (see macvlan: Enable qdisc backoff logic). When the hardware is busy, used to just drop the packet. With Ben's patch, we return -EAGAIN to qemu (or vhost-net) to trigger a resend. I suppose what we really should do is feed that condition back to the guest network stack and implement the backoff in there. Thanks for the pointer. I will take a look at this as I hadn't seen this patch earlier. Is there any way to figure out if this is the issue? I think a good indication would be if this changes with/without the patch, and if you see -EAGAIN in qemu with the patch applied. Arnd -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch uq/master 7/8] MCE: Relay UCR MCE to guest
On 10/06/2010 11:05 AM, Marcelo Tosatti wrote: On Wed, Oct 06, 2010 at 10:58:36AM +0900, Hidetoshi Seto wrote: I got some more question: (2010/10/05 3:54), Marcelo Tosatti wrote: Index: qemu/target-i386/cpu.h === --- qemu.orig/target-i386/cpu.h +++ qemu/target-i386/cpu.h @@ -250,16 +250,32 @@ #define PG_ERROR_RSVD_MASK 0x08 #define PG_ERROR_I_D_MASK 0x10 -#define MCG_CTL_P (1UL8) /* MCG_CAP register available */ +#define MCG_CTL_P (1ULL8) /* MCG_CAP register available */ +#define MCG_SER_P (1ULL24) /* MCA recovery/new status bits */ -#define MCE_CAP_DEFMCG_CTL_P +#define MCE_CAP_DEF(MCG_CTL_P|MCG_SER_P) #define MCE_BANKS_DEF 10 It seems that current kvm doesn't support SER_P, so injecting SRAO to guest will mean that guest receives VAL|UC|!PCC and RIPV event from virtual processor that doesn't have SER_P. Dean also noted this. I don't think it was deliberate choice to not expose SER_P. Huang? In my testing, I found that MCG_SER_P was not being set (and I was running on a Nehalem-EX system). Injecting a MCE resulted in the guest entering into panic() from mce_panic(). If crash_kexec() finds a kexec_crash_image the system ends up rebooting, otherwise, what happens next requires operator intervention. When I applied a patch to the guest's kernel which forces mce_ser to be set, as if MCG_SER_P was set (see __mcheck_cpu_cap_init()), I found that when the memory page was 'owned' by a guest process, the process would be killed (if the page was dirty), and the guest would stay running. The HWPoisoned page would be sidelined and not cause any more issues. I think most OSes don't expect that it can receives MCE with !PCC on traditional x86 processor without SER_P. Q1: Is it safe to expect that guests can handle such !PCC event? This might be best answered by Huang, but as I mentioned above, without MCG_SER_P being set, the result was an orderly system panic on the guest. Q2: What is the expected behavior on the guest? I think I answered this above. Q3: What happen if guest reboots itself in response to the MCE? That depends... And the following issue also holds for a guest that is rebooted at some point having successfully sidelined the bad page. After the guest has panic'd, a system_reset of the guest or a restart initiated by crash_kexec() (called by panic() on the guest), usually results in the guest hanging because the bad page still belongs to qemu-kvm and is now being referenced by the new guest in some way. (It actually may not hang, but successfully reboot and be runnable, with the bad page lurking in the background. It all seems to depend on where the bad page ends up, and whether it's ever referenced.) I believe there was an attempt to deal with this in kvm on the host. See kvm_handle_bad_page(). This function was suppose to result in the sending of a BUS_MCEERR_AR flavored SIGBUS by do_sigbus() to qemu-kvm which in theory would result in the right thing happening. But commit 96054569190bdec375fe824e48ca1f4e3b53dd36 prevents the signal from being sent. So this mechanism needs to be re-worked, and the issue remains. I would think that if the the bad page can't be sidelined, such that the newly booting guest can't use it, then the new guest shouldn't be allowed to boot. But perhaps there is some merit in letting it try to boot and see if one gets 'lucky'. I understand that Huang is looking into what should be done. He can give you better information than I in answer to your questions. Dean -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] Add svm cpuid features
On Tue, Sep 28, 2010 at 12:05:20PM +0200, Roedel, Joerg wrote: On Tue, Sep 28, 2010 at 05:37:58AM -0400, Avi Kivity wrote: On 09/28/2010 11:28 AM, Roedel, Joerg wrote: Weird, it worked here as I tested it. I had it on qemu/master and with all three patches. But patch 1 should not make the difference. I take a look, have you pushed the failing uq/master? Yes, 8fe6a21c76. What was your command line? qemu-system-x86_64 -m 2G -cpu kvm64,+svm,+npt -enable-kvm ... Note this is qemu.git, so -enable-kvm is needed. Ok, I apparently forgot to force the CPUID xlevel to be 0x800A when SVM is enabled, probably because I only tested CPUID models where xlevel already defaults to 0x800A. Attached is a fix, thanks for catching this. Joerg Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] Make kvm64 the default cpu model when kvm_enabled()
On Mon, Sep 27, 2010 at 03:16:15PM +0200, Joerg Roedel wrote: As requested by Alex this patch makes kvm64 the default CPU model when qemu is started with -enable-kvm. This takes only effect for qemu-versions newer or equal to 0.14.0. Signed-off-by: Joerg Roedel joerg.roe...@amd.com --- hw/boards.h|1 + hw/pc.c| 21 - hw/pc_piix.c |6 ++ qemu-version.h | 35 +++ vl.c |4 5 files changed, 62 insertions(+), 5 deletions(-) create mode 100644 qemu-version.h diff --git a/hw/boards.h b/hw/boards.h index 6f0f0d7..2d41b2d 100644 --- a/hw/boards.h +++ b/hw/boards.h @@ -19,6 +19,7 @@ typedef struct QEMUMachine { QEMUMachineInitFunc *init; int use_scsi; int max_cpus; +unsigned int compat_version; unsigned int no_serial:1, no_parallel:1, use_virtcon:1, diff --git a/hw/pc.c b/hw/pc.c index 69b13bf..372ec4c 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -40,6 +40,16 @@ #include sysbus.h #include sysemu.h #include blockdev.h +#include kvm.h +#include qemu-version.h + +#ifdef TARGET_X86_64 +#define DEFAULT_KVM_CPU_MODEL kvm64 +#define DEFAULT_QEMU_CPU_MODEL qemu64 +#else +#define DEFAULT_KVM_CPU_MODEL kvm32 +#define DEFAULT_QEMU_CPU_MODEL qemu32 +#endif /* output Bochs bios info messages */ //#define DEBUG_BIOS @@ -867,11 +877,12 @@ void pc_cpus_init(const char *cpu_model) /* init CPUs */ if (cpu_model == NULL) { -#ifdef TARGET_X86_64 -cpu_model = qemu64; -#else -cpu_model = qemu32; -#endif +if (kvm_enabled() +qemu_compat_version = QEMU_COMPAT_VERSION(0, 14, 0)) { +cpu_model = DEFAULT_KVM_CPU_MODEL; +} else { +cpu_model = DEFAULT_QEMU_CPU_MODEL; +} } for(i = 0; i smp_cpus; i++) { diff --git a/hw/pc_piix.c b/hw/pc_piix.c index 12359a7..9e46b71 100644 --- a/hw/pc_piix.c +++ b/hw/pc_piix.c @@ -35,6 +35,7 @@ #include sysemu.h #include sysbus.h #include blockdev.h +#include qemu-version.h #define MAX_IDE_BUS 2 @@ -217,6 +218,7 @@ static QEMUMachine pc_machine = { .desc = Standard PC, .init = pc_init_pci, .max_cpus = 255, +.compat_version = QEMU_COMPAT_VERSION(0, 13, 0), .is_default = 1, }; @@ -225,6 +227,7 @@ static QEMUMachine pc_machine_v0_12 = { .desc = Standard PC, .init = pc_init_pci, .max_cpus = 255, +.compat_version = QEMU_COMPAT_VERSION(0, 12, 0), .compat_props = (GlobalProperty[]) { { .driver = virtio-serial-pci, @@ -244,6 +247,7 @@ static QEMUMachine pc_machine_v0_11 = { .desc = Standard PC, qemu 0.11, .init = pc_init_pci, .max_cpus = 255, +.compat_version = QEMU_COMPAT_VERSION(0, 11, 0), .compat_props = (GlobalProperty[]) { { .driver = virtio-blk-pci, @@ -279,6 +283,7 @@ static QEMUMachine pc_machine_v0_10 = { .desc = Standard PC, qemu 0.10, .init = pc_init_pci, .max_cpus = 255, +.compat_version = QEMU_COMPAT_VERSION(0, 10, 0), .compat_props = (GlobalProperty[]) { { .driver = virtio-blk-pci, @@ -325,6 +330,7 @@ static QEMUMachine isapc_machine = { .name = isapc, .desc = ISA-only PC, .init = pc_init_isa, +.compat_version = QEMU_COMPAT_VERSION(0, 10, 0), .max_cpus = 1, }; diff --git a/qemu-version.h b/qemu-version.h new file mode 100644 index 000..b4bfe48 --- /dev/null +++ b/qemu-version.h @@ -0,0 +1,35 @@ +/* + * qemu-version.h + * + * Defines needed for handling QEMU version compatibility + * + * Copyright (c) 2010 Joerg Roedel joerg.roe...@amd.com + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the Software), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#ifndef _QEMU_VERSION_H_ +#define _QEMU_VERSION_H_ + +extern unsigned int
Re: [v2 RFC PATCH 0/4] Implement multiqueue virtio-net
On Wed, Oct 06, 2010 at 11:13:31PM +0530, Krishna Kumar2 wrote: Michael S. Tsirkin m...@redhat.com wrote on 10/05/2010 11:53:23 PM: Any idea where does this come from? Do you see more TX interrupts? RX interrupts? Exits? Do interrupts bounce more between guest CPUs? 4. Identify reasons for single netperf BW regression. After testing various combinations of #txqs, #vhosts, #netperf sessions, I think the drop for 1 stream is due to TX and RX for a flow being processed on different cpus. Right. Can we fix it? I am not sure how to. My initial patch had one thread but gave small gains and ran into limitations once number of sessions became large. Sure. We will need multiple RX queues, and have a single thread handle a TX and RX pair. Then we need to make sure packets from a given flow on TX land on the same thread on RX. As flows can be hashed differently, for this to work we'll have to expose this info in host/guest interface. But since multiqueue implies host/guest ABI changes anyway, this point is moot. BTW, an interesting approach could be using bonding and multiple virtio-net interfaces. What are the disadvantages of such a setup? One advantage is it can be made to work in existing guests. I did two more tests: 1. Pin vhosts to same CPU: - BW drop is much lower for 1 stream case (- 5 to -8% range) - But performance is not so high for more sessions. 2. Changed vhost to be single threaded: - No degradation for 1 session, and improvement for upto 8, sometimes 16 streams (5-12%). - BW degrades after that, all the way till 128 netperf sessions. - But overall CPU utilization improves. Summary of the entire run (for 1-128 sessions): txq=4: BW: (-2.3) CPU: (-16.5)RCPU: (-5.3) txq=16: BW: (-1.9) CPU: (-24.9)RCPU: (-9.6) I don't see any reasons mentioned above. However, for higher number of netperf sessions, I see a big increase in retransmissions: Hmm, ok, and do you see any errors? I haven't seen any in any statistics, messages, etc. Herbert, could you help out debugging this increase in retransmissions please? Older mail on netdev in this thread has some numbers that seem to imply that we start hitting retransmissions much more as # of flows goes up. Also no retranmissions for txq=1. While it's nice that we have this parameter, the need to choose between single stream and multi stream performance when you start the vm makes this patch much less interesting IMHO. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] Make kvm64 the default cpu model when kvm_enabled()
On 10/06/2010 01:53 PM, Marcelo Tosatti wrote: On Mon, Sep 27, 2010 at 03:16:15PM +0200, Joerg Roedel wrote: As requested by Alex this patch makes kvm64 the default CPU model when qemu is started with -enable-kvm. This takes only effect for qemu-versions newer or equal to 0.14.0. Signed-off-by: Joerg Roedeljoerg.roe...@amd.com --- hw/boards.h|1 + hw/pc.c| 21 - hw/pc_piix.c |6 ++ qemu-version.h | 35 +++ vl.c |4 5 files changed, 62 insertions(+), 5 deletions(-) create mode 100644 qemu-version.h diff --git a/hw/boards.h b/hw/boards.h index 6f0f0d7..2d41b2d 100644 --- a/hw/boards.h +++ b/hw/boards.h @@ -19,6 +19,7 @@ typedef struct QEMUMachine { QEMUMachineInitFunc *init; int use_scsi; int max_cpus; +unsigned int compat_version; unsigned int no_serial:1, no_parallel:1, use_virtcon:1, diff --git a/hw/pc.c b/hw/pc.c index 69b13bf..372ec4c 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -40,6 +40,16 @@ #include sysbus.h #include sysemu.h #include blockdev.h +#include kvm.h +#include qemu-version.h + +#ifdef TARGET_X86_64 +#define DEFAULT_KVM_CPU_MODEL kvm64 +#define DEFAULT_QEMU_CPU_MODEL qemu64 +#else +#define DEFAULT_KVM_CPU_MODEL kvm32 +#define DEFAULT_QEMU_CPU_MODEL qemu32 +#endif /* output Bochs bios info messages */ //#define DEBUG_BIOS @@ -867,11 +877,12 @@ void pc_cpus_init(const char *cpu_model) /* init CPUs */ if (cpu_model == NULL) { -#ifdef TARGET_X86_64 -cpu_model = qemu64; -#else -cpu_model = qemu32; -#endif +if (kvm_enabled() +qemu_compat_version= QEMU_COMPAT_VERSION(0, 14, 0)) { +cpu_model = DEFAULT_KVM_CPU_MODEL; +} else { +cpu_model = DEFAULT_QEMU_CPU_MODEL; +} } for(i = 0; i smp_cpus; i++) { diff --git a/hw/pc_piix.c b/hw/pc_piix.c index 12359a7..9e46b71 100644 --- a/hw/pc_piix.c +++ b/hw/pc_piix.c @@ -35,6 +35,7 @@ #include sysemu.h #include sysbus.h #include blockdev.h +#include qemu-version.h #define MAX_IDE_BUS 2 @@ -217,6 +218,7 @@ static QEMUMachine pc_machine = { .desc = Standard PC, .init = pc_init_pci, .max_cpus = 255, +.compat_version = QEMU_COMPAT_VERSION(0, 13, 0), .is_default = 1, }; @@ -225,6 +227,7 @@ static QEMUMachine pc_machine_v0_12 = { .desc = Standard PC, .init = pc_init_pci, .max_cpus = 255, +.compat_version = QEMU_COMPAT_VERSION(0, 12, 0), .compat_props = (GlobalProperty[]) { { .driver = virtio-serial-pci, @@ -244,6 +247,7 @@ static QEMUMachine pc_machine_v0_11 = { .desc = Standard PC, qemu 0.11, .init = pc_init_pci, .max_cpus = 255, +.compat_version = QEMU_COMPAT_VERSION(0, 11, 0), .compat_props = (GlobalProperty[]) { { .driver = virtio-blk-pci, @@ -279,6 +283,7 @@ static QEMUMachine pc_machine_v0_10 = { .desc = Standard PC, qemu 0.10, .init = pc_init_pci, .max_cpus = 255, +.compat_version = QEMU_COMPAT_VERSION(0, 10, 0), .compat_props = (GlobalProperty[]) { { .driver = virtio-blk-pci, @@ -325,6 +330,7 @@ static QEMUMachine isapc_machine = { .name = isapc, .desc = ISA-only PC, .init = pc_init_isa, +.compat_version = QEMU_COMPAT_VERSION(0, 10, 0), .max_cpus = 1, }; diff --git a/qemu-version.h b/qemu-version.h new file mode 100644 index 000..b4bfe48 --- /dev/null +++ b/qemu-version.h @@ -0,0 +1,35 @@ +/* + * qemu-version.h + * + * Defines needed for handling QEMU version compatibility + * + * Copyright (c) 2010 Joerg Roedeljoerg.roe...@amd.com + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the Software), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#ifndef _QEMU_VERSION_H_ +#define _QEMU_VERSION_H_ + +extern unsigned int qemu_compat_version; + +#define QEMU_COMPAT_VERSION(major, minor,
Re: [patch uq/master 4/8] kvm: x86: add mce support
On 10/06/2010 12:34 PM, Marcelo Tosatti wrote: Port qemu-kvm's MCE support commit c68b2374c9048812f488e00ffb95db66c0bc07a7 Author: Huang Yingying.hu...@intel.com Date: Mon Jul 20 10:00:53 2009 +0800 Add MCE simulation support to qemu/kvm KVM ioctls are used to initialize MCE simulation and inject MCE. The real MCE simulation is implemented in Linux kernel. The Kernel part has been merged. Signed-off-by: Marcelo Tosattimtosa...@redhat.com Index: qemu/target-i386/helper.c === --- qemu.orig/target-i386/helper.c +++ qemu/target-i386/helper.c @@ -27,6 +27,7 @@ #include exec-all.h #include qemu-common.h #include kvm.h +#include kvm_x86.h //#define DEBUG_MMU @@ -1030,6 +1031,11 @@ void cpu_inject_x86_mce(CPUState *cenv, if (bank= bank_num || !(status MCI_STATUS_VAL)) return; +if (kvm_enabled()) { +kvm_inject_x86_mce(cenv, bank, status, mcg_status, addr, misc); +return; +} + /* * if MSR_MCG_CTL is not all 1s, the uncorrected error * reporting is disabled Index: qemu/target-i386/kvm.c === --- qemu.orig/target-i386/kvm.c +++ qemu/target-i386/kvm.c @@ -27,6 +27,7 @@ #include hw/pc.h #include hw/apic.h #include ioport.h +#include kvm_x86.h #ifdef CONFIG_KVM_PARA #includelinux/kvm_para.h @@ -167,6 +168,67 @@ static int get_para_features(CPUState *e } #endif +#ifdef KVM_CAP_MCE +static int kvm_get_mce_cap_supported(KVMState *s, uint64_t *mce_cap, + int *max_banks) +{ +int r; + +r = kvm_ioctl(s, KVM_CHECK_EXTENSION, KVM_CAP_MCE); +if (r 0) { +*max_banks = r; +return kvm_ioctl(s, KVM_X86_GET_MCE_CAP_SUPPORTED, mce_cap); +} +return -ENOSYS; +} + +static int kvm_setup_mce(CPUState *env, uint64_t *mcg_cap) +{ +return kvm_vcpu_ioctl(env, KVM_X86_SETUP_MCE, mcg_cap); +} + +static int kvm_set_mce(CPUState *env, struct kvm_x86_mce *m) +{ +return kvm_vcpu_ioctl(env, KVM_X86_SET_MCE, m); +} + +struct kvm_x86_mce_data +{ +CPUState *env; +struct kvm_x86_mce *mce; +}; + +static void kvm_do_inject_x86_mce(void *_data) +{ +struct kvm_x86_mce_data *data = _data; +int r; + +r = kvm_set_mce(data-env, data-mce); +if (r 0) +perror(kvm_set_mce FAILED); +} +#endif + +void kvm_inject_x86_mce(CPUState *cenv, int bank, uint64_t status, +uint64_t mcg_status, uint64_t addr, uint64_t misc) +{ +#ifdef KVM_CAP_MCE +struct kvm_x86_mce mce = { +.bank = bank, +.status = status, +.mcg_status = mcg_status, +.addr = addr, +.misc = misc, +}; +struct kvm_x86_mce_data data = { +.env = cenv, +.mce =mce, +}; + +run_on_cpu(cenv, kvm_do_inject_x86_mce,data); +#endif +} + int kvm_arch_init_vcpu(CPUState *env) { struct { @@ -274,6 +336,28 @@ int kvm_arch_init_vcpu(CPUState *env) cpuid_data.cpuid.nent = cpuid_i; +#ifdef KVM_CAP_MCE +if (((env-cpuid_version 8)0xF)= 6 + (env-cpuid_features(CPUID_MCE|CPUID_MCA)) == (CPUID_MCE|CPUID_MCA) + kvm_check_extension(env-kvm_state, KVM_CAP_MCE) 0) { +uint64_t mcg_cap; +int banks; + +if (kvm_get_mce_cap_supported(env-kvm_state,mcg_cap,banks)) +perror(kvm_get_mce_cap_supported FAILED); +else { +if (banks MCE_BANKS_DEF) +banks = MCE_BANKS_DEF; +mcg_cap= MCE_CAP_DEF; +mcg_cap |= banks; +if (kvm_setup_mce(env,mcg_cap)) +perror(kvm_setup_mce FAILED); +else +env-mcg_cap = mcg_cap; +} +} +#endif + return kvm_vcpu_ioctl(env, KVM_SET_CPUID2,cpuid_data); } Index: qemu/target-i386/kvm_x86.h === --- /dev/null +++ qemu/target-i386/kvm_x86.h @@ -0,0 +1,21 @@ +/* + * QEMU KVM support + * + * Copyright (C) 2009 Red Hat Inc. + * Copyright IBM, Corp. 2008 + * + * Authors: + * Anthony Liguorialigu...@us.ibm.com + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + * + */ BTW, I'm fairly sure I didn't write any of this code so this copyright statement is probably bogus. Regards, Anthony Liguori + +#ifndef __KVM_X86_H__ +#define __KVM_X86_H__ + +void kvm_inject_x86_mce(CPUState *cenv, int bank, uint64_t status, +uint64_t mcg_status, uint64_t addr, uint64_t misc); + +#endif -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at
Re: [PATCH v6 07/12] Add async PF initialization to PV guest.
On Wed, Oct 06, 2010 at 11:45:12AM -0300, Marcelo Tosatti wrote: On Wed, Oct 06, 2010 at 12:55:04PM +0200, Gleb Natapov wrote: On Tue, Oct 05, 2010 at 03:25:54PM -0300, Marcelo Tosatti wrote: On Mon, Oct 04, 2010 at 05:56:29PM +0200, Gleb Natapov wrote: Enable async PF in a guest if async PF capability is discovered. Signed-off-by: Gleb Natapov g...@redhat.com --- Documentation/kernel-parameters.txt |3 + arch/x86/include/asm/kvm_para.h |5 ++ arch/x86/kernel/kvm.c | 92 +++ 3 files changed, 100 insertions(+), 0 deletions(-) +static int __cpuinit kvm_cpu_notify(struct notifier_block *self, + unsigned long action, void *hcpu) +{ + int cpu = (unsigned long)hcpu; + switch (action) { + case CPU_ONLINE: + case CPU_DOWN_FAILED: + case CPU_ONLINE_FROZEN: + smp_call_function_single(cpu, kvm_guest_cpu_notify, NULL, 0); wait parameter should probably be 1. Why should we wait for it? FWIW I copied this from somewhere (May be arch/x86/pci/amd_bus.c). So that you know its executed in a defined point in cpu bringup. If I read code correctly CPU we are notified about is already running when callback is called, so I do not see what waiting for IPI to be processed will accomplish here. With many cpus we will make boot a little bit slower. I don't care too much though, so if you still think that 1 is required here I'll make it so. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 04/12] Add memory slot versioning and use it to provide fast guest write interface
On Wed, Oct 06, 2010 at 11:38:47AM -0300, Marcelo Tosatti wrote: On Wed, Oct 06, 2010 at 01:14:17PM +0200, Gleb Natapov wrote: +int kvm_gfn_to_hva_cache_init(struct kvm *kvm, struct gfn_to_hva_cache *ghc, + gpa_t gpa) +{ + struct kvm_memslots *slots = kvm_memslots(kvm); + int offset = offset_in_page(gpa); + gfn_t gfn = gpa PAGE_SHIFT; + + ghc-gpa = gpa; + ghc-generation = slots-generation; kvm-memslots can change here. + ghc-memslot = gfn_to_memslot(kvm, gfn); + ghc-hva = gfn_to_hva(kvm, gfn); And if so, gfn_to_memslot / gfn_to_hva will use new memslots pointer. Should dereference all values from one copy of kvm-memslots pointer. Ah, I see now. Thanks! Will fix. + if (!kvm_is_error_hva(ghc-hva)) + ghc-hva += offset; + else + return -EFAULT; + + return 0; +} Should use a unique kvm_memslots structure for the cache entry, since it can change in between (use gfn_to_hva_memslot, etc on slots pointer). I do not understand what do you mean here. kvm_memslots structure itself is not cached only various translation that use it are cached. Translation result are never used if kvm_memslots was changed. Also should zap any cached entries on overflow, otherwise malicious userspace could make use of stale slots: There is only one cached entry at each given time. User who wants to write into guest memory often defines gfn_to_hva_cache variable somewhere. Init it with kvm_gfn_to_hva_cache_init() and then calls kvm_write_guest_cached() on it. If there was no slot changes in between cached translation are used. Otherwise cache is recalculated. Malicious userspace can cause entry to be cached, ioctl SET_USER_MEMORY_REGION 2^32 times, generation number will match, mark_page_dirty_in_slot will be called with pointer to freed memory. Hmm. To zap all cached entires on overflow we need to track them. If we will track then we can zap them on each slot update and drop generation entirely. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL net-next-2.6] vhost-net patchset for 2.6.37
From: Michael S. Tsirkin m...@redhat.com Date: Tue, 5 Oct 2010 20:27:32 +0200 It looks like it was a quiet cycle for vhost-net: probably because most of energy was spent on bugfixes that went in for 2.6.36. People are working on multiqueue, tracing but I'm not sure it'll get done in time for 2.6.37 - so here's a tree with a single patch that helps windows guests which we definitely want in the next kernel. Please merge for 2.6.37. Pulled, thanks Michael. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] device-assignment: Re-work PCI option ROM support
On Mon, Oct 04, 2010 at 03:26:18PM -0600, Alex Williamson wrote: This cleans up device assignment option ROM support and allows us to use romfile and rombar default PCI options. Thanks, Alex --- Alex Williamson (2): device-assignment: Allow PCI to manage the option ROM PCI: Export pci_map_option_rom() hw/device-assignment.c | 155 +--- hw/device-assignment.h |4 + hw/pci.c |2 - hw/pci.h |3 + 4 files changed, 75 insertions(+), 89 deletions(-) Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/6] Save state error handling (kill off no_migrate)
Our code paths for saving or migrating a VM are full of functions that return void, leaving no opportunity for a device to cancel a migration, either from error or incompatibility. The ivshmem driver attempted to solve this with a no_migrate flag on the save state entry. I think the more generic and flexible way to solve this is to allow driver save functions to fail. This series implements that and converts ivshmem to uses a set_params function to NAK migration much earlier in the processes. This touches a lot of files, but bulk of those changes are simply s/void/int/ and tacking a return 0 to the end of functions. Thanks, Alex --- Alex Williamson (6): savevm: Remove register_device_unmigratable() savevm: Allow set_params and save_live_state to error virtio: Allow virtio_save() errors pci: Allow pci_device_save() to return error savevm: Allow vmsd-pre_save to return error savevm: Allow SaveStateHandler() to return error block-migration.c |4 +- hw/adb.c|8 +++- hw/ads7846.c|4 +- hw/arm_gic.c|4 +- hw/arm_timer.c |6 ++- hw/armv7m_nvic.c|4 +- hw/cuda.c |4 +- hw/fdc.c|3 + hw/g364fb.c |4 +- hw/grackle_pci.c|4 +- hw/gt64xxx.c|4 +- hw/heathrow_pic.c |4 +- hw/hpet.c |3 + hw/hw.h | 12 ++ hw/i2c.c|3 + hw/ide/core.c |4 +- hw/ivshmem.c| 30 +++ hw/lsi53c895a.c |4 +- hw/m48t59.c |4 +- hw/mac_dbdma.c |4 +- hw/mac_nvram.c |4 +- hw/max111x.c|4 +- hw/mipsnet.c|4 +- hw/mst_fpga.c |3 + hw/nand.c |3 + hw/openpic.c|4 +- hw/pci.c|9 +++- hw/pci.h|2 - hw/piix4.c |4 +- hw/pl011.c |4 +- hw/pl022.c |4 +- hw/pl061.c |4 +- hw/ppc4xx_pci.c | 11 - hw/ppce500_pci.c| 11 - hw/pxa2xx.c | 28 ++ hw/pxa2xx_dma.c |4 +- hw/pxa2xx_gpio.c|4 +- hw/pxa2xx_keypad.c |3 + hw/pxa2xx_lcd.c |4 +- hw/pxa2xx_mmci.c|4 +- hw/pxa2xx_pic.c |4 +- hw/pxa2xx_timer.c |4 +- hw/rc4030.c |4 +- hw/rtl8139.c|4 +- hw/serial.c |3 + hw/spitz.c | 14 +-- hw/ssd0323.c|4 +- hw/ssi-sd.c |4 +- hw/stellaris.c | 20 +++--- hw/stellaris_enet.c |4 +- hw/stellaris_input.c|4 +- hw/syborg_fb.c |4 +- hw/syborg_interrupt.c |3 + hw/syborg_keyboard.c|3 + hw/syborg_pointer.c |3 + hw/syborg_rtc.c |4 +- hw/syborg_serial.c |4 +- hw/syborg_timer.c |4 +- hw/tsc2005.c|4 +- hw/tsc210x.c|4 +- hw/twl92230.c |3 + hw/unin_pci.c |4 +- hw/usb-uhci.c |3 + hw/virtio-balloon.c |9 +++- hw/virtio-blk.c | 10 - hw/virtio-net.c | 11 - hw/virtio-pci.c | 10 - hw/virtio-serial-bus.c | 10 - hw/virtio.c | 14 +-- hw/virtio.h |4 +- hw/wm8750.c |3 + hw/zaurus.c |4 +- qemu-common.h |2 - savevm.c| 88 +++ slirp/slirp.c |6 ++- target-arm/machine.c|3 + target-cris/machine.c |3 + target-i386/machine.c |7 ++- target-microblaze/machine.c |3 + target-mips/machine.c |3 + target-ppc/machine.c|3 + target-s390x/machine.c |3 + target-sparc/machine.c |3 + 83 files changed, 365 insertions(+), 181 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/6] savevm: Allow SaveStateHandler() to return error
Some devices may not always able to save their state, allow the save handler to return an error. Signed-off-by: Alex Williamson alex.william...@redhat.com --- hw/adb.c|8 ++-- hw/ads7846.c|4 +++- hw/arm_gic.c|4 +++- hw/arm_timer.c |6 -- hw/armv7m_nvic.c|4 +++- hw/cuda.c |4 +++- hw/g364fb.c |4 +++- hw/grackle_pci.c|4 +++- hw/gt64xxx.c|3 ++- hw/heathrow_pic.c |4 +++- hw/hw.h |2 +- hw/ivshmem.c|3 ++- hw/m48t59.c |4 +++- hw/mac_dbdma.c |4 +++- hw/mac_nvram.c |4 +++- hw/max111x.c|4 +++- hw/mipsnet.c|4 +++- hw/mst_fpga.c |3 ++- hw/nand.c |3 ++- hw/openpic.c|4 +++- hw/piix4.c |3 ++- hw/pl011.c |4 +++- hw/pl022.c |4 +++- hw/pl061.c |4 +++- hw/ppc4xx_pci.c |4 +++- hw/ppce500_pci.c|4 +++- hw/pxa2xx.c | 28 +--- hw/pxa2xx_dma.c |4 +++- hw/pxa2xx_gpio.c|4 +++- hw/pxa2xx_keypad.c |3 ++- hw/pxa2xx_lcd.c |4 +++- hw/pxa2xx_mmci.c|4 +++- hw/pxa2xx_pic.c |4 +++- hw/pxa2xx_timer.c |4 +++- hw/rc4030.c |4 +++- hw/spitz.c | 14 ++ hw/ssd0323.c|4 +++- hw/ssi-sd.c |4 +++- hw/stellaris.c | 20 +++- hw/stellaris_enet.c |4 +++- hw/stellaris_input.c|4 +++- hw/syborg_fb.c |4 +++- hw/syborg_interrupt.c |3 ++- hw/syborg_keyboard.c|3 ++- hw/syborg_pointer.c |3 ++- hw/syborg_rtc.c |4 +++- hw/syborg_serial.c |4 +++- hw/syborg_timer.c |4 +++- hw/tsc2005.c|4 +++- hw/tsc210x.c|4 +++- hw/unin_pci.c |4 +++- hw/virtio-balloon.c |3 ++- hw/virtio-blk.c |4 +++- hw/virtio-net.c |4 +++- hw/virtio-serial-bus.c |4 +++- hw/zaurus.c |4 +++- qemu-common.h |2 +- savevm.c|3 +-- slirp/slirp.c |6 -- target-arm/machine.c|3 ++- target-cris/machine.c |3 ++- target-i386/machine.c |3 ++- target-microblaze/machine.c |3 ++- target-mips/machine.c |3 ++- target-ppc/machine.c|3 ++- target-s390x/machine.c |3 ++- target-sparc/machine.c |3 ++- 67 files changed, 219 insertions(+), 84 deletions(-) diff --git a/hw/adb.c b/hw/adb.c index 99b30f6..f400d12 100644 --- a/hw/adb.c +++ b/hw/adb.c @@ -261,7 +261,7 @@ static int adb_kbd_request(ADBDevice *d, uint8_t *obuf, return olen; } -static void adb_kbd_save(QEMUFile *f, void *opaque) +static int adb_kbd_save(QEMUFile *f, void *opaque) { KBDState *s = (KBDState *)opaque; @@ -269,6 +269,8 @@ static void adb_kbd_save(QEMUFile *f, void *opaque) qemu_put_sbe32s(f, s-rptr); qemu_put_sbe32s(f, s-wptr); qemu_put_sbe32s(f, s-count); + +return 0; } static int adb_kbd_load(QEMUFile *f, void *opaque, int version_id) @@ -439,7 +441,7 @@ static int adb_mouse_reset(ADBDevice *d) return 0; } -static void adb_mouse_save(QEMUFile *f, void *opaque) +static int adb_mouse_save(QEMUFile *f, void *opaque) { MouseState *s = (MouseState *)opaque; @@ -448,6 +450,8 @@ static void adb_mouse_save(QEMUFile *f, void *opaque) qemu_put_sbe32s(f, s-dx); qemu_put_sbe32s(f, s-dy); qemu_put_sbe32s(f, s-dz); + +return 0; } static int adb_mouse_load(QEMUFile *f, void *opaque, int version_id) diff --git a/hw/ads7846.c b/hw/ads7846.c index b3bbeaf..4440ed2 100644 --- a/hw/ads7846.c +++ b/hw/ads7846.c @@ -105,7 +105,7 @@ static void ads7846_ts_event(void *opaque, } } -static void ads7846_save(QEMUFile *f, void *opaque) +static int ads7846_save(QEMUFile *f, void *opaque) { ADS7846State *s = (ADS7846State *) opaque; int i; @@ -115,6 +115,8 @@ static void ads7846_save(QEMUFile *f, void *opaque) qemu_put_be32(f, s-noise); qemu_put_be32(f, s-cycle); qemu_put_be32(f, s-output); + +return 0; } static int ads7846_load(QEMUFile *f, void *opaque, int version_id) diff --git a/hw/arm_gic.c b/hw/arm_gic.c index 8286a28..7790a10 100644 --- a/hw/arm_gic.c +++ b/hw/arm_gic.c @@ -653,7 +653,7 @@ static void gic_reset(gic_state *s) #endif } -static void gic_save(QEMUFile *f, void *opaque) +static int gic_save(QEMUFile *f, void *opaque) {
[PATCH 2/6] savevm: Allow vmsd-pre_save to return error
This allows vmsd based saves to also have a way to signal that they can't be saved or migrated. Signed-off-by: Alex Williamson alex.william...@redhat.com --- hw/fdc.c |3 ++- hw/hpet.c |3 ++- hw/hw.h |6 +++--- hw/i2c.c |3 ++- hw/ide/core.c |4 +++- hw/lsi53c895a.c |4 +++- hw/rtl8139.c |4 +++- hw/serial.c |3 ++- hw/twl92230.c |3 ++- hw/usb-uhci.c |3 ++- hw/wm8750.c |3 ++- savevm.c | 36 +++- target-i386/machine.c |6 +++--- 13 files changed, 52 insertions(+), 29 deletions(-) diff --git a/hw/fdc.c b/hw/fdc.c index c159dcb..ff48c70 100644 --- a/hw/fdc.c +++ b/hw/fdc.c @@ -643,11 +643,12 @@ static const VMStateDescription vmstate_fdrive = { } }; -static void fdc_pre_save(void *opaque) +static int fdc_pre_save(void *opaque) { FDCtrl *s = opaque; s-dor_vmstate = s-dor | GET_CUR_DRV(s); +return 0; } static int fdc_post_load(void *opaque, int version_id) diff --git a/hw/hpet.c b/hw/hpet.c index d5c406c..e586e68 100644 --- a/hw/hpet.c +++ b/hw/hpet.c @@ -204,12 +204,13 @@ static void update_irq(struct HPETTimer *timer, int set) } } -static void hpet_pre_save(void *opaque) +static int hpet_pre_save(void *opaque) { HPETState *s = opaque; /* save current counter value */ s-hpet_counter = hpet_get_ticks(s); +return 0; } static int hpet_pre_load(void *opaque) diff --git a/hw/hw.h b/hw/hw.h index b6f1236..91a60ca 100644 --- a/hw/hw.h +++ b/hw/hw.h @@ -328,7 +328,7 @@ struct VMStateDescription { LoadStateHandler *load_state_old; int (*pre_load)(void *opaque); int (*post_load)(void *opaque, int version_id); -void (*pre_save)(void *opaque); +int (*pre_save)(void *opaque); VMStateField *fields; const VMStateSubsection *subsections; }; @@ -773,8 +773,8 @@ extern const VMStateDescription vmstate_i2c_slave; extern int vmstate_load_state(QEMUFile *f, const VMStateDescription *vmsd, void *opaque, int version_id); -extern void vmstate_save_state(QEMUFile *f, const VMStateDescription *vmsd, - void *opaque); +extern int vmstate_save_state(QEMUFile *f, const VMStateDescription *vmsd, + void *opaque); extern int vmstate_register(DeviceState *dev, int instance_id, const VMStateDescription *vmsd, void *base); extern int vmstate_register_with_alias_id(DeviceState *dev, diff --git a/hw/i2c.c b/hw/i2c.c index f80d12d..f05c2ef 100644 --- a/hw/i2c.c +++ b/hw/i2c.c @@ -26,11 +26,12 @@ static struct BusInfo i2c_bus_info = { } }; -static void i2c_bus_pre_save(void *opaque) +static int i2c_bus_pre_save(void *opaque) { i2c_bus *bus = opaque; bus-saved_address = bus-current_dev ? bus-current_dev-address : -1; +return 0; } static int i2c_bus_post_load(void *opaque, int version_id) diff --git a/hw/ide/core.c b/hw/ide/core.c index 06b6e14..eb5f095 100644 --- a/hw/ide/core.c +++ b/hw/ide/core.c @@ -2792,7 +2792,7 @@ static int ide_drive_pio_post_load(void *opaque, int version_id) return 0; } -static void ide_drive_pio_pre_save(void *opaque) +static int ide_drive_pio_pre_save(void *opaque) { IDEState *s = opaque; int idx; @@ -2808,6 +2808,8 @@ static void ide_drive_pio_pre_save(void *opaque) } else { s-end_transfer_fn_idx = idx; } + +return 0; } static bool ide_drive_pio_state_needed(void *opaque) diff --git a/hw/lsi53c895a.c b/hw/lsi53c895a.c index 5eaf69e..7315a3f 100644 --- a/hw/lsi53c895a.c +++ b/hw/lsi53c895a.c @@ -2045,7 +2045,7 @@ static void lsi_scsi_reset(DeviceState *dev) lsi_soft_reset(s); } -static void lsi_pre_save(void *opaque) +static int lsi_pre_save(void *opaque) { LSIState *s = opaque; @@ -2054,6 +2054,8 @@ static void lsi_pre_save(void *opaque) assert(s-current-dma_len == 0); } assert(QTAILQ_EMPTY(s-queue)); + +return 0; } static const VMStateDescription vmstate_lsi_scsi = { diff --git a/hw/rtl8139.c b/hw/rtl8139.c index d92981d..56271fb 100644 --- a/hw/rtl8139.c +++ b/hw/rtl8139.c @@ -3173,7 +3173,7 @@ static int rtl8139_post_load(void *opaque, int version_id) return 0; } -static void rtl8139_pre_save(void *opaque) +static int rtl8139_pre_save(void *opaque) { RTL8139State* s = opaque; int64_t current_time = qemu_get_clock(vm_clock); @@ -3182,6 +3182,8 @@ static void rtl8139_pre_save(void *opaque) rtl8139_set_next_tctr_time(s, current_time); s-TCTR = muldiv64(current_time - s-TCTR_base, PCI_FREQUENCY, get_ticks_per_sec()); + +return 0; } static const VMStateDescription vmstate_rtl8139 = { diff --git a/hw/serial.c b/hw/serial.c index 9ebc452..edfdd4d 100644 --- a/hw/serial.c +++ b/hw/serial.c @@ -659,10 +659,11 @@ static void
[PATCH 3/6] pci: Allow pci_device_save() to return error
Carry the vmsd pre_save error reporting through pci_device_save(). Signed-off-by: Alex Williamson alex.william...@redhat.com --- hw/grackle_pci.c |4 +--- hw/gt64xxx.c |3 +-- hw/ivshmem.c |7 ++- hw/openpic.c |4 +--- hw/pci.c |9 +++-- hw/pci.h |2 +- hw/piix4.c |3 +-- hw/ppc4xx_pci.c |7 +-- hw/ppce500_pci.c |7 +-- hw/unin_pci.c|4 +--- 10 files changed, 29 insertions(+), 21 deletions(-) diff --git a/hw/grackle_pci.c b/hw/grackle_pci.c index f6905fb..c7164c5 100644 --- a/hw/grackle_pci.c +++ b/hw/grackle_pci.c @@ -61,9 +61,7 @@ static int pci_grackle_save(QEMUFile* f, void *opaque) { PCIDevice *d = opaque; -pci_device_save(d, f); - -return 0; +return pci_device_save(d, f); } static int pci_grackle_load(QEMUFile* f, void *opaque, int version_id) diff --git a/hw/gt64xxx.c b/hw/gt64xxx.c index 7d8c3b3..21a0e57 100644 --- a/hw/gt64xxx.c +++ b/hw/gt64xxx.c @@ -1089,8 +1089,7 @@ static void gt64120_reset(void *opaque) static int gt64120_save(QEMUFile* f, void *opaque) { PCIDevice *d = opaque; -pci_device_save(d, f); -return 0; +return pci_device_save(d, f); } static int gt64120_load(QEMUFile* f, void *opaque, int version_id) diff --git a/hw/ivshmem.c b/hw/ivshmem.c index 0919c4e..3726a7f 100644 --- a/hw/ivshmem.c +++ b/hw/ivshmem.c @@ -619,9 +619,14 @@ static void ivshmem_setup_msi(IVShmemState * s) { static int ivshmem_save(QEMUFile* f, void *opaque) { IVShmemState *proxy = opaque; +int ret; IVSHMEM_DPRINTF(ivshmem_save\n); -pci_device_save(proxy-dev, f); + +ret = pci_device_save(proxy-dev, f); +if (ret 0) { +return ret; +} if (ivshmem_has_feature(proxy, IVSHMEM_MSI)) { msix_save(proxy-dev, f); diff --git a/hw/openpic.c b/hw/openpic.c index 4ca4ba3..4537239 100644 --- a/hw/openpic.c +++ b/hw/openpic.c @@ -1102,9 +1102,7 @@ static int openpic_save(QEMUFile* f, void *opaque) } #endif -pci_device_save(opp-pci_dev, f); - -return 0; +return pci_device_save(opp-pci_dev, f); } static void openpic_load_IRQ_queue(QEMUFile* f, IRQ_queue_t *q) diff --git a/hw/pci.c b/hw/pci.c index 15416dd..a30f6ec 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -434,16 +434,21 @@ static inline const VMStateDescription *pci_get_vmstate(PCIDevice *s) return pci_is_express(s) ? vmstate_pcie_device : vmstate_pci_device; } -void pci_device_save(PCIDevice *s, QEMUFile *f) +int pci_device_save(PCIDevice *s, QEMUFile *f) { +int ret; /* Clear interrupt status bit: it is implicit * in irq_state which we are saving. * This makes us compatible with old devices * which never set or clear this bit. */ s-config[PCI_STATUS] = ~PCI_STATUS_INTERRUPT; -vmstate_save_state(f, pci_get_vmstate(s), s); +ret = vmstate_save_state(f, pci_get_vmstate(s), s); +if (ret 0) { +return ret; +} /* Restore the interrupt status bit. */ pci_update_irq_status(s); +return 0; } int pci_device_load(PCIDevice *s, QEMUFile *f) diff --git a/hw/pci.h b/hw/pci.h index 3d23f03..bb9ad79 100644 --- a/hw/pci.h +++ b/hw/pci.h @@ -198,7 +198,7 @@ uint32_t pci_default_read_config(PCIDevice *d, uint32_t address, int len); void pci_default_write_config(PCIDevice *d, uint32_t address, uint32_t val, int len); -void pci_device_save(PCIDevice *s, QEMUFile *f); +int pci_device_save(PCIDevice *s, QEMUFile *f); int pci_device_load(PCIDevice *s, QEMUFile *f); typedef void (*pci_set_irq_fn)(void *opaque, int irq_num, int level); diff --git a/hw/piix4.c b/hw/piix4.c index 5209061..9f560ac 100644 --- a/hw/piix4.c +++ b/hw/piix4.c @@ -71,8 +71,7 @@ static void piix4_reset(void *opaque) static int piix_save(QEMUFile* f, void *opaque) { PCIDevice *d = opaque; -pci_device_save(d, f); -return 0; +return pci_device_save(d, f); } static int piix_load(QEMUFile* f, void *opaque, int version_id) diff --git a/hw/ppc4xx_pci.c b/hw/ppc4xx_pci.c index 7507d08..3499270 100644 --- a/hw/ppc4xx_pci.c +++ b/hw/ppc4xx_pci.c @@ -301,9 +301,12 @@ static void ppc4xx_pci_set_irq(void *opaque, int irq_num, int level) static int ppc4xx_pci_save(QEMUFile *f, void *opaque) { PPC4xxPCIState *controller = opaque; -int i; +int i, ret; -pci_device_save(controller-pci_dev, f); +ret = pci_device_save(controller-pci_dev, f); +if (ret 0) { +return ret; +} for (i = 0; i PPC4xx_PCI_NR_PMMS; i++) { qemu_put_be32s(f, controller-pmm[i].la); diff --git a/hw/ppce500_pci.c b/hw/ppce500_pci.c index 9babe05..97a7743 100644 --- a/hw/ppce500_pci.c +++ b/hw/ppce500_pci.c @@ -219,9 +219,12 @@ static void mpc85xx_pci_set_irq(void *opaque, int irq_num, int level) static int ppce500_pci_save(QEMUFile *f, void *opaque) { PPCE500PCIState *controller = opaque; -int i; +int i, ret; -
[PATCH 4/6] virtio: Allow virtio_save() errors
Carry pci_device_save() error through to virtio_save(). Signed-off-by: Alex Williamson alex.william...@redhat.com --- hw/virtio-balloon.c|6 +- hw/virtio-blk.c|6 +- hw/virtio-net.c|7 ++- hw/virtio-pci.c| 10 -- hw/virtio-serial-bus.c |6 +- hw/virtio.c| 14 ++ hw/virtio.h|4 ++-- 7 files changed, 41 insertions(+), 12 deletions(-) diff --git a/hw/virtio-balloon.c b/hw/virtio-balloon.c index 719e72c..2bf009d 100644 --- a/hw/virtio-balloon.c +++ b/hw/virtio-balloon.c @@ -230,8 +230,12 @@ static void virtio_balloon_to_target(void *opaque, ram_addr_t target, static int virtio_balloon_save(QEMUFile *f, void *opaque) { VirtIOBalloon *s = opaque; +int ret; -virtio_save(s-vdev, f); +ret = virtio_save(s-vdev, f); +if (ret 0) { +return ret; +} qemu_put_be32(f, s-num_pages); qemu_put_be32(f, s-actual); diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c index 3770901..b4772bf 100644 --- a/hw/virtio-blk.c +++ b/hw/virtio-blk.c @@ -464,8 +464,12 @@ static int virtio_blk_save(QEMUFile *f, void *opaque) { VirtIOBlock *s = opaque; VirtIOBlockReq *req = s-rq; +int ret; -virtio_save(s-vdev, f); +ret = virtio_save(s-vdev, f); +if (ret 0) { +return ret; +} while (req) { qemu_put_sbyte(f, 1); diff --git a/hw/virtio-net.c b/hw/virtio-net.c index 1b683d9..6673320 100644 --- a/hw/virtio-net.c +++ b/hw/virtio-net.c @@ -782,6 +782,7 @@ static void virtio_net_tx_bh(void *opaque) static int virtio_net_save(QEMUFile *f, void *opaque) { VirtIONet *n = opaque; +int ret; if (n-vhost_started) { /* TODO: should we really stop the backend? @@ -789,7 +790,11 @@ static int virtio_net_save(QEMUFile *f, void *opaque) vhost_net_stop(tap_get_vhost_net(n-nic-nc.peer), n-vdev); n-vhost_started = 0; } -virtio_save(n-vdev, f); + +ret = virtio_save(n-vdev, f); +if (ret 0) { +return ret; +} qemu_put_buffer(f, n-mac, ETH_ALEN); qemu_put_be32(f, n-tx_waiting); diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c index 86e6b0a..a7603bb 100644 --- a/hw/virtio-pci.c +++ b/hw/virtio-pci.c @@ -121,13 +121,19 @@ static void virtio_pci_notify(void *opaque, uint16_t vector) qemu_set_irq(proxy-pci_dev.irq[0], proxy-vdev-isr 1); } -static void virtio_pci_save_config(void * opaque, QEMUFile *f) +static int virtio_pci_save_config(void * opaque, QEMUFile *f) { VirtIOPCIProxy *proxy = opaque; -pci_device_save(proxy-pci_dev, f); +int ret; + +ret = pci_device_save(proxy-pci_dev, f); +if (ret 0) { +return ret; +} msix_save(proxy-pci_dev, f); if (msix_present(proxy-pci_dev)) qemu_put_be16(f, proxy-vdev-config_vector); +return 0; } static void virtio_pci_save_queue(void * opaque, int n, QEMUFile *f) diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c index 7f00fcf..ca57dda 100644 --- a/hw/virtio-serial-bus.c +++ b/hw/virtio-serial-bus.c @@ -459,9 +459,13 @@ static int virtio_serial_save(QEMUFile *f, void *opaque) VirtIOSerialPort *port; uint32_t nr_active_ports; unsigned int i; +int ret; /* The virtio device */ -virtio_save(s-vdev, f); +ret = virtio_save(s-vdev, f); +if (ret 0) { +return ret; +} /* The config space */ qemu_put_be16s(f, s-config.cols); diff --git a/hw/virtio.c b/hw/virtio.c index fbef788..27b0e84 100644 --- a/hw/virtio.c +++ b/hw/virtio.c @@ -640,12 +640,16 @@ void virtio_notify_config(VirtIODevice *vdev) virtio_notify_vector(vdev, vdev-config_vector); } -void virtio_save(VirtIODevice *vdev, QEMUFile *f) +int virtio_save(VirtIODevice *vdev, QEMUFile *f) { -int i; +int i, ret; -if (vdev-binding-save_config) -vdev-binding-save_config(vdev-binding_opaque, f); +if (vdev-binding-save_config) { +ret = vdev-binding-save_config(vdev-binding_opaque, f); +if (ret 0) { +return ret; +} +} qemu_put_8s(f, vdev-status); qemu_put_8s(f, vdev-isr); @@ -671,6 +675,8 @@ void virtio_save(VirtIODevice *vdev, QEMUFile *f) if (vdev-binding-save_queue) vdev-binding-save_queue(vdev-binding_opaque, i, f); } + +return 0; } int virtio_load(VirtIODevice *vdev, QEMUFile *f) diff --git a/hw/virtio.h b/hw/virtio.h index 96514e6..5c5da3a 100644 --- a/hw/virtio.h +++ b/hw/virtio.h @@ -88,7 +88,7 @@ typedef struct VirtQueueElement typedef struct { void (*notify)(void * opaque, uint16_t vector); -void (*save_config)(void * opaque, QEMUFile *f); +int (*save_config)(void * opaque, QEMUFile *f); void (*save_queue)(void * opaque, int n, QEMUFile *f); int (*load_config)(void * opaque, QEMUFile *f); int (*load_queue)(void * opaque, int n, QEMUFile *f); @@ -150,7 +150,7 @@ int
[PATCH 5/6] savevm: Allow set_params and save_live_state to error
This lets a save state handler NAK a migration or cancel if it runs into problems. Signed-off-by: Alex Williamson alex.william...@redhat.com --- block-migration.c |4 +++- hw/hw.h |2 +- savevm.c | 18 +++--- 3 files changed, 19 insertions(+), 5 deletions(-) diff --git a/block-migration.c b/block-migration.c index 0bfdb73..5fb3b72 100644 --- a/block-migration.c +++ b/block-migration.c @@ -628,13 +628,15 @@ static int block_load(QEMUFile *f, void *opaque, int version_id) return 0; } -static void block_set_params(int blk_enable, int shared_base, void *opaque) +static int block_set_params(int blk_enable, int shared_base, void *opaque) { block_mig_state.blk_enable = blk_enable; block_mig_state.shared_base = shared_base; /* shared base means that blk_enable = 1 */ block_mig_state.blk_enable |= shared_base; + +return 0; } void blk_mig_init(void) diff --git a/hw/hw.h b/hw/hw.h index 91a60ca..95f2d52 100644 --- a/hw/hw.h +++ b/hw/hw.h @@ -239,7 +239,7 @@ static inline void qemu_get_sbe64s(QEMUFile *f, int64_t *pv) int64_t qemu_ftell(QEMUFile *f); int64_t qemu_fseek(QEMUFile *f, int64_t pos, int whence); -typedef void SaveSetParamsHandler(int blk_enable, int shared, void * opaque); +typedef int SaveSetParamsHandler(int blk_enable, int shared, void * opaque); typedef int SaveStateHandler(QEMUFile *f, void *opaque); typedef int SaveLiveStateHandler(Monitor *mon, QEMUFile *f, int stage, void *opaque); diff --git a/savevm.c b/savevm.c index 89c5fac..ad3ab86 100644 --- a/savevm.c +++ b/savevm.c @@ -1414,12 +1414,16 @@ int qemu_savevm_state_begin(Monitor *mon, QEMUFile *f, int blk_enable, int shared) { SaveStateEntry *se; +int ret; QTAILQ_FOREACH(se, savevm_handlers, entry) { if(se-set_params == NULL) { continue; } - se-set_params(blk_enable, shared, se-opaque); + ret = se-set_params(blk_enable, shared, se-opaque); +if (ret 0) { +return ret; +} } qemu_put_be32(f, QEMU_VM_FILE_MAGIC); @@ -1443,7 +1447,10 @@ int qemu_savevm_state_begin(Monitor *mon, QEMUFile *f, int blk_enable, qemu_put_be32(f, se-instance_id); qemu_put_be32(f, se-version_id); -se-save_live_state(mon, f, QEMU_VM_SECTION_START, se-opaque); +ret = se-save_live_state(mon, f, QEMU_VM_SECTION_START, se-opaque); +if (ret 0) { +return ret; +} } if (qemu_file_has_error(f)) { @@ -1474,6 +1481,8 @@ int qemu_savevm_state_iterate(Monitor *mon, QEMUFile *f) and reduces the probability that a faster changing state is synchronized over and over again. */ break; +} else if (ret 0) { +return ret; } } @@ -1503,7 +1512,10 @@ int qemu_savevm_state_complete(Monitor *mon, QEMUFile *f) qemu_put_byte(f, QEMU_VM_SECTION_END); qemu_put_be32(f, se-section_id); -se-save_live_state(mon, f, QEMU_VM_SECTION_END, se-opaque); +r = se-save_live_state(mon, f, QEMU_VM_SECTION_END, se-opaque); +if (r 0) { +return r; +} } QTAILQ_FOREACH(se, savevm_handlers, entry) { -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/6] savevm: Remove register_device_unmigratable()
Now that the save state handlers can return error, individual drivers can cancel a migration if they hit an error or don't support it. This makes the unmigratable callback redundant. Remove it and change the only user to cancel the migration in a set_params callback, which actually happens much earlier in the migration than the unmigratable flag was checked. Signed-off-by: Alex Williamson alex.william...@redhat.com --- hw/hw.h |2 -- hw/ivshmem.c | 20 ++-- savevm.c | 31 --- 3 files changed, 14 insertions(+), 39 deletions(-) diff --git a/hw/hw.h b/hw/hw.h index 95f2d52..6c0aefe 100644 --- a/hw/hw.h +++ b/hw/hw.h @@ -264,8 +264,6 @@ int register_savevm_live(DeviceState *dev, void *opaque); void unregister_savevm(DeviceState *dev, const char *idstr, void *opaque); -void register_device_unmigratable(DeviceState *dev, const char *idstr, -void *opaque); typedef void QEMUResetHandler(void *opaque); diff --git a/hw/ivshmem.c b/hw/ivshmem.c index 3726a7f..4164861 100644 --- a/hw/ivshmem.c +++ b/hw/ivshmem.c @@ -616,6 +616,18 @@ static void ivshmem_setup_msi(IVShmemState * s) { s-eventfd_table = qemu_mallocz(s-vectors * sizeof(EventfdEntry)); } +static int ivshmem_set_param(int blk_enable, int shared, void *opaque) +{ +IVShmemState *proxy = opaque; + +if (proxy-role_val == IVSHMEM_PEER) { +fprintf(stderr, +ivshmem device in peer role, cannot be migrated or saved\n); +return -EINVAL; +} +return 0; +} + static int ivshmem_save(QEMUFile* f, void *opaque) { IVShmemState *proxy = opaque; @@ -683,8 +695,8 @@ static int pci_ivshmem_init(PCIDevice *dev) s-ivshmem_size = ivshmem_get_size(s); } -register_savevm(s-dev.qdev, ivshmem, 0, 0, ivshmem_save, ivshmem_load, -dev); +register_savevm_live(s-dev.qdev, ivshmem, 0, 0, ivshmem_set_param, + NULL, ivshmem_save, ivshmem_load, dev); /* IRQFD requires MSI */ if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD) @@ -707,10 +719,6 @@ static int pci_ivshmem_init(PCIDevice *dev) s-role_val = IVSHMEM_MASTER; /* default */ } -if (s-role_val == IVSHMEM_PEER) { -register_device_unmigratable(s-dev.qdev, ivshmem, s); -} - pci_conf = s-dev.config; pci_config_set_vendor_id(pci_conf, PCI_VENDOR_ID_REDHAT_QUMRANET); pci_conf[0x02] = 0x10; diff --git a/savevm.c b/savevm.c index ad3ab86..1b4ee08 100644 --- a/savevm.c +++ b/savevm.c @@ -1018,7 +1018,6 @@ typedef struct SaveStateEntry { const VMStateDescription *vmsd; void *opaque; CompatEntry *compat; -int no_migrate; } SaveStateEntry; @@ -1082,7 +1081,6 @@ int register_savevm_live(DeviceState *dev, se-load_state = load_state; se-opaque = opaque; se-vmsd = NULL; -se-no_migrate = 0; if (dev dev-parent_bus dev-parent_bus-info-get_dev_path) { char *id = dev-parent_bus-info-get_dev_path(dev); @@ -1149,31 +1147,6 @@ void unregister_savevm(DeviceState *dev, const char *idstr, void *opaque) } } -/* mark a device as not to be migrated, that is the device should be - unplugged before migration */ -void register_device_unmigratable(DeviceState *dev, const char *idstr, -void *opaque) -{ -SaveStateEntry *se; -char id[256] = ; - -if (dev dev-parent_bus dev-parent_bus-info-get_dev_path) { -char *path = dev-parent_bus-info-get_dev_path(dev); -if (path) { -pstrcpy(id, sizeof(id), path); -pstrcat(id, sizeof(id), /); -qemu_free(path); -} -} -pstrcat(id, sizeof(id), idstr); - -QTAILQ_FOREACH(se, savevm_handlers, entry) { -if (strcmp(se-idstr, id) == 0 se-opaque == opaque) { -se-no_migrate = 1; -} -} -} - int vmstate_register_with_alias_id(DeviceState *dev, int instance_id, const VMStateDescription *vmsd, void *opaque, int alias_id, @@ -1389,10 +1362,6 @@ static int vmstate_load(QEMUFile *f, SaveStateEntry *se, int version_id) static int vmstate_save(QEMUFile *f, SaveStateEntry *se) { -if (se-no_migrate) { -return -1; -} - if (!se-vmsd) { /* Old style */ return se-save_state(f, se-opaque); } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2] qemu-kvm/vhost: fix up irqfd support
On Wed, Oct 06, 2010 at 11:24:24AM -0600, Alex Williamson wrote: You could always keep the functions as separate wrapper callers of the common function so you only need to keep true = unset, false = set straight in one place. Thanks, Just to show why it does not work, I did exactly this: as you see the code is shorter but the true/false magic gets spread: it was in 2 places, (set/unset) now it is in 4 places and it is within the loop, in code that is more complex. So I think I'll stick to the original version and we can patch it up later if there's a will. diff --git a/hw/msix.c b/hw/msix.c index 3d4dd61..4b705a0 100644 --- a/hw/msix.c +++ b/hw/msix.c @@ -583,40 +583,15 @@ void msix_unuse_all_vectors(PCIDevice *dev) msix_free_irq_entries(dev); } -static int msix_set_mask_notifier_for_vector(PCIDevice *dev, unsigned vector) +/* Invoke the notifier if vector entry is used and unmasked. */ +static int msix_notify_if_unmasked(PCIDevice *dev, unsigned vector, bool masked) { int r = 0; -if (vector = dev-msix_entries_nr || !dev-msix_entry_used[vector]) +if (!dev-msix_entry_used[vector] || msix_is_masked(dev, vector)) { return 0; - -assert(dev-msix_mask_notifier); - -/* Unmask the new notifier unless vector is masked. */ -if (!msix_is_masked(dev, vector)) { -r = dev-msix_mask_notifier(dev, vector, false); -if (r 0) { -return r; -} } -return r; -} - -static int msix_unset_mask_notifier_for_vector(PCIDevice *dev, unsigned vector) -{ -int r = 0; -if (vector = dev-msix_entries_nr || !dev-msix_entry_used[vector]) -return 0; - assert(dev-msix_mask_notifier); - -/* Mask the old notifier unless it is already masked. */ -if (!msix_is_masked(dev, vector)) { -r = dev-msix_mask_notifier(dev, vector, true); -if (r 0) { -return r; -} -} -return r; +return dev-msix_mask_notifier(dev, vector, masked); } int msix_set_mask_notifier(PCIDevice *dev, msix_mask_notifier_func f) @@ -625,7 +600,7 @@ int msix_set_mask_notifier(PCIDevice *dev, msix_mask_notifier_func f) assert(!dev-msix_mask_notifier); dev-msix_mask_notifier = f; for (n = 0; n dev-msix_entries_nr; ++n) { -r = msix_set_mask_notifier_for_vector(dev, n); +r = msix_notify_if_unmasked(dev, n, false); if (r 0) { goto undo; } @@ -634,7 +609,7 @@ int msix_set_mask_notifier(PCIDevice *dev, msix_mask_notifier_func f) undo: while (--n = 0) { -msix_unset_mask_notifier_for_vector(dev, n); +msix_notify_if_unmasked(dev, n, true); } dev-msix_mask_notifier = NULL; return r; @@ -645,7 +620,7 @@ int msix_unset_mask_notifier(PCIDevice *dev) int r, n; assert(dev-msix_mask_notifier); for (n = 0; n dev-msix_entries_nr; ++n) { -r = msix_unset_mask_notifier_for_vector(dev, n); +r = msix_notify_if_unmasked(dev, n, true); if (r 0) { goto undo; } @@ -655,7 +630,7 @@ int msix_unset_mask_notifier(PCIDevice *dev) undo: while (--n = 0) { -msix_set_mask_notifier_for_vector(dev, n); +msix_notify_if_unmasked(dev, n, false); } return r; } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2] qemu-kvm/vhost: fix up irqfd support
On Wed, 2010-10-06 at 23:44 +0200, Michael S. Tsirkin wrote: On Wed, Oct 06, 2010 at 11:24:24AM -0600, Alex Williamson wrote: You could always keep the functions as separate wrapper callers of the common function so you only need to keep true = unset, false = set straight in one place. Thanks, Just to show why it does not work, I did exactly this: as you see the code is shorter but the true/false magic gets spread: it was in 2 places, (set/unset) now it is in 4 places and it is within the loop, in code that is more complex. You seem to have missed the wrapper function. I'm simply suggesting something like this: static int __do_msix_mask_notifier_for_vector(PCIDevice *dev, unsigned vector, bool mask) { if (vector = dev-msix_entries_nr || !dev-msix_entry_used[vector]) return 0; assert(dev-msix_mask_notifier); /* Set the new notifier unless vector is masked. */ if (!msix_is_masked(dev, vector)) { return dev-msix_mask_notifier(dev, vector, mask); } return 0; } static int msix_set_mask_notifier_for_vector(PCIDevice *dev, unsigned vector) { return __do_msix_mask_notifier_for_vector(dev, vector, false); } static int msix_unset_mask_notifier_for_vector(PCIDevice *dev, unsigned vector) { return __do_msix_mask_notifier_for_vector(dev, vector, true); } Which then doesn't go on to complicate the callers like the below does. Thanks, Alex So I think I'll stick to the original version and we can patch it up later if there's a will. diff --git a/hw/msix.c b/hw/msix.c index 3d4dd61..4b705a0 100644 --- a/hw/msix.c +++ b/hw/msix.c @@ -583,40 +583,15 @@ void msix_unuse_all_vectors(PCIDevice *dev) msix_free_irq_entries(dev); } -static int msix_set_mask_notifier_for_vector(PCIDevice *dev, unsigned vector) +/* Invoke the notifier if vector entry is used and unmasked. */ +static int msix_notify_if_unmasked(PCIDevice *dev, unsigned vector, bool masked) { int r = 0; -if (vector = dev-msix_entries_nr || !dev-msix_entry_used[vector]) +if (!dev-msix_entry_used[vector] || msix_is_masked(dev, vector)) { return 0; - -assert(dev-msix_mask_notifier); - -/* Unmask the new notifier unless vector is masked. */ -if (!msix_is_masked(dev, vector)) { -r = dev-msix_mask_notifier(dev, vector, false); -if (r 0) { -return r; -} } -return r; -} - -static int msix_unset_mask_notifier_for_vector(PCIDevice *dev, unsigned vector) -{ -int r = 0; -if (vector = dev-msix_entries_nr || !dev-msix_entry_used[vector]) -return 0; - assert(dev-msix_mask_notifier); - -/* Mask the old notifier unless it is already masked. */ -if (!msix_is_masked(dev, vector)) { -r = dev-msix_mask_notifier(dev, vector, true); -if (r 0) { -return r; -} -} -return r; +return dev-msix_mask_notifier(dev, vector, masked); } int msix_set_mask_notifier(PCIDevice *dev, msix_mask_notifier_func f) @@ -625,7 +600,7 @@ int msix_set_mask_notifier(PCIDevice *dev, msix_mask_notifier_func f) assert(!dev-msix_mask_notifier); dev-msix_mask_notifier = f; for (n = 0; n dev-msix_entries_nr; ++n) { -r = msix_set_mask_notifier_for_vector(dev, n); +r = msix_notify_if_unmasked(dev, n, false); if (r 0) { goto undo; } @@ -634,7 +609,7 @@ int msix_set_mask_notifier(PCIDevice *dev, msix_mask_notifier_func f) undo: while (--n = 0) { -msix_unset_mask_notifier_for_vector(dev, n); +msix_notify_if_unmasked(dev, n, true); } dev-msix_mask_notifier = NULL; return r; @@ -645,7 +620,7 @@ int msix_unset_mask_notifier(PCIDevice *dev) int r, n; assert(dev-msix_mask_notifier); for (n = 0; n dev-msix_entries_nr; ++n) { -r = msix_unset_mask_notifier_for_vector(dev, n); +r = msix_notify_if_unmasked(dev, n, true); if (r 0) { goto undo; } @@ -655,7 +630,7 @@ int msix_unset_mask_notifier(PCIDevice *dev) undo: while (--n = 0) { -msix_set_mask_notifier_for_vector(dev, n); +msix_notify_if_unmasked(dev, n, false); } return r; } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 17/19] KVM test: vlan subtest - Replace extra_params '-snapshot' with image_snapshot
From: Amos Kong ak...@redhat.com Framework could not totalise default extra_params and extra_params_vm1 in the following condition, it's difficult to realise when parsing config file or calling get_sub_dict*(). extra_params += ' str1' - case: extra_params_vm1 += str2 Signed-off-by: Amos Kong ak...@redhat.com --- client/tests/kvm/tests_base.cfg.sample |3 +-- 1 files changed, 1 insertions(+), 2 deletions(-) diff --git a/client/tests/kvm/tests_base.cfg.sample b/client/tests/kvm/tests_base.cfg.sample index ceabbf1..e9cb1b4 100644 --- a/client/tests/kvm/tests_base.cfg.sample +++ b/client/tests/kvm/tests_base.cfg.sample @@ -467,8 +467,7 @@ variants: send_cmd = nc %s %s %s nic_mode = tap vms += vm2 -extra_params_vm1 += -snapshot -extra_params_vm2 += -snapshot +image_snapshot = yes kill_vm_vm2 = yes kill_vm_gracefully_vm2 = no -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 09/19] KVM test: Add a subtest of load/unload nic driver
Repeatedly load/unload nic driver, try to transfer file between guest and host by threads at the same time, and check the md5sum. Changes from v4: - Give some time for the interface to be present after modprobe is executed. Changes from v1: - Use a new method to get nic driver name - Use utils.hash_file() to get md5sum Signed-off-by: Amos Kong ak...@redhat.com Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com --- client/tests/kvm/tests/nicdriver_unload.py | 115 client/tests/kvm/tests_base.cfg.sample | 10 ++- 2 files changed, 124 insertions(+), 1 deletions(-) create mode 100644 client/tests/kvm/tests/nicdriver_unload.py diff --git a/client/tests/kvm/tests/nicdriver_unload.py b/client/tests/kvm/tests/nicdriver_unload.py new file mode 100644 index 000..47318ba --- /dev/null +++ b/client/tests/kvm/tests/nicdriver_unload.py @@ -0,0 +1,115 @@ +import logging, threading, os +from autotest_lib.client.common_lib import error +from autotest_lib.client.bin import utils +import kvm_utils, kvm_test_utils + +def run_nicdriver_unload(test, params, env): + +Test nic driver. + +1) Boot a VM. +2) Get the NIC driver name. +3) Repeatedly unload/load NIC driver. +4) Multi-session TCP transfer on test interface. +5) Check whether the test interface should still work. + +@param test: KVM test object. +@param params: Dictionary with the test parameters. +@param env: Dictionary with test environment. + +timeout = int(params.get(login_timeout, 360)) +vm = kvm_test_utils.get_living_vm(env, params.get(main_vm)) +session = kvm_test_utils.wait_for_login(vm, timeout=timeout) +logging.info(Trying to log into guest '%s' by serial, vm.name) +session2 = kvm_utils.wait_for(lambda: vm.serial_login(), + timeout, 0, step=2) +if not session2: +raise error.TestFail(Could not log into guest '%s' % vm.name) + +ethname = kvm_test_utils.get_linux_ifname(session, vm.get_mac_address(0)) +sys_path = /sys/class/net/%s/device/driver % (ethname) +s, o = session.get_command_status_output('readlink -e %s' % sys_path) +if s: +raise error.TestError(Could not find driver name) +driver = os.path.basename(o.strip()) +logging.info(driver is %s, driver) + +class ThreadScp(threading.Thread): +def run(self): +remote_file = '/tmp/' + self.getName() +file_list.append(remote_file) +ret = vm.copy_files_to(file_name, remote_file, timeout=scp_timeout) +if ret: +logging.debug(File %s was transfered successfuly, remote_file) +else: +logging.debug(Failed to transfer file %s, remote_file) + +def compare(origin_file, receive_file): +cmd = md5sum %s +check_sum1 = utils.hash_file(origin_file, method=md5) +s, output2 = session.get_command_status_output(cmd % receive_file) +if s != 0: +logging.error(Could not get md5sum of receive_file) +return False +check_sum2 = output2.strip().split()[0] +logging.debug(original file md5: %s, received file md5: %s, + check_sum1, check_sum2) +if check_sum1 != check_sum2: +logging.error(MD5 hash of origin and received files doesn't match) +return False +return True + +#produce sized file in host +file_size = params.get(file_size) +file_name = /tmp/nicdriver_unload_file +cmd = dd if=/dev/urandom of=%s bs=%sM count=1 +utils.system(cmd % (file_name, file_size)) + +file_list = [] +connect_time = params.get(connect_time) +scp_timeout = int(params.get(scp_timeout)) +thread_num = int(params.get(thread_num)) +unload_load_cmd = (sleep %s ifconfig %s down modprobe -r %s + sleep 1 modprobe %s sleep 4 ifconfig %s up % + (connect_time, ethname, driver, driver, ethname)) +pid = os.fork() +if pid != 0: +logging.info(Unload/load NIC driver repeatedly in guest...) +while True: +logging.debug(Try to unload/load nic drive once) +if session2.get_command_status(unload_load_cmd, timeout=120) != 0: +session.get_command_output(rm -rf /tmp/Thread-*) +raise error.TestFail(Unload/load nic driver failed) +pid, s = os.waitpid(pid, os.WNOHANG) +status = os.WEXITSTATUS(s) +if (pid, status) != (0, 0): +logging.debug(Child process ending) +break +else: +logging.info(Multi-session TCP data transfer) +threads = [] +for i in range(thread_num): +t = ThreadScp() +t.start() +threads.append(t) +for t in threads: +t.join(timeout = scp_timeout) +os._exit(0) + +session2.close() + +try: +logging.info(Check MD5 hash for
[PATCH 16/19] KVM test: Improve vlan subtest
From: Amos Kong ak...@redhat.com This is an enhancement of existed vlan test. Rename the vlan_tag.py to vlan.py, it is more reasonable. . Setup arp from /proc/sys/net/ipv4/conf/all/arp_ignore . Multiple vlans exist simultaneously . Test ping between same and different vlans . Test by TCP data transfer, floop ping between same vlan . Maximal plumb/unplumb vlans Changes from v4: - Do not use hardcoded nw interfaces Signed-off-by: Amos Kong ak...@redhat.com --- client/tests/kvm/tests/vlan.py | 185 client/tests/kvm/tests/vlan_tag.py | 68 client/tests/kvm/tests_base.cfg.sample | 16 ++- 3 files changed, 195 insertions(+), 74 deletions(-) create mode 100644 client/tests/kvm/tests/vlan.py delete mode 100644 client/tests/kvm/tests/vlan_tag.py diff --git a/client/tests/kvm/tests/vlan.py b/client/tests/kvm/tests/vlan.py new file mode 100644 index 000..f41ea6a --- /dev/null +++ b/client/tests/kvm/tests/vlan.py @@ -0,0 +1,185 @@ +import logging, time, re +from autotest_lib.client.common_lib import error +import kvm_test_utils, kvm_utils + +def run_vlan(test, params, env): + +Test 802.1Q vlan of NIC, config it by vconfig command. + +1) Create two VMs. +2) Setup guests in 10 different vlans by vconfig and using hard-coded + ip address. +3) Test by ping between same and different vlans of two VMs. +4) Test by TCP data transfer, floop ping between same vlan of two VMs. +5) Test maximal plumb/unplumb vlans. +6) Recover the vlan config. + +@param test: KVM test object. +@param params: Dictionary with the test parameters. +@param env: Dictionary with test environment. + + +vm = [] +session = [] +ifname = [] +vm_ip = [] +digest_origin = [] +vlan_ip = ['', ''] +ip_unit = ['1', '2'] +subnet = params.get(subnet) +vlan_num = int(params.get(vlan_num)) +maximal = int(params.get(maximal)) +file_size = params.get(file_size) + +vm.append(kvm_test_utils.get_living_vm(env, params.get(main_vm))) +vm.append(kvm_test_utils.get_living_vm(env, vm2)) + +def add_vlan(session, id, iface=eth0): +if session.get_command_status(vconfig add %s %s % (iface, id)) != 0: +raise error.TestError(Fail to add %s.%s % (iface, id)) + +def set_ip_vlan(session, id, ip, iface=eth0): +iface = %s.%s % (iface, id) +if session.get_command_status(ifconfig %s %s % (iface, ip)) != 0: +raise error.TestError(Fail to configure ip for %s % iface) + +def set_arp_ignore(session, iface=eth0): +ignore_cmd = echo 1 /proc/sys/net/ipv4/conf/all/arp_ignore +if session.get_command_status(ignore_cmd) != 0: +raise error.TestError(Fail to set arp_ignore of %s % session) + +def rem_vlan(session, id, iface=eth0): +rem_vlan_cmd = if [[ -e /proc/net/vlan/%s ]];then vconfig rem %s;fi +iface = %s.%s % (iface, id) +s = session.get_command_status(rem_vlan_cmd % (iface, iface)) +return s + +def nc_transfer(src, dst): +nc_port = kvm_utils.find_free_port(1025, 5334, vm_ip[dst]) +listen_cmd = params.get(listen_cmd) +send_cmd = params.get(send_cmd) + +#listen in dst +listen_cmd = listen_cmd % (nc_port, receive) +session[dst].sendline(listen_cmd) +time.sleep(2) +#send file from src to dst +send_cmd = send_cmd % (vlan_ip[dst], str(nc_port), file) +if session[src].get_command_status(send_cmd, timeout = 60) != 0: +raise error.TestFail (Fail to send file + from vm%s to vm%s % (src+1, dst+1)) +s, o = session[dst].read_up_to_prompt(timeout=60) +if s != True: +raise error.TestFail (Fail to receive file + from vm%s to vm%s % (src+1, dst+1)) +#check MD5 message digest of receive file in dst +output = session[dst].get_command_output(md5sum receive).strip() +digest_receive = re.findall(r'(\w+)', output)[0] +if digest_receive == digest_origin[src]: +logging.info(file succeed received in vm %s % vlan_ip[dst]) +else: +logging.info(digest_origin is %s % digest_origin[src]) +logging.info(digest_receive is %s % digest_receive) +raise error.TestFail(File transfered differ from origin) +session[dst].get_command_status(rm -f receive) + +for i in range(2): +session.append(kvm_test_utils.wait_for_login(vm[i], + timeout=int(params.get(login_timeout, 360 +if not session[i] : +raise error.TestError(Could not log into guest(vm%d) % i) +logging.info(Logged in) + +ifname.append(kvm_test_utils.get_linux_ifname(session[i], + vm[i].get_mac_address())) +#get guest ip +vm_ip.append(vm[i].get_address()) + +#produce sized file
Re: 8 NIC limit - patch - places limit at 32
* Anthony Liguori (anth...@codemonkey.ws) wrote: BTW, using -device, it should be possible to add a very high number of nics because you can specify the PCI address including a function. If this doesn't Just Work today, we should make it work. Should work...test...mostly[1], but I don't actually know of any tools that make use of it. thanks, -chris [1] 40 worked, 48 caused guest kernel stack corruption, didn't dig in to see why yet. Here's my simple wrapper to build up the command line: QEMU=/home/chrisw/git/kvm/qemu-kvm/x86_64-softmmu/qemu-system-x86_64 BIOS=/home/chrisw/git/kvm/qemu-kvm/pc-bios DISK=/home/chrisw/disk-snap1.img SCRIPT=/home/chrisw/git/kvm/qemu-kvm/kvm/scripts/qemu-ifup unset NETARGS i=0 dev=4 func=0 max_dev=40 while [ $i -lt $max_dev ] do unset MULTIFUNC if [ $(($i + 1)) -lt $max_dev -a $func -eq 0 ]; then MULTIFUNC=,multifunction=on fi NETARGS=${NETARGS} -netdev type=tap,id=netdev$i,script=$SCRIPT -device virtio-net-pci,mac=52:54:00:12:34:$(printf %.2x\n $i),netdev=netdev$i,bus=pci.0,addr=$dev.$func$MULTIFUNC i=$(($i+1)) func=$(($func+1)) if [ $func -eq 8 ]; then func=0 dev=$(($dev+1)) fi done $QEMU -L $BIOS -m 1024 -drive file=$DISK,if=virtio,boot=on $NETARGS -vnc :0 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/18] Network Patchset v4
On Mon, 2010-09-27 at 18:43 -0400, Lucas Meneghel Rodrigues wrote: We are close to the end of this journey. Several little problems were fixed and we are down to some little problems: Ok, all patches applied. Thanks to everyone that helped on this effort! 1 - jumbo test - tap interface name shouldn't be used to stablish arp static entry, use bridge name instead - need validation from akong and/or jasonwang 2 - vlan subtest - still has some problems with it 3 - ethtool - find a way to install ethtool in guest using distro packages - this one can be easily postponed Please give me some feedback on it. Amos Kong (11): KVM test: Add a new macaddress pool algorithm KVM test: Add a new subtest ping KVM test: Add basic file transfer test KVM test: Add a subtest of nic promisc KVM test: Add a subtest of multicast KVM test: Add a subtest of pxe KVM test: Add a subtest of changing MAC address KVM test: Add a netperf subtest KVM test: kvm_utils - Add support of check if remote port free KVM test: Improve vlan subtest KVM test: vlan subtest - Replace extra_params '-snapshot' with image_snapshot Lucas Meneghel Rodrigues (7): KVM test: Make physical_resources_check to work with MAC management KVM test: Remove address_pools.cfg dependency KVM test: Add a get_ifname function KVM Test: Add nw related functions ping and get_linux_ifname KVM test: Add a subtest jumbo KVM test: Add a subtest of load/unload nic driver KVM test: Add subtest of testing offload by ethtool client/tests/kvm/address_pools.cfg.sample | 65 -- client/tests/kvm/control |8 - client/tests/kvm/control.parallel |9 - client/tests/kvm/get_started.py|4 +- client/tests/kvm/kvm_test_utils.py | 130 - client/tests/kvm/kvm_utils.py | 139 - client/tests/kvm/kvm_vm.py | 104 +- client/tests/kvm/scripts/join_mcast.py | 37 client/tests/kvm/tests/ethtool.py | 222 client/tests/kvm/tests/file_transfer.py| 58 + client/tests/kvm/tests/jumbo.py| 136 client/tests/kvm/tests/mac_change.py | 68 ++ client/tests/kvm/tests/multicast.py| 91 client/tests/kvm/tests/netperf.py | 56 + client/tests/kvm/tests/nic_promisc.py | 103 + client/tests/kvm/tests/nicdriver_unload.py | 115 ++ client/tests/kvm/tests/physical_resources_check.py |7 +- client/tests/kvm/tests/ping.py | 72 +++ client/tests/kvm/tests/pxe.py | 31 +++ client/tests/kvm/tests/vlan.py | 186 client/tests/kvm/tests/vlan_tag.py | 68 -- client/tests/kvm/tests_base.cfg.sample | 97 - 22 files changed, 1628 insertions(+), 178 deletions(-) delete mode 100644 client/tests/kvm/address_pools.cfg.sample create mode 100755 client/tests/kvm/scripts/join_mcast.py create mode 100644 client/tests/kvm/tests/ethtool.py create mode 100644 client/tests/kvm/tests/file_transfer.py create mode 100644 client/tests/kvm/tests/jumbo.py create mode 100644 client/tests/kvm/tests/mac_change.py create mode 100644 client/tests/kvm/tests/multicast.py create mode 100644 client/tests/kvm/tests/netperf.py create mode 100644 client/tests/kvm/tests/nic_promisc.py create mode 100644 client/tests/kvm/tests/nicdriver_unload.py create mode 100644 client/tests/kvm/tests/ping.py create mode 100644 client/tests/kvm/tests/pxe.py create mode 100644 client/tests/kvm/tests/vlan.py delete mode 100644 client/tests/kvm/tests/vlan_tag.py -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch uq/master 7/8] MCE: Relay UCR MCE to guest
(2010/10/07 3:10), Dean Nelson wrote: On 10/06/2010 11:05 AM, Marcelo Tosatti wrote: On Wed, Oct 06, 2010 at 10:58:36AM +0900, Hidetoshi Seto wrote: I got some more question: (2010/10/05 3:54), Marcelo Tosatti wrote: Index: qemu/target-i386/cpu.h === --- qemu.orig/target-i386/cpu.h +++ qemu/target-i386/cpu.h @@ -250,16 +250,32 @@ #define PG_ERROR_RSVD_MASK 0x08 #define PG_ERROR_I_D_MASK 0x10 -#define MCG_CTL_P(1UL8) /* MCG_CAP register available */ +#define MCG_CTL_P(1ULL8) /* MCG_CAP register available */ +#define MCG_SER_P(1ULL24) /* MCA recovery/new status bits */ -#define MCE_CAP_DEFMCG_CTL_P +#define MCE_CAP_DEF(MCG_CTL_P|MCG_SER_P) #define MCE_BANKS_DEF10 It seems that current kvm doesn't support SER_P, so injecting SRAO to guest will mean that guest receives VAL|UC|!PCC and RIPV event from virtual processor that doesn't have SER_P. Dean also noted this. I don't think it was deliberate choice to not expose SER_P. Huang? In my testing, I found that MCG_SER_P was not being set (and I was running on a Nehalem-EX system). Injecting a MCE resulted in the guest entering into panic() from mce_panic(). If crash_kexec() finds a kexec_crash_image the system ends up rebooting, otherwise, what happens next requires operator intervention. Good to know. What I'm concerning is that if memory scrubbing SRAO event is injected when !SER_P, linux guest with certain mce tolerant level might grade it as UC severity and continue running with none of panicking, killing and poisoning because of !PCC and RIPV. Could you provide the panic message of the guest in your test? I think it can tell me why the mce handler decided to go panic. When I applied a patch to the guest's kernel which forces mce_ser to be set, as if MCG_SER_P was set (see __mcheck_cpu_cap_init()), I found that when the memory page was 'owned' by a guest process, the process would be killed (if the page was dirty), and the guest would stay running. The HWPoisoned page would be sidelined and not cause any more issues. Excellent. So while guest kernel knows which page is poisoned, guest processes are controlled not to touch the page. ... Therefore rebooting the vm and renewing kernel will lost the information where is poisoned. I think most OSes don't expect that it can receives MCE with !PCC on traditional x86 processor without SER_P. Q1: Is it safe to expect that guests can handle such !PCC event? This might be best answered by Huang, but as I mentioned above, without MCG_SER_P being set, the result was an orderly system panic on the guest. Though I'll wait Huang (I think he is on holiday), I believe that system panic is just a possible option for AO (Action Optional) event, no matter how the SER_P is. Q2: What is the expected behavior on the guest? I think I answered this above. Yeah, thanks. Q3: What happen if guest reboots itself in response to the MCE? That depends... And the following issue also holds for a guest that is rebooted at some point having successfully sidelined the bad page. After the guest has panic'd, a system_reset of the guest or a restart initiated by crash_kexec() (called by panic() on the guest), usually results in the guest hanging because the bad page still belongs to qemu-kvm and is now being referenced by the new guest in some way. Yes. In other words my concern about reboot is that new guest kernel including kdump kernel might try to read the bad page. If there is no AR-SIGBUS etc., we need some tricks to inhibit such accesses. (It actually may not hang, but successfully reboot and be runnable, with the bad page lurking in the background. It all seems to depend on where the bad page ends up, and whether it's ever referenced.) I know some tough guys using their PC with buggy DIMMs :-) I believe there was an attempt to deal with this in kvm on the host. See kvm_handle_bad_page(). This function was suppose to result in the sending of a BUS_MCEERR_AR flavored SIGBUS by do_sigbus() to qemu-kvm which in theory would result in the right thing happening. But commit 96054569190bdec375fe824e48ca1f4e3b53dd36 prevents the signal from being sent. So this mechanism needs to be re-worked, and the issue remains. Definitely. I guess Huang has some plan or hint for rework this point. I would think that if the the bad page can't be sidelined, such that the newly booting guest can't use it, then the new guest shouldn't be allowed to boot. But perhaps there is some merit in letting it try to boot and see if one gets 'lucky'. In case of booting a real machine in real world, hardware and firmware usually (or often) do self-test before passing control to OS. Some platform can boot OS with degraded configuration (for example, fewer memory) if it has trouble on its component. Some BIOS may stop booting and show messages like please reseat [component] on