Re: [Qemu-devel] hotplug: VM got stuck when attaching a pass-through device to the non-pass-through VM for the first time
I agree it's either COW breaking or (similarly) locking pages that the guest hasn't touched yet. You can use prealloc or -rt mlock=on to avoid this problem. Paolo Or the new shared flag - IIRC shared VMAs don't do COW either. Only if the problem isn't locking and zeroing of untouched pages (also, it is not upstream is it?). Can you make a profile with perf? -rt mlock=on option is not set, perf top -p qemu pid result: 21699 root 20 0 24.2g 24g 5312 S 0 33.8 0:24.39 qemu-system-x8 PerfTop: 95 irqs/sec kernel:17.9% us: 1.1% guest kernel:47.4% guest us:32.6% exact: 0.0% [1000Hz cycles], (target_pid: 15950) samples pcnt function DSO ___ _ _ ___ 2984.00 77.8% clear_page_c [kernel] 135.00 3.5% gup_huge_pmd [kernel] 134.00 3.5% pfn_to_dma_pte[kernel] 83.00 2.2% __domain_mapping [kernel] 63.00 1.6% update_memslots [kvm] 59.00 1.5% prep_new_page [kernel] 50.00 1.3% get_user_pages_fast [kernel] 45.00 1.2% up_read [kernel] 42.00 1.1% down_read [kernel] 38.00 1.0% gup_pud_range [kernel] 34.00 0.9% kvm_clear_async_pf_completion_queue [kvm] 18.00 0.5% intel_iommu_map [kernel] 16.00 0.4% _cond_resched [kernel] 16.00 0.4% gfn_to_hva[kvm] 15.00 0.4% kvm_set_apic_base [kvm] 15.00 0.4% load_vmcs12_host_state[kvm_intel] 14.00 0.4% clear_huge_page [kernel] 7.00 0.2% intel_iommu_iova_to_phys [kernel] 6.00 0.2% is_error_pfn [kvm] 6.00 0.2% iommu_map [kernel] 6.00 0.2% native_write_msr_safe [kernel] 5.00 0.1% find_vma [kernel] -rt mlock=on option is set, perf top -p qemu pid result: PerfTop: 326 irqs/sec kernel:17.5% us: 2.8% guest kernel:37.4% guest us:42.3% exact: 0.0% [1000Hz cycles], (target_pid: 25845) samples pcnt function DSO ___ _ _ ___ 182.00 17.5% pfn_to_dma_pte[kernel] 178.00 17.1% gup_huge_pmd [kernel] 91.00 8.8% __domain_mapping [kernel] 71.00 6.8% update_memslots [kvm] 65.00 6.3% gup_pud_range [kernel] 62.00 6.0% get_user_pages_fast [kernel] 52.00 5.0% kvm_clear_async_pf_completion_queue [kvm] 50.00 4.8% down_read [kernel] 37.00 3.6% up_read [kernel] 26.00 2.5% intel_iommu_map [kernel] 20.00 1.9% native_write_msr_safe [kernel] 16.00 1.5% gfn_to_hva[kvm] 14.00 1.3% load_vmcs12_host_state[kvm_intel] 8.00 0.8% find_busiest_group[kernel] 8.00 0.8% _raw_spin_lock[kernel] 8.00 0.8% hrtimer_interrupt [kernel] 8.00 0.8% intel_iommu_iova_to_phys [kernel] 7.00 0.7% iommu_map [kernel] 6.00 0.6% kvm_mmu_pte_write [kvm] 6.00 0.6% is_error_pfn [kvm] 5.00 0.5% kvm_set_apic_base [kvm] 5.00 0.5% clear_page_c [kernel] 5.00 0.5% iommu_iova_to_phys[kernel] With -rt mlock=on option not set, when iommu_map, many new pages have to be allocated and cleared, the clear operation is expensive. but no matter whether the -rt mlock=on option is set or not, the GPA-HPA DMAR page-table MUST be built, this operation is also expensive, about 1-2 sec needed for 25GB memory. Thanks, Zhang Haoyu Paolo
Re: [Qemu-devel] hotplug: VM got stuck when attaching a pass-through device to the non-pass-through VM for the first time
Hi, all The VM will get stuck for a while(about 6s for a VM with 20GB memory) when attaching a pass-through PCI card to the non-pass-through VM for the first time. The reason is that the host will build the whole VT-d GPA-HPA DMAR page-table, which needs a lot of time, and during this time, the qemu_global_mutex lock is hold by the main-thread, if the vcpu thread IOCTL return, it will be blocked to waiting main-thread to release the qemu_global_mutex lock, so the VM got stuck. The race between qemu-main-thread and vcpu-thread is shown as below, QEMU-main-threadvcpu-thread | | qemu_mutex_lock_iothread qemu_mutex_lock(qemu_global_mutex) | | +loop- -+ +loop+ || | | | qemu_mutex_unlock_iothread| qemu_mutex_unlock_iothread || | | | poll | kvm_vcpu_ioctl(KVM_RUN) || | | | qemu_mutex_lock_iothread | | || | | -- || | qemu_mutex_lock_iothread | kvm_device_pci_assign| | || | blocked to waiting main-thread to release the qemu lock | about 6 sec for 20GB memory | | || | | ++ +-+ Any advises? Thanks, Zhang Haoyu What if you detach and re-attach? Is it fast then? Yes, because the VT-d GPA-HPA DMAR page-table has been built, no need to re-build it. If yes this means the issue is COW breaking that occurs with get_user_pages, not translation as such. Try hugepages with prealloc - does it help? Yes, a bit help gained, but it cannot resolve the problem completely, the stuck still happened. Thanks, Zhang Haoyu
Re: [Qemu-devel] hotplug: VM got stuck when attaching a pass-through device to the non-pass-through VM for the first time
What if you detach and re-attach? Is it fast then? If yes this means the issue is COW breaking that occurs with get_user_pages, not translation as such. Try hugepages with prealloc - does it help? I agree it's either COW breaking or (similarly) locking pages that the guest hasn't touched yet. You can use prealloc or -rt mlock=on to avoid this problem. It gets better if using -rt mlock=on, but still cannot resolve the problem completely. VT-d and EPT do not share the GPA-HPA page-table, still need to build VT-d GPA-HPA DMAR page-table, Although the -rt mlock=on option guarantees that all of vm memory have been touched before attaching the pass-through device, the building is faster, but which still need some time. Thanks, Zhang Haoyu Paolo
[Qemu-devel] hotplug: VM got stuck when attaching a pass-through device to the non-pass-through VM for the first time
Hi, all The VM will get stuck for a while(about 6s for a VM with 20GB memory) when attaching a pass-through PCI card to the non-pass-through VM for the first time. The reason is that the host will build the whole VT-d GPA-HPA DMAR page-table, which needs a lot of time, and during this time, the qemu_global_mutex lock is hold by the main-thread, if the vcpu thread IOCTL return, it will be blocked to waiting main-thread to release the qemu_global_mutex lock, so the VM got stuck. The race between qemu-main-thread and vcpu-thread is shown as below, QEMU-main-threadvcpu-thread | | qemu_mutex_lock_iothread qemu_mutex_lock(qemu_global_mutex) | | +loop- -+ +loop+ || | | | qemu_mutex_unlock_iothread| qemu_mutex_unlock_iothread || | | | poll | kvm_vcpu_ioctl(KVM_RUN) || | | | qemu_mutex_lock_iothread | | || | | -- || | qemu_mutex_lock_iothread | kvm_device_pci_assign| | || | blocked to waiting main-thread to release the qemu lock | about 6 sec for 20GB memory | | || | | ++ +-+ Any advises? Thanks, Zhang Haoyu
Re: [Qemu-devel] [RFC] create a single workqueue for each vm to update vm irq routing table
On Tue, Nov 26, 2013 at 06:14:27PM +0200, Gleb Natapov wrote: On Tue, Nov 26, 2013 at 06:05:37PM +0200, Michael S. Tsirkin wrote: On Tue, Nov 26, 2013 at 02:56:10PM +0200, Gleb Natapov wrote: On Tue, Nov 26, 2013 at 01:47:03PM +0100, Paolo Bonzini wrote: Il 26/11/2013 13:40, Zhanghaoyu (A) ha scritto: When guest set irq smp_affinity, VMEXIT occurs, then the vcpu thread will IOCTL return to QEMU from hypervisor, then vcpu thread ask the hypervisor to update the irq routing table, in kvm_set_irq_routing, synchronize_rcu is called, current vcpu thread is blocked for so much time to wait RCU grace period, and during this period, this vcpu cannot provide service to VM, so those interrupts delivered to this vcpu cannot be handled in time, and the apps running on this vcpu cannot be serviced too. It's unacceptable in some real-time scenario, e.g. telecom. So, I want to create a single workqueue for each VM, to asynchronously performing the RCU synchronization for irq routing table, and let the vcpu thread return and VMENTRY to service VM immediately, no more need to blocked to wait RCU grace period. And, I have implemented a raw patch, took a test in our telecom environment, above problem disappeared. I don't think a workqueue is even needed. You just need to use call_rcu to free old after releasing kvm-irq_lock. What do you think? It should be rate limited somehow. Since it guest triggarable guest may cause host to allocate a lot of memory this way. The checks in __call_rcu(), should handle this I think. These keep a per-CPU counter, which can be adjusted via rcutree.blimit, which defaults to taking evasive action if more than 10K callbacks are waiting on a given CPU. Documentation/RCU/checklist.txt has: An especially important property of the synchronize_rcu() primitive is that it automatically self-limits: if grace periods are delayed for whatever reason, then the synchronize_rcu() primitive will correspondingly delay updates. In contrast, code using call_rcu() should explicitly limit update rate in cases where grace periods are delayed, as failing to do so can result in excessive realtime latencies or even OOM conditions. I just asked Paul what this means. My understanding shown as blow, The synchronous grace period API synchronize_rcu() can prevent current thread from generating a large number of rcu-update subsequently, just as the self-limits described above in Documentation/RCU/checklist.txt, can avoid memory exhaustion, but the asynchronous API call_rcu() cannot limit the update rate, need explicitly rate limit. Thanks, Zhang Haoyu -- Gleb.
Re: [Qemu-devel] [RFC] create a single workqueue for each vm to update vm irq routing table
No, this would be exactly the same code that is running now: mutex_lock(kvm-irq_lock); old = kvm-irq_routing; kvm_irq_routing_update(kvm, new); mutex_unlock(kvm-irq_lock); synchronize_rcu(); kfree(old); return 0; Except that the kfree would run in the call_rcu kernel thread instead of the vcpu thread. But the vcpus already see the new routing table after the rcu_assign_pointer that is in kvm_irq_routing_update. I understood the proposal was also to eliminate the synchronize_rcu(), so while new interrupts would see the new routing table, interrupts already in flight could pick up the old one. Isn't that always the case with RCU? (See my answer above: the vcpus already see the new routing table after the rcu_assign_pointer that is in kvm_irq_routing_update). With synchronize_rcu(), you have the additional guarantee that any parallel accesses to the old routing table have completed. Since we also trigger the irq from rcu context, you know that after synchronize_rcu() you won't get any interrupts to the old destination (see kvm_set_irq_inatomic()). We do not have this guaranty for other vcpus that do not call synchronize_rcu(). They may still use outdated routing table while a vcpu or iothread that performed table update sits in synchronize_rcu(). Consider this guest code: write msi entry, directing the interrupt away from this vcpu nop memset(idt, 0, sizeof(idt)); Currently, this code will never trigger a triple fault. With the change to call_rcu(), it may. Now it may be that the guest does not expect this to work (PCI writes are posted; and interrupts can be delayed indefinitely by the pci fabric), but we don't know if there's a path that guarantees the guest something that we're taking away with this change. In native environment, if a CPU's LAPIC's IRR and ISR have been pending many interrupts, then OS perform zeroing this CPU's IDT before receiving interrupts, will the same problem happen? Zhang Haoyu
Re: [Qemu-devel] [RFC] create a single workqueue for each vm to update vm irq routing table
I don't think a workqueue is even needed. You just need to use call_rcu to free old after releasing kvm-irq_lock. What do you think? It should be rate limited somehow. Since it guest triggarable guest may cause host to allocate a lot of memory this way. Why does use call_rcu to free old after releasing kvm-irq_lock may cause host to allocate a lot of memory? Do you mean that malicious guest's frequent irq-routing-table updating operations will result in too many delayed mem-free of old irq-routing-tables? Thanks, Zhang Haoyu True, though if I understand Zhanghaoyu's proposal a workqueue would be even worse. Paolo
Re: [Qemu-devel] [RFC] create a single workqueue for each vm to update vm irq routing table
I understood the proposal was also to eliminate the synchronize_rcu(), so while new interrupts would see the new routing table, interrupts already in flight could pick up the old one. Isn't that always the case with RCU? (See my answer above: the vcpus already see the new routing table after the rcu_assign_pointer that is in kvm_irq_routing_update). With synchronize_rcu(), you have the additional guarantee that any parallel accesses to the old routing table have completed. Since we also trigger the irq from rcu context, you know that after synchronize_rcu() you won't get any interrupts to the old destination (see kvm_set_irq_inatomic()). We do not have this guaranty for other vcpus that do not call synchronize_rcu(). They may still use outdated routing table while a vcpu or iothread that performed table update sits in synchronize_rcu(). Avi's point is that, after the VCPU resumes execution, you know that no interrupt will be sent to the old destination because kvm_set_msi_inatomic (and ultimately kvm_irq_delivery_to_apic_fast) is also called within the RCU read-side critical section. Without synchronize_rcu you could have VCPU writes to routing table e = entry from IRQ routing table kvm_irq_routing_update(kvm, new); VCPU resumes execution kvm_set_msi_irq(e, irq); kvm_irq_delivery_to_apic_fast(); where the entry is stale but the VCPU has already resumed execution. If we use call_rcu()(Not consider the problem that Gleb pointed out temporarily) instead of synchronize_rcu(), should we still ensure this? Thanks, Zhang Haoyu If we want to ensure, we need to use a different mechanism for synchronization than the global RCU. QRCU would work; readers are not wait-free but only if there is a concurrent synchronize_qrcu, which should be rare. Paolo
[Qemu-devel] [RFC] create a single workqueue for each vm to update vm irq routing table
Hi all, When guest set irq smp_affinity, VMEXIT occurs, then the vcpu thread will IOCTL return to QEMU from hypervisor, then vcpu thread ask the hypervisor to update the irq routing table, in kvm_set_irq_routing, synchronize_rcu is called, current vcpu thread is blocked for so much time to wait RCU grace period, and during this period, this vcpu cannot provide service to VM, so those interrupts delivered to this vcpu cannot be handled in time, and the apps running on this vcpu cannot be serviced too. It's unacceptable in some real-time scenario, e.g. telecom. So, I want to create a single workqueue for each VM, to asynchronously performing the RCU synchronization for irq routing table, and let the vcpu thread return and VMENTRY to service VM immediately, no more need to blocked to wait RCU grace period. And, I have implemented a raw patch, took a test in our telecom environment, above problem disappeared. Any better ideas? Thanks, Zhang Haoyu
[Qemu-devel] question about VM kernel parameter idle=poll/mwait/halt/nomwait
Hi, all What's the difference of the linux guest kernel parameter idle=poll/mwait/halt/nomwait, especially in performance? Taking the performance into account, which one is best? In my opinion, if the number of all VMs' vcpus is far more than that of pcpus, e.g. SPECVirt test, idle=halt is better for server's total throughput, otherwise, e.g. in some CT scenario, the number of total vcpus is not greater than that of pcpus, idle=poll is better for server's total throughput, because of less latency and VMEXIT. linux-3.9 and above, idle=mwait is not recommended. Thanks, Zhang Haoyu
[Qemu-devel] [patch] avoid a bogus COMPLETED-CANCELLED transition
Avoid a bogus COMPLETED-CANCELLED transition. There is a period of time from the timing of setting COMPLETED state to that of migration thread exits, so during which it's problematic in COMPLETED-CANCELLED transition. Signed-off-by: Zeng Junliang zengjunli...@huawei.com Signed-off-by: Zhang Haoyu haoyu.zh...@huawei.com --- migration.c |9 - 1 files changed, 8 insertions(+), 1 deletions(-) diff --git a/migration.c b/migration.c index 2b1ab20..fd73b97 100644 --- a/migration.c +++ b/migration.c @@ -326,9 +326,16 @@ void migrate_fd_error(MigrationState *s) static void migrate_fd_cancel(MigrationState *s) { +int old_state ; DPRINTF(cancelling migration\n); -migrate_set_state(s, s-state, MIG_STATE_CANCELLED); +do { +old_state = s-state; +if (old_state != MIG_STATE_SETUP old_state != MIG_STATE_ACTIVE) { +break; +} +migrate_set_state(s, old_state, MIG_STATE_CANCELLED); +} while (s-state != MIG_STATE_CANCELLED); } void add_migration_state_change_notifier(Notifier *notify) -- 1.7.3.1.msysgit.0
[Qemu-devel] [patch] introduce MIG_STATE_CANCELLING state
Introduce MIG_STATE_CANCELLING state to avoid starting a new migration task while the previous one still exist. Signed-off-by: Zeng Junliang zengjunli...@huawei.com Signed-off-by: Zhang Haoyu haoyu.zh...@huawei.com --- migration.c | 26 -- 1 files changed, 16 insertions(+), 10 deletions(-) diff --git a/migration.c b/migration.c index fd73b97..af8a09c 100644 --- a/migration.c +++ b/migration.c @@ -40,6 +40,7 @@ enum { MIG_STATE_ERROR = -1, MIG_STATE_NONE, MIG_STATE_SETUP, +MIG_STATE_CANCELLING, MIG_STATE_CANCELLED, MIG_STATE_ACTIVE, MIG_STATE_COMPLETED, @@ -196,6 +197,7 @@ MigrationInfo *qmp_query_migrate(Error **errp) info-has_total_time = false; break; case MIG_STATE_ACTIVE: +case MIG_STATE_CANCELLING: info-has_status = true; info-status = g_strdup(active); info-has_total_time = true; @@ -282,6 +284,13 @@ void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params, /* shared migration helpers */ +static void migrate_set_state(MigrationState *s, int old_state, int new_state) +{ +if (atomic_cmpxchg(s-state, old_state, new_state) == new_state) { +trace_migrate_set_state(new_state); +} +} + static void migrate_fd_cleanup(void *opaque) { MigrationState *s = opaque; @@ -303,18 +312,14 @@ static void migrate_fd_cleanup(void *opaque) if (s-state != MIG_STATE_COMPLETED) { qemu_savevm_state_cancel(); +if (s-state == MIG_STATE_CANCELLING) { +migrate_set_state(s, MIG_STATE_CANCELLING, MIG_STATE_CANCELLED); +} } notifier_list_notify(migration_state_notifiers, s); } -static void migrate_set_state(MigrationState *s, int old_state, int new_state) -{ -if (atomic_cmpxchg(s-state, old_state, new_state) == new_state) { -trace_migrate_set_state(new_state); -} -} - void migrate_fd_error(MigrationState *s) { DPRINTF(setting error state\n); @@ -334,8 +339,8 @@ static void migrate_fd_cancel(MigrationState *s) if (old_state != MIG_STATE_SETUP old_state != MIG_STATE_ACTIVE) { break; } -migrate_set_state(s, old_state, MIG_STATE_CANCELLED); -} while (s-state != MIG_STATE_CANCELLED); +migrate_set_state(s, old_state, MIG_STATE_CANCELLING); +} while (s-state != MIG_STATE_CANCELLING); } void add_migration_state_change_notifier(Notifier *notify) @@ -412,7 +417,8 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk, params.blk = has_blk blk; params.shared = has_inc inc; -if (s-state == MIG_STATE_ACTIVE || s-state == MIG_STATE_SETUP) { +if (s-state == MIG_STATE_ACTIVE || s-state == MIG_STATE_SETUP || +s-state == MIG_STATE_CANCELLING) { error_set(errp, QERR_MIGRATION_ACTIVE); return; } -- 1.7.3.1.msysgit.0
Re: [Qemu-devel] [migration] questions about removing the old block-migration code
I read below words on the report of KVM Live Migration: Weather forecast (May 29, 2013), We were going to remove the old block-migration code Then people fixed it Good: it works now Bad: We have to maintain both It uses the same port than migration You need to migrate all/none of block devices The old block-migration code said above is that in block-migration.c? Yes. What are the reasons of removing the old block-migration code? Buggy implementation? Or need to migrate all/none of block devices? Buggy and tightly coupled with the live migration code, making it hard to modify either area independently. Thanks a lot for explaining. Till now, we still use the old block-migration code in our virtualization solution. Could you detail the bugs that the old block-migration code have? Thanks, Zhang Haoyu What's the substitutional method? drive_mirror? drive_mirror over NBD is an alternative. There are security and integration challenges with those approaches but libvirt has added drive-mirror block migration support. Stefan
Re: [Qemu-devel] [PATCH] migration: avoid starting a new migration task while the previous one still exist
Avoid starting a new migration task while the previous one still exist. Can you explain how to reproduce the problem? When network disconnection between source and destination happened, the migration thread stuck at below stack, Then I cancel the migration task, the migration state in qemu will be set to MIG_STATE_CANCELLED, so the migration job in libvirt quits. Then I perform migration again, at this time, the network reconnected successfully, since the TCP timeout retransmission, above stack will not return immediately, so two migration tasks exist at the same time. And still worse, source qemu will crash, because of accessing the NULL pointer in qemu_bh_schedule(s-cleanup_bh); statement in latter migration task, since the s-cleanup_bh had been deleted by previous migration task. Thanks for explaining. CANCELLING looks like a useful addition. Why do you need both CANCELLING and COMPLETING? The COMPLETED state should be set only after all I/O is done. There is a period of time from the timing of setting COMPLETED state to that of migration task exits, so it's problematic in COMPLETED-CANCELLED transition, but if applying your below proposal, the problem gone. do { old_state = s-state; if (old_state != MIG_STATE_SETUP old_state != MIG_STATE_ACTIVE) { break; } migrate_set_state(s, old_state, MIG_STATE_CANCELLED); } while (s-state != MIG_STATE_CANCELLED); I agree with Eric that the CANCELLING state should not be exposed via QMP. info migrate and query-migrate can keep showing active for maximum backwards compatibility. More comments below. -if (s-state != MIG_STATE_COMPLETED) { +if (s-state != MIG_STATE_COMPLETING) { qemu_savevm_state_cancel(); +if (s-state == MIG_STATE_CANCELLING) { +migrate_set_state(s, MIG_STATE_CANCELLING, MIG_STATE_CANCELLED); +} I think you can remove the if and unconditionally call migrate_set_state. Do you mean to remove the if (s-state == MIG_STATE_CANCELLING) ? The s-state probably is MIG_STATE_ERROR here, is it okay to unconditionally call migrate_set_state? Thanks, Zhang Haoyu +}else { +migrate_set_state(s, MIG_STATE_COMPLETING, + MIG_STATE_COMPLETED); } notifier_list_notify(migration_state_notifiers, s); } -static void migrate_set_state(MigrationState *s, int old_state, int new_state) -{ -if (atomic_cmpxchg(s-state, old_state, new_state) == new_state) { -trace_migrate_set_state(new_state); -} -} - void migrate_fd_error(MigrationState *s) { DPRINTF(setting error state\n); @@ -328,7 +337,7 @@ static void migrate_fd_cancel(MigrationState *s) { DPRINTF(cancelling migration\n); -migrate_set_state(s, s-state, MIG_STATE_CANCELLED); +migrate_set_state(s, s-state, MIG_STATE_CANCELLING); Here probably we want something like do { old_state = s-state; if (old_state != MIG_STATE_SETUP old_state != MIG_STATE_ACTIVE) { break; } migrate_set_state(s, old_state, MIG_STATE_CANCELLING); } while (s-state != MIG_STATE_CANCELLING); to avoid a bogus COMPLETED-CANCELLED transition. Please separate the patch in two parts: (1) the first uses the above code, with CANCELLED instead of CANCELLING (2) the second, similar to the one you have posted, introduces the new CANCELLING state Thanks, Paolo
Re: [Qemu-devel] About the IO-mirroring functionality inside the qemu
Hi all, Does the Qemu have the storage migration tool, like the io-mirroring inside the vmware? io-mirroring means for all the ioes, they are send to both source and destination at the same time. drive_mirror maybe your choice. Thanks, Zhang Haoyu Thanks!
[Qemu-devel] [PATCH] migration: avoid starting a new migration task while the previous one still exist
Avoid starting a new migration task while the previous one still exist. Signed-off-by: Zeng Junliang zengjunli...@huawei.com --- migration.c | 34 ++ 1 files changed, 22 insertions(+), 12 deletions(-) diff --git a/migration.c b/migration.c index 2b1ab20..ab4c439 100644 --- a/migration.c +++ b/migration.c @@ -40,8 +40,10 @@ enum { MIG_STATE_ERROR = -1, MIG_STATE_NONE, MIG_STATE_SETUP, +MIG_STATE_CANCELLING, MIG_STATE_CANCELLED, MIG_STATE_ACTIVE, +MIG_STATE_COMPLETING, MIG_STATE_COMPLETED, }; @@ -196,6 +198,8 @@ MigrationInfo *qmp_query_migrate(Error **errp) info-has_total_time = false; break; case MIG_STATE_ACTIVE: +case MIG_STATE_CANCELLING: +case MIG_STATE_COMPLETING: info-has_status = true; info-status = g_strdup(active); info-has_total_time = true; @@ -282,6 +286,13 @@ void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params, /* shared migration helpers */ +static void migrate_set_state(MigrationState *s, int old_state, int new_state) +{ +if (atomic_cmpxchg(s-state, old_state, new_state) == new_state) { +trace_migrate_set_state(new_state); +} +} + static void migrate_fd_cleanup(void *opaque) { MigrationState *s = opaque; @@ -301,20 +312,18 @@ static void migrate_fd_cleanup(void *opaque) assert(s-state != MIG_STATE_ACTIVE); -if (s-state != MIG_STATE_COMPLETED) { +if (s-state != MIG_STATE_COMPLETING) { qemu_savevm_state_cancel(); +if (s-state == MIG_STATE_CANCELLING) { +migrate_set_state(s, MIG_STATE_CANCELLING, MIG_STATE_CANCELLED); +} +}else { +migrate_set_state(s, MIG_STATE_COMPLETING, MIG_STATE_COMPLETED); } notifier_list_notify(migration_state_notifiers, s); } -static void migrate_set_state(MigrationState *s, int old_state, int new_state) -{ -if (atomic_cmpxchg(s-state, old_state, new_state) == new_state) { -trace_migrate_set_state(new_state); -} -} - void migrate_fd_error(MigrationState *s) { DPRINTF(setting error state\n); @@ -328,7 +337,7 @@ static void migrate_fd_cancel(MigrationState *s) { DPRINTF(cancelling migration\n); -migrate_set_state(s, s-state, MIG_STATE_CANCELLED); +migrate_set_state(s, s-state, MIG_STATE_CANCELLING); } void add_migration_state_change_notifier(Notifier *notify) @@ -405,7 +414,8 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk, params.blk = has_blk blk; params.shared = has_inc inc; -if (s-state == MIG_STATE_ACTIVE || s-state == MIG_STATE_SETUP) { +if (s-state == MIG_STATE_ACTIVE || s-state == MIG_STATE_SETUP || +s-state == MIG_STATE_COMPLETING || s-state == MIG_STATE_CANCELLING) { error_set(errp, QERR_MIGRATION_ACTIVE); return; } @@ -594,7 +604,7 @@ static void *migration_thread(void *opaque) } if (!qemu_file_get_error(s-file)) { -migrate_set_state(s, MIG_STATE_ACTIVE, MIG_STATE_COMPLETED); +migrate_set_state(s, MIG_STATE_ACTIVE, MIG_STATE_COMPLETING); break; } } @@ -634,7 +644,7 @@ static void *migration_thread(void *opaque) } qemu_mutex_lock_iothread(); -if (s-state == MIG_STATE_COMPLETED) { +if (s-state == MIG_STATE_COMPLETING) { int64_t end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); s-total_time = end_time - s-total_time; s-downtime = end_time - start_time; -- 1.7.3.1.msysgit.0 BTW, while error happened during migration, need the erroring state to avoid starting a new migration task while current migration task still exist? And, do the new added migration states need to be reported to libvirt?
Re: [Qemu-devel] [PATCH] migration: avoid starting a new migration task while the previous one still exist
Avoid starting a new migration task while the previous one still exist. Can you explain how to reproduce the problem? When network disconnection between source and destination happened, the migration thread stuck at below stack, #0 0x7f07e96c8288 in writev () from /lib64/libc.so.6 #1 0x7f07eb9bf11d in unix_writev_buffer (opaque=0x7f07eca2de80, iov=0x7f07ede9b1e0, iovcnt=64, pos=259870577) at /mnt/sdb2/zjl/bsod0x101_0821/qemu-kvm-1.5.1/savevm.c:354 #2 0x7f07eb9bf999 in qemu_fflush (f=0x7f07ede931b0) at /mnt/sdb2/zjl/bsod0x101_0821/qemu-kvm-1.5.1/savevm.c:600 #3 0x7f07eb9c011f in add_to_iovec (f=0x7f07ede931b0, buf=0x7f000ee23000 , size=4096) at /mnt/sdb2/zjl/bsod0x101_0821/qemu-kvm-1.5.1/savevm.c:756 #4 0x7f07eb9c01c0 in qemu_put_buffer_async (f=0x7f07ede931b0, buf=0x7f000ee23000 , size=4096) at /mnt/sdb2/zjl/bsod0x101_0821/qemu-kvm-1.5.1/savevm.c:772 #5 0x7f07eb92ad2f in ram_save_block (f=0x7f07ede931b0, last_stage=false) at /mnt/sdb2/zjl/bsod0x101_0821/qemu-kvm-1.5.1/arch_init.c:493 #6 0x7f07eb92b30c in ram_save_iterate (f=0x7f07ede931b0, opaque=0x0) at /mnt/sdb2/zjl/bsod0x101_0821/qemu-kvm-1.5.1/arch_init.c:654 #7 0x7f07eb9c2e12 in qemu_savevm_state_iterate (f=0x7f07ede931b0) at /mnt/sdb2/zjl/bsod0x101_0821/qemu-kvm-1.5.1/savevm.c:1914 #8 0x7f07eb8975e1 in migration_thread (opaque=0x7f07ebf53300 current_migration.25325) at migration.c:578 Then I cancel the migration task, the migration state in qemu will be set to MIG_STATE_CANCELLED, so the migration job in libvirt quits. Then I perform migration again, at this time, the network reconnected successfully, since the TCP timeout retransmission, above stack will not return immediately, so two migration tasks exist at the same time. And still worse, source qemu will crash, because of accessing the NULL pointer in qemu_bh_schedule(s-cleanup_bh); statement in latter migration task, since the s-cleanup_bh had been deleted by previous migration task. Also please use pbonz...@redhat.com instead. My Gmail address is an implementation detail. :) Signed-off-by: Zeng Junliang zengjunli...@huawei.com It looks like the author of the patch is not the same as you. If so, you need to make Zeng Junliang the author (using --author on the git commit command line) and add your own signoff line. So sorry for my poor experience. Paolo Avoid starting a new migration task while the previous one still exist. Signed-off-by: Zeng Junliang zengjunli...@huawei.com Signed-off-by: Zhang Haoyu haoyu.zh...@huawei.com --- migration.c | 34 ++ 1 files changed, 22 insertions(+), 12 deletions(-) diff --git a/migration.c b/migration.c index 2b1ab20..ab4c439 100644 --- a/migration.c +++ b/migration.c @@ -40,8 +40,10 @@ enum { MIG_STATE_ERROR = -1, MIG_STATE_NONE, MIG_STATE_SETUP, +MIG_STATE_CANCELLING, MIG_STATE_CANCELLED, MIG_STATE_ACTIVE, +MIG_STATE_COMPLETING, MIG_STATE_COMPLETED, }; @@ -196,6 +198,8 @@ MigrationInfo *qmp_query_migrate(Error **errp) info-has_total_time = false; break; case MIG_STATE_ACTIVE: +case MIG_STATE_CANCELLING: +case MIG_STATE_COMPLETING: info-has_status = true; info-status = g_strdup(active); info-has_total_time = true; @@ -282,6 +286,13 @@ void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params, /* shared migration helpers */ +static void migrate_set_state(MigrationState *s, int old_state, int new_state) +{ +if (atomic_cmpxchg(s-state, old_state, new_state) == new_state) { +trace_migrate_set_state(new_state); +} +} + static void migrate_fd_cleanup(void *opaque) { MigrationState *s = opaque; @@ -301,20 +312,18 @@ static void migrate_fd_cleanup(void *opaque) assert(s-state != MIG_STATE_ACTIVE); -if (s-state != MIG_STATE_COMPLETED) { +if (s-state != MIG_STATE_COMPLETING) { qemu_savevm_state_cancel(); +if (s-state == MIG_STATE_CANCELLING) { +migrate_set_state(s, MIG_STATE_CANCELLING, MIG_STATE_CANCELLED); +} +}else { +migrate_set_state(s, MIG_STATE_COMPLETING, MIG_STATE_COMPLETED); } notifier_list_notify(migration_state_notifiers, s); } -static void migrate_set_state(MigrationState *s, int old_state, int new_state) -{ -if (atomic_cmpxchg(s-state, old_state, new_state) == new_state) { -trace_migrate_set_state(new_state); -} -} - void migrate_fd_error(MigrationState *s) { DPRINTF(setting error state\n); @@ -328,7 +337,7 @@ static void migrate_fd_cancel(MigrationState *s) { DPRINTF(cancelling migration\n); -migrate_set_state(s, s-state, MIG_STATE_CANCELLED); +migrate_set_state(s, s-state, MIG_STATE_CANCELLING); } void add_migration_state_change_notifier(Notifier *notify) @@ -405,7 +414,8 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
[Qemu-devel] [migration] questions about removing the old block-migration code
Hi, Juan I read below words on the report of KVM Live Migration: Weather forecast (May 29, 2013), We were going to remove the old block-migration code Then people fixed it Good: it works now Bad: We have to maintain both It uses the same port than migration You need to migrate all/none of block devices The old block-migration code said above is that in block-migration.c? What are the reasons of removing the old block-migration code? Buggy implementation? Or need to migrate all/none of block devices? What's the substitutional method? drive_mirror? Thanks, Zhang Haoyu
Re: [Qemu-devel] [RESEND][PATCH] migration: drop MADVISE_DONT_NEED for incoming zero pages
The comments of ram_handle_compressed needs to be changed accordingly, Do not memset pages to zero if they already read as zero to avoid allocating zero pages and consuming memory unnecessarily. Thanks, Zhang Haoyu The madvise for zeroed out pages was introduced when every transferred zero page was memset to zero and thus allocated. Since commit 211ea740 we check for zeroness of a target page before we memset it to zero. Additionally we memmap target memory so it is essentially zero initialized (except for e.g. option roms and bios which are loaded into target memory although they shouldn't). It was reported recently that this madvise causes a performance degradation in some situations. As the madvise should only be called rarely and if it's called it is likely on a busy page (it was non-zero and changed to zero during migration) drop it completely. Reported-By: Zhang Haoyu haoyu.zh...@huawei.com Acked-by: Paolo Bonzini pbonz...@redhat.com Signed-off-by: Peter Lieven p...@kamp.de --- arch_init.c |8 1 file changed, 8 deletions(-) diff --git a/arch_init.c b/arch_init.c index 7545d96..e0acbc5 100644 --- a/arch_init.c +++ b/arch_init.c @@ -850,14 +850,6 @@ void ram_handle_compressed(void *host, uint8_t ch, uint64_t size) { if (ch != 0 || !is_zero_range(host, size)) { memset(host, ch, size); -#ifndef _WIN32 -if (ch == 0 (!kvm_enabled() || kvm_has_sync_mmu())) { -size = size ~(getpagesize() - 1); -if (size 0) { -qemu_madvise(host, size, QEMU_MADV_DONTNEED); -} -} -#endif } } -- 1.7.9.5
[Qemu-devel] migration: question about buggy implementation of traditional live migration with storage that migrating the storage in iteration way
Hi, all Could someone make a detailed statement for the buggy implementation of traditional live migration with storage that migrating the storage in iteration way? Thanks, Zhang Haoyu hi Michal, I used libvirt-1.0.3, ran below command to perform live migration, why no progress shown up? virsh migrate --live --verbose --copy-storage-all domain qemu+tcp://dest ip/system If replacing libvirt-1.0.3 with libvirt-1.0.2, the migration progress shown up, if performing migration without --copy-storage-all, the migration progress shown up, too. Thanks, Zhang Haoyu Because since 1.0.3 we are using NBD to migrate storage. Truth is, qemu is reporting progress of storage migration, however, there is no generic formula to combine storage migration and internal state migration into one number. With NBD the process is something like this: How to use NBD to migrate storage? Does NBD server in destination start automatically as soon as migration initiated, or some other configurations needed? What's the advantages of using NBD to migrate storage over traditional method that migrating the storage in iteration way, just like the way in which migrating the memory? Sorry for my poor knowledge in NBD, by which I used to implement shared storage for live migration without storage. NBD is used whenever both src and dst of migration is new enough to use it. That is, libvirt = 1.0.3 and qemu = 1.0.3. The NBD is turned on by libvirt whenever the conditions are met. User has no control over this. The advantage is: only specified disks can be transferred (currently not supported in libvirt), the previous implementation was buggy (according to some qemu developers), the storage is migrated via separate channel (a new connection) so it can be possible (in the future) to split migration of RAM + internal state and storage. So frankly speaking, there's no real advantage for users now - besides not using buggy implementation. Michal
Re: [Qemu-devel] why no progress shown after introduce NBD migration cookie
Hi, all Could someone make a detailed statement for the buggy implementation of traditional storage-migration method that migrating the storage in iteration way? Thanks, Zhang Haoyu hi Michal, I used libvirt-1.0.3, ran below command to perform live migration, why no progress shown up? virsh migrate --live --verbose --copy-storage-all domain qemu+tcp://dest ip/system If replacing libvirt-1.0.3 with libvirt-1.0.2, the migration progress shown up, if performing migration without --copy-storage-all, the migration progress shown up, too. Thanks, Zhang Haoyu Because since 1.0.3 we are using NBD to migrate storage. Truth is, qemu is reporting progress of storage migration, however, there is no generic formula to combine storage migration and internal state migration into one number. With NBD the process is something like this: How to use NBD to migrate storage? Does NBD server in destination start automatically as soon as migration initiated, or some other configurations needed? What's the advantages of using NBD to migrate storage over traditional method that migrating the storage in iteration way, just like the way in which migrating the memory? Sorry for my poor knowledge in NBD, by which I used to implement shared storage for live migration without storage. NBD is used whenever both src and dst of migration is new enough to use it. That is, libvirt = 1.0.3 and qemu = 1.0.3. The NBD is turned on by libvirt whenever the conditions are met. User has no control over this. The advantage is: only specified disks can be transferred (currently not supported in libvirt), the previous implementation was buggy (according to some qemu developers), the storage is migrated via separate channel (a new connection) so it can be possible (in the future) to split migration of RAM + internal state and storage. So frankly speaking, there's no real advantage for users now - besides not using buggy implementation. Michal BTW: It's better to ask these kind of info on the libvir-list next time, others might contribute with much more info as well (e.g. some qemu developers tend to watch the libvir-list too).
[Qemu-devel] [PATCH] rdma: fix multiple VMs parallel migration
When several VMs migrate with RDMA at the same time, the increased pressure cause packet loss probabilistically and make source and destination wait for each other. There might be some of VMs blocked during the migration. Fix the bug by using two completion queues, for sending and receiving respectively. Signed-off-by: Frank Yang frank.yang...@gmail.com --- migration-rdma.c | 58 +--- 1 file changed, 39 insertions(+), 19 deletions(-) diff --git a/migration-rdma.c b/migration-rdma.c index f94f3b4..33e8a92 100644 --- a/migration-rdma.c +++ b/migration-rdma.c @@ -363,7 +363,8 @@ typedef struct RDMAContext { struct ibv_qp *qp; /* queue pair */ struct ibv_comp_channel *comp_channel; /* completion channel */ struct ibv_pd *pd; /* protection domain */ -struct ibv_cq *cq; /* completion queue */ +struct ibv_cq *send_cq; /* completion queue */ +struct ibv_cq *recv_cq; /* receive completion queue */ /* * If a previous write failed (perhaps because of a failed @@ -1008,13 +1009,15 @@ static int qemu_rdma_alloc_pd_cq(RDMAContext *rdma) } /* - * Completion queue can be filled by both read and write work requests, - * so must reflect the sum of both possible queue sizes. + * Send completion queue is filled by both send and write work requests, + * Receive completion queue is filled by receive work requesets. */ -rdma-cq = ibv_create_cq(rdma-verbs, (RDMA_SIGNALED_SEND_MAX * 3), +rdma-send_cq = ibv_create_cq(rdma-verbs, (RDMA_SIGNALED_SEND_MAX * 2), NULL, rdma-comp_channel, 0); -if (!rdma-cq) { -fprintf(stderr, failed to allocate completion queue\n); +rdma-recv_cq = ibv_create_cq(rdma-verbs, RDMA_SIGNALED_SEND_MAX, NULL, +rdma-comp_channel, 0); +if (!rdma-send_cq || !rdma-recv_cq) { +fprintf(stderr, failed to allocate completion queues\n); goto err_alloc_pd_cq; } @@ -1045,8 +1048,8 @@ static int qemu_rdma_alloc_qp(RDMAContext *rdma) attr.cap.max_recv_wr = 3; attr.cap.max_send_sge = 1; attr.cap.max_recv_sge = 1; -attr.send_cq = rdma-cq; -attr.recv_cq = rdma-cq; +attr.send_cq = rdma-send_cq; +attr.recv_cq = rdma-recv_cq; attr.qp_type = IBV_QPT_RC; ret = rdma_create_qp(rdma-cm_id, rdma-pd, attr); @@ -1366,13 +1369,18 @@ static void qemu_rdma_signal_unregister(RDMAContext *rdma, uint64_t index, * Return the work request ID that completed. */ static uint64_t qemu_rdma_poll(RDMAContext *rdma, uint64_t *wr_id_out, - uint32_t *byte_len) + uint32_t *byte_len, int wrid_requested) { int ret; struct ibv_wc wc; uint64_t wr_id; -ret = ibv_poll_cq(rdma-cq, 1, wc); +if (wrid_requested == RDMA_WRID_RDMA_WRITE || +wrid_requested == RDMA_WRID_SEND_CONTROL) { +ret = ibv_poll_cq(rdma-send_cq, 1, wc); +} else if (wrid_requested = RDMA_WRID_RECV_CONTROL) { +ret = ibv_poll_cq(rdma-recv_cq, 1, wc); +} if (!ret) { *wr_id_out = RDMA_WRID_NONE; @@ -1465,12 +1473,9 @@ static int qemu_rdma_block_for_wrid(RDMAContext *rdma, int wrid_requested, void *cq_ctx; uint64_t wr_id = RDMA_WRID_NONE, wr_id_in; -if (ibv_req_notify_cq(rdma-cq, 0)) { -return -1; -} /* poll cq first */ while (wr_id != wrid_requested) { -ret = qemu_rdma_poll(rdma, wr_id_in, byte_len); +ret = qemu_rdma_poll(rdma, wr_id_in, byte_len, wrid_requested); if (ret 0) { return ret; } @@ -1492,6 +1497,17 @@ static int qemu_rdma_block_for_wrid(RDMAContext *rdma, int wrid_requested, } while (1) { +if (wrid_requested == RDMA_WRID_RDMA_WRITE || +wrid_requested == RDMA_WRID_SEND_CONTROL) { +if (ibv_req_notify_cq(rdma-send_cq, 0)) { +return -1; +} +} else if (wrid_requested = RDMA_WRID_RECV_CONTROL) { +if (ibv_req_notify_cq(rdma-recv_cq, 0)) { +return -1; +} +} + /* * Coroutine doesn't start until process_incoming_migration() * so don't yield unless we know we're running inside of a coroutine. @@ -1512,7 +1528,7 @@ static int qemu_rdma_block_for_wrid(RDMAContext *rdma, int wrid_requested, } while (wr_id != wrid_requested) { -ret = qemu_rdma_poll(rdma, wr_id_in, byte_len); +ret = qemu_rdma_poll(rdma, wr_id_in, byte_len, wrid_requested); if (ret 0) { goto err_block_for_wrid; } @@ -2241,9 +2257,13 @@ static void qemu_rdma_cleanup(RDMAContext *rdma) rdma_destroy_qp(rdma-cm_id); rdma-qp = NULL; } -if (rdma-cq) { -ibv_destroy_cq(rdma-cq); -rdma-cq = NULL; +
Re: [Qemu-devel] [RFC] sync NIC's MAC maintained in NICConf as soon as emualted NIC's MAC changed in guest
Hi, all Do live migration if emulated NIC's MAC has been changed, RARP with wrong MAC address will broadcast via qemu_announce_self in destination, so, long time network disconnection probably happen. Good catch. I want to do below works to resolve this problem, 1. change NICConf's MAC as soon as emulated NIC's MAC changed in guest This will make it impossible to revert it correctly on reset, won't it? You are right. virsh reboot domain, or virsh reset domain, or reboot VM from guest, will revert emulated NIC's MAC to original one maintained in NICConf. During the reboot/reset flow in qemu, emulated NIC's reset handler will sync the MAC address in NICConf to the MAC address in emulated NIC structure, e.g., virtio_net_reset sync the MAC address in NICConf to VirtIONet'mac. BTW, in native scenario, reboot will revert the changed MAC to original one, too. 2. sync NIC's (more precisely, queue) MAC to corresponding NICConf in NIC's migration load handler Any better ideas? Thanks, Zhang Haoyu I think announce needs to poke at the current MAC instead of the default one in NICConf. We can make it respect link down state while we are at it. NICConf structures are incorporated in different emulated NIC's structure, e.g., VirtIONet, E1000State_st, RTL8139State, etc., since so many kinds of emulated NICs, they are described by different structures, how to find all NICs' current MAC? Maybe we can introduce a pointer member 'current_mac' to NICConf structure, which points to the current MAC, then we can find all current MACs from NICConf.current_mac. Can we broadcast the RARP with current MAC in NIC's migration load handler respectively? Thanks, Zhang Haoyu Happily recent linux guests aren't affected since they do announcements from guest. -- MST
Re: [Qemu-devel] [RFC] sync NIC's MAC maintained in NICConf as soon as emualted NIC's MAC changed in guest
Hi, all Do live migration if emulated NIC's MAC has been changed, RARP with wrong MAC address will broadcast via qemu_announce_self in destination, so, long time network disconnection probably happen. Good catch. I want to do below works to resolve this problem, 1. change NICConf's MAC as soon as emulated NIC's MAC changed in guest This will make it impossible to revert it correctly on reset, won't it? You are right. virsh reboot domain, or virsh reset domain, or reboot VM from guest, will revert emulated NIC's MAC to original one maintained in NICConf. During the reboot/reset flow in qemu, emulated NIC's reset handler will sync the MAC address in NICConf to the MAC address in emulated NIC structure, e.g., virtio_net_reset sync the MAC address in NICConf to VirtIONet'mac. BTW, in native scenario, reboot will revert the changed MAC to original one, too. 2. sync NIC's (more precisely, queue) MAC to corresponding NICConf in NIC's migration load handler Any better ideas? Thanks, Zhang Haoyu I think announce needs to poke at the current MAC instead of the default one in NICConf. We can make it respect link down state while we are at it. NICConf structures are incorporated in different emulated NIC's structure, e.g., VirtIONet, E1000State_st, RTL8139State, etc., since so many kinds of emulated NICs, they are described by different structures, how to find all NICs' current MAC? Maybe we can introduce a pointer member 'current_mac' to NICConf structure, which points to the current MAC, then we can find all current MACs from NICConf.current_mac. I wouldn't make it a pointer, just a buffer with the mac, copy it there. Maybe call it softmac that's what it is really. Can we broadcast the RARP with current MAC in NIC's migration load handler respectively? Thanks, Zhang Haoyu It's not so simple, you need to retry several times. Could you make a statement for 'retry several times' ? Is it the process of retrying several times to sending RARP in qemu_announce_self_once? 'broadcast the RARP with current MAC in NIC's migration load handler respectively' is distributing the job of what qemu_announce_self does to every NIC's migration load handler, e.g., in virtio NIC's migration load handler virtio_net_load, we can create a timer to retry several times to send ARAP with current MAC for this NIC, just as same as qemu_announce_self does. Happily recent linux guests aren't affected since they do announcements from guest. -- MST
Re: [Qemu-devel] [RFC] sync NIC's MAC maintained in NICConf as soon as emualted NIC's MAC changed in guest
Hi, all Do live migration if emulated NIC's MAC has been changed, RARP with wrong MAC address will broadcast via qemu_announce_self in destination, so, long time network disconnection probably happen. Good catch. I want to do below works to resolve this problem, 1. change NICConf's MAC as soon as emulated NIC's MAC changed in guest This will make it impossible to revert it correctly on reset, won't it? You are right. virsh reboot domain, or virsh reset domain, or reboot VM from guest, will revert emulated NIC's MAC to original one maintained in NICConf. During the reboot/reset flow in qemu, emulated NIC's reset handler will sync the MAC address in NICConf to the MAC address in emulated NIC structure, e.g., virtio_net_reset sync the MAC address in NICConf to VirtIONet'mac. BTW, in native scenario, reboot will revert the changed MAC to original one, too. 2. sync NIC's (more precisely, queue) MAC to corresponding NICConf in NIC's migration load handler Any better ideas? Thanks, Zhang Haoyu I think announce needs to poke at the current MAC instead of the default one in NICConf. We can make it respect link down state while we are at it. NICConf structures are incorporated in different emulated NIC's structure, e.g., VirtIONet, E1000State_st, RTL8139State, etc., since so many kinds of emulated NICs, they are described by different structures, how to find all NICs' current MAC? Maybe we can introduce a pointer member 'current_mac' to NICConf structure, which points to the current MAC, then we can find all current MACs from NICConf.current_mac. I wouldn't make it a pointer, just a buffer with the mac, copy it there. Maybe call it softmac that's what it is really. Can we broadcast the RARP with current MAC in NIC's migration load handler respectively? Thanks, Zhang Haoyu It's not so simple, you need to retry several times. Could you make a statement for 'retry several times' ? Is it the process of retrying several times to sending RARP in qemu_announce_self_once? yes 'broadcast the RARP with current MAC in NIC's migration load handler respectively' is distributing the job of what qemu_announce_self does to every NIC's migration load handler, e.g., in virtio NIC's migration load handler virtio_net_load, we can create a timer to retry several times to send ARAP with current MAC for this NIC, just as same as qemu_announce_self does. I don't see a lot of value in this yet. In my opinion, it's not so good to introduce a 'softmac' member to NICConf, which is not essential function of NICConf. And, distributing the job of what qemu_announce_self does to every NIC's migration load handler has no disadvantages over qemu_announce_self, maybe more immediately to updating the forwarding table of switches/bridges. Happily recent linux guests aren't affected since they do announcements from guest. -- MST
Re: [Qemu-devel] [RFC] sync NIC's MAC maintained in NICConf as soon as emualted NIC's MAC changed in guest
Hi, all Do live migration if emulated NIC's MAC has been changed, RARP with wrong MAC address will broadcast via qemu_announce_self in destination, so, long time network disconnection probably happen. Good catch. I want to do below works to resolve this problem, 1. change NICConf's MAC as soon as emulated NIC's MAC changed in guest This will make it impossible to revert it correctly on reset, won't it? You are right. virsh reboot domain, or virsh reset domain, or reboot VM from guest, will revert emulated NIC's MAC to original one maintained in NICConf. During the reboot/reset flow in qemu, emulated NIC's reset handler will sync the MAC address in NICConf to the MAC address in emulated NIC structure, e.g., virtio_net_reset sync the MAC address in NICConf to VirtIONet'mac. BTW, in native scenario, reboot will revert the changed MAC to original one, too. 2. sync NIC's (more precisely, queue) MAC to corresponding NICConf in NIC's migration load handler Any better ideas? Thanks, Zhang Haoyu I think announce needs to poke at the current MAC instead of the default one in NICConf. We can make it respect link down state while we are at it. NICConf structures are incorporated in different emulated NIC's structure, e.g., VirtIONet, E1000State_st, RTL8139State, etc., since so many kinds of emulated NICs, they are described by different structures, how to find all NICs' current MAC? Maybe we can introduce a pointer member 'current_mac' to NICConf structure, which points to the current MAC, then we can find all current MACs from NICConf.current_mac. I wouldn't make it a pointer, just a buffer with the mac, copy it there. Maybe call it softmac that's what it is really. Can we broadcast the RARP with current MAC in NIC's migration load handler respectively? Thanks, Zhang Haoyu It's not so simple, you need to retry several times. Could you make a statement for 'retry several times' ? Is it the process of retrying several times to sending RARP in qemu_announce_self_once? yes 'broadcast the RARP with current MAC in NIC's migration load handler respectively' is distributing the job of what qemu_announce_self does to every NIC's migration load handler, e.g., in virtio NIC's migration load handler virtio_net_load, we can create a timer to retry several times to send ARAP with current MAC for this NIC, just as same as qemu_announce_self does. I don't see a lot of value in this yet. In my opinion, it's not so good to introduce a 'softmac' member to NICConf, which is not essential function of NICConf. Maybe not essential but 100% of hardware we emulate supports softmacs. Yes, but NICConf is about NIC *configuration*, not random common NIC state. We can capture common NIC state in a separate, properly named data type. If we want to bunch it together with common configuration in NICConf instead, then better rename NICConf to something that actually reflects its changed purpose. I doubt this would be a good idea. I agree, it should go into NetClientState, not NICConf. My main point is it's a common thing, let's not duplicate code. Yes, put it into NetClientState is better. But, need to add updating code for NetClientState.softmac to all devices, right? And, distributing the job of what qemu_announce_self does to every NIC's migration load handler has no disadvantages over qemu_announce_self, I see some disadvantages, yes. You are going to add code to all devices instead of doing it in one place, there better be a good reason for this. Comparing with qemu_announce_self, there is indeed no advantages, on the contrary, has disadvantages, just as what you said. but comparing with introducing 'softmac' or something into NIC-related structures, it does not need to add any data to NIC-related structures, and introducing 'softmac' also need to add updating code for 'softmac' to all devices, right? And, I don't think it's a good idea to store the identical data in two buffers, its consistency should be guaranteed. Thanks, Zhang Haoyu Keeping code common to many (most?) NICs factored out makes sense. We've started doing that for block devices, in hw/block/block.c. So far, the only code there is about configuration, thus we work with BlockConf. [...]
Re: [Qemu-devel] [RFC] sync NIC's MAC maintained in NICConf as soon as emualted NIC's MAC changed in guest
Hi, all Do live migration if emulated NIC's MAC has been changed, RARP with wrong MAC address will broadcast via qemu_announce_self in destination, so, long time network disconnection probably happen. I want to do below works to resolve this problem, 1. change NICConf's MAC as soon as emulated NIC's MAC changed in guest 2. sync NIC's (more precisely, queue) MAC to corresponding NICConf in NIC's migration load handler Any better ideas? As Michael points out. The only possible solution is to use do it inside the guest instead of qemu ( and using a pv device). You can have a look at my RFCs in http://lists.nongnu.org/archive/html/qemu-devel/2013-03/msg01127.html which let virtio driver send the gARP. Xen, Hyperv does the same thing. How about other emulated NICs, like etl8139, etc. ? The point is qemu does not know how the macs was used. So your method only solves the issue partially becuase: - A card can have several macs, see virtio_net and e1000's mac table and it can be overflowed also. - Vlan could be used so we need to send tagged gARP instead of untagged. Does the emulated NIC in qemu have knowledge of all of its MACs? We can provide an interface nic_announce_self(NetClientState *nc, uint8_t *mac_addr) which will try several times to send RARP just as same as what qemu_announce_self does, all emulated NICs' migration load handler can call nic_announce_self to announce itself for its all MACs. Thanks, Zhang Haoyu
[Qemu-devel] [RFC] sync NIC's MAC maintained in NICConf as soon as emualted NIC's MAC changed in guest
Hi, all Do live migration if emulated NIC's MAC has been changed, RARP with wrong MAC address will broadcast via qemu_announce_self in destination, so, long time network disconnection probably happen. I want to do below works to resolve this problem, 1. change NICConf's MAC as soon as emulated NIC's MAC changed in guest 2. sync NIC's (more precisely, queue) MAC to corresponding NICConf in NIC's migration load handler Any better ideas? Thanks, Zhang Haoyu
[Qemu-devel] [KVM] segmentation fault happened when reboot VM after hot-uplug virtio NIC
Hi, all Segmentation fault happened when reboot VM after hot-unplug virtio NIC, which can be reproduced 100%. See similar bug report to https://bugzilla.redhat.com/show_bug.cgi?id=988256 test environment: host: SLES11SP2 (kenrel version: 3.0.58) qemu: 1.5.1, upstream-qemu (commit 545825d4cda03ea292b7788b3401b99860efe8bc) libvirt: 1.1.0 guest os: win2k8 R2 x64bit or sles11sp2 x64 or win2k3 32bit You can reproduce this problem by following steps: 1. start a VM with virtio NIC(s) 2. hot-unplug a virtio NIC from the VM 3. reboot the VM, then segmentation fault happened during starting period the qemu backtrace shown as below: #0 0x7ff4be3288d0 in __memcmp_sse4_1 () from /lib64/libc.so.6 #1 0x7ff4c07f82c0 in patch_hypercalls (s=0x7ff4c15dd610) at /mnt/zhanghaoyu/qemu/qemu-1.5.1/hw/i386/kvmvapic.c:549 #2 0x7ff4c07f84f0 in vapic_prepare (s=0x7ff4c15dd610) at /mnt/zhanghaoyu/qemu/qemu-1.5.1/hw/i386/kvmvapic.c:614 #3 0x7ff4c07f85e7 in vapic_write (opaque=0x7ff4c15dd610, addr=0, data=32, size=2) at /mnt/zhanghaoyu/qemu/qemu-1.5.1/hw/i386/kvmvapic.c:651 #4 0x7ff4c082a917 in memory_region_write_accessor (opaque=0x7ff4c15df938, addr=0, value=0x7ff4bbfe3d00, size=2, shift=0, mask=65535) at /mnt/zhanghaoyu/qemu/qemu-1.5.1/memory.c:334 #5 0x7ff4c082a9ee in access_with_adjusted_size (addr=0, value=0x7ff4bbfe3d00, size=2, access_size_min=1, access_size_max=4, access=0x7ff4c082a89a memory_region_write_accessor, opaque=0x7ff4c15df938) at /mnt/zhanghaoyu/qemu/qemu-1.5.1/memory.c:364 #6 0x7ff4c082ae49 in memory_region_iorange_write (iorange=0x7ff4c15dfca0, offset=0, width=2, data=32) at /mnt/zhanghaoyu/qemu/qemu-1.5.1/memory.c:439 #7 0x7ff4c08236f7 in ioport_writew_thunk (opaque=0x7ff4c15dfca0, addr=126, data=32) at /mnt/zhanghaoyu/qemu/qemu-1.5.1/ioport.c:219 #8 0x7ff4c0823078 in ioport_write (index=1, address=126, data=32) at /mnt/zhanghaoyu/qemu/qemu-1.5.1/ioport.c:83 #9 0x7ff4c0823ca9 in cpu_outw (addr=126, val=32) at /mnt/zhanghaoyu/qemu/qemu-1.5.1/ioport.c:296 #10 0x7ff4c0827485 in kvm_handle_io (port=126, data=0x7ff4c051, direction=1, size=2, count=1) at /mnt/zhanghaoyu/qemu/qemu-1.5.1/kvm-all.c:1485 #11 0x7ff4c0827e14 in kvm_cpu_exec (env=0x7ff4c15bf270) at /mnt/zhanghaoyu/qemu/qemu-1.5.1/kvm-all.c:1634 #12 0x7ff4c07b6f27 in qemu_kvm_cpu_thread_fn (arg=0x7ff4c15bf270) at /mnt/zhanghaoyu/qemu/qemu-1.5.1/cpus.c:759 #13 0x7ff4be58af05 in start_thread () from /lib64/libpthread.so.0 #14 0x7ff4be2cd53d in clone () from /lib64/libc.so.6 If I apply below patch to the upstream qemu, this problem will disappear, --- hw/i386/kvmvapic.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/hw/i386/kvmvapic.c b/hw/i386/kvmvapic.c index 15beb80..6fff299 100644 --- a/hw/i386/kvmvapic.c +++ b/hw/i386/kvmvapic.c @@ -652,11 +652,11 @@ static void vapic_write(void *opaque, hwaddr addr, uint64_t data, switch (size) { case 2: if (s-state == VAPIC_INACTIVE) { -rom_paddr = (env-segs[R_CS].base + env-eip) ROM_BLOCK_MASK; -s-rom_state_paddr = rom_paddr + data; - s-state = VAPIC_STANDBY; } +rom_paddr = (env-segs[R_CS].base + env-eip) ROM_BLOCK_MASK; +s-rom_state_paddr = rom_paddr + data; + if (vapic_prepare(s) 0) { s-state = VAPIC_INACTIVE; break; -- 1.8.1.4 Thanks, Daniel
Re: [Qemu-devel] [KVM] segmentation fault happened when reboot VM after hot-uplug virtio NIC
Hi, all Segmentation fault happened when reboot VM after hot-unplug virtio NIC, which can be reproduced 100%. See similar bug report to https://bugzilla.redhat.com/show_bug.cgi?id=988256 test environment: host: SLES11SP2 (kenrel version: 3.0.58) qemu: 1.5.1, upstream-qemu (commit 545825d4cda03ea292b7788b3401b99860efe8bc) libvirt: 1.1.0 guest os: win2k8 R2 x64bit or sles11sp2 x64 or win2k3 32bit You can reproduce this problem by following steps: 1. start a VM with virtio NIC(s) 2. hot-unplug a virtio NIC from the VM 3. reboot the VM, then segmentation fault happened during starting period the qemu backtrace shown as below: #0 0x7ff4be3288d0 in __memcmp_sse4_1 () from /lib64/libc.so.6 #1 0x7ff4c07f82c0 in patch_hypercalls (s=0x7ff4c15dd610) at /mnt/zhanghaoyu/qemu/qemu-1.5.1/hw/i386/kvmvapic.c:549 #2 0x7ff4c07f84f0 in vapic_prepare (s=0x7ff4c15dd610) at /mnt/zhanghaoyu/qemu/qemu-1.5.1/hw/i386/kvmvapic.c:614 #3 0x7ff4c07f85e7 in vapic_write (opaque=0x7ff4c15dd610, addr=0, data=32, size=2) at /mnt/zhanghaoyu/qemu/qemu-1.5.1/hw/i386/kvmvapic.c:651 #4 0x7ff4c082a917 in memory_region_write_accessor (opaque=0x7ff4c15df938, addr=0, value=0x7ff4bbfe3d00, size=2, shift=0, mask=65535) at /mnt/zhanghaoyu/qemu/qemu-1.5.1/memory.c:334 #5 0x7ff4c082a9ee in access_with_adjusted_size (addr=0, value=0x7ff4bbfe3d00, size=2, access_size_min=1, access_size_max=4, access=0x7ff4c082a89a memory_region_write_accessor, opaque=0x7ff4c15df938) at /mnt/zhanghaoyu/qemu/qemu-1.5.1/memory.c:364 #6 0x7ff4c082ae49 in memory_region_iorange_write (iorange=0x7ff4c15dfca0, offset=0, width=2, data=32) at /mnt/zhanghaoyu/qemu/qemu-1.5.1/memory.c:439 #7 0x7ff4c08236f7 in ioport_writew_thunk (opaque=0x7ff4c15dfca0, addr=126, data=32) at /mnt/zhanghaoyu/qemu/qemu-1.5.1/ioport.c:219 #8 0x7ff4c0823078 in ioport_write (index=1, address=126, data=32) at /mnt/zhanghaoyu/qemu/qemu-1.5.1/ioport.c:83 #9 0x7ff4c0823ca9 in cpu_outw (addr=126, val=32) at /mnt/zhanghaoyu/qemu/qemu-1.5.1/ioport.c:296 #10 0x7ff4c0827485 in kvm_handle_io (port=126, data=0x7ff4c051, direction=1, size=2, count=1) at /mnt/zhanghaoyu/qemu/qemu-1.5.1/kvm-all.c:1485 #11 0x7ff4c0827e14 in kvm_cpu_exec (env=0x7ff4c15bf270) at /mnt/zhanghaoyu/qemu/qemu-1.5.1/kvm-all.c:1634 #12 0x7ff4c07b6f27 in qemu_kvm_cpu_thread_fn (arg=0x7ff4c15bf270) at /mnt/zhanghaoyu/qemu/qemu-1.5.1/cpus.c:759 #13 0x7ff4be58af05 in start_thread () from /lib64/libpthread.so.0 #14 0x7ff4be2cd53d in clone () from /lib64/libc.so.6 If I apply below patch to the upstream qemu, this problem will disappear, --- hw/i386/kvmvapic.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/hw/i386/kvmvapic.c b/hw/i386/kvmvapic.c index 15beb80..6fff299 100644 --- a/hw/i386/kvmvapic.c +++ b/hw/i386/kvmvapic.c @@ -652,11 +652,11 @@ static void vapic_write(void *opaque, hwaddr addr, uint64_t data, switch (size) { case 2: if (s-state == VAPIC_INACTIVE) { -rom_paddr = (env-segs[R_CS].base + env-eip) ROM_BLOCK_MASK; -s-rom_state_paddr = rom_paddr + data; - s-state = VAPIC_STANDBY; } +rom_paddr = (env-segs[R_CS].base + env-eip) ROM_BLOCK_MASK; +s-rom_state_paddr = rom_paddr + data; + if (vapic_prepare(s) 0) { s-state = VAPIC_INACTIVE; break; Yes, we need to update the ROM's physical address after the BIOS reshuffled the layout. But I'm not happy with simply updating the address unconditionally. We need to understand the crash first, then make QEMU robust against the guest not issuing this initial write after a ROM region layout change. And finally make it work properly in the normal case. The direct cause of crash is trying to access invalid address, which is due to not updating the rom's physical address. In my opinion, since hot-plug/unplug involved in, we need to re-calculate rom's physical address for all devices which have rom during starting period when reboot/reset vm, is it reasonable to set vapic's state to VAPIC_INACTIVE during vapic's reset? Jan
Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with EPT enabled
I tested below combos of qemu and kernel, ++-+-+ |kernel | QEMU | migration | ++-+-+ | SLES11SP2+kvm-kmod-3.6 | qemu-1.6.0|GOOD | ++-+-+ | SLES11SP2+kvm-kmod-3.6 | qemu-1.6.0* |BAD | ++-+-+ | SLES11SP2+kvm-kmod-3.6 | qemu-1.5.1|BAD | ++-+-+ | SLES11SP2+kvm-kmod-3.6*| qemu-1.5.1|GOOD | ++-+-+ | SLES11SP2+kvm-kmod-3.6 | qemu-1.5.1* |GOOD | ++-+-+ | SLES11SP2+kvm-kmod-3.6 | qemu-1.5.2|BAD | ++-+-+ | kvm-3.11-2 | qemu-1.5.1|BAD | ++-+-+ NOTE: 1. kvm-3.11-2 : the whole tag kernel downloaded from https://git.kernel.org/pub/scm/virt/kvm/kvm.git 2. SLES11SP2+kvm-kmod-3.6 : our release kernel, replace the SLES11SP2's default kvm-kmod with kvm-kmod-3.6, SLES11SP2's kernel version is 3.0.13-0.27 3. qemu-1.6.0* : revert the commit 211ea74022f51164a7729030b28eec90b6c99a08 on qemu-1.6.0 4. kvm-kmod-3.6* : kvm-kmod-3.6 with EPT disabled 5. qemu-1.5.1* : apply below patch to qemu-1.5.1 to delete qemu_madvise() statement in ram_load() function --- qemu-1.5.1/arch_init.c 2013-06-27 05:47:29.0 +0800 +++ qemu-1.5.1_fix3/arch_init.c 2013-08-28 19:43:42.0 +0800 @@ -842,7 +842,6 @@ static int ram_load(QEMUFile *f, void *o if (ch == 0 (!kvm_enabled() || kvm_has_sync_mmu()) getpagesize() = TARGET_PAGE_SIZE) { -qemu_madvise(host, TARGET_PAGE_SIZE, QEMU_MADV_DONTNEED); } #endif } else if (flags RAM_SAVE_FLAG_PAGE) { If I apply above patch to qemu-1.5.1 to delete the qemu_madvise() statement, the test result of the combos of SLES11SP2+kvm-kmod-3.6 and qemu-1.5.1 is good. Why do we perform the qemu_madvise(QEMU_MADV_DONTNEED) for those zero pages? Does the qemu_madvise() have sustained effect on the range of virtual address? In other words, does qemu_madvise() have sustained effect on the VM performance? If later frequently read/write the range of virtual address which have been advised to DONTNEED, could performance degradation happen? The reason why the combos of SLES11SP2+kvm-kmod-3.6 and qemu-1.6.0 is good, is because of commit 211ea74022f51164a7729030b28eec90b6c99a08, if I revert the commit 211ea74022f51164a7729030b28eec90b6c99a08 on qemu-1.6.0, the test result of combos of SLES11SP2+kvm-kmod-3.6 and qemu-1.6.0 is bad, performance degradation happened, too. Thanks, Zhang Haoyu The QEMU command line (/var/log/libvirt/qemu/[domain name].log), LC_ALL=C PATH=/bin:/sbin:/usr/bin:/usr/sbin HOME=/ QEMU_AUDIO_DRV=none /usr/local/bin/qemu-system-x86_64 -name ATS1 -S -M pc-0.12 -cpu qemu32 -enable-kvm -m 12288 -smp 4,sockets=4,cores=1,threads=1 -uuid 0505ec91-382d-800e-2c79-e5b286eb60b5 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/ATS1.monitor,serv er, n owait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/opt/ne/vm/ATS1.img,if=none,id=drive-virtio-disk0,format=raw, cac h e=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk 0,i d =virtio-disk0,bootindex=1 -netdev tap,fd=20,id=hostnet0,vhost=on,vhostfd=21 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:e0:fc:00:0f:00,bus=pci. 0 ,addr=0x3,bootindex=2 -netdev tap,fd=22,id=hostnet1,vhost=on,vhostfd=23 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:e0:fc:01:0f:00,bus=pci. 0 ,addr=0x4 -netdev tap,fd=24,id=hostnet2,vhost=on,vhostfd=25 -device virtio-net-pci,netdev=hostnet2,id=net2,mac=00:e0:fc:02:0f:00,bus=pci. 0 ,addr=0x5 -netdev tap,fd=26,id=hostnet3,vhost=on,vhostfd=27 -device virtio-net-pci,netdev=hostnet3,id=net3,mac=00:e0:fc:03:0f:00,bus=pci. 0 ,addr=0x6 -netdev tap,fd=28,id=hostnet4,vhost=on,vhostfd=29 -device virtio-net-pci,netdev=hostnet4,id=net4,mac=00:e0:fc:0a:0f:00,bus=pci. 0 ,addr=0x7 -netdev tap,fd=30,id=hostnet5,vhost=on,vhostfd=31 -device virtio-net-pci,netdev=hostnet5,id=net5,mac=00:e0:fc:0b:0f:00,bus=pci. 0 ,addr=0x9 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc *:0 -k en-us -vga cirrus -device i6300esb,id=watchdog0,bus=pci.0,addr=0xb -watchdog-action poweroff -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xa Which QEMU version is this? Can you try with e1000 NICs instead of virtio? This QEMU version is 1.0.0, but
[Qemu-devel] [kvm] segmentation fault when guest reboot or reset after hotunplug virtio NIC
Description of problem: when guest do reboot or reset after hotunplug virtio NIC, Segmentation fault occurs.It can reproduce 100%. Similar to https://bugzilla.redhat.com/show_bug.cgi?id=988256 Version-Release number of selected component (if applicable): Host OS:sles11sp2 kernel version:3.0.58 qemu-1.5.1 libvirt-1.1.0 guest os:win2k8 R2 x64bit or sles11sp2 x64 or win2k3 32bit Steps shown as below: 1.use virsh to start a vm with a virtio NIC 2.after booting, use virsh detach-device to hotunplug the virito NIC 3.use virsh reboot/reset the restart the vm 4.when vm is rebooting, Segmentation fault appears. the backstrace: #0 0x7ff4be3288d0 in __memcmp_sse4_1 () from /lib64/libc.so.6 #1 0x7ff4c07f82c0 in patch_hypercalls (s=0x7ff4c15dd610) at /mnt/zhanghaoyu/qemu/qemu-1.5.1/hw/i386/kvmvapic.c:549 #2 0x7ff4c07f84f0 in vapic_prepare (s=0x7ff4c15dd610) at /mnt/zhanghaoyu/qemu/qemu-1.5.1/hw/i386/kvmvapic.c:614 #3 0x7ff4c07f85e7 in vapic_write (opaque=0x7ff4c15dd610, addr=0, data=32, size=2) at /mnt/zhanghaoyu/qemu/qemu-1.5.1/hw/i386/kvmvapic.c:651 #4 0x7ff4c082a917 in memory_region_write_accessor (opaque=0x7ff4c15df938, addr=0, value=0x7ff4bbfe3d00, size=2, shift=0, mask=65535) at /mnt/zhanghaoyu/qemu/qemu-1.5.1/memory.c:334 #5 0x7ff4c082a9ee in access_with_adjusted_size (addr=0, value=0x7ff4bbfe3d00, size=2, access_size_min=1, access_size_max=4, access=0x7ff4c082a89a memory_region_write_accessor, opaque=0x7ff4c15df938) at /mnt/zhanghaoyu/qemu/qemu-1.5.1/memory.c:364 #6 0x7ff4c082ae49 in memory_region_iorange_write (iorange=0x7ff4c15dfca0, offset=0, width=2, data=32) at /mnt/zhanghaoyu/qemu/qemu-1.5.1/memory.c:439 #7 0x7ff4c08236f7 in ioport_writew_thunk (opaque=0x7ff4c15dfca0, addr=126, data=32) at /mnt/zhanghaoyu/qemu/qemu-1.5.1/ioport.c:219 #8 0x7ff4c0823078 in ioport_write (index=1, address=126, data=32) at /mnt/zhanghaoyu/qemu/qemu-1.5.1/ioport.c:83 #9 0x7ff4c0823ca9 in cpu_outw (addr=126, val=32) at /mnt/zhanghaoyu/qemu/qemu-1.5.1/ioport.c:296 #10 0x7ff4c0827485 in kvm_handle_io (port=126, data=0x7ff4c051, direction=1, size=2, count=1) at /mnt/zhanghaoyu/qemu/qemu-1.5.1/kvm-all.c:1485 #11 0x7ff4c0827e14 in kvm_cpu_exec (env=0x7ff4c15bf270) at /mnt/zhanghaoyu/qemu/qemu-1.5.1/kvm-all.c:1634 #12 0x7ff4c07b6f27 in qemu_kvm_cpu_thread_fn (arg=0x7ff4c15bf270) at /mnt/zhanghaoyu/qemu/qemu-1.5.1/cpus.c:759 #13 0x7ff4be58af05 in start_thread () from /lib64/libpthread.so.0 #14 0x7ff4be2cd53d in clone () from /lib64/libc.so.6 In function vapic_write(), when reboot or reset the vm after hotunplug the virtio NIC, the rom_paddr may changed since virtio NIC rom will not load to ram. switch (size) { case 2: if (s-state == VAPIC_INACTIVE) { rom_paddr = (env-segs[R_CS].base + env-eip) ROM_BLOCK_MASK; s-rom_state_paddr = rom_paddr + data; s-state = VAPIC_STANDBY; } if (vapic_prepare(s) 0) { s-state = VAPIC_INACTIVE; break; } So I change this code like this: switch (size) { case 2: if (s-state == VAPIC_INACTIVE) { s-state = VAPIC_STANDBY; } rom_paddr = (env-segs[R_CS].base + env-eip) ROM_BLOCK_MASK; s-rom_state_paddr = rom_paddr + data; if (vapic_prepare(s) 0) { s-state = VAPIC_INACTIVE; break; } Apply above change, the segmentation fault disappears and the vm reboot or reset successfully. Is above change the correct way to fix the problem? Thanks, Daniel
Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with EPT enabled
The QEMU command line (/var/log/libvirt/qemu/[domain name].log), LC_ALL=C PATH=/bin:/sbin:/usr/bin:/usr/sbin HOME=/ QEMU_AUDIO_DRV=none /usr/local/bin/qemu-system-x86_64 -name ATS1 -S -M pc-0.12 -cpu qemu32 -enable-kvm -m 12288 -smp 4,sockets=4,cores=1,threads=1 -uuid 0505ec91-382d-800e-2c79-e5b286eb60b5 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/ATS1.monitor,ser ver, n owait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/opt/ne/vm/ATS1.img,if=none,id=drive-virtio-disk0,format=raw ,cac h e=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-dis k0,i d =virtio-disk0,bootindex=1 -netdev tap,fd=20,id=hostnet0,vhost=on,vhostfd=21 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:e0:fc:00:0f:00,bus=pci. 0 ,addr=0x3,bootindex=2 -netdev tap,fd=22,id=hostnet1,vhost=on,vhostfd=23 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:e0:fc:01:0f:00,bus=pci. 0 ,addr=0x4 -netdev tap,fd=24,id=hostnet2,vhost=on,vhostfd=25 -device virtio-net-pci,netdev=hostnet2,id=net2,mac=00:e0:fc:02:0f:00,bus=pci. 0 ,addr=0x5 -netdev tap,fd=26,id=hostnet3,vhost=on,vhostfd=27 -device virtio-net-pci,netdev=hostnet3,id=net3,mac=00:e0:fc:03:0f:00,bus=pci. 0 ,addr=0x6 -netdev tap,fd=28,id=hostnet4,vhost=on,vhostfd=29 -device virtio-net-pci,netdev=hostnet4,id=net4,mac=00:e0:fc:0a:0f:00,bus=pci. 0 ,addr=0x7 -netdev tap,fd=30,id=hostnet5,vhost=on,vhostfd=31 -device virtio-net-pci,netdev=hostnet5,id=net5,mac=00:e0:fc:0b:0f:00,bus=pci. 0 ,addr=0x9 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc *:0 -k en-us -vga cirrus -device i6300esb,id=watchdog0,bus=pci.0,addr=0xb -watchdog-action poweroff -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xa Which QEMU version is this? Can you try with e1000 NICs instead of virtio? This QEMU version is 1.0.0, but I also test QEMU 1.5.2, the same problem exists, including the performance degradation and readonly GFNs' flooding. I tried with e1000 NICs instead of virtio, including the performance degradation and readonly GFNs' flooding, the QEMU version is 1.5.2. No matter e1000 NICs or virtio NICs, the GFNs' flooding is initiated at post-restore stage (i.e. running stage), as soon as the restoring completed, the flooding is starting. Thanks, Zhang Haoyu -- Gleb. Should we focus on the first bad commit(612819c3c6e67bac8fceaa7cc402f13b1b63f7e4) and the surprising GFNs' flooding? Not really. There is no point in debugging very old version compiled with kvm-kmod, there are to many variables in the environment. I cannot reproduce the GFN flooding on upstream, so the problem may be gone, may be a result of kvm-kmod problem or something different in how I invoke qemu. So the best way to proceed is for you to reproduce with upstream version then at least I will be sure that we are using the same code. Thanks, I will test the combos of upstream kvm kernel and upstream qemu. And, the guest os version above I said was wrong, current running guest os is SLES10SP4. I tested below combos of qemu and kernel, +-+-+-+ | kvm kernel | QEMU | test result | +-+-+-+ | kvm-3.11-2 | qemu-1.5.2| GOOD | +-+-+-+ | SLES11SP2 | qemu-1.0.0| BAD| +-+-+-+ | SLES11SP2 | qemu-1.4.0| BAD| +-+-+-+ | SLES11SP2 | qemu-1.4.2| BAD| +-+-+-+ | SLES11SP2 | qemu-1.5.0-rc0 | GOOD | +-+-+-+ | SLES11SP2 | qemu-1.5.0| GOOD | +-+-+-+ | SLES11SP2 | qemu-1.5.1| GOOD | +-+-+-+ | SLES11SP2 | qemu-1.5.2| GOOD | +-+-+-+ NOTE: 1. above kvm-3.11-2 in the table is the whole tag kernel download from https://git.kernel.org/pub/scm/virt/kvm/kvm.git 2. SLES11SP2's kernel version is 3.0.13-0.27 Then I git bisect the qemu changes between qemu-1.4.2 and qemu-1.5.0-rc0 by marking the good version as bad, and the bad version as good, so the first bad commit is just the patch which fixes the degradation problem. ++---+-+-+ | bisect No. | commit | save-restore | migration|
Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with EPT enabled
The QEMU command line (/var/log/libvirt/qemu/[domain name].log), LC_ALL=C PATH=/bin:/sbin:/usr/bin:/usr/sbin HOME=/ QEMU_AUDIO_DRV=none /usr/local/bin/qemu-system-x86_64 -name ATS1 -S -M pc-0.12 -cpu qemu32 -enable-kvm -m 12288 -smp 4,sockets=4,cores=1,threads=1 -uuid 0505ec91-382d-800e-2c79-e5b286eb60b5 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/ATS1.monitor,server, n owait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/opt/ne/vm/ATS1.img,if=none,id=drive-virtio-disk0,format=raw,cac h e=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk0,i d =virtio-disk0,bootindex=1 -netdev tap,fd=20,id=hostnet0,vhost=on,vhostfd=21 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:e0:fc:00:0f:00,bus=pci. 0 ,addr=0x3,bootindex=2 -netdev tap,fd=22,id=hostnet1,vhost=on,vhostfd=23 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:e0:fc:01:0f:00,bus=pci. 0 ,addr=0x4 -netdev tap,fd=24,id=hostnet2,vhost=on,vhostfd=25 -device virtio-net-pci,netdev=hostnet2,id=net2,mac=00:e0:fc:02:0f:00,bus=pci. 0 ,addr=0x5 -netdev tap,fd=26,id=hostnet3,vhost=on,vhostfd=27 -device virtio-net-pci,netdev=hostnet3,id=net3,mac=00:e0:fc:03:0f:00,bus=pci. 0 ,addr=0x6 -netdev tap,fd=28,id=hostnet4,vhost=on,vhostfd=29 -device virtio-net-pci,netdev=hostnet4,id=net4,mac=00:e0:fc:0a:0f:00,bus=pci. 0 ,addr=0x7 -netdev tap,fd=30,id=hostnet5,vhost=on,vhostfd=31 -device virtio-net-pci,netdev=hostnet5,id=net5,mac=00:e0:fc:0b:0f:00,bus=pci. 0 ,addr=0x9 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc *:0 -k en-us -vga cirrus -device i6300esb,id=watchdog0,bus=pci.0,addr=0xb -watchdog-action poweroff -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xa Which QEMU version is this? Can you try with e1000 NICs instead of virtio? This QEMU version is 1.0.0, but I also test QEMU 1.5.2, the same problem exists, including the performance degradation and readonly GFNs' flooding. I tried with e1000 NICs instead of virtio, including the performance degradation and readonly GFNs' flooding, the QEMU version is 1.5.2. No matter e1000 NICs or virtio NICs, the GFNs' flooding is initiated at post-restore stage (i.e. running stage), as soon as the restoring completed, the flooding is starting. Thanks, Zhang Haoyu -- Gleb. Should we focus on the first bad commit(612819c3c6e67bac8fceaa7cc402f13b1b63f7e4) and the surprising GFNs' flooding? Not really. There is no point in debugging very old version compiled with kvm-kmod, there are to many variables in the environment. I cannot reproduce the GFN flooding on upstream, so the problem may be gone, may be a result of kvm-kmod problem or something different in how I invoke qemu. So the best way to proceed is for you to reproduce with upstream version then at least I will be sure that we are using the same code. Thanks, I will test the combos of upstream kvm kernel and upstream qemu. And, the guest os version above I said was wrong, current running guest os is SLES10SP4. Thanks, Zhang Haoyu I applied below patch to __direct_map(), @@ -2223,6 +2223,8 @@ static int __direct_map(struct kvm_vcpu int pt_write = 0; gfn_t pseudo_gfn; +map_writable = true; + for_each_shadow_entry(vcpu, (u64)gfn PAGE_SHIFT, iterator) { if (iterator.level == level) { unsigned pte_access = ACC_ALL; and rebuild the kvm-kmod, then re-insmod it. After I started a VM, the host seemed to be abnormal, so many programs cannot be started successfully, segmentation fault is reported. In my opinion, after above patch applied, the commit: 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4 should be of no effect, but the test result proved me wrong. Dose the map_writable value's getting process in hva_to_pfn() have effect on the result? If hva_to_pfn() returns map_writable == false it means that page is mapped as read only on primary MMU, so it should not be mapped writable on secondary MMU either. This should not happen usually. -- Gleb.
Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with EPT enabled
The QEMU command line (/var/log/libvirt/qemu/[domain name].log), LC_ALL=C PATH=/bin:/sbin:/usr/bin:/usr/sbin HOME=/ QEMU_AUDIO_DRV=none /usr/local/bin/qemu-system-x86_64 -name ATS1 -S -M pc-0.12 -cpu qemu32 -enable-kvm -m 12288 -smp 4,sockets=4,cores=1,threads=1 -uuid 0505ec91-382d-800e-2c79-e5b286eb60b5 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/ATS1.monitor,server,n owait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/opt/ne/vm/ATS1.img,if=none,id=drive-virtio-disk0,format=raw,cach e=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk0,id =virtio-disk0,bootindex=1 -netdev tap,fd=20,id=hostnet0,vhost=on,vhostfd=21 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:e0:fc:00:0f:00,bus=pci.0 ,addr=0x3,bootindex=2 -netdev tap,fd=22,id=hostnet1,vhost=on,vhostfd=23 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:e0:fc:01:0f:00,bus=pci.0 ,addr=0x4 -netdev tap,fd=24,id=hostnet2,vhost=on,vhostfd=25 -device virtio-net-pci,netdev=hostnet2,id=net2,mac=00:e0:fc:02:0f:00,bus=pci.0 ,addr=0x5 -netdev tap,fd=26,id=hostnet3,vhost=on,vhostfd=27 -device virtio-net-pci,netdev=hostnet3,id=net3,mac=00:e0:fc:03:0f:00,bus=pci.0 ,addr=0x6 -netdev tap,fd=28,id=hostnet4,vhost=on,vhostfd=29 -device virtio-net-pci,netdev=hostnet4,id=net4,mac=00:e0:fc:0a:0f:00,bus=pci.0 ,addr=0x7 -netdev tap,fd=30,id=hostnet5,vhost=on,vhostfd=31 -device virtio-net-pci,netdev=hostnet5,id=net5,mac=00:e0:fc:0b:0f:00,bus=pci.0 ,addr=0x9 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc *:0 -k en-us -vga cirrus -device i6300esb,id=watchdog0,bus=pci.0,addr=0xb -watchdog-action poweroff -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xa Which QEMU version is this? Can you try with e1000 NICs instead of virtio? This QEMU version is 1.0.0, but I also test QEMU 1.5.2, the same problem exists, including the performance degradation and readonly GFNs' flooding. I tried with e1000 NICs instead of virtio, including the performance degradation and readonly GFNs' flooding, the QEMU version is 1.5.2. No matter e1000 NICs or virtio NICs, the GFNs' flooding is initiated at post-restore stage (i.e. running stage), as soon as the restoring completed, the flooding is starting. Thanks, Zhang Haoyu -- Gleb.
Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with EPT enabled
The QEMU command line (/var/log/libvirt/qemu/[domain name].log), LC_ALL=C PATH=/bin:/sbin:/usr/bin:/usr/sbin HOME=/ QEMU_AUDIO_DRV=none /usr/local/bin/qemu-system-x86_64 -name ATS1 -S -M pc-0.12 -cpu qemu32 -enable-kvm -m 12288 -smp 4,sockets=4,cores=1,threads=1 -uuid 0505ec91-382d-800e-2c79-e5b286eb60b5 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/ATS1.monitor,server, n owait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/opt/ne/vm/ATS1.img,if=none,id=drive-virtio-disk0,format=raw,cac h e=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk0,i d =virtio-disk0,bootindex=1 -netdev tap,fd=20,id=hostnet0,vhost=on,vhostfd=21 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:e0:fc:00:0f:00,bus=pci. 0 ,addr=0x3,bootindex=2 -netdev tap,fd=22,id=hostnet1,vhost=on,vhostfd=23 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:e0:fc:01:0f:00,bus=pci. 0 ,addr=0x4 -netdev tap,fd=24,id=hostnet2,vhost=on,vhostfd=25 -device virtio-net-pci,netdev=hostnet2,id=net2,mac=00:e0:fc:02:0f:00,bus=pci. 0 ,addr=0x5 -netdev tap,fd=26,id=hostnet3,vhost=on,vhostfd=27 -device virtio-net-pci,netdev=hostnet3,id=net3,mac=00:e0:fc:03:0f:00,bus=pci. 0 ,addr=0x6 -netdev tap,fd=28,id=hostnet4,vhost=on,vhostfd=29 -device virtio-net-pci,netdev=hostnet4,id=net4,mac=00:e0:fc:0a:0f:00,bus=pci. 0 ,addr=0x7 -netdev tap,fd=30,id=hostnet5,vhost=on,vhostfd=31 -device virtio-net-pci,netdev=hostnet5,id=net5,mac=00:e0:fc:0b:0f:00,bus=pci. 0 ,addr=0x9 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc *:0 -k en-us -vga cirrus -device i6300esb,id=watchdog0,bus=pci.0,addr=0xb -watchdog-action poweroff -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xa Which QEMU version is this? Can you try with e1000 NICs instead of virtio? This QEMU version is 1.0.0, but I also test QEMU 1.5.2, the same problem exists, including the performance degradation and readonly GFNs' flooding. I tried with e1000 NICs instead of virtio, including the performance degradation and readonly GFNs' flooding, the QEMU version is 1.5.2. No matter e1000 NICs or virtio NICs, the GFNs' flooding is initiated at post-restore stage (i.e. running stage), as soon as the restoring completed, the flooding is starting. Thanks, Zhang Haoyu -- Gleb. Should we focus on the first bad commit(612819c3c6e67bac8fceaa7cc402f13b1b63f7e4) and the surprising GFNs' flooding? I applied below patch to __direct_map(), @@ -2223,6 +2223,8 @@ static int __direct_map(struct kvm_vcpu int pt_write = 0; gfn_t pseudo_gfn; +map_writable = true; + for_each_shadow_entry(vcpu, (u64)gfn PAGE_SHIFT, iterator) { if (iterator.level == level) { unsigned pte_access = ACC_ALL; and rebuild the kvm-kmod, then re-insmod it. After I started a VM, the host seemed to be abnormal, so many programs cannot be started successfully, segmentation fault is reported. In my opinion, after above patch applied, the commit: 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4 should be of no effect, but the test result proved me wrong. Dose the map_writable value's getting process in hva_to_pfn() have effect on the result? Thanks, Zhang Haoyu
Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with EPT enabled
hi all, I met similar problem to these, while performing live migration or save-restore test on the kvm platform (qemu:1.4.0, host:suse11sp2, guest:suse11sp2), running tele-communication software suite in guest, https://lists.gnu.org/archive/html/qemu-devel/2013-05/msg00098.html http://comments.gmane.org/gmane.comp.emulators.kvm.devel/102506 http://thread.gmane.org/gmane.comp.emulators.kvm.devel/100592 https://bugzilla.kernel.org/show_bug.cgi?id=58771 After live migration or virsh restore [savefile], one process's CPU utilization went up by about 30%, resulted in throughput degradation of this process. If EPT disabled, this problem gone. I suspect that kvm hypervisor has business with this problem. Based on above suspect, I want to find the two adjacent versions of kvm-kmod which triggers this problem or not (e.g. 2.6.39, 3.0-rc1), and analyze the differences between this two versions, or apply the patches between this two versions by bisection method, finally find the key patches. Any better ideas? Thanks, Zhang Haoyu I've attempted to duplicate this on a number of machines that are as similar to yours as I am able to get my hands on, and so far have not been able to see any performance degradation. And from what I've read in the above links, huge pages do not seem to be part of the problem. So, if you are in a position to bisect the kernel changes, that would probably be the best avenue to pursue in my opinion. Bruce I found the first bad commit([612819c3c6e67bac8fceaa7cc402f13b1b63f7e4] KVM: propagate fault r/w information to gup(), allow read-only memory) which triggers this problem by git bisecting the kvm kernel (download from https://git.kernel.org/pub/scm/virt/kvm/kvm.git) changes. And, git log 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4 -n 1 -p 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.log git diff 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4~1..612819c3c6e67bac8fceaa7cc4 02f13b1b63f7e4 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.diff Then, I diffed 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.log and 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.diff, came to a conclusion that all of the differences between 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4~1 and 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4 are contributed by no other than 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4, so this commit is the peace-breaker which directly or indirectly causes the degradation. Does the map_writable flag passed to mmu_set_spte() function have effect on PTE's PAT flag or increase the VMEXITs induced by that guest tried to write read-only memory? Thanks, Zhang Haoyu There should be no read-only memory maps backing guest RAM. Can you confirm map_writable = false is being passed to __direct_map? (this should not happen, for guest RAM). And if it is false, please capture the associated GFN. I added below check and printk at the start of __direct_map() at the fist bad commit version, --- kvm-612819c3c6e67bac8fceaa7cc402f13b1b63f7e4/arch/x86/kvm/mmu.c 2013-07-26 18:44:05.0 +0800 +++ kvm-612819/arch/x86/kvm/mmu.c 2013-07-31 00:05:48.0 +0800 @@ -2223,6 +2223,9 @@ static int __direct_map(struct kvm_vcpu int pt_write = 0; gfn_t pseudo_gfn; +if (!map_writable) +printk(KERN_ERR %s: %s: gfn = %llu \n, __FILE__, __func__, gfn); + for_each_shadow_entry(vcpu, (u64)gfn PAGE_SHIFT, iterator) { if (iterator.level == level) { unsigned pte_access = ACC_ALL; I virsh-save the VM, and then virsh-restore it, so many GFNs were printed, you can absolutely describe it as flooding. The flooding you see happens during migrate to file stage because of dirty page tracking. If you clear dmesg after virsh-save you should not see any flooding after virsh-restore. I just checked with latest tree, I do not. I made a verification again. I virsh-save the VM, during the saving stage, I run 'dmesg', no GFN printed, maybe the switching from running stage to pause stage takes so short time, no guest-write happens during this switching period. After the completion of saving operation, I run 'demsg -c' to clear the buffer all the same, then I virsh-restore the VM, so many GFNs are printed by running 'dmesg', and I also run 'tail -f /var/log/messages' during the restoring stage, so many GFNs are flooded dynamically too. I'm sure that the flooding happens during the virsh-restore stage, not the migration stage. On VM's normal starting stage, only very few GFNs are printed, shown as below gfn = 16 gfn = 604 gfn = 605 gfn = 606 gfn = 607 gfn = 608 gfn = 609 but on the VM's restoring stage, so many GFNs are printed, taking some examples shown as below, 2042600 279 2797778 2797779 2797780 2797781 2797782 2797783 2797784
Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with EPT enabled
hi all, I met similar problem to these, while performing live migration or save-restore test on the kvm platform (qemu:1.4.0, host:suse11sp2, guest:suse11sp2), running tele-communication software suite in guest, https://lists.gnu.org/archive/html/qemu-devel/2013-05/msg00098.html http://comments.gmane.org/gmane.comp.emulators.kvm.devel/102506 http://thread.gmane.org/gmane.comp.emulators.kvm.devel/100592 https://bugzilla.kernel.org/show_bug.cgi?id=58771 After live migration or virsh restore [savefile], one process's CPU utilization went up by about 30%, resulted in throughput degradation of this process. If EPT disabled, this problem gone. I suspect that kvm hypervisor has business with this problem. Based on above suspect, I want to find the two adjacent versions of kvm-kmod which triggers this problem or not (e.g. 2.6.39, 3.0-rc1), and analyze the differences between this two versions, or apply the patches between this two versions by bisection method, finally find the key patches. Any better ideas? Thanks, Zhang Haoyu I've attempted to duplicate this on a number of machines that are as similar to yours as I am able to get my hands on, and so far have not been able to see any performance degradation. And from what I've read in the above links, huge pages do not seem to be part of the problem. So, if you are in a position to bisect the kernel changes, that would probably be the best avenue to pursue in my opinion. Bruce I found the first bad commit([612819c3c6e67bac8fceaa7cc402f13b1b63f7e4] KVM: propagate fault r/w information to gup(), allow read-only memory) which triggers this problem by git bisecting the kvm kernel (download from https://git.kernel.org/pub/scm/virt/kvm/kvm.git) changes. And, git log 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4 -n 1 -p 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.log git diff 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4~1..612819c3c6e67bac8fceaa7cc4 02f13b1b63f7e4 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.diff Then, I diffed 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.log and 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.diff, came to a conclusion that all of the differences between 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4~1 and 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4 are contributed by no other than 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4, so this commit is the peace-breaker which directly or indirectly causes the degradation. Does the map_writable flag passed to mmu_set_spte() function have effect on PTE's PAT flag or increase the VMEXITs induced by that guest tried to write read-only memory? Thanks, Zhang Haoyu There should be no read-only memory maps backing guest RAM. Can you confirm map_writable = false is being passed to __direct_map? (this should not happen, for guest RAM). And if it is false, please capture the associated GFN. I added below check and printk at the start of __direct_map() at the fist bad commit version, --- kvm-612819c3c6e67bac8fceaa7cc402f13b1b63f7e4/arch/x86/kvm/mmu.c 2013-07-26 18:44:05.0 +0800 +++ kvm-612819/arch/x86/kvm/mmu.c 2013-07-31 00:05:48.0 +0800 @@ -2223,6 +2223,9 @@ static int __direct_map(struct kvm_vcpu int pt_write = 0; gfn_t pseudo_gfn; +if (!map_writable) +printk(KERN_ERR %s: %s: gfn = %llu \n, __FILE__, __func__, gfn); + for_each_shadow_entry(vcpu, (u64)gfn PAGE_SHIFT, iterator) { if (iterator.level == level) { unsigned pte_access = ACC_ALL; I virsh-save the VM, and then virsh-restore it, so many GFNs were printed, you can absolutely describe it as flooding. The flooding you see happens during migrate to file stage because of dirty page tracking. If you clear dmesg after virsh-save you should not see any flooding after virsh-restore. I just checked with latest tree, I do not. I made a verification again. I virsh-save the VM, during the saving stage, I run 'dmesg', no GFN printed, maybe the switching from running stage to pause stage takes so short time, no guest-write happens during this switching period. After the completion of saving operation, I run 'demsg -c' to clear the buffer all the same, then I virsh-restore the VM, so many GFNs are printed by running 'dmesg', and I also run 'tail -f /var/log/messages' during the restoring stage, so many GFNs are flooded dynamically too. I'm sure that the flooding happens during the virsh-restore stage, not the migration stage. Interesting, is this with upstream kernel? For me the situation is exactly the opposite. What is your command line? I made the verification on the first bad commit
Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with EPT enabled
Hi, Am 05.08.2013 11:09, schrieb Zhanghaoyu (A): When I build the upstream, encounter a problem that I compile and install the upstream(commit: e769ece3b129698d2b09811a6f6d304e4eaa8c29) on sles11sp2 environment via below command cp /boot/config-3.0.13-0.27-default ./.config yes | make oldconfig make make modules_install make install then, I reboot the host, and select the upstream kernel, but during the starting stage, below problem happened, Could not find /dev/disk/by-id/scsi-3600508e0864407c5b8f7ad01-part3 I'm trying to resolve it. Possibly you need to enable loading unsupported kernel modules? At least that's needed when testing a kmod with a SUSE kernel. I have tried to set allow_unsupported_modules 1 in /etc/modprobe.d/unsupported-modules, but the problem still happened. I replace the whole kernel with the kvm kernel, not only the kvm modules. Regards, Andreas
Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with ETP enabled
hi all, I met similar problem to these, while performing live migration or save-restore test on the kvm platform (qemu:1.4.0, host:suse11sp2, guest:suse11sp2), running tele-communication software suite in guest, https://lists.gnu.org/archive/html/qemu-devel/2013-05/msg00098.html http://comments.gmane.org/gmane.comp.emulators.kvm.devel/102506 http://thread.gmane.org/gmane.comp.emulators.kvm.devel/100592 https://bugzilla.kernel.org/show_bug.cgi?id=58771 After live migration or virsh restore [savefile], one process's CPU utilization went up by about 30%, resulted in throughput degradation of this process. If EPT disabled, this problem gone. I suspect that kvm hypervisor has business with this problem. Based on above suspect, I want to find the two adjacent versions of kvm-kmod which triggers this problem or not (e.g. 2.6.39, 3.0-rc1), and analyze the differences between this two versions, or apply the patches between this two versions by bisection method, finally find the key patches. Any better ideas? Thanks, Zhang Haoyu I've attempted to duplicate this on a number of machines that are as similar to yours as I am able to get my hands on, and so far have not been able to see any performance degradation. And from what I've read in the above links, huge pages do not seem to be part of the problem. So, if you are in a position to bisect the kernel changes, that would probably be the best avenue to pursue in my opinion. Bruce I found the first bad commit([612819c3c6e67bac8fceaa7cc402f13b1b63f7e4] KVM: propagate fault r/w information to gup(), allow read-only memory) which triggers this problem by git bisecting the kvm kernel (download from https://git.kernel.org/pub/scm/virt/kvm/kvm.git) changes. And, git log 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4 -n 1 -p 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.log git diff 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4~1..612819c3c6e67bac8fceaa7cc4 02f13b1b63f7e4 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.diff Then, I diffed 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.log and 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.diff, came to a conclusion that all of the differences between 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4~1 and 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4 are contributed by no other than 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4, so this commit is the peace-breaker which directly or indirectly causes the degradation. Does the map_writable flag passed to mmu_set_spte() function have effect on PTE's PAT flag or increase the VMEXITs induced by that guest tried to write read-only memory? Thanks, Zhang Haoyu There should be no read-only memory maps backing guest RAM. Can you confirm map_writable = false is being passed to __direct_map? (this should not happen, for guest RAM). And if it is false, please capture the associated GFN. I added below check and printk at the start of __direct_map() at the fist bad commit version, --- kvm-612819c3c6e67bac8fceaa7cc402f13b1b63f7e4/arch/x86/kvm/mmu.c 2013-07-26 18:44:05.0 +0800 +++ kvm-612819/arch/x86/kvm/mmu.c 2013-07-31 00:05:48.0 +0800 @@ -2223,6 +2223,9 @@ static int __direct_map(struct kvm_vcpu int pt_write = 0; gfn_t pseudo_gfn; +if (!map_writable) +printk(KERN_ERR %s: %s: gfn = %llu \n, __FILE__, __func__, gfn); + for_each_shadow_entry(vcpu, (u64)gfn PAGE_SHIFT, iterator) { if (iterator.level == level) { unsigned pte_access = ACC_ALL; I virsh-save the VM, and then virsh-restore it, so many GFNs were printed, you can absolutely describe it as flooding. Its probably an issue with an older get_user_pages variant (either in kvm-kmod or the older kernel). Is there any indication of a similar issue with upstream kernel? I will test the upstream kvm host(https://git.kernel.org/pub/scm/virt/kvm/kvm.git) later, if the problem is still there, I will revert the first bad commit patch: 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4 on the upstream, then test it again. And, I collected the VMEXITs statistics in pre-save and post-restore period at first bad commit version, pre-save: COTS-F10S03:~ # perf stat -e kvm:* -a sleep 30 Performance counter stats for 'sleep 30': 1222318 kvm:kvm_entry 0 kvm:kvm_hypercall 0 kvm:kvm_hv_hypercall 351755 kvm:kvm_pio 6703 kvm:kvm_cpuid 692502 kvm:kvm_apic 1234173 kvm:kvm_exit 223956 kvm:kvm_inj_virq 0 kvm:kvm_inj_exception 16028 kvm:kvm_page_fault 59872 kvm:kvm_msr 0 kvm:kvm_cr 169596 kvm:kvm_pic_set_irq 81455 kvm:kvm_apic_ipi 245103 kvm:kvm_apic_accept_irq 0 kvm:kvm_nested_vmrun 0 kvm:kvm_nested_intercepts
Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with ETP enabled
hi all, I met similar problem to these, while performing live migration or save-restore test on the kvm platform (qemu:1.4.0, host:suse11sp2, guest:suse11sp2), running tele-communication software suite in guest, https://lists.gnu.org/archive/html/qemu-devel/2013-05/msg00098.html http://comments.gmane.org/gmane.comp.emulators.kvm.devel/102506 http://thread.gmane.org/gmane.comp.emulators.kvm.devel/100592 https://bugzilla.kernel.org/show_bug.cgi?id=58771 After live migration or virsh restore [savefile], one process's CPU utilization went up by about 30%, resulted in throughput degradation of this process. If EPT disabled, this problem gone. I suspect that kvm hypervisor has business with this problem. Based on above suspect, I want to find the two adjacent versions of kvm-kmod which triggers this problem or not (e.g. 2.6.39, 3.0-rc1), and analyze the differences between this two versions, or apply the patches between this two versions by bisection method, finally find the key patches. Any better ideas? Thanks, Zhang Haoyu I've attempted to duplicate this on a number of machines that are as similar to yours as I am able to get my hands on, and so far have not been able to see any performance degradation. And from what I've read in the above links, huge pages do not seem to be part of the problem. So, if you are in a position to bisect the kernel changes, that would probably be the best avenue to pursue in my opinion. Bruce I found the first bad commit([612819c3c6e67bac8fceaa7cc402f13b1b63f7e4] KVM: propagate fault r/w information to gup(), allow read-only memory) which triggers this problem by git bisecting the kvm kernel (download from https://git.kernel.org/pub/scm/virt/kvm/kvm.git) changes. And, git log 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4 -n 1 -p 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.log git diff 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4~1..612819c3c6e67bac8fceaa7cc402f13b1b63f7e4 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.diff Then, I diffed 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.log and 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.diff, came to a conclusion that all of the differences between 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4~1 and 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4 are contributed by no other than 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4, so this commit is the peace-breaker which directly or indirectly causes the degradation. Does the map_writable flag passed to mmu_set_spte() function have effect on PTE's PAT flag or increase the VMEXITs induced by that guest tried to write read-only memory? Thanks, Zhang Haoyu
[Qemu-devel] vm performance degradation after kvm live migration or save-restore with ETP enabled
hi all, I met similar problem to these, while performing live migration or save-restore test on the kvm platform (qemu:1.4.0, host:suse11sp2, guest:suse11sp2), running tele-communication software suite in guest, https://lists.gnu.org/archive/html/qemu-devel/2013-05/msg00098.html http://comments.gmane.org/gmane.comp.emulators.kvm.devel/102506 http://thread.gmane.org/gmane.comp.emulators.kvm.devel/100592 https://bugzilla.kernel.org/show_bug.cgi?id=58771 After live migration or virsh restore [savefile], one process's CPU utilization went up by about 30%, resulted in throughput degradation of this process. oprofile report on this process in guest, pre live migration: CPU: CPU with timer interrupt, speed 0 MHz (estimated) Profiling through timer interrupt samples %app name symbol name 248 12.3016 no-vmlinux (no symbols) 783.8690 libc.so.6memset 683.3730 libc.so.6memcpy 301.4881 cscf.scu SipMmBufMemAlloc 291.4385 libpthread.so.0 pthread_mutex_lock 261.2897 cscf.scu SipApiGetNextIe 251.2401 cscf.scu DBFI_DATA_Search 200.9921 libpthread.so.0 __pthread_mutex_unlock_usercnt 160.7937 cscf.scu DLM_FreeSlice 160.7937 cscf.scu receivemessage 150.7440 cscf.scu SipSmCopyString 140.6944 cscf.scu DLM_AllocSlice post live migration: CPU: CPU with timer interrupt, speed 0 MHz (estimated) Profiling through timer interrupt samples %app name symbol name 1586 42.2370 libc.so.6memcpy 271 7.2170 no-vmlinux (no symbols) 832.2104 libc.so.6memset 411.0919 libpthread.so.0 __pthread_mutex_unlock_usercnt 350.9321 cscf.scu SipMmBufMemAlloc 290.7723 cscf.scu DLM_AllocSlice 280.7457 libpthread.so.0 pthread_mutex_lock 230.6125 cscf.scu SipApiGetNextIe 170.4527 cscf.scu SipSmCopyString 160.4261 cscf.scu receivemessage 150.3995 cscf.scu SipcMsgStatHandle 140.3728 cscf.scu Urilex 120.3196 cscf.scu DBFI_DATA_Search 120.3196 cscf.scu SipDsmGetHdrBitValInner 120.3196 cscf.scu SipSmGetDataFromRefString So, memcpy costs much more cpu cycles after live migration. Then, I restart the process, this problem disappeared. save-restore has the similar problem. perf report on vcpu thread in host, pre live migration: Performance counter stats for thread id '21082': 0 page-faults 0 minor-faults 0 major-faults 31616 cs 506 migrations 0 alignment-faults 0 emulation-faults 5075957539 L1-dcache-loads [21.32%] 324685106 L1-dcache-load-misses #6.40% of all L1-dcache hits [21.85%] 3681777120 L1-dcache-stores [21.65%] 65251823 L1-dcache-store-misses# 1.77% [22.78%] 0 L1-dcache-prefetches [22.84%] 0 L1-dcache-prefetch-misses [22.32%] 9321652613 L1-icache-loads [22.60%] 1353418869 L1-icache-load-misses # 14.52% of all L1-icache hits [21.92%] 169126969 LLC-loads [21.87%] 12583605 LLC-load-misses #7.44% of all LL-cache hits [ 5.84%] 132853447 LLC-stores [ 6.61%] 10601171 LLC-store-misses #7.9% [ 5.01%] 25309497 LLC-prefetches #30% [ 4.96%] 7723198 LLC-prefetch-misses [ 6.04%] 4954075817 dTLB-loads [11.56%] 26753106 dTLB-load-misses #0.54% of all dTLB cache hits [16.80%] 3553702874 dTLB-stores [22.37%] 4720313 dTLB-store-misses#0.13% [21.46%] not counted dTLB-prefetches not counted dTLB-prefetch-misses 60.000920666 seconds time elapsed post live migration: Performance counter stats for thread id '1579': 0 page-faults [100.00%] 0
Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with ETP enabled
Hi, Am 11.07.2013 11:36, schrieb Zhanghaoyu (A): I met similar problem to these, while performing live migration or save-restore test on the kvm platform (qemu:1.4.0, host:suse11sp2, guest:suse11sp2), running tele-communication software suite in guest, https://lists.gnu.org/archive/html/qemu-devel/2013-05/msg00098.html http://comments.gmane.org/gmane.comp.emulators.kvm.devel/102506 http://thread.gmane.org/gmane.comp.emulators.kvm.devel/100592 https://bugzilla.kernel.org/show_bug.cgi?id=58771 After live migration or virsh restore [savefile], one process's CPU utilization went up by about 30%, resulted in throughput degradation of this process. oprofile report on this process in guest, pre live migration: So far we've been unable to reproduce this with a pure qemu-kvm / qemu-system-x86_64 command line on several EPT machines, whereas for virsh it was reported as confirmed. Can you please share the resulting QEMU command line from libvirt logs or process list? qemu command line from /var/log/libvirt/qemu/[domain].log, LC_ALL=C PATH=/sbin:/usr/sbin:/usr/local/sbin:/root/bin:/usr/local/bin:/usr/bin:/bin:/usr/X11R6/bin:/usr/games:/usr/lib/mit/bin:/usr/lib/mit/sbin HOME=/root USER=root LOGNAME=root QEMU_AUDIO_DRV=none /usr/local/bin/qemu-system-x86_64 -name CSC2 -S -M pc-0.12 -cpu qemu32 -enable-kvm -m 12288 -smp 4,sockets=4,cores=1,threads=1 -uuid 76e03575-a3ad-589a-e039-40160274bb97 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/CSC2.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/opt/ne/vm/CSC2.img,if=none,id=drive-virtio-disk0,format=raw,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=20,id=hostnet0,vhost=on,vhostfd=22 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:e0:fc:00:0f:01,bus=pci.0,addr=0x3,bootindex=2 -netdev tap,fd=23,id=hostnet1,vhost=on,vhostfd=24 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:e0:fc:01:0f:01,bus=pci.0,addr=0x4 -netdev tap,fd=25,id=hostnet2,vhost=on,vhostfd=26 -device virtio-net-pci,netdev=hostnet2,id=net2,mac=00:e0:fc:02:0f:01,bus=pci.0,addr=0x5 -netdev tap,fd=27,id=hostnet3,vhost=on,vhostfd=28 -device virtio-net-pci,netdev=hostnet3,id=net3,mac=00:e0:fc:03:0f:01,bus=pci.0,addr=0x6 -netdev tap,fd=29,id=hostnet4,vhost=on,vhostfd=30 -device virtio-net-pci,netdev=hostnet4,id=net4,mac=00:e0:fc:0a:0f:01,bus=pci.0,addr=0x7 -netdev tap,fd=31,id=hostnet5,vhost=on,vhostfd=32 -device virtio-net-pci,netdev=hostnet5,id=net5,mac=00:e0:fc:0b:0f:01,bus=pci.0,addr=0x9 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc *:1 -k en-us -vga cirrus -device i6300esb,id=watchdog0,bus=pci.0,addr=0xb -watchdog-action poweroff -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xa Are both host and guest kernel at 3.0.80 (latest SLES updates)? No, both host and guest are just raw sles11-sp2-64-GM, kernel version: 3.0.13-0.27. Thanks, Zhang Haoyu Thanks, Andreas -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
Re: [Qemu-devel] meaningless to compare irqfd's msi message with new msi message in virtio_pci_vq_vector_unmask
I searched vector_irqfd globally, no place found to set/change irqfd's msi message, only irqfd's virq or users member may be changed in kvm_virtio_pci_vq_vector_use, kvm_virtio_pci_vq_vector_release, etc. So I think it's meaningless to do below check in virtio_pci_vq_vector_unmask, if (irqfd-msg.data != msg.data || irqfd-msg.address != msg.address) And, I think the comparison between old msi message and new msi messge should be performed in kvm_update_routing_entry, the raw patch shown as below, Signed-off-by: Zhang Haoyu haoyu.zh...@huawei.com Signed-off-by: Zhang Huanzhong zhanghuanzh...@huawei.com --- hw/virtio/virtio-pci.c |8 +++- kvm-all.c |5 + 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c index b070b64..e4829a3 100644 --- a/hw/virtio/virtio-pci.c +++ b/hw/virtio/virtio-pci.c @@ -613,11 +613,9 @@ static int virtio_pci_vq_vector_unmask(VirtIOPCIProxy *proxy, if (proxy-vector_irqfd) { irqfd = proxy-vector_irqfd[vector]; -if (irqfd-msg.data != msg.data || irqfd-msg.address != msg.address) { -ret = kvm_irqchip_update_msi_route(kvm_state, irqfd-virq, msg); -if (ret 0) { -return ret; -} +ret = kvm_irqchip_update_msi_route(kvm_state, irqfd-virq, msg); +if (ret 0) { +return ret; } } diff --git a/kvm-all.c b/kvm-all.c index e6b262f..63a33b4 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -1034,6 +1034,11 @@ static int kvm_update_routing_entry(KVMState *s, continue; } +if (entry-type == new_entry-type +entry-flags == new_entry-flags +!memcmp(entry-u, new_entry-u, sizeof(entry-u))) { +return 0; +} entry-type = new_entry-type; entry-flags = new_entry-flags; entry-u = new_entry-u; -- 1.7.3.1.msysgit.0 This patch works for both virtio-pci device and pci-passthrough device. MST and I had been discussed this patch before, this patch can avoid meaninglessly updating the routing entry in kvm hypervisor when new msi message is identical with old msi message, especially in some cases, for example, frequently mask/unmask per-vector masking control bit in ISR on some old linux guest(e.g., rhel-5.5), which gains much. At MST's request, the number will be provided later. I started a VM(rhel-5.5) with direct-assigned intel 82599 VF. And, ran iperf-client on the VM, iperf-server on the host where the VM resides, so communication between VM and host was switched in the 82599 NIC. The throughput comparison between above patch applied and not shown as below, before this patch applied: [ID] IntervalTransfer Bandwidth [SUM] 0.0-10.1 sec96.5Mbytes80.1Mbits/sec after this patch applied: [ID] IntervalTransfer Bandwidth [SUM] 0.0-10.0 sec10.9GBytes9.37Gbits/sec Then, I ran netperf-client on the VM, netperf-server on the host where the VM resides, the command shown as below netperf-client: netperf -H [host ip] -l 120 -t TCP_RR -- -m 1024 -r 32,1024 netperf-server: netserver The transaction rate comparison between above patch applied and not shown as below, before this patch applied: SocketSize Request Resp. Elapsed Trans. Send Recv SizeSize TimeRate Bytes Bytes bytes bytes secs. Per sec 16384 87380 32 1024 120.01 36.61 65536 87380 after this patch applied: SocketSize Request Resp. Elapsed Trans. Send Recv SizeSize TimeRate Bytes Bytes bytes bytes secs. Per sec 16384 87380 32 1024 120.01 7464.89 65536 87380 Thanks, Zhang Haoyu
[Qemu-devel] [PATCH] migration: add timeout option for tcp migration send/receive socket
When network disconnection occurs during live migration, the migration thread will be stuck in the function sendmsg(), as the migration socket is in ~O_NONBLOCK mode now. Signed-off-by: Zeng Junliang zengjunli...@huawei.com --- include/migration/migration.h |4 migration-tcp.c | 23 ++- 2 files changed, 26 insertions(+), 1 deletions(-) diff --git a/include/migration/migration.h b/include/migration/migration.h index f0640e0..1a56248 100644 --- a/include/migration/migration.h +++ b/include/migration/migration.h @@ -23,6 +23,8 @@ #include qapi-types.h #include exec/cpu-common.h +#define QEMU_MIGRATE_SOCKET_OP_TIMEOUT 60 + struct MigrationParams { bool blk; bool shared; @@ -109,6 +111,8 @@ uint64_t xbzrle_mig_pages_transferred(void); uint64_t xbzrle_mig_pages_overflow(void); uint64_t xbzrle_mig_pages_cache_miss(void); +int tcp_migration_set_socket_timeout(int fd, int optname, int timeout_in_sec); + /** * @migrate_add_blocker - prevent migration from proceeding * diff --git a/migration-tcp.c b/migration-tcp.c index b20ee58..860238b 100644 --- a/migration-tcp.c +++ b/migration-tcp.c @@ -29,11 +29,28 @@ do { } while (0) #endif +int tcp_migration_set_socket_timeout(int fd, int optname, int timeout_in_sec) +{ +struct timeval timeout; +int ret = 0; + +if (fd 0 || timeout_in_sec 0 || +(optname != SO_RCVTIMEO optname != SO_SNDTIMEO)) +return -1; + +timeout.tv_sec = timeout_in_sec; +timeout.tv_usec = 0; + +ret = qemu_setsockopt(fd, SOL_SOCKET, optname, timeout, sizeof(timeout)); + +return ret; +} + static void tcp_wait_for_connect(int fd, void *opaque) { MigrationState *s = opaque; -if (fd 0) { +if (tcp_migration_set_socket_timeout(fd, SO_SNDTIMEO, QEMU_MIGRATE_SOCKET_OP_TIMEOUT) 0) { DPRINTF(migrate connect error\n); s-file = NULL; migrate_fd_error(s); @@ -76,6 +93,10 @@ static void tcp_accept_incoming_migration(void *opaque) goto out; } +if (tcp_migration_set_socket_timeout(c, SO_RCVTIMEO, QEMU_MIGRATE_SOCKET_OP_TIMEOUT) 0) { +fprintf(stderr, set tcp migration socket receive timeout error\n); +goto out; +} process_incoming_migration(f); return; -- 1.7.3.1.msysgit.0
[Qemu-devel] [PATCH] migration: add timeout option for tcp migraion send/receive socket
When network disconnection occurs during live migration, the migration thread will be stuck in the function sendmsg(), as the migration socket is in ~O_NONBLOCK mode now. Signed-off-by: Zeng Junliang zengjunli...@huawei.com --- include/migration/migration.h |4 migration-tcp.c | 23 ++- 2 files changed, 26 insertions(+), 1 deletions(-) diff --git a/include/migration/migration.h b/include/migration/migration.h index f0640e0..1a56248 100644 --- a/include/migration/migration.h +++ b/include/migration/migration.h @@ -23,6 +23,8 @@ #include qapi-types.h #include exec/cpu-common.h +#define QEMU_MIGRATE_SOCKET_OP_TIMEOUT 60 + struct MigrationParams { bool blk; bool shared; @@ -109,6 +111,8 @@ uint64_t xbzrle_mig_pages_transferred(void); uint64_t xbzrle_mig_pages_overflow(void); uint64_t xbzrle_mig_pages_cache_miss(void); +int tcp_migration_set_socket_timeout(int fd, int optname, int timeout_in_sec); + /** * @migrate_add_blocker - prevent migration from proceeding * diff --git a/migration-tcp.c b/migration-tcp.c index b20ee58..391db0a 100644 --- a/migration-tcp.c +++ b/migration-tcp.c @@ -33,7 +33,7 @@ static void tcp_wait_for_connect(int fd, void *opaque) { MigrationState *s = opaque; -if (fd 0) { +if (tcp_migration_set_socket_timeout(fd, SO_SNDTIMEO, QEMU_MIGRATE_SOCKET_OP_TIMEOUT) 0) { DPRINTF(migrate connect error\n); s-file = NULL; migrate_fd_error(s); @@ -76,6 +76,10 @@ static void tcp_accept_incoming_migration(void *opaque) goto out; } +if (tcp_migration_set_socket_timeout(c, SO_RCVTIMEO, QEMU_MIGRATE_SOCKET_OP_TIMEOUT) 0) { +fprintf(stderr, set tcp migration socket receive timeout error\n); +goto out; +} process_incoming_migration(f); return; @@ -95,3 +99,20 @@ void tcp_start_incoming_migration(const char *host_port, Error **errp) qemu_set_fd_handler2(s, NULL, tcp_accept_incoming_migration, NULL, (void *)(intptr_t)s); } + +int tcp_migration_set_socket_timeout(int fd, int optname, int timeout_in_sec) +{ +struct timeval timeout; +int ret = 0; + +if (fd 0 || timeout_in_sec 0 || +(optname != SO_RCVTIMEO optname != SO_SNDTIMEO)) +return -1; + +timeout.tv_sec = timeout_in_sec; +timeout.tv_usec = 0; + +ret = qemu_setsockopt(fd, SOL_SOCKET, optname, timeout, sizeof(timeout)); + +return ret; +} \ No newline at end of file -- 1.7.3.1.msysgit.0
[Qemu-devel] meaningless to compare irqfd's msi message with new msi message in virtio_pci_vq_vector_unmask
I searched vector_irqfd globally, no place found to set/change irqfd's msi message, only irqfd's virq or users member may be changed in kvm_virtio_pci_vq_vector_use, kvm_virtio_pci_vq_vector_release, etc. So I think it's meaningless to do below check in virtio_pci_vq_vector_unmask, if (irqfd-msg.data != msg.data || irqfd-msg.address != msg.address) And, I think the comparison between old msi message and new msi messge should be performed in kvm_update_routing_entry, the raw patch shown as below, Signed-off-by: Zhang Haoyu haoyu.zh...@huawei.com Signed-off-by: Zhang Huanzhong zhanghuanzh...@huawei.com --- hw/virtio/virtio-pci.c |8 +++- kvm-all.c |5 + 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c index b070b64..e4829a3 100644 --- a/hw/virtio/virtio-pci.c +++ b/hw/virtio/virtio-pci.c @@ -613,11 +613,9 @@ static int virtio_pci_vq_vector_unmask(VirtIOPCIProxy *proxy, if (proxy-vector_irqfd) { irqfd = proxy-vector_irqfd[vector]; -if (irqfd-msg.data != msg.data || irqfd-msg.address != msg.address) { -ret = kvm_irqchip_update_msi_route(kvm_state, irqfd-virq, msg); -if (ret 0) { -return ret; -} +ret = kvm_irqchip_update_msi_route(kvm_state, irqfd-virq, msg); +if (ret 0) { +return ret; } } diff --git a/kvm-all.c b/kvm-all.c index e6b262f..63a33b4 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -1034,6 +1034,11 @@ static int kvm_update_routing_entry(KVMState *s, continue; } +if (entry-type == new_entry-type +entry-flags == new_entry-flags +!memcmp(entry-u, new_entry-u, sizeof(entry-u))) { +return 0; +} entry-type = new_entry-type; entry-flags = new_entry-flags; entry-u = new_entry-u; -- 1.7.3.1.msysgit.0 This patch works for both virtio-pci device and pci-passthrough device. MST and I had been discussed this patch before, this patch can avoid meaninglessly updating the routing entry in kvm hypervisor when new msi message is identical with old msi message, especially in some cases, for example, frequently mask/unmask per-vector masking control bit in ISR on some old linux guest(e.g., rhel-5.5), which gains much. At MST's request, the number will be provided later. Thanks, Zhang Haoyu
Re: [Qemu-devel] [PATCH] [KVM] Needless to update msi route when only msi-x entry control section changed
With regard to old version linux guest(e.g., rhel-5.5), in ISR processing, mask and unmask msi-x vector every time, which result in VMEXIT, then QEMU will invoke kvm_irqchip_update_msi_route() to ask KVM hypervisor to update the VM irq routing table. In KVM hypervisor, synchronizing RCU needed after updating routing table, so much time consumed for waiting in wait_rcu_gp(). So CPU usage in VM is so high, while from the view of host, VM's total CPU usage is so low. Masking/unmasking msi-x vector only set msi-x entry control section, needless to update VM irq routing table. Signed-off-by: Zhang Haoyu haoyu.zh...@huawei.com Signed-off-by: Huang Weidong weidong.hu...@huawei.com Signed-off-by: Qin Chuanyu qinchua...@huawei.com --- hw/i386/kvm/pci-assign.c | 3 +++ 1 files changed, 3 insertions(+) --- a/hw/i386/kvm/pci-assign.c 2013-05-04 15:53:18.0 +0800 +++ b/hw/i386/kvm/pci-assign.c 2013-05-04 15:50:46.0 +0800 @@ -1576,6 +1576,8 @@ static void assigned_dev_msix_mmio_write MSIMessage msg; int ret; +/* Needless to update msi route when only msi-x entry control section changed */ +if ((addr (PCI_MSIX_ENTRY_SIZE - 1)) != + PCI_MSIX_ENTRY_VECTOR_CTRL){ msg.address = entry-addr_lo | ((uint64_t)entry-addr_hi 32); msg.data = entry-data; @@ -1585,6 +1587,7 @@ static void assigned_dev_msix_mmio_write if (ret) { error_report(Error updating irq routing entry (%d), ret); } +} } } } Thanks, Zhang Haoyu If guest wants to update the vector, it does it like this: mask update unmask and it looks like the only point where we update the vector is on unmask, so this patch will mean we don't update the vector ever. I'm not sure this combination (old guest + legacy device assignment framework) is worth optimizing. Can you try VFIO instead? But if it is, the right way to do this is probably along the lines of the below patch. Want to try it out? diff --git a/kvm-all.c b/kvm-all.c index 2d92721..afe2327 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -1006,6 +1006,11 @@ static int kvm_update_routing_entry(KVMState *s, continue; } +if (entry-type == new_entry-type +entry-flags == new_entry-flags +entry-u == new_entry-u) { +return 0; +} entry-type = new_entry-type; entry-flags = new_entry-flags; entry-u = new_entry-u; union type cannot be directly compared, I tried out below patch instead, --- a/kvm-all.c 2013-05-06 09:56:38.0 +0800 +++ b/kvm-all.c 2013-05-06 09:56:45.0 +0800 @@ -1008,6 +1008,12 @@ static int kvm_update_routing_entry(KVMS continue; } +if (entry-type == new_entry-type +entry-flags == new_entry-flags +!memcmp(entry-u, new_entry-u, sizeof(entry-u))) { +return 0; +} + entry-type = new_entry-type; entry-flags = new_entry-flags; entry-u = new_entry-u; MST's patch is more universal than my first patch fixed in assigned_dev_msix_mmio_write(). On the case that the msix entry's other section but control section is set to the identical value with old entry's, MST's patch also works. MST's patch also works on the non-passthrough scenario. Any numbers for either case? I'm not sure what you said exactly means. Do you want me to make a further statement for comparison between above two patches? If yes, no other comments. And, after MST's patch applied, the below check in virtio_pci_vq_vector_unmask() can be removed. --- a/hw/virtio/virtio-pci.c2013-05-04 15:53:20.0 +0800 +++ b/hw/virtio/virtio-pci.c2013-05-06 10:25:58.0 +0800 @@ -619,12 +619,10 @@ static int virtio_pci_vq_vector_unmask(V if (proxy-vector_irqfd) { irqfd = proxy-vector_irqfd[vector]; -if (irqfd-msg.data != msg.data || irqfd-msg.address != msg.address) { ret = kvm_irqchip_update_msi_route(kvm_state, irqfd-virq, msg); if (ret 0) { return ret; } -} } /* If guest supports masking, irqfd is already setup, unmask it. Thanks, Zhang Haoyu
Re: [Qemu-devel] [PATCH] [KVM] Needless to update msi route when only msi-x entry control section changed
With regard to old version linux guest(e.g., rhel-5.5), in ISR processing, mask and unmask msi-x vector every time, which result in VMEXIT, then QEMU will invoke kvm_irqchip_update_msi_route() to ask KVM hypervisor to update the VM irq routing table. In KVM hypervisor, synchronizing RCU needed after updating routing table, so much time consumed for waiting in wait_rcu_gp(). So CPU usage in VM is so high, while from the view of host, VM's total CPU usage is so low. Masking/unmasking msi-x vector only set msi-x entry control section, needless to update VM irq routing table. Signed-off-by: Zhang Haoyu haoyu.zh...@huawei.com Signed-off-by: Huang Weidong weidong.hu...@huawei.com Signed-off-by: Qin Chuanyu qinchua...@huawei.com --- hw/i386/kvm/pci-assign.c | 3 +++ 1 files changed, 3 insertions(+) --- a/hw/i386/kvm/pci-assign.c 2013-05-04 15:53:18.0 +0800 +++ b/hw/i386/kvm/pci-assign.c 2013-05-04 15:50:46.0 +0800 @@ -1576,6 +1576,8 @@ static void assigned_dev_msix_mmio_write MSIMessage msg; int ret; +/* Needless to update msi route when only msi-x entry control section changed */ +if ((addr (PCI_MSIX_ENTRY_SIZE - 1)) != + PCI_MSIX_ENTRY_VECTOR_CTRL){ msg.address = entry-addr_lo | ((uint64_t)entry-addr_hi 32); msg.data = entry-data; @@ -1585,6 +1587,7 @@ static void assigned_dev_msix_mmio_write if (ret) { error_report(Error updating irq routing entry (%d), ret); } +} } } } Thanks, Zhang Haoyu If guest wants to update the vector, it does it like this: mask update unmask and it looks like the only point where we update the vector is on unmask, so this patch will mean we don't update the vector ever. I'm not sure this combination (old guest + legacy device assignment framework) is worth optimizing. Can you try VFIO instead? But if it is, the right way to do this is probably along the lines of the below patch. Want to try it out? diff --git a/kvm-all.c b/kvm-all.c index 2d92721..afe2327 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -1006,6 +1006,11 @@ static int kvm_update_routing_entry(KVMState *s, continue; } +if (entry-type == new_entry-type +entry-flags == new_entry-flags +entry-u == new_entry-u) { +return 0; +} entry-type = new_entry-type; entry-flags = new_entry-flags; entry-u = new_entry-u; union type cannot be directly compared, I tried out below patch instead, --- a/kvm-all.c 2013-05-06 09:56:38.0 +0800 +++ b/kvm-all.c 2013-05-06 09:56:45.0 +0800 @@ -1008,6 +1008,12 @@ static int kvm_update_routing_entry(KVMS continue; } +if (entry-type == new_entry-type +entry-flags == new_entry-flags +!memcmp(entry-u, new_entry-u, sizeof(entry-u))) { +return 0; +} + entry-type = new_entry-type; entry-flags = new_entry-flags; entry-u = new_entry-u; MST's patch is more universal than my first patch fixed in assigned_dev_msix_mmio_write(). On the case that the msix entry's other section but control section is set to the identical value with old entry's, MST's patch also works. MST's patch also works on the non-passthrough scenario. And, after MST's patch applied, the below check in virtio_pci_vq_vector_unmask() can be removed. --- a/hw/virtio/virtio-pci.c2013-05-04 15:53:20.0 +0800 +++ b/hw/virtio/virtio-pci.c2013-05-06 10:25:58.0 +0800 @@ -619,12 +619,10 @@ static int virtio_pci_vq_vector_unmask(V if (proxy-vector_irqfd) { irqfd = proxy-vector_irqfd[vector]; -if (irqfd-msg.data != msg.data || irqfd-msg.address != msg.address) { ret = kvm_irqchip_update_msi_route(kvm_state, irqfd-virq, msg); if (ret 0) { return ret; } -} } /* If guest supports masking, irqfd is already setup, unmask it. Thanks, Zhang Haoyu
[Qemu-devel] [PATCH] [KVM] Needless to update msi route when only msi-x entry control section changed
With regard to old version linux guest(e.g., rhel-5.5), in ISR processing, mask and unmask msi-x vector every time, which result in VMEXIT, then QEMU will invoke kvm_irqchip_update_msi_route() to ask KVM hypervisor to update the VM irq routing table. In KVM hypervisor, synchronizing RCU needed after updating routing table, so much time consumed for waiting in wait_rcu_gp(). So CPU usage in VM is so high, while from the view of host, VM's total CPU usage is so low. Masking/unmasking msi-x vector only set msi-x entry control section, needless to update VM irq routing table. Signed-off-by: Zhang Haoyu haoyu.zh...@huawei.com Signed-off-by: Huang Weidong weidong.hu...@huawei.com Signed-off-by: Qin Chuanyu qinchua...@huawei.com --- hw/i386/kvm/pci-assign.c | 3 +++ 1 files changed, 3 insertions(+) --- a/hw/i386/kvm/pci-assign.c 2013-05-04 15:53:18.0 +0800 +++ b/hw/i386/kvm/pci-assign.c 2013-05-04 15:50:46.0 +0800 @@ -1576,6 +1576,8 @@ static void assigned_dev_msix_mmio_write MSIMessage msg; int ret; +/* Needless to update msi route when only msi-x entry control section changed */ +if ((addr (PCI_MSIX_ENTRY_SIZE - 1)) != PCI_MSIX_ENTRY_VECTOR_CTRL){ msg.address = entry-addr_lo | ((uint64_t)entry-addr_hi 32); msg.data = entry-data; @@ -1585,6 +1587,7 @@ static void assigned_dev_msix_mmio_write if (ret) { error_report(Error updating irq routing entry (%d), ret); } +} } } } Thanks, Zhang Haoyu
Re: [Qemu-devel] KVM VM(rhel-5.5) %si is too high when TX/RX packets
I running a VM(RHEL-5.5) on KVM hypervisor(linux-3.8 + QEMU-1.4.1), and direct-assign intel 82576 VF to the VM. When TX/RX packets on VM to the other host via iperf tool, top tool result on VM shown that the %si is too high, approximately 95% ~ 100%, but from the view of host, the VM's total CPU usage is about 20% - 30%. And the throughput rate is approximately 200Mb/s, far from the line rate 1Gb/s, And, I found the hardirq rate is lower than normal by running watch -d -n 1 cat /proc/interrupts, I think it's caused by the too high %si, because the NIC's hardirq was disabled during the softirq process. Then, I direct-assign the intel 82576 to the VM, the same case happened too. I found the intel 82576 and intel 82576 VF's interrupt mode are both PCI-MSI-X. And, I rmmod the igb driver, and, re-insmod the igb driver(igb-4.1.2) with the parameter IntMode=0/1(0:legacy, 1:MSI, 2:MSI-x), the problem then gone, the %si is approximately 20% -30%, and the throughput rate came to the line rate, about 940Mb/s. I update the VM to RHEL-6.1, the problem disappeared too. And, I found a very strange thing, the VM's 82576VF's irq routing is set one time on Vf's one interrupt received, so frequently. RHEL 5.5 is a very old update. Can you try RHEL 5.9? In any case, this looks a lot like a bug in the version of the driver that was included in RHEL5.5; you should contact Red Hat support services if you can still reproduce it with the latest RHEL5 update. Paolo One patch has been proposed to QEMU, shown as below, [PATCH] [KVM] Needless to update msi route when only msi-x entry control section changed With regard to old version linux guest(e.g., rhel-5.5), in ISR processing, mask and unmask msi-x vector every time, which result in VMEXIT, then QEMU will invoke kvm_irqchip_update_msi_route() to ask KVM hypervisor to update the VM irq routing table. In KVM hypervisor, synchronizing RCU needed after updating routing table, so much time consumed for waiting in wait_rcu_gp(). So CPU usage in VM is so high, while from the view of host, VM's total CPU usage is so low. Masking/unmasking msi-x vector only set msi-x entry control section, needless to update VM irq routing table. hw/i386/kvm/pci-assign.c | 3 +++ 1 files changed, 3 insertions(+) --- a/hw/i386/kvm/pci-assign.c 2013-05-04 15:53:18.0 +0800 +++ b/hw/i386/kvm/pci-assign.c 2013-05-04 15:50:46.0 +0800 @@ -1576,6 +1576,8 @@ static void assigned_dev_msix_mmio_write MSIMessage msg; int ret; +/* Needless to update msi route when only msi-x entry control section changed */ +if ((addr (PCI_MSIX_ENTRY_SIZE - 1)) != PCI_MSIX_ENTRY_VECTOR_CTRL){ msg.address = entry-addr_lo | ((uint64_t)entry-addr_hi 32); msg.data = entry-data; @@ -1585,6 +1587,7 @@ static void assigned_dev_msix_mmio_write if (ret) { error_report(Error updating irq routing entry (%d), ret); } +} } } } Thanks, Zhang Haoyu
Re: [Qemu-devel] KVM VM(rhel-5.5) %si is too high when TX/RX packets
I running a VM(RHEL-5.5) on KVM hypervisor(linux-3.8 + QEMU-1.4.1), and direct-assign intel 82576 VF to the VM. When TX/RX packets on VM to the other host via iperf tool, top tool result on VM shown that the %si is too high, approximately 95% ~ 100%, but from the view of host, the VM's total CPU usage is about 20% - 30%. And the throughput rate is approximately 200Mb/s, far from the line rate 1Gb/s, And, I found the hardirq rate is lower than normal by running watch -d -n 1 cat /proc/interrupts, I think it's caused by the too high %si, because the NIC's hardirq was disabled during the softirq process. Then, I direct-assign the intel 82576 to the VM, the same case happened too. I found the intel 82576 and intel 82576 VF's interrupt mode are both PCI-MSI-X. And, I rmmod the igb driver, and, re-insmod the igb driver(igb-4.1.2) with the parameter IntMode=0/1(0:legacy, 1:MSI, 2:MSI-x), the problem then gone, the %si is approximately 20% -30%, and the throughput rate came to the line rate, about 940Mb/s. I update the VM to RHEL-6.1, the problem disappeared too. And, I found a very strange thing, the VM's 82576VF's irq routing is set one time on Vf's one interrupt received, so frequently. With regard to rhel-5.5(linux-2.6.18), in ISR process, mask and unmask msi-x vector function msi_set_mask_bit() was invoked every time, which result in VMEXIT, then QEMU will invoke kvm_irqchip_update_msi_route() to ask KVM hypervisor to update the VM irq routing table. In KVM hypervisor, synchronizing process needed after updating routing table, so much time consumed for waiting in wait_rcu_gp(). So %si in VM is so high, while from the view of host, VM's total CPU usage is so low. Why in ISR process, masking and unmasking msi-x vector is needed every time? Thanks, Zhang Haoyu
[Qemu-devel] KVM VM(rhel-5.5) %si is too high when TX/RX packets
I running a VM(RHEL-5.5) on KVM hypervisor(linux-3.8 + QEMU-1.4.1), and direct-assign intel 82576 VF to the VM. When TX/RX packets on VM to the other host via iperf tool, top tool result on VM shown that the %si is too high, approximately 95% ~ 100%, but from the view of host, the VM's total CPU usage is about 20% - 30%. And the throughput rate is approximately 200Mb/s, far from the line rate 1Gb/s, And, I found the hardirq rate is lower than normal by running watch -d -n 1 cat /proc/interrupts, I think it's caused by the too high %si, because the NIC's hardirq was disabled during the softirq process. Then, I direct-assign the intel 82576 to the VM, the same case happened too. I found the intel 82576 and intel 82576 VF's interrupt mode are both PCI-MSI-X. And, I rmmod the igb driver, and, re-insmod the igb driver(igb-4.1.2) with the parameter IntMode=0/1(0:legacy, 1:MSI, 2:MSI-x), the problem then gone, the %si is approximately 20% -30%, and the throughput rate came to the line rate, about 940Mb/s. I update the VM to RHEL-6.1, the problem disappeared too. And, I found a very strange thing, the VM's 82576VF's irq routing is set one time on Vf's one interrupt received, so frequently. Thanks, Zhang Haoyu
Re: [Qemu-devel] KVM VM(windows xp) reseted when running geekbench for about 2 days
On Thu, Apr 18, 2013 at 12:00:49PM +, Zhanghaoyu (A) wrote: I start 10 VMs(windows xp), then running geekbench tool on them, about 2 days, one of them was reset, I found the reset operation is done by int kvm_cpu_exec(CPUArchState *env) { ... switch (run-exit_reason) ... case KVM_EXIT_SHUTDOWN: DPRINTF(shutdown\n); qemu_system_reset_request(); ret = EXCP_INTERRUPT; break; ... } KVM_EXIT_SHUTDOWN exit reason was set previously in triple fault handle handle_triple_fault(). How do you know that reset was done here? This is not the only place where qemu_system_reset_request() is called. I used gdb to debug QEMU process, and add a breakpoint in qemu_system_reset_request(), when the case occurred, backtrace shown as below, (gdb) bt #0 qemu_system_reset_request () at vl.c:1964 #1 0x7f9ef9dc5991 in kvm_cpu_exec (env=0x7f9efac47100) at /gt/qemu-kvm-1.4/qemu-kvm-1.4/kvm-all.c:1602 #2 0x7f9ef9d5b229 in qemu_kvm_cpu_thread_fn (arg=0x7f9efac47100) at /gt/qemu-kvm-1.4/qemu-kvm-1.4/cpus.c:759 #3 0x7f9ef898b5f0 in start_thread () from /lib64/libpthread.so.0 #4 0x7f9ef86fa84d in clone () from /lib64/libc.so.6 #5 0x in ?? () And, I add printk log in all places where KVM_EXIT_SHUTDOWN exit reason is set, only handle_triple_fault() was called. Make sure XP is not set to auto-reset in case of BSOD. No, winxp is not set to auto-reset in case of BSOD. No Winxp event log reported. Best regards, Yan. What causes the triple fault? Are you asking what is triple fault or why it happened in your case? What I asked is why triple fault happened in my case. For the former see here: http://en.wikipedia.org/wiki/Triple_fault For the later it is to late to tell after VM reset. You can run QEMU with -no-reboot -no-shutdown. VM will pause instead of rebooting and then you can examine what is going on. Great thanks, I'll run QEMU with -no-reboot -no-shutdown options, if VM paused in my case, what should I examined? Register state info registers in the monitor for each vcpu. Code around the instruction that faulted. I ran the QEMU with -no-reboot -no-shutdown options, the VM paused When the case happened, then I info registers in QEMU monitor, shown as below, CS =0008 00c09b00 DPL =0 CS32 [-RA] SS =0010 00c09300 DPL =0 DS [-WA] DS =0023 00c0f300 DPL =3 DS [-WA] FS =0030 ffdff000 1fff 00c09300 DPL =0 DS [-WA] GS = 00c0 LDT= 00c0 TR =0028 80042000 20ab 8b00 DPL=0 TSS32-busy GDT= 8003f000 03ff IDT= 8003f400 07ff CR0=8001003b CR2=760d7fe4 CR3=002ec000 CR4=06f8 DR0= DR1= DR2= DR3= DR6=0ff0 DR7=0400 EFER=0800 FCW=027f FSW= [ST=0] FTW=00 MXCSR=1f80 FPR0= FPR1= FPR2= FPR3= FPR4= FPR5= FPR6= FPR7= XMM00= XMM01= XMM02= XMM03= XMM04= XMM05= XMM06= XMM07= In normal case, info registers in QEMU monitor, shown as below CS =001b 00c0fb00 DPL=3 CS32 [-RA] SS =0023 00c0f300 DPL=3 DS [-WA] DS =0023 00c0f300 DPL=3 DS [-WA] FS =0038 7ffda000 0fff 0040f300 DPL=3 DS [-WA] GS = 0100 LDT= TR =0028 80042000 20ab 8b00 DPL=0 TSS32-busy GDT= 8003f000 03ff IDT= 8003f400 07ff CR0=80010031 CR2=0167fd20 CR3=0af00220 CR4=06f8 DR0= DR1= DR2= DR3= DR6=0ff0 DR7=0400 EFER=0800 FCW=027f FSW= [ST=0] FTW=00 MXCSR=1f80 FPR0=00a400a40a18 d830 FPR1=0012f9c07c90e900 e900 FPR2=7c910202 5d40 FPR3=01e27c903400 f808 FPR4=05230012f87a FPR5=7c905d40 0001 FPR6=0001 FPR7=a9dfde00 4018 XMM00=7c917d9a0012f8d47c90 XMM01=0012f8740012f8740012f87a7c90 XMM02=7c917de97c97b1787c917e3f0012f87a XMM03=0012fa687c80901a0012f9186cfd XMM04=7c9102027c9034007c9102087c90e900 XMM05=000c7c900012f9907c91017b XMM06=9a400012f8780012f878 XMM07=6365446c74527c91340500241f18 N.B. in two cases, CS DPL, SS DPL, FS DPL, FPR, XMM, FSW, ST, FTW
Re: [Qemu-devel] KVM VM(windows xp) reseted when running geekbench for about 2 days
On Thu, Apr 18, 2013 at 12:00:49PM +, Zhanghaoyu (A) wrote: I start 10 VMs(windows xp), then running geekbench tool on them, about 2 days, one of them was reset, I found the reset operation is done by int kvm_cpu_exec(CPUArchState *env) { ... switch (run-exit_reason) ... case KVM_EXIT_SHUTDOWN: DPRINTF(shutdown\n); qemu_system_reset_request(); ret = EXCP_INTERRUPT; break; ... } KVM_EXIT_SHUTDOWN exit reason was set previously in triple fault handle handle_triple_fault(). How do you know that reset was done here? This is not the only place where qemu_system_reset_request() is called. I used gdb to debug QEMU process, and add a breakpoint in qemu_system_reset_request(), when the case occurred, backtrace shown as below, (gdb) bt #0 qemu_system_reset_request () at vl.c:1964 #1 0x7f9ef9dc5991 in kvm_cpu_exec (env=0x7f9efac47100) at /gt/qemu-kvm-1.4/qemu-kvm-1.4/kvm-all.c:1602 #2 0x7f9ef9d5b229 in qemu_kvm_cpu_thread_fn (arg=0x7f9efac47100) at /gt/qemu-kvm-1.4/qemu-kvm-1.4/cpus.c:759 #3 0x7f9ef898b5f0 in start_thread () from /lib64/libpthread.so.0 #4 0x7f9ef86fa84d in clone () from /lib64/libc.so.6 #5 0x in ?? () And, I add printk log in all places where KVM_EXIT_SHUTDOWN exit reason is set, only handle_triple_fault() was called. Make sure XP is not set to auto-reset in case of BSOD. No, winxp is not set to auto-reset in case of BSOD. No Winxp event log reported. Best regards, Yan. What causes the triple fault? Are you asking what is triple fault or why it happened in your case? What I asked is why triple fault happened in my case. For the former see here: http://en.wikipedia.org/wiki/Triple_fault For the later it is to late to tell after VM reset. You can run QEMU with -no-reboot -no-shutdown. VM will pause instead of rebooting and then you can examine what is going on. Great thanks, I'll run QEMU with -no-reboot -no-shutdown options, if VM paused in my case, what should I examined? Register state info registers in the monitor for each vcpu. Code around the instruction that faulted. I ran the QEMU with -no-reboot -no-shutdown options, the VM paused When the case happened, then I info registers in QEMU monitor, shown as below, CS =0008 00c09b00 DPL =0 CS32 [-RA] SS =0010 00c09300 DPL =0 DS [-WA] DS =0023 00c0f300 DPL =3 DS [-WA] FS =0030 ffdff000 1fff 00c09300 DPL =0 DS [-WA] GS = 00c0 LDT= 00c0 TR =0028 80042000 20ab 8b00 DPL=0 TSS32-busy GDT= 8003f000 03ff IDT= 8003f400 07ff CR0=8001003b CR2=760d7fe4 CR3=002ec000 CR4=06f8 DR0= DR1= DR2= DR3= DR6=0ff0 DR7=0400 EFER=0800 FCW=027f FSW= [ST=0] FTW=00 MXCSR=1f80 FPR0= FPR1= FPR2= FPR3= FPR4= FPR5= FPR6= FPR7= XMM00= XMM01= XMM02= XMM03= XMM04= XMM05= XMM06= XMM07= In normal case, info registers in QEMU monitor, shown as below CS =001b 00c0fb00 DPL=3 CS32 [-RA] SS =0023 00c0f300 DPL=3 DS [-WA] DS =0023 00c0f300 DPL=3 DS [-WA] FS =0038 7ffda000 0fff 0040f300 DPL=3 DS [-WA] GS = 0100 LDT= TR =0028 80042000 20ab 8b00 DPL=0 TSS32-busy GDT= 8003f000 03ff IDT= 8003f400 07ff CR0=80010031 CR2=0167fd20 CR3=0af00220 CR4=06f8 DR0= DR1= DR2= DR3= DR6=0ff0 DR7=0400 EFER=0800 FCW=027f FSW= [ST=0] FTW=00 MXCSR=1f80 FPR0=00a400a40a18 d830 FPR1=0012f9c07c90e900 e900 FPR2=7c910202 5d40 FPR3=01e27c903400 f808 FPR4=05230012f87a FPR5=7c905d40 0001 FPR6=0001 FPR7=a9dfde00 4018 XMM00=7c917d9a0012f8d47c90 XMM01=0012f8740012f8740012f87a7c90 XMM02=7c917de97c97b1787c917e3f0012f87a XMM03=0012fa687c80901a0012f9186cfd XMM04=7c9102027c9034007c9102087c90e900 XMM05=000c7c900012f9907c91017b XMM06=9a400012f8780012f878 XMM07=6365446c74527c91340500241f18 N.B. in two cases, CS DPL, SS DPL, FS DPL, FPR, XMM, FSW, ST, FTW values are quite distinct. Thanks, Zhang Haoyu
[Qemu-devel] reply: reply: reply: qemu crashed when starting vm(kvm) with vnc connect
On Mon, Apr 08, 2013 at 12:27:06PM +, Zhanghaoyu (A) wrote: On Sun, Apr 07, 2013 at 04:58:07AM +, Zhanghaoyu (A) wrote: I start a kvm VM with vnc(using the zrle protocol) connect, sometimes qemu program crashed during starting period, received signal SIGABRT. Trying about 20 times, this crash may be reproduced. I guess the cause memory corruption or double free. Which version of QEMU are you running? Please try qemu.git/master. Please try again with latest master, might be fixed meanwhile. If it still happens pleas provide full qemu and vnc client command lines. backtrace from core file is shown as below: Program received signal SIGABRT, Aborted. #8 0x7f32efd26d07 in vnc_disconnect_finish (vs=0x7f32f0c762d0) at ui/vnc.c:1050 Do you have a vnc client connected? Do you close it? I have a vnc client connected, it was auto closed while qemu crashed. Any errors reported by the vnc client (maybe it disconnects due to an error in the data stream)? No errors reported by the vnc client, just popup a reconnect window. And, I have tried to fix this bug, not reproduce this crash after tried about 100 times, patch is shown as below, --- a/ui/vnc-jobs.c 2013-04-18 20:10:07.0 +0800 +++ b/ui/vnc-jobs.c 2013-04-18 20:14:06.0 +0800 @@ -234,7 +234,6 @@ static int vnc_worker_thread_loop(VncJob vnc_unlock_output(job-vs); goto disconnected; } -vnc_unlock_output(job-vs); /* Make a local copy of vs and switch output buffers */ vnc_async_encoding_start(job-vs, vs); @@ -252,6 +251,8 @@ static int vnc_worker_thread_loop(VncJob if (job-vs-csock == -1) { vnc_unlock_display(job-vs-vd); +vnc_async_encoding_end(job-vs, vs); +vnc_unlock_output(job-vs); goto disconnected; } @@ -269,7 +270,6 @@ static int vnc_worker_thread_loop(VncJob vs.output.buffer[saved_offset] = (n_rectangles 8) 0xFF; vs.output.buffer[saved_offset + 1] = n_rectangles 0xFF; -vnc_lock_output(job-vs); if (job-vs-csock != -1) { buffer_reserve(job-vs-jobs_buffer, vs.output.offset); buffer_append(job-vs-jobs_buffer, vs.output.buffer, @@ -278,6 +278,8 @@ static int vnc_worker_thread_loop(VncJob vnc_async_encoding_end(job-vs, vs); qemu_bh_schedule(job-vs-bh); +} else { +vnc_async_encoding_end(job-vs, vs); } vnc_unlock_output(job-vs); Thanks, Zhang Haoyu
[Qemu-devel] KVM VM(windows xp) reseted when running geekbench for about 2 days
I start 10 VMs(windows xp), then running geekbench tool on them, about 2 days, one of them was reset, I found the reset operation is done by int kvm_cpu_exec(CPUArchState *env) { ... switch (run-exit_reason) ... case KVM_EXIT_SHUTDOWN: DPRINTF(shutdown\n); qemu_system_reset_request(); ret = EXCP_INTERRUPT; break; ... } KVM_EXIT_SHUTDOWN exit reason was set previously in triple fault handle handle_triple_fault(). What causes the triple fault? Thanks, Zhang Haoyu
Re: [Qemu-devel] KVM VM(windows xp) reseted when running geekbench for about 2 days
On Thu, Apr 18, 2013 at 12:00:49PM +, Zhanghaoyu (A) wrote: I start 10 VMs(windows xp), then running geekbench tool on them, about 2 days, one of them was reset, I found the reset operation is done by int kvm_cpu_exec(CPUArchState *env) { ... switch (run-exit_reason) ... case KVM_EXIT_SHUTDOWN: DPRINTF(shutdown\n); qemu_system_reset_request(); ret = EXCP_INTERRUPT; break; ... } KVM_EXIT_SHUTDOWN exit reason was set previously in triple fault handle handle_triple_fault(). How do you know that reset was done here? This is not the only place where qemu_system_reset_request() is called. I used gdb to debug QEMU process, and add a breakpoint in qemu_system_reset_request(), when the case occurred, backtrace shown as below, (gdb) bt #0 qemu_system_reset_request () at vl.c:1964 #1 0x7f9ef9dc5991 in kvm_cpu_exec (env=0x7f9efac47100) at /gt/qemu-kvm-1.4/qemu-kvm-1.4/kvm-all.c:1602 #2 0x7f9ef9d5b229 in qemu_kvm_cpu_thread_fn (arg=0x7f9efac47100) at /gt/qemu-kvm-1.4/qemu-kvm-1.4/cpus.c:759 #3 0x7f9ef898b5f0 in start_thread () from /lib64/libpthread.so.0 #4 0x7f9ef86fa84d in clone () from /lib64/libc.so.6 #5 0x in ?? () And, I add printk log in all place where KVM_EXIT_SHUTDOWN exit reason is set, only handle_triple_fault() was called. Make sure XP is not set to auto-reset in case of BSOD. No, winxp is not set to auto-reset in case of BSOD. No Winxp event log reported. Best regards, Yan. What causes the triple fault? Are you asking what is triple fault or why it happened in your case? What I asked is why triple fault happened in my case. For the former see here: http://en.wikipedia.org/wiki/Triple_fault For the later it is to late to tell after VM reset. You can run QEMU with -no-reboot -no-shutdown. VM will pause instead of rebooting and then you can examine what is going on. Great thanks, I'll run QEMU with -no-reboot -no-shutdown options, if VM paused in my case, what should I examined? Thanks, Zhang Haoyu
Re: [Qemu-devel] latest version qemu compile error
The log of make V=1 is identical with that of make, shown as below, hw/virtio/dataplane/vring.c: In function 'vring_enable_notification': hw/virtio/dataplane/vring.c:72: warning: implicit declaration of function 'vring_avail_event' hw/virtio/dataplane/vring.c:72: warning: nested extern declaration of 'vring_avail_event' hw/virtio/dataplane/vring.c:72: error: lvalue required as left operand of assignment hw/virtio/dataplane/vring.c: In function 'vring_should_notify': hw/virtio/dataplane/vring.c:107: warning: implicit declaration of function 'vring_need_event' hw/virtio/dataplane/vring.c:107: warning: nested extern declaration of 'vring_need_event' hw/virtio/dataplane/vring.c:107: warning: implicit declaration of function 'vring_used_event' hw/virtio/dataplane/vring.c:107: warning: nested extern declaration of 'vring_used_event' hw/virtio/dataplane/vring.c: In function 'vring_pop': hw/virtio/dataplane/vring.c:262: error: lvalue required as left operand of assignment make: *** [hw/virtio/dataplane/vring.o] Error 1 I don't need the errors, I need the compiler command line. Paolo The gcc command line, cc -I. -I/home/zhanghaoyu/qemu_201304091521 -I/home/zhanghaoyu/qemu_201304091521/include -fPIE -DPIE -m64 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fstack-protector-all -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -I/usr/include/pixman-1 -Ihw/virtio/dataplane -Ihw/virtio/dataplane -pthread -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -MMD -MP -MT hw/virtio/dataplane/vring.o -MF hw/virtio/dataplane/vring.d -O2 -D_FORTIFY_SOURCE=2 -g -c -o hw/virtio/dataplane/vring.o hw/virtio/dataplane/vring.c Thanks, Zhang Haoyu
[Qemu-devel] latest version qemu compile error
I compile the QEMU source download from qemu.git (http://git.qemu.org/git/qemu.git) on 4-9-2013, errors reported as below, hw/virtio/dataplane/vring.c: In function 'vring_enable_notification': hw/virtio/dataplane/vring.c:72: warning: implicit declaration of function 'vring_avail_event' hw/virtio/dataplane/vring.c:72: warning: nested extern declaration of 'vring_avail_event' hw/virtio/dataplane/vring.c:72: error: lvalue required as left operand of assignment hw/virtio/dataplane/vring.c: In function 'vring_should_notify': hw/virtio/dataplane/vring.c:107: warning: implicit declaration of function 'vring_need_event' hw/virtio/dataplane/vring.c:107: warning: nested extern declaration of 'vring_need_event' hw/virtio/dataplane/vring.c:107: warning: implicit declaration of function 'vring_used_event' hw/virtio/dataplane/vring.c:107: warning: nested extern declaration of 'vring_used_event' hw/virtio/dataplane/vring.c: In function 'vring_pop': hw/virtio/dataplane/vring.c:262: error: lvalue required as left operand of assignment make: *** [hw/virtio/dataplane/vring.o] Error 1 'vring_avail_event' and 'vring_need_event' defined in /linux-headers/linux/virtio_ring.h, are not available in vring.c ?
Re: [Qemu-devel] latest version qemu compile error
I compile the QEMU source download from qemu.git (http://git.qemu.org/git/qemu.git) on 4-9-2013, errors reported as below, hw/virtio/dataplane/vring.c: In function 'vring_enable_notification': hw/virtio/dataplane/vring.c:72: warning: implicit declaration of function 'vring_avail_event' hw/virtio/dataplane/vring.c:72: warning: nested extern declaration of 'vring_avail_event' hw/virtio/dataplane/vring.c:72: error: lvalue required as left operand of assignment hw/virtio/dataplane/vring.c: In function 'vring_should_notify': hw/virtio/dataplane/vring.c:107: warning: implicit declaration of function 'vring_need_event' hw/virtio/dataplane/vring.c:107: warning: nested extern declaration of 'vring_need_event' hw/virtio/dataplane/vring.c:107: warning: implicit declaration of function 'vring_used_event' hw/virtio/dataplane/vring.c:107: warning: nested extern declaration of 'vring_used_event' hw/virtio/dataplane/vring.c: In function 'vring_pop': hw/virtio/dataplane/vring.c:262: error: lvalue required as left operand of assignment make: *** [hw/virtio/dataplane/vring.o] Error 1 'vring_avail_event' and 'vring_need_event' defined in /linux-headers/linux/virtio_ring.h, are not available in vring.c ? Please send the log of make V=1. Paolo The log of make V=1 is identical with that of make, shown as below, hw/virtio/dataplane/vring.c: In function 'vring_enable_notification': hw/virtio/dataplane/vring.c:72: warning: implicit declaration of function 'vring_avail_event' hw/virtio/dataplane/vring.c:72: warning: nested extern declaration of 'vring_avail_event' hw/virtio/dataplane/vring.c:72: error: lvalue required as left operand of assignment hw/virtio/dataplane/vring.c: In function 'vring_should_notify': hw/virtio/dataplane/vring.c:107: warning: implicit declaration of function 'vring_need_event' hw/virtio/dataplane/vring.c:107: warning: nested extern declaration of 'vring_need_event' hw/virtio/dataplane/vring.c:107: warning: implicit declaration of function 'vring_used_event' hw/virtio/dataplane/vring.c:107: warning: nested extern declaration of 'vring_used_event' hw/virtio/dataplane/vring.c: In function 'vring_pop': hw/virtio/dataplane/vring.c:262: error: lvalue required as left operand of assignment make: *** [hw/virtio/dataplane/vring.o] Error 1 Thanks, Zhang Haoyu
[Qemu-devel] reply: reply: qemu crashed when starting vm(kvm) with vnc connect
On Sun, Apr 07, 2013 at 04:58:07AM +, Zhanghaoyu (A) wrote: I start a kvm VM with vnc(using the zrle protocol) connect, sometimes qemu program crashed during starting period, received signal SIGABRT. Trying about 20 times, this crash may be reproduced. I guess the cause memory corruption or double free. Which version of QEMU are you running? Please try qemu.git/master. Stefan I used the QEMU download from qemu.git (http://git.qemu.org/git/qemu.git). Great, thanks! Can you please post a backtrace? The easiest way is: $ ulimit -c unlimited $ qemu-system-x86_64 -enable-kvm -m 1024 ... ...crash... $ gdb -c qemu-system-x86_64.core (gdb) bt Depending on how your system is configured the core file might have a different filename but there should be a file name *core* the current working directory after the crash. The backtrace will make it possible to find out where the crash occurred. Thanks, Stefan backtrace from core file is shown as below: Program received signal SIGABRT, Aborted. 0x7f32eda3dd95 in raise () from /lib64/libc.so.6 (gdb) bt #0 0x7f32eda3dd95 in raise () from /lib64/libc.so.6 #1 0x7f32eda3f2ab in abort () from /lib64/libc.so.6 #2 0x7f32eda77ece in __libc_message () from /lib64/libc.so.6 #3 0x7f32eda7dc06 in malloc_printerr () from /lib64/libc.so.6 #4 0x7f32eda7ecda in _int_free () from /lib64/libc.so.6 #5 0x7f32efd3452c in free_and_trace (mem=0x7f329cd0) at vl.c:2880 #6 0x7f32efd251a1 in buffer_free (buffer=0x7f32f0c82890) at ui/vnc.c:505 #7 0x7f32efd20c56 in vnc_zrle_clear (vs=0x7f32f0c762d0) at ui/vnc-enc-zrle.c:364 #8 0x7f32efd26d07 in vnc_disconnect_finish (vs=0x7f32f0c762d0) at ui/vnc.c:1050 #9 0x7f32efd275c5 in vnc_client_read (opaque=0x7f32f0c762d0) at ui/vnc.c:1349 #10 0x7f32efcb397c in qemu_iohandler_poll (readfds=0x7f32f074d020, writefds=0x7f32f074d0a0, xfds=0x7f32f074d120, ret=1) at iohandler.c:124 #11 0x7f32efcb46e8 in main_loop_wait (nonblocking=0) at main-loop.c:417 #12 0x7f32efd31159 in main_loop () at vl.c:2133 #13 0x7f32efd38070 in main (argc=46, argv=0x7fff7f5df178, envp=0x7fff7f5df2f0) at vl.c:4481 Zhang Haoyu
[Qemu-devel] 答复: qemu crashed when starting vm(kvm) with vnc connect
I start a kvm VM with vnc(using the zrle protocol) connect, sometimes qemu program crashed during starting period, received signal SIGABRT. Trying about 20 times, this crash may be reproduced. I guess the cause memory corruption or double free. Which version of QEMU are you running? Please try qemu.git/master. Stefan I used the QEMU download from qemu.git (http://git.qemu.org/git/qemu.git). Zhang Haoyu
[Qemu-devel] qemu crashed when starting vm(kvm) with vnc connect
I start a kvm VM with vnc(using the zrle protocol) connect, sometimes qemu program crashed during starting period, received signal SIGABRT. Trying about 20 times, this crash may be reproduced. I guess the cause memory corruption or double free. The backtrace shown as below: 0x7f32eda3dd95 in raise () from /lib64/libc.so.6 (gdb) bt #0 0x7f32eda3dd95 in raise () from /lib64/libc.so.6 #1 0x7f32eda3f2ab in abort () from /lib64/libc.so.6 #2 0x7f32eda77ece in __libc_message () from /lib64/libc.so.6 #3 0x7f32eda7dc06 in malloc_printerr () from /lib64/libc.so.6 #4 0x7f32eda7ecda in _int_free () from /lib64/libc.so.6 #5 0x7f32efd3452c in free_and_trace (mem=0x7f329cd0) at vl.c:2880 #6 0x7f32efd251a1 in buffer_free (buffer=0x7f32f0c82890) at ui/vnc.c:505 #7 0x7f32efd20c56 in vnc_zrle_clear (vs=0x7f32f0c762d0) at ui/vnc-enc-zrle.c:364 #8 0x7f32efd26d07 in vnc_disconnect_finish (vs=0x7f32f0c762d0) at ui/vnc.c:1050 #9 0x7f32efd275c5 in vnc_client_read (opaque=0x7f32f0c762d0) at ui/vnc.c:1349 #10 0x7f32efcb397c in qemu_iohandler_poll (readfds=0x7f32f074d020, writefds=0x7f32f074d0a0, xfds=0x7f32f074d120, ret=1) at iohandler.c:124 #11 0x7f32efcb46e8 in main_loop_wait (nonblocking=0) at main-loop.c:417 #12 0x7f32efd31159 in main_loop () at vl.c:2133 #13 0x7f32efd38070 in main (argc=46, argv=0x7fff7f5df178, envp=0x7fff7f5df2f0) at vl.c:4481