Re: [Qemu-devel] hotplug: VM got stuck when attaching a pass-through device to the non-pass-through VM for the first time

2014-02-24 Thread Zhanghaoyu (A)
  I agree it's either COW breaking or (similarly) locking pages that 
  the guest hasn't touched yet.
 
  You can use prealloc or -rt mlock=on to avoid this problem.
 
  Paolo
 Or the new shared flag - IIRC shared VMAs don't do COW either.

Only if the problem isn't locking and zeroing of untouched pages (also, it is 
not upstream is it?).

Can you make a profile with perf?

-rt mlock=on option is not set, perf top -p qemu pid result:
21699 root  20   0 24.2g  24g 5312 S  0 33.8   0:24.39 qemu-system-x8
   PerfTop:  95 irqs/sec  kernel:17.9% us: 1.1% guest kernel:47.4% guest 
us:32.6% exact:  0.0% [1000Hz cycles],  (target_pid: 15950)


 samples  pcnt function  DSO
 ___ _ _ ___

 2984.00 77.8% clear_page_c  [kernel]
  135.00  3.5% gup_huge_pmd  [kernel]
  134.00  3.5% pfn_to_dma_pte[kernel]
   83.00  2.2% __domain_mapping  [kernel]
   63.00  1.6% update_memslots   [kvm]
   59.00  1.5% prep_new_page [kernel]
   50.00  1.3% get_user_pages_fast   [kernel]
   45.00  1.2% up_read   [kernel]
   42.00  1.1% down_read [kernel]
   38.00  1.0% gup_pud_range [kernel]
   34.00  0.9% kvm_clear_async_pf_completion_queue   [kvm]
   18.00  0.5% intel_iommu_map   [kernel]
   16.00  0.4% _cond_resched [kernel]
   16.00  0.4% gfn_to_hva[kvm]
   15.00  0.4% kvm_set_apic_base [kvm]
   15.00  0.4% load_vmcs12_host_state[kvm_intel]
   14.00  0.4% clear_huge_page   [kernel]
7.00  0.2% intel_iommu_iova_to_phys  [kernel]
6.00  0.2% is_error_pfn  [kvm]
6.00  0.2% iommu_map [kernel]
6.00  0.2% native_write_msr_safe [kernel]
5.00  0.1% find_vma  [kernel]

-rt mlock=on option is set, perf top -p qemu pid result:
   PerfTop: 326 irqs/sec  kernel:17.5% us: 2.8% guest kernel:37.4% guest 
us:42.3% exact:  0.0% [1000Hz cycles],  (target_pid: 25845)


 samples  pcnt function  DSO
 ___ _ _ ___

  182.00 17.5% pfn_to_dma_pte[kernel]
  178.00 17.1% gup_huge_pmd  [kernel]
   91.00  8.8% __domain_mapping  [kernel]
   71.00  6.8% update_memslots   [kvm]
   65.00  6.3% gup_pud_range [kernel]
   62.00  6.0% get_user_pages_fast   [kernel]
   52.00  5.0% kvm_clear_async_pf_completion_queue   [kvm]
   50.00  4.8% down_read [kernel]
   37.00  3.6% up_read   [kernel]
   26.00  2.5% intel_iommu_map   [kernel]
   20.00  1.9% native_write_msr_safe [kernel]
   16.00  1.5% gfn_to_hva[kvm]
   14.00  1.3% load_vmcs12_host_state[kvm_intel]
8.00  0.8% find_busiest_group[kernel]
8.00  0.8% _raw_spin_lock[kernel]
8.00  0.8% hrtimer_interrupt [kernel]
8.00  0.8% intel_iommu_iova_to_phys  [kernel]
7.00  0.7% iommu_map [kernel]
6.00  0.6% kvm_mmu_pte_write [kvm]
6.00  0.6% is_error_pfn  [kvm]
5.00  0.5% kvm_set_apic_base [kvm]
5.00  0.5% clear_page_c  [kernel]
5.00  0.5% iommu_iova_to_phys[kernel]

With -rt mlock=on option not set, when iommu_map, many new pages have to be 
allocated and cleared, the clear operation is expensive.
but no matter whether the -rt mlock=on option is set or not, the GPA-HPA 
DMAR page-table MUST be built, this operation is also expensive, about 1-2 sec 
needed for 25GB memory.

Thanks,
Zhang Haoyu
Paolo


Re: [Qemu-devel] hotplug: VM got stuck when attaching a pass-through device to the non-pass-through VM for the first time

2014-02-18 Thread Zhanghaoyu (A)
 Hi, all
 
 The VM will get stuck for a while(about 6s for a VM with 20GB memory) when 
 attaching a pass-through PCI card to the non-pass-through VM for the first 
 time. 
 The reason is that the host will build the whole VT-d GPA-HPA DMAR 
 page-table, which needs a lot of time, and during this time, the 
 qemu_global_mutex lock is hold by the main-thread, if the vcpu thread IOCTL 
 return, it will be blocked to waiting main-thread to release the 
 qemu_global_mutex lock, so the VM got stuck.
 The race between qemu-main-thread and vcpu-thread is shown as below,
 
   QEMU-main-threadvcpu-thread

  | |
   qemu_mutex_lock_iothread 
 qemu_mutex_lock(qemu_global_mutex)
  | |
 +loop- -+   +loop+ 
   
 ||   | |
 |  qemu_mutex_unlock_iothread| 
 qemu_mutex_unlock_iothread 
 ||   | | 
  
 |   poll |
 kvm_vcpu_ioctl(KVM_RUN) 
 ||   | | 
  
 | qemu_mutex_lock_iothread   | |
 ||   | | 
  
 --
 ||   |  
 qemu_mutex_lock_iothread
 |   kvm_device_pci_assign| | 
  
 ||   |   blocked to waiting 
 main-thread to release the qemu lock
 |  about 6 sec for 20GB memory   | | 
  
 ||   | | 
 
 ++   +-+ 
  
 
 
 Any advises?
 
 Thanks,
 Zhang Haoyu

What if you detach and re-attach?
Is it fast then?
Yes, because the VT-d GPA-HPA DMAR page-table has been built, no need to 
re-build it.

If yes this means the issue is COW breaking that occurs with get_user_pages, 
not translation as such.
Try hugepages with prealloc - does it help?
Yes, a bit help gained, but it cannot resolve the problem completely, the stuck 
still happened.

Thanks,
Zhang Haoyu



Re: [Qemu-devel] hotplug: VM got stuck when attaching a pass-through device to the non-pass-through VM for the first time

2014-02-18 Thread Zhanghaoyu (A)
 What if you detach and re-attach?
 Is it fast then?
 If yes this means the issue is COW breaking that occurs with 
 get_user_pages, not translation as such.
 Try hugepages with prealloc - does it help?

I agree it's either COW breaking or (similarly) locking pages that the guest 
hasn't touched yet.

You can use prealloc or -rt mlock=on to avoid this problem.

It gets better if using -rt mlock=on, but still cannot resolve the problem 
completely.
VT-d and EPT do not share the GPA-HPA page-table, still need to build VT-d 
GPA-HPA DMAR page-table,
Although the -rt mlock=on option guarantees that all of vm memory have been 
touched before attaching 
the pass-through device, the building is faster, but which still need some time.

Thanks,
Zhang Haoyu

Paolo



[Qemu-devel] hotplug: VM got stuck when attaching a pass-through device to the non-pass-through VM for the first time

2014-02-17 Thread Zhanghaoyu (A)
Hi, all

The VM will get stuck for a while(about 6s for a VM with 20GB memory) when 
attaching a pass-through PCI card to the non-pass-through VM for the first 
time. 
The reason is that the host will build the whole VT-d GPA-HPA DMAR page-table, 
which needs a lot of time, and during this time, the qemu_global_mutex
lock is hold by the main-thread, if the vcpu thread IOCTL return, it will be 
blocked to waiting main-thread to release the qemu_global_mutex lock,
so the VM got stuck.
The race between qemu-main-thread and vcpu-thread is shown as below,

  QEMU-main-threadvcpu-thread   

 | |
  qemu_mutex_lock_iothread 
qemu_mutex_lock(qemu_global_mutex)
 | |
+loop- -+   +loop+
   
||   | |
|  qemu_mutex_unlock_iothread| 
qemu_mutex_unlock_iothread 
||   | |
  
|   poll |
kvm_vcpu_ioctl(KVM_RUN) 
||   | |
  
| qemu_mutex_lock_iothread   | |
||   | | 
 
--
||   |  qemu_mutex_lock_iothread
|   kvm_device_pci_assign| |
  
||   |   blocked to waiting 
main-thread to release the qemu lock
|  about 6 sec for 20GB memory   | |
  
||   | |
 
++   +-+
  


Any advises?

Thanks,
Zhang Haoyu



Re: [Qemu-devel] [RFC] create a single workqueue for each vm to update vm irq routing table

2013-11-29 Thread Zhanghaoyu (A)
On Tue, Nov 26, 2013 at 06:14:27PM +0200, Gleb Natapov wrote:
 On Tue, Nov 26, 2013 at 06:05:37PM +0200, Michael S. Tsirkin wrote:
  On Tue, Nov 26, 2013 at 02:56:10PM +0200, Gleb Natapov wrote:
   On Tue, Nov 26, 2013 at 01:47:03PM +0100, Paolo Bonzini wrote:
Il 26/11/2013 13:40, Zhanghaoyu (A) ha scritto:
 When guest set irq smp_affinity, VMEXIT occurs, then the vcpu 
 thread will IOCTL return to QEMU from hypervisor, then vcpu 
 thread ask the hypervisor to update the irq routing table, in 
 kvm_set_irq_routing, synchronize_rcu is called, current vcpu thread 
 is blocked for so much time to wait RCU grace period, and during 
 this period, this vcpu cannot provide service to VM, so those 
 interrupts delivered to this vcpu cannot be handled in time, and the 
 apps running on this vcpu cannot be serviced too.
 It's unacceptable in some real-time scenario, e.g. telecom. 
 
 So, I want to create a single workqueue for each VM, to 
 asynchronously performing the RCU synchronization for irq routing 
 table, and let the vcpu thread return and VMENTRY to service VM 
 immediately, no more need to blocked to wait RCU grace period.
 And, I have implemented a raw patch, took a test in our telecom 
 environment, above problem disappeared.

I don't think a workqueue is even needed.  You just need to use 
call_rcu to free old after releasing kvm-irq_lock.

What do you think?

   It should be rate limited somehow. Since it guest triggarable 
   guest may cause host to allocate a lot of memory this way.
  
  The checks in __call_rcu(), should handle this I think.  These keep 
  a per-CPU counter, which can be adjusted via rcutree.blimit, which 
  defaults to taking evasive action if more than 10K callbacks are 
  waiting on a given CPU.
  
  
 Documentation/RCU/checklist.txt has:
 
 An especially important property of the synchronize_rcu()
 primitive is that it automatically self-limits: if grace periods
 are delayed for whatever reason, then the synchronize_rcu()
 primitive will correspondingly delay updates.  In contrast,
 code using call_rcu() should explicitly limit update rate in
 cases where grace periods are delayed, as failing to do so can
 result in excessive realtime latencies or even OOM conditions.

I just asked Paul what this means.

My understanding shown as blow,
The synchronous grace period API synchronize_rcu() can prevent current thread 
from generating a large number of rcu-update subsequently, just as the 
self-limits described above in Documentation/RCU/checklist.txt, can avoid 
memory exhaustion, but the asynchronous API call_rcu() cannot limit the update 
rate, need explicitly rate limit.

Thanks,
Zhang Haoyu

 --
  Gleb.



Re: [Qemu-devel] [RFC] create a single workqueue for each vm to update vm irq routing table

2013-11-28 Thread Zhanghaoyu (A)
  No, this would be exactly the same code that is running now:

  mutex_lock(kvm-irq_lock);
  old = kvm-irq_routing;
  kvm_irq_routing_update(kvm, new);
  mutex_unlock(kvm-irq_lock);

  synchronize_rcu();
  kfree(old);
  return 0;

  Except that the kfree would run in the call_rcu kernel thread 
 instead of
  the vcpu thread.  But the vcpus already see the new routing table 
 after
  the rcu_assign_pointer that is in kvm_irq_routing_update.

 I understood the proposal was also to eliminate the 
 synchronize_rcu(), so while new interrupts would see the new 
 routing table, interrupts already in flight could pick up the old one.
 Isn't that always the case with RCU?  (See my answer above: the 
 vcpus already see the new routing table after the rcu_assign_pointer 
 that is in kvm_irq_routing_update).
 With synchronize_rcu(), you have the additional guarantee that any 
 parallel accesses to the old routing table have completed.  Since we 
 also trigger the irq from rcu context, you know that after
 synchronize_rcu() you won't get any interrupts to the old destination 
 (see kvm_set_irq_inatomic()).
 We do not have this guaranty for other vcpus that do not call 
 synchronize_rcu(). They may still use outdated routing table while a 
 vcpu or iothread that performed table update sits in synchronize_rcu().


Consider this guest code:

   write msi entry, directing the interrupt away from this vcpu
   nop
   memset(idt, 0, sizeof(idt));

Currently, this code will never trigger a triple fault.  With the change to 
call_rcu(), it may.

Now it may be that the guest does not expect this to work (PCI writes are 
posted; and interrupts can be delayed indefinitely by the pci fabric), 
but we don't know if there's a path that guarantees the guest something that 
we're taking away with this change.

In native environment, if a CPU's LAPIC's IRR and ISR have been pending many 
interrupts, then OS perform zeroing this CPU's IDT before receiving interrupts,
will the same problem happen?

Zhang Haoyu



Re: [Qemu-devel] [RFC] create a single workqueue for each vm to update vm irq routing table

2013-11-27 Thread Zhanghaoyu (A)
  I don't think a workqueue is even needed.  You just need to use 
  call_rcu to free old after releasing kvm-irq_lock.
  
  What do you think?
 
 It should be rate limited somehow. Since it guest triggarable guest 
 may cause host to allocate a lot of memory this way.

Why does use call_rcu to free old after releasing kvm-irq_lock may cause 
host to allocate a lot of memory?
Do you mean that malicious guest's frequent irq-routing-table updating 
operations will result in too many delayed mem-free of old irq-routing-tables?

Thanks,
Zhang Haoyu

True, though if I understand Zhanghaoyu's proposal a workqueue would be even 
worse.


Paolo



Re: [Qemu-devel] [RFC] create a single workqueue for each vm to update vm irq routing table

2013-11-27 Thread Zhanghaoyu (A)
  I understood the proposal was also to eliminate the 
  synchronize_rcu(), so while new interrupts would see the new 
  routing table, interrupts already in flight could pick up the old one.
  Isn't that always the case with RCU?  (See my answer above: the 
  vcpus already see the new routing table after the 
  rcu_assign_pointer that is in kvm_irq_routing_update).
  
  With synchronize_rcu(), you have the additional guarantee that any 
  parallel accesses to the old routing table have completed.  Since 
  we also trigger the irq from rcu context, you know that after
  synchronize_rcu() you won't get any interrupts to the old 
  destination (see kvm_set_irq_inatomic()).
 We do not have this guaranty for other vcpus that do not call 
 synchronize_rcu(). They may still use outdated routing table while a 
 vcpu or iothread that performed table update sits in synchronize_rcu().

Avi's point is that, after the VCPU resumes execution, you know that no 
interrupt will be sent to the old destination because kvm_set_msi_inatomic 
(and ultimately kvm_irq_delivery_to_apic_fast) is also called within the RCU 
read-side critical section.

Without synchronize_rcu you could have

VCPU writes to routing table
   e = entry from IRQ routing table
kvm_irq_routing_update(kvm, new);
VCPU resumes execution
   kvm_set_msi_irq(e, irq);
   kvm_irq_delivery_to_apic_fast();

where the entry is stale but the VCPU has already resumed execution.

If we use call_rcu()(Not consider the problem that Gleb pointed out 
temporarily) instead of synchronize_rcu(), should we still ensure this?

Thanks,
Zhang Haoyu

If we want to ensure, we need to use a different mechanism for synchronization 
than the global RCU.  QRCU would work; readers are not wait-free but only if 
there is a concurrent synchronize_qrcu, which should be rare.

Paolo



[Qemu-devel] [RFC] create a single workqueue for each vm to update vm irq routing table

2013-11-26 Thread Zhanghaoyu (A)
Hi all,

When guest set irq smp_affinity, VMEXIT occurs, then the vcpu thread will IOCTL 
return to QEMU from hypervisor, then vcpu thread ask the hypervisor to update 
the irq routing table,
in kvm_set_irq_routing, synchronize_rcu is called, current vcpu thread is 
blocked for so much time to wait RCU grace period, and during this period, this 
vcpu cannot provide service to VM,
so those interrupts delivered to this vcpu cannot be handled in time, and the 
apps running on this vcpu cannot be serviced too.
It's unacceptable in some real-time scenario, e.g. telecom. 

So, I want to create a single workqueue for each VM, to asynchronously 
performing the RCU synchronization for irq routing table, 
and let the vcpu thread return and VMENTRY to service VM immediately, no more 
need to blocked to wait RCU grace period.
And, I have implemented a raw patch, took a test in our telecom environment, 
above problem disappeared.

Any better ideas?

Thanks,
Zhang Haoyu




[Qemu-devel] question about VM kernel parameter idle=poll/mwait/halt/nomwait

2013-11-20 Thread Zhanghaoyu (A)
Hi, all

What's the difference of the linux guest kernel parameter 
idle=poll/mwait/halt/nomwait, especially in performance?

Taking the performance into account, which one is best?

In my opinion, if the number of all VMs' vcpus is far more than that of pcpus, 
e.g. SPECVirt test, idle=halt is better for server's total throughput,
otherwise, e.g. in some CT scenario, the number of total vcpus is not greater 
than that of pcpus, idle=poll is better for server's total throughput,
because of less latency and VMEXIT.

linux-3.9 and above, idle=mwait is not recommended.

Thanks,
Zhang Haoyu



[Qemu-devel] [patch] avoid a bogus COMPLETED-CANCELLED transition

2013-11-07 Thread Zhanghaoyu (A)
Avoid a bogus COMPLETED-CANCELLED transition.
There is a period of time from the timing of setting COMPLETED state to that of 
migration thread exits, so during which it's problematic in 
COMPLETED-CANCELLED transition.

Signed-off-by: Zeng Junliang zengjunli...@huawei.com
Signed-off-by: Zhang Haoyu haoyu.zh...@huawei.com
---
 migration.c |9 -
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/migration.c b/migration.c
index 2b1ab20..fd73b97 100644
--- a/migration.c
+++ b/migration.c
@@ -326,9 +326,16 @@ void migrate_fd_error(MigrationState *s)
 
 static void migrate_fd_cancel(MigrationState *s)
 {
+int old_state ;
 DPRINTF(cancelling migration\n);
 
-migrate_set_state(s, s-state, MIG_STATE_CANCELLED);
+do {
+old_state = s-state;
+if (old_state != MIG_STATE_SETUP  old_state != MIG_STATE_ACTIVE) {
+break;
+}
+migrate_set_state(s, old_state, MIG_STATE_CANCELLED);
+} while (s-state != MIG_STATE_CANCELLED);
 }
 
 void add_migration_state_change_notifier(Notifier *notify)
-- 
1.7.3.1.msysgit.0




[Qemu-devel] [patch] introduce MIG_STATE_CANCELLING state

2013-11-07 Thread Zhanghaoyu (A)
Introduce MIG_STATE_CANCELLING state to avoid starting a new migration task 
while the previous one still exist.

Signed-off-by: Zeng Junliang zengjunli...@huawei.com
Signed-off-by: Zhang Haoyu haoyu.zh...@huawei.com
---
 migration.c |   26 --
 1 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/migration.c b/migration.c
index fd73b97..af8a09c 100644
--- a/migration.c
+++ b/migration.c
@@ -40,6 +40,7 @@ enum {
 MIG_STATE_ERROR = -1,
 MIG_STATE_NONE,
 MIG_STATE_SETUP,
+MIG_STATE_CANCELLING,
 MIG_STATE_CANCELLED,
 MIG_STATE_ACTIVE,
 MIG_STATE_COMPLETED,
@@ -196,6 +197,7 @@ MigrationInfo *qmp_query_migrate(Error **errp)
 info-has_total_time = false;
 break;
 case MIG_STATE_ACTIVE:
+case MIG_STATE_CANCELLING:
 info-has_status = true;
 info-status = g_strdup(active);
 info-has_total_time = true;
@@ -282,6 +284,13 @@ void 
qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
 
 /* shared migration helpers */
 
+static void migrate_set_state(MigrationState *s, int old_state, int new_state)
+{
+if (atomic_cmpxchg(s-state, old_state, new_state) == new_state) {
+trace_migrate_set_state(new_state);
+}
+}
+
 static void migrate_fd_cleanup(void *opaque)
 {
 MigrationState *s = opaque;
@@ -303,18 +312,14 @@ static void migrate_fd_cleanup(void *opaque)
 
 if (s-state != MIG_STATE_COMPLETED) {
 qemu_savevm_state_cancel();
+if (s-state == MIG_STATE_CANCELLING) {
+migrate_set_state(s, MIG_STATE_CANCELLING, MIG_STATE_CANCELLED);
+}
 }
 
 notifier_list_notify(migration_state_notifiers, s);
 }
 
-static void migrate_set_state(MigrationState *s, int old_state, int new_state)
-{
-if (atomic_cmpxchg(s-state, old_state, new_state) == new_state) {
-trace_migrate_set_state(new_state);
-}
-}
-
 void migrate_fd_error(MigrationState *s)
 {
 DPRINTF(setting error state\n);
@@ -334,8 +339,8 @@ static void migrate_fd_cancel(MigrationState *s)
 if (old_state != MIG_STATE_SETUP  old_state != MIG_STATE_ACTIVE) {
 break;
 }
-migrate_set_state(s, old_state, MIG_STATE_CANCELLED);
-} while (s-state != MIG_STATE_CANCELLED);
+migrate_set_state(s, old_state, MIG_STATE_CANCELLING);
+} while (s-state != MIG_STATE_CANCELLING);
 }
 
 void add_migration_state_change_notifier(Notifier *notify)
@@ -412,7 +417,8 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
 params.blk = has_blk  blk;
 params.shared = has_inc  inc;
 
-if (s-state == MIG_STATE_ACTIVE || s-state == MIG_STATE_SETUP) {
+if (s-state == MIG_STATE_ACTIVE || s-state == MIG_STATE_SETUP ||
+s-state == MIG_STATE_CANCELLING) {
 error_set(errp, QERR_MIGRATION_ACTIVE);
 return;
 }
-- 
1.7.3.1.msysgit.0




Re: [Qemu-devel] [migration] questions about removing the old block-migration code

2013-11-07 Thread Zhanghaoyu (A)
 I read below words on the report of KVM Live Migration: Weather 
 forecast (May 29, 2013), We were going to remove the old 
 block-migration code Then people fixed it
 Good: it works now
 Bad: We have to maintain both
 It uses the same port than migration
 You need to migrate all/none of block devices
 
 The old block-migration code said above is that in block-migration.c?

Yes.

 What are the reasons of removing the old block-migration code? Buggy 
 implementation? Or need to migrate all/none of block devices?

Buggy and tightly coupled with the live migration code, making it hard to 
modify either area independently.

Thanks a lot for explaining.
Till now, we still use the old block-migration code in our virtualization 
solution.
Could you detail the bugs that the old block-migration code have?

Thanks,
Zhang Haoyu


 What's the substitutional method? drive_mirror?

drive_mirror over NBD is an alternative.  There are security and integration 
challenges with those approaches but libvirt has added drive-mirror block 
migration support.

Stefan



Re: [Qemu-devel] [PATCH] migration: avoid starting a new migration task while the previous one still exist

2013-11-05 Thread Zhanghaoyu (A)
 Avoid starting a new migration task while the previous one still
 exist.

 Can you explain how to reproduce the problem?

 When network disconnection between source and destination happened, 
 the migration thread stuck at below stack,
 Then I cancel the migration task, the migration state in qemu will be set to 
 MIG_STATE_CANCELLED, so the migration job in libvirt quits.
 Then I perform migration again, at this time, the network reconnected 
 successfully, since the TCP timeout retransmission, above stack will not 
 return immediately, so two migration tasks exist at the same time.
 And still worse, source qemu will crash, because of accessing the NULL 
 pointer in qemu_bh_schedule(s-cleanup_bh); statement in latter migration 
 task, since the s-cleanup_bh had been deleted by previous migration task.

Thanks for explaining.  CANCELLING looks like a useful addition.

Why do you need both CANCELLING and COMPLETING?  The COMPLETED state should be 
set only after all I/O is done.

There is a period of time from the timing of setting COMPLETED state to that of 
migration task exits,
so it's problematic in COMPLETED-CANCELLED transition, but if applying your 
below proposal, the problem gone.
do {
old_state = s-state;
if (old_state != MIG_STATE_SETUP  old_state != MIG_STATE_ACTIVE) {
break;
}
migrate_set_state(s, old_state, MIG_STATE_CANCELLED);
} while (s-state != MIG_STATE_CANCELLED);

I agree with Eric that the CANCELLING state should not be exposed via QMP.
info migrate and query-migrate can keep showing active for maximum 
backwards compatibility.

More comments below.


 -if (s-state != MIG_STATE_COMPLETED) {
 +if (s-state != MIG_STATE_COMPLETING) {
  qemu_savevm_state_cancel();
 +if (s-state == MIG_STATE_CANCELLING) {
 +migrate_set_state(s, MIG_STATE_CANCELLING, 
 MIG_STATE_CANCELLED); 
 +}

I think you can remove the if and unconditionally call migrate_set_state.

Do you mean to remove the if (s-state == MIG_STATE_CANCELLING) ?
The s-state probably is MIG_STATE_ERROR here, is it okay to unconditionally 
call migrate_set_state?

Thanks,
Zhang Haoyu


 +}else {
 +migrate_set_state(s, MIG_STATE_COMPLETING, 
 + MIG_STATE_COMPLETED);
  }
  
  notifier_list_notify(migration_state_notifiers, s);  }
  
 -static void migrate_set_state(MigrationState *s, int old_state, int 
 new_state) -{
 -if (atomic_cmpxchg(s-state, old_state, new_state) == new_state) {
 -trace_migrate_set_state(new_state);
 -}
 -}
 -
  void migrate_fd_error(MigrationState *s)  {
  DPRINTF(setting error state\n); @@ -328,7 +337,7 @@ static void 
 migrate_fd_cancel(MigrationState *s)  {
  DPRINTF(cancelling migration\n);
  
 -migrate_set_state(s, s-state, MIG_STATE_CANCELLED);
 +migrate_set_state(s, s-state, MIG_STATE_CANCELLING);

Here probably we want something like

do {
old_state = s-state;
if (old_state != MIG_STATE_SETUP  old_state != MIG_STATE_ACTIVE) {
break;
}
migrate_set_state(s, old_state, MIG_STATE_CANCELLING);
} while (s-state != MIG_STATE_CANCELLING);

to avoid a bogus COMPLETED-CANCELLED transition.  Please separate the patch 
in two parts:

(1) the first uses the above code, with CANCELLED instead of CANCELLING

(2) the second, similar to the one you have posted, introduces the new 
CANCELLING state

Thanks,

Paolo



Re: [Qemu-devel] About the IO-mirroring functionality inside the qemu

2013-11-05 Thread Zhanghaoyu (A)
Hi all,

Does the Qemu have the storage migration tool, like the io-mirroring inside 
the vmware? io-mirroring means for all the ioes, they are send to both source 
and destination at the same time.
drive_mirror maybe your choice.

Thanks,
Zhang Haoyu

Thanks!



[Qemu-devel] [PATCH] migration: avoid starting a new migration task while the previous one still exist

2013-11-04 Thread Zhanghaoyu (A)
Avoid starting a new migration task while the previous one still exist.

Signed-off-by: Zeng Junliang zengjunli...@huawei.com
---
 migration.c |   34 ++
 1 files changed, 22 insertions(+), 12 deletions(-)

diff --git a/migration.c b/migration.c
index 2b1ab20..ab4c439 100644
--- a/migration.c
+++ b/migration.c
@@ -40,8 +40,10 @@ enum {
 MIG_STATE_ERROR = -1,
 MIG_STATE_NONE,
 MIG_STATE_SETUP,
+MIG_STATE_CANCELLING,
 MIG_STATE_CANCELLED,
 MIG_STATE_ACTIVE,
+MIG_STATE_COMPLETING,
 MIG_STATE_COMPLETED,
 };
 
@@ -196,6 +198,8 @@ MigrationInfo *qmp_query_migrate(Error **errp)
 info-has_total_time = false;
 break;
 case MIG_STATE_ACTIVE:
+case MIG_STATE_CANCELLING:
+case MIG_STATE_COMPLETING:
 info-has_status = true;
 info-status = g_strdup(active);
 info-has_total_time = true;
@@ -282,6 +286,13 @@ void 
qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
 
 /* shared migration helpers */
 
+static void migrate_set_state(MigrationState *s, int old_state, int new_state)
+{
+if (atomic_cmpxchg(s-state, old_state, new_state) == new_state) {
+trace_migrate_set_state(new_state);
+}
+}
+
 static void migrate_fd_cleanup(void *opaque)
 {
 MigrationState *s = opaque;
@@ -301,20 +312,18 @@ static void migrate_fd_cleanup(void *opaque)
 
 assert(s-state != MIG_STATE_ACTIVE);
 
-if (s-state != MIG_STATE_COMPLETED) {
+if (s-state != MIG_STATE_COMPLETING) {
 qemu_savevm_state_cancel();
+if (s-state == MIG_STATE_CANCELLING) {
+migrate_set_state(s, MIG_STATE_CANCELLING, MIG_STATE_CANCELLED); 
+}
+}else {
+migrate_set_state(s, MIG_STATE_COMPLETING, MIG_STATE_COMPLETED); 
 }
 
 notifier_list_notify(migration_state_notifiers, s);
 }
 
-static void migrate_set_state(MigrationState *s, int old_state, int new_state)
-{
-if (atomic_cmpxchg(s-state, old_state, new_state) == new_state) {
-trace_migrate_set_state(new_state);
-}
-}
-
 void migrate_fd_error(MigrationState *s)
 {
 DPRINTF(setting error state\n);
@@ -328,7 +337,7 @@ static void migrate_fd_cancel(MigrationState *s)
 {
 DPRINTF(cancelling migration\n);
 
-migrate_set_state(s, s-state, MIG_STATE_CANCELLED);
+migrate_set_state(s, s-state, MIG_STATE_CANCELLING);
 }
 
 void add_migration_state_change_notifier(Notifier *notify)
@@ -405,7 +414,8 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
 params.blk = has_blk  blk;
 params.shared = has_inc  inc;
 
-if (s-state == MIG_STATE_ACTIVE || s-state == MIG_STATE_SETUP) {
+if (s-state == MIG_STATE_ACTIVE || s-state == MIG_STATE_SETUP ||
+s-state == MIG_STATE_COMPLETING || s-state == MIG_STATE_CANCELLING) {
 error_set(errp, QERR_MIGRATION_ACTIVE);
 return;
 }
@@ -594,7 +604,7 @@ static void *migration_thread(void *opaque)
 }
 
 if (!qemu_file_get_error(s-file)) {
-migrate_set_state(s, MIG_STATE_ACTIVE, 
MIG_STATE_COMPLETED);
+migrate_set_state(s, MIG_STATE_ACTIVE, 
MIG_STATE_COMPLETING);
 break;
 }
 }
@@ -634,7 +644,7 @@ static void *migration_thread(void *opaque)
 }
 
 qemu_mutex_lock_iothread();
-if (s-state == MIG_STATE_COMPLETED) {
+if (s-state == MIG_STATE_COMPLETING) {
 int64_t end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
 s-total_time = end_time - s-total_time;
 s-downtime = end_time - start_time;
-- 
1.7.3.1.msysgit.0

BTW, while error happened during migration, need the erroring state to avoid 
starting a new migration task while current migration task still exist?
And, do the new added migration states need to be reported to libvirt?



Re: [Qemu-devel] [PATCH] migration: avoid starting a new migration task while the previous one still exist

2013-11-04 Thread Zhanghaoyu (A)
  Avoid starting a new migration task while the previous one still
 exist.
 
 Can you explain how to reproduce the problem?
 
When network disconnection between source and destination happened, the 
migration thread stuck at below stack,
#0  0x7f07e96c8288 in writev () from /lib64/libc.so.6
#1  0x7f07eb9bf11d in unix_writev_buffer (opaque=0x7f07eca2de80, 
iov=0x7f07ede9b1e0, iovcnt=64,
pos=259870577) at /mnt/sdb2/zjl/bsod0x101_0821/qemu-kvm-1.5.1/savevm.c:354
#2  0x7f07eb9bf999 in qemu_fflush (f=0x7f07ede931b0)
at /mnt/sdb2/zjl/bsod0x101_0821/qemu-kvm-1.5.1/savevm.c:600
#3  0x7f07eb9c011f in add_to_iovec (f=0x7f07ede931b0, buf=0x7f000ee23000 
, size=4096)
at /mnt/sdb2/zjl/bsod0x101_0821/qemu-kvm-1.5.1/savevm.c:756
#4  0x7f07eb9c01c0 in qemu_put_buffer_async (f=0x7f07ede931b0, 
buf=0x7f000ee23000 , size=4096)
at /mnt/sdb2/zjl/bsod0x101_0821/qemu-kvm-1.5.1/savevm.c:772
#5  0x7f07eb92ad2f in ram_save_block (f=0x7f07ede931b0, last_stage=false)
at /mnt/sdb2/zjl/bsod0x101_0821/qemu-kvm-1.5.1/arch_init.c:493
#6  0x7f07eb92b30c in ram_save_iterate (f=0x7f07ede931b0, opaque=0x0)
at /mnt/sdb2/zjl/bsod0x101_0821/qemu-kvm-1.5.1/arch_init.c:654
#7  0x7f07eb9c2e12 in qemu_savevm_state_iterate (f=0x7f07ede931b0)
at /mnt/sdb2/zjl/bsod0x101_0821/qemu-kvm-1.5.1/savevm.c:1914
#8  0x7f07eb8975e1 in migration_thread (opaque=0x7f07ebf53300 
current_migration.25325)
at migration.c:578
Then I cancel the migration task, the migration state in qemu will be set to 
MIG_STATE_CANCELLED, so the migration job in libvirt quits.
Then I perform migration again, at this time, the network reconnected 
successfully,
since the TCP timeout retransmission, above stack will not return immediately, 
so two migration tasks exist at the same time.
And still worse, source qemu will crash, because of accessing the NULL pointer 
in qemu_bh_schedule(s-cleanup_bh); statement in latter migration task, 
since the s-cleanup_bh had been deleted by previous migration task.

 Also please use pbonz...@redhat.com instead.  My Gmail address is an
 implementation detail. :)
 
  Signed-off-by: Zeng Junliang zengjunli...@huawei.com
 
 It looks like the author of the patch is not the same as you.  If so,
 you need to make Zeng Junliang the author (using --author on the git
 commit command line) and add your own signoff line.
So sorry for my poor experience.

 
 Paolo
 

Avoid starting a new migration task while the previous one still exist.

Signed-off-by: Zeng Junliang zengjunli...@huawei.com
Signed-off-by: Zhang Haoyu haoyu.zh...@huawei.com
---
 migration.c |   34 ++
 1 files changed, 22 insertions(+), 12 deletions(-)

diff --git a/migration.c b/migration.c
index 2b1ab20..ab4c439 100644
--- a/migration.c
+++ b/migration.c
@@ -40,8 +40,10 @@ enum {
 MIG_STATE_ERROR = -1,
 MIG_STATE_NONE,
 MIG_STATE_SETUP,
+MIG_STATE_CANCELLING,
 MIG_STATE_CANCELLED,
 MIG_STATE_ACTIVE,
+MIG_STATE_COMPLETING,
 MIG_STATE_COMPLETED,
 };
 
@@ -196,6 +198,8 @@ MigrationInfo *qmp_query_migrate(Error **errp)
 info-has_total_time = false;
 break;
 case MIG_STATE_ACTIVE:
+case MIG_STATE_CANCELLING:
+case MIG_STATE_COMPLETING:
 info-has_status = true;
 info-status = g_strdup(active);
 info-has_total_time = true;
@@ -282,6 +286,13 @@ void 
qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
 
 /* shared migration helpers */
 
+static void migrate_set_state(MigrationState *s, int old_state, int new_state)
+{
+if (atomic_cmpxchg(s-state, old_state, new_state) == new_state) {
+trace_migrate_set_state(new_state);
+}
+}
+
 static void migrate_fd_cleanup(void *opaque)
 {
 MigrationState *s = opaque;
@@ -301,20 +312,18 @@ static void migrate_fd_cleanup(void *opaque)
 
 assert(s-state != MIG_STATE_ACTIVE);
 
-if (s-state != MIG_STATE_COMPLETED) {
+if (s-state != MIG_STATE_COMPLETING) {
 qemu_savevm_state_cancel();
+if (s-state == MIG_STATE_CANCELLING) {
+migrate_set_state(s, MIG_STATE_CANCELLING, MIG_STATE_CANCELLED); 
+}
+}else {
+migrate_set_state(s, MIG_STATE_COMPLETING, MIG_STATE_COMPLETED); 
 }
 
 notifier_list_notify(migration_state_notifiers, s);
 }
 
-static void migrate_set_state(MigrationState *s, int old_state, int new_state)
-{
-if (atomic_cmpxchg(s-state, old_state, new_state) == new_state) {
-trace_migrate_set_state(new_state);
-}
-}
-
 void migrate_fd_error(MigrationState *s)
 {
 DPRINTF(setting error state\n);
@@ -328,7 +337,7 @@ static void migrate_fd_cancel(MigrationState *s)
 {
 DPRINTF(cancelling migration\n);
 
-migrate_set_state(s, s-state, MIG_STATE_CANCELLED);
+migrate_set_state(s, s-state, MIG_STATE_CANCELLING);
 }
 
 void add_migration_state_change_notifier(Notifier *notify)
@@ -405,7 +414,8 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
 

[Qemu-devel] [migration] questions about removing the old block-migration code

2013-11-02 Thread Zhanghaoyu (A)
Hi, Juan

I read below words on the report of KVM Live Migration: Weather forecast (May 
29, 2013),
We were going to remove the old block-migration code
Then people fixed it
Good: it works now
Bad: We have to maintain both
It uses the same port than migration
You need to migrate all/none of block devices

The old block-migration code said above is that in block-migration.c?
What are the reasons of removing the old block-migration code? Buggy 
implementation? Or need to migrate all/none of block devices?
What's the substitutional method? drive_mirror?

Thanks,
Zhang Haoyu



Re: [Qemu-devel] [RESEND][PATCH] migration: drop MADVISE_DONT_NEED for incoming zero pages

2013-10-29 Thread Zhanghaoyu (A)
The comments of ram_handle_compressed needs to be changed accordingly,

Do not memset pages to zero if they already read as zero to avoid allocating 
zero pages and consuming memory unnecessarily.

Thanks,
Zhang Haoyu

 The madvise for zeroed out pages was introduced when every transferred
 zero page was memset to zero and thus allocated. Since commit
 211ea740 we check for zeroness of a target page before we memset
 it to zero. Additionally we memmap target memory so it is essentially
 zero initialized (except for e.g. option roms and bios which are loaded
 into target memory although they shouldn't).
 
 It was reported recently that this madvise causes a performance
 degradation
 in some situations. As the madvise should only be called rarely and if
 it's called
 it is likely on a busy page (it was non-zero and changed to zero during
 migration)
 drop it completely.
 
 Reported-By: Zhang Haoyu haoyu.zh...@huawei.com
 Acked-by: Paolo Bonzini pbonz...@redhat.com
 Signed-off-by: Peter Lieven p...@kamp.de
 ---
  arch_init.c |8 
  1 file changed, 8 deletions(-)
 
 diff --git a/arch_init.c b/arch_init.c
 index 7545d96..e0acbc5 100644
 --- a/arch_init.c
 +++ b/arch_init.c
 @@ -850,14 +850,6 @@ void ram_handle_compressed(void *host, uint8_t ch,
 uint64_t size)
  {
  if (ch != 0 || !is_zero_range(host, size)) {
  memset(host, ch, size);
 -#ifndef _WIN32
 -if (ch == 0  (!kvm_enabled() || kvm_has_sync_mmu())) {
 -size = size  ~(getpagesize() - 1);
 -if (size  0) {
 -qemu_madvise(host, size, QEMU_MADV_DONTNEED);
 -}
 -}
 -#endif
  }
  }
 
 --
 1.7.9.5
 




[Qemu-devel] migration: question about buggy implementation of traditional live migration with storage that migrating the storage in iteration way

2013-10-25 Thread Zhanghaoyu (A)
Hi, all

Could someone make a detailed statement for the buggy implementation of 
traditional live migration with storage that migrating the storage in iteration 
way?

Thanks,
Zhang Haoyu

 hi Michal,

 I used libvirt-1.0.3, ran below command to perform live migration, why no 
 progress shown up?
 virsh migrate --live --verbose --copy-storage-all domain
 qemu+tcp://dest ip/system

 If replacing libvirt-1.0.3 with libvirt-1.0.2, the migration 
 progress shown up, if performing migration without --copy-storage-all, 
 the migration progress shown up, too.

 Thanks,
 Zhang Haoyu


 Because since 1.0.3 we are using NBD to migrate storage. Truth is, 
 qemu is reporting progress of storage migration, however, there is no 
 generic formula to combine storage migration and internal state migration 
 into one number. With NBD the process is something like this:
 
 How to use NBD to migrate storage?
 Does NBD server in destination start automatically as soon as migration 
 initiated, or some other configurations needed?
 What's the advantages of using NBD to migrate storage over traditional 
 method that migrating the storage in iteration way, just like the way in 
 which migrating the memory?
 Sorry for my poor knowledge in NBD, by which I used to implement shared 
 storage for live migration without storage.

NBD is used whenever both src and dst of migration is new enough to use it. 
That is, libvirt = 1.0.3 and qemu = 1.0.3. The NBD is turned on by libvirt 
whenever the conditions are met. User has no control over this.
The advantage is: only specified disks can be transferred (currently not 
supported in libvirt), the previous implementation was buggy (according to 
some qemu developers), the storage is migrated via separate channel (a new 
connection) so it can be possible (in the future) to split migration of RAM + 
internal state and storage.

So frankly speaking, there's no real advantage for users now - besides not 
using buggy implementation.

Michal



Re: [Qemu-devel] why no progress shown after introduce NBD migration cookie

2013-10-22 Thread Zhanghaoyu (A)
Hi, all

Could someone make a detailed statement for the buggy implementation of 
traditional storage-migration method that migrating the storage in iteration 
way?

Thanks,
Zhang Haoyu

 hi Michal,

 I used libvirt-1.0.3, ran below command to perform live migration, why no 
 progress shown up?
 virsh migrate --live --verbose --copy-storage-all domain
 qemu+tcp://dest ip/system

 If replacing libvirt-1.0.3 with libvirt-1.0.2, the migration 
 progress shown up, if performing migration without --copy-storage-all, 
 the migration progress shown up, too.

 Thanks,
 Zhang Haoyu


 Because since 1.0.3 we are using NBD to migrate storage. Truth is, 
 qemu is reporting progress of storage migration, however, there is no 
 generic formula to combine storage migration and internal state migration 
 into one number. With NBD the process is something like this:
 
 How to use NBD to migrate storage?
 Does NBD server in destination start automatically as soon as migration 
 initiated, or some other configurations needed?
 What's the advantages of using NBD to migrate storage over traditional 
 method that migrating the storage in iteration way, just like the way in 
 which migrating the memory?
 Sorry for my poor knowledge in NBD, by which I used to implement shared 
 storage for live migration without storage.

NBD is used whenever both src and dst of migration is new enough to use it. 
That is, libvirt = 1.0.3 and qemu = 1.0.3. The NBD is turned on by libvirt 
whenever the conditions are met. User has no control over this.
The advantage is: only specified disks can be transferred (currently not 
supported in libvirt), the previous implementation was buggy (according to 
some qemu developers), the storage is migrated via separate channel (a new 
connection) so it can be possible (in the future) to split migration of RAM + 
internal state and storage.

So frankly speaking, there's no real advantage for users now - besides not 
using buggy implementation.

Michal

BTW: It's better to ask these kind of info on the libvir-list next time, 
others might contribute with much more info as well (e.g. some qemu developers 
tend to watch the libvir-list too).



[Qemu-devel] [PATCH] rdma: fix multiple VMs parallel migration

2013-10-10 Thread Zhanghaoyu (A)
When several VMs migrate with RDMA at the same time, the increased pressure 
cause packet loss probabilistically and make source and destination wait for 
each other. There might be some of VMs blocked during the migration.
Fix the bug by using two completion queues, for sending and receiving 
respectively.

Signed-off-by: Frank Yang frank.yang...@gmail.com
---
 migration-rdma.c | 58 +---
 1 file changed, 39 insertions(+), 19 deletions(-)

diff --git a/migration-rdma.c b/migration-rdma.c
index f94f3b4..33e8a92 100644
--- a/migration-rdma.c
+++ b/migration-rdma.c
@@ -363,7 +363,8 @@ typedef struct RDMAContext {
 struct ibv_qp *qp;  /* queue pair */
 struct ibv_comp_channel *comp_channel;  /* completion channel */
 struct ibv_pd *pd;  /* protection domain */
-struct ibv_cq *cq;  /* completion queue */
+struct ibv_cq *send_cq; /* completion queue */
+struct ibv_cq *recv_cq; /* receive completion queue */
 
 /*
  * If a previous write failed (perhaps because of a failed
@@ -1008,13 +1009,15 @@ static int qemu_rdma_alloc_pd_cq(RDMAContext *rdma)
 }
 
 /*
- * Completion queue can be filled by both read and write work requests,
- * so must reflect the sum of both possible queue sizes.
+ * Send completion queue is filled by both send and write work requests,
+ * Receive completion queue is filled by receive work requesets.
  */
-rdma-cq = ibv_create_cq(rdma-verbs, (RDMA_SIGNALED_SEND_MAX * 3),
+rdma-send_cq = ibv_create_cq(rdma-verbs, (RDMA_SIGNALED_SEND_MAX * 2),
 NULL, rdma-comp_channel, 0);
-if (!rdma-cq) {
-fprintf(stderr, failed to allocate completion queue\n);
+rdma-recv_cq = ibv_create_cq(rdma-verbs, RDMA_SIGNALED_SEND_MAX, NULL,
+rdma-comp_channel, 0);
+if (!rdma-send_cq || !rdma-recv_cq) {
+fprintf(stderr, failed to allocate completion queues\n);
 goto err_alloc_pd_cq;
 }
 
@@ -1045,8 +1048,8 @@ static int qemu_rdma_alloc_qp(RDMAContext *rdma)
 attr.cap.max_recv_wr = 3;
 attr.cap.max_send_sge = 1;
 attr.cap.max_recv_sge = 1;
-attr.send_cq = rdma-cq;
-attr.recv_cq = rdma-cq;
+attr.send_cq = rdma-send_cq;
+attr.recv_cq = rdma-recv_cq;
 attr.qp_type = IBV_QPT_RC;
 
 ret = rdma_create_qp(rdma-cm_id, rdma-pd, attr);
@@ -1366,13 +1369,18 @@ static void qemu_rdma_signal_unregister(RDMAContext 
*rdma, uint64_t index,
  * Return the work request ID that completed.
  */
 static uint64_t qemu_rdma_poll(RDMAContext *rdma, uint64_t *wr_id_out,
-   uint32_t *byte_len)
+   uint32_t *byte_len, int wrid_requested)
 {
 int ret;
 struct ibv_wc wc;
 uint64_t wr_id;
 
-ret = ibv_poll_cq(rdma-cq, 1, wc);
+if (wrid_requested == RDMA_WRID_RDMA_WRITE ||
+wrid_requested == RDMA_WRID_SEND_CONTROL) {
+ret = ibv_poll_cq(rdma-send_cq, 1, wc);
+} else if (wrid_requested = RDMA_WRID_RECV_CONTROL) {
+ret = ibv_poll_cq(rdma-recv_cq, 1, wc);
+}
 
 if (!ret) {
 *wr_id_out = RDMA_WRID_NONE;
@@ -1465,12 +1473,9 @@ static int qemu_rdma_block_for_wrid(RDMAContext *rdma, 
int wrid_requested,
 void *cq_ctx;
 uint64_t wr_id = RDMA_WRID_NONE, wr_id_in;
 
-if (ibv_req_notify_cq(rdma-cq, 0)) {
-return -1;
-}
 /* poll cq first */
 while (wr_id != wrid_requested) {
-ret = qemu_rdma_poll(rdma, wr_id_in, byte_len);
+ret = qemu_rdma_poll(rdma, wr_id_in, byte_len, wrid_requested);
 if (ret  0) {
 return ret;
 }
@@ -1492,6 +1497,17 @@ static int qemu_rdma_block_for_wrid(RDMAContext *rdma, 
int wrid_requested,
 }
 
 while (1) {
+if (wrid_requested == RDMA_WRID_RDMA_WRITE ||
+wrid_requested == RDMA_WRID_SEND_CONTROL) {
+if (ibv_req_notify_cq(rdma-send_cq, 0)) {
+return -1;
+}
+} else if (wrid_requested = RDMA_WRID_RECV_CONTROL) {
+if (ibv_req_notify_cq(rdma-recv_cq, 0)) {
+return -1;
+}
+}
+
 /*
  * Coroutine doesn't start until process_incoming_migration()
  * so don't yield unless we know we're running inside of a coroutine.
@@ -1512,7 +1528,7 @@ static int qemu_rdma_block_for_wrid(RDMAContext *rdma, 
int wrid_requested,
 }
 
 while (wr_id != wrid_requested) {
-ret = qemu_rdma_poll(rdma, wr_id_in, byte_len);
+ret = qemu_rdma_poll(rdma, wr_id_in, byte_len, wrid_requested);
 if (ret  0) {
 goto err_block_for_wrid;
 }
@@ -2241,9 +2257,13 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
 rdma_destroy_qp(rdma-cm_id);
 rdma-qp = NULL;
 }
-if (rdma-cq) {
-ibv_destroy_cq(rdma-cq);
-rdma-cq = NULL;
+  

Re: [Qemu-devel] [RFC] sync NIC's MAC maintained in NICConf as soon as emualted NIC's MAC changed in guest

2013-09-25 Thread Zhanghaoyu (A)
 Hi, all
 
 Do live migration if emulated NIC's MAC has been changed, RARP with 
 wrong MAC address will broadcast via qemu_announce_self in destination, so, 
 long time network disconnection probably happen.

Good catch.

 I want to do below works to resolve this problem, 1. change NICConf's 
 MAC as soon as emulated NIC's MAC changed in guest

This will make it impossible to revert it correctly on reset, won't it?

You are right.
virsh reboot domain, or virsh reset domain, or reboot VM from guest, will 
revert emulated NIC's MAC to original one maintained in NICConf.
During the reboot/reset flow in qemu, emulated NIC's reset handler will sync 
the MAC address in NICConf to the MAC address in emulated NIC structure,
e.g., virtio_net_reset sync the MAC address in NICConf to VirtIONet'mac.

BTW, in native scenario, reboot will revert the changed MAC to original one, 
too.

 2. sync NIC's (more precisely, queue) MAC to corresponding NICConf in 
 NIC's migration load handler
 
 Any better ideas?
 
 Thanks,
 Zhang Haoyu

I think announce needs to poke at the current MAC instead of the default one 
in NICConf.
We can make it respect link down state while we are at it.

NICConf structures are incorporated in different emulated NIC's structure, 
e.g., VirtIONet, E1000State_st, RTL8139State, etc.,
since so many kinds of emulated NICs, they are described by different 
structures, how to find all NICs' current MAC?

Maybe we can introduce a pointer member 'current_mac' to NICConf structure, 
which points to the current MAC,
then we can find all current MACs from NICConf.current_mac.

Can we broadcast the RARP with current MAC in NIC's migration load handler 
respectively?

Thanks,
Zhang Haoyu

Happily recent linux guests aren't affected since they do announcements from 
guest.

--
MST



Re: [Qemu-devel] [RFC] sync NIC's MAC maintained in NICConf as soon as emualted NIC's MAC changed in guest

2013-09-25 Thread Zhanghaoyu (A)
  Hi, all
  
  Do live migration if emulated NIC's MAC has been changed, RARP with 
  wrong MAC address will broadcast via qemu_announce_self in destination, 
  so, long time network disconnection probably happen.
 
 Good catch.
 
  I want to do below works to resolve this problem, 1. change 
  NICConf's MAC as soon as emulated NIC's MAC changed in guest
 
 This will make it impossible to revert it correctly on reset, won't it?
 
 You are right.
 virsh reboot domain, or virsh reset domain, or reboot VM from guest, 
 will revert emulated NIC's MAC to original one maintained in NICConf.
 During the reboot/reset flow in qemu, emulated NIC's reset handler 
 will sync the MAC address in NICConf to the MAC address in emulated NIC 
 structure, e.g., virtio_net_reset sync the MAC address in NICConf to 
 VirtIONet'mac.
 
 BTW, in native scenario, reboot will revert the changed MAC to original one, 
 too.
 
  2. sync NIC's (more precisely, queue) MAC to corresponding NICConf 
  in NIC's migration load handler
  
  Any better ideas?
  
  Thanks,
  Zhang Haoyu
 
 I think announce needs to poke at the current MAC instead of the default 
 one in NICConf.
 We can make it respect link down state while we are at it.
 
 NICConf structures are incorporated in different emulated NIC's 
 structure, e.g., VirtIONet, E1000State_st, RTL8139State, etc., since so many 
 kinds of emulated NICs, they are described by different structures, how to 
 find all NICs' current MAC?
 
 Maybe we can introduce a pointer member 'current_mac' to NICConf 
 structure, which points to the current MAC, then we can find all current 
 MACs from NICConf.current_mac.

I wouldn't make it a pointer, just a buffer with the mac, copy it there.
Maybe call it softmac that's what it is really.

 Can we broadcast the RARP with current MAC in NIC's migration load handler 
 respectively?
 
 Thanks,
 Zhang Haoyu

It's not so simple, you need to retry several times.

Could you make a statement for 'retry several times' ?
Is it the process of retrying several times to sending RARP in 
qemu_announce_self_once?

'broadcast the RARP with current MAC in NIC's migration load handler 
respectively' is distributing the job of what qemu_announce_self does to every 
NIC's migration load handler, 
e.g., in virtio NIC's migration load handler virtio_net_load, we can create a 
timer to retry several times to send ARAP with current MAC for this NIC, just 
as same as qemu_announce_self does.

 Happily recent linux guests aren't affected since they do announcements 
 from guest.
 
 --
 MST



Re: [Qemu-devel] [RFC] sync NIC's MAC maintained in NICConf as soon as emualted NIC's MAC changed in guest

2013-09-25 Thread Zhanghaoyu (A)
   Hi, all
   
   Do live migration if emulated NIC's MAC has been changed, RARP 
   with wrong MAC address will broadcast via qemu_announce_self in 
   destination, so, long time network disconnection probably happen.
  
  Good catch.
  
   I want to do below works to resolve this problem, 1. change 
   NICConf's MAC as soon as emulated NIC's MAC changed in guest
  
  This will make it impossible to revert it correctly on reset, won't it?
  
  You are right.
  virsh reboot domain, or virsh reset domain, or reboot VM from guest, 
  will revert emulated NIC's MAC to original one maintained in NICConf.
  During the reboot/reset flow in qemu, emulated NIC's reset handler 
  will sync the MAC address in NICConf to the MAC address in emulated NIC 
  structure, e.g., virtio_net_reset sync the MAC address in NICConf to 
  VirtIONet'mac.
  
  BTW, in native scenario, reboot will revert the changed MAC to original 
  one, too.
  
   2. sync NIC's (more precisely, queue) MAC to corresponding 
   NICConf in NIC's migration load handler
   
   Any better ideas?
   
   Thanks,
   Zhang Haoyu
  
  I think announce needs to poke at the current MAC instead of the default 
  one in NICConf.
  We can make it respect link down state while we are at it.
  
  NICConf structures are incorporated in different emulated NIC's 
  structure, e.g., VirtIONet, E1000State_st, RTL8139State, etc., since so 
  many kinds of emulated NICs, they are described by different structures, 
  how to find all NICs' current MAC?
  
  Maybe we can introduce a pointer member 'current_mac' to NICConf 
  structure, which points to the current MAC, then we can find all current 
  MACs from NICConf.current_mac.
 
 I wouldn't make it a pointer, just a buffer with the mac, copy it there.
 Maybe call it softmac that's what it is really.
 
  Can we broadcast the RARP with current MAC in NIC's migration load 
  handler respectively?
  
  Thanks,
  Zhang Haoyu
 
 It's not so simple, you need to retry several times.
 
 Could you make a statement for 'retry several times' ?
 Is it the process of retrying several times to sending RARP in 
 qemu_announce_self_once?

yes

 'broadcast the RARP with current MAC in NIC's migration load handler 
 respectively' is distributing the job of what qemu_announce_self does to 
 every NIC's migration load handler, e.g., in virtio NIC's migration load 
 handler virtio_net_load, we can create a timer to retry several times to 
 send ARAP with current MAC for this NIC, just as same as qemu_announce_self 
 does.

I don't see a lot of value in this yet.

In my opinion, it's not so good to introduce a 'softmac' member to NICConf, 
which is not essential function of NICConf.
And, distributing the job of what qemu_announce_self does to every NIC's 
migration load handler has no disadvantages over qemu_announce_self,
maybe more immediately to updating the forwarding table of switches/bridges.

  Happily recent linux guests aren't affected since they do announcements 
  from guest.
  
  --
  MST



Re: [Qemu-devel] [RFC] sync NIC's MAC maintained in NICConf as soon as emualted NIC's MAC changed in guest

2013-09-25 Thread Zhanghaoyu (A)
 Hi, all
 
 Do live migration if emulated NIC's MAC has been changed, 
 RARP with wrong MAC address will broadcast via 
 qemu_announce_self in destination, so, long time network 
 disconnection probably happen.

Good catch.

 I want to do below works to resolve this problem, 1. 
 change NICConf's MAC as soon as emulated NIC's MAC changed 
 in guest

This will make it impossible to revert it correctly on reset, 
won't it?

You are right.
virsh reboot domain, or virsh reset domain, or reboot VM 
from guest, will revert emulated NIC's MAC to original one 
maintained in NICConf.
During the reboot/reset flow in qemu, emulated NIC's reset 
handler will sync the MAC address in NICConf to the MAC 
address in emulated NIC structure, e.g., virtio_net_reset 
sync the MAC address in NICConf to VirtIONet'mac.

BTW, in native scenario, reboot will revert the changed MAC 
to original one, too.

 2. sync NIC's (more precisely, queue) MAC to corresponding 
 NICConf in NIC's migration load handler
 
 Any better ideas?
 
 Thanks,
 Zhang Haoyu

I think announce needs to poke at the current MAC instead of  
the default one in NICConf.
We can make it respect link down state while we are at it.

NICConf structures are incorporated in different emulated 
NIC's structure, e.g., VirtIONet, E1000State_st, 
RTL8139State, etc., since so many kinds of emulated NICs, 
they are described by different structures, how to find all NICs' 
current MAC?

Maybe we can introduce a pointer member 'current_mac' to 
NICConf structure, which points to the current MAC, then we 
can find all current MACs from NICConf.current_mac.
   
   I wouldn't make it a pointer, just a buffer with the mac, copy it 
   there.
   Maybe call it softmac that's what it is really.
   
Can we broadcast the RARP with current MAC in NIC's migration 
load handler respectively?

Thanks,
Zhang Haoyu
   
   It's not so simple, you need to retry several times.
   
   Could you make a statement for 'retry several times' ?
   Is it the process of retrying several times to sending RARP in 
   qemu_announce_self_once?
  
  yes
  
   'broadcast the RARP with current MAC in NIC's migration load 
   handler respectively' is distributing the job of what 
   qemu_announce_self does to every NIC's migration load handler, 
   e.g., in virtio NIC's migration load handler virtio_net_load, we 
   can create a timer to retry several times to send ARAP with 
   current MAC for this NIC, just as same as qemu_announce_self does.
  
  I don't see a lot of value in this yet.
  
  In my opinion, it's not so good to introduce a 'softmac' member to 
  NICConf, which is not essential function of NICConf.
 
  Maybe not essential but 100% of hardware we emulate supports softmacs.
 
 Yes, but NICConf is about NIC *configuration*, not random common NIC 
 state.
 
 We can capture common NIC state in a separate, properly named data type.
 
 If we want to bunch it together with common configuration in NICConf 
 instead, then better rename NICConf to something that actually 
 reflects its changed purpose.  I doubt this would be a good idea.

I agree, it should go into NetClientState, not NICConf.
My main point is it's a common thing, let's not duplicate code.

Yes, put it into NetClientState is better.
But, need to add updating code for NetClientState.softmac to all devices, right?

  And, distributing the job of what qemu_announce_self does to every 
  NIC's migration load handler has no disadvantages over 
  qemu_announce_self,
 
  I see some disadvantages, yes.
  You are going to add code to all devices instead of doing it in one 
  place, there better be a good reason for this.
Comparing with qemu_announce_self, there is indeed no advantages, on the 
contrary, has disadvantages, just as what you said.
but comparing with introducing 'softmac' or something into NIC-related 
structures, 
it does not need to add any data to NIC-related structures, and introducing 
'softmac' also need to add updating code for 'softmac' to all devices, right?
And, I don't think it's a good idea to store the identical data in two buffers, 
its consistency should be guaranteed.

Thanks,
Zhang Haoyu

 
 Keeping code common to many (most?) NICs factored out makes sense.
 We've started doing that for block devices, in hw/block/block.c.  So 
 far, the only code there is about configuration, thus we work with 
 BlockConf.
 
 [...]



Re: [Qemu-devel] [RFC] sync NIC's MAC maintained in NICConf as soon as emualted NIC's MAC changed in guest

2013-09-25 Thread Zhanghaoyu (A)
 Hi, all

 Do live migration if emulated NIC's MAC has been changed, RARP with 
 wrong MAC address will broadcast via qemu_announce_self in destination, so, 
 long time network disconnection probably happen.

 I want to do below works to resolve this problem, 1. change NICConf's 
 MAC as soon as emulated NIC's MAC changed in guest 2. sync NIC's (more 
 precisely, queue) MAC to corresponding NICConf in NIC's migration load 
 handler

 Any better ideas?

As Michael points out. The only possible solution is to use do it inside the 
guest instead of qemu ( and using a pv device). You can have a look at my RFCs 
in http://lists.nongnu.org/archive/html/qemu-devel/2013-03/msg01127.html
which let virtio driver send the gARP. Xen, Hyperv does the same thing.

How about other emulated NICs, like etl8139, etc. ?

The point is qemu does not know how the macs was used. So your method only 
solves the issue partially becuase:

- A card can have several macs, see virtio_net and e1000's mac table and it 
can be overflowed also.
- Vlan could be used so we need to send tagged gARP instead of untagged.

Does the emulated NIC in qemu have knowledge of all of its MACs?
We can provide an interface nic_announce_self(NetClientState *nc, uint8_t 
*mac_addr) which will try several times to send RARP just as same as 
what qemu_announce_self does, all emulated NICs' migration load handler can 
call nic_announce_self to announce itself for its all MACs.



 Thanks,
 Zhang Haoyu



[Qemu-devel] [RFC] sync NIC's MAC maintained in NICConf as soon as emualted NIC's MAC changed in guest

2013-09-22 Thread Zhanghaoyu (A)
Hi, all

Do live migration if emulated NIC's MAC has been changed, RARP with wrong MAC 
address will broadcast via qemu_announce_self in destination,
so, long time network disconnection probably happen.

I want to do below works to resolve this problem,
1. change NICConf's MAC as soon as emulated NIC's MAC changed in guest
2. sync NIC's (more precisely, queue) MAC to corresponding NICConf in NIC's 
migration load handler

Any better ideas?

Thanks,
Zhang Haoyu



[Qemu-devel] [KVM] segmentation fault happened when reboot VM after hot-uplug virtio NIC

2013-09-03 Thread Zhanghaoyu (A)
Hi, all

Segmentation fault happened when reboot VM after hot-unplug virtio NIC, which 
can be reproduced 100%.
See similar bug report to https://bugzilla.redhat.com/show_bug.cgi?id=988256

test environment:
host: SLES11SP2 (kenrel version: 3.0.58)
qemu: 1.5.1, upstream-qemu (commit 545825d4cda03ea292b7788b3401b99860efe8bc)
libvirt: 1.1.0
guest os: win2k8 R2 x64bit or sles11sp2 x64 or win2k3 32bit

You can reproduce this problem by following steps:
1. start a VM with virtio NIC(s)
2. hot-unplug a virtio NIC from the VM
3. reboot the VM, then segmentation fault happened during starting period

the qemu backtrace shown as below:
#0  0x7ff4be3288d0 in __memcmp_sse4_1 () from /lib64/libc.so.6
#1  0x7ff4c07f82c0 in patch_hypercalls (s=0x7ff4c15dd610) at 
/mnt/zhanghaoyu/qemu/qemu-1.5.1/hw/i386/kvmvapic.c:549
#2  0x7ff4c07f84f0 in vapic_prepare (s=0x7ff4c15dd610) at 
/mnt/zhanghaoyu/qemu/qemu-1.5.1/hw/i386/kvmvapic.c:614
#3  0x7ff4c07f85e7 in vapic_write (opaque=0x7ff4c15dd610, addr=0, data=32, 
size=2)
at /mnt/zhanghaoyu/qemu/qemu-1.5.1/hw/i386/kvmvapic.c:651
#4  0x7ff4c082a917 in memory_region_write_accessor (opaque=0x7ff4c15df938, 
addr=0, value=0x7ff4bbfe3d00, size=2, 
shift=0, mask=65535) at /mnt/zhanghaoyu/qemu/qemu-1.5.1/memory.c:334
#5  0x7ff4c082a9ee in access_with_adjusted_size (addr=0, 
value=0x7ff4bbfe3d00, size=2, access_size_min=1, 
access_size_max=4, access=0x7ff4c082a89a memory_region_write_accessor, 
opaque=0x7ff4c15df938)
at /mnt/zhanghaoyu/qemu/qemu-1.5.1/memory.c:364
#6  0x7ff4c082ae49 in memory_region_iorange_write (iorange=0x7ff4c15dfca0, 
offset=0, width=2, data=32)
at /mnt/zhanghaoyu/qemu/qemu-1.5.1/memory.c:439
#7  0x7ff4c08236f7 in ioport_writew_thunk (opaque=0x7ff4c15dfca0, addr=126, 
data=32)
at /mnt/zhanghaoyu/qemu/qemu-1.5.1/ioport.c:219
#8  0x7ff4c0823078 in ioport_write (index=1, address=126, data=32) at 
/mnt/zhanghaoyu/qemu/qemu-1.5.1/ioport.c:83
#9  0x7ff4c0823ca9 in cpu_outw (addr=126, val=32) at 
/mnt/zhanghaoyu/qemu/qemu-1.5.1/ioport.c:296
#10 0x7ff4c0827485 in kvm_handle_io (port=126, data=0x7ff4c051, 
direction=1, size=2, count=1)
at /mnt/zhanghaoyu/qemu/qemu-1.5.1/kvm-all.c:1485
#11 0x7ff4c0827e14 in kvm_cpu_exec (env=0x7ff4c15bf270) at 
/mnt/zhanghaoyu/qemu/qemu-1.5.1/kvm-all.c:1634
#12 0x7ff4c07b6f27 in qemu_kvm_cpu_thread_fn (arg=0x7ff4c15bf270) at 
/mnt/zhanghaoyu/qemu/qemu-1.5.1/cpus.c:759
#13 0x7ff4be58af05 in start_thread () from /lib64/libpthread.so.0
#14 0x7ff4be2cd53d in clone () from /lib64/libc.so.6

If I apply below patch to the upstream qemu, this problem will disappear,
---
 hw/i386/kvmvapic.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/i386/kvmvapic.c b/hw/i386/kvmvapic.c
index 15beb80..6fff299 100644
--- a/hw/i386/kvmvapic.c
+++ b/hw/i386/kvmvapic.c
@@ -652,11 +652,11 @@ static void vapic_write(void *opaque, hwaddr addr, 
uint64_t data,
 switch (size) {
 case 2:
 if (s-state == VAPIC_INACTIVE) {
-rom_paddr = (env-segs[R_CS].base + env-eip)  ROM_BLOCK_MASK;
-s-rom_state_paddr = rom_paddr + data;
-
 s-state = VAPIC_STANDBY;
 }
+rom_paddr = (env-segs[R_CS].base + env-eip)  ROM_BLOCK_MASK;
+s-rom_state_paddr = rom_paddr + data;
+
 if (vapic_prepare(s)  0) {
 s-state = VAPIC_INACTIVE;
 break;
--
1.8.1.4

Thanks,
Daniel






Re: [Qemu-devel] [KVM] segmentation fault happened when reboot VM after hot-uplug virtio NIC

2013-09-03 Thread Zhanghaoyu (A)
 Hi, all
 
 Segmentation fault happened when reboot VM after hot-unplug virtio NIC, 
 which can be reproduced 100%.
 See similar bug report to 
 https://bugzilla.redhat.com/show_bug.cgi?id=988256
 
 test environment:
 host: SLES11SP2 (kenrel version: 3.0.58)
 qemu: 1.5.1, upstream-qemu (commit 
 545825d4cda03ea292b7788b3401b99860efe8bc)
 libvirt: 1.1.0
 guest os: win2k8 R2 x64bit or sles11sp2 x64 or win2k3 32bit
 
 You can reproduce this problem by following steps:
 1. start a VM with virtio NIC(s)
 2. hot-unplug a virtio NIC from the VM 3. reboot the VM, then 
 segmentation fault happened during starting period
 
 the qemu backtrace shown as below:
 #0  0x7ff4be3288d0 in __memcmp_sse4_1 () from /lib64/libc.so.6
 #1  0x7ff4c07f82c0 in patch_hypercalls (s=0x7ff4c15dd610) at 
 /mnt/zhanghaoyu/qemu/qemu-1.5.1/hw/i386/kvmvapic.c:549
 #2  0x7ff4c07f84f0 in vapic_prepare (s=0x7ff4c15dd610) at 
 /mnt/zhanghaoyu/qemu/qemu-1.5.1/hw/i386/kvmvapic.c:614
 #3  0x7ff4c07f85e7 in vapic_write (opaque=0x7ff4c15dd610, addr=0, 
 data=32, size=2)
 at /mnt/zhanghaoyu/qemu/qemu-1.5.1/hw/i386/kvmvapic.c:651
 #4  0x7ff4c082a917 in memory_region_write_accessor 
 (opaque=0x7ff4c15df938, addr=0, value=0x7ff4bbfe3d00, size=2, 
 shift=0, mask=65535) at 
 /mnt/zhanghaoyu/qemu/qemu-1.5.1/memory.c:334
 #5  0x7ff4c082a9ee in access_with_adjusted_size (addr=0, 
 value=0x7ff4bbfe3d00, size=2, access_size_min=1, 
 access_size_max=4, access=0x7ff4c082a89a memory_region_write_accessor, 
 opaque=0x7ff4c15df938)
 at /mnt/zhanghaoyu/qemu/qemu-1.5.1/memory.c:364
 #6  0x7ff4c082ae49 in memory_region_iorange_write 
 (iorange=0x7ff4c15dfca0, offset=0, width=2, data=32)
 at /mnt/zhanghaoyu/qemu/qemu-1.5.1/memory.c:439
 #7  0x7ff4c08236f7 in ioport_writew_thunk (opaque=0x7ff4c15dfca0, 
 addr=126, data=32)
 at /mnt/zhanghaoyu/qemu/qemu-1.5.1/ioport.c:219
 #8  0x7ff4c0823078 in ioport_write (index=1, address=126, data=32) 
 at /mnt/zhanghaoyu/qemu/qemu-1.5.1/ioport.c:83
 #9  0x7ff4c0823ca9 in cpu_outw (addr=126, val=32) at 
 /mnt/zhanghaoyu/qemu/qemu-1.5.1/ioport.c:296
 #10 0x7ff4c0827485 in kvm_handle_io (port=126, data=0x7ff4c051, 
 direction=1, size=2, count=1)
 at /mnt/zhanghaoyu/qemu/qemu-1.5.1/kvm-all.c:1485
 #11 0x7ff4c0827e14 in kvm_cpu_exec (env=0x7ff4c15bf270) at 
 /mnt/zhanghaoyu/qemu/qemu-1.5.1/kvm-all.c:1634
 #12 0x7ff4c07b6f27 in qemu_kvm_cpu_thread_fn (arg=0x7ff4c15bf270) 
 at /mnt/zhanghaoyu/qemu/qemu-1.5.1/cpus.c:759
 #13 0x7ff4be58af05 in start_thread () from /lib64/libpthread.so.0
 #14 0x7ff4be2cd53d in clone () from /lib64/libc.so.6
 
 If I apply below patch to the upstream qemu, this problem will 
 disappear,
 ---
  hw/i386/kvmvapic.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)
 
 diff --git a/hw/i386/kvmvapic.c b/hw/i386/kvmvapic.c index 
 15beb80..6fff299 100644
 --- a/hw/i386/kvmvapic.c
 +++ b/hw/i386/kvmvapic.c
 @@ -652,11 +652,11 @@ static void vapic_write(void *opaque, hwaddr addr, 
 uint64_t data,
  switch (size) {
  case 2:
  if (s-state == VAPIC_INACTIVE) {
 -rom_paddr = (env-segs[R_CS].base + env-eip)  ROM_BLOCK_MASK;
 -s-rom_state_paddr = rom_paddr + data;
 -
  s-state = VAPIC_STANDBY;
  }
 +rom_paddr = (env-segs[R_CS].base + env-eip)  ROM_BLOCK_MASK;
 +s-rom_state_paddr = rom_paddr + data;
 +
  if (vapic_prepare(s)  0) {
  s-state = VAPIC_INACTIVE;
  break;

Yes, we need to update the ROM's physical address after the BIOS reshuffled 
the layout.

But I'm not happy with simply updating the address unconditionally. We need to 
understand the crash first, then make QEMU robust against the guest not 
issuing this initial write after a ROM region layout change.
And finally make it work properly in the normal case.

The direct cause of crash is trying to access invalid address, which is due to 
not updating the rom's physical address.
In my opinion, since hot-plug/unplug involved in, we need to re-calculate rom's 
physical address for all devices which have rom during starting period when 
reboot/reset vm,
is it reasonable to set vapic's state to VAPIC_INACTIVE during vapic's reset?

Jan



Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with EPT enabled

2013-08-31 Thread Zhanghaoyu (A)
I tested below combos of qemu and kernel,
++-+-+
|kernel  |  QEMU   |  migration  |
++-+-+
| SLES11SP2+kvm-kmod-3.6 |   qemu-1.6.0|GOOD |
++-+-+
| SLES11SP2+kvm-kmod-3.6 |   qemu-1.6.0*   |BAD  |
++-+-+
| SLES11SP2+kvm-kmod-3.6 |   qemu-1.5.1|BAD  |
++-+-+
| SLES11SP2+kvm-kmod-3.6*|   qemu-1.5.1|GOOD |
++-+-+
| SLES11SP2+kvm-kmod-3.6 |   qemu-1.5.1*   |GOOD |
++-+-+
| SLES11SP2+kvm-kmod-3.6 |   qemu-1.5.2|BAD  |
++-+-+
| kvm-3.11-2 |   qemu-1.5.1|BAD  |
++-+-+
NOTE:
1. kvm-3.11-2 : the whole tag kernel downloaded from 
https://git.kernel.org/pub/scm/virt/kvm/kvm.git
2. SLES11SP2+kvm-kmod-3.6 : our release kernel, replace the SLES11SP2's default 
kvm-kmod with kvm-kmod-3.6, SLES11SP2's kernel version is 3.0.13-0.27
3. qemu-1.6.0* : revert the commit 211ea74022f51164a7729030b28eec90b6c99a08 on 
qemu-1.6.0
4. kvm-kmod-3.6* : kvm-kmod-3.6 with EPT disabled
5. qemu-1.5.1* : apply below patch to qemu-1.5.1 to delete qemu_madvise() 
statement in ram_load() function

--- qemu-1.5.1/arch_init.c  2013-06-27 05:47:29.0 +0800
+++ qemu-1.5.1_fix3/arch_init.c 2013-08-28 19:43:42.0 +0800
@@ -842,7 +842,6 @@ static int ram_load(QEMUFile *f, void *o
 if (ch == 0 
 (!kvm_enabled() || kvm_has_sync_mmu()) 
 getpagesize() = TARGET_PAGE_SIZE) {
-qemu_madvise(host, TARGET_PAGE_SIZE, QEMU_MADV_DONTNEED);
 }
 #endif
 } else if (flags  RAM_SAVE_FLAG_PAGE) {

If I apply above patch to qemu-1.5.1 to delete the qemu_madvise() statement, 
the test result of the combos of SLES11SP2+kvm-kmod-3.6 and qemu-1.5.1 is good.
Why do we perform the qemu_madvise(QEMU_MADV_DONTNEED) for those zero pages?
Does the qemu_madvise() have sustained effect on the range of virtual address?  
In other words, does qemu_madvise() have sustained effect on the VM performance?
If later frequently read/write the range of virtual address which have been 
advised to DONTNEED, could performance degradation happen?

The reason why the combos of SLES11SP2+kvm-kmod-3.6 and qemu-1.6.0 is good, is 
because of commit 211ea74022f51164a7729030b28eec90b6c99a08,
if I revert the commit 211ea74022f51164a7729030b28eec90b6c99a08 on qemu-1.6.0, 
the test result of combos of SLES11SP2+kvm-kmod-3.6 and qemu-1.6.0 is bad, 
performance degradation happened, too.

Thanks,
Zhang Haoyu

  The QEMU command line (/var/log/libvirt/qemu/[domain name].log), 
  LC_ALL=C PATH=/bin:/sbin:/usr/bin:/usr/sbin HOME=/ 
  QEMU_AUDIO_DRV=none
  /usr/local/bin/qemu-system-x86_64 -name ATS1 -S -M pc-0.12 -cpu
  qemu32 -enable-kvm -m 12288 -smp 4,sockets=4,cores=1,threads=1 
  -uuid
  0505ec91-382d-800e-2c79-e5b286eb60b5 -no-user-config -nodefaults 
  -chardev 
  socket,id=charmonitor,path=/var/lib/libvirt/qemu/ATS1.monitor,serv
  er, n owait -mon chardev=charmonitor,id=monitor,mode=control -rtc 
  base=localtime -no-shutdown -device
  piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive 
  file=/opt/ne/vm/ATS1.img,if=none,id=drive-virtio-disk0,format=raw,
  cac
  h
  e=none -device
  virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk
  0,i
  d
  =virtio-disk0,bootindex=1 -netdev
  tap,fd=20,id=hostnet0,vhost=on,vhostfd=21 -device 
  virtio-net-pci,netdev=hostnet0,id=net0,mac=00:e0:fc:00:0f:00,bus=pci.
  0
  ,addr=0x3,bootindex=2 -netdev
  tap,fd=22,id=hostnet1,vhost=on,vhostfd=23 -device 
  virtio-net-pci,netdev=hostnet1,id=net1,mac=00:e0:fc:01:0f:00,bus=pci.
  0
  ,addr=0x4 -netdev tap,fd=24,id=hostnet2,vhost=on,vhostfd=25 
  -device 
  virtio-net-pci,netdev=hostnet2,id=net2,mac=00:e0:fc:02:0f:00,bus=pci.
  0
  ,addr=0x5 -netdev tap,fd=26,id=hostnet3,vhost=on,vhostfd=27 
  -device 
  virtio-net-pci,netdev=hostnet3,id=net3,mac=00:e0:fc:03:0f:00,bus=pci.
  0
  ,addr=0x6 -netdev tap,fd=28,id=hostnet4,vhost=on,vhostfd=29 
  -device 
  virtio-net-pci,netdev=hostnet4,id=net4,mac=00:e0:fc:0a:0f:00,bus=pci.
  0
  ,addr=0x7 -netdev tap,fd=30,id=hostnet5,vhost=on,vhostfd=31 
  -device 
  virtio-net-pci,netdev=hostnet5,id=net5,mac=00:e0:fc:0b:0f:00,bus=pci.
  0
  ,addr=0x9 -chardev pty,id=charserial0 -device 
  isa-serial,chardev=charserial0,id=serial0 -vnc *:0 -k en-us -vga 
  cirrus -device i6300esb,id=watchdog0,bus=pci.0,addr=0xb
  -watchdog-action poweroff -device
  virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xa
  
 Which QEMU version is this? Can you try with e1000 NICs instead of virtio?
 
 This QEMU version is 1.0.0, but 

[Qemu-devel] [kvm] segmentation fault when guest reboot or reset after hotunplug virtio NIC

2013-08-29 Thread Zhanghaoyu (A)
Description of problem:
when guest do reboot or reset after hotunplug virtio NIC, Segmentation fault 
occurs.It can reproduce 100%.
Similar to https://bugzilla.redhat.com/show_bug.cgi?id=988256

Version-Release number of selected component (if applicable):
Host OS:sles11sp2 kernel version:3.0.58
qemu-1.5.1
libvirt-1.1.0
guest os:win2k8 R2 x64bit or sles11sp2 x64 or win2k3 32bit


Steps shown as below:
1.use virsh to start a vm with a virtio NIC

2.after booting, use virsh detach-device to hotunplug the virito NIC

3.use virsh reboot/reset the restart the vm

4.when vm is rebooting, Segmentation fault appears.

the backstrace:

#0  0x7ff4be3288d0 in __memcmp_sse4_1 () from /lib64/libc.so.6
#1  0x7ff4c07f82c0 in patch_hypercalls (s=0x7ff4c15dd610) at 
/mnt/zhanghaoyu/qemu/qemu-1.5.1/hw/i386/kvmvapic.c:549
#2  0x7ff4c07f84f0 in vapic_prepare (s=0x7ff4c15dd610) at 
/mnt/zhanghaoyu/qemu/qemu-1.5.1/hw/i386/kvmvapic.c:614
#3  0x7ff4c07f85e7 in vapic_write (opaque=0x7ff4c15dd610, addr=0, data=32, 
size=2)
at /mnt/zhanghaoyu/qemu/qemu-1.5.1/hw/i386/kvmvapic.c:651
#4  0x7ff4c082a917 in memory_region_write_accessor (opaque=0x7ff4c15df938, 
addr=0, value=0x7ff4bbfe3d00, size=2, 
shift=0, mask=65535) at /mnt/zhanghaoyu/qemu/qemu-1.5.1/memory.c:334
#5  0x7ff4c082a9ee in access_with_adjusted_size (addr=0, 
value=0x7ff4bbfe3d00, size=2, access_size_min=1, 
access_size_max=4, access=0x7ff4c082a89a memory_region_write_accessor, 
opaque=0x7ff4c15df938)
at /mnt/zhanghaoyu/qemu/qemu-1.5.1/memory.c:364
#6  0x7ff4c082ae49 in memory_region_iorange_write (iorange=0x7ff4c15dfca0, 
offset=0, width=2, data=32)
at /mnt/zhanghaoyu/qemu/qemu-1.5.1/memory.c:439
#7  0x7ff4c08236f7 in ioport_writew_thunk (opaque=0x7ff4c15dfca0, addr=126, 
data=32)
at /mnt/zhanghaoyu/qemu/qemu-1.5.1/ioport.c:219
#8  0x7ff4c0823078 in ioport_write (index=1, address=126, data=32) at 
/mnt/zhanghaoyu/qemu/qemu-1.5.1/ioport.c:83
#9  0x7ff4c0823ca9 in cpu_outw (addr=126, val=32) at 
/mnt/zhanghaoyu/qemu/qemu-1.5.1/ioport.c:296
#10 0x7ff4c0827485 in kvm_handle_io (port=126, data=0x7ff4c051, 
direction=1, size=2, count=1)
at /mnt/zhanghaoyu/qemu/qemu-1.5.1/kvm-all.c:1485
#11 0x7ff4c0827e14 in kvm_cpu_exec (env=0x7ff4c15bf270) at 
/mnt/zhanghaoyu/qemu/qemu-1.5.1/kvm-all.c:1634
#12 0x7ff4c07b6f27 in qemu_kvm_cpu_thread_fn (arg=0x7ff4c15bf270) at 
/mnt/zhanghaoyu/qemu/qemu-1.5.1/cpus.c:759
#13 0x7ff4be58af05 in start_thread () from /lib64/libpthread.so.0
#14 0x7ff4be2cd53d in clone () from /lib64/libc.so.6

In function vapic_write(), when reboot or reset the vm after hotunplug the 
virtio NIC, the rom_paddr may changed since virtio NIC rom will not load to ram.
switch (size) {
case 2:
if (s-state == VAPIC_INACTIVE) {
rom_paddr = (env-segs[R_CS].base + env-eip)  ROM_BLOCK_MASK;
s-rom_state_paddr = rom_paddr + data;

s-state = VAPIC_STANDBY;
}
if (vapic_prepare(s)  0) {
s-state = VAPIC_INACTIVE;
break;
}

So I change this code like this:
switch (size) {
case 2:
if (s-state == VAPIC_INACTIVE) {
s-state = VAPIC_STANDBY;
}

rom_paddr = (env-segs[R_CS].base + env-eip)  ROM_BLOCK_MASK;
s-rom_state_paddr = rom_paddr + data;

if (vapic_prepare(s)  0) {
s-state = VAPIC_INACTIVE;
break;
}

Apply above change, the segmentation fault disappears and the vm reboot or 
reset successfully. 
Is above change the correct way to fix the problem?

Thanks,
Daniel



Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with EPT enabled

2013-08-20 Thread Zhanghaoyu (A)
  The QEMU command line (/var/log/libvirt/qemu/[domain name].log), 
  LC_ALL=C PATH=/bin:/sbin:/usr/bin:/usr/sbin HOME=/ 
  QEMU_AUDIO_DRV=none
  /usr/local/bin/qemu-system-x86_64 -name ATS1 -S -M pc-0.12 -cpu
  qemu32 -enable-kvm -m 12288 -smp 4,sockets=4,cores=1,threads=1 
  -uuid
  0505ec91-382d-800e-2c79-e5b286eb60b5 -no-user-config -nodefaults 
  -chardev 
  socket,id=charmonitor,path=/var/lib/libvirt/qemu/ATS1.monitor,ser
  ver, n owait -mon chardev=charmonitor,id=monitor,mode=control 
  -rtc base=localtime -no-shutdown -device
  piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive 
  file=/opt/ne/vm/ATS1.img,if=none,id=drive-virtio-disk0,format=raw
  ,cac
  h
  e=none -device
  virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-dis
  k0,i
  d
  =virtio-disk0,bootindex=1 -netdev
  tap,fd=20,id=hostnet0,vhost=on,vhostfd=21 -device 
  virtio-net-pci,netdev=hostnet0,id=net0,mac=00:e0:fc:00:0f:00,bus=pci.
  0
  ,addr=0x3,bootindex=2 -netdev
  tap,fd=22,id=hostnet1,vhost=on,vhostfd=23 -device 
  virtio-net-pci,netdev=hostnet1,id=net1,mac=00:e0:fc:01:0f:00,bus=pci.
  0
  ,addr=0x4 -netdev tap,fd=24,id=hostnet2,vhost=on,vhostfd=25 
  -device 
  virtio-net-pci,netdev=hostnet2,id=net2,mac=00:e0:fc:02:0f:00,bus=pci.
  0
  ,addr=0x5 -netdev tap,fd=26,id=hostnet3,vhost=on,vhostfd=27 
  -device 
  virtio-net-pci,netdev=hostnet3,id=net3,mac=00:e0:fc:03:0f:00,bus=pci.
  0
  ,addr=0x6 -netdev tap,fd=28,id=hostnet4,vhost=on,vhostfd=29 
  -device 
  virtio-net-pci,netdev=hostnet4,id=net4,mac=00:e0:fc:0a:0f:00,bus=pci.
  0
  ,addr=0x7 -netdev tap,fd=30,id=hostnet5,vhost=on,vhostfd=31 
  -device 
  virtio-net-pci,netdev=hostnet5,id=net5,mac=00:e0:fc:0b:0f:00,bus=pci.
  0
  ,addr=0x9 -chardev pty,id=charserial0 -device 
  isa-serial,chardev=charserial0,id=serial0 -vnc *:0 -k en-us -vga 
  cirrus -device i6300esb,id=watchdog0,bus=pci.0,addr=0xb
  -watchdog-action poweroff -device 
  virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xa
  
 Which QEMU version is this? Can you try with e1000 NICs instead of virtio?
 
 This QEMU version is 1.0.0, but I also test QEMU 1.5.2, the same problem 
 exists, including the performance degradation and readonly GFNs' flooding.
 I tried with e1000 NICs instead of virtio, including the performance 
 degradation and readonly GFNs' flooding, the QEMU version is 1.5.2.
 No matter e1000 NICs or virtio NICs, the GFNs' flooding is initiated at 
 post-restore stage (i.e. running stage), as soon as the restoring 
 completed, the flooding is starting.
 
 Thanks,
 Zhang Haoyu
 
 --
   Gleb.
 
 Should we focus on the first bad 
 commit(612819c3c6e67bac8fceaa7cc402f13b1b63f7e4) and the surprising GFNs' 
 flooding?
 
Not really. There is no point in debugging very old version compiled 
with kvm-kmod, there are to many variables in the environment. I cannot 
reproduce the GFN flooding on upstream, so the problem may be gone, may 
be a result of kvm-kmod problem or something different in how I invoke 
qemu. So the best way to proceed is for you to reproduce with upstream 
version then at least I will be sure that we are using the same code.

Thanks, I will test the combos of upstream kvm kernel and upstream qemu.
And, the guest os version above I said was wrong, current running guest os is 
SLES10SP4.

I tested below combos of qemu and kernel,
+-+-+-+
|  kvm kernel |  QEMU   |   test result   |
+-+-+-+
|  kvm-3.11-2 |   qemu-1.5.2|  GOOD   |
+-+-+-+
|  SLES11SP2  |   qemu-1.0.0|  BAD|
+-+-+-+
|  SLES11SP2  |   qemu-1.4.0|  BAD|
+-+-+-+
|  SLES11SP2  |   qemu-1.4.2|  BAD|
+-+-+-+
|  SLES11SP2  | qemu-1.5.0-rc0  |  GOOD   |
+-+-+-+
|  SLES11SP2  |   qemu-1.5.0|  GOOD   |
+-+-+-+
|  SLES11SP2  |   qemu-1.5.1|  GOOD   |
+-+-+-+
|  SLES11SP2  |   qemu-1.5.2|  GOOD   |
+-+-+-+
NOTE:
1. above kvm-3.11-2 in the table is the whole tag kernel download from 
https://git.kernel.org/pub/scm/virt/kvm/kvm.git
2. SLES11SP2's kernel version is 3.0.13-0.27

Then I git bisect the qemu changes between qemu-1.4.2 and qemu-1.5.0-rc0 by 
marking the good version as bad, and the bad version as good,
so the first bad commit is just the patch which fixes the degradation problem.
++---+-+-+
| bisect No. |  commit   |  save-restore   |
migration|

Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with EPT enabled

2013-08-14 Thread Zhanghaoyu (A)
  The QEMU command line (/var/log/libvirt/qemu/[domain name].log), 
  LC_ALL=C PATH=/bin:/sbin:/usr/bin:/usr/sbin HOME=/ 
  QEMU_AUDIO_DRV=none
  /usr/local/bin/qemu-system-x86_64 -name ATS1 -S -M pc-0.12 -cpu 
  qemu32 -enable-kvm -m 12288 -smp 4,sockets=4,cores=1,threads=1 -uuid
  0505ec91-382d-800e-2c79-e5b286eb60b5 -no-user-config -nodefaults 
  -chardev 
  socket,id=charmonitor,path=/var/lib/libvirt/qemu/ATS1.monitor,server,
  n owait -mon chardev=charmonitor,id=monitor,mode=control -rtc 
  base=localtime -no-shutdown -device
  piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive 
  file=/opt/ne/vm/ATS1.img,if=none,id=drive-virtio-disk0,format=raw,cac
  h
  e=none -device
  virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk0,i
  d
  =virtio-disk0,bootindex=1 -netdev
  tap,fd=20,id=hostnet0,vhost=on,vhostfd=21 -device 
  virtio-net-pci,netdev=hostnet0,id=net0,mac=00:e0:fc:00:0f:00,bus=pci.
  0
  ,addr=0x3,bootindex=2 -netdev
  tap,fd=22,id=hostnet1,vhost=on,vhostfd=23 -device 
  virtio-net-pci,netdev=hostnet1,id=net1,mac=00:e0:fc:01:0f:00,bus=pci.
  0
  ,addr=0x4 -netdev tap,fd=24,id=hostnet2,vhost=on,vhostfd=25 -device 
  virtio-net-pci,netdev=hostnet2,id=net2,mac=00:e0:fc:02:0f:00,bus=pci.
  0
  ,addr=0x5 -netdev tap,fd=26,id=hostnet3,vhost=on,vhostfd=27 -device 
  virtio-net-pci,netdev=hostnet3,id=net3,mac=00:e0:fc:03:0f:00,bus=pci.
  0
  ,addr=0x6 -netdev tap,fd=28,id=hostnet4,vhost=on,vhostfd=29 -device 
  virtio-net-pci,netdev=hostnet4,id=net4,mac=00:e0:fc:0a:0f:00,bus=pci.
  0
  ,addr=0x7 -netdev tap,fd=30,id=hostnet5,vhost=on,vhostfd=31 -device 
  virtio-net-pci,netdev=hostnet5,id=net5,mac=00:e0:fc:0b:0f:00,bus=pci.
  0
  ,addr=0x9 -chardev pty,id=charserial0 -device 
  isa-serial,chardev=charserial0,id=serial0 -vnc *:0 -k en-us -vga 
  cirrus -device i6300esb,id=watchdog0,bus=pci.0,addr=0xb
  -watchdog-action poweroff -device
  virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xa
  
 Which QEMU version is this? Can you try with e1000 NICs instead of virtio?
 
 This QEMU version is 1.0.0, but I also test QEMU 1.5.2, the same problem 
 exists, including the performance degradation and readonly GFNs' flooding.
 I tried with e1000 NICs instead of virtio, including the performance 
 degradation and readonly GFNs' flooding, the QEMU version is 1.5.2.
 No matter e1000 NICs or virtio NICs, the GFNs' flooding is initiated at 
 post-restore stage (i.e. running stage), as soon as the restoring 
 completed, the flooding is starting.
 
 Thanks,
 Zhang Haoyu
 
 --
Gleb.
 
 Should we focus on the first bad 
 commit(612819c3c6e67bac8fceaa7cc402f13b1b63f7e4) and the surprising GFNs' 
 flooding?
 
Not really. There is no point in debugging very old version compiled
with kvm-kmod, there are to many variables in the environment. I cannot
reproduce the GFN flooding on upstream, so the problem may be gone, may
be a result of kvm-kmod problem or something different in how I invoke
qemu. So the best way to proceed is for you to reproduce with upstream
version then at least I will be sure that we are using the same code.

Thanks, I will test the combos of upstream kvm kernel and upstream qemu.
And, the guest os version above I said was wrong, current running guest os is 
SLES10SP4.

Thanks,
Zhang Haoyu

 I applied below patch to  __direct_map(), 
 @@ -2223,6 +2223,8 @@ static int __direct_map(struct kvm_vcpu
 int pt_write = 0;
 gfn_t pseudo_gfn;
 
 +map_writable = true;
 +
 for_each_shadow_entry(vcpu, (u64)gfn  PAGE_SHIFT, iterator) {
 if (iterator.level == level) {
 unsigned pte_access = ACC_ALL;
 and rebuild the kvm-kmod, then re-insmod it.
 After I started a VM, the host seemed to be abnormal, so many programs 
 cannot be started successfully, segmentation fault is reported.
 In my opinion, after above patch applied, the commit: 
 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4 should be of no effect, but the 
 test result proved me wrong.
 Dose the map_writable value's getting process in hva_to_pfn() have effect on 
 the result?
 
If hva_to_pfn() returns map_writable == false it means that page is
mapped as read only on primary MMU, so it should not be mapped writable
on secondary MMU either. This should not happen usually.

--
   Gleb.


Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with EPT enabled

2013-08-06 Thread Zhanghaoyu (A)
 The QEMU command line (/var/log/libvirt/qemu/[domain name].log), 
 LC_ALL=C PATH=/bin:/sbin:/usr/bin:/usr/sbin HOME=/ QEMU_AUDIO_DRV=none 
 /usr/local/bin/qemu-system-x86_64 -name ATS1 -S -M pc-0.12 -cpu qemu32 
 -enable-kvm -m 12288 -smp 4,sockets=4,cores=1,threads=1 -uuid 
 0505ec91-382d-800e-2c79-e5b286eb60b5 -no-user-config -nodefaults 
 -chardev 
 socket,id=charmonitor,path=/var/lib/libvirt/qemu/ATS1.monitor,server,n
 owait -mon chardev=charmonitor,id=monitor,mode=control -rtc 
 base=localtime -no-shutdown -device 
 piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive 
 file=/opt/ne/vm/ATS1.img,if=none,id=drive-virtio-disk0,format=raw,cach
 e=none -device 
 virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk0,id
 =virtio-disk0,bootindex=1 -netdev 
 tap,fd=20,id=hostnet0,vhost=on,vhostfd=21 -device 
 virtio-net-pci,netdev=hostnet0,id=net0,mac=00:e0:fc:00:0f:00,bus=pci.0
 ,addr=0x3,bootindex=2 -netdev 
 tap,fd=22,id=hostnet1,vhost=on,vhostfd=23 -device 
 virtio-net-pci,netdev=hostnet1,id=net1,mac=00:e0:fc:01:0f:00,bus=pci.0
 ,addr=0x4 -netdev tap,fd=24,id=hostnet2,vhost=on,vhostfd=25 -device 
 virtio-net-pci,netdev=hostnet2,id=net2,mac=00:e0:fc:02:0f:00,bus=pci.0
 ,addr=0x5 -netdev tap,fd=26,id=hostnet3,vhost=on,vhostfd=27 -device 
 virtio-net-pci,netdev=hostnet3,id=net3,mac=00:e0:fc:03:0f:00,bus=pci.0
 ,addr=0x6 -netdev tap,fd=28,id=hostnet4,vhost=on,vhostfd=29 -device 
 virtio-net-pci,netdev=hostnet4,id=net4,mac=00:e0:fc:0a:0f:00,bus=pci.0
 ,addr=0x7 -netdev tap,fd=30,id=hostnet5,vhost=on,vhostfd=31 -device 
 virtio-net-pci,netdev=hostnet5,id=net5,mac=00:e0:fc:0b:0f:00,bus=pci.0
 ,addr=0x9 -chardev pty,id=charserial0 -device 
 isa-serial,chardev=charserial0,id=serial0 -vnc *:0 -k en-us -vga 
 cirrus -device i6300esb,id=watchdog0,bus=pci.0,addr=0xb 
 -watchdog-action poweroff -device 
 virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xa
 
Which QEMU version is this? Can you try with e1000 NICs instead of virtio?

This QEMU version is 1.0.0, but I also test QEMU 1.5.2, the same problem 
exists, including the performance degradation and readonly GFNs' flooding.
I tried with e1000 NICs instead of virtio, including the performance 
degradation and readonly GFNs' flooding, the QEMU version is 1.5.2.
No matter e1000 NICs or virtio NICs, the GFNs' flooding is initiated at 
post-restore stage (i.e. running stage), as soon as the restoring completed, 
the flooding is starting.

Thanks,
Zhang Haoyu

--
   Gleb.



Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with EPT enabled

2013-08-06 Thread Zhanghaoyu (A)
 The QEMU command line (/var/log/libvirt/qemu/[domain name].log), 
 LC_ALL=C PATH=/bin:/sbin:/usr/bin:/usr/sbin HOME=/ 
 QEMU_AUDIO_DRV=none
 /usr/local/bin/qemu-system-x86_64 -name ATS1 -S -M pc-0.12 -cpu 
 qemu32 -enable-kvm -m 12288 -smp 4,sockets=4,cores=1,threads=1 -uuid
 0505ec91-382d-800e-2c79-e5b286eb60b5 -no-user-config -nodefaults 
 -chardev 
 socket,id=charmonitor,path=/var/lib/libvirt/qemu/ATS1.monitor,server,
 n owait -mon chardev=charmonitor,id=monitor,mode=control -rtc 
 base=localtime -no-shutdown -device
 piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive 
 file=/opt/ne/vm/ATS1.img,if=none,id=drive-virtio-disk0,format=raw,cac
 h
 e=none -device
 virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk0,i
 d
 =virtio-disk0,bootindex=1 -netdev
 tap,fd=20,id=hostnet0,vhost=on,vhostfd=21 -device 
 virtio-net-pci,netdev=hostnet0,id=net0,mac=00:e0:fc:00:0f:00,bus=pci.
 0
 ,addr=0x3,bootindex=2 -netdev
 tap,fd=22,id=hostnet1,vhost=on,vhostfd=23 -device 
 virtio-net-pci,netdev=hostnet1,id=net1,mac=00:e0:fc:01:0f:00,bus=pci.
 0
 ,addr=0x4 -netdev tap,fd=24,id=hostnet2,vhost=on,vhostfd=25 -device 
 virtio-net-pci,netdev=hostnet2,id=net2,mac=00:e0:fc:02:0f:00,bus=pci.
 0
 ,addr=0x5 -netdev tap,fd=26,id=hostnet3,vhost=on,vhostfd=27 -device 
 virtio-net-pci,netdev=hostnet3,id=net3,mac=00:e0:fc:03:0f:00,bus=pci.
 0
 ,addr=0x6 -netdev tap,fd=28,id=hostnet4,vhost=on,vhostfd=29 -device 
 virtio-net-pci,netdev=hostnet4,id=net4,mac=00:e0:fc:0a:0f:00,bus=pci.
 0
 ,addr=0x7 -netdev tap,fd=30,id=hostnet5,vhost=on,vhostfd=31 -device 
 virtio-net-pci,netdev=hostnet5,id=net5,mac=00:e0:fc:0b:0f:00,bus=pci.
 0
 ,addr=0x9 -chardev pty,id=charserial0 -device 
 isa-serial,chardev=charserial0,id=serial0 -vnc *:0 -k en-us -vga 
 cirrus -device i6300esb,id=watchdog0,bus=pci.0,addr=0xb
 -watchdog-action poweroff -device
 virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xa
 
Which QEMU version is this? Can you try with e1000 NICs instead of virtio?

This QEMU version is 1.0.0, but I also test QEMU 1.5.2, the same problem 
exists, including the performance degradation and readonly GFNs' flooding.
I tried with e1000 NICs instead of virtio, including the performance 
degradation and readonly GFNs' flooding, the QEMU version is 1.5.2.
No matter e1000 NICs or virtio NICs, the GFNs' flooding is initiated at 
post-restore stage (i.e. running stage), as soon as the restoring completed, 
the flooding is starting.

Thanks,
Zhang Haoyu

--
  Gleb.

Should we focus on the first bad 
commit(612819c3c6e67bac8fceaa7cc402f13b1b63f7e4) and the surprising GFNs' 
flooding?

I applied below patch to  __direct_map(), 
@@ -2223,6 +2223,8 @@ static int __direct_map(struct kvm_vcpu
int pt_write = 0;
gfn_t pseudo_gfn;

+map_writable = true;
+
for_each_shadow_entry(vcpu, (u64)gfn  PAGE_SHIFT, iterator) {
if (iterator.level == level) {
unsigned pte_access = ACC_ALL;
and rebuild the kvm-kmod, then re-insmod it.
After I started a VM, the host seemed to be abnormal, so many programs cannot 
be started successfully, segmentation fault is reported.
In my opinion, after above patch applied, the commit: 
612819c3c6e67bac8fceaa7cc402f13b1b63f7e4 should be of no effect, but the test 
result proved me wrong.
Dose the map_writable value's getting process in hva_to_pfn() have effect on 
the result?

Thanks,
Zhang Haoyu



Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with EPT enabled

2013-08-05 Thread Zhanghaoyu (A)
   hi all,
   
   I met similar problem to these, while performing live migration or 
   save-restore test on the kvm platform (qemu:1.4.0, host:suse11sp2, 
   guest:suse11sp2), running tele-communication software suite in 
   guest, 
   https://lists.gnu.org/archive/html/qemu-devel/2013-05/msg00098.html
   http://comments.gmane.org/gmane.comp.emulators.kvm.devel/102506
   http://thread.gmane.org/gmane.comp.emulators.kvm.devel/100592
   https://bugzilla.kernel.org/show_bug.cgi?id=58771
   
   After live migration or virsh restore [savefile], one process's CPU 
   utilization went up by about 30%, resulted in throughput 
   degradation of this process.
   
   If EPT disabled, this problem gone.
   
   I suspect that kvm hypervisor has business with this problem.
   Based on above suspect, I want to find the two adjacent versions of 
   kvm-kmod which triggers this problem or not (e.g. 2.6.39, 3.0-rc1), 
   and analyze the differences between this two versions, or apply the 
   patches between this two versions by bisection method, finally find 
   the key patches.
   
   Any better ideas?
   
   Thanks,
   Zhang Haoyu
  
  I've attempted to duplicate this on a number of machines that are as 
  similar to yours as I am able to get my hands on, and so far have not 
  been able to see any performance degradation. And from what I've read in 
  the above links, huge pages do not seem to be part of the problem.
  
  So, if you are in a position to bisect the kernel changes, that would 
  probably be the best avenue to pursue in my opinion.
  
  Bruce
  
  I found the first bad 
  commit([612819c3c6e67bac8fceaa7cc402f13b1b63f7e4] KVM: propagate fault 
  r/w information to gup(), allow read-only memory) which triggers this 
  problem by git bisecting the kvm kernel (download from 
  https://git.kernel.org/pub/scm/virt/kvm/kvm.git) changes.
  
  And,
  git log 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4 -n 1 -p  
  612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.log
  git diff 
  612819c3c6e67bac8fceaa7cc402f13b1b63f7e4~1..612819c3c6e67bac8fceaa7cc4
  02f13b1b63f7e4  612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.diff
  
  Then, I diffed 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.log and 
  612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.diff,
  came to a conclusion that all of the differences between 
  612819c3c6e67bac8fceaa7cc402f13b1b63f7e4~1 and 
  612819c3c6e67bac8fceaa7cc402f13b1b63f7e4
  are contributed by no other than 
  612819c3c6e67bac8fceaa7cc402f13b1b63f7e4, so this commit is the 
  peace-breaker which directly or indirectly causes the degradation.
  
  Does the map_writable flag passed to mmu_set_spte() function have effect 
  on PTE's PAT flag or increase the VMEXITs induced by that guest tried to 
  write read-only memory?
  
  Thanks,
  Zhang Haoyu
  
 
 There should be no read-only memory maps backing guest RAM.
 
 Can you confirm map_writable = false is being passed to __direct_map? (this 
 should not happen, for guest RAM).
 And if it is false, please capture the associated GFN.
 
 I added below check and printk at the start of __direct_map() at the fist 
 bad commit version,
 --- kvm-612819c3c6e67bac8fceaa7cc402f13b1b63f7e4/arch/x86/kvm/mmu.c 
 2013-07-26 18:44:05.0 +0800
 +++ kvm-612819/arch/x86/kvm/mmu.c   2013-07-31 00:05:48.0 +0800
 @@ -2223,6 +2223,9 @@ static int __direct_map(struct kvm_vcpu
 int pt_write = 0;
 gfn_t pseudo_gfn;
 
 +if (!map_writable)
 +printk(KERN_ERR %s: %s: gfn = %llu \n, __FILE__, 
 __func__, gfn);
 +
 for_each_shadow_entry(vcpu, (u64)gfn  PAGE_SHIFT, iterator) {
 if (iterator.level == level) {
 unsigned pte_access = ACC_ALL;
 
 I virsh-save the VM, and then virsh-restore it, so many GFNs were printed, 
 you can absolutely describe it as flooding.
 
The flooding you see happens during migrate to file stage because of dirty
page tracking. If you clear dmesg after virsh-save you should not see any
flooding after virsh-restore. I just checked with latest tree, I do not.

I made a verification again.
I virsh-save the VM, during the saving stage, I run 'dmesg', no GFN printed, 
maybe the switching from running stage to pause stage takes so short time, 
no guest-write happens during this switching period.
After the completion of saving operation, I run 'demsg -c' to clear the buffer 
all the same, then I virsh-restore the VM, so many GFNs are printed by running 
'dmesg',
and I also run 'tail -f /var/log/messages' during the restoring stage, so many 
GFNs are flooded dynamically too.
I'm sure that the flooding happens during the virsh-restore stage, not the 
migration stage.

On VM's normal starting stage, only very few GFNs are printed, shown as below
gfn = 16
gfn = 604
gfn = 605
gfn = 606
gfn = 607
gfn = 608
gfn = 609

but on the VM's restoring stage, so many GFNs are printed, taking some examples 
shown as below,
2042600
279
2797778
2797779
2797780
2797781
2797782
2797783
2797784

Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with EPT enabled

2013-08-05 Thread Zhanghaoyu (A)
hi all,

I met similar problem to these, while performing live migration or 
save-restore test on the kvm platform (qemu:1.4.0, host:suse11sp2, 
guest:suse11sp2), running tele-communication software suite in 
guest, 
https://lists.gnu.org/archive/html/qemu-devel/2013-05/msg00098.html
http://comments.gmane.org/gmane.comp.emulators.kvm.devel/102506
http://thread.gmane.org/gmane.comp.emulators.kvm.devel/100592
https://bugzilla.kernel.org/show_bug.cgi?id=58771

After live migration or virsh restore [savefile], one process's CPU 
utilization went up by about 30%, resulted in throughput 
degradation of this process.

If EPT disabled, this problem gone.

I suspect that kvm hypervisor has business with this problem.
Based on above suspect, I want to find the two adjacent versions of 
kvm-kmod which triggers this problem or not (e.g. 2.6.39, 3.0-rc1), 
and analyze the differences between this two versions, or apply the 
patches between this two versions by bisection method, finally find 
the key patches.

Any better ideas?

Thanks,
Zhang Haoyu
   
   I've attempted to duplicate this on a number of machines that are as 
   similar to yours as I am able to get my hands on, and so far have not 
   been able to see any performance degradation. And from what I've read 
   in the above links, huge pages do not seem to be part of the problem.
   
   So, if you are in a position to bisect the kernel changes, that would 
   probably be the best avenue to pursue in my opinion.
   
   Bruce
   
   I found the first bad 
   commit([612819c3c6e67bac8fceaa7cc402f13b1b63f7e4] KVM: propagate fault 
   r/w information to gup(), allow read-only memory) which triggers this 
   problem by git bisecting the kvm kernel (download from 
   https://git.kernel.org/pub/scm/virt/kvm/kvm.git) changes.
   
   And,
   git log 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4 -n 1 -p  
   612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.log
   git diff 
   612819c3c6e67bac8fceaa7cc402f13b1b63f7e4~1..612819c3c6e67bac8fceaa7cc4
   02f13b1b63f7e4  612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.diff
   
   Then, I diffed 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.log and 
   612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.diff,
   came to a conclusion that all of the differences between 
   612819c3c6e67bac8fceaa7cc402f13b1b63f7e4~1 and 
   612819c3c6e67bac8fceaa7cc402f13b1b63f7e4
   are contributed by no other than 
   612819c3c6e67bac8fceaa7cc402f13b1b63f7e4, so this commit is the 
   peace-breaker which directly or indirectly causes the degradation.
   
   Does the map_writable flag passed to mmu_set_spte() function have 
   effect on PTE's PAT flag or increase the VMEXITs induced by that guest 
   tried to write read-only memory?
   
   Thanks,
   Zhang Haoyu
   
  
  There should be no read-only memory maps backing guest RAM.
  
  Can you confirm map_writable = false is being passed to __direct_map? 
  (this should not happen, for guest RAM).
  And if it is false, please capture the associated GFN.
  
  I added below check and printk at the start of __direct_map() at the fist 
  bad commit version,
  --- kvm-612819c3c6e67bac8fceaa7cc402f13b1b63f7e4/arch/x86/kvm/mmu.c 
  2013-07-26 18:44:05.0 +0800
  +++ kvm-612819/arch/x86/kvm/mmu.c   2013-07-31 00:05:48.0 
  +0800
  @@ -2223,6 +2223,9 @@ static int __direct_map(struct kvm_vcpu
  int pt_write = 0;
  gfn_t pseudo_gfn;
  
  +if (!map_writable)
  +printk(KERN_ERR %s: %s: gfn = %llu \n, __FILE__, 
  __func__, gfn);
  +
  for_each_shadow_entry(vcpu, (u64)gfn  PAGE_SHIFT, iterator) {
  if (iterator.level == level) {
  unsigned pte_access = ACC_ALL;
  
  I virsh-save the VM, and then virsh-restore it, so many GFNs were 
  printed, you can absolutely describe it as flooding.
  
 The flooding you see happens during migrate to file stage because of dirty
 page tracking. If you clear dmesg after virsh-save you should not see any
 flooding after virsh-restore. I just checked with latest tree, I do not.
 
 I made a verification again.
 I virsh-save the VM, during the saving stage, I run 'dmesg', no GFN printed, 
 maybe the switching from running stage to pause stage takes so short time, 
 no guest-write happens during this switching period.
 After the completion of saving operation, I run 'demsg -c' to clear the 
 buffer all the same, then I virsh-restore the VM, so many GFNs are printed 
 by running 'dmesg',
 and I also run 'tail -f /var/log/messages' during the restoring stage, so 
 many GFNs are flooded dynamically too.
 I'm sure that the flooding happens during the virsh-restore stage, not the 
 migration stage.
 
Interesting, is this with upstream kernel? For me the situation is
exactly the opposite. What is your command line?
 
I made the verification on the first bad commit 

Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with EPT enabled

2013-08-05 Thread Zhanghaoyu (A)
Hi,

Am 05.08.2013 11:09, schrieb Zhanghaoyu (A):
 When I build the upstream, encounter a problem that I compile and 
 install the upstream(commit: e769ece3b129698d2b09811a6f6d304e4eaa8c29) 
 on sles11sp2 environment via below command cp 
 /boot/config-3.0.13-0.27-default ./.config yes  | make oldconfig 
 make  make modules_install  make install then, I reboot the host, 
 and select the upstream kernel, but during the starting stage, below 
 problem happened, Could not find 
 /dev/disk/by-id/scsi-3600508e0864407c5b8f7ad01-part3
 
 I'm trying to resolve it.

Possibly you need to enable loading unsupported kernel modules?
At least that's needed when testing a kmod with a SUSE kernel.

I have tried to set  allow_unsupported_modules 1 in 
/etc/modprobe.d/unsupported-modules, but the problem still happened.
I replace the whole kernel with the kvm kernel, not only the kvm modules.

Regards,
Andreas


Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with ETP enabled

2013-07-30 Thread Zhanghaoyu (A)

  hi all,
  
  I met similar problem to these, while performing live migration or 
  save-restore test on the kvm platform (qemu:1.4.0, host:suse11sp2, 
  guest:suse11sp2), running tele-communication software suite in 
  guest, 
  https://lists.gnu.org/archive/html/qemu-devel/2013-05/msg00098.html
  http://comments.gmane.org/gmane.comp.emulators.kvm.devel/102506
  http://thread.gmane.org/gmane.comp.emulators.kvm.devel/100592
  https://bugzilla.kernel.org/show_bug.cgi?id=58771
  
  After live migration or virsh restore [savefile], one process's CPU 
  utilization went up by about 30%, resulted in throughput 
  degradation of this process.
  
  If EPT disabled, this problem gone.
  
  I suspect that kvm hypervisor has business with this problem.
  Based on above suspect, I want to find the two adjacent versions of 
  kvm-kmod which triggers this problem or not (e.g. 2.6.39, 3.0-rc1), 
  and analyze the differences between this two versions, or apply the 
  patches between this two versions by bisection method, finally find the 
  key patches.
  
  Any better ideas?
  
  Thanks,
  Zhang Haoyu
 
 I've attempted to duplicate this on a number of machines that are as 
 similar to yours as I am able to get my hands on, and so far have not been 
 able to see any performance degradation. And from what I've read in the 
 above links, huge pages do not seem to be part of the problem.
 
 So, if you are in a position to bisect the kernel changes, that would 
 probably be the best avenue to pursue in my opinion.
 
 Bruce
 
 I found the first bad 
 commit([612819c3c6e67bac8fceaa7cc402f13b1b63f7e4] KVM: propagate fault r/w 
 information to gup(), allow read-only memory) which triggers this problem by 
 git bisecting the kvm kernel (download from 
 https://git.kernel.org/pub/scm/virt/kvm/kvm.git) changes.
 
 And,
 git log 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4 -n 1 -p  
 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.log
 git diff 
 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4~1..612819c3c6e67bac8fceaa7cc4
 02f13b1b63f7e4  612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.diff
 
 Then, I diffed 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.log and 
 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.diff,
 came to a conclusion that all of the differences between 
 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4~1 and 
 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4
 are contributed by no other than 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4, 
 so this commit is the peace-breaker which directly or indirectly causes the 
 degradation.
 
 Does the map_writable flag passed to mmu_set_spte() function have effect on 
 PTE's PAT flag or increase the VMEXITs induced by that guest tried to write 
 read-only memory?
 
 Thanks,
 Zhang Haoyu
 

There should be no read-only memory maps backing guest RAM.

Can you confirm map_writable = false is being passed to __direct_map? (this 
should not happen, for guest RAM).
And if it is false, please capture the associated GFN.

I added below check and printk at the start of __direct_map() at the fist bad 
commit version,
--- kvm-612819c3c6e67bac8fceaa7cc402f13b1b63f7e4/arch/x86/kvm/mmu.c 
2013-07-26 18:44:05.0 +0800
+++ kvm-612819/arch/x86/kvm/mmu.c   2013-07-31 00:05:48.0 +0800
@@ -2223,6 +2223,9 @@ static int __direct_map(struct kvm_vcpu
int pt_write = 0;
gfn_t pseudo_gfn;

+if (!map_writable)
+printk(KERN_ERR %s: %s: gfn = %llu \n, __FILE__, __func__, 
gfn);
+
for_each_shadow_entry(vcpu, (u64)gfn  PAGE_SHIFT, iterator) {
if (iterator.level == level) {
unsigned pte_access = ACC_ALL;

I virsh-save the VM, and then virsh-restore it, so many GFNs were printed, you 
can absolutely describe it as flooding.

Its probably an issue with an older get_user_pages variant (either in kvm-kmod 
or the older kernel). Is there any indication of a similar issue with upstream 
kernel?
I will test the upstream kvm 
host(https://git.kernel.org/pub/scm/virt/kvm/kvm.git) later, if the problem is 
still there, 
I will revert the first bad commit patch: 
612819c3c6e67bac8fceaa7cc402f13b1b63f7e4 on the upstream, then test it again.

And, I collected the VMEXITs statistics in pre-save and post-restore period at 
first bad commit version,
pre-save:
COTS-F10S03:~ # perf stat -e kvm:* -a sleep 30

 Performance counter stats for 'sleep 30':

   1222318 kvm:kvm_entry
 0 kvm:kvm_hypercall
 0 kvm:kvm_hv_hypercall
351755 kvm:kvm_pio
  6703 kvm:kvm_cpuid
692502 kvm:kvm_apic
   1234173 kvm:kvm_exit
223956 kvm:kvm_inj_virq
 0 kvm:kvm_inj_exception
 16028 kvm:kvm_page_fault
 59872 kvm:kvm_msr
 0 kvm:kvm_cr
169596 kvm:kvm_pic_set_irq
 81455 kvm:kvm_apic_ipi
245103 kvm:kvm_apic_accept_irq
 0 kvm:kvm_nested_vmrun
 0 kvm:kvm_nested_intercepts
  

Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with ETP enabled

2013-07-27 Thread Zhanghaoyu (A)
 hi all,
 
 I met similar problem to these, while performing live migration or 
 save-restore test on the kvm platform (qemu:1.4.0, host:suse11sp2, 
 guest:suse11sp2), running tele-communication software suite in guest, 
 https://lists.gnu.org/archive/html/qemu-devel/2013-05/msg00098.html
 http://comments.gmane.org/gmane.comp.emulators.kvm.devel/102506
 http://thread.gmane.org/gmane.comp.emulators.kvm.devel/100592
 https://bugzilla.kernel.org/show_bug.cgi?id=58771
 
 After live migration or virsh restore [savefile], one process's CPU 
 utilization went up by about 30%, resulted in throughput degradation 
 of this process.
 
 If EPT disabled, this problem gone.
 
 I suspect that kvm hypervisor has business with this problem.
 Based on above suspect, I want to find the two adjacent versions of 
 kvm-kmod which triggers this problem or not (e.g. 2.6.39, 3.0-rc1), 
 and analyze the differences between this two versions, or apply the 
 patches between this two versions by bisection method, finally find the key 
 patches.
 
 Any better ideas?
 
 Thanks,
 Zhang Haoyu

I've attempted to duplicate this on a number of machines that are as similar 
to yours as I am able to get my hands on, and so far have not been able to see 
any performance degradation. And from what I've read in the above links, huge 
pages do not seem to be part of the problem.

So, if you are in a position to bisect the kernel changes, that would probably 
be the best avenue to pursue in my opinion.

Bruce

I found the first bad commit([612819c3c6e67bac8fceaa7cc402f13b1b63f7e4] KVM: 
propagate fault r/w information to gup(), allow read-only memory) which 
triggers this problem 
by git bisecting the kvm kernel (download from 
https://git.kernel.org/pub/scm/virt/kvm/kvm.git) changes.

And, 
git log 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4 -n 1 -p  
612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.log
git diff 
612819c3c6e67bac8fceaa7cc402f13b1b63f7e4~1..612819c3c6e67bac8fceaa7cc402f13b1b63f7e4
  612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.diff

Then, I diffed 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.log and 
612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.diff, 
came to a conclusion that all of the differences between 
612819c3c6e67bac8fceaa7cc402f13b1b63f7e4~1 and 
612819c3c6e67bac8fceaa7cc402f13b1b63f7e4
are contributed by no other than 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4, so 
this commit is the peace-breaker which directly or indirectly causes the 
degradation.

Does the map_writable flag passed to mmu_set_spte() function have effect on 
PTE's PAT flag or increase the VMEXITs induced by that guest tried to write 
read-only memory?

Thanks,
Zhang Haoyu





[Qemu-devel] vm performance degradation after kvm live migration or save-restore with ETP enabled

2013-07-11 Thread Zhanghaoyu (A)
hi all,

I met similar problem to these, while performing live migration or save-restore 
test on the kvm platform (qemu:1.4.0, host:suse11sp2, guest:suse11sp2), running 
tele-communication software suite in guest,
https://lists.gnu.org/archive/html/qemu-devel/2013-05/msg00098.html
http://comments.gmane.org/gmane.comp.emulators.kvm.devel/102506
http://thread.gmane.org/gmane.comp.emulators.kvm.devel/100592
https://bugzilla.kernel.org/show_bug.cgi?id=58771

After live migration or virsh restore [savefile], one process's CPU utilization 
went up by about 30%, resulted in throughput degradation of this process.
oprofile report on this process in guest,
pre live migration:
CPU: CPU with timer interrupt, speed 0 MHz (estimated)
Profiling through timer interrupt
samples  %app name symbol name
248  12.3016  no-vmlinux   (no symbols)
783.8690  libc.so.6memset
683.3730  libc.so.6memcpy
301.4881  cscf.scu SipMmBufMemAlloc
291.4385  libpthread.so.0  pthread_mutex_lock
261.2897  cscf.scu SipApiGetNextIe
251.2401  cscf.scu DBFI_DATA_Search
200.9921  libpthread.so.0  __pthread_mutex_unlock_usercnt
160.7937  cscf.scu DLM_FreeSlice
160.7937  cscf.scu receivemessage
150.7440  cscf.scu SipSmCopyString
140.6944  cscf.scu DLM_AllocSlice

post live migration:
CPU: CPU with timer interrupt, speed 0 MHz (estimated)
Profiling through timer interrupt
samples  %app name symbol name
1586 42.2370  libc.so.6memcpy
271   7.2170  no-vmlinux   (no symbols)
832.2104  libc.so.6memset
411.0919  libpthread.so.0  __pthread_mutex_unlock_usercnt
350.9321  cscf.scu SipMmBufMemAlloc
290.7723  cscf.scu DLM_AllocSlice
280.7457  libpthread.so.0  pthread_mutex_lock
230.6125  cscf.scu SipApiGetNextIe
170.4527  cscf.scu SipSmCopyString
160.4261  cscf.scu receivemessage
150.3995  cscf.scu SipcMsgStatHandle
140.3728  cscf.scu Urilex
120.3196  cscf.scu DBFI_DATA_Search
120.3196  cscf.scu SipDsmGetHdrBitValInner
120.3196  cscf.scu SipSmGetDataFromRefString

So, memcpy costs much more cpu cycles after live migration. Then, I restart the 
process, this problem disappeared. save-restore has the similar problem.

perf report on vcpu thread in host,
pre live migration:
Performance counter stats for thread id '21082':

 0 page-faults
 0 minor-faults
 0 major-faults
 31616 cs
   506 migrations
 0 alignment-faults
 0 emulation-faults
5075957539 L1-dcache-loads  
[21.32%]
 324685106 L1-dcache-load-misses #6.40% of all L1-dcache hits   
[21.85%]
3681777120 L1-dcache-stores 
[21.65%]
  65251823 L1-dcache-store-misses# 1.77%
   [22.78%]
 0 L1-dcache-prefetches 
[22.84%]
 0 L1-dcache-prefetch-misses
[22.32%]
9321652613 L1-icache-loads  
[22.60%]
1353418869 L1-icache-load-misses #   14.52% of all L1-icache hits   
[21.92%]
 169126969 LLC-loads
[21.87%]
  12583605 LLC-load-misses   #7.44% of all LL-cache hits
[ 5.84%]
 132853447 LLC-stores   
[ 6.61%]
  10601171 LLC-store-misses  #7.9%  
 [ 5.01%]
  25309497 LLC-prefetches #30%  
[ 4.96%]
   7723198 LLC-prefetch-misses  
[ 6.04%]
4954075817 dTLB-loads   
[11.56%]
  26753106 dTLB-load-misses  #0.54% of all dTLB cache hits  
[16.80%]
3553702874 dTLB-stores  
[22.37%]
   4720313 dTLB-store-misses#0.13%  
  [21.46%]
 not counted dTLB-prefetches
 not counted dTLB-prefetch-misses

  60.000920666 seconds time elapsed

post live migration:
Performance counter stats for thread id '1579':

 0 page-faults  
[100.00%]
 0 

Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with ETP enabled

2013-07-11 Thread Zhanghaoyu (A)
 Hi,
 
 Am 11.07.2013 11:36, schrieb Zhanghaoyu (A):
  I met similar problem to these, while performing live migration or
 save-restore test on the kvm platform (qemu:1.4.0, host:suse11sp2,
 guest:suse11sp2), running tele-communication software suite in guest,
  https://lists.gnu.org/archive/html/qemu-devel/2013-05/msg00098.html
  http://comments.gmane.org/gmane.comp.emulators.kvm.devel/102506
  http://thread.gmane.org/gmane.comp.emulators.kvm.devel/100592
  https://bugzilla.kernel.org/show_bug.cgi?id=58771
 
  After live migration or virsh restore [savefile], one process's CPU
 utilization went up by about 30%, resulted in throughput degradation of
 this process.
  oprofile report on this process in guest,
  pre live migration:
 
 So far we've been unable to reproduce this with a pure qemu-kvm /
 qemu-system-x86_64 command line on several EPT machines, whereas for
 virsh it was reported as confirmed. Can you please share the resulting
 QEMU command line from libvirt logs or process list?
qemu command line from /var/log/libvirt/qemu/[domain].log, 
LC_ALL=C 
PATH=/sbin:/usr/sbin:/usr/local/sbin:/root/bin:/usr/local/bin:/usr/bin:/bin:/usr/X11R6/bin:/usr/games:/usr/lib/mit/bin:/usr/lib/mit/sbin
 HOME=/root USER=root LOGNAME=root QEMU_AUDIO_DRV=none 
/usr/local/bin/qemu-system-x86_64 -name CSC2 -S -M pc-0.12 -cpu qemu32 
-enable-kvm -m 12288 -smp 4,sockets=4,cores=1,threads=1 -uuid 
76e03575-a3ad-589a-e039-40160274bb97 -no-user-config -nodefaults -chardev 
socket,id=charmonitor,path=/var/lib/libvirt/qemu/CSC2.monitor,server,nowait 
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime 
-no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive 
file=/opt/ne/vm/CSC2.img,if=none,id=drive-virtio-disk0,format=raw,cache=none 
-device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
 -netdev tap,fd=20,id=hostnet0,vhost=on,vhostfd=22 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=00:e0:fc:00:0f:01,bus=pci.0,addr=0x3,bootindex=2
 -netdev tap,fd=23,id=hostnet1,vhost=on,vhostfd=24 -device 
virtio-net-pci,netdev=hostnet1,id=net1,mac=00:e0:fc:01:0f:01,bus=pci.0,addr=0x4 
-netdev tap,fd=25,id=hostnet2,vhost=on,vhostfd=26 -device 
virtio-net-pci,netdev=hostnet2,id=net2,mac=00:e0:fc:02:0f:01,bus=pci.0,addr=0x5 
-netdev tap,fd=27,id=hostnet3,vhost=on,vhostfd=28 -device 
virtio-net-pci,netdev=hostnet3,id=net3,mac=00:e0:fc:03:0f:01,bus=pci.0,addr=0x6 
-netdev tap,fd=29,id=hostnet4,vhost=on,vhostfd=30 -device 
virtio-net-pci,netdev=hostnet4,id=net4,mac=00:e0:fc:0a:0f:01,bus=pci.0,addr=0x7 
-netdev tap,fd=31,id=hostnet5,vhost=on,vhostfd=32 -device 
virtio-net-pci,netdev=hostnet5,id=net5,mac=00:e0:fc:0b:0f:01,bus=pci.0,addr=0x9 
-chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 
-vnc *:1 -k en-us -vga cirrus -device i6300esb,id=watchdog0,bus=pci.0,addr=0xb 
-watchdog-action poweroff -device 
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xa
 
 Are both host and guest kernel at 3.0.80 (latest SLES updates)?
No, both host and guest are just raw sles11-sp2-64-GM, kernel version: 
3.0.13-0.27.

Thanks,
Zhang Haoyu
 
 Thanks,
 Andreas
 
 --
 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
 GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg



Re: [Qemu-devel] meaningless to compare irqfd's msi message with new msi message in virtio_pci_vq_vector_unmask

2013-07-04 Thread Zhanghaoyu (A)
 I searched vector_irqfd globally,  no place found to set/change irqfd's msi 
 message, only irqfd's virq or users member may be changed in 
 kvm_virtio_pci_vq_vector_use, kvm_virtio_pci_vq_vector_release, etc.
 So I think it's meaningless to do below check in virtio_pci_vq_vector_unmask, 
 if (irqfd-msg.data != msg.data || irqfd-msg.address != msg.address)
 
 And, I think the comparison between old msi message and new msi messge should 
 be performed in kvm_update_routing_entry, the raw patch shown as below,
 Signed-off-by: Zhang Haoyu haoyu.zh...@huawei.com
 Signed-off-by: Zhang Huanzhong zhanghuanzh...@huawei.com
 ---
  hw/virtio/virtio-pci.c |8 +++-
  kvm-all.c  |5 +
  2 files changed, 8 insertions(+), 5 deletions(-)
 
 diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c index 
 b070b64..e4829a3 100644
 --- a/hw/virtio/virtio-pci.c
 +++ b/hw/virtio/virtio-pci.c
 @@ -613,11 +613,9 @@ static int virtio_pci_vq_vector_unmask(VirtIOPCIProxy 
 *proxy,
  
  if (proxy-vector_irqfd) {
  irqfd = proxy-vector_irqfd[vector];
 -if (irqfd-msg.data != msg.data || irqfd-msg.address != 
 msg.address) {
 -ret = kvm_irqchip_update_msi_route(kvm_state, irqfd-virq, msg);
 -if (ret  0) {
 -return ret;
 -}
 +ret = kvm_irqchip_update_msi_route(kvm_state, irqfd-virq, msg);
 +if (ret  0) {
 +return ret;
  }
  }
  
 diff --git a/kvm-all.c b/kvm-all.c
 index e6b262f..63a33b4 100644
 --- a/kvm-all.c
 +++ b/kvm-all.c
 @@ -1034,6 +1034,11 @@ static int kvm_update_routing_entry(KVMState *s,
  continue;
  }
  
 +if (entry-type == new_entry-type 
 +entry-flags == new_entry-flags 
 +!memcmp(entry-u, new_entry-u, sizeof(entry-u))) {
 +return 0;
 +}
  entry-type = new_entry-type;
  entry-flags = new_entry-flags;
  entry-u = new_entry-u;
 --
 1.7.3.1.msysgit.0
 
 
 This patch works for both virtio-pci device and pci-passthrough device.
 MST and I had been discussed this patch before, this patch can avoid 
 meaninglessly updating the routing entry in kvm hypervisor when new msi 
 message is identical with old msi message, 
 especially in some cases, for example, frequently mask/unmask per-vector 
 masking control bit in ISR on some old linux guest(e.g., rhel-5.5), which 
 gains much.
 At MST's request, the number will be provided later.

I started a VM(rhel-5.5) with direct-assigned intel 82599 VF. And, ran 
iperf-client on the VM, iperf-server on the host where the VM resides, 
so communication between VM and host was switched in the 82599 NIC. The 
throughput comparison between above patch applied and not shown as below,
before this patch applied:
[ID]   IntervalTransfer  Bandwidth
[SUM]  0.0-10.1 sec96.5Mbytes80.1Mbits/sec
after this patch applied:
[ID]   IntervalTransfer  Bandwidth
[SUM]  0.0-10.0 sec10.9GBytes9.37Gbits/sec

Then, I ran netperf-client on the VM, netperf-server on the host where the VM 
resides, the command shown as below
netperf-client: netperf -H [host ip] -l 120 -t TCP_RR -- -m 1024 -r 32,1024
netperf-server: netserver
The transaction rate comparison between above patch applied and not shown as 
below,
before this patch applied:
SocketSize   Request Resp. Elapsed Trans.
Send  Recv   SizeSize  TimeRate
Bytes Bytes  bytes   bytes secs.   Per sec
16384 87380  32  1024  120.01  36.61
65536 87380
after this patch applied:
SocketSize   Request Resp. Elapsed Trans.
Send  Recv   SizeSize  TimeRate
Bytes Bytes  bytes   bytes secs.   Per sec
16384 87380  32  1024  120.01  7464.89
65536 87380

 Thanks,
 Zhang Haoyu



[Qemu-devel] [PATCH] migration: add timeout option for tcp migration send/receive socket

2013-06-29 Thread Zhanghaoyu (A)
When network disconnection occurs during live migration, the migration thread 
will be stuck in the function sendmsg(), as the migration socket is in 
~O_NONBLOCK mode now.

Signed-off-by: Zeng Junliang zengjunli...@huawei.com
---
 include/migration/migration.h |4 
 migration-tcp.c   |   23 ++-
 2 files changed, 26 insertions(+), 1 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index f0640e0..1a56248 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -23,6 +23,8 @@
 #include qapi-types.h
 #include exec/cpu-common.h
 
+#define QEMU_MIGRATE_SOCKET_OP_TIMEOUT 60
+
 struct MigrationParams {
 bool blk;
 bool shared;
@@ -109,6 +111,8 @@ uint64_t xbzrle_mig_pages_transferred(void);
 uint64_t xbzrle_mig_pages_overflow(void);
 uint64_t xbzrle_mig_pages_cache_miss(void);
 
+int tcp_migration_set_socket_timeout(int fd, int optname, int timeout_in_sec);
+
 /**
  * @migrate_add_blocker - prevent migration from proceeding
  *
diff --git a/migration-tcp.c b/migration-tcp.c
index b20ee58..860238b 100644
--- a/migration-tcp.c
+++ b/migration-tcp.c
@@ -29,11 +29,28 @@
 do { } while (0)
 #endif
 
+int tcp_migration_set_socket_timeout(int fd, int optname, int timeout_in_sec)
+{
+struct timeval timeout;
+int ret = 0;
+
+if (fd  0 || timeout_in_sec  0 ||
+(optname != SO_RCVTIMEO  optname != SO_SNDTIMEO))
+return -1;
+
+timeout.tv_sec = timeout_in_sec;
+timeout.tv_usec = 0;
+
+ret = qemu_setsockopt(fd, SOL_SOCKET, optname, timeout, sizeof(timeout));
+
+return ret;
+}
+
 static void tcp_wait_for_connect(int fd, void *opaque)
 {
 MigrationState *s = opaque;
 
-if (fd  0) {
+if (tcp_migration_set_socket_timeout(fd, SO_SNDTIMEO, 
QEMU_MIGRATE_SOCKET_OP_TIMEOUT)  0) {
 DPRINTF(migrate connect error\n);
 s-file = NULL;
 migrate_fd_error(s);
@@ -76,6 +93,10 @@ static void tcp_accept_incoming_migration(void *opaque)
 goto out;
 }
 
+if (tcp_migration_set_socket_timeout(c, SO_RCVTIMEO, 
QEMU_MIGRATE_SOCKET_OP_TIMEOUT)  0) {
+fprintf(stderr, set tcp migration socket receive timeout error\n);
+goto out;
+}
 process_incoming_migration(f);
 return;
 
-- 
1.7.3.1.msysgit.0



[Qemu-devel] [PATCH] migration: add timeout option for tcp migraion send/receive socket

2013-06-28 Thread Zhanghaoyu (A)
When network disconnection occurs during live migration, the migration thread 
will be stuck in the function sendmsg(), as the migration socket is in 
~O_NONBLOCK mode now.

Signed-off-by: Zeng Junliang zengjunli...@huawei.com
---
 include/migration/migration.h |4 
 migration-tcp.c   |   23 ++-
 2 files changed, 26 insertions(+), 1 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index f0640e0..1a56248 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -23,6 +23,8 @@
 #include qapi-types.h
 #include exec/cpu-common.h
 
+#define QEMU_MIGRATE_SOCKET_OP_TIMEOUT 60
+
 struct MigrationParams {
 bool blk;
 bool shared;
@@ -109,6 +111,8 @@ uint64_t xbzrle_mig_pages_transferred(void);
 uint64_t xbzrle_mig_pages_overflow(void);
 uint64_t xbzrle_mig_pages_cache_miss(void);
 
+int tcp_migration_set_socket_timeout(int fd, int optname, int timeout_in_sec);
+
 /**
  * @migrate_add_blocker - prevent migration from proceeding
  *
diff --git a/migration-tcp.c b/migration-tcp.c
index b20ee58..391db0a 100644
--- a/migration-tcp.c
+++ b/migration-tcp.c
@@ -33,7 +33,7 @@ static void tcp_wait_for_connect(int fd, void *opaque)
 {
 MigrationState *s = opaque;
 
-if (fd  0) {
+if (tcp_migration_set_socket_timeout(fd, SO_SNDTIMEO, 
QEMU_MIGRATE_SOCKET_OP_TIMEOUT)  0) {
 DPRINTF(migrate connect error\n);
 s-file = NULL;
 migrate_fd_error(s);
@@ -76,6 +76,10 @@ static void tcp_accept_incoming_migration(void *opaque)
 goto out;
 }
 
+if (tcp_migration_set_socket_timeout(c, SO_RCVTIMEO, 
QEMU_MIGRATE_SOCKET_OP_TIMEOUT)  0) {
+fprintf(stderr, set tcp migration socket receive timeout error\n);
+goto out;
+}
 process_incoming_migration(f);
 return;
 
@@ -95,3 +99,20 @@ void tcp_start_incoming_migration(const char *host_port, 
Error **errp)
 qemu_set_fd_handler2(s, NULL, tcp_accept_incoming_migration, NULL,
  (void *)(intptr_t)s);
 }
+
+int tcp_migration_set_socket_timeout(int fd, int optname, int timeout_in_sec)
+{
+struct timeval timeout;
+int ret = 0;
+
+if (fd  0 || timeout_in_sec  0 ||
+(optname != SO_RCVTIMEO  optname != SO_SNDTIMEO))
+return -1;
+
+timeout.tv_sec = timeout_in_sec;
+timeout.tv_usec = 0;
+
+ret = qemu_setsockopt(fd, SOL_SOCKET, optname, timeout, sizeof(timeout));
+
+return ret;
+}
\ No newline at end of file
-- 
1.7.3.1.msysgit.0



[Qemu-devel] meaningless to compare irqfd's msi message with new msi message in virtio_pci_vq_vector_unmask

2013-06-25 Thread Zhanghaoyu (A)
I searched vector_irqfd globally,  no place found to set/change irqfd's msi 
message, only irqfd's virq or users member may be changed in 
kvm_virtio_pci_vq_vector_use, 
kvm_virtio_pci_vq_vector_release, etc.
So I think it's meaningless to do below check in virtio_pci_vq_vector_unmask,
if (irqfd-msg.data != msg.data || irqfd-msg.address != msg.address)

And, I think the comparison between old msi message and new msi messge should 
be performed in kvm_update_routing_entry, the raw patch shown as below,
Signed-off-by: Zhang Haoyu haoyu.zh...@huawei.com
Signed-off-by: Zhang Huanzhong zhanghuanzh...@huawei.com
---
 hw/virtio/virtio-pci.c |8 +++-
 kvm-all.c  |5 +
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index b070b64..e4829a3 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -613,11 +613,9 @@ static int virtio_pci_vq_vector_unmask(VirtIOPCIProxy 
*proxy,
 
 if (proxy-vector_irqfd) {
 irqfd = proxy-vector_irqfd[vector];
-if (irqfd-msg.data != msg.data || irqfd-msg.address != msg.address) {
-ret = kvm_irqchip_update_msi_route(kvm_state, irqfd-virq, msg);
-if (ret  0) {
-return ret;
-}
+ret = kvm_irqchip_update_msi_route(kvm_state, irqfd-virq, msg);
+if (ret  0) {
+return ret;
 }
 }
 
diff --git a/kvm-all.c b/kvm-all.c
index e6b262f..63a33b4 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1034,6 +1034,11 @@ static int kvm_update_routing_entry(KVMState *s,
 continue;
 }
 
+if (entry-type == new_entry-type 
+entry-flags == new_entry-flags 
+!memcmp(entry-u, new_entry-u, sizeof(entry-u))) {
+return 0;
+}
 entry-type = new_entry-type;
 entry-flags = new_entry-flags;
 entry-u = new_entry-u;
-- 
1.7.3.1.msysgit.0


This patch works for both virtio-pci device and pci-passthrough device.
MST and I had been discussed this patch before, this patch can avoid 
meaninglessly updating the routing entry in kvm hypervisor when new msi message 
is identical with old msi message, 
especially in some cases, for example, frequently mask/unmask per-vector 
masking control bit in ISR on some old linux guest(e.g., rhel-5.5), which gains 
much.
At MST's request, the number will be provided later.

Thanks,
Zhang Haoyu



Re: [Qemu-devel] [PATCH] [KVM] Needless to update msi route when only msi-x entry control section changed

2013-05-06 Thread Zhanghaoyu (A)
  With regard to old version linux guest(e.g., rhel-5.5), in ISR 
  processing, mask and unmask msi-x vector every time, which result in 
  VMEXIT, then QEMU will invoke kvm_irqchip_update_msi_route() to ask KVM 
  hypervisor to update the VM irq routing table. In KVM hypervisor, 
  synchronizing RCU needed after updating routing table, so much time 
  consumed for waiting in wait_rcu_gp(). So CPU usage in VM is so high, 
  while from the view of host, VM's total CPU usage is so low. 
  Masking/unmasking msi-x vector only set msi-x entry control section, 
  needless to update VM irq routing table.
  
  Signed-off-by: Zhang Haoyu haoyu.zh...@huawei.com
  Signed-off-by: Huang Weidong weidong.hu...@huawei.com
  Signed-off-by: Qin Chuanyu qinchua...@huawei.com
  ---
  hw/i386/kvm/pci-assign.c | 3 +++
  1 files changed, 3 insertions(+)
  
  --- a/hw/i386/kvm/pci-assign.c  2013-05-04 15:53:18.0 +0800
  +++ b/hw/i386/kvm/pci-assign.c  2013-05-04 15:50:46.0 +0800
  @@ -1576,6 +1576,8 @@ static void assigned_dev_msix_mmio_write
   MSIMessage msg;
   int ret;
  
  +/* Needless to update msi route when only msi-x entry 
  control section changed */
  +if ((addr  (PCI_MSIX_ENTRY_SIZE - 1)) != 
  + PCI_MSIX_ENTRY_VECTOR_CTRL){
   msg.address = entry-addr_lo |
   ((uint64_t)entry-addr_hi  32);
   msg.data = entry-data; @@ -1585,6 +1587,7 @@ 
  static void assigned_dev_msix_mmio_write
   if (ret) {
   error_report(Error updating irq routing entry 
  (%d), ret);
   }
  +}
   }
   }
   }
  
  Thanks,
  Zhang Haoyu
 
 
 If guest wants to update the vector, it does it like this:
 mask
 update
 unmask
 and it looks like the only point where we update the vector is on unmask, 
 so this patch will mean we don't update the vector ever.
 
 I'm not sure this combination (old guest + legacy device assignment
 framework) is worth optimizing. Can you try VFIO instead?
 
 But if it is, the right way to do this is probably along the lines of the 
 below patch. Want to try it out?
 
 diff --git a/kvm-all.c b/kvm-all.c
 index 2d92721..afe2327 100644
 --- a/kvm-all.c
 +++ b/kvm-all.c
 @@ -1006,6 +1006,11 @@ static int kvm_update_routing_entry(KVMState *s,
  continue;
  }
  
 +if (entry-type == new_entry-type 
 +entry-flags == new_entry-flags 
 +entry-u == new_entry-u) {
 +return 0;
 +}
  entry-type = new_entry-type;
  entry-flags = new_entry-flags;
  entry-u = new_entry-u;
 
 
 union type cannot be directly compared, I tried out below patch 
 instead,
 --- a/kvm-all.c 2013-05-06 09:56:38.0 +0800
 +++ b/kvm-all.c 2013-05-06 09:56:45.0 +0800
 @@ -1008,6 +1008,12 @@ static int kvm_update_routing_entry(KVMS
  continue;
  }
 
 +if (entry-type == new_entry-type 
 +entry-flags == new_entry-flags 
 +!memcmp(entry-u, new_entry-u, sizeof(entry-u))) {
 +return 0;
 +}
 +
  entry-type = new_entry-type;
  entry-flags = new_entry-flags;
  entry-u = new_entry-u;
 
 MST's patch is more universal than my first patch fixed in 
 assigned_dev_msix_mmio_write().
 On the case that the msix entry's other section but control section is set 
 to the identical value with old entry's, MST's patch also works.
 MST's patch also works on the non-passthrough scenario.

Any numbers for either case?

I'm not sure what you said exactly means. 
Do you want me to make a further statement for comparison between above two 
patches?
If yes, no other comments.

 And, after MST's patch applied, the below check in 
 virtio_pci_vq_vector_unmask() can be removed.
 --- a/hw/virtio/virtio-pci.c2013-05-04 15:53:20.0 +0800
 +++ b/hw/virtio/virtio-pci.c2013-05-06 10:25:58.0 +0800
 @@ -619,12 +619,10 @@ static int virtio_pci_vq_vector_unmask(V
 
  if (proxy-vector_irqfd) {
  irqfd = proxy-vector_irqfd[vector];
 -if (irqfd-msg.data != msg.data || irqfd-msg.address != 
 msg.address) {
  ret = kvm_irqchip_update_msi_route(kvm_state, irqfd-virq, msg);
  if (ret  0) {
  return ret;
  }
 -}
  }
 
  /* If guest supports masking, irqfd is already setup, unmask it.
 
 Thanks,
 Zhang Haoyu



Re: [Qemu-devel] [PATCH] [KVM] Needless to update msi route when only msi-x entry control section changed

2013-05-05 Thread Zhanghaoyu (A)
 With regard to old version linux guest(e.g., rhel-5.5), in ISR processing, 
 mask and unmask msi-x vector every time, which result in VMEXIT, then QEMU 
 will invoke kvm_irqchip_update_msi_route() to ask KVM hypervisor to update 
 the VM irq routing table. In KVM hypervisor, synchronizing RCU needed after 
 updating routing table, so much time consumed for waiting in wait_rcu_gp(). 
 So CPU usage in VM is so high, while from the view of host, VM's total CPU 
 usage is so low. 
 Masking/unmasking msi-x vector only set msi-x entry control section, 
 needless to update VM irq routing table.
 
 Signed-off-by: Zhang Haoyu haoyu.zh...@huawei.com
 Signed-off-by: Huang Weidong weidong.hu...@huawei.com
 Signed-off-by: Qin Chuanyu qinchua...@huawei.com
 ---
 hw/i386/kvm/pci-assign.c | 3 +++
 1 files changed, 3 insertions(+)
 
 --- a/hw/i386/kvm/pci-assign.c  2013-05-04 15:53:18.0 +0800
 +++ b/hw/i386/kvm/pci-assign.c  2013-05-04 15:50:46.0 +0800
 @@ -1576,6 +1576,8 @@ static void assigned_dev_msix_mmio_write
  MSIMessage msg;
  int ret;
 
 +/* Needless to update msi route when only msi-x entry 
 control section changed */
 +if ((addr  (PCI_MSIX_ENTRY_SIZE - 1)) != 
 + PCI_MSIX_ENTRY_VECTOR_CTRL){
  msg.address = entry-addr_lo |
  ((uint64_t)entry-addr_hi  32);
  msg.data = entry-data; @@ -1585,6 +1587,7 @@ static 
 void assigned_dev_msix_mmio_write
  if (ret) {
  error_report(Error updating irq routing entry (%d), 
 ret);
  }
 +}
  }
  }
  }
 
 Thanks,
 Zhang Haoyu


If guest wants to update the vector, it does it like this:
mask
update
unmask
and it looks like the only point where we update the vector is on unmask, so 
this patch will mean we don't update the vector ever.

I'm not sure this combination (old guest + legacy device assignment
framework) is worth optimizing. Can you try VFIO instead?

But if it is, the right way to do this is probably along the lines of the 
below patch. Want to try it out?

diff --git a/kvm-all.c b/kvm-all.c
index 2d92721..afe2327 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1006,6 +1006,11 @@ static int kvm_update_routing_entry(KVMState *s,
 continue;
 }
 
+if (entry-type == new_entry-type 
+entry-flags == new_entry-flags 
+entry-u == new_entry-u) {
+return 0;
+}
 entry-type = new_entry-type;
 entry-flags = new_entry-flags;
 entry-u = new_entry-u;


union type cannot be directly compared, I tried out below patch instead,
--- a/kvm-all.c 2013-05-06 09:56:38.0 +0800
+++ b/kvm-all.c 2013-05-06 09:56:45.0 +0800
@@ -1008,6 +1008,12 @@ static int kvm_update_routing_entry(KVMS
 continue;
 }

+if (entry-type == new_entry-type 
+entry-flags == new_entry-flags 
+!memcmp(entry-u, new_entry-u, sizeof(entry-u))) {
+return 0;
+}
+
 entry-type = new_entry-type;
 entry-flags = new_entry-flags;
 entry-u = new_entry-u;

MST's patch is more universal than my first patch fixed in 
assigned_dev_msix_mmio_write().
On the case that the msix entry's other section but control section is set to 
the identical value with old entry's, MST's patch also works.
MST's patch also works on the non-passthrough scenario.

And, after MST's patch applied, the below check in 
virtio_pci_vq_vector_unmask() can be removed.
--- a/hw/virtio/virtio-pci.c2013-05-04 15:53:20.0 +0800
+++ b/hw/virtio/virtio-pci.c2013-05-06 10:25:58.0 +0800
@@ -619,12 +619,10 @@ static int virtio_pci_vq_vector_unmask(V

 if (proxy-vector_irqfd) {
 irqfd = proxy-vector_irqfd[vector];
-if (irqfd-msg.data != msg.data || irqfd-msg.address != msg.address) {
 ret = kvm_irqchip_update_msi_route(kvm_state, irqfd-virq, msg);
 if (ret  0) {
 return ret;
 }
-}
 }

 /* If guest supports masking, irqfd is already setup, unmask it.

Thanks,
Zhang Haoyu



[Qemu-devel] [PATCH] [KVM] Needless to update msi route when only msi-x entry control section changed

2013-05-04 Thread Zhanghaoyu (A)
With regard to old version linux guest(e.g., rhel-5.5), in ISR processing, mask 
and unmask msi-x vector every time, which result in VMEXIT, then QEMU will 
invoke kvm_irqchip_update_msi_route() to ask KVM hypervisor to update the VM 
irq routing table. In KVM hypervisor, synchronizing RCU needed after updating 
routing table, so much time consumed for waiting in wait_rcu_gp(). So CPU usage 
in VM is so high, while from the view of host, VM's total CPU usage is so low. 
Masking/unmasking msi-x vector only set msi-x entry control section, needless 
to update VM irq routing table.

Signed-off-by: Zhang Haoyu haoyu.zh...@huawei.com
Signed-off-by: Huang Weidong weidong.hu...@huawei.com
Signed-off-by: Qin Chuanyu qinchua...@huawei.com
---
hw/i386/kvm/pci-assign.c | 3 +++
1 files changed, 3 insertions(+)

--- a/hw/i386/kvm/pci-assign.c  2013-05-04 15:53:18.0 +0800
+++ b/hw/i386/kvm/pci-assign.c  2013-05-04 15:50:46.0 +0800
@@ -1576,6 +1576,8 @@ static void assigned_dev_msix_mmio_write
 MSIMessage msg;
 int ret;

+/* Needless to update msi route when only msi-x entry 
control section changed */
+if ((addr  (PCI_MSIX_ENTRY_SIZE - 1)) != 
PCI_MSIX_ENTRY_VECTOR_CTRL){
 msg.address = entry-addr_lo |
 ((uint64_t)entry-addr_hi  32);
 msg.data = entry-data;
@@ -1585,6 +1587,7 @@ static void assigned_dev_msix_mmio_write
 if (ret) {
 error_report(Error updating irq routing entry (%d), ret);
 }
+}
 }
 }
 }

Thanks,
Zhang Haoyu



Re: [Qemu-devel] KVM VM(rhel-5.5) %si is too high when TX/RX packets

2013-05-04 Thread Zhanghaoyu (A)
 I running a VM(RHEL-5.5) on KVM hypervisor(linux-3.8 + QEMU-1.4.1), 
 and direct-assign intel 82576 VF to the VM. When TX/RX packets on VM to the 
 other host via iperf tool, top tool result on VM shown that the %si is too 
 high, approximately 95% ~ 100%, but from the view of host, the VM's total 
 CPU usage is about 20% - 30%. And the throughput rate is approximately 
 200Mb/s, far from the line rate 1Gb/s, And, I found  the hardirq rate is 
 lower than normal by running watch -d -n 1 cat /proc/interrupts, I think 
 it's caused by the too high %si, because the NIC's hardirq was disabled 
 during the softirq process.
 Then, I direct-assign the intel 82576 to the VM, the same case happened too. 
 I found the intel 82576 and intel 82576 VF's interrupt mode are both 
 PCI-MSI-X.
 
 And,
 I rmmod the igb driver, and, re-insmod the igb driver(igb-4.1.2) with the 
 parameter IntMode=0/1(0:legacy, 1:MSI, 2:MSI-x), the problem then gone, the 
 %si is approximately 20% -30%, and the throughput rate came to the line 
 rate, about 940Mb/s.
 I update the VM to RHEL-6.1, the problem disappeared too.
 And, I found a very strange thing, the VM's 82576VF's irq routing is set one 
 time on Vf's one interrupt received, so frequently.

RHEL 5.5 is a very old update.  Can you try RHEL 5.9?

In any case, this looks a lot like a bug in the version of the driver that was 
included in RHEL5.5; you should contact Red Hat support services if you can 
still reproduce it with the latest RHEL5 update.

Paolo

One patch has been proposed to QEMU, shown as below,

[PATCH] [KVM] Needless to update msi route when only msi-x entry control 
section changed
With regard to old version linux guest(e.g., rhel-5.5), in ISR processing, mask 
and unmask msi-x vector every time, which result in VMEXIT, then QEMU will 
invoke kvm_irqchip_update_msi_route() to ask KVM hypervisor to update the VM 
irq routing table. In KVM hypervisor, synchronizing RCU needed after updating 
routing table, so much time consumed for waiting in wait_rcu_gp(). So CPU usage 
in VM is so high, while from the view of host, VM's total CPU usage is so low. 
Masking/unmasking msi-x vector only set msi-x entry control section, needless 
to update VM irq routing table.

hw/i386/kvm/pci-assign.c | 3 +++
1 files changed, 3 insertions(+)

--- a/hw/i386/kvm/pci-assign.c  2013-05-04 15:53:18.0 +0800
+++ b/hw/i386/kvm/pci-assign.c  2013-05-04 15:50:46.0 +0800
@@ -1576,6 +1576,8 @@ static void assigned_dev_msix_mmio_write
 MSIMessage msg;
 int ret;

+/* Needless to update msi route when only msi-x entry 
control section changed */
+if ((addr  (PCI_MSIX_ENTRY_SIZE - 1)) != 
PCI_MSIX_ENTRY_VECTOR_CTRL){
 msg.address = entry-addr_lo |
 ((uint64_t)entry-addr_hi  32);
 msg.data = entry-data; @@ -1585,6 +1587,7 @@ static void 
assigned_dev_msix_mmio_write
 if (ret) {
 error_report(Error updating irq routing entry (%d), ret);
 }
+}
 }
 }
 }

Thanks,
Zhang Haoyu



Re: [Qemu-devel] KVM VM(rhel-5.5) %si is too high when TX/RX packets

2013-05-03 Thread Zhanghaoyu (A)
 I running a VM(RHEL-5.5) on KVM hypervisor(linux-3.8 + QEMU-1.4.1), and 
 direct-assign intel 82576 VF to the VM. When TX/RX packets on VM to the other 
 host via iperf tool, top tool result on VM shown that the %si is too high, 
 approximately 95% ~ 100%, but from the view of host, the VM's total CPU usage 
 is about 20% - 30%. And the throughput rate is approximately 200Mb/s, far 
 from the line rate 1Gb/s, And, I found  the hardirq rate is lower than normal 
 by running watch -d -n 1 cat /proc/interrupts, I think it's caused by the 
 too high %si, because the NIC's hardirq was disabled during the softirq 
 process.
 Then, I direct-assign the intel 82576 to the VM, the same case happened too. 
 I found the intel 82576 and intel 82576 VF's interrupt mode are both 
 PCI-MSI-X.
 
 And,
 I rmmod the igb driver, and, re-insmod the igb driver(igb-4.1.2) with the 
 parameter IntMode=0/1(0:legacy, 1:MSI, 2:MSI-x), the problem then gone, the 
 %si is approximately 20% -30%, and the throughput rate came to the line rate, 
 about 940Mb/s.
 I update the VM to RHEL-6.1, the problem disappeared too.
 And, I found a very strange thing, the VM's 82576VF's irq routing is set one 
 time on Vf's one interrupt received, so frequently.

With regard to rhel-5.5(linux-2.6.18), in ISR process, mask and unmask msi-x 
vector function msi_set_mask_bit() was invoked every time, which result in 
VMEXIT, then QEMU will invoke kvm_irqchip_update_msi_route() to ask KVM 
hypervisor to update the VM irq routing table. In KVM hypervisor, synchronizing 
process needed after updating routing table, so much time consumed for waiting 
in wait_rcu_gp(). 
So %si in VM is so high, while from the view of host, VM's total CPU usage is 
so low. 

Why in ISR process, masking and unmasking msi-x vector is needed every time? 

Thanks,
Zhang Haoyu



[Qemu-devel] KVM VM(rhel-5.5) %si is too high when TX/RX packets

2013-05-02 Thread Zhanghaoyu (A)
I running a VM(RHEL-5.5) on KVM hypervisor(linux-3.8 + QEMU-1.4.1), and 
direct-assign intel 82576 VF to the VM. When TX/RX packets on VM to the other 
host via iperf tool, top tool result on VM shown that the %si is too high, 
approximately 95% ~ 100%, but from the view of host, the VM's total CPU usage 
is about 20% - 30%. And the throughput rate is approximately 200Mb/s, far from 
the line rate 1Gb/s, 
And, I found  the hardirq rate is lower than normal by running watch -d -n 1 
cat /proc/interrupts, I think it's caused by the too high %si, because the 
NIC's hardirq was disabled during the softirq process.
Then, I direct-assign the intel 82576 to the VM, the same case happened too. 
I found the intel 82576 and intel 82576 VF's interrupt mode are both PCI-MSI-X.

And,
I rmmod the igb driver, and, re-insmod the igb driver(igb-4.1.2) with the 
parameter IntMode=0/1(0:legacy, 1:MSI, 2:MSI-x), the problem then gone, the %si 
is approximately 20% -30%, and the throughput rate came to the line rate, about 
940Mb/s.
I update the VM to RHEL-6.1, the problem disappeared too.
And, I found a very strange thing, the VM's 82576VF's irq routing is set one 
time on Vf's one interrupt received, so frequently.

Thanks,
Zhang Haoyu



Re: [Qemu-devel] KVM VM(windows xp) reseted when running geekbench for about 2 days

2013-04-25 Thread Zhanghaoyu (A)
   On Thu, Apr 18, 2013 at 12:00:49PM +, Zhanghaoyu (A) wrote:
   I start 10 VMs(windows xp), then running geekbench tool on 
   them, about 2 days, one of them was reset, I found the reset 
   operation is done by int kvm_cpu_exec(CPUArchState *env) {
  ...
 switch (run-exit_reason)
 ...
  case KVM_EXIT_SHUTDOWN:
  DPRINTF(shutdown\n);
  qemu_system_reset_request();
  ret = EXCP_INTERRUPT;
  break;
  ...
   }
   
   KVM_EXIT_SHUTDOWN exit reason was set previously in triple fault 
   handle handle_triple_fault().
   
   How do you know that reset was done here? This is not the only 
   place where qemu_system_reset_request() is called.
  I used gdb to debug QEMU process, and add a breakpoint in 
  qemu_system_reset_request(), when the case occurred, backtrace 
  shown as below,
  (gdb) bt
  #0  qemu_system_reset_request () at vl.c:1964
  #1  0x7f9ef9dc5991 in kvm_cpu_exec (env=0x7f9efac47100)
  at /gt/qemu-kvm-1.4/qemu-kvm-1.4/kvm-all.c:1602
  #2  0x7f9ef9d5b229 in qemu_kvm_cpu_thread_fn (arg=0x7f9efac47100)
  at /gt/qemu-kvm-1.4/qemu-kvm-1.4/cpus.c:759
  #3  0x7f9ef898b5f0 in start_thread () from 
  /lib64/libpthread.so.0
  #4  0x7f9ef86fa84d in clone () from /lib64/libc.so.6
  #5  0x in ?? ()
  
  And, I add printk log in all places where KVM_EXIT_SHUTDOWN exit reason 
  is set, only handle_triple_fault() was called.
  
  Make sure XP is not set to auto-reset in case of BSOD. 
  No, winxp is not set to auto-reset in case of BSOD. No Winxp event log 
  reported.
  
  Best regards,
  Yan.
  
   
   What causes the triple fault?
   
   Are you asking what is triple fault or why it happened in your case?
  What I asked is why triple fault happened in my case.
   For the former see here: 
   http://en.wikipedia.org/wiki/Triple_fault
   For the later it is to late to tell after VM reset. You can run 
   QEMU with -no-reboot -no-shutdown. VM will pause instead of 
   rebooting and then you can examine what is going on.
  Great thanks, I'll run QEMU with -no-reboot -no-shutdown options, if VM 
  paused in my case, what should I examined?
  
 Register state info registers in the monitor for each vcpu. Code around 
 the instruction that faulted.
 
 I ran the QEMU with -no-reboot -no-shutdown options, the VM paused 
 When the case happened, then I info registers in QEMU monitor, shown as 
 below, CS =0008   00c09b00 DPL =0 CS32 [-RA]
 SS =0010   00c09300 DPL =0 DS   [-WA]
 DS =0023   00c0f300 DPL =3 DS   [-WA]
 FS =0030 ffdff000 1fff 00c09300 DPL =0 DS   [-WA]
 GS =   00c0
 LDT=   00c0
 TR =0028 80042000 20ab 8b00 DPL=0 TSS32-busy
 GDT= 8003f000 03ff
 IDT= 8003f400 07ff
 CR0=8001003b CR2=760d7fe4 CR3=002ec000 CR4=06f8 
 DR0= DR1= DR2= 
 DR3= DR6=0ff0 DR7=0400 
 EFER=0800 FCW=027f FSW= [ST=0] FTW=00 MXCSR=1f80 
 FPR0=  FPR1=  
 FPR2=  FPR3=  
 FPR4=  FPR5=  
 FPR6=  FPR7=  
 XMM00= 
 XMM01=
 XMM02= 
 XMM03=
 XMM04= 
 XMM05=
 XMM06= 
 XMM07=
 
 In normal case, info registers in QEMU monitor, shown as below CS 
 =001b   00c0fb00 DPL=3 CS32 [-RA]
 SS =0023   00c0f300 DPL=3 DS   [-WA]
 DS =0023   00c0f300 DPL=3 DS   [-WA]
 FS =0038 7ffda000 0fff 0040f300 DPL=3 DS   [-WA]
 GS =   0100
 LDT=   
 TR =0028 80042000 20ab 8b00 DPL=0 TSS32-busy
 GDT= 8003f000 03ff
 IDT= 8003f400 07ff
 CR0=80010031 CR2=0167fd20 CR3=0af00220 CR4=06f8 
 DR0= DR1= DR2= 
 DR3= DR6=0ff0 DR7=0400 
 EFER=0800 FCW=027f FSW= [ST=0] FTW=00 MXCSR=1f80
 FPR0=00a400a40a18 d830 FPR1=0012f9c07c90e900 e900 
 FPR2=7c910202 5d40 FPR3=01e27c903400 f808 
 FPR4=05230012f87a  FPR5=7c905d40 0001 
 FPR6=0001  FPR7=a9dfde00 4018 
 XMM00=7c917d9a0012f8d47c90 
 XMM01=0012f8740012f8740012f87a7c90
 XMM02=7c917de97c97b1787c917e3f0012f87a 
 XMM03=0012fa687c80901a0012f9186cfd
 XMM04=7c9102027c9034007c9102087c90e900 
 XMM05=000c7c900012f9907c91017b
 XMM06=9a400012f8780012f878 
 XMM07=6365446c74527c91340500241f18
 
 N.B. in two cases, CS DPL, SS DPL, FS DPL, FPR, XMM, FSW, ST, FTW

Re: [Qemu-devel] KVM VM(windows xp) reseted when running geekbench for about 2 days

2013-04-23 Thread Zhanghaoyu (A)
  On Thu, Apr 18, 2013 at 12:00:49PM +, Zhanghaoyu (A) wrote:
  I start 10 VMs(windows xp), then running geekbench tool on them, 
  about 2 days, one of them was reset, I found the reset operation 
  is done by int kvm_cpu_exec(CPUArchState *env) {
 ...
switch (run-exit_reason)
...
 case KVM_EXIT_SHUTDOWN:
 DPRINTF(shutdown\n);
 qemu_system_reset_request();
 ret = EXCP_INTERRUPT;
 break;
 ...
  }
  
  KVM_EXIT_SHUTDOWN exit reason was set previously in triple fault handle 
  handle_triple_fault().
  
  How do you know that reset was done here? This is not the only 
  place where qemu_system_reset_request() is called.
 I used gdb to debug QEMU process, and add a breakpoint in 
 qemu_system_reset_request(), when the case occurred, backtrace shown 
 as below,
 (gdb) bt
 #0  qemu_system_reset_request () at vl.c:1964
 #1  0x7f9ef9dc5991 in kvm_cpu_exec (env=0x7f9efac47100)
 at /gt/qemu-kvm-1.4/qemu-kvm-1.4/kvm-all.c:1602
 #2  0x7f9ef9d5b229 in qemu_kvm_cpu_thread_fn (arg=0x7f9efac47100)
 at /gt/qemu-kvm-1.4/qemu-kvm-1.4/cpus.c:759
 #3  0x7f9ef898b5f0 in start_thread () from /lib64/libpthread.so.0
 #4  0x7f9ef86fa84d in clone () from /lib64/libc.so.6
 #5  0x in ?? ()
 
 And, I add printk log in all places where KVM_EXIT_SHUTDOWN exit reason is 
 set, only handle_triple_fault() was called.
 
 Make sure XP is not set to auto-reset in case of BSOD. 
 No, winxp is not set to auto-reset in case of BSOD. No Winxp event log 
 reported.
 
 Best regards,
 Yan.
 
  
  What causes the triple fault?
  
  Are you asking what is triple fault or why it happened in your case?
 What I asked is why triple fault happened in my case.
  For the former see here: http://en.wikipedia.org/wiki/Triple_fault
  For the later it is to late to tell after VM reset. You can run 
  QEMU with -no-reboot -no-shutdown. VM will pause instead of 
  rebooting and then you can examine what is going on.
 Great thanks, I'll run QEMU with -no-reboot -no-shutdown options, if VM 
 paused in my case, what should I examined?
 
Register state info registers in the monitor for each vcpu. Code around the 
instruction that faulted.

I ran the QEMU with -no-reboot -no-shutdown options, the VM paused When the 
case happened, then I info registers in QEMU monitor, shown as below,
CS =0008   00c09b00 DPL =0 CS32 [-RA]
SS =0010   00c09300 DPL =0 DS   [-WA]
DS =0023   00c0f300 DPL =3 DS   [-WA]
FS =0030 ffdff000 1fff 00c09300 DPL =0 DS   [-WA]
GS =   00c0
LDT=   00c0
TR =0028 80042000 20ab 8b00 DPL=0 TSS32-busy
GDT= 8003f000 03ff
IDT= 8003f400 07ff
CR0=8001003b CR2=760d7fe4 CR3=002ec000 CR4=06f8
DR0= DR1= DR2= 
DR3=
DR6=0ff0 DR7=0400
EFER=0800
FCW=027f FSW= [ST=0] FTW=00 MXCSR=1f80
FPR0=  FPR1= 
FPR2=  FPR3= 
FPR4=  FPR5= 
FPR6=  FPR7= 
XMM00= XMM01=
XMM02= XMM03=
XMM04= XMM05=
XMM06= XMM07=

In normal case, info registers in QEMU monitor, shown as below
CS =001b   00c0fb00 DPL=3 CS32 [-RA]
SS =0023   00c0f300 DPL=3 DS   [-WA]
DS =0023   00c0f300 DPL=3 DS   [-WA]
FS =0038 7ffda000 0fff 0040f300 DPL=3 DS   [-WA]
GS =   0100
LDT=   
TR =0028 80042000 20ab 8b00 DPL=0 TSS32-busy
GDT= 8003f000 03ff
IDT= 8003f400 07ff
CR0=80010031 CR2=0167fd20 CR3=0af00220 CR4=06f8
DR0= DR1= DR2= 
DR3=
DR6=0ff0 DR7=0400
EFER=0800
FCW=027f FSW= [ST=0] FTW=00 MXCSR=1f80
FPR0=00a400a40a18 d830 FPR1=0012f9c07c90e900 e900
FPR2=7c910202 5d40 FPR3=01e27c903400 f808
FPR4=05230012f87a  FPR5=7c905d40 0001
FPR6=0001  FPR7=a9dfde00 4018
XMM00=7c917d9a0012f8d47c90 XMM01=0012f8740012f8740012f87a7c90
XMM02=7c917de97c97b1787c917e3f0012f87a XMM03=0012fa687c80901a0012f9186cfd
XMM04=7c9102027c9034007c9102087c90e900 XMM05=000c7c900012f9907c91017b
XMM06=9a400012f8780012f878 XMM07=6365446c74527c91340500241f18

N.B. in two cases, CS DPL, SS DPL, FS DPL, FPR, XMM, FSW, ST, FTW values are 
quite distinct.

Thanks,
Zhang Haoyu



[Qemu-devel] reply: reply: reply: qemu crashed when starting vm(kvm) with vnc connect

2013-04-18 Thread Zhanghaoyu (A)
  On Mon, Apr 08, 2013 at 12:27:06PM +, Zhanghaoyu (A) wrote:
  On Sun, Apr 07, 2013 at 04:58:07AM +, Zhanghaoyu (A) wrote:
  I start a kvm VM with vnc(using the zrle protocol) connect, sometimes 
  qemu program crashed during starting period, received signal SIGABRT.
  Trying about 20 times, this crash may be reproduced.
  I guess the cause memory corruption or double free.
 
  Which version of QEMU are you running?
 
  Please try qemu.git/master.
 
 Please try again with latest master, might be fixed meanwhile.
 
 If it still happens pleas provide full qemu and vnc client command lines.
 
  backtrace from core file is shown as below:
 
  Program received signal SIGABRT, Aborted.
 
  #8  0x7f32efd26d07 in vnc_disconnect_finish (vs=0x7f32f0c762d0)
  at ui/vnc.c:1050
 
 Do you have a vnc client connected?  Do you close it?
 
I have a vnc client connected, it was auto closed while qemu crashed.

 Any errors reported by the vnc client (maybe it disconnects due to an error 
 in the data stream)?
 
No errors reported by the vnc client, just popup a reconnect window.

And, I have tried to fix this bug, not reproduce this crash after tried about 
100 times, patch is shown as below,
--- a/ui/vnc-jobs.c 2013-04-18 20:10:07.0 +0800
+++ b/ui/vnc-jobs.c 2013-04-18 20:14:06.0 +0800
@@ -234,7 +234,6 @@ static int vnc_worker_thread_loop(VncJob
 vnc_unlock_output(job-vs);
 goto disconnected;
 }
-vnc_unlock_output(job-vs);

 /* Make a local copy of vs and switch output buffers */
 vnc_async_encoding_start(job-vs, vs);
@@ -252,6 +251,8 @@ static int vnc_worker_thread_loop(VncJob

 if (job-vs-csock == -1) {
 vnc_unlock_display(job-vs-vd);
+vnc_async_encoding_end(job-vs, vs);
+vnc_unlock_output(job-vs);
 goto disconnected;
 }

@@ -269,7 +270,6 @@ static int vnc_worker_thread_loop(VncJob
 vs.output.buffer[saved_offset] = (n_rectangles  8)  0xFF;
 vs.output.buffer[saved_offset + 1] = n_rectangles  0xFF;

-vnc_lock_output(job-vs);
 if (job-vs-csock != -1) {
 buffer_reserve(job-vs-jobs_buffer, vs.output.offset);
 buffer_append(job-vs-jobs_buffer, vs.output.buffer,
@@ -278,6 +278,8 @@ static int vnc_worker_thread_loop(VncJob
 vnc_async_encoding_end(job-vs, vs);

qemu_bh_schedule(job-vs-bh);
+} else {
+vnc_async_encoding_end(job-vs, vs);
 }
 vnc_unlock_output(job-vs);

Thanks,
Zhang Haoyu



[Qemu-devel] KVM VM(windows xp) reseted when running geekbench for about 2 days

2013-04-18 Thread Zhanghaoyu (A)
I start 10 VMs(windows xp), then running geekbench tool on them, about 2 days, 
one of them was reset,
I found the reset operation is done by
int kvm_cpu_exec(CPUArchState *env)
{
...
   switch (run-exit_reason)
   ...
case KVM_EXIT_SHUTDOWN:
DPRINTF(shutdown\n);
qemu_system_reset_request();
ret = EXCP_INTERRUPT;
break;
...
}

KVM_EXIT_SHUTDOWN exit reason was set previously in triple fault handle 
handle_triple_fault().

What causes the triple fault?

Thanks,
Zhang Haoyu



Re: [Qemu-devel] KVM VM(windows xp) reseted when running geekbench for about 2 days

2013-04-18 Thread Zhanghaoyu (A)
 On Thu, Apr 18, 2013 at 12:00:49PM +, Zhanghaoyu (A) wrote:
 I start 10 VMs(windows xp), then running geekbench tool on them, 
 about 2 days, one of them was reset, I found the reset operation is 
 done by int kvm_cpu_exec(CPUArchState *env) {
...
   switch (run-exit_reason)
   ...
case KVM_EXIT_SHUTDOWN:
DPRINTF(shutdown\n);
qemu_system_reset_request();
ret = EXCP_INTERRUPT;
break;
...
 }
 
 KVM_EXIT_SHUTDOWN exit reason was set previously in triple fault handle 
 handle_triple_fault().
 
 How do you know that reset was done here? This is not the only place 
 where qemu_system_reset_request() is called.
I used gdb to debug QEMU process, and add a breakpoint in 
qemu_system_reset_request(), when the case occurred, backtrace shown as below,
(gdb) bt
#0  qemu_system_reset_request () at vl.c:1964
#1  0x7f9ef9dc5991 in kvm_cpu_exec (env=0x7f9efac47100)
at /gt/qemu-kvm-1.4/qemu-kvm-1.4/kvm-all.c:1602
#2  0x7f9ef9d5b229 in qemu_kvm_cpu_thread_fn (arg=0x7f9efac47100)
at /gt/qemu-kvm-1.4/qemu-kvm-1.4/cpus.c:759
#3  0x7f9ef898b5f0 in start_thread () from /lib64/libpthread.so.0
#4  0x7f9ef86fa84d in clone () from /lib64/libc.so.6
#5  0x in ?? ()

And, I add printk log in all place where KVM_EXIT_SHUTDOWN exit reason is set, 
only handle_triple_fault() was called.

Make sure XP is not set to auto-reset in case of BSOD. 
No, winxp is not set to auto-reset in case of BSOD. No Winxp event log reported.

Best regards,
Yan.

 
 What causes the triple fault?
 
 Are you asking what is triple fault or why it happened in your case?
What I asked is why triple fault happened in my case.
 For the former see here: http://en.wikipedia.org/wiki/Triple_fault
 For the later it is to late to tell after VM reset. You can run QEMU 
 with -no-reboot -no-shutdown. VM will pause instead of rebooting and 
 then you can examine what is going on.
Great thanks, I'll run QEMU with -no-reboot -no-shutdown options, if VM paused 
in my case, what should I examined?

Thanks,
Zhang Haoyu



Re: [Qemu-devel] latest version qemu compile error

2013-04-10 Thread Zhanghaoyu (A)
  The log of make V=1 is identical with that of make, shown as below,
  
  hw/virtio/dataplane/vring.c: In function 'vring_enable_notification':
  hw/virtio/dataplane/vring.c:72: warning: implicit declaration of function 
  'vring_avail_event'
  hw/virtio/dataplane/vring.c:72: warning: nested extern declaration of 
  'vring_avail_event'
  hw/virtio/dataplane/vring.c:72: error: lvalue required as left operand of 
  assignment
  hw/virtio/dataplane/vring.c: In function 'vring_should_notify':
  hw/virtio/dataplane/vring.c:107: warning: implicit declaration of function 
  'vring_need_event'
  hw/virtio/dataplane/vring.c:107: warning: nested extern declaration of 
  'vring_need_event'
  hw/virtio/dataplane/vring.c:107: warning: implicit declaration of function 
  'vring_used_event'
  hw/virtio/dataplane/vring.c:107: warning: nested extern declaration of 
  'vring_used_event'
  hw/virtio/dataplane/vring.c: In function 'vring_pop':
  hw/virtio/dataplane/vring.c:262: error: lvalue required as left operand of 
  assignment
  make: *** [hw/virtio/dataplane/vring.o] Error 1

 I don't need the errors, I need the compiler command line.
 
 Paolo

The gcc command line,
cc -I. -I/home/zhanghaoyu/qemu_201304091521 
-I/home/zhanghaoyu/qemu_201304091521/include -fPIE -DPIE -m64 -D_GNU_SOURCE 
-D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes 
-Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes 
-fno-strict-aliasing  -fstack-protector-all -Wendif-labels 
-Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security 
-Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration 
-Wold-style-definition -Wtype-limits   -I/usr/include/pixman-1   
-Ihw/virtio/dataplane -Ihw/virtio/dataplane -pthread -I/usr/include/glib-2.0 
-I/usr/lib64/glib-2.0/include   -MMD -MP -MT hw/virtio/dataplane/vring.o -MF 
hw/virtio/dataplane/vring.d -O2 -D_FORTIFY_SOURCE=2 -g  -c -o 
hw/virtio/dataplane/vring.o hw/virtio/dataplane/vring.c

Thanks,
Zhang Haoyu



[Qemu-devel] latest version qemu compile error

2013-04-09 Thread Zhanghaoyu (A)
I compile the QEMU source download from qemu.git 
(http://git.qemu.org/git/qemu.git) on 4-9-2013, errors reported as below,

hw/virtio/dataplane/vring.c: In function 'vring_enable_notification':
hw/virtio/dataplane/vring.c:72: warning: implicit declaration of function 
'vring_avail_event'
hw/virtio/dataplane/vring.c:72: warning: nested extern declaration of 
'vring_avail_event'
hw/virtio/dataplane/vring.c:72: error: lvalue required as left operand of 
assignment
hw/virtio/dataplane/vring.c: In function 'vring_should_notify':
hw/virtio/dataplane/vring.c:107: warning: implicit declaration of function 
'vring_need_event'
hw/virtio/dataplane/vring.c:107: warning: nested extern declaration of 
'vring_need_event'
hw/virtio/dataplane/vring.c:107: warning: implicit declaration of function 
'vring_used_event'
hw/virtio/dataplane/vring.c:107: warning: nested extern declaration of 
'vring_used_event'
hw/virtio/dataplane/vring.c: In function 'vring_pop':
hw/virtio/dataplane/vring.c:262: error: lvalue required as left operand of 
assignment
make: *** [hw/virtio/dataplane/vring.o] Error 1

'vring_avail_event' and 'vring_need_event' defined in 
/linux-headers/linux/virtio_ring.h, are not available  in vring.c ?


Re: [Qemu-devel] latest version qemu compile error

2013-04-09 Thread Zhanghaoyu (A)
  I compile the QEMU source download from qemu.git
  (http://git.qemu.org/git/qemu.git) on 4-9-2013, errors reported as 
  below,
  
   
  
  hw/virtio/dataplane/vring.c: In function 'vring_enable_notification':
  
  hw/virtio/dataplane/vring.c:72: warning: implicit declaration of 
  function 'vring_avail_event'
  
  hw/virtio/dataplane/vring.c:72: warning: nested extern declaration of 
  'vring_avail_event'
  
  hw/virtio/dataplane/vring.c:72: error: lvalue required as left operand 
  of assignment
  
  hw/virtio/dataplane/vring.c: In function 'vring_should_notify':
  
  hw/virtio/dataplane/vring.c:107: warning: implicit declaration of 
  function 'vring_need_event'
  
  hw/virtio/dataplane/vring.c:107: warning: nested extern declaration of 
  'vring_need_event'
  
  hw/virtio/dataplane/vring.c:107: warning: implicit declaration of 
  function 'vring_used_event'
  
  hw/virtio/dataplane/vring.c:107: warning: nested extern declaration of 
  'vring_used_event'
  
  hw/virtio/dataplane/vring.c: In function 'vring_pop':
  
  hw/virtio/dataplane/vring.c:262: error: lvalue required as left 
  operand of assignment
  
  make: *** [hw/virtio/dataplane/vring.o] Error 1
  
   
  
  'vring_avail_event' and 'vring_need_event' defined in 
  /linux-headers/linux/virtio_ring.h, are not available  in vring.c ?
 
 Please send the log of make V=1.
 
 Paolo

The log of make V=1 is identical with that of make, shown as below,

hw/virtio/dataplane/vring.c: In function 'vring_enable_notification':
hw/virtio/dataplane/vring.c:72: warning: implicit declaration of function 
'vring_avail_event'
hw/virtio/dataplane/vring.c:72: warning: nested extern declaration of 
'vring_avail_event'
hw/virtio/dataplane/vring.c:72: error: lvalue required as left operand of 
assignment
hw/virtio/dataplane/vring.c: In function 'vring_should_notify':
hw/virtio/dataplane/vring.c:107: warning: implicit declaration of function 
'vring_need_event'
hw/virtio/dataplane/vring.c:107: warning: nested extern declaration of 
'vring_need_event'
hw/virtio/dataplane/vring.c:107: warning: implicit declaration of function 
'vring_used_event'
hw/virtio/dataplane/vring.c:107: warning: nested extern declaration of 
'vring_used_event'
hw/virtio/dataplane/vring.c: In function 'vring_pop':
hw/virtio/dataplane/vring.c:262: error: lvalue required as left operand of 
assignment
make: *** [hw/virtio/dataplane/vring.o] Error 1

Thanks,
Zhang Haoyu



[Qemu-devel] reply: reply: qemu crashed when starting vm(kvm) with vnc connect

2013-04-08 Thread Zhanghaoyu (A)
On Sun, Apr 07, 2013 at 04:58:07AM +, Zhanghaoyu (A) wrote:
  I start a kvm VM with vnc(using the zrle protocol) connect, sometimes 
  qemu program crashed during starting period, received signal SIGABRT.
  Trying about 20 times, this crash may be reproduced.
  I guess the cause memory corruption or double free.
 
  Which version of QEMU are you running?
  
  Please try qemu.git/master.
  
  Stefan
 
 I used the QEMU download from qemu.git (http://git.qemu.org/git/qemu.git).

 Great, thanks!  Can you please post a backtrace?
 
 The easiest way is:
 
  $ ulimit -c unlimited
  $ qemu-system-x86_64 -enable-kvm -m 1024 ...
  ...crash...
  $ gdb -c qemu-system-x86_64.core
  (gdb) bt
 
 Depending on how your system is configured the core file might have a 
 different filename but there should be a file name *core* the current working 
 directory
after the crash.
 
 The backtrace will make it possible to find out where the crash occurred.
 
 Thanks,
 Stefan

backtrace from core file is shown as below:

Program received signal SIGABRT, Aborted.
0x7f32eda3dd95 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x7f32eda3dd95 in raise () from /lib64/libc.so.6
#1  0x7f32eda3f2ab in abort () from /lib64/libc.so.6
#2  0x7f32eda77ece in __libc_message () from /lib64/libc.so.6
#3  0x7f32eda7dc06 in malloc_printerr () from /lib64/libc.so.6
#4  0x7f32eda7ecda in _int_free () from /lib64/libc.so.6
#5  0x7f32efd3452c in free_and_trace (mem=0x7f329cd0) at vl.c:2880
#6  0x7f32efd251a1 in buffer_free (buffer=0x7f32f0c82890) at ui/vnc.c:505
#7  0x7f32efd20c56 in vnc_zrle_clear (vs=0x7f32f0c762d0)
at ui/vnc-enc-zrle.c:364
#8  0x7f32efd26d07 in vnc_disconnect_finish (vs=0x7f32f0c762d0)
at ui/vnc.c:1050
#9  0x7f32efd275c5 in vnc_client_read (opaque=0x7f32f0c762d0)
at ui/vnc.c:1349
#10 0x7f32efcb397c in qemu_iohandler_poll (readfds=0x7f32f074d020,
writefds=0x7f32f074d0a0, xfds=0x7f32f074d120, ret=1) at iohandler.c:124
#11 0x7f32efcb46e8 in main_loop_wait (nonblocking=0) at main-loop.c:417
#12 0x7f32efd31159 in main_loop () at vl.c:2133
#13 0x7f32efd38070 in main (argc=46, argv=0x7fff7f5df178,
envp=0x7fff7f5df2f0) at vl.c:4481

Zhang Haoyu



[Qemu-devel] 答复: qemu crashed when starting vm(kvm) with vnc connect

2013-04-06 Thread Zhanghaoyu (A)
 I start a kvm VM with vnc(using the zrle protocol) connect, sometimes qemu 
 program crashed during starting period, received signal SIGABRT.
 Trying about 20 times, this crash may be reproduced.
 I guess the cause memory corruption or double free.

 Which version of QEMU are you running?
 
 Please try qemu.git/master.
 
 Stefan

I used the QEMU download from qemu.git (http://git.qemu.org/git/qemu.git).

Zhang Haoyu


[Qemu-devel] qemu crashed when starting vm(kvm) with vnc connect

2013-04-02 Thread Zhanghaoyu (A)
I start a kvm VM with vnc(using the zrle protocol) connect, sometimes qemu 
program crashed during starting period, received signal SIGABRT.
Trying about 20 times, this crash may be reproduced.
I guess the cause memory corruption or double free.

The backtrace shown as below:

0x7f32eda3dd95 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x7f32eda3dd95 in raise () from /lib64/libc.so.6
#1  0x7f32eda3f2ab in abort () from /lib64/libc.so.6
#2  0x7f32eda77ece in __libc_message () from /lib64/libc.so.6
#3  0x7f32eda7dc06 in malloc_printerr () from /lib64/libc.so.6
#4  0x7f32eda7ecda in _int_free () from /lib64/libc.so.6
#5  0x7f32efd3452c in free_and_trace (mem=0x7f329cd0) at vl.c:2880
#6  0x7f32efd251a1 in buffer_free (buffer=0x7f32f0c82890) at ui/vnc.c:505
#7  0x7f32efd20c56 in vnc_zrle_clear (vs=0x7f32f0c762d0)
at ui/vnc-enc-zrle.c:364
#8  0x7f32efd26d07 in vnc_disconnect_finish (vs=0x7f32f0c762d0)
at ui/vnc.c:1050
#9  0x7f32efd275c5 in vnc_client_read (opaque=0x7f32f0c762d0)
at ui/vnc.c:1349
#10 0x7f32efcb397c in qemu_iohandler_poll (readfds=0x7f32f074d020,
writefds=0x7f32f074d0a0, xfds=0x7f32f074d120, ret=1) at iohandler.c:124
#11 0x7f32efcb46e8 in main_loop_wait (nonblocking=0) at main-loop.c:417
#12 0x7f32efd31159 in main_loop () at vl.c:2133
#13 0x7f32efd38070 in main (argc=46, argv=0x7fff7f5df178,
envp=0x7fff7f5df2f0) at vl.c:4481