Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with ETP enabled

2013-08-01 Thread Gleb Natapov
On Tue, Jul 30, 2013 at 09:04:56AM +, Zhanghaoyu (A) wrote:
 
   hi all,
   
   I met similar problem to these, while performing live migration or 
   save-restore test on the kvm platform (qemu:1.4.0, host:suse11sp2, 
   guest:suse11sp2), running tele-communication software suite in 
   guest, 
   https://lists.gnu.org/archive/html/qemu-devel/2013-05/msg00098.html
   http://comments.gmane.org/gmane.comp.emulators.kvm.devel/102506
   http://thread.gmane.org/gmane.comp.emulators.kvm.devel/100592
   https://bugzilla.kernel.org/show_bug.cgi?id=58771
   
   After live migration or virsh restore [savefile], one process's CPU 
   utilization went up by about 30%, resulted in throughput 
   degradation of this process.
   
   If EPT disabled, this problem gone.
   
   I suspect that kvm hypervisor has business with this problem.
   Based on above suspect, I want to find the two adjacent versions of 
   kvm-kmod which triggers this problem or not (e.g. 2.6.39, 3.0-rc1), 
   and analyze the differences between this two versions, or apply the 
   patches between this two versions by bisection method, finally find the 
   key patches.
   
   Any better ideas?
   
   Thanks,
   Zhang Haoyu
  
  I've attempted to duplicate this on a number of machines that are as 
  similar to yours as I am able to get my hands on, and so far have not 
  been able to see any performance degradation. And from what I've read in 
  the above links, huge pages do not seem to be part of the problem.
  
  So, if you are in a position to bisect the kernel changes, that would 
  probably be the best avenue to pursue in my opinion.
  
  Bruce
  
  I found the first bad 
  commit([612819c3c6e67bac8fceaa7cc402f13b1b63f7e4] KVM: propagate fault r/w 
  information to gup(), allow read-only memory) which triggers this problem 
  by git bisecting the kvm kernel (download from 
  https://git.kernel.org/pub/scm/virt/kvm/kvm.git) changes.
  
  And,
  git log 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4 -n 1 -p  
  612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.log
  git diff 
  612819c3c6e67bac8fceaa7cc402f13b1b63f7e4~1..612819c3c6e67bac8fceaa7cc4
  02f13b1b63f7e4  612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.diff
  
  Then, I diffed 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.log and 
  612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.diff,
  came to a conclusion that all of the differences between 
  612819c3c6e67bac8fceaa7cc402f13b1b63f7e4~1 and 
  612819c3c6e67bac8fceaa7cc402f13b1b63f7e4
  are contributed by no other than 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4, 
  so this commit is the peace-breaker which directly or indirectly causes 
  the degradation.
  
  Does the map_writable flag passed to mmu_set_spte() function have effect 
  on PTE's PAT flag or increase the VMEXITs induced by that guest tried to 
  write read-only memory?
  
  Thanks,
  Zhang Haoyu
  
 
 There should be no read-only memory maps backing guest RAM.
 
 Can you confirm map_writable = false is being passed to __direct_map? (this 
 should not happen, for guest RAM).
 And if it is false, please capture the associated GFN.
 
 I added below check and printk at the start of __direct_map() at the fist bad 
 commit version,
 --- kvm-612819c3c6e67bac8fceaa7cc402f13b1b63f7e4/arch/x86/kvm/mmu.c 
 2013-07-26 18:44:05.0 +0800
 +++ kvm-612819/arch/x86/kvm/mmu.c   2013-07-31 00:05:48.0 +0800
 @@ -2223,6 +2223,9 @@ static int __direct_map(struct kvm_vcpu
 int pt_write = 0;
 gfn_t pseudo_gfn;
 
 +if (!map_writable)
 +printk(KERN_ERR %s: %s: gfn = %llu \n, __FILE__, __func__, 
 gfn);
 +
 for_each_shadow_entry(vcpu, (u64)gfn  PAGE_SHIFT, iterator) {
 if (iterator.level == level) {
 unsigned pte_access = ACC_ALL;
 
 I virsh-save the VM, and then virsh-restore it, so many GFNs were printed, 
 you can absolutely describe it as flooding.
 
The flooding you see happens during migrate to file stage because of dirty
page tracking. If you clear dmesg after virsh-save you should not see any
flooding after virsh-restore. I just checked with latest tree, I do not.


--
Gleb.



Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with ETP enabled

2013-07-30 Thread Zhanghaoyu (A)

  hi all,
  
  I met similar problem to these, while performing live migration or 
  save-restore test on the kvm platform (qemu:1.4.0, host:suse11sp2, 
  guest:suse11sp2), running tele-communication software suite in 
  guest, 
  https://lists.gnu.org/archive/html/qemu-devel/2013-05/msg00098.html
  http://comments.gmane.org/gmane.comp.emulators.kvm.devel/102506
  http://thread.gmane.org/gmane.comp.emulators.kvm.devel/100592
  https://bugzilla.kernel.org/show_bug.cgi?id=58771
  
  After live migration or virsh restore [savefile], one process's CPU 
  utilization went up by about 30%, resulted in throughput 
  degradation of this process.
  
  If EPT disabled, this problem gone.
  
  I suspect that kvm hypervisor has business with this problem.
  Based on above suspect, I want to find the two adjacent versions of 
  kvm-kmod which triggers this problem or not (e.g. 2.6.39, 3.0-rc1), 
  and analyze the differences between this two versions, or apply the 
  patches between this two versions by bisection method, finally find the 
  key patches.
  
  Any better ideas?
  
  Thanks,
  Zhang Haoyu
 
 I've attempted to duplicate this on a number of machines that are as 
 similar to yours as I am able to get my hands on, and so far have not been 
 able to see any performance degradation. And from what I've read in the 
 above links, huge pages do not seem to be part of the problem.
 
 So, if you are in a position to bisect the kernel changes, that would 
 probably be the best avenue to pursue in my opinion.
 
 Bruce
 
 I found the first bad 
 commit([612819c3c6e67bac8fceaa7cc402f13b1b63f7e4] KVM: propagate fault r/w 
 information to gup(), allow read-only memory) which triggers this problem by 
 git bisecting the kvm kernel (download from 
 https://git.kernel.org/pub/scm/virt/kvm/kvm.git) changes.
 
 And,
 git log 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4 -n 1 -p  
 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.log
 git diff 
 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4~1..612819c3c6e67bac8fceaa7cc4
 02f13b1b63f7e4  612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.diff
 
 Then, I diffed 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.log and 
 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.diff,
 came to a conclusion that all of the differences between 
 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4~1 and 
 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4
 are contributed by no other than 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4, 
 so this commit is the peace-breaker which directly or indirectly causes the 
 degradation.
 
 Does the map_writable flag passed to mmu_set_spte() function have effect on 
 PTE's PAT flag or increase the VMEXITs induced by that guest tried to write 
 read-only memory?
 
 Thanks,
 Zhang Haoyu
 

There should be no read-only memory maps backing guest RAM.

Can you confirm map_writable = false is being passed to __direct_map? (this 
should not happen, for guest RAM).
And if it is false, please capture the associated GFN.

I added below check and printk at the start of __direct_map() at the fist bad 
commit version,
--- kvm-612819c3c6e67bac8fceaa7cc402f13b1b63f7e4/arch/x86/kvm/mmu.c 
2013-07-26 18:44:05.0 +0800
+++ kvm-612819/arch/x86/kvm/mmu.c   2013-07-31 00:05:48.0 +0800
@@ -2223,6 +2223,9 @@ static int __direct_map(struct kvm_vcpu
int pt_write = 0;
gfn_t pseudo_gfn;

+if (!map_writable)
+printk(KERN_ERR %s: %s: gfn = %llu \n, __FILE__, __func__, 
gfn);
+
for_each_shadow_entry(vcpu, (u64)gfn  PAGE_SHIFT, iterator) {
if (iterator.level == level) {
unsigned pte_access = ACC_ALL;

I virsh-save the VM, and then virsh-restore it, so many GFNs were printed, you 
can absolutely describe it as flooding.

Its probably an issue with an older get_user_pages variant (either in kvm-kmod 
or the older kernel). Is there any indication of a similar issue with upstream 
kernel?
I will test the upstream kvm 
host(https://git.kernel.org/pub/scm/virt/kvm/kvm.git) later, if the problem is 
still there, 
I will revert the first bad commit patch: 
612819c3c6e67bac8fceaa7cc402f13b1b63f7e4 on the upstream, then test it again.

And, I collected the VMEXITs statistics in pre-save and post-restore period at 
first bad commit version,
pre-save:
COTS-F10S03:~ # perf stat -e kvm:* -a sleep 30

 Performance counter stats for 'sleep 30':

   1222318 kvm:kvm_entry
 0 kvm:kvm_hypercall
 0 kvm:kvm_hv_hypercall
351755 kvm:kvm_pio
  6703 kvm:kvm_cpuid
692502 kvm:kvm_apic
   1234173 kvm:kvm_exit
223956 kvm:kvm_inj_virq
 0 kvm:kvm_inj_exception
 16028 kvm:kvm_page_fault
 59872 kvm:kvm_msr
 0 kvm:kvm_cr
169596 kvm:kvm_pic_set_irq
 81455 kvm:kvm_apic_ipi
245103 kvm:kvm_apic_accept_irq
 0 kvm:kvm_nested_vmrun
 0 kvm:kvm_nested_intercepts
  

Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with ETP enabled

2013-07-29 Thread Andrea Arcangeli
Hi,

On Sat, Jul 27, 2013 at 07:47:49AM +, Zhanghaoyu (A) wrote:
  hi all,
  
  I met similar problem to these, while performing live migration or 
  save-restore test on the kvm platform (qemu:1.4.0, host:suse11sp2, 
  guest:suse11sp2), running tele-communication software suite in guest, 
  https://lists.gnu.org/archive/html/qemu-devel/2013-05/msg00098.html
  http://comments.gmane.org/gmane.comp.emulators.kvm.devel/102506
  http://thread.gmane.org/gmane.comp.emulators.kvm.devel/100592
  https://bugzilla.kernel.org/show_bug.cgi?id=58771
  
  After live migration or virsh restore [savefile], one process's CPU 
  utilization went up by about 30%, resulted in throughput degradation 
  of this process.
  
  If EPT disabled, this problem gone.
  
  I suspect that kvm hypervisor has business with this problem.
  Based on above suspect, I want to find the two adjacent versions of 
  kvm-kmod which triggers this problem or not (e.g. 2.6.39, 3.0-rc1), 
  and analyze the differences between this two versions, or apply the 
  patches between this two versions by bisection method, finally find the 
  key patches.
  
  Any better ideas?
  
  Thanks,
  Zhang Haoyu
 
 I've attempted to duplicate this on a number of machines that are as similar 
 to yours as I am able to get my hands on, and so far have not been able to 
 see any performance degradation. And from what I've read in the above links, 
 huge pages do not seem to be part of the problem.
 
 So, if you are in a position to bisect the kernel changes, that would 
 probably be the best avenue to pursue in my opinion.
 
 Bruce
 
 I found the first bad commit([612819c3c6e67bac8fceaa7cc402f13b1b63f7e4] KVM: 
 propagate fault r/w information to gup(), allow read-only memory) which 
 triggers this problem 
 by git bisecting the kvm kernel (download from 
 https://git.kernel.org/pub/scm/virt/kvm/kvm.git) changes.
 
 And, 
 git log 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4 -n 1 -p  
 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.log
 git diff 
 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4~1..612819c3c6e67bac8fceaa7cc402f13b1b63f7e4
   612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.diff
 
 Then, I diffed 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.log and 
 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.diff, 
 came to a conclusion that all of the differences between 
 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4~1 and 
 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4
 are contributed by no other than 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4, so 
 this commit is the peace-breaker which directly or indirectly causes the 
 degradation.

Something is generating readonly host ptes for this to make a
difference. Considering live migrate or startup actions are involved
the most likely culprit is fork() to start some script or something.

forks would mark all the pte readonly and invalidate the spte with the
mmu notifier.

So then with all spte dropped, and the whole guest address space
mapped readonly, depending on the app, sometime we could have a vmexit
to establish a readonly spte on the readonly pte, and then another
vmexit to execute the COW at the first write fault that follows.

But it won't run a COW unless the child is still there (and normally
child does fork() + quick stuff + exec(), so child is unlikely to be
still there).

But it's still 2 vmexits when before there was just 1 vmexit.

The same overhead should happen for both EPT and no-EPT, there would
be two vmexits in no-EPT case, there's no way spte can be marked
writable if the host pte is still readonly.

If you get an massive overhead and CPU loop in host kernel mode, maybe
a global tlb flush is missing that get rid of the readonly copy of the
spte in the CPU and all CPUs tends to exit on the same spte at the
same time. Or we may lack the tlb flush even for the current CPU but
we should really flush them all (in the old days the current CPU TLB
flush was implicit in the vmexit but CPU got more features)?

I don't know exactly which kind of overhead we're talking about but
the double number of vmexit would probably not be measurable. If you
monitor the number of vmexits if it's a missing TLB flush you'll see a
flood, otherwise you'll just the double amount before/after that commit.

If the readonly pte generator is fork and it's just the double number
of vmexit the only thing you need is the patch I posted a few days ago
that adds the missing madvise(MADV_DONTFORK).

If instead the overhead is massive and it's a vmexit flood, we also
have a missing tlb flush. In that case let's fix the tlb flush first,
and then you can still apply the MADV_DONTFORK. This kind of fault
activity also happens after a swapin from readonly swapcache so if
there's a vmexit flood we need to fix it before applying
MADV_DONTFORK.

Thanks,
Andrea



Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with ETP enabled

2013-07-29 Thread Marcelo Tosatti
On Sat, Jul 27, 2013 at 07:47:49AM +, Zhanghaoyu (A) wrote:
  hi all,
  
  I met similar problem to these, while performing live migration or 
  save-restore test on the kvm platform (qemu:1.4.0, host:suse11sp2, 
  guest:suse11sp2), running tele-communication software suite in guest, 
  https://lists.gnu.org/archive/html/qemu-devel/2013-05/msg00098.html
  http://comments.gmane.org/gmane.comp.emulators.kvm.devel/102506
  http://thread.gmane.org/gmane.comp.emulators.kvm.devel/100592
  https://bugzilla.kernel.org/show_bug.cgi?id=58771
  
  After live migration or virsh restore [savefile], one process's CPU 
  utilization went up by about 30%, resulted in throughput degradation 
  of this process.
  
  If EPT disabled, this problem gone.
  
  I suspect that kvm hypervisor has business with this problem.
  Based on above suspect, I want to find the two adjacent versions of 
  kvm-kmod which triggers this problem or not (e.g. 2.6.39, 3.0-rc1), 
  and analyze the differences between this two versions, or apply the 
  patches between this two versions by bisection method, finally find the 
  key patches.
  
  Any better ideas?
  
  Thanks,
  Zhang Haoyu
 
 I've attempted to duplicate this on a number of machines that are as similar 
 to yours as I am able to get my hands on, and so far have not been able to 
 see any performance degradation. And from what I've read in the above links, 
 huge pages do not seem to be part of the problem.
 
 So, if you are in a position to bisect the kernel changes, that would 
 probably be the best avenue to pursue in my opinion.
 
 Bruce
 
 I found the first bad commit([612819c3c6e67bac8fceaa7cc402f13b1b63f7e4] KVM: 
 propagate fault r/w information to gup(), allow read-only memory) which 
 triggers this problem 
 by git bisecting the kvm kernel (download from 
 https://git.kernel.org/pub/scm/virt/kvm/kvm.git) changes.
 
 And, 
 git log 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4 -n 1 -p  
 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.log
 git diff 
 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4~1..612819c3c6e67bac8fceaa7cc402f13b1b63f7e4
   612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.diff
 
 Then, I diffed 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.log and 
 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.diff, 
 came to a conclusion that all of the differences between 
 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4~1 and 
 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4
 are contributed by no other than 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4, so 
 this commit is the peace-breaker which directly or indirectly causes the 
 degradation.
 
 Does the map_writable flag passed to mmu_set_spte() function have effect on 
 PTE's PAT flag or increase the VMEXITs induced by that guest tried to write 
 read-only memory?
 
 Thanks,
 Zhang Haoyu
 

There should be no read-only memory maps backing guest RAM.

Can you confirm map_writable = false is being passed
to __direct_map? (this should not happen, for guest RAM).
And if it is false, please capture the associated GFN.

Its probably an issue with an older get_user_pages variant
(either in kvm-kmod or the older kernel). Is there any 
indication of a similar issue with upstream kernel?





Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with ETP enabled

2013-07-27 Thread Zhanghaoyu (A)
 hi all,
 
 I met similar problem to these, while performing live migration or 
 save-restore test on the kvm platform (qemu:1.4.0, host:suse11sp2, 
 guest:suse11sp2), running tele-communication software suite in guest, 
 https://lists.gnu.org/archive/html/qemu-devel/2013-05/msg00098.html
 http://comments.gmane.org/gmane.comp.emulators.kvm.devel/102506
 http://thread.gmane.org/gmane.comp.emulators.kvm.devel/100592
 https://bugzilla.kernel.org/show_bug.cgi?id=58771
 
 After live migration or virsh restore [savefile], one process's CPU 
 utilization went up by about 30%, resulted in throughput degradation 
 of this process.
 
 If EPT disabled, this problem gone.
 
 I suspect that kvm hypervisor has business with this problem.
 Based on above suspect, I want to find the two adjacent versions of 
 kvm-kmod which triggers this problem or not (e.g. 2.6.39, 3.0-rc1), 
 and analyze the differences between this two versions, or apply the 
 patches between this two versions by bisection method, finally find the key 
 patches.
 
 Any better ideas?
 
 Thanks,
 Zhang Haoyu

I've attempted to duplicate this on a number of machines that are as similar 
to yours as I am able to get my hands on, and so far have not been able to see 
any performance degradation. And from what I've read in the above links, huge 
pages do not seem to be part of the problem.

So, if you are in a position to bisect the kernel changes, that would probably 
be the best avenue to pursue in my opinion.

Bruce

I found the first bad commit([612819c3c6e67bac8fceaa7cc402f13b1b63f7e4] KVM: 
propagate fault r/w information to gup(), allow read-only memory) which 
triggers this problem 
by git bisecting the kvm kernel (download from 
https://git.kernel.org/pub/scm/virt/kvm/kvm.git) changes.

And, 
git log 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4 -n 1 -p  
612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.log
git diff 
612819c3c6e67bac8fceaa7cc402f13b1b63f7e4~1..612819c3c6e67bac8fceaa7cc402f13b1b63f7e4
  612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.diff

Then, I diffed 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.log and 
612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.diff, 
came to a conclusion that all of the differences between 
612819c3c6e67bac8fceaa7cc402f13b1b63f7e4~1 and 
612819c3c6e67bac8fceaa7cc402f13b1b63f7e4
are contributed by no other than 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4, so 
this commit is the peace-breaker which directly or indirectly causes the 
degradation.

Does the map_writable flag passed to mmu_set_spte() function have effect on 
PTE's PAT flag or increase the VMEXITs induced by that guest tried to write 
read-only memory?

Thanks,
Zhang Haoyu





Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with ETP enabled

2013-07-11 Thread Michael S. Tsirkin
On Thu, Jul 11, 2013 at 09:36:47AM +, Zhanghaoyu (A) wrote:
 hi all,
 
 I met similar problem to these, while performing live migration or 
 save-restore test on the kvm platform (qemu:1.4.0, host:suse11sp2, 
 guest:suse11sp2), running tele-communication software suite in guest,
 https://lists.gnu.org/archive/html/qemu-devel/2013-05/msg00098.html
 http://comments.gmane.org/gmane.comp.emulators.kvm.devel/102506
 http://thread.gmane.org/gmane.comp.emulators.kvm.devel/100592
 https://bugzilla.kernel.org/show_bug.cgi?id=58771
 
 After live migration or virsh restore [savefile], one process's CPU 
 utilization went up by about 30%, resulted in throughput degradation of this 
 process.
 oprofile report on this process in guest,
 pre live migration:
 CPU: CPU with timer interrupt, speed 0 MHz (estimated)
 Profiling through timer interrupt
 samples  %app name symbol name
 248  12.3016  no-vmlinux   (no symbols)
 783.8690  libc.so.6memset
 683.3730  libc.so.6memcpy
 301.4881  cscf.scu SipMmBufMemAlloc
 291.4385  libpthread.so.0  pthread_mutex_lock
 261.2897  cscf.scu SipApiGetNextIe
 251.2401  cscf.scu DBFI_DATA_Search
 200.9921  libpthread.so.0  __pthread_mutex_unlock_usercnt
 160.7937  cscf.scu DLM_FreeSlice
 160.7937  cscf.scu receivemessage
 150.7440  cscf.scu SipSmCopyString
 140.6944  cscf.scu DLM_AllocSlice
 
 post live migration:
 CPU: CPU with timer interrupt, speed 0 MHz (estimated)
 Profiling through timer interrupt
 samples  %app name symbol name
 1586 42.2370  libc.so.6memcpy
 271   7.2170  no-vmlinux   (no symbols)
 832.2104  libc.so.6memset
 411.0919  libpthread.so.0  __pthread_mutex_unlock_usercnt
 350.9321  cscf.scu SipMmBufMemAlloc
 290.7723  cscf.scu DLM_AllocSlice
 280.7457  libpthread.so.0  pthread_mutex_lock
 230.6125  cscf.scu SipApiGetNextIe
 170.4527  cscf.scu SipSmCopyString
 160.4261  cscf.scu receivemessage
 150.3995  cscf.scu SipcMsgStatHandle
 140.3728  cscf.scu Urilex
 120.3196  cscf.scu DBFI_DATA_Search
 120.3196  cscf.scu SipDsmGetHdrBitValInner
 120.3196  cscf.scu SipSmGetDataFromRefString
 
 So, memcpy costs much more cpu cycles after live migration. Then, I restart 
 the process, this problem disappeared. save-restore has the similar problem.
 
 perf report on vcpu thread in host,
 pre live migration:
 Performance counter stats for thread id '21082':
 
  0 page-faults
  0 minor-faults
  0 major-faults
  31616 cs
506 migrations
  0 alignment-faults
  0 emulation-faults
 5075957539 L1-dcache-loads
   [21.32%]
  324685106 L1-dcache-load-misses #6.40% of all L1-dcache hits 
   [21.85%]
 3681777120 L1-dcache-stores   
   [21.65%]
   65251823 L1-dcache-store-misses# 1.77%  
  [22.78%]
  0 L1-dcache-prefetches   
   [22.84%]
  0 L1-dcache-prefetch-misses  
   [22.32%]
 9321652613 L1-icache-loads
   [22.60%]
 1353418869 L1-icache-load-misses #   14.52% of all L1-icache hits 
   [21.92%]
  169126969 LLC-loads  
   [21.87%]
   12583605 LLC-load-misses   #7.44% of all LL-cache hits  
   [ 5.84%]
  132853447 LLC-stores 
   [ 6.61%]
   10601171 LLC-store-misses  #7.9%
[ 5.01%]
   25309497 LLC-prefetches #30%
   [ 4.96%]
7723198 LLC-prefetch-misses
   [ 6.04%]
 4954075817 dTLB-loads 
   [11.56%]
   26753106 dTLB-load-misses  #0.54% of all dTLB cache 
 hits  [16.80%]
 3553702874 dTLB-stores
   [22.37%]
4720313 dTLB-store-misses#0.13%
 [21.46%]
  not counted dTLB-prefetches
  not counted dTLB-prefetch-misses
 
   60.000920666 seconds time elapsed
 
 post 

Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with ETP enabled

2013-07-11 Thread Gleb Natapov
On Thu, Jul 11, 2013 at 09:36:47AM +, Zhanghaoyu (A) wrote:
 hi all,
 
 I met similar problem to these, while performing live migration or 
 save-restore test on the kvm platform (qemu:1.4.0, host:suse11sp2, 
 guest:suse11sp2), running tele-communication software suite in guest,
 https://lists.gnu.org/archive/html/qemu-devel/2013-05/msg00098.html
 http://comments.gmane.org/gmane.comp.emulators.kvm.devel/102506
 http://thread.gmane.org/gmane.comp.emulators.kvm.devel/100592
 https://bugzilla.kernel.org/show_bug.cgi?id=58771
 
 After live migration or virsh restore [savefile], one process's CPU 
 utilization went up by about 30%, resulted in throughput degradation of this 
 process.
 oprofile report on this process in guest,
 pre live migration:
 CPU: CPU with timer interrupt, speed 0 MHz (estimated)
 Profiling through timer interrupt
 samples  %app name symbol name
 248  12.3016  no-vmlinux   (no symbols)
 783.8690  libc.so.6memset
 683.3730  libc.so.6memcpy
 301.4881  cscf.scu SipMmBufMemAlloc
 291.4385  libpthread.so.0  pthread_mutex_lock
 261.2897  cscf.scu SipApiGetNextIe
 251.2401  cscf.scu DBFI_DATA_Search
 200.9921  libpthread.so.0  __pthread_mutex_unlock_usercnt
 160.7937  cscf.scu DLM_FreeSlice
 160.7937  cscf.scu receivemessage
 150.7440  cscf.scu SipSmCopyString
 140.6944  cscf.scu DLM_AllocSlice
 
 post live migration:
 CPU: CPU with timer interrupt, speed 0 MHz (estimated)
 Profiling through timer interrupt
 samples  %app name symbol name
 1586 42.2370  libc.so.6memcpy
 271   7.2170  no-vmlinux   (no symbols)
 832.2104  libc.so.6memset
 411.0919  libpthread.so.0  __pthread_mutex_unlock_usercnt
 350.9321  cscf.scu SipMmBufMemAlloc
 290.7723  cscf.scu DLM_AllocSlice
 280.7457  libpthread.so.0  pthread_mutex_lock
 230.6125  cscf.scu SipApiGetNextIe
 170.4527  cscf.scu SipSmCopyString
 160.4261  cscf.scu receivemessage
 150.3995  cscf.scu SipcMsgStatHandle
 140.3728  cscf.scu Urilex
 120.3196  cscf.scu DBFI_DATA_Search
 120.3196  cscf.scu SipDsmGetHdrBitValInner
 120.3196  cscf.scu SipSmGetDataFromRefString
 
 So, memcpy costs much more cpu cycles after live migration. Then, I restart 
 the process, this problem disappeared. save-restore has the similar problem.
 
Does slowdown persist several minutes after restore? Can you check
how many hugepages is used by qemu process before/after save/restore.

 perf report on vcpu thread in host,
 pre live migration:
 Performance counter stats for thread id '21082':
 
  0 page-faults
  0 minor-faults
  0 major-faults
  31616 cs
506 migrations
  0 alignment-faults
  0 emulation-faults
 5075957539 L1-dcache-loads
   [21.32%]
  324685106 L1-dcache-load-misses #6.40% of all L1-dcache hits 
   [21.85%]
 3681777120 L1-dcache-stores   
   [21.65%]
   65251823 L1-dcache-store-misses# 1.77%  
  [22.78%]
  0 L1-dcache-prefetches   
   [22.84%]
  0 L1-dcache-prefetch-misses  
   [22.32%]
 9321652613 L1-icache-loads
   [22.60%]
 1353418869 L1-icache-load-misses #   14.52% of all L1-icache hits 
   [21.92%]
  169126969 LLC-loads  
   [21.87%]
   12583605 LLC-load-misses   #7.44% of all LL-cache hits  
   [ 5.84%]
  132853447 LLC-stores 
   [ 6.61%]
   10601171 LLC-store-misses  #7.9%
[ 5.01%]
   25309497 LLC-prefetches #30%
   [ 4.96%]
7723198 LLC-prefetch-misses
   [ 6.04%]
 4954075817 dTLB-loads 
   [11.56%]
   26753106 dTLB-load-misses  #0.54% of all dTLB cache 
 hits  [16.80%]
 3553702874 dTLB-stores
   [22.37%]
4720313 dTLB-store-misses#0.13%
   

Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with ETP enabled

2013-07-11 Thread Xiao Guangrong
Hi,

Could you please test this patch?

From 48df7db2ec2721e35d024a8d9850dbb34b557c1c Mon Sep 17 00:00:00 2001
From: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
Date: Thu, 6 Sep 2012 16:56:01 +0800
Subject: [PATCH 10/11] using huge page on fast page fault path

---
 arch/x86/kvm/mmu.c |   27 ---
 1 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 6945ef4..7d177c7 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2663,6 +2663,13 @@ static int kvm_handle_bad_page(struct kvm_vcpu *vcpu, 
gfn_t gfn, pfn_t pfn)
return -EFAULT;
 }

+static bool pfn_can_adjust(pfn_t pfn, int level)
+{
+   return !is_error_pfn(pfn)  !kvm_is_mmio_pfn(pfn) 
+  level == PT_PAGE_TABLE_LEVEL 
+ PageTransCompound(pfn_to_page(pfn));
+}
+
 static void transparent_hugepage_adjust(struct kvm_vcpu *vcpu,
gfn_t *gfnp, pfn_t *pfnp, int *levelp)
 {
@@ -2676,10 +2683,8 @@ static void transparent_hugepage_adjust(struct kvm_vcpu 
*vcpu,
 * PT_PAGE_TABLE_LEVEL and there would be no adjustment done
 * here.
 */
-   if (!is_error_pfn(pfn)  !kvm_is_mmio_pfn(pfn) 
-   level == PT_PAGE_TABLE_LEVEL 
-   PageTransCompound(pfn_to_page(pfn)) 
-   !has_wrprotected_page(vcpu-kvm, gfn, PT_DIRECTORY_LEVEL)) {
+   if (pfn_can_adjust(pfn, level) 
+ !has_wrprotected_page(vcpu-kvm, gfn, PT_DIRECTORY_LEVEL)) {
unsigned long mask;
/*
 * mmu_notifier_retry was successful and we hold the
@@ -2768,7 +2773,7 @@ fast_pf_fix_direct_spte(struct kvm_vcpu *vcpu, u64 
*sptep, u64 spte)
  * - false: let the real page fault path to fix it.
  */
 static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t gva, int level,
-   u32 error_code)
+   u32 error_code, bool force_pt_level)
 {
struct kvm_shadow_walk_iterator iterator;
bool ret = false;
@@ -2795,6 +2800,14 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t 
gva, int level,
goto exit;

/*
+* Let the real page fault path change the mapping if large
+* mapping is allowed, for example, the memslot dirty log is
+* disabled.
+*/
+   if (!force_pt_level  pfn_can_adjust(spte_to_pfn(spte), level))
+   goto exit;
+
+   /*
 * Check if it is a spurious fault caused by TLB lazily flushed.
 *
 * Need not check the access of upper level table entries since
@@ -2854,7 +2867,7 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, 
u32 error_code,
} else
level = PT_PAGE_TABLE_LEVEL;

-   if (fast_page_fault(vcpu, v, level, error_code))
+   if (fast_page_fault(vcpu, v, level, error_code, force_pt_level))
return 0;

mmu_seq = vcpu-kvm-mmu_notifier_seq;
@@ -3323,7 +3336,7 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t 
gpa, u32 error_code,
} else
level = PT_PAGE_TABLE_LEVEL;

-   if (fast_page_fault(vcpu, gpa, level, error_code))
+   if (fast_page_fault(vcpu, gpa, level, error_code, force_pt_level))
return 0;

mmu_seq = vcpu-kvm-mmu_notifier_seq;
-- 
1.7.7.6


On 07/11/2013 05:36 PM, Zhanghaoyu (A) wrote:
 hi all,
 
 I met similar problem to these, while performing live migration or 
 save-restore test on the kvm platform (qemu:1.4.0, host:suse11sp2, 
 guest:suse11sp2), running tele-communication software suite in guest,
 https://lists.gnu.org/archive/html/qemu-devel/2013-05/msg00098.html
 http://comments.gmane.org/gmane.comp.emulators.kvm.devel/102506
 http://thread.gmane.org/gmane.comp.emulators.kvm.devel/100592
 https://bugzilla.kernel.org/show_bug.cgi?id=58771
 
 After live migration or virsh restore [savefile], one process's CPU 
 utilization went up by about 30%, resulted in throughput degradation of this 
 process.
 oprofile report on this process in guest,
 pre live migration:
 CPU: CPU with timer interrupt, speed 0 MHz (estimated)
 Profiling through timer interrupt
 samples  %app name symbol name
 248  12.3016  no-vmlinux   (no symbols)
 783.8690  libc.so.6memset
 683.3730  libc.so.6memcpy
 301.4881  cscf.scu SipMmBufMemAlloc
 291.4385  libpthread.so.0  pthread_mutex_lock
 261.2897  cscf.scu SipApiGetNextIe
 251.2401  cscf.scu DBFI_DATA_Search
 200.9921  libpthread.so.0  __pthread_mutex_unlock_usercnt
 160.7937  cscf.scu DLM_FreeSlice
 160.7937  cscf.scu receivemessage
 150.7440  cscf.scu SipSmCopyString
 140.6944  cscf.scu DLM_AllocSlice
 
 post live migration:
 

Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with ETP enabled

2013-07-11 Thread Andreas Färber
Hi,

Am 11.07.2013 11:36, schrieb Zhanghaoyu (A):
 I met similar problem to these, while performing live migration or 
 save-restore test on the kvm platform (qemu:1.4.0, host:suse11sp2, 
 guest:suse11sp2), running tele-communication software suite in guest,
 https://lists.gnu.org/archive/html/qemu-devel/2013-05/msg00098.html
 http://comments.gmane.org/gmane.comp.emulators.kvm.devel/102506
 http://thread.gmane.org/gmane.comp.emulators.kvm.devel/100592
 https://bugzilla.kernel.org/show_bug.cgi?id=58771
 
 After live migration or virsh restore [savefile], one process's CPU 
 utilization went up by about 30%, resulted in throughput degradation of this 
 process.
 oprofile report on this process in guest,
 pre live migration:

So far we've been unable to reproduce this with a pure qemu-kvm /
qemu-system-x86_64 command line on several EPT machines, whereas for
virsh it was reported as confirmed. Can you please share the resulting
QEMU command line from libvirt logs or process list?

Are both host and guest kernel at 3.0.80 (latest SLES updates)?

Thanks,
Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg



Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with ETP enabled

2013-07-11 Thread Zhang Haoyu
Hi,

Could you please test this patch?

I tried this patch, but the problem still be there.

Thanks,
Zhang Haoyu

From 48df7db2ec2721e35d024a8d9850dbb34b557c1c Mon Sep 17 00:00:00 2001
From: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
Date: Thu, 6 Sep 2012 16:56:01 +0800
Subject: [PATCH 10/11] using huge page on fast page fault path

---
 arch/x86/kvm/mmu.c |   27 ---
 1 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 6945ef4..7d177c7 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2663,6 +2663,13 @@ static int kvm_handle_bad_page(struct kvm_vcpu
*vcpu, gfn_t gfn, pfn_t pfn)
   return -EFAULT;
 }

+static bool pfn_can_adjust(pfn_t pfn, int level)
+{
+  return !is_error_pfn(pfn)  !kvm_is_mmio_pfn(pfn) 
+ level == PT_PAGE_TABLE_LEVEL 
+PageTransCompound(pfn_to_page(pfn));
+}
+
 static void transparent_hugepage_adjust(struct kvm_vcpu *vcpu,
   gfn_t *gfnp, pfn_t *pfnp, int *levelp)
 {
@@ -2676,10 +2683,8 @@ static void transparent_hugepage_adjust(struct
kvm_vcpu *vcpu,
* PT_PAGE_TABLE_LEVEL and there would be no adjustment done
* here.
*/
-  if (!is_error_pfn(pfn)  !kvm_is_mmio_pfn(pfn) 
-  level == PT_PAGE_TABLE_LEVEL 
-  PageTransCompound(pfn_to_page(pfn)) 
-  !has_wrprotected_page(vcpu-kvm, gfn, PT_DIRECTORY_LEVEL)) {
+  if (pfn_can_adjust(pfn, level) 
+!has_wrprotected_page(vcpu-kvm, gfn, PT_DIRECTORY_LEVEL)) {
   unsigned long mask;
   /*
* mmu_notifier_retry was successful and we hold the
@@ -2768,7 +2773,7 @@ fast_pf_fix_direct_spte(struct kvm_vcpu *vcpu,
u64 *sptep, u64 spte)
  * - false: let the real page fault path to fix it.
  */
 static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t gva, int level,
-  u32 error_code)
+  u32 error_code, bool force_pt_level)
 {
   struct kvm_shadow_walk_iterator iterator;
   bool ret = false;
@@ -2795,6 +2800,14 @@ static bool fast_page_fault(struct kvm_vcpu
*vcpu, gva_t gva, int level,
   goto exit;

   /*
+   * Let the real page fault path change the mapping if large
+   * mapping is allowed, for example, the memslot dirty log is
+   * disabled.
+   */
+  if (!force_pt_level  pfn_can_adjust(spte_to_pfn(spte), level))
+  goto exit;
+
+  /*
* Check if it is a spurious fault caused by TLB lazily flushed.
*
* Need not check the access of upper level table entries since
@@ -2854,7 +2867,7 @@ static int nonpaging_map(struct kvm_vcpu *vcpu,
gva_t v, u32 error_code,
   } else
   level = PT_PAGE_TABLE_LEVEL;

-  if (fast_page_fault(vcpu, v, level, error_code))
+  if (fast_page_fault(vcpu, v, level, error_code, force_pt_level))
   return 0;

   mmu_seq = vcpu-kvm-mmu_notifier_seq;
@@ -3323,7 +3336,7 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu,
gva_t gpa, u32 error_code,
   } else
   level = PT_PAGE_TABLE_LEVEL;

-  if (fast_page_fault(vcpu, gpa, level, error_code))
+  if (fast_page_fault(vcpu, gpa, level, error_code, force_pt_level))
   return 0;

   mmu_seq = vcpu-kvm-mmu_notifier_seq;
-- 1.7.7.6 On 07/11/2013 05:36 PM, Zhanghaoyu (A) wrote:
 hi all,

 I met similar problem to these, while performing live migration or
save-restore test on the kvm platform (qemu:1.4.0, host:suse11sp2,
guest:suse11sp2), running tele-communication software suite in guest,
 https://lists.gnu.org/archive/html/qemu-devel/2013-05/msg00098.html
 http://comments.gmane.org/gmane.comp.emulators.kvm.devel/102506
 http://thread.gmane.org/gmane.comp.emulators.kvm.devel/100592
 https://bugzilla.kernel.org/show_bug.cgi?id=58771

 After live migration or virsh restore [savefile], one process's CPU
utilization went up by about 30%, resulted in throughput degradation of
this process.
 oprofile report on this process in guest,
 pre live migration:
 CPU: CPU with timer interrupt, speed 0 MHz (estimated)
 Profiling through timer interrupt
 samples  %app name symbol name
 248  12.3016  no-vmlinux   (no symbols)
 783.8690  libc.so.6memset
 683.3730  libc.so.6memcpy
 301.4881  cscf.scu SipMmBufMemAlloc
 291.4385  libpthread.so.0  pthread_mutex_lock
 261.2897  cscf.scu SipApiGetNextIe
 251.2401  cscf.scu DBFI_DATA_Search
 200.9921  libpthread.so.0  __pthread_mutex_unlock_usercnt
 160.7937  cscf.scu DLM_FreeSlice
 160.7937  cscf.scu receivemessage
 150.7440  cscf.scu SipSmCopyString
 140.6944  cscf.scu DLM_AllocSlice

 post live 

Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with ETP enabled

2013-07-11 Thread Bruce Rogers
  On 7/11/2013 at 03:36 AM, Zhanghaoyu (A) haoyu.zh...@huawei.com wrote: 
 hi all,
 
 I met similar problem to these, while performing live migration or 
 save-restore test on the kvm platform (qemu:1.4.0, host:suse11sp2, 
 guest:suse11sp2), running tele-communication software suite in guest,
 https://lists.gnu.org/archive/html/qemu-devel/2013-05/msg00098.html
 http://comments.gmane.org/gmane.comp.emulators.kvm.devel/102506
 http://thread.gmane.org/gmane.comp.emulators.kvm.devel/100592
 https://bugzilla.kernel.org/show_bug.cgi?id=58771
 
 After live migration or virsh restore [savefile], one process's CPU 
 utilization went up by about 30%, resulted in throughput degradation of this 
 process.
 oprofile report on this process in guest,
 pre live migration:
 CPU: CPU with timer interrupt, speed 0 MHz (estimated)
 Profiling through timer interrupt
 samples  %app name symbol name
 248  12.3016  no-vmlinux   (no symbols)
 783.8690  libc.so.6memset
 683.3730  libc.so.6memcpy
 301.4881  cscf.scu SipMmBufMemAlloc
 291.4385  libpthread.so.0  pthread_mutex_lock
 261.2897  cscf.scu SipApiGetNextIe
 251.2401  cscf.scu DBFI_DATA_Search
 200.9921  libpthread.so.0  __pthread_mutex_unlock_usercnt
 160.7937  cscf.scu DLM_FreeSlice
 160.7937  cscf.scu receivemessage
 150.7440  cscf.scu SipSmCopyString
 140.6944  cscf.scu DLM_AllocSlice
 
 post live migration:
 CPU: CPU with timer interrupt, speed 0 MHz (estimated)
 Profiling through timer interrupt
 samples  %app name symbol name
 1586 42.2370  libc.so.6memcpy
 271   7.2170  no-vmlinux   (no symbols)
 832.2104  libc.so.6memset
 411.0919  libpthread.so.0  __pthread_mutex_unlock_usercnt
 350.9321  cscf.scu SipMmBufMemAlloc
 290.7723  cscf.scu DLM_AllocSlice
 280.7457  libpthread.so.0  pthread_mutex_lock
 230.6125  cscf.scu SipApiGetNextIe
 170.4527  cscf.scu SipSmCopyString
 160.4261  cscf.scu receivemessage
 150.3995  cscf.scu SipcMsgStatHandle
 140.3728  cscf.scu Urilex
 120.3196  cscf.scu DBFI_DATA_Search
 120.3196  cscf.scu SipDsmGetHdrBitValInner
 120.3196  cscf.scu SipSmGetDataFromRefString
 
 So, memcpy costs much more cpu cycles after live migration. Then, I restart 
 the process, this problem disappeared. save-restore has the similar problem.
 
 perf report on vcpu thread in host,
 pre live migration:
 Performance counter stats for thread id '21082':
 
  0 page-faults
  0 minor-faults
  0 major-faults
  31616 cs
506 migrations
  0 alignment-faults
  0 emulation-faults
 5075957539 L1-dcache-loads
  
  [21.32%]
  324685106 L1-dcache-load-misses #6.40% of all L1-dcache hits 
   
 [21.85%]
 3681777120 L1-dcache-stores   
  
  [21.65%]
   65251823 L1-dcache-store-misses# 1.77%  
   
[22.78%]
  0 L1-dcache-prefetches   
  
  [22.84%]
  0 L1-dcache-prefetch-misses  
   
 [22.32%]
 9321652613 L1-icache-loads
  
  [22.60%]
 1353418869 L1-icache-load-misses #   14.52% of all L1-icache hits 
   
 [21.92%]
  169126969 LLC-loads  
   [21.87%]
   12583605 LLC-load-misses   #7.44% of all LL-cache hits  
   
 [ 5.84%]
  132853447 LLC-stores 
   [ 6.61%]
   10601171 LLC-store-misses  #7.9%
  
   [ 5.01%]
   25309497 LLC-prefetches #30%
   [ 4.96%]
7723198 LLC-prefetch-misses
  
  [ 6.04%]
 4954075817 dTLB-loads 
   [11.56%]
   26753106 dTLB-load-misses  #0.54% of all dTLB cache 
 hits 
  [16.80%]
 3553702874 dTLB-stores
   [22.37%]
4720313 dTLB-store-misses#0.13%
  
[21.46%]
  not counted dTLB-prefetches
  not counted dTLB-prefetch-misses
 
   

Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with ETP enabled

2013-07-11 Thread Zhanghaoyu (A)
 Hi,
 
 Am 11.07.2013 11:36, schrieb Zhanghaoyu (A):
  I met similar problem to these, while performing live migration or
 save-restore test on the kvm platform (qemu:1.4.0, host:suse11sp2,
 guest:suse11sp2), running tele-communication software suite in guest,
  https://lists.gnu.org/archive/html/qemu-devel/2013-05/msg00098.html
  http://comments.gmane.org/gmane.comp.emulators.kvm.devel/102506
  http://thread.gmane.org/gmane.comp.emulators.kvm.devel/100592
  https://bugzilla.kernel.org/show_bug.cgi?id=58771
 
  After live migration or virsh restore [savefile], one process's CPU
 utilization went up by about 30%, resulted in throughput degradation of
 this process.
  oprofile report on this process in guest,
  pre live migration:
 
 So far we've been unable to reproduce this with a pure qemu-kvm /
 qemu-system-x86_64 command line on several EPT machines, whereas for
 virsh it was reported as confirmed. Can you please share the resulting
 QEMU command line from libvirt logs or process list?
qemu command line from /var/log/libvirt/qemu/[domain].log, 
LC_ALL=C 
PATH=/sbin:/usr/sbin:/usr/local/sbin:/root/bin:/usr/local/bin:/usr/bin:/bin:/usr/X11R6/bin:/usr/games:/usr/lib/mit/bin:/usr/lib/mit/sbin
 HOME=/root USER=root LOGNAME=root QEMU_AUDIO_DRV=none 
/usr/local/bin/qemu-system-x86_64 -name CSC2 -S -M pc-0.12 -cpu qemu32 
-enable-kvm -m 12288 -smp 4,sockets=4,cores=1,threads=1 -uuid 
76e03575-a3ad-589a-e039-40160274bb97 -no-user-config -nodefaults -chardev 
socket,id=charmonitor,path=/var/lib/libvirt/qemu/CSC2.monitor,server,nowait 
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime 
-no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive 
file=/opt/ne/vm/CSC2.img,if=none,id=drive-virtio-disk0,format=raw,cache=none 
-device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
 -netdev tap,fd=20,id=hostnet0,vhost=on,vhostfd=22 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=00:e0:fc:00:0f:01,bus=pci.0,addr=0x3,bootindex=2
 -netdev tap,fd=23,id=hostnet1,vhost=on,vhostfd=24 -device 
virtio-net-pci,netdev=hostnet1,id=net1,mac=00:e0:fc:01:0f:01,bus=pci.0,addr=0x4 
-netdev tap,fd=25,id=hostnet2,vhost=on,vhostfd=26 -device 
virtio-net-pci,netdev=hostnet2,id=net2,mac=00:e0:fc:02:0f:01,bus=pci.0,addr=0x5 
-netdev tap,fd=27,id=hostnet3,vhost=on,vhostfd=28 -device 
virtio-net-pci,netdev=hostnet3,id=net3,mac=00:e0:fc:03:0f:01,bus=pci.0,addr=0x6 
-netdev tap,fd=29,id=hostnet4,vhost=on,vhostfd=30 -device 
virtio-net-pci,netdev=hostnet4,id=net4,mac=00:e0:fc:0a:0f:01,bus=pci.0,addr=0x7 
-netdev tap,fd=31,id=hostnet5,vhost=on,vhostfd=32 -device 
virtio-net-pci,netdev=hostnet5,id=net5,mac=00:e0:fc:0b:0f:01,bus=pci.0,addr=0x9 
-chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 
-vnc *:1 -k en-us -vga cirrus -device i6300esb,id=watchdog0,bus=pci.0,addr=0xb 
-watchdog-action poweroff -device 
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xa
 
 Are both host and guest kernel at 3.0.80 (latest SLES updates)?
No, both host and guest are just raw sles11-sp2-64-GM, kernel version: 
3.0.13-0.27.

Thanks,
Zhang Haoyu
 
 Thanks,
 Andreas
 
 --
 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
 GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg