Re: [PATCH v2 00/11] KVM in-guest performance monitoring
On 06/13/2011 04:34 PM, Avi Kivity wrote: This patchset exposes an emulated version 1 architectural performance monitoring unit to KVM guests. The PMU is emulated using perf_events, so the host kernel can multiplex host-wide, host-user, and the guest on available resources. Caveats: - counters that have PMI (interrupt) enabled stop counting after the interrupt is signalled. This is because we need one-shot samples that keep counting, which perf doesn't support yet - some combinations of INV and CMASK are not supported - counters keep on counting in the host as well as the guest perf maintainers: please consider the first three patches for merging (the first two make sense even without the rest). If you're familiar with the Intel PMU, please review patch 5 as well - it effectively undoes all your work of abstracting the PMU into perf_events by unabstracting perf_events into what is hoped is a very similar PMU. v2: - don't pass perf_event handler context to the callback; extract it via the 'event' parameter instead - RDPMC emulation and interception - CR4.PCE emulation Peter, can you look at 1-3 please? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/11] KVM in-guest performance monitoring
On Wed, 2011-06-29 at 10:52 +0300, Avi Kivity wrote: On 06/13/2011 04:34 PM, Avi Kivity wrote: This patchset exposes an emulated version 1 architectural performance monitoring unit to KVM guests. The PMU is emulated using perf_events, so the host kernel can multiplex host-wide, host-user, and the guest on available resources. Caveats: - counters that have PMI (interrupt) enabled stop counting after the interrupt is signalled. This is because we need one-shot samples that keep counting, which perf doesn't support yet - some combinations of INV and CMASK are not supported - counters keep on counting in the host as well as the guest perf maintainers: please consider the first three patches for merging (the first two make sense even without the rest). If you're familiar with the Intel PMU, please review patch 5 as well - it effectively undoes all your work of abstracting the PMU into perf_events by unabstracting perf_events into what is hoped is a very similar PMU. v2: - don't pass perf_event handler context to the callback; extract it via the 'event' parameter instead - RDPMC emulation and interception - CR4.PCE emulation Peter, can you look at 1-3 please? Queued them, thanks! I was more or less waiting for a next iteration of the series because of those problems reported, but those three stand well on their own. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/11] KVM in-guest performance monitoring
On 06/29/2011 11:38 AM, Peter Zijlstra wrote: Peter, can you look at 1-3 please? Queued them, thanks! I was more or less waiting for a next iteration of the series because of those problems reported, but those three stand well on their own. Thanks. I'm mired in other work but will return to investigate fix those issues. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/11] KVM in-guest performance monitoring
On 06/15/2011 07:51 PM, David Ahern wrote: The qemu-kvm change is setting the pmu version to 1, and your patchset introduces v1 event constraints. So based on intel_pmu_init model=0 is an appropriate model - and a required parameter (-cpu host,model=0). With that option I get thenot supported label as expected. Guest side: Performance counter stats for 'openssl speed aes': 45160.015949 task-clock#0.998 CPUs utilized 192 context-switches #0.000 M/sec 0 CPU-migrations#0.000 M/sec 650 page-faults #0.000 M/sec 57,064,592,321 cycles#1.264 GHz [49.96%] 138,608,368,094 instructions #2.43 insns per cycle [50.04%] 3,003,337,751 branches # 66.504 M/sec [50.04%] 21,890,537 branch-misses #0.73% of all branches [49.96%] 45.242117218 seconds time elapsed (not supported events removed). And comparable events from running the same command host side: Performance counter stats for 'openssl speed aes': 44947.093539 task-clock#0.998 CPUs utilized 4,800 context-switches #0.000 M/sec 5 CPU-migrations#0.000 M/sec 481 page-faults #0.000 M/sec 124,610,137,228 cycles#2.772 GHz [27.77%] 338,982,292,106 instructions #2.72 insns per cycle 6,061,899,079 branches # 134.867 M/sec [33.33%] 2,236,965 branch-misses #0.04% of all branches [33.33%] 45.043442068 seconds time elapsed So cycles are off by roughly 2, instructions are off by roughly a factor of 2.5, branches by a factor of 2. Those 3 events are fairly close from one run to the next in the host. Oh, there's the scaling issue that Peter pointed out. Can you try the tests again, but now measuring just one counter per run (perf stat -e xxx command). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/11] KVM in-guest performance monitoring
On 06/16/2011 07:53 AM, Avi Kivity wrote: On 06/15/2011 07:51 PM, David Ahern wrote: The qemu-kvm change is setting the pmu version to 1, and your patchset introduces v1 event constraints. So based on intel_pmu_init model=0 is an appropriate model - and a required parameter (-cpu host,model=0). With that option I get thenot supported label as expected. Guest side: Performance counter stats for 'openssl speed aes': 45160.015949 task-clock#0.998 CPUs utilized 192 context-switches #0.000 M/sec 0 CPU-migrations#0.000 M/sec 650 page-faults #0.000 M/sec 57,064,592,321 cycles#1.264 GHz [49.96%] 138,608,368,094 instructions #2.43 insns per cycle [50.04%] 3,003,337,751 branches # 66.504 M/sec [50.04%] 21,890,537 branch-misses #0.73% of all branches [49.96%] 45.242117218 seconds time elapsed (not supported events removed). And comparable events from running the same command host side: Performance counter stats for 'openssl speed aes': 44947.093539 task-clock#0.998 CPUs utilized 4,800 context-switches #0.000 M/sec 5 CPU-migrations#0.000 M/sec 481 page-faults #0.000 M/sec 124,610,137,228 cycles#2.772 GHz [27.77%] 338,982,292,106 instructions #2.72 insns per cycle 6,061,899,079 branches # 134.867 M/sec [33.33%] 2,236,965 branch-misses #0.04% of all branches [33.33%] 45.043442068 seconds time elapsed So cycles are off by roughly 2, instructions are off by roughly a factor of 2.5, branches by a factor of 2. Those 3 events are fairly close from one run to the next in the host. Oh, there's the scaling issue that Peter pointed out. Can you try the tests again, but now measuring just one counter per run (perf stat -e xxx command). Command: perf stat -e instructions openssl speed aes Guest: 135,522,189,056 instructions #0.00 insns per cycle Host: 346,082,922,185 instructions #0.00 insns per cycle Adding '--no-scale' to the perf-stat had no effect on the relative difference. David -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/11] KVM in-guest performance monitoring
On 06/16/2011 08:08 AM, David Ahern wrote: Command: perf stat -e instructions openssl speed aes Hmm.. this might be the wrong benchmark for this. I thought openssl-speed was a purely CPU intensive benchmark which should have fairly similar performance numbers in both host and guest. I seem to recall this as true 2 or so years ago, but that is not the case with 3.0-rc2 and F14. Using a benchmark Vince W. wrote seems better: http://www.csl.cornell.edu/~vince/projects/perf_counter/million.s perf stat -e instructions ./million Performance counter stats for './million': 1,113,650 instructions #0.00 insns per cycle David Guest: 135,522,189,056 instructions #0.00 insns per cycle Host: 346,082,922,185 instructions #0.00 insns per cycle Adding '--no-scale' to the perf-stat had no effect on the relative difference. David -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/11] KVM in-guest performance monitoring
On 06/16/2011 05:19 PM, David Ahern wrote: On 06/16/2011 08:08 AM, David Ahern wrote: Command: perf stat -e instructions openssl speed aes Hmm.. this might be the wrong benchmark for this. I thought openssl-speed was a purely CPU intensive benchmark which should have fairly similar performance numbers in both host and guest. I seem to recall this as true 2 or so years ago, but that is not the case with 3.0-rc2 and F14. Using a benchmark Vince W. wrote seems better: http://www.csl.cornell.edu/~vince/projects/perf_counter/million.s perf stat -e instructions ./million Performance counter stats for './million': 1,113,650 instructions #0.00 insns per cycle Maybe it's sensitive to a cpuid bit which we don't pass through - likely a bug in qemu or perhaps in kvm. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/11] KVM in-guest performance monitoring
On 06/16/2011 08:20 AM, Avi Kivity wrote: On 06/16/2011 05:19 PM, David Ahern wrote: On 06/16/2011 08:08 AM, David Ahern wrote: Command: perf stat -e instructions openssl speed aes Hmm.. this might be the wrong benchmark for this. I thought openssl-speed was a purely CPU intensive benchmark which should have fairly similar performance numbers in both host and guest. I seem to recall this as true 2 or so years ago, but that is not the case with 3.0-rc2 and F14. Using a benchmark Vince W. wrote seems better: http://www.csl.cornell.edu/~vince/projects/perf_counter/million.s perf stat -e instructions ./million Performance counter stats for './million': 1,113,650 instructions #0.00 insns per cycle Maybe it's sensitive to a cpuid bit which we don't pass through - likely a bug in qemu or perhaps in kvm. Seems to be a side effect of running perf-stat in the guest. Running just 'openssl speed aes' in both host and guest shows very similar numbers (for the first 3 columns). Adding the 'perf stat' to the command (ie., perf stat openssl speed aes) causes a significant decline in the guest - by a factor of 2. For comparison 'perf stat' in the host has a negligible impact. David -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/11] KVM in-guest performance monitoring
On 06/16/2011 05:32 PM, David Ahern wrote: Seems to be a side effect of running perf-stat in the guest. Running just 'openssl speed aes' in both host and guest shows very similar numbers (for the first 3 columns). Adding the 'perf stat' to the command (ie., perf stat openssl speed aes) causes a significant decline in the guest - by a factor of 2. For comparison 'perf stat' in the host has a negligible impact. That's pretty bad. I'll investigate. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/11] KVM in-guest performance monitoring
On 06/16/2011 08:36 AM, Avi Kivity wrote: On 06/16/2011 05:32 PM, David Ahern wrote: Seems to be a side effect of running perf-stat in the guest. Running just 'openssl speed aes' in both host and guest shows very similar numbers (for the first 3 columns). Adding the 'perf stat' to the command (ie., perf stat openssl speed aes) causes a significant decline in the guest - by a factor of 2. For comparison 'perf stat' in the host has a negligible impact. That's pretty bad. I'll investigate. Before I let this go for the day Running perf in the host shows arch_local_irq_enable is a lot more prevalent when adding 'perf stat' to the command in the guest. David -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/11] KVM in-guest performance monitoring
On Thu, 2011-06-16 at 08:08 -0600, David Ahern wrote: Command: perf stat -e instructions openssl speed aes Guest: 135,522,189,056 instructions #0.00 insns per cycle Host: 346,082,922,185 instructions #0.00 insns per cycle How does: perf stat -e instructions:u openssl speed aes, compare? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/11] KVM in-guest performance monitoring
On 06/16/2011 09:08 AM, Peter Zijlstra wrote: On Thu, 2011-06-16 at 08:08 -0600, David Ahern wrote: Command: perf stat -e instructions openssl speed aes Guest: 135,522,189,056 instructions #0.00 insns per cycle Host: 346,082,922,185 instructions #0.00 insns per cycle How does: perf stat -e instructions:u openssl speed aes, compare? I think the problem is that perf stat in the guest introduces significant overhead. I ran perf-record in the host on the VM pid while running 'perf stat openssl speed aes' in the guest. perf-report on that data shows: 18.06% 9226 [k] arch_local_irq_enable | |--99.77%-- kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | __GI___ioctl | 0x1010002 --0.23%-- [...] and then perf-annotate on kvm_arch_vcpu_ioctl_run shows :vcpu-srcu_idx = srcu_read_lock(vcpu-kvm-srcu); 21.47 : 1613a: 48 8b 3bmov(%rbx),%rdi David -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/11] KVM in-guest performance monitoring
On 06/16/2011 09:08 AM, Peter Zijlstra wrote: On Thu, 2011-06-16 at 08:08 -0600, David Ahern wrote: Command: perf stat -e instructions openssl speed aes Guest: 135,522,189,056 instructions #0.00 insns per cycle Host: 346,082,922,185 instructions #0.00 insns per cycle How does: perf stat -e instructions:u openssl speed aes, compare? In the past couple of months I recall you posted a one billion instruction benchmark in analyzing perf correctness. I can't seem to find that email. Do you recall the benchmark and if so can you resend ? David -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/11] KVM in-guest performance monitoring
On Thu, 2011-06-16 at 09:19 -0600, David Ahern wrote: On 06/16/2011 09:08 AM, Peter Zijlstra wrote: On Thu, 2011-06-16 at 08:08 -0600, David Ahern wrote: Command: perf stat -e instructions openssl speed aes Guest: 135,522,189,056 instructions #0.00 insns per cycle Host: 346,082,922,185 instructions #0.00 insns per cycle How does: perf stat -e instructions:u openssl speed aes, compare? In the past couple of months I recall you posted a one billion instruction benchmark in analyzing perf correctness. I can't seem to find that email. Do you recall the benchmark and if so can you resend ? Sure, I've got a couple of those things lying around: # perf stat -e instructions:u ./loop_1b_instructions-4x Performance counter stats for './loop_1b_instructions-4x': 4,000,085,344 instructions:u#0.00 insns per cycle 0.311861278 seconds time elapsed --- #include stdlib.h #include stdio.h #include time.h main () { int i; fork(); fork(); for (i = 0; i 1; i++) { asm(nop); asm(nop); asm(nop); asm(nop); asm(nop); asm(nop); asm(nop); } wait(NULL); wait(NULL); wait(NULL); wait(NULL); } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/11] KVM in-guest performance monitoring
On 06/16/2011 09:27 AM, Peter Zijlstra wrote: Sure, I've got a couple of those things lying around: # perf stat -e instructions:u ./loop_1b_instructions-4x Performance counter stats for './loop_1b_instructions-4x': 4,000,085,344 instructions:u#0.00 insns per cycle 0.311861278 seconds time elapsed --- #include stdlib.h #include stdio.h #include time.h main () { int i; fork(); fork(); for (i = 0; i 1; i++) { asm(nop); asm(nop); asm(nop); asm(nop); asm(nop); asm(nop); asm(nop); } wait(NULL); wait(NULL); wait(NULL); wait(NULL); } That's the one. Guest: perf stat -e instructions:u /tmp/a.out Performance counter stats for '/tmp/a.out': 4,000,090,357 instructions:u#0.00 insns per cycle 2.972828828 seconds time elapsed Host: perf stat -e instructions:u /tmp/a.out Performance counter stats for '/tmp/a.out': 4,000,083,592 instructions:u#0.00 insns per cycle 0.278185315 seconds time elapsed So the counting is correct, but the time to run the command is significantly longer in the guest. That emphasizes the performance overhead of running perf-stat in the VM. Even the default counters for perf-stat are similar, showing correctness in counting: Guest: perf stat ./a.out Performance counter stats for './a.out': 2707.156752 task-clock#0.996 CPUs utilized 337 context-switches #0.000 M/sec 0 CPU-migrations#0.000 M/sec 209 page-faults #0.000 M/sec 3,103,481,148 cycles#1.146 GHz [50.25%] not supported stalled-cycles-frontend not supported stalled-cycles-backend 3,999,894,345 instructions #1.29 insns per cycle [50.03%] 406,716,307 branches # 150.237 M/sec [49.85%] 270,801 branch-misses #0.07% of all branches [50.02%] 2.717859741 seconds time elapsed Host: perf stat /tmp/a.out Performance counter stats for '/tmp/a.out': 1117.694687 task-clock#3.845 CPUs utilized 140 context-switches #0.000 M/sec 3 CPU-migrations#0.000 M/sec 203 page-faults #0.000 M/sec 3,052,677,262 cycles#2.731 GHz 1,449,951,708 stalled-cycles-frontend # 47.50% frontend cycles idle 471,788,212 stalled-cycles-backend# 15.45% backend cycles idle 4,006,074,559 instructions #1.31 insns per cycle #0.36 stalled cycles per insn 401,265,264 branches # 359.012 M/sec 29,376 branch-misses #0.01% of all branches 0.290722796 seconds time elapsed David -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/11] KVM in-guest performance monitoring
On 06/16/2011 06:34 PM, David Ahern wrote: main () { int i; fork(); fork(); What happens without the two forks? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/11] KVM in-guest performance monitoring
On 06/16/2011 09:59 AM, Avi Kivity wrote: On 06/16/2011 06:34 PM, David Ahern wrote: main () { int i; fork(); fork(); What happens without the two forks? you have a 1-billion instruction benchmark since there is only 1 process. David -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/11] KVM in-guest performance monitoring
On 06/16/2011 07:04 PM, David Ahern wrote: On 06/16/2011 09:59 AM, Avi Kivity wrote: On 06/16/2011 06:34 PM, David Ahern wrote: main () { int i; fork(); fork(); What happens without the two forks? you have a 1-billion instruction benchmark since there is only 1 process. I mean in terms of the overhead. Is the overhead due to context switches being made more expensive by the pmu, or is it something else? But there were only 337 context switches in your measurement, they couldn't possibly be so bad. Anyway I'll investigate it. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/11] KVM in-guest performance monitoring
On 06/16/2011 10:31 AM, Avi Kivity wrote: On 06/16/2011 07:04 PM, David Ahern wrote: On 06/16/2011 09:59 AM, Avi Kivity wrote: On 06/16/2011 06:34 PM, David Ahern wrote: main () { int i; fork(); fork(); What happens without the two forks? you have a 1-billion instruction benchmark since there is only 1 process. I mean in terms of the overhead. Is the overhead due to context switches being made more expensive by the pmu, or is it something else? I figured you meant something else by the question. But there were only 337 context switches in your measurement, they couldn't possibly be so bad. Anyway I'll investigate it. I don't think it's the context switching. See the email on perf-report and perf-annotate from the host side while running perf-stat in the guest. Perhaps more vmexits and associated preemption disable/enable overhead - or the rcu change? David -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/11] KVM in-guest performance monitoring
On 06/14/2011 09:11 PM, David Ahern wrote: Based on Patch 2 you are expecting the guest to have this feature set. I've tried +perfmon and +arch_perfmon in the cpu definition for qemu-kvm (e.g., -cpu host,model=0,+perfmon) no luck nevermind. I hand applied your qemu-kvm patch and changed ebx not eax. I noticed init_intel() looked at eax and discovered the user error. Application of patch fixed and it works. :-) Okay. If you do anything interesting with it, please let us know. I only tested the watchdog, 'perf top', and 'perf stat'. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/11] KVM in-guest performance monitoring
On 06/15/2011 02:57 AM, Avi Kivity wrote: Okay. If you do anything interesting with it, please let us know. I only tested the watchdog, 'perf top', and 'perf stat'. For the following I was using the userspace command from latest perf-core branch. cycles H/W event is not working for me, so perf-top did not do much other than start. perf-stat -ddd shows a whole lot of 0's - which is interesting. It means time enabled and time running are non-0, yet the counter value is 0. cycles and instructions events also show as not counted Command I was playing with: taskset -c 1 chrt -r 1 perf stat -ddd openssl speed aes Performance counter stats for 'openssl speed aes': 46111.369065 task-clock#0.984 CPUs utilized 195 context-switches #0.000 M/sec 0 CPU-migrations#0.000 M/sec 650 page-faults #0.000 M/sec not counted cycles 0 stalled-cycles-frontend #0.00% frontend cycles idle[ 7.63%] 0 stalled-cycles-backend#0.00% backend cycles idle[12.70%] not counted instructions 801,002,999 branches # 17.371 M/sec [ 8.15%] 8,491,676 branch-misses #1.06% of all branches [15.17%] 0 L1-dcache-loads #0.000 M/sec [ 9.23%] 0 L1-dcache-load-misses #0.00% of all L1-dcache hits [ 8.48%] 0 LLC-loads #0.000 M/sec [13.89%] 0 LLC-load-misses #0.00% of all LL-cache hits[12.47%] 0 L1-icache-loads #0.000 M/sec [ 9.46%] 0 L1-icache-load-misses #0.00% of all L1-icache hits [ 9.44%] 0 dTLB-loads#0.000 M/sec [ 9.59%] 0 dTLB-load-misses #0.00% of all dTLB cache hits [11.00%] 0 iTLB-loads#0.000 M/sec [11.13%] 0 iTLB-load-misses #0.00% of all iTLB cache hits [ 9.73%] 0 L1-dcache-prefetches #0.000 M/sec [10.98%] 0 L1-dcache-prefetch-misses #0.000 M/sec [12.51%] 46.851192693 seconds time elapsed Also, the numbers for branches and branch-misses just seem wrong compared to the same command run in the host as well as running perf-stat in the host on the vcpu thread running openssl (with the vcpu pinned to a pcpu). And then reality kicked in and I had to move on to other items. David -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/11] KVM in-guest performance monitoring
On 06/15/2011 03:40 PM, David Ahern wrote: On 06/15/2011 02:57 AM, Avi Kivity wrote: Okay. If you do anything interesting with it, please let us know. I only tested the watchdog, 'perf top', and 'perf stat'. For the following I was using the userspace command from latest perf-core branch. cycles H/W event is not working for me, so perf-top did not do much other than start. Strange, IIRC it did for me. I'll re-test. perf-stat -ddd shows a whole lot of 0's - which is interesting. It means time enabled and time running are non-0, yet the counter value is 0. cycles and instructions events also show as not counted Most of those counters aren't supported by the emulated PMU. What does dmesg say about Perf? Also, the numbers for branches and branch-misses just seem wrong compared to the same command run in the host as well as running perf-stat in the host on the vcpu thread running openssl (with the vcpu pinned to a pcpu). Could be due to the fact that the counter is running in host mode. Will be fixed once the exclude_host/exclude_guest patch makes it in (and gains Intel support). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/11] KVM in-guest performance monitoring
On 06/15/2011 07:22 AM, Avi Kivity wrote: On 06/15/2011 03:40 PM, David Ahern wrote: On 06/15/2011 02:57 AM, Avi Kivity wrote: Okay. If you do anything interesting with it, please let us know. I only tested the watchdog, 'perf top', and 'perf stat'. For the following I was using the userspace command from latest perf-core branch. cycles H/W event is not working for me, so perf-top did not do much other than start. Strange, IIRC it did for me. I'll re-test. perf-stat -ddd shows a whole lot of 0's - which is interesting. It means time enabled and time running are non-0, yet the counter value is 0. cycles and instructions events also show as not counted Most of those counters aren't supported by the emulated PMU. If the counter is unsupported perf-stat should show either not counted or not supported (I submitted a patch for the latter which is in perf-core branch). If you add -v to perf-stat you see the counters are enabled and the time running is getting incremented. ie., something is probably not implemented correctly. What does dmesg say about Perf? [0.050995] Performance Events: Nehalem events, core PMU driver. [0.051466] ... version:1 [0.052998] ... bit width: 40 [0.053999] ... generic registers: 2 [0.054998] ... value mask: 00ff [0.055998] ... max period: 7fff [0.057997] ... fixed-purpose events: 0 [0.058998] ... event mask: 0003 Also, the numbers for branches and branch-misses just seem wrong compared to the same command run in the host as well as running perf-stat in the host on the vcpu thread running openssl (with the vcpu pinned to a pcpu). Could be due to the fact that the counter is running in host mode. Will You mean when perf is run in the guest? be fixed once the exclude_host/exclude_guest patch makes it in (and gains Intel support). How does exclude_{host,_guest} help if the guest-side counters are low -- by orders of magnitude? David -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/11] KVM in-guest performance monitoring
On 06/15/2011 07:08 PM, David Ahern wrote: What does dmesg say about Perf? [0.050995] Performance Events: Nehalem events, core PMU driver. [0.051466] ... version:1 [0.052998] ... bit width: 40 [0.053999] ... generic registers: 2 [0.054998] ... value mask: 00ff [0.055998] ... max period: 7fff [0.057997] ... fixed-purpose events: 0 [0.058998] ... event mask: 0003 Well, it's not a Nehalem. Can you tweak the model/family (via -cpu host) so it doesn't match a Nehalem and instead falls on the architectural PMU? Trial-and-error should work to find a good combo. Also, the numbers for branches and branch-misses just seem wrong compared to the same command run in the host as well as running perf-stat in the host on the vcpu thread running openssl (with the vcpu pinned to a pcpu). Could be due to the fact that the counter is running in host mode. Will You mean when perf is run in the guest? Yes - it's counting host events (mostly kvm.ko) as well as guest events. be fixed once the exclude_host/exclude_guest patch makes it in (and gains Intel support). How does exclude_{host,_guest} help if the guest-side counters are low -- by orders of magnitude? It's probably the misidentification as Nehalem. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/11] KVM in-guest performance monitoring
On 06/15/2011 10:27 AM, Avi Kivity wrote: On 06/15/2011 07:08 PM, David Ahern wrote: What does dmesg say about Perf? [0.050995] Performance Events: Nehalem events, core PMU driver. [0.051466] ... version:1 [0.052998] ... bit width: 40 [0.053999] ... generic registers: 2 [0.054998] ... value mask: 00ff [0.055998] ... max period: 7fff [0.057997] ... fixed-purpose events: 0 [0.058998] ... event mask: 0003 Well, it's not a Nehalem. Can you tweak the model/family (via -cpu host) so it doesn't match a Nehalem and instead falls on the architectural PMU? Trial-and-error should work to find a good combo. The qemu-kvm change is setting the pmu version to 1, and your patchset introduces v1 event constraints. So based on intel_pmu_init model=0 is an appropriate model - and a required parameter (-cpu host,model=0). With that option I get the not supported label as expected. Guest side: Performance counter stats for 'openssl speed aes': 45160.015949 task-clock#0.998 CPUs utilized 192 context-switches #0.000 M/sec 0 CPU-migrations#0.000 M/sec 650 page-faults #0.000 M/sec 57,064,592,321 cycles#1.264 GHz [49.96%] 138,608,368,094 instructions #2.43 insns per cycle [50.04%] 3,003,337,751 branches # 66.504 M/sec [50.04%] 21,890,537 branch-misses #0.73% of all branches [49.96%] 45.242117218 seconds time elapsed (not supported events removed). And comparable events from running the same command host side: Performance counter stats for 'openssl speed aes': 44947.093539 task-clock#0.998 CPUs utilized 4,800 context-switches #0.000 M/sec 5 CPU-migrations#0.000 M/sec 481 page-faults #0.000 M/sec 124,610,137,228 cycles#2.772 GHz [27.77%] 338,982,292,106 instructions #2.72 insns per cycle 6,061,899,079 branches # 134.867 M/sec [33.33%] 2,236,965 branch-misses #0.04% of all branches [33.33%] 45.043442068 seconds time elapsed So cycles are off by roughly 2, instructions are off by roughly a factor of 2.5, branches by a factor of 2. Those 3 events are fairly close from one run to the next in the host. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/11] KVM in-guest performance monitoring
On 06/13/2011 10:55 PM, David Ahern wrote: On 06/13/2011 07:34 AM, Avi Kivity wrote: This patchset exposes an emulated version 1 architectural performance monitoring unit to KVM guests. The PMU is emulated using perf_events, so the host kernel can multiplex host-wide, host-user, and the guest on available resources. Any particular magic needed to try this patchset? You'll need the attached patch, '-cpu host' (or '-cpu host,model=0' sometimes), and, as patch 2 is a guest bug fix, you'll need to run the patched kernel in the guest as well. -- error compiling committee.c: too many arguments to function From 520cf568954500457e1efe37e144c022a767e41f Mon Sep 17 00:00:00 2001 From: Avi Kivity a...@redhat.com Date: Mon, 9 May 2011 09:59:52 +0300 Subject: [PATCH] pmu hack Signed-off-by: Avi Kivity a...@redhat.com --- target-i386/cpuid.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/target-i386/cpuid.c b/target-i386/cpuid.c index 091d812..52ee7a6 100644 --- a/target-i386/cpuid.c +++ b/target-i386/cpuid.c @@ -1124,7 +1124,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count, break; case 0xA: /* Architectural Performance Monitoring Leaf */ -*eax = 0; +*eax = 0x07280201; *ebx = 0; *ecx = 0; *edx = 0; -- 1.7.5.3
Re: [PATCH v2 00/11] KVM in-guest performance monitoring
On 06/14/2011 02:36 AM, Avi Kivity wrote: On 06/13/2011 10:55 PM, David Ahern wrote: On 06/13/2011 07:34 AM, Avi Kivity wrote: This patchset exposes an emulated version 1 architectural performance monitoring unit to KVM guests. The PMU is emulated using perf_events, so the host kernel can multiplex host-wide, host-user, and the guest on available resources. Any particular magic needed to try this patchset? You'll need the attached patch, '-cpu host' (or '-cpu host,model=0' sometimes), and, as patch 2 is a guest bug fix, you'll need to run the patched kernel in the guest as well. qemu-kvm is not cooperating. git repo as of 05f1737582 with your patch is aborting: Welcome to Fedora Starting udev: [4.031626] udev[409]: starting version 161 [4.831159] piix4_smbus :00:01.3: SMBus Host Controller at 0xb100, revision 0 qemu-kvm: /exports/daahern/qemu-kvm.git/hw/msix.c:616: msix_unset_mask_notifier: Assertion `dev-msix_mask_notifier' failed. David -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/11] KVM in-guest performance monitoring
On 2011-06-14 19:15, David Ahern wrote: On 06/14/2011 02:36 AM, Avi Kivity wrote: On 06/13/2011 10:55 PM, David Ahern wrote: On 06/13/2011 07:34 AM, Avi Kivity wrote: This patchset exposes an emulated version 1 architectural performance monitoring unit to KVM guests. The PMU is emulated using perf_events, so the host kernel can multiplex host-wide, host-user, and the guest on available resources. Any particular magic needed to try this patchset? You'll need the attached patch, '-cpu host' (or '-cpu host,model=0' sometimes), and, as patch 2 is a guest bug fix, you'll need to run the patched kernel in the guest as well. qemu-kvm is not cooperating. git repo as of 05f1737582 with your patch is aborting: Welcome to Fedora Starting udev: [4.031626] udev[409]: starting version 161 [4.831159] piix4_smbus :00:01.3: SMBus Host Controller at 0xb100, revision 0 qemu-kvm: /exports/daahern/qemu-kvm.git/hw/msix.c:616: msix_unset_mask_notifier: Assertion `dev-msix_mask_notifier' failed. Use the qemu-kvm next branch. It has the fix you need. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/11] KVM in-guest performance monitoring
On 06/14/2011 11:24 AM, Jan Kiszka wrote: On 2011-06-14 19:15, David Ahern wrote: On 06/14/2011 02:36 AM, Avi Kivity wrote: On 06/13/2011 10:55 PM, David Ahern wrote: On 06/13/2011 07:34 AM, Avi Kivity wrote: This patchset exposes an emulated version 1 architectural performance monitoring unit to KVM guests. The PMU is emulated using perf_events, so the host kernel can multiplex host-wide, host-user, and the guest on available resources. Any particular magic needed to try this patchset? You'll need the attached patch, '-cpu host' (or '-cpu host,model=0' sometimes), and, as patch 2 is a guest bug fix, you'll need to run the patched kernel in the guest as well. qemu-kvm is not cooperating. git repo as of 05f1737582 with your patch is aborting: Welcome to Fedora Starting udev: [4.031626] udev[409]: starting version 161 [4.831159] piix4_smbus :00:01.3: SMBus Host Controller at 0xb100, revision 0 qemu-kvm: /exports/daahern/qemu-kvm.git/hw/msix.c:616: msix_unset_mask_notifier: Assertion `dev-msix_mask_notifier' failed. Use the qemu-kvm next branch. It has the fix you need. Indeed it does. Thanks. Avi: still no luck: [0.047996] Performance Events: unsupported p6 CPU model 0 no PMU driver, software events only. qemu-kvm next branch, ce5f0a588b740e8f28f46a6009e12cfa72edc51f with your perfmon cpuid change. Host and guest are both running your kvm next branch with pmu patch series. David Jan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/11] KVM in-guest performance monitoring
On 06/14/2011 11:33 AM, David Ahern wrote: Avi: still no luck: [0.047996] Performance Events: unsupported p6 CPU model 0 no PMU driver, software events only. qemu-kvm next branch, ce5f0a588b740e8f28f46a6009e12cfa72edc51f with your perfmon cpuid change. Host and guest are both running your kvm next branch with pmu patch series. The perf init code is going down the !perfmon route: if (!cpu_has(boot_cpu_data, X86_FEATURE_ARCH_PERFMON)) { switch (boot_cpu_data.x86) { case 0x6: return p6_pmu_init(); case 0xf: return p4_pmu_init(); } return -ENODEV; } Based on Patch 2 you are expecting the guest to have this feature set. I've tried +perfmon and +arch_perfmon in the cpu definition for qemu-kvm (e.g., -cpu host,model=0,+perfmon) no luck David -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/11] KVM in-guest performance monitoring
On 06/14/2011 11:48 AM, David Ahern wrote: On 06/14/2011 11:33 AM, David Ahern wrote: Avi: still no luck: [0.047996] Performance Events: unsupported p6 CPU model 0 no PMU driver, software events only. qemu-kvm next branch, ce5f0a588b740e8f28f46a6009e12cfa72edc51f with your perfmon cpuid change. Host and guest are both running your kvm next branch with pmu patch series. The perf init code is going down the !perfmon route: if (!cpu_has(boot_cpu_data, X86_FEATURE_ARCH_PERFMON)) { switch (boot_cpu_data.x86) { case 0x6: return p6_pmu_init(); case 0xf: return p4_pmu_init(); } return -ENODEV; } Based on Patch 2 you are expecting the guest to have this feature set. I've tried +perfmon and +arch_perfmon in the cpu definition for qemu-kvm (e.g., -cpu host,model=0,+perfmon) no luck nevermind. I hand applied your qemu-kvm patch and changed ebx not eax. I noticed init_intel() looked at eax and discovered the user error. Application of patch fixed and it works. :-) David -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/11] KVM in-guest performance monitoring
On 06/13/2011 07:34 AM, Avi Kivity wrote: This patchset exposes an emulated version 1 architectural performance monitoring unit to KVM guests. The PMU is emulated using perf_events, so the host kernel can multiplex host-wide, host-user, and the guest on available resources. Any particular magic needed to try this patchset? Host and guest both 64-bit, Fedora 14. Kernel for both is your 'kvm.git next' with this patchset applied. Host: 2 x Intel(R) Xeon(R) CPU E5540 @ 2.53GHz qemu-kvm git as of May 9. Guest: tried '-cpu host' and without a -cpu arg (so qemu-kvm default). In both cases I get: [0.044999] CPU0: Intel(R) Xeon(R) CPU E5540 @ 2.53GHz stepping 05 [0.046996] Performance Events: unsupported p6 CPU model 26 no PMU driver, software events only. David Caveats: - counters that have PMI (interrupt) enabled stop counting after the interrupt is signalled. This is because we need one-shot samples that keep counting, which perf doesn't support yet - some combinations of INV and CMASK are not supported - counters keep on counting in the host as well as the guest perf maintainers: please consider the first three patches for merging (the first two make sense even without the rest). If you're familiar with the Intel PMU, please review patch 5 as well - it effectively undoes all your work of abstracting the PMU into perf_events by unabstracting perf_events into what is hoped is a very similar PMU. v2: - don't pass perf_event handler context to the callback; extract it via the 'event' parameter instead - RDPMC emulation and interception - CR4.PCE emulation -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html