Re: [PATCH v5 00/21] KVM: ARM64: Add guest PMU support
On 07/12/15 14:47, Shannon Zhao wrote: > Hi Marc, > > On 2015/12/7 22:11, Marc Zyngier wrote: >> Shannon, >> >> On 03/12/15 06:11, Shannon Zhao wrote: >>> From: Shannon Zhao >>> >>> This patchset adds guest PMU support for KVM on ARM64. It takes >>> trap-and-emulate approach. When guest wants to monitor one event, it >>> will be trapped by KVM and KVM will call perf_event API to create a perf >>> event and call relevant perf_event APIs to get the count value of event. >>> >>> Use perf to test this patchset in guest. When using "perf list", it >>> shows the list of the hardware events and hardware cache events perf >>> supports. Then use "perf stat -e EVENT" to monitor some event. For >>> example, use "perf stat -e cycles" to count cpu cycles and >>> "perf stat -e cache-misses" to count cache misses. >>> >>> Below are the outputs of "perf stat -r 5 sleep 5" when running in host >>> and guest. >>> >>> Host: >>> Performance counter stats for 'sleep 5' (5 runs): >>> >>>0.510276 task-clock (msec) #0.000 CPUs utilized >>>( +- 1.57% ) >>> 1 context-switches #0.002 M/sec >>> 0 cpu-migrations#0.000 K/sec >>> 49 page-faults #0.096 M/sec >>>( +- 0.77% ) >>> 1064117 cycles#2.085 GHz >>>( +- 1.56% ) >>> stalled-cycles-frontend >>> stalled-cycles-backend >>> 529051 instructions #0.50 insns per >>> cycle ( +- 0.55% ) >>> branches >>>9894 branch-misses # 19.390 M/sec >>>( +- 1.70% ) >>> >>> 5.000853900 seconds time elapsed >>> ( +- 0.00% ) >>> >>> Guest: >>> Performance counter stats for 'sleep 5' (5 runs): >>> >>>0.642456 task-clock (msec) #0.000 CPUs utilized >>>( +- 1.81% ) >>> 1 context-switches #0.002 M/sec >>> 0 cpu-migrations#0.000 K/sec >>> 49 page-faults #0.076 M/sec >>>( +- 1.64% ) >>> 1322717 cycles#2.059 GHz >>>( +- 1.88% ) >>> stalled-cycles-frontend >>> stalled-cycles-backend >>> 640944 instructions #0.48 insns per >>> cycle ( +- 1.10% ) >>> branches >>> 10665 branch-misses # 16.600 M/sec >>>( +- 2.23% ) >>> >>> 5.001181452 seconds time elapsed >>> ( +- 0.00% ) >>> >>> Have a cycle counter read test like below in guest and host: >>> >>> static void test(void) >>> { >>> unsigned long count, count1, count2; >>> count1 = read_cycles(); >>> count++; >>> count2 = read_cycles(); >>> } >>> >>> Host: >>> count1: 3046186213 >>> count2: 3046186347 >>> delta: 134 >>> >>> Guest: >>> count1: 5645797121 >>> count2: 5645797270 >>> delta: 149 >>> >>> The gap between guest and host is very small. One reason for this I >>> think is that it doesn't count the cycles in EL2 and host since we add >>> exclude_hv = 1. So the cycles spent to store/restore registers which >>> happens at EL2 are not included. >>> >>> This patchset can be fetched from [1] and the relevant QEMU version for >>> test can be fetched from [2]. >>> >>> The results of 'perf test' can be found from [3][4]. >>> The results of perf_event_tests test suite can be found from [5][6]. >>> >>> Also, I have tested "perf top" in two VMs and host at the same time. It >>> works well. >> >> I've commented on more issues I've found. Hopefully you'll be able to >> respin this quickly enough, and end-up with a simpler code base (state >> duplication is a bit messy). >> > Ok, will try my best :) > >> Another thing I have noticed is that you have dropped the vgic changes >> that were configuring the interrupt. It feels like they should be >> included, and configure the PPI as a LEVEL interrupt. > The reason why I drop that is in upstream code PPIs are LEVEL interrupt > by default which is changed by the arch_timers patches. So is it > necessary to configure it again? Ah, yes. Missed that. No, that's fine. > >> Also, looking at >> your QEMU code, you seem to configure the interrupt as EDGE, which is >> now how yor emulated HW behaves. >> > Sorry, the QEMU code is not updated while the version I use for test > locally configures the interrupt as LEVEL. I will push the newest one > tomorrow. That'd be good. Thanks, M. -- Jazz is not dead. It just smells funny... -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo inf
Re: [PATCH v5 00/21] KVM: ARM64: Add guest PMU support
Hi Marc, On 2015/12/7 22:11, Marc Zyngier wrote: Shannon, On 03/12/15 06:11, Shannon Zhao wrote: From: Shannon Zhao This patchset adds guest PMU support for KVM on ARM64. It takes trap-and-emulate approach. When guest wants to monitor one event, it will be trapped by KVM and KVM will call perf_event API to create a perf event and call relevant perf_event APIs to get the count value of event. Use perf to test this patchset in guest. When using "perf list", it shows the list of the hardware events and hardware cache events perf supports. Then use "perf stat -e EVENT" to monitor some event. For example, use "perf stat -e cycles" to count cpu cycles and "perf stat -e cache-misses" to count cache misses. Below are the outputs of "perf stat -r 5 sleep 5" when running in host and guest. Host: Performance counter stats for 'sleep 5' (5 runs): 0.510276 task-clock (msec) #0.000 CPUs utilized ( +- 1.57% ) 1 context-switches #0.002 M/sec 0 cpu-migrations#0.000 K/sec 49 page-faults #0.096 M/sec ( +- 0.77% ) 1064117 cycles#2.085 GHz ( +- 1.56% ) stalled-cycles-frontend stalled-cycles-backend 529051 instructions #0.50 insns per cycle ( +- 0.55% ) branches 9894 branch-misses # 19.390 M/sec ( +- 1.70% ) 5.000853900 seconds time elapsed ( +- 0.00% ) Guest: Performance counter stats for 'sleep 5' (5 runs): 0.642456 task-clock (msec) #0.000 CPUs utilized ( +- 1.81% ) 1 context-switches #0.002 M/sec 0 cpu-migrations#0.000 K/sec 49 page-faults #0.076 M/sec ( +- 1.64% ) 1322717 cycles#2.059 GHz ( +- 1.88% ) stalled-cycles-frontend stalled-cycles-backend 640944 instructions #0.48 insns per cycle ( +- 1.10% ) branches 10665 branch-misses # 16.600 M/sec ( +- 2.23% ) 5.001181452 seconds time elapsed ( +- 0.00% ) Have a cycle counter read test like below in guest and host: static void test(void) { unsigned long count, count1, count2; count1 = read_cycles(); count++; count2 = read_cycles(); } Host: count1: 3046186213 count2: 3046186347 delta: 134 Guest: count1: 5645797121 count2: 5645797270 delta: 149 The gap between guest and host is very small. One reason for this I think is that it doesn't count the cycles in EL2 and host since we add exclude_hv = 1. So the cycles spent to store/restore registers which happens at EL2 are not included. This patchset can be fetched from [1] and the relevant QEMU version for test can be fetched from [2]. The results of 'perf test' can be found from [3][4]. The results of perf_event_tests test suite can be found from [5][6]. Also, I have tested "perf top" in two VMs and host at the same time. It works well. I've commented on more issues I've found. Hopefully you'll be able to respin this quickly enough, and end-up with a simpler code base (state duplication is a bit messy). Ok, will try my best :) Another thing I have noticed is that you have dropped the vgic changes that were configuring the interrupt. It feels like they should be included, and configure the PPI as a LEVEL interrupt. The reason why I drop that is in upstream code PPIs are LEVEL interrupt by default which is changed by the arch_timers patches. So is it necessary to configure it again? Also, looking at your QEMU code, you seem to configure the interrupt as EDGE, which is now how yor emulated HW behaves. Sorry, the QEMU code is not updated while the version I use for test locally configures the interrupt as LEVEL. I will push the newest one tomorrow. Looking forward to reviewing the next version. Thanks, M. -- Shannon -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 00/21] KVM: ARM64: Add guest PMU support
Shannon, On 03/12/15 06:11, Shannon Zhao wrote: > From: Shannon Zhao > > This patchset adds guest PMU support for KVM on ARM64. It takes > trap-and-emulate approach. When guest wants to monitor one event, it > will be trapped by KVM and KVM will call perf_event API to create a perf > event and call relevant perf_event APIs to get the count value of event. > > Use perf to test this patchset in guest. When using "perf list", it > shows the list of the hardware events and hardware cache events perf > supports. Then use "perf stat -e EVENT" to monitor some event. For > example, use "perf stat -e cycles" to count cpu cycles and > "perf stat -e cache-misses" to count cache misses. > > Below are the outputs of "perf stat -r 5 sleep 5" when running in host > and guest. > > Host: > Performance counter stats for 'sleep 5' (5 runs): > > 0.510276 task-clock (msec) #0.000 CPUs utilized > ( +- 1.57% ) > 1 context-switches #0.002 M/sec > 0 cpu-migrations#0.000 K/sec > 49 page-faults #0.096 M/sec > ( +- 0.77% ) >1064117 cycles#2.085 GHz > ( +- 1.56% ) > stalled-cycles-frontend > stalled-cycles-backend > 529051 instructions #0.50 insns per cycle > ( +- 0.55% ) > branches > 9894 branch-misses # 19.390 M/sec > ( +- 1.70% ) > >5.000853900 seconds time elapsed >( +- 0.00% ) > > Guest: > Performance counter stats for 'sleep 5' (5 runs): > > 0.642456 task-clock (msec) #0.000 CPUs utilized > ( +- 1.81% ) > 1 context-switches #0.002 M/sec > 0 cpu-migrations#0.000 K/sec > 49 page-faults #0.076 M/sec > ( +- 1.64% ) >1322717 cycles#2.059 GHz > ( +- 1.88% ) > stalled-cycles-frontend > stalled-cycles-backend > 640944 instructions #0.48 insns per cycle > ( +- 1.10% ) > branches > 10665 branch-misses # 16.600 M/sec > ( +- 2.23% ) > >5.001181452 seconds time elapsed >( +- 0.00% ) > > Have a cycle counter read test like below in guest and host: > > static void test(void) > { > unsigned long count, count1, count2; > count1 = read_cycles(); > count++; > count2 = read_cycles(); > } > > Host: > count1: 3046186213 > count2: 3046186347 > delta: 134 > > Guest: > count1: 5645797121 > count2: 5645797270 > delta: 149 > > The gap between guest and host is very small. One reason for this I > think is that it doesn't count the cycles in EL2 and host since we add > exclude_hv = 1. So the cycles spent to store/restore registers which > happens at EL2 are not included. > > This patchset can be fetched from [1] and the relevant QEMU version for > test can be fetched from [2]. > > The results of 'perf test' can be found from [3][4]. > The results of perf_event_tests test suite can be found from [5][6]. > > Also, I have tested "perf top" in two VMs and host at the same time. It > works well. I've commented on more issues I've found. Hopefully you'll be able to respin this quickly enough, and end-up with a simpler code base (state duplication is a bit messy). Another thing I have noticed is that you have dropped the vgic changes that were configuring the interrupt. It feels like they should be included, and configure the PPI as a LEVEL interrupt. Also, looking at your QEMU code, you seem to configure the interrupt as EDGE, which is now how yor emulated HW behaves. Looking forward to reviewing the next version. Thanks, M. -- Jazz is not dead. It just smells funny... -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html