On 20/01/16 15:06, Andrew Jones wrote: > On Wed, Jan 20, 2016 at 02:28:05PM +0000, Marc Zyngier wrote: >> On 20/01/16 14:01, Andrew Jones wrote: >>> On Tue, Jan 19, 2016 at 07:48:14PM +0100, Andrew Jones wrote: >>>> On Tue, Jan 19, 2016 at 01:43:07PM +0000, Marc Zyngier wrote: >>>>>>> On Tue, Jan 19, 2016 at 01:37:16PM +0100, Andrew Jones wrote: >>>>>> OK, CCing him. One thing I see is that without this change we're >>>>>> currently setting the clock feature CLOCK_EVT_FEAT_C3STOP, even though >>>>>> it's not true. Having that set may disable the oneshot capabilityj >>>>>> necessary to switch to nohz mode? I'll just stop there with my >>>>>> speculation though, so Marc won't have to correct too much... >>>>> >>>>> You're spot on. See 82a5619 in the kernel tree. When I did a similar >>>>> change in kvmtool, I saw a massive reduction in the number of timer >>>>> interrupts injected (specially when the number of vcpu is relatively >>>>> high). >>>>> >>>>> This also have interesting benefits when running on a model, where >>>>> you're trying to squeeze the last bits of "performance" from the >>>>> monster... >>>>> >>>> >>>> Hmm, I'm probably testing this wrong, but I don't see any difference in >>>> the number of injected timer interrupts. My guest, which I boot with >>>> UEFI, has >>>> >>>> CONFIG_ARM_ARCH_TIMER=y >>>> CONFIG_ARM_ARCH_TIMER_EVTSTREAM=y >>>> CONFIG_ARM_TIMER_SP804=y >>>> CONFIG_HIGH_RES_TIMERS=y >>>> CONFIG_TICK_ONESHOT=y >>>> CONFIG_NO_HZ_COMMON=y >>>> # CONFIG_HZ_PERIODIC is not set >>>> CONFIG_NO_HZ_IDLE=y >>>> # CONFIG_NO_HZ_FULL is not set >>>> CONFIG_NO_HZ=y >>>> CONFIG_HZ_1000=y >>>> CONFIG_HZ=1000 >>>> >>>> I've boot a guest using DT with and without this patch >>>> >>>> ---WITHOUT--- >>>> >>>> # ls /proc/device-tree/timer >>>> compatible interrupts name >>>> # cat /proc/interrupts >>>> CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 >>>> CPU6 CPU7 >>>> 3: 6958 5766 5166 5187 5576 5129 >>>> 4695 4398 GIC 27 Edge arch_timer >>>> # sleep 120 && cat /proc/interrupts >>>> CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 >>>> CPU6 CPU7 >>>> 3: 7557 5986 5487 5265 6232 5868 >>>> 5464 4438 GIC 27 Edge arch_timer >>>> >>>> ---WITH--- >>>> >>>> # ls /proc/device-tree/timer >>>> always-on compatible interrupts name >>>> # cat /proc/interrupts >>>> CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 >>>> CPU6 CPU7 >>>> 3: 7005 6080 4996 5391 5165 5257 >>>> 4930 4844 GIC 27 Edge arch_timer >>>> # sleep 120 && cat /proc/interrupts >>>> CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 >>>> CPU6 CPU7 >>>> 3: 7523 6505 5264 6717 5273 5391 >>>> 5526 4901 GIC 27 Edge arch_timer >>>> >>>> >>>> >>>> And kvm trace data has >>>> >>>> ---WITHOUT--- >>>> $ grep kvm_timer_update_irq trace.out | wc -l >>>> 94336 >>>> ---WITH--- >>>> $ grep kvm_timer_update_irq trace.out | wc -l >>>> 95838 >>>> >>>> >>> >>> Must be how I'm looking, because I just tried kvmtool with/without >>> Marc's patch that adds always-on, but don't see any reduction of >>> interrupts there either. I used a defconfig guest kernel. Also, >>> not that I think it should matter, but my host kernel is 4.4-rc4 >>> based. >>> >>> I'd like to be able to see a difference with/without this always-on >>> patch, not because I don't think we should take it anyway, but because >>> I need a test case for the ACPI counterpart. >> >> I just run a couple of quick tests, measuring interrupt rate (vmstat 1) >> on the host, with one VM (2 vcpus) idling, and I'm seeing the following >> thing: >> >> Without "always-on": ~380 interrupts per second >> With "always-on": ~40 interrupts per second >> >> This is with kvmtool, 32bit host (but none of that is arch specific anyway). >> > > For me (64bit host, one VM (8 vcpus)) of 100 'vmstat 1' samples I have the > following. > > Without "always-on": mean=56.370 sd=33.404 min=1 max=244 > With "always-on": mean=51.580 sd=33.361 min=1 max=273 > > I'm also using kvmtool, and my guest is idle. > > So a difference between 32 and 64bit hosts? Again, my guest config is > now just a defconfig. My host config is not, but I'm not sure what > options to look for other than what I wrote above, which are the same > for my host.
Just tried on Seattle with a 64bit guest, and there is hardly any difference indeed. Both host and guest are "mostly" defconfig as well. So there is a kernel configuration difference. Running my 32bit guest on a 64bit host definitely shows a massive difference (with 8 vcpus): Without "always-on": ~1200 interrupts per second With "always-on": ~50 interrupts per second [Head scratching, poking Mark] Right, I now know what is going on: The arm64 kernel uses tick_setup_hrtimer_broadcast() so that it can still use the arch timer as a broadcast timer (forcing one CPU to remain on), while the 32bit kernel relies on the presence of a backup timer (sp804 anyone?) or the guarantee that the timer cannot go away (always-on). This is probably why I'm seeing such a gain with a 32bit guest, and none with a 64bit guest (the kernel already does the right thing). As to why there is such a difference between the two architectures, this is a story for another day... Thanks, M. -- Jazz is not dead. It just smells funny...