On Sunday, March 21, 2021 at 4:34:50 AM UTC-4 Nadav Har'El wrote:
> On Fri, Mar 19, 2021 at 11:50 PM Waldemar Kozaczuk <jwkoz...@gmail.com> > wrote: > >> The AArch64 Programmer's Guides Generic Timer (chapter 3.4, page 10) >> states following: >> >> "The interrupts generated by the timer behave in a level-sensitive manner. >> This means that, once the timer firing condition is reached, >> the timer will continue to signal an interrupt until one of the >> following situations occurs: >> - IMASK is set to one, which masks the interrupt. >> - ENABLE is cleared to 0, which disables the timer. >> - TVAL or CVAL is written, so that firing condition is no longer met." >> >> This indicates that the timer interrupt handler needs to mask >> or disable the timer, otherwise the interrupt will be re-delivered >> and in some scenarios might cause underlying VMM (like QEMU in TCG mode) >> stop delivering subsequent interrupts. > > > While I understand why the interrupt needs to be cleared, this description > seems to indicate that > this is a bug in the VMM, not in OSv, no? The above description says that > if the timer fired and > OSv did not clear it, it should re-fire immediately, and if it doesn't it > sounds like a (virtual) processor bug. > It very well could be a QEMU bug. With the original code, OSv would not get any interrupts for tiny tval values, not just zero. Having said that, it is sad we do not understand what exactly is going on on the QEMU side in TCG mode and why masking the interrupt fixes the problem. > > That being said, it indeed sounds better to disable the timer, not because > of this bug but because if > we don't, we'll get spurious timer events, no? So this fix is good. > Perhaps it works around the QEMU > bug because nobody thought of testing QEMU in the unexpected case where > the timer is not disabled > after receiving it? > Very likely because Linux does mask timer interrupt (set TIMER_CTL_IMASK_BIT) so the interrupt does not get re-delivered. > > I didn't understand what the code below does, so I have some questions > below - but it's likely it > reflects more on the quality of my understanding than that of the code :-) > > >> In any case per the description >> from the guide above, it is correct to mask the interrupt and/or disable >> the timer in the timer interrupt handler. This is also what Linux does - >> >> https://github.com/torvalds/linux/blob/edd7ab76847442e299af64a761febd180d71f98d/drivers/clocksource/arm_arch_timer.c#L638-L652 >> . >> >> Besides the above, the patch also sligthly optimizes the method set() >> to make it set new timer only if calculated number of ticks (tval) >> is greater than 0. >> >> Fixes #1127 >> >> Signed-off-by: Waldemar Kozaczuk <jwkoz...@gmail.com> >> --- >> arch/aarch64/arm-clock.cc | 39 +++++++++++++++++++++++++++++++++------ >> 1 file changed, 33 insertions(+), 6 deletions(-) >> >> diff --git a/arch/aarch64/arm-clock.cc b/arch/aarch64/arm-clock.cc >> index 7d11e454..d32f58b1 100644 >> --- a/arch/aarch64/arm-clock.cc >> +++ b/arch/aarch64/arm-clock.cc >> @@ -98,6 +98,10 @@ u64 arm_clock::processor_to_nano(u64 ticks) >> return cntvct; >> } >> >> +#define TIMER_CTL_ISTATUS_BIT 4 >> +#define TIMER_CTL_IMASK_BIT 2 >> +#define TIMER_CTL_ENABLE_BIT 1 >> + >> class arm_clock_events : public clock_event_driver { >> public: >> arm_clock_events(); >> @@ -120,7 +124,25 @@ arm_clock_events::arm_clock_events() >> res = 16 + 11; /* default PPI 11 */ >> } >> _irq.reset(new ppi_interrupt(gic::irq_type::IRQ_TYPE_EDGE, res, >> - [this] { this->_callback->fired(); })); >> + [this] { >> + /* From AArch64 Programmer's Guides Generic Timer (chapter 3.4, page >> 10): >> + * The interrupts generated by the timer behave in a level-sensitive >> manner. >> + * This means that, once the timer firing condition is reached, >> + * the timer will continue to signal an interrupt until one of the >> following situations occurs: >> + - IMASK is set to one, which masks the interrupt. >> + - ENABLE is cleared to 0, which disables the timer. >> + - TVAL or CVAL is written, so that firing condition is no longer met. >> + When writing an interrupt handler for the timers, it is important >> + that software clears the interrupt before deactivating the interrupt >> in the GIC. >> + Otherwise the GIC will re-signal the same interrupt again. */ >> + u32 ctl = this->read_ctl(); >> + if (ctl & TIMER_CTL_ISTATUS_BIT) { // check if timer condition >> met >> > > We didn't have this ctl & TIMER_CTL_ISTATUS_BIT before. Why do we need to > add it now? > Could we get a timer interrupt without this bit? > In theory, we could get a timer interrupt without this bit set. In any case, I am following what Linux does (see the github link above). > > + ctl |= TIMER_CTL_IMASK_BIT; // mask timer interrupt >> + ctl &= ~TIMER_CTL_ENABLE_BIT; // disable timer >> + this->write_ctl(ctl); >> > > I'm afraid I don't understand this - when we handle one timer interrupt, > we mask the timer interrupt and disable the timer, > First, why do we need to do both? (what's the difference between disabling > and masking)? > Linux only masks the timer interrupt and this was enough to fix the problem with QEMU/TCG. I added disabling of timer to make armclock disable the timer output signal which I found somewhere saves cpu energy. But you are right it is not necessary. Maybe we should remove it. On other hand, it makes this code symmetrical to what we do in set. > > But thinking of the performance of this, what happens in the likely case > that there is a *next* timer interrupt? > I see below in set() that we re-enable the timer, and un-mask it? > Maybe instead of disabling and immediately re-enabling the timer, we could > avoid doing both and save a few nanoseconds? > I wonder if would save any as regardless of how many bits of CTL you enable/disable, it just requires one msr instruction (write to CTL register). > > Shouldn't we need to do something like: > > 1. If the timer list has a next timer, reconfigure the timer to that next > time > 2. Otherwise, disable the timer. > 3. However, the "otherwise" in 2 also includes the case that the next > timer happens right now or in the past. In that case, we need to manually > run the callbacks, because after we disable the timer we won't get more > interrupts for the same timer. > > You are right. My original patch proposal had this: class arm_clock *c = static_cast<arm_clock *>(clock::get()); tval = ((__uint128_t)tval * c->freq_hz) / NANO_PER_SEC; - u32 ctl = this->read_ctl(); - ctl |= 1; /* set enable */ - ctl &= ~2; /* unmask timer interrupt */ - this->write_tval(tval); - this->write_ctl(ctl); + if (tval) { + u32 ctl = this->read_ctl(); + ctl |= 1; /* set enable */ + ctl &= ~2; /* unmask timer interrupt */ + this->write_tval(tval); + this->write_ctl(ctl); + if (this->read_ctl() & 4) { // is timer condition met? + ctl |= 2; /* mask */ + this->write_ctl(ctl); + _callback->fired(); + } + } else { + _callback->fired(); + } } And this was enough (based on my empirical tests) to deal with the QEMU bug (which we think there is one but cannot prove 100%). But since then I saw Linux code and discovered, that it masks the interrupt in the handler and that adding code to mask the interrupt in OSv arm clock interrupt handler would alone work around QEMU bug. So either of the two would help us. But if we do what I proposed originally in the set (see above), it would require reading CTL registers twice - one before writing tval and one after to check if the timer condition is already met to disable it. Which would be more expensive, right? I'm thinking that maybe missing 3 is the bug we had with QEMU - if our code > did not have a special case for a timer 0 nanoseconds in the future (or > even in the past) than it would have relied on the processor serving the > same interrupt again, which QEMU probably didn't (because of a bug). > You now added this special case (the !tval case), so I wonder if that > change alone would have fixed the bug. > !tval alone was not enough. So the !tval in this patch is only meant to save us from setting the timer (not big saving). > > It's been ages since I looked at the details here, so maybe I'm just being > confused. > > > >> + this->_callback->fired(); >> + } >> + })); >> } >> >> arm_clock_events::~arm_clock_events() >> @@ -171,11 +193,16 @@ void arm_clock_events::set(std::chrono::nanoseconds >> nanos) >> class arm_clock *c = static_cast<arm_clock *>(clock::get()); >> tval = ((__uint128_t)tval * c->freq_hz) / NANO_PER_SEC; >> >> - u32 ctl = this->read_ctl(); >> - ctl |= 1; /* set enable */ >> - ctl &= ~2; /* unmask timer interrupt */ >> - this->write_tval(tval); >> - this->write_ctl(ctl); >> + if (tval) { >> + u32 ctl = this->read_ctl(); >> + ctl |= TIMER_CTL_ENABLE_BIT; // set enable >> + ctl &= ~TIMER_CTL_IMASK_BIT; // unmask timer interrupt >> + this->write_tval(tval); >> + this->write_ctl(ctl); >> + } else { >> + //No point to set timer if tval is 0 >> + _callback->fired(); >> > > As I noted above, I wonder if this change alone (the !tval case where you > run fired() immediately) would have been enough to fix this QEMU bug. > Which doesn't mean, of course, that the rest of the changes aren't good as > well. But I'm just trying to understand. > No, it was not enough. > > + } >> } >> } >> >> -- >> 2.30.2 >> >> -- >> You received this message because you are subscribed to the Google Groups >> "OSv Development" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to osv-dev+u...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/osv-dev/20210319214955.93417-1-jwkozaczuk%40gmail.com >> . >> > -- You received this message because you are subscribed to the Google Groups "OSv Development" group. To unsubscribe from this group and stop receiving emails from it, send an email to osv-dev+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/osv-dev/893380d8-d078-45c5-bae3-32f004c7b85cn%40googlegroups.com.