After more digging and experimenting, I am more inclined to say this is not 
a QEMU bug but OSv one after all. And this smaller patch below is enough to 
make the tests that used to hang in TCG mode run consistently without any 
problems based on many runs:

diff --git a/arch/aarch64/arm-clock.cc b/arch/aarch64/arm-clock.cc
index 7d11e454..a5542718 100644
--- a/arch/aarch64/arm-clock.cc
+++ b/arch/aarch64/arm-clock.cc
@@ -120,7 +120,23 @@ arm_clock_events::arm_clock_events()
         res = 16 + 11; /* default PPI 11 */
     }
     _irq.reset(new ppi_interrupt(gic::irq_type::IRQ_TYPE_EDGE, res,
-                                 [this] { this->_callback->fired(); }));
+                                 [this] {
+ /* From AArch64 Programmer's Guides Generic Timer (chapter 3.4, page 10):
+  * The interrupts generated by the timer behave in a level-sensitive 
manner.
+  * This means that, once the timer firing condition is reached,
+  * the timer will continue to signal an interrupt until one of the 
following situations occurs:
+    - IMASK is set to one, which masks the interrupt.
+    - ENABLE is cleared to 0, which disables the timer.
+    - TVAL or CVAL is written, so that firing condition is no longer met.
+    When writing an interrupt handler for the timers, it is important
+    that software clears the interrupt before deactivating the interrupt 
in the GIC.
+    Otherwise the GIC will re-signal the same interrupt again. */
+        u32 ctl = this->read_ctl();
+        ctl |= 2;  // mask timer interrupt
+        ctl &= ~1; // disable timer
+        this->write_ctl(ctl);
+        this->_callback->fired();
+    }));
 }
 
 arm_clock_events::~arm_clock_events()

As the comment states, our timer interrupt handler missed disabling the 
timer and/or masking the interrupt right after an interrupt was raised and 
that seemed to have caused QEMU to not fire the next timer event. On real 
hardware with KVM on that did not seem to matter but disabling the timer 
and masking the interrupt is the correct way to handle the timer interrupt.

The other optimizations to the set() method (at least the check to verify 
non-0 tval) I had in the original patch seem to be correct as well so it 
does not hurt to add them as well.

Waldek

BTW. Linux seems to be masking the interrupt in the handler 
- 
https://github.com/torvalds/linux/blob/edd7ab76847442e299af64a761febd180d71f98d/drivers/clocksource/arm_arch_timer.c#L638-L652.
 
But disabling the timer seems to be the correct thing to do as well per 
"Setting this bit to 0 disables the timer output signal, but the timer 
value accessible from CNTV_TVAL_EL0 continues to count down." from the 
description of ENABLE bit of the CTL register.
On Sunday, March 14, 2021 at 10:12:56 PM UTC-4 Waldek Kozaczuk wrote:

> I after more research I may more inclined to say this might be a QEMU bug. 
> In essence, in TCG mode QEMU does NOT seem to correctly emulate 
> CNTV_TVAL_EL0 (Virtual Timer TimerValue) and CNTV_CVAL_EL0 (Virtual Timer 
> CompareValue) registers when one sets a tiny value to the former one or 
> increments the latter one by a tiny value. In situations when OSv would 
> seem to have ceased to receive any new timer interrupts the last value 
> assigned to CNTV_TVAL_EL0 would typically be 3-4 ticks which given the 
> frequency of 62500000 equals setting a timer in 48 or 64 nanoseconds.
>
> Based on the descriptions of these 4 registers taken from ARMv8 Arch 
> Documentation:
>
>    - CNTVCT_EL0, Counter-timer Virtual Count register - Holds the 64-bit 
>    virtual count value
>    - CNTV_CTL_EL0, Counter-timer Virtual Timer Control register
>       - 
>       - ISTATUS, bit [2]
>          - The status of the timer. This bit indicates whether the timer 
>          condition is met:
>             - 0b0 Timer condition is not met.
>             - 0b1 Timer condition is met.
>          - When the value of the ENABLE bit is 1, ISTATUS indicates 
>          whether the timer condition is met. ISTATUS takes no account of the 
> value 
>          of the IMASK bit. If the value of ISTATUS is 1 and the value of 
> IMASK is 0 
>          then the timer interrupt is asserted.
>          - When the value of the ENABLE bit is 0, the ISTATUS field is 
>          UNKNOWN. *This bit is read-only.*
>       - 
>       - IMASK, bit [1]
>          - Timer interrupt mask bit. Permitted values are:
>             - 0b0 Timer interrupt is not masked by the IMASK bit.
>             - 0b1 Timer interrupt is masked by the IMASK bit.
>          - ENABLE, bit [0]
>          - Enables the timer. Permitted values are:
>             - 0b0 Timer disabled.
>             - 0b1 Timer enabled.
>          - Setting this bit to 0 disables the timer output signal, but 
>          the timer value accessible from CNTV_TVAL_EL0 continues to count 
> down.
>       - CNTV_CVAL_EL0, Counter-timer Virtual Timer CompareValue register 
>    - Holds the compare value for the virtual timer
>    - 
>       - When CNTV_CTL_EL0.ENABLE is 1, the timer condition is met when 
>       (CNTVCT_EL0 - CompareValue) is greater than or equal to zero. This 
> means 
>       that CompareValue acts like a 64-bit upcounter timer. When the timer 
>       condition is met:
>          - CNTV_CTL_EL0.ISTATUS is set to 1.
>          - If CNTV_CTL_EL0.IMASK is 0, an interrupt is generated.
>       - CNTV_TVAL_EL0, Counter-timer Virtual Timer TimerValue register
>    - 
>       - On a write of this register, CNTV_CVAL_EL0 is set to (CNTVCT_EL0 
>       + TimerValue), where TimerValue is treated as a signed 32-bit integer.
>       - When CNTV_CTL_EL0.ENABLE is 1, the timer condition is met when 
>       (CNTVCT_EL0 - CNTV_CVAL_EL0) is greater than or equal to zero. This 
> means 
>       that TimerValue acts like a 32-bit downcounter timer. When the timer 
>       condition is met:
>          - CNTV_CTL_EL0.ISTATUS is set to 1.
>          - If CNTV_CTL_EL0.IMASK is 0, an interrupt is generated.
>       
> Every time, we write a tval (nanos converted to ticks) to CNTV_TVAL_EL0, 
> the CNTV_CVAL_EL0 should be set to the value of CNTVCT_EL0 + tval. Then 
> once CNTVCT_EL0 >= CNTV_CVAL_EL0, the ISTATUS bit of CNTV_CTL_EL0 should be 
> on when read and trigger interrupt if IMASK is 0.
>
> But what should happen if by the time we set CNTV_TVAL_EL0 and write to 
> CNTV_CTL_EL0 to set IMASK to 0 and ENABLE bit to 1, the timer condition is 
> already met because the tval was really tiny like 3 or 4, etc. Per the spec 
> the timer condition should be met and we should be receiving an interrupt. 
> But it seems in TCG mode QEMU does not seem to raise one.
>
> But also please note that after we write CNTV_TVAL_EL0 followed by 
> CNTV_CTL_EL0, subsequently reading CNTV_CTL_EL0 should return value with 
> ISTATUS on if the condition is met.
>
> Interestingly enough if I take advantage of the above and re-read 
> CNTV_CTL_EL0 and detect that ISTATUS is met and fire the timer event on the 
> spot and enable IMASK, all the tests that kept handing with TCG work fine 
> based on hundreds of runs (one test like tst-hub.cc would hang always in 
> the test_timer part before the change, now it always passes).
>
> So this patch seems to be fixing the issue and make handling clock events 
> slightly more efficient by skipping settning unnecessary alarms. This 
> optimization also work in native mode with KVM (for KVM mode this patch is 
> unnecessary as the native cpu is able to handle tiny TVAL values it seems).
>
> diff --git a/arch/aarch64/arm-clock.cc b/arch/aarch64/arm-clock.cc
> index c1f4b277..c7aafc5a 100644
> --- a/arch/aarch64/arm-clock.cc
> +++ b/arch/aarch64/arm-clock.cc
> @@ -106,7 +106,22 @@ arm_clock_events::arm_clock_events()
>          res = 16 + 11; /* default PPI 11 */
>      }
>      _irq.reset(new ppi_interrupt(gic::irq_type::IRQ_TYPE_EDGE, res,
> -                                 [this] { this->_callback->fired(); }));
> +                                 [this] {
> +/* From AArch64 Programmer's Guides Generic Timer:
> + * The interrupts generated by the timer behave in a level-sensitive 
> manner.
> + * This means that, once the timer firing condition is reached,
> + * the timer will continue to signal an interrupt until one of the 
> following situations occurs:
> +   - IMASK is set to one, which masks the interrupt.
> +   - ENABLE is cleared to 0, which disables the timer.
> +   - TVAL or CVAL is written, so that firing condition is no longer met.
> +   When writing an interrupt handler for the timers, it is important
> +   that software clears the interrupt before deactivating the interrupt 
> in the GIC.
> +   Otherwise the GIC will re-signal the same interrupt again. */
> +        u32 ctl = this->read_ctl();
> +        ctl |= 2; /* mask timer interrupt */
> +        this->write_ctl(ctl);
> +        this->_callback->fired();
> +    }));
>  }
>  
>  arm_clock_events::~arm_clock_events()
> @@ -157,11 +172,20 @@ void arm_clock_events::set(std::chrono::nanoseconds 
> nanos)
>          class arm_clock *c = static_cast<arm_clock *>(clock::get());
>          tval = ((__uint128_t)tval * c->freq_hz) / NANO_PER_SEC;
>  
> -        u32 ctl = this->read_ctl();
> -        ctl |= 1;  /* set enable */
> -        ctl &= ~2; /* unmask timer interrupt */
> -        this->write_tval(tval);
> -        this->write_ctl(ctl);
> +        if (tval) {
> +           u32 ctl = this->read_ctl();
> +           ctl |= 1;  /* set enable */
> +           ctl &= ~2; /* unmask timer interrupt */
> +           this->write_tval(tval);
> +           this->write_ctl(ctl);
> +           if (this->read_ctl() & 4) { // is timer condition met?
> +               ctl |= 2; /* mask */
> +               this->write_ctl(ctl);
> +              _callback->fired();
> +           }
> +        } else {
> +           _callback->fired();
> +        }
>      }
>  }
>
> So it seems that QEMU at least properly emulates the ISTATUS bit of the 
> CNTV_CTL_EL0 register.
>
> Waldek
>
> On Sunday, March 14, 2021 at 11:07:37 AM UTC-4 Waldek Kozaczuk wrote:
>
>> Thanks for your suggestions.
>>
>> Can you also verify if my thinking is correct: the idle thread for each 
>> CPU should ALWAYS advance which can be observed by constantly increasing 
>> value of *_total_cpu_time* unless all other threads on that cpu use the 
>> time (the case of 100% cpu utilization)? Also, any active timers should 
>> only stay active until they expire. In other words, if I see not changing 
>> *_total_cpu_time* and the same active timers every time I connect with 
>> gdb or continue it means that something is wrong meaning it is an abnormal 
>> state. It could be explained by either timer interrupts not being raised by 
>> QEMU or a bug in OSv where it misses to set up the next timer event or sets 
>> in an overly distant future or some other condition of that sort. Right?
>>
>>
>>
>> On Sunday, March 14, 2021 at 8:16:16 AM UTC-4 Nadav Har'El wrote:
>>
>>> On Thu, Mar 11, 2021 at 9:12 PM Waldek Kozaczuk <jwkoz...@gmail.com> 
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> In the last couple of days, I have been troubleshooting the scenarios 
>>>> when some unit test would hang when running on QEMU in TCG mode. I have 
>>>> opened an issue with extensive details and some findings - 
>>>> https://github.com/cloudius-systems/osv/issues/1127. My running theory 
>>>> is that we may have a bug in arm-clock implementation or a bug in QEMU 
>>>> aarch64 TCG code that simulates the virtual clock counters and affects how 
>>>> we use them. It also relates to the scheduler logic and how it interacts 
>>>> with a clock so I wonder if some of you might have any interesting 
>>>> insights.
>>>>
>>>> Please note that this problem does not seem to happen on KVM when 
>>>> running the same tests on real ARM hardware. The code I am testing is 
>>>> actually patched with some of the latest improvements like the "implement 
>>>> reschedule_from_interrupt() in assembly" and some other ones fixing FPU 
>>>> state save/restore which have not been applied to the master.
>>>>
>>>> Any help or suggestions would be greatly appreciated.
>>>> Waldek
>>>>
>>>
>>> Unfortunately I can't really help here, I have zero experience with ARM. 
>>> I can also provide guesses:
>>>
>>> Your description makes it sound - although I really have no idea if that 
>>> is possible or likely - that we ask the timer hardware for a timer 
>>> interrupt but never receive it, which would make this CPU just hang forever 
>>> (if nobody on this CPU is waiting for any mutex or something like that, 
>>> nobody will ever bother to send a wakeup IPI to this CPU, so it will never 
>>> wake up and reach the scheduler).
>>> I don't know where is our ARM code that sets up the high-precision timer 
>>> interrupt, or how it works. 
>>>
>>> I suggest you try to add *tracepoints* in the timer code so after a 
>>> hang you can see a list of the last timer operations - sets, unsets and 
>>> wakeups: Did we set up a timer even but never got work up? Did we forget to 
>>> set a timer? Did we cancel a timer and forget to reinstate it?
>>>
>>> I wouldn't be surprised if this is a QEMU bug in the emulation of the 
>>> timer hardware. I've seen in the past that QEMU emulation bugs can linger 
>>> for years (e.g., see https://github.com/cloudius-systems/osv/issues/855). 
>>> Maybe it's because hardly anyone uses QEMU without hardware support (like 
>>> KVM), so these bugs can go unnoticed? Although certainly a OSv bug is even 
>>> more likely.
>>>
>>> Nadav.
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/410891c2-e296-4f8e-881e-6881a7a54636n%40googlegroups.com.

Reply via email to