Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback

Jan Kiszka Sat, 29 May 2010 09:40:49 -0700

Blue Swirl wrote:
>>> On the contrary, APIC is actually the only source of the IRQ ack
>>> information. RTC hack would not work without APIC (or the
>>> bidirectional IRQ) passing this info to RTC.
>>>
>>> What APIC doesn't have now is the timer frequency or period info. This
>>> is known by RTC and also higher levels managing the clocks.
>>>
>> So APIC has one bit of information and RTC everything else.
> 
> The information known by RTC (timer period) is also known by higher levels.


Curious to see where you'll find this.

> 
>> The current
>> approach (and proposed patch) brings this one bit of information to RTC,
>> you are arguing that RTC should be able to communicate all its info to
>> APIC. Sorry I don't see that your way has any advantage. Just more
>> complex interface and it is much easier to get it wrong for other time
>> sources.
> 
> I don't think anymore that APIC should be handling this but the
> generic stuff, like vl.c or exec.c. Then there would be only
> information passing from APIC to higher levels.

You neglect the the information required to associate a periodic source
(e.g. RTC) with an IRQ sink (e.g. APIC). Without that, you will have a
hard time figuring out if a reported IRQ coalescing requires any
activities or should simply be welcomed (for I/O IRQs).

> 
>>> I keep ignoring the idea that the current model, where both RTC and
>>> APIC must somehow work together to make coalescing work, is the only
>>> possible just because it is committed and it happens to work in some
>>> cases. It would be much better to concentrate this to one place, APIC
>>> or preferably higher level where it may benefit other timers too.
>>> Provided of course that the other models can be made to work.
>>>
>> So write the code and show us. You haven't show any evidence that RTC is
>> the wrong place. RTC knows when interrupt was acknowledge to RTC, it
>> know when clock frequency changes, it know when device reset happened.
>> APIC knows only that interrupt was coalesced. It doesn't even know that
>> it may be masked by a guest in IOAPIC (interrupts delivered while they
>> are masked not considered coalesced).
> 
> Oh, I thought interrupt masking was the reason for coalescing! What
> exactly is the reason then?

Missing acks, ie. the IRQ is still pending when the next one arrives.
You want to filter out masked/suppressed IRQs to avoid running the
de-coalescing logic on sources that are actually cut off (like the RTC
IRQ when the HPET took over).

> 
>> Time source knows only when
>> frequency changes and may be when device reset happens if timer is
>> stopped by device on reset. So RTC is actually a sweet spot if you want
>> to minimize amount of info you need to pass between various layers.
>>
>>>>> Maybe that version would not bend backwards as much as the current to
>>>>> cater for buggy hosts.
>>>>>
>>>> You mean "buggy guests"?
>>> Yes, sorry.
>>>
>>>> What guests are not buggy in your opinion?
>>>> Linux tries hard to be smart and as a result the only way to have stable
>>>> clock with it is to go paravirt.
>>> I'm not an OS designer, but I think an OS should never crash, even if
>>> a burst of IRQs is received. Reprogramming the timer should consider
>>> the pending IRQ situation (0 or 1 with real HW). Those bugs are one
>>> cause of the problem.
>> OS should never crash in the absence of HW bugs? I doubt you can design
>> an OS that can run in a face of any HW failure. Anyway here we are
>> trying to solve guests time keeping problem not crashes. Do you think
>> you can design OS that can keep time accurately no matter how crazy all
>> HW clock behaves?
> 
> I think my OS design skills are not relevant in this discussion, but
> IIRC there are fault tolerant operating systems for extreme conditions
> so it can be done.

No one can influence the design of released OS versions anymore.

> 
>>>>>> The fact is that timer device is not "just like any
>>>>>> other device" in virtual world. Any other device is easy: you just
>>>>>> implement spec as close as possible and everything works. For time
>>>>>> source device this is not enough. You can implement RTC+HPET to the
>>>>>> letter and your guest will drift like crazy.
>>>>> It's doable: a cycle accurate emulator will not cause any drift,
>>>>> without any voodoo. The interrupts would come after executing the same
>>>>> instruction as the real HW. For emulating any sufficiently buggy
>>>>> guests in any sufficiently desperate low resource conditions, this may
>>>>> be the only option that will always work.
>>>>>
>>>> Yes, but qemu and kvm are not cycle accurate emulators and don't strive
>>>> to be one. On the contrary KVM runs at native host CPU speed most of the
>>>> time, so any emulation done between two instruction is theoretically
>>>> noticeable for a guest. TSC is bypassed directly to a guest too, so
>>>> keeping all time source in perfect sync is also impossible.
>>> That is actually another cause of the problem. KVM gives the guest an
>>> illusion that the VCPU speed is equal to host speed. When they don't
>>> match, especially in critical code, there can be problems. It would be
>>> better to tell the guest a lower speed, which also can be guaranteed.
>>>
>> Not possible. It's that simple. You should take it into account in your
>> architecture design stage. In case of KVM real physical CPU executes guest
>> instruction and it does this as fast as it can. The only way we can hide
>> that from a guest is by intercepting each access to TSC and at that
>> point we can use bochs instead.
> 
> Well, as Paul pointed out, there's also icount option.

Which is not available in virtualization mode.

Jan

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback

Reply via email to