Blue Swirl wrote: >>> On the contrary, APIC is actually the only source of the IRQ ack >>> information. RTC hack would not work without APIC (or the >>> bidirectional IRQ) passing this info to RTC. >>> >>> What APIC doesn't have now is the timer frequency or period info. This >>> is known by RTC and also higher levels managing the clocks. >>> >> So APIC has one bit of information and RTC everything else. > > The information known by RTC (timer period) is also known by higher levels.
Curious to see where you'll find this. > >> The current >> approach (and proposed patch) brings this one bit of information to RTC, >> you are arguing that RTC should be able to communicate all its info to >> APIC. Sorry I don't see that your way has any advantage. Just more >> complex interface and it is much easier to get it wrong for other time >> sources. > > I don't think anymore that APIC should be handling this but the > generic stuff, like vl.c or exec.c. Then there would be only > information passing from APIC to higher levels. You neglect the the information required to associate a periodic source (e.g. RTC) with an IRQ sink (e.g. APIC). Without that, you will have a hard time figuring out if a reported IRQ coalescing requires any activities or should simply be welcomed (for I/O IRQs). > >>> I keep ignoring the idea that the current model, where both RTC and >>> APIC must somehow work together to make coalescing work, is the only >>> possible just because it is committed and it happens to work in some >>> cases. It would be much better to concentrate this to one place, APIC >>> or preferably higher level where it may benefit other timers too. >>> Provided of course that the other models can be made to work. >>> >> So write the code and show us. You haven't show any evidence that RTC is >> the wrong place. RTC knows when interrupt was acknowledge to RTC, it >> know when clock frequency changes, it know when device reset happened. >> APIC knows only that interrupt was coalesced. It doesn't even know that >> it may be masked by a guest in IOAPIC (interrupts delivered while they >> are masked not considered coalesced). > > Oh, I thought interrupt masking was the reason for coalescing! What > exactly is the reason then? Missing acks, ie. the IRQ is still pending when the next one arrives. You want to filter out masked/suppressed IRQs to avoid running the de-coalescing logic on sources that are actually cut off (like the RTC IRQ when the HPET took over). > >> Time source knows only when >> frequency changes and may be when device reset happens if timer is >> stopped by device on reset. So RTC is actually a sweet spot if you want >> to minimize amount of info you need to pass between various layers. >> >>>>> Maybe that version would not bend backwards as much as the current to >>>>> cater for buggy hosts. >>>>> >>>> You mean "buggy guests"? >>> Yes, sorry. >>> >>>> What guests are not buggy in your opinion? >>>> Linux tries hard to be smart and as a result the only way to have stable >>>> clock with it is to go paravirt. >>> I'm not an OS designer, but I think an OS should never crash, even if >>> a burst of IRQs is received. Reprogramming the timer should consider >>> the pending IRQ situation (0 or 1 with real HW). Those bugs are one >>> cause of the problem. >> OS should never crash in the absence of HW bugs? I doubt you can design >> an OS that can run in a face of any HW failure. Anyway here we are >> trying to solve guests time keeping problem not crashes. Do you think >> you can design OS that can keep time accurately no matter how crazy all >> HW clock behaves? > > I think my OS design skills are not relevant in this discussion, but > IIRC there are fault tolerant operating systems for extreme conditions > so it can be done. No one can influence the design of released OS versions anymore. > >>>>>> The fact is that timer device is not "just like any >>>>>> other device" in virtual world. Any other device is easy: you just >>>>>> implement spec as close as possible and everything works. For time >>>>>> source device this is not enough. You can implement RTC+HPET to the >>>>>> letter and your guest will drift like crazy. >>>>> It's doable: a cycle accurate emulator will not cause any drift, >>>>> without any voodoo. The interrupts would come after executing the same >>>>> instruction as the real HW. For emulating any sufficiently buggy >>>>> guests in any sufficiently desperate low resource conditions, this may >>>>> be the only option that will always work. >>>>> >>>> Yes, but qemu and kvm are not cycle accurate emulators and don't strive >>>> to be one. On the contrary KVM runs at native host CPU speed most of the >>>> time, so any emulation done between two instruction is theoretically >>>> noticeable for a guest. TSC is bypassed directly to a guest too, so >>>> keeping all time source in perfect sync is also impossible. >>> That is actually another cause of the problem. KVM gives the guest an >>> illusion that the VCPU speed is equal to host speed. When they don't >>> match, especially in critical code, there can be problems. It would be >>> better to tell the guest a lower speed, which also can be guaranteed. >>> >> Not possible. It's that simple. You should take it into account in your >> architecture design stage. In case of KVM real physical CPU executes guest >> instruction and it does this as fast as it can. The only way we can hide >> that from a guest is by intercepting each access to TSC and at that >> point we can use bochs instead. > > Well, as Paul pointed out, there's also icount option. Which is not available in virtualization mode. Jan
signature.asc
Description: OpenPGP digital signature