Changes since previous version: * Patches 1, 2, 3, and 4, and 7 haven't changed at all. * The only change to patch 6 is I added test cases output to the commit message. * Patches 5 and 8 are new. Patch 8 has downsides; see migration notes below.
Disclaimer: The PIT patches and the analysis below are mostly based on reading the spec and the code, with only a minimal amount of sanity testing so far. --------------------- Remaining inaccuracies in the qemu (user mode) 8254 PIT (timer) model after these patches: * Generally, a new guest-supplied initial count is only loaded into the actual counter on the next CLK tick. This could be delayed by about a microsecond (CLK runs at about 1.1 MHz). * CLK ticks should be predefined; nothing done in software should change their timing. But we essentially restart CLK (throwing out fractional ticks) whenever the guest programs the PIT. * It doesn't model a paused counter (for example, when GATE is low), and reading back the paused count from the counter. Also, when it is unpaused, it always starts back at the beginning, rather than where it was paused. This shouldn't affect interrupts (channel 0's GATE is hard-wired high), but it could affect reading back the count from channel 2 (PC speaker). * Lost tick risks: * When the guest leaves interrupts disabled for an extended period of time, the level might return low before the guest sees the inturrpt. This can also happen on real hardware, so probably isn't worth worrying about. * Depending on the underlying qemu-timer model (used for CLK), there might be a risk that the timer could skip a significant chunk of time without providing any processor time to the 8254 model or the guest (perhaps if the host is scheduling other processes). I'm not sure if this is possible or likely, but if it happens, the 8254 will dutifully provide all the missing edges to the 8259. But the 8254 will provide them all at once (until it catches up to the qemu-timer time), and the 8259 and guest will effectively not notice nor process the earlier edges in the batch. Again, not sure when/if this can actually happen. Maybe it is more likely to happen with KVM in the "kernel_irqchip=off" case, but not in the native case? Not sure. My general impression is that none of these are worth worrying about. They are fairly tangential to any corrections to the interrupt trailing edge model. --------------------- Question about best way to handle migration issues: There is a tradeoff between potentially losing an IRQ0 tick in some cases, vs potentially gaining an IRQ0 tick in other cases, and how to deal with interrupts. NOTE: In addition to the lost leading edge issue mentioned in the comment in the patch, some of my changes in patch 5 fixes where the leading edge occurs by 1 CLK count. Migration might risk delivering an interrupt twice if it happens in the 1 CLK interval. Evaluate ways of dealing with this. Perhaps intentionally leave the off-by-one in perpetuity, or is there a clever way to avoid issues? I'm not sure how to resolve it, but here are some ideas: 1. Leave out patch 8 and do nothing. Potentially lose an interrupt when migrating back to an older version, because the old version always expects to be able to produce a leading edge by simply setting IRQ0 high, but if it is normally high... 2. Use patch 8. If we keep it, it probably belongs near patch 5 or 6, or squashed in with one of them. But if the guest uses something like the one-shot mode 4, it would get an extra interrupt whenever it restarts the timer if the previous interrupt has been serviced. (Some of these could match real hardware, if mode 4 happens to be started when IRQ0 happens to be low. But this is relatively unlikely.) 3. Use something similar to patch 8, but only in the exported data used for migration, not in normal operation. This could possibly generate a single extra interrupt when migrating, but with luck most guests should be able to handle that. 4. Or only prepare to fix it sometime in the future: * Explicitly do not fix the trailing edge logic for IRQ0 in the 8259. Only fix other IRQ's for now. * Leave the PIT functionally mostly as-is, except: * Try to tweak the PIT model so that it could handle migration to/from some future fixed version without problems. For example, when it wants to generate a leading edge, make sure it is a leading edge by first explicitly lowering the IRQ, even if it "should have" been low already. * Wait until sometime in the future (years from now?) when back-migrating to 2012 version is no longer important to actually fix the PIT interrupt level model. 5. Or implement two models of operation, and a command line switch to select one. At some point, change the default to the more accurate model. And maybe drop the old model sometime in the distant future. Anyone have any better ideas? Or preferences for or against any of the above? --------------------- Some notes about the KVM (kernel mode) 8254 PIT (timer) model: As far as it goes, the kernel model appears to need the similar changes and have similar remaining inaccuracies was the qemu model. But it also has other issues: But it will also have problems related to the different way the kernel 8254 ties the OUT logic to the interrupt controller. The kernel makes no attempt to model two edges separately; instead it delivers both at the same time, in essentially a zero-width pulse. This model never looses a tick; it delays delivering a new leading edge until the previous leading edge has been acknowledged with an EOI in the 8259. This functionality may be necessary due to how the kernel schedules tasks, and the nature of the kernel's timers (otherwise on a heavily loaded host, the guest could unavoidably lose a large number of ticks). It also has the edges backwards. The almost-zero-width pulse is currently briefly pulsing positive instead of briefly pulsing negative, which is clearly won't work at all with a proper model of the trailing edge. Clearly the order of the edges in the brief pulse needs to be fixed if we want to fix the interrupt trailing edge model. (See migration considerations above...) What is less clear is if it is worth the effort to try to model the trailing edge more accurately than that. Note that if we want to keep the tick catchup logic, then anytime it is "catching up" it is probably not desirable to leave it low for long. That would delay or prevent catching up, so in the "catching up" case, we would essentially fall back on the current simplified model. Potentially we could model the low level in a few cases (like when starting mode 0, or getting halfway through counting mode 3, or the one-CLK-cylce low in various other modes, or we get the EOI after the IRQ0 should be low, etc), as long as it isn't trying to catch up. This could lead to somewhat random durations of IRQ0=low, which might cause more intermitment/confusing bugs than just always modeling it as short-duration. Perhaps if some rare guest cares about IRQ0=low duration, then user could just disable the in-kernel IRQ chip for that guest? --------------------- Matthew Ogilvie (8): fix some debug printf format strings vl: fix -hdachs/-hda argument order parsing issues qemu-options.hx: mention retrace= VGA option vga: add some optional CGA compatibility hacks i8254: fix inaccuracies in pit_get_out() i8259: fix so that dropping IRQ level always clears the interrupt request i8259: refactor pic_set_irq level logic i8259/i8254: migration workaround for timer hw/cirrus_vga.c | 4 ++-- hw/i8254.c | 24 ++++++++++++++++++-- hw/i8254_common.c | 18 +++++---------- hw/i8259.c | 67 +++++++++++++++++++++++++++++++++++++++++-------------- hw/ide/cmd646.c | 5 +++-- hw/ide/via.c | 5 +++-- hw/pc.h | 4 ++++ hw/vga.c | 37 +++++++++++++++++++++++------- qemu-options.hx | 27 +++++++++++++++++++++- vl.c | 62 +++++++++++++++++++++++++++++++++----------------- 10 files changed, 186 insertions(+), 67 deletions(-) -- 1.7.10.2.484.gcd07cc5