Re: [Qemu-devel] fix clearing i8259 IRQ lines (Was: Should the i8259 devices remain no-user?)

2013-10-29 Thread BALATON Zoltan

On Sat, 26 Oct 2013, Matthew Ogilvie wrote:

Although the 8259 (interrupts) model is clearly wrong with respect
to clearing an IRQ request line, only one ancient unimportant guest
(Microport UNIX ca. 1987) seems to care, and there are potentially
significant risks to more important guests if we try to fix it:


There's at least one more guest that cares I know about which is less 
ancient but maybe just as unimportant: OPENSTEP for Mach. But nevertheless 
it still is a now known bug which just seems to be tolerated by the OS-es 
that are most commonly run under Qemu. What was not clear to me is how 
significant are the risks of the fix and if they were considered or the 
patch was just forgotten without ever getting the thought about merging 
it.



Risks: The 8254 (timers) model is wrong in various ways, some of
which are hidden by the incorrect 8259 model, and fixing it could
potentially break migration, depending on exact circumstances.
Also, it isn't clear if there are other device models depending
on the incorrect 8259 that would also need to be fixed.


I had the impression from previous discussion that the main risk was a 
potential lost timer interrupt in some circumstances at migration which 
may affect some guests but it was not clear (to me at least) how big of a 
risk is this. IMO if other models depend on a bug they are also buggy and 
should be fixed but I don't know how many models could that affect.



If someone actually showed real interest in actually merging
these, including the selection of a migration compatibility
strategy they would actually be willing to merge (and above:
other devices, KVM, etc), I could look into updating
the patches to match.  But the if parts aren't looking
particularly likely.  This seems like a rather core-level
wide-implication change for a newbie to be messing
with.  (I've already spent noticably more time on qemu
patches than I had intended to spend total on playing with
this guest, although I may continue if I have a clearly
defined strategy.)


I think you have already provided detailed analysis, test cases and 
multiple options and patch versions so it is not you who should spend more 
time on this now. What I think would be needed is that people who have the 
knowledge and insight to analyse and decide about the patches give it some 
time to think about it and come to a decision then tell what to do or why 
it's better to leave it unfixed. Can this be done in this thread? Or maybe 
on one of the upcoming phone conferences where the right people are 
together anyway to discuss it?


Regards,
BALATON Zoltan



Re: [Qemu-devel] fix clearing i8259 IRQ lines (Was: Should the i8259 devices remain no-user?)

2013-10-26 Thread Matthew Ogilvie
On Wed, Oct 16, 2013 at 06:23:11PM +0200, Paolo Bonzini wrote:
 Il 16/10/2013 18:21, BALATON Zoltan ha scritto:
  A bit off topic but this reminded me of these patches:
 
  http://patchwork.ozlabs.org/patch/206753/
  http://patchwork.ozlabs.org/patch/208252/
 
  which never got merged. Is there a chance that these fixes get merged
  sometimes or is there an explanation why it won't be fixed? As far as I
  remember the patches were reviewed and multiple versions were proposed
  but at the end no decision was reached on which one to merge and it was
  just left uncorrected.

 Right, thank you very much.  ISTR the unanswered question was what to do
 about migration, but I need to reread all the threads.

 Paolo

Essentially correct.

Although the 8259 (interrupts) model is clearly wrong with respect
to clearing an IRQ request line, only one ancient unimportant guest
(Microport UNIX ca. 1987) seems to care, and there are potentially
significant risks to more important guests if we try to fix it:

Risks: The 8254 (timers) model is wrong in various ways, some of
which are hidden by the incorrect 8259 model, and fixing it could
potentially break migration, depending on exact circumstances.
Also, it isn't clear if there are other device models depending
on the incorrect 8259 that would also need to be fixed.

Similar changes are needed in KVM for consistency, although some of
the 8254 modes are implemented in a more simplistic way (pulses
handled as fast as possible directly, instead of
1-millisecond-long pulses on real hardware).  Note that I was
never able to get my guest running successfully under KVM; I'm
not sure what the remaining problems were.

Also, the patch series included a few other things:
  - A couple of low priority fixes that can still be worked
around without code changes, but could probably qualify
as trivial patches.
  - Some test cases to test for the 8259 problem.
  - Plus an optional VGA hack to make it work when
my ancient guest tries to directly (no BIOS) configure it
for CGA text mode.
I didn't get much feedback about these.

-

If someone actually showed real interest in actually merging
these, including the selection of a migration compatibility
strategy they would actually be willing to merge (and above:
other devices, KVM, etc), I could look into updating
the patches to match.  But the if parts aren't looking
particularly likely.  This seems like a rather core-level
wide-implication change for a newbie to be messing
with.  (I've already spent noticably more time on qemu
patches than I had intended to spend total on playing with
this guest, although I may continue if I have a clearly
defined strategy.)

- Matthew Ogilvie