Re: [Qemu-devel] Re: Network connections stalling (due to lost interrupts/ticks?)

2007-08-03 Thread n schembr
I'm seeing the same rtc error but my systems are not hanging. I can still get 
to them and they seem to handle a good load from time to time, 4 running proc.

Is this a stability or performance issue? 

If it is a stability issue  how do I test it?

- Original Message 
From: Jason Wessel [EMAIL PROTECTED]
To: [EMAIL PROTECTED]; qemu-devel@nongnu.org
Sent: Friday, August 3, 2007 8:18:50 AM
Subject: Re: [Qemu-devel] Re: Network connections stalling (due to lost 
interrupts/ticks?)

Charles,

Are you willing to try an experimental patch?

Perhaps you could try the attached patch and post back if it happens to 
solve your problem.  There is most definitely a problem where qemu can 
get hung up indefinitely after an interrupt storm.  I had not ever 
submitted it because there is no clean way to do this via the opaque 
information that is passed around.  It seems wrong to have to make the 
ioapic a global.  If this does fix the problem perhaps someone will 
decide to fix this up in a cleaner fashion via the opaque structures.

Jason.

Charles Duffy wrote:
 Charles Duffy wrote:
   
 There's a warning on startup that the system can't set a 1024Hz timer,
 which persists even after I set /proc/sys/dev/rtc/max-user-freq to 1024,
 and I occasionally get warnings at runtime (Your time source seems to
 be instable or some driver is hogging interrupts).
 

 This was happening because my host kernel was compiled with
 CONFIG_HPET_RTC_IRQ=y. I've disabled this option, recompiled and
 rebooted, and it resolved the RTC warning (and apparently, the unstable
 time source messages) -- but my network connections are still stalling.



   







Re: [Qemu-devel] Re: Network connections stalling (due to lost interrupts/ticks?)

2007-08-03 Thread Jason Wessel

The RTC message has nothing to do with the interrupt controller load.

The patch I mentioned was aimed at stability/bug fix.  Nothing to do 
with performance what so ever.


The simple test that you can usually break the qemu interrupt controller 
with is to do a ping -f to the target when using TAP.  Then just run 
some other processes on the target or try to use the network with telnet 
or write to the disk with echo file  blah ; sync...  It usually doesn't 
last too long.   It is the ping -f that will keep the interrupt load 
at the max.


Jason.

n schembr wrote:
I'm seeing the same rtc error but my systems are not hanging. I can 
still get to them and they seem to handle a good load from time to 
time, 4 running proc.


Is this a stability or performance issue?

If it is a stability issue  how do I test it?

- Original Message 
From: Jason Wessel [EMAIL PROTECTED]
To: [EMAIL PROTECTED]; qemu-devel@nongnu.org
Sent: Friday, August 3, 2007 8:18:50 AM
Subject: Re: [Qemu-devel] Re: Network connections stalling (due to 
lost interrupts/ticks?)


Charles,

Are you willing to try an experimental patch?

Perhaps you could try the attached patch and post back if it happens to
solve your problem.  There is most definitely a problem where qemu can
get hung up indefinitely after an interrupt storm.  I had not ever
submitted it because there is no clean way to do this via the opaque
information that is passed around.  It seems wrong to have to make the
ioapic a global.  If this does fix the problem perhaps someone will
decide to fix this up in a cleaner fashion via the opaque structures.

Jason.

Charles Duffy wrote:
 Charles Duffy wrote:
  
 There's a warning on startup that the system can't set a 1024Hz timer,
 which persists even after I set /proc/sys/dev/rtc/max-user-freq to 
1024,

 and I occasionally get warnings at runtime (Your time source seems to
 be instable or some driver is hogging interrupts).



 This was happening because my host kernel was compiled with
 CONFIG_HPET_RTC_IRQ=y. I've disabled this option, recompiled and
 rebooted, and it resolved the RTC warning (and apparently, the unstable
 time source messages) -- but my network connections are still stalling.



  









Re: [Qemu-devel] Re: Network connections stalling (due to lost interrupts/ticks?)

2007-08-03 Thread Jason Wessel

Charles,

Are you willing to try an experimental patch?

Perhaps you could try the attached patch and post back if it happens to 
solve your problem.  There is most definitely a problem where qemu can 
get hung up indefinitely after an interrupt storm.  I had not ever 
submitted it because there is no clean way to do this via the opaque 
information that is passed around.  It seems wrong to have to make the 
ioapic a global.  If this does fix the problem perhaps someone will 
decide to fix this up in a cleaner fashion via the opaque structures.


Jason.

Charles Duffy wrote:

Charles Duffy wrote:
  

There's a warning on startup that the system can't set a 1024Hz timer,
which persists even after I set /proc/sys/dev/rtc/max-user-freq to 1024,
and I occasionally get warnings at runtime (Your time source seems to
be instable or some driver is hogging interrupts).



This was happening because my host kernel was compiled with
CONFIG_HPET_RTC_IRQ=y. I've disabled this option, recompiled and
rebooted, and it resolved the RTC warning (and apparently, the unstable
time source messages) -- but my network connections are still stalling.



  



Recover from an interupt flood by propagating the end of interrupt state. 

Signed-off-by: Jason Wessel [EMAIL PROTECTED]
---
 hw/apic.c |   23 +--
 hw/pc.c   |2 +-
 2 files changed, 22 insertions(+), 3 deletions(-)

Index: qemu/hw/apic.c
===
--- qemu.orig/hw/apic.c
+++ qemu/hw/apic.c
@@ -332,6 +332,26 @@ static void apic_set_irq(APICState *s, i
 apic_update_irq(s);
 }
 
+struct IOAPICState *ioapic;
+/* XXX Multi IOAPIC support */
+static void apic_propogate_eoi(int vector) {
+uint32_t irr;
+int pin;
+
+if ((vector  0x10) || (vector  0xfe))
+return;
+
+irr = ioapic-irr;
+while (irr) {
+pin = ffs_bit(irr);
+irr = ~(1  pin);
+if ((ioapic-ioredtbl[pin]  0xff) == vector) {
+ioapic-irr = ~(1  pin);
+break;
+}
+}
+}
+
 static void apic_eoi(APICState *s)
 {
 int isrv;
@@ -339,8 +359,7 @@ static void apic_eoi(APICState *s)
 if (isrv  0)
 return;
 reset_bit(s-isr, isrv);
-/* XXX: send the EOI packet to the APIC bus to allow the I/O APIC to
-set the remote IRR bit for level triggered interrupts. */
+apic_propogate_eoi(isrv);
 apic_update_irq(s);
 }
 
Index: qemu/hw/pc.c
===
--- qemu.orig/hw/pc.c
+++ qemu/hw/pc.c
@@ -36,7 +36,7 @@
 static fdctrl_t *floppy_controller;
 static RTCState *rtc_state;
 static PITState *pit;
-static IOAPICState *ioapic;
+extern IOAPICState *ioapic;
 static PCIDevice *i440fx_state;
 
 static void ioport80_write(void *opaque, uint32_t addr, uint32_t data)


[Qemu-devel] Re: Network connections stalling (due to lost interrupts/ticks?)

2007-08-03 Thread Charles Duffy
Well, behavior with the patch applied is certainly different.

The large download I'm running still times out; however, it is now able
to resume without needing to bring the interface down and back up.
However, after the first timeout, subsequent timeouts occur with much
greater frequency -- still making this multi-GB download an
impracticality when using -net tap.

The flood ping is not killing the network connection, though it is
interrupted by frequent messages: Warning: time of day goes back
(-23150us), taking countermeasures. (This is no the high end of the
time variances shown; the smallest are on the scale of 120us