Re: [patch] 2.4.1, 2.4.2-pre3: APIC lockups
On Wed, 14 Feb 2001, Roeland Th. Jansen wrote: > On Tue, Feb 13, 2001 at 09:13:10PM +0100, Maciej W. Rozycki wrote: > > Please test it extensively, as much as you can, before I submit it for > > inclusion. If you ever get "Aieee!!! Remote IRR still set after unlock!" > > message, please report it to me immediately -- it means the code failed. > > > ok, so far so good. > > > There is also an additional debugging/statistics counter provided in > > /proc/cpuinfo that counts interrupts which got delivered with its trigger > > mode mismatched. Check it out to find if you get any misdelivered > > interrupts at all. > > currently attacking the box with a flood ping. I used a pristine 2.4.1. > to be sure I didn't leave stuff and applied the patch. ping -l is a good test also... Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] 2.4.1, 2.4.2-pre3: APIC lockups
On Wed, Feb 14, 2001 at 05:30:57PM +, Roeland Th. Jansen wrote: > other observations -- approx 6000 ints from the ne2k card/sec. > MIS shows approx 1% that goes wrong with a ping flood. oops. had to count both CPU0 and CPU1's interrupts. after 23 minutes : CPU0 CPU1 19:38241143823371 IO-APIC-level eth0 MIS: 29025 makes approx 0.3%.. -- Grobbebol's Home | Don't give in to spammers. -o) http://www.xs4all.nl/~bengel | Use your real e-mail address /\ Linux 2.2.16 SMP 2x466MHz / 256 MB |on Usenet. _\_v - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] 2.4.1, 2.4.2-pre3: APIC lockups
On Tue, Feb 13, 2001 at 09:13:10PM +0100, Maciej W. Rozycki wrote: > Please test it extensively, as much as you can, before I submit it for > inclusion. If you ever get "Aieee!!! Remote IRR still set after unlock!" > message, please report it to me immediately -- it means the code failed. ok, so far so good. > There is also an additional debugging/statistics counter provided in > /proc/cpuinfo that counts interrupts which got delivered with its trigger > mode mismatched. Check it out to find if you get any misdelivered > interrupts at all. currently attacking the box with a flood ping. I used a pristine 2.4.1. to be sure I didn't leave stuff and applied the patch. observations -- system doesn't crash; usually I had to use disable focus processor -- else it fails. other observations -- approx 6000 ints from the ne2k card/sec. MIS shows approx 1% that goes wrong with a ping flood. CPU0 CPU1 0: 35345 36195IO-APIC-edge timer 1: 1632 1534IO-APIC-edge keyboard 2: 0 0 XT-PIC cascade 3:826832IO-APIC-edge serial 4: 4 4IO-APIC-edge serial 5: 12213 12201IO-APIC-edge soundblaster 8: 0 1IO-APIC-edge rtc 14: 3079 2906IO-APIC-edge ide0 15: 3 3IO-APIC-edge ide1 18: 69 85 IO-APIC-level BusLogic BT-930 19:17582801758266 IO-APIC-level eth0 NMI: 71480 71480 LOC: 71459 71456 ERR: 3 MIS: 15814 good work ! -- Grobbebol's Home | Don't give in to spammers. -o) http://www.xs4all.nl/~bengel | Use your real e-mail address /\ Linux 2.2.16 SMP 2x466MHz / 256 MB |on Usenet. _\_v - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] 2.4.1, 2.4.2-pre3: APIC lockups
On Wed, 14 Feb 2001, Andrew Morton wrote: > Tell me, please: what tradeoffs are involved in this patch? > Obviously it works around a pretty fatal problem, but > what are we giving away? The change decreases performance a bit. For well-behaved systems the loss is fifteen instructions: a local APIC read (uncached but supposedly cheap), a global memory read (a cache line invalidation and fetch), seven stack accesses (cached for sure), a taken branch and five ALU. With the version you have I see gcc is actually doing an extra memory read due to the volatile APIC access presumably -- this is now fixed. For misdelivered interrupts the overhead is much, much bigger, involving acquiring a spinlock and multiple (uncached and possibly slow) I/O APIC accesses. We may lower the overhead by undefining APIC_LOCKUP_DEBUG, which we should do after a bit of testing. I think we might leave APIC_MISMATCH_DEBUG intact -- its cost is a single locked instruction which is negligible IMO. Note the original version consisted of two instructions only -- a local APIC write and "ret", sigh... Maciej -- + Maciej W. Rozycki, Technical University of Gdansk, Poland + +--+ +e-mail: [EMAIL PROTECTED], PGP key available+ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] 2.4.1, 2.4.2-pre3: APIC lockups
"Maciej W. Rozycki" wrote: > > Hi, > > After performing various tests I came to the following workaround for > APIC lockups which people observe under IRQ load, mostly for networking > stuff. Works fine on the dual-PII. No "Aieee!!!" messages at all. After sending a few gigs across the ethernet, running irq-whacker: mnm:/usr/src/cptimer> cat /proc/interrupts CPU0 CPU1 0: 77613 61869IO-APIC-edge timer 1:253258IO-APIC-edge keyboard 2: 0 0 XT-PIC cascade 8: 0 1IO-APIC-edge rtc 9: 0 0 XT-PIC acpi 12: 0 0IO-APIC-edge PS/2 Mouse 17:51048553919759 IO-APIC-level eth0 18: 2334 2313 IO-APIC-level ide2 NMI: 139418 139418 LOC: 139403 139402 ERR:221 MIS:5299867 And without irq-whacker: mnm:/home/morton> cat /proc/interrupts CPU0 CPU1 0: 55384 70899IO-APIC-edge timer 1: 2 3IO-APIC-edge keyboard 2: 0 0 XT-PIC cascade 8: 0 1IO-APIC-edge rtc 9: 0 0 XT-PIC acpi 12: 0 0IO-APIC-edge PS/2 Mouse 17:25547052554064 IO-APIC-level eth0 18: 1814 1812 IO-APIC-level ide2 NMI: 126220 126220 LOC: 126202 126201 ERR: 35 MIS: 0 Tell me, please: what tradeoffs are involved in this patch? Obviously it works around a pretty fatal problem, but what are we giving away? Oh: and thanks :) - - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] 2.4.1, 2.4.2-pre3: APIC lockups
"Maciej W. Rozycki" wrote: Hi, After performing various tests I came to the following workaround for APIC lockups which people observe under IRQ load, mostly for networking stuff. Works fine on the dual-PII. No "Aieee!!!" messages at all. After sending a few gigs across the ethernet, running irq-whacker: mnm:/usr/src/cptimer cat /proc/interrupts CPU0 CPU1 0: 77613 61869IO-APIC-edge timer 1:253258IO-APIC-edge keyboard 2: 0 0 XT-PIC cascade 8: 0 1IO-APIC-edge rtc 9: 0 0 XT-PIC acpi 12: 0 0IO-APIC-edge PS/2 Mouse 17:51048553919759 IO-APIC-level eth0 18: 2334 2313 IO-APIC-level ide2 NMI: 139418 139418 LOC: 139403 139402 ERR:221 MIS:5299867 And without irq-whacker: mnm:/home/morton cat /proc/interrupts CPU0 CPU1 0: 55384 70899IO-APIC-edge timer 1: 2 3IO-APIC-edge keyboard 2: 0 0 XT-PIC cascade 8: 0 1IO-APIC-edge rtc 9: 0 0 XT-PIC acpi 12: 0 0IO-APIC-edge PS/2 Mouse 17:25547052554064 IO-APIC-level eth0 18: 1814 1812 IO-APIC-level ide2 NMI: 126220 126220 LOC: 126202 126201 ERR: 35 MIS: 0 Tell me, please: what tradeoffs are involved in this patch? Obviously it works around a pretty fatal problem, but what are we giving away? Oh: and thanks :) - - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] 2.4.1, 2.4.2-pre3: APIC lockups
On Wed, 14 Feb 2001, Andrew Morton wrote: Tell me, please: what tradeoffs are involved in this patch? Obviously it works around a pretty fatal problem, but what are we giving away? The change decreases performance a bit. For well-behaved systems the loss is fifteen instructions: a local APIC read (uncached but supposedly cheap), a global memory read (a cache line invalidation and fetch), seven stack accesses (cached for sure), a taken branch and five ALU. With the version you have I see gcc is actually doing an extra memory read due to the volatile APIC access presumably -- this is now fixed. For misdelivered interrupts the overhead is much, much bigger, involving acquiring a spinlock and multiple (uncached and possibly slow) I/O APIC accesses. We may lower the overhead by undefining APIC_LOCKUP_DEBUG, which we should do after a bit of testing. I think we might leave APIC_MISMATCH_DEBUG intact -- its cost is a single locked instruction which is negligible IMO. Note the original version consisted of two instructions only -- a local APIC write and "ret", sigh... Maciej -- + Maciej W. Rozycki, Technical University of Gdansk, Poland + +--+ +e-mail: [EMAIL PROTECTED], PGP key available+ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] 2.4.1, 2.4.2-pre3: APIC lockups
On Tue, Feb 13, 2001 at 09:13:10PM +0100, Maciej W. Rozycki wrote: Please test it extensively, as much as you can, before I submit it for inclusion. If you ever get "Aieee!!! Remote IRR still set after unlock!" message, please report it to me immediately -- it means the code failed. ok, so far so good. There is also an additional debugging/statistics counter provided in /proc/cpuinfo that counts interrupts which got delivered with its trigger mode mismatched. Check it out to find if you get any misdelivered interrupts at all. currently attacking the box with a flood ping. I used a pristine 2.4.1. to be sure I didn't leave stuff and applied the patch. observations -- system doesn't crash; usually I had to use disable focus processor -- else it fails. other observations -- approx 6000 ints from the ne2k card/sec. MIS shows approx 1% that goes wrong with a ping flood. CPU0 CPU1 0: 35345 36195IO-APIC-edge timer 1: 1632 1534IO-APIC-edge keyboard 2: 0 0 XT-PIC cascade 3:826832IO-APIC-edge serial 4: 4 4IO-APIC-edge serial 5: 12213 12201IO-APIC-edge soundblaster 8: 0 1IO-APIC-edge rtc 14: 3079 2906IO-APIC-edge ide0 15: 3 3IO-APIC-edge ide1 18: 69 85 IO-APIC-level BusLogic BT-930 19:17582801758266 IO-APIC-level eth0 NMI: 71480 71480 LOC: 71459 71456 ERR: 3 MIS: 15814 good work ! -- Grobbebol's Home | Don't give in to spammers. -o) http://www.xs4all.nl/~bengel | Use your real e-mail address /\ Linux 2.2.16 SMP 2x466MHz / 256 MB |on Usenet. _\_v - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] 2.4.1, 2.4.2-pre3: APIC lockups
On Wed, Feb 14, 2001 at 05:30:57PM +, Roeland Th. Jansen wrote: other observations -- approx 6000 ints from the ne2k card/sec. MIS shows approx 1% that goes wrong with a ping flood. oops. had to count both CPU0 and CPU1's interrupts. after 23 minutes : CPU0 CPU1 19:38241143823371 IO-APIC-level eth0 MIS: 29025 makes approx 0.3%.. -- Grobbebol's Home | Don't give in to spammers. -o) http://www.xs4all.nl/~bengel | Use your real e-mail address /\ Linux 2.2.16 SMP 2x466MHz / 256 MB |on Usenet. _\_v - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] 2.4.1, 2.4.2-pre3: APIC lockups
On Wed, 14 Feb 2001, Roeland Th. Jansen wrote: On Tue, Feb 13, 2001 at 09:13:10PM +0100, Maciej W. Rozycki wrote: Please test it extensively, as much as you can, before I submit it for inclusion. If you ever get "Aieee!!! Remote IRR still set after unlock!" message, please report it to me immediately -- it means the code failed. ok, so far so good. There is also an additional debugging/statistics counter provided in /proc/cpuinfo that counts interrupts which got delivered with its trigger mode mismatched. Check it out to find if you get any misdelivered interrupts at all. currently attacking the box with a flood ping. I used a pristine 2.4.1. to be sure I didn't leave stuff and applied the patch. ping -l is a good test also... Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] 2.4.1, 2.4.2-pre3: APIC lockups
On Tue, Feb 13, 2001 at 09:13:10PM +0100, Maciej W. Rozycki wrote: > There is also an additional debugging/statistics counter provided in > /proc/cpuinfo that counts interrupts which got delivered with its trigger > mode mismatched. Check it out to find if you get any misdelivered > interrupts at all. I guess you mean the MIS: counter in /proc/interrupts? This is what it says on my box after running some 33 interrupts (at a rate of app. 900/second) through the network/usb IRQ: cat /proc/interrupts CPU0 CPU1 0: 31693 32749IO-APIC-edge timer 1: 1208 1174IO-APIC-edge keyboard 2: 0 0 XT-PIC cascade 3:113 26IO-APIC-edge serial 4: 4689 4567IO-APIC-edge serial 14: 4440 4545IO-APIC-edge ide0 15: 1911 2132IO-APIC-edge ide1 16: 85021 84227 IO-APIC-level es1371, mga@PCI:1:0:0 17: 26 26 IO-APIC-level sym53c8xx 18: 0 0 IO-APIC-level btaudio, bttv 19: 165467 166254 IO-APIC-level eth0, eth1, usb-uhci NMI: 64376 64376 LOC: 64364 64362 ERR: 0 MIS:647 So, that's about 650 misdelivered interrupts for 33 deliveries (the other interrupts never gave me any trouble, so I guess the misdelivered ones are all from IRQ 19), or about .2% When I load the network and stream some audio over it, the sound becomes a bit choppy. The MIS: counter only increases when the network (read: IRQ1() is loaded, a single audio stream (app. 220 int/sec) causes no MISses to occur. In general, I'd say the stability WITH the patch is good, and timeouts are withing tolerable levels. If I need something better, I'll probably get myself a better set of network cards... So, quick conclusion, this seems a reasonable fix... Cheers//Frank -- W ___ ## o o\/ Frank de Lange \ }# \| / \ ##---# _/ \ \ +31-320-252965/ \[EMAIL PROTECTED]/ - [ "Omnis enim res, quae dando non deficit, dum habetur et non datur, nondum habetur, quomodo habenda est." ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] 2.4.1, 2.4.2-pre3: APIC lockups
"Maciej W. Rozycki" wrote: > > Hi, > > After performing various tests I came to the following workaround for > APIC lockups which people observe under IRQ load, mostly for networking > stuff. I believe the test should work in all cases as it basically > implements a manual replacement for EOI messages. In my simulated > environment I was unable to get a lockup with the code in place, even > though I was getting about every other level-triggered IRQ misdelivered. > > Please test it extensively, as much as you can, before I submit it for > inclusion. If you ever get "Aieee!!! Remote IRR still set after unlock!" > message, please report it to me immediately -- it means the code failed. > No messages. > There is also an additional debugging/statistics counter provided in > /proc/cpuinfo that counts interrupts which got delivered with its trigger > mode mismatched. Check it out to find if you get any misdelivered > interrupts at all. > I'm running my default webserver load test, and I get ~40 /second, 92735 total. bw_tcp says 1.13 MB/sec, that's wire speed. tcpdump | grep 'sack ' doesn't show unusually many lost packets. Look promising. -- Manfred - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] 2.4.1, 2.4.2-pre3: APIC lockups
On Tue, Feb 13, 2001 at 09:13:10PM +0100, Maciej W. Rozycki wrote: There is also an additional debugging/statistics counter provided in /proc/cpuinfo that counts interrupts which got delivered with its trigger mode mismatched. Check it out to find if you get any misdelivered interrupts at all. I guess you mean the MIS: counter in /proc/interrupts? This is what it says on my box after running some 33 interrupts (at a rate of app. 900/second) through the network/usb IRQ: cat /proc/interrupts CPU0 CPU1 0: 31693 32749IO-APIC-edge timer 1: 1208 1174IO-APIC-edge keyboard 2: 0 0 XT-PIC cascade 3:113 26IO-APIC-edge serial 4: 4689 4567IO-APIC-edge serial 14: 4440 4545IO-APIC-edge ide0 15: 1911 2132IO-APIC-edge ide1 16: 85021 84227 IO-APIC-level es1371, mga@PCI:1:0:0 17: 26 26 IO-APIC-level sym53c8xx 18: 0 0 IO-APIC-level btaudio, bttv 19: 165467 166254 IO-APIC-level eth0, eth1, usb-uhci NMI: 64376 64376 LOC: 64364 64362 ERR: 0 MIS:647 So, that's about 650 misdelivered interrupts for 33 deliveries (the other interrupts never gave me any trouble, so I guess the misdelivered ones are all from IRQ 19), or about .2% When I load the network and stream some audio over it, the sound becomes a bit choppy. The MIS: counter only increases when the network (read: IRQ1() is loaded, a single audio stream (app. 220 int/sec) causes no MISses to occur. In general, I'd say the stability WITH the patch is good, and timeouts are withing tolerable levels. If I need something better, I'll probably get myself a better set of network cards... So, quick conclusion, this seems a reasonable fix... Cheers//Frank -- W ___ ## o o\/ Frank de Lange \ }# \| / \ ##---# _/ Hacker for Hire \ \ +31-320-252965/ \[EMAIL PROTECTED]/ - [ "Omnis enim res, quae dando non deficit, dum habetur et non datur, nondum habetur, quomodo habenda est." ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/