Thor Lancelot Simon <t...@panix.com> writes: > We saw this on a platform of similar vintage at a former employer of > mine, and indeed the uhci was one of the devices involved.
While googling just now, I discovered that you've observed the problem before within the NetBSD Foundation, as well: http://mail-index.netbsd.org/port-amd64/2006/03/01/0004.html What happens seems to be that the Intel E7520 chip set has a bug where an interrupt is being handled, and the ioapic pin temporarily masked, and the chip set somehow decides to make the masked interrupt pop up somewhere else. I've seen other describe the exact symptoms you did, so I'm assuming that you had the same Intel chip set I do, but in a different machine, where it was wired up differently. Thus, you got leakage from the amr to the bge, whereas I (and others with Dell products) get a different pattern: I've been observing my system closely, with uhci in polled mode, using Joerg's patch, and now also with SMP enabled. What I've found is that disk I/O, with the amr interrupting at ioapic1, pin 14, leaks interrupts to ioapic0, pin 18 (uhci2), while network I/O, with the wm interrupting at ioapic2, pin 0, leaks interrupts to ioapic0, pin 16 (uhci0). When the source interrupt rate is low, a low percentage of the interrupts leak, but when the source driver is loaded down with lots of work, the percentage increases. When I'm running a "-j 4" system build, and, at the same time, spooling a full backup to a scratch disk on a neighboring system, I end up with something on the order of 10% of network interrupts, and 30% of disk interrupts, leaking to the wrong ioapic, as opposed to about 1% and 2%, respectively, when the system is lightly loaded. It feels exponential, but I haven't plotted the data. The hangs I've seen are probably related to the feedback loop where a busy source interrupt handler means more work for the leaked interrupt handler, which in turn reduces the system's ability to handle interrupts quickly, leading to yet more leaked interrupts. I'm guessing that my system is surviving this, and not letting it escalate into full on hangs, because Joerg's patch has it spending much less time on each leaked interrupt, so load peaks don't get escalated out of control. -tih -- Popularity is the hallmark of mediocrity. --Niles Crane, "Frasier"