i agree that mikebs change should go in. On 05/03/2011, at 12:10 AM, Mark Kettenis wrote:
>> Date: Fri, 4 Mar 2011 07:30:24 -0600 >> From: Marco Peereboom <[email protected]> >> >> That is a huge penalty because it is read over the pci bus. The trick >> with 0xffffffff should work just fine per the doco and other os' drivers >> (on top of my head). The question I have is does Linux only have one >> device per interrupt? > > Linux probably does a better job at avoiding shared interrupts than we > do, but it on some hardware it can't be avoided so it has to deal with > it. > > If you wantto avoid reading the interrupt status register, you'll have > to stop trusting the hardware (or rather the firmware) in make > mpi_reply(), and do bounds checks before accessing sc->sc_rcbs[] and > sc->sc_ccbs[]. To be honest, that would be a good idea even if we > didn't have this bug. > > In the meantime I think mikeb's fix should be committed. > >> I am going to reference the doco one more time on this. >> >> On Thu, Mar 03, 2011 at 10:35:59PM -0500, Kenneth R Westerback wrote: >>> On Thu, Mar 03, 2011 at 07:11:52PM +0100, Mike Belopuhov wrote: >>>> On Fri, Feb 04, 2011 at 14:53 +0000, emeric boit wrote: >>>>> Hello, >>>>> >>>>> After doing a clean install of OpenBSD 4.8 (AMD64) on an IBM x3550 M3, >>>>> I find >>>>> the >>>>> system randomly panics after a period of use. >>>>> uvm_fault(0xffffffff80cc8360, 0xffff8000149b7000, 0, 1) -> e >>>>> kernel: page >>>>> fault trap, code=0 >>>>> Stopped at mpi_reply+0x102: movq >>>>> 0(%r13),%rax >>>>> ddb{0}> >>>>> >>>>> ddb{0}> trace >>>>> mpi_reply() at mpi_reply+0x102 >>>>> mpi_intr() >>>>> at mpi_intr+0x20 >>>>> Xintr_ioapic_level18() at Xintr_ioapic_level18+0xec >>>>> --- >>>>> interrupt --- >>>>> Bad frame pointer: 0xffff8000194e1920 >>>>> end trace frame: >>>>> 0xffff8000194e1920, count: -3 >>>>> Xspllower+0xe: >>>>> ddb{0}> >>>>> >>>> >>>> We've tried different things, but after this hint i realised >>>> that what might be happening is that bnx and mpi interrupts >>>> are chained (it's bnx0 actually, my initial guess about bnx1 >>>> was wrong) and mpi_intr is called first. Currently neither >>>> mpi(4) nor mpii(4) don't check the interrupt status register >>>> but look directly into the reply post queue. Although, >>>> there's not supposed to be any race between host cpu reading >>>> from the memory and ioc writing to it, in practice it turns >>>> out that in some particular hardware configurations this rule >>>> is violated and we read a garbled reply from the controller. >>>> >>>> If my memory serves, I've considered this for the mpii_intr >>>> but never got into the situation where it was needed and >>>> thus omitted it. I guess I have to bring it back too. >>>> >>>> Emeric tortured the machine with this diff and reported that >>>> it solves the issue for him. OK to commit? >>>> >>>> On Wed, Mar 02, 2011 at 17:20 +0000, emeric boit wrote: >>>>> hi, >>>>> >>>>> This change doesn't solve the issue. >>>>> >>>>> I have remarked that the server crash when I use the network. >>>>> >>>>> I copy a small file several times without problem. >>>>> On the IBM I do : >>>>> scp USER@IP:/tmp/mpi.c . >>>>> >>>>> And when I copy a larger file the server crash : >>>>> scp USER@IP:/bsd . >>>>> >>>>> >>>>> And when I copy th same file (bsd) from an usb key I don't have problem. >>>>> >>>>> Emeric. >>>>> >>>> >>>> that sounds like an interrupt sharing bug of some sort. >>>> is it bnx1 that you're using to reproduce a crash? >>>> >>>> try the following diff please (on a clean checkout): >>>> >>>> Index: mpi.c >>>> =================================================================== >>>> RCS file: /home/cvs/src/sys/dev/ic/mpi.c,v >>>> retrieving revision 1.166 >>>> diff -u -p -r1.166 mpi.c >>>> --- mpi.c 1 Mar 2011 23:48:33 -0000 1.166 >>>> +++ mpi.c 2 Mar 2011 17:40:13 -0000 >>>> @@ -887,6 +887,9 @@ mpi_intr(void *arg) >>>> u_int32_t reg; >>>> int rv = 0; >>>> >>>> + if ((mpi_read_intr(sc) & MPI_INTR_STATUS_REPLY) == 0) >>>> + return (rv); >>>> + >>>> while ((reg = mpi_pop_reply(sc)) != 0xffffffff) { >>>> mpi_reply(sc, reg); >>>> rv = 1; >>>> >>> >>> ok krw@. >>> >>> .... Ken
