> Date: Fri, 4 Mar 2011 07:30:24 -0600
> From: Marco Peereboom <[email protected]>
>
> That is a huge penalty because it is read over the pci bus. The trick
> with 0xffffffff should work just fine per the doco and other os' drivers
> (on top of my head). The question I have is does Linux only have one
> device per interrupt?
Linux probably does a better job at avoiding shared interrupts than we
do, but it on some hardware it can't be avoided so it has to deal with
it.
If you wantto avoid reading the interrupt status register, you'll have
to stop trusting the hardware (or rather the firmware) in make
mpi_reply(), and do bounds checks before accessing sc->sc_rcbs[] and
sc->sc_ccbs[]. To be honest, that would be a good idea even if we
didn't have this bug.
In the meantime I think mikeb's fix should be committed.
> I am going to reference the doco one more time on this.
>
> On Thu, Mar 03, 2011 at 10:35:59PM -0500, Kenneth R Westerback wrote:
> > On Thu, Mar 03, 2011 at 07:11:52PM +0100, Mike Belopuhov wrote:
> > > On Fri, Feb 04, 2011 at 14:53 +0000, emeric boit wrote:
> > > > Hello,
> > > >
> > > > After doing a clean install of OpenBSD 4.8 (AMD64) on an IBM x3550 M3,
> > > > I find
> > > > the
> > > > system randomly panics after a period of use.
> > > > uvm_fault(0xffffffff80cc8360, 0xffff8000149b7000, 0, 1) -> e
> > > > kernel: page
> > > > fault trap, code=0
> > > > Stopped at mpi_reply+0x102: movq
> > > > 0(%r13),%rax
> > > > ddb{0}>
> > > >
> > > > ddb{0}> trace
> > > > mpi_reply() at mpi_reply+0x102
> > > > mpi_intr()
> > > > at mpi_intr+0x20
> > > > Xintr_ioapic_level18() at Xintr_ioapic_level18+0xec
> > > > ---
> > > > interrupt ---
> > > > Bad frame pointer: 0xffff8000194e1920
> > > > end trace frame:
> > > > 0xffff8000194e1920, count: -3
> > > > Xspllower+0xe:
> > > > ddb{0}>
> > > >
> > >
> > > We've tried different things, but after this hint i realised
> > > that what might be happening is that bnx and mpi interrupts
> > > are chained (it's bnx0 actually, my initial guess about bnx1
> > > was wrong) and mpi_intr is called first. Currently neither
> > > mpi(4) nor mpii(4) don't check the interrupt status register
> > > but look directly into the reply post queue. Although,
> > > there's not supposed to be any race between host cpu reading
> > > from the memory and ioc writing to it, in practice it turns
> > > out that in some particular hardware configurations this rule
> > > is violated and we read a garbled reply from the controller.
> > >
> > > If my memory serves, I've considered this for the mpii_intr
> > > but never got into the situation where it was needed and
> > > thus omitted it. I guess I have to bring it back too.
> > >
> > > Emeric tortured the machine with this diff and reported that
> > > it solves the issue for him. OK to commit?
> > >
> > > On Wed, Mar 02, 2011 at 17:20 +0000, emeric boit wrote:
> > > > hi,
> > > >
> > > > This change doesn't solve the issue.
> > > >
> > > > I have remarked that the server crash when I use the network.
> > > >
> > > > I copy a small file several times without problem.
> > > > On the IBM I do :
> > > > scp USER@IP:/tmp/mpi.c .
> > > >
> > > > And when I copy a larger file the server crash :
> > > > scp USER@IP:/bsd .
> > > >
> > > >
> > > > And when I copy th same file (bsd) from an usb key I don't have problem.
> > > >
> > > > Emeric.
> > > >
> > >
> > > that sounds like an interrupt sharing bug of some sort.
> > > is it bnx1 that you're using to reproduce a crash?
> > >
> > > try the following diff please (on a clean checkout):
> > >
> > > Index: mpi.c
> > > ===================================================================
> > > RCS file: /home/cvs/src/sys/dev/ic/mpi.c,v
> > > retrieving revision 1.166
> > > diff -u -p -r1.166 mpi.c
> > > --- mpi.c 1 Mar 2011 23:48:33 -0000 1.166
> > > +++ mpi.c 2 Mar 2011 17:40:13 -0000
> > > @@ -887,6 +887,9 @@ mpi_intr(void *arg)
> > > u_int32_t reg;
> > > int rv = 0;
> > >
> > > + if ((mpi_read_intr(sc) & MPI_INTR_STATUS_REPLY) == 0)
> > > + return (rv);
> > > +
> > > while ((reg = mpi_pop_reply(sc)) != 0xffffffff) {
> > > mpi_reply(sc, reg);
> > > rv = 1;
> > >
> >
> > ok krw@.
> >
> > .... Ken