Re: OpenBSD crash on an IBM x3550 M3
That is a huge penalty because it is read over the pci bus. The trick with 0x should work just fine per the doco and other os' drivers (on top of my head). The question I have is does Linux only have one device per interrupt? I am going to reference the doco one more time on this. On Thu, Mar 03, 2011 at 10:35:59PM -0500, Kenneth R Westerback wrote: On Thu, Mar 03, 2011 at 07:11:52PM +0100, Mike Belopuhov wrote: On Fri, Feb 04, 2011 at 14:53 +, emeric boit wrote: Hello, After doing a clean install of OpenBSD 4.8 (AMD64) on an IBM x3550 M3, I find the system randomly panics after a period of use. uvm_fault(0x80cc8360, 0x8000149b7000, 0, 1) - e kernel: page fault trap, code=0 Stopped at mpi_reply+0x102:movq 0(%r13),%rax ddb{0} ddb{0} trace mpi_reply() at mpi_reply+0x102 mpi_intr() at mpi_intr+0x20 Xintr_ioapic_level18() at Xintr_ioapic_level18+0xec --- interrupt --- Bad frame pointer: 0x8000194e1920 end trace frame: 0x8000194e1920, count: -3 Xspllower+0xe: ddb{0} We've tried different things, but after this hint i realised that what might be happening is that bnx and mpi interrupts are chained (it's bnx0 actually, my initial guess about bnx1 was wrong) and mpi_intr is called first. Currently neither mpi(4) nor mpii(4) don't check the interrupt status register but look directly into the reply post queue. Although, there's not supposed to be any race between host cpu reading from the memory and ioc writing to it, in practice it turns out that in some particular hardware configurations this rule is violated and we read a garbled reply from the controller. If my memory serves, I've considered this for the mpii_intr but never got into the situation where it was needed and thus omitted it. I guess I have to bring it back too. Emeric tortured the machine with this diff and reported that it solves the issue for him. OK to commit? On Wed, Mar 02, 2011 at 17:20 +, emeric boit wrote: hi, This change doesn't solve the issue. I have remarked that the server crash when I use the network. I copy a small file several times without problem. On the IBM I do : scp USER@IP:/tmp/mpi.c . And when I copy a larger file the server crash : scp USER@IP:/bsd . And when I copy th same file (bsd) from an usb key I don't have problem. Emeric. that sounds like an interrupt sharing bug of some sort. is it bnx1 that you're using to reproduce a crash? try the following diff please (on a clean checkout): Index: mpi.c === RCS file: /home/cvs/src/sys/dev/ic/mpi.c,v retrieving revision 1.166 diff -u -p -r1.166 mpi.c --- mpi.c 1 Mar 2011 23:48:33 - 1.166 +++ mpi.c 2 Mar 2011 17:40:13 - @@ -887,6 +887,9 @@ mpi_intr(void *arg) u_int32_t reg; int rv = 0; + if ((mpi_read_intr(sc) MPI_INTR_STATUS_REPLY) == 0) + return (rv); + while ((reg = mpi_pop_reply(sc)) != 0x) { mpi_reply(sc, reg); rv = 1; ok krw@. Ken
Re: OpenBSD crash on an IBM x3550 M3
i agree that mikebs change should go in. On 05/03/2011, at 12:10 AM, Mark Kettenis wrote: Date: Fri, 4 Mar 2011 07:30:24 -0600 From: Marco Peereboom sl...@peereboom.us That is a huge penalty because it is read over the pci bus. The trick with 0x should work just fine per the doco and other os' drivers (on top of my head). The question I have is does Linux only have one device per interrupt? Linux probably does a better job at avoiding shared interrupts than we do, but it on some hardware it can't be avoided so it has to deal with it. If you wantto avoid reading the interrupt status register, you'll have to stop trusting the hardware (or rather the firmware) in make mpi_reply(), and do bounds checks before accessing sc-sc_rcbs[] and sc-sc_ccbs[]. To be honest, that would be a good idea even if we didn't have this bug. In the meantime I think mikeb's fix should be committed. I am going to reference the doco one more time on this. On Thu, Mar 03, 2011 at 10:35:59PM -0500, Kenneth R Westerback wrote: On Thu, Mar 03, 2011 at 07:11:52PM +0100, Mike Belopuhov wrote: On Fri, Feb 04, 2011 at 14:53 +, emeric boit wrote: Hello, After doing a clean install of OpenBSD 4.8 (AMD64) on an IBM x3550 M3, I find the system randomly panics after a period of use. uvm_fault(0x80cc8360, 0x8000149b7000, 0, 1) - e kernel: page fault trap, code=0 Stopped at mpi_reply+0x102:movq 0(%r13),%rax ddb{0} ddb{0} trace mpi_reply() at mpi_reply+0x102 mpi_intr() at mpi_intr+0x20 Xintr_ioapic_level18() at Xintr_ioapic_level18+0xec --- interrupt --- Bad frame pointer: 0x8000194e1920 end trace frame: 0x8000194e1920, count: -3 Xspllower+0xe: ddb{0} We've tried different things, but after this hint i realised that what might be happening is that bnx and mpi interrupts are chained (it's bnx0 actually, my initial guess about bnx1 was wrong) and mpi_intr is called first. Currently neither mpi(4) nor mpii(4) don't check the interrupt status register but look directly into the reply post queue. Although, there's not supposed to be any race between host cpu reading from the memory and ioc writing to it, in practice it turns out that in some particular hardware configurations this rule is violated and we read a garbled reply from the controller. If my memory serves, I've considered this for the mpii_intr but never got into the situation where it was needed and thus omitted it. I guess I have to bring it back too. Emeric tortured the machine with this diff and reported that it solves the issue for him. OK to commit? On Wed, Mar 02, 2011 at 17:20 +, emeric boit wrote: hi, This change doesn't solve the issue. I have remarked that the server crash when I use the network. I copy a small file several times without problem. On the IBM I do : scp USER@IP:/tmp/mpi.c . And when I copy a larger file the server crash : scp USER@IP:/bsd . And when I copy th same file (bsd) from an usb key I don't have problem. Emeric. that sounds like an interrupt sharing bug of some sort. is it bnx1 that you're using to reproduce a crash? try the following diff please (on a clean checkout): Index: mpi.c === RCS file: /home/cvs/src/sys/dev/ic/mpi.c,v retrieving revision 1.166 diff -u -p -r1.166 mpi.c --- mpi.c 1 Mar 2011 23:48:33 - 1.166 +++ mpi.c 2 Mar 2011 17:40:13 - @@ -887,6 +887,9 @@ mpi_intr(void *arg) u_int32_t reg; int rv = 0; + if ((mpi_read_intr(sc) MPI_INTR_STATUS_REPLY) == 0) + return (rv); + while ((reg = mpi_pop_reply(sc)) != 0x) { mpi_reply(sc, reg); rv = 1; ok krw@. Ken
Re: OpenBSD crash on an IBM x3550 M3
On Fri, Feb 04, 2011 at 14:53 +, emeric boit wrote: Hello, After doing a clean install of OpenBSD 4.8 (AMD64) on an IBM x3550 M3, I find the system randomly panics after a period of use. uvm_fault(0x80cc8360, 0x8000149b7000, 0, 1) - e kernel: page fault trap, code=0 Stopped at mpi_reply+0x102:movq 0(%r13),%rax ddb{0} ddb{0} trace mpi_reply() at mpi_reply+0x102 mpi_intr() at mpi_intr+0x20 Xintr_ioapic_level18() at Xintr_ioapic_level18+0xec --- interrupt --- Bad frame pointer: 0x8000194e1920 end trace frame: 0x8000194e1920, count: -3 Xspllower+0xe: ddb{0} We've tried different things, but after this hint i realised that what might be happening is that bnx and mpi interrupts are chained (it's bnx0 actually, my initial guess about bnx1 was wrong) and mpi_intr is called first. Currently neither mpi(4) nor mpii(4) don't check the interrupt status register but look directly into the reply post queue. Although, there's not supposed to be any race between host cpu reading from the memory and ioc writing to it, in practice it turns out that in some particular hardware configurations this rule is violated and we read a garbled reply from the controller. If my memory serves, I've considered this for the mpii_intr but never got into the situation where it was needed and thus omitted it. I guess I have to bring it back too. Emeric tortured the machine with this diff and reported that it solves the issue for him. OK to commit? On Wed, Mar 02, 2011 at 17:20 +, emeric boit wrote: hi, This change doesn't solve the issue. I have remarked that the server crash when I use the network. I copy a small file several times without problem. On the IBM I do : scp USER@IP:/tmp/mpi.c . And when I copy a larger file the server crash : scp USER@IP:/bsd . And when I copy th same file (bsd) from an usb key I don't have problem. Emeric. that sounds like an interrupt sharing bug of some sort. is it bnx1 that you're using to reproduce a crash? try the following diff please (on a clean checkout): Index: mpi.c === RCS file: /home/cvs/src/sys/dev/ic/mpi.c,v retrieving revision 1.166 diff -u -p -r1.166 mpi.c --- mpi.c 1 Mar 2011 23:48:33 - 1.166 +++ mpi.c 2 Mar 2011 17:40:13 - @@ -887,6 +887,9 @@ mpi_intr(void *arg) u_int32_t reg; int rv = 0; + if ((mpi_read_intr(sc) MPI_INTR_STATUS_REPLY) == 0) + return (rv); + while ((reg = mpi_pop_reply(sc)) != 0x) { mpi_reply(sc, reg); rv = 1;
Re: OpenBSD crash on an IBM x3550 M3
On Thu, Mar 03, 2011 at 07:11:52PM +0100, Mike Belopuhov wrote: On Fri, Feb 04, 2011 at 14:53 +, emeric boit wrote: Hello, After doing a clean install of OpenBSD 4.8 (AMD64) on an IBM x3550 M3, I find the system randomly panics after a period of use. uvm_fault(0x80cc8360, 0x8000149b7000, 0, 1) - e kernel: page fault trap, code=0 Stopped at mpi_reply+0x102:movq 0(%r13),%rax ddb{0} ddb{0} trace mpi_reply() at mpi_reply+0x102 mpi_intr() at mpi_intr+0x20 Xintr_ioapic_level18() at Xintr_ioapic_level18+0xec --- interrupt --- Bad frame pointer: 0x8000194e1920 end trace frame: 0x8000194e1920, count: -3 Xspllower+0xe: ddb{0} We've tried different things, but after this hint i realised that what might be happening is that bnx and mpi interrupts are chained (it's bnx0 actually, my initial guess about bnx1 was wrong) and mpi_intr is called first. Currently neither mpi(4) nor mpii(4) don't check the interrupt status register but look directly into the reply post queue. Although, there's not supposed to be any race between host cpu reading from the memory and ioc writing to it, in practice it turns out that in some particular hardware configurations this rule is violated and we read a garbled reply from the controller. If my memory serves, I've considered this for the mpii_intr but never got into the situation where it was needed and thus omitted it. I guess I have to bring it back too. Emeric tortured the machine with this diff and reported that it solves the issue for him. OK to commit? On Wed, Mar 02, 2011 at 17:20 +, emeric boit wrote: hi, This change doesn't solve the issue. I have remarked that the server crash when I use the network. I copy a small file several times without problem. On the IBM I do : scp USER@IP:/tmp/mpi.c . And when I copy a larger file the server crash : scp USER@IP:/bsd . And when I copy th same file (bsd) from an usb key I don't have problem. Emeric. that sounds like an interrupt sharing bug of some sort. is it bnx1 that you're using to reproduce a crash? try the following diff please (on a clean checkout): Index: mpi.c === RCS file: /home/cvs/src/sys/dev/ic/mpi.c,v retrieving revision 1.166 diff -u -p -r1.166 mpi.c --- mpi.c 1 Mar 2011 23:48:33 - 1.166 +++ mpi.c 2 Mar 2011 17:40:13 - @@ -887,6 +887,9 @@ mpi_intr(void *arg) u_int32_t reg; int rv = 0; + if ((mpi_read_intr(sc) MPI_INTR_STATUS_REPLY) == 0) + return (rv); + while ((reg = mpi_pop_reply(sc)) != 0x) { mpi_reply(sc, reg); rv = 1; ok krw@. Ken
Re : Re : OpenBSD crash on an IBM x3550 M3
Someone else has an idea ? Regards, Emeric. - Message d'origine De : emeric boit emericb...@yahoo.fr @ : Mike Belopuhov m...@crypt.org.ru Cc : bugs@openbsd.org Envoyi le : Mar 22 fivrier 2011, 18h 26min 46s Objet : Re : OpenBSD crash on an IBM x3550 M3 With the OpenBSD 4.9 snapshot (Jan. 2011) the message is : panic: mpi0: choked on reg 0xe0b06bff dva 0xc160d7fe map 0x7d198000 Stopped at Debugger+0x5: leave - Message d'origine De : Mike Belopuhov m...@crypt.org.ru @ : emeric boit emericb...@yahoo.fr Cc : bugs@openbsd.org Envoyi le : Mar 22 fivrier 2011, 16h 42min 42s Objet : Re: OpenBSD crash on an IBM x3550 M3 On Tue, Feb 22, 2011 at 15:21 +, emeric boit wrote: Stuart, Thanks for your response, but this patch doesn't resolve the problem. With the OpenBSD 4.9 snapshot (Jan. 2011) the problem is the same : uvm_fault(0x80cef780, 0x80001b6fe000, 0, 1) - e kernel: page fault trap, code=0 Stopped at mpi_reply+0xd0: movq0x10(%r13),%rdx ddb{0} ddb{0} trace mpi_reply() at mpi_reply+0xd0 mpi_intr() at mpi_intr+0x20 Xintr_ioapic_level18() at Xintr_ioapic_level18+0xec --- interrupt --- Bad frame pointer: 0x80001942dac0 end trace frame: 0x80001942dac0, count: -3 Xspllower+0xe: ddb{0} I thinks the bug is in this file : /usr/src/sys/dev/ic/mpi.c Regards, Emeric. lets prove a theory: Index: dev/ic/mpi.c === RCS file: /home/cvs/src/sys/dev/ic/mpi.c,v retrieving revision 1.165 diff -u -p -r1.165 mpi.c --- dev/ic/mpi.c24 Sep 2010 01:27:11 -1.165 +++ dev/ic/mpi.c 22 Feb 2011 15:38:19 - @@ -914,6 +914,10 @@ mpi_reply(struct mpi_softc *sc, u_int32_ reply_dva = (reg MPI_REPLY_QUEUE_ADDRESS_MASK) 1; i = (reply_dva - (u_int32_t)MPI_DMA_DVA(sc-sc_replies)) / MPI_REPLY_SIZE; + if (i 0 || i sc-sc_repq) +panic(%s: choked on reg %#x dva %#x map %#x, +DEVNAME(sc), reg, reply_dva, + (u_int32_t)MPI_DMA_DVA(sc-sc_replies)); rcb = sc-sc_rcbs[i]; bus_dmamap_sync(sc-sc_dmat,
Re : Re : OpenBSD crash on an IBM x3550 M3
Stuart, Thanks for your response, but this patch doesn't resolve the problem. With the OpenBSD 4.9 snapshot (Jan. 2011) the problem is the same : uvm_fault(0x80cef780, 0x80001b6fe000, 0, 1) - e kernel: page fault trap, code=0 Stopped at mpi_reply+0xd0: movq0x10(%r13),%rdx ddb{0} ddb{0} trace mpi_reply() at mpi_reply+0xd0 mpi_intr() at mpi_intr+0x20 Xintr_ioapic_level18() at Xintr_ioapic_level18+0xec --- interrupt --- Bad frame pointer: 0x80001942dac0 end trace frame: 0x80001942dac0, count: -3 Xspllower+0xe: ddb{0} I thinks the bug is in this file : /usr/src/sys/dev/ic/mpi.c Regards, Emeric. - Message d'origine De : Stuart Henderson s...@spacehopper.org @ : emeric boit emericb...@yahoo.fr Cc : bugs@openbsd.org Envoyi le : Lun 21 fivrier 2011, 13h 28min 13s Objet : Re: Re : OpenBSD crash on an IBM x3550 M3 You could try the patch for errata 002. http://www.openbsd.org/errata48.html On 2011/02/21 11:22, emeric boit wrote: Hello, Don't hesitate to contact me if you need more informations about this bug. Thanks for your help. Regards, Emeric. - Message d'origine De : emeric boit emericb...@yahoo.fr @ : bugs@openbsd.org Envoyi le : Ven 4 fivrier 2011, 15h 53min 11s Objet : OpenBSD crash on an IBM x3550 M3 Hello, After doing a clean install of OpenBSD 4.8 (AMD64) on an IBM x3550 M3, I find the system randomly panics after a period of use. uvm_fault(0x80cc8360, 0x8000149b7000, 0, 1) - e kernel: page fault trap, code=0 Stopped at mpi_reply+0x102:movq 0(%r13),%rax ddb{0} ddb{0} trace mpi_reply() at mpi_reply+0x102 mpi_intr() at mpi_intr+0x20 Xintr_ioapic_level18() at Xintr_ioapic_level18+0xec --- interrupt --- Bad frame pointer: 0x8000194e1920 end trace frame: 0x8000194e1920, count: -3 Xspllower+0xe: ddb{0} ddb{0} dmesg OpenBSD 4.8 (GENERIC.MP) #335: Mon Aug 16 09:09:20 MDT 2010 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP RTC BIOS diagnostic error 80clock_battery real mem = 2135011328 (2036MB) avail mem = 2064355328 (1968MB) mainbus0 at root bios0 at mainbus0: SMBIOS rev. 2.5 @ 0x7f6bd000 (82 entries) bios0: vendor IBM Corp. version -[D6E149AUS-1.09]- date 09/21/2010 bios0: IBM 69Y4438 acpi0 at bios0: rev 2 acpi0: sleep states S0 S1 S5 acpi0: tables DSDT FACP TCPA APIC MCFG SLIC HPET SSDT SSDT ERST DMAR acpi0: wakeup devices UHC1(S4) UHC2(S4) UHC3(S4) UHC4(S4) UHC5(S4) EHC1(S4) EHC 2(S4) acpitimer0 at acpi0: 3579545 Hz, 24 bits acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: Intel(R) Xeon(R) CPU E5506 @ 2.13GHz, 2133.68 MHz cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,C FLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,VMX,EST,TM2,S SS E3,CX16,xTPR,PDCM,DCA,SSE4.1,SSE4.2,POPCNT,NXE,LONG cpu0: 256KB 64b/line 8-way L2 cache cpu0: apic clock running at 133MHz cpu1 at mainbus0: apid 2 (application processor) cpu1: Intel(R) Xeon(R) CPU E5506 @ 2.13GHz, 2133.41 MHz cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,C FLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,VMX,EST,TM2,S SS E3,CX16,xTPR,PDCM,DCA,SSE4.1,SSE4.2,POPCNT,NXE,LONG cpu1: 256KB 64b/line 8-way L2 cache cpu2 at mainbus0: apid 4 (application processor) cpu2: Intel(R) Xeon(R) CPU E5506 @ 2.13GHz, 2133.41 MHz cpu2: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,C FLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,VMX,EST,TM2,S SS E3,CX16,xTPR,PDCM,DCA,SSE4.1,SSE4.2,POPCNT,NXE,LONG cpu2: 256KB 64b/line 8-way L2 cache cpu3 at mainbus0: apid 6 (application processor) cpu3: Intel(R) Xeon(R) CPU E5506 @ 2.13GHz, 2133.41 MHz cpu3: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,C FLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,VMX,EST,TM2,S SS E3,CX16,xTPR,PDCM,DCA,SSE4.1,SSE4.2,POPCNT,NXE,LONG cpu3: 256KB 64b/line 8-way L2 cache ioapic0 at mainbus0: apid 8 pa 0xfec0, version 20, 24 pins ioapic1 at mainbus0: apid 9 pa 0xfec8, version 20, 24 pins acpihpet0 at acpi0: 14318179 Hz acpiprt0 at acpi0: bus 0 (PCI0) acpiprt1 at acpi0: bus 11 (PCI1) acpiprt2 at acpi0: bus 21 (PCI3) acpiprt3 at acpi0: bus -1 (PCI6) acpiprt4 at acpi0: bus 31 (PCI7) acpiprt5 at acpi0: bus -1 (PCI9) acpicpu0 at acpi0 acpicpu1 at acpi0 acpicpu2 at acpi0 acpicpu3 at acpi0 ipmi at mainbus0 not configured pci0 at mainbus0 bus 0 pchb0 at pci0 dev 0 function 0 Intel 5520 Host rev 0x22 ppb0 at pci0 dev 1 function 0 Intel X58 PCIE rev 0x22 pci1 at ppb0 bus 11 bnx0 at pci1 dev 0 function 0 Broadcom BCM5709 rev 0x20: apic 9 int 4 (irq 11 ) bnx1 at pci1 dev 0 function 1 Broadcom BCM5709 rev 0x20: apic 9 int 16 (irq 1 0) ppb1 at pci0 dev 2 function 0 Intel X58 PCIE rev 0x22 pci2 at ppb1 bus 16 ppb2 at pci0 dev 3 function
Re: OpenBSD crash on an IBM x3550 M3
On Tue, Feb 22, 2011 at 15:21 +, emeric boit wrote: Stuart, Thanks for your response, but this patch doesn't resolve the problem. With the OpenBSD 4.9 snapshot (Jan. 2011) the problem is the same : uvm_fault(0x80cef780, 0x80001b6fe000, 0, 1) - e kernel: page fault trap, code=0 Stopped at mpi_reply+0xd0: movq0x10(%r13),%rdx ddb{0} ddb{0} trace mpi_reply() at mpi_reply+0xd0 mpi_intr() at mpi_intr+0x20 Xintr_ioapic_level18() at Xintr_ioapic_level18+0xec --- interrupt --- Bad frame pointer: 0x80001942dac0 end trace frame: 0x80001942dac0, count: -3 Xspllower+0xe: ddb{0} I thinks the bug is in this file : /usr/src/sys/dev/ic/mpi.c Regards, Emeric. lets prove a theory: Index: dev/ic/mpi.c === RCS file: /home/cvs/src/sys/dev/ic/mpi.c,v retrieving revision 1.165 diff -u -p -r1.165 mpi.c --- dev/ic/mpi.c24 Sep 2010 01:27:11 - 1.165 +++ dev/ic/mpi.c22 Feb 2011 15:38:19 - @@ -914,6 +914,10 @@ mpi_reply(struct mpi_softc *sc, u_int32_ reply_dva = (reg MPI_REPLY_QUEUE_ADDRESS_MASK) 1; i = (reply_dva - (u_int32_t)MPI_DMA_DVA(sc-sc_replies)) / MPI_REPLY_SIZE; + if (i 0 || i sc-sc_repq) + panic(%s: choked on reg %#x dva %#x map %#x, + DEVNAME(sc), reg, reply_dva, + (u_int32_t)MPI_DMA_DVA(sc-sc_replies)); rcb = sc-sc_rcbs[i]; bus_dmamap_sync(sc-sc_dmat,
Re : OpenBSD crash on an IBM x3550 M3
With the OpenBSD 4.9 snapshot (Jan. 2011) the message is : panic: mpi0: choked on reg 0xe0b06bff dva 0xc160d7fe map 0x7d198000 Stopped at Debugger+0x5: leave - Message d'origine De : Mike Belopuhov m...@crypt.org.ru @ : emeric boit emericb...@yahoo.fr Cc : bugs@openbsd.org Envoyi le : Mar 22 fivrier 2011, 16h 42min 42s Objet : Re: OpenBSD crash on an IBM x3550 M3 On Tue, Feb 22, 2011 at 15:21 +, emeric boit wrote: Stuart, Thanks for your response, but this patch doesn't resolve the problem. With the OpenBSD 4.9 snapshot (Jan. 2011) the problem is the same : uvm_fault(0x80cef780, 0x80001b6fe000, 0, 1) - e kernel: page fault trap, code=0 Stopped at mpi_reply+0xd0: movq0x10(%r13),%rdx ddb{0} ddb{0} trace mpi_reply() at mpi_reply+0xd0 mpi_intr() at mpi_intr+0x20 Xintr_ioapic_level18() at Xintr_ioapic_level18+0xec --- interrupt --- Bad frame pointer: 0x80001942dac0 end trace frame: 0x80001942dac0, count: -3 Xspllower+0xe: ddb{0} I thinks the bug is in this file : /usr/src/sys/dev/ic/mpi.c Regards, Emeric. lets prove a theory: Index: dev/ic/mpi.c === RCS file: /home/cvs/src/sys/dev/ic/mpi.c,v retrieving revision 1.165 diff -u -p -r1.165 mpi.c --- dev/ic/mpi.c24 Sep 2010 01:27:11 -1.165 +++ dev/ic/mpi.c 22 Feb 2011 15:38:19 - @@ -914,6 +914,10 @@ mpi_reply(struct mpi_softc *sc, u_int32_ reply_dva = (reg MPI_REPLY_QUEUE_ADDRESS_MASK) 1; i = (reply_dva - (u_int32_t)MPI_DMA_DVA(sc-sc_replies)) / MPI_REPLY_SIZE; +if (i 0 || i sc-sc_repq) +panic(%s: choked on reg %#x dva %#x map %#x, +DEVNAME(sc), reg, reply_dva, +(u_int32_t)MPI_DMA_DVA(sc-sc_replies)); rcb = sc-sc_rcbs[i]; bus_dmamap_sync(sc-sc_dmat,
Re : OpenBSD crash on an IBM x3550 M3
Hello, Don't hesitate to contact me if you need more informations about this bug. Thanks for your help. Regards, Emeric. - Message d'origine De : emeric boit emericb...@yahoo.fr @ : bugs@openbsd.org Envoyi le : Ven 4 fivrier 2011, 15h 53min 11s Objet : OpenBSD crash on an IBM x3550 M3 Hello, After doing a clean install of OpenBSD 4.8 (AMD64) on an IBM x3550 M3, I find the system randomly panics after a period of use. uvm_fault(0x80cc8360, 0x8000149b7000, 0, 1) - e kernel: page fault trap, code=0 Stopped at mpi_reply+0x102:movq 0(%r13),%rax ddb{0} ddb{0} trace mpi_reply() at mpi_reply+0x102 mpi_intr() at mpi_intr+0x20 Xintr_ioapic_level18() at Xintr_ioapic_level18+0xec --- interrupt --- Bad frame pointer: 0x8000194e1920 end trace frame: 0x8000194e1920, count: -3 Xspllower+0xe: ddb{0} ddb{0} dmesg OpenBSD 4.8 (GENERIC.MP) #335: Mon Aug 16 09:09:20 MDT 2010 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP RTC BIOS diagnostic error 80clock_battery real mem = 2135011328 (2036MB) avail mem = 2064355328 (1968MB) mainbus0 at root bios0 at mainbus0: SMBIOS rev. 2.5 @ 0x7f6bd000 (82 entries) bios0: vendor IBM Corp. version -[D6E149AUS-1.09]- date 09/21/2010 bios0: IBM 69Y4438 acpi0 at bios0: rev 2 acpi0: sleep states S0 S1 S5 acpi0: tables DSDT FACP TCPA APIC MCFG SLIC HPET SSDT SSDT ERST DMAR acpi0: wakeup devices UHC1(S4) UHC2(S4) UHC3(S4) UHC4(S4) UHC5(S4) EHC1(S4) EHC 2(S4) acpitimer0 at acpi0: 3579545 Hz, 24 bits acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: Intel(R) Xeon(R) CPU E5506 @ 2.13GHz, 2133.68 MHz cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,C FLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,VMX,EST,TM2,S SS E3,CX16,xTPR,PDCM,DCA,SSE4.1,SSE4.2,POPCNT,NXE,LONG cpu0: 256KB 64b/line 8-way L2 cache cpu0: apic clock running at 133MHz cpu1 at mainbus0: apid 2 (application processor) cpu1: Intel(R) Xeon(R) CPU E5506 @ 2.13GHz, 2133.41 MHz cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,C FLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,VMX,EST,TM2,S SS E3,CX16,xTPR,PDCM,DCA,SSE4.1,SSE4.2,POPCNT,NXE,LONG cpu1: 256KB 64b/line 8-way L2 cache cpu2 at mainbus0: apid 4 (application processor) cpu2: Intel(R) Xeon(R) CPU E5506 @ 2.13GHz, 2133.41 MHz cpu2: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,C FLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,VMX,EST,TM2,S SS E3,CX16,xTPR,PDCM,DCA,SSE4.1,SSE4.2,POPCNT,NXE,LONG cpu2: 256KB 64b/line 8-way L2 cache cpu3 at mainbus0: apid 6 (application processor) cpu3: Intel(R) Xeon(R) CPU E5506 @ 2.13GHz, 2133.41 MHz cpu3: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,C FLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,VMX,EST,TM2,S SS E3,CX16,xTPR,PDCM,DCA,SSE4.1,SSE4.2,POPCNT,NXE,LONG cpu3: 256KB 64b/line 8-way L2 cache ioapic0 at mainbus0: apid 8 pa 0xfec0, version 20, 24 pins ioapic1 at mainbus0: apid 9 pa 0xfec8, version 20, 24 pins acpihpet0 at acpi0: 14318179 Hz acpiprt0 at acpi0: bus 0 (PCI0) acpiprt1 at acpi0: bus 11 (PCI1) acpiprt2 at acpi0: bus 21 (PCI3) acpiprt3 at acpi0: bus -1 (PCI6) acpiprt4 at acpi0: bus 31 (PCI7) acpiprt5 at acpi0: bus -1 (PCI9) acpicpu0 at acpi0 acpicpu1 at acpi0 acpicpu2 at acpi0 acpicpu3 at acpi0 ipmi at mainbus0 not configured pci0 at mainbus0 bus 0 pchb0 at pci0 dev 0 function 0 Intel 5520 Host rev 0x22 ppb0 at pci0 dev 1 function 0 Intel X58 PCIE rev 0x22 pci1 at ppb0 bus 11 bnx0 at pci1 dev 0 function 0 Broadcom BCM5709 rev 0x20: apic 9 int 4 (irq 11 ) bnx1 at pci1 dev 0 function 1 Broadcom BCM5709 rev 0x20: apic 9 int 16 (irq 1 0) ppb1 at pci0 dev 2 function 0 Intel X58 PCIE rev 0x22 pci2 at ppb1 bus 16 ppb2 at pci0 dev 3 function 0 Intel X58 PCIE rev 0x22 pci3 at ppb2 bus 21 ppb3 at pci0 dev 5 function 0 Intel X58 PCIE rev 0x22 pci4 at ppb3 bus 26 ppb4 at pci0 dev 7 function 0 Intel X58 PCIE rev 0x22: apic 9 int 6 (irq 11) pci5 at ppb4 bus 31 Intel I340-T4 (82580) rev 0x01 at pci5 dev 0 function 0 not configured Intel I340-T4 (82580) rev 0x01 at pci5 dev 0 function 1 not configured Intel I340-T4 (82580) rev 0x01 at pci5 dev 0 function 2 not configured Intel I340-T4 (82580) rev 0x01 at pci5 dev 0 function 3 not configured Intel X58 QuickPath rev 0x22 at pci0 dev 16 function 0 not configured Intel X58 QuickPath rev 0x22 at pci0 dev 16 function 1 not configured Intel X58 QuickPath rev 0x22 at pci0 dev 17 function 0 not configured Intel X58 QuickPath rev 0x22 at pci0 dev 17 function 1 not configured Intel X58 Misc rev 0x22 at pci0 dev 20 function 0 not configured Intel X58 GPIO rev 0x22 at pci0 dev 20 function 1 not configured Intel X58 RAS rev 0x22 at pci0 dev 20 function 2 not configured Intel X58 Throttle rev 0x22 at pci0 dev 20 function 3 not configured vendor Intel, unknown product 0x342f (class system subclass interrupt, rev 0x 22