Re: OpenBSD crash on an IBM x3550 M3

2011-03-04 Thread Marco Peereboom
That is a huge penalty because it is read over the pci bus.  The trick
with 0x should work just fine per the doco and other os' drivers
(on top of my head).  The question I have is does Linux only have one
device per interrupt?

I am going to reference the doco one more time on this.

On Thu, Mar 03, 2011 at 10:35:59PM -0500, Kenneth R Westerback wrote:
 On Thu, Mar 03, 2011 at 07:11:52PM +0100, Mike Belopuhov wrote:
  On Fri, Feb 04, 2011 at 14:53 +, emeric boit wrote:
   Hello,
   
   After doing a clean install of OpenBSD 4.8 (AMD64) on an IBM x3550 M3,
   I find 
   the
   system randomly panics after a period of use.
   uvm_fault(0x80cc8360, 0x8000149b7000, 0, 1) - e
   kernel: page
   fault trap, code=0
   Stopped at  mpi_reply+0x102:movq
   0(%r13),%rax
   ddb{0}
   
   ddb{0} trace
   mpi_reply() at mpi_reply+0x102
   mpi_intr()
   at mpi_intr+0x20
   Xintr_ioapic_level18() at Xintr_ioapic_level18+0xec
   ---
   interrupt ---
   Bad frame pointer: 0x8000194e1920
   end trace frame:
   0x8000194e1920, count: -3
   Xspllower+0xe:
   ddb{0}
   
  
  We've tried different things, but after this hint i realised
  that what might be happening is that bnx and mpi interrupts
  are chained (it's bnx0 actually, my initial guess about bnx1
  was wrong) and mpi_intr is called first.  Currently neither
  mpi(4) nor mpii(4) don't check the interrupt status register
  but look directly into the reply post queue.  Although,
  there's not supposed to be any race between host cpu reading
  from the memory and ioc writing to it, in practice it turns
  out that in some particular hardware configurations this rule
  is violated and we read a garbled reply from the controller.
  
  If my memory serves, I've considered this for the mpii_intr
  but never got into the situation where it was needed and
  thus omitted it.  I guess I have to bring it back too.
  
  Emeric tortured the machine with this diff and reported that
  it solves the issue for him.  OK to commit?
  
  On Wed, Mar 02, 2011 at 17:20 +, emeric boit wrote:
   hi,
   
   This change doesn't solve the issue.
   
   I have remarked that the server crash when I use the network.
   
   I copy a small file several times without problem.
   On the IBM I do :
   scp USER@IP:/tmp/mpi.c .
   
   And when I copy a larger file the server crash :
   scp USER@IP:/bsd .
   
   
   And when I copy th same file (bsd) from an usb key I don't have problem.
   
   Emeric.
   
  
  that sounds like an interrupt sharing bug of some sort.
  is it bnx1 that you're using to reproduce a crash?
  
  try the following diff please (on a clean checkout):
  
  Index: mpi.c
  ===
  RCS file: /home/cvs/src/sys/dev/ic/mpi.c,v
  retrieving revision 1.166
  diff -u -p -r1.166 mpi.c
  --- mpi.c   1 Mar 2011 23:48:33 -   1.166
  +++ mpi.c   2 Mar 2011 17:40:13 -
  @@ -887,6 +887,9 @@ mpi_intr(void *arg)
  u_int32_t   reg;
  int rv = 0;
   
  +   if ((mpi_read_intr(sc)  MPI_INTR_STATUS_REPLY) == 0)
  +   return (rv);
  +
  while ((reg = mpi_pop_reply(sc)) != 0x) {
  mpi_reply(sc, reg);
  rv = 1;
  
 
 ok krw@.
 
  Ken



Re: OpenBSD crash on an IBM x3550 M3

2011-03-04 Thread David Gwynne
i agree that mikebs change should go in.

On 05/03/2011, at 12:10 AM, Mark Kettenis wrote:

 Date: Fri, 4 Mar 2011 07:30:24 -0600
 From: Marco Peereboom sl...@peereboom.us

 That is a huge penalty because it is read over the pci bus.  The trick
 with 0x should work just fine per the doco and other os' drivers
 (on top of my head).  The question I have is does Linux only have one
 device per interrupt?

 Linux probably does a better job at avoiding shared interrupts than we
 do, but it on some hardware it can't be avoided so it has to deal with
 it.

 If you wantto avoid reading the interrupt status register, you'll have
 to stop trusting the hardware (or rather the firmware) in make
 mpi_reply(), and do bounds checks before accessing sc-sc_rcbs[] and
 sc-sc_ccbs[].  To be honest, that would be a good idea even if we
 didn't have this bug.

 In the meantime I think mikeb's fix should be committed.

 I am going to reference the doco one more time on this.

 On Thu, Mar 03, 2011 at 10:35:59PM -0500, Kenneth R Westerback wrote:
 On Thu, Mar 03, 2011 at 07:11:52PM +0100, Mike Belopuhov wrote:
 On Fri, Feb 04, 2011 at 14:53 +, emeric boit wrote:
 Hello,

 After doing a clean install of OpenBSD 4.8 (AMD64) on an IBM x3550 M3,
 I find
 the
 system randomly panics after a period of use.
 uvm_fault(0x80cc8360, 0x8000149b7000, 0, 1) - e
 kernel: page
 fault trap, code=0
 Stopped at  mpi_reply+0x102:movq
 0(%r13),%rax
 ddb{0}

 ddb{0} trace
 mpi_reply() at mpi_reply+0x102
 mpi_intr()
 at mpi_intr+0x20
 Xintr_ioapic_level18() at Xintr_ioapic_level18+0xec
 ---
 interrupt ---
 Bad frame pointer: 0x8000194e1920
 end trace frame:
 0x8000194e1920, count: -3
 Xspllower+0xe:
 ddb{0}


 We've tried different things, but after this hint i realised
 that what might be happening is that bnx and mpi interrupts
 are chained (it's bnx0 actually, my initial guess about bnx1
 was wrong) and mpi_intr is called first.  Currently neither
 mpi(4) nor mpii(4) don't check the interrupt status register
 but look directly into the reply post queue.  Although,
 there's not supposed to be any race between host cpu reading
 from the memory and ioc writing to it, in practice it turns
 out that in some particular hardware configurations this rule
 is violated and we read a garbled reply from the controller.

 If my memory serves, I've considered this for the mpii_intr
 but never got into the situation where it was needed and
 thus omitted it.  I guess I have to bring it back too.

 Emeric tortured the machine with this diff and reported that
 it solves the issue for him.  OK to commit?

 On Wed, Mar 02, 2011 at 17:20 +, emeric boit wrote:
 hi,

 This change doesn't solve the issue.

 I have remarked that the server crash when I use the network.

 I copy a small file several times without problem.
 On the IBM I do :
 scp USER@IP:/tmp/mpi.c .

 And when I copy a larger file the server crash :
 scp USER@IP:/bsd .


 And when I copy th same file (bsd) from an usb key I don't have
problem.

 Emeric.


 that sounds like an interrupt sharing bug of some sort.
 is it bnx1 that you're using to reproduce a crash?

 try the following diff please (on a clean checkout):

 Index: mpi.c
 ===
 RCS file: /home/cvs/src/sys/dev/ic/mpi.c,v
 retrieving revision 1.166
 diff -u -p -r1.166 mpi.c
 --- mpi.c  1 Mar 2011 23:48:33 -   1.166
 +++ mpi.c  2 Mar 2011 17:40:13 -
 @@ -887,6 +887,9 @@ mpi_intr(void *arg)
u_int32_t   reg;
int rv = 0;

 +  if ((mpi_read_intr(sc)  MPI_INTR_STATUS_REPLY) == 0)
 +  return (rv);
 +
while ((reg = mpi_pop_reply(sc)) != 0x) {
mpi_reply(sc, reg);
rv = 1;


 ok krw@.

  Ken



Re: OpenBSD crash on an IBM x3550 M3

2011-03-03 Thread Mike Belopuhov
On Fri, Feb 04, 2011 at 14:53 +, emeric boit wrote:
 Hello,
 
 After doing a clean install of OpenBSD 4.8 (AMD64) on an IBM x3550 M3,
 I find 
 the
 system randomly panics after a period of use.
 uvm_fault(0x80cc8360, 0x8000149b7000, 0, 1) - e
 kernel: page
 fault trap, code=0
 Stopped at  mpi_reply+0x102:movq
 0(%r13),%rax
 ddb{0}
 
 ddb{0} trace
 mpi_reply() at mpi_reply+0x102
 mpi_intr()
 at mpi_intr+0x20
 Xintr_ioapic_level18() at Xintr_ioapic_level18+0xec
 ---
 interrupt ---
 Bad frame pointer: 0x8000194e1920
 end trace frame:
 0x8000194e1920, count: -3
 Xspllower+0xe:
 ddb{0}
 

We've tried different things, but after this hint i realised
that what might be happening is that bnx and mpi interrupts
are chained (it's bnx0 actually, my initial guess about bnx1
was wrong) and mpi_intr is called first.  Currently neither
mpi(4) nor mpii(4) don't check the interrupt status register
but look directly into the reply post queue.  Although,
there's not supposed to be any race between host cpu reading
from the memory and ioc writing to it, in practice it turns
out that in some particular hardware configurations this rule
is violated and we read a garbled reply from the controller.

If my memory serves, I've considered this for the mpii_intr
but never got into the situation where it was needed and
thus omitted it.  I guess I have to bring it back too.

Emeric tortured the machine with this diff and reported that
it solves the issue for him.  OK to commit?

On Wed, Mar 02, 2011 at 17:20 +, emeric boit wrote:
 hi,
 
 This change doesn't solve the issue.
 
 I have remarked that the server crash when I use the network.
 
 I copy a small file several times without problem.
 On the IBM I do :
 scp USER@IP:/tmp/mpi.c .
 
 And when I copy a larger file the server crash :
 scp USER@IP:/bsd .
 
 
 And when I copy th same file (bsd) from an usb key I don't have problem.
 
 Emeric.
 

that sounds like an interrupt sharing bug of some sort.
is it bnx1 that you're using to reproduce a crash?

try the following diff please (on a clean checkout):

Index: mpi.c
===
RCS file: /home/cvs/src/sys/dev/ic/mpi.c,v
retrieving revision 1.166
diff -u -p -r1.166 mpi.c
--- mpi.c   1 Mar 2011 23:48:33 -   1.166
+++ mpi.c   2 Mar 2011 17:40:13 -
@@ -887,6 +887,9 @@ mpi_intr(void *arg)
u_int32_t   reg;
int rv = 0;
 
+   if ((mpi_read_intr(sc)  MPI_INTR_STATUS_REPLY) == 0)
+   return (rv);
+
while ((reg = mpi_pop_reply(sc)) != 0x) {
mpi_reply(sc, reg);
rv = 1;



Re: OpenBSD crash on an IBM x3550 M3

2011-03-03 Thread Kenneth R Westerback
On Thu, Mar 03, 2011 at 07:11:52PM +0100, Mike Belopuhov wrote:
 On Fri, Feb 04, 2011 at 14:53 +, emeric boit wrote:
  Hello,
  
  After doing a clean install of OpenBSD 4.8 (AMD64) on an IBM x3550 M3,
  I find 
  the
  system randomly panics after a period of use.
  uvm_fault(0x80cc8360, 0x8000149b7000, 0, 1) - e
  kernel: page
  fault trap, code=0
  Stopped at  mpi_reply+0x102:movq
  0(%r13),%rax
  ddb{0}
  
  ddb{0} trace
  mpi_reply() at mpi_reply+0x102
  mpi_intr()
  at mpi_intr+0x20
  Xintr_ioapic_level18() at Xintr_ioapic_level18+0xec
  ---
  interrupt ---
  Bad frame pointer: 0x8000194e1920
  end trace frame:
  0x8000194e1920, count: -3
  Xspllower+0xe:
  ddb{0}
  
 
 We've tried different things, but after this hint i realised
 that what might be happening is that bnx and mpi interrupts
 are chained (it's bnx0 actually, my initial guess about bnx1
 was wrong) and mpi_intr is called first.  Currently neither
 mpi(4) nor mpii(4) don't check the interrupt status register
 but look directly into the reply post queue.  Although,
 there's not supposed to be any race between host cpu reading
 from the memory and ioc writing to it, in practice it turns
 out that in some particular hardware configurations this rule
 is violated and we read a garbled reply from the controller.
 
 If my memory serves, I've considered this for the mpii_intr
 but never got into the situation where it was needed and
 thus omitted it.  I guess I have to bring it back too.
 
 Emeric tortured the machine with this diff and reported that
 it solves the issue for him.  OK to commit?
 
 On Wed, Mar 02, 2011 at 17:20 +, emeric boit wrote:
  hi,
  
  This change doesn't solve the issue.
  
  I have remarked that the server crash when I use the network.
  
  I copy a small file several times without problem.
  On the IBM I do :
  scp USER@IP:/tmp/mpi.c .
  
  And when I copy a larger file the server crash :
  scp USER@IP:/bsd .
  
  
  And when I copy th same file (bsd) from an usb key I don't have problem.
  
  Emeric.
  
 
 that sounds like an interrupt sharing bug of some sort.
 is it bnx1 that you're using to reproduce a crash?
 
 try the following diff please (on a clean checkout):
 
 Index: mpi.c
 ===
 RCS file: /home/cvs/src/sys/dev/ic/mpi.c,v
 retrieving revision 1.166
 diff -u -p -r1.166 mpi.c
 --- mpi.c 1 Mar 2011 23:48:33 -   1.166
 +++ mpi.c 2 Mar 2011 17:40:13 -
 @@ -887,6 +887,9 @@ mpi_intr(void *arg)
   u_int32_t   reg;
   int rv = 0;
  
 + if ((mpi_read_intr(sc)  MPI_INTR_STATUS_REPLY) == 0)
 + return (rv);
 +
   while ((reg = mpi_pop_reply(sc)) != 0x) {
   mpi_reply(sc, reg);
   rv = 1;
 

ok krw@.

 Ken



Re : Re : OpenBSD crash on an IBM x3550 M3

2011-02-24 Thread emeric boit
Someone else has an idea ?

Regards,
Emeric.



- Message d'origine 
De : emeric boit emericb...@yahoo.fr
@ : Mike Belopuhov m...@crypt.org.ru
Cc : bugs@openbsd.org
Envoyi le : Mar 22 fivrier 2011, 18h 26min 46s
Objet :
Re : OpenBSD crash on an IBM x3550 M3

With the OpenBSD 4.9 snapshot (Jan.
2011) the message is :

panic: mpi0:
choked on reg 0xe0b06bff dva 0xc160d7fe
map 0x7d198000
Stopped at
Debugger+0x5:   leave



- Message d'origine

De : Mike Belopuhov
m...@crypt.org.ru
@ : emeric boit
emericb...@yahoo.fr
Cc : bugs@openbsd.org
Envoyi le : Mar 22 fivrier 2011,
16h 42min 42s
Objet : Re: OpenBSD crash on an
IBM x3550 M3

On Tue, Feb 22,
2011 at 15:21 +, emeric boit wrote:

Stuart,
 
 Thanks for your
response, but this patch doesn't resolve the
problem.
 With the OpenBSD 4.9
snapshot (Jan. 2011) the problem is the same :
 uvm_fault(0x80cef780,
0x80001b6fe000, 0, 1) - e
 kernel: page
 fault trap, code=0
 Stopped
at  mpi_reply+0xd0: movq0x10(%r13),%rdx
 ddb{0}
 
 ddb{0} trace

mpi_reply() at mpi_reply+0xd0
 mpi_intr() at

mpi_intr+0x20

Xintr_ioapic_level18() at Xintr_ioapic_level18+0xec
 ---

interrupt ---

Bad frame pointer:
 0x80001942dac0
 end trace frame:

0x80001942dac0, count: -3
 Xspllower+0xe:
 ddb{0}
 
 
 I thinks the
bug is in
 this file : /usr/src/sys/dev/ic/mpi.c
 
 Regards,
 Emeric.

lets prove a theory:

Index: dev/ic/mpi.c
===
RCS file:
/home/cvs/src/sys/dev/ic/mpi.c,v
retrieving revision 1.165
diff -u -p -r1.165
mpi.c
--- dev/ic/mpi.c24 Sep 2010 01:27:11 -1.165
+++ dev/ic/mpi.c
22 Feb 2011 15:38:19 -
@@ -914,6 +914,10 @@ mpi_reply(struct mpi_softc
*sc, u_int32_
reply_dva = (reg  MPI_REPLY_QUEUE_ADDRESS_MASK)  1;
i
= (reply_dva - (u_int32_t)MPI_DMA_DVA(sc-sc_replies)) /
MPI_REPLY_SIZE;
+
if (i  0 || i  sc-sc_repq)
+panic(%s:
choked on reg %#x dva
%#x map %#x,
+DEVNAME(sc), reg,
reply_dva,
+
(u_int32_t)MPI_DMA_DVA(sc-sc_replies));
rcb = sc-sc_rcbs[i];
bus_dmamap_sync(sc-sc_dmat,



Re : Re : OpenBSD crash on an IBM x3550 M3

2011-02-22 Thread emeric boit
Stuart,

Thanks for your response, but this patch doesn't resolve the problem.
With the OpenBSD 4.9 snapshot (Jan. 2011) the problem is the same :
uvm_fault(0x80cef780, 0x80001b6fe000, 0, 1) - e
kernel: page
fault trap, code=0
Stopped at  mpi_reply+0xd0: movq0x10(%r13),%rdx
ddb{0}

ddb{0} trace
mpi_reply() at mpi_reply+0xd0
mpi_intr() at
mpi_intr+0x20
Xintr_ioapic_level18() at Xintr_ioapic_level18+0xec
---
interrupt ---
Bad frame pointer:
0x80001942dac0
end trace frame:
0x80001942dac0, count: -3
Xspllower+0xe:
ddb{0}


I thinks the bug is in
this file : /usr/src/sys/dev/ic/mpi.c

Regards,
Emeric.


- Message
d'origine 
De : Stuart Henderson s...@spacehopper.org
@ : emeric boit
emericb...@yahoo.fr
Cc : bugs@openbsd.org
Envoyi le : Lun 21 fivrier 2011,
13h 28min 13s
Objet : Re: Re : OpenBSD crash on an IBM x3550 M3

You could try
the patch for errata 002.

http://www.openbsd.org/errata48.html



On
2011/02/21 11:22, emeric boit wrote:
 Hello,
 
 Don't hesitate to contact
me if you need more informations about this
 bug.
 
 
 Thanks for your
help.
 
 Regards,
 
 Emeric.
 
 
 - Message d'origine 
 De :
emeric boit emericb...@yahoo.fr
 @ : bugs@openbsd.org
 Envoyi le : Ven 4

fivrier 2011, 15h 53min 11s
 Objet : OpenBSD crash on an IBM x3550 M3
 

Hello,
 After doing a clean install of OpenBSD 4.8 (AMD64) on an IBM x3550
M3, I find
 the system randomly panics after a period of use.

uvm_fault(0x80cc8360, 0x8000149b7000, 0, 1) - e
 kernel: page

fault trap, code=0
 Stopped at  mpi_reply+0x102:movq

0(%r13),%rax
 ddb{0}
 
 ddb{0} trace
 mpi_reply() at mpi_reply+0x102

mpi_intr()
 at mpi_intr+0x20
 Xintr_ioapic_level18() at
Xintr_ioapic_level18+0xec
 ---
 interrupt ---
 Bad frame pointer:
0x8000194e1920
 end trace frame:
 0x8000194e1920, count: -3

Xspllower+0xe:
 ddb{0}
 
 ddb{0} dmesg
 OpenBSD
 4.8 (GENERIC.MP) #335:
Mon Aug 16 09:09:20 MDT 2010

dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
 RTC
BIOS
 diagnostic error 80clock_battery
 real mem = 2135011328 (2036MB)

avail mem =
 2064355328 (1968MB)
 mainbus0 at root
 bios0 at mainbus0:
SMBIOS rev. 2.5 @
 0x7f6bd000 (82 entries)
 bios0: vendor IBM Corp. version
-[D6E149AUS-1.09]-
 date 09/21/2010
 bios0: IBM 69Y4438
 acpi0 at bios0:
rev 2
 acpi0: sleep states
 S0 S1 S5
 acpi0: tables DSDT FACP TCPA APIC
MCFG SLIC HPET SSDT SSDT ERST DMAR
 acpi0: wakeup devices UHC1(S4) UHC2(S4)
UHC3(S4) UHC4(S4) UHC5(S4) EHC1(S4)
 EHC
 2(S4)
 acpitimer0 at acpi0:
3579545 Hz, 24 bits
 acpimadt0 at acpi0 addr
 0xfee0: PC-AT compat

cpu0 at mainbus0: apid 0 (boot processor)
 cpu0:
 Intel(R) Xeon(R) CPU E5506
@ 2.13GHz, 2133.68 MHz
 cpu0:

FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,C

FLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,VMX,EST,TM2,S
 SS
 E3,CX16,xTPR,PDCM,DCA,SSE4.1,SSE4.2,POPCNT,NXE,LONG
 cpu0: 256KB
64b/line
 8-way L2 cache
 cpu0: apic clock running at 133MHz
 cpu1 at
mainbus0: apid 2
 (application processor)
 cpu1: Intel(R) Xeon(R) CPU E5506
@ 2.13GHz, 2133.41
 MHz
 cpu1:

FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,C

FLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,VMX,EST,TM2,S
 SS
 E3,CX16,xTPR,PDCM,DCA,SSE4.1,SSE4.2,POPCNT,NXE,LONG
 cpu1: 256KB
64b/line
 8-way L2 cache
 cpu2 at mainbus0: apid 4 (application processor)

cpu2: Intel(R)
 Xeon(R) CPU E5506 @ 2.13GHz, 2133.41 MHz
 cpu2:

FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,C

FLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,VMX,EST,TM2,S
 SS
 E3,CX16,xTPR,PDCM,DCA,SSE4.1,SSE4.2,POPCNT,NXE,LONG
 cpu2: 256KB
64b/line
 8-way L2 cache
 cpu3 at mainbus0: apid 6 (application processor)

cpu3: Intel(R)
 Xeon(R) CPU E5506 @ 2.13GHz, 2133.41 MHz
 cpu3:

FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,C

FLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,VMX,EST,TM2,S
 SS
 E3,CX16,xTPR,PDCM,DCA,SSE4.1,SSE4.2,POPCNT,NXE,LONG
 cpu3: 256KB
64b/line
 8-way L2 cache
 ioapic0 at mainbus0: apid 8 pa 0xfec0, version
20, 24 pins
 ioapic1 at mainbus0: apid 9 pa 0xfec8, version 20, 24 pins

acpihpet0 at
 acpi0: 14318179 Hz
 acpiprt0 at acpi0: bus 0 (PCI0)
 acpiprt1
at acpi0: bus 11
 (PCI1)
 acpiprt2 at acpi0: bus 21 (PCI3)
 acpiprt3 at
acpi0: bus -1 (PCI6)
 acpiprt4 at acpi0: bus 31 (PCI7)
 acpiprt5 at acpi0:
bus -1 (PCI9)
 acpicpu0 at
 acpi0
 acpicpu1 at acpi0
 acpicpu2 at acpi0

acpicpu3 at acpi0
 ipmi at mainbus0
 not configured
 pci0 at mainbus0 bus 0
 pchb0 at pci0 dev 0 function 0 Intel
 5520 Host rev 0x22
 ppb0 at pci0
dev 1 function 0 Intel X58 PCIE rev 0x22
 pci1 at ppb0 bus 11
 bnx0 at
pci1 dev 0 function 0 Broadcom BCM5709 rev 0x20:
 apic 9 int 4 (irq 11
 )
 bnx1 at pci1 dev 0 function 1 Broadcom BCM5709 rev
 0x20: apic 9 int 16
(irq 1
 0)
 ppb1 at pci0 dev 2 function 0 Intel X58 PCIE
 rev 0x22
 pci2
at ppb1 bus 16
 ppb2 at pci0 dev 3 function

Re: OpenBSD crash on an IBM x3550 M3

2011-02-22 Thread Mike Belopuhov
On Tue, Feb 22, 2011 at 15:21 +, emeric boit wrote:
 Stuart,
 
 Thanks for your response, but this patch doesn't resolve the problem.
 With the OpenBSD 4.9 snapshot (Jan. 2011) the problem is the same :
 uvm_fault(0x80cef780, 0x80001b6fe000, 0, 1) - e
 kernel: page
 fault trap, code=0
 Stopped at  mpi_reply+0xd0: movq0x10(%r13),%rdx
 ddb{0}
 
 ddb{0} trace
 mpi_reply() at mpi_reply+0xd0
 mpi_intr() at
 mpi_intr+0x20
 Xintr_ioapic_level18() at Xintr_ioapic_level18+0xec
 ---
 interrupt ---
 Bad frame pointer:
 0x80001942dac0
 end trace frame:
 0x80001942dac0, count: -3
 Xspllower+0xe:
 ddb{0}
 
 
 I thinks the bug is in
 this file : /usr/src/sys/dev/ic/mpi.c
 
 Regards,
 Emeric.
 

lets prove a theory:

Index: dev/ic/mpi.c
===
RCS file: /home/cvs/src/sys/dev/ic/mpi.c,v
retrieving revision 1.165
diff -u -p -r1.165 mpi.c
--- dev/ic/mpi.c24 Sep 2010 01:27:11 -  1.165
+++ dev/ic/mpi.c22 Feb 2011 15:38:19 -
@@ -914,6 +914,10 @@ mpi_reply(struct mpi_softc *sc, u_int32_
reply_dva = (reg  MPI_REPLY_QUEUE_ADDRESS_MASK)  1;
i = (reply_dva - (u_int32_t)MPI_DMA_DVA(sc-sc_replies)) /
MPI_REPLY_SIZE;
+   if (i  0 || i  sc-sc_repq)
+   panic(%s: choked on reg %#x dva %#x map %#x,
+   DEVNAME(sc), reg, reply_dva,
+   (u_int32_t)MPI_DMA_DVA(sc-sc_replies));
rcb = sc-sc_rcbs[i];
 
bus_dmamap_sync(sc-sc_dmat,



Re : OpenBSD crash on an IBM x3550 M3

2011-02-22 Thread emeric boit
With the OpenBSD 4.9 snapshot (Jan. 2011) the message is :

panic: mpi0:
choked on reg 0xe0b06bff dva 0xc160d7fe map 0x7d198000
Stopped at
Debugger+0x5:   leave



- Message d'origine 
De : Mike Belopuhov
m...@crypt.org.ru
@ : emeric boit emericb...@yahoo.fr
Cc : bugs@openbsd.org
Envoyi le : Mar 22 fivrier 2011, 16h 42min 42s
Objet : Re: OpenBSD crash on an
IBM x3550 M3

On Tue, Feb 22, 2011 at 15:21 +, emeric boit wrote:

Stuart,
 
 Thanks for your response, but this patch doesn't resolve the
problem.
 With the OpenBSD 4.9 snapshot (Jan. 2011) the problem is the same :
 uvm_fault(0x80cef780, 0x80001b6fe000, 0, 1) - e
 kernel: page
 fault trap, code=0
 Stopped at  mpi_reply+0xd0: movq0x10(%r13),%rdx
 ddb{0}
 
 ddb{0} trace
 mpi_reply() at mpi_reply+0xd0
 mpi_intr() at

mpi_intr+0x20
 Xintr_ioapic_level18() at Xintr_ioapic_level18+0xec
 ---

interrupt ---
 Bad frame pointer:
 0x80001942dac0
 end trace frame:

0x80001942dac0, count: -3
 Xspllower+0xe:
 ddb{0}
 
 
 I thinks the
bug is in
 this file : /usr/src/sys/dev/ic/mpi.c
 
 Regards,
 Emeric.

lets prove a theory:

Index: dev/ic/mpi.c
===
RCS file:
/home/cvs/src/sys/dev/ic/mpi.c,v
retrieving revision 1.165
diff -u -p -r1.165
mpi.c
--- dev/ic/mpi.c24 Sep 2010 01:27:11 -1.165
+++ dev/ic/mpi.c
22 Feb 2011 15:38:19 -
@@ -914,6 +914,10 @@ mpi_reply(struct mpi_softc
*sc, u_int32_
reply_dva = (reg  MPI_REPLY_QUEUE_ADDRESS_MASK)  1;
i = (reply_dva - (u_int32_t)MPI_DMA_DVA(sc-sc_replies)) /
MPI_REPLY_SIZE;
+if (i  0 || i  sc-sc_repq)
+panic(%s:
choked on reg %#x dva %#x map %#x,
+DEVNAME(sc), reg,
reply_dva,
+(u_int32_t)MPI_DMA_DVA(sc-sc_replies));
rcb = sc-sc_rcbs[i];

bus_dmamap_sync(sc-sc_dmat,



Re : OpenBSD crash on an IBM x3550 M3

2011-02-21 Thread emeric boit
Hello,

Don't hesitate to contact me if you need more informations about this
bug.


Thanks for your help.

Regards,

Emeric.


- Message d'origine 
De : emeric boit emericb...@yahoo.fr
@ : bugs@openbsd.org
Envoyi le : Ven 4
fivrier 2011, 15h 53min 11s
Objet : OpenBSD crash on an IBM x3550 M3

Hello,
After doing a clean install of OpenBSD 4.8 (AMD64) on an IBM x3550 M3, I find
the system randomly panics after a period of use.
uvm_fault(0x80cc8360, 0x8000149b7000, 0, 1) - e
kernel: page
fault trap, code=0
Stopped at  mpi_reply+0x102:movq
0(%r13),%rax
ddb{0}

ddb{0} trace
mpi_reply() at mpi_reply+0x102
mpi_intr()
at mpi_intr+0x20
Xintr_ioapic_level18() at Xintr_ioapic_level18+0xec
---
interrupt ---
Bad frame pointer: 0x8000194e1920
end trace frame:
0x8000194e1920, count: -3
Xspllower+0xe:
ddb{0}

ddb{0} dmesg
OpenBSD
4.8 (GENERIC.MP) #335: Mon Aug 16 09:09:20 MDT 2010
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
RTC BIOS
diagnostic error 80clock_battery
real mem = 2135011328 (2036MB)
avail mem =
2064355328 (1968MB)
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.5 @
0x7f6bd000 (82 entries)
bios0: vendor IBM Corp. version -[D6E149AUS-1.09]-
date 09/21/2010
bios0: IBM 69Y4438
acpi0 at bios0: rev 2
acpi0: sleep states
S0 S1 S5
acpi0: tables DSDT FACP TCPA APIC MCFG SLIC HPET SSDT SSDT ERST DMAR
acpi0: wakeup devices UHC1(S4) UHC2(S4) UHC3(S4) UHC4(S4) UHC5(S4) EHC1(S4)
EHC
2(S4)
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr
0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0:
Intel(R) Xeon(R) CPU E5506 @ 2.13GHz, 2133.68 MHz
cpu0:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,C
FLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,VMX,EST,TM2,S
SS
E3,CX16,xTPR,PDCM,DCA,SSE4.1,SSE4.2,POPCNT,NXE,LONG
cpu0: 256KB 64b/line
8-way L2 cache
cpu0: apic clock running at 133MHz
cpu1 at mainbus0: apid 2
(application processor)
cpu1: Intel(R) Xeon(R) CPU E5506 @ 2.13GHz, 2133.41
MHz
cpu1:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,C
FLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,VMX,EST,TM2,S
SS
E3,CX16,xTPR,PDCM,DCA,SSE4.1,SSE4.2,POPCNT,NXE,LONG
cpu1: 256KB 64b/line
8-way L2 cache
cpu2 at mainbus0: apid 4 (application processor)
cpu2: Intel(R)
Xeon(R) CPU E5506 @ 2.13GHz, 2133.41 MHz
cpu2:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,C
FLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,VMX,EST,TM2,S
SS
E3,CX16,xTPR,PDCM,DCA,SSE4.1,SSE4.2,POPCNT,NXE,LONG
cpu2: 256KB 64b/line
8-way L2 cache
cpu3 at mainbus0: apid 6 (application processor)
cpu3: Intel(R)
Xeon(R) CPU E5506 @ 2.13GHz, 2133.41 MHz
cpu3:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,C
FLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,VMX,EST,TM2,S
SS
E3,CX16,xTPR,PDCM,DCA,SSE4.1,SSE4.2,POPCNT,NXE,LONG
cpu3: 256KB 64b/line
8-way L2 cache
ioapic0 at mainbus0: apid 8 pa 0xfec0, version 20, 24 pins
ioapic1 at mainbus0: apid 9 pa 0xfec8, version 20, 24 pins
acpihpet0 at
acpi0: 14318179 Hz
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus 11
(PCI1)
acpiprt2 at acpi0: bus 21 (PCI3)
acpiprt3 at acpi0: bus -1 (PCI6)
acpiprt4 at acpi0: bus 31 (PCI7)
acpiprt5 at acpi0: bus -1 (PCI9)
acpicpu0 at
acpi0
acpicpu1 at acpi0
acpicpu2 at acpi0
acpicpu3 at acpi0
ipmi at mainbus0
not configured
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 Intel
5520 Host rev 0x22
ppb0 at pci0 dev 1 function 0 Intel X58 PCIE rev 0x22
pci1 at ppb0 bus 11
bnx0 at pci1 dev 0 function 0 Broadcom BCM5709 rev 0x20:
apic 9 int 4 (irq 11
)
bnx1 at pci1 dev 0 function 1 Broadcom BCM5709 rev
0x20: apic 9 int 16 (irq 1
0)
ppb1 at pci0 dev 2 function 0 Intel X58 PCIE
rev 0x22
pci2 at ppb1 bus 16
ppb2 at pci0 dev 3 function 0 Intel X58 PCIE
rev 0x22
pci3 at ppb2 bus 21
ppb3 at pci0 dev 5 function 0 Intel X58 PCIE
rev 0x22
pci4 at ppb3 bus 26
ppb4 at pci0 dev 7 function 0 Intel X58 PCIE
rev 0x22: apic 9 int 6 (irq 11)
pci5 at ppb4 bus 31
Intel I340-T4 (82580)
rev 0x01 at pci5 dev 0 function 0 not configured
Intel I340-T4 (82580) rev
0x01 at pci5 dev 0 function 1 not configured
Intel I340-T4 (82580) rev 0x01
at pci5 dev 0 function 2 not configured
Intel I340-T4 (82580) rev 0x01 at
pci5 dev 0 function 3 not configured
Intel X58 QuickPath rev 0x22 at pci0
dev 16 function 0 not configured
Intel X58 QuickPath rev 0x22 at pci0 dev 16
function 1 not configured
Intel X58 QuickPath rev 0x22 at pci0 dev 17
function 0 not configured
Intel X58 QuickPath rev 0x22 at pci0 dev 17
function 1 not configured
Intel X58 Misc rev 0x22 at pci0 dev 20 function 0
not configured
Intel X58 GPIO rev 0x22 at pci0 dev 20 function 1 not
configured
Intel X58 RAS rev 0x22 at pci0 dev 20 function 2 not configured
Intel X58 Throttle rev 0x22 at pci0 dev 20 function 3 not configured
vendor
Intel, unknown product 0x342f (class system subclass interrupt, rev 0x
22