Adaptec aic7892 Ultra160 SCSI adapter ERROR

2004-05-20 Thread Aurimas Mikalauskas

Hi,

I'm new to SCSI and all it's subsystems and this is the first time I'm
having troubles with it. This morning I got the folowing kernel error
message:

---
May 20 10:39:59 banners /kernel: (da0:ahc0:0:0:0): SCB 0x35 - timed out
May 20 10:40:04 banners /kernel:  Dump Card State Begins 

May 20 10:40:04 banners /kernel: ahc0: Dumping Card State while idle, at SEQADDR 0x8
May 20 10:40:04 banners /kernel: Card was paused
May 20 10:40:04 banners /kernel: ACCUM = 0x0, SINDEX = 0x3a, DINDEX = 0xe4, ARG_2 = 0x0
May 20 10:40:04 banners /kernel: HCNT = 0x0 SCBPTR = 0x12
May 20 10:40:04 banners /kernel: SCSIPHASE[0x0] SCSISIGI[0x0] ERROR[0x0] SCSIBUSL[0x0]
May 20 10:40:04 banners /kernel: LASTPHASE[0x1] SCSISEQ[0x12] SBLKCTL[0xa] 
SCSIRATE[0x0]
May 20 10:40:04 banners /kernel: SEQCTL[0x10] SEQ_FLAGS[0xc0] SSTAT0[0x0] SSTAT1[0x8]
May 20 10:40:04 banners /kernel: SSTAT2[0x0] SSTAT3[0x0] SIMODE0[0x8] SIMODE1[0xa4]
May 20 10:40:04 banners /kernel: SXFRCTL0[0x80] DFCNTRL[0x0] DFSTATUS[0x89]
May 20 10:40:04 banners /kernel: STACK: 0x0 0x163 0x109 0x3
May 20 10:40:04 banners /kernel: SCB count = 70
May 20 10:40:04 banners /kernel: Kernel NEXTQSCB = 11
May 20 10:40:04 banners /kernel: Card NEXTQSCB = 11
May 20 10:40:04 banners /kernel: QINFIFO entries:
May 20 10:40:04 banners /kernel: Waiting Queue entries:
May 20 10:40:05 banners /kernel: Disconnected Queue entries: 11:53
May 20 10:40:05 banners /kernel: QOUTFIFO entries:
May 20 10:40:05 banners /kernel: Sequencer Free SCB List: 18 26 12 15 24 25 31 27 2 0 
7 5 9 4 23 28 17 1 29 22 16 3 21 8 10 20
 13 30 14 6 19
May 20 10:40:05 banners /kernel: Sequencer SCB Info:
May 20 10:40:05 banners /kernel: 0 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] 
SCB_TAG[0xff]
May 20 10:40:05 banners /kernel: 1 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] 
SCB_TAG[0xff]
 skipped ...
May 20 10:40:05 banners /kernel: 31 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] 
SCB_TAG[0xff]
May 20 10:40:05 banners /kernel: Pending list:
May 20 10:40:06 banners /kernel: 53 SCB_CONTROL[0x60] SCB_SCSIID[0x7] SCB_LUN[0x0]
May 20 10:40:06 banners /kernel: Kernel Free SCB list: 58 0 48 19 26 49 2 47 43 28 52 
15 14 30 33 50 64 62 18 68 42 6 16 24 60
 41 10 21 7 29 56 3 35 5 45 39 44 34 38 51 13 8 27 20 59 66 37 69 25 61 54 46 31 9 65 
12 63 4 36 17 55 22 1 23 40 57 67 32
May 20 10:40:06 banners /kernel:
May 20 10:40:06 banners /kernel:  Dump Card State Ends 

May 20 10:40:06 banners /kernel: sg[0] - Addr 0xda7 : Length 4096
May 20 10:40:06 banners /kernel: sg[1] - Addr 0xa191000 : Length 4096
May 20 10:40:06 banners /kernel: sg[2] - Addr 0xc512000 : Length 4096
May 20 10:40:06 banners /kernel: sg[3] - Addr 0xf13000 : Length 4096
May 20 10:40:06 banners /kernel: sg[4] - Addr 0x2394000 : Length 4096
May 20 10:40:06 banners /kernel: sg[5] - Addr 0xde15000 : Length 4096
May 20 10:40:06 banners /kernel: sg[6] - Addr 0x41f6000 : Length 4096
May 20 10:40:06 banners /kernel: sg[7] - Addr 0x92f7000 : Length 4096
May 20 10:40:06 banners /kernel: sg[8] - Addr 0xb0f8000 : Length 4096
May 20 10:40:06 banners /kernel: sg[9] - Addr 0xa8d9000 : Length 4096
May 20 10:40:06 banners /kernel: sg[10] - Addr 0xd07a000 : Length 4096
May 20 10:40:06 banners /kernel: sg[11] - Addr 0x463b000 : Length 4096
May 20 10:40:06 banners /kernel: sg[12] - Addr 0x55bc000 : Length 4096
May 20 10:40:06 banners /kernel: sg[13] - Addr 0x4b5d000 : Length 4096
May 20 10:40:06 banners /kernel: sg[14] - Addr 0x335e000 : Length 4096
May 20 10:40:06 banners /kernel: sg[15] - Addr 0xd05f000 : Length 4096
May 20 10:40:06 banners /kernel: (da0:ahc0:0:0:0): Queuing a BDR SCB
May 20 10:40:06 banners /kernel: (da0:ahc0:0:0:0): Bus Device Reset Message Sent
May 20 10:40:06 banners /kernel: (da0:ahc0:0:0:0): no longer in timeout, status = 34b
May 20 10:40:06 banners /kernel: ahc0: Bus Device Reset on A:0. 1 SCBs aborted
---

And I have no idea what could this mean. For me it's just a list of
interesting sounding words like BDR, SCB etc. Could you please tell me
what should I be expecting from this system soon? It's a high priority
system, runing hundreds of apache daemons showing hundreds of banners
every second.

dmesg:

---
FreeBSD 4.9-STABLE #0: Wed Dec 17 16:15:51 EET 2003
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/ADNET
Timecounter i8254  frequency 1193182 Hz
CPU: Intel(R) Pentium(R) 4 CPU 2.66GHz (2665.39-MHz 686-class CPU)
  Origin = GenuineIntel  Id = 0xf29  Stepping = 9
  
Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SS
E2,SS,HTT,TM,PBE
real memory  = 268238848 (261952K bytes)
avail memory = 258314240 (252260K bytes)
Preloaded elf kernel kernel at 0xc02b1000.
Warning: Pentium 

Re: Adaptec aic7892 Ultra160 SCSI adapter ERROR

2004-05-20 Thread Justin T. Gibbs
 Hi,
 
 I'm new to SCSI and all it's subsystems and this is the first time I'm
 having troubles with it. This morning I got the folowing kernel error
 message:

This means that your drive decided not to return a command back to the
controller.  The controller driver was able to clear up the error by
resetting the device (BDR = Bus Device Reset).

The failure could indicate a drive firmware bug or that the drive is
failing.

--
Justin

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Adaptec aic7892 Ultra160 SCSI adapter ERROR

2004-05-20 Thread Mark Terribile


Aurimas Mikalauskas [EMAIL PROTECTED] writes

 May 20 10:39:59 banners /kernel: (da0:ahc0:0:0:0): SCB 0x35 - timed out
 May 20 10:40:04 banners /kernel:  Dump Card State Begins

 :
 :

and Justin T. Gibbs [EMAIL PROTECTED] replies:

 This means that your drive decided not to return a command back to the
 controller.  The controller driver was able to clear up the error by
 resetting the device (BDR = Bus Device Reset).
 
 The failure could indicate a drive firmware bug or that the drive is
 failing.

I have seen similar messages that were traced to SCSI cabling problems.  At the
higher SCSI speeds, the electrical signal just barely has time to settle before
a new signal is introduced (this is _GREATLY_ simplifying the problem; the
science to study is Transmission Line Theory) and the only reason it works is
that the `terminators' absorb energy and keep it from reflecting back into
the line and the cabling rules (on where connectors may be placed) prevent
reflections from occurring where they can cause problems.  The whole thing is
sensitive to poor connections, connections placed too close together, sharp
bends in the cable, cables laced together face-to-face, etc.  And the
problems can be intermittent, or only show up under extreme load.

If you haven't, make sure that the connectors are properly seated, the
correct termination is installed/set, etc., and that the cable is not
folded on itself or clamped face-to-face with another flat cable.
And, of course, make sure that all the device IDs are set correctly.

   Mark Terribile





__
Do you Yahoo!?
Yahoo! Domains – Claim yours for only $14.70/year
http://smallbusiness.promotions.yahoo.com/offer 
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]