ahc lockups in -current

2000-07-23 Thread Mike Meyer

It seems that the last changes to the ahc drivers (committed on the
18th) are causing my system to lock up. I'd check the aic7890 specific
changes first, but that's just me.

The problem is that when I start doing I/O to two drives, the system
hangs. The SCSI controller and both drives(*) turn on their "I'm busy"
LED, so I assume the scsi bus is hung. They OS is still there, but
trying to do anything that touches the drives causes the process to
lock up. I get no core dump and no messages to the console indicating
any problems.

With this version, I *do* get the following message at boot time that
I didn't get before:

(noperiph:ahc0:0:-1:-1): SCSI bus reset delivered. 0 SCBs aborted.

The system configuration is:

Supermicro motherboard with two PII/Xeons and a aic7890 on it. The aic
has BIOS version 2.01 on it. Attached to that are:

su-2.04# camcontrol devlist
SEAGATE ST39236LW 0004   at scbus0 target 0 lun 0 (pass0,da0)
SEAGATE ST39173W 5958at scbus0 target 1 lun 0 (pass1,da1)
iomega jaz 1GB J.86  at scbus0 target 3 lun 0 (pass2,da2)
PIONEER CD-ROM DR-124X 1.06  at scbus0 target 4 lun 0 (pass3,cd0)
YAMAHA CRW4260 1.0q  at scbus0 target 5 lun 0 (pass4,cd1)
ARTEC AM12S 1.06 at scbus0 target 6 lun 0 (pass5)

Target 0 is the system disk: /, /var, /usr, swap and some scratch
space.

Target 1 is data: /home, more scratch space (/usr/obj lives there) and
more swap.

The SCSI bus is:

AM12S(6) -- AIC(7) -- da(1) -- da(0) -- jazz(3) -- cd(4) -- cd(5)
 -- term plug

I'm a bit leary of the external scanner, so I unplugged it, made sure
the AIC had termination set properly, and rebooted single
user. Mounted /usr read-only, mounted the scratch space on da1, and
did a cp -r of /usr to the scratch space. The system locked up in the
same state as described above.

Trying the same test - except I left the scanner plugged in - with a
kernel built with the old version of the ahc driver worked fine. In
fact, building the world with /usr/src and /usr/obj on different disks
has been working fine for a while now.

I'm hoping to get some guidance from someone who's familiar with the
code before I start digging into it. If more information would be
useful (dmesg output? config file? other?), let me know. If there's
somne specific testing to do - including, if needed, borrowing a 2940
and moving the drives to that to try things on - let me know.

Thanx,
mike

*) When rebooting after the first such crash, the system locked up
during fsck with the jazz drive lit as active as well. Yes, the jazz
filesystem is mounted at boot.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: ahc lockups in -current

2000-07-23 Thread Brandon Hume

 It seems that the last changes to the ahc drivers (committed on the
 18th) are causing my system to lock up. I'd check the aic7890 specific

I just upgraded my system to the latest -current today, from a long
hiatus... last time I did a world was July 3rd.

I can no longer boot the system.  I was beginning to sort through the boot
floppies, to figure out when the changes were made that sunk me.  Thanks for
saving me the trouble.  :)

I'm booting off the onboard AIC7895 on a Tyan Thunder/100.  I get the same
error you do initially, followed by many, many SCSI bus resets, errors about
lost devices, SCBs aborted, and the like.  After a period, the system
panics... not about being unable to mount the root fs like I expected, but
about 'page fault in kernel mode' or something similar.

(da0:ahc0:0:0:0): SCB 0x9 - timed out in Command phase, SEQADDR == 0xa0
(da0:ahc0:0:0:0): BDR message in message buffer
(da0:ahc0:0:0:0): SCB 0x9 - timed out in Command phase, SEQADDR == 0x9f
(da0:ahc0:0:0:0): no longer in timeout, status = 34b
ahc0: Issued Channel A Bus Reset. 4 SCBs aborted

etc...

Finally it ends with 'Fatal trap 12: page fault while in kernel mode'
fault virtual address = 0x3c
fault code = supervisor write, page not present

The machine locks up hard at that, needing a power cycle.  The SCSI activity
light blazes.

Sorry I can't cut'n'paste the errors to be more useful, I don't have a serial
console.  I'm copying them by hand as best I can (I'd appreciate being told
a better method... :) )

-- 
Brandon Hume- hume - BOFH.Halifax.NS.Ca, http://WWW.BOFH.Halifax.NS.Ca/


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: ahc lockups in -current

2000-07-23 Thread Manfred Antar

At 06:07 PM 7/23/2000 -0500, Mike Meyer wrote:
It seems that the last changes to the ahc drivers (committed on the
18th) are causing my system to lock up. I'd check the aic7890 specific
changes first, but that's just me.

The problem is that when I start doing I/O to two drives, the system
hangs. The SCSI controller and both drives(*) turn on their "I'm busy"
LED, so I assume the scsi bus is hung. They OS is still there, but
trying to do anything that touches the drives causes the process to
lock up. I get no core dump and no messages to the console indicating
any problems.

With this version, I *do* get the following message at boot time that
I didn't get before:

(noperiph:ahc0:0:-1:-1): SCSI bus reset delivered. 0 SCBs aborted.

The system configuration is:

I get the same hang when doing a dump to an Exabyte 8505 connected to on board aic7880
My disks are hooked up to an internal DPT Raid controller and no problem there.
Any time I try to access the tape drive it panics.
Sorry I don't have the panic message as I'm running this machine headless at the 
moment.
Manfred

==
||  [EMAIL PROTECTED]   ||
||  Ph. (415) 681-6235  ||
==



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message