Hi,
I had a FreeBSD 4.11-RC3 machine reboot without advance notice, the last
logging the network syslogd captured was attempted aic0 (Adaptec 2940 UW
Pro) recovery.
Syslog excerpt as captured by the remote machine, with date and
hostname /kernel: and card state dumps removed (can be provided if
necessary). I wonder if the SCSI error recovery attempts caused the
reboot, I have no hints either way, but this machine is otherwise
stable.
13:28:35 ahc0: Recovery Initiated
13:28:53 (da0:ahc0:0:0:0): SCB 0x16 - timed out
13:28:53 sg[0] - Addr 0x6da3800 : Length 2048
13:28:53 (da0:ahc0:0:0:0): Other SCB Timeout
13:28:53 ahc0: Timedout SCBs already complete. Interrupts may not be
functioning.
13:28:53 ahc0: Recovery Initiated
13:29:02 (da0:ahc0:0:0:0): SCB 0x1b - timed out
13:29:04 (da0:ahc0:0:0:0): BDR message in message buffer
13:29:04 ahc0: Timedout SCBs already complete. Interrupts may not be
functioning.
13:29:04 ahc0: Recovery Initiated
13:29:16 Kernel Free SCB list: 9 4 15 20
13:29:17 sg[7] - Addr 0x3bea000 : Length 4096
13:29:18 ahc0: Issued Channel A Bus Reset. 25 SCBs aborted
As the machine rebooted up, it remained in single user due to
a softupdates inconsistency fsck reported:
| # fsck -p /usr
| /dev/da0s1g: DIRECTORY CORRUPTED I=175105 OWNER=root MODE=40755
| /dev/da0s1g: SIZE=512 MTIME=Jan 18 15:14 2005
| /dev/da0s1g: DIR=?
|
| /dev/da0s1g: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY.
I have not yet run fsck for interactive repair, because I want to know
what is going on here and allow debugging this.
At the time of the crash, these tasks were running:
1. amanda was running a dump(8)
2. I was installing manpages from /usr/src/share/man/man4
3. a cvsup for the ports tree was running (this is likely related to the
problem)
| # fsdb -r /dev/da0s1g
| fsdb (inum: 2) inode 175105
| current inode: directory
| I=175105 MODE=40755 SIZE=512
| MTIME=Jan 18 15:14:48 2005 [0 nsec]
| CTIME=Jan 18 15:14:48 2005 [0 nsec]
| ATIME=Jun 19 03:05:43 2003 [0 nsec]
| OWNER=root GRP=wheel LINKCNT=2 FLAGS=0 BLKCNT=4 GEN=4e5151f9
| fsdb (inum: 175105) cd ..
| component `..': fsdb: name `..' not found in current inode directory
I checked with camcontrol, the write cache is off (see below), but the
queue algorithm modifier is on and cannot be switched off.
Digging through the old structures, with find, reveals:
| 1751014 drwxr-xr-x3 root wheel 512 Sep 1
2002 /usr/X11R6/lib/perl5/site_perl/5.005/i386-freebsd
| 1751024 drwxr-xr-x2 root wheel 512 Sep 1
2002 /usr/X11R6/lib/perl5/site_perl/5.005/i386-freebsd/auto
| 1751034 drwxr-xr-x5 root wheel 512 Aug 23
2002 /usr/sup
| 1751044 drwxr-xr-x2 root wheel 512 Jan 19
13:29 /usr/sup/src-all
1751054 drwxr-xr-x2 root wheel 512 Jan 18
15:14 /usr/sup/ports-all
| 1751064 drwxr-xr-x2 root wheel 512 Jan 18
15:14 /usr/sup/doc-all
| 1751074 drwxr-xr-x 22 root wheel1024 Sep 28
19:47 /usr/doc
| 1751084 drwxr-xr-x6 root wheel 512 Dec 19
13:26 /usr/doc/de_DE.ISO8859-1
| 1751094 drwxr-xr-x5 root wheel 512 Dec 27
2003 /usr/doc/de_DE.ISO8859-1/books
And, as expected:
| # ls -la /usr/sup/ports-all/
| #
Why can, under such circumstances, a softupdates filesystem become
corrupt so that fsck -p cannot fix it, and it loses has directories without
. and ..? kernel/softupdates bug? How can this directory become empty?
locate has this information recorded:
/usr/sup/ports-all
/usr/sup/ports-all/#cvs.cvsup-2279.0
/usr/sup/ports-all/checkouts.cvs:.
so apparently, three (checkouts.cvs:., . and ..) or four files (perhaps
the # file) have disappeared. I'm not sure if fsck will revive them, I
want to avoid destroying data useful for debugging.
Is the Queue Algorithm Modifier a problem? (see below) I cannot set this
to 0 on this drive, camcontrol: error sending mode select command with
-P0 and -P3. (Micropolis 4345WS)
How do I go about providing the file system metadata so someone can take
a look at it? The file system is 3.5 G in size, so anything that goes
beyond meta data is not feasible. Providing SSH access to the failed
machine may work though if I'm sent your OpenSSH v2-format key.
# camcontrol inquiry da0
pass0: MICROP 4345WS x43h Fixed Direct Access SCSI-2 device
pass0: Serial Number 77HT45
pass0: 40.000MB/s transfers (20.000MHz, offset 8, 16bit), Tagged Queueing
Enabled
# camcontrol modepage da0 -m8
IC: 0
ABPF: 0
CAP: 0
DISC: 0
SIZE: 0
WCE: 0
MF: 0
RCD: 0
...
# camcontrol modepage da0 -m10
RLEC: 0
Queue Algorithm Modifier: 1
QErr: 0
DQue: 0
...
--
Matthias Andree
___
freebsd-stable@freebsd.org mailing list