Hi,
I think I may have found the cause of the pst timeout panics. I'm using
the Promise SX6000 RAID on -CURRENT, using the pst driver. Unfortunately,
under sufficiently high I/O load, the box starts printing:
pst: timeout mfa=0x00327b90 cmd=0x01
The 'mfa' address varies. It starts printing more and more rapidly, and
then eventually the machine wedges solid. Sometimes it makes it to:
panic: timeout table full
Here's what I think is happening. Two timeouts are being scheduled every
time a timeout triggers, because pst_timeout schedules a timeout before
calling pst_rw to retry the operation. Then pst_rw schedules ANOTHER
timeout. Both of these timeouts call pst_timeout, so they double every 10
seconds until there are a large enough number of timeouts firing, retrying
the same I/O operation, that the table fills and the machine panics.
Check out the following diff
http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/pst/pst-raid.c.diff?r1=1.8r2=1.9f=h
This is where pst_rw was changed to schedule its own timeouts, but the
timeout function didn't have its removed.
Do you think this could be the correct explanation? It seems like once
pst_timeout is called, the machine is doomed... I'm recompiling my kernel
now to test the fix under load.
--Aaron
Index: /sys/dev/pst/pst-raid.c
===
RCS file: /usr/cvs/src/sys/dev/pst/pst-raid.c,v
retrieving revision 1.11
diff -u -r1.11 pst-raid.c
--- /sys/dev/pst/pst-raid.c 24 Aug 2003 17:54:17 - 1.11
+++ /sys/dev/pst/pst-raid.c 8 Sep 2003 02:32:58 -
@@ -316,11 +316,6 @@
mtx_unlock(request-psc-iop-mtx);
return;
}
-if (dumping)
- request-timeout_handle.callout = NULL;
-else
- request-timeout_handle =
- timeout((timeout_t*)pst_timeout, request, 10 * hz);
if (pst_rw(request)) {
iop_free_mfa(request-psc-iop, request-mfa);
biofinish(request-bp, NULL, EIO);
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]