pst driver: timeout explosion? (patch is attached)

2003-09-07 Thread Aaron Smith
Hi,

I think I may have found the cause of the pst timeout panics.  I'm using
the Promise SX6000 RAID on -CURRENT, using the pst driver.  Unfortunately,
under sufficiently high I/O load, the box starts printing:

  pst: timeout mfa=0x00327b90 cmd=0x01

The 'mfa' address varies. It starts printing more and more rapidly, and
then eventually the machine wedges solid. Sometimes it makes it to:

  panic: timeout table full

Here's what I think is happening. Two timeouts are being scheduled every
time a timeout triggers, because pst_timeout schedules a timeout before
calling pst_rw to retry the operation. Then pst_rw schedules ANOTHER
timeout.  Both of these timeouts call pst_timeout, so they double every 10
seconds until there are a large enough number of timeouts firing, retrying
the same I/O operation, that the table fills and the machine panics.

Check out the following diff

  
http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/pst/pst-raid.c.diff?r1=1.8r2=1.9f=h

This is where pst_rw was changed to schedule its own timeouts, but the
timeout function didn't have its removed.

Do you think this could be the correct explanation? It seems like once
pst_timeout is called, the machine is doomed... I'm recompiling my kernel
now to test the fix under load.

--Aaron
Index: /sys/dev/pst/pst-raid.c
===
RCS file: /usr/cvs/src/sys/dev/pst/pst-raid.c,v
retrieving revision 1.11
diff -u -r1.11 pst-raid.c
--- /sys/dev/pst/pst-raid.c 24 Aug 2003 17:54:17 -  1.11
+++ /sys/dev/pst/pst-raid.c 8 Sep 2003 02:32:58 -
@@ -316,11 +316,6 @@
mtx_unlock(request-psc-iop-mtx);
return;
 }
-if (dumping)
-   request-timeout_handle.callout = NULL;
-else
-   request-timeout_handle =
-   timeout((timeout_t*)pst_timeout, request, 10 * hz);
 if (pst_rw(request)) {
iop_free_mfa(request-psc-iop, request-mfa);
biofinish(request-bp, NULL, EIO);
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


can't log in! openpam_load_module failures on strcpy, cgetclose

2003-09-06 Thread Aaron Smith
Hi everyone. login and sshd are both failing for me after a make world on
current. I have mergemaster'd, rebuilt world, ad infinitum. I've compared
/etc/pam.d with the source tree version and they're the same.

I instrumented openpam_dynamic, and login is failing on 'strcpy':

login: in openpam_dynamic(): pam_nologin.so: /usr/lib/pam_nologin.so:
Undefined symbol strcpy
login: in openpam_load_module(): no pam_nologin.so found
login: pam_start(): system error

for sshd the failure is on 'cgetclose':

sshd: in openpam_dynamic(): pam_nologin.so: /usr/lib/pam_nologin.so:
Undefined symbol cgetclose
sshd: in openpam_load_module(): no pam_nologin.so found
sshd: fatal: PAM: initialisation failed

Now, there are T symbols for both of these in the respective binaries,
according to nm (at least before they are stripped). So just in case
stripping was an issue I figured I'd try unstripped binaries and rebuilt
them. No luck.

Can anyone help me out?

Thanks,
--Aaron
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]