Recently, "hangs" (i.e. the proxy and the remote servers do nothing) re- appeared on my laptop during test036/039. First of all, it appears roughly 50% of the times; drops down to one every 10-15 runs when logging is reduced to 0, or just stats. After attaching so many times to slapd with gdb, to see that's absolutely idle in daemon on select, I decided to attach to one of the tester clients that are waiting for results. All of them appear to hang on poll() called with NULL timeout. BTW, my system is CentOS 4.2, which is a clone of RHEL4 with kernel 2.6.9. portable.h says that HAVE_POLL but don't HAVE_EPOLL. After undef'ing HAVE_POLL, and thus using select(), the hang reduced quite a lot: it ran 20 times with full logging (which, with poll(), would have had 50% chances of hanging immediately).
I note that there's no more possibility of hangs between the proxy and the remote servers, because synchronous operations are no longer in use, and the admin can set timeouts that would resolve the issue at some point; moreover, in any of the cases I saw, all threads of the proxy were idle, not waiting or looping on select/poll (this would be ITS#4246, which I haven't seen ever since any more). I'm really at a loss in identifying the reason of such behavior, given that logs do not appear to indicate any faulty behavior before the hang; yet, many clients (~15 simultaneously) appear to be waiting for something that doesn't happen. p. Ing. Pierangelo Masarati Responsabile Open Solution OpenLDAP Core Team SysNet s.n.c. Via Dossi, 8 - 27100 Pavia - ITALIA http://www.sys-net.it ------------------------------------------ Office: +39.02.23998309 Mobile: +39.333.4963172 Email: [EMAIL PROTECTED] ------------------------------------------
