Re: One more attempt: stuck processes

2007-11-20 Thread Ken Murchison
Gary Mills wrote: On Mon, Nov 19, 2007 at 12:35:46PM -0500, Ken Murchison wrote: Sebastian Hagedorn wrote: -- Ken Murchison [EMAIL PROTECTED] is rumored to have mumbled on 17. November 2007 11:21:38 -0500 regarding Re: One more attempt: stuck processes: Here's a patch that seems to fix

Re: One more attempt: stuck processes

2007-11-20 Thread Sebastian Hagedorn
--On 20. November 2007 09:20:42 -0500 Ken Murchison [EMAIL PROTECTED] wrote: OK. Can you both try this alternate patch? It should be portable, and GDB shouldn't cause it to kick out. I've set it up so that for SSL-wrapped services it will timeout after 3 minutes, otherwise it uses the

Re: One more attempt: stuck processes

2007-11-20 Thread Sebastian Hagedorn
--On 20. November 2007 15:59:18 +0100 Sebastian Hagedorn [EMAIL PROTECTED] wrote: I can fix this myself, but it's probably easier if you do it. Just FYI: I fixed it locally with a 3 minute timeout and it compiled fine. I'll start testing it now. -- .:.Sebastian Hagedorn - RZKR-R1

Re: One more attempt: stuck processes

2007-11-19 Thread Sebastian Hagedorn
-- Ken Murchison [EMAIL PROTECTED] is rumored to have mumbled on 19. November 2007 12:35:46 -0500 regarding Re: One more attempt: stuck processes: How are things looking today? Good! When I just checked I thought I'd found a new hanging pop3d process, because it's been around for 6 hours

Re: One more attempt: stuck processes

2007-11-19 Thread Sebastian Hagedorn
-- Ken Murchison [EMAIL PROTECTED] is rumored to have mumbled on 19. November 2007 13:17:07 -0500 regarding Re: One more attempt: stuck processes: The only other potential downside the patch has is that stracing or gdb'ing it causes the timeout to trigger prematurely. AFAIK that's a common

Re: One more attempt: stuck processes

2007-11-19 Thread Ken Murchison
Sebastian Hagedorn wrote: -- Ken Murchison [EMAIL PROTECTED] is rumored to have mumbled on 19. November 2007 13:17:07 -0500 regarding Re: One more attempt: stuck processes: The only other potential downside the patch has is that stracing or gdb'ing it causes the timeout to trigger

Re: One more attempt: stuck processes

2007-11-19 Thread Ken Murchison
Sebastian Hagedorn wrote: -- Ken Murchison [EMAIL PROTECTED] is rumored to have mumbled on 19. November 2007 12:35:46 -0500 regarding Re: One more attempt: stuck processes: How are things looking today? Good! When I just checked I thought I'd found a new hanging pop3d process, because

Re: One more attempt: stuck processes

2007-11-19 Thread Gary Mills
On Mon, Nov 19, 2007 at 12:35:46PM -0500, Ken Murchison wrote: Sebastian Hagedorn wrote: -- Ken Murchison [EMAIL PROTECTED] is rumored to have mumbled on 17. November 2007 11:21:38 -0500 regarding Re: One more attempt: stuck processes: Here's a patch that seems to fix the problem. I did

Re: One more attempt: stuck processes

2007-11-17 Thread Gabor Gombas
On Fri, Nov 16, 2007 at 06:37:52PM +0100, Sebastian Hagedorn wrote: OK. Still the symptom seems to be different from what I'm seeing. It may be. As I said I had no time so far to investigate it in depth, I just wanted to say mee too for the hung process problem. Could it be that you have a

Re: One more attempt: stuck processes

2007-11-17 Thread Ken Murchison
Sebastian Hagedorn wrote: -- Ken Murchison [EMAIL PROTECTED] is rumored to have mumbled on 16. November 2007 15:54:50 -0500 regarding Re: One more attempt: stuck processes: That's exactly what Gary is seeing. Right. Apparently stripped binaries aren't any good for straces. Its

Re: One more attempt: stuck processes

2007-11-16 Thread Sebastian Hagedorn
--On 15. November 2007 19:25:19 +0100 Simon Matter [EMAIL PROTECTED] wrote: It's blinking red, which normally means a broken link. I'm not sure how The file 0 is a symbolic symlink which doesn't really point to a file, that's why the shell shows it blinking. Everything okay here. Thanks.

Re: One more attempt: stuck processes

2007-11-16 Thread Sebastian Hagedorn
--On 16. November 2007 16:52:27 +0100 Gabor Gombas [EMAIL PROTECTED] wrote: On Fri, Nov 16, 2007 at 12:36:49PM +0100, Sebastian Hagedorn wrote: He suggested that the trace is unreliable. Perhaps a bug in RHEL 3's version of OpenSSL messes up the stack. That would also explain why nobody

Re: One more attempt: stuck processes

2007-11-16 Thread Gary Mills
On Fri, Nov 16, 2007 at 03:20:57PM +0100, Sebastian Hagedorn wrote: --On 16. November 2007 08:00:07 -0600 Gary Mills [EMAIL PROTECTED] wrote: This timeout doesn't work in some cases. We have lots of POP sessions that never terminate. That's interesting to hear! Especially since you are

Re: One more attempt: stuck processes

2007-11-16 Thread Ken Murchison
Sebastian Hagedorn wrote: I think I will try one more approach: I reverted cyrus.conf to not use -U 1 anymore, so that processes should be reused. I will strace one of the pop3d processes in the hope that it gets stuck. That way I should be able to see where things go wrong. If the process

Re: One more attempt: stuck processes

2007-11-16 Thread Gabor Gombas
On Fri, Nov 16, 2007 at 05:20:00PM +0100, Sebastian Hagedorn wrote: That's a 2.6 kernel, right? Yes, 2.6.18-2-amd64. Hm, we don't suffer any actual slowdown, it's just that the number of processes increases over time. It's not a slowdown - the client connects, and hangs. It never even gets

Re: One more attempt: stuck processes

2007-11-16 Thread Ken Murchison
Sebastian Hagedorn wrote: --On 16. November 2007 09:37:42 -0600 Gary Mills [EMAIL PROTECTED] wrote: Could you get a stack trace? If you have gdb you just call it with gdb -p 19175. Then you can do bt at the prompt. I forget how to do it with Sun's debugger. Easy: # pstack 19175

Re: One more attempt: stuck processes

2007-11-16 Thread Sebastian Hagedorn
--On 16. November 2007 18:07:51 +0100 Gabor Gombas [EMAIL PROTECTED] wrote: Hm, we don't suffer any actual slowdown, it's just that the number of processes increases over time. It's not a slowdown - the client connects, and hangs. It never even gets to the authentication phase (at least it's

Re: One more attempt: stuck processes

2007-11-16 Thread Gary Mills
On Fri, Nov 16, 2007 at 01:54:24PM +0100, Alain Spineux wrote: On Nov 16, 2007 12:36 PM, Sebastian Hagedorn [EMAIL PROTECTED] wrote: --On 16. November 2007 11:27:09 +0100 Sebastian Hagedorn [EMAIL PROTECTED] wrote: 1. In the absence of the SO_KEEPALIVE option it is entirely possible that

Re: One more attempt: stuck processes

2007-11-16 Thread Sebastian Hagedorn
--On 16. November 2007 11:27:09 +0100 Sebastian Hagedorn [EMAIL PROTECTED] wrote: 1) Since it only happens on dialup connections, could it be that the dialin router at the providers end sends TCP/RST when a client hangs up and those packets are filtered somewhere, maybe on your firewall? OK,

Re: One more attempt: stuck processes

2007-11-16 Thread Ken Murchison
Sebastian Hagedorn wrote: The only reason I could imagine for the sequence of calls was signal handling. But let's be methodical. There's only one spot where SSL_accept() is called: in tls_start_servertls(). In pop3d.c that's only called in cmd_starttls(). That in turn is called either in

Re: One more attempt: stuck processes

2007-11-16 Thread Sebastian Hagedorn
--On 16. November 2007 13:54:24 +0100 Alain Spineux [EMAIL PROTECTED] wrote: On Nov 16, 2007 12:36 PM, Sebastian Hagedorn [EMAIL PROTECTED] wrote: I just had a discussion with a colleague regarding this. He made two observations: 1. In the absence of the SO_KEEPALIVE option it is entirely

Re: One more attempt: stuck processes

2007-11-16 Thread Sebastian Hagedorn
--On 16. November 2007 08:00:07 -0600 Gary Mills [EMAIL PROTECTED] wrote: This timeout doesn't work in some cases. We have lots of POP sessions that never terminate. That's interesting to hear! Especially since you are using Solaris. About 30 out of 40 are in that state now. Here's an

Re: One more attempt: stuck processes

2007-11-16 Thread Sebastian Hagedorn
--On 16. November 2007 09:37:42 -0600 Gary Mills [EMAIL PROTECTED] wrote: Could you get a stack trace? If you have gdb you just call it with gdb -p 19175. Then you can do bt at the prompt. I forget how to do it with Sun's debugger. Easy: # pstack 19175 19175: pop3d -s fef9f810

Re: One more attempt: stuck processes

2007-11-16 Thread Gabor Gombas
On Fri, Nov 16, 2007 at 12:36:49PM +0100, Sebastian Hagedorn wrote: He suggested that the trace is unreliable. Perhaps a bug in RHEL 3's version of OpenSSL messes up the stack. That would also explain why nobody else seems to have this problem. FYI I also know a system that has problems

Re: One more attempt: stuck processes

2007-11-16 Thread Gabor Gombas
On Fri, Nov 16, 2007 at 06:11:01PM +0100, Sebastian Hagedorn wrote: Well, that just sounds like you're running out of entropy. That's a different issue. Recompile your cyrus-sasl to use /dev/urandom instead of /dev/random or disable apop in /etc/imapd.conf: Debian uses /dev/urandom for a

Re: One more attempt: stuck processes

2007-11-16 Thread Ken Murchison
Sebastian Hagedorn wrote: --On 16. November 2007 12:39:28 -0500 Ken Murchison [EMAIL PROTECTED] wrote: Sorry, my patch wasn't complete. It wasn't logging the value that I wanted. OK: Nov 16 18:48:17 lvr13 pop3s[1385]: SSL_read() returned 0:5 Nov 16 18:48:33 lvr13 pop3s[1375]:

Re: One more attempt: stuck processes

2007-11-16 Thread Alain Spineux
On Nov 16, 2007 6:11 PM, Sebastian Hagedorn [EMAIL PROTECTED] wrote: --On 16. November 2007 18:07:51 +0100 Gabor Gombas [EMAIL PROTECTED] wrote: Hm, we don't suffer any actual slowdown, it's just that the number of processes increases over time. It's not a slowdown - the client

Re: One more attempt: stuck processes

2007-11-16 Thread Gary Mills
On Fri, Nov 16, 2007 at 05:13:13PM +0100, Sebastian Hagedorn wrote: --On 16. November 2007 14:23:17 +0100 Simon Matter [EMAIL PROTECTED] wrote: Did you ever see non SSL connections get stuck? No. Most of mine are `pop3d -s', but I have seen a few without the `-s'. When I did a stack

Re: One more attempt: stuck processes

2007-11-16 Thread Sebastian Hagedorn
OK, now I got this: Nov 16 18:37:06 lvr13 pop3s[23089]: SSL_read() returned -1 But that process terminated normally. -- .:.Sebastian Hagedorn - RZKR-R1 (Gebäude 52), Zimmer 18.:. Zentrum für angewandte Informatik - Universitätsweiter Service RRZK .:.Universität zu Köln / Cologne University

Re: One more attempt: stuck processes

2007-11-16 Thread Alain Spineux
On Nov 16, 2007 12:36 PM, Sebastian Hagedorn [EMAIL PROTECTED] wrote: --On 16. November 2007 11:27:09 +0100 Sebastian Hagedorn [EMAIL PROTECTED] wrote: 1) Since it only happens on dialup connections, could it be that the dialin router at the providers end sends TCP/RST when a client hangs

Re: One more attempt: stuck processes

2007-11-16 Thread Sebastian Hagedorn
--On 16. November 2007 14:23:17 +0100 Simon Matter [EMAIL PROTECTED] wrote: Did you ever see non SSL connections get stuck? No. -- .:.Sebastian Hagedorn - RZKR-R1 (Gebäude 52), Zimmer 18.:. Zentrum für angewandte Informatik - Universitätsweiter Service RRZK .:.Universität zu Köln /

Re: One more attempt: stuck processes

2007-11-16 Thread Sebastian Hagedorn
--On 16. November 2007 11:27:52 -0500 Ken Murchison [EMAIL PROTECTED] wrote: Sebastian Hagedorn wrote: The only reason I could imagine for the sequence of calls was signal handling. But let's be methodical. There's only one spot where SSL_accept() is called: in tls_start_servertls(). In

Re: One more attempt: stuck processes

2007-11-16 Thread Ken Murchison
Sebastian Hagedorn wrote: Nov 16 18:00:26 lvr13 pop3s[3847]: SSL_read() returned 0 Nov 16 18:00:34 lvr13 pop3s[3215]: SSL_read() returned 0 Nov 16 18:00:34 lvr13 pop3s[3199]: SSL_read() returned 0 Nov 16 18:00:39 lvr13 pop3s[3199]: SSL_read() returned 0 Nov 16 18:00:43 lvr13 pop3s[3229]:

Re: One more attempt: stuck processes

2007-11-16 Thread Sebastian Hagedorn
--On 16. November 2007 12:39:28 -0500 Ken Murchison [EMAIL PROTECTED] wrote: Sorry, my patch wasn't complete. It wasn't logging the value that I wanted. OK: Nov 16 18:48:17 lvr13 pop3s[1385]: SSL_read() returned 0:5 Nov 16 18:48:33 lvr13 pop3s[1375]: SSL_read() returned 0:5 Nov 16 18:48:50

Re: One more attempt: stuck processes

2007-11-16 Thread Alain Spineux
Hi Can I resume the problem in : The server is blocked in a read, waiting for the client next command. (this is normal, 99% of the process are in this state). But the autologout procedure is not working! Then this means the SIGALRM that should awake the process never come or is not handled

Re: One more attempt: stuck processes

2007-11-16 Thread Sebastian Hagedorn
--On 16. November 2007 18:21:21 +0100 Gabor Gombas [EMAIL PROTECTED] wrote: On Fri, Nov 16, 2007 at 06:11:01PM +0100, Sebastian Hagedorn wrote: Well, that just sounds like you're running out of entropy. That's a different issue. Recompile your cyrus-sasl to use /dev/urandom instead of

Re: One more attempt: stuck processes

2007-11-16 Thread Sebastian Hagedorn
-- Ken Murchison [EMAIL PROTECTED] is rumored to have mumbled on 16. November 2007 12:58:49 -0500 regarding Re: One more attempt: stuck processes: So should I add a call to ERR_get_error()? Not yet. I'm assuming that none of these processes has hung. We're getting an I/O error most

Re: One more attempt: stuck processes

2007-11-16 Thread Michael M. Rach
I know it has been asked before and may be redundant, but... You answered that cyrus-sasl is using /dev/urandom and should not run out of entropy. However, what about openssl itself? It also uses random numbers. Perhaps, as a test renaming /dev/random and ln -s /dev/urandom /dev/random.

Re: One more attempt: stuck processes

2007-11-16 Thread Michael Bacon
--On Friday, November 16, 2007 3:54 PM -0500 Ken Murchison [EMAIL PROTECTED] wrote: I've reproduced the former by telneting to port 995 and doing nothing. I have been unable to reproduce the latter because as soon as I QUIT the telnet session or kill() the telnet process, pop3d exits

Re: One more attempt: stuck processes

2007-11-16 Thread Gary Mills
On Fri, Nov 16, 2007 at 03:54:50PM -0500, Ken Murchison wrote: That's exactly what Gary is seeing. Its blocking in SSL_accept(). Apparently the client connects to port 995, and then either sends nothing, or goes away and leaves the socket open. I've reproduced the former by telneting to

Re: One more attempt: stuck processes

2007-11-16 Thread Alain Spineux
On Nov 16, 2007 6:24 PM, Alain Spineux [EMAIL PROTECTED] wrote: Hi Can I resume the problem in : I'm wrong The server is blocked in a read, waiting for the client next command. (this is normal, 99% of the process are in this state). No it is waiting in select, and the select has a

Re: One more attempt: stuck processes

2007-11-15 Thread Ken Murchison
Sebastian Hagedorn wrote: Thanks. I will try this patch as soon as I can, but it's clearly not the only issue, because the same thing happens with POP processes. Here's an example for one: (gdb) bt #0 0x0096441e in __read_nocancel () from /lib/tls/libc.so.6 #1 0x00ac02f7 in

Re: One more attempt: stuck processes

2007-11-15 Thread Sebastian Hagedorn
--On 15. November 2007 06:55:44 -0500 Ken Murchison [EMAIL PROTECTED] wrote: OK. What version of OpenSSL? cyradm says: Built w/OpenSSL 0.9.7a Feb 19 2003 Running w/OpenSSL 0.9.7a Feb 19 2003 rpm says: openssl-0.9.7a-33.23 This is RHEL 3. Are they imaps/pop3s

Re: One more attempt: stuck processes

2007-11-15 Thread Sebastian Hagedorn
--On 14. November 2007 16:39:44 -0500 Ken Murchison [EMAIL PROTECTED] wrote: It looks to me like we are timing out the client while the client is IDLEing, but we get a signal from idled in the middle of shutdown(). Try this patch. --- imapd.c.~1.535.~2007-11-14 16:16:21.0 -0500

Re: One more attempt: stuck processes

2007-11-15 Thread Ken Murchison
Sebastian Hagedorn wrote: No. Since this potentially affects all IMAP and POP processes I would have to do it for all entries. Do you recommend that I try that? Since it looks like things are hanging when a process is being used, I'd like to see if the problem goes away if we don't reuse the

Re: One more attempt: stuck processes

2007-11-15 Thread Sebastian Hagedorn
--On 15. November 2007 08:21:48 -0500 Ken Murchison [EMAIL PROTECTED] wrote: No. Since this potentially affects all IMAP and POP processes I would have to do it for all entries. Do you recommend that I try that? Since it looks like things are hanging when a process is being used, I'd like to

Re: One more attempt: stuck processes

2007-11-15 Thread Sebastian Hagedorn
--On 15. November 2007 08:32:18 -0500 Ken Murchison [EMAIL PROTECTED] wrote: Since it looks like things are hanging when a process is being used, I'd like to see if the problem goes away if we don't reuse the processes. I'm just trying to do a bsearch() on the problem. OK. I've made the

Re: One more attempt: stuck processes

2007-11-15 Thread Ken Murchison
Sebastian Hagedorn wrote: --On 15. November 2007 08:21:48 -0500 Ken Murchison [EMAIL PROTECTED] wrote: No. Since this potentially affects all IMAP and POP processes I would have to do it for all entries. Do you recommend that I try that? Since it looks like things are hanging when a

Re: One more attempt: stuck processes

2007-11-15 Thread Ken Murchison
Sebastian Hagedorn wrote: --On 15. November 2007 08:32:18 -0500 Ken Murchison [EMAIL PROTECTED] wrote: Since it looks like things are hanging when a process is being used, I'd like to see if the problem goes away if we don't reuse the processes. I'm just trying to do a bsearch() on the

Re: One more attempt: stuck processes

2007-11-15 Thread Sebastian Hagedorn
--On 15. November 2007 11:00:39 -0500 Ken Murchison [EMAIL PROTECTED] wrote: (gdb) bt # 0 0x0079f41e in __read_nocancel () from /lib/tls/libc.so.6 # 1 0x00d0b2f7 in BIO_new_socket () from /lib/libcrypto.so.4 # 2 0x00d092b2 in BIO_read () from /lib/libcrypto.so.4 # 3 0x005dae13 in

Re: One more attempt: stuck processes

2007-11-15 Thread Alain Spineux
On Nov 15, 2007 4:54 PM, Sebastian Hagedorn [EMAIL PROTECTED] wrote: --On 15. November 2007 08:32:18 -0500 Ken Murchison [EMAIL PROTECTED] wrote: Since it looks like things are hanging when a process is being used, I'd like to see if the problem goes away if we don't reuse the processes.

Re: One more attempt: stuck processes

2007-11-15 Thread Sebastian Hagedorn
--On 15. November 2007 18:14:05 +0100 Alain Spineux [EMAIL PROTECTED] wrote: # strace -p 25038 Process 25038 attached - interrupt to quit read(0, unfinished ... Do you know what is 0, if it was a socket it should timeout, isn't it ? It should, I guess, but it doesn't. # ls -l

Re: One more attempt: stuck processes

2007-11-15 Thread Simon Matter
--On 15. November 2007 18:14:05 +0100 Alain Spineux [EMAIL PROTECTED] wrote: # strace -p 25038 Process 25038 attached - interrupt to quit read(0, unfinished ... Do you know what is 0, if it was a socket it should timeout, isn't it ? It should, I guess, but it doesn't. # ls -l

One more attempt: stuck processes

2007-11-14 Thread Sebastian Hagedorn
Hi, I've brought up this topic before. We've been running cyrus-imapd very happily for several years. Yet there's one issue that none of the updates have resolved. The last time I reported it we were running 2.2.12. Now we're running 2.3.8, but the issues is the same: POP and IMAP processes

Re: One more attempt: stuck processes

2007-11-14 Thread Gary Mills
On Wed, Nov 14, 2007 at 04:15:13PM +0100, Sebastian Hagedorn wrote: I've brought up this topic before. We've been running cyrus-imapd very happily for several years. Yet there's one issue that none of the updates have resolved. The last time I reported it we were running 2.2.12. Now we're

Re: One more attempt: stuck processes

2007-11-14 Thread Sebastian Hagedorn
--On 14. November 2007 09:30:45 -0600 Gary Mills [EMAIL PROTECTED] wrote: On Wed, Nov 14, 2007 at 04:15:13PM +0100, Sebastian Hagedorn wrote: I've brought up this topic before. We've been running cyrus-imapd very happily for several years. Yet there's one issue that none of the updates have

Re: One more attempt: stuck processes

2007-11-14 Thread Ken Murchison
Sebastian Hagedorn wrote: Hi, I've brought up this topic before. We've been running cyrus-imapd very happily for several years. Yet there's one issue that none of the updates have resolved. The last time I reported it we were running 2.2.12. Now we're running 2.3.8, but the issues is the