Gary Mills wrote:
On Mon, Nov 19, 2007 at 12:35:46PM -0500, Ken Murchison wrote:
Sebastian Hagedorn wrote:
-- Ken Murchison [EMAIL PROTECTED] is rumored to have mumbled on
17. November 2007 11:21:38 -0500 regarding Re: One more attempt: stuck
processes:
Here's a patch that seems to fix
--On 20. November 2007 09:20:42 -0500 Ken Murchison [EMAIL PROTECTED]
wrote:
OK. Can you both try this alternate patch? It should be portable, and
GDB shouldn't cause it to kick out. I've set it up so that for
SSL-wrapped services it will timeout after 3 minutes, otherwise it uses
the
--On 20. November 2007 15:59:18 +0100 Sebastian Hagedorn
[EMAIL PROTECTED] wrote:
I can fix this myself, but it's probably easier if you do it.
Just FYI: I fixed it locally with a 3 minute timeout and it compiled fine.
I'll start testing it now.
--
.:.Sebastian Hagedorn - RZKR-R1
-- Ken Murchison [EMAIL PROTECTED] is rumored to have mumbled on 19.
November 2007 12:35:46 -0500 regarding Re: One more attempt: stuck
processes:
How are things looking today?
Good! When I just checked I thought I'd found a new hanging pop3d process,
because it's been around for 6 hours
-- Ken Murchison [EMAIL PROTECTED] is rumored to have mumbled on 19.
November 2007 13:17:07 -0500 regarding Re: One more attempt: stuck
processes:
The only other potential downside
the patch has is that stracing or gdb'ing it causes the timeout to
trigger prematurely. AFAIK that's a common
Sebastian Hagedorn wrote:
-- Ken Murchison [EMAIL PROTECTED] is rumored to have mumbled on
19. November 2007 13:17:07 -0500 regarding Re: One more attempt: stuck
processes:
The only other potential downside
the patch has is that stracing or gdb'ing it causes the timeout to
trigger
Sebastian Hagedorn wrote:
-- Ken Murchison [EMAIL PROTECTED] is rumored to have mumbled on
19. November 2007 12:35:46 -0500 regarding Re: One more attempt: stuck
processes:
How are things looking today?
Good! When I just checked I thought I'd found a new hanging pop3d
process, because
On Mon, Nov 19, 2007 at 12:35:46PM -0500, Ken Murchison wrote:
Sebastian Hagedorn wrote:
-- Ken Murchison [EMAIL PROTECTED] is rumored to have mumbled on
17. November 2007 11:21:38 -0500 regarding Re: One more attempt: stuck
processes:
Here's a patch that seems to fix the problem. I did
On Fri, Nov 16, 2007 at 06:37:52PM +0100, Sebastian Hagedorn wrote:
OK. Still the symptom seems to be different from what I'm seeing.
It may be. As I said I had no time so far to investigate it in depth, I
just wanted to say mee too for the hung process problem.
Could it be that you have a
Sebastian Hagedorn wrote:
-- Ken Murchison [EMAIL PROTECTED] is rumored to have mumbled on
16. November 2007 15:54:50 -0500 regarding Re: One more attempt: stuck
processes:
That's exactly what Gary is seeing.
Right. Apparently stripped binaries aren't any good for straces.
Its
--On 15. November 2007 19:25:19 +0100 Simon Matter [EMAIL PROTECTED]
wrote:
It's blinking red, which normally means a broken link. I'm not sure how
The file 0 is a symbolic symlink which doesn't really point to a file,
that's why the shell shows it blinking. Everything okay here.
Thanks.
--On 16. November 2007 16:52:27 +0100 Gabor Gombas [EMAIL PROTECTED]
wrote:
On Fri, Nov 16, 2007 at 12:36:49PM +0100, Sebastian Hagedorn wrote:
He suggested that the trace is unreliable. Perhaps a bug in RHEL 3's
version of OpenSSL messes up the stack. That would also explain why
nobody
On Fri, Nov 16, 2007 at 03:20:57PM +0100, Sebastian Hagedorn wrote:
--On 16. November 2007 08:00:07 -0600 Gary Mills [EMAIL PROTECTED]
wrote:
This timeout doesn't work in some cases. We have lots of POP sessions
that never terminate.
That's interesting to hear! Especially since you are
Sebastian Hagedorn wrote:
I think I will try one more approach: I reverted cyrus.conf to not use
-U 1 anymore, so that processes should be reused. I will strace one of
the pop3d processes in the hope that it gets stuck. That way I should be
able to see where things go wrong. If the process
On Fri, Nov 16, 2007 at 05:20:00PM +0100, Sebastian Hagedorn wrote:
That's a 2.6 kernel, right?
Yes, 2.6.18-2-amd64.
Hm, we don't suffer any actual slowdown, it's just that the number of
processes increases over time.
It's not a slowdown - the client connects, and hangs. It never even gets
Sebastian Hagedorn wrote:
--On 16. November 2007 09:37:42 -0600 Gary Mills [EMAIL PROTECTED]
wrote:
Could you get a stack trace? If you have gdb you just call it with gdb
-p 19175. Then you can do bt at the prompt. I forget how to do it
with Sun's debugger.
Easy:
# pstack 19175
--On 16. November 2007 18:07:51 +0100 Gabor Gombas [EMAIL PROTECTED]
wrote:
Hm, we don't suffer any actual slowdown, it's just that the number of
processes increases over time.
It's not a slowdown - the client connects, and hangs. It never even gets
to the authentication phase (at least it's
On Fri, Nov 16, 2007 at 01:54:24PM +0100, Alain Spineux wrote:
On Nov 16, 2007 12:36 PM, Sebastian Hagedorn [EMAIL PROTECTED] wrote:
--On 16. November 2007 11:27:09 +0100 Sebastian Hagedorn
[EMAIL PROTECTED] wrote:
1. In the absence of the SO_KEEPALIVE option it is entirely possible that
--On 16. November 2007 11:27:09 +0100 Sebastian Hagedorn
[EMAIL PROTECTED] wrote:
1) Since it only happens on dialup connections, could it be that the
dialin router at the providers end sends TCP/RST when a client hangs up
and those packets are filtered somewhere, maybe on your firewall?
OK,
Sebastian Hagedorn wrote:
The only reason I could imagine for the sequence of calls was signal
handling. But let's be methodical. There's only one spot where
SSL_accept() is called: in tls_start_servertls(). In pop3d.c that's only
called in cmd_starttls(). That in turn is called either in
--On 16. November 2007 13:54:24 +0100 Alain Spineux [EMAIL PROTECTED]
wrote:
On Nov 16, 2007 12:36 PM, Sebastian Hagedorn [EMAIL PROTECTED]
wrote:
I just had a discussion with a colleague regarding this. He made two
observations:
1. In the absence of the SO_KEEPALIVE option it is entirely
--On 16. November 2007 08:00:07 -0600 Gary Mills [EMAIL PROTECTED]
wrote:
This timeout doesn't work in some cases. We have lots of POP sessions
that never terminate.
That's interesting to hear! Especially since you are using Solaris.
About 30 out of 40 are in that state now.
Here's an
--On 16. November 2007 09:37:42 -0600 Gary Mills [EMAIL PROTECTED]
wrote:
Could you get a stack trace? If you have gdb you just call it with gdb
-p 19175. Then you can do bt at the prompt. I forget how to do it
with Sun's debugger.
Easy:
# pstack 19175
19175: pop3d -s
fef9f810
On Fri, Nov 16, 2007 at 12:36:49PM +0100, Sebastian Hagedorn wrote:
He suggested that the trace is unreliable. Perhaps a bug in RHEL 3's
version of OpenSSL messes up the stack. That would also explain why nobody
else seems to have this problem.
FYI I also know a system that has problems
On Fri, Nov 16, 2007 at 06:11:01PM +0100, Sebastian Hagedorn wrote:
Well, that just sounds like you're running out of entropy. That's a
different issue. Recompile your cyrus-sasl to use /dev/urandom instead of
/dev/random or disable apop in /etc/imapd.conf:
Debian uses /dev/urandom for a
Sebastian Hagedorn wrote:
--On 16. November 2007 12:39:28 -0500 Ken Murchison
[EMAIL PROTECTED] wrote:
Sorry, my patch wasn't complete. It wasn't logging the value that I
wanted.
OK:
Nov 16 18:48:17 lvr13 pop3s[1385]: SSL_read() returned 0:5
Nov 16 18:48:33 lvr13 pop3s[1375]:
On Nov 16, 2007 6:11 PM, Sebastian Hagedorn [EMAIL PROTECTED] wrote:
--On 16. November 2007 18:07:51 +0100 Gabor Gombas [EMAIL PROTECTED]
wrote:
Hm, we don't suffer any actual slowdown, it's just that the number of
processes increases over time.
It's not a slowdown - the client
On Fri, Nov 16, 2007 at 05:13:13PM +0100, Sebastian Hagedorn wrote:
--On 16. November 2007 14:23:17 +0100 Simon Matter [EMAIL PROTECTED]
wrote:
Did you ever see non SSL connections get stuck?
No.
Most of mine are `pop3d -s', but I have seen a few without the `-s'.
When I did a stack
OK, now I got this:
Nov 16 18:37:06 lvr13 pop3s[23089]: SSL_read() returned -1
But that process terminated normally.
--
.:.Sebastian Hagedorn - RZKR-R1 (Gebäude 52), Zimmer 18.:.
Zentrum für angewandte Informatik - Universitätsweiter Service RRZK
.:.Universität zu Köln / Cologne University
On Nov 16, 2007 12:36 PM, Sebastian Hagedorn [EMAIL PROTECTED] wrote:
--On 16. November 2007 11:27:09 +0100 Sebastian Hagedorn
[EMAIL PROTECTED] wrote:
1) Since it only happens on dialup connections, could it be that the
dialin router at the providers end sends TCP/RST when a client hangs
--On 16. November 2007 14:23:17 +0100 Simon Matter [EMAIL PROTECTED]
wrote:
Did you ever see non SSL connections get stuck?
No.
--
.:.Sebastian Hagedorn - RZKR-R1 (Gebäude 52), Zimmer 18.:.
Zentrum für angewandte Informatik - Universitätsweiter Service RRZK
.:.Universität zu Köln /
--On 16. November 2007 11:27:52 -0500 Ken Murchison [EMAIL PROTECTED]
wrote:
Sebastian Hagedorn wrote:
The only reason I could imagine for the sequence of calls was signal
handling. But let's be methodical. There's only one spot where
SSL_accept() is called: in tls_start_servertls(). In
Sebastian Hagedorn wrote:
Nov 16 18:00:26 lvr13 pop3s[3847]: SSL_read() returned 0
Nov 16 18:00:34 lvr13 pop3s[3215]: SSL_read() returned 0
Nov 16 18:00:34 lvr13 pop3s[3199]: SSL_read() returned 0
Nov 16 18:00:39 lvr13 pop3s[3199]: SSL_read() returned 0
Nov 16 18:00:43 lvr13 pop3s[3229]:
--On 16. November 2007 12:39:28 -0500 Ken Murchison [EMAIL PROTECTED]
wrote:
Sorry, my patch wasn't complete. It wasn't logging the value that I
wanted.
OK:
Nov 16 18:48:17 lvr13 pop3s[1385]: SSL_read() returned 0:5
Nov 16 18:48:33 lvr13 pop3s[1375]: SSL_read() returned 0:5
Nov 16 18:48:50
Hi
Can I resume the problem in :
The server is blocked in a read, waiting for the client next command.
(this is normal,
99% of the process are in this state). But the autologout procedure is
not working!
Then this means the SIGALRM that should awake the process never come or is not
handled
--On 16. November 2007 18:21:21 +0100 Gabor Gombas [EMAIL PROTECTED]
wrote:
On Fri, Nov 16, 2007 at 06:11:01PM +0100, Sebastian Hagedorn wrote:
Well, that just sounds like you're running out of entropy. That's a
different issue. Recompile your cyrus-sasl to use /dev/urandom instead
of
-- Ken Murchison [EMAIL PROTECTED] is rumored to have mumbled on 16.
November 2007 12:58:49 -0500 regarding Re: One more attempt: stuck
processes:
So should I add a call to ERR_get_error()?
Not yet. I'm assuming that none of these processes has hung. We're
getting an I/O error most
I know it has been asked before and may be redundant, but... You
answered that cyrus-sasl is using /dev/urandom and should not run out of
entropy. However, what about openssl itself? It also uses random
numbers. Perhaps, as a test renaming /dev/random and ln -s /dev/urandom
/dev/random.
--On Friday, November 16, 2007 3:54 PM -0500 Ken Murchison
[EMAIL PROTECTED] wrote:
I've reproduced the former by telneting to port 995 and doing nothing.
I have been unable to reproduce the latter because as soon as I QUIT the
telnet session or kill() the telnet process, pop3d exits
On Fri, Nov 16, 2007 at 03:54:50PM -0500, Ken Murchison wrote:
That's exactly what Gary is seeing. Its blocking in SSL_accept().
Apparently the client connects to port 995, and then either sends
nothing, or goes away and leaves the socket open.
I've reproduced the former by telneting to
On Nov 16, 2007 6:24 PM, Alain Spineux [EMAIL PROTECTED] wrote:
Hi
Can I resume the problem in :
I'm wrong
The server is blocked in a read, waiting for the client next command.
(this is normal,
99% of the process are in this state).
No it is waiting in select, and the select has a
Sebastian Hagedorn wrote:
Thanks. I will try this patch as soon as I can, but it's clearly not the
only issue, because the same thing happens with POP processes. Here's an
example for one:
(gdb) bt
#0 0x0096441e in __read_nocancel () from /lib/tls/libc.so.6
#1 0x00ac02f7 in
--On 15. November 2007 06:55:44 -0500 Ken Murchison [EMAIL PROTECTED]
wrote:
OK. What version of OpenSSL?
cyradm says:
Built w/OpenSSL 0.9.7a Feb 19 2003
Running w/OpenSSL 0.9.7a Feb 19 2003
rpm says:
openssl-0.9.7a-33.23
This is RHEL 3.
Are they imaps/pop3s
--On 14. November 2007 16:39:44 -0500 Ken Murchison [EMAIL PROTECTED]
wrote:
It looks to me like we are timing out the client while the client is
IDLEing, but we get a signal from idled in the middle of shutdown(). Try
this patch.
--- imapd.c.~1.535.~2007-11-14 16:16:21.0 -0500
Sebastian Hagedorn wrote:
No. Since this potentially affects all IMAP and POP processes I would
have to do it for all entries. Do you recommend that I try that?
Since it looks like things are hanging when a process is being used, I'd
like to see if the problem goes away if we don't reuse the
--On 15. November 2007 08:21:48 -0500 Ken Murchison [EMAIL PROTECTED]
wrote:
No. Since this potentially affects all IMAP and POP processes I would
have to do it for all entries. Do you recommend that I try that?
Since it looks like things are hanging when a process is being used, I'd
like to
--On 15. November 2007 08:32:18 -0500 Ken Murchison [EMAIL PROTECTED]
wrote:
Since it looks like things are hanging when a process is being used, I'd
like to see if the problem goes away if we don't reuse the processes.
I'm just trying to do a bsearch() on the problem.
OK. I've made the
Sebastian Hagedorn wrote:
--On 15. November 2007 08:21:48 -0500 Ken Murchison
[EMAIL PROTECTED] wrote:
No. Since this potentially affects all IMAP and POP processes I would
have to do it for all entries. Do you recommend that I try that?
Since it looks like things are hanging when a
Sebastian Hagedorn wrote:
--On 15. November 2007 08:32:18 -0500 Ken Murchison
[EMAIL PROTECTED] wrote:
Since it looks like things are hanging when a process is being used,
I'd
like to see if the problem goes away if we don't reuse the processes.
I'm just trying to do a bsearch() on the
--On 15. November 2007 11:00:39 -0500 Ken Murchison [EMAIL PROTECTED]
wrote:
(gdb) bt
# 0 0x0079f41e in __read_nocancel () from /lib/tls/libc.so.6
# 1 0x00d0b2f7 in BIO_new_socket () from /lib/libcrypto.so.4
# 2 0x00d092b2 in BIO_read () from /lib/libcrypto.so.4
# 3 0x005dae13 in
On Nov 15, 2007 4:54 PM, Sebastian Hagedorn [EMAIL PROTECTED] wrote:
--On 15. November 2007 08:32:18 -0500 Ken Murchison [EMAIL PROTECTED]
wrote:
Since it looks like things are hanging when a process is being used, I'd
like to see if the problem goes away if we don't reuse the processes.
--On 15. November 2007 18:14:05 +0100 Alain Spineux [EMAIL PROTECTED]
wrote:
# strace -p 25038
Process 25038 attached - interrupt to quit
read(0, unfinished ...
Do you know what is 0, if it was a socket it should timeout, isn't it ?
It should, I guess, but it doesn't.
# ls -l
--On 15. November 2007 18:14:05 +0100 Alain Spineux [EMAIL PROTECTED]
wrote:
# strace -p 25038
Process 25038 attached - interrupt to quit
read(0, unfinished ...
Do you know what is 0, if it was a socket it should timeout, isn't it ?
It should, I guess, but it doesn't.
# ls -l
Hi,
I've brought up this topic before. We've been running cyrus-imapd very
happily for several years. Yet there's one issue that none of the updates
have resolved. The last time I reported it we were running 2.2.12. Now
we're running 2.3.8, but the issues is the same: POP and IMAP processes
On Wed, Nov 14, 2007 at 04:15:13PM +0100, Sebastian Hagedorn wrote:
I've brought up this topic before. We've been running cyrus-imapd very
happily for several years. Yet there's one issue that none of the updates
have resolved. The last time I reported it we were running 2.2.12. Now
we're
--On 14. November 2007 09:30:45 -0600 Gary Mills [EMAIL PROTECTED]
wrote:
On Wed, Nov 14, 2007 at 04:15:13PM +0100, Sebastian Hagedorn wrote:
I've brought up this topic before. We've been running cyrus-imapd very
happily for several years. Yet there's one issue that none of the
updates have
Sebastian Hagedorn wrote:
Hi,
I've brought up this topic before. We've been running cyrus-imapd very
happily for several years. Yet there's one issue that none of the
updates have resolved. The last time I reported it we were running
2.2.12. Now we're running 2.3.8, but the issues is the
57 matches
Mail list logo