date:20071116

RE: Collaboration replacement via Toltec/Bynari (was How many people to admin a Cyrus system?)

2007-11-16 Thread Olaf Fraczyk

On Thu, 2007-11-15 at 22:10 +0200, Joon Radley wrote:
 Hi Olaf,
 
  Thats an interesting information. I have always thought that in
  Exchange-Outlook world the processing was on the server side and the
  messages were sitting on the server.
  Or the client side processing is limited to Toltec/Bynari solution?
 
 With Exchange-Outlook the Outlook message store receives on the server. When 
 new mail is delivered to the exchange server it does the special processing 
 before injecting it into the mail store. With the IMAP4 server and SMTP of 
 your choice, the mail must be downloaded to be processes before it can be 
 stored in the Outlook message store.
OK. Now everything is clear.
Didn't have the Bynari some server-side solution in the past?
I remember (though not have used) a product called Bynari Server or sth.

Regards,

Olaf
-- 
Olaf Frączyk [EMAIL PROTECTED]
NAVI
http://www.navi.pl
http://www.ntp.navi.pl


Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Re: LARGE single-system Cyrus installs?

2007-11-16 Thread Pascal Gienger

Rob Mueller [EMAIL PROTECTED] wrote:


 About 30% of all I/O is to mailboxes.db, most of which is read.  I
 haven't personally deployed a split-meta configuration, but I
 understand the meta files are similarly heavy I/O concentrators.

 That sounds odd.

 Given the size and hotness of mailboxes.db, and in most cases the size
 of  mailboxes.db compared to the memory your machine has, basically the
 OS  should end up caching the entire thing in memory.

Solaris 10 does this in my case. Via dtrace you'll see that open() on the 
mailboxes.db and read-calls do not exceed microsecond ranges. mailboxes.db 
is not the problem here. It is entirely cached and rarely written 
(creating, deleting and moving a mailbox).

Pascal




Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Re: One more attempt: stuck processes

2007-11-16 Thread Sebastian Hagedorn

--On 15. November 2007 19:25:19 +0100 Simon Matter [EMAIL PROTECTED] 
wrote:



It's blinking red, which normally means a broken link. I'm not sure how


The file 0 is a symbolic symlink which doesn't really point to a file,
that's why the shell shows it blinking. Everything okay here.


Thanks. That's what I thought, but I wasn't sure.


reliable that is in this case. Anyway, lsof reports:

pop3d   25038 cyrus0u  IPv4  -64802663 TCP
cyrus.rrz.uni-koeln.de:pop3s-p50865F5D.dip.t-dialin.net:1064
(ESTABLISHED)

It *thinks* the connections is still open. So does netstat:

# LANG=C netstat -a|grep p50865F5D
tcp0  0 cyrus.rrz.uni-koeln.d:pop3s
p50865F5D.dip.t-dialin:1064
ESTABLISHED

But obviously that connection is dead. I don't know what conclusions to
draw from that ...


Just two ideas come to mind:

1) Since it only happens on dialup connections, could it be that the
dialin router at the providers end sends TCP/RST when a client hangs up
and those packets are filtered somewhere, maybe on your firewall?


OK, let's run with that one.

a) We don't really have a firewall, we only use ACLs on the Cisco routers. 
You can't even filter TCP/RST there.


b) Even *if* a TCP/RST had been dropped, lost or whatever, the server 
*still* should timeout eventually!



2) Could it be that SO_LINGER should be used as socket option in
service_create() in master/master.c.


I didn't remember that option, so I just read up on it. It seems as though 
SO_LINGER is very dependent on implementation.  If I get your intention 
correctly SO_LINGER would have to be set with l_onoff set to non-zero and 
l_linger to zero, right? So close() would return immediately? That might 
make sense if the stack trace showed a call to close(). But if I understand 
the code correctly, close() isn't called at all. The socket is closed as a 
result of a call to exit(). And that defeats all use of SO_LINGER:


When the socket is closed as part of exit(2), it always lingers in the 
background.



If it's complete nonsense, ignore it.


I wouldn't know :-)
--
.:.Sebastian Hagedorn - RZKR-R1 (Gebäude 52), Zimmer 18.:.
Zentrum für angewandte Informatik - Universitätsweiter Service RRZK
.:.Universität zu Köln / Cologne University - ✆ +49-221-478-5587.:.
  .:.:.:.Skype: shagedorn.:.:.:.

pgpproVHc1y86.pgp
Description: PGP signature

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Re: One more attempt: stuck processes

2007-11-16 Thread Sebastian Hagedorn

--On 16. November 2007 16:52:27 +0100 Gabor Gombas [EMAIL PROTECTED] 
wrote:



On Fri, Nov 16, 2007 at 12:36:49PM +0100, Sebastian Hagedorn wrote:


He suggested that the trace is unreliable. Perhaps a bug in RHEL 3's
version of OpenSSL messes up the stack. That would also explain why
nobody  else seems to have this problem.


FYI I also know a system that has problems with hung Cyrus processes.
AFAIR they have problems with pop3s only, but that may be because there
are more POP3 than IMAP users, I don't know. The system in question runs
2.3.8 on Debian Etch currently.


That's a 2.6 kernel, right?


I intend to help diagnose that system but had no time so far; they're
now running a script that does a POP3 connection every couple of minutes
and if that takes too long it restarts Cyrus.


Hm, we don't suffer any actual slowdown, it's just that the number of 
processes increases over time.



There is nothing interesting in the logs:

Oct 15 02:39:31 host cyrus/master[6102]: about to exec
/usr/local/cyrus/sbin/pop3d Oct 15 02:39:31 host cyrus/pop3s[6102]:
executed
Oct 15 02:39:31 host cyrus/pop3s[6102]: accepted connection


That's what I'm seeing. Could you get a stack trace?


OTOH there are a lot of messages like the following:

Oct 16 14:13:10 host cyrus/master[26136]: about to exec
/usr/local/cyrus/sbin/pop3d Oct 16 14:13:10 host cyrus/pop3s[26136]:
executed
Oct 16 14:13:10 host cyrus/pop3s[26136]: accepted connection
Oct 16 14:13:10 host cyrus/pop3s[26136]: pop3s failed:
[XX.XXX.XX.XXX] Oct 16 14:13:10 host cyrus/pop3s[26136]: Fatal error:
tls_start_servertls() failed Oct 16 14:13:10 host cyrus/master[15923]:
process 26136 exited, status 75 Oct 16 14:13:10 host cyrus/master[15923]:
service pop3s pid 26136 in BUSY state: terminated abnormally

Any idea what's causing that?


I have many of those as well. I suppose that could be any number of things. 
Faulty protocol or dropped connections.

--
.:.Sebastian Hagedorn - RZKR-R1 (Gebäude 52), Zimmer 18.:.
Zentrum für angewandte Informatik - Universitätsweiter Service RRZK
.:.Universität zu Köln / Cologne University - ✆ +49-221-478-5587.:.
  .:.:.:.Skype: shagedorn.:.:.:.

pgp3H24eUgNSV.pgp
Description: PGP signature

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Re: Timed Actions in Sieve

2007-11-16 Thread Gary Mills

On Tue, Nov 13, 2007 at 11:24:48AM +, Ian G Batten wrote:
 We've been having a chat about how useful it would be to have timed  
 actions in sieve: so that a vacation message could be set up for a  
 duration which would automatically revert, so that a forwarding could  
 be set up for the duration of a short-term project, etc, etc.  The  
 naive way is to add support to the sieve interface of choice (the  
 squirrelmail plugin in our case) to handle deferred actions, but I  
 can think of all sorts of security problems with that.  Another would  
 be a means to auto-generate regexps to match on Date: headers, but  
 that's really tacky.  The full solution would be to have the current  
 time available in sieve scripts, to then match on.  Has anyone else  
 thought about this area?

We've had occasional complaints from people who set up a vacation
message and then forgot to remove it later.  They would like to be
able to put a time limit on such things, so that they would stop
working when that limit expires.  More generally, I suppose they
could specify start and stop times, so that they could set up the
sieve script in advance of their vacation.

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Re: One more attempt: stuck processes

2007-11-16 Thread Gary Mills

On Fri, Nov 16, 2007 at 03:20:57PM +0100, Sebastian Hagedorn wrote:
 --On 16. November 2007 08:00:07 -0600 Gary Mills [EMAIL PROTECTED] 
 wrote:
 
 This timeout doesn't work in some cases.  We have lots of POP sessions
 that never terminate.
 
 That's interesting to hear! Especially since you are using Solaris.
 
  About 30 out of 40 are in that state now.
 Here's an example:
 
cyrus 13075   708  0   Oct 14 ?0:05 pop3d -s
cyrus 20023   708  0   Oct 29 ?0:00 pop3d
cyrus 24560   708  1 07:38:03 ?0:03 pop3d
cyrus   631   708  0   Oct 03 ?0:10 pop3d -s
cyrus  6786   708  0   Oct 20 ?0:00 pop3d -s
cyrus 29777   708  0 07:45:03 ?0:00 pop3d
cyrus 19175   708  0   Oct 04 ?0:04 pop3d -s
 
 One I just checked is stuck in a read():
 
   # truss -p 19175
   read(0, 0x002316F0, 5)  (sleeping...)
   ^?# pfiles 19175
   19175:  pop3d -s
 Current rlimit: 256 file descriptors
  0: S_IFSOCK mode:0666 dev:271,0 ino:25813 uid:0 gid:0 size:0
 O_RDWR
   sockname: AF_INET 130.179.16.23  port: 995
   peername: AF_INET 130.179.188.184  port: 51771
 
 Could you get a stack trace? If you have gdb you just call it with gdb -p 
 19175. Then you can do bt at the prompt. I forget how to do it with 
 Sun's debugger.

Easy:

  # pstack 19175
  19175:  pop3d -s
   fef9f810 read (0, 2316f0, 5)
   fee1d2d0 read (0, 2316f0, 5, 0, 0, 0) + 5c
   ff06bb38 sock_read (1f0860, 2316f0, 5, 5, 0, 0) + 24
   ff068af0 BIO_read (1f0860, 2316f0, 5, fef98b84, 0, 0) + 110
   ff278488 ssl3_read_n (212798, 5, 8805, 0, 0, 203958) + 174
   ff2785fc ssl3_get_record (204ce0, 8000, 8400, 4400, f1, f0) + d0
   ff279424 ssl3_read_bytes (212798, 1000, 2000, 4, 0, ffbfe731) + 228
   ff27a99c ssl3_get_message (ff2a259c, 2070a0, 0, , 19000, ffbfe7a0) + 
d0 ff27042c ssl3_accept (2150, 2160, 2180, 21e0, 2110, 2122) + 904
   ff27bd2c ssl23_get_client_hello (2316fb, 6c, 6c, 4, fe79, 0) + 828
   ff27b4b4 ssl23_accept (4000, 2000, 0, 0, 0, 0) + 2a4
   00032d00 tls_start_servertls (0, 1, ffbfee24, ffbfee20, 1849a8, ff00) + 198
   0002c504 cmd_starttls (1, 1fd8b8, 0, 0, 0, 0) + 184
   0002a638 service_main (2, 192198, ffbffce0, 1aec4, 3508c, 1) + 488
   00035250 main (2, ffbffcd4, ffbffce0, 17c400, 0, 0) + e18
   00029298 _start   (0, 0, 0, 0, 0, 0) + 108

I've confirmed that the client has gone away a long time ago.

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Re: One more attempt: stuck processes

2007-11-16 Thread Ken Murchison

Sebastian Hagedorn wrote:

 I think I will try one more approach: I reverted cyrus.conf to not use 
 -U 1 anymore, so that processes should be reused. I will strace one of 
 the pop3d processes in the hope that it gets stuck. That way I should be 
 able to see where things go wrong. If the process terminates normally I 
 will try with another one.

Please let me know if you get a trace from a hung process.

-- 
Kenneth Murchison
Systems Programmer
Project Cyrus Developer/Maintainer
Carnegie Mellon University

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Re: One more attempt: stuck processes

2007-11-16 Thread Gabor Gombas

On Fri, Nov 16, 2007 at 05:20:00PM +0100, Sebastian Hagedorn wrote:

 That's a 2.6 kernel, right?

Yes, 2.6.18-2-amd64.

 Hm, we don't suffer any actual slowdown, it's just that the number of 
 processes increases over time.

It's not a slowdown - the client connects, and hangs. It never even gets
to the authentication phase (at least it's not logged). Clients that
happen to connect to a non-affected process run normally. Also, IMAP
connections do not seem to be affected, at least I did not hear any
complaints about that.

 That's what I'm seeing. Could you get a stack trace?

I intend to but I do not have the time currently. I'm not involved in
the daily management of that machine and the operators are happy to just
restart Cyrus when the hangs begin, and so far I never was around just
when that happened.

Gabor

-- 
 -
 MTA SZTAKI Computer and Automation Research Institute
Hungarian Academy of Sciences
 -

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Re: One more attempt: stuck processes

2007-11-16 Thread Ken Murchison

Sebastian Hagedorn wrote:
 --On 16. November 2007 09:37:42 -0600 Gary Mills [EMAIL PROTECTED] 
 wrote:
 
 Could you get a stack trace? If you have gdb you just call it with gdb
 -p  19175. Then you can do bt at the prompt. I forget how to do it
 with  Sun's debugger.

 Easy:

   # pstack 19175
   19175:  pop3d -s
fef9f810 read (0, 2316f0, 5)
fee1d2d0 read (0, 2316f0, 5, 0, 0, 0) + 5c
ff06bb38 sock_read (1f0860, 2316f0, 5, 5, 0, 0) + 24
ff068af0 BIO_read (1f0860, 2316f0, 5, fef98b84, 0, 0) + 110
ff278488 ssl3_read_n (212798, 5, 8805, 0, 0, 203958) + 174
ff2785fc ssl3_get_record (204ce0, 8000, 8400, 4400, f1, f0) + d0
ff279424 ssl3_read_bytes (212798, 1000, 2000, 4, 0, ffbfe731) + 228
ff27a99c ssl3_get_message (ff2a259c, 2070a0, 0, , 19000,
 ffbfe7a0) + d0 ff27042c ssl3_accept (2150, 2160, 2180, 21e0, 2110, 2122)
 + 904ff27bd2c ssl23_get_client_hello (2316fb, 6c, 6c, 4, fe79, 0)
 + 828ff27b4b4 ssl23_accept (4000, 2000, 0, 0, 0, 0) + 2a4
00032d00 tls_start_servertls (0, 1, ffbfee24, ffbfee20, 1849a8, ff00)
 + 1980002c504 cmd_starttls (1, 1fd8b8, 0, 0, 0, 0) + 184
0002a638 service_main (2, 192198, ffbffce0, 1aec4, 3508c, 1) + 488
00035250 main (2, ffbffcd4, ffbffce0, 17c400, 0, 0) + e18
00029298 _start   (0, 0, 0, 0, 0, 0) + 108
 
 Thanks, that looks like progress! That stack trace looks similar enough 
 to the one I'm seeing that I could imagine that it is what I *should* be 
 seeing if the stack weren't garbled. Of course that's only speculation.
 
 Ken, is it possible that the call to SSL_accept() in 
 tls_start_servertls() blocks when the client goes away? That could 
 explain everything 

Yes.  Gary's problem might be very similar to yours, depending on what I 
see from the patch that I just sent you.

-- 
Kenneth Murchison
Systems Programmer
Project Cyrus Developer/Maintainer
Carnegie Mellon University

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Re: One more attempt: stuck processes

2007-11-16 Thread Sebastian Hagedorn

--On 16. November 2007 18:07:51 +0100 Gabor Gombas [EMAIL PROTECTED] 
wrote:



Hm, we don't suffer any actual slowdown, it's just that the number of
processes increases over time.


It's not a slowdown - the client connects, and hangs. It never even gets
to the authentication phase (at least it's not logged). Clients that
happen to connect to a non-affected process run normally.


Well, that just sounds like you're running out of entropy. That's a 
different issue. Recompile your cyrus-sasl to use /dev/urandom instead of 
/dev/random or disable apop in /etc/imapd.conf:


allowapop: 0

Either of those things should get rid of that.
--
.:.Sebastian Hagedorn - RZKR-R1 (Gebäude 52), Zimmer 18.:.
Zentrum für angewandte Informatik - Universitätsweiter Service RRZK
.:.Universität zu Köln / Cologne University - ✆ +49-221-478-5587.:.
  .:.:.:.Skype: shagedorn.:.:.:.

pgp4g0isqF9Ha.pgp
Description: PGP signature

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Re: One more attempt: stuck processes

2007-11-16 Thread Gary Mills

On Fri, Nov 16, 2007 at 01:54:24PM +0100, Alain Spineux wrote:
 On Nov 16, 2007 12:36 PM, Sebastian Hagedorn [EMAIL PROTECTED] wrote:
  --On 16. November 2007 11:27:09 +0100 Sebastian Hagedorn
  [EMAIL PROTECTED] wrote:
 
  1. In the absence of the SO_KEEPALIVE option it is entirely possible that a
  TCP connection remains ESTABLISHED even when the other side has gone.
 
 I said that socket should timeout, but this is true only when the
 protocol (TCP here)
 require a response (usualy AK here) or at connection establishement.
 On the contrary
 it should stay open indefinitely util something happens. Router doing
 NAT can drop
 a too old connection, because it has to maintains a NAT table and make some
 cleanup time to time, this where KEEPALIVE become usefull.
 
  This may not be a solution to this particular problem, but it made me
  wonder why Cyrus does *not* use SO_KEEPALIVE. Is there a downside to it?
 
 Cyrus has already a built-in time out, it seems a lite conflicting to actively
 maintains the connection until it drop it itself !
 This is the works of the client to actively maintains the connection,
 if it want it !

This timeout doesn't work in some cases.  We have lots of POP sessions
that never terminate.  About 30 out of 40 are in that state now.
Here's an example:

   cyrus 13075   708  0   Oct 14 ?0:05 pop3d -s
   cyrus 20023   708  0   Oct 29 ?0:00 pop3d
   cyrus 24560   708  1 07:38:03 ?0:03 pop3d
   cyrus   631   708  0   Oct 03 ?0:10 pop3d -s
   cyrus  6786   708  0   Oct 20 ?0:00 pop3d -s
   cyrus 29777   708  0 07:45:03 ?0:00 pop3d
   cyrus 19175   708  0   Oct 04 ?0:04 pop3d -s

One I just checked is stuck in a read():

  # truss -p 19175
  read(0, 0x002316F0, 5)  (sleeping...)
  ^?# pfiles 19175
  19175:  pop3d -s
Current rlimit: 256 file descriptors
 0: S_IFSOCK mode:0666 dev:271,0 ino:25813 uid:0 gid:0 size:0
O_RDWR
  sockname: AF_INET 130.179.16.23  port: 995
  peername: AF_INET 130.179.188.184  port: 51771

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Re: One more attempt: stuck processes

2007-11-16 Thread Sebastian Hagedorn

--On 16. November 2007 11:27:09 +0100 Sebastian Hagedorn 
[EMAIL PROTECTED] wrote:



1) Since it only happens on dialup connections, could it be that the
dialin router at the providers end sends TCP/RST when a client hangs up
and those packets are filtered somewhere, maybe on your firewall?


OK, let's run with that one.

a) We don't really have a firewall, we only use ACLs on the Cisco
routers. You can't even filter TCP/RST there.

b) Even *if* a TCP/RST had been dropped, lost or whatever, the server
*still* should timeout eventually!


I just had a discussion with a colleague regarding this. He made two 
observations:


1. In the absence of the SO_KEEPALIVE option it is entirely possible that a 
TCP connection remains ESTABLISHED even when the other side has gone.


This may not be a solution to this particular problem, but it made me 
wonder why Cyrus does *not* use SO_KEEPALIVE. Is there a downside to it?


2. The stack trace looks garbled:

(gdb) bt
#0  0x0079f41e in __read_nocancel () from /lib/tls/libc.so.6
#1  0x00d0b2f7 in BIO_new_socket () from /lib/libcrypto.so.4
#2  0x00d092b2 in BIO_read () from /lib/libcrypto.so.4
#3  0x005dae13 in ssl23_read_bytes () from /lib/libssl.so.4
#4  0x005d9c51 in ssl23_get_client_hello () from /lib/libssl.so.4
#5  0x005d9712 in ssl23_accept () from /lib/libssl.so.4
#6  0x005ddc9a in SSL_accept () from /lib/libssl.so.4
#7  0x08052cb3 in shut_down ()
#8  0x0804e513 in shut_down ()
#9  0x0804d58c in ?? ()
#10 0x0001 in ?? ()
#11 0x082ee848 in ?? ()
#12 0x in ?? ()

He suggested that the trace is unreliable. Perhaps a bug in RHEL 3's 
version of OpenSSL messes up the stack. That would also explain why nobody 
else seems to have this problem.


I think I will try one more approach: I reverted cyrus.conf to not use -U 
1 anymore, so that processes should be reused. I will strace one of the 
pop3d processes in the hope that it gets stuck. That way I should be able 
to see where things go wrong. If the process terminates normally I will try 
with another one. If that doesn't go anywhere, I guess I'll drop this 
investigation. We will upgrade to RHEL 5 some time next year, so hopefully 
that will bring new bugs :-)

--
.:.Sebastian Hagedorn - RZKR-R1 (Gebäude 52), Zimmer 18.:.
Zentrum für angewandte Informatik - Universitätsweiter Service RRZK
.:.Universität zu Köln / Cologne University - ✆ +49-221-478-5587.:.
  .:.:.:.Skype: shagedorn.:.:.:.

pgpbrIUha0peZ.pgp
Description: PGP signature

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Re: LARGE single-system Cyrus installs?

2007-11-16 Thread Michael Bacon

--On Friday, November 16, 2007 7:39 AM +0100 Pascal Gienger 
[EMAIL PROTECTED] wrote:

 Solaris 10 does this in my case. Via dtrace you'll see that open() on the
 mailboxes.db and read-calls do not exceed microsecond ranges.
 mailboxes.db  is not the problem here. It is entirely cached and rarely
 written  (creating, deleting and moving a mailbox).

This is where I think the actual user count may really influence this 
behavior.  On our system, during heavy times, we can see writes to the 
mailboxes file separated by no more than 5-10 seconds.

If you're constantly freezing all cyrus processes for the duration of those 
writes, and those writes are taking any appreciable time at all, you're 
going to have a stuttering server with big load averages.

Again, it's not I/O throughput to be worried about here -- it's latency. 
If you don't have write caches in front of your disk, even with RAID you're 
still at the mercy of drive latency in the millisecond range.  Not a 
problem if those writes are once every five minutes, but if you're at peak 
load on a big system and seeing them every couple of seconds, that's brutal.

-Michael

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Re: One more attempt: stuck processes

2007-11-16 Thread Ken Murchison

Sebastian Hagedorn wrote:

 The only reason I could imagine for the sequence of calls was signal 
 handling. But let's be methodical. There's only one spot where 
 SSL_accept() is called: in tls_start_servertls(). In pop3d.c that's only 
 called in cmd_starttls(). That in turn is called either in cmdloop (for 
 handling of STLS) or in service_main() for connections to port 995.

Actually, now that I think about it, I believe SSL_accept() can be 
called from SSL_read() at any time if a renegotiation is required. 
Since shut_down() calls prot_fill(), which in turn can call SSL_read(), 
its possible that we can get an SSL_accept() call.  Before I start 
hacking code, can you apply the following patch (sorry about the line 
breaks) and see if I'm heading in the right direction?  Let me know if 
you get any of the WARNING messages in your logs.


--- prot.c.~1.93.~  2007-11-16 11:21:56.0 -0500
+++ prot.c  2007-11-16 11:23:32.0 -0500
@@ -468,6 +468,7 @@
/* just do a SSL read instead if we're under a tls layer */
if (s-tls_conn != NULL) {
n = SSL_read(s-tls_conn, (char *) s-buf, PROT_BUFSIZE);
+   if (n = 0) syslog(LOG_WARNING, SSL_read() returned %d, n);
} else {
n = read(s-fd, s-buf, PROT_BUFSIZE);
}
-- 
Kenneth Murchison
Systems Programmer
Project Cyrus Developer/Maintainer
Carnegie Mellon University

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Re: One more attempt: stuck processes

2007-11-16 Thread Sebastian Hagedorn

--On 16. November 2007 13:54:24 +0100 Alain Spineux [EMAIL PROTECTED] 
wrote:



On Nov 16, 2007 12:36 PM, Sebastian Hagedorn [EMAIL PROTECTED]
wrote:

I just had a discussion with a colleague regarding this. He made two
observations:

1. In the absence of the SO_KEEPALIVE option it is entirely possible
that a TCP connection remains ESTABLISHED even when the other side has
gone.


I said that socket should timeout, but this is true only when the
protocol (TCP here)
require a response (usualy AK here) or at connection establishement.


Right.


On the contrary
it should stay open indefinitely util something happens. Router doing
NAT can drop
a too old connection, because it has to maintains a NAT table and make
some cleanup time to time, this where KEEPALIVE become usefull.


Not only there, but I think also in the case of unilaterally dropped 
connections.



This may not be a solution to this particular problem, but it made me
wonder why Cyrus does *not* use SO_KEEPALIVE. Is there a downside to it?


Cyrus has already a built-in time out, it seems a lite conflicting to
actively maintains the connection until it drop it itself !


I'm not sure I understand that sentence.


This is the works of the client to actively maintains the connection,
if it want it !


Yes, but what if the client is gone? I realise that *normally* the server 
keeps a built-in timeout, but I'm guessing that sometimes it doesn't work, 
perhaps because something (in prot_fill() perhaps?) blocks.



I think I will try one more approach: I reverted cyrus.conf to not use
-U 1 anymore, so that processes should be reused. I will strace one of
the pop3d processes in the hope that it gets stuck. That way I should be
able to see where things go wrong. If the process terminates normally I
will try with another one. If that doesn't go anywhere, I guess I'll
drop this


You could try to replace imapd by a home made script, something like .

mv imapd imapd_
echo exec strace -o /tmp/imapd.$$ imapd_ $*  imapd
chmod imapd a+x


Thanks for the suggestion. I'll think about it, although I'm wary of doing 
that on a production server.



investigation. We will upgrade to RHEL 5 some time next year, so
hopefully that will bring new bugs :-)


Sorry but I dont understand what you are complaining about!


I'm not complaining ...


Is-it because the imap or pop client is loosing its connection and
this disturb the user


No.


or just because you are getting some sleeping processes ?


If it were some I wouldn't worry. I'm talking hundreds of processes! I 
know I can kill them, in fact for the pop3d processes we run this command 
once a month:


ps -C pop3d -o pid,start|grep [a-z]|awk '{print $1}'|xargs kill

(It kills pop3d processes that have the month in their start time, i.e. are 
more than a day old)


But for imapd processes it's not as easy to tell if they are just 
long-living or stuck.



Do you have a timeout option in your imapd.conf to force the
imap/pop server to autologout ?


No. But both POP and IMAP have default timeouts. They just don't work in my 
case.

--
.:.Sebastian Hagedorn - RZKR-R1 (Gebäude 52), Zimmer 18.:.
Zentrum für angewandte Informatik - Universitätsweiter Service RRZK
.:.Universität zu Köln / Cologne University - ✆ +49-221-478-5587.:.
  .:.:.:.Skype: shagedorn.:.:.:.

pgpUj3SrktoJw.pgp
Description: PGP signature

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Re: One more attempt: stuck processes

2007-11-16 Thread Sebastian Hagedorn

--On 16. November 2007 08:00:07 -0600 Gary Mills [EMAIL PROTECTED] 
wrote:



This timeout doesn't work in some cases.  We have lots of POP sessions
that never terminate.


That's interesting to hear! Especially since you are using Solaris.


 About 30 out of 40 are in that state now.
Here's an example:

   cyrus 13075   708  0   Oct 14 ?0:05 pop3d -s
   cyrus 20023   708  0   Oct 29 ?0:00 pop3d
   cyrus 24560   708  1 07:38:03 ?0:03 pop3d
   cyrus   631   708  0   Oct 03 ?0:10 pop3d -s
   cyrus  6786   708  0   Oct 20 ?0:00 pop3d -s
   cyrus 29777   708  0 07:45:03 ?0:00 pop3d
   cyrus 19175   708  0   Oct 04 ?0:04 pop3d -s

One I just checked is stuck in a read():

  # truss -p 19175
  read(0, 0x002316F0, 5)  (sleeping...)
  ^?# pfiles 19175
  19175:  pop3d -s
Current rlimit: 256 file descriptors
 0: S_IFSOCK mode:0666 dev:271,0 ino:25813 uid:0 gid:0 size:0
O_RDWR
  sockname: AF_INET 130.179.16.23  port: 995
  peername: AF_INET 130.179.188.184  port: 51771


Could you get a stack trace? If you have gdb you just call it with gdb -p 
19175. Then you can do bt at the prompt. I forget how to do it with 
Sun's debugger.

--
.:.Sebastian Hagedorn - RZKR-R1 (Gebäude 52), Zimmer 18.:.
Zentrum für angewandte Informatik - Universitätsweiter Service RRZK
.:.Universität zu Köln / Cologne University - ✆ +49-221-478-5587.:.
  .:.:.:.Skype: shagedorn.:.:.:.

pgpvKBTMY4YQA.pgp
Description: PGP signature

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Re: One more attempt: stuck processes

2007-11-16 Thread Sebastian Hagedorn

--On 16. November 2007 09:37:42 -0600 Gary Mills [EMAIL PROTECTED] 
wrote:



Could you get a stack trace? If you have gdb you just call it with gdb
-p  19175. Then you can do bt at the prompt. I forget how to do it
with  Sun's debugger.


Easy:

  # pstack 19175
  19175:  pop3d -s
   fef9f810 read (0, 2316f0, 5)
   fee1d2d0 read (0, 2316f0, 5, 0, 0, 0) + 5c
   ff06bb38 sock_read (1f0860, 2316f0, 5, 5, 0, 0) + 24
   ff068af0 BIO_read (1f0860, 2316f0, 5, fef98b84, 0, 0) + 110
   ff278488 ssl3_read_n (212798, 5, 8805, 0, 0, 203958) + 174
   ff2785fc ssl3_get_record (204ce0, 8000, 8400, 4400, f1, f0) + d0
   ff279424 ssl3_read_bytes (212798, 1000, 2000, 4, 0, ffbfe731) + 228
   ff27a99c ssl3_get_message (ff2a259c, 2070a0, 0, , 19000,
ffbfe7a0) + d0 ff27042c ssl3_accept (2150, 2160, 2180, 21e0, 2110, 2122)
+ 904ff27bd2c ssl23_get_client_hello (2316fb, 6c, 6c, 4, fe79, 0)
+ 828ff27b4b4 ssl23_accept (4000, 2000, 0, 0, 0, 0) + 2a4
   00032d00 tls_start_servertls (0, 1, ffbfee24, ffbfee20, 1849a8, ff00)
+ 1980002c504 cmd_starttls (1, 1fd8b8, 0, 0, 0, 0) + 184
   0002a638 service_main (2, 192198, ffbffce0, 1aec4, 3508c, 1) + 488
   00035250 main (2, ffbffcd4, ffbffce0, 17c400, 0, 0) + e18
   00029298 _start   (0, 0, 0, 0, 0, 0) + 108


Thanks, that looks like progress! That stack trace looks similar enough to 
the one I'm seeing that I could imagine that it is what I *should* be 
seeing if the stack weren't garbled. Of course that's only speculation.


Ken, is it possible that the call to SSL_accept() in tls_start_servertls() 
blocks when the client goes away? That could explain everything 

--
.:.Sebastian Hagedorn - RZKR-R1 (Gebäude 52), Zimmer 18.:.
Zentrum für angewandte Informatik - Universitätsweiter Service RRZK
.:.Universität zu Köln / Cologne University - ✆ +49-221-478-5587.:.
  .:.:.:.Skype: shagedorn.:.:.:.

pgpldwycIAjiI.pgp
Description: PGP signature

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Re: One more attempt: stuck processes

2007-11-16 Thread Gabor Gombas

On Fri, Nov 16, 2007 at 12:36:49PM +0100, Sebastian Hagedorn wrote:

 He suggested that the trace is unreliable. Perhaps a bug in RHEL 3's 
 version of OpenSSL messes up the stack. That would also explain why nobody 
 else seems to have this problem.

FYI I also know a system that has problems with hung Cyrus processes.
AFAIR they have problems with pop3s only, but that may be because there
are more POP3 than IMAP users, I don't know. The system in question runs
2.3.8 on Debian Etch currently.

I intend to help diagnose that system but had no time so far; they're
now running a script that does a POP3 connection every couple of minutes
and if that takes too long it restarts Cyrus.

There is nothing interesting in the logs:

Oct 15 02:39:31 host cyrus/master[6102]: about to exec 
/usr/local/cyrus/sbin/pop3d
Oct 15 02:39:31 host cyrus/pop3s[6102]: executed
Oct 15 02:39:31 host cyrus/pop3s[6102]: accepted connection

... and that's about it, nothing else is logged about the stuck process.
As can be seen the process gets stuck just after it has been created, so
-U 1 can not help.

OTOH there are a lot of messages like the following:

Oct 16 14:13:10 host cyrus/master[26136]: about to exec 
/usr/local/cyrus/sbin/pop3d
Oct 16 14:13:10 host cyrus/pop3s[26136]: executed
Oct 16 14:13:10 host cyrus/pop3s[26136]: accepted connection
Oct 16 14:13:10 host cyrus/pop3s[26136]: pop3s failed: 
[XX.XXX.XX.XXX]
Oct 16 14:13:10 host cyrus/pop3s[26136]: Fatal error: tls_start_servertls() 
failed
Oct 16 14:13:10 host cyrus/master[15923]: process 26136 exited, status 75
Oct 16 14:13:10 host cyrus/master[15923]: service pop3s pid 26136 in BUSY 
state: terminated abnormally

Any idea what's causing that?

Gabor

-- 
 -
 MTA SZTAKI Computer and Automation Research Institute
Hungarian Academy of Sciences
 -

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

RE: Collaboration replacement via Toltec/Bynari (was How many people to admin a Cyrus system?)

2007-11-16 Thread Ian Eiloart



--On 15 November 2007 22:15:32 +0200 Joon Radley [EMAIL PROTECTED] wrote:

 Hi Ian,

 Cyrus Mailstore does handle final delivery, but there's plenty of
 opportunity to handle messages before that point. For example, we now
 use Exim and Cyrus Mailstore, and we have plenty of processing going on
 in Exim before hand off to Cyrus (with LMTP) including spamassassin,
 clamav
 and
 Exim filters. There are also processes between the two, for example
 Mailman.

 Very true, but it does not do the processing needed by Outlook. It cannot
 convert iTip and winmail.dat attachments to the related message objects
 and do the linking in the Outlook message store. This is where you need
 the transport mechanism of Outlook.


So, the problem has nothing to do with IMAP, and everything to do with 
message handling before delivery to the mailbox.

-- 
Ian Eiloart
IT Services, University of Sussex
x3148

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Re: One more attempt: stuck processes

2007-11-16 Thread Gabor Gombas

On Fri, Nov 16, 2007 at 06:11:01PM +0100, Sebastian Hagedorn wrote:

 Well, that just sounds like you're running out of entropy. That's a 
 different issue. Recompile your cyrus-sasl to use /dev/urandom instead of 
 /dev/random or disable apop in /etc/imapd.conf:

Debian uses /dev/urandom for a long time:

# strings /usr/lib/libsasl2.so.2 | grep random
/dev/urandom

And according to the logs I have, after a pop3 process got stuck other
IMAP users can still log in using TLS+PLAIN, so entropy can be ruled
out.

Gabor

-- 
 -
 MTA SZTAKI Computer and Automation Research Institute
Hungarian Academy of Sciences
 -

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Re: One more attempt: stuck processes

2007-11-16 Thread Ken Murchison

Sebastian Hagedorn wrote:
 --On 16. November 2007 12:39:28 -0500 Ken Murchison 
 [EMAIL PROTECTED] wrote:
 
 Sorry, my patch wasn't complete.  It wasn't logging the value that I
 wanted.
 
 OK:
 
 Nov 16 18:48:17 lvr13 pop3s[1385]: SSL_read() returned 0:5
 Nov 16 18:48:33 lvr13 pop3s[1375]: SSL_read() returned 0:5
 Nov 16 18:48:50 lvr13 pop3s[1980]: SSL_read() returned 0:6
 Nov 16 18:48:54 lvr13 pop3s[1376]: SSL_read() returned 0:5
 Nov 16 18:49:03 lvr13 pop3s[1375]: SSL_read() returned 0:5
 Nov 16 18:49:11 lvr13 pop3s[1375]: SSL_read() returned 0:5
 Nov 16 18:49:38 lvr13 pop3s[1375]: SSL_read() returned 0:5
 Nov 16 18:49:54 lvr13 pop3s[1404]: SSL_read() returned 0:5
 
 I'm guessing that's still not enough:
 
 #define SSL_ERROR_SYSCALL   5 /* look at error stack/return 
 value/errno */
 #define SSL_ERROR_ZERO_RETURN   6
 
   SSL_ERROR_SYSCALL
   Some I/O error occurred.  The OpenSSL error queue may contain 
 more
   information on the error.  If the error queue is empty (i.e.
   ERR_get_error() returns 0), ret can be used to find out more 
 about
   the error: If ret == 0, an EOF was observed that violates the 
 pro-
   tocol.  If ret == -1, the underlying BIO reported an I/O error 
 (for
   socket I/O on Unix systems, consult errno for details).
 
 So should I add a call to ERR_get_error()?


Not yet.  I'm assuming that none of these processes has hung.  We're 
getting an I/O error most likely because the client has closed the 
connection immediately after sending QUIT.  This is harmless.

What I really want to see is if we get a SSL_ERROR_WANT_xxx return code 
when we're hung.

-- 
Kenneth Murchison
Systems Programmer
Project Cyrus Developer/Maintainer
Carnegie Mellon University

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Re: One more attempt: stuck processes

2007-11-16 Thread Alain Spineux

On Nov 16, 2007 6:11 PM, Sebastian Hagedorn [EMAIL PROTECTED] wrote:
 --On 16. November 2007 18:07:51 +0100 Gabor Gombas [EMAIL PROTECTED]
 wrote:

  Hm, we don't suffer any actual slowdown, it's just that the number of
  processes increases over time.
 
  It's not a slowdown - the client connects, and hangs. It never even gets
  to the authentication phase (at least it's not logged). Clients that
  happen to connect to a non-affected process run normally.

 Well, that just sounds like you're running out of entropy. That's a
 different issue. Recompile your cyrus-sasl to use /dev/urandom instead of
 /dev/random or disable apop in /etc/imapd.conf:

 allowapop: 0

 Either of those things should get rid of that.

The quick but the bad way to do this is for testing is

# ls -l /dev/*random
crw-rw-rw- 1 root root 1, 8 Nov 16 06:18 /dev/random
cr--r--r-- 1 root root 1, 9 Nov  7 22:47 /dev/urandom

# mv /dev/random /dev/random.orig
# ln -sf /dev/urandom /dev/random

And then, because letting /dev/random that way too long is insecure :

# rm -f /dev/random
# mv /dev/random.orig /dev/random

This avoid to recompile the source just for testing.




 --
  .:.Sebastian Hagedorn - RZKR-R1 (Gebäude 52), Zimmer 18.:.
 Zentrum für angewandte Informatik - Universitätsweiter Service RRZK
 .:.Universität zu Köln / Cologne University - ✆ +49-221-478-5587.:.
.:.:.:.Skype: shagedorn.:.:.:.
 
 Cyrus Home Page: http://cyrusimap.web.cmu.edu/
 Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
 List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html




-- 
Alain Spineux
aspineux gmail com
May the sources be with you

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Re: One more attempt: stuck processes

2007-11-16 Thread Gary Mills

On Fri, Nov 16, 2007 at 05:13:13PM +0100, Sebastian Hagedorn wrote:
 --On 16. November 2007 14:23:17 +0100 Simon Matter [EMAIL PROTECTED] 
 wrote:
 
 Did you ever see non SSL connections get stuck?
 
 No.

Most of mine are `pop3d -s', but I have seen a few without the `-s'.
When I did a stack trace on one, it also turned out to be for an SSL
session.  So, I have to agree.

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Re: One more attempt: stuck processes

2007-11-16 Thread Sebastian Hagedorn


OK, now I got this:

Nov 16 18:37:06 lvr13 pop3s[23089]: SSL_read() returned -1

But that process terminated normally.
--
.:.Sebastian Hagedorn - RZKR-R1 (Gebäude 52), Zimmer 18.:.
Zentrum für angewandte Informatik - Universitätsweiter Service RRZK
.:.Universität zu Köln / Cologne University - ✆ +49-221-478-5587.:.
  .:.:.:.Skype: shagedorn.:.:.:.

pgpRJSjlsSCf8.pgp
Description: PGP signature

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Re: LARGE single-system Cyrus installs?

2007-11-16 Thread Ian G Batten



On 15 Nov 07, at 1504, Michael Bacon wrote:

Interesting thought.  We haven't gone to ZFS yet, although I like  
the idea
a lot.  My hunch is it's an enormous win for the mailbox  
partitions, but
perhaps it's not a good thing for the meta partition.  I'll have to  
let

someone else who knows more about ZFS and write speeds vs. read speeds
chime in here.


We're finding it a real win for the meta-partition.  We're handing  
~1000 users on a 2-way stripe by two-way mirror on the internal disks  
in a T2000 for the meta-data, with the message data coming in over  
NFS.We do see a few spikes of write operations (this is one  
instance from zpool isotat -v 1):


 capacity operationsbandwidth
pool   used  avail   read  write   read  write
  -  -  -  -  -  -
pool1 52.1G  25.9G  4657  3.96K  3.71M
  mirror  26.0G  13.0G  4354  3.96K  1.42M
c0t0d0s4  -  -  0135  0  1.42M
c0t1d0s4  -  -  0126  63.4K  1.42M
  mirror  26.0G  13.0G  0302  0  2.29M
c0t2d0s4  -  -  0112  0  2.29M
c0t3d0s4  -  -  0109  0  2.29M
  -  -  -  -  -  -


but it's showing no signs at all of being IO bound on the metadata.
The spikes are really just spikes for a second: the typical level is  
about 10 ops / disk / sec.


ian


Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Re: One more attempt: stuck processes

2007-11-16 Thread Alain Spineux

On Nov 16, 2007 12:36 PM, Sebastian Hagedorn [EMAIL PROTECTED] wrote:
 --On 16. November 2007 11:27:09 +0100 Sebastian Hagedorn
 [EMAIL PROTECTED] wrote:

  1) Since it only happens on dialup connections, could it be that the
  dialin router at the providers end sends TCP/RST when a client hangs up
  and those packets are filtered somewhere, maybe on your firewall?
 
  OK, let's run with that one.
 
  a) We don't really have a firewall, we only use ACLs on the Cisco
  routers. You can't even filter TCP/RST there.
 
  b) Even *if* a TCP/RST had been dropped, lost or whatever, the server
  *still* should timeout eventually!

 I just had a discussion with a colleague regarding this. He made two
 observations:

 1. In the absence of the SO_KEEPALIVE option it is entirely possible that a
 TCP connection remains ESTABLISHED even when the other side has gone.

I said that socket should timeout, but this is true only when the
protocol (TCP here)
require a response (usualy AK here) or at connection establishement.
On the contrary
it should stay open indefinitely util something happens. Router doing
NAT can drop
a too old connection, because it has to maintains a NAT table and make some
cleanup time to time, this where KEEPALIVE become usefull.


 This may not be a solution to this particular problem, but it made me
 wonder why Cyrus does *not* use SO_KEEPALIVE. Is there a downside to it?

Cyrus has already a built-in time out, it seems a lite conflicting to actively
maintains the connection until it drop it itself !
This is the works of the client to actively maintains the connection,
if it want it !


 2. The stack trace looks garbled:

 (gdb) bt
 #0  0x0079f41e in __read_nocancel () from /lib/tls/libc.so.6
 #1  0x00d0b2f7 in BIO_new_socket () from /lib/libcrypto.so.4
 #2  0x00d092b2 in BIO_read () from /lib/libcrypto.so.4
 #3  0x005dae13 in ssl23_read_bytes () from /lib/libssl.so.4
 #4  0x005d9c51 in ssl23_get_client_hello () from /lib/libssl.so.4
 #5  0x005d9712 in ssl23_accept () from /lib/libssl.so.4
 #6  0x005ddc9a in SSL_accept () from /lib/libssl.so.4
 #7  0x08052cb3 in shut_down ()
 #8  0x0804e513 in shut_down ()
 #9  0x0804d58c in ?? ()
 #10 0x0001 in ?? ()
 #11 0x082ee848 in ?? ()
 #12 0x in ?? ()

 He suggested that the trace is unreliable. Perhaps a bug in RHEL 3's
 version of OpenSSL messes up the stack. That would also explain why nobody
 else seems to have this problem.

 I think I will try one more approach: I reverted cyrus.conf to not use -U
 1 anymore, so that processes should be reused. I will strace one of the
 pop3d processes in the hope that it gets stuck. That way I should be able
 to see where things go wrong. If the process terminates normally I will try
 with another one. If that doesn't go anywhere, I guess I'll drop this

You could try to replace imapd by a home made script, something like .

mv imapd imapd_
echo exec strace -o /tmp/imapd.$$ imapd_ $*  imapd
chmod imapd a+x


 investigation. We will upgrade to RHEL 5 some time next year, so hopefully
 that will bring new bugs :-)

Sorry but I dont understand what you are complaining about!
Is-it because the imap or pop client is loosing its connection and
this disturb the user
or just because you are getting some sleeping processes ? Or both :-)

Do you have a timeout option in your imapd.conf to force the
imap/pop server to autologout ?


Regards.

Alain

 --
  .:.Sebastian Hagedorn - RZKR-R1 (Gebäude 52), Zimmer 18.:.
 Zentrum für angewandte Informatik - Universitätsweiter Service RRZK
 .:.Universität zu Köln / Cologne University - ✆ +49-221-478-5587.:.
.:.:.:.Skype: shagedorn.:.:.:.
 
 Cyrus Home Page: http://cyrusimap.web.cmu.edu/
 Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
 List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html




-- 
Alain Spineux
aspineux gmail com
May the sources be with you

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Re: One more attempt: stuck processes

2007-11-16 Thread Sebastian Hagedorn

--On 16. November 2007 14:23:17 +0100 Simon Matter [EMAIL PROTECTED] 
wrote:



Did you ever see non SSL connections get stuck?


No.
--
.:.Sebastian Hagedorn - RZKR-R1 (Gebäude 52), Zimmer 18.:.
Zentrum für angewandte Informatik - Universitätsweiter Service RRZK
.:.Universität zu Köln / Cologne University - ✆ +49-221-478-5587.:.
  .:.:.:.Skype: shagedorn.:.:.:.

pgpAIZv7hfTCt.pgp
Description: PGP signature

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Re: LARGE single-system Cyrus installs?

2007-11-16 Thread Ken Murchison

Dale Ghent wrote:
 On Nov 16, 2007, at 1:39 AM, Pascal Gienger wrote:
 
 Solaris 10 does this in my case. Via dtrace you'll see that open()  
 on the
 mailboxes.db and read-calls do not exceed microsecond ranges.  
 mailboxes.db
 is not the problem here. It is entirely cached and rarely written
 (creating, deleting and moving a mailbox).
 
 
 Hmm, I'm wondering if the Cyrus devs would be receptive to the idea of  
 implementing some dtrace probes in Cyrus.
 
 Stuff such as mailbox open/close, IMAP operations such as SELECTs,  
 message retrievals, and so on.

We'd probably accept a patch, as long as its portable.

-- 
Kenneth Murchison
Systems Programmer
Project Cyrus Developer/Maintainer
Carnegie Mellon University

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Re: One more attempt: stuck processes

2007-11-16 Thread Sebastian Hagedorn

--On 16. November 2007 11:27:52 -0500 Ken Murchison [EMAIL PROTECTED] 
wrote:



Sebastian Hagedorn wrote:


The only reason I could imagine for the sequence of calls was signal
handling. But let's be methodical. There's only one spot where
SSL_accept() is called: in tls_start_servertls(). In pop3d.c that's only
called in cmd_starttls(). That in turn is called either in cmdloop (for
handling of STLS) or in service_main() for connections to port 995.


Actually, now that I think about it, I believe SSL_accept() can be called
from SSL_read() at any time if a renegotiation is required. Since
shut_down() calls prot_fill(), which in turn can call SSL_read(), its
possible that we can get an SSL_accept() call.  Before I start hacking
code, can you apply the following patch (sorry about the line breaks) and
see if I'm heading in the right direction?  Let me know if you get any of
the WARNING messages in your logs.


--- prot.c.~1.93.~  2007-11-16 11:21:56.0 -0500
+++ prot.c  2007-11-16 11:23:32.0 -0500
@@ -468,6 +468,7 @@
/* just do a SSL read instead if we're under a tls layer */
if (s-tls_conn != NULL) {
n = SSL_read(s-tls_conn, (char *) s-buf, PROT_BUFSIZE);
+   if (n = 0) syslog(LOG_WARNING, SSL_read() returned %d, n);
} else {
n = read(s-fd, s-buf, PROT_BUFSIZE);
}


Yes, I do:

Nov 16 17:59:34 lvr13 pop3s[3196]: SSL_read() returned 0
Nov 16 17:59:38 lvr13 pop3s[3196]: SSL_read() returned 0
Nov 16 18:00:09 lvr13 pop3s[3215]: SSL_read() returned 0
Nov 16 18:00:26 lvr13 pop3s[3847]: SSL_read() returned 0
Nov 16 18:00:34 lvr13 pop3s[3215]: SSL_read() returned 0
Nov 16 18:00:34 lvr13 pop3s[3199]: SSL_read() returned 0
Nov 16 18:00:39 lvr13 pop3s[3199]: SSL_read() returned 0
Nov 16 18:00:43 lvr13 pop3s[3229]: SSL_read() returned 0

Not all of these processes are stuck, though. (Maybe none are). Should I be 
looking for something specific?

--
.:.Sebastian Hagedorn - RZKR-R1 (Gebäude 52), Zimmer 18.:.
Zentrum für angewandte Informatik - Universitätsweiter Service RRZK
.:.Universität zu Köln / Cologne University - ✆ +49-221-478-5587.:.
  .:.:.:.Skype: shagedorn.:.:.:.

pgpzM3I7B80P9.pgp
Description: PGP signature

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Re: One more attempt: stuck processes

2007-11-16 Thread Ken Murchison

Sebastian Hagedorn wrote:

 Nov 16 18:00:26 lvr13 pop3s[3847]: SSL_read() returned 0
 Nov 16 18:00:34 lvr13 pop3s[3215]: SSL_read() returned 0
 Nov 16 18:00:34 lvr13 pop3s[3199]: SSL_read() returned 0
 Nov 16 18:00:39 lvr13 pop3s[3199]: SSL_read() returned 0
 Nov 16 18:00:43 lvr13 pop3s[3229]: SSL_read() returned 0
 
 Not all of these processes are stuck, though. (Maybe none are). Should I 
 be looking for something specific?

Sorry, my patch wasn't complete.  It wasn't logging the value that I 
wanted.  Try this:

--- prot.c.~1.93.~  2007-11-16 11:21:56.0 -0500
+++ prot.c  2007-11-16 12:37:55.0 -0500
@@ -468,6 +468,10 @@
/* just do a SSL read instead if we're under a tls layer */
if (s-tls_conn != NULL) {
n = SSL_read(s-tls_conn, (char *) s-buf, PROT_BUFSIZE);
+   if (n = 0) {
+   syslog(LOG_WARNING, SSL_read() returned %d:%d,
+  n, SSL_get_error(s-tls_conn, n));
+   }
} else {
n = read(s-fd, s-buf, PROT_BUFSIZE);
}


-- 
Kenneth Murchison
Systems Programmer
Project Cyrus Developer/Maintainer
Carnegie Mellon University

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Re: One more attempt: stuck processes

2007-11-16 Thread Sebastian Hagedorn

--On 16. November 2007 12:39:28 -0500 Ken Murchison [EMAIL PROTECTED] 
wrote:



Sorry, my patch wasn't complete.  It wasn't logging the value that I
wanted.


OK:

Nov 16 18:48:17 lvr13 pop3s[1385]: SSL_read() returned 0:5
Nov 16 18:48:33 lvr13 pop3s[1375]: SSL_read() returned 0:5
Nov 16 18:48:50 lvr13 pop3s[1980]: SSL_read() returned 0:6
Nov 16 18:48:54 lvr13 pop3s[1376]: SSL_read() returned 0:5
Nov 16 18:49:03 lvr13 pop3s[1375]: SSL_read() returned 0:5
Nov 16 18:49:11 lvr13 pop3s[1375]: SSL_read() returned 0:5
Nov 16 18:49:38 lvr13 pop3s[1375]: SSL_read() returned 0:5
Nov 16 18:49:54 lvr13 pop3s[1404]: SSL_read() returned 0:5

I'm guessing that's still not enough:

#define SSL_ERROR_SYSCALL   5 /* look at error stack/return 
value/errno */

#define SSL_ERROR_ZERO_RETURN   6

  SSL_ERROR_SYSCALL
  Some I/O error occurred.  The OpenSSL error queue may contain 
more

  information on the error.  If the error queue is empty (i.e.
  ERR_get_error() returns 0), ret can be used to find out more 
about
  the error: If ret == 0, an EOF was observed that violates the 
pro-
  tocol.  If ret == -1, the underlying BIO reported an I/O error 
(for

  socket I/O on Unix systems, consult errno for details).

So should I add a call to ERR_get_error()?
--
.:.Sebastian Hagedorn - RZKR-R1 (Gebäude 52), Zimmer 18.:.
Zentrum für angewandte Informatik - Universitätsweiter Service RRZK
.:.Universität zu Köln / Cologne University - ✆ +49-221-478-5587.:.
  .:.:.:.Skype: shagedorn.:.:.:.

pgpFMfHMrSvNV.pgp
Description: PGP signature

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Re: One more attempt: stuck processes

2007-11-16 Thread Alain Spineux

Hi

Can I resume the problem in :

The server is blocked in a read, waiting for the client next command.
(this is normal,
99% of the process are in this state). But the autologout procedure is
not working!

Then this means the SIGALRM that should awake the process never come or is not
handled properly! I simple call to sleep() or signal() could disturb this.
If this append only when using SSL, maybe the problem is here and the
ALRM should
bne reloaded somewhere.

This is useless now, but files in $cyrus_imap/proc/* contains the user
and the selected mailbox
of all these processes this could be useful to know if this what not
always the same user at the
origin of the problem, because he was using an old outlook or something.

Regards

On Nov 16, 2007 5:33 PM, Ken Murchison [EMAIL PROTECTED] wrote:

 Sebastian Hagedorn wrote:
  --On 16. November 2007 09:37:42 -0600 Gary Mills [EMAIL PROTECTED]
  wrote:
 
  Could you get a stack trace? If you have gdb you just call it with gdb
  -p  19175. Then you can do bt at the prompt. I forget how to do it
  with  Sun's debugger.
 
  Easy:
 
# pstack 19175
19175:  pop3d -s
 fef9f810 read (0, 2316f0, 5)
 fee1d2d0 read (0, 2316f0, 5, 0, 0, 0) + 5c
 ff06bb38 sock_read (1f0860, 2316f0, 5, 5, 0, 0) + 24
 ff068af0 BIO_read (1f0860, 2316f0, 5, fef98b84, 0, 0) + 110
 ff278488 ssl3_read_n (212798, 5, 8805, 0, 0, 203958) + 174
 ff2785fc ssl3_get_record (204ce0, 8000, 8400, 4400, f1, f0) + d0
 ff279424 ssl3_read_bytes (212798, 1000, 2000, 4, 0, ffbfe731) + 228
 ff27a99c ssl3_get_message (ff2a259c, 2070a0, 0, , 19000,
  ffbfe7a0) + d0 ff27042c ssl3_accept (2150, 2160, 2180, 21e0, 2110, 2122)
  + 904ff27bd2c ssl23_get_client_hello (2316fb, 6c, 6c, 4, fe79, 0)
  + 828ff27b4b4 ssl23_accept (4000, 2000, 0, 0, 0, 0) + 2a4
 00032d00 tls_start_servertls (0, 1, ffbfee24, ffbfee20, 1849a8, ff00)
  + 1980002c504 cmd_starttls (1, 1fd8b8, 0, 0, 0, 0) + 184
 0002a638 service_main (2, 192198, ffbffce0, 1aec4, 3508c, 1) + 488
 00035250 main (2, ffbffcd4, ffbffce0, 17c400, 0, 0) + e18
 00029298 _start   (0, 0, 0, 0, 0, 0) + 108
 
  Thanks, that looks like progress! That stack trace looks similar enough
  to the one I'm seeing that I could imagine that it is what I *should* be
  seeing if the stack weren't garbled. Of course that's only speculation.
 
  Ken, is it possible that the call to SSL_accept() in
  tls_start_servertls() blocks when the client goes away? That could
  explain everything 

 Yes.  Gary's problem might be very similar to yours, depending on what I
 see from the patch that I just sent you.

 --
 Kenneth Murchison
 Systems Programmer
 Project Cyrus Developer/Maintainer
 Carnegie Mellon University
 

 Cyrus Home Page: http://cyrusimap.web.cmu.edu/
 Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
 List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html




-- 
Alain Spineux
aspineux gmail com
May the sources be with you

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Re: One more attempt: stuck processes

2007-11-16 Thread Sebastian Hagedorn

--On 16. November 2007 18:21:21 +0100 Gabor Gombas [EMAIL PROTECTED] 
wrote:



On Fri, Nov 16, 2007 at 06:11:01PM +0100, Sebastian Hagedorn wrote:


Well, that just sounds like you're running out of entropy. That's a
different issue. Recompile your cyrus-sasl to use /dev/urandom instead
of  /dev/random or disable apop in /etc/imapd.conf:


Debian uses /dev/urandom for a long time:

# strings /usr/lib/libsasl2.so.2 | grep random
/dev/urandom

And according to the logs I have, after a pop3 process got stuck other
IMAP users can still log in using TLS+PLAIN, so entropy can be ruled
out.


OK. Still the symptom seems to be different from what I'm seeing. Could it 
be that you have a process limit in /etc/cyrus.conf?

--
.:.Sebastian Hagedorn - RZKR-R1 (Gebäude 52), Zimmer 18.:.
Zentrum für angewandte Informatik - Universitätsweiter Service RRZK
.:.Universität zu Köln / Cologne University - ✆ +49-221-478-5587.:.
  .:.:.:.Skype: shagedorn.:.:.:.

pgpvuf1O5cVWm.pgp
Description: PGP signature

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Re: One more attempt: stuck processes

2007-11-16 Thread Sebastian Hagedorn

-- Ken Murchison [EMAIL PROTECTED] is rumored to have mumbled on 16. 
November 2007 12:58:49 -0500 regarding Re: One more attempt: stuck 
processes:



So should I add a call to ERR_get_error()?



Not yet.  I'm assuming that none of these processes has hung.  We're
getting an I/O error most likely because the client has closed the
connection immediately after sending QUIT.  This is harmless.

What I really want to see is if we get a SSL_ERROR_WANT_xxx return code
when we're hung.


I have both good and bad news. Bad news first: there is a stuck process 
that did *not* log that SSL_read line.


Good news: the binary I'm running now isn't stripped and has much more 
detail in its stack trace:


(gdb) bt
#0  0x003d341e in __read_nocancel () from /lib/tls/libc.so.6
#1  0x0017f2f7 in BIO_new_socket () from /lib/libcrypto.so.4
#2  0x0017d2b2 in BIO_read () from /lib/libcrypto.so.4
#3  0x0089ec30 in ssl3_alert_code () from /lib/libssl.so.4
#4  0x0089edcc in ssl3_alert_code () from /lib/libssl.so.4
#5  0x008a00cf in ssl3_read_bytes () from /lib/libssl.so.4
#6  0x008a0ffc in ssl3_get_message () from /lib/libssl.so.4
#7  0x00896cab in ssl3_accept () from /lib/libssl.so.4
#8  0x00896944 in ssl3_accept () from /lib/libssl.so.4
#9  0x008a5c9a in SSL_accept () from /lib/libssl.so.4
#10 0x008a180d in ssl23_get_client_hello () from /lib/libssl.so.4
#11 0x008a1712 in ssl23_accept () from /lib/libssl.so.4
#12 0x008a5c9a in SSL_accept () from /lib/libssl.so.4
#13 0x08052cf3 in tls_start_servertls (readfd=-512, writefd=-512, 
layerbits=0xbfff7a78, authid=0xbfff7a74,

   ret=0x810bca0) at tls.c:803
#14 0x0804e553 in cmd_starttls (pop3s=1) at pop3d.c:1076
#15 0x0804d5cc in service_main (argc=2, argv=0x9e84008, envp=0xbfff9850) at 
pop3d.c:537

#16 0x08054550 in main (argc=2, argv=0x9, envp=0xbfff9850) at service.c:539

There's much less POP activity now, so I may have to wait until Monday for 
more results.

--
Sebastian Hagedorn - Postmaster - RZKR-R1 (Flachbau), Zimmer 18
Zentrum für angewandte Informatik - Universitätsweiter Service RRZK
Universität zu Köln / Cologne University - Tel. +49-221-478-5587

pgpvINiK8adT6.pgp
Description: PGP signature

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Re: LARGE single-system Cyrus installs?

2007-11-16 Thread Dale Ghent

On Nov 16, 2007, at 1:39 AM, Pascal Gienger wrote:

 Solaris 10 does this in my case. Via dtrace you'll see that open()  
 on the
 mailboxes.db and read-calls do not exceed microsecond ranges.  
 mailboxes.db
 is not the problem here. It is entirely cached and rarely written
 (creating, deleting and moving a mailbox).


Hmm, I'm wondering if the Cyrus devs would be receptive to the idea of  
implementing some dtrace probes in Cyrus.

Stuff such as mailbox open/close, IMAP operations such as SELECTs,  
message retrievals, and so on.

I run cyrus on my personal server now, so maybe I'll fool around with  
that idea.

/dale

--
Dale Ghent
Specialist, Storage and UNIX Systems
UMBC - Office of Information Technology
ECS 201 - x51705




Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Re: LARGE single-system Cyrus installs?

2007-11-16 Thread Wesley Craig

On 15 Nov 2007, at 18:25, Rob Mueller wrote:
 About 30% of all I/O is to mailboxes.db, most of which is read.  I
 haven't personally deployed a split-meta configuration, but I
 understand the meta files are similarly heavy I/O concentrators.

 That sounds odd.

Yeah, it's not right.  I was reading my iostat output backwards.  In  
fact, it's writes and presumably an artifact of having system logs on  
the same device as mailboxes.db.  Sorry for the confusion.

:wes

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Re: One more attempt: stuck processes

2007-11-16 Thread Michael M. Rach

I know it has been asked before and may be redundant, but...  You 
answered that cyrus-sasl is using /dev/urandom and should not run out of 
entropy.  However, what about openssl itself?  It also uses random 
numbers.  Perhaps, as a test renaming /dev/random and ln -s /dev/urandom 
/dev/random.

Gary Mills wrote:
 On Fri, Nov 16, 2007 at 05:13:13PM +0100, Sebastian Hagedorn wrote:
   
 --On 16. November 2007 14:23:17 +0100 Simon Matter [EMAIL PROTECTED] 
 wrote:

 
 Did you ever see non SSL connections get stuck?
   
 No.
 

 Most of mine are `pop3d -s', but I have seen a few without the `-s'.
 When I did a stack trace on one, it also turned out to be for an SSL
 session.  So, I have to agree.

   

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Help with xfermailbox

2007-11-16 Thread Dan White

I'm experiencing errors when attempting to transfer a mailbox 
from one backend to another in a murder environment.

This is my first try, so this could be due to misconfiguration.

I have three servers in my setup:

kaled.olp.net - MUPDATE master and frontend
gandalf.olp.net - backend #1
neo.olp.net - backend #2

When I issue the command xfermailbox user/9183641498 
neo.olp.net from gandalf, I receive the error:

gandalf.olp.net xfer user/9183641498 neo.olp.net
xfermailbox: The remote Server(s) denied the operation

And in neo's (destination backend) logs, I see:
Nov 16 14:16:18 neo cyrus/imap[6183]: accepted connection
Nov 16 14:16:19 neo cyrus/imap[6183]: login: gandalf.olp.net 
[65.161.252.87] cyrus-gandalf.olp.net GSSAPI User logged in
Nov 16 14:16:19 neo cyrus/imap[6183]: kick_mupdate: can't connect 
to target: No such file or directory

Sometimes I also get (in addition to the No such file or 
directory error):

Nov 16 13:44:57 neo cyrus/imap[6171]: decoding error: generic 
failure; SASL(-1): generic failure: , closing connection

The relevant portion of the code that generates this error 
appears to be in mupdate-client.c:

 strlcpy(buf, config_dir, sizeof(buf));
 strlcat(buf, FNAME_MUPDATE_TARGET_SOCK, sizeof(buf));
 memset((char *)srvaddr, 0, sizeof(srvaddr));
 srvaddr.sun_family = AF_UNIX;
 strcpy(srvaddr.sun_path, buf);
 len = sizeof(srvaddr.sun_family) + strlen(srvaddr.sun_path) + 1;

 r = connect(s, (struct sockaddr *)srvaddr, len);
 if (r == -1) {
 syslog(LOG_ERR, kick_mupdate: can't connect to target: 
%m);
 goto done;
 }

FNAME_MUPDATE_TARGET_SOCK is defined in mupdate-client.h as:
#define FNAME_MUPDATE_TARGET_SOCK /socket/mupdate.target

I can't find any sockets named mupdate.target on neo (my 
destination backend).

Relevant configurations can be found at:
http://support.olp.net/cyrus/kaled-imapd.conf
http://support.olp.net/cyrus/kaled-cyrus.conf
http://support.olp.net/cyrus/gandalf-imapd.conf
http://support.olp.net/cyrus/gandalf-cyrus.conf
http://support.olp.net/cyrus/neo-imapd.conf
http://support.olp.net/cyrus/neo-cyrus.conf

I'm running 2.3.10, with several Debian patches.

Thanks for any help,
-- 
Dan White [EMAIL PROTECTED]
BTC Broadband

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Re: One more attempt: stuck processes

2007-11-16 Thread Michael Bacon

--On Friday, November 16, 2007 3:54 PM -0500 Ken Murchison 
[EMAIL PROTECTED] wrote:

 I've reproduced the former by telneting to port 995 and doing nothing.
 I have been unable to reproduce the latter because as soon as I QUIT the
 telnet session or kill() the telnet process, pop3d exits gracefully.

I agree with others that you want to do something other than kill the 
telnet session, like unplugging the cable.  I've seen similar behavior out 
of cyrus that I thought could easily be people on laptops shutting their 
computers down hard, in some way that the TCP/IP stack never got a chance 
to clean up the connection properly.  With a QUIT or a kill, you're giving 
the OS a chance to do the right thing.

-Michael

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Bingo!

2007-11-16 Thread Sebastian Hagedorn

-- Sebastian Hagedorn [EMAIL PROTECTED] is rumored to have mumbled on 
16. November 2007 22:03:21 +0100 regarding Re: One more attempt: stuck 
processes:



The question is how pop3d knows that the connection is dropped. And maybe
that's really where dial-up comes into play. In don't know if you're in a
position to test that, but what happens if you telnet to port 995 from
dial-up and then drop the dial-up connection? I guess I might try that
from home now.


That does it ... I disconnected my cable modem while having an open telnet 
connection to 995. Now that process is stuck.

--
Sebastian Hagedorn - RZKR-R1 (Flachbau), Zi. 18, Robert-Koch-Str. 10
Zentrum für angewandte Informatik - Universitätsweiter Service RRZK
Universität zu Köln / Cologne University - Tel. +49-221-478-5587

pgpd0ZxYqLLmf.pgp
Description: PGP signature

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Re: LARGE single-system Cyrus installs?

2007-11-16 Thread Ken Murchison

Dale Ghent wrote:
 On Nov 16, 2007, at 2:56 PM, Ken Murchison wrote:
 
 Dale Ghent wrote:
 On Nov 16, 2007, at 1:39 AM, Pascal Gienger wrote:
 Solaris 10 does this in my case. Via dtrace you'll see that open()  
 on the
 mailboxes.db and read-calls do not exceed microsecond ranges.  
 mailboxes.db
 is not the problem here. It is entirely cached and rarely written
 (creating, deleting and moving a mailbox).
 Hmm, I'm wondering if the Cyrus devs would be receptive to the idea 
 of  implementing some dtrace probes in Cyrus.
 Stuff such as mailbox open/close, IMAP operations such as SELECTs,  
 message retrievals, and so on.

 We'd probably accept a patch, as long as its portable.
 
 
 Portable in what sense, exactly?
 
 Currently the only OSes which offer DTrace is OSX 10.5 and Solaris 10 
 (and Solaris Next), so would I be correct to assume that you mean that a 
 dtrace feature would have to work on those two OSes?

I don't care if it only works on Solaris 10, but the code can't get in 
the way of it compiling and running on any other non-Dtrace system.

-- 
Kenneth Murchison
Systems Programmer
Project Cyrus Developer/Maintainer
Carnegie Mellon University

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Re: Bingo!

2007-11-16 Thread Sebastian Hagedorn

-- Ken Murchison [EMAIL PROTECTED] is rumored to have mumbled on 16. 
November 2007 16:29:20 -0500 regarding Re: Bingo!:



That does it ... I disconnected my cable modem while having an open
telnet connection to 995. Now that process is stuck.


Does the same thing happen if you telnet to port 110?


Actually yes - so far! But the stack trace and strace are instructive:

(gdb) bt
#0  0x006cf2e8 in ___newselect_nocancel () from /lib/tls/libc.so.6
#1  0x08073f76 in prot_fill (s=0x9f6bf48) at prot.c:439
#2  0x080757ad in prot_fgets (buf=0xbfff7a30 quit, size=8191, 
s=0x9f6bf48) at prot.c:1196

#3  0x0804da6b in cmdloop () at pop3d.c:762
#4  0x0804d516 in service_main (argc=1, argv=0x9f1f008, envp=0xbfffb80c) at 
pop3d.c:543

#5  0x08054550 in main (argc=1, argv=0x9, envp=0xbfffb80c) at service.c:539

# strace -p 18432
Process 18432 attached - interrupt to quit
select(1, [0], NULL, NULL, {463, 9}

The select() will time out eventually, I'm sure. I'm currently waiting for 
that to happen.

--
Sebastian Hagedorn - Postmaster - RZKR-R1 (Flachbau), Zimmer 18
Zentrum für angewandte Informatik - Universitätsweiter Service RRZK
Universität zu Köln / Cologne University - Tel. +49-221-478-5587

pgpG7EqedAZoN.pgp
Description: PGP signature

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Re: Bingo!

2007-11-16 Thread Sebastian Hagedorn

-- Sebastian Hagedorn [EMAIL PROTECTED] is rumored to have mumbled on 
16. November 2007 22:36:09 +0100 regarding Re: Bingo!:



The select() will time out eventually, I'm sure. I'm currently waiting
for that to happen.


Here we go:

# strace -p 18432
Process 18432 attached - interrupt to quit
select(1, [0], NULL, NULL, {463, 9}) = 0 (Timeout)
time(NULL)  = 1195249308
close(9)= 0
munmap(0xb47a4000, 4096)= 0
unlink(/var/lib/imap/proc/18432)  = 0
munmap(0xb47a5000, 12214272)= 0
close(6)= 0
munmap(0xb41da000, 6070272) = 0
close(10)   = 0
munmap(0xb534b000, 32768)   = 0
munmap(0xb6953000, 2621440) = 0
munmap(0xb5353000, 23068672)= 0
munmap(0xb74a1000, 1318912) = 0
munmap(0xb735f000, 1318912) = 0
munmap(0xb721d000, 1318912) = 0
munmap(0xb70db000, 1318912) = 0
munmap(0xb6f99000, 1318912) = 0
munmap(0xb6e57000, 1318912) = 0
munmap(0xb6d15000, 1318912) = 0
munmap(0xb6bd3000, 1318912) = 0
munmap(0xb75f4000, 16384)   = 0
exit_group(0)   = ?

I suppose an alarm handler is in order?
--
Sebastian Hagedorn - RZKR-R1 (Flachbau), Zi. 18, Robert-Koch-Str. 10
Zentrum für angewandte Informatik - Universitätsweiter Service RRZK
Universität zu Köln / Cologne University - Tel. +49-221-478-5587

pgp92aahzato9.pgp
Description: PGP signature

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

RE: Bingo!

2007-11-16 Thread Ken Murchison

It looks like it timed out properly, correct?

(from my phone)
-- 
Kenneth Murchison
Systems Programmer
Project Cyrus Developer/Maintainer
Carnegie Mellon University

-Original Message-
From: Sebastian Hagedorn [EMAIL PROTECTED]
To: Ken Murchison [EMAIL PROTECTED]
Cc: Postmaster Uni Köln [EMAIL PROTECTED]; Cyrus IMAP 
info-cyrus@lists.andrew.cmu.edu
Sent: 11/16/07 4:43 PM
Subject: Re: Bingo!

-- Sebastian Hagedorn [EMAIL PROTECTED] is rumored to have mumbled on 
16. November 2007 22:36:09 +0100 regarding Re: Bingo!:

 The select() will time out eventually, I'm sure. I'm currently waiting
 for that to happen.

Here we go:

# strace -p 18432
Process 18432 attached - interrupt to quit
select(1, [0], NULL, NULL, {463, 9}) = 0 (Timeout)
time(NULL)  = 1195249308
close(9)= 0
munmap(0xb47a4000, 4096)= 0
unlink(/var/lib/imap/proc/18432)  = 0
munmap(0xb47a5000, 12214272)= 0
close(6)= 0
munmap(0xb41da000, 6070272) = 0
close(10)   = 0
munmap(0xb534b000, 32768)   = 0
munmap(0xb6953000, 2621440) = 0
munmap(0xb5353000, 23068672)= 0
munmap(0xb74a1000, 1318912) = 0
munmap(0xb735f000, 1318912) = 0
munmap(0xb721d000, 1318912) = 0
munmap(0xb70db000, 1318912) = 0
munmap(0xb6f99000, 1318912) = 0
munmap(0xb6e57000, 1318912) = 0
munmap(0xb6d15000, 1318912) = 0
munmap(0xb6bd3000, 1318912) = 0
munmap(0xb75f4000, 16384)   = 0
exit_group(0)   = ?

I suppose an alarm handler is in order?
--
Sebastian Hagedorn - RZKR-R1 (Flachbau), Zi. 18, Robert-Koch-Str. 10
Zentrum für angewandte Informatik - Universitätsweiter Service RRZK
Universität zu Köln / Cologne University - Tel. +49-221-478-5587


Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Re: One more attempt: stuck processes

2007-11-16 Thread Gary Mills

On Fri, Nov 16, 2007 at 03:54:50PM -0500, Ken Murchison wrote:
 
 That's exactly what Gary is seeing.  Its blocking in SSL_accept(). 
 Apparently the client connects to port 995, and then either sends 
 nothing, or goes away and leaves the socket open.
 
 I've reproduced the former by telneting to port 995 and doing nothing. 
 I have been unable to reproduce the latter because as soon as I QUIT the 
 telnet session or kill() the telnet process, pop3d exits gracefully.

You probably have to reboot the client at that point, or just
disconnect the cable and take it home.

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Re: One more attempt: stuck processes

2007-11-16 Thread Alain Spineux

On Nov 16, 2007 6:24 PM, Alain Spineux [EMAIL PROTECTED] wrote:
 Hi

 Can I resume the problem in :

I'm wrong


 The server is blocked in a read, waiting for the client next command.
 (this is normal,
 99% of the process are in this state).

No it is waiting in select, and the select has a timeout !

 But the autologout procedure is
 not working!

 Then this means the SIGALRM that should awake the process never come or is not
 handled properly! I simple call to sleep() or signal() could disturb this.
 If this append only when using SSL, maybe the problem is here and the
 ALRM should
 bne reloaded somewhere.

Wrong wrong ! It could be right, but here the time out looks to be
done using the select !


 This is useless now, but files in $cyrus_imap/proc/* contains the user
 and the selected mailbox
 of all these processes this could be useful to know if this what not
 always the same user at the
 origin of the problem, because he was using an old outlook or something.

 Regards


 On Nov 16, 2007 5:33 PM, Ken Murchison [EMAIL PROTECTED] wrote:
 
  Sebastian Hagedorn wrote:
   --On 16. November 2007 09:37:42 -0600 Gary Mills [EMAIL PROTECTED]
   wrote:
  
   Could you get a stack trace? If you have gdb you just call it with gdb
   -p  19175. Then you can do bt at the prompt. I forget how to do it
   with  Sun's debugger.
  
   Easy:
  
 # pstack 19175
 19175:  pop3d -s
  fef9f810 read (0, 2316f0, 5)
  fee1d2d0 read (0, 2316f0, 5, 0, 0, 0) + 5c
  ff06bb38 sock_read (1f0860, 2316f0, 5, 5, 0, 0) + 24
  ff068af0 BIO_read (1f0860, 2316f0, 5, fef98b84, 0, 0) + 110
  ff278488 ssl3_read_n (212798, 5, 8805, 0, 0, 203958) + 174
  ff2785fc ssl3_get_record (204ce0, 8000, 8400, 4400, f1, f0) + d0
  ff279424 ssl3_read_bytes (212798, 1000, 2000, 4, 0, ffbfe731) + 228
  ff27a99c ssl3_get_message (ff2a259c, 2070a0, 0, , 19000,
   ffbfe7a0) + d0 ff27042c ssl3_accept (2150, 2160, 2180, 21e0, 2110, 2122)
   + 904ff27bd2c ssl23_get_client_hello (2316fb, 6c, 6c, 4, fe79, 0)
   + 828ff27b4b4 ssl23_accept (4000, 2000, 0, 0, 0, 0) + 2a4
  00032d00 tls_start_servertls (0, 1, ffbfee24, ffbfee20, 1849a8, ff00)
   + 1980002c504 cmd_starttls (1, 1fd8b8, 0, 0, 0, 0) + 184
  0002a638 service_main (2, 192198, ffbffce0, 1aec4, 3508c, 1) + 488
  00035250 main (2, ffbffcd4, ffbffce0, 17c400, 0, 0) + e18
  00029298 _start   (0, 0, 0, 0, 0, 0) + 108
  
   Thanks, that looks like progress! That stack trace looks similar enough
   to the one I'm seeing that I could imagine that it is what I *should* be
   seeing if the stack weren't garbled. Of course that's only speculation.
  
   Ken, is it possible that the call to SSL_accept() in
   tls_start_servertls() blocks when the client goes away? That could
   explain everything 
 
  Yes.  Gary's problem might be very similar to yours, depending on what I
  see from the patch that I just sent you.
 
  --
  Kenneth Murchison
  Systems Programmer
  Project Cyrus Developer/Maintainer
  Carnegie Mellon University
  
 
  Cyrus Home Page: http://cyrusimap.web.cmu.edu/
  Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
  List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
 




 --
 Alain Spineux
 aspineux gmail com
 May the sources be with you




-- 
Alain Spineux
aspineux gmail com
May the sources be with you

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

building cyrus 2.3.10 on x86_64

2007-11-16 Thread Andrew Morgan

I'm trying to build Cyrus 2.3.10 on an x86_64 box running Debian Etch 
(stable).  The first problem I ran into was out-of-date config.guess and 
config.sub scripts in the 2.3.10 tarball.  I grabbed the latest copies of 
those files from the GNU website:

   http://cvs.savannah.gnu.org/viewvc/*checkout*/config/config/config.guess
   http://cvs.savannah.gnu.org/viewvc/*checkout*/config/config/config.sub

Could we get those added to the next tarball release?


Now, I'm getting an error during the make process when the Cyrus perl bits 
are compiled:

-
### Making all in /private/src/cyrus-imapd-2.3.10/perl/imap
Checking if your kit is complete...
Looks good
Writing Makefile for Cyrus::IMAP
make[2]: Entering directory `/private/src/cyrus-imapd-2.3.10/perl/imap'
cp IMAP/Admin.pm blib/lib/Cyrus/IMAP/Admin.pm
cp IMAP.pm blib/lib/Cyrus/IMAP.pm
cp IMAP/Shell.pm blib/lib/Cyrus/IMAP/Shell.pm
cp IMAP/IMSP.pm blib/lib/Cyrus/IMAP/IMSP.pm
/usr/bin/perl /usr/share/perl/5.8/ExtUtils/xsubpp  -typemap 
/usr/share/perl/5.8/ExtUtils/typemap -typemap typemap  IMAP.xs  IMAP.xsc 
 mv IMAP.xsc IMAP.c
cc -c  -I../../lib -I../.. -I../../com_err/et   -D_REENTRANT -D_GNU_SOURCE 
-DTHREADS_HAVE_PIDS -DDEBIAN -fno-strict-aliasing -pipe 
-I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -O2 
-DVERSION=\1.00\ -DXS_VERSION=\1.00\ -fPIC -I/usr/lib/perl/5.8/CORE 
-DPERL_POLLUTE IMAP.c
Running Mkbootstrap for Cyrus::IMAP ()
chmod 644 IMAP.bs
rm -f blib/arch/auto/Cyrus/IMAP/IMAP.so
cc  -shared -L/usr/local/lib IMAP.o  -o blib/arch/auto/Cyrus/IMAP/IMAP.so 
../../lib/libcyrus.a ../../lib/libcyrus_min.a\
-ldb-4.4 -lsasl2 -lssl -lcrypto  \

/usr/bin/ld: ../../lib/libcyrus.a(imclient.o): relocation R_X86_64_32 
against `a local symbol' can not be used when making a shared object; 
recompile with -fPIC
../../lib/libcyrus.a: could not read symbols: Bad value
collect2: ld returned 1 exit status
make[2]: *** [blib/arch/auto/Cyrus/IMAP/IMAP.so] Error 1
make[2]: Leaving directory `/private/src/cyrus-imapd-2.3.10/perl/imap'
make[1]: *** [all] Error 1
make[1]: Leaving directory `/private/src/cyrus-imapd-2.3.10/perl'
make: *** [all] Error 1
-

I was able to get it to compile cleanly by adding -fPIC to the CFLAGS 
definition in each Makefile.  I'm not sure if this is the correct solution 
though!

Any feedback?

Andy

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

47 matches

Mail list logo