Re: [Dovecot] Leaky dovecot-auth ?
On Wed, 27 Jun 2007 23:15:32 +0300 Timo Sirainen [EMAIL PROTECTED] wrote: On Thu, 2007-06-21 at 16:49 +0900, Christian Balzer wrote: You could try http://dovecot.org/patches/debug/mempool-accounting.diff and send USR1 signal to dovecot-auth after a while. It logs how much memory is used by all existing memory pools. Each auth request has its own pool, so if it's really leaking them it's probably logging a lot of lines. If not, then the leak is elsewhere. I grabbed the Debian package source on a test machine (not gonna chance anything on the production servers), applied the patch, did add --enable-debug to the debian/rules file (and got the #define DEBUG in config.h), created the binary packages, installed, configured, started them, tested a few logins and... nothing gets logged in mail.* if I send a USR1 to dovecot-auth. Anything I'm missing? Bug, fixed: http://hg.dovecot.org/dovecot-1.0/rev/a098e94cd318 Thanks, that fixed the silence of the auth-sheep. This is the output after start-up: --- Jul 2 13:59:54 engtest03 dovecot: auth(default): pool auth request handler: 104 / 4080 bytes Jul 2 13:59:54 engtest03 last message repeated 19 times Jul 2 13:59:54 engtest03 dovecot: auth(default): pool passwd_file: 56 / 10224 bytes Jul 2 13:59:54 engtest03 dovecot: auth(default): pool Environment: 224 / 2032 bytes Jul 2 13:59:54 engtest03 dovecot: auth(default): pool ldap_connection: 576 / 1008 bytes Jul 2 13:59:54 engtest03 dovecot: auth(default): pool auth: 1520 / 2032 bytes --- Used memory of dovecot-auth after 1 login was 3148KB(RSS). This is after a good trashing with rabid (from the postal package), with just 2 users though, using POP3 logins: --- Jul 2 14:12:30 engtest03 dovecot: auth(default): pool auth request handler: 104 / 4080 bytes Jul 2 14:12:30 engtest03 last message repeated 128 times Jul 2 14:12:30 engtest03 dovecot: auth(default): pool passwd_file: 56 / 10224 bytes Jul 2 14:12:30 engtest03 dovecot: auth(default): pool Environment: 224 / 2032 bytes Jul 2 14:12:30 engtest03 dovecot: auth(default): pool ldap_connection: 576 / 1008 bytes Jul 2 14:12:30 engtest03 dovecot: auth(default): pool auth: 1520 / 2032 bytes --- Note that the amount of auth request handler pools have grown to 128. After another short round of rabid the handler pools grew to 137 and the size of dovecot-auth to 5100KB. The number of handler pools never fell, nor did the memory footprint, obviously. :-p At about 800k logins/day/node here it's obvious now why dovecot-auth explodes after less than a week with max size of 512MB. But no matter, it is clearly leaking just as bad as 0.99 and I venture that his is the largest installation with LDAP as authentication backend. I wonder if this leak would be avoided by having LDAP lookups performed by worker processes as with SQL. Then you'd only have multiple leaking worker processes. Yes, I realize that. But I presume since these are designed to die off and be recreated on the fly the repercussions would be much better. ;) Of course now it looks like this is not LDAP related after all. The same as 0.99. You could also kill -HUP dovecot when dovecot-auth is nearing the limit. That makes it a bit nicer, although not perfectly safe either (should fix this some day..). If that leak can't be found I would very much appreciate a solution that at least avoids failed and/or delayed logins. That would require that login processes don't fail logins if connection to dovecot-auth drops, but instead wait until they can connect back to it and try again. And maybe another alternative would be to just disconnect the client instead of giving login failure. Anything that fixes this one way or the other would be nice. ^_^ Oh and HUP'ing the master is not an option here, I guess the system load triggers a race condition in dovecot because several times when doing so I got this: --- Jun 22 15:08:58 mb11 dovecot: listen(143) failed: Interrupted system call --- Which results in a killed off dovecot, including all active sessions. The self terminating dovecot-auth is not nice, but at least more predictable and does recover by itself: --- Jun 30 19:03:27 mb12 dovecot: auth(default): pool_system_malloc(): Out of memory Jun 30 19:03:27 mb12 dovecot: child 0 (auth) returned error 83 (Out of memory) Jun 30 19:03:28 mb12 dovecot: pop3-login: Can't connect to auth server at default: Resource temporarily unavailable Jun 30 19:03:28 mb12 last message repeated 11 times --- Of course the 12 users that tried to log in at this time are probably not amused or at least confused. Regards, Christian -- Christian BalzerNetwork/Systems EngineerNOC [EMAIL PROTECTED] Global OnLine Japan/Fusion Network Services http://www.gol.com/
Re: [Dovecot] Leaky dovecot-auth ?
On Mon, 2007-07-02 at 15:20 +0900, Christian Balzer wrote: Jul 2 14:12:30 engtest03 dovecot: auth(default): pool auth request handler: 104 / 4080 bytes Jul 2 14:12:30 engtest03 last message repeated 128 times Auth request handler is created for each imap-login connection. So if you have 128 imap-login processes this isn't a leak. Hmm. Does this help: http://hg.dovecot.org/dovecot-1.0/rev/50c79521e8f5 Oh and HUP'ing the master is not an option here, I guess the system load triggers a race condition in dovecot because several times when doing so I got this: --- Jun 22 15:08:58 mb11 dovecot: listen(143) failed: Interrupted system call Did you use killall? I think this happens only with it. signature.asc Description: This is a digitally signed message part
Re: [Dovecot] Leaky dovecot-auth ?
On Mon, 02 Jul 2007 17:37:05 +0300 Timo Sirainen [EMAIL PROTECTED] wrote: On Mon, 2007-07-02 at 15:20 +0900, Christian Balzer wrote: Jul 2 14:12:30 engtest03 dovecot: auth(default): pool auth request handler: 104 / 4080 bytes Jul 2 14:12:30 engtest03 last message repeated 128 times Auth request handler is created for each imap-login connection. So if you have 128 imap-login processes this isn't a leak. At that point in time only POP3 was tried, since this is by far the most used protocol here and rabid defaults to it anyway. But there were plenty of pop3-login processes indeed. Enough to make up that number combined with the IMAP ones. Which is interesting, as this does NOT happen on the production servers, I guess rabid can dish out even more stress than my users (and cause these login processes to be left hanging around). But that's not the issue anyway, with identical pool outputs the local DB incarnation retains its size (I got an internal IMAP server with 1.0.0 and PAM and a few dozen intense users which also shows no signs of a growing dovecot-auth) while the LDAP DB one keeps growing with nothing to show for in that pool debug output. Hmm. Does this help: http://hg.dovecot.org/dovecot-1.0/rev/50c79521e8f5 Will try that tomorrow if I can. Oh and HUP'ing the master is not an option here, I guess the system load triggers a race condition in dovecot because several times when doing so I got this: --- Jun 22 15:08:58 mb11 dovecot: listen(143) failed: Interrupted system call Did you use killall? I think this happens only with it. Nope, this is a Debian/Linux show and I did HUP just the master process. It only happened some of the times on the (then) busiest node, but it clearly is a race condition of sorts. Set up a test environment with about 30-50 logins/second and I'm sure you can reproduce it. ;) Regards, Christian -- Christian BalzerNetwork/Systems EngineerNOC [EMAIL PROTECTED] Global OnLine Japan/Fusion Network Services http://www.gol.com/
Re: [Dovecot] Leaky dovecot-auth ?
On Tue, 2007-07-03 at 00:17 +0900, Christian Balzer wrote: Jun 22 15:08:58 mb11 dovecot: listen(143) failed: Interrupted system call Did you use killall? I think this happens only with it. Not even with it actually. I was probably thinking about something else. Nope, this is a Debian/Linux show and I did HUP just the master process. It only happened some of the times on the (then) busiest node, but it clearly is a race condition of sorts. Set up a test environment with about 30-50 logins/second and I'm sure you can reproduce it. ;) Not even with 1100 logins/sec. :) And in the code I'm already catching EINTR. Only if listen() fails 10 times with EINTR it exits with that error. And between those 10 listen() calls is a 1 second sleep. But of course if the process is getting signals all the time the sleep also fails with EINTR. But a single HUP signal shouldn't interrupt more than a single syscall, so I've really no idea what's happening. signature.asc Description: This is a digitally signed message part
Re: [Dovecot] Leaky dovecot-auth ?
On Mon, 02 Jul 2007 17:37:05 +0300 Timo Sirainen [EMAIL PROTECTED] wrote: Hmm. Does this help: http://hg.dovecot.org/dovecot-1.0/rev/50c79521e8f5 We have a winner! Auth process grows to the same size as with a local DB and stays there. Now I just have to get this into a security maintained Debian package... (looks around for the official package masters and backports maintainers) Am I correct in assuming that this code did not change since 0.99, read that the leak I saw for the last 4 years was the same thing? ;) Regards, Christian -- Christian BalzerNetwork/Systems EngineerNOC [EMAIL PROTECTED] Global OnLine Japan/Fusion Network Services http://www.gol.com/
Re: [Dovecot] Leaky dovecot-auth ?
On Thu, 2007-06-21 at 16:49 +0900, Christian Balzer wrote: You could try http://dovecot.org/patches/debug/mempool-accounting.diff and send USR1 signal to dovecot-auth after a while. It logs how much memory is used by all existing memory pools. Each auth request has its own pool, so if it's really leaking them it's probably logging a lot of lines. If not, then the leak is elsewhere. I grabbed the Debian package source on a test machine (not gonna chance anything on the production servers), applied the patch, did add --enable-debug to the debian/rules file (and got the #define DEBUG in config.h), created the binary packages, installed, configured, started them, tested a few logins and... nothing gets logged in mail.* if I send a USR1 to dovecot-auth. Anything I'm missing? Bug, fixed: http://hg.dovecot.org/dovecot-1.0/rev/a098e94cd318 But no matter, it is clearly leaking just as bad as 0.99 and I venture that his is the largest installation with LDAP as authentication backend. I wonder if this leak would be avoided by having LDAP lookups performed by worker processes as with SQL. Then you'd only have multiple leaking worker processes. The same as 0.99. You could also kill -HUP dovecot when dovecot-auth is nearing the limit. That makes it a bit nicer, although not perfectly safe either (should fix this some day..). If that leak can't be found I would very much appreciate a solution that at least avoids failed and/or delayed logins. That would require that login processes don't fail logins if connection to dovecot-auth drops, but instead wait until they can connect back to it and try again. And maybe another alternative would be to just disconnect the client instead of giving login failure. signature.asc Description: This is a digitally signed message part
Re: [Dovecot] Leaky dovecot-auth ?
On Tue, 19 Jun 2007 14:39:59 +0300 Timo Sirainen [EMAIL PROTECTED] wrote: On Tue, 2007-06-19 at 13:40 +0900, Christian Balzer wrote: 1. How and why would the memory footprint of dovecot-auth grow when there is no change in the amount of users in the DB? The only thing that's changing the size of dovecot-auth is how many requests it's simultaneously handling. For example if you try to login with invalid user/pass 1000 times within a second, dovecot-auth keeps those 1000 requests in memory for a couple of seconds until it returns with failure. But this happens also with normal requests, just not for so long. There is nowhere near anything of this kind of concurrency and memory should be returned after a while, but that is clearly not happening. The dovecot-auth is now at 200/160MB and thus prone to blow up over the weekend I guess. You could try http://dovecot.org/patches/debug/mempool-accounting.diff and send USR1 signal to dovecot-auth after a while. It logs how much memory is used by all existing memory pools. Each auth request has its own pool, so if it's really leaking them it's probably logging a lot of lines. If not, then the leak is elsewhere. I grabbed the Debian package source on a test machine (not gonna chance anything on the production servers), applied the patch, did add --enable-debug to the debian/rules file (and got the #define DEBUG in config.h), created the binary packages, installed, configured, started them, tested a few logins and... nothing gets logged in mail.* if I send a USR1 to dovecot-auth. Anything I'm missing? But no matter, it is clearly leaking just as bad as 0.99 and I venture that his is the largest installation with LDAP as authentication backend. I wonder if this leak would be avoided by having LDAP lookups performed by worker processes as with SQL. 2. What will happen when the single dovecot-auth process reaches 256MB in the end? Internal housekeeping attempts of that process? A whack to the head from the master process like in 0.99 and thus more erroneous authentication failures, potentially aggravated by the fact that there is just single dovecot-auth process? The same as 0.99. You could also kill -HUP dovecot when dovecot-auth is nearing the limit. That makes it a bit nicer, although not perfectly safe either (should fix this some day..). If that leak can't be found I would very much appreciate a solution that at least avoids failed and/or delayed logins. Regards, Christian -- Christian BalzerNetwork/Systems EngineerNOC [EMAIL PROTECTED] Global OnLine Japan/Fusion Network Services http://www.gol.com/
Re: [Dovecot] Leaky dovecot-auth ?
On Tue, 2007-06-19 at 13:40 +0900, Christian Balzer wrote: Hello, as mentioned before, we are migrating our mailboxes from a 0.99 cluster to a 1.0.0 one. With 0.99 dovecot-auth (with LDAP as backend) was leaking quite happily and the dovecot-auth processes frequently did hit their size limit and thus were killed and restarted. Which in 0.99 at least lead to authentication failures on a busy server, as the dovecot master process just killed off the auth process w/o disabling new connections to it first and letting any current authentications finish. It's actually your kernel that kills the process. 1. How and why would the memory footprint of dovecot-auth grow when there is no change in the amount of users in the DB? The only thing that's changing the size of dovecot-auth is how many requests it's simultaneously handling. For example if you try to login with invalid user/pass 1000 times within a second, dovecot-auth keeps those 1000 requests in memory for a couple of seconds until it returns with failure. But this happens also with normal requests, just not for so long. You could try http://dovecot.org/patches/debug/mempool-accounting.diff and send USR1 signal to dovecot-auth after a while. It logs how much memory is used by all existing memory pools. Each auth request has its own pool, so if it's really leaking them it's probably logging a lot of lines. If not, then the leak is elsewhere. 2. What will happen when the single dovecot-auth process reaches 256MB in the end? Internal housekeeping attempts of that process? A whack to the head from the master process like in 0.99 and thus more erroneous authentication failures, potentially aggravated by the fact that there is just single dovecot-auth process? The same as 0.99. You could also kill -HUP dovecot when dovecot-auth is nearing the limit. That makes it a bit nicer, although not perfectly safe either (should fix this some day..). signature.asc Description: This is a digitally signed message part