Re: [Dovecot] Leaky dovecot-auth ?

2007-07-02 Thread Christian Balzer
On Wed, 27 Jun 2007 23:15:32 +0300 Timo Sirainen [EMAIL PROTECTED] wrote:
 On Thu, 2007-06-21 at 16:49 +0900, Christian Balzer wrote:
   You could try
   http://dovecot.org/patches/debug/mempool-accounting.diff and send
   USR1 signal to dovecot-auth after a while. It logs how much memory
   is used by all existing memory pools. Each auth request has its own
   pool, so if it's really leaking them it's probably logging a lot of
   lines. If not, then the leak is elsewhere.
   
  I grabbed the Debian package source on a test machine (not gonna chance
  anything on the production servers), applied the patch, did add
  --enable-debug to the debian/rules file (and got the #define DEBUG 
  in config.h), created the binary packages, installed, configured,
  started them, tested a few logins and... nothing gets logged 
  in mail.* if I send a USR1 to dovecot-auth. Anything I'm missing?
 
 Bug, fixed: http://hg.dovecot.org/dovecot-1.0/rev/a098e94cd318
 
Thanks, that fixed the silence of the auth-sheep.

This is the output after start-up:
---
Jul  2 13:59:54 engtest03 dovecot: auth(default): pool auth request handler: 
104 / 4080 bytes
Jul  2 13:59:54 engtest03 last message repeated 19 times
Jul  2 13:59:54 engtest03 dovecot: auth(default): pool passwd_file: 56 / 10224 
bytes
Jul  2 13:59:54 engtest03 dovecot: auth(default): pool Environment: 224 / 2032 
bytes
Jul  2 13:59:54 engtest03 dovecot: auth(default): pool ldap_connection: 576 / 
1008 bytes
Jul  2 13:59:54 engtest03 dovecot: auth(default): pool auth: 1520 / 2032 bytes
---

Used memory of dovecot-auth after 1 login was 3148KB(RSS).

This is after a good trashing with rabid (from the postal package), with
just 2 users though, using POP3 logins:
---
Jul  2 14:12:30 engtest03 dovecot: auth(default): pool auth request handler: 
104 / 4080 bytes
Jul  2 14:12:30 engtest03 last message repeated 128 times
Jul  2 14:12:30 engtest03 dovecot: auth(default): pool passwd_file: 56 / 10224 
bytes
Jul  2 14:12:30 engtest03 dovecot: auth(default): pool Environment: 224 / 2032 
bytes
Jul  2 14:12:30 engtest03 dovecot: auth(default): pool ldap_connection: 576 / 
1008 bytes
Jul  2 14:12:30 engtest03 dovecot: auth(default): pool auth: 1520 / 2032 bytes
---
Note that the amount of auth request handler pools have grown to 128. 
After another short round of rabid the handler pools grew to 137 and the
size of dovecot-auth to 5100KB. The number of handler pools never fell,
nor did the memory footprint, obviously. :-p

At about 800k logins/day/node here it's obvious now why dovecot-auth
explodes after less than a week with max size of 512MB. 

  But no matter, it is clearly leaking just as bad as 0.99 and I venture
  that his is the largest installation with LDAP as authentication
  backend. I wonder if this leak would be avoided by having LDAP lookups
  performed by worker processes as with SQL. 
 
 Then you'd only have multiple leaking worker processes.

Yes, I realize that. But I presume since these are designed to die off
and be recreated on the fly the repercussions would be much better. ;)
Of course now it looks like this is not LDAP related after all.

   The same as 0.99. You could also kill -HUP dovecot when dovecot-auth
   is nearing the limit. That makes it a bit nicer, although not
   perfectly safe either (should fix this some day..).
  
  If that leak can't be found I would very much appreciate a solution
  that at least avoids failed and/or delayed logins.
 
 That would require that login processes don't fail logins if connection
 to dovecot-auth drops, but instead wait until they can connect back to
 it and try again. And maybe another alternative would be to just
 disconnect the client instead of giving login failure.
 
Anything that fixes this one way or the other would be nice. ^_^

Oh and HUP'ing the master is not an option here, I guess the system load
triggers a race condition in dovecot because several times when doing so
I got this:
---
Jun 22 15:08:58 mb11 dovecot: listen(143) failed: Interrupted system call
---
Which results in a killed off dovecot, including all active sessions.

The self terminating dovecot-auth is not nice, but at least more
predictable and does recover by itself:
---
Jun 30 19:03:27 mb12 dovecot: auth(default): pool_system_malloc(): Out of memory
Jun 30 19:03:27 mb12 dovecot: child 0 (auth) returned error 83 (Out of 
memory)
Jun 30 19:03:28 mb12 dovecot: pop3-login: Can't connect to auth server at 
default: Resource temporarily unavailable
Jun 30 19:03:28 mb12 last message repeated 11 times
---
Of course the 12 users that tried to log in at this time are probably not
amused or at least confused.

Regards,

Christian
-- 
Christian BalzerNetwork/Systems EngineerNOC
[EMAIL PROTECTED]   Global OnLine Japan/Fusion Network Services
http://www.gol.com/


Re: [Dovecot] Leaky dovecot-auth ?

2007-07-02 Thread Timo Sirainen
On Mon, 2007-07-02 at 15:20 +0900, Christian Balzer wrote:
 Jul  2 14:12:30 engtest03 dovecot: auth(default): pool auth request handler: 
 104 / 4080 bytes
 Jul  2 14:12:30 engtest03 last message repeated 128 times

Auth request handler is created for each imap-login connection. So if
you have 128 imap-login processes this isn't a leak.

Hmm. Does this help: http://hg.dovecot.org/dovecot-1.0/rev/50c79521e8f5

 Oh and HUP'ing the master is not an option here, I guess the system load
 triggers a race condition in dovecot because several times when doing so
 I got this:
 ---
 Jun 22 15:08:58 mb11 dovecot: listen(143) failed: Interrupted system call

Did you use killall? I think this happens only with it.



signature.asc
Description: This is a digitally signed message part


Re: [Dovecot] Leaky dovecot-auth ?

2007-07-02 Thread Christian Balzer
On Mon, 02 Jul 2007 17:37:05 +0300 Timo Sirainen [EMAIL PROTECTED] wrote:

 On Mon, 2007-07-02 at 15:20 +0900, Christian Balzer wrote:
  Jul  2 14:12:30 engtest03 dovecot: auth(default): pool auth request
  handler: 104 / 4080 bytes Jul  2 14:12:30 engtest03 last message
  repeated 128 times
 
 Auth request handler is created for each imap-login connection. So if
 you have 128 imap-login processes this isn't a leak.
 
At that point in time only POP3 was tried, since this is by far the 
most used protocol here and rabid defaults to it anyway. But there 
were plenty of pop3-login processes indeed. Enough to make up that 
number combined with the IMAP ones.
Which is interesting, as this does NOT happen on the production servers,
I guess rabid can dish out even more stress than my users (and cause
these login processes to be left hanging around). 

But that's not the issue anyway, with identical pool outputs the local
DB incarnation retains its size (I got an internal IMAP server with 1.0.0 
and PAM and a few dozen intense users which also shows no signs of a
growing dovecot-auth) while the LDAP DB one keeps growing with nothing
to show for in that pool debug output. 

 Hmm. Does this help: http://hg.dovecot.org/dovecot-1.0/rev/50c79521e8f5
 
Will try that tomorrow if I can.

  Oh and HUP'ing the master is not an option here, I guess the system
  load triggers a race condition in dovecot because several times when
  doing so I got this:
  ---
  Jun 22 15:08:58 mb11 dovecot: listen(143) failed: Interrupted system
  call
 
 Did you use killall? I think this happens only with it.
 
Nope, this is a Debian/Linux show and I did HUP just the master process.
It only happened some of the times on the (then) busiest node, but it 
clearly is a race condition of sorts. Set up a test environment with
about 30-50 logins/second and I'm sure you can reproduce it. ;)


Regards,

Christian
-- 
Christian BalzerNetwork/Systems EngineerNOC
[EMAIL PROTECTED]   Global OnLine Japan/Fusion Network Services
http://www.gol.com/


Re: [Dovecot] Leaky dovecot-auth ?

2007-07-02 Thread Timo Sirainen
On Tue, 2007-07-03 at 00:17 +0900, Christian Balzer wrote:

   Jun 22 15:08:58 mb11 dovecot: listen(143) failed: Interrupted system
   call
  
  Did you use killall? I think this happens only with it.

Not even with it actually. I was probably thinking about something else.

 Nope, this is a Debian/Linux show and I did HUP just the master process.
 It only happened some of the times on the (then) busiest node, but it 
 clearly is a race condition of sorts. Set up a test environment with
 about 30-50 logins/second and I'm sure you can reproduce it. ;)

Not even with 1100 logins/sec. :)

And in the code I'm already catching EINTR. Only if listen() fails 10
times with EINTR it exits with that error. And between those 10 listen()
calls is a 1 second sleep. But of course if the process is getting
signals all the time the sleep also fails with EINTR. But a single HUP
signal shouldn't interrupt more than a single syscall, so I've really no
idea what's happening.



signature.asc
Description: This is a digitally signed message part


Re: [Dovecot] Leaky dovecot-auth ?

2007-07-02 Thread Christian Balzer
On Mon, 02 Jul 2007 17:37:05 +0300 Timo Sirainen [EMAIL PROTECTED] wrote:
 
 Hmm. Does this help: http://hg.dovecot.org/dovecot-1.0/rev/50c79521e8f5

We have a winner!
Auth process grows to the same size as with a local DB and stays there.

Now I just have to get this into a security maintained Debian package...
(looks around for the official package masters and backports maintainers)

Am I correct in assuming that this code did not change since 0.99, read
that the leak I saw for the last 4 years was the same thing? ;)

Regards,

Christian
-- 
Christian BalzerNetwork/Systems EngineerNOC
[EMAIL PROTECTED]   Global OnLine Japan/Fusion Network Services
http://www.gol.com/


Re: [Dovecot] Leaky dovecot-auth ?

2007-06-27 Thread Timo Sirainen
On Thu, 2007-06-21 at 16:49 +0900, Christian Balzer wrote:
  You could try http://dovecot.org/patches/debug/mempool-accounting.diff
  and send USR1 signal to dovecot-auth after a while. It logs how much
  memory is used by all existing memory pools. Each auth request has its
  own pool, so if it's really leaking them it's probably logging a lot of
  lines. If not, then the leak is elsewhere.
  
 I grabbed the Debian package source on a test machine (not gonna chance
 anything on the production servers), applied the patch, did add
 --enable-debug to the debian/rules file (and got the #define DEBUG 
 in config.h), created the binary packages, installed, configured,
 started them, tested a few logins and... nothing gets logged 
 in mail.* if I send a USR1 to dovecot-auth. Anything I'm missing?

Bug, fixed: http://hg.dovecot.org/dovecot-1.0/rev/a098e94cd318

 But no matter, it is clearly leaking just as bad as 0.99 and I venture
 that his is the largest installation with LDAP as authentication backend.
 I wonder if this leak would be avoided by having LDAP lookups performed
 by worker processes as with SQL. 

Then you'd only have multiple leaking worker processes.

  The same as 0.99. You could also kill -HUP dovecot when dovecot-auth is
  nearing the limit. That makes it a bit nicer, although not perfectly
  safe either (should fix this some day..).
 
 If that leak can't be found I would very much appreciate a solution that
 at least avoids failed and/or delayed logins.

That would require that login processes don't fail logins if connection
to dovecot-auth drops, but instead wait until they can connect back to
it and try again. And maybe another alternative would be to just
disconnect the client instead of giving login failure.



signature.asc
Description: This is a digitally signed message part


Re: [Dovecot] Leaky dovecot-auth ?

2007-06-21 Thread Christian Balzer
On Tue, 19 Jun 2007 14:39:59 +0300 Timo Sirainen [EMAIL PROTECTED] wrote:
 On Tue, 2007-06-19 at 13:40 +0900, Christian Balzer wrote:
 
  1. How and why would the memory footprint of dovecot-auth grow when
  there is no change in the amount of users in the DB? 
 
 The only thing that's changing the size of dovecot-auth is how many
 requests it's simultaneously handling. For example if you try to login
 with invalid user/pass 1000 times within a second, dovecot-auth keeps
 those 1000 requests in memory for a couple of seconds until it returns
 with failure. But this happens also with normal requests, just not for
 so long.
 
There is nowhere near anything of this kind of concurrency and 
memory should be returned after a while, but that is clearly not
happening. The dovecot-auth is now at 200/160MB and thus prone to
blow up over the weekend I guess.

 You could try http://dovecot.org/patches/debug/mempool-accounting.diff
 and send USR1 signal to dovecot-auth after a while. It logs how much
 memory is used by all existing memory pools. Each auth request has its
 own pool, so if it's really leaking them it's probably logging a lot of
 lines. If not, then the leak is elsewhere.
 
I grabbed the Debian package source on a test machine (not gonna chance
anything on the production servers), applied the patch, did add
--enable-debug to the debian/rules file (and got the #define DEBUG 
in config.h), created the binary packages, installed, configured,
started them, tested a few logins and... nothing gets logged 
in mail.* if I send a USR1 to dovecot-auth. Anything I'm missing?

But no matter, it is clearly leaking just as bad as 0.99 and I venture
that his is the largest installation with LDAP as authentication backend.
I wonder if this leak would be avoided by having LDAP lookups performed
by worker processes as with SQL. 

  2. What will happen when the single dovecot-auth process reaches 256MB
  in the end? Internal housekeeping attempts of that process? A whack to
  the head from the master process like in 0.99 and thus more erroneous 
  authentication failures, potentially aggravated by the fact that there
  is just single dovecot-auth process?
 
 The same as 0.99. You could also kill -HUP dovecot when dovecot-auth is
 nearing the limit. That makes it a bit nicer, although not perfectly
 safe either (should fix this some day..).

If that leak can't be found I would very much appreciate a solution that
at least avoids failed and/or delayed logins.

Regards,

Christian
-- 
Christian BalzerNetwork/Systems EngineerNOC
[EMAIL PROTECTED]   Global OnLine Japan/Fusion Network Services
http://www.gol.com/


Re: [Dovecot] Leaky dovecot-auth ?

2007-06-19 Thread Timo Sirainen
On Tue, 2007-06-19 at 13:40 +0900, Christian Balzer wrote:
 Hello,
 
 as mentioned before, we are migrating our mailboxes from a 0.99 cluster
 to a 1.0.0 one. With 0.99 dovecot-auth (with LDAP as backend) was leaking
 quite happily and the dovecot-auth processes frequently did hit their
 size limit and thus were killed and restarted. Which in 0.99 at least
 lead to authentication failures on a busy server, as the dovecot
 master process just killed off the auth process w/o disabling new
 connections to it first and letting any current authentications finish.

It's actually your kernel that kills the process.

 1. How and why would the memory footprint of dovecot-auth grow when
 there is no change in the amount of users in the DB? 

The only thing that's changing the size of dovecot-auth is how many
requests it's simultaneously handling. For example if you try to login
with invalid user/pass 1000 times within a second, dovecot-auth keeps
those 1000 requests in memory for a couple of seconds until it returns
with failure. But this happens also with normal requests, just not for
so long.

You could try http://dovecot.org/patches/debug/mempool-accounting.diff
and send USR1 signal to dovecot-auth after a while. It logs how much
memory is used by all existing memory pools. Each auth request has its
own pool, so if it's really leaking them it's probably logging a lot of
lines. If not, then the leak is elsewhere.

 2. What will happen when the single dovecot-auth process reaches 256MB
 in the end? Internal housekeeping attempts of that process? A whack to
 the head from the master process like in 0.99 and thus more erroneous 
 authentication failures, potentially aggravated by the fact that there
 is just single dovecot-auth process?

The same as 0.99. You could also kill -HUP dovecot when dovecot-auth is
nearing the limit. That makes it a bit nicer, although not perfectly
safe either (should fix this some day..).


signature.asc
Description: This is a digitally signed message part