Re: [Dovecot] Connection refused with auth-master afterupgradingto Dovecot 1.0 rc 28

2007-04-12 Thread Timo Sirainen
On Tue, 2007-04-10 at 13:12 -0600, Jason Warner wrote:
  Anyway I should be able to figure out the assert by looking at the
  code too. Just a bit more difficult, since a couple of minutes wasn't
  enough. :)

This should fix it:
http://dovecot.org/list/dovecot-cvs/2007-April/008653.html


signature.asc
Description: This is a digitally signed message part


Re: [Dovecot] Connection refused with auth-master afterupgradingto Dovecot 1.0 rc 28

2007-04-10 Thread Timo Sirainen

On 10.4.2007, at 19.30, Jason Warner wrote:


 write(6, [EMAIL PROTECTED]..., 55) = 55

..

write(6, [EMAIL PROTECTED]..., 54) = 54


Your original error was: net_connect(/var/run/dovecot/auth-master)  
failed: Connection refused. However here both of them were  
successfully connected to, and this shows that the authentication  
handshake was successfully sent. So I think it wrote a different  
error message to Dovecot's logs this time.




PGP.sig
Description: This is a digitally signed message part


Re: [Dovecot] Connection refused with auth-master afterupgradingto Dovecot 1.0 rc 28

2007-04-10 Thread Jason Warner
 On 10.4.2007, at 19.30, Jason Warner wrote:
 
   write(6, [EMAIL PROTECTED]..., 55) = 55
 ..
  write(6, [EMAIL PROTECTED]..., 54) = 54
 
 Your original error was: net_connect(/var/run/dovecot/auth-master)
 failed: Connection refused. However here both of them were
 successfully connected to, and this shows that the authentication
 handshake was successfully sent. So I think it wrote a different
 error message to Dovecot's logs this time.

I'm afraid that I've tampered with stuff so much on this server that things
have changed from when I originally posted. 

Apr 10 10:12:14 mail postfix/smtpd[562]: A7F1B168734:
client=mail.domain.com[127.0.0.1]
Apr 10 10:12:14 mail postfix/cleanup[496]: A7F1B168734:
message-id=[EMAIL PROTECTED]
Apr 10 10:12:14 mail postfix/qmgr[452]: A7F1B168734: from=[EMAIL PROTECTED],
size=2600, nrcpt=1 (queue active)
Apr 10 10:12:14 mail postfix/smtp[464]: E4B351686E4: to=[EMAIL PROTECTED],
relay=127.0.0.1[127.0.0.1]:10024, delay=7, delays=0.22/0/0.01/6.8,
dsn=2.6.0, status=sent (250 2.6.0 Ok, id=31657-06, from MTA: 250 2.0.0 Ok:
queued as A7F1B168734)
Apr 10 10:12:15 mail dovecot: auth(default): file auth-request.c: line 474
(auth_request_lookup_credentials_callback): assertion failed:
(request-state == AUTH_REQUEST_STATE_PASSDB)
Apr 10 10:12:15 mail dovecot: auth(default): Raw backtrace: dovecot-auth
[0x806bb91] - dovecot-auth [0x806baac] - dovecot-auth [0x80552ce] -
dovecot-auth [0x805f2bd] - dovecot-auth [0x805f684] - dovecot-auth
[0x805906b] - dovecot-auth(io_loop_handler_run+0x110) [0x806ef30] -
dovecot-auth(io_loop_run+0x1c) [0x806e27c] - dovecot-auth(main+0x2fe)
[0x805a51e] - /lib/libc.so.6(__libc_start_main+0xdc) [0x673f2c] -
dovecot-auth {0x8050ba1]
Apr 10 10:12:15 mail dovecot: child 31532 (auth) killed with signal 6
Apr 10 10:12:15 mail postfix/pipe[563]: A7F1B168734: to=[EMAIL PROTECTED],
relay=dovecot, delay=0.75, delays=0.24/0.06/0/0.45, dsn=4.3.0,
status=deferred (temporary failure)

This looks like the auth fails due to some error, but I'm not sure what.

Apr 10 10:13:55 mail postfix/qmgr[864]: A7F1B168734: from=[EMAIL PROTECTED],
size=2600, nrcpt=1 (queue active)
Apr 10 10:13:55 mail postfix/pipe[890]: A7F1B168734: to=[EMAIL PROTECTED],
relay=dovecot, delay=101, delays=101/0.02/0/0.11, dsn=2.0.0, status=sent
(delivered via dovecot service)
Apr 10 10:13:55 mail postfix/qmgr[864]: A7F1B168734: removed

At this point, it looks like there was a retry and the dovecot service
delivers the mail, but the mail disappears into thin air. The problem is
that I have a ton of mail flooding the server and errors are overwritten
before I can take a look at them. I wasn't even able to compare the same
mail being sent in the delivery logs because by the time I stopped the
logging more mail had come into the system.




Re: [Dovecot] Connection refused with auth-master afterupgradingto Dovecot 1.0 rc 28

2007-04-10 Thread Jason Warner


 -Original Message-
 From: Timo Sirainen [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, April 10, 2007 11:45 AM
 To: Dovecot Mailing List
 Cc: Jason Warner; 'Jason Warner'
 Subject: Re: [Dovecot] Connection refused with auth-master
 afterupgradingto Dovecot 1.0 rc 28
 
 On 10.4.2007, at 20.35, Timo Sirainen wrote:
 
  On 10.4.2007, at 20.22, Jason Warner wrote:
 
  Apr 10 10:12:15 mail dovecot: auth(default): file auth-request.c:
  line 474
  (auth_request_lookup_credentials_callback): assertion failed:
  (request-state == AUTH_REQUEST_STATE_PASSDB)
 
  Oh. So the connection refused happens because dovecot-auth crashes.
  Now this is beginning to make sense. The easiest way to fix this is
  to add allow_all_users to userdb static's args. I'll try to figure
  out why this crash is happening.

Adding allow_all_users to my userdb static args gets rid of the previous
error message, but now the logs contain this message:

Apr 10 11:48:28 mail deliver([EMAIL PROTECTED]): setgid(999) failed: Operation
not permitted
Apr 10 11:48:28 mail postfix/pipe[17576]: 95FC916872D: to=[EMAIL PROTECTED],
relay=dovecot, delay=0.31, delays=0.26/0.02/0/0.02, dsn=2.0.0, status=sent
(delivered via dovecot service)

999 is the gid of my vmail user.

 
 Do you have core file in Dovecot's base_dir (/var/run/dovecot/ or /
 usr/local/var/run/dovecot/ probably)? If so, could you do:
 
 gdb /path/to/dovecot-auth /path/to/core
 bt full
 fr 4
 p *request
 
 (I think fr 4 is correct to give a usable output for p *request, if
 it just says No symbol try with different fr numbers. It's anyway
 the one that bt full shows auth_request_lookup_credentials_callback()
 being in)
 

I couldn't find a core dump file in the base_dir (/var/run/dovecot). Is
there a way to force a core dump?





Re: [Dovecot] Connection refused with auth-master afterupgradingto Dovecot 1.0 rc 28

2007-04-10 Thread Timo Sirainen

On 10.4.2007, at 21.33, Jason Warner wrote:

I still didn't get a core dump in /var/run/dovecot. Should I be  
looking
somewhere else? That is the directory that the base_dir variable is  
set to

in my dovecot.conf file.


Well, another way would be to attach gdb directly into dovecot-auth  
while it's still running:


gdb attach `pidof dovecot-auth`
cont
(wait for crash)
bt full
fr 4
p *request

Anyway I should be able to figure out the assert by looking at the  
code too. Just a bit more difficult, since a couple of minutes wasn't  
enough. :)


I tried this with both the allow_all_users set and without it set.  
That was

the fix though. By adding allow_all_users to my userdb static args,
everything is running just fine.


Only problem with that is that it doesn't know if the user exists or  
not. So depending on how you've set up permissions, it's possible  
that it delivers mails to unknown users as well, creating the  
directories if needed..


PGP.sig
Description: This is a digitally signed message part


Re: [Dovecot] Connection refused with auth-master afterupgradingto Dovecot 1.0 rc 28

2007-04-10 Thread Jason Warner
 On 10.4.2007, at 21.33, Jason Warner wrote:
 
  I still didn't get a core dump in /var/run/dovecot. Should I be
  looking
  somewhere else? That is the directory that the base_dir variable is
  set to
  in my dovecot.conf file.
 
 Well, another way would be to attach gdb directly into dovecot-auth
 while it's still running:
 
 gdb attach `pidof dovecot-auth`
 cont
 (wait for crash)
 bt full
 fr 4
 p *request
 
 Anyway I should be able to figure out the assert by looking at the
 code too. Just a bit more difficult, since a couple of minutes wasn't
 enough. :)

Here's the best I could do:

(gdb) bt full
#0  0x004b9402 in __kernel_vsyscall ()
No symbol table info available.
#1  0x00686d40 in raise () from /lib/libc.so.6
No symbol table info available.
#2  0x00688591 in abort () from /lib/libc.so.6
No symbol table info available.
#3  0x0806bb9a in i_error ()
No symbol table info available.
#4  0x0806baac in i_panic ()
No symbol table info available.
#5  0x080552ce in auth_request_lookup_credentials_callback ()
No symbol table info available.
#6  0x0805f2bd in passdb_cache_init ()
No symbol table info available.
#7  0x0805f684 in passdb_cache_init ()
No symbol table info available.
#8  0x0805906b in db_ldap_connect ()
No symbol table info available.
#9  0x0806ef30 in io_loop_handler_run ()
No symbol table info available.
#10 0x0806e27c in io_loop_run ()
No symbol table info available.
#11 0x0805a51e in main ()
No symbol table info available.

It looks like a stack trace, but the p *request just gives No symbol table
info available for all 11 integers.

 
  I tried this with both the allow_all_users set and without it set.
  That was
  the fix though. By adding allow_all_users to my userdb static args,
  everything is running just fine.
 
 Only problem with that is that it doesn't know if the user exists or
 not. So depending on how you've set up permissions, it's possible
 that it delivers mails to unknown users as well, creating the
 directories if needed..

That is a problem. I'll go back to Postfix's virtual delivery method until I
don't have to use the allow_all_users flag.