Re: [HACKERS] Strange hanging bug in a simple milter

2013-09-24 Thread Vesa-Matti J Kari
Hello, On Mon, 23 Sep 2013, Stephen Frost wrote: I've now committed a fix for this issue. I cloned the 9.4devel branch and linked my authmilter and a test program (based on Heikki's earlier design) against the libpq that comes with it. After hours of pretty extensive stress testing using 2,

Re: [HACKERS] Strange hanging bug in a simple milter

2013-09-24 Thread Stephen Frost
* Vesa-Matti J Kari (vmk...@cc.helsinki.fi) wrote: Many thanks to all who contributed to the fix. Great! Thanks for the report and the testing. Stephen signature.asc Description: Digital signature

Re: [HACKERS] Strange hanging bug in a simple milter

2013-09-23 Thread Stephen Frost
Vesa-Matti J Kari, I've now committed a fix for this issue. If you have opportunity to, it'd be great to pull down the latest git (for whichever supported branch you'd like) and give it a try. Otherwise, the fix should be out with our next round of point releases (which I expect will be

Re: [HACKERS] Strange hanging bug in a simple milter

2013-09-13 Thread Stephen Frost
Heikki, all, * Stephen Frost (sfr...@snowman.net) wrote: Very curious. Out of time right now to look into it, but will probably be back at it later tonight. Alright, I was back at this a bit today and decided to go with a hunch- and it looks like I might have been right to try. Leaving the

Re: [HACKERS] Strange hanging bug in a simple milter

2013-09-13 Thread Andres Freund
On 2013-09-13 12:40:11 -0400, Stephen Frost wrote: Heikki, all, * Stephen Frost (sfr...@snowman.net) wrote: Very curious. Out of time right now to look into it, but will probably be back at it later tonight. Alright, I was back at this a bit today and decided to go with a hunch- and

Re: [HACKERS] Strange hanging bug in a simple milter

2013-09-13 Thread Stephen Frost
Andres, On Friday, September 13, 2013, Andres Freund wrote: It'd be interesting to replace the origin callbacks with one immediately doing an abort() or similar to see whether they maybe are called after they shouldn't be and from where. Good thought. Got sucked into a meeting but once I'm

Re: [HACKERS] Strange hanging bug in a simple milter

2013-09-13 Thread Andres Freund
On 2013-09-13 13:15:34 -0400, Stephen Frost wrote: Andres, On Friday, September 13, 2013, Andres Freund wrote: It'd be interesting to replace the origin callbacks with one immediately doing an abort() or similar to see whether they maybe are called after they shouldn't be and from

Re: [HACKERS] Strange hanging bug in a simple milter

2013-09-13 Thread Stephen Frost
* Stephen Frost (sfr...@snowman.net) wrote: * Andres Freund (and...@2ndquadrant.com) wrote: Hm. close_SSL() first does pqsecure_destroy() which will unset the callbacks, and the count and then goes on to do X509_free() and ENGINE_finish(), ENGINE_free() if either is used. It's not

Re: [HACKERS] Strange hanging bug in a simple milter

2013-09-13 Thread Stephen Frost
* Andres Freund (and...@2ndquadrant.com) wrote: On 2013-09-13 13:15:34 -0400, Stephen Frost wrote: Good thought. Got sucked into a meeting but once I'm out I'll try having the lock/unlock routines abort if they're called while ssl_open_connections is zero, which should not be happening, but

Re: [HACKERS] Strange hanging bug in a simple milter

2013-09-13 Thread Andres Freund
On 2013-09-13 14:33:25 -0400, Stephen Frost wrote: * Stephen Frost (sfr...@snowman.net) wrote: * Andres Freund (and...@2ndquadrant.com) wrote: Hm. close_SSL() first does pqsecure_destroy() which will unset the callbacks, and the count and then goes on to do X509_free() and

Re: [HACKERS] Strange hanging bug in a simple milter

2013-09-13 Thread Andres Freund
On 2013-09-13 13:59:54 -0400, Stephen Frost wrote: Unfortunately, while I can still easily get the deadlock to happen when the hooks are reset, the hooks don't appear to ever get called when ssl_open_connections is set to zero. You have a good point about the additional SSL calls after the

Re: [HACKERS] Strange hanging bug in a simple milter

2013-09-13 Thread Heikki Linnakangas
On 13.09.2013 22:26, Heikki Linnakangas wrote: I'm afraid the move_locks.diff patch you posted earlier is also broken; close_SSL() is called in error scenarios from pqsecure_open_client(), while already holding the mutex. So it will deadlock with itself if the connection cannot be established.

Re: [HACKERS] Strange hanging bug in a simple milter

2013-09-13 Thread Stephen Frost
* Heikki Linnakangas (hlinnakan...@vmware.com) wrote: Umm, with that patch, pqsecure_destroy() is never called. The if (conn-ssl) test that's now at the end of the close_SSL function is never true, because conn-ssl is set to NULL earlier. Yeah, got ahead of myself, as Andres pointed out. I'm

Re: [HACKERS] Strange hanging bug in a simple milter

2013-09-13 Thread Heikki Linnakangas
On 13.09.2013 22:03, Stephen Frost wrote: * Andres Freund (and...@2ndquadrant.com) wrote: It seems slightly cleaner to just move the pqsecure_destroy(); to the end of that function, based on a boolean. But if you think otherwise, I won't protest... Hmm, agreed; I had originally been concerned

Re: [HACKERS] Strange hanging bug in a simple milter

2013-09-13 Thread Stephen Frost
* Andres Freund (and...@2ndquadrant.com) wrote: It seems slightly cleaner to just move the pqsecure_destroy(); to the end of that function, based on a boolean. But if you think otherwise, I won't protest... Hmm, agreed; I had originally been concerned that the SIGPIPE madness needed to be

Re: [HACKERS] Strange hanging bug in a simple milter

2013-09-13 Thread Andres Freund
On 2013-09-13 15:03:31 -0400, Stephen Frost wrote: * Andres Freund (and...@2ndquadrant.com) wrote: It seems slightly cleaner to just move the pqsecure_destroy(); to the end of that function, based on a boolean. But if you think otherwise, I won't protest... Hmm, agreed; I had originally

Re: [HACKERS] Strange hanging bug in a simple milter

2013-09-13 Thread Stephen Frost
* Heikki Linnakangas (hlinnakan...@vmware.com) wrote: Actually, I think there's a pre-existing bug there in git master. If the SSL_set_app_data or SSL_set_fd call in pqsecure_open_client fails for some reason, it will call close_SSL() with conn-ssl already set, and the mutex held. close_SSL()

Re: [HACKERS] Strange hanging bug in a simple milter

2013-09-13 Thread Stephen Frost
* Andres Freund (and...@2ndquadrant.com) wrote: That patch looks wrong to me. Note that the if (conn-ssl) branch resets conn-ssl to NULL. huh, it figures one would overlook the simplest things. Of course it's not locking up now- we never remove the hooks (as my original patch was doing :).

Re: [HACKERS] Strange hanging bug in a simple milter

2013-09-10 Thread Stephen Frost
Heikki, * Heikki Linnakangas (hlinnakan...@vmware.com) wrote: Thanks! I tested with git master. I've run your test program against both git master and 9.2.4 on a couple of Ubuntu 13.04 boxes and all I see are tons of these: 1: DEBUG: database connection established 1: DEBUG: about to call

Re: [HACKERS] Strange hanging bug in a simple milter

2013-09-10 Thread Heikki Linnakangas
On 10.09.2013 18:10, Stephen Frost wrote: I've run your test program against both git master and 9.2.4 on a couple of Ubuntu 13.04 boxes and all I see are tons of these: 1: DEBUG: database connection established 1: DEBUG: about to call PQfinish() 1: DEBUG: database connection established 1:

Re: [HACKERS] Strange hanging bug in a simple milter

2013-09-10 Thread Stephen Frost
Heikki, * Heikki Linnakangas (hlinnakan...@vmware.com) wrote: Hmm. Are you sure you're getting an SSL connection? Run it with something like this to make sure: sslmode=require doesn't help on Unix domain connections. :) Was able to get it to lock with both 9.2.4 and master, and with both

[HACKERS] Strange hanging bug in a simple milter

2013-09-09 Thread Vesa-Matti J Kari
Hello PostgreSQL gurus, (I have already posted a very similar message to comp.mail.sendmail newsgroup on August 22nd, but I haven't received any responses there. I have also tried pgsql-interfa...@postgresql.org but to no avail. Solving this problem requires some Sendmail/Postfix experience

Re: [HACKERS] Strange hanging bug in a simple milter

2013-09-09 Thread Heikki Linnakangas
On 09.09.2013 09:34, Vesa-Matti J Kari wrote: Basically all that the authmilter now does is to connect to PostgreSQL in authmilt_connect() and close the connection in authmilt_close(). Based on the authmilter debug logging it seems to me that when the hanging occurs, the authmilter never

Re: [HACKERS] Strange hanging bug in a simple milter

2013-09-09 Thread Vesa-Matti J Kari
Hello, On Mon, 9 Sep 2013, Heikki Linnakangas wrote: I managed to set that up and got it running. Many thanks for taking the time. But it works fine for me, does not hang. Okay. Have you tried increasing the iterations for the smtp sender scripts? And could you please specify what is your

Re: [HACKERS] Strange hanging bug in a simple milter

2013-09-09 Thread Stephen Frost
Vesa-Matti, Heikki, * Heikki Linnakangas (hlinnakan...@vmware.com) wrote: On 09.09.2013 15:36, Vesa-Matti J Kari wrote: If I interpret this correctly, threads #2 and #3 are waiting for the same lock but they make no progress. A-ha, the deadlock happens while doing SSL stuff. I didn't have

Re: [HACKERS] Strange hanging bug in a simple milter

2013-09-09 Thread Heikki Linnakangas
On 09.09.2013 15:36, Vesa-Matti J Kari wrote: It looks like a deadlock situation of some kind... (gdb) thread 2 [Switching to thread 2 (Thread 0x7fe62f7fe700 (LWP 27284))] #0 0x7fe64c0b589c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0 (gdb) bt #0 0x7fe64c0b589c in

Re: [HACKERS] Strange hanging bug in a simple milter

2013-09-09 Thread Heikki Linnakangas
On 09.09.2013 18:20, Stephen Frost wrote: Vesa-Matti, Heikki, * Heikki Linnakangas (hlinnakan...@vmware.com) wrote: On 09.09.2013 15:36, Vesa-Matti J Kari wrote: If I interpret this correctly, threads #2 and #3 are waiting for the same lock but they make no progress. A-ha, the deadlock

Re: [HACKERS] Strange hanging bug in a simple milter

2013-09-09 Thread Alvaro Herrera
Heikki Linnakangas wrote: I'll dig into that, but right now it seems like an OpenSSL or libcrypto bug to me. Or something in the way we use them, although I can't see anything obviously wrong in the libpq code at a quick glance. Can you please try with ssl_renegotiation_limit=0? [ looks ]

Re: [HACKERS] Strange hanging bug in a simple milter

2013-09-09 Thread Stephen Frost
Alvaro, * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote: Heikki Linnakangas wrote: I'll dig into that, but right now it seems like an OpenSSL or libcrypto bug to me. Or something in the way we use them, although I can't see anything obviously wrong in the libpq code at a quick glance.