Re: [HACKERS] libpq's multi-threaded SSL callback handling is busted

2015-04-09 Thread Jan Urbański

Peter Eisentraut writes:

 On 2/12/15 7:28 AM, Jan Urbański wrote:
 +#if OPENSSL_VERSION_NUMBER  0x1000
 +/* OpenSSL 1.0.0 deprecates the CRYPTO_set_id_callback function and 
 provides a
 + * default implementation, so there's no need for our own. */

 I have some additional concerns about this.  It is true that OpenSSL
 1.0.0 deprecates CRYPTO_set_id_callback(), but it replaces it with
 CRYPTO_THREADID_set_callback().  There is no indication that you don't
 need to set a callback anymore.  The man page
 (https://www.openssl.org/docs/crypto/threads.html) still says you need
 to set two callbacks, and points to the new interface.

Truly, no good deed can ever go unpunished.

I spent around an hour tracking down why setting both callbacks
for OpenSSL = 1.0.0 brought back the PHP segfaults. Turns out, in OpenSSL
there's *no way* to unregister a callback registered with
CRYPTO_THREADID_set_callback()!

Observe: 
https://github.com/openssl/openssl/blob/35a1cc90bc1795e8893c11e442790ee7f659fffb/crypto/thr_id.c#L174-L180

Once you set a callback, game over - unloading your library will cause a
segfault. I cannot express how joyful I felt when I discovered it.

 I suggest you remove this part from your patch and submit a separate
 patch for consideration if you want to.

At this point I will propose an even simpler patch (attached). I gave up on
trying to use the more modern CRYPTO_THREADID_* functions.

Right now, HEAD libpq won't compile against OpenSSL compiled with
OPENSSL_NO_DEPRECATED, which I think is fine, given the lack of complaints. So
let's just keep using the older, saner functions and ignore the THREADID crap.

By the way, might I take the opportunity to breach the subject of backpatching
this change, should it get accepted?

Cheers,
Jan

diff --git a/src/interfaces/libpq/fe-secure-openssl.c b/src/interfaces/libpq/fe-secure-openssl.c
index a32af34..02c9177 100644
--- a/src/interfaces/libpq/fe-secure-openssl.c
+++ b/src/interfaces/libpq/fe-secure-openssl.c
@@ -806,9 +806,12 @@ pgtls_init(PGconn *conn)
 
 		if (ssl_open_connections++ == 0)
 		{
-			/* These are only required for threaded libcrypto applications */
-			CRYPTO_set_id_callback(pq_threadidcallback);
-			CRYPTO_set_locking_callback(pq_lockingcallback);
+			/* These are only required for threaded libcrypto applications, but
+			 * make sure we don't stomp on them if they're already set. */
+			if (CRYPTO_get_id_callback() == NULL)
+CRYPTO_set_id_callback(pq_threadidcallback);
+			if (CRYPTO_get_locking_callback() == NULL)
+CRYPTO_set_locking_callback(pq_lockingcallback);
 		}
 	}
 #endif   /* ENABLE_THREAD_SAFETY */
@@ -885,9 +888,12 @@ destroy_ssl_system(void)
 
 	if (pq_init_crypto_lib  ssl_open_connections == 0)
 	{
-		/* No connections left, unregister libcrypto callbacks */
-		CRYPTO_set_locking_callback(NULL);
-		CRYPTO_set_id_callback(NULL);
+		/* No connections left, unregister libcrypto callbacks, if no one
+		 * registered different ones in the meantime. */
+		if (CRYPTO_get_id_callback() == pq_threadidcallback)
+			CRYPTO_set_id_callback(NULL);
+		if (CRYPTO_get_locking_callback() == pq_lockingcallback)
+			CRYPTO_set_locking_callback(NULL);
 
 		/*
 		 * We don't free the lock array or the SSL_context. If we get another

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq's multi-threaded SSL callback handling is busted

2015-04-09 Thread Peter Eisentraut
On 2/12/15 7:28 AM, Jan Urbański wrote:
 +#if OPENSSL_VERSION_NUMBER  0x1000
 +/* OpenSSL 1.0.0 deprecates the CRYPTO_set_id_callback function and provides 
 a
 + * default implementation, so there's no need for our own. */

I have some additional concerns about this.  It is true that OpenSSL
1.0.0 deprecates CRYPTO_set_id_callback(), but it replaces it with
CRYPTO_THREADID_set_callback().  There is no indication that you don't
need to set a callback anymore.  The man page
(https://www.openssl.org/docs/crypto/threads.html) still says you need
to set two callbacks, and points to the new interface.

It is true that there is a fallback implementation for some platforms,
but there is no indication that one is invited to rely on those.  Let's
keep in mind that libpq is potentially used on obscure platforms, so I'd
rather stick with the documented approaches.

I suggest you remove this part from your patch and submit a separate
patch for consideration if you want to.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq's multi-threaded SSL callback handling is busted

2015-04-07 Thread Peter Eisentraut
On 4/3/15 7:44 AM, Jan Urbański wrote:
 To reiterate: I recognise that not handling the callbacks is not the right
 answer. But not stomping on callbacks others might have set sounds like a
 minimal and safe improvement.

I think your patch is okay in that it is not a good idea to overwrite or
unset someone else's callbacks.  But we shouldn't mistake that for
fixing the underlying problem.  The only reason this patch appears to
fix the presented test cases is because other OpenSSL users are also
misbehaving and/or the OpenSSL interfaces are so stupid that they cannot
be worked with reasonably.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq's multi-threaded SSL callback handling is busted

2015-04-03 Thread Jan Urbański

Peter Eisentraut writes:
 On 4/2/15 4:32 AM, Jan Urbański wrote:
 
 Peter Eisentraut writes:
 I don't think this patch would actually fix the problem that was
 described after the original bug report
 (http://www.postgresql.org/message-id/5436991b.5020...@vmware.com),
 namely that another thread acquires a lock while the libpq callbacks are
 set and then cannot release the lock if libpq has been shut down in the
 meantime.
 
 I did test both the Python and the PHP repro scripts and the patch fixed both
 the deadlock and the segfault.
 
 What happens is that Python (for instance) stops over the callback
 unconditionally. So when libpq gets unloaded, it sees that the currently set
 callback is no the one it originally set and refrains from NULLing it.

 So this works because Python is just as broken as libpq right now.  What
 happens if Python is fixed as well?  Then we'll have the problem I
 described above.

Well, not really, libpq sets and unsets the callbacks every time an SSL
connection is opened and closed. Python sets the callbacks once when the ssl
module is imported and never touches them again.

Arguably, it should set them even earlier, but it's still saner than
flip-flopping them. AFAIK you can't unload Python, so setting the callback
and keeping it there is a sound strategy.

The change I'm proposing is not to set the callback in libpq if one is already
set and not remove it if the one that's set is not libpq's. That's as sane as
it gets.

With multiple libraries wanting to use OpenSSL in threaded code, the mechanism
seems to be last one wins. It doesn't matter *who's* callbacks are used, as
long as they're there and they stay there and that's Python's stance. This
doesn't work if you want to be able to unload the library, so the next best
thing is not touching the callback if someone else set his in the meantime.

What's broken is OpenSSL for offering such a bad way of dealing with
concurrency.

To reiterate: I recognise that not handling the callbacks is not the right
answer. But not stomping on callbacks others might have set sounds like a
minimal and safe improvement.

Cheers,
Jan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq's multi-threaded SSL callback handling is busted

2015-04-02 Thread Jan Urbański

Peter Eisentraut writes:

 On 2/12/15 7:28 AM, Jan Urbański wrote:
 For the sake of discussion, here's a patch to prevent stomping on
 previously-set callbacks, racy as it looks.

 FWIW, it does fix the Python deadlock and doesn't cause the PHP segfault...

 I don't think this patch would actually fix the problem that was
 described after the original bug report
 (http://www.postgresql.org/message-id/5436991b.5020...@vmware.com),
 namely that another thread acquires a lock while the libpq callbacks are
 set and then cannot release the lock if libpq has been shut down in the
 meantime.

I did test both the Python and the PHP repro scripts and the patch fixed both
the deadlock and the segfault.

What happens is that Python (for instance) stops over the callback
unconditionally. So when libpq gets unloaded, it sees that the currently set
callback is no the one it originally set and refrains from NULLing it.

There's a small race window there, to be sure, but it's a lot better than what
we have now.

Cheers,
Jan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq's multi-threaded SSL callback handling is busted

2015-04-02 Thread Peter Eisentraut
On 4/2/15 4:32 AM, Jan Urbański wrote:
 
 Peter Eisentraut writes:
 
 On 2/12/15 7:28 AM, Jan Urbański wrote:
 For the sake of discussion, here's a patch to prevent stomping on
 previously-set callbacks, racy as it looks.

 FWIW, it does fix the Python deadlock and doesn't cause the PHP segfault...

 I don't think this patch would actually fix the problem that was
 described after the original bug report
 (http://www.postgresql.org/message-id/5436991b.5020...@vmware.com),
 namely that another thread acquires a lock while the libpq callbacks are
 set and then cannot release the lock if libpq has been shut down in the
 meantime.
 
 I did test both the Python and the PHP repro scripts and the patch fixed both
 the deadlock and the segfault.
 
 What happens is that Python (for instance) stops over the callback
 unconditionally. So when libpq gets unloaded, it sees that the currently set
 callback is no the one it originally set and refrains from NULLing it.

So this works because Python is just as broken as libpq right now.  What
happens if Python is fixed as well?  Then we'll have the problem I
described above.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq's multi-threaded SSL callback handling is busted

2015-03-31 Thread Peter Eisentraut
On 2/12/15 7:28 AM, Jan Urbański wrote:
 * If there's already callbacks set: Remember that fact and don't
   overwrite. In the next major version: warn.

 So yeah, that was my initial approach - check if callbacks are set, don't 
 do
 the dance if they are. It felt like a crutch, though, and racy at that. 
 There's
 no atomic way to test-and-set those callbacks. The window for racyness is
 small, though.

 If you do that check during library initialization instead of every
 connection it shouldn't be racy - if that part is run in a multithreaded
 fashion you're doing something crazy.

 Yes, that's true. The problem is that there's no real libpq initialisation
 function. The docs say that:

 If your application initializes libssl and/or libcrypto libraries and libpq 
 is
 built with SSL support, you should call PQinitOpenSSL

 So most apps will just not bother. The moment you know you'll need SSL is 
 only
 when you get an 'S' message from the server...
 
 For the sake of discussion, here's a patch to prevent stomping on
 previously-set callbacks, racy as it looks.
 
 FWIW, it does fix the Python deadlock and doesn't cause the PHP segfault...

I don't think this patch would actually fix the problem that was
described after the original bug report
(http://www.postgresql.org/message-id/5436991b.5020...@vmware.com),
namely that another thread acquires a lock while the libpq callbacks are
set and then cannot release the lock if libpq has been shut down in the
meantime.

The only way to fix that is to never unset the callbacks.  But we don't
want that or can't do that for other reasons.

I think the only way out is to declare that if there are multiple
threads and other threads might be using OpenSSL not through libpq, then
the callbacks need to be managed outside of libpq.

In environments like PHP or Python this would require some coordination
work across modules somehow, but I don't see a way around that.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq's multi-threaded SSL callback handling is busted

2015-02-12 Thread Jan Urbański

Andres Freund writes:

 On 2015-02-12 09:31:27 +0100, Jan Urbański wrote:
 That doesn't solve the problem of the Python deadlock, where you're not at
 leisure to call a C function at the beginning of your module.

 We could just never unload the hooks...

That's what we did before 4e816286533dd34c10b368487d4079595a3e1418 :) And it
got changed after 
http://www.postgresql.org/message-id/48620925.6070...@pws.com.au


  * If there's already callbacks set: Remember that fact and don't
overwrite. In the next major version: warn.

 So yeah, that was my initial approach - check if callbacks are set, don't do
 the dance if they are. It felt like a crutch, though, and racy at that. 
 There's
 no atomic way to test-and-set those callbacks. The window for racyness is
 small, though.

 If you do that check during library initialization instead of every
 connection it shouldn't be racy - if that part is run in a multithreaded
 fashion you're doing something crazy.

Yes, that's true. The problem is that there's no real libpq initialisation
function. The docs say that:

If your application initializes libssl and/or libcrypto libraries and libpq is
built with SSL support, you should call PQinitOpenSSL

So most apps will just not bother. The moment you know you'll need SSL is only
when you get an 'S' message from the server...

Cheers,
Jan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq's multi-threaded SSL callback handling is busted

2015-02-12 Thread Jan Urbański

Andres Freund writes:

 On 2015-02-09 18:17:14 +0100, Jan Urbański wrote:
 First of all, the current behaviour is crazy. We're setting and unsetting the
 locking callback every time a connection is made/closed, which is not how
 OpenSSL is supposed to be used. The *application* using libpq should set a
 callback before it starts threads, it's no business of the library's.

 I don't buy this argument at all. Delivering a libpq that's not
 threadsafe in some circumstances is a really bad idea.

I knew this would be a hard sell :( What I know is that the current situation
is not good and leaving it as-is causes real grief.

 The old behaviour was slightly less insane (set callbacks first time we're
 engaging OpenSSL code, never touch them again). The real sane solution is to
 leave it up to the application.

 The real solution would be for openssl to do it itself.

I think that the OpenSSL API is the real culprit there, requiring threads to
register a callback is what's causing all the issues, but I don't think this
will get ever changed.

 Note that the PHP pgsql extension doesn't set the OpenSSL callbacks, because
 libpq was setting them on its own. If we remove the callback handling from
 libpq, PHP will need to add them. By the way, the MySQL extension for PHP 
 also
 does not set those callbacks.

 I think this shows pretty clearly how insane it would be to change this
 in a minor release. Do you really expect people to update libpq and php
 in unison. After a minor release?

Well, I haven't found reports of threaded PHP+MySQL+SSL programs crashing, and
the MySQL extension also doesn't care about the callbacks. I think it's because
it's both because it's a rare combination, or because almost everyone loads the
cURL extension, which *does* set up callbacks. Like I said, it's not a good
situation.

 Note that we *already* provide applications with the ability to set the
 callbacks themselves and prevent us from doing so: PQinitSSL(false).

Ah, I only now saw that with PQinitOpenSSL(true, false) you can work around the
problem, if you set up your own callbacks first. That seems to at least provide
a way to make libpq not do the callbacks dance in released versions. Thank you.

 Let me reiterate: I now believe the callbacks should be set by the 
 application,
 libraries should not touch them, since they don't know what will they be
 stomping on. If the application is run through a VM like Python or PHP, it's
 the VM that should make sure the callbacks are set.

 I fail to see why it's any better to do it in the VM. That relies on
 either always loading the VM's openssl module, even if you don't
 actually need it because the library you use deals with openssl. It
 prevents other implementations of openssl in the VM.

The callbacks are a thing that's owned by the application, so libraries have no
business in setting them up. The way OpenSSL's API is specified (very
unfortunately IMHO) is that before you use OpenSSL from threads, you need to
set the callbacks. If you don't control your application's startup (like when
you're a script executed through a VM), you assume the VM took care of it at
startup. If you're a library, you assume the user took care of it, as they
should.

Now I know that a lot of apps get this wrong. Python almost does the right
thing, but you're right, it only sets up callbacks if you load the 'ssl'
module. It still feels like setting them up in library code is stomping on
things not owned by the library - like libperl setting up signal handlers,
which caused problems in Postgres. They're resources not owned by the library!

 What I think we should do is the following:

 * Emphasize the fact that it's important to use PQinitSSL(false) if you
   did things yourself.

PQinitOpenSSL(true, false) if anything. The reason that function got introduced
is precisely because PQinitSSL(false) isn't exactly right, see
http://www.postgresql.org/message-id/49820d7d.7030...@esilo.com

That doesn't solve the problem of the Python deadlock, where you're not at
leisure to call a C function at the beginning of your module.

 * If there's already callbacks set: Remember that fact and don't
   overwrite. In the next major version: warn.

So yeah, that was my initial approach - check if callbacks are set, don't do
the dance if they are. It felt like a crutch, though, and racy at that. There's
no atomic way to test-and-set those callbacks. The window for racyness is
small, though.

 * Assert or something if the callback when unregistering isn't the same
   as it was when registering. That's pretty much guaranteed to cause
   issues.

So let me try another proposal and see if it doesn't set alarm bells off.

  * When opening a connection, if there's a callback set, don't overwrite it
(small race window).

  * When closing a connection, if the callback set is not
pq_lockingcallback/pq_threadidcallback, don't NULL it (small race window)

Asserts in frontend code are impossible, right? There's no way to 

Re: [HACKERS] libpq's multi-threaded SSL callback handling is busted

2015-02-12 Thread Andres Freund
On 2015-02-12 09:31:27 +0100, Jan Urbański wrote:
 
 Andres Freund writes:
 
  On 2015-02-09 18:17:14 +0100, Jan Urbański wrote:
  First of all, the current behaviour is crazy. We're setting and unsetting 
  the
  locking callback every time a connection is made/closed, which is not how
  OpenSSL is supposed to be used. The *application* using libpq should set a
  callback before it starts threads, it's no business of the library's.
 
  I don't buy this argument at all. Delivering a libpq that's not
  threadsafe in some circumstances is a really bad idea.
 
 I knew this would be a hard sell :( What I know is that the current situation
 is not good and leaving it as-is causes real grief.

It certainly causes less grief than just breaking working
applications. While really shitty the current situation works in a large
number of scenarios. It's not that common to have several users of
openssl in an application *and* unload libpq.

  I fail to see why it's any better to do it in the VM. That relies on
  either always loading the VM's openssl module, even if you don't
  actually need it because the library you use deals with openssl. It
  prevents other implementations of openssl in the VM.
 
 The callbacks are a thing that's owned by the application, so libraries have 
 no
 business in setting them up. The way OpenSSL's API is specified (very
 unfortunately IMHO) is that before you use OpenSSL from threads, you need to
 set the callbacks. If you don't control your application's startup (like when
 you're a script executed through a VM), you assume the VM took care of it at
 startup. If you're a library, you assume the user took care of it, as they
 should.

That's a bogus argument. The VM cannot setup up every library in the
world in a correct way. It'd be obviously be completely insane to load
the ssl module in every library just because some part of some random
application might need it. In fact, the ssl library for python does
pretty much the same thing as libpq does (although it's slightly more
careful). It's not the VM at all.

  What I think we should do is the following:
 
  * Emphasize the fact that it's important to use PQinitSSL(false) if you
did things yourself.
 
 PQinitOpenSSL(true, false) if anything. The reason that function got 
 introduced
 is precisely because PQinitSSL(false) isn't exactly right, see
 http://www.postgresql.org/message-id/49820d7d.7030...@esilo.com

Well, that really depends on what the application is actually using...

 That doesn't solve the problem of the Python deadlock, where you're not at
 leisure to call a C function at the beginning of your module.

We could just never unload the hooks...

  * If there's already callbacks set: Remember that fact and don't
overwrite. In the next major version: warn.
 
 So yeah, that was my initial approach - check if callbacks are set, don't do
 the dance if they are. It felt like a crutch, though, and racy at that. 
 There's
 no atomic way to test-and-set those callbacks. The window for racyness is
 small, though.

If you do that check during library initialization instead of every
connection it shouldn't be racy - if that part is run in a multithreaded
fashion you're doing something crazy.

  * Assert or something if the callback when unregistering isn't the same
as it was when registering. That's pretty much guaranteed to cause
issues.
 
 So let me try another proposal and see if it doesn't set alarm bells off.
 
   * When opening a connection, if there's a callback set, don't overwrite it
 (small race window).

   * When closing a connection, if the callback set is not
 pq_lockingcallback/pq_threadidcallback, don't NULL it (small race window)

If we do the tests once during initialization there shouldn't be a
race

 Asserts in frontend code are impossible, right? There's no way to warn, 
 either.

You can write to stderr...

 That would be a ~8 line patch. Does it feel back-patcheable?

I think we first need to find out what we all can agree on before
deciding about that.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq's multi-threaded SSL callback handling is busted

2015-02-12 Thread Jan Urbański

Jan Urbański writes:

 Andres Freund writes:

 On 2015-02-12 09:31:27 +0100, Jan Urbański wrote:
 That doesn't solve the problem of the Python deadlock, where you're not at
 leisure to call a C function at the beginning of your module.

 We could just never unload the hooks...

 That's what we did before 4e816286533dd34c10b368487d4079595a3e1418 :) And it
 got changed after 
 http://www.postgresql.org/message-id/48620925.6070...@pws.com.au


  * If there's already callbacks set: Remember that fact and don't
overwrite. In the next major version: warn.

 So yeah, that was my initial approach - check if callbacks are set, don't do
 the dance if they are. It felt like a crutch, though, and racy at that. 
 There's
 no atomic way to test-and-set those callbacks. The window for racyness is
 small, though.

 If you do that check during library initialization instead of every
 connection it shouldn't be racy - if that part is run in a multithreaded
 fashion you're doing something crazy.

 Yes, that's true. The problem is that there's no real libpq initialisation
 function. The docs say that:

 If your application initializes libssl and/or libcrypto libraries and libpq 
 is
 built with SSL support, you should call PQinitOpenSSL

 So most apps will just not bother. The moment you know you'll need SSL is only
 when you get an 'S' message from the server...

For the sake of discussion, here's a patch to prevent stomping on
previously-set callbacks, racy as it looks.

FWIW, it does fix the Python deadlock and doesn't cause the PHP segfault...

J
diff --git a/src/interfaces/libpq/fe-secure-openssl.c b/src/interfaces/libpq/fe-secure-openssl.c
index a32af34..54312a5 100644
--- a/src/interfaces/libpq/fe-secure-openssl.c
+++ b/src/interfaces/libpq/fe-secure-openssl.c
@@ -710,6 +710,9 @@ verify_peer_name_matches_certificate(PGconn *conn)
  *	Callback functions for OpenSSL internal locking
  */
 
+#if OPENSSL_VERSION_NUMBER  0x1000
+/* OpenSSL 1.0.0 deprecates the CRYPTO_set_id_callback function and provides a
+ * default implementation, so there's no need for our own. */
 static unsigned long
 pq_threadidcallback(void)
 {
@@ -720,6 +723,7 @@ pq_threadidcallback(void)
 	 */
 	return (unsigned long) pthread_self();
 }
+#endif   /* OPENSSL_VERSION_NUMBER  0x1000 */
 
 static pthread_mutex_t *pq_lockarray;
 
@@ -806,9 +810,14 @@ pgtls_init(PGconn *conn)
 
 		if (ssl_open_connections++ == 0)
 		{
-			/* These are only required for threaded libcrypto applications */
-			CRYPTO_set_id_callback(pq_threadidcallback);
-			CRYPTO_set_locking_callback(pq_lockingcallback);
+			/* These are only required for threaded libcrypto applications, but
+			 * make sure we don't stomp on them if they're already set. */
+#if OPENSSL_VERSION_NUMBER  0x1000
+			if (CRYPTO_get_id_callback() == NULL)
+CRYPTO_set_id_callback(pq_threadidcallback);
+#endif
+			if (CRYPTO_get_locking_callback() == NULL)
+CRYPTO_set_locking_callback(pq_lockingcallback);
 		}
 	}
 #endif   /* ENABLE_THREAD_SAFETY */
@@ -885,9 +894,14 @@ destroy_ssl_system(void)
 
 	if (pq_init_crypto_lib  ssl_open_connections == 0)
 	{
-		/* No connections left, unregister libcrypto callbacks */
-		CRYPTO_set_locking_callback(NULL);
-		CRYPTO_set_id_callback(NULL);
+		/* No connections left, unregister libcrypto callbacks, if no one
+		 * registered different ones in the meantime. */
+#if OPENSSL_VERSION_NUMBER  0x1000
+		if (CRYPTO_get_id_callback() == pq_threadidcallback)
+			CRYPTO_set_id_callback(NULL);
+#endif
+		if (CRYPTO_get_locking_callback() == pq_lockingcallback)
+			CRYPTO_set_locking_callback(NULL);
 
 		/*
 		 * We don't free the lock array or the SSL_context. If we get another

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq's multi-threaded SSL callback handling is busted

2015-02-11 Thread Jan Urbański
Jan Urbański writes:

 I did some more digging on bug
 http://www.postgresql.org/message-id/cahul3dpwyfnugdgo95ohydq4kugdnbkptjq0mnbtubhcmg4...@mail.gmail.com
 which describes a deadlock when using libpq with SSL in a multi-threaded
 environment with other threads doing SSL independently.

 [reproducing instructions]

 I posit we should remove all CRYPTO_set_*_callback functions and associated
 cruft from libpq.

 I could submit a patch to get rid of the crazy CRYPTO_*_callback dance in
 libpq, but at the very least this will require a warning in the release notes

Attached is a patch doing just that.

 I would very much like to have this change back-patched, since setting and
 resetting the callback makes using libpq in a threaded OpenSSL-enabled app
 arguably less safe than if it didn't use any locking.

Also attached is a patch for 9.4 and all previous supported releases, which is
the same thing, but adjusted for when we didn't have a separate fe-secure.c and
fe-secure-openssl.c

If committed, this change will warrant a notice in the release notes. I could
try drafting it, if that'd be helpful.

Cheers,
Jan
commit 62f7433f697d49ab005ad22822dc943661a4e48a
Author: Jan Urbański wulc...@wulczer.org
Date:   Wed Feb 11 17:47:09 2015 +0100

Do not attempt to manage OpenSSL locking callbacks in libpq.

According to OpenSSL documentation, threaded programs using SSL should register
locking callbacks before starting work. In 6daf396879b6502c41146cdb1013568654f52ff6
libpq started doing that, registering its callbacks when the connection would
switch to SSL mode.

This would lead to overwriting any locking callback a properly written threaded
SSL application might have already set, but only got uncovered as a bug in
http://www.postgresql.org/message-id/48620925.6070...@pws.com.au where it
turned out that unloading libpq would leave a dangling function
pointer. Specifically, the PHP runtime would segfault after unloading libpq, as
reported in https://bugs.php.net/bug.php?id=40926

Commit 4e816286533dd34c10b368487d4079595a3e1418 made libpq unset the callback
after the last SSL connection was closed. This solved the segfault issue, but
introduced a potential for deadlocks when one thread using libpq with SSL and
another doing an unrelated SSL operation.

T1 (libpq) T2 (other)

start libpq connection
add locking callback

   start SSL operation
   take lock

finish libpq connection
remove locking callback

   (!) release lock, noop since no callback

This got reported in bug
http://www.postgresql.org/message-id/cahul3dpwyfnugdgo95ohydq4kugdnbkptjq0mnbtubhcmg4...@mail.gmail.com

The reality is that it's not libpq's responsibility to manage those OpenSSL
callbacks. It's up to application code to set them up before using OpenSSL in
threads and trying to handle them in libpq is just stomping on the ones that
should already by in place by the time we start using shared OpenSSL
structures.

This commit gets rid of all OpenSSL callback handling from libpq. For
well-behaved applications, this should increase reliability, since they will
now have certainty that the callbacks they set up before attempting to use
OpenSSL will be used throughout the program execution.

Applications that relied on libpq to set up locking callbacks will need to be
fixed.

diff --git a/src/interfaces/libpq/fe-secure-openssl.c b/src/interfaces/libpq/fe-secure-openssl.c
index a32af34..5d0b132 100644
--- a/src/interfaces/libpq/fe-secure-openssl.c
+++ b/src/interfaces/libpq/fe-secure-openssl.c
@@ -67,7 +67,6 @@ static int	verify_cb(int ok, X509_STORE_CTX *ctx);
 static int	verify_peer_name_matches_certificate_name(PGconn *conn,
   ASN1_STRING *name,
   char **store_name);
-static void destroy_ssl_system(void);
 static int	initialize_SSL(PGconn *conn);
 static PostgresPollingStatusType open_client_SSL(PGconn *);
 static char *SSLerrmessage(void);
@@ -90,8 +89,6 @@ static bool pq_init_crypto_lib = true;
 static SSL_CTX *SSL_context = NULL;
 
 #ifdef ENABLE_THREAD_SAFETY
-static long ssl_open_connections = 0;
-
 #ifndef WIN32
 static pthread_mutex_t ssl_config_mutex = PTHREAD_MUTEX_INITIALIZER;
 #else
@@ -112,16 +109,6 @@ static long win32_ssl_create_mutex = 0;
 void
 pgtls_init_library(bool do_ssl, int do_crypto)
 {
-#ifdef ENABLE_THREAD_SAFETY
-
-	/*
-	 * Disallow changing the flags while we have open connections, else we'd
-	 * get completely confused.
-	 */
-	if (ssl_open_connections != 0)
-		return;
-#endif
-
 	pq_init_ssl_lib = do_ssl;
 	pq_init_crypto_lib = do_crypto;
 }
@@ -705,40 +692,6 @@ verify_peer_name_matches_certificate(PGconn *conn)
 	return found_match  !got_error;
 }
 
-#ifdef ENABLE_THREAD_SAFETY
-/*
- *	Callback functions 

Re: [HACKERS] libpq's multi-threaded SSL callback handling is busted

2015-02-11 Thread Andres Freund
On 2015-02-09 18:17:14 +0100, Jan Urbański wrote:
 We added unsetting the locking callback in
 4e816286533dd34c10b368487d4079595a3e1418 due to this bug report:
 http://www.postgresql.org/message-id/48620925.6070...@pws.com.au
 
 Indeed, commenting out the CRYPTO_set_locking_callback(NULL) call in
 fe-secure-openssl.c gets rid of the deadlock. However, it makes php segfault
 with the (attached) reproduction script from the original 2008 bug report. If
 your php.ini loads both the pgsql and curl extensions, reproduce the segfault 
 with:
 
 php -f pg_segfault.php
 
 The most difficult part about fixing this bug is to determine *who's at
 fault*. I now lean towards the opinion that we shouldn't be messing with
 OpenSSL callbacks *at all*.
 
 First of all, the current behaviour is crazy. We're setting and unsetting the
 locking callback every time a connection is made/closed, which is not how
 OpenSSL is supposed to be used. The *application* using libpq should set a
 callback before it starts threads, it's no business of the library's.

I don't buy this argument at all. Delivering a libpq that's not
threadsafe in some circumstances is a really bad idea.

 The old behaviour was slightly less insane (set callbacks first time we're
 engaging OpenSSL code, never touch them again). The real sane solution is to
 leave it up to the application.

The real solution would be for openssl to do it itself.

 I posit we should remove all CRYPTO_set_*_callback functions and associated
 cruft from libpq. This unfortunately means that multi-threaded applications
 using libpq and SSL will break if they haven't been setting their own 
 callbacks
 (if they have, well, tough luck! libpq will just stomp over them the first 
 time
 it connects to Postgres, but at least *some* callbacks are left present after
 that).

I think there's no chance in hell we can do that. Breaking a noticeable
amount of libpq users in a minor upgrade is imo a cure *FAR* worse than
the cure.

 Note that the PHP pgsql extension doesn't set the OpenSSL callbacks, because
 libpq was setting them on its own. If we remove the callback handling from
 libpq, PHP will need to add them. By the way, the MySQL extension for PHP also
 does not set those callbacks.

I think this shows pretty clearly how insane it would be to change this
in a minor release. Do you really expect people to update libpq and php
in unison. After a minor release?

Note that we *already* provide applications with the ability to set the
callbacks themselves and prevent us from doing so: PQinitSSL(false).

 Let me reiterate: I now believe the callbacks should be set by the 
 application,
 libraries should not touch them, since they don't know what will they be
 stomping on. If the application is run through a VM like Python or PHP, it's
 the VM that should make sure the callbacks are set.

I fail to see why it's any better to do it in the VM. That relies on
either always loading the VM's openssl module, even if you don't
actually need it because the library you use deals with openssl. It
prevents other implementations of openssl in the VM.

What I think we should do is the following:

* Emphasize the fact that it's important to use PQinitSSL(false) if you
  did things yourself.
* If there's already callbacks set: Remember that fact and don't
  overwrite. In the next major version: warn.
* Assert or something if the callback when unregistering isn't the same
  as it was when registering. That's pretty much guaranteed to cause
  issues.

 I would very much like to have this change back-patched, since setting and
 resetting the callback makes using libpq in a threaded OpenSSL-enabled app
 arguably less safe than if it didn't use any locking.

Meh^2

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] libpq's multi-threaded SSL callback handling is busted

2015-02-09 Thread Jan Urbański
I did some more digging on bug
http://www.postgresql.org/message-id/cahul3dpwyfnugdgo95ohydq4kugdnbkptjq0mnbtubhcmg4...@mail.gmail.com
which describes a deadlock when using libpq with SSL in a multi-threaded
environment with other threads doing SSL independently.

Attached is a reproducing Python script in my experience is faster at showing
the problem. Run it with

python -u pg_lock.py

As Heikki correctly diagnosed, the problem is with libpq unsetting the OpenSSL
locking callback while another thread is holding one of the locks. The other
thread then never releases the lock and the next time anyone tries to take it,
the application deadlocks.

The exact way it goes down is like this:



T1 (libpq) T2 (Python)

start libpq connection
init ssl system
add locking callback

   start ssl operation
   take lock

finish libpq connection
destroy ssl system
remove locking callback

   (!) release lock, noop since no callback



And the next time any thread tries to take the lock, it deadlocks.

We added unsetting the locking callback in
4e816286533dd34c10b368487d4079595a3e1418 due to this bug report:
http://www.postgresql.org/message-id/48620925.6070...@pws.com.au

Indeed, commenting out the CRYPTO_set_locking_callback(NULL) call in
fe-secure-openssl.c gets rid of the deadlock. However, it makes php segfault
with the (attached) reproduction script from the original 2008 bug report. If
your php.ini loads both the pgsql and curl extensions, reproduce the segfault 
with:

php -f pg_segfault.php

The most difficult part about fixing this bug is to determine *who's at
fault*. I now lean towards the opinion that we shouldn't be messing with
OpenSSL callbacks *at all*.

First of all, the current behaviour is crazy. We're setting and unsetting the
locking callback every time a connection is made/closed, which is not how
OpenSSL is supposed to be used. The *application* using libpq should set a
callback before it starts threads, it's no business of the library's.

The old behaviour was slightly less insane (set callbacks first time we're
engaging OpenSSL code, never touch them again). The real sane solution is to
leave it up to the application.

I posit we should remove all CRYPTO_set_*_callback functions and associated
cruft from libpq. This unfortunately means that multi-threaded applications
using libpq and SSL will break if they haven't been setting their own callbacks
(if they have, well, tough luck! libpq will just stomp over them the first time
it connects to Postgres, but at least *some* callbacks are left present after
that).

However, AFAICS if your app is not in C, then runtimes already handle that for
you (as they should).

Python:

https://hg.python.org/cpython/file/dc820b44ce21/Modules/_ssl.c#l4284

PHP:

https://github.com/php/php-src/blob/master/ext/curl/interface.c#L1235

Note that the PHP pgsql extension doesn't set the OpenSSL callbacks, because
libpq was setting them on its own. If we remove the callback handling from
libpq, PHP will need to add them. By the way, the MySQL extension for PHP also
does not set those callbacks.

Let me reiterate: I now believe the callbacks should be set by the application,
libraries should not touch them, since they don't know what will they be
stomping on. If the application is run through a VM like Python or PHP, it's
the VM that should make sure the callbacks are set.

I could submit a patch to get rid of the crazy CRYPTO_*_callback dance in
libpq, but at the very least this will require a warning in the release notes
about how you can't assume that libpq will take care of making sure your app is
multi-threaded safe when using OpenSSL. I also don't know how far that's
back-patcheable.

I would very much like to have this change back-patched, since setting and
resetting the callback makes using libpq in a threaded OpenSSL-enabled app
arguably less safe than if it didn't use any locking. If the app is written
correctly, it will have set locking callbacks before starting. Then libpq will
happily stomp on them. If the app hasn't set callbacks, it wasn't written
correctly in the first place and it will get segfaults instead of deadlocks.

Thanks,
Jan
#!/usr/bin/env python
import sys
import time
import threading
import urllib

import psycopg2


DB_HOST = '127.0.0.1'
DB_USER = 'postgres'
DB_NAME = 'postgres'

HTTPS_URL = 'https://google.com'
NUM_HTTPS_THREADS = 10


def connect():
conn = psycopg2.connect(
host=DB_HOST, user=DB_USER,
database=DB_NAME, sslmode='require')
return conn


class Worker(threading.Thread):

def __init__(self):
super(Worker, self).__init__()
self._stop = threading.Event()

def stop(self):
self._stop.set()

def stopped(self):
return self._stop.isSet()

def run(self):
i = 0
while not self.stopped():
i += 1
self.do_work(i)


class