Re: [HACKERS] Fix for OpenSSL error queue bug

2016-05-06 Thread Tom Lane
Peter Geoghegan  writes:
> On Fri, May 6, 2016 at 9:12 PM, Peter Eisentraut
>  wrote:
>> All committed.

> Thanks!

> This should no longer be referenced in the 9.6 release notes. It
> should just appear in the next batch of point releases. Tom has an
> sgml comment in the draft 9.6 release notes to that effect.

Yeah, I was just on that ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix for OpenSSL error queue bug

2016-05-06 Thread Peter Geoghegan
On Fri, May 6, 2016 at 9:12 PM, Peter Eisentraut
 wrote:
> All committed.

Thanks!

This should no longer be referenced in the 9.6 release notes. It
should just appear in the next batch of point releases. Tom has an
sgml comment in the draft 9.6 release notes to that effect.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix for OpenSSL error queue bug

2016-05-06 Thread Peter Eisentraut

On 4/25/16 8:37 PM, Peter Geoghegan wrote:

Attached is a series of patches for each supported release branch. As
discussed, I would like to target every released version of Postgres
with this bugfix. This is causing users real pain at the moment.


All committed.

--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix for OpenSSL error queue bug

2016-05-02 Thread Tom Lane
Peter Eisentraut  writes:
> On 04/27/2016 11:04 PM, Tom Lane wrote:
>> Actually, git_changelog can merge identically-messaged commits despite
>> intervening commits.  It's set up to not merge commits more than 24 hours
>> apart, though.  We could loosen that requirement but I'm afraid it would
>> make things confusing to merge far-apart commits.

> ISTM that git_changelog should be looking at the AuthorDate instead of
> the CommitDate.  Then it would work correctly for backpatches done using
> cherry-pick.

Meh.  That would also make it noticeably *less* accurate for some other
scenarios.  It actually did look at AuthorDate to start with, and we
changed it because we didn't like the results; cf 7299778a9.

Also, IME, backpatching with cherry-pick fails often enough that designing
one's process around it is just asking for pain.  Certainly, many fewer
than half of my own back-patches manage to use cherry-pick.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix for OpenSSL error queue bug

2016-05-02 Thread Peter Eisentraut
On 04/27/2016 11:04 PM, Tom Lane wrote:
>> There is no value in providing exact message matches when the backpatch
>> occurs even after one other commit in the master branch.
> 
> Actually, git_changelog can merge identically-messaged commits despite
> intervening commits.  It's set up to not merge commits more than 24 hours
> apart, though.  We could loosen that requirement but I'm afraid it would
> make things confusing to merge far-apart commits.

ISTM that git_changelog should be looking at the AuthorDate instead of
the CommitDate.  Then it would work correctly for backpatches done using
cherry-pick.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix for OpenSSL error queue bug

2016-04-27 Thread Tom Lane
Alvaro Herrera  writes:
> Peter Geoghegan wrote:
>> I'm not sure if project policy around backpatching (that commit
>> messages and so on should match exactly) has anything to say about the
>> case where backpatching follows several weeks after commit to the
>> master branch.

> There is no value in providing exact message matches when the backpatch
> occurs even after one other commit in the master branch.

Actually, git_changelog can merge identically-messaged commits despite
intervening commits.  It's set up to not merge commits more than 24 hours
apart, though.  We could loosen that requirement but I'm afraid it would
make things confusing to merge far-apart commits.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix for OpenSSL error queue bug

2016-04-27 Thread Alvaro Herrera
Peter Geoghegan wrote:

> I'm not sure if project policy around backpatching (that commit
> messages and so on should match exactly) has anything to say about the
> case where backpatching follows several weeks after commit to the
> master branch.

There is no value in providing exact message matches when the backpatch
occurs even after one other commit in the master branch.  The principal
reason for the requirement is so that src/tools/git_changelog is able to
merge the commits as a single entry, and that's not going to happen when
you have one or more other commits in the master branch anyway.  I find
it more productive to mention, in the backpatched commit message, what
is the commit ID in the master branch.  That way it can be more easily
be searched for, later on.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix for OpenSSL error queue bug

2016-04-25 Thread Peter Geoghegan
On Mon, Apr 25, 2016 at 6:44 PM, Michael Paquier
 wrote:
>> I'm not sure if project policy around backpatching (that commit
>> messages and so on should match exactly) has anything to say about the
>> case where backpatching follows several weeks after commit to the
>> master branch. In the absence of any clear direction on that, I've
>> created commits that look like what Peter E might have pushed in early
>> April, had he decided to do that backpatch leg-work up front.
>
> It seems to me that we definitely want to get this stuff backpatched
> at the end. So +1 for this move.

Right. This issue has a long history of causing users significant
(though often intermittent) problems. As an example, consider this
problem report from a co-worker of mine that dates back to 2012:

https://bitbucket.org/ged/ruby-pg/issues/142/async_exec-over-ssl-connection-can-fail-on

There are numerous problem reports like this that are easily found
using Google. I think that there are probably a variety of unpleasant
interactions and symptoms. Crashes are one rarer symptom, seen in
certain scenarios only (crashes are not described in the link above).

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix for OpenSSL error queue bug

2016-04-25 Thread Michael Paquier
On Tue, Apr 26, 2016 at 9:37 AM, Peter Geoghegan  wrote:
> Only the 9.5 backpatch was a simple, conflict-free "git cherry-pick".
> Most of the effort here involved producing a clean 9.4 patch. This was
> largely mechanical, if a little tricky. In release branches for
> releases that preceded 9.4, there were a few further merge conflicts
> as I worked backwards through the branches, but those were trivial.

Looking again at this thread, the general agreement was to clear the
error stack before calling any SSL routine. Those patches are doing
so, and they look in good shape to me. Note: there is
SSL_do_handshake() on back-branches for the SSL renegotiation but we
don't need to bother about clearing the error queue as any error
occurring in those cases just stops the session, and we've never
bothered calling ERR_get_error there to get more details about the
errors.

> I'm not sure if project policy around backpatching (that commit
> messages and so on should match exactly) has anything to say about the
> case where backpatching follows several weeks after commit to the
> master branch. In the absence of any clear direction on that, I've
> created commits that look like what Peter E might have pushed in early
> April, had he decided to do that backpatch leg-work up front.

It seems to me that we definitely want to get this stuff backpatched
at the end. So +1 for this move.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix for OpenSSL error queue bug

2016-04-25 Thread Peter Geoghegan
On Fri, Apr 8, 2016 at 11:43 AM, Peter Geoghegan  wrote:
> I'll do so soon. I was waiting on Peter E to take me up on the offer.

Attached is a series of patches for each supported release branch. As
discussed, I would like to target every released version of Postgres
with this bugfix. This is causing users real pain at the moment.

Only the 9.5 backpatch was a simple, conflict-free "git cherry-pick".
Most of the effort here involved producing a clean 9.4 patch. This was
largely mechanical, if a little tricky. In release branches for
releases that preceded 9.4, there were a few further merge conflicts
as I worked backwards through the branches, but those were trivial.

I'm not sure if project policy around backpatching (that commit
messages and so on should match exactly) has anything to say about the
case where backpatching follows several weeks after commit to the
master branch. In the absence of any clear direction on that, I've
created commits that look like what Peter E might have pushed in early
April, had he decided to do that backpatch leg-work up front.

-- 
Peter Geoghegan


openssl-patches.tar.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix for OpenSSL error queue bug

2016-04-08 Thread Peter Geoghegan
On Fri, Apr 8, 2016 at 11:42 AM, Tom Lane  wrote:
> Seems like a reasonable thing to do, but somebody would have to do the
> legwork to produce back-branch patches.

I'll do so soon. I was waiting on Peter E to take me up on the offer.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix for OpenSSL error queue bug

2016-04-08 Thread Tom Lane
Peter Geoghegan  writes:
> On Fri, Apr 8, 2016 at 11:20 AM, Peter Geoghegan  wrote:
>> That seems reasonable. I'm glad we finally got this done. Thanks.

> Are we going to backpatch this?

Seems like a reasonable thing to do, but somebody would have to do the
legwork to produce back-branch patches.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix for OpenSSL error queue bug

2016-04-08 Thread Peter Geoghegan
On Fri, Apr 8, 2016 at 11:20 AM, Peter Geoghegan  wrote:
> That seems reasonable. I'm glad we finally got this done. Thanks.

Are we going to backpatch this?


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix for OpenSSL error queue bug

2016-04-08 Thread Peter Geoghegan
On Fri, Apr 8, 2016 at 11:12 AM, Peter Eisentraut  wrote:
> I have committed this so that the comments are only in the first instance in
> each file.  I think that should give enough information to someone who is
> curious about the details of the error handling.
>
> Also, I have adjusted this so that we check for n<0 after SSL_read and
> SSL_write, as you had in the frontend code but not in the backend code.

That seems reasonable. I'm glad we finally got this done. Thanks.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix for OpenSSL error queue bug

2016-04-08 Thread Peter Eisentraut

On 04/07/2016 09:36 PM, Peter Geoghegan wrote:

On Thu, Apr 7, 2016 at 6:08 PM, Peter Eisentraut  wrote:

I wish we could avoid the huge, repeated comment blocks.  Perhaps we could
put them at the top of the files once?


I'm fine with that. Do you want to take care of that, or should I?


I have committed this so that the comments are only in the first 
instance in each file.  I think that should give enough information to 
someone who is curious about the details of the error handling.


Also, I have adjusted this so that we check for n<0 after SSL_read and 
SSL_write, as you had in the frontend code but not in the backend code.




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix for OpenSSL error queue bug

2016-04-07 Thread Michael Paquier
On Fri, Apr 8, 2016 at 10:23 AM, Peter Eisentraut  wrote:
> On 04/07/2016 03:47 AM, Michael Paquier wrote:
>>
>> I have looked at this patch. Do we need to worry as well about
>> SSL_shutdown in disconnection code path? I believe that we don't care
>> much if an error happens at this point but we surely should consume
>> any error generated because the SSL context is kept after
>> destroy_ssl_system and another connection attempt may be done using
>> the same SSL context, no?
>
>
> But we are the only user of our SSL context, and we clear the error before
> every call we make (with this patch).  The clean up afterwards is only if
> someone else is also using SSL in the same process, and they won't use our
> SSL context.

Argh, yes. Because SSL_free() is directly used. I should read those
docs more carefully.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix for OpenSSL error queue bug

2016-04-07 Thread Peter Geoghegan
On Thu, Apr 7, 2016 at 6:08 PM, Peter Eisentraut  wrote:
> I wish we could avoid the huge, repeated comment blocks.  Perhaps we could
> put them at the top of the files once?

I'm fine with that. Do you want to take care of that, or should I?

> Also, why do you write 0UL instead of just 0?

Because the variable is declared as a raw unsigned long, IIRC. It
hardly matters.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix for OpenSSL error queue bug

2016-04-07 Thread Peter Eisentraut

On 04/07/2016 03:47 AM, Michael Paquier wrote:

I have looked at this patch. Do we need to worry as well about
SSL_shutdown in disconnection code path? I believe that we don't care
much if an error happens at this point but we surely should consume
any error generated because the SSL context is kept after
destroy_ssl_system and another connection attempt may be done using
the same SSL context, no?


But we are the only user of our SSL context, and we clear the error 
before every call we make (with this patch).  The clean up afterwards is 
only if someone else is also using SSL in the same process, and they 
won't use our SSL context.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix for OpenSSL error queue bug

2016-04-07 Thread Peter Eisentraut

On 03/14/2016 09:44 PM, Peter Geoghegan wrote:

On Mon, Mar 14, 2016 at 4:11 PM, Peter Geoghegan  wrote:

Yes, with one small difference: I wouldn't be calling ERR_get_error()
in the common case where SSL_get_error() returns SSL_ERROR_NONE, on
the theory that skipping that case represents no risk. I'm making a
concession to Peter E's view that that will calling ERR_get_error()
more will add useless cycles.


The attached patch is what I have in mind.

I can produce a back-patchable variant of this if you and Peter E.
think this approach is okay.


I think this patch is OK under the premises that we have established.

I wish we could avoid the huge, repeated comment blocks.  Perhaps we 
could put them at the top of the files once?


Also, why do you write 0UL instead of just 0?


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix for OpenSSL error queue bug

2016-04-07 Thread Michael Paquier
On Sun, Mar 27, 2016 at 9:01 AM, Tom Lane  wrote:
> Peter Geoghegan  writes:
>> Will this make it into the next point release? I was rather hoping it would.
>
> I dunno.  I certainly haven't reviewed it carefully enough to commit it.
> Perhaps Peter has, but time grows short ...

I have looked at this patch. Do we need to worry as well about
SSL_shutdown in disconnection code path? I believe that we don't care
much if an error happens at this point but we surely should consume
any error generated because the SSL context is kept after
destroy_ssl_system and another connection attempt may be done using
the same SSL context, no?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix for OpenSSL error queue bug

2016-03-26 Thread Tom Lane
Peter Geoghegan  writes:
> Will this make it into the next point release? I was rather hoping it would.

I dunno.  I certainly haven't reviewed it carefully enough to commit it.
Perhaps Peter has, but time grows short ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix for OpenSSL error queue bug

2016-03-26 Thread Peter Geoghegan
On Wed, Mar 23, 2016 at 9:18 PM, Tom Lane  wrote:
> in the other order, so as not to assume that ERR_clear_error doesn't
> set errno.  On the other hand, if it does, things are probably hopelessly
> broken anyway; so I'm not sure there is any case where this helps.

I'm fine with that.

Will this make it into the next point release? I was rather hoping it would.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix for OpenSSL error queue bug

2016-03-23 Thread Tom Lane
Peter Geoghegan  writes:
> On Mon, Mar 14, 2016 at 6:44 PM, Peter Geoghegan  wrote:
>> I can produce a back-patchable variant of this if you and Peter E.
>> think this approach is okay.

> Where are we on this?

I'm generally okay with the approach used in
http://www.postgresql.org/message-id/CAM3SWZS0iV1HQ2fqNxvmTZm4+hPLAfH=7LeC4znmFX8az=a...@mail.gmail.com
though I have not reviewed that patch in detail.

One minor thought is that maybe it'd be better to do this:

errno = 0;
+   ERR_clear_error();

in the other order, so as not to assume that ERR_clear_error doesn't
set errno.  On the other hand, if it does, things are probably hopelessly
broken anyway; so I'm not sure there is any case where this helps.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix for OpenSSL error queue bug

2016-03-23 Thread Peter Geoghegan
On Mon, Mar 14, 2016 at 6:44 PM, Peter Geoghegan  wrote:
> I can produce a back-patchable variant of this if you and Peter E.
> think this approach is okay.

Where are we on this?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix for OpenSSL error queue bug

2016-03-14 Thread Peter Geoghegan
On Mon, Mar 14, 2016 at 4:11 PM, Peter Geoghegan  wrote:
> Yes, with one small difference: I wouldn't be calling ERR_get_error()
> in the common case where SSL_get_error() returns SSL_ERROR_NONE, on
> the theory that skipping that case represents no risk. I'm making a
> concession to Peter E's view that that will calling ERR_get_error()
> more will add useless cycles.

The attached patch is what I have in mind.

I can produce a back-patchable variant of this if you and Peter E.
think this approach is okay.

-- 
Peter Geoghegan
From f7a72e36cdf2ff58857bd962e26daabdc5747fe1 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan 
Date: Tue, 26 Jan 2016 15:11:15 -0800
Subject: [PATCH] Distrust external OpenSSL clients; clear err queue

OpenSSL has an unfortunate tendency to mix per-session state error
handling with per-thread error handling.  This can cause problems when
programs that link to libpq with OpenSSL enabled have some other use of
OpenSSL; without care, one caller of OpenSSL may cause problems for the
other caller.  Backend code might similarly be affected, for example
when a third party extension independently uses OpenSSL without taking
the appropriate precautions.

To fix, don't trust other users of OpenSSL to clear the per-thread error
queue.  Instead, clear the entire per-thread queue ahead of certain I/O
operations when it appears that there might be trouble (these I/O
operations mostly need to call SSL_get_error() to check for success,
which relies on the queue being empty).  This is slightly aggressive,
but it's pretty clear that the other callers have a very dubious claim
to ownership of the per-thread queue.  Do this is both frontend and
backend code.

Finally, be more careful about clearing our own error queue, so as to
not cause these problems ourself.  It's possibly that control previously
did not always reach SSLerrmessage(), where ERR_get_error() was supposed
to be called to clear the queue's earliest code.  Make sure
ERR_get_error() is always called, so as to spare other users of OpenSSL
the possibility of similar problems caused by libpq (as opposed to
problems caused by a third party OpenSSL library like PHP's OpenSSL
extension).  Again, do this is both frontend and backend code.

See bug #12799 and https://bugs.php.net/bug.php?id=68276

Based on patches by Dave Vitek and Peter Eisentraut.

Back-patch to all supported versions.
---
 src/backend/libpq/be-secure-openssl.c| 104 ++--
 src/interfaces/libpq/fe-secure-openssl.c | 114 ---
 2 files changed, 173 insertions(+), 45 deletions(-)

diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index 1e3dfb6..8fd92ab 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -78,7 +78,7 @@ static DH  *tmp_dh_cb(SSL *s, int is_export, int keylength);
 static int	verify_cb(int, X509_STORE_CTX *);
 static void info_cb(const SSL *ssl, int type, int args);
 static void initialize_ecdh(void);
-static const char *SSLerrmessage(void);
+static const char *SSLerrmessage(unsigned long ecode);
 
 static char *X509_NAME_to_cstring(X509_NAME *name);
 
@@ -182,7 +182,7 @@ be_tls_init(void)
 		if (!SSL_context)
 			ereport(FATAL,
 	(errmsg("could not create SSL context: %s",
-			SSLerrmessage(;
+			SSLerrmessage(ERR_get_error();
 
 		/*
 		 * Disable OpenSSL's moving-write-buffer sanity check, because it
@@ -198,7 +198,7 @@ be_tls_init(void)
 			ereport(FATAL,
 	(errcode(ERRCODE_CONFIG_FILE_ERROR),
   errmsg("could not load server certificate file \"%s\": %s",
-		 ssl_cert_file, SSLerrmessage(;
+		 ssl_cert_file, SSLerrmessage(ERR_get_error();
 
 		if (stat(ssl_key_file, ) != 0)
 			ereport(FATAL,
@@ -228,12 +228,12 @@ be_tls_init(void)
 		SSL_FILETYPE_PEM) != 1)
 			ereport(FATAL,
 	(errmsg("could not load private key file \"%s\": %s",
-			ssl_key_file, SSLerrmessage(;
+			ssl_key_file, SSLerrmessage(ERR_get_error();
 
 		if (SSL_CTX_check_private_key(SSL_context) != 1)
 			ereport(FATAL,
 	(errmsg("check of private key failed: %s",
-			SSLerrmessage(;
+			SSLerrmessage(ERR_get_error();
 	}
 
 	/* set up ephemeral DH keys, and disallow SSL v2/v3 while at it */
@@ -262,7 +262,7 @@ be_tls_init(void)
 			(root_cert_list = SSL_load_client_CA_file(ssl_ca_file)) == NULL)
 			ereport(FATAL,
 	(errmsg("could not load root certificate file \"%s\": %s",
-			ssl_ca_file, SSLerrmessage(;
+			ssl_ca_file, SSLerrmessage(ERR_get_error();
 	}
 
 	/*--
@@ -293,7 +293,7 @@ be_tls_init(void)
 			else
 ereport(FATAL,
 		(errmsg("could not load SSL certificate revocation list file \"%s\": %s",
-ssl_crl_file, SSLerrmessage(;
+ssl_crl_file, SSLerrmessage(ERR_get_error();
 		}
 	}
 
@@ -330,6 +330,7 @@ be_tls_open_server(Port *port)
 	int			r;
 	int			err;
 	int			waitfor;
+	

Re: [HACKERS] Fix for OpenSSL error queue bug

2016-03-14 Thread Peter Geoghegan
On Mon, Mar 14, 2016 at 4:05 PM, Tom Lane  wrote:
> So your proposal is basically to do #2 in all branches?  I won't fight it,
> if it doesn't bloat the code much.  The overhead should surely be trivial
> compared to network communication costs, and I'm afraid you might be right
> about the risk of latent bugs.

Yes, with one small difference: I wouldn't be calling ERR_get_error()
in the common case where SSL_get_error() returns SSL_ERROR_NONE, on
the theory that skipping that case represents no risk. I'm making a
concession to Peter E's view that that will calling ERR_get_error()
more will add useless cycles.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix for OpenSSL error queue bug

2016-03-14 Thread Tom Lane
Peter Geoghegan  writes:
> On Mon, Mar 14, 2016 at 3:06 PM, Tom Lane  wrote:
>> Agreed, we need to deal with this one way or the other.  My proposal
>> is:
>> 
>> 1. In HEAD, do it as Peter E. suggests, ie clear error queue before calls.
>> 
>> 2. In back branches, clear error queue before *and* after calls.  This
>> will waste a few nanoseconds but will avoid any risk of breaking
>> existing third-party code.

> I am concerned that users will never be able to get this right, since
> I think it requires every Ruby or PHP app using some thin OpenSSL
> wrapper to clear the per-queue thread. It's a big mess, but it's our
> mess to some degree.

So your proposal is basically to do #2 in all branches?  I won't fight it,
if it doesn't bloat the code much.  The overhead should surely be trivial
compared to network communication costs, and I'm afraid you might be right
about the risk of latent bugs.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix for OpenSSL error queue bug

2016-03-14 Thread Peter Geoghegan
On Mon, Mar 14, 2016 at 3:06 PM, Tom Lane  wrote:
> Agreed, we need to deal with this one way or the other.  My proposal
> is:
>
> 1. In HEAD, do it as Peter E. suggests, ie clear error queue before calls.
>
> 2. In back branches, clear error queue before *and* after calls.  This
> will waste a few nanoseconds but will avoid any risk of breaking
> existing third-party code.
>
> I think it's reasonable to expect extensions to update to the newer
> behavior in a major release, but we're taking risks if we expect
> that to happen in minor releases.

I am concerned that users will never be able to get this right, since
I think it requires every Ruby or PHP app using some thin OpenSSL
wrapper to clear the per-queue thread. It's a big mess, but it's our
mess to some degree.

I wonder if it would be just as good if we ensured that
ERR_get_error() was definitely called in any circumstance where it
looked like we might have an error: ERR_get_error() would be
*reliably* called, as in my patch, but unlike my patch only when
SSL_get_error() indicated a problem (not all the time).

Heikki believed that clearing the error queue by calling
ERR_clear_error() before calling an I/O function like SSL_read() was
necessary, as we all do; no controversy there. But Heikki also
believed, even prior to hearing about this bug, that it was important
and necessary for ERR_get_error() to be reached when there might be an
error added to the queue following a Postgres/libpq call to an I/O
function like SSL_read() followed by SSL_get_error() indicating
trouble. He thought, as I do, that it would be a good idea to not rely
on that happening from a distance (i.e. not relying on reaching
SSLerrmessage()). Peter E. seems to believe that there is absolutely
no reason to rely on ERR_get_error() getting called at all, and that
the existing SSLerrmessage() only exists for the purposes of producing
a human-readable error message.

Anyway, thinking about it some more, perhaps the best solution is to
do the ERR_get_error() call iff SSL_get_error() seems unhappy, perhaps
even placing the two into a utility function. That's probably almost
the same as the existing behavior, as far as clearing up the queue
after we may have added to it goes. I don't know if that's any less
safe then my patch's pessimistic approach. It seems like it might be
just as safe. Under this compromise, I think we'd probably clear the
error queue after SSL_get_error() returned a value that is not
SSL_ERROR_NONE, though (including SSL_ERROR_WANT_READ, etc). What do
you think about that?

> In any case, we need a patch that touches the backend-side code as well.

I agree that the backend-side code should be covered. I quickly
changed my mind about that, and am happy to produce a revision along
those lines.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix for OpenSSL error queue bug

2016-03-14 Thread Tom Lane
Peter Geoghegan  writes:
> On Thu, Mar 10, 2016 at 8:16 PM, Peter Geoghegan  wrote:
>> On Thu, Mar 10, 2016 at 7:22 PM, Peter Eisentraut  wrote:
>>> Arguably, if everyone followed "my" approach, this should be very easy
>>> to fix everywhere.

>> I don't think that there is any clear indication that the OpenSSL
>> people would share that view. Or my view. Or anything that's sensible
>> or practical or actionable.

> Ideally, we'd be able to get this into the upcoming minor release.

Agreed, we need to deal with this one way or the other.  My proposal
is:

1. In HEAD, do it as Peter E. suggests, ie clear error queue before calls.

2. In back branches, clear error queue before *and* after calls.  This
will waste a few nanoseconds but will avoid any risk of breaking
existing third-party code.

I think it's reasonable to expect extensions to update to the newer
behavior in a major release, but we're taking risks if we expect
that to happen in minor releases.

In any case, we need a patch that touches the backend-side code as well.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix for OpenSSL error queue bug

2016-03-14 Thread Peter Geoghegan
On Thu, Mar 10, 2016 at 8:16 PM, Peter Geoghegan  wrote:
> On Thu, Mar 10, 2016 at 7:22 PM, Peter Eisentraut  wrote:
>> Arguably, if everyone followed "my" approach, this should be very easy
>> to fix everywhere.
>
> I don't think that there is any clear indication that the OpenSSL
> people would share that view. Or my view. Or anything that's sensible
> or practical or actionable.

Ideally, we'd be able to get this into the upcoming minor release.
This bug has caused Heroku some serious problems.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix for OpenSSL error queue bug

2016-03-10 Thread Peter Geoghegan
On Thu, Mar 10, 2016 at 7:22 PM, Peter Eisentraut  wrote:
> Arguably, if everyone followed "my" approach, this should be very easy
> to fix everywhere.

I don't think that there is any clear indication that the OpenSSL
people would share that view. Or my view. Or anything that's sensible
or practical or actionable.

> Instead, reading through the PHP bug report, they
> are coming up with a fairly complex solution for clearing the error
> queue afterwards so as to not leave "landmines" for the next guy.  But
> their code will (AFAICT) still be wrong because they are not clearing
> the error *before* the API calls where it is required per documentation.
>  So "everyone" (sample of 2) is scrambling to clean up for the next guy
> instead of doing the straightforward fix of following the API
> documentation and cleaning up before their own calls.

It will be less wrong, though.

PostgreSQL is the project that doesn't trust a C90 standard library
function to not blithely write passed the bounds of a passed buffer,
all because of a bug in certain versions of Solaris based systems that
was several years old at the time. See commit be8b06c364. My view that
that wasn't really worth worrying about was clearly the minority view
when this was discussed (a minority of 1, and a majority of 4 or 5,
IIRC). I think that this case vastly exceeds that standard for
worrying about other people's broken code; in this case, we ourselves
made the same mistake for years and years.

> I also see the clean-up-afterwards approach in the Python ssl module.  I
> fear there is de facto a second API specification that requires you to
> clean up errors after yourself and gives an implicit guarantee that the
> error queue is empty whenever you want to make any SSL calls.  I don't
> think this actually works in all cases, but maybe if everyone else is
> convinced of that (in plain violation of the published OpenSSL
> documentation, AFAICT) we need to get on board with that for
> interoperability.

I didn't know that Python's ssl module did that. That seems to lend
support to my view, which is that we should similarly clear the
thread's queue lest anyone else be affected. Yes, this approach is
fairly scatter-gun, but frankly that's just the situation we find
ourselves in.

>>> Also, there is nothing that
>>> says that an error produces exactly one entry in the error queue; it
>>> could be multiple.  Or that errors couldn't arise at random times
>>> between the reset and whatever happens next.
>>
>> I think that it's kind of implied, since calling ERR_get_error() pops
>> the stack. But even if that isn't so, it might be worth preventing bad
>> things from happening to client applications only sometimes.
>
> The lore on the internet suggests that multiple errors could definitely
> happen.  So popping one error afterwards isn't going to fix it, it just
> moves the edge case around.

Are you sure, specifically, that an I/O function is known to add more
than one error to the per-thread queue? Obviously there can be more
than one error in the queue. But I haven't seen any specific
indication, either in the docs or in the lore, that more than one
error can be added by a single call to an I/O function such as
SSL_read(). Perhaps you can share where you encountered the lore.

>> doesn't it give you pause? Heikki seemed to think that clearing our
>> own queue was important when he looked at this a year ago:
>>
>> http://www.postgresql.org/message-id/54edd30d.5050...@vmware.com
>
> I think that message suggests that we should clear the queue before each
> call, not after.

Uh, it very clearly *is* Heikki's view that we should clear the queue
*afterwards*. Certainly, I think Heikki also wanted us to clear the
queue before, so we aren't screwed, "just to be sure", as he puts it
-- but nobody disputes that that's necessary anyway. That it might not
be *sufficient* to just call ERR_get_error() is the new information in
the bug report. Heikki said:

"""

The OpenSSL manual doesn't directly require you to call
ERR_clear_error() before every SSL_* call. It just requires that you
ensure that the error queue is empty. Libpq ensures that by always
clearing the queue *after* an error happens, in SSLerrmessage().


"""

The problem with this, as Heikki goes on to say, it that we might not
get to that point in SSLerrmessage(); we may not be able to pop the
queue/call ERR_get_error(), more or less by accident (e.g. I noticed
an OOM could do that). That's why I proposed to fix that by calling
ERR_get_error() early and unambiguously. If we must rely on that
happening, it should not be from such a long distance (i.e. from
within SSLerrmessage(), which is kind of far removed from the original
I/O function calls).

Thanks
--
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix for OpenSSL error queue bug

2016-03-10 Thread Peter Eisentraut
On 3/10/16 9:38 PM, Peter Geoghegan wrote:
> Looked at your proposed patch. Will respond to your original mail on the 
> matter.
> 
> On Thu, Mar 3, 2016 at 4:15 PM, Peter Eisentraut  wrote:
>> I think clearing the error after a call is not necessary.  The API
>> clearly requires that you should clear the error queue before a call, so
>> clearing it afterwards does not accomplish anything, except maybe make
>> broken code work sometimes, for a while.
> 
> Uh, if it's so clear, then why haven't we been doing it all along?

The issue only happens if two interleaving trains of execution, one of
which is libpq, use OpenSSL.  Not many applications do that.  And you
also need to provoke the errors in a certain order.  And even then, in
some cases you might just see a false positive error, rather than a
crash.  So it's an edge case.

> Part of the problem is that various scripting language OpenSSL
> wrappers are only very thin wrappers. They effectively pass the buck
> on to PHP and Ruby devs. If we cannot get it right, what chance have
> they? I've personally experienced a bit uptick in complaints about
> this recently. I think there are 3 separate groups within Heroku that
> regularly ask me how this patch is doing.

I think they have been getting away with it for so long for the same
reasons.

Arguably, if everyone followed "my" approach, this should be very easy
to fix everywhere.  Instead, reading through the PHP bug report, they
are coming up with a fairly complex solution for clearing the error
queue afterwards so as to not leave "landmines" for the next guy.  But
their code will (AFAICT) still be wrong because they are not clearing
the error *before* the API calls where it is required per documentation.
 So "everyone" (sample of 2) is scrambling to clean up for the next guy
instead of doing the straightforward fix of following the API
documentation and cleaning up before their own calls.

I also see the clean-up-afterwards approach in the Python ssl module.  I
fear there is de facto a second API specification that requires you to
clean up errors after yourself and gives an implicit guarantee that the
error queue is empty whenever you want to make any SSL calls.  I don't
think this actually works in all cases, but maybe if everyone else is
convinced of that (in plain violation of the published OpenSSL
documentation, AFAICT) we need to get on board with that for
interoperability.

>> Also, there is nothing that
>> says that an error produces exactly one entry in the error queue; it
>> could be multiple.  Or that errors couldn't arise at random times
>> between the reset and whatever happens next.
> 
> I think that it's kind of implied, since calling ERR_get_error() pops
> the stack. But even if that isn't so, it might be worth preventing bad
> things from happening to client applications only sometimes.

The lore on the internet suggests that multiple errors could definitely
happen.  So popping one error afterwards isn't going to fix it, it just
moves the edge case around.  At least what we should do is clear the
entire queue afterwards instead of just the first error.

> doesn't it give you pause? Heikki seemed to think that clearing our
> own queue was important when he looked at this a year ago:
> 
> http://www.postgresql.org/message-id/54edd30d.5050...@vmware.com

I think that message suggests that we should clear the queue before each
call, not after.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix for OpenSSL error queue bug

2016-03-10 Thread Peter Geoghegan
Looked at your proposed patch. Will respond to your original mail on the matter.

On Thu, Mar 3, 2016 at 4:15 PM, Peter Eisentraut  wrote:
> I think clearing the error after a call is not necessary.  The API
> clearly requires that you should clear the error queue before a call, so
> clearing it afterwards does not accomplish anything, except maybe make
> broken code work sometimes, for a while.

Uh, if it's so clear, then why haven't we been doing it all along? The
API doesn't require you to take *any* specific practical measure (for
example, the specific practical measure of resetting the queue before
calling an I/O function). It simply says "this exact thing cannot be
allowed to happen; the consequences are undefined", with nothing in
the way of guidance on what that means in the real world. It's a
shockingly bad API, but that's the reality.

Part of the problem is that various scripting language OpenSSL
wrappers are only very thin wrappers. They effectively pass the buck
on to PHP and Ruby devs. If we cannot get it right, what chance have
they? I've personally experienced a bit uptick in complaints about
this recently. I think there are 3 separate groups within Heroku that
regularly ask me how this patch is doing.

> Also, there is nothing that
> says that an error produces exactly one entry in the error queue; it
> could be multiple.  Or that errors couldn't arise at random times
> between the reset and whatever happens next.

I think that it's kind of implied, since calling ERR_get_error() pops
the stack. But even if that isn't so, it might be worth preventing bad
things from happening to client applications only sometimes.

> I think this is analogous to clearing errno before a C library call.
> You could clear it afterwards as well, to be nice to the next guy, but
> the next guy should really take care of that themselves, and we can't
> rely on what happens in between anyway.

It sounds like you're saying "well, we cannot be expected to bend over
backwards to make broken code work". But that broken code includes
every single version of libpq + OpenSSL currently distributed. Seems
like a very high standard. I'm not saying that that means we
definitely should clear the error queue reliably ourselves, but
doesn't it give you pause? Heikki seemed to think that clearing our
own queue was important when he looked at this a year ago:

http://www.postgresql.org/message-id/54edd30d.5050...@vmware.com

Again, not conclusive, but I would like to hear a rationale for why
you think it's okay to not consistently clear our own queue for the
benefit of others. Is this informed by a concern about some specific
downside to taking that extra precaution?

Thanks
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix for OpenSSL error queue bug

2016-03-10 Thread Peter Eisentraut
On 3/10/16 6:10 PM, Peter Geoghegan wrote:
> On Thu, Mar 10, 2016 at 3:09 PM, Peter Geoghegan  wrote:
>> Getting to it very soon. Just really busy right this moment.
> 
> That said, I agree with Peter's remarks about doing this frontend and
> backend. So, while I'm not sure, I think we're in agreement on all
> issues. I would have no problem with Peter E following through with
> final steps + commit as Robert outlined, if that works for him.

My proposal is the attached patch.


From 697b4f75fccbb5eda530211f3ef58c2b226c5461 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 10 Mar 2016 20:59:30 -0500
Subject: [PATCH] Clear OpenSSL error queue before OpenSSL calls

OpenSSL requires that the error queue be cleared before certain OpenSSL
API calls, so that you can be sure that the error you are checking
afterwards actually come from you and was not left over from other
activity.  We had never done that, which appears to have worked as long
as we are the only users of OpenSSL in the process.  But if a process
using libpq or a backend plugin uses OpenSSL as well, this can lead to
confusion and crashes.

see bug #12799 and https://bugs.php.net/bug.php?id=68276

based on patches by Dave Vitek and Peter Geoghegan
---
 src/backend/libpq/be-secure-openssl.c| 3 +++
 src/interfaces/libpq/fe-secure-openssl.c | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index 1e3dfb6..be337f5 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -353,6 +353,7 @@ be_tls_open_server(Port *port)
 	port->ssl_in_use = true;
 
 aloop:
+	ERR_clear_error();
 	r = SSL_accept(port->ssl);
 	if (r <= 0)
 	{
@@ -501,6 +502,7 @@ be_tls_read(Port *port, void *ptr, size_t len, int *waitfor)
 	int			err;
 
 	errno = 0;
+	ERR_clear_error();
 	n = SSL_read(port->ssl, ptr, len);
 	err = SSL_get_error(port->ssl, n);
 	switch (err)
@@ -558,6 +560,7 @@ be_tls_write(Port *port, void *ptr, size_t len, int *waitfor)
 	int			err;
 
 	errno = 0;
+	ERR_clear_error();
 	n = SSL_write(port->ssl, ptr, len);
 	err = SSL_get_error(port->ssl, n);
 	switch (err)
diff --git a/src/interfaces/libpq/fe-secure-openssl.c b/src/interfaces/libpq/fe-secure-openssl.c
index 133546b..0535338 100644
--- a/src/interfaces/libpq/fe-secure-openssl.c
+++ b/src/interfaces/libpq/fe-secure-openssl.c
@@ -212,6 +212,7 @@ pgtls_read(PGconn *conn, void *ptr, size_t len)
 
 rloop:
 	SOCK_ERRNO_SET(0);
+	ERR_clear_error();
 	n = SSL_read(conn->ssl, ptr, len);
 	err = SSL_get_error(conn->ssl, n);
 	switch (err)
@@ -320,6 +321,7 @@ pgtls_write(PGconn *conn, const void *ptr, size_t len)
 	int			err;
 
 	SOCK_ERRNO_SET(0);
+	ERR_clear_error();
 	n = SSL_write(conn->ssl, ptr, len);
 	err = SSL_get_error(conn->ssl, n);
 	switch (err)
@@ -1327,6 +1329,7 @@ open_client_SSL(PGconn *conn)
 {
 	int			r;
 
+	ERR_clear_error();
 	r = SSL_connect(conn->ssl);
 	if (r <= 0)
 	{
-- 
2.7.2


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix for OpenSSL error queue bug

2016-03-10 Thread Peter Geoghegan
On Thu, Mar 10, 2016 at 3:09 PM, Peter Geoghegan  wrote:
> Getting to it very soon. Just really busy right this moment.

That said, I agree with Peter's remarks about doing this frontend and
backend. So, while I'm not sure, I think we're in agreement on all
issues. I would have no problem with Peter E following through with
final steps + commit as Robert outlined, if that works for him.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix for OpenSSL error queue bug

2016-03-10 Thread Peter Geoghegan
On Thu, Mar 10, 2016 at 2:50 PM, Robert Haas  wrote:
> So what's the next step here?  Peter G, are you planning to update the
> patch based on this review from Peter E?  If not, Peter E, do you want
> to update the patch and commit?  If neither, I'm going to mark this
> Returned with Feedback in the CF and move on, which seems a bit of a
> shame since this appears to be a bona fide bug, but if nobody's
> willing to work on it, it ain't gettin' fixed.

Getting to it very soon. Just really busy right this moment.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix for OpenSSL error queue bug

2016-03-10 Thread Robert Haas
On Thu, Mar 3, 2016 at 7:15 PM, Peter Eisentraut  wrote:
> On 2/5/16 5:04 AM, Peter Geoghegan wrote:
>> As Heikki goes into on that thread, the appropriate action seems to be
>> to constantly reset the error queue, and to make sure that we
>> ourselves clear the queue consistently. (Note that we might not have
>> consistently called ERR_get_error() in the event of an OOM within
>> SSLerrmessage(), for example). I have not changed backend code in the
>> patch, though; I felt that we had enough control of the general
>> situation there for it to be unnecessary to lock everything down.
>
> I think clearing the error after a call is not necessary.  The API
> clearly requires that you should clear the error queue before a call, so
> clearing it afterwards does not accomplish anything, except maybe make
> broken code work sometimes, for a while.  Also, there is nothing that
> says that an error produces exactly one entry in the error queue; it
> could be multiple.  Or that errors couldn't arise at random times
> between the reset and whatever happens next.
>
> I think this is analogous to clearing errno before a C library call.
> You could clear it afterwards as well, to be nice to the next guy, but
> the next guy should really take care of that themselves, and we can't
> rely on what happens in between anyway.
>
> The places that you identified for change look correct as far as libpq
> goes.  I do think that the backend should be updated in the same way,
> because it's a) correct, b) easy enough, and c) there could well be
> interactions with postgres_fdw, plproxy, plperl, or who knows what.

So what's the next step here?  Peter G, are you planning to update the
patch based on this review from Peter E?  If not, Peter E, do you want
to update the patch and commit?  If neither, I'm going to mark this
Returned with Feedback in the CF and move on, which seems a bit of a
shame since this appears to be a bona fide bug, but if nobody's
willing to work on it, it ain't gettin' fixed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix for OpenSSL error queue bug

2016-03-03 Thread Peter Eisentraut
On 2/5/16 5:04 AM, Peter Geoghegan wrote:
> As Heikki goes into on that thread, the appropriate action seems to be
> to constantly reset the error queue, and to make sure that we
> ourselves clear the queue consistently. (Note that we might not have
> consistently called ERR_get_error() in the event of an OOM within
> SSLerrmessage(), for example). I have not changed backend code in the
> patch, though; I felt that we had enough control of the general
> situation there for it to be unnecessary to lock everything down.

I think clearing the error after a call is not necessary.  The API
clearly requires that you should clear the error queue before a call, so
clearing it afterwards does not accomplish anything, except maybe make
broken code work sometimes, for a while.  Also, there is nothing that
says that an error produces exactly one entry in the error queue; it
could be multiple.  Or that errors couldn't arise at random times
between the reset and whatever happens next.

I think this is analogous to clearing errno before a C library call.
You could clear it afterwards as well, to be nice to the next guy, but
the next guy should really take care of that themselves, and we can't
rely on what happens in between anyway.

The places that you identified for change look correct as far as libpq
goes.  I do think that the backend should be updated in the same way,
because it's a) correct, b) easy enough, and c) there could well be
interactions with postgres_fdw, plproxy, plperl, or who knows what.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Fix for OpenSSL error queue bug

2016-02-05 Thread Peter Geoghegan
Attached patch fixes an issue reported by William Felipe Welter about
a year ago [1]. It is loosely based on his original patch.

As Heikki goes into on that thread, the appropriate action seems to be
to constantly reset the error queue, and to make sure that we
ourselves clear the queue consistently. (Note that we might not have
consistently called ERR_get_error() in the event of an OOM within
SSLerrmessage(), for example). I have not changed backend code in the
patch, though; I felt that we had enough control of the general
situation there for it to be unnecessary to lock everything down.

The interface that OpenSSL exposes for all of this is very poorly
thought out. It's not exactly clear how a client of OpenSSL can be a
"good citizen" in handling the error queue. Correctly using the
library is only ever described in terms of a very exact thing that
must happen or must not happen. There is no overarching concept of how
things fit together so that each OpenSSL client doesn't clobber the
other. It's all rather impractical. Clearly, this patch needs careful
review.

[1] 
http://www.postgresql.org/message-id/flat/20150224030956.2529.83...@wrigleys.postgresql.org#20150224030956.2529.83...@wrigleys.postgresql.org
-- 
Peter Geoghegan
From d5433bc562f792afd75ef8664729db9e6a60ee62 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan 
Date: Tue, 26 Jan 2016 15:11:15 -0800
Subject: [PATCH] Distrust external OpenSSL clients; clear err queue

OpenSSL has an unfortunate tendency to mix up per-session state error
handling, and per-thread OpenSSL error handling.  This can cause
problems when programs that link to libpq with OpenSSL enabled have some
other use of OpenSSL; without care, one caller of OpenSSL may cause
problems for the other caller.

To fix, don't trust other users of OpenSSL to clear the per-thread error
queue.  Instead, clear the entire per-thread queue ahead of certain I/O
operations (these I/O operations mostly need to call SSL_get_error() to
check for success, which relies on the queue being empty).  This is a
bit aggressive, but it's pretty clear that the other callers have a very
dubious claim to ownership of the per-thread queue; problem reports
involving PHP scripts that use both PHP's OpenSSL extension, and the
pgsql PHP extension (libpq) are themselves evidence of this.

Finally, be more careful about clearing our own error queue, so as to
not cause these problems ourself.  It's possibly that control previously
did not always reach SSLerrmessage(), where ERR_get_error() was supposed
to be called to clear the queue's earliest code.  Make sure
ERR_get_error() is always called, so as to spare other users of OpenSSL
the possibility of similar problems caused by libpq (as opposed to
problems caused by a third party OpenSSL library like PHP's OpenSSL
extension).
---
 src/interfaces/libpq/fe-secure-openssl.c | 125 +--
 1 file changed, 104 insertions(+), 21 deletions(-)

diff --git a/src/interfaces/libpq/fe-secure-openssl.c b/src/interfaces/libpq/fe-secure-openssl.c
index 133546b..270d184 100644
--- a/src/interfaces/libpq/fe-secure-openssl.c
+++ b/src/interfaces/libpq/fe-secure-openssl.c
@@ -70,7 +70,7 @@ static int verify_peer_name_matches_certificate_name(PGconn *conn,
 static void destroy_ssl_system(void);
 static int	initialize_SSL(PGconn *conn);
 static PostgresPollingStatusType open_client_SSL(PGconn *);
-static char *SSLerrmessage(void);
+static char *SSLerrmessage(unsigned long errcode);
 static void SSLerrfree(char *buf);
 
 static int	my_sock_read(BIO *h, char *buf, int size);
@@ -152,7 +152,7 @@ pgtls_open_client(PGconn *conn)
 			!SSL_set_app_data(conn->ssl, conn) ||
 			!my_SSL_set_fd(conn, conn->sock))
 		{
-			char	   *err = SSLerrmessage();
+			char	   *err = SSLerrmessage(ERR_get_error());
 
 			printfPQExpBuffer(>errorMessage,
    libpq_gettext("could not establish SSL connection: %s\n"),
@@ -209,11 +209,37 @@ pgtls_read(PGconn *conn, void *ptr, size_t len)
 	int			result_errno = 0;
 	char		sebuf[256];
 	int			err;
+	unsigned long errcode;
 
 rloop:
 	SOCK_ERRNO_SET(0);
+
+	/*
+	 * Prepare to call SSL_get_error() by clearing thread's OpenSSL error
+	 * queue.  In general, the current thread's error queue must be empty
+	 * before the TLS/SSL I/O operation is attempted, or SSL_get_error()
+	 * will not work reliably.  Since the possibility exists that other
+	 * OpenSSL clients running in the same thread but not under our control
+	 * will fail to call ERR_get_error() themselves (after their own I/O
+	 * operations), pro-actively clear the per-thread error queue now.
+	 */
+	ERR_clear_error();
 	n = SSL_read(conn->ssl, ptr, len);
 	err = SSL_get_error(conn->ssl, n);
+
+	/*
+	 * Other clients of OpenSSL may fail to call ERR_get_error(), but we
+	 * always do (so as to not cause problems for OpenSSL clients that
+	 * don't call ERR_clear_error() defensively, but are responsible enough
+	 * to call ERR_get_error() to clear error