While investigating a ruby-pg issue [1], we noticed that a libpq SSL connection can fail, if the running application uses OpenSSL for other work, too. Root cause is the thread local error queue of OpenSSL, that is used to transmit textual error messages to the application after a failed crypto operation. In case that the application leaves errors on the queue, the communication to the PostgreSQL server can fail with a message left from the previous failed OpenSSL operation, in particular when using non-blocking operations on the socket. This issue with openssl is quite old now - see [3].

For [1] it turned out that the issue is subdivided into these three parts:
1. the ruby-openssl binding does not clear the thread local error queue of OpenSSL after a certificate verify
2. OpenSSL makes use of a shared error queue for different crypto contexts.
3. libpq does not ensure a cleared error queue when doing SSL_* calls

To 1: Remaining messages on the error queue can generally lead to failing operations, later on. I'd talk to the ruby-openssl developers, to discuss how we can avoid any remaining messages on the queue.

To 2: SSL_get_error() inspects the shared error queue under some conditions. It's maybe poor API design, but it's documented behaviour [2]. So we certainly have to get along with it.

To 3: To make libpq independent to a previous error state, the error queue might be cleared with a call to ERR_clear_error() prior SSL_connect/read/write as in the attached trivial patch. This would make libpq robust against other uses of openssl within the application.

What do you think about clearing the OpenSSL error queue in libpq in that way?

[1] https://bitbucket.org/ged/ruby-pg/issue/142/async_exec-over-ssl-connection-can-fail-on
[2] http://www.openssl.org/docs/ssl/SSL_get_error.html
[3] http://www.educatedguesswork.org/movabletype/archives/2005/03/curse_you_opens.html

diff --git a/src/interfaces/libpq/fe-secure.c b/src/interfaces/libpq/fe-secure.c
index b1ad776..2a09c5c 100644
--- a/src/interfaces/libpq/fe-secure.c
+++ b/src/interfaces/libpq/fe-secure.c
@@ -323,6 +323,8 @@ pqsecure_read(PGconn *conn, void *ptr, size_t len)
 
 		/* SSL_read can write to the socket, so we need to disable SIGPIPE */
 		DISABLE_SIGPIPE(conn, spinfo, return -1);
+		/* There could be errors left on OpenSSL's error queue from the application */
+		ERR_clear_error();
 
 rloop:
 		SOCK_ERRNO_SET(0);
@@ -485,6 +487,8 @@ pqsecure_write(PGconn *conn, const void *ptr, size_t len)
 		int			err;
 
 		DISABLE_SIGPIPE(conn, spinfo, return -1);
+		/* There could be errors left on OpenSSL's error queue from the application */
+		ERR_clear_error();
 
 		SOCK_ERRNO_SET(0);
 		n = SSL_write(conn->ssl, ptr, len);
@@ -1375,6 +1379,9 @@ open_client_SSL(PGconn *conn)
 {
 	int			r;
 
+	/* There could be errors left on OpenSSL's error queue from the application */
+	ERR_clear_error();
+
 	r = SSL_connect(conn->ssl);
 	if (r <= 0)
 	{
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to