Hello all,

Not sure this is exactly right list, so feel free to point me to some other as appropriate.


While working on a higher-level binding to the libpq library, I've (likely) discovered a problem with non-blocking operation in case of using openssl. And, it looks so striking I'd like to share my observation.

For libpq, non-blocking operation is documented as a normal supported feature, e.g. [1] Now, openssl transport is also documented as a normal supported feature, e.g. [2] I have not found anywhere in documentaion any clear warnings that non-blocking operation and openssl transport are mutually exclusive or might not quite work as specified in any way.

From [1] we learn (through some intricate wording) that in order to avoid blocking at PQgetResult() one can employ PQsetnonblocking(), PQflush(), PQconsumeInput() and PQisBusy(), supposedly all of them non-blocking after calling PQsetnonblocking(), although not stated explicitely so, but otherwise it would make just no sence whatsoever, right?

Now lets have a look at e.g. PQconsumeInput():

===================
.....
/*
 * Load more data, if available. We do this no matter what state we are
 * in, since we are probably getting called because the application wants
 * to get rid of a read-select condition. Note that we will NOT block
 * waiting for more input.
 */
if (pqReadData(conn) < 0)
        return 0;

/* Parsing of the data waits till later. */
 return 1;
}
===================

It is stated that pqReadData() will NOT block. Now let's get inside:

===================
.....
/* OK, try to read some data */
retry3:
        nread = pqsecure_read(conn, conn->inBuffer + conn->inEnd,
                         conn->inBufSize - conn->inEnd);
.....
/*
 * Still not sure that it's EOF, because some data could have just
 * arrived.
 */
retry4:
        nread = pqsecure_read(conn, conn->inBuffer + conn->inEnd,
                        conn->inBufSize - conn->inEnd);
....
====================

Now in case of SSL, this pqsecure_read() is just a wrapper around pgtls_read(), so lets look further:

====================
pgtls_read(PGconn *conn, void *ptr, size_t len)
{
.....
rloop:
        SOCK_ERRNO_SET(0);
        n = SSL_read(conn->ssl, ptr, len);
        err = SSL_get_error(conn->ssl, n);
        switch (err)
        {
......
                        break;
        case SSL_ERROR_WANT_WRITE:
        /* Returning 0 here would cause caller to wait for read-ready,
         * which is not correct since what SSL wants is wait for
         * write-ready.  The former could get us stuck in an infinite
         * wait, so don't risk it; busy-loop instead. */
        goto rloop;
======================

So going PQconsumeInput()->pqReadData()->pqsecure_read()->pgtls_read() in a supposedly non-blocking operation we finally come to a tight busy-loop waiting for SSL_ERROR_WANT_WRITE to go down! How could such thing ever be,

- with no even sleep(1),
- no timeout,
- no diagnostics of any sort,
- a comment implying that getting stuck in a (potentially) infinite sleepless loop deep inside a library is OK.

And looking more into this pgtls_read() function it seems it just has inadequate interface. So that it has really no way to reliably indicate some important details to its caller, namely the need to wait for write-readyness. It's like if ssl support was a quick-n-dirty hack rather than a consistently integrated feature. Or do I read it all wrong?
Any thoughts?

[1] https://www.postgresql.org/docs/9.5/static/libpq-async.html
[2] https://www.postgresql.org/docs/9.5/static/libpq-ssl.html

Thank you,
Regards,

Nikolai


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to