On 11/04/2013 02:51 AM, Claudio Freire wrote:
On Sun, Nov 3, 2013 at 3:58 PM, Florian Weimer <fwei...@redhat.com> wrote:
I would like to add truly asynchronous query processing to libpq, enabling
command pipelining.  The idea is to to allow applications to auto-tune to
the bandwidth-delay product and reduce the number of context switches when
running against a local server.
...
If the application is not interested in intermediate query results, it would
use something like this:
...
If there is no need to exit from the loop early (say, because errors are
expected to be extremely rare), the PQgetResultNoWait call can be left out.

It doesn't seem wise to me making such a distinction. It sounds like
you're oversimplifying, and that's why you need "modes", to overcome
the evidently restrictive limits of the simplified interface, and that
it would only be a matter of (a short) time when some other limitation
requires some other mode.

I need modes because I want to avoid unbound buffering, which means that result data has to be consumed in the order queries are issued.

   PGAsyncMode oldMode = PQsetsendAsyncMode(conn, PQASYNC_RESULT);
   bool more_data;
   do {
      more_data = ...;
      if (more_data) {
        int ret = PQsendQueryParams(conn,
          "INSERT ... RETURNING ...", ...);
        if (ret == 0) {
          // handle low-level error
        }
      }
      // Consume all pending results.
      while (1) {
        PGresult *res;
        if (more_data) {
          res = PQgetResultNoWait(conn);
        } else {
          res = PQgetResult(conn);
        }

Somehow, that code looks backwards. I mean, really backwards. Wouldn't
that be !more_data?

No, if more data is available to transfer to the server, the no-wait variant has to be used to avoid a needless synchronization with the server.

In any case, pipelining like that, without a clear distinction, in the
wire protocol, of which results pertain to which query, could be a
recipe for trouble when subtle bugs, either in lib usage or
implementation, mistakenly treat one query's result as another's.

We already use pipelining in libpq (see pqFlush, PQsendQueryGuts and pqParseInput3), the server is supposed to support it, and there is a lack of a clear tit-for-tat response mechanism anyway because of NOTIFY/LISTEN and the way certain errors are reported.

Instead of buffering the results, we could buffer the encoded command
messages in PQASYNC_RESULT mode.  This means that PQsendQueryParams would
not block when it cannot send the (complete) command message, but store in
the connection object so that the subsequent PQgetResultNoWait and
PQgetResult would send it.  This might work better with single-tuple result
mode.  We cannot avoid buffering either multiple queries or multiple
responses if we want to utilize the link bandwidth, or we'd risk deadlocks.

This is a non-solution. Such an implementation, at least as described,
would not remove neither network latency nor context switches, it
would be a purely API change with no externally visible behavior
change.

Ugh, why?

An effective solution must include multi-command packets. Without
knowing the wire protocol in detail, something like:

PARSE: INSERT blah
BIND: args
EXECUTE with DISCARD
PARSE: INSERT blah
BIND: args
EXECUTE with DISCARD
PARSE: SELECT  blah
BIND: args
EXECUTE with FETCH ALL

All in one packet, would be efficient and error-free (IMO).

No, because this doesn't scale automatically with the bandwidth-delay product. It also requires that the client buffers queries and their parameters even though the network has to do that anyway.

In any case, I don't want to change the wire protocol, I just want to enable libpq clients to use more of its capabilities.

--
Florian Weimer / Red Hat Product Security Team


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to