Re: [HACKERS] libpq pipelining

Heikki Linnakangas Thu, 04 Dec 2014 13:40:57 -0800

On 12/04/2014 09:11 PM, Matt Newell wrote:

With the API i am proposing, only 2 new functions (PQgetFirstQuery,
PQgetLastQuery) are required to be able to match each result to the query that
caused it.  Another function, PQgetNextQuery allows iterating through the
pending queries, and PQgetQueryCommand permits getting the original query
text.


Adding the ability to set a user supplied pointer on the PGquery struct might
make it much easier for some frameworks, and other users might want a
callback, but I don't think either are required.

I don't like exposing the PGquery struct to the application like that.Access to all other libpq objects is done via functions. The applicationcan't (or shouldn't, anyway) directly access the fields of PGresult, forexample. It has to call PQnfields(), PQntuples() etc.

The user-supplied pointer seems quite pointless. It would make sense ifthe pointer was passed to PQsendquery(), and you'd get it back inPGquery. You could then use it to tag the query when you send it withwhatever makes sense for the application, and use the tag in the resultto match it with the original query. But as it stands, I don't see thepoint.

The original query string might be handy for some things, but for othersit's useless. It's not enough as a general method to identify the querythe result belongs to. A common use case for this is to execute the samequery many times with different parameters.

So I don't think you've quite nailed the problem of how to match theresults to the commands that originated them, yet. One idea is to add afunction that can be called after PQgetResult(), to get some identifierof the original command. But there needs to be a mechanism to tag thePQsendQuery() calls. Or you can assign each call a unique IDautomatically, and have a way to ask for that ID after callingPQsendQuery().

The explanation of PQgetFirstQuery makes it sound pretty hard to matchup the result with the query. You have to pay attention to PQisBusy.

It would be good to make it explicit when you start a pipelinedoperation. Currently, you get an error if you call PQsendQuery() twicein a row, without reading the result inbetween. That's a good thing, tocatch application errors, when you're not trying to do pipelining.Otherwise, if you forget to get the result of a query you've sent, andthen send another query, you'll merrily read the result of the firstquery and think that it belongs to the second.

Are you trying to support "continous pipelining", where you send newqueries all the time, and read results as they arrive, without everdraining the pipe? Or are you just trying to do "batches", where yousend a bunch of queries, and wait for all the results to arrive, beforesending more? A batched API would be easier to understand and work with,although a "continuous" pipeline could be more efficient for anapplication that can take advantage of it.

Consideration of implicit transactions (autocommit), the whole pipeline
being one transaction, or multiple transactions is needed.

The more I think about this the more confident I am that no extra work is
needed.

Unless we start doing some preliminary processing of the query inside of
libpq, our hands are tied wrt sending a sync at the end of each query.  The
reason for this is that we rely on the ReadyForQuery message to indicate the
end of a query, so without the sync there is no way to tell if the next result
is from another statement in the current query, or the first statement in the
next query.

I also don't see a reason to need multiple queries without a sync statement.
If the user wants all queries to succeed or fail together it should be no
problem to start the pipeline with begin and complete it commit.  But I may be
missing some detail...

True. It makes me a bit uneasy, though, to not be sure that the wholebatch is committed or rolled back as one unit. There are many ways theuser can shoot himself in the foot with that. Error handling would be alot simpler if you would only send one Sync for the whole batch. Tomexplained it better on this recent thread:http://www.postgresql.org/message-id/32086.1415063...@sss.pgh.pa.us.

Another thought is that for many applications, it would actually be OKto not know which query each result belongs to. For example, if youexecute a bunch of inserts, you often just want to get back the totalnumber of inserted, or maybe not even that. Or if you execute a "CREATETEMPORARY TABLE ... ON COMMIT DROP", followed by some insertions to it,some more data manipulations, and finally a SELECT to get the resultsback. All you want is the last result set.

If we could modify the wire protocol, we'd want to have a MiniSyncmessage that is like Sync except that it wouldn't close the currenttransaction. The server would respond to it with a ReadyForQuery message(which could carry an ID number, to match it up with the MiniSynccommand). But I really wish we'd find a way to do this without changingthe wire protocol.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] libpq pipelining

Reply via email to