Re: [HACKERS] [patch] libpq one-row-at-a-time API
On Fri, Aug 3, 2012 at 12:01 AM, Tom Lane wrote: > Marko Kreen writes: >> On Thu, Aug 2, 2012 at 8:19 AM, Tom Lane wrote: >>> In principle I suppose we could hack PQconsumeInput enough that it >>> didn't damage the row buffer while still meeting its API contract of >>> clearing the read-ready condition on the socket; but it wouldn't be >>> simple or cheap to do so. > >> Any code using single-row-mode is new. Any code calling consumeInput >> before fully draining rows from buffer is buggy, as it allows unlimited >> growth >> of network buffer, which breaks whole reason to use single-row mode. > > Sorry, that argument will not fly. The API specification for > PQconsumeInput is that it can be called at any time to drain the kernel > input buffer, without changing the logical state of the PGconn. It's > not acceptable to insert a parenthetical "oh, but you'd better not try > it in single-row mode" caveat. Well, if the old API contract must be kept, then so be it. > The reason I find this of importance is that PQconsumeInput is meant to > be used in an application's main event loop for the sole purpose of > clearing the read-ready condition on the connection's socket, so that > it can wait for other conditions. This could very well be decoupled > from the logic that is involved in testing PQisBusy and > fetching/processing actual results. (If we had not intended to allow > those activities to be decoupled, we'd not have separated PQconsumeInput > and PQisBusy in the first place.) Introducing a dependency between > these things could lead to non-reproducible, timing-dependent, very > hard-to-find bugs. > > By the same token, if somebody were trying to put single-row-mode > support into an existing async application, they'd be fooling with the > parts that issue new queries and collect results, but probably not with > the main event loop. Thus, I do not believe your argument above that > "any code using single-row mode is new". The whole app is not going to > be new. It could be very hard to guarantee that there is no path of > control that invokes PQconsumeInput before the new data is actually > processed, because there was never before a reason to avoid extra > PQconsumeInput calls. I've thought about this. On first glance indeed, if async app has both reader and writer registered in event loop, it might be simpler to keep reading from source socket, event if writer stalls. But this is exactly the situation where error from consumeInput would help. Imagine fast source and slow target (common scenario - a database query for each row). Reading from source *must* be stopped to get predictable memory usage,. > As I said, this is the exact same objection I had to the original scheme > of exposing the row buffer directly. Putting a libpq function in front > of the row buffer doesn't solve the problem if that function is > implemented in a way that's still vulnerable to misuse or race conditions. > >> And now libpq allows such apps to go from 2x row size to full resultset >> size with unfortunate coding. Why? > > This argument is completely nonsensical. The contract for > PQconsumeInput has always been that it consumes whatever the kernel has > available. If you don't extract data from the library before calling it > again, the library's internal buffer may grow to more than the minimum > workable size, but that's a tradeoff you may be willing to make to > simplify your application logic. I do not think that it's an > improvement to change the API spec to say either that you can't call > PQconsumeInput in certain states, or that you can but it won't absorb > data from the kernel. Old patch had a nice property that we could replace PQgetResult() with something else. To allow that and also PQconsumeInput(), we could store offsets in rowBuf, and then skip realign in PQconsumeInput(). Not sure if the replacement reason is worth keeping, but the offsets may be useful even now as they might give additional speedup as they decrease the per-column storage from 16 to 8 bytes on 64-bit architectures. May be better left for 9.3 though... -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] libpq one-row-at-a-time API
Marko Kreen writes: > On Thu, Aug 2, 2012 at 8:19 AM, Tom Lane wrote: >> In principle I suppose we could hack PQconsumeInput enough that it >> didn't damage the row buffer while still meeting its API contract of >> clearing the read-ready condition on the socket; but it wouldn't be >> simple or cheap to do so. > Any code using single-row-mode is new. Any code calling consumeInput > before fully draining rows from buffer is buggy, as it allows unlimited growth > of network buffer, which breaks whole reason to use single-row mode. Sorry, that argument will not fly. The API specification for PQconsumeInput is that it can be called at any time to drain the kernel input buffer, without changing the logical state of the PGconn. It's not acceptable to insert a parenthetical "oh, but you'd better not try it in single-row mode" caveat. The reason I find this of importance is that PQconsumeInput is meant to be used in an application's main event loop for the sole purpose of clearing the read-ready condition on the connection's socket, so that it can wait for other conditions. This could very well be decoupled from the logic that is involved in testing PQisBusy and fetching/processing actual results. (If we had not intended to allow those activities to be decoupled, we'd not have separated PQconsumeInput and PQisBusy in the first place.) Introducing a dependency between these things could lead to non-reproducible, timing-dependent, very hard-to-find bugs. By the same token, if somebody were trying to put single-row-mode support into an existing async application, they'd be fooling with the parts that issue new queries and collect results, but probably not with the main event loop. Thus, I do not believe your argument above that "any code using single-row mode is new". The whole app is not going to be new. It could be very hard to guarantee that there is no path of control that invokes PQconsumeInput before the new data is actually processed, because there was never before a reason to avoid extra PQconsumeInput calls. As I said, this is the exact same objection I had to the original scheme of exposing the row buffer directly. Putting a libpq function in front of the row buffer doesn't solve the problem if that function is implemented in a way that's still vulnerable to misuse or race conditions. > And now libpq allows such apps to go from 2x row size to full resultset > size with unfortunate coding. Why? This argument is completely nonsensical. The contract for PQconsumeInput has always been that it consumes whatever the kernel has available. If you don't extract data from the library before calling it again, the library's internal buffer may grow to more than the minimum workable size, but that's a tradeoff you may be willing to make to simplify your application logic. I do not think that it's an improvement to change the API spec to say either that you can't call PQconsumeInput in certain states, or that you can but it won't absorb data from the kernel. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] libpq one-row-at-a-time API
On Thu, Aug 2, 2012 at 8:19 AM, Tom Lane wrote: > Marko Kreen writes: >> On Wed, Aug 1, 2012 at 6:18 PM, Tom Lane wrote: >>> So I'm working from the first set of patches in your message >>> <20120721194907.ga28...@gmail.com>. > >> Great! > > Here's an updated draft patch. I've not looked at the docs yet so > this doesn't include that, but I'm reasonably happy with the code > changes now. The main difference from what you had is that I pushed > the creation of the single-row PGresults into pqRowProcessor, so that > they're formed immediately while scanning the input message. That > other method with postponing examination of the rowBuf does not work, > any more than it did with the original patch, because you can't assume > that the network input buffer is going to hold still. For example, > calling PQconsumeInput after parseInput has absorbed a row-data message > but before calling PQgetResult would likely break things. > > In principle I suppose we could hack PQconsumeInput enough that it > didn't damage the row buffer while still meeting its API contract of > clearing the read-ready condition on the socket; but it wouldn't be > simple or cheap to do so. Any code using single-row-mode is new. Any code calling consumeInput before fully draining rows from buffer is buggy, as it allows unlimited growth of network buffer, which breaks whole reason to use single-row mode. It was indeed bug in previous patch that it did not handle this situation, but IMHO it should be handled with error and not with allowing such code to work. Previously, whatever sequence the fetch functions were called, the apps max memory usage was either 1x resultset size, or max 2x resultset size, if it messed the sequence somehow. But no more. So it app knew that resultset was big, it needed to split it up. Now with single-row-mode, the apps can assume their max memory usage is 1*row size + network buffer, lets simplify that to 2x row size. But no more. And now they can process huge resultsets assuming their memory usage will be no more than 2x row size. And now libpq allows such apps to go from 2x row size to full resultset size with unfortunate coding. Why? I have opinions about reorg details too, but that's mostly matter of taste, so rather unimportant compared to question whether we should allow code to break the guarantees the single-row-mode gives. -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] libpq one-row-at-a-time API
Marko Kreen writes: > On Wed, Aug 1, 2012 at 6:18 PM, Tom Lane wrote: >> So I'm working from the first set of patches in your message >> <20120721194907.ga28...@gmail.com>. > Great! Here's an updated draft patch. I've not looked at the docs yet so this doesn't include that, but I'm reasonably happy with the code changes now. The main difference from what you had is that I pushed the creation of the single-row PGresults into pqRowProcessor, so that they're formed immediately while scanning the input message. That other method with postponing examination of the rowBuf does not work, any more than it did with the original patch, because you can't assume that the network input buffer is going to hold still. For example, calling PQconsumeInput after parseInput has absorbed a row-data message but before calling PQgetResult would likely break things. In principle I suppose we could hack PQconsumeInput enough that it didn't damage the row buffer while still meeting its API contract of clearing the read-ready condition on the socket; but it wouldn't be simple or cheap to do so. regards, tom lane diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c index 1e62d8091a9d2bdf60af6745d5a01ee14ee5cf5a..7cf86176c2385b9e4ee37c72d7e3c662ea079f7a 100644 *** a/contrib/dblink/dblink.c --- b/contrib/dblink/dblink.c *** typedef struct storeInfo *** 70,75 --- 70,78 AttInMetadata *attinmeta; MemoryContext tmpcontext; char **cstrs; + /* temp storage for results to avoid leaks on exception */ + PGresult *last_res; + PGresult *cur_res; } storeInfo; /* *** static void materializeQueryResult(Funct *** 83,90 const char *conname, const char *sql, bool fail); ! static int storeHandler(PGresult *res, const PGdataValue *columns, ! const char **errmsgp, void *param); static remoteConn *getConnectionByName(const char *name); static HTAB *createConnHash(void); static void createNewConnection(const char *name, remoteConn *rconn); --- 86,93 const char *conname, const char *sql, bool fail); ! static PGresult *storeQueryResult(storeInfo *sinfo, PGconn *conn, const char *sql); ! static void storeRow(storeInfo *sinfo, PGresult *res, bool first); static remoteConn *getConnectionByName(const char *name); static HTAB *createConnHash(void); static void createNewConnection(const char *name, remoteConn *rconn); *** materializeResult(FunctionCallInfo fcinf *** 927,934 /* * Execute the given SQL command and store its results into a tuplestore * to be returned as the result of the current function. * This is equivalent to PQexec followed by materializeResult, but we make ! * use of libpq's "row processor" API to reduce per-row overhead. */ static void materializeQueryResult(FunctionCallInfo fcinfo, --- 930,939 /* * Execute the given SQL command and store its results into a tuplestore * to be returned as the result of the current function. + * * This is equivalent to PQexec followed by materializeResult, but we make ! * use of libpq's single-row mode to avoid accumulating the whole result ! * inside libpq before it gets transferred to the tuplestore. */ static void materializeQueryResult(FunctionCallInfo fcinfo, *** materializeQueryResult(FunctionCallInfo *** 944,962 /* prepTuplestoreResult must have been called previously */ Assert(rsinfo->returnMode == SFRM_Materialize); PG_TRY(); { ! /* initialize storeInfo to empty */ ! memset(&sinfo, 0, sizeof(sinfo)); ! sinfo.fcinfo = fcinfo; ! ! /* We'll collect tuples using storeHandler */ ! PQsetRowProcessor(conn, storeHandler, &sinfo); ! ! res = PQexec(conn, sql); ! ! /* We don't keep the custom row processor installed permanently */ ! PQsetRowProcessor(conn, NULL, NULL); if (!res || (PQresultStatus(res) != PGRES_COMMAND_OK && --- 949,962 /* prepTuplestoreResult must have been called previously */ Assert(rsinfo->returnMode == SFRM_Materialize); + /* initialize storeInfo to empty */ + memset(&sinfo, 0, sizeof(sinfo)); + sinfo.fcinfo = fcinfo; + PG_TRY(); { ! /* execute query, collecting any tuples into the tuplestore */ ! res = storeQueryResult(&sinfo, conn, sql); if (!res || (PQresultStatus(res) != PGRES_COMMAND_OK && *** materializeQueryResult(FunctionCallInfo *** 975,982 else if (PQresultStatus(res) == PGRES_COMMAND_OK) { /* ! * storeHandler didn't get called, so we need to convert the ! * command status string to a tuple manually */ TupleDesc tupdesc; AttInMetadata *attinmeta; --- 975,982 else if (PQresultStatus(res) == PGRES_COMMAND_OK) { /* ! * storeRow didn't get called, so we need to convert the command ! * status string to a tuple manually */ TupleDesc tupdesc; AttI
Re: [HACKERS] [patch] libpq one-row-at-a-time API
On Wed, Aug 1, 2012 at 6:18 PM, Tom Lane wrote: > Marko Kreen writes: >> * Is there better API than PQsetSingleRowMode()? New PQsend* >> functions is my alternative. > > After thinking it over, I'm really unexcited about adding new versions > of all the PQsend functions. If we had the prospect of more flags in > future that could be added to a bitmask flags argument, it would be > more palatable, but it's far from clear what those might be. So I'm > now leaning towards using PQsetSingleRowMode as-is. I can imagine such flag - I'd like to have a flag to allow send more queries to server without waiting the finish of old ones. Currently, when a batch-processing app wants to keep backend busy, they need to stuff many statements into single query. Which is ugly. Among other things it loses the possibility to separate arguments from query, and it breaks error reporting. The flag would fix this, and it would be useful also in other scenarios. Interestingly, although it affects different area, it would be also be flag for PQsend* and not for PQexec*. So maybe the direction is not completely wrong. But I cannot imagine why the PQsetSingleRowMode would be hard to keep working even if we have PQsend functions with flags, so I'm not worried about getting it exactly right from the start. >> * Should we rollback rowBuf change? I think no, as per benchmark >> it performs better than old code. > > I had already pretty much come to that conclusion just based on code > review, without thinking about performance. In particular, we had done > some nontrivial work towards improving error-case handling in the data > message parsing code, and I don't really want to give that up, nor > rewrite it on the fly now. About the only reason I could see for > reverting rowBuf was that I thought it might hurt performance; so now > that you've proven the opposite, we should leave it alone. Additional argument for rowBuf is Merlin's wish for weirdly optimized PGresults. Basically, with rowBuf anybody who wants to experiment with different ways to process row data just needs to implement single function which processes data then advances the state machine. Like I did with PQgetRowData(). Without it, quite lot of code needs to be patched. > So I'm working from the first set of patches in your message > <20120721194907.ga28...@gmail.com>. Great! -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] libpq one-row-at-a-time API
[ getting back to this now after assorted distractions ] Marko Kreen writes: > Just to show agreement: both PQgetRowData() and optimized PGresult > do not belong to 9.2. OK, we're all on board with leaving those for later. > Only open questions are: > * Is there better API than PQsetSingleRowMode()? New PQsend* > functions is my alternative. After thinking it over, I'm really unexcited about adding new versions of all the PQsend functions. If we had the prospect of more flags in future that could be added to a bitmask flags argument, it would be more palatable, but it's far from clear what those might be. So I'm now leaning towards using PQsetSingleRowMode as-is. > * Should we rollback rowBuf change? I think no, as per benchmark > it performs better than old code. I had already pretty much come to that conclusion just based on code review, without thinking about performance. In particular, we had done some nontrivial work towards improving error-case handling in the data message parsing code, and I don't really want to give that up, nor rewrite it on the fly now. About the only reason I could see for reverting rowBuf was that I thought it might hurt performance; so now that you've proven the opposite, we should leave it alone. So I'm working from the first set of patches in your message <20120721194907.ga28...@gmail.com>. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] libpq one-row-at-a-time API
On Mon, Jul 30, 2012 at 10:26 PM, Jan Wieck wrote: > On 7/30/2012 10:31 PM, Leon Smith wrote: >> >> This is not necessarily true, on multiple levels. I mean, some of >> the programs I write are highly concurrent, and this form of batching >> would have almost no risk of stalling the network buffer.And >> the possible use case would be when you are dealing with very small >> rows, when there would typically be several rows inside a single >> network packet or network buffer. > > > With "highly concurrent" you mean multi-threaded? Like one thread reads the > rows in batches and pushes them into a queue while another thread processes > them from that queue? > > If that is the case, then you just added a useless layer of buffering and > the need for thread/thread context switches to PQsetSingleRowMode. Libpq's > "receiver thread" is the kernel itself. Libpq tries to never read partial > kernel buffers already. It always makes sure that there are at least 8K of > free space in the inBuffer. In the case you describe above, where several > rows fit into a single packet, libpq will receive them with a single system > call in one read(2), then the application can get them as fast as possible, > without causing any further context switches because they are already in the > inBuffer. Yeah: with asynchronous query processing the query gets sent and control returns immediately to your code: that's the whole point. Even if some data races to the network buffer, libpq doesn't 'see' any data until you tell it to by asking for a result (which can block) or draining the buffers with PQconsumeInput. So there is no race in the traditional sense and I'm ok with the PQsetSingleRowMode as such. Removing malloc/free on row iteration seems only to be possible via one of two methods: either a) you introduce a non-PGresult based method of data extraction or b) you preserve the PGresult across row iterations. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] libpq one-row-at-a-time API
On 7/30/2012 10:31 PM, Leon Smith wrote: This is not necessarily true, on multiple levels. I mean, some of the programs I write are highly concurrent, and this form of batching would have almost no risk of stalling the network buffer.And the possible use case would be when you are dealing with very small rows, when there would typically be several rows inside a single network packet or network buffer. With "highly concurrent" you mean multi-threaded? Like one thread reads the rows in batches and pushes them into a queue while another thread processes them from that queue? If that is the case, then you just added a useless layer of buffering and the need for thread/thread context switches to PQsetSingleRowMode. Libpq's "receiver thread" is the kernel itself. Libpq tries to never read partial kernel buffers already. It always makes sure that there are at least 8K of free space in the inBuffer. In the case you describe above, where several rows fit into a single packet, libpq will receive them with a single system call in one read(2), then the application can get them as fast as possible, without causing any further context switches because they are already in the inBuffer. I've written that sort of code myself in the past. Look at the Slony worker thread prior to 2.2. We switched to the COPY protocol instead of waiting for the single row mode and got rid of all that extra buffering already (and then some more). Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] libpq one-row-at-a-time API
On Mon, Jul 30, 2012 at 9:59 PM, Jan Wieck wrote: > On 7/30/2012 8:11 PM, Leon Smith wrote: > >> One other possibility, Tom Lane fretted ever so slightly about the use >> of malloc/free per row... what about instead of PQsetSingleRowMode, you >> have PQsetChunkedRowMode that takes a chunkSize parameter. A chunkSize >> <= 0 would be equivalent to what we have today, a chunkSize of 1 means >> you get what you have from PQsetSingleRowMode, and larger chunkSizes >> would wait until n rows have been received before returning them all in >> a single result. I don't know that this suggestion is all that >> important, but it seems like an obvious generalization that might >> possibly be useful. >> > > It is questionable if that actually adds any useful functionality. This is true, I'm not sure my suggestion is necessarily useful. I'm just throwing it out there. > Any "collecting" of multiple rows will only run the risk to stall > receiving the following rows while processing this batch. Processing each > row as soon as it is available will ensure making most use network buffers. > This is not necessarily true, on multiple levels. I mean, some of the programs I write are highly concurrent, and this form of batching would have almost no risk of stalling the network buffer.And the possible use case would be when you are dealing with very small rows, when there would typically be several rows inside a single network packet or network buffer. > Collecting multiple rows, like in the FETCH command for cursors does, > makes sense when each batch introduces a network round trip, like for the > FETCH command. But does it make any sense for a true streaming mode, like > what is discussed here? Maybe?I mean, I anticipate that there are (probably) still use cases for FETCH, even when the row-at-a-time interface is a viable option and the transport between postgres and the client has reasonable flow-control. Leon
Re: [HACKERS] [patch] libpq one-row-at-a-time API
On 7/30/2012 8:11 PM, Leon Smith wrote: One other possibility, Tom Lane fretted ever so slightly about the use of malloc/free per row... what about instead of PQsetSingleRowMode, you have PQsetChunkedRowMode that takes a chunkSize parameter. A chunkSize <= 0 would be equivalent to what we have today, a chunkSize of 1 means you get what you have from PQsetSingleRowMode, and larger chunkSizes would wait until n rows have been received before returning them all in a single result. I don't know that this suggestion is all that important, but it seems like an obvious generalization that might possibly be useful. It is questionable if that actually adds any useful functionality. Any "collecting" of multiple rows will only run the risk to stall receiving the following rows while processing this batch. Processing each row as soon as it is available will ensure making most use network buffers. Collecting multiple rows, like in the FETCH command for cursors does, makes sense when each batch introduces a network round trip, like for the FETCH command. But does it make any sense for a true streaming mode, like what is discussed here? Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] libpq one-row-at-a-time API
Hey, this thread was pointed out to me just a few days ago, but I'll start by saying that I think this thread is on exactly the right track. I don't like the callback API, and think that PQsetSingleRowMode should be offered in place of it. But I do have one On Sat, Jun 16, 2012 at 10:22 AM, Marko Kreen wrote: > > The function can be called only after PQsend* and before any > rows have arrived. This guarantees there will be no surprises > to PQexec* users who expect full resultset at once. Ok, I'm guessing you mean that "before you call PQgetResult or PQgetRowData", or maybe "before you call PQgetResult or PQgetRowData and it returns a result or partial result."Because it would be a race condition if you meant exactly what you said. (Though I don't understand how this could possibly be implemented without some source of concurrency, which libpq doesn't do.) Maybe this is a little overly pendantic, but I do want to confirm the intention here. One other possibility, Tom Lane fretted ever so slightly about the use of malloc/free per row... what about instead of PQsetSingleRowMode, you have PQsetChunkedRowMode that takes a chunkSize parameter. A chunkSize <= 0 would be equivalent to what we have today, a chunkSize of 1 means you get what you have from PQsetSingleRowMode, and larger chunkSizes would wait until n rows have been received before returning them all in a single result. I don't know that this suggestion is all that important, but it seems like an obvious generalization that might possibly be useful. Best, Leon
Re: [HACKERS] [patch] libpq one-row-at-a-time API
On Tuesday, July 24, 2012, Marko Kreen wrote: > On Wed, Jul 25, 2012 at 1:29 AM, Merlin Moncure wrote: >> On Tue, Jul 24, 2012 at 4:59 PM, Marko Kreen wrote: >>> So if we give only PQgetResult() in 9.2, I do not see that we >>> are locked out from any interesting optimizations. >> >> Well, you are locked out of having PQgetResult reuse the conn's result >> since that would then introduce potentially breaking changes to user >> code. > > You can specify special flags to PQsend or have special PQgetResultWeird() > calls to get PGresults with unusual behavior. Like I did with PQgetRowData(). > > I see no reason here to reject PQgetResult() that returns normal PGresult. Yeah -- I agree. So, given the scheduling, I think we should go with non-PQgetRowData patch and reserve optimized path for 9.3. merlin
Re: [HACKERS] [patch] libpq one-row-at-a-time API
On Wed, Jul 25, 2012 at 1:29 AM, Merlin Moncure wrote: > On Tue, Jul 24, 2012 at 4:59 PM, Marko Kreen wrote: >> So if we give only PQgetResult() in 9.2, I do not see that we >> are locked out from any interesting optimizations. > > Well, you are locked out of having PQgetResult reuse the conn's result > since that would then introduce potentially breaking changes to user > code. You can specify special flags to PQsend or have special PQgetResultWeird() calls to get PGresults with unusual behavior. Like I did with PQgetRowData(). I see no reason here to reject PQgetResult() that returns normal PGresult. -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] libpq one-row-at-a-time API
On Tue, Jul 24, 2012 at 5:29 PM, Merlin Moncure wrote: > so #2 seems like the lowest common > denominator (it would permanently preclude #3 and would require #4 to > introduce two new functions instead of just one). #1 of course would > bolt on to #2. oops, got #1 and #2 backwards there. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] libpq one-row-at-a-time API
On Tue, Jul 24, 2012 at 4:59 PM, Marko Kreen wrote: > So if we give only PQgetResult() in 9.2, I do not see that we > are locked out from any interesting optimizations. Well, you are locked out of having PQgetResult reuse the conn's result since that would then introduce potentially breaking changes to user code. Here are the approaches I see: 1) introduce rowbuf (but not user exposed rowbuf) patch: Not the fastest method. Users would begin using the feature and API behaviors would be locked in -- only possible optmiization route would be to try and make PQcopyResult as fast as possible. 2) expose PQgetRowData Very fast, but foreign and somewhat awkward. Existing libpq wrappers (libpqtypes, libpqxx etc) could not be converted to use without extensive revision, if at all, and would be stuck with the slower version of iteration. Also a side benefit here is that optimizing result copying later has usefulness outside of row by row processing. 3) as #1, but instead of copying result, overwrite the data area. this is bending the rule 'user defined lifetime of result' since each iteration clears the data area of the PGresult. This is almost as fast as #2, and existing libpq wrappers would be trivially converted to the API. 4) as #3, but introduce a new iteration function (PQiterateResult(conn, result)) that would be called instead of a paired PQgetResult/PQclear. This would also be fast, and could almost lay directly on top of #2 as an alternate optimization strategy -- the set result mode would have to be modified (or and additional function introduced) to return a result. I feel like you're partial to #2 and appreciate that, but I'm really quite skeptical about it as noted. #3 and #4 are not as well thought out as what you've put together and would need some extra work and buy-in to get out the door, so #2 seems like the lowest common denominator (it would permanently preclude #3 and would require #4 to introduce two new functions instead of just one). #1 of course would bolt on to #2. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] libpq one-row-at-a-time API
On Wed, Jul 25, 2012 at 12:35 AM, Merlin Moncure wrote: > On Tue, Jul 24, 2012 at 3:52 PM, Marko Kreen wrote: >> * Is there better API than PQsetSingleRowMode()? New PQsend* >> functions is my alternative. > > I would prefer to have PQsetSingleRowMode() over new flavor of PQsend. > > To consolidate my argument above: since you're throwing PQgetResult in > the main iteration loop you're potentially walling yourself off from > one important optimization: not copying the result at all and reusing > the old one; that's going to produce the fastest possible code since > you're recovering all the attribute structures that have already been > set up unless you're willing to do the following: I am not forgetting such optimization, I've already implemented it: it's called PQgetRowData(). Main reason why it works, and so simply, is that it does not try to give you PGresult. PGresult carries various historical assumptions: - Fully user-controlled lifetime. - Zero-terminated strings. - Aligned binary values. Breaking them all does not seem like a good idea. Trying to make them work with zero-copy seems not worth the pain. And if you are ready to introduce new accessor functions, why care about PGresult at all? Why not give user PQgetRowData()? Btw, PQgetRowData() and any other potential zero-copy API is not incompatible with both slow PQgetResult() or optimized PQgetResult(). So if we give only PQgetResult() in 9.2, I do not see that we are locked out from any interesting optimizations. -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] libpq one-row-at-a-time API
On Tue, Jul 24, 2012 at 3:52 PM, Marko Kreen wrote: > * Is there better API than PQsetSingleRowMode()? New PQsend* > functions is my alternative. I would prefer to have PQsetSingleRowMode() over new flavor of PQsend. To consolidate my argument above: since you're throwing PQgetResult in the main iteration loop you're potentially walling yourself off from one important optimization: not copying the result at all and reusing the old one; that's going to produce the fastest possible code since you're recovering all the attribute structures that have already been set up unless you're willing to do the following: Reading your original patch, what if at the line inside PQgetResult: res = pqSingleRowResult(conn); you instead manipulated the result the connection already had and directly returned it -- discarding the result data but not the attributes etc? Actually, you could even keep your API 'as is' if you're willing to adjust the behavior of PQclear while the connection is doing row by row results processing to be a no-op unless done. Single row results' data would then be volatile across iterations, not const -- even if you save off the pointer the connection can and will rewrite the data portion of the PGresult. Any pointers to PQgetdata'd values would have to be copied off between iterations naturally (or the user can copy off using the user-facing copy result function). This would be extremely efficient since we'd only even need to extend PGresult buffer if a particular row was longer than the longest one already fetched. If all that's a bridge too far, we can look at re-jiggering the API like I was thinking ealier to make the adjusted result scoping more user visible or run your non-rowbuf-exposing patch as-is and hope for optimizations down the line. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] libpq one-row-at-a-time API
On Tue, Jul 24, 2012 at 10:37 PM, Tom Lane wrote: > Merlin Moncure writes: >> Either way, it looks like there's plausible paths to optimizing >> repeated result fetch without having expose an alternate data-fetching >> API and that the proposed implementation doesn't appear to be a wall >> in terms of getting there. So I'm firmly in the non-exposed-rowbuf >> camp, even if we can't expose an optimal implementation in time for >> 9.2. > > Yeah, the schedule argument is a strong one. If we put in PQgetRowData > now, we'll be stuck with it forevermore. It would be better to push > that part of the patch to 9.3 so we can have more time to think it over > and investigate alternatives. The single-row mode is already enough to > solve the demonstrated client-side performance issues we know about. > So IMO it would be a lot smarter to be spending our time right now on > making sure we have *that* part of the patch right. Just to show agreement: both PQgetRowData() and optimized PGresult do not belong to 9.2. Only open questions are: * Is there better API than PQsetSingleRowMode()? New PQsend* functions is my alternative. * Should we rollback rowBuf change? I think no, as per benchmark it performs better than old code. > The thing I fundamentally don't like about PQsetSingleRowMode is that > there's such a narrow window of time to use it correctly -- as soon as > you've done even one PQconsumeInput, it's too late. (Or maybe not, if > the server is slow, which makes it timing-dependent whether you'll > observe a bug. Maybe we should add more internal state so that we can > consistently throw error if you've done any PQconsumeInput?) The most > obvious alternative is to invent new versions of the PQsendXXX functions > with an additional flag parameter; but there are enough of them to make > that not very attractive either. It belongs logically together with PQsend, so if you don't like current separation, then proper fix is to make them single function call. -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] libpq one-row-at-a-time API
On Tue, Jul 24, 2012 at 11:34 PM, Merlin Moncure wrote: > On Tue, Jul 24, 2012 at 2:37 PM, Tom Lane wrote: >> In particular I agree that PQsetSingleRowMode is a bit ugly, so I'm >> wondering about your thoughts on providing PQgetSingleRowResult instead. >> I don't see how to make that work in async mode. I think the library >> needs to be aware of whether it's supposed to return a single-row result >> in advance of actually doing so --- for instance, how can PQisBusy >> possibly give the correct answer if it doesn't know whether having >> received the first row is sufficient? > > Well (Marko probably wants to slap me with a halibut by now), the > original intent was for it not to have to: PQgetSingleRowResult is > more 'construct result for single row iteration', so it would never > block -- it only sets the internal single row mode and instantiates > the result object. After that, you can do do standard > consumeinput/isbusy processing on the conn. The actual iteration > routine would be like PQiterateResult which you could guard with > PQisBusy. Like I said: it's the same general structure but the result > construction is moved out of the loop. If you really don't like PQsetSingleRowMode, then I would prefer new PQsend* functions to new fetch functions. Because while send is one-shot affair, receive is complex state-machine with requires multiple function calls. -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] libpq one-row-at-a-time API
On Tue, Jul 24, 2012 at 2:37 PM, Tom Lane wrote: > Merlin Moncure writes: >> Either way, it looks like there's plausible paths to optimizing >> repeated result fetch without having expose an alternate data-fetching >> API and that the proposed implementation doesn't appear to be a wall >> in terms of getting there. So I'm firmly in the non-exposed-rowbuf >> camp, even if we can't expose an optimal implementation in time for >> 9.2. > > Yeah, the schedule argument is a strong one. If we put in PQgetRowData > now, we'll be stuck with it forevermore. It would be better to push > that part of the patch to 9.3 so we can have more time to think it over > and investigate alternatives. The single-row mode is already enough to > solve the demonstrated client-side performance issues we know about. > So IMO it would be a lot smarter to be spending our time right now on > making sure we have *that* part of the patch right. > > In particular I agree that PQsetSingleRowMode is a bit ugly, so I'm > wondering about your thoughts on providing PQgetSingleRowResult instead. > I don't see how to make that work in async mode. I think the library > needs to be aware of whether it's supposed to return a single-row result > in advance of actually doing so --- for instance, how can PQisBusy > possibly give the correct answer if it doesn't know whether having > received the first row is sufficient? Well (Marko probably wants to slap me with a halibut by now), the original intent was for it not to have to: PQgetSingleRowResult is more 'construct result for single row iteration', so it would never block -- it only sets the internal single row mode and instantiates the result object. After that, you can do do standard consumeinput/isbusy processing on the conn. The actual iteration routine would be like PQiterateResult which you could guard with PQisBusy. Like I said: it's the same general structure but the result construction is moved out of the loop. I don't think this breaks result scoping rules...the conn keeps a copy of the result during 'result construction' which we are going to define as ending when the query terminates. Until the query resolves, the result remains 'under construction' and avoids the expectation of const-ness that you normally get so you don't get to interleave threads reading the result with threads iterating from the connection (which is fine) and you have to avoid doing stupid things like closing the connection before all the data has been read through. (but maybe all this is moot per Marko's latest benchmarks) merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] libpq one-row-at-a-time API
> Hm, I think it's possible to rig the test to do dummy > copy of pgresult, thus it's possible to see what kind > of speed is possible.. Will try. I added a new method (-x) to rowdump where it asks for row with PQgetRowData() and then tries to emulate super-efficient PGresult copy, then loads data from that PGresult. Quick reference: rowdump1 - single-row-mode1 [~ libpq 9.2] rowdump2 - single-row-mode2 [~ libpq 9.1] -s - single row mode with PQgetResult() -z - single row mode with PQgetRowData() -x - simulated optimized PQgetResult() - QUERY: select 1,200,30,rpad('x',30,'z') from generate_series(1,500) ./rowdump1 -s: 6.28 6.25 6.39 avg: 6.31 [ 100.00 % ] ./rowdump2 -s: 7.49 7.40 7.57 avg: 7.49 [ 118.71 % ] ./rowdump1 -z: 2.86 2.77 2.79 avg: 2.81 [ 44.50 % ] ./rowdump1 -x: 3.46 3.27 3.29 avg: 3.34 [ 52.96 % ] QUERY: select rpad('x',10,'z'),rpad('x',20,'z'),rpad('x',30,'z'),rpad('x',40,'z'),rpad('x',50,'z'),rpad('x',60,'z') from generate_series(1,300) ./rowdump1 -s: 7.76 7.76 7.68 avg: 7.73 [ 100.00 % ] ./rowdump2 -s: 8.24 8.12 8.66 avg: 8.34 [ 107.85 % ] ./rowdump1 -z: 5.34 5.07 5.23 avg: 5.21 [ 67.41 % ] ./rowdump1 -x: 5.53 5.61 5.61 avg: 5.58 [ 72.20 % ] QUERY: select 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100 from generate_series(1,80) ./rowdump1 -s: 7.49 7.66 7.59 avg: 7.58 [ 100.00 % ] ./rowdump2 -s: 7.56 8.12 7.95 avg: 7.88 [ 103.91 % ] ./rowdump1 -z: 2.77 2.76 2.76 avg: 2.76 [ 36.46 % ] ./rowdump1 -x: 3.07 3.05 3.18 avg: 3.10 [ 40.90 % ] QUERY: select 1000,rpad('x', 400, 'z'),rpad('x', 4000, 'z') from generate_series(1,10) ./rowdump1 -s: 2.66 2.62 2.67 avg: 2.65 [ 100.00 % ] ./rowdump2 -s: 3.11 3.14 3.11 avg: 3.12 [ 117.74 % ] ./rowdump1 -z: 2.49 2.46 2.47 avg: 2.47 [ 93.33 % ] ./rowdump1 -x: 2.59 2.57 2.57 avg: 2.58 [ 97.23 % ] - It shows that even if the actual "fast" row copy will be slower than this one here, it's still quote competitive approach to PQgetRowData(), as long it's not too slow. So the optimized PGgetResult() may be good enough, thus we can drop the idea of PQgetRowData(). Code attached, also in https://github.com/markokr/pqtest repo. -- marko pg1 = /opt/apps/pgsql92mode1 pg2 = /opt/apps/pgsql92mode2 X1 = -DHAVE_ROWDATA -I$(pg1)/include/internal -I$(pg1)/include/server CFLAGS = -O -g -Wall all: rowdump1 rowdump2 rowdump1: rowdump.c $(CC) -I$(pg1)/include $(CFLAGS) -o $@ $< -L$(pg1)/lib -Wl,-rpath=$(pg1)/lib -lpq $(X1) rowdump2: rowdump.c $(CC) -I$(pg2)/include $(CFLAGS) -o $@ $< -L$(pg2)/lib -Wl,-rpath=$(pg2)/lib -lpq clean: rm -f rowdump1 rowdump2 time.tmp README.html html: README.html README.html: README.rst rst2html $< > $@ #include #include #include #include #include #include #ifdef HAVE_ROWDATA #include #endif struct Context { PGconn *db; int count; char *buf; int buflen; int bufpos; }; /* print error and exit */ static void die(PGconn *db, const char *msg) { if (db) fprintf(stderr, "%s: %s\n", msg, PQerrorMessage(db)); else fprintf(stderr, "%s\n", msg); exit(1); } /* write out buffer */ static void out_flush(struct Context *ctx) { int out; if (!ctx->buf) return; out = write(1, ctx->buf, ctx->bufpos); if (out != ctx->bufpos) die(NULL, "failed to write file"); ctx->bufpos = 0; ctx->buflen = 0; free(ctx->buf); ctx->buf = NULL; } /* add char to buffer */ static void out_char(struct Context *ctx, char c) { if (ctx->bufpos + 1 > ctx->buflen) { if (!ctx->buf) { ctx->buflen = 16; ctx->buf = malloc(ctx->buflen); if (!ctx->buf) die(NULL, "failed to allocate buffer"); } else { ctx->buflen *= 2; ctx->buf = realloc(ctx->buf, ctx->buflen); if (!ctx->buf) die(NULL, "failed to resize buffer"); } } ctx->buf[ctx->bufpos++] = c; } /* quote string for copy */ static void proc_value(struct Context *ctx, const char *val, int vlen) { int i; char c; for (i = 0; i < vlen; i++) { c = val[i]; switch (c) { case '\\': out_char(ctx, '\\'); out_char(ctx, '\\'); break; case '\t': out_char(ctx, '\\'); out_char(ctx, 't'); break; case '\n': out_char(ctx, '\\'); out_char(ctx, 'n'); break; case '\r': out_char(ctx, '\\'); out_char(ctx, 'r'); break; default: out_char(ctx, c); break; } } } /* quote one row for copy from regular PGresult */ static void proc_row(struct Context *ctx, PGresult *res, int tup) { int n = PQnfields(res); const char *val; int i, vlen; ctx->count++; for (i = 0; i < n; i++) { if
Re: [HACKERS] [patch] libpq one-row-at-a-time API
Merlin Moncure writes: > Either way, it looks like there's plausible paths to optimizing > repeated result fetch without having expose an alternate data-fetching > API and that the proposed implementation doesn't appear to be a wall > in terms of getting there. So I'm firmly in the non-exposed-rowbuf > camp, even if we can't expose an optimal implementation in time for > 9.2. Yeah, the schedule argument is a strong one. If we put in PQgetRowData now, we'll be stuck with it forevermore. It would be better to push that part of the patch to 9.3 so we can have more time to think it over and investigate alternatives. The single-row mode is already enough to solve the demonstrated client-side performance issues we know about. So IMO it would be a lot smarter to be spending our time right now on making sure we have *that* part of the patch right. In particular I agree that PQsetSingleRowMode is a bit ugly, so I'm wondering about your thoughts on providing PQgetSingleRowResult instead. I don't see how to make that work in async mode. I think the library needs to be aware of whether it's supposed to return a single-row result in advance of actually doing so --- for instance, how can PQisBusy possibly give the correct answer if it doesn't know whether having received the first row is sufficient? (Yeah, I know we could invent a separate flavor of PQisBusy for single-row usage, but ick. There are implementation problems too, such as possibly having already copied a lot of rows into a work-in-progress PGresult.) The thing I fundamentally don't like about PQsetSingleRowMode is that there's such a narrow window of time to use it correctly -- as soon as you've done even one PQconsumeInput, it's too late. (Or maybe not, if the server is slow, which makes it timing-dependent whether you'll observe a bug. Maybe we should add more internal state so that we can consistently throw error if you've done any PQconsumeInput?) The most obvious alternative is to invent new versions of the PQsendXXX functions with an additional flag parameter; but there are enough of them to make that not very attractive either. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] libpq one-row-at-a-time API
On Tue, Jul 24, 2012 at 1:49 PM, Merlin Moncure wrote: > On Tue, Jul 24, 2012 at 1:33 PM, Merlin Moncure wrote: >> The 'source' result (or source data that would be copied into the >> destination result) would be stored in the PGconn, right? So, the idea >> is that when you set up single row mode the connection generates a >> template PGconn which is then copied out repeatedly during row-by-row >> processing. I like it, but only if we're reasonably confident the >> PGresult can be sufficiently optimized like that. > > hm, PGresAttDesc is unfortunately in the public header and as such > probably can't be changed? It can't -- at least not without breaking compatibility. So, as attractive as it sounds, it looks like a memcpy based PGresult copy is out unless we break the rules change it anyways (with the probably safe assumption that the only userland code that cares is libpqtypes, which we'd certainly change). However, it's not clear that a shared metadata implementation would require a mutex -- if you stored the shared data in the conn and were willing to walk the results in the event the PGconn was freed before the results, you'd probably be ok. That's really unpleasant though. Either way, it looks like there's plausible paths to optimizing repeated result fetch without having expose an alternate data-fetching API and that the proposed implementation doesn't appear to be a wall in terms of getting there. So I'm firmly in the non-exposed-rowbuf camp, even if we can't expose an optimal implementation in time for 9.2. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] libpq one-row-at-a-time API
On Tue, Jul 24, 2012 at 1:33 PM, Merlin Moncure wrote: > The 'source' result (or source data that would be copied into the > destination result) would be stored in the PGconn, right? So, the idea > is that when you set up single row mode the connection generates a > template PGconn which is then copied out repeatedly during row-by-row > processing. I like it, but only if we're reasonably confident the > PGresult can be sufficiently optimized like that. hm, PGresAttDesc is unfortunately in the public header and as such probably can't be changed? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] libpq one-row-at-a-time API
On Tue, Jul 24, 2012 at 11:57 AM, Marko Kreen wrote: > On Tue, Jul 24, 2012 at 7:52 PM, Merlin Moncure wrote: >> But, the faster rowbuf method is a generally incompatible way of >> dealing with data vs current libpq -- this is bad. If it's truly >> impossible to get those benefits without bypassing result API that >> then I remove my objection on the grounds it's optional behavior (my >> gut tells me it is possible though). > > Um, please clarify what are you talking about here? > > What is the incompatibility of PGresult from branch 1? Incompatibility in terms of usage -- we should be getting data with PQgetdata. I think you're suspecting that I incorrectly believe your forced to use the rowbuf API -- I don't (although I wasn't clear on that earlier). Basically I'm saying that we should only buy into that if all other alternative routes to getting the faster performance are exhausted. On Tue, Jul 24, 2012 at 11:59 AM, Tom Lane wrote: > Merlin Moncure writes: >> I think the dummy copy of PGresult is plausible (if by that you mean >> optimizing PQgetResult when in single row mode). That would be even >> better: you'd remove the need for the rowbuf mode. > > I haven't spent any time looking at this, but my gut tells me that a big > chunk of the expense is copying the PGresult's metadata (the column > names, types, etc). It has to be possible to make that cheaper. > > One idea is to rearrange the internal storage so that that part reduces > to one memcpy(). Another thought is to allow PGresults to share > metadata by treating the metadata as a separate reference-counted > object. The second could be a bit hazardous though, as we advertise > that PGresults are independent objects that can be manipulated by > separate threads. I don't want to introduce mutexes into PGresults, > but I'm not sure reference-counted metadata can be safe without them. > So maybe the memcpy idea is the only workable one. Yeah -- we had a very similar problem in libpqtypes and we solved it exactly as you're thinking. libpqtypes has to create a result with each row iteration potentially (we expose rows and composites as on the fly created result objects) and stores some extra non-trivial data with the result. We solved it with the optimized-memcpy method (look here: http://libpqtypes.esilo.com/browse_source.html?file=libpqtypes.h and you'll see all the important structs like PGtypeHandler are somewhat haphazardly designed to be run through a memcpy. We couldn't do anything about internal libpq issues though, but some micro optimization of PQsetResultAttrs (which is called via PQcopyResult) might fit the bill. The 'source' result (or source data that would be copied into the destination result) would be stored in the PGconn, right? So, the idea is that when you set up single row mode the connection generates a template PGconn which is then copied out repeatedly during row-by-row processing. I like it, but only if we're reasonably confident the PGresult can be sufficiently optimized like that. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] libpq one-row-at-a-time API
Merlin Moncure writes: > I think the dummy copy of PGresult is plausible (if by that you mean > optimizing PQgetResult when in single row mode). That would be even > better: you'd remove the need for the rowbuf mode. I haven't spent any time looking at this, but my gut tells me that a big chunk of the expense is copying the PGresult's metadata (the column names, types, etc). It has to be possible to make that cheaper. One idea is to rearrange the internal storage so that that part reduces to one memcpy(). Another thought is to allow PGresults to share metadata by treating the metadata as a separate reference-counted object. The second could be a bit hazardous though, as we advertise that PGresults are independent objects that can be manipulated by separate threads. I don't want to introduce mutexes into PGresults, but I'm not sure reference-counted metadata can be safe without them. So maybe the memcpy idea is the only workable one. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] libpq one-row-at-a-time API
On Tue, Jul 24, 2012 at 7:52 PM, Merlin Moncure wrote: > But, the faster rowbuf method is a generally incompatible way of > dealing with data vs current libpq -- this is bad. If it's truly > impossible to get those benefits without bypassing result API that > then I remove my objection on the grounds it's optional behavior (my > gut tells me it is possible though). Um, please clarify what are you talking about here? What is the incompatibility of PGresult from branch 1? -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] libpq one-row-at-a-time API
On Tue, Jul 24, 2012 at 11:39 AM, Marko Kreen wrote: > On Tue, Jul 24, 2012 at 7:25 PM, Merlin Moncure wrote: >> On Tue, Jul 24, 2012 at 11:08 AM, Marko Kreen wrote: I'm arguing that *all* data getting must continue to do so through the result object, and bypassing the result to get at data is breaking the result abstraction in the libpq api. I bet you can still maintain data access through result object while avoiding extra copies. >>> >>> Well, the main problem this week is whether we should >>> apply PQsetSingleRowMode() from single-row-mode1 >>> or from single-row-mode2 branch? >>> >>> The PQgetRowData() is unimportant as it just exposes >>> the rowBuf to user and thats all. >> >> right. branch 1 (containing PQgetRowData) seems wrong to me. > > Indeed, you are missing the fact that PGgetResult works same > in both branches., > > And please see the benchmart I posted. > > Even better, run it yourself... > >>> What do you mean by that? And have you though about both >>> sync and async usage patterns? >> >> No, I haven't -- at least not very well. The basic idea is that >> PQsetSingleRowMode turns into PQgetSingleRowResult() and returns a >> result object. For row by row an extra API call gets called (maybe >> PQgetNextRow(PGconn, PGresult)) that does the behind the scenes work >> under the existing result object. This is the same general structure >> you have in branch 2, but the result allocation is move out of the >> loop. Presumably sync and async would then follow the same pattern, >> but we'd still have to be able to guarantee non-blocking behavior in >> the async api. > > Well, as discussed previously it's worthwhile to keep standard functions > like PQisBusy() and PQgetResult() working sensibly in new mode, > and currently they do so. > > If you add new functions, you also need to define the behavior > when new and old one's are mixed and it gets messy quickly. Right, I'm aware that you can use 'slow' method in branch 1. I also saw the benchmarks and agree that single row mode should be at least competitive with current methods for pretty much all cases. But, the faster rowbuf method is a generally incompatible way of dealing with data vs current libpq -- this is bad. If it's truly impossible to get those benefits without bypassing result API that then I remove my objection on the grounds it's optional behavior (my gut tells me it is possible though). I think the dummy copy of PGresult is plausible (if by that you mean optimizing PQgetResult when in single row mode). That would be even better: you'd remove the need for the rowbuf mode. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] libpq one-row-at-a-time API
On Tue, Jul 24, 2012 at 7:25 PM, Merlin Moncure wrote: > On Tue, Jul 24, 2012 at 11:08 AM, Marko Kreen wrote: >>> I'm arguing that *all* data getting must continue to do so through the >>> result object, and bypassing the result to get at data is breaking the >>> result abstraction in the libpq api. I bet you can still maintain >>> data access through result object while avoiding extra copies. >> >> Well, the main problem this week is whether we should >> apply PQsetSingleRowMode() from single-row-mode1 >> or from single-row-mode2 branch? >> >> The PQgetRowData() is unimportant as it just exposes >> the rowBuf to user and thats all. > > right. branch 1 (containing PQgetRowData) seems wrong to me. Indeed, you are missing the fact that PGgetResult works same in both branches., And please see the benchmart I posted. Even better, run it yourself... >> What do you mean by that? And have you though about both >> sync and async usage patterns? > > No, I haven't -- at least not very well. The basic idea is that > PQsetSingleRowMode turns into PQgetSingleRowResult() and returns a > result object. For row by row an extra API call gets called (maybe > PQgetNextRow(PGconn, PGresult)) that does the behind the scenes work > under the existing result object. This is the same general structure > you have in branch 2, but the result allocation is move out of the > loop. Presumably sync and async would then follow the same pattern, > but we'd still have to be able to guarantee non-blocking behavior in > the async api. Well, as discussed previously it's worthwhile to keep standard functions like PQisBusy() and PQgetResult() working sensibly in new mode, and currently they do so. If you add new functions, you also need to define the behavior when new and old one's are mixed and it gets messy quickly. -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] libpq one-row-at-a-time API
On Tue, Jul 24, 2012 at 7:08 PM, Marko Kreen wrote: > The PQgetRowData() is unimportant as it just exposes > the rowBuf to user and thats all. So to get back to the more interesting PQgetRowData() discussion... Did you notice that it's up to app to decide whether it calls PQgetResult() or PQgetRowData() after PQsetSingleRowMode()? So there is no way for it to get into conflicts with anything. If libpqtypes keeps using PQgetResult it keeps getting PGresult. No problem. The PQgetRowData() is meant for use-cases where PGresult is *not* the main data structure the app operates on. If the app does dumb copy out of PGresult as soon as possible, it can move to PGgetRowData(). If app wants do to complex operations with PGresult or just store it, then it can keep doing it. Nobody forces the use of PQgetRowData(). Now the about idea about doing more optimized PGresult - one way of doing it would be to create zero-copy PGresult that points into network buffer like PGgetRowData() does. But this breaks all expectations of lifetime rules for PGresult, thus seems like bad idea. Second way is to optimize the copying step - basically just do a malloc and 2 or 3 memcopies - for struct, headers and data. This leaves standalone PGresult around that behaves as expected. But for apps that don't care about PGresult it is still unnecessary copy. So even if we optimize PGresult that way, it still seems worthwhile to have PGgetRowData() around. Hm, I think it's possible to rig the test to do dummy copy of pgresult, thus it's possible to see what kind of speed is possible.. Will try. -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] libpq one-row-at-a-time API
On Tue, Jul 24, 2012 at 11:08 AM, Marko Kreen wrote: >> I'm arguing that *all* data getting must continue to do so through the >> result object, and bypassing the result to get at data is breaking the >> result abstraction in the libpq api. I bet you can still maintain >> data access through result object while avoiding extra copies. > > Well, the main problem this week is whether we should > apply PQsetSingleRowMode() from single-row-mode1 > or from single-row-mode2 branch? > > The PQgetRowData() is unimportant as it just exposes > the rowBuf to user and thats all. right. branch 1 (containing PQgetRowData) seems wrong to me. so, if given that choice, I'd argue for branch 2, forcing a PGresult pull on each row. However, what you were gunning for via branch 1 which is extra performance via removing the extra allocs is important and useful; hopefully we can get the best of both worlds, or punt and settle on branch 2. >> For example, maybe PQsetSingleRowMode maybe should return a result object? > > What do you mean by that? And have you though about both > sync and async usage patterns? No, I haven't -- at least not very well. The basic idea is that PQsetSingleRowMode turns into PQgetSingleRowResult() and returns a result object. For row by row an extra API call gets called (maybe PQgetNextRow(PGconn, PGresult)) that does the behind the scenes work under the existing result object. This is the same general structure you have in branch 2, but the result allocation is move out of the loop. Presumably sync and async would then follow the same pattern, but we'd still have to be able to guarantee non-blocking behavior in the async api. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] libpq one-row-at-a-time API
On Tue, Jul 24, 2012 at 6:18 PM, Merlin Moncure wrote: > On Mon, Jul 23, 2012 at 5:07 PM, Marko Kreen wrote: >>> Does PQgetRowData() break the ability to call PQgetvalue() against the >>> result as well as other functions like PQgetisnull()? If so, I >>> object. >> >> I don't get what are you objecting to. The PQgetRowData() >> is called instead of PQgetResult() and it returns zero-row PGresult >> for row header and list of PGdataValues that point to actual >> data. You call the value functions, they don't crash, but as >> the result has zero rows (the data is not copied to it) >> they don't do anything useful either. >> >> The whole point of the API is to avoid the unnecessary copying. >> >> You can mix the calls to PQgetRowData() and PQgetResult() >> during one resultset, it works fine although does not seem >> that useful. >> >> I guess you fear that some code can unexpectedly see >> new behavior, and that exactly why the row-callback API >> needs to be scrapped - it allowed such possibility. >> >> But with PQsetSingleRowMode and PQgetRowData, the >> new behavior is seen only by new code that clearly >> expects it, so I don't see what the problem is. > > Well, for one, it breaks libpqtypes (see line 171@ > http://libpqtypes.esilo.com/browse_source.html?file=exec.c), or at > least makes it unable to use the row-processing mode. libpqtypes > wraps the libpq getter functions and exposes a richer api using only > the result object. When writing libpqtypes we went through great > pains to keep access to server side data through the existing result > API. Note, I'm not arguing that compatibility with libpqtypes is a > desired objective, but the wrapping model that it uses is pretty > reasonably going to be used by other code -- the awkwardness there > should be a red flag of sorts. > > I'm arguing that *all* data getting must continue to do so through the > result object, and bypassing the result to get at data is breaking the > result abstraction in the libpq api. I bet you can still maintain > data access through result object while avoiding extra copies. Well, the main problem this week is whether we should apply PQsetSingleRowMode() from single-row-mode1 or from single-row-mode2 branch? The PQgetRowData() is unimportant as it just exposes the rowBuf to user and thats all. Do you have opinion about that? > For example, maybe PQsetSingleRowMode maybe should return a result object? What do you mean by that? And have you though about both sync and async usage patterns? -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] libpq one-row-at-a-time API
On Tue, Jul 24, 2012 at 10:49 AM, Tom Lane wrote: > Merlin Moncure writes: >> I'm arguing that *all* data getting must continue to do so through the >> result object, and bypassing the result to get at data is breaking the >> result abstraction in the libpq api. > > That's a fair point, but the single-row mode without PQgetRowData still > fits that model, doesn't it? From the point of view of libpqtypes it > just looks like you got a lot of one-row query results. Sure: Marko's exec_query_single_row example looks like 100% reasonable libpq code. That said, I'd still spend a few cycles to think this through and make sure we aren't walling ourselves off from 'copy free' behavior, even if that's reserved for a future improvement. In particular, I'd like to explore if PQsetSingleRowMode() should be returning a result. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] libpq one-row-at-a-time API
Merlin Moncure writes: > I'm arguing that *all* data getting must continue to do so through the > result object, and bypassing the result to get at data is breaking the > result abstraction in the libpq api. That's a fair point, but the single-row mode without PQgetRowData still fits that model, doesn't it? From the point of view of libpqtypes it just looks like you got a lot of one-row query results. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] libpq one-row-at-a-time API
On Mon, Jul 23, 2012 at 5:07 PM, Marko Kreen wrote: >> Does PQgetRowData() break the ability to call PQgetvalue() against the >> result as well as other functions like PQgetisnull()? If so, I >> object. > > I don't get what are you objecting to. The PQgetRowData() > is called instead of PQgetResult() and it returns zero-row PGresult > for row header and list of PGdataValues that point to actual > data. You call the value functions, they don't crash, but as > the result has zero rows (the data is not copied to it) > they don't do anything useful either. > > The whole point of the API is to avoid the unnecessary copying. > > You can mix the calls to PQgetRowData() and PQgetResult() > during one resultset, it works fine although does not seem > that useful. > > I guess you fear that some code can unexpectedly see > new behavior, and that exactly why the row-callback API > needs to be scrapped - it allowed such possibility. > > But with PQsetSingleRowMode and PQgetRowData, the > new behavior is seen only by new code that clearly > expects it, so I don't see what the problem is. Well, for one, it breaks libpqtypes (see line 171@ http://libpqtypes.esilo.com/browse_source.html?file=exec.c), or at least makes it unable to use the row-processing mode. libpqtypes wraps the libpq getter functions and exposes a richer api using only the result object. When writing libpqtypes we went through great pains to keep access to server side data through the existing result API. Note, I'm not arguing that compatibility with libpqtypes is a desired objective, but the wrapping model that it uses is pretty reasonably going to be used by other code -- the awkwardness there should be a red flag of sorts. I'm arguing that *all* data getting must continue to do so through the result object, and bypassing the result to get at data is breaking the result abstraction in the libpq api. I bet you can still maintain data access through result object while avoiding extra copies. For example, maybe PQsetSingleRowMode maybe should return a result object? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] libpq one-row-at-a-time API
On Tue, Jul 24, 2012 at 12:27 AM, Merlin Moncure wrote: > On Mon, Jul 23, 2012 at 2:05 PM, Marko Kreen wrote: > It seems odd (but maybe ok) that you have to set the single row mode > on the connection only to have the server reset it whenever you call a > send function: maybe rename to PQsetResultSingleRowMode? Server does not reset it, it's purely client-side flag. It is reset by next PQsend*/PQexec* call. If there are several resultsets sent by server for one query, they all share the same setting. I don't think extra-long function names that "describe exactly" the function behavior are improvement over shorter but inexact names, as you need to read the docs anyway to know the real behavior. But its a matter of taste, so it can be renamed if people want it. Alternative is to create parallel PQsend* functions that set the flag. It gets rid of the possibility of any extra activity between PQsend and PQsetSingleRowMode(). But it seems messy to do that just for single-row-mode, instead it makes sense to have PQsend* that takes a flags argument to tune it's behavior. But that is worth doing only if we have more flags than just one. > Does PQgetRowData() break the ability to call PQgetvalue() against the > result as well as other functions like PQgetisnull()? If so, I > object. I don't get what are you objecting to. The PQgetRowData() is called instead of PQgetResult() and it returns zero-row PGresult for row header and list of PGdataValues that point to actual data. You call the value functions, they don't crash, but as the result has zero rows (the data is not copied to it) they don't do anything useful either. The whole point of the API is to avoid the unnecessary copying. You can mix the calls to PQgetRowData() and PQgetResult() during one resultset, it works fine although does not seem that useful. I guess you fear that some code can unexpectedly see new behavior, and that exactly why the row-callback API needs to be scrapped - it allowed such possibility. But with PQsetSingleRowMode and PQgetRowData, the new behavior is seen only by new code that clearly expects it, so I don't see what the problem is. -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] libpq one-row-at-a-time API
On Mon, Jul 23, 2012 at 2:05 PM, Marko Kreen wrote: > > Here is a simple test program that takes a SELECT > query, reads it and outputs a COPY-formatted stream > to standard output, to simulate some activity. > > It operates on 3 modes, specified by commant-line switches: > > -f Load full resultset at once - old way. > -s Single-Row mode using PQgetResult(). > -z Single-Row mode using PQgetRowData(). > > It is compiled with 2 different libpqs that correspond to > single-row-modeX branches in my github repo: > > rowdump1 - libpq with rowBuf + PQgetRowData(). rowBuf is >required for PQgetRowData. >[ https://github.com/markokr/postgres/tree/single-row-mode1 ] > > rowdump2 - Plain libpq patched for single-row mode. >No PQgetRowData() here. >[ https://github.com/markokr/postgres/tree/single-row-mode2 ] > > Notes: > > * Hardest part is picking realistic queries that matter. > It's possible to construct artificial queries that make > results go either way. > > * It does not make sense for compare -f with others. But it > does make sense to compare -f from differently patched libpqs > to detect any potential slowdowns. > > * The time measured is User Time of client process. > > --- > QUERY: select 1,200,30,rpad('x',30,'z') from > generate_series(1,500) > ./rowdump1 -f: 3.90 3.90 3.93 avg: 3.91 > ./rowdump2 -f: 4.03 4.13 4.05 avg: 4.07 > ./rowdump1 -s: 6.26 6.33 6.49 avg: 6.36 > ./rowdump2 -s: 7.48 7.46 7.50 avg: 7.48 > ./rowdump1 -z: 2.88 2.90 2.79 avg: 2.86 > QUERY: select > rpad('x',10,'z'),rpad('x',20,'z'),rpad('x',30,'z'),rpad('x',40,'z'),rpad('x',50,'z'),rpad('x',60,'z') > from generate_series(1,300) > ./rowdump1 -f: 6.29 6.36 6.14 avg: 6.26 > ./rowdump2 -f: 6.79 6.69 6.72 avg: 6.73 > ./rowdump1 -s: 7.71 7.72 7.80 avg: 7.74 > ./rowdump2 -s: 8.14 8.16 8.57 avg: 8.29 > ./rowdump1 -z: 6.45 5.15 5.16 avg: 5.59 > QUERY: select > 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100 > from generate_series(1,80) > ./rowdump1 -f: 5.68 5.66 5.72 avg: 5.69 > ./rowdump2 -f: 5.69 5.84 5.67 avg: 5.73 > ./rowdump1 -s: 7.68 7.76 7.67 avg: 7.70 > ./rowdump2 -s: 7.57 7.54 7.62 avg: 7.58 > ./rowdump1 -z: 2.78 2.82 2.72 avg: 2.77 > QUERY: select 1000,rpad('x', 400, 'z'),rpad('x', 4000, 'z') from > generate_series(1,10) > ./rowdump1 -f: 2.71 2.66 2.58 avg: 2.65 > ./rowdump2 -f: 3.11 3.14 3.16 avg: 3.14 > ./rowdump1 -s: 2.64 2.61 2.64 avg: 2.63 > ./rowdump2 -s: 3.15 3.11 3.11 avg: 3.12 > ./rowdump1 -z: 2.53 2.51 2.46 avg: 2.50 > --- > > Test code attached. Please play with it. > > By this test, both rowBuf and PQgetRowData() look good. I agree on performance grounds. It's important for libpq to be fast. It seems odd (but maybe ok) that you have to set the single row mode on the connection only to have the server reset it whenever you call a send function: maybe rename to PQsetResultSingleRowMode? Does PQgetRowData() break the ability to call PQgetvalue() against the result as well as other functions like PQgetisnull()? If so, I object. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] libpq one-row-at-a-time API
Here is a simple test program that takes a SELECT query, reads it and outputs a COPY-formatted stream to standard output, to simulate some activity. It operates on 3 modes, specified by commant-line switches: -f Load full resultset at once - old way. -s Single-Row mode using PQgetResult(). -z Single-Row mode using PQgetRowData(). It is compiled with 2 different libpqs that correspond to single-row-modeX branches in my github repo: rowdump1 - libpq with rowBuf + PQgetRowData(). rowBuf is required for PQgetRowData. [ https://github.com/markokr/postgres/tree/single-row-mode1 ] rowdump2 - Plain libpq patched for single-row mode. No PQgetRowData() here. [ https://github.com/markokr/postgres/tree/single-row-mode2 ] Notes: * Hardest part is picking realistic queries that matter. It's possible to construct artificial queries that make results go either way. * It does not make sense for compare -f with others. But it does make sense to compare -f from differently patched libpqs to detect any potential slowdowns. * The time measured is User Time of client process. --- QUERY: select 1,200,30,rpad('x',30,'z') from generate_series(1,500) ./rowdump1 -f: 3.90 3.90 3.93 avg: 3.91 ./rowdump2 -f: 4.03 4.13 4.05 avg: 4.07 ./rowdump1 -s: 6.26 6.33 6.49 avg: 6.36 ./rowdump2 -s: 7.48 7.46 7.50 avg: 7.48 ./rowdump1 -z: 2.88 2.90 2.79 avg: 2.86 QUERY: select rpad('x',10,'z'),rpad('x',20,'z'),rpad('x',30,'z'),rpad('x',40,'z'),rpad('x',50,'z'),rpad('x',60,'z') from generate_series(1,300) ./rowdump1 -f: 6.29 6.36 6.14 avg: 6.26 ./rowdump2 -f: 6.79 6.69 6.72 avg: 6.73 ./rowdump1 -s: 7.71 7.72 7.80 avg: 7.74 ./rowdump2 -s: 8.14 8.16 8.57 avg: 8.29 ./rowdump1 -z: 6.45 5.15 5.16 avg: 5.59 QUERY: select 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100 from generate_series(1,80) ./rowdump1 -f: 5.68 5.66 5.72 avg: 5.69 ./rowdump2 -f: 5.69 5.84 5.67 avg: 5.73 ./rowdump1 -s: 7.68 7.76 7.67 avg: 7.70 ./rowdump2 -s: 7.57 7.54 7.62 avg: 7.58 ./rowdump1 -z: 2.78 2.82 2.72 avg: 2.77 QUERY: select 1000,rpad('x', 400, 'z'),rpad('x', 4000, 'z') from generate_series(1,10) ./rowdump1 -f: 2.71 2.66 2.58 avg: 2.65 ./rowdump2 -f: 3.11 3.14 3.16 avg: 3.14 ./rowdump1 -s: 2.64 2.61 2.64 avg: 2.63 ./rowdump2 -s: 3.15 3.11 3.11 avg: 3.12 ./rowdump1 -z: 2.53 2.51 2.46 avg: 2.50 --- Test code attached. Please play with it. By this test, both rowBuf and PQgetRowData() look good. -- marko pg1 = /opt/apps/pgsql92mode1 pg2 = /opt/apps/pgsql92mode2 CFLAGS = -O -g -Wall all: rowdump1 rowdump2 rowdump1: rowdump.c $(CC) -I$(pg1)/include $(CFLAGS) -o $@ $< -L$(pg1)/lib -Wl,-rpath=$(pg1)/lib -lpq -DHAVE_ROWDATA rowdump2: rowdump.c $(CC) -I$(pg2)/include $(CFLAGS) -o $@ $< -L$(pg2)/lib -Wl,-rpath=$(pg2)/lib -lpq clean: rm -f rowdump1 rowdump2 xtest.sh Description: Bourne shell script #include #include #include #include #include #include struct Context { PGconn *db; int count; char *buf; int buflen; int bufpos; }; static void die(PGconn *db, const char *msg) { if (db) fprintf(stderr, "%s: %s\n", msg, PQerrorMessage(db)); else fprintf(stderr, "%s\n", msg); exit(1); } static void out_flush(struct Context *ctx) { int out; if (!ctx->buf) return; out = write(1, ctx->buf, ctx->bufpos); if (out != ctx->bufpos) die(NULL, "failed to write file"); ctx->bufpos = 0; ctx->buflen = 0; free(ctx->buf); ctx->buf = NULL; } static void out_char(struct Context *ctx, char c) { if (ctx->bufpos + 1 > ctx->buflen) { if (!ctx->buf) { ctx->buflen = 16; ctx->buf = malloc(ctx->buflen); if (!ctx->buf) die(NULL, "failed to allocate buffer"); } else { ctx->buflen *= 2; ctx->buf = realloc(ctx->buf, ctx->buflen); if (!ctx->buf) die(NULL, "failed to resize buffer"); } } ctx->buf[ctx->bufpos++] = c; } static void proc_value(struct Context *ctx, const char *val, int vlen) { int i; char c; for (i = 0; i < vlen; i++) { c = val[i]; switch (c) { case '\\': out_char(ctx, '\\'); out_char(ctx, '\\'); break; case '\t': out_char(ctx, '\\'); out_char(ctx, 't'); break; case '\n': out_char(ctx, '\\'); out_char(ctx, 'n'); break; case '\r': out_char(ctx, '\\'); out_char(ctx, 'r'); break; default: out_char(ctx, c); break; } } } static void proc_row(struct Context *ctx, PGresult *res, int tup) { int n = PQnfields(res); const char *val; int i, vlen; ctx
Re: [HACKERS] [patch] libpq one-row-at-a-time API
Marko Kreen writes: > Here is 2 approaches how to get to state where only PQsetSingleRowMode() > is available. Both on top of REL9_2_STABLE branch. > a) Remove callback hooks, keep rowBuf, implement single-row-mode on >top of that. This was posted before, I just additionally removed >the PQgetRowData() function. > b) Revert row-callback changes completely, implement single-row-mode on >top of virgin libpq. Only problem here was keeping fixes implemented >as part of row-callback patch. Single-row-mode itself is quite simple. Yeah, I was wondering if we shouldn't revisit the whole patch given that where we're ending up looks little like where we thought we were going. I hadn't had time to pursue the idea, but I'm glad you did. Will look at these. One reason to stick with approach (a) is that it gives us the option to easily add PQgetRowData(), in case future testing shows that that's actually worth the risk and usage restrictions. While I'm a bit dubious of that, I'm prepared to be proven wrong. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] libpq one-row-at-a-time API
Here is 2 approaches how to get to state where only PQsetSingleRowMode() is available. Both on top of REL9_2_STABLE branch. a) Remove callback hooks, keep rowBuf, implement single-row-mode on top of that. This was posted before, I just additionally removed the PQgetRowData() function. git pull git://github.com/markokr/postgres.git single-row-mode1 https://github.com/markokr/postgres/commits/single-row-mode1 Commits: libpq: Single-row based processing libpq, dblink: Remove row processor API Advantage: makes easier to play with PQgetRowData() or potatial faster PGresult creation methods. Smaller change compared to libpq from 9.2beta than b). b) Revert row-callback changes completely, implement single-row-mode on top of virgin libpq. Only problem here was keeping fixes implemented as part of row-callback patch. Single-row-mode itself is quite simple. git pull git://github.com/markokr/postgres.git single-row-mode1 https://github.com/markokr/postgres/commits/single-row-mode1 Feature patch: https://github.com/markokr/postgres/commit/b5e822125c655f189875401c61317242705143b9 Commits: dblink: revert conversion to row processor API patch libpq: revert row processor API patch libpq: random fixes libpq: single-row mode dblink: use single-row-mode Advantage: smaller change compared to libpq from 9.1 than a). As the patch has suffered a lot from trying to provide both macro- and micro-optimization (on-the-fly row processing vs. more efficient row processing), maybe b) is safer choice for 9.2? In case somebody wants to look at the patches without git (or web), I attaches them as tgz too. -- marko single-row.tgz Description: GNU Unix tar archive -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] libpq one-row-at-a-time API
On Mon, Jul 16, 2012 at 6:47 PM, Tom Lane wrote: > Marko Kreen writes: >> On Mon, Jul 16, 2012 at 4:33 PM, Tom Lane wrote: >>> Mm. I still think we should drop it, because it's still a dangerous API >>> that's not necessary for the principal benefit of this feature. > >> Yes, it is a secondary feature, but it fits the needs of the actual target >> audience of the single-row feature - various high-level wrappers of libpq. > >> Also it is needed for high-performance situations, where the >> single-row-mode fits well even for C clients, except the >> advantage is negated by new malloc-per-row overhead. > > Absolutely no evidence has been presented that there's any useful > performance gain to be had there. Moreover, if there were, we could > probably work a bit harder at making PGresult creation cheaper, rather > than having to expose a dangerous API. Ok, I'm more interested in primary feature, so no more objections from me. -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] libpq one-row-at-a-time API
Marko Kreen writes: > On Mon, Jul 16, 2012 at 4:33 PM, Tom Lane wrote: >> Mm. I still think we should drop it, because it's still a dangerous API >> that's not necessary for the principal benefit of this feature. > Yes, it is a secondary feature, but it fits the needs of the actual target > audience of the single-row feature - various high-level wrappers of libpq. > Also it is needed for high-performance situations, where the > single-row-mode fits well even for C clients, except the > advantage is negated by new malloc-per-row overhead. Absolutely no evidence has been presented that there's any useful performance gain to be had there. Moreover, if there were, we could probably work a bit harder at making PGresult creation cheaper, rather than having to expose a dangerous API. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] libpq one-row-at-a-time API
On Mon, Jul 16, 2012 at 4:33 PM, Tom Lane wrote: > Marko Kreen writes: >> On Mon, Jul 16, 2012 at 4:11 AM, Tom Lane wrote: >>> I'm starting to look at this patch now. I think we could drop the >>> PQgetRowData() API: it complicates matters for little gain that I can >>> see. The argument for it was to avoid the cost of creating a PGresult >>> per row, but we're already going to pay the cost of creating a >>> PGresult in order to return the PGRES_SINGLE_TUPLE status. > >> No. Please look again, it is supposed to be called instead of PGgetResult(). > > Mm. I still think we should drop it, because it's still a dangerous API > that's not necessary for the principal benefit of this feature. Yes, it is a secondary feature, but it fits the needs of the actual target audience of the single-row feature - various high-level wrappers of libpq. Also it is needed for high-performance situations, where the single-row-mode fits well even for C clients, except the advantage is negated by new malloc-per-row overhead. -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] libpq one-row-at-a-time API
Marko Kreen writes: > On Mon, Jul 16, 2012 at 4:11 AM, Tom Lane wrote: >> I'm starting to look at this patch now. I think we could drop the >> PQgetRowData() API: it complicates matters for little gain that I can >> see. The argument for it was to avoid the cost of creating a PGresult >> per row, but we're already going to pay the cost of creating a >> PGresult in order to return the PGRES_SINGLE_TUPLE status. > No. Please look again, it is supposed to be called instead of PGgetResult(). Mm. I still think we should drop it, because it's still a dangerous API that's not necessary for the principal benefit of this feature. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] libpq one-row-at-a-time API
On Mon, Jul 16, 2012 at 4:11 AM, Tom Lane wrote: > Marko Kreen writes: >> Now, looking at the problem with some perspective, the solution >> is obvious: when in single-row mode, the PQgetResult() must return >> proper PGresult for that single row. And everything else follows that. > >> Such API is implemented in attached patch: > > I'm starting to look at this patch now. I think we could drop the > PQgetRowData() API: it complicates matters for little gain that I can > see. The argument for it was to avoid the cost of creating a PGresult > per row, but we're already going to pay the cost of creating a > PGresult in order to return the PGRES_SINGLE_TUPLE status. No. Please look again, it is supposed to be called instead of PGgetResult(). -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] libpq one-row-at-a-time API
Marko Kreen writes: > Now, looking at the problem with some perspective, the solution > is obvious: when in single-row mode, the PQgetResult() must return > proper PGresult for that single row. And everything else follows that. > Such API is implemented in attached patch: I'm starting to look at this patch now. I think we could drop the PQgetRowData() API: it complicates matters for little gain that I can see. The argument for it was to avoid the cost of creating a PGresult per row, but we're already going to pay the cost of creating a PGresult in order to return the PGRES_SINGLE_TUPLE status. And as was pointed out upthread, any per-tuple malloc costs are going to be in the noise compared to the server-side effort expended to create the tuple, anyway. The point of this feature is to avoid accumulating the entire resultset in memory, not to micro-optimize linear-time costs. Moreover, if the argument for changing 9.2 at this late date is to get rid of a fragile, breakable API, surely an API that's designed around returning pointers into the library's network buffer ought to be a prime target. And lastly, since the proposed patch for dblink doesn't use PQgetRowData, there's not even much reason to think that it's bug-free. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] libpq one-row-at-a-time API
On 17 June 2012 19:37, Marko Kreen wrote: > On Sun, Jun 17, 2012 at 2:07 PM, Simon Riggs wrote: >> I prefer the description of Marko's API than the one we have now. >> >> Adopting one API in 9.2 and another in 9.3 would be fairly bad. >> Perhaps we can have both? > > I see no reason the keep the (public) callback API around, > except if we don't bother to remove it now. OK by me. >> Can we see a performance test? "Add a row processor API to libpq for >> better handling of large result sets". So idea is we do this many, >> many times so we need to double check the extra overhead is not a >> problem in cases where the dumping overhead is significant. ... > I did benchmark it, and it seems there are column-size > + column-count patterns where new way is faster, > and some patterns where old way is faster. But the > difference did not raise above test noise so I concluded > it is insignificant and the malloc+copy avoidance is worth it. As long as we've checked that's fine. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] libpq one-row-at-a-time API
On Sun, Jun 17, 2012 at 2:07 PM, Simon Riggs wrote: > I prefer the description of Marko's API than the one we have now. > > Adopting one API in 9.2 and another in 9.3 would be fairly bad. > Perhaps we can have both? I see no reason the keep the (public) callback API around, except if we don't bother to remove it now. > Can we see a performance test? "Add a row processor API to libpq for > better handling of large result sets". So idea is we do this many, > many times so we need to double check the extra overhead is not a > problem in cases where the dumping overhead is significant. Not sure what do to want to performance test. PQgetRowData() uses exactly the same pipeline that callbacks used. It will use few more C calls, not sure it make sense to benchmark them. Recent dblink change did change palloc() + copy zero-termination dance to PQgetResult(), which does malloc() + copy dance internally. This malloc vs. palloc might be benchmarkable, but it seems to go into micro-benchmarking world as the big win came from avoiding buffering rows. So yeah, maybe using PQgetRowData() might be tiny bit faster, but I don't expect much difference. But all this affects new users only. The thing that affects everybody was the 2-step row processing change that was done during rowproc patch. I did benchmark it, and it seems there are column-size + column-count patterns where new way is faster, and some patterns where old way is faster. But the difference did not raise above test noise so I concluded it is insignificant and the malloc+copy avoidance is worth it. Ofcourse, additional any benchmarking is welcome, so feel free to pick any situation you care about. -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] libpq one-row-at-a-time API
On Sat, Jun 16, 2012 at 7:58 PM, Marko Kreen wrote: > So my preference would be to simply remove the callback API > but keep the processing and provide PQgetRowData() instead. This is implemented in attached patch. It also converts dblink to use single-row API. The patch should be applied on top of previous single-row patch. Both can be seen also here: https://github.com/markokr/postgres/commits/single-row -- marko remove-rowproc.diff.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] libpq one-row-at-a-time API
On 16 June 2012 23:09, Tom Lane wrote: > Marko Kreen writes: >>> Now, looking at the problem with some perspective, the solution >>> is obvious: when in single-row mode, the PQgetResult() must return >>> proper PGresult for that single row. And everything else follows that. >>> >>> * PQgetRowData(): can be called instead PQgetResult() to get raw row data >>> in buffer, for more efficient processing. This is optional feature >>> that provides the original row-callback promise of avoiding unnecessary >>> row data copy. >>> >>> * Although PQgetRowData() makes callback API unnecessary, it is still >>> fully compatible with it - the callback should not see any difference >>> whether the resultset is processed in single-row mode or >>> old single-PGresult mode. Unless it wants to - it can check >>> PGRES_TUPLES_OK vs. PGRES_SINGLE_TUPLE. > > I guess this raises the question of whether we ought to revert the > row-callback patch entirely and support only this approach. IMO > it is (barely) not too late to do that for 9.2, if we want to. > If we don't want to, then this is just another new feature and > should be considered for 9.3. > > What I like about this is the greatly simpler and harder-to-misuse > API. The only arguable drawback is that there's still at least one > malloc/free cycle per tuple, imposed by the creation of a PGresult > for each one, whereas the callback approach avoids that. But worrying > about that could be considered to be vast overoptimization; the backend > has certainly spent a lot more overhead than that generating the tuple. I prefer the description of Marko's API than the one we have now. Adopting one API in 9.2 and another in 9.3 would be fairly bad. Perhaps we can have both? Can we see a performance test? "Add a row processor API to libpq for better handling of large result sets". So idea is we do this many, many times so we need to double check the extra overhead is not a problem in cases where the dumping overhead is significant. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] libpq one-row-at-a-time API
On Sat, Jun 16, 2012 at 6:09 PM, Tom Lane wrote: > I guess this raises the question of whether we ought to revert the > row-callback patch entirely and support only this approach. IMO > it is (barely) not too late to do that for 9.2, if we want to. > If we don't want to, then this is just another new feature and > should be considered for 9.3. I think row-callback is dangerous API that does not solve any important problems. But I do like the 2-phase processing the rowproc patch introduced and having a way to bypass unnecessary malloc()+copy. So my preference would be to simply remove the callback API but keep the processing and provide PQgetRowData() instead. Although the win that it brings is significantly smaller thanks to single-row PQgetResult(). So if it does not sound interesting to others, it can be dropped. Because the single-row processing is the important feature we need, rest is extra. -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] libpq one-row-at-a-time API
Marko Kreen writes: >> Now, looking at the problem with some perspective, the solution >> is obvious: when in single-row mode, the PQgetResult() must return >> proper PGresult for that single row. And everything else follows that. >> >> * PQgetRowData(): can be called instead PQgetResult() to get raw row data >> in buffer, for more efficient processing. This is optional feature >> that provides the original row-callback promise of avoiding unnecessary >> row data copy. >> >> * Although PQgetRowData() makes callback API unnecessary, it is still >> fully compatible with it - the callback should not see any difference >> whether the resultset is processed in single-row mode or >> old single-PGresult mode. Unless it wants to - it can check >> PGRES_TUPLES_OK vs. PGRES_SINGLE_TUPLE. I guess this raises the question of whether we ought to revert the row-callback patch entirely and support only this approach. IMO it is (barely) not too late to do that for 9.2, if we want to. If we don't want to, then this is just another new feature and should be considered for 9.3. What I like about this is the greatly simpler and harder-to-misuse API. The only arguable drawback is that there's still at least one malloc/free cycle per tuple, imposed by the creation of a PGresult for each one, whereas the callback approach avoids that. But worrying about that could be considered to be vast overoptimization; the backend has certainly spent a lot more overhead than that generating the tuple. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] libpq one-row-at-a-time API
Demos: https://github.com/markokr/libpq-rowproc-demos/blob/master/demo-onerow-sync.c https://github.com/markokr/libpq-rowproc-demos/blob/master/demo-onerow-async.c Few clarifications below. On Fri, Jun 15, 2012 at 9:21 PM, Marko Kreen wrote: > Now, looking at the problem with some perspective, the solution > is obvious: when in single-row mode, the PQgetResult() must return > proper PGresult for that single row. And everything else follows that. > > Such API is implemented in attached patch: > > * PQsetSingleRowMode(conn): set's single-row mode. The function can be called only after PQsend* and before any rows have arrived. This guarantees there will be no surprises to PQexec* users who expect full resultset at once. Also it guarantees that user will process resultset with PQgetResult() loop, either sync or async. Next PQexec/PQsend call will reset the flag. So it is active only for duration of processing results from one command. Currently it returns FALSE if called in wrong place and does nothing. Only question I see here is whether it should set error state on connection or not. It does not seem to be improvement. > * PQgetRowData(): can be called instead PQgetResult() to get raw row data > in buffer, for more efficient processing. This is optional feature > that provides the original row-callback promise of avoiding unnecessary > row data copy. > > * Although PQgetRowData() makes callback API unnecessary, it is still > fully compatible with it - the callback should not see any difference > whether the resultset is processed in single-row mode or > old single-PGresult mode. Unless it wants to - it can check > PGRES_TUPLES_OK vs. PGRES_SINGLE_TUPLE. The PQgetResult() is compatible with callbacks, the PQgetRowData() bypasses them. -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] libpq one-row-at-a-time API
On Fri, Jun 15, 2012 at 1:21 PM, Marko Kreen wrote: > The row-processor API is now in 9.2, but it solves only the > "different-row-storage" problem, but not the "one-row-at-a-time" > problem, as libpq is still in control until all rows are received. > > This means libpq cannet still be used to implement iterative > result processing that almost all high-level languages are using. > > We discussed potential API for fetching on single row at a time, > but did not reach conclusion. Basic arguments were: > > 1) Tom: PQisBusy must keep current behaviour. Thus also PQgetResult() > must keep current behaviour: > * PQisBusy() -> 0: need to call PQgetResult(), which returns PGresult > * PQisBusy() -> 1: need to call PQconsumeInput() > * PQisBusy() must be callable several times in a row, thus be > stateless from clients POV. > > 2) Me: behaviour must not be controlled by callback, but client code > that uses PQgetResult() + PQisBusy(). > > Now, looking at the problem with some perspective, the solution > is obvious: when in single-row mode, the PQgetResult() must return > proper PGresult for that single row. And everything else follows that. > > Such API is implemented in attached patch: > > * PQsetSingleRowMode(conn): set's single-row mode. > > * PQisBusy(): stops after each row in single-row mode, sets PGASYNC_ROW_READY. > Thus keeping the property of being repeatedly callable. > > * PQgetResult(): returns copy of the row if PGASYNC_ROW_READY. Sets row > resultStatus to PGRES_SINGLE_TUPLE. This needs to be different from > PGRES_TUPLES_OK to detect resultset end. > > * PQgetRowData(): can be called instead PQgetResult() to get raw row data > in buffer, for more efficient processing. This is optional feature > that provides the original row-callback promise of avoiding unnecessary > row data copy. > > * Although PQgetRowData() makes callback API unnecessary, it is still > fully compatible with it - the callback should not see any difference > whether the resultset is processed in single-row mode or > old single-PGresult mode. Unless it wants to - it can check > PGRES_TUPLES_OK vs. PGRES_SINGLE_TUPLE. > > There is some duplicate code here that can be refactored (callback exec), > but I did not do it yet to avoid affecting existing code too much. > > -- > marko > > PS. If a squint it seems like fix of exising API instead of new feature, > so perhaps it can still fit into 9.2? +1 on rushing in row processing for 9.2, but only if the API feels right (i'll spend some time to review). I found the lack of iterative row processing to be really unfortunate. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [patch] libpq one-row-at-a-time API
The row-processor API is now in 9.2, but it solves only the "different-row-storage" problem, but not the "one-row-at-a-time" problem, as libpq is still in control until all rows are received. This means libpq cannet still be used to implement iterative result processing that almost all high-level languages are using. We discussed potential API for fetching on single row at a time, but did not reach conclusion. Basic arguments were: 1) Tom: PQisBusy must keep current behaviour. Thus also PQgetResult() must keep current behaviour: * PQisBusy() -> 0: need to call PQgetResult(), which returns PGresult * PQisBusy() -> 1: need to call PQconsumeInput() * PQisBusy() must be callable several times in a row, thus be stateless from clients POV. 2) Me: behaviour must not be controlled by callback, but client code that uses PQgetResult() + PQisBusy(). Now, looking at the problem with some perspective, the solution is obvious: when in single-row mode, the PQgetResult() must return proper PGresult for that single row. And everything else follows that. Such API is implemented in attached patch: * PQsetSingleRowMode(conn): set's single-row mode. * PQisBusy(): stops after each row in single-row mode, sets PGASYNC_ROW_READY. Thus keeping the property of being repeatedly callable. * PQgetResult(): returns copy of the row if PGASYNC_ROW_READY. Sets row resultStatus to PGRES_SINGLE_TUPLE. This needs to be different from PGRES_TUPLES_OK to detect resultset end. * PQgetRowData(): can be called instead PQgetResult() to get raw row data in buffer, for more efficient processing. This is optional feature that provides the original row-callback promise of avoiding unnecessary row data copy. * Although PQgetRowData() makes callback API unnecessary, it is still fully compatible with it - the callback should not see any difference whether the resultset is processed in single-row mode or old single-PGresult mode. Unless it wants to - it can check PGRES_TUPLES_OK vs. PGRES_SINGLE_TUPLE. There is some duplicate code here that can be refactored (callback exec), but I did not do it yet to avoid affecting existing code too much. -- marko PS. If a squint it seems like fix of exising API instead of new feature, so perhaps it can still fit into 9.2? commit 4114613 (HEAD, single-row) Author: Marko Kreen Date: Sat Apr 7 15:05:01 2012 +0300 Single-row based processing diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml index 5c5dd68..0ea2c1f 100644 --- a/doc/src/sgml/libpq.sgml +++ b/doc/src/sgml/libpq.sgml @@ -4018,6 +4018,75 @@ PGresult *PQgetResult(PGconn *conn); + + + + PQsetSingleRowMode + + PQsetSingleRowMode + + + + + + Instead buffering all rows in PGresult + until full resultset has arrived, this changes resultset processing + to return rows as soon as they arrive from network. + + +int PQsetSingleRowMode(PGconn *conn); + + + + + The mode can be changed directly after + PQsendQuery, + PQsendQueryParams, + PQsendQueryPrepared call, and before + any result rows have arrived from network. Then this functions + changes mode and returns 1. Otherwise the mode stays unchanged + and this functions returns 0. + + + + The rows returned have PQresultStatus() of PGRES_SINGLE_TUPLE. + There will be final PGresult that has either PGRES_TUPLES_OK + or PGRES_FATAL_ERROR result status. In case + of error status, the actual query failed in the middle and received rows + should be dropped. + + + + + + + + PQgetRowData + + PQgetRowData + + + + + + In single row mode it is possible to get row data directly, + without constructing PGresult for + each row. + + +int PQgetRowData(PGconn *conn, PGresult **hdr, PGdataValue **columns); + + + + + It can be called everywhere PQgetResult can. + It returns 1 and fills pointers if there is row data avilable. + It returns 0 otherwise. Then PQgetResult + should be called to get final status. + + + + diff --git a/src/interfaces/libpq/exports.txt b/src/interfaces/libpq/exports.txt index 1251455..a228a71 100644 --- a/src/interfaces/libpq/exports.txt +++ b/src/interfaces/libpq/exports.txt @@ -163,3 +163,5 @@ PQlibVersion 160 PQsetRowProcessor 161 PQgetRowProcessor 162 PQskipResult 163 +PQsetSingleRowMode164 +PQgetRowData 165 diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c index badc0b3..ba9215b 100644 --- a/src/interfaces/libpq/fe-exec.c +++ b/src/interfaces/libpq/fe-exec.c @@ -1344,6 +1344,9 @@ PQsendQueryStart(PGconn *conn) /* initialize async result-accumulation state */ conn->result = NULL; + /* reset single-row process