subject:"Re\: \[HACKERS\] \[patch\] libpq one\-row\-at\-a\-time API"

Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-08-02 Thread Marko Kreen

On Thu, Aug 2, 2012 at 8:19 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Marko Kreen mark...@gmail.com writes:
 On Wed, Aug 1, 2012 at 6:18 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 So I'm working from the first set of patches in your message
 20120721194907.ga28...@gmail.com.

 Great!

 Here's an updated draft patch.  I've not looked at the docs yet so
 this doesn't include that, but I'm reasonably happy with the code
 changes now.  The main difference from what you had is that I pushed
 the creation of the single-row PGresults into pqRowProcessor, so that
 they're formed immediately while scanning the input message.  That
 other method with postponing examination of the rowBuf does not work,
 any more than it did with the original patch, because you can't assume
 that the network input buffer is going to hold still.  For example,
 calling PQconsumeInput after parseInput has absorbed a row-data message
 but before calling PQgetResult would likely break things.

 In principle I suppose we could hack PQconsumeInput enough that it
 didn't damage the row buffer while still meeting its API contract of
 clearing the read-ready condition on the socket; but it wouldn't be
 simple or cheap to do so.

Any code using single-row-mode is new.  Any code calling consumeInput
before fully draining rows from buffer is buggy, as it allows unlimited growth
of network buffer, which breaks whole reason to use single-row mode.

It was indeed bug in previous patch that it did not handle this situation,
but IMHO it should be handled with error and not with allowing such code
to work.

Previously, whatever sequence the fetch functions were called, the apps
max memory usage was either 1x resultset size, or max 2x resultset size,
if it messed the sequence somehow.  But no more.  So it app knew
that resultset was big, it needed to split it up.

Now with single-row-mode, the apps can assume their max memory usage
is 1*row size + network buffer, lets simplify that to 2x row size.
But no more.  And now they can process huge resultsets assuming
their memory usage will be no more than 2x row size.

And now libpq allows such apps to go from 2x row size to full resultset
size with unfortunate coding.  Why?


I have opinions about reorg details too, but that's mostly matter of taste,
so rather unimportant compared to question whether we should allow
code to break the guarantees the single-row-mode gives.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-08-02 Thread Tom Lane

Marko Kreen mark...@gmail.com writes:
 On Thu, Aug 2, 2012 at 8:19 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 In principle I suppose we could hack PQconsumeInput enough that it
 didn't damage the row buffer while still meeting its API contract of
 clearing the read-ready condition on the socket; but it wouldn't be
 simple or cheap to do so.

 Any code using single-row-mode is new.  Any code calling consumeInput
 before fully draining rows from buffer is buggy, as it allows unlimited growth
 of network buffer, which breaks whole reason to use single-row mode.

Sorry, that argument will not fly.  The API specification for
PQconsumeInput is that it can be called at any time to drain the kernel
input buffer, without changing the logical state of the PGconn.  It's
not acceptable to insert a parenthetical oh, but you'd better not try
it in single-row mode caveat.

The reason I find this of importance is that PQconsumeInput is meant to
be used in an application's main event loop for the sole purpose of
clearing the read-ready condition on the connection's socket, so that
it can wait for other conditions.  This could very well be decoupled
from the logic that is involved in testing PQisBusy and
fetching/processing actual results.  (If we had not intended to allow
those activities to be decoupled, we'd not have separated PQconsumeInput
and PQisBusy in the first place.)  Introducing a dependency between
these things could lead to non-reproducible, timing-dependent, very
hard-to-find bugs.

By the same token, if somebody were trying to put single-row-mode
support into an existing async application, they'd be fooling with the
parts that issue new queries and collect results, but probably not with
the main event loop.  Thus, I do not believe your argument above that
any code using single-row mode is new.  The whole app is not going to
be new.  It could be very hard to guarantee that there is no path of
control that invokes PQconsumeInput before the new data is actually
processed, because there was never before a reason to avoid extra
PQconsumeInput calls.

As I said, this is the exact same objection I had to the original scheme
of exposing the row buffer directly.  Putting a libpq function in front
of the row buffer doesn't solve the problem if that function is
implemented in a way that's still vulnerable to misuse or race conditions.

 And now libpq allows such apps to go from 2x row size to full resultset
 size with unfortunate coding.  Why?

This argument is completely nonsensical.  The contract for
PQconsumeInput has always been that it consumes whatever the kernel has
available.  If you don't extract data from the library before calling it
again, the library's internal buffer may grow to more than the minimum
workable size, but that's a tradeoff you may be willing to make to
simplify your application logic.  I do not think that it's an
improvement to change the API spec to say either that you can't call
PQconsumeInput in certain states, or that you can but it won't absorb
data from the kernel.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-08-02 Thread Marko Kreen

On Fri, Aug 3, 2012 at 12:01 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Marko Kreen mark...@gmail.com writes:
 On Thu, Aug 2, 2012 at 8:19 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 In principle I suppose we could hack PQconsumeInput enough that it
 didn't damage the row buffer while still meeting its API contract of
 clearing the read-ready condition on the socket; but it wouldn't be
 simple or cheap to do so.

 Any code using single-row-mode is new.  Any code calling consumeInput
 before fully draining rows from buffer is buggy, as it allows unlimited 
 growth
 of network buffer, which breaks whole reason to use single-row mode.

 Sorry, that argument will not fly.  The API specification for
 PQconsumeInput is that it can be called at any time to drain the kernel
 input buffer, without changing the logical state of the PGconn.  It's
 not acceptable to insert a parenthetical oh, but you'd better not try
 it in single-row mode caveat.

Well, if the old API contract must be kept, then so be it.

 The reason I find this of importance is that PQconsumeInput is meant to
 be used in an application's main event loop for the sole purpose of
 clearing the read-ready condition on the connection's socket, so that
 it can wait for other conditions.  This could very well be decoupled
 from the logic that is involved in testing PQisBusy and
 fetching/processing actual results.  (If we had not intended to allow
 those activities to be decoupled, we'd not have separated PQconsumeInput
 and PQisBusy in the first place.)  Introducing a dependency between
 these things could lead to non-reproducible, timing-dependent, very
 hard-to-find bugs.

 By the same token, if somebody were trying to put single-row-mode
 support into an existing async application, they'd be fooling with the
 parts that issue new queries and collect results, but probably not with
 the main event loop.  Thus, I do not believe your argument above that
 any code using single-row mode is new.  The whole app is not going to
 be new.  It could be very hard to guarantee that there is no path of
 control that invokes PQconsumeInput before the new data is actually
 processed, because there was never before a reason to avoid extra
 PQconsumeInput calls.

I've thought about this.  On first glance indeed, if async app
has both reader and writer registered in event loop, it might be
simpler to keep reading from source socket, event if writer stalls.

But this is exactly the situation where error from consumeInput would help.
Imagine fast source and slow target (common scenario - a database query
for each row).  Reading from source *must* be stopped to get
predictable memory usage,.

 As I said, this is the exact same objection I had to the original scheme
 of exposing the row buffer directly.  Putting a libpq function in front
 of the row buffer doesn't solve the problem if that function is
 implemented in a way that's still vulnerable to misuse or race conditions.

 And now libpq allows such apps to go from 2x row size to full resultset
 size with unfortunate coding.  Why?

 This argument is completely nonsensical.  The contract for
 PQconsumeInput has always been that it consumes whatever the kernel has
 available.  If you don't extract data from the library before calling it
 again, the library's internal buffer may grow to more than the minimum
 workable size, but that's a tradeoff you may be willing to make to
 simplify your application logic.  I do not think that it's an
 improvement to change the API spec to say either that you can't call
 PQconsumeInput in certain states, or that you can but it won't absorb
 data from the kernel.

Old patch had a nice property that we could replace PQgetResult()
with something else.  To allow that and also PQconsumeInput(),
we could store offsets in rowBuf, and then skip realign in PQconsumeInput().

Not sure if the replacement reason is worth keeping, but the
offsets may be useful even now as they might give additional
speedup as they decrease the per-column storage
from 16 to 8 bytes on 64-bit architectures.

May be better left for 9.3 though...

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-08-01 Thread Tom Lane

[ getting back to this now after assorted distractions ]

Marko Kreen mark...@gmail.com writes:
 Just to show agreement: both PQgetRowData() and optimized PGresult
 do not belong to 9.2.

OK, we're all on board with leaving those for later.

 Only open questions are:

 * Is there better API than PQsetSingleRowMode()?  New PQsend*
   functions is my alternative.

After thinking it over, I'm really unexcited about adding new versions
of all the PQsend functions.  If we had the prospect of more flags in
future that could be added to a bitmask flags argument, it would be
more palatable, but it's far from clear what those might be.  So I'm
now leaning towards using PQsetSingleRowMode as-is.

 * Should we rollback rowBuf change? I think no, as per benchmark
   it performs better than old code.

I had already pretty much come to that conclusion just based on code
review, without thinking about performance.  In particular, we had done
some nontrivial work towards improving error-case handling in the data
message parsing code, and I don't really want to give that up, nor
rewrite it on the fly now.  About the only reason I could see for
reverting rowBuf was that I thought it might hurt performance; so now
that you've proven the opposite, we should leave it alone.

So I'm working from the first set of patches in your message
20120721194907.ga28...@gmail.com.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-08-01 Thread Marko Kreen

On Wed, Aug 1, 2012 at 6:18 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Marko Kreen mark...@gmail.com writes:
 * Is there better API than PQsetSingleRowMode()?  New PQsend*
   functions is my alternative.

 After thinking it over, I'm really unexcited about adding new versions
 of all the PQsend functions.  If we had the prospect of more flags in
 future that could be added to a bitmask flags argument, it would be
 more palatable, but it's far from clear what those might be.  So I'm
 now leaning towards using PQsetSingleRowMode as-is.

I can imagine such flag - I'd like to have a flag to allow send more
queries to server without waiting the finish of old ones.

Currently, when a batch-processing app wants to keep
backend busy, they need to stuff many statements into
single query.  Which is ugly.  Among other things it
loses the possibility to separate arguments from query,
and it breaks error reporting.  The flag would fix this,
and it would be useful also in other scenarios.

Interestingly, although it affects different area, it would be also
be flag for PQsend* and not for PQexec*.  So maybe
the direction is not completely wrong.

But I cannot imagine why the PQsetSingleRowMode would be
hard to keep working even if we have PQsend functions with flags,
so I'm not worried about getting it exactly right from the start.

 * Should we rollback rowBuf change? I think no, as per benchmark
   it performs better than old code.

 I had already pretty much come to that conclusion just based on code
 review, without thinking about performance.  In particular, we had done
 some nontrivial work towards improving error-case handling in the data
 message parsing code, and I don't really want to give that up, nor
 rewrite it on the fly now.  About the only reason I could see for
 reverting rowBuf was that I thought it might hurt performance; so now
 that you've proven the opposite, we should leave it alone.

Additional argument for rowBuf is Merlin's wish for weirdly optimized
PGresults.  Basically, with rowBuf anybody who wants to experiment
with different ways to process row data just needs to implement
single function which processes data then advances the state
machine.  Like I did with PQgetRowData().

Without it, quite lot of code needs to be patched.

 So I'm working from the first set of patches in your message
 20120721194907.ga28...@gmail.com.

Great!

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-08-01 Thread Tom Lane

Marko Kreen mark...@gmail.com writes:
 On Wed, Aug 1, 2012 at 6:18 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 So I'm working from the first set of patches in your message
 20120721194907.ga28...@gmail.com.

 Great!

Here's an updated draft patch.  I've not looked at the docs yet so
this doesn't include that, but I'm reasonably happy with the code
changes now.  The main difference from what you had is that I pushed
the creation of the single-row PGresults into pqRowProcessor, so that
they're formed immediately while scanning the input message.  That
other method with postponing examination of the rowBuf does not work,
any more than it did with the original patch, because you can't assume
that the network input buffer is going to hold still.  For example,
calling PQconsumeInput after parseInput has absorbed a row-data message
but before calling PQgetResult would likely break things.

In principle I suppose we could hack PQconsumeInput enough that it
didn't damage the row buffer while still meeting its API contract of
clearing the read-ready condition on the socket; but it wouldn't be
simple or cheap to do so.

regards, tom lane

diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 1e62d8091a9d2bdf60af6745d5a01ee14ee5cf5a..7cf86176c2385b9e4ee37c72d7e3c662ea079f7a 100644
*** a/contrib/dblink/dblink.c
--- b/contrib/dblink/dblink.c
*** typedef struct storeInfo
*** 70,75 
--- 70,78 
  	AttInMetadata *attinmeta;
  	MemoryContext tmpcontext;
  	char	  **cstrs;
+ 	/* temp storage for results to avoid leaks on exception */
+ 	PGresult   *last_res;
+ 	PGresult   *cur_res;
  } storeInfo;
  
  /*
*** static void materializeQueryResult(Funct
*** 83,90 
  	   const char *conname,
  	   const char *sql,
  	   bool fail);
! static int storeHandler(PGresult *res, const PGdataValue *columns,
! 			 const char **errmsgp, void *param);
  static remoteConn *getConnectionByName(const char *name);
  static HTAB *createConnHash(void);
  static void createNewConnection(const char *name, remoteConn *rconn);
--- 86,93 
  	   const char *conname,
  	   const char *sql,
  	   bool fail);
! static PGresult *storeQueryResult(storeInfo *sinfo, PGconn *conn, const char *sql);
! static void storeRow(storeInfo *sinfo, PGresult *res, bool first);
  static remoteConn *getConnectionByName(const char *name);
  static HTAB *createConnHash(void);
  static void createNewConnection(const char *name, remoteConn *rconn);
*** materializeResult(FunctionCallInfo fcinf
*** 927,934 
  /*
   * Execute the given SQL command and store its results into a tuplestore
   * to be returned as the result of the current function.
   * This is equivalent to PQexec followed by materializeResult, but we make
!  * use of libpq's row processor API to reduce per-row overhead.
   */
  static void
  materializeQueryResult(FunctionCallInfo fcinfo,
--- 930,939 
  /*
   * Execute the given SQL command and store its results into a tuplestore
   * to be returned as the result of the current function.
+  *
   * This is equivalent to PQexec followed by materializeResult, but we make
!  * use of libpq's single-row mode to avoid accumulating the whole result
!  * inside libpq before it gets transferred to the tuplestore.
   */
  static void
  materializeQueryResult(FunctionCallInfo fcinfo,
*** materializeQueryResult(FunctionCallInfo 
*** 944,962 
  	/* prepTuplestoreResult must have been called previously */
  	Assert(rsinfo-returnMode == SFRM_Materialize);
  
  	PG_TRY();
  	{
! 		/* initialize storeInfo to empty */
! 		memset(sinfo, 0, sizeof(sinfo));
! 		sinfo.fcinfo = fcinfo;
! 
! 		/* We'll collect tuples using storeHandler */
! 		PQsetRowProcessor(conn, storeHandler, sinfo);
! 
! 		res = PQexec(conn, sql);
! 
! 		/* We don't keep the custom row processor installed permanently */
! 		PQsetRowProcessor(conn, NULL, NULL);
  
  		if (!res ||
  			(PQresultStatus(res) != PGRES_COMMAND_OK 
--- 949,962 
  	/* prepTuplestoreResult must have been called previously */
  	Assert(rsinfo-returnMode == SFRM_Materialize);
  
+ 	/* initialize storeInfo to empty */
+ 	memset(sinfo, 0, sizeof(sinfo));
+ 	sinfo.fcinfo = fcinfo;
+ 
  	PG_TRY();
  	{
! 		/* execute query, collecting any tuples into the tuplestore */
! 		res = storeQueryResult(sinfo, conn, sql);
  
  		if (!res ||
  			(PQresultStatus(res) != PGRES_COMMAND_OK 
*** materializeQueryResult(FunctionCallInfo 
*** 975,982 
  		else if (PQresultStatus(res) == PGRES_COMMAND_OK)
  		{
  			/*
! 			 * storeHandler didn't get called, so we need to convert the
! 			 * command status string to a tuple manually
  			 */
  			TupleDesc	tupdesc;
  			AttInMetadata *attinmeta;
--- 975,982 
  		else if (PQresultStatus(res) == PGRES_COMMAND_OK)
  		{
  			/*
! 			 * storeRow didn't get called, so we need to convert the command
! 			 * status string to a tuple manually
  			 */
  			TupleDesc

Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-07-31 Thread Merlin Moncure

On Mon, Jul 30, 2012 at 10:26 PM, Jan Wieck janwi...@yahoo.com wrote:
 On 7/30/2012 10:31 PM, Leon Smith wrote:

 This is not necessarily true,  on multiple levels.   I mean,  some of
 the programs I write are highly concurrent,  and this form of batching
 would have almost no risk of stalling the network buffer.And
 the possible use case would be when you are dealing with very small
 rows,  when there would typically be several rows inside a single
 network packet or network buffer.


 With highly concurrent you mean multi-threaded? Like one thread reads the
 rows in batches and pushes them into a queue while another thread processes
 them from that queue?

 If that is the case, then you just added a useless layer of buffering and
 the need for thread/thread context switches to PQsetSingleRowMode. Libpq's
 receiver thread is the kernel itself. Libpq tries to never read partial
 kernel buffers already. It always makes sure that there are at least 8K of
 free space in the inBuffer. In the case you describe above, where several
 rows fit into a single packet, libpq will receive them with a single system
 call in one read(2), then the application can get them as fast as possible,
 without causing any further context switches because they are already in the
 inBuffer.

Yeah: with asynchronous query processing the query gets sent and
control returns immediately to your code: that's the whole point.
Even if some data races to the network buffer, libpq doesn't 'see' any
data until you tell it to by asking for a result (which can block) or
draining the buffers with PQconsumeInput.  So there is no race in the
traditional sense and I'm ok with the PQsetSingleRowMode as such.

Removing malloc/free on row iteration seems only to be possible via
one of two methods: either a) you introduce a non-PGresult based
method of data extraction or b) you preserve the PGresult across row
iterations.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-07-30 Thread Leon Smith

Hey, this thread was pointed out to me just a few days ago, but I'll start
by saying that I think this thread is on exactly the right track.   I don't
like the callback API,  and think that PQsetSingleRowMode should be offered
in place of it.   But I do have one

On Sat, Jun 16, 2012 at 10:22 AM, Marko Kreen mark...@gmail.com wrote:

 The function can be called only after PQsend* and before any
 rows have arrived.  This guarantees there will be no surprises
 to PQexec* users who expect full resultset at once.


Ok,  I'm guessing you mean that before you call PQgetResult or
PQgetRowData,  or maybe before you call PQgetResult or PQgetRowData and
it returns a result or partial result.Because it would be a race
condition if you meant exactly what you said.   (Though I don't understand
how this could possibly be implemented without some source of concurrency,
which libpq doesn't do.)   Maybe this is a little overly pendantic,  but I
do want to confirm the intention here.

One other possibility,  Tom Lane fretted ever so slightly about the use of
malloc/free per row... what about instead of PQsetSingleRowMode,  you have
PQsetChunkedRowMode that takes a chunkSize parameter.   A chunkSize = 0
would be equivalent to what we have today,   a chunkSize of 1 means you get
what you have from PQsetSingleRowMode,  and larger chunkSizes would wait
until n rows have been received before returning them all in a single
result.  I don't know that this suggestion is all that important, but
it seems like an obvious generalization that might possibly be useful.

Best,
Leon

Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-07-30 Thread Jan Wieck


On 7/30/2012 8:11 PM, Leon Smith wrote:

One other possibility,  Tom Lane fretted ever so slightly about the use
of malloc/free per row... what about instead of PQsetSingleRowMode,  you
have PQsetChunkedRowMode that takes a chunkSize parameter.   A chunkSize
= 0 would be equivalent to what we have today,   a chunkSize of 1 means
you get what you have from PQsetSingleRowMode,  and larger chunkSizes
would wait until n rows have been received before returning them all in
a single result.  I don't know that this suggestion is all that
important, but it seems like an obvious generalization that might
possibly be useful.


It is questionable if that actually adds any useful functionality. Any 
collecting of multiple rows will only run the risk to stall receiving 
the following rows while processing this batch. Processing each row as 
soon as it is available will ensure making most use network buffers.


Collecting multiple rows, like in the FETCH command for cursors does, 
makes sense when each batch introduces a network round trip, like for 
the FETCH command. But does it make any sense for a true streaming mode, 
like what is discussed here?



Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-07-30 Thread Leon Smith

On Mon, Jul 30, 2012 at 9:59 PM, Jan Wieck janwi...@yahoo.com wrote:

 On 7/30/2012 8:11 PM, Leon Smith wrote:

 One other possibility,  Tom Lane fretted ever so slightly about the use
 of malloc/free per row... what about instead of PQsetSingleRowMode,  you
 have PQsetChunkedRowMode that takes a chunkSize parameter.   A chunkSize
 = 0 would be equivalent to what we have today,   a chunkSize of 1 means
 you get what you have from PQsetSingleRowMode,  and larger chunkSizes
 would wait until n rows have been received before returning them all in
 a single result.  I don't know that this suggestion is all that
 important, but it seems like an obvious generalization that might
 possibly be useful.


 It is questionable if that actually adds any useful functionality.


This is true, I'm not sure my suggestion is necessarily useful.   I'm just
throwing it out there.


 Any collecting of multiple rows will only run the risk to stall
 receiving the following rows while processing this batch. Processing each
 row as soon as it is available will ensure making most use network buffers.


This is not necessarily true,  on multiple levels.   I mean,  some of the
programs I write are highly concurrent,  and this form of batching would
have almost no risk of stalling the network buffer.And the possible use
case would be when you are dealing with very small rows,  when there would
typically be several rows inside a single network packet or network buffer.



 Collecting multiple rows, like in the FETCH command for cursors does,
 makes sense when each batch introduces a network round trip, like for the
 FETCH command. But does it make any sense for a true streaming mode, like
 what is discussed here?


Maybe?I mean,  I anticipate that there are (probably) still use cases
for FETCH,  even when the row-at-a-time interface is a viable option and
the transport between postgres and the client has reasonable flow-control.

Leon

Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-07-30 Thread Jan Wieck


On 7/30/2012 10:31 PM, Leon Smith wrote:

This is not necessarily true,  on multiple levels.   I mean,  some of
the programs I write are highly concurrent,  and this form of batching
would have almost no risk of stalling the network buffer.And
the possible use case would be when you are dealing with very small
rows,  when there would typically be several rows inside a single
network packet or network buffer.


With highly concurrent you mean multi-threaded? Like one thread reads 
the rows in batches and pushes them into a queue while another thread 
processes them from that queue?


If that is the case, then you just added a useless layer of buffering 
and the need for thread/thread context switches to PQsetSingleRowMode. 
Libpq's receiver thread is the kernel itself. Libpq tries to never 
read partial kernel buffers already. It always makes sure that there are 
at least 8K of free space in the inBuffer. In the case you describe 
above, where several rows fit into a single packet, libpq will receive 
them with a single system call in one read(2), then the application can 
get them as fast as possible, without causing any further context 
switches because they are already in the inBuffer.


I've written that sort of code myself in the past. Look at the Slony 
worker thread prior to 2.2. We switched to the COPY protocol instead of 
waiting for the single row mode and got rid of all that extra buffering 
already (and then some more).



Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-07-24 Thread Merlin Moncure

On Mon, Jul 23, 2012 at 5:07 PM, Marko Kreen mark...@gmail.com wrote:
 Does PQgetRowData() break the ability to call PQgetvalue() against the
 result as well as other functions like PQgetisnull()?  If so, I
 object.

 I don't get what are you objecting to.  The PQgetRowData()
 is called instead of PQgetResult() and it returns zero-row PGresult
 for row header and list of PGdataValues that point to actual
 data. You call the value functions, they don't crash, but as
 the result has zero rows (the data is not copied to it)
 they don't do anything useful either.

 The whole point of the API is to avoid the unnecessary copying.

 You can mix the calls to PQgetRowData() and PQgetResult()
 during one resultset, it works fine although does not seem
 that useful.

 I guess you fear that some code can unexpectedly see
 new behavior, and that exactly why the row-callback API
 needs to be scrapped - it allowed such possibility.

 But with PQsetSingleRowMode and PQgetRowData, the
 new behavior is seen only by new code that clearly
 expects it, so I don't see what the problem is.

Well, for one, it breaks libpqtypes (see line 171@
http://libpqtypes.esilo.com/browse_source.html?file=exec.c), or at
least makes it unable to use the row-processing mode.   libpqtypes
wraps the libpq getter functions and exposes a richer api using only
the result object.  When writing libpqtypes we went through great
pains to keep access to server side data through the existing result
API.  Note, I'm not arguing that compatibility with libpqtypes is a
desired objective, but the wrapping model that it uses is pretty
reasonably going to be used by other code -- the awkwardness there
should be a red flag of sorts.

I'm arguing that *all* data getting must continue to do so through the
result object, and bypassing the result to get at data is breaking the
result abstraction in the libpq api.  I bet you can still maintain
data access through result object while avoiding extra copies.  For
example, maybe PQsetSingleRowMode maybe should return a result object?

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-07-24 Thread Tom Lane

Merlin Moncure mmonc...@gmail.com writes:
 I'm arguing that *all* data getting must continue to do so through the
 result object, and bypassing the result to get at data is breaking the
 result abstraction in the libpq api.

That's a fair point, but the single-row mode without PQgetRowData still
fits that model, doesn't it?  From the point of view of libpqtypes it
just looks like you got a lot of one-row query results.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-07-24 Thread Merlin Moncure

On Tue, Jul 24, 2012 at 10:49 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Merlin Moncure mmonc...@gmail.com writes:
 I'm arguing that *all* data getting must continue to do so through the
 result object, and bypassing the result to get at data is breaking the
 result abstraction in the libpq api.

 That's a fair point, but the single-row mode without PQgetRowData still
 fits that model, doesn't it?  From the point of view of libpqtypes it
 just looks like you got a lot of one-row query results.

Sure: Marko's exec_query_single_row example looks like 100% reasonable
libpq code.  That said, I'd still spend a few cycles to think this
through and make sure we aren't walling ourselves off from 'copy free'
behavior, even if that's reserved for a future improvement.  In
particular, I'd like to explore if PQsetSingleRowMode() should be
returning a result.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-07-24 Thread Marko Kreen

On Tue, Jul 24, 2012 at 6:18 PM, Merlin Moncure mmonc...@gmail.com wrote:
 On Mon, Jul 23, 2012 at 5:07 PM, Marko Kreen mark...@gmail.com wrote:
 Does PQgetRowData() break the ability to call PQgetvalue() against the
 result as well as other functions like PQgetisnull()?  If so, I
 object.

 I don't get what are you objecting to.  The PQgetRowData()
 is called instead of PQgetResult() and it returns zero-row PGresult
 for row header and list of PGdataValues that point to actual
 data. You call the value functions, they don't crash, but as
 the result has zero rows (the data is not copied to it)
 they don't do anything useful either.

 The whole point of the API is to avoid the unnecessary copying.

 You can mix the calls to PQgetRowData() and PQgetResult()
 during one resultset, it works fine although does not seem
 that useful.

 I guess you fear that some code can unexpectedly see
 new behavior, and that exactly why the row-callback API
 needs to be scrapped - it allowed such possibility.

 But with PQsetSingleRowMode and PQgetRowData, the
 new behavior is seen only by new code that clearly
 expects it, so I don't see what the problem is.

 Well, for one, it breaks libpqtypes (see line 171@
 http://libpqtypes.esilo.com/browse_source.html?file=exec.c), or at
 least makes it unable to use the row-processing mode.   libpqtypes
 wraps the libpq getter functions and exposes a richer api using only
 the result object.  When writing libpqtypes we went through great
 pains to keep access to server side data through the existing result
 API.  Note, I'm not arguing that compatibility with libpqtypes is a
 desired objective, but the wrapping model that it uses is pretty
 reasonably going to be used by other code -- the awkwardness there
 should be a red flag of sorts.

 I'm arguing that *all* data getting must continue to do so through the
 result object, and bypassing the result to get at data is breaking the
 result abstraction in the libpq api.  I bet you can still maintain
 data access through result object while avoiding extra copies.

Well, the main problem this week is whether we should
apply PQsetSingleRowMode() from single-row-mode1
or from single-row-mode2 branch?

The PQgetRowData() is unimportant as it just exposes
the rowBuf to user and thats all.

Do you have opinion about that?

  For example, maybe PQsetSingleRowMode maybe should return a result object?

What do you mean by that?  And have you though about both
sync and async usage patterns?

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-07-24 Thread Merlin Moncure

On Tue, Jul 24, 2012 at 11:08 AM, Marko Kreen mark...@gmail.com wrote:
 I'm arguing that *all* data getting must continue to do so through the
 result object, and bypassing the result to get at data is breaking the
 result abstraction in the libpq api.  I bet you can still maintain
 data access through result object while avoiding extra copies.

 Well, the main problem this week is whether we should
 apply PQsetSingleRowMode() from single-row-mode1
 or from single-row-mode2 branch?

 The PQgetRowData() is unimportant as it just exposes
 the rowBuf to user and thats all.

right. branch 1 (containing PQgetRowData) seems wrong to me.  so, if
given that choice, I'd argue for branch 2, forcing a PGresult pull on
each row.  However, what you were gunning for via branch 1 which is
extra performance via removing the extra allocs is important and
useful; hopefully we can get the best of both worlds, or punt and
settle on branch 2.

  For example, maybe PQsetSingleRowMode maybe should return a result object?

 What do you mean by that?  And have you though about both
 sync and async usage patterns?

No, I haven't -- at least not very well.  The basic idea is that
PQsetSingleRowMode turns into PQgetSingleRowResult() and returns a
result object.  For row by row an extra API call gets called (maybe
PQgetNextRow(PGconn, PGresult)) that does the behind the scenes work
under the existing result object.  This is the same general structure
you have in branch 2, but the result allocation is move out of the
loop.  Presumably sync and async would then follow the same pattern,
but we'd still have to be able to guarantee non-blocking behavior in
the async api.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-07-24 Thread Marko Kreen

On Tue, Jul 24, 2012 at 7:08 PM, Marko Kreen mark...@gmail.com wrote:
 The PQgetRowData() is unimportant as it just exposes
 the rowBuf to user and thats all.

So to get back to the more interesting PQgetRowData() discussion...

Did you notice that it's up to app to decide whether it calls
PQgetResult() or PQgetRowData() after PQsetSingleRowMode()?
So there is no way for it to get into conflicts with anything.
If libpqtypes keeps using PQgetResult it keeps getting
PGresult.  No problem.

The PQgetRowData() is meant for use-cases where PGresult is *not* the
main data structure the app operates on.  If the app does dumb
copy out of PGresult as soon as possible, it can move to PGgetRowData().

If app wants do to complex operations with PGresult or just
store it, then it can keep doing it.  Nobody forces the use
of PQgetRowData().


Now the about idea about doing more optimized PGresult - one way of doing
it would be to create zero-copy PGresult that points into network
buffer like PGgetRowData() does.  But this breaks all expectations
of lifetime rules for PGresult, thus seems like bad idea.

Second way is to optimize the copying step - basically just
do a malloc and 2 or 3 memcopies - for struct, headers
and data.  This leaves standalone PGresult around that
behaves as expected.  But for apps that don't care about
PGresult it is still unnecessary copy.  So even if we
optimize PGresult that way, it still seems worthwhile
to have PGgetRowData() around.


Hm, I think it's possible to rig the test to do dummy
copy of pgresult, thus it's possible to see what kind
of speed is possible..  Will try.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-07-24 Thread Marko Kreen

On Tue, Jul 24, 2012 at 7:25 PM, Merlin Moncure mmonc...@gmail.com wrote:
 On Tue, Jul 24, 2012 at 11:08 AM, Marko Kreen mark...@gmail.com wrote:
 I'm arguing that *all* data getting must continue to do so through the
 result object, and bypassing the result to get at data is breaking the
 result abstraction in the libpq api.  I bet you can still maintain
 data access through result object while avoiding extra copies.

 Well, the main problem this week is whether we should
 apply PQsetSingleRowMode() from single-row-mode1
 or from single-row-mode2 branch?

 The PQgetRowData() is unimportant as it just exposes
 the rowBuf to user and thats all.

 right. branch 1 (containing PQgetRowData) seems wrong to me.

Indeed, you are missing the fact that PGgetResult works same
in both branches.,

And please see the benchmart I posted.

Even better, run it yourself...

 What do you mean by that?  And have you though about both
 sync and async usage patterns?

 No, I haven't -- at least not very well.  The basic idea is that
 PQsetSingleRowMode turns into PQgetSingleRowResult() and returns a
 result object.  For row by row an extra API call gets called (maybe
 PQgetNextRow(PGconn, PGresult)) that does the behind the scenes work
 under the existing result object.  This is the same general structure
 you have in branch 2, but the result allocation is move out of the
 loop.  Presumably sync and async would then follow the same pattern,
 but we'd still have to be able to guarantee non-blocking behavior in
 the async api.

Well, as discussed previously it's worthwhile to keep standard functions
like PQisBusy() and PQgetResult() working sensibly in new mode,
and currently they do so.

If you add new functions, you also need to define the behavior
when new and old one's are mixed and it gets messy quickly.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-07-24 Thread Merlin Moncure

On Tue, Jul 24, 2012 at 11:39 AM, Marko Kreen mark...@gmail.com wrote:
 On Tue, Jul 24, 2012 at 7:25 PM, Merlin Moncure mmonc...@gmail.com wrote:
 On Tue, Jul 24, 2012 at 11:08 AM, Marko Kreen mark...@gmail.com wrote:
 I'm arguing that *all* data getting must continue to do so through the
 result object, and bypassing the result to get at data is breaking the
 result abstraction in the libpq api.  I bet you can still maintain
 data access through result object while avoiding extra copies.

 Well, the main problem this week is whether we should
 apply PQsetSingleRowMode() from single-row-mode1
 or from single-row-mode2 branch?

 The PQgetRowData() is unimportant as it just exposes
 the rowBuf to user and thats all.

 right. branch 1 (containing PQgetRowData) seems wrong to me.

 Indeed, you are missing the fact that PGgetResult works same
 in both branches.,

 And please see the benchmart I posted.

 Even better, run it yourself...

 What do you mean by that?  And have you though about both
 sync and async usage patterns?

 No, I haven't -- at least not very well.  The basic idea is that
 PQsetSingleRowMode turns into PQgetSingleRowResult() and returns a
 result object.  For row by row an extra API call gets called (maybe
 PQgetNextRow(PGconn, PGresult)) that does the behind the scenes work
 under the existing result object.  This is the same general structure
 you have in branch 2, but the result allocation is move out of the
 loop.  Presumably sync and async would then follow the same pattern,
 but we'd still have to be able to guarantee non-blocking behavior in
 the async api.

 Well, as discussed previously it's worthwhile to keep standard functions
 like PQisBusy() and PQgetResult() working sensibly in new mode,
 and currently they do so.

 If you add new functions, you also need to define the behavior
 when new and old one's are mixed and it gets messy quickly.

Right, I'm aware that you can use 'slow' method in branch 1.  I also
saw the benchmarks and agree that single row mode should be at least
competitive with current methods for pretty much all cases.

But, the faster rowbuf method is a generally incompatible way of
dealing with data vs current libpq -- this is bad.  If it's truly
impossible to get those benefits without bypassing result API that
then I remove my objection on the grounds it's optional behavior (my
gut tells me it is possible though).

I think the dummy copy of PGresult is plausible (if by that you mean
optimizing PQgetResult when in single row mode).  That would be even
better: you'd remove the need for the rowbuf mode.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-07-24 Thread Marko Kreen

On Tue, Jul 24, 2012 at 7:52 PM, Merlin Moncure mmonc...@gmail.com wrote:
 But, the faster rowbuf method is a generally incompatible way of
 dealing with data vs current libpq -- this is bad.  If it's truly
 impossible to get those benefits without bypassing result API that
 then I remove my objection on the grounds it's optional behavior (my
 gut tells me it is possible though).

Um, please clarify what are you talking about here?

What is the incompatibility of PGresult from branch 1?

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-07-24 Thread Tom Lane

Merlin Moncure mmonc...@gmail.com writes:
 I think the dummy copy of PGresult is plausible (if by that you mean
 optimizing PQgetResult when in single row mode).  That would be even
 better: you'd remove the need for the rowbuf mode.

I haven't spent any time looking at this, but my gut tells me that a big
chunk of the expense is copying the PGresult's metadata (the column
names, types, etc).  It has to be possible to make that cheaper.

One idea is to rearrange the internal storage so that that part reduces
to one memcpy().  Another thought is to allow PGresults to share
metadata by treating the metadata as a separate reference-counted
object.  The second could be a bit hazardous though, as we advertise
that PGresults are independent objects that can be manipulated by
separate threads.  I don't want to introduce mutexes into PGresults,
but I'm not sure reference-counted metadata can be safe without them.
So maybe the memcpy idea is the only workable one.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-07-24 Thread Merlin Moncure

On Tue, Jul 24, 2012 at 11:57 AM, Marko Kreen mark...@gmail.com wrote:
 On Tue, Jul 24, 2012 at 7:52 PM, Merlin Moncure mmonc...@gmail.com wrote:
 But, the faster rowbuf method is a generally incompatible way of
 dealing with data vs current libpq -- this is bad.  If it's truly
 impossible to get those benefits without bypassing result API that
 then I remove my objection on the grounds it's optional behavior (my
 gut tells me it is possible though).

 Um, please clarify what are you talking about here?

 What is the incompatibility of PGresult from branch 1?

Incompatibility in terms of usage -- we should be getting data with
PQgetdata.  I think you're suspecting that I incorrectly believe your
forced to use the rowbuf API -- I don't (although I wasn't clear on
that earlier).  Basically I'm saying that we should only buy into that
if all other alternative routes to getting the faster performance are
exhausted.

On Tue, Jul 24, 2012 at 11:59 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Merlin Moncure mmonc...@gmail.com writes:
 I think the dummy copy of PGresult is plausible (if by that you mean
 optimizing PQgetResult when in single row mode).  That would be even
 better: you'd remove the need for the rowbuf mode.

 I haven't spent any time looking at this, but my gut tells me that a big
 chunk of the expense is copying the PGresult's metadata (the column
 names, types, etc).  It has to be possible to make that cheaper.

 One idea is to rearrange the internal storage so that that part reduces
 to one memcpy().  Another thought is to allow PGresults to share
 metadata by treating the metadata as a separate reference-counted
 object.  The second could be a bit hazardous though, as we advertise
 that PGresults are independent objects that can be manipulated by
 separate threads.  I don't want to introduce mutexes into PGresults,
 but I'm not sure reference-counted metadata can be safe without them.
 So maybe the memcpy idea is the only workable one.

Yeah -- we had a very similar problem in libpqtypes and we solved it
exactly as you're thinking.  libpqtypes has to create a result with
each row iteration potentially (we expose rows and composites as on
the fly created result objects) and stores some extra non-trivial data
with the result.  We solved it with the optimized-memcpy method (look
here: http://libpqtypes.esilo.com/browse_source.html?file=libpqtypes.h
and you'll see all the important structs like PGtypeHandler are
somewhat haphazardly designed to be run through a memcpy.   We
couldn't do anything about internal libpq issues though, but some
micro optimization of PQsetResultAttrs (which is called via
PQcopyResult) might fit the bill.

The 'source' result (or source data that would be copied into the
destination result) would be stored in the PGconn, right? So, the idea
is that when you set up single row mode the connection generates a
template PGconn which is then copied out repeatedly during row-by-row
processing.  I like it, but only if we're reasonably confident the
PGresult can be sufficiently optimized like that.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-07-24 Thread Merlin Moncure

On Tue, Jul 24, 2012 at 1:33 PM, Merlin Moncure mmonc...@gmail.com wrote:
 The 'source' result (or source data that would be copied into the
 destination result) would be stored in the PGconn, right? So, the idea
 is that when you set up single row mode the connection generates a
 template PGconn which is then copied out repeatedly during row-by-row
 processing.  I like it, but only if we're reasonably confident the
 PGresult can be sufficiently optimized like that.

hm, PGresAttDesc is unfortunately in the public header and as such
probably can't be changed?

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-07-24 Thread Merlin Moncure

On Tue, Jul 24, 2012 at 1:49 PM, Merlin Moncure mmonc...@gmail.com wrote:
 On Tue, Jul 24, 2012 at 1:33 PM, Merlin Moncure mmonc...@gmail.com wrote:
 The 'source' result (or source data that would be copied into the
 destination result) would be stored in the PGconn, right? So, the idea
 is that when you set up single row mode the connection generates a
 template PGconn which is then copied out repeatedly during row-by-row
 processing.  I like it, but only if we're reasonably confident the
 PGresult can be sufficiently optimized like that.

 hm, PGresAttDesc is unfortunately in the public header and as such
 probably can't be changed?

It can't -- at least not without breaking compatibility.  So, as
attractive as it sounds, it looks like a memcpy based PGresult copy is
out unless we break the rules change it anyways (with the probably
safe assumption that the only userland code that cares is libpqtypes,
which we'd certainly change).

However, it's not clear that a shared metadata implementation would
require a mutex -- if you stored the shared data in the conn and were
willing to walk the results in the event the PGconn was freed before
the results, you'd probably be ok.  That's really unpleasant though.

Either way, it looks like there's plausible paths to optimizing
repeated result fetch without having expose an alternate data-fetching
API and that the proposed implementation doesn't appear to be a wall
in terms of getting there. So I'm firmly in the non-exposed-rowbuf
camp, even if we can't expose an optimal implementation in time for
9.2.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-07-24 Thread Tom Lane

Merlin Moncure mmonc...@gmail.com writes:
 Either way, it looks like there's plausible paths to optimizing
 repeated result fetch without having expose an alternate data-fetching
 API and that the proposed implementation doesn't appear to be a wall
 in terms of getting there. So I'm firmly in the non-exposed-rowbuf
 camp, even if we can't expose an optimal implementation in time for
 9.2.

Yeah, the schedule argument is a strong one.  If we put in PQgetRowData
now, we'll be stuck with it forevermore.  It would be better to push
that part of the patch to 9.3 so we can have more time to think it over
and investigate alternatives.  The single-row mode is already enough to
solve the demonstrated client-side performance issues we know about.
So IMO it would be a lot smarter to be spending our time right now on
making sure we have *that* part of the patch right.

In particular I agree that PQsetSingleRowMode is a bit ugly, so I'm
wondering about your thoughts on providing PQgetSingleRowResult instead.
I don't see how to make that work in async mode.  I think the library
needs to be aware of whether it's supposed to return a single-row result
in advance of actually doing so --- for instance, how can PQisBusy
possibly give the correct answer if it doesn't know whether having
received the first row is sufficient?  (Yeah, I know we could invent
a separate flavor of PQisBusy for single-row usage, but ick.  There are
implementation problems too, such as possibly having already copied a
lot of rows into a work-in-progress PGresult.)

The thing I fundamentally don't like about PQsetSingleRowMode is that
there's such a narrow window of time to use it correctly -- as soon as
you've done even one PQconsumeInput, it's too late.  (Or maybe not, if
the server is slow, which makes it timing-dependent whether you'll
observe a bug.  Maybe we should add more internal state so that we can
consistently throw error if you've done any PQconsumeInput?)  The most
obvious alternative is to invent new versions of the PQsendXXX functions
with an additional flag parameter; but there are enough of them to make
that not very attractive either.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-07-24 Thread Marko Kreen

 Hm, I think it's possible to rig the test to do dummy
 copy of pgresult, thus it's possible to see what kind
 of speed is possible..  Will try.

I added a new method (-x) to rowdump where it asks for row
with PQgetRowData() and then tries to emulate super-efficient
PGresult copy, then loads data from that PGresult.

Quick reference:
rowdump1 - single-row-mode1 [~ libpq 9.2]
rowdump2 - single-row-mode2 [~ libpq 9.1]

-s - single row mode with PQgetResult()
-z - single row mode with PQgetRowData()
-x - simulated optimized PQgetResult()

-
QUERY: select 1,200,30,rpad('x',30,'z') from
generate_series(1,500)
./rowdump1 -s:   6.28   6.25   6.39  avg:  6.31 [ 100.00 % ]
./rowdump2 -s:   7.49   7.40   7.57  avg:  7.49 [ 118.71 % ]
./rowdump1 -z:   2.86   2.77   2.79  avg:  2.81 [ 44.50 % ]
./rowdump1 -x:   3.46   3.27   3.29  avg:  3.34 [ 52.96 % ]
QUERY: select
rpad('x',10,'z'),rpad('x',20,'z'),rpad('x',30,'z'),rpad('x',40,'z'),rpad('x',50,'z'),rpad('x',60,'z')
from generate_series(1,300)
./rowdump1 -s:   7.76   7.76   7.68  avg:  7.73 [ 100.00 % ]
./rowdump2 -s:   8.24   8.12   8.66  avg:  8.34 [ 107.85 % ]
./rowdump1 -z:   5.34   5.07   5.23  avg:  5.21 [ 67.41 % ]
./rowdump1 -x:   5.53   5.61   5.61  avg:  5.58 [ 72.20 % ]
QUERY: select
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100
from generate_series(1,80)
./rowdump1 -s:   7.49   7.66   7.59  avg:  7.58 [ 100.00 % ]
./rowdump2 -s:   7.56   8.12   7.95  avg:  7.88 [ 103.91 % ]
./rowdump1 -z:   2.77   2.76   2.76  avg:  2.76 [ 36.46 % ]
./rowdump1 -x:   3.07   3.05   3.18  avg:  3.10 [ 40.90 % ]
QUERY: select 1000,rpad('x', 400, 'z'),rpad('x', 4000, 'z') from
generate_series(1,10)
./rowdump1 -s:   2.66   2.62   2.67  avg:  2.65 [ 100.00 % ]
./rowdump2 -s:   3.11   3.14   3.11  avg:  3.12 [ 117.74 % ]
./rowdump1 -z:   2.49   2.46   2.47  avg:  2.47 [ 93.33 % ]
./rowdump1 -x:   2.59   2.57   2.57  avg:  2.58 [ 97.23 % ]
-

It shows that even if the actual fast row copy will be slower
than this one here, it's still quote competitive approach to
PQgetRowData(), as long it's not too slow.

So the optimized PGgetResult() may be good enough, thus we
can drop the idea of PQgetRowData().

Code attached, also in https://github.com/markokr/pqtest repo.

-- 
marko


pg1 = /opt/apps/pgsql92mode1
pg2 = /opt/apps/pgsql92mode2

X1 = -DHAVE_ROWDATA -I$(pg1)/include/internal -I$(pg1)/include/server

CFLAGS = -O -g -Wall

all: rowdump1 rowdump2

rowdump1: rowdump.c
$(CC) -I$(pg1)/include $(CFLAGS) -o $@ $ -L$(pg1)/lib 
-Wl,-rpath=$(pg1)/lib -lpq $(X1)

rowdump2: rowdump.c
$(CC) -I$(pg2)/include $(CFLAGS) -o $@ $ -L$(pg2)/lib 
-Wl,-rpath=$(pg2)/lib -lpq

clean:
rm -f rowdump1 rowdump2 time.tmp README.html

html: README.html

README.html: README.rst
rst2html $  $@


#include stdio.h
#include stdlib.h
#include string.h
#include unistd.h
#include getopt.h

#include libpq-fe.h

#ifdef HAVE_ROWDATA
#include internal/libpq-int.h
#endif

struct Context {
	PGconn *db;
	int count;

	char *buf;
	int buflen;
	int bufpos;
};

/* print error and exit */
static void die(PGconn *db, const char *msg)
{
	if (db)
		fprintf(stderr, %s: %s\n, msg, PQerrorMessage(db));
	else
		fprintf(stderr, %s\n, msg);
	exit(1);
}

/* write out buffer */
static void out_flush(struct Context *ctx)
{
	int out;
	if (!ctx-buf)
		return;

	out = write(1, ctx-buf, ctx-bufpos);
	if (out != ctx-bufpos)
		die(NULL, failed to write file);
	ctx-bufpos = 0;
	ctx-buflen = 0;
	free(ctx-buf);
	ctx-buf = NULL;
}

/* add char to buffer */
static void out_char(struct Context *ctx, char c)
{
	if (ctx-bufpos + 1  ctx-buflen) {
		if (!ctx-buf) {
			ctx-buflen = 16;
			ctx-buf = malloc(ctx-buflen);
			if (!ctx-buf)
die(NULL, failed to allocate buffer);
		} else {
			ctx-buflen *= 2;
			ctx-buf = realloc(ctx-buf, ctx-buflen);
			if (!ctx-buf)
die(NULL, failed to resize buffer);
		}
	}

	ctx-buf[ctx-bufpos++] = c;
}

/* quote string for copy */
static void proc_value(struct Context *ctx, const char *val, int vlen)
{
	int i;
	char c;

	for (i = 0; i  vlen; i++) {
		c = val[i];
		switch (c) {
		case '\\':
			out_char(ctx, '\\');
			out_char(ctx, '\\');
			break;
		case '\t':
			out_char(ctx, '\\');
			out_char(ctx, 't');
			break;
		case '\n':
			out_char(ctx, '\\');
			out_char(ctx, 'n');
			break;
		case '\r':
			out_char(ctx, '\\');
			out_char(ctx, 'r');
			break;
		default:
			out_char(ctx, c);
			break;
		}
	}
}

/* quote one row for copy from regular PGresult */
static void proc_row(struct Context *ctx, PGresult *res, int tup)
{
	int n = PQnfields(res);
	const char *val;
	int i, vlen;

	ctx-count++;

	for

Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-07-24 Thread Merlin Moncure

On Tue, Jul 24, 2012 at 2:37 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Merlin Moncure mmonc...@gmail.com writes:
 Either way, it looks like there's plausible paths to optimizing
 repeated result fetch without having expose an alternate data-fetching
 API and that the proposed implementation doesn't appear to be a wall
 in terms of getting there. So I'm firmly in the non-exposed-rowbuf
 camp, even if we can't expose an optimal implementation in time for
 9.2.

 Yeah, the schedule argument is a strong one.  If we put in PQgetRowData
 now, we'll be stuck with it forevermore.  It would be better to push
 that part of the patch to 9.3 so we can have more time to think it over
 and investigate alternatives.  The single-row mode is already enough to
 solve the demonstrated client-side performance issues we know about.
 So IMO it would be a lot smarter to be spending our time right now on
 making sure we have *that* part of the patch right.

 In particular I agree that PQsetSingleRowMode is a bit ugly, so I'm
 wondering about your thoughts on providing PQgetSingleRowResult instead.
 I don't see how to make that work in async mode.  I think the library
 needs to be aware of whether it's supposed to return a single-row result
 in advance of actually doing so --- for instance, how can PQisBusy
 possibly give the correct answer if it doesn't know whether having
 received the first row is sufficient?

Well (Marko probably wants to slap me with a halibut by now), the
original intent was for it not to have to: PQgetSingleRowResult is
more 'construct result for single row iteration', so it would never
block -- it only sets the internal single row mode and instantiates
the result object.  After that, you can do do standard
consumeinput/isbusy processing on the conn.  The actual iteration
routine would be like PQiterateResult which you could guard with
PQisBusy.  Like I said: it's the same general structure but the result
construction is moved out of the loop.

I don't think this breaks result scoping rules...the conn keeps a copy
of the result during 'result construction' which we are going to
define as ending when the query terminates.  Until the query resolves,
the result remains 'under construction' and avoids the expectation of
const-ness that you normally get so you don't get to interleave
threads reading the result with threads iterating from the connection
(which is fine) and you have to avoid doing stupid things like closing
the connection before all the data has been read through.

(but maybe all this is moot per Marko's latest benchmarks)

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-07-24 Thread Marko Kreen

On Tue, Jul 24, 2012 at 11:34 PM, Merlin Moncure mmonc...@gmail.com wrote:
 On Tue, Jul 24, 2012 at 2:37 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 In particular I agree that PQsetSingleRowMode is a bit ugly, so I'm
 wondering about your thoughts on providing PQgetSingleRowResult instead.
 I don't see how to make that work in async mode.  I think the library
 needs to be aware of whether it's supposed to return a single-row result
 in advance of actually doing so --- for instance, how can PQisBusy
 possibly give the correct answer if it doesn't know whether having
 received the first row is sufficient?

 Well (Marko probably wants to slap me with a halibut by now), the
 original intent was for it not to have to: PQgetSingleRowResult is
 more 'construct result for single row iteration', so it would never
 block -- it only sets the internal single row mode and instantiates
 the result object.  After that, you can do do standard
 consumeinput/isbusy processing on the conn.  The actual iteration
 routine would be like PQiterateResult which you could guard with
 PQisBusy.  Like I said: it's the same general structure but the result
 construction is moved out of the loop.

If you really don't like PQsetSingleRowMode, then I would prefer
new PQsend* functions to new fetch functions.

Because while send is one-shot affair, receive is complex state-machine
with requires multiple function calls.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-07-24 Thread Marko Kreen

On Tue, Jul 24, 2012 at 10:37 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Merlin Moncure mmonc...@gmail.com writes:
 Either way, it looks like there's plausible paths to optimizing
 repeated result fetch without having expose an alternate data-fetching
 API and that the proposed implementation doesn't appear to be a wall
 in terms of getting there. So I'm firmly in the non-exposed-rowbuf
 camp, even if we can't expose an optimal implementation in time for
 9.2.

 Yeah, the schedule argument is a strong one.  If we put in PQgetRowData
 now, we'll be stuck with it forevermore.  It would be better to push
 that part of the patch to 9.3 so we can have more time to think it over
 and investigate alternatives.  The single-row mode is already enough to
 solve the demonstrated client-side performance issues we know about.
 So IMO it would be a lot smarter to be spending our time right now on
 making sure we have *that* part of the patch right.

Just to show agreement: both PQgetRowData() and optimized PGresult
do not belong to 9.2.

Only open questions are:

* Is there better API than PQsetSingleRowMode()?  New PQsend*
  functions is my alternative.

* Should we rollback rowBuf change? I think no, as per benchmark
  it performs better than old code.

 The thing I fundamentally don't like about PQsetSingleRowMode is that
 there's such a narrow window of time to use it correctly -- as soon as
 you've done even one PQconsumeInput, it's too late.  (Or maybe not, if
 the server is slow, which makes it timing-dependent whether you'll
 observe a bug.  Maybe we should add more internal state so that we can
 consistently throw error if you've done any PQconsumeInput?)  The most
 obvious alternative is to invent new versions of the PQsendXXX functions
 with an additional flag parameter; but there are enough of them to make
 that not very attractive either.

It belongs logically together with PQsend, so if you don't like
current separation, then proper fix is to make them single
function call.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-07-24 Thread Merlin Moncure

On Tue, Jul 24, 2012 at 3:52 PM, Marko Kreen mark...@gmail.com wrote:
 * Is there better API than PQsetSingleRowMode()?  New PQsend*
   functions is my alternative.

I would prefer to have PQsetSingleRowMode() over new flavor of PQsend.

To consolidate my argument above: since you're throwing PQgetResult in
the main iteration loop you're potentially walling yourself off from
one important optimization: not copying the result at all and reusing
the old one; that's going to produce the fastest possible code since
you're recovering all the attribute structures that have already been
set up unless you're willing to do the following:

Reading your original patch, what if at the line inside PQgetResult:

res = pqSingleRowResult(conn);

you instead manipulated the result the connection already had and
directly returned it -- discarding the result data but not the
attributes etc?  Actually, you could even keep your API 'as is' if
you're willing to adjust the behavior of PQclear while the connection
is doing row by row results processing to be a no-op unless done.

Single row results' data  would then be volatile across iterations,
not const -- even if you save off the pointer the connection can and
will rewrite the data portion of the PGresult.  Any pointers to
PQgetdata'd values would have to be copied off between iterations
naturally (or the user can copy off using the user-facing copy result
function).  This would be extremely efficient since we'd only even
need to extend PGresult buffer if a particular row was longer than the
longest one already fetched.

If all that's a bridge too far, we can look at re-jiggering the API
like I was thinking ealier to make the adjusted result scoping more
user visible or run your non-rowbuf-exposing patch as-is and hope for
optimizations down the line.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-07-24 Thread Marko Kreen

On Wed, Jul 25, 2012 at 12:35 AM, Merlin Moncure mmonc...@gmail.com wrote:
 On Tue, Jul 24, 2012 at 3:52 PM, Marko Kreen mark...@gmail.com wrote:
 * Is there better API than PQsetSingleRowMode()?  New PQsend*
   functions is my alternative.

 I would prefer to have PQsetSingleRowMode() over new flavor of PQsend.

 To consolidate my argument above: since you're throwing PQgetResult in
 the main iteration loop you're potentially walling yourself off from
 one important optimization: not copying the result at all and reusing
 the old one; that's going to produce the fastest possible code since
 you're recovering all the attribute structures that have already been
 set up unless you're willing to do the following:

I am not forgetting such optimization, I've already implemented it:
it's called PQgetRowData().  Main reason why it works, and so simply,
is that it does not try to give you PGresult.

PGresult carries various historical assumptions:
- Fully user-controlled lifetime.
- Zero-terminated strings.
- Aligned binary values.

Breaking them all does not seem like a good idea.

Trying to make them work with zero-copy seems
not worth the pain.

And if you are ready to introduce new accessor functions,
why care about PGresult at all?  Why not give user
PQgetRowData()?

Btw, PQgetRowData() and any other potential zero-copy API
is not incompatible with both slow PQgetResult() or optimized
PQgetResult().

So if we give only PQgetResult() in 9.2, I do not see that we
are locked out from any interesting optimizations.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-07-24 Thread Merlin Moncure

On Tue, Jul 24, 2012 at 4:59 PM, Marko Kreen mark...@gmail.com wrote:
 So if we give only PQgetResult() in 9.2, I do not see that we
 are locked out from any interesting optimizations.

Well, you are locked out of having PQgetResult reuse the conn's result
since that would then introduce potentially breaking changes to user
code.  Here are the approaches I see:

1) introduce rowbuf (but not user exposed rowbuf) patch:
Not the fastest method.  Users would begin using the feature and API
behaviors would be locked in -- only possible optmiization route would
be to try and make PQcopyResult as fast as possible.

2) expose PQgetRowData
Very fast, but foreign and somewhat awkward.  Existing libpq wrappers
(libpqtypes, libpqxx etc) could not be converted to use without
extensive revision, if at all, and would be stuck with the slower
version of iteration.  Also a side benefit here is that optimizing
result copying later has usefulness outside of row by row processing.

3) as #1, but instead of copying result, overwrite the data area.
this is bending the rule 'user defined lifetime of result' since each
iteration clears the data area of the PGresult. This is almost as fast
as #2, and existing libpq wrappers would be trivially converted to the
API.

4) as #3, but introduce a new iteration function
(PQiterateResult(conn, result)) that would be called instead of a
paired PQgetResult/PQclear.  This would also be fast, and could almost
lay directly on top of #2 as an alternate optimization strategy -- the
set result mode would have to be modified (or and additional function
introduced) to return a result.

I feel like you're partial to #2 and appreciate that, but I'm really
quite skeptical about it as noted.  #3 and #4 are not as well thought
out as what you've put together and would need some extra work and
buy-in to get out the door, so #2 seems like the lowest common
denominator (it would permanently preclude #3 and would require #4 to
introduce two new functions instead of just one).  #1 of course would
bolt on to #2.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-07-24 Thread Merlin Moncure

On Tue, Jul 24, 2012 at 5:29 PM, Merlin Moncure mmonc...@gmail.com wrote:
 so #2 seems like the lowest common
 denominator (it would permanently preclude #3 and would require #4 to
 introduce two new functions instead of just one).  #1 of course would
 bolt on to #2.

oops, got #1 and #2 backwards there.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-07-24 Thread Marko Kreen

On Wed, Jul 25, 2012 at 1:29 AM, Merlin Moncure mmonc...@gmail.com wrote:
 On Tue, Jul 24, 2012 at 4:59 PM, Marko Kreen mark...@gmail.com wrote:
 So if we give only PQgetResult() in 9.2, I do not see that we
 are locked out from any interesting optimizations.

 Well, you are locked out of having PQgetResult reuse the conn's result
 since that would then introduce potentially breaking changes to user
 code.

You can specify special flags to PQsend or have special PQgetResultWeird()
calls to get PGresults with unusual behavior.  Like I did with PQgetRowData().

I see no reason here to reject PQgetResult() that returns normal PGresult.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-07-24 Thread Merlin Moncure

On Tuesday, July 24, 2012, Marko Kreen mark...@gmail.com wrote:
 On Wed, Jul 25, 2012 at 1:29 AM, Merlin Moncure mmonc...@gmail.com
wrote:
 On Tue, Jul 24, 2012 at 4:59 PM, Marko Kreen mark...@gmail.com wrote:
 So if we give only PQgetResult() in 9.2, I do not see that we
 are locked out from any interesting optimizations.

 Well, you are locked out of having PQgetResult reuse the conn's result
 since that would then introduce potentially breaking changes to user
 code.

 You can specify special flags to PQsend or have special PQgetResultWeird()
 calls to get PGresults with unusual behavior.  Like I did with
PQgetRowData().

 I see no reason here to reject PQgetResult() that returns normal PGresult.

Yeah -- I agree.  So, given the scheduling, I think we should go with
non-PQgetRowData patch and reserve optimized path for 9.3.

merlin

Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-07-23 Thread Marko Kreen


Here is a simple test program that takes a SELECT
query, reads it and outputs a COPY-formatted stream
to standard output, to simulate some activity.

It operates on 3 modes, specified by commant-line switches:

-f   Load full resultset at once - old way.
-s   Single-Row mode using PQgetResult().
-z   Single-Row mode using PQgetRowData().

It is compiled with 2 different libpqs that correspond to
single-row-modeX branches in my github repo:

rowdump1 - libpq with rowBuf + PQgetRowData().   rowBuf is
   required for PQgetRowData.
   [ https://github.com/markokr/postgres/tree/single-row-mode1 ]

rowdump2 - Plain libpq patched for single-row mode.
   No PQgetRowData() here.
   [ https://github.com/markokr/postgres/tree/single-row-mode2 ]

Notes:

* Hardest part is picking realistic queries that matter.
  It's possible to construct artificial queries that make
  results go either way.

* It does not make sense for compare -f with others.  But it
  does make sense to compare -f from differently patched libpqs
  to detect any potential slowdowns.

* The time measured is User Time of client process.

---
QUERY: select 1,200,30,rpad('x',30,'z') from
generate_series(1,500)
./rowdump1 -f:   3.90   3.90   3.93  avg:  3.91
./rowdump2 -f:   4.03   4.13   4.05  avg:  4.07
./rowdump1 -s:   6.26   6.33   6.49  avg:  6.36
./rowdump2 -s:   7.48   7.46   7.50  avg:  7.48
./rowdump1 -z:   2.88   2.90   2.79  avg:  2.86
QUERY: select
rpad('x',10,'z'),rpad('x',20,'z'),rpad('x',30,'z'),rpad('x',40,'z'),rpad('x',50,'z'),rpad('x',60,'z')
from generate_series(1,300)
./rowdump1 -f:   6.29   6.36   6.14  avg:  6.26
./rowdump2 -f:   6.79   6.69   6.72  avg:  6.73
./rowdump1 -s:   7.71   7.72   7.80  avg:  7.74
./rowdump2 -s:   8.14   8.16   8.57  avg:  8.29
./rowdump1 -z:   6.45   5.15   5.16  avg:  5.59
QUERY: select
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100
from generate_series(1,80)
./rowdump1 -f:   5.68   5.66   5.72  avg:  5.69
./rowdump2 -f:   5.69   5.84   5.67  avg:  5.73
./rowdump1 -s:   7.68   7.76   7.67  avg:  7.70
./rowdump2 -s:   7.57   7.54   7.62  avg:  7.58
./rowdump1 -z:   2.78   2.82   2.72  avg:  2.77
QUERY: select 1000,rpad('x', 400, 'z'),rpad('x', 4000, 'z') from
generate_series(1,10)
./rowdump1 -f:   2.71   2.66   2.58  avg:  2.65
./rowdump2 -f:   3.11   3.14   3.16  avg:  3.14
./rowdump1 -s:   2.64   2.61   2.64  avg:  2.63
./rowdump2 -s:   3.15   3.11   3.11  avg:  3.12
./rowdump1 -z:   2.53   2.51   2.46  avg:  2.50
---

Test code attached.  Please play with it.

By this test, both rowBuf and PQgetRowData() look good.

-- 
marko


pg1 = /opt/apps/pgsql92mode1
pg2 = /opt/apps/pgsql92mode2

CFLAGS = -O -g -Wall

all: rowdump1 rowdump2

rowdump1: rowdump.c
$(CC) -I$(pg1)/include $(CFLAGS) -o $@ $ -L$(pg1)/lib 
-Wl,-rpath=$(pg1)/lib -lpq -DHAVE_ROWDATA

rowdump2: rowdump.c
$(CC) -I$(pg2)/include $(CFLAGS) -o $@ $ -L$(pg2)/lib 
-Wl,-rpath=$(pg2)/lib -lpq

clean:
rm -f rowdump1 rowdump2



xtest.sh
Description: Bourne shell script
#include stdio.h
#include stdlib.h
#include string.h
#include unistd.h
#include getopt.h

#include libpq-fe.h

struct Context {
	PGconn *db;
	int count;

	char *buf;
	int buflen;
	int bufpos;
};

static void die(PGconn *db, const char *msg)
{
	if (db)
		fprintf(stderr, %s: %s\n, msg, PQerrorMessage(db));
	else
		fprintf(stderr, %s\n, msg);
	exit(1);
}

static void out_flush(struct Context *ctx)
{
	int out;
	if (!ctx-buf)
		return;

	out = write(1, ctx-buf, ctx-bufpos);
	if (out != ctx-bufpos)
		die(NULL, failed to write file);
	ctx-bufpos = 0;
	ctx-buflen = 0;
	free(ctx-buf);
	ctx-buf = NULL;
}

static void out_char(struct Context *ctx, char c)
{
	if (ctx-bufpos + 1  ctx-buflen) {
		if (!ctx-buf) {
			ctx-buflen = 16;
			ctx-buf = malloc(ctx-buflen);
			if (!ctx-buf)
die(NULL, failed to allocate buffer);
		} else {
			ctx-buflen *= 2;
			ctx-buf = realloc(ctx-buf, ctx-buflen);
			if (!ctx-buf)
die(NULL, failed to resize buffer);
		}
	}

	ctx-buf[ctx-bufpos++] = c;
}

static void proc_value(struct Context *ctx, const char *val, int vlen)
{
	int i;
	char c;

	for (i = 0; i  vlen; i++) {
		c = val[i];
		switch (c) {
		case '\\':
			out_char(ctx, '\\');
			out_char(ctx, '\\');
			break;
		case '\t':
			out_char(ctx, '\\');
			out_char(ctx, 't');
			break;
		case '\n':
			out_char(ctx, '\\');
			out_char(ctx, 'n');
			break;
		case '\r':
			out_char(ctx, '\\');
			out_char(ctx, 'r');
			break;
		default:
			out_char(ctx, c);
			break;
		}
	}
}

static void proc_row(struct Context *ctx, PGresult *res, int tup)
{
	int n = PQnfields(res);
	const char *val;
	int

Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-07-23 Thread Merlin Moncure

On Mon, Jul 23, 2012 at 2:05 PM, Marko Kreen mark...@gmail.com wrote:

 Here is a simple test program that takes a SELECT
 query, reads it and outputs a COPY-formatted stream
 to standard output, to simulate some activity.

 It operates on 3 modes, specified by commant-line switches:

 -f   Load full resultset at once - old way.
 -s   Single-Row mode using PQgetResult().
 -z   Single-Row mode using PQgetRowData().

 It is compiled with 2 different libpqs that correspond to
 single-row-modeX branches in my github repo:

 rowdump1 - libpq with rowBuf + PQgetRowData().   rowBuf is
required for PQgetRowData.
[ https://github.com/markokr/postgres/tree/single-row-mode1 ]

 rowdump2 - Plain libpq patched for single-row mode.
No PQgetRowData() here.
[ https://github.com/markokr/postgres/tree/single-row-mode2 ]

 Notes:

 * Hardest part is picking realistic queries that matter.
   It's possible to construct artificial queries that make
   results go either way.

 * It does not make sense for compare -f with others.  But it
   does make sense to compare -f from differently patched libpqs
   to detect any potential slowdowns.

 * The time measured is User Time of client process.

 ---
 QUERY: select 1,200,30,rpad('x',30,'z') from
 generate_series(1,500)
 ./rowdump1 -f:   3.90   3.90   3.93  avg:  3.91
 ./rowdump2 -f:   4.03   4.13   4.05  avg:  4.07
 ./rowdump1 -s:   6.26   6.33   6.49  avg:  6.36
 ./rowdump2 -s:   7.48   7.46   7.50  avg:  7.48
 ./rowdump1 -z:   2.88   2.90   2.79  avg:  2.86
 QUERY: select
 rpad('x',10,'z'),rpad('x',20,'z'),rpad('x',30,'z'),rpad('x',40,'z'),rpad('x',50,'z'),rpad('x',60,'z')
 from generate_series(1,300)
 ./rowdump1 -f:   6.29   6.36   6.14  avg:  6.26
 ./rowdump2 -f:   6.79   6.69   6.72  avg:  6.73
 ./rowdump1 -s:   7.71   7.72   7.80  avg:  7.74
 ./rowdump2 -s:   8.14   8.16   8.57  avg:  8.29
 ./rowdump1 -z:   6.45   5.15   5.16  avg:  5.59
 QUERY: select
 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100
 from generate_series(1,80)
 ./rowdump1 -f:   5.68   5.66   5.72  avg:  5.69
 ./rowdump2 -f:   5.69   5.84   5.67  avg:  5.73
 ./rowdump1 -s:   7.68   7.76   7.67  avg:  7.70
 ./rowdump2 -s:   7.57   7.54   7.62  avg:  7.58
 ./rowdump1 -z:   2.78   2.82   2.72  avg:  2.77
 QUERY: select 1000,rpad('x', 400, 'z'),rpad('x', 4000, 'z') from
 generate_series(1,10)
 ./rowdump1 -f:   2.71   2.66   2.58  avg:  2.65
 ./rowdump2 -f:   3.11   3.14   3.16  avg:  3.14
 ./rowdump1 -s:   2.64   2.61   2.64  avg:  2.63
 ./rowdump2 -s:   3.15   3.11   3.11  avg:  3.12
 ./rowdump1 -z:   2.53   2.51   2.46  avg:  2.50
 ---

 Test code attached.  Please play with it.

 By this test, both rowBuf and PQgetRowData() look good.

I agree on performance grounds.   It's important for libpq to be fast.

It seems odd (but maybe ok) that you have to set the single row mode
on the connection only to have the server reset it whenever you call a
send function: maybe rename to PQsetResultSingleRowMode?

Does PQgetRowData() break the ability to call PQgetvalue() against the
result as well as other functions like PQgetisnull()?  If so, I
object.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-07-23 Thread Marko Kreen

On Tue, Jul 24, 2012 at 12:27 AM, Merlin Moncure mmonc...@gmail.com wrote:
 On Mon, Jul 23, 2012 at 2:05 PM, Marko Kreen mark...@gmail.com wrote:
 It seems odd (but maybe ok) that you have to set the single row mode
 on the connection only to have the server reset it whenever you call a
 send function: maybe rename to PQsetResultSingleRowMode?

Server does not reset it, it's purely client-side flag.  It is reset
by next PQsend*/PQexec* call.  If there are several resultsets
sent by server for one query, they all share the same setting.

I don't think extra-long function names that describe exactly
the function behavior are improvement over shorter but inexact
names, as you need to read the docs anyway to know
the real behavior.  But its a matter of taste, so it can be
renamed if people want it.

Alternative is to create parallel PQsend* functions that set the flag.
It gets rid of the possibility of any extra activity between PQsend
and PQsetSingleRowMode().  But it seems messy to do that
just for single-row-mode, instead it makes sense to have PQsend*
that takes a flags argument to tune it's behavior.  But that
is worth doing only if we have more flags than just one.

 Does PQgetRowData() break the ability to call PQgetvalue() against the
 result as well as other functions like PQgetisnull()?  If so, I
 object.

I don't get what are you objecting to.  The PQgetRowData()
is called instead of PQgetResult() and it returns zero-row PGresult
for row header and list of PGdataValues that point to actual
data. You call the value functions, they don't crash, but as
the result has zero rows (the data is not copied to it)
they don't do anything useful either.

The whole point of the API is to avoid the unnecessary copying.

You can mix the calls to PQgetRowData() and PQgetResult()
during one resultset, it works fine although does not seem
that useful.

I guess you fear that some code can unexpectedly see
new behavior, and that exactly why the row-callback API
needs to be scrapped - it allowed such possibility.

But with PQsetSingleRowMode and PQgetRowData, the
new behavior is seen only by new code that clearly
expects it, so I don't see what the problem is.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-07-21 Thread Marko Kreen


Here is 2 approaches how to get to state where only PQsetSingleRowMode()
is available.  Both on top of REL9_2_STABLE branch.

a) Remove callback hooks, keep rowBuf, implement single-row-mode on
   top of that.   This was posted before, I just additionally removed
   the PQgetRowData() function.

git pull git://github.com/markokr/postgres.git single-row-mode1
https://github.com/markokr/postgres/commits/single-row-mode1

   Commits:
libpq: Single-row based processing
libpq, dblink: Remove row processor API

   Advantage: makes easier to play with PQgetRowData() or potatial
   faster PGresult creation methods.  Smaller change compared to
   libpq from 9.2beta than b).

b) Revert row-callback changes completely, implement single-row-mode on
   top of virgin libpq.  Only problem here was keeping fixes implemented
   as part of row-callback patch.  Single-row-mode itself is quite simple.

git pull git://github.com/markokr/postgres.git  single-row-mode1
https://github.com/markokr/postgres/commits/single-row-mode1

   Feature patch:

https://github.com/markokr/postgres/commit/b5e822125c655f189875401c61317242705143b9

   Commits:
dblink: revert conversion to row processor API patch
libpq: revert row processor API patch
libpq: random fixes
libpq: single-row mode
dblink: use single-row-mode

   Advantage: smaller change compared to libpq from 9.1 than a).

As the patch has suffered a lot from trying to provide both macro- and
micro-optimization (on-the-fly row processing vs. more efficient row
processing), maybe b) is safer choice for 9.2?

In case somebody wants to look at the patches without git (or web),
I attaches them as tgz too.

-- 
marko



single-row.tgz
Description: GNU Unix tar archive

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-07-21 Thread Tom Lane

Marko Kreen mark...@gmail.com writes:
 Here is 2 approaches how to get to state where only PQsetSingleRowMode()
 is available.  Both on top of REL9_2_STABLE branch.

 a) Remove callback hooks, keep rowBuf, implement single-row-mode on
top of that.   This was posted before, I just additionally removed
the PQgetRowData() function.

 b) Revert row-callback changes completely, implement single-row-mode on
top of virgin libpq.  Only problem here was keeping fixes implemented
as part of row-callback patch.  Single-row-mode itself is quite simple.

Yeah, I was wondering if we shouldn't revisit the whole patch given that
where we're ending up looks little like where we thought we were going.
I hadn't had time to pursue the idea, but I'm glad you did.  Will look
at these.

One reason to stick with approach (a) is that it gives us the option
to easily add PQgetRowData(), in case future testing shows that that's
actually worth the risk and usage restrictions.  While I'm a bit dubious
of that, I'm prepared to be proven wrong.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-07-16 Thread Marko Kreen

On Mon, Jul 16, 2012 at 4:11 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Marko Kreen mark...@gmail.com writes:
 Now, looking at the problem with some perspective, the solution
 is obvious: when in single-row mode, the PQgetResult() must return
 proper PGresult for that single row.  And everything else follows that.

 Such API is implemented in attached patch:

 I'm starting to look at this patch now.  I think we could drop the
 PQgetRowData() API: it complicates matters for little gain that I can
 see.  The argument for it was to avoid the cost of creating a PGresult
 per row, but we're already going to pay the cost of creating a
 PGresult in order to return the PGRES_SINGLE_TUPLE status.

No.  Please look again, it is supposed to be called instead of PGgetResult().

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-07-16 Thread Tom Lane

Marko Kreen mark...@gmail.com writes:
 On Mon, Jul 16, 2012 at 4:11 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I'm starting to look at this patch now.  I think we could drop the
 PQgetRowData() API: it complicates matters for little gain that I can
 see.  The argument for it was to avoid the cost of creating a PGresult
 per row, but we're already going to pay the cost of creating a
 PGresult in order to return the PGRES_SINGLE_TUPLE status.

 No.  Please look again, it is supposed to be called instead of PGgetResult().

Mm.  I still think we should drop it, because it's still a dangerous API
that's not necessary for the principal benefit of this feature.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-07-16 Thread Marko Kreen

On Mon, Jul 16, 2012 at 4:33 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Marko Kreen mark...@gmail.com writes:
 On Mon, Jul 16, 2012 at 4:11 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I'm starting to look at this patch now.  I think we could drop the
 PQgetRowData() API: it complicates matters for little gain that I can
 see.  The argument for it was to avoid the cost of creating a PGresult
 per row, but we're already going to pay the cost of creating a
 PGresult in order to return the PGRES_SINGLE_TUPLE status.

 No.  Please look again, it is supposed to be called instead of PGgetResult().

 Mm.  I still think we should drop it, because it's still a dangerous API
 that's not necessary for the principal benefit of this feature.

Yes, it is a secondary feature, but it fits the needs of the actual target
audience of the single-row feature - various high-level wrappers of libpq.

Also it is needed for high-performance situations, where the
single-row-mode fits well even for C clients, except the
advantage is negated by new malloc-per-row overhead.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-07-16 Thread Tom Lane

Marko Kreen mark...@gmail.com writes:
 On Mon, Jul 16, 2012 at 4:33 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Mm.  I still think we should drop it, because it's still a dangerous API
 that's not necessary for the principal benefit of this feature.

 Yes, it is a secondary feature, but it fits the needs of the actual target
 audience of the single-row feature - various high-level wrappers of libpq.

 Also it is needed for high-performance situations, where the
 single-row-mode fits well even for C clients, except the
 advantage is negated by new malloc-per-row overhead.

Absolutely no evidence has been presented that there's any useful
performance gain to be had there.  Moreover, if there were, we could
probably work a bit harder at making PGresult creation cheaper, rather
than having to expose a dangerous API.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-07-16 Thread Marko Kreen

On Mon, Jul 16, 2012 at 6:47 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Marko Kreen mark...@gmail.com writes:
 On Mon, Jul 16, 2012 at 4:33 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Mm.  I still think we should drop it, because it's still a dangerous API
 that's not necessary for the principal benefit of this feature.

 Yes, it is a secondary feature, but it fits the needs of the actual target
 audience of the single-row feature - various high-level wrappers of libpq.

 Also it is needed for high-performance situations, where the
 single-row-mode fits well even for C clients, except the
 advantage is negated by new malloc-per-row overhead.

 Absolutely no evidence has been presented that there's any useful
 performance gain to be had there.  Moreover, if there were, we could
 probably work a bit harder at making PGresult creation cheaper, rather
 than having to expose a dangerous API.

Ok, I'm more interested in primary feature,
so no more objections from me.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-07-15 Thread Tom Lane

Marko Kreen mark...@gmail.com writes:
 Now, looking at the problem with some perspective, the solution
 is obvious: when in single-row mode, the PQgetResult() must return
 proper PGresult for that single row.  And everything else follows that.

 Such API is implemented in attached patch:

I'm starting to look at this patch now.  I think we could drop the
PQgetRowData() API: it complicates matters for little gain that I can
see.  The argument for it was to avoid the cost of creating a PGresult
per row, but we're already going to pay the cost of creating a
PGresult in order to return the PGRES_SINGLE_TUPLE status.  And as was
pointed out upthread, any per-tuple malloc costs are going to be in
the noise compared to the server-side effort expended to create the
tuple, anyway.  The point of this feature is to avoid accumulating the
entire resultset in memory, not to micro-optimize linear-time costs.

Moreover, if the argument for changing 9.2 at this late date is to get
rid of a fragile, breakable API, surely an API that's designed around
returning pointers into the library's network buffer ought to be a
prime target.

And lastly, since the proposed patch for dblink doesn't use
PQgetRowData, there's not even much reason to think that it's
bug-free.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-06-17 Thread Simon Riggs

On 16 June 2012 23:09, Tom Lane t...@sss.pgh.pa.us wrote:
 Marko Kreen mark...@gmail.com writes:
 Now, looking at the problem with some perspective, the solution
 is obvious: when in single-row mode, the PQgetResult() must return
 proper PGresult for that single row.  And everything else follows that.

 * PQgetRowData(): can be called instead PQgetResult() to get raw row data
  in buffer, for more efficient processing.  This is optional feature
  that provides the original row-callback promise of avoiding unnecessary
  row data copy.

 * Although PQgetRowData() makes callback API unnecessary, it is still
  fully compatible with it - the callback should not see any difference
  whether the resultset is processed in single-row mode or
  old single-PGresult mode.  Unless it wants to - it can check
  PGRES_TUPLES_OK vs. PGRES_SINGLE_TUPLE.

 I guess this raises the question of whether we ought to revert the
 row-callback patch entirely and support only this approach.  IMO
 it is (barely) not too late to do that for 9.2, if we want to.
 If we don't want to, then this is just another new feature and
 should be considered for 9.3.

 What I like about this is the greatly simpler and harder-to-misuse
 API.  The only arguable drawback is that there's still at least one
 malloc/free cycle per tuple, imposed by the creation of a PGresult
 for each one, whereas the callback approach avoids that.  But worrying
 about that could be considered to be vast overoptimization; the backend
 has certainly spent a lot more overhead than that generating the tuple.

I prefer the description of Marko's API than the one we have now.

Adopting one API in 9.2 and another in 9.3 would be fairly bad.
Perhaps we can have both?

Can we see a performance test? Add a row processor API to libpq for
better handling of large result sets. So idea is we do this many,
many times so we need to double check the extra overhead is not a
problem in cases where the dumping overhead is significant.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-06-17 Thread Marko Kreen

On Sat, Jun 16, 2012 at 7:58 PM, Marko Kreen mark...@gmail.com wrote:
 So my preference would be to simply remove the callback API
 but keep the processing and provide PQgetRowData() instead.

This is implemented in attached patch.  It also
converts dblink to use single-row API.

The patch should be applied on top of previous
single-row patch.

Both can be seen also here:

  https://github.com/markokr/postgres/commits/single-row

-- 
marko


remove-rowproc.diff.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-06-17 Thread Marko Kreen

On Sun, Jun 17, 2012 at 2:07 PM, Simon Riggs si...@2ndquadrant.com wrote:
 I prefer the description of Marko's API than the one we have now.

 Adopting one API in 9.2 and another in 9.3 would be fairly bad.
 Perhaps we can have both?

I see no reason the keep the (public) callback API around,
except if we don't bother to remove it now.

 Can we see a performance test? Add a row processor API to libpq for
 better handling of large result sets. So idea is we do this many,
 many times so we need to double check the extra overhead is not a
 problem in cases where the dumping overhead is significant.

Not sure what do to want to performance test.

PQgetRowData() uses exactly the same pipeline
that callbacks used.  It will use few more C calls,
not sure it make sense to benchmark them.

Recent dblink change did change palloc() + copy
zero-termination dance to PQgetResult(), which
does malloc() + copy dance internally.  This malloc
vs. palloc might be benchmarkable, but it seems to
go into micro-benchmarking world as the big win came
from avoiding buffering rows.  So yeah, maybe using
PQgetRowData() might be tiny bit faster, but I don't
expect much difference.

But all this affects new users only.  The thing that affects
everybody was the 2-step row processing change
that was done during rowproc patch.

I did benchmark it, and it seems there are column-size
+ column-count patterns where new way is faster,
and some patterns where old way is faster.  But the
difference did not raise above test noise so I concluded
it is insignificant and the malloc+copy avoidance is worth it.

Ofcourse, additional any benchmarking is welcome, so feel
free to pick any situation you care about.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-06-17 Thread Simon Riggs

On 17 June 2012 19:37, Marko Kreen mark...@gmail.com wrote:
 On Sun, Jun 17, 2012 at 2:07 PM, Simon Riggs si...@2ndquadrant.com wrote:
 I prefer the description of Marko's API than the one we have now.

 Adopting one API in 9.2 and another in 9.3 would be fairly bad.
 Perhaps we can have both?

 I see no reason the keep the (public) callback API around,
 except if we don't bother to remove it now.

OK by me.

 Can we see a performance test? Add a row processor API to libpq for
 better handling of large result sets. So idea is we do this many,
 many times so we need to double check the extra overhead is not a
 problem in cases where the dumping overhead is significant.
...
 I did benchmark it, and it seems there are column-size
 + column-count patterns where new way is faster,
 and some patterns where old way is faster.  But the
 difference did not raise above test noise so I concluded
 it is insignificant and the malloc+copy avoidance is worth it.

As long as we've checked that's fine.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-06-16 Thread Marko Kreen

Demos:

https://github.com/markokr/libpq-rowproc-demos/blob/master/demo-onerow-sync.c
https://github.com/markokr/libpq-rowproc-demos/blob/master/demo-onerow-async.c

Few clarifications below.

On Fri, Jun 15, 2012 at 9:21 PM, Marko Kreen mark...@gmail.com wrote:
 Now, looking at the problem with some perspective, the solution
 is obvious: when in single-row mode, the PQgetResult() must return
 proper PGresult for that single row.  And everything else follows that.

 Such API is implemented in attached patch:

 * PQsetSingleRowMode(conn): set's single-row mode.

The function can be called only after PQsend* and before any
rows have arrived.  This guarantees there will be no surprises
to PQexec* users who expect full resultset at once.  Also it
guarantees that user will process resultset with PQgetResult()
loop, either sync or async.  Next PQexec/PQsend call will
reset the flag.  So it is active only for duration of processing
results from one command.

Currently it returns FALSE if called in wrong place and does
nothing.  Only question I see here is whether it should set
error state on connection or not.  It does not seem to be
improvement.

 * PQgetRowData(): can be called instead PQgetResult() to get raw row data
  in buffer, for more efficient processing.  This is optional feature
  that provides the original row-callback promise of avoiding unnecessary
  row data copy.

 * Although PQgetRowData() makes callback API unnecessary, it is still
  fully compatible with it - the callback should not see any difference
  whether the resultset is processed in single-row mode or
  old single-PGresult mode.  Unless it wants to - it can check
  PGRES_TUPLES_OK vs. PGRES_SINGLE_TUPLE.

The PQgetResult() is compatible with callbacks, the PQgetRowData()
bypasses them.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-06-16 Thread Tom Lane

Marko Kreen mark...@gmail.com writes:
 Now, looking at the problem with some perspective, the solution
 is obvious: when in single-row mode, the PQgetResult() must return
 proper PGresult for that single row.  And everything else follows that.
 
 * PQgetRowData(): can be called instead PQgetResult() to get raw row data
  in buffer, for more efficient processing.  This is optional feature
  that provides the original row-callback promise of avoiding unnecessary
  row data copy.
 
 * Although PQgetRowData() makes callback API unnecessary, it is still
  fully compatible with it - the callback should not see any difference
  whether the resultset is processed in single-row mode or
  old single-PGresult mode.  Unless it wants to - it can check
  PGRES_TUPLES_OK vs. PGRES_SINGLE_TUPLE.

I guess this raises the question of whether we ought to revert the
row-callback patch entirely and support only this approach.  IMO
it is (barely) not too late to do that for 9.2, if we want to.
If we don't want to, then this is just another new feature and
should be considered for 9.3.

What I like about this is the greatly simpler and harder-to-misuse
API.  The only arguable drawback is that there's still at least one
malloc/free cycle per tuple, imposed by the creation of a PGresult
for each one, whereas the callback approach avoids that.  But worrying
about that could be considered to be vast overoptimization; the backend
has certainly spent a lot more overhead than that generating the tuple.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-06-16 Thread Marko Kreen

On Sat, Jun 16, 2012 at 6:09 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I guess this raises the question of whether we ought to revert the
 row-callback patch entirely and support only this approach.  IMO
 it is (barely) not too late to do that for 9.2, if we want to.
 If we don't want to, then this is just another new feature and
 should be considered for 9.3.

I think row-callback is dangerous API that does not solve any
important problems.

But I do like the 2-phase processing the rowproc patch introduced
and having a way to bypass unnecessary malloc()+copy.

So my preference would be to simply remove the callback API
but keep the processing and provide PQgetRowData() instead.

Although the win that it brings is significantly smaller thanks
to single-row PQgetResult().  So if it does not sound interesting
to others, it can be dropped.  Because the single-row processing
is the important feature we need, rest is extra.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-06-15 Thread Merlin Moncure

On Fri, Jun 15, 2012 at 1:21 PM, Marko Kreen mark...@gmail.com wrote:
 The row-processor API is now in 9.2, but it solves only the
 different-row-storage problem, but not the one-row-at-a-time
 problem, as libpq is still in control until all rows are received.

 This means libpq cannet still be used to implement iterative
 result processing that almost all high-level languages are using.

 We discussed potential API for fetching on single row at a time,
 but did not reach conclusion.  Basic arguments were:

 1) Tom: PQisBusy must keep current behaviour.  Thus also PQgetResult()
   must keep current behaviour:
   * PQisBusy() - 0: need to call PQgetResult(), which returns PGresult
   * PQisBusy() - 1: need to call PQconsumeInput()
   * PQisBusy() must be callable several times in a row, thus be
     stateless from clients POV.

 2) Me: behaviour must not be controlled by callback, but client code
   that uses PQgetResult() + PQisBusy().

 Now, looking at the problem with some perspective, the solution
 is obvious: when in single-row mode, the PQgetResult() must return
 proper PGresult for that single row.  And everything else follows that.

 Such API is implemented in attached patch:

 * PQsetSingleRowMode(conn): set's single-row mode.

 * PQisBusy(): stops after each row in single-row mode, sets PGASYNC_ROW_READY.
  Thus keeping the property of being repeatedly callable.

 * PQgetResult(): returns copy of the row if PGASYNC_ROW_READY.  Sets row
  resultStatus to PGRES_SINGLE_TUPLE.  This needs to be different from
  PGRES_TUPLES_OK to detect resultset end.

 * PQgetRowData(): can be called instead PQgetResult() to get raw row data
  in buffer, for more efficient processing.  This is optional feature
  that provides the original row-callback promise of avoiding unnecessary
  row data copy.

 * Although PQgetRowData() makes callback API unnecessary, it is still
  fully compatible with it - the callback should not see any difference
  whether the resultset is processed in single-row mode or
  old single-PGresult mode.  Unless it wants to - it can check
  PGRES_TUPLES_OK vs. PGRES_SINGLE_TUPLE.

 There is some duplicate code here that can be refactored (callback exec),
 but I did not do it yet to avoid affecting existing code too much.

 --
 marko

 PS. If a squint it seems like fix of exising API instead of new feature,
 so perhaps it can still fit into 9.2?

+1 on rushing in row processing for 9.2, but only if the API feels
right (i'll spend some time to review).  I found the lack of iterative
row processing to be really unfortunate.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

54 matches

Mail list logo