subject:"\[HACKERS\] PATCH\: Batch\/pipelining support for libpq"

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-10-04 Thread Vaishnavi Prabakaran

On Mon, Oct 2, 2017 at 8:31 PM, Daniel Gustafsson  wrote:

> > On 13 Sep 2017, at 07:44, Vaishnavi Prabakaran <
> vaishnaviprabaka...@gmail.com> wrote:
> >
> > On Wed, Sep 13, 2017 at 3:33 PM, Craig Ringer  > wrote:
> >
> > I really do not like calling it "commit" as that conflates with a
> database commit.
> >
> > A batch can embed multiple BEGINs and COMMITs. It's entirely possible
> for an earlier part of the batch to succeed and commit, then a later part
> to fail, if that's the case. So that name is IMO wrong.
> >
> > Ok, SendQueue seems ok to me as well. Will change it in next version.
> >
> > +"a"?
> >
> > Hmm, Can you explain the question please. I don't understand.
> >
> > s/of new query/of a new query/
> >
> > Thanks for explaining. Will change this too in next version.
>
> Based on the discussions in this thread, and that a new version hasn’t been
> submitted, I’m marking this Returned with Feedback.  Please re-submit the
> new
> version in an upcoming commitfest when ready.



Thanks for the suggestion and, OK I will create a new patch in upcoming
commitfest with attached patch addressing above review comments.

Thanks & Regards,
Vaishnavi,
Fujitsu Australia.


0001-Pipelining-batch-support-for-libpq-code-v14.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-10-02 Thread Daniel Gustafsson

> On 13 Sep 2017, at 07:44, Vaishnavi Prabakaran 
>  wrote:
> 
> On Wed, Sep 13, 2017 at 3:33 PM, Craig Ringer  > wrote:
> 
> I really do not like calling it "commit" as that conflates with a database 
> commit.
> 
> A batch can embed multiple BEGINs and COMMITs. It's entirely possible for an 
> earlier part of the batch to succeed and commit, then a later part to fail, 
> if that's the case. So that name is IMO wrong.
> 
> Ok, SendQueue seems ok to me as well. Will change it in next version. 
>  
> +"a"?
> 
> Hmm, Can you explain the question please. I don't understand.
> 
> s/of new query/of a new query/
> 
> Thanks for explaining. Will change this too in next version. 

Based on the discussions in this thread, and that a new version hasn’t been
submitted, I’m marking this Returned with Feedback.  Please re-submit the new
version in an upcoming commitfest when ready.

cheers ./daniel

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-09-13 Thread Craig Ringer

On 13 September 2017 at 13:44, Vaishnavi Prabakaran <
vaishnaviprabaka...@gmail.com> wrote:


> Thanks for explaining. Will change this too in next version.
>
>
Thankyou, a lot, for picking up this patch.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-09-12 Thread Vaishnavi Prabakaran

On Wed, Sep 13, 2017 at 3:33 PM, Craig Ringer  wrote:

>
> I really do not like calling it "commit" as that conflates with a database
> commit.
>
> A batch can embed multiple BEGINs and COMMITs. It's entirely possible for
> an earlier part of the batch to succeed and commit, then a later part to
> fail, if that's the case. So that name is IMO wrong.
>

Ok, SendQueue seems ok to me as well. Will change it in next version.



>>> +"a"?
>>>
>>
>> Hmm, Can you explain the question please. I don't understand.
>>
>
> s/of new query/of a new query/
>
>
Thanks for explaining. Will change this too in next version.

Thanks & Regards,
Vaishnavi,
Fujitsu Australia.

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-09-12 Thread Craig Ringer

On 13 September 2017 at 13:06, Vaishnavi Prabakaran <
vaishnaviprabaka...@gmail.com> wrote:

>
>
> On Wed, Aug 23, 2017 at 7:40 PM, Andres Freund  wrote:
>
>>
>>
>>
>> > Am failing to see the benefit in allowing user to set
>> > PQBatchAutoFlush(true|false) property? Is it really needed?
>>
>> I'm inclined not to introduce that for now. If somebody comes up with a
>> convincing usecase and numbers, we can add it later. Libpq API is set in
>> stone, so I'd rather not introduce unnecessary stuff...
>>
>>
> Thanks for reviewing the patch and yes ok.
>
>
>>
>>
>> > +   
>> > +Much like asynchronous query mode, there is no performance
>> disadvantage to
>> > +using batching and pipelining. It increases client application
>> complexity
>> > +and extra caution is required to prevent client/server deadlocks
>> but
>> > +can sometimes offer considerable performance improvements.
>> > +   
>>
>> That's not necessarily true, is it? Unless you count always doing
>> batches of exactly size 1.
>>
>
> Client application complexity is increased in batch mode,because
> application needs to remember the query queue status. Results processing
> can be done at anytime, so the application needs to know till what query,
> the results are consumed.
>
>

Yep. Also, the client/server deadlocks at issue here are a buffer
management issue, and deadlock is probably not exactly the right word. Your
app has to process replies from the server while it's sending queries,
otherwise it can get into a state where it has no room left in its send
buffer, but the server isn't consuming its receive buffer because the
server's send buffer is full. To allow the system to make progress, the
client must read from the client receive buffer.

This isn't an issue when using libpq normally.

PgJDBC has similar issues with its batch mode, but in PgJDBC it's much
worse because there's no non-blocking send available. In libpq you can at
least set your sending socket to non-blocking.



>
> > +   
>> > +Use batches when your application does lots of small
>> > +INSERT, UPDATE and
>> > +DELETE operations that can't easily be
>> transformed into
>> > +operations on sets or into a
>> > +COPY
>> operation.
>> > +   
>>
>> Aren't SELECTs also a major beneficiarry of this?
>>
>
Yes, many individual SELECTs that cannot be assembled into a single more
efficient query would definitely also benefit.


> Hmm, though SELECTs also benefit from batch mode, doing multiple selects
> in batch mode will fill up the memory rapidly and might not be as
> beneficial as other operations listed.
>

Depends on the SELECT. With wide results you'll get less benefit, but even
then you can gain if you're on a high latency network. With "n+1" patterns
and similar, you'll see huge gains.


> Maybe note that multiple batches can be "in flight"?
>> I.e. PQbatchSyncQueue() is about error handling, nothing else? Don't
>> have a great idea, but we might want to rename...
>>
>>
> This function not only does error handling, but also sends the "Sync"
> message to backend. In batch mode, "Sync" message is not sent with every
> query but will
> be sent only via this function to mark the end of implicit transaction.
> Renamed it to PQbatchCommitQueue. Kindly let me know if you think of any
> other better name.
>

I really do not like calling it "commit" as that conflates with a database
commit.

A batch can embed multiple BEGINs and COMMITs. It's entirely possible for
an earlier part of the batch to succeed and commit, then a later part to
fail, if that's the case. So that name is IMO wrong.


>>
>> > +
>> > + 
>> > +  PQbatchSyncQueue
>> > +  
>> > +   PQbatchSyncQueue
>> > +  
>> > + 
>>
>> I wonder why this isn't framed as PQbatchIssue/Send/...()? Syncing seems
>> to mostly make sense from a protocol POV.
>>
>>
> Renamed to PQbatchCommitQueue.
>
>
Per above, strong -1 on that. But SendQueue seems OK, or FlushQueue?


>
>> > + *   Put an idle connection in batch mode. Commands submitted after
>> this
>> > + *   can be pipelined on the connection, there's no requirement to
>> wait for
>> > + *   one to finish before the next is dispatched.
>> > + *
>> > + *   Queuing of new query or syncing during COPY is not allowed.
>>
>> +"a"?
>>
>
> Hmm, Can you explain the question please. I don't understand.
>

s/of new query/of a new query/


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-09-12 Thread Vaishnavi Prabakaran

On Wed, Aug 23, 2017 at 7:40 PM, Andres Freund  wrote:

>
>
>
> > Am failing to see the benefit in allowing user to set
> > PQBatchAutoFlush(true|false) property? Is it really needed?
>
> I'm inclined not to introduce that for now. If somebody comes up with a
> convincing usecase and numbers, we can add it later. Libpq API is set in
> stone, so I'd rather not introduce unnecessary stuff...
>
>
Thanks for reviewing the patch and yes ok.


>
>
> > +   
> > +Much like asynchronous query mode, there is no performance
> disadvantage to
> > +using batching and pipelining. It increases client application
> complexity
> > +and extra caution is required to prevent client/server deadlocks but
> > +can sometimes offer considerable performance improvements.
> > +   
>
> That's not necessarily true, is it? Unless you count always doing
> batches of exactly size 1.
>

Client application complexity is increased in batch mode,because
application needs to remember the query queue status. Results processing
can be done at anytime, so the application needs to know till what query,
the results are consumed.


> +   
> > +Use batches when your application does lots of small
> > +INSERT, UPDATE and
> > +DELETE operations that can't easily be
> transformed into
> > +operations on sets or into a
> > +COPY
> operation.
> > +   
>
> Aren't SELECTs also a major beneficiarry of this?
>


Hmm, though SELECTs also benefit from batch mode, doing multiple selects in
batch mode will fill up the memory rapidly and might not be as beneficial
as other operations listed.



> > +   
> > +Batching is less useful when information from one operation is
> required by the
> > +client before it knows enough to send the next operation.
>
> s/less/not/
>
>
Corrected.


>
> > +   
> > +
> > + The batch API was introduced in PostgreSQL 10.0, but clients using
> PostgresSQL 10.0 version of libpq can
> > + use batches on server versions 8.4 and newer. Batching works on
> any server
> > + that supports the v3 extended query protocol.
> > +
> > +   
>
> Where's the 8.4 coming from?
>
>

I guess it is 7.4 where "PQsendQueryParams" is introduced, and not 8.4.
Corrected.


> +   
> > +
> > + It is best to use batch mode with libpq
> in
> > + non-blocking mode.
> If used in
> > + blocking mode it is possible for a client/server deadlock to
> occur. The
> > + client will block trying to send queries to the server, but the
> server will
> > + block trying to send results from queries it has already processed
> to the
> > + client. This only occurs when the client sends enough queries to
> fill its
> > + output buffer and the server's receive buffer before switching to
> > + processing input from the server, but it's hard to predict exactly
> when
> > + that'll happen so it's best to always use non-blocking mode.
> > +
> > +   
>
> Mention that nonblocking only actually helps if send/recv is done as
> required, and can essentially require unbound memory?  We probably
> should either document or implement some smarts about when to signal
> read/write readyness. Otherwise we e.g. might be receiving tons of
> result data without having sent the next query - or the other way round.
>
>

Added a statement for caution in documentation and again this is one of the
reason why SELECT query is not so beneficial in batch mode.



> Maybe note that multiple batches can be "in flight"?
> I.e. PQbatchSyncQueue() is about error handling, nothing else? Don't
> have a great idea, but we might want to rename...
>
>
This function not only does error handling, but also sends the "Sync"
message to backend. In batch mode, "Sync" message is not sent with every
query but will
be sent only via this function to mark the end of implicit transaction.
Renamed it to PQbatchCommitQueue. Kindly let me know if you think of any
other better name.



> > +
> > + 
> > +  The client must not assume that work is committed when it
> > +  sends a COMMIT, only when
> the
> > +  corresponding result is received to confirm the commit is
> complete.
> > +  Because errors arrive asynchronously the application needs to be
> able to
> > +  restart from the last received committed
> change and
> > +  resend work done after that point if something goes wrong.
> > + 
> > +
>
> This seems fairly independent of batching.
>
>
Yes and the reason why is it explicitly specified for batch mode is that if
more than one explicit transactions are used in Single batch, then failure
of one transaction will lead to skipping the consequent transactions until
the end of current batch is reached. This behavior is specific to batch
mode, so adding a precautionary note here is needed I think.



> > +   
> > +
> > +   
> > +Interleaving result processing and query dispatch
> > +
> > +
> > + To avoid deadlocks on large batches the client should be
>

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-08-23 Thread Andres Freund

Hi,

On 2017-08-10 15:23:06 +1000, Vaishnavi Prabakaran wrote:
> Andres Freund wrote :
> > If you were to send a gigabyte of queries, it'd buffer them all up in
> memory... So some
> >more intelligence is going to be needed.
> 
> Firstly, sorry for the delayed response as I got busy with other
> activities.

No worries - development of new features was slowed down anyway, due to
the v10 feature freeze.


> To buffer up the queries before flushing them to the socket, I think
> "conn->outCount>=65536" is ok to use, as 64k is considered to be safe in
> Windows as per comments in pqSendSome() API.
> 
> Attached the code patch to replace pqFlush calls with pqBatchFlush in the
> asynchronous libpq function calls flow.
> Still pqFlush is used in "PQbatchSyncQueue" and
> "PQexitBatchMode" functions.


> Am failing to see the benefit in allowing user to set
> PQBatchAutoFlush(true|false) property? Is it really needed?

I'm inclined not to introduce that for now. If somebody comes up with a
convincing usecase and numbers, we can add it later. Libpq API is set in
stone, so I'd rather not introduce unnecessary stuff...



> +   
> +Much like asynchronous query mode, there is no performance disadvantage 
> to
> +using batching and pipelining. It increases client application complexity
> +and extra caution is required to prevent client/server deadlocks but
> +can sometimes offer considerable performance improvements.
> +   

That's not necessarily true, is it? Unless you count always doing
batches of exactly size 1.


> +   
> +Batching is most useful when the server is distant, i.e. network latency
> +(ping time) is high, and when many small operations are 
> being performed in
> +rapid sequence. There is usually less benefit in using batches when each
> +query takes many multiples of the client/server round-trip time to 
> execute.
> +A 100-statement operation run on a server 300ms round-trip-time away 
> would take
> +30 seconds in network latency alone without batching; with batching it 
> may spend
> +as little as 0.3s waiting for results from the server.
> +   

I'd add a remark that this is frequently beneficial even in cases of
minimal latency - as e.g. shown by the numbers I presented upthread.


> +   
> +Use batches when your application does lots of small
> +INSERT, UPDATE and
> +DELETE operations that can't easily be transformed 
> into
> +operations on sets or into a
> +COPY operation.
> +   

Aren't SELECTs also a major beneficiarry of this?


> +   
> +Batching is less useful when information from one operation is required 
> by the
> +client before it knows enough to send the next operation.

s/less/not/


> +   
> +
> + The batch API was introduced in PostgreSQL 10.0, but clients using 
> PostgresSQL 10.0 version of libpq can
> + use batches on server versions 8.4 and newer. Batching works on any 
> server
> + that supports the v3 extended query protocol.
> +
> +   

Where's the 8.4 coming from?


> +   
> +The client uses libpq's asynchronous query functions to dispatch work,
> +marking the end of each batch with PQbatchSyncQueue.
> +And to get results, it uses PQgetResult and
> +PQbatchProcessQueue. It may eventually exit
> +batch mode with PQexitBatchMode once all results are
> +processed.
> +   
> +
> +   
> +
> + It is best to use batch mode with libpq in
> + non-blocking mode. If 
> used in
> + blocking mode it is possible for a client/server deadlock to occur. The
> + client will block trying to send queries to the server, but the server 
> will
> + block trying to send results from queries it has already processed to 
> the
> + client. This only occurs when the client sends enough queries to fill 
> its
> + output buffer and the server's receive buffer before switching to
> + processing input from the server, but it's hard to predict exactly when
> + that'll happen so it's best to always use non-blocking mode.
> +
> +   

Mention that nonblocking only actually helps if send/recv is done as
required, and can essentially require unbound memory?  We probably
should either document or implement some smarts about when to signal
read/write readyness. Otherwise we e.g. might be receiving tons of
result data without having sent the next query - or the other way round.


> +   
> +Issuing queries
> +
> +
> + After entering batch mode the application dispatches requests
> + using normal asynchronous libpq functions 
> such as 
> + PQsendQueryParams, 
> PQsendPrepare,
> + PQsendQueryPrepared, 
> PQsendDescribePortal,
> + PQsendDescribePrepared.
> + The asynchronous requests are followed by a  + 
> linkend="libpq-PQbatchSyncQueue">PQbatchSyncQueue(conn)
>  call to mark
> + the end of the batch. The client does not need to 
> call
> + PQgetResult immediately after dispatching each
> + operation. Result

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-08-09 Thread Vaishnavi Prabakaran

Andres Freund wrote :
> If you were to send a gigabyte of queries, it'd buffer them all up in
memory... So some
>more intelligence is going to be needed.

Firstly, sorry for the delayed response as I got busy with other
activities.

To buffer up the queries before flushing them to the socket, I think
"conn->outCount>=65536" is ok to use, as 64k is considered to be safe in
Windows as per comments in pqSendSome() API.

Attached the code patch to replace pqFlush calls with pqBatchFlush in the
asynchronous libpq function calls flow.
Still pqFlush is used in "PQbatchSyncQueue" and
"PQexitBatchMode" functions.

Am failing to see the benefit in allowing user to set
PQBatchAutoFlush(true|false) property? Is it really needed?

Thanks & Regards,
Vaishnavi,
Fujitsu Australia.


0001-Pipelining-batch-support-for-libpq-code-v12.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-06-22 Thread Daniel Verite

Andres Freund wrote:

> > One option may be to leave that decision to the user by providing a
> > PQBatchAutoFlush(true|false) property, along with a PQBatchFlush()
> > function.
> 
> What'd be the difference between PQflush() and PQbatchFlush()?

I guess no difference, I was just not seeing that libpq already provides
this functionality...

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-06-22 Thread Andres Freund

Hi,

On 2017-06-22 13:43:35 +0200, Daniel Verite wrote:
> But OTOH there are certainly batch workloads where it will be preferrable
> for the first query to reach the server ASAP, rather than waiting to be
> coalesced with the next ones.

Is that really something people expect from a batch API?  I suspect it's
not really, and nothing would stop one from adding PQflush() or similar
calls if desirable anyway.

FWIW, the way I did that in the hack clearly isn't ok: If you were to
send a gigabyte of queries, it'd buffer them all up in memory... So some
more intelligence is going to be needed.

> libpq is not going to know what's best.
> One option may be to leave that decision to the user by providing a
> PQBatchAutoFlush(true|false) property, along with a PQBatchFlush()
> function.

What'd be the difference between PQflush() and PQbatchFlush()?

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-06-22 Thread Daniel Verite

Andres Freund wrote:

-   if (pqFlush(conn) < 0)
-   goto sendFailed;
+   if (conn->batch_status == PQBATCH_MODE_OFF)
+   {
+   /*
+* Give the data a push.  In nonblock mode, don't complain if
we're unable
+* to send it all; PQgetResult() will do any additional
flushing needed.
+*/
+   if (pqFlush(conn) < 0)
+   goto sendFailed;
+   }

Seems to be responsible for roughly an 1.7x speedup in tps and equivalent
decrease in latency, based on the "progress" info.
I wonder how much of that corresponds to a decrease in the number of
packets versus the number of syscalls. Both matter, I guess.

But OTOH there are certainly batch workloads where it will be preferrable
for the first query to reach the server ASAP, rather than waiting to be
coalesced with the next ones.
libpq is not going to know what's best.
One option may be to leave that decision to the user by providing a
PQBatchAutoFlush(true|false) property, along with a PQBatchFlush()
function. Maybe we could even let the user set the size of the sending
buffer, so those who really want to squeeze performance may tune it
for their network and workload.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-06-22 Thread Daniel Verite

Craig Ringer wrote:

> The kernel will usually do some packet aggregation unless we use
> TCP_NODELAY (which we don't and shouldn't)

Not sure. As a point of comparison, Oracle has it as a tunable
parameter (TCP.NODELAY), and they changed its default from
No to Yes starting from their 10g R2.

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-06-21 Thread Andres Freund

On 2017-06-21 18:07:21 -0700, Andres Freund wrote:
> On 2017-06-22 09:03:05 +0800, Craig Ringer wrote:
> > On 22 June 2017 at 08:29, Andres Freund  wrote:
> > 
> > > I.e. we're doing tiny write send() syscalls (they should be coalesced)
> > 
> > That's likely worth doing, but can probably wait for a separate patch.
> 
> I don't think so, we should get this right, it could have API influence.
> 
> 
> > The kernel will usually do some packet aggregation unless we use
> > TCP_NODELAY (which we don't and shouldn't), and the syscall overhead
> > is IMO not worth worrying about just yet.
> 
> 1)
>   /*
>* Select socket options: no delay of 
> outgoing data for
>* TCP sockets, nonblock mode, 
> close-on-exec. Fail if any
>* of this fails.
>*/
>   if (!IS_AF_UNIX(addr_cur->ai_family))
>   {
>   if (!connectNoDelay(conn))
>   {
>   pqDropConnection(conn, 
> true);
>   conn->addr_cur = 
> addr_cur->ai_next;
>   continue;
>   }
>   }
> 
> 2) Even if nodelay weren't set, this can still lead to smaller packets
>being sent, because you start sending normal sized tcp packets,
>rather than jumbo ones, even if configured (pretty common these
>days).
> 
> 3) Syscall overhead is actually quite significant.

Proof of the pudding:

pgbench of 10 pgbench select statements in a batch:

as submitted by Daniel:

pgbench -h localhost -M prepared -S -n -c 16 -j 16 -T 1 -P 1 -f 
~/tmp/pgbench-select-only-batch.sq
progress: 1.0 s, 24175.5 tps, lat 0.647 ms stddev 0.782
progress: 2.0 s, 27737.6 tps, lat 0.577 ms stddev 0.625
progress: 3.0 s, 28853.3 tps, lat 0.554 ms stddev 0.619
progress: 4.0 s, 26660.8 tps, lat 0.600 ms stddev 0.776
progress: 5.0 s, 30023.8 tps, lat 0.533 ms stddev 0.484
progress: 6.0 s, 29959.3 tps, lat 0.534 ms stddev 0.450
progress: 7.0 s, 29944.9 tps, lat 0.534 ms stddev 0.536
progress: 8.0 s, 30137.7 tps, lat 0.531 ms stddev 0.533
progress: 9.0 s, 30285.2 tps, lat 0.528 ms stddev 0.479
progress: 10.0 s, 30228.7 tps, lat 0.529 ms stddev 0.460
progress: 11.0 s, 29921.4 tps, lat 0.534 ms stddev 0.613
progress: 12.0 s, 29982.4 tps, lat 0.533 ms stddev 0.510
progress: 13.0 s, 29247.4 tps, lat 0.547 ms stddev 0.526
progress: 14.0 s, 28757.3 tps, lat 0.556 ms stddev 0.635
progress: 15.0 s, 29035.3 tps, lat 0.551 ms stddev 0.523
^C

sample vmstat:
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa st
19  0  0 488992 787332 2355867600 0 0 9720 455099 65 35  0  
0  0

(i.e. ~450k context switches)

hackily patched:
pgbench -h localhost -M prepared -S -n -c 16 -j 16 -T 1 -P 1 -f 
~/tmp/pgbench-select-only-batch.sq
progress: 1.0 s, 40545.2 tps, lat 0.386 ms stddev 0.625
progress: 2.0 s, 48158.0 tps, lat 0.332 ms stddev 0.277
progress: 3.0 s, 50125.7 tps, lat 0.319 ms stddev 0.204
progress: 4.0 s, 50740.6 tps, lat 0.315 ms stddev 0.250
progress: 5.0 s, 50795.6 tps, lat 0.315 ms stddev 0.246
progress: 6.0 s, 51195.6 tps, lat 0.312 ms stddev 0.207
progress: 7.0 s, 50746.7 tps, lat 0.315 ms stddev 0.264
progress: 8.0 s, 50619.1 tps, lat 0.316 ms stddev 0.250
progress: 9.0 s, 50619.4 tps, lat 0.316 ms stddev 0.228
progress: 10.0 s, 46967.8 tps, lat 0.340 ms stddev 0.499
progress: 11.0 s, 50480.1 tps, lat 0.317 ms stddev 0.239
progress: 12.0 s, 50242.5 tps, lat 0.318 ms stddev 0.286
progress: 13.0 s, 49912.7 tps, lat 0.320 ms stddev 0.266
progress: 14.0 s, 49841.7 tps, lat 0.321 ms stddev 0.271
progress: 15.0 s, 49807.1 tps, lat 0.321 ms stddev 0.248
^C

sample vmstat:
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa st
23  0  0 482008 787312 2355899600 0 0 8219 105097 87 14  0  
0  0

(i.e. ~100k context switches)

That's *localhost*.


It's completely possible that I've screwed something up here, I didn't
test it besides running pgbench, but the send/recv'd data looks like
it's similar amounts of data, just fewer syscalls.

Greetings,

Andres Freund
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index e498ad61e5..aeed1649ce 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -2352,14 +2352,17 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 if (debug)
 	fprintf(stderr, "client %d receiving\n", st->id);
 
-if (!PQconsumeInput(st->con))
-{/* there's something wrong */
-	commandFailed(st, "perhaps the backend died while processing");
-

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-06-21 Thread Craig Ringer

On 22 June 2017 at 09:07, Andres Freund  wrote:
> On 2017-06-22 09:03:05 +0800, Craig Ringer wrote:
>> On 22 June 2017 at 08:29, Andres Freund  wrote:
>>
>> > I.e. we're doing tiny write send() syscalls (they should be coalesced)
>>
>> That's likely worth doing, but can probably wait for a separate patch.
>
> I don't think so, we should get this right, it could have API influence.
>
>
>> The kernel will usually do some packet aggregation unless we use
>> TCP_NODELAY (which we don't and shouldn't), and the syscall overhead
>> is IMO not worth worrying about just yet.
>
> 1)
> /*
>  * Select socket options: no delay of 
> outgoing data for
>  * TCP sockets, nonblock mode, 
> close-on-exec. Fail if any
>  * of this fails.
>  */
> if (!IS_AF_UNIX(addr_cur->ai_family))
> {
> if (!connectNoDelay(conn))
> {
> 
> pqDropConnection(conn, true);
> conn->addr_cur = 
> addr_cur->ai_next;
> continue;
> }
> }
>
> 2) Even if nodelay weren't set, this can still lead to smaller packets
>being sent, because you start sending normal sized tcp packets,
>rather than jumbo ones, even if configured (pretty common these
>days).
>
> 3) Syscall overhead is actually quite significant.

Fair enough, and *headdesk* re not checking NODELAY. I thought I'd
checked for our use of that before, but I must've remembered wrong.

We could use TCP_CORK but it's not portable and it'd be better to just
collect up a buffer to dispatch.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-06-21 Thread Andres Freund

On 2017-06-22 09:03:05 +0800, Craig Ringer wrote:
> On 22 June 2017 at 08:29, Andres Freund  wrote:
> 
> > I.e. we're doing tiny write send() syscalls (they should be coalesced)
> 
> That's likely worth doing, but can probably wait for a separate patch.

I don't think so, we should get this right, it could have API influence.

> The kernel will usually do some packet aggregation unless we use
> TCP_NODELAY (which we don't and shouldn't), and the syscall overhead
> is IMO not worth worrying about just yet.

1)
/*
 * Select socket options: no delay of 
outgoing data for
 * TCP sockets, nonblock mode, 
close-on-exec. Fail if any
 * of this fails.
 */
if (!IS_AF_UNIX(addr_cur->ai_family))
{
if (!connectNoDelay(conn))
{
pqDropConnection(conn, 
true);
conn->addr_cur = 
addr_cur->ai_next;
continue;
}
}

2) Even if nodelay weren't set, this can still lead to smaller packets
   being sent, because you start sending normal sized tcp packets,
   rather than jumbo ones, even if configured (pretty common these
   days).

3) Syscall overhead is actually quite significant.

- Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-06-21 Thread Craig Ringer

On 22 June 2017 at 08:29, Andres Freund  wrote:

> I.e. we're doing tiny write send() syscalls (they should be coalesced)

That's likely worth doing, but can probably wait for a separate patch.
The kernel will usually do some packet aggregation unless we use
TCP_NODELAY (which we don't and shouldn't), and the syscall overhead
is IMO not worth worrying about just yet.

> and then completely unnecessarily call recv() over and over again
> without polling.  To me it looks very much like we're just doing either
> exactly once per command...

Yeah, that looks suspect.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-06-21 Thread Andres Freund

On 2017-06-21 16:40:48 -0700, Andres Freund wrote:
> On 2017-06-20 17:51:23 +0200, Daniel Verite wrote:
> > Andres Freund wrote:
> > 
> > > FWIW, I still think this needs a pgbench or similar example integration,
> > > so we can actually properly measure the benefits.
> > 
> > Here's an updated version of the patch I made during review,
> > adding \beginbatch and \endbatch to pgbench.
> > The performance improvement appears clearly
> > with a custom script of this kind:
> >   \beginbatch
> >  UPDATE pgbench_branches SET bbalance = bbalance + 1 WHERE bid = 0;
> >  ..above repeated 1000 times...
> >   \endbatch
> > 
> > versus the same with a BEGIN; END; pair instead of \beginbatch \endbatch
> > 
> > On localhost on my desktop I tend to see a 30% difference in favor
> > of the batch mode with that kind of test.
> > On slower networks there are much bigger differences.
> 
> This is seriously impressive.  Just using the normal pgbench mixed
> workload, wrapping a whole transaction into a batch *doubles* the
> throughput.  And that's locally over a unix socket - the gain over
> actual network will be larger.

I've not analyzed this further, but something with the way network is
done isn't yet quite right either in the pgbench patch or in the libpq
patch.  You'll currently get IO like:

sendto(3, "B\0\0\0\22\0P1_2\0\0\0\0\0\0\1\0\0D\0\0\0\6P\0E\0\0\0\t\0"..., 36, 
MSG_NOSIGNAL, NULL, 0) = 36
sendto(3, "B\0\0\0\35\0P1_4\0\0\0\0\1\0\0\0\0073705952\0\1\0\0D\0"..., 47, 
MSG_NOSIGNAL, NULL, 0) = 47
sendto(3, "B\0\0\0\35\0P1_6\0\0\0\0\1\0\0\0\0077740854\0\1\0\0D\0"..., 47, 
MSG_NOSIGNAL, NULL, 0) = 47
sendto(3, "B\0\0\0\35\0P1_8\0\0\0\0\1\0\0\0\0071570280\0\1\0\0D\0"..., 47, 
MSG_NOSIGNAL, NULL, 0) = 47
sendto(3, "B\0\0\0\36\0P1_10\0\0\0\0\1\0\0\0\0072634305\0\1\0\0D"..., 48, 
MSG_NOSIGNAL, NULL, 0) = 48
sendto(3, "B\0\0\0\36\0P1_12\0\0\0\0\1\0\0\0\0078960656\0\1\0\0D"..., 48, 
MSG_NOSIGNAL, NULL, 0) = 48
sendto(3, "B\0\0\0\36\0P1_14\0\0\0\0\1\0\0\0\0073030370\0\1\0\0D"..., 48, 
MSG_NOSIGNAL, NULL, 0) = 48
sendto(3, "B\0\0\0\35\0P1_16\0\0\0\0\1\0\0\0\006376125\0\1\0\0D\0"..., 47, 
MSG_NOSIGNAL, NULL, 0) = 47
sendto(3, "B\0\0\0\36\0P1_18\0\0\0\0\1\0\0\0\0072982423\0\1\0\0D"..., 48, 
MSG_NOSIGNAL, NULL, 0) = 48
sendto(3, "B\0\0\0\36\0P1_20\0\0\0\0\1\0\0\0\0073860195\0\1\0\0D"..., 48, 
MSG_NOSIGNAL, NULL, 0) = 48
sendto(3, "B\0\0\0\36\0P1_22\0\0\0\0\1\0\0\0\0072794433\0\1\0\0D"..., 48, 
MSG_NOSIGNAL, NULL, 0) = 48
sendto(3, "B\0\0\0\36\0P1_24\0\0\0\0\1\0\0\0\0075475271\0\1\0\0D"..., 48, 
MSG_NOSIGNAL, NULL, 0) = 48
sendto(3, "B\0\0\0\23\0P1_25\0\0\0\0\0\0\1\0\0D\0\0\0\6P\0E\0\0\0\t"..., 37, 
MSG_NOSIGNAL, NULL, 0) = 37
sendto(3, "S\0\0\0\4", 5, MSG_NOSIGNAL, NULL, 0) = 5
recvfrom(3, "2\0\0\0\4n\0\0\0\4C\0\0\0\nBEGIN\0002\0\0\0\4T\0\0\0!\0"..., 
16384, 0, NULL, NULL) = 775
recvfrom(3, 0x559a02667ff2, 15630, 0, NULL, NULL) = -1 EAGAIN (Resource 
temporarily unavailable)
recvfrom(3, 0x559a02667fb1, 15695, 0, NULL, NULL) = -1 EAGAIN (Resource 
temporarily unavailable)
recvfrom(3, 0x559a02667f6c, 15764, 0, NULL, NULL) = -1 EAGAIN (Resource 
temporarily unavailable)
recvfrom(3, 0x559a02667f2b, 15829, 0, NULL, NULL) = -1 EAGAIN (Resource 
temporarily unavailable)
recvfrom(3, 0x559a02667eea, 15894, 0, NULL, NULL) = -1 EAGAIN (Resource 
temporarily unavailable)
recvfrom(3, 0x559a02667ea9, 15959, 0, NULL, NULL) = -1 EAGAIN (Resource 
temporarily unavailable)
recvfrom(3, 0x559a02667e68, 16024, 0, NULL, NULL) = -1 EAGAIN (Resource 
temporarily unavailable)
recvfrom(3, 0x559a02667e24, 16092, 0, NULL, NULL) = -1 EAGAIN (Resource 
temporarily unavailable)
recvfrom(3, 0x559a02667de3, 16157, 0, NULL, NULL) = -1 EAGAIN (Resource 
temporarily unavailable)
recvfrom(3, 0x559a02667da2, 16222, 0, NULL, NULL) = -1 EAGAIN (Resource 
temporarily unavailable)
recvfrom(3, 0x559a02667d5d, 16291, 0, NULL, NULL) = -1 EAGAIN (Resource 
temporarily unavailable)
recvfrom(3, 0x559a02667d1c, 16356, 0, NULL, NULL) = -1 EAGAIN (Resource 
temporarily unavailable)
recvfrom(3, 0x559a02667d06, 16378, 0, NULL, NULL) = -1 EAGAIN (Resource 
temporarily unavailable)

I.e. we're doing tiny write send() syscalls (they should be coalesced)
and then completely unnecessarily call recv() over and over again
without polling.  To me it looks very much like we're just doing either
exactly once per command...

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-06-21 Thread Craig Ringer

On 22 Jun. 2017 07:40, "Andres Freund"  wrote:

On 2017-06-20 17:51:23 +0200, Daniel Verite wrote:
>   Andres Freund wrote:
>
> > FWIW, I still think this needs a pgbench or similar example integration,
> > so we can actually properly measure the benefits.
>
> Here's an updated version of the patch I made during review,
> adding \beginbatch and \endbatch to pgbench.
> The performance improvement appears clearly
> with a custom script of this kind:
>   \beginbatch
>  UPDATE pgbench_branches SET bbalance = bbalance + 1 WHERE bid = 0;
>  ..above repeated 1000 times...
>   \endbatch
>
> versus the same with a BEGIN; END; pair instead of \beginbatch \endbatch
>
> On localhost on my desktop I tend to see a 30% difference in favor
> of the batch mode with that kind of test.
> On slower networks there are much bigger differences.

This is seriously impressive.  Just using the normal pgbench mixed
workload, wrapping a whole transaction into a batch *doubles* the
throughput.  And that's locally over a unix socket - the gain over
actual network will be larger.


In my original tests I got over a 300x improvement on WAN :) . I should
check if the same applies with pgbench.

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-06-21 Thread Andres Freund

On 2017-06-20 17:51:23 +0200, Daniel Verite wrote:
>   Andres Freund wrote:
> 
> > FWIW, I still think this needs a pgbench or similar example integration,
> > so we can actually properly measure the benefits.
> 
> Here's an updated version of the patch I made during review,
> adding \beginbatch and \endbatch to pgbench.
> The performance improvement appears clearly
> with a custom script of this kind:
>   \beginbatch
>  UPDATE pgbench_branches SET bbalance = bbalance + 1 WHERE bid = 0;
>  ..above repeated 1000 times...
>   \endbatch
> 
> versus the same with a BEGIN; END; pair instead of \beginbatch \endbatch
> 
> On localhost on my desktop I tend to see a 30% difference in favor
> of the batch mode with that kind of test.
> On slower networks there are much bigger differences.

This is seriously impressive.  Just using the normal pgbench mixed
workload, wrapping a whole transaction into a batch *doubles* the
throughput.  And that's locally over a unix socket - the gain over
actual network will be larger.

\set nbranches 1 * :scale
\set ntellers 10 * :scale
\set naccounts 10 * :scale
\set aid random(1, :naccounts)
\set bid random(1, :nbranches)
\set tid random(1, :ntellers)
\set delta random(-5000, 5000)
\beginbatch
BEGIN;
UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, 
:aid, :delta, CURRENT_TIMESTAMP);
END;
\endbatch

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-06-20 Thread Daniel Verite

Andres Freund wrote:

> FWIW, I still think this needs a pgbench or similar example integration,
> so we can actually properly measure the benefits.

Here's an updated version of the patch I made during review,
adding \beginbatch and \endbatch to pgbench.
The performance improvement appears clearly
with a custom script of this kind:
  \beginbatch
 UPDATE pgbench_branches SET bbalance = bbalance + 1 WHERE bid = 0;
 ..above repeated 1000 times...
  \endbatch

versus the same with a BEGIN; END; pair instead of \beginbatch \endbatch

On localhost on my desktop I tend to see a 30% difference in favor
of the batch mode with that kind of test.
On slower networks there are much bigger differences.

The latest main patch (v10) must also be slightly updated for HEAD,
because of this:
error: patch failed: src/interfaces/libpq/exports.txt:171
v11 attached without any other change.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index c3bd4f9..366f278 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -4656,6 +4656,526 @@ int PQflush(PGconn *conn);
 
  
 
+ 
+  Batch mode and query pipelining
+
+  
+   libpq
+   batch mode
+  
+
+  
+   libpq
+   pipelining
+  
+
+  
+   libpq supports queueing up queries into
+   a pipeline to be executed as a batch on the server. Batching queries allows
+   applications to avoid a client/server round-trip after each query to get
+   the results before issuing the next query.
+  
+
+  
+   An example of batch use may be found in the source distribution in
+   src/test/modules/test_libpq/testlibpqbatch.c.
+  
+
+  
+   When to use batching
+
+   
+Much like asynchronous query mode, there is no performance disadvantage to
+using batching and pipelining. It increases client application complexity
+and extra caution is required to prevent client/server deadlocks but
+can sometimes offer considerable performance improvements.
+   
+
+   
+Batching is most useful when the server is distant, i.e. network latency
+(ping time) is high, and when many small operations are 
being performed in
+rapid sequence. There is usually less benefit in using batches when each
+query takes many multiples of the client/server round-trip time to execute.
+A 100-statement operation run on a server 300ms round-trip-time away would 
take
+30 seconds in network latency alone without batching; with batching it may 
spend
+as little as 0.3s waiting for results from the server.
+   
+
+   
+Use batches when your application does lots of small
+INSERT, UPDATE and
+DELETE operations that can't easily be transformed into
+operations on sets or into a
+COPY operation.
+   
+
+   
+Batching is less useful when information from one operation is required by 
the
+client before it knows enough to send the next operation. The client must
+introduce a synchronisation point and wait for a full client/server
+round-trip to get the results it needs. However, it's often possible to
+adjust the client design to exchange the required information server-side.
+Read-modify-write cycles are especially good candidates; for example:
+
+ BEGIN;
+ SELECT x FROM mytable WHERE id = 42 FOR UPDATE;
+ -- result: x=2
+ -- client adds 1 to x:
+ UPDATE mytable SET x = 3 WHERE id = 42;
+ COMMIT;
+
+could be much more efficiently done with:
+
+ UPDATE mytable SET x = x + 1 WHERE id = 42;
+
+   
+
+   
+
+ The batch API was introduced in PostgreSQL 10.0, but clients using 
PostgresSQL 10.0 version of libpq can
+ use batches on server versions 8.4 and newer. Batching works on any server
+ that supports the v3 extended query protocol.
+
+   
+
+  
+
+  
+   Using batch mode
+
+   
+To issue batches the application must switch
+a connection into batch mode. Enter batch mode with PQenterBatchMode(conn)
 or test
+whether batch mode is active with PQbatchStatus(conn). 
In batch mode only asynchronous operations are permitted, and
+COPY is not recommended as it most likely will trigger 
failure in batch processing. 
+Using any synchronous command execution functions such as 
PQfn,
+PQexec or one of its sibling functions are error 
conditions.
+Functions allowed in batch mode are described in . 
+   
+
+   
+The client uses libpq's asynchronous query functions to dispatch work,
+marking the end of each batch with PQbatchSyncQueue.
+And to get results, it uses PQgetResult and
+PQbatchProcessQueue. It may eventually exit
+batch mode with PQexitBatchMode once all results are
+processed.
+   
+
+   
+
+ It is best to use batch mode with libpq in
+ non-blocking mode. If used 
in
+ blocking mode it is possible for a client/server deadlock to occur. The
+ client will block trying

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-06-19 Thread Michael Paquier

On Tue, Jun 20, 2017 at 10:43 AM, Craig Ringer  wrote:
> Especially with a 6-week-old baby now

Congratulations!
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-06-19 Thread Craig Ringer

On 20 June 2017 at 06:49, Andres Freund  wrote:
> On 2017-04-05 15:45:26 -0700, Andres Freund wrote:
>> Hi,
>>
>> On 2017-04-05 17:00:42 +1000, Vaishnavi Prabakaran wrote:
>> > Regarding test patch, I have corrected the test suite after David Steele's
>> > comments.
>> > Also, I would like to mention that a companion patch was submitted by David
>> > Steele up-thread.
>> >
>> > Attached the latest code and test patch.
>>
>> My impression is that this'll need a couple more rounds of review. Given
>> that this'll establish API we'll pretty much ever going to be able to
>> change/remove, I think it'd be a bad idea to rush this into v10.
>> Therefore I propose moving this to the next CF.
>
> Craig, Vaishnavi, everyone else: Are you planning to continue to work on
> this for v11?  I'm willing to do another round, but only if it's
> worthwhile.

I'm happy to work on review, and will try to make some time, but have
to focus primarily on logical rep infrastructure. This patch was a
proof of concept and fun hack for me and while I'm glad folks are
interested, it's not something I can dedicate much time to. Especially
with a 6-week-old baby now

> FWIW, I still think this needs a pgbench or similar example integration,
> so we can actually properly measure the benefits.

I agree. I originally wanted to patch psql, but it's pretty intrusive.
pgbench is likely a better target. Also pg_restore.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-06-19 Thread Vaishnavi Prabakaran

On Tue, Jun 20, 2017 at 8:49 AM, Andres Freund  wrote:

> On 2017-04-05 15:45:26 -0700, Andres Freund wrote:
> > Hi,
> >
> > On 2017-04-05 17:00:42 +1000, Vaishnavi Prabakaran wrote:
> > > Regarding test patch, I have corrected the test suite after David
> Steele's
> > > comments.
> > > Also, I would like to mention that a companion patch was submitted by
> David
> > > Steele up-thread.
> > >
> > > Attached the latest code and test patch.
> >
> > My impression is that this'll need a couple more rounds of review. Given
> > that this'll establish API we'll pretty much ever going to be able to
> > change/remove, I think it'd be a bad idea to rush this into v10.
> > Therefore I propose moving this to the next CF.
>
> Craig, Vaishnavi, everyone else: Are you planning to continue to work on
> this for v11?  I'm willing to do another round, but only if it's
> worthwhile.
>

Yes, am willing to continue working on this patch for v11.


>
> FWIW, I still think this needs a pgbench or similar example integration,
> so we can actually properly measure the benefits.
>

I will investigate on this further.


Thanks & Regards,
Vaishnavi,
Fujitsu Australia.

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-06-19 Thread Andres Freund

On 2017-04-05 15:45:26 -0700, Andres Freund wrote:
> Hi,
> 
> On 2017-04-05 17:00:42 +1000, Vaishnavi Prabakaran wrote:
> > Regarding test patch, I have corrected the test suite after David Steele's
> > comments.
> > Also, I would like to mention that a companion patch was submitted by David
> > Steele up-thread.
> > 
> > Attached the latest code and test patch.
> 
> My impression is that this'll need a couple more rounds of review. Given
> that this'll establish API we'll pretty much ever going to be able to
> change/remove, I think it'd be a bad idea to rush this into v10.
> Therefore I propose moving this to the next CF.

Craig, Vaishnavi, everyone else: Are you planning to continue to work on
this for v11?  I'm willing to do another round, but only if it's
worthwhile.

FWIW, I still think this needs a pgbench or similar example integration,
so we can actually properly measure the benefits.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-04-05 Thread Andres Freund

Hi,

On 2017-04-05 17:00:42 +1000, Vaishnavi Prabakaran wrote:
> Regarding test patch, I have corrected the test suite after David Steele's
> comments.
> Also, I would like to mention that a companion patch was submitted by David
> Steele up-thread.
> 
> Attached the latest code and test patch.

My impression is that this'll need a couple more rounds of review. Given
that this'll establish API we'll pretty much ever going to be able to
change/remove, I think it'd be a bad idea to rush this into v10.
Therefore I propose moving this to the next CF.

- Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-04-03 Thread Andres Freund

On 2017-04-04 08:57:33 +0900, Michael Paquier wrote:
> On Tue, Apr 4, 2017 at 8:26 AM, Andres Freund  wrote:
> > On 2017-04-04 09:24:23 +1000, Vaishnavi Prabakaran wrote:
> >> Just quickly, Is it not ok to consider only the code patch for this CF
> >> without test patch?
> >
> > I'd say no, it's not acceptable.  This is too much new code for it not
> > to be tested.
> 
> Doesn't it depend actually?

Well, I didn't make a general statement, I made one about this patch.
And this would add a significant bunch of untested code, and it'll likely
take years till it gets decent coverage outside.


> In the case of this patch, it seems to me that we would have a far
> better portable set of tests if we had a dedicated set of subcommands
> available at psql level, particularly for Windows/MSVC.

That's a really large scope creep imo.  Adding a bunch of user-facing
psql stuff doesn't compare in complexity to running a test across
platforms.  We can just do that from regess.c or such, if that ends up
being a problem..

> If that's a  requirement for this patch so let it be. I am not saying that 
> tests
> are not necessary. They are of course, but in this case having a bit
> more infrastructure would be more be more helpful for users and the
> tests themselves.

I'm not following.


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-04-03 Thread Michael Paquier

On Tue, Apr 4, 2017 at 8:26 AM, Andres Freund  wrote:
> On 2017-04-04 09:24:23 +1000, Vaishnavi Prabakaran wrote:
>> Just quickly, Is it not ok to consider only the code patch for this CF
>> without test patch?
>
> I'd say no, it's not acceptable.  This is too much new code for it not
> to be tested.

Doesn't it depend actually? I would think that the final patch may not
include all the tests implemented if:
- The thread on which a patch has been developed had a set of tests
done and posted with it.
- Including them does not make sense if we have a way to run those
tests more efficiently. Sometimes a bunch of benchmarks or tests are
run on a patch bu for the final result keeping them around does not
make much sense.

In the case of this patch, it seems to me that we would have a far
better portable set of tests if we had a dedicated set of subcommands
available at psql level, particularly for Windows/MSVC. If that's a
requirement for this patch so let it be. I am not saying that tests
are not necessary. They are of course, but in this case having a bit
more infrastructure would be more be more helpful for users and the
tests themselves.

Note that I won't complain either if this set of C tests are included
at the end.
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-04-03 Thread Andres Freund

On 2017-04-04 09:24:23 +1000, Vaishnavi Prabakaran wrote:
> Hi,
> On Tue, Apr 4, 2017 at 7:05 AM, Andres Freund  wrote:
> 
> > On 2017-04-03 14:10:47 +1000, Vaishnavi Prabakaran wrote:
> > > > The CF has been extended until April 7 but time is still growing short.
> > > > Please respond with a new patch by 2017-04-04 00:00 AoE (UTC-12) or
> > this
> > > > submission will be marked "Returned with Feedback".
> > > >
> > > >
> > > Thanks for the information, attached the latest patch resolving one
> > > compilation warning. And, please discard the test patch as it will be
> > > re-implemented later separately.
> >
> > Hm.  If the tests aren't ready yet, it seems we'll have to move this to
> > the next CF.
> >
> >
> Thanks for your review and I will address your review comments and send the
> newer version of patch shortly.

Cool.

> Just quickly, Is it not ok to consider only the code patch for this CF
> without test patch?

I'd say no, it's not acceptable.  This is too much new code for it not
to be tested.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-04-03 Thread Vaishnavi Prabakaran

Hi,
On Tue, Apr 4, 2017 at 7:05 AM, Andres Freund  wrote:

> On 2017-04-03 14:10:47 +1000, Vaishnavi Prabakaran wrote:
> > > The CF has been extended until April 7 but time is still growing short.
> > > Please respond with a new patch by 2017-04-04 00:00 AoE (UTC-12) or
> this
> > > submission will be marked "Returned with Feedback".
> > >
> > >
> > Thanks for the information, attached the latest patch resolving one
> > compilation warning. And, please discard the test patch as it will be
> > re-implemented later separately.
>
> Hm.  If the tests aren't ready yet, it seems we'll have to move this to
> the next CF.
>
>
Thanks for your review and I will address your review comments and send the
newer version of patch shortly.
Just quickly, Is it not ok to consider only the code patch for this CF
without test patch?

Thanks & Regards,
Vaishnavi,
Fujitsu Australia.

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-04-03 Thread Andres Freund

On 2017-04-03 14:10:47 +1000, Vaishnavi Prabakaran wrote:
> > The CF has been extended until April 7 but time is still growing short.
> > Please respond with a new patch by 2017-04-04 00:00 AoE (UTC-12) or this
> > submission will be marked "Returned with Feedback".
> >
> >
> Thanks for the information, attached the latest patch resolving one
> compilation warning. And, please discard the test patch as it will be
> re-implemented later separately.

Hm.  If the tests aren't ready yet, it seems we'll have to move this to
the next CF.


> + 
> +  Batch mode and query pipelining
> +
> +  
> +   libpq
> +   batch mode
> +  
> +
> +  
> +   libpq
> +   pipelining
> +  
> +
> +  
> +   libpq supports queueing up multiple queries 
> into
> +   a pipeline to be executed as a batch on the server. Batching queries 
> allows
> +   applications to avoid a client/server round-trip after each query to get
> +   the results before issuing the next query.
> +  

"queueing .. into a pipeline" sounds weird to me - but I'm not a native
speaker.  Also batching != pipelining.


> +  
> +   When to use batching
> +
> +   
> +Much like asynchronous query mode, there is no performance disadvantage 
> to
> +using batching and pipelining. It increases client application complexity
> +and extra caution is required to prevent client/server deadlocks but
> +offers considerable performance improvements.
> +   

s/offers/can sometimes offer/


> +  
> +   Using batch mode
> +
> +   
> +To issue batches the application must switch
> +libpq into batch mode.

s/libpq/a connection/?


> Enter batch mode with  +
> linkend="libpq-PQbatchBegin">PQbatchBegin(conn) 
> or test
> +whether batch mode is active with  +
> linkend="libpq-PQbatchStatus">PQbatchStatus(conn).
>  In batch mode only  +linkend="libpq-async">asynchronous operations are permitted, and
> +COPY is not recommended as it most likely will 
> trigger failure in batch processing. 
> +(The restriction on COPY is an implementation
> +limit; the PostgreSQL protocol and server can support batched
> COPY).

Hm, I'm unconvinced that that's a useful parenthetical in the libpq
docs.


> +   
> +The client uses libpq's asynchronous query functions to dispatch work,
> +marking the end of each batch with PQbatchQueueSync.
> +Concurrently, it uses PQgetResult and
> +PQbatchQueueProcess to get results.

"Concurrently" imo is a dangerous word, somewhat implying multi-threading.


> +   
> +
> + It is best to use batch mode with libpq in
> + non-blocking mode. If 
> used in
> + blocking mode it is possible for a client/server deadlock to occur. The
> + client will block trying to send queries to the server, but the server 
> will
> + block trying to send results from queries it has already processed to 
> the
> + client. This only occurs when the client sends enough queries to fill 
> its
> + output buffer and the server's receive buffer before switching to
> + processing input from the server, but it's hard to predict exactly when
> + that'll happen so it's best to always use non-blocking mode.
> +
> +   

Such deadlocks are possible just as well with non-blocking mode, unless
one can avoid sending queries and switching to receiving results anytime
in the application code.


> +
> + Batched operations will be executed by the server in the order the 
> client
> + sends them. The server will send the results in the order the statements
> + executed. The server may begin executing the batch before all commands
> + in the batch are queued and the end of batch command is sent. If any
> + statement encounters an error the server aborts the current transaction 
> and
> + skips processing the rest of the batch. Query processing resumes after 
> the
> + end of the failed batch.
> +

What if a batch contains transaction boundaries?


> +   
> +Processing results
> +
> +
> + The client interleaves result
> + processing with sending batch queries, or for small batches may
> + process all results after sending the whole batch.
> +

That's a very long  text, is it not?


> +
> + To get the result of the first batch entry the client must call  + 
> linkend="libpq-PQbatchQueueProcess">PQbatchQueueProcess.
>  It must then call

What does 'QueueProcess' mean? Shouldn't it be 'ProcessQueue'? You're
not enquing a process or processing, right?


> + PQgetResult and handle the results until
> + PQgetResult returns null (or would return null if
> + called).

What is that parenthetical referring to?  IIRC we don't provide any
external way to determine PQgetResult would return NULL.


Have you checked how this API works with PQsetSingleRowMode()?

> +
> + To enter single-row mode, call PQsetSingleRowMode 
> immediately after a
> + successful call of PQbatchQueueProcess. This mode 
> selection is effective 
> +

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-04-02 Thread Vaishnavi Prabakaran

On Sat, Apr 1, 2017 at 2:03 AM, David Steele  wrote:

> Hi,
>
> On 3/30/17 2:12 PM, Daniel Verite wrote:
>
>> Vaishnavi Prabakaran wrote:
>>
>> Hmm, With batch mode, after sending COPY command to server(and server
>>> started processing the query and goes into COPY state) , client does not
>>> immediately read the result , but it keeps sending other queries to the
>>> server. By this time, server already encountered the error
>>> scenario(Receiving different message during COPY state) and sent error
>>> messages
>>>
>>
>> IOW, the test intentionally violates the protocol and then all goes wonky
>> because of that.
>> That's why I was wondering upthread what's it's supposed to test.
>> I mean, regression tests are meant to warn against a desirable behavior
>> being unknowingly changed by new code into an undesirable behavior.
>> Here we have the undesirable behavior to start with.
>> What kind of regression could we fear from that?
>>
>
Yes, completely agree, demonstrating the undesirable behavior is not needed
as documentation gives enough warning to user.
The test patch is decided not to go in for now, but will be re-implemented
with PSQL commands later. So, during the re-implementation of test patch, I
will remove this test. Thanks .


>
> The CF has been extended until April 7 but time is still growing short.
> Please respond with a new patch by 2017-04-04 00:00 AoE (UTC-12) or this
> submission will be marked "Returned with Feedback".
>
>
Thanks for the information, attached the latest patch resolving one
compilation warning. And, please discard the test patch as it will be
re-implemented later separately.


Thanks & Regards,
Vaishnavi,
Fujitsu Australia.


0001-Pipelining-batch-support-for-libpq-code-v9.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-03-31 Thread David Steele


Hi,

On 3/30/17 2:12 PM, Daniel Verite wrote:

Vaishnavi Prabakaran wrote:


Hmm, With batch mode, after sending COPY command to server(and server
started processing the query and goes into COPY state) , client does not
immediately read the result , but it keeps sending other queries to the
server. By this time, server already encountered the error
scenario(Receiving different message during COPY state) and sent error
messages


IOW, the test intentionally violates the protocol and then all goes wonky
because of that.
That's why I was wondering upthread what's it's supposed to test.
I mean, regression tests are meant to warn against a desirable behavior
being unknowingly changed by new code into an undesirable behavior.
Here we have the undesirable behavior to start with.
What kind of regression could we fear from that?


The CF has been extended until April 7 but time is still growing short. 
Please respond with a new patch by 2017-04-04 00:00 AoE (UTC-12) or this 
submission will be marked "Returned with Feedback".


Thanks,
--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-03-30 Thread Daniel Verite

Vaishnavi Prabakaran wrote:

> Hmm, With batch mode, after sending COPY command to server(and server
> started processing the query and goes into COPY state) , client does not
> immediately read the result , but it keeps sending other queries to the
> server. By this time, server already encountered the error
> scenario(Receiving different message during COPY state) and sent error
> messages

IOW, the test intentionally violates the protocol and then all goes wonky
because of that.
That's why I was wondering upthread what's it's supposed to test.
I mean, regression tests are meant to warn against a desirable behavior
being unknowingly changed by new code into an undesirable behavior.
Here we have the undesirable behavior to start with.
What kind of regression could we fear from that?

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-03-29 Thread Vaishnavi Prabakaran

On Thu, Mar 30, 2017 at 12:08 PM, Michael Paquier  wrote:

> On Wed, Mar 29, 2017 at 12:40 PM, Vaishnavi Prabakaran
>  wrote:
> > Michael Paquier wrote:
> >>Could you as well update src/tools/msvc/vcregress.pl, aka the routine
> >>modulescheck so as this new test is skipped. I am sure that nobody
> >>will scream if this test is not run on Windows, but the buildfarm will
> >>complain if the patch is let in its current state.
> >
> > Thanks for the finding. Hmm, modulescheck will not pick up test_libpq as
> > "subdircheck" will fail for this module(because it does not have
> > "sql"/"expected" folders in it) and hence it will be skipped.
> > But, Modified "Mkvcbuild.pm" to exclude "test_libpq" module, as this test
> > anyways will not be run in Windows and also because it uses linux
> specific
> > APIs(gettimeofday , timersub).
>
> src/port/gettimeofday.c extends things on Windows, and we could
> consider to just drop the timing portion for simplicity (except
> Windows I am not seeing any platform missing timersub). But that's
> just a point of detail. Or we could just drop those tests from the
> final patch, and re-implement them after having some psql-level
> subcommands. That would far better for portability.
>

Yes agree, having tests with psql meta commands will be more easy to
understand also.  And, that is one of the reason I separated the patch into
two - code and test . So, yes, we can drop the test patch for now.


>
> > There are 2 cases tested here:
> >
> > 1. Example for the case - Using COPY command in batch mode will trigger
> > failure. (Specified in documentation)
> > Failure can be seen only when processing the copy command's results. That
> > is, after dispatching COPY command, server goes into COPY state , but the
> > client dispatched another query in batch mode.
> >
> > Below error messages belongs to this case :
> > Error status and message got from server due to COPY command failure are
> :
> > PGRES_FATAL_ERROR ERROR:  unexpected message type 0x50 during COPY from
> > stdin
> > CONTEXT:  COPY batch_demo, line 1
> >
> > message type 0x5a arrived from server while idle
>
> Such messages are really confusing to the user as they refer to the
> protocol going out of sync. I would think that something like "cannot
> process COPY results during a batch processing" would be cleaner.
> Isn't some more error handling necessary in GetCopyStart()?


Hmm, With batch mode, after sending COPY command to server(and server
started processing the query and goes into COPY state) , client does not
immediately read the result , but it keeps sending other queries to the
server. By this time, server already encountered the error
scenario(Receiving different message during COPY state) and sent error
messages. So, when client starts reading the result, already error messages
sent by server are present in socket.  I think trying to handle during
result processing is more like masking the server's error message with new
error message and I think it is not appropriate.



>
>
> Attached the latest code and test patches.
>
> For me the test is still broken:
> ok 1 - testlibpqbatch disallowed_in_batch
> ok 2 - testlibpqbatch simple_batch
> ok 3 - testlibpqbatch multi_batch
> ok 4 - testlibpqbatch batch_abort
> ok 5 - testlibpqbatch timings
> not ok 6 - testlibpqbatch copyfailure
>
> Still broken here. I can see that this patch is having enough
> safeguards in PQbatchBegin() and PQsendQueryStart(), but this issue is
> really pointing out at something...
>
>
I will check this one with fresh setup again and I guess that this should
not block the code patch.

Thanks & Regards,
Vaishnavi
Fujitsu Australia.

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-03-29 Thread Michael Paquier

On Wed, Mar 29, 2017 at 12:40 PM, Vaishnavi Prabakaran
 wrote:
> Michael Paquier wrote:
>>Could you as well update src/tools/msvc/vcregress.pl, aka the routine
>>modulescheck so as this new test is skipped. I am sure that nobody
>>will scream if this test is not run on Windows, but the buildfarm will
>>complain if the patch is let in its current state.
>
> Thanks for the finding. Hmm, modulescheck will not pick up test_libpq as
> "subdircheck" will fail for this module(because it does not have
> "sql"/"expected" folders in it) and hence it will be skipped.
> But, Modified "Mkvcbuild.pm" to exclude "test_libpq" module, as this test
> anyways will not be run in Windows and also because it uses linux specific
> APIs(gettimeofday , timersub).

src/port/gettimeofday.c extends things on Windows, and we could
consider to just drop the timing portion for simplicity (except
Windows I am not seeing any platform missing timersub). But that's
just a point of detail. Or we could just drop those tests from the
final patch, and re-implement them after having some psql-level
subcommands. That would far better for portability.

> There are 2 cases tested here:
>
> 1. Example for the case - Using COPY command in batch mode will trigger
> failure. (Specified in documentation)
> Failure can be seen only when processing the copy command's results. That
> is, after dispatching COPY command, server goes into COPY state , but the
> client dispatched another query in batch mode.
>
> Below error messages belongs to this case :
> Error status and message got from server due to COPY command failure are :
> PGRES_FATAL_ERROR ERROR:  unexpected message type 0x50 during COPY from
> stdin
> CONTEXT:  COPY batch_demo, line 1
>
> message type 0x5a arrived from server while idle

Such messages are really confusing to the user as they refer to the
protocol going out of sync. I would think that something like "cannot
process COPY results during a batch processing" would be cleaner.
Isn't some more error handling necessary in GetCopyStart()?

> Attached the latest code and test patches.

For me the test is still broken:
ok 1 - testlibpqbatch disallowed_in_batch
ok 2 - testlibpqbatch simple_batch
ok 3 - testlibpqbatch multi_batch
ok 4 - testlibpqbatch batch_abort
ok 5 - testlibpqbatch timings
not ok 6 - testlibpqbatch copyfailure

Still broken here. I can see that this patch is having enough
safeguards in PQbatchBegin() and PQsendQueryStart(), but this issue is
really pointing out at something...
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-03-28 Thread Daniel Verite

Michael Paquier wrote:

> # Running: testlibpqbatch dbname=postgres 1 copyfailure
> dispatching SELECT query failed: cannot queue commands during COPY
> 
> COPYBUF: 5
> 
> Error status and message got from server due to COPY command failure
> are : PGRES_FATAL_ERROR ERROR:  unexpected message type 0x50 during
> COPY from stdin
> CONTEXT:  COPY batch_demo, line 1

Same result here.

BTW the doc says: 
  "In batch mode only asynchronous operations are permitted, and COPY is
   not recommended as it most likely will trigger failure in batch
processing"
Yet it seems that the test assumes that it should work nonetheless.
I don't quite understand what we expect of this test, given what's
documented. Or what am I missing?

While looking at the regress log, I noticed multiple spurious
test_singlerowmode tests among the others. Should be fixed with:

--- a/src/test/modules/test_libpq/testlibpqbatch.c
+++ b/src/test/modules/test_libpq/testlibpqbatch.c
@@ -1578,6 +1578,7 @@ main(int argc, char **argv)
run_batch_abort = 0;
run_timings = 0;
run_copyfailure = 0;
+   run_singlerowmode = 0;
if (strcmp(argv[3], "disallowed_in_batch") == 0)
run_disallowed_in_batch = 1;
else if (strcmp(argv[3], "simple_batch") == 0)

There's also the fact that this test doesn't care whether it fails 
(compare with all the "goto fail" of the other tests).

And it happens that PQsetSingleRowMode() fails after the call to
PQbatchQueueProcess() that instantiates a PGresult
of status PGRES_BATCH_END to indicate the end of the batch,

To me this shows how this way of setting the single row mode is not ideal.
In order to avoid the failure, the loop should know in advance 
what kind of result it's going to dequeue before calling PQgetResult(),
which doesn't feel right.

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-03-28 Thread Michael Paquier

On Tue, Mar 28, 2017 at 1:57 PM, Vaishnavi Prabakaran
 wrote:
> Thanks Craig and Michael for confirming that "PQsetSingleRowMode" should be
> called right after "PQbatchQueueProcess".
>
> Michael Paquier wrote:
>
>> It seems to me that
>>this should also be effective only during the fetching of one single
>>result set. When the client moves on to the next item in the queue we
>>should make necessary again a call to PQsetSingleRowMode().
>
> Yes, the current behavior(with V6 code patch) is exactly the same as you
> described above. PQsetSingleRowMode() should be called each time after
> "PQbatchQueueProcess" to set result processing to single-row mode.

Okay, that's fine for me then.

> Attached the latest patch and here is the RT run result:
> ok 1 - testlibpqbatch disallowed_in_batch
> ok 2 - testlibpqbatch simple_batch
> ok 3 - testlibpqbatch multi_batch
> ok 4 - testlibpqbatch batch_abort
> ok 5 - testlibpqbatch timings
> ok 6 - testlibpqbatch copyfailure
> ok 7 - testlibpqbatch test_singlerowmode
> ok
> All tests successful.
> Files=1, Tests=7,  5 wallclock secs ( 0.01 usr  0.00 sys +  1.79 cusr  0.35
> csys =  2.15 CPU)
> Result: PASS

ok 1 - testlibpqbatch disallowed_in_batch
ok 2 - testlibpqbatch simple_batch
ok 3 - testlibpqbatch multi_batch
ok 4 - testlibpqbatch batch_abort
ok 5 - testlibpqbatch timings
not ok 6 - testlibpqbatch copyfailure
Hm... Not here. I saw something like that a couple of days ago on my
macos laptop and that was related to a variable not initialized. From
001_libpq_async_main.log:
7-03-28 17:05:49.159 JST [31553] t/001_libpq_async.pl LOG:  execute
: copy batch_demo(id) to stdout;
2017-03-28 17:05:49.159 JST [31553] t/001_libpq_async.pl LOG:  execute
: copy batch_demo(itemno) FROM stdin;
2017-03-28 17:05:49.160 JST [31553] t/001_libpq_async.pl ERROR:
unexpected message type 0x50 during COPY from stdin
2017-03-28 17:05:49.160 JST [31553] t/001_libpq_async.pl CONTEXT:
COPY batch_demo, line 1

>From regress_log_001_libpq_async :
ok 5 - testlibpqbatch timings
# Running: testlibpqbatch dbname=postgres 1 copyfailure
dispatching SELECT query failed: cannot queue commands during COPY

COPYBUF: 5

Error status and message got from server due to COPY command failure
are : PGRES_FATAL_ERROR ERROR:  unexpected message type 0x50 during
COPY from stdin
CONTEXT:  COPY batch_demo, line 1

So it seems to me that you are still missing something..

 src/test/modules/Makefile|1 +
 src/test/modules/test_libpq/.gitignore   |5 +
 src/test/modules/test_libpq/Makefile |   25 +
 src/test/modules/test_libpq/README   |1 +
 src/test/modules/test_libpq/t/001_libpq_async.pl |   26 +
 src/test/modules/test_libpq/testlibpqbatch.c | 1661 ++
 6 files changed, 1719 insertions(+)
Could you as well update src/tools/msvc/vcregress.pl, aka the routine
modulescheck so as this new test is skipped. I am sure that nobody
will scream if this test is not run on Windows, but the buildfarm will
complain if the patch is let in its current state.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-03-27 Thread Vaishnavi Prabakaran

Thanks Craig and Michael for confirming that "PQsetSingleRowMode" should be
called right after "PQbatchQueueProcess".

Michael Paquier wrote:

> It seems to me that
>this should also be effective only during the fetching of one single
>result set. When the client moves on to the next item in the queue we
>should make necessary again a call to PQsetSingleRowMode().

Yes, the current behavior(with V6 code patch) is exactly the same as you
described above. PQsetSingleRowMode() should be called each time after
"PQbatchQueueProcess"
to set result processing to single-row mode.

>src/test/modules/Makefile should include test_libpq.

Yes, added test_libpq to the list in Makefile.

Daniel Verite wrote:

>> Please let me know if you think this is not enough but wanted to update
>> section 33.6 also?

>Yes, if the right place to call PQsetSingleRowMode() is immediately
>after PQbatchQueueProcess(), I think we need to update
>"33.6. Retrieving Query Results Row-By-Row"
>with that information, otherwise what it says is just wrong
>when in batch mode.

Yes, I have updated Chapter 33.6 by adding note for batch mode.

>Also, in 33.5, I suggest to not use the "currently executing
>query" as a synonym for the "query currently being processed"
>to avoid any confusion between what the backend is executing
>and a prior query whose results are being collected by the client
>at the same moment.

Yes correct, I modified the words to "query currently being processed" as
suggested.



> One bit of functionality that does not work in batch mode and is left
> as is by the patch is the PQfn() fast path interface and the large object
> functions that use it.
>
> Currently, calling lo_* functions inside a batch will fail with a message
> that depends on whether the internal lo_initialize() has been successfully
> called before.
>
> If it hasn't, PQerrorMessage() will be:
>"Synchronous command execution functions are not allowed in batch mode"
> which is fine, but it comes by happenstance from lo_initialize()
> calling PQexec() to fetch the function OIDs from pg_catalog.pg_proc.
>
> OTOH, if lo_initialize() has succeeded before, a call to a large
> object function will fail with a different message:
>   "connection in wrong state"
> which is emitted by PQfn() based on conn->asyncStatus != PGASYNC_IDLE
>
> Having an unified error message would be more user friendly.
>
>
Thanks for finding out this and yes, added a new check in PQfn() to give
the same error message - "Synchronous command execution functions are not
allowed in batch mode" when called in batch mode.



> Concerning the doc, when saying in 33.5.2:
>  "In batch mode only asynchronous operations are permitted"
> the server-side functions are indeed ruled out, since PQfn() is
> synchronous, but maybe we should be a bit more explicit
> about that?


> Chapter 34.3 (Large Objects / Client Interfaces) could also
> mention the incompatibility with batch mode.
>
>
Updated 33.5.2 to be more clear about what functions are allowed and what
are not allowed. Updated Chapter 33.3(Large Objects/ Client Interfaces) to
let the user know about the incompatibility with batch mode .


Attached the latest patch and here is the RT run result:
ok 1 - testlibpqbatch disallowed_in_batch
ok 2 - testlibpqbatch simple_batch
ok 3 - testlibpqbatch multi_batch
ok 4 - testlibpqbatch batch_abort
ok 5 - testlibpqbatch timings
ok 6 - testlibpqbatch copyfailure
ok 7 - testlibpqbatch test_singlerowmode
ok
All tests successful.
Files=1, Tests=7,  5 wallclock secs ( 0.01 usr  0.00 sys +  1.79 cusr  0.35
csys =  2.15 CPU)
Result: PASS

Thanks & Regards,
Vaishnavi,
Fujitsu Australia.


0001-Pipelining-batch-support-for-libpq-code-v7.patch
Description: Binary data


0002-Pipelining-batch-support-for-libpq-test-v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-03-27 Thread Daniel Verite

Vaishnavi Prabakaran wrote:

> Am going to include the test which you shared in the test patch. Please let
> me know if you want to cover anymore specific cases to gain confidence.

One bit of functionality that does not work in batch mode and is left
as is by the patch is the PQfn() fast path interface and the large object
functions that use it.

Currently, calling lo_* functions inside a batch will fail with a message
that depends on whether the internal lo_initialize() has been successfully
called before.

If it hasn't, PQerrorMessage() will be:
   "Synchronous command execution functions are not allowed in batch mode"
which is fine, but it comes by happenstance from lo_initialize()
calling PQexec() to fetch the function OIDs from pg_catalog.pg_proc.

OTOH, if lo_initialize() has succeeded before, a call to a large
object function will fail with a different message:
  "connection in wrong state"
which is emitted by PQfn() based on conn->asyncStatus != PGASYNC_IDLE

Having an unified error message would be more user friendly.

Concerning the doc, when saying in 33.5.2:
 "In batch mode only asynchronous operations are permitted"
the server-side functions are indeed ruled out, since PQfn() is
synchronous, but maybe we should be a bit more explicit
about that?

Chapter 34.3 (Large Objects / Client Interfaces) could also
mention the incompatibility with batch mode.

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-03-27 Thread Daniel Verite

Vaishnavi Prabakaran wrote:

> Please let me know if you think this is not enough but wanted to update
> section 33.6 also?

Yes, if the right place to call PQsetSingleRowMode() is immediately
after PQbatchQueueProcess(), I think we need to update
"33.6. Retrieving Query Results Row-By-Row"
with that information, otherwise what it says is just wrong
when in batch mode.

Also, in 33.5, I suggest to not use the "currently executing
query" as a synonym for the "query currently being processed"
to avoid any confusion between what the backend is executing
and a prior query whose results are being collected by the client
at the same moment.

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-03-27 Thread Michael Paquier

On Mon, Mar 27, 2017 at 4:42 PM, Craig Ringer  wrote:
> On 27 March 2017 at 15:26, Michael Paquier  wrote:
>> On Sat, Mar 25, 2017 at 9:50 PM, Craig Ringer  wrote:
>>> I'm fairly confident that I overlooked single row mode entirely in the
>>> original patch, though it's long enough ago that it's hard for me to
>>> remember exactly.
>>>
>>> I don't really have much of an opinion on the best handling of it.
>>>
>>> I would expect to be setting single-row mode just before I requested
>>> the *result* from the next pending query, since it relates to result
>>> processing rather than query dispatch. But that's about all the
>>> opinion I have here.
>>
>> Yeah, I think that it makes sense to allow users to switch to single
>> row mode before requesting a result in the queue. It seems to me that
>> this should also be effective only during the fetching of one single
>> result set. When the client moves on to the next item in the queue we
>> should make necessary again a call to PQsetSingleRowMode().
>>
>> Regarding the patch, I have spotted the following things in the last version:
>> - src/test/modules/Makefile should include test_libpq.
>> - Regression tests from 0002 are failing:
>> ok 1 - testlibpqbatch disallowed_in_batch
>> ok 2 - testlibpqbatch simple_batch
>> ok 3 - testlibpqbatch multi_batch
>> ok 4 - testlibpqbatch batch_abort
>> ok 5 - testlibpqbatch timings
>> not ok 6 - testlibpqbatch copyfailure
>
> There are only a few more days left of this commit fest.
>
> Things are sounding pretty ready here, with some final cleanups
> pending. It'd be cool to get this into Pg 10 :)

Yes, that's one of the items I'd like to help with by the end of the CF.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-03-27 Thread Craig Ringer

On 27 March 2017 at 15:26, Michael Paquier  wrote:
> On Sat, Mar 25, 2017 at 9:50 PM, Craig Ringer  wrote:
>> I'm fairly confident that I overlooked single row mode entirely in the
>> original patch, though it's long enough ago that it's hard for me to
>> remember exactly.
>>
>> I don't really have much of an opinion on the best handling of it.
>>
>> I would expect to be setting single-row mode just before I requested
>> the *result* from the next pending query, since it relates to result
>> processing rather than query dispatch. But that's about all the
>> opinion I have here.
>
> Yeah, I think that it makes sense to allow users to switch to single
> row mode before requesting a result in the queue. It seems to me that
> this should also be effective only during the fetching of one single
> result set. When the client moves on to the next item in the queue we
> should make necessary again a call to PQsetSingleRowMode().
>
> Regarding the patch, I have spotted the following things in the last version:
> - src/test/modules/Makefile should include test_libpq.
> - Regression tests from 0002 are failing:
> ok 1 - testlibpqbatch disallowed_in_batch
> ok 2 - testlibpqbatch simple_batch
> ok 3 - testlibpqbatch multi_batch
> ok 4 - testlibpqbatch batch_abort
> ok 5 - testlibpqbatch timings
> not ok 6 - testlibpqbatch copyfailure

There are only a few more days left of this commit fest.

Things are sounding pretty ready here, with some final cleanups
pending. It'd be cool to get this into Pg 10 :)

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-03-27 Thread Michael Paquier

On Sat, Mar 25, 2017 at 9:50 PM, Craig Ringer  wrote:
> I'm fairly confident that I overlooked single row mode entirely in the
> original patch, though it's long enough ago that it's hard for me to
> remember exactly.
>
> I don't really have much of an opinion on the best handling of it.
>
> I would expect to be setting single-row mode just before I requested
> the *result* from the next pending query, since it relates to result
> processing rather than query dispatch. But that's about all the
> opinion I have here.

Yeah, I think that it makes sense to allow users to switch to single
row mode before requesting a result in the queue. It seems to me that
this should also be effective only during the fetching of one single
result set. When the client moves on to the next item in the queue we
should make necessary again a call to PQsetSingleRowMode().

Regarding the patch, I have spotted the following things in the last version:
- src/test/modules/Makefile should include test_libpq.
- Regression tests from 0002 are failing:
ok 1 - testlibpqbatch disallowed_in_batch
ok 2 - testlibpqbatch simple_batch
ok 3 - testlibpqbatch multi_batch
ok 4 - testlibpqbatch batch_abort
ok 5 - testlibpqbatch timings
not ok 6 - testlibpqbatch copyfailure
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-03-25 Thread Craig Ringer

On 24 March 2017 at 23:21, David Steele  wrote:
> Hi Vaishnavi,
>
> On 3/19/17 9:32 PM, Vaishnavi Prabakaran wrote:
>>
>> On Fri, Mar 17, 2017 at 12:37 AM, Daniel Verite >
>> Please let me know if you think this is not enough but wanted to update
>> section 33.6 also?
>
>
> Daniel, any input here?
>
>> > I would also like to hear Craig's opinion on it before applying this
>> fix
>> > to the original patch, just to make sure am not missing anything
>> here.
>
>
> Craig?

I'm fairly confident that I overlooked single row mode entirely in the
original patch, though it's long enough ago that it's hard for me to
remember exactly.

I don't really have much of an opinion on the best handling of it.

I would expect to be setting single-row mode just before I requested
the *result* from the next pending query, since it relates to result
processing rather than query dispatch. But that's about all the
opinion I have here.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-03-24 Thread David Steele


Hi Vaishnavi,

On 3/19/17 9:32 PM, Vaishnavi Prabakaran wrote:

On Fri, Mar 17, 2017 at 12:37 AM, Daniel Verite  I would also like to hear Craig's opinion on it before applying this fix
> to the original patch, just to make sure am not missing anything here.


Craig?



Am going to include the test which you shared in the test patch. Please
let me know if you want to cover anymore specific cases to gain
confidence.


I have marked this submission "Waiting for Author" since it appears a 
new patch is required based on input and adding a new test.


--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-03-19 Thread Vaishnavi Prabakaran

On Fri, Mar 17, 2017 at 12:37 AM, Daniel Verite 
wrote:

> Vaishnavi Prabakaran wrote:
>
> > So, attached the alternative fix for this issue.
> > Please share me your thoughts.
>
> I assume you prefer the alternative fix because it's simpler.
>

I would like add one more reason for this fix, I think "PQsetSingleRowMode"
should be called only when the result is ready to be processed and before
starting to consume result as it is documented currently as follows -
"To enter single-row mode, call PQsetSingleRowMode immediately after a
successful call of PQsendQuery (or a sibling function). This mode selection
is effective only for the currently executing query. Then call PQgetResult
repeatedly..."

I agree that first fix (you shared) will allow user to set single-row mode
after PQsendQuery, but it also allows user to set this mode at any time of
batch processing(not necessarily "immediately after PQsendQuery"), also
"mode selection is effective only for the currently executing query" will
be false. Please note that I don't see any problem with this deviation. I
like to outline that documentation here anyways needs an update/note.

 Before going further, I would like to mention that I have modified the
documentation of batch processing( in v6 code patch) as below:
"To enter single-row mode, call PQsetSingleRowMode immediately after a
successful call of PQbatchQueueProcess. This mode selection is effective
only for the currently executing query. For more information on the
use of PQsetSingleRowMode
, refer to Section 33.6, “Retrieving Query Results Row-By-Row”. "

Please let me know if you think this is not enough but wanted to update
section 33.6 also?

>
> > I would also like to hear Craig's opinion on it before applying this fix
> > to the original patch, just to make sure am not missing anything here.
>
> +1
>
> The main question is whether the predicates enforced
> by PQsetSingleRowMode() apply in batch mode in all cases
> when it's legit to call that function. Two predicates
> that may be problematic are:
> if (conn->asyncStatus != PGASYNC_BUSY)
> return 0;
> and
> if (conn->result)
> return 0;
>
> The general case with batch mode is that, from the doc:
> "The client interleaves result processing with sending batch queries"
>

While sending batch queries in middle of result processing, only the query
is appended to the list of queries maintained for batch processing and no
current connection attribute impacting result processing will be changed.
So, calling the PQsetSingleRowMode in-between result processing will fail
as it tries to set single-row mode for currently executing query for which
result processing is already started.

Note that I've not even tested that here,

I've tested
> batching a bunch of queries in a first step and getting the results
> in a second step.
> I am not confident that the above predicates will be true
> in all cases.

Also your alternative fix assumes that we add
> a user-visible exception to PQsetSingleRowMode in batch mode,
> whereby it must not be called as currently documented:
>   "call PQsetSingleRowMode immediately after a successful call of
>PQsendQuery (or a sibling function)"

My gut feeling is that it's not the right direction, I prefer making
> the single-row a per-query attribute internally and keep
> PQsetSingleRowMode's contract unchanged from the
> user's perspective.
>
>
Am going to include the test which you shared in the test patch. Please let
me know if you want to cover anymore specific cases to gain confidence.

Thanks & Regards,
Vaishnavi,
Fujitsu Australia.

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-03-16 Thread Daniel Verite

Vaishnavi Prabakaran wrote:

> So, attached the alternative fix for this issue.
> Please share me your thoughts.

I assume you prefer the alternative fix because it's simpler.

> I would also like to hear Craig's opinion on it before applying this fix
> to the original patch, just to make sure am not missing anything here.

+1

The main question is whether the predicates enforced
by PQsetSingleRowMode() apply in batch mode in all cases
when it's legit to call that function. Two predicates
that may be problematic are:
if (conn->asyncStatus != PGASYNC_BUSY)
return 0;
and
if (conn->result)
return 0;

The general case with batch mode is that, from the doc:
"The client interleaves result processing with sending batch queries"
Note that I've not even tested that here, I've tested
batching a bunch of queries in a first step and getting the results
in a second step.
I am not confident that the above predicates will be true
in all cases. Also your alternative fix assumes that we add
a user-visible exception to PQsetSingleRowMode in batch mode,
whereby it must not be called as currently documented:
  "call PQsetSingleRowMode immediately after a successful call of 
   PQsendQuery (or a sibling function)"
My gut feeling is that it's not the right direction, I prefer making
the single-row a per-query attribute internally and keep
PQsetSingleRowMode's contract unchanged from the
user's perspective.

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-03-15 Thread Vaishnavi Prabakaran

On Tue, Mar 14, 2017 at 5:50 PM, Vaishnavi Prabakaran <
vaishnaviprabaka...@gmail.com> wrote:

>
>
> On Tue, Mar 14, 2017 at 4:19 AM, Daniel Verite 
> wrote:
>
>>
>> I mean the next iteration of the above while statement. Referring
>> to the doc, that would be the "next batch entry":
>>
>>   " To get the result of the first batch entry the client must call
>>PQbatchQueueProcess. It must then call PQgetResult and handle the
>>results until PQgetResult returns null (or would return null if
>>called). The result from the next batch entry may then be retrieved
>>using PQbatchQueueProcess and the cycle repeated"
>>
>> Attached is a bare-bones testcase showing the problem.
>> As it is, it works, retrieving results for three "SELECT 1"
>> in the same batch.  Now in order to use the single-row
>> fetch mode, consider adding this:
>>
>> r = PQsetSingleRowMode(conn);
>> if (r!=1) {
>>   fprintf(stderr, "PQsetSingleRowMode() failed for i=%d\n", i);
>> }
>>
>> When inserted after the call to PQbatchQueueProcess,
>> which is what I understand you're saying works for you,
>> it fails for me when starting to get the results of the 2nd query
>> and after.
>>
>>
>
> Thanks for explaining the issue in detail and yes, I do see the issue
> using your attached test file.
>
> And I think the problem is with the "parseInput" call at the end of
> PQbatchQueueProcess().
> I don't see the need for "parseInput" to cover the scope of
> PQbatchQueueProcess(), also anyways we do it in PQgetResult().
> This function call changes the asyncstatus of connection to
> READY(following command complete message from backend), so eventually
> PQsetSingleRowMode() is failing. So, attached the alternative fix for this
> issue.
>
Please share me your thoughts.
>
> I would also like to hear Craig's opinion on it before applying this fix
> to the original patch, just to make sure am not missing anything here.
>
>
Attached the code patch applied with the fix explained above and moving the
CF status seeking review.

Thanks & Regards,
Vaishnavi,
Fujitsu Australia.


0001-Pipelining-batch-support-for-libpq-code-v6.patch
Description: Binary data


0002-Pipelining-batch-support-for-libpq-test-v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-03-14 Thread Vaishnavi Prabakaran

On Tue, Mar 14, 2017 at 4:19 AM, Daniel Verite 
wrote:

>
> I mean the next iteration of the above while statement. Referring
> to the doc, that would be the "next batch entry":
>
>   " To get the result of the first batch entry the client must call
>PQbatchQueueProcess. It must then call PQgetResult and handle the
>results until PQgetResult returns null (or would return null if
>called). The result from the next batch entry may then be retrieved
>using PQbatchQueueProcess and the cycle repeated"
>
> Attached is a bare-bones testcase showing the problem.
> As it is, it works, retrieving results for three "SELECT 1"
> in the same batch.  Now in order to use the single-row
> fetch mode, consider adding this:
>
> r = PQsetSingleRowMode(conn);
> if (r!=1) {
>   fprintf(stderr, "PQsetSingleRowMode() failed for i=%d\n", i);
> }
>
> When inserted after the call to PQbatchQueueProcess,
> which is what I understand you're saying works for you,
> it fails for me when starting to get the results of the 2nd query
> and after.
>
>

Thanks for explaining the issue in detail and yes, I do see the issue using
your attached test file.

And I think the problem is with the "parseInput" call at the end of
PQbatchQueueProcess().
I don't see the need for "parseInput" to cover the scope of
PQbatchQueueProcess(), also anyways we do it in PQgetResult().
This function call changes the asyncstatus of connection to READY(following
command complete message from backend), so eventually PQsetSingleRowMode()
is failing. So, attached the alternative fix for this issue.
Please share me your thoughts.

I would also like to hear Craig's opinion on it before applying this fix to
the original patch, just to make sure am not missing anything here.

Thanks & Regards,
Vaishnavi,
Fujitsu Australia.

0002-single-row-mode-fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-03-13 Thread Daniel Verite

Vaishnavi Prabakaran wrote:

> >  while (QbatchQueueProcess(conn)) {
> >r = PQsetSingleRowMode(conn);
> >if (r!=1) {
> >   fprintf(stderr, "PQsetSingleRowMode() failed");
> >}
> >..

> Thanks for investigating the problem, and could you kindly explain what
> "next iteration" you mean here? Because I don't see any problem in
> following sequence of calls - PQbatchQueueProcess(),PQsetSingleRowMode()
> , PQgetResult()

I mean the next iteration of the above while statement. Referring
to the doc, that would be the "next batch entry":

  " To get the result of the first batch entry the client must call
   PQbatchQueueProcess. It must then call PQgetResult and handle the
   results until PQgetResult returns null (or would return null if
   called). The result from the next batch entry may then be retrieved
   using PQbatchQueueProcess and the cycle repeated"

Attached is a bare-bones testcase showing the problem.
As it is, it works, retrieving results for three "SELECT 1"
in the same batch.  Now in order to use the single-row
fetch mode, consider adding this:

r = PQsetSingleRowMode(conn);
if (r!=1) {
  fprintf(stderr, "PQsetSingleRowMode() failed for i=%d\n", i);
}

When inserted after the call to PQbatchQueueProcess,
which is what I understand you're saying works for you,
it fails for me when starting to get the results of the 2nd query
and after. 

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

test-singlerow-batch.c
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-03-12 Thread Craig Ringer

On 13 March 2017 at 08:54, Vaishnavi Prabakaran
 wrote:

> Before going with this fix, I would like you to consider the option of
> asking batch processing users(via documentation) to set single-row mode
> before calling PQgetResult().
> Either way we need to fix the documentation part, letting users know how
> they can activate single-row mode while using batch processing.
> Let me know your thoughts.

Thanks for looking into this, it's clear that I didn't cover the combo
of single-row-mode and batch mode adequately.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-03-12 Thread Vaishnavi Prabakaran

On Sat, Mar 11, 2017 at 12:52 AM, Daniel Verite 
wrote:

>   Hi,
>
> I notice that PQsetSingleRowMode() doesn't work when getting batch results.
>
> The function is documented as:
> " int PQsetSingleRowMode(PGconn *conn);
>
>   This function can only be called immediately after PQsendQuery or one
>   of its sibling functions, before any other operation on the connection
>   such as PQconsumeInput or PQgetResult"
>
> But PQbatchQueueProcess() unconditionally clears conn->singleRowMode,
> so whatever it was when sending the query gets lost, and besides
> other queries might have been submitted in the meantime.
>
> Also if trying to set that mode when fetching like this
>
>  while (QbatchQueueProcess(conn)) {
>r = PQsetSingleRowMode(conn);
>if (r!=1) {
>   fprintf(stderr, "PQsetSingleRowMode() failed");
>}
>..
>
> it might work the first time, but on the next iterations, conn->asyncStatus
> might be PGASYNC_READY, which is a failure condition for
> PQsetSingleRowMode(), so that won't do.
>

Thanks for investigating the problem, and could you kindly explain what
"next iteration" you mean here? Because I don't see any problem in
following sequence of calls - PQbatchQueueProcess(),PQsetSingleRowMode()
, PQgetResult(). Am I missing anything?
Please note that it is MUST to call PQgetResult immediately after
PQbatchQueueProcess() as documented.

>
> ISTM that the simplest fix would be that when in batch mode,
> PQsetSingleRowMode() should register that the last submitted query
> does request that mode, and when later QbatchQueueProcess dequeues
> the batch entry for the corresponding query, this flag would be popped off
> and set as the current mode.
>
> Please find attached the suggested fix, against the v5 of the patch.
>

Before going with this fix, I would like you to consider the option of
asking batch processing users(via documentation) to set single-row mode
before calling PQgetResult().
Either way we need to fix the documentation part, letting users know how
they can activate single-row mode while using batch processing.
Let me know your thoughts.

Best Regards,
Vaishnavi,
Fujitsu Australia.

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-03-10 Thread Daniel Verite

  Hi,

I notice that PQsetSingleRowMode() doesn't work when getting batch results.  

The function is documented as:
" int PQsetSingleRowMode(PGconn *conn);

  This function can only be called immediately after PQsendQuery or one
  of its sibling functions, before any other operation on the connection
  such as PQconsumeInput or PQgetResult"

But PQbatchQueueProcess() unconditionally clears conn->singleRowMode,
so whatever it was when sending the query gets lost, and besides
other queries might have been submitted in the meantime.

Also if trying to set that mode when fetching like this

 while (QbatchQueueProcess(conn)) {
   r = PQsetSingleRowMode(conn);
   if (r!=1) {
  fprintf(stderr, "PQsetSingleRowMode() failed");
   }
   .. 

it might work the first time, but on the next iterations, conn->asyncStatus
might be PGASYNC_READY, which is a failure condition for
PQsetSingleRowMode(), so that won't do.

ISTM that the simplest fix would be that when in batch mode,
PQsetSingleRowMode() should register that the last submitted query
does request that mode, and when later QbatchQueueProcess dequeues
the batch entry for the corresponding query, this flag would be popped off
and set as the current mode.

Please find attached the suggested fix, against the v5 of the patch.

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite
diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c
index 3c0be46..8ddf63d 100644
--- a/src/interfaces/libpq/fe-exec.c
+++ b/src/interfaces/libpq/fe-exec.c
@@ -1425,7 +1425,7 @@ PQmakePipelinedCommand(PGconn *conn)
 	}
 	entry->next = NULL;
 	entry->query = NULL;
-
+	entry->singleRowMode = false;
 	return entry;
 }
 
@@ -1783,16 +1783,28 @@ PQsetSingleRowMode(PGconn *conn)
 	/*
 	 * Only allow setting the flag when we have launched a query and not yet
 	 * received any results.
+	 * In batch mode, store the flag in the queue for applying it later.
 	 */
 	if (!conn)
 		return 0;
-	if (conn->asyncStatus != PGASYNC_BUSY)
-		return 0;
 	if (conn->queryclass != PGQUERY_SIMPLE &&
 		conn->queryclass != PGQUERY_EXTENDED)
 		return 0;
 	if (conn->result)
 		return 0;
+	if (conn->batch_status == PQBATCH_MODE_OFF)
+	{
+		if (conn->asyncStatus != PGASYNC_BUSY)
+			return 0;
+	}
+	else
+	{
+		/* apply to the last submitted query in the batch, or fail */
+		if (conn->cmd_queue_tail != NULL)
+			conn->cmd_queue_tail->singleRowMode = true;
+		else
+			return 0;
+	}
 
 	/* OK, set flag */
 	conn->singleRowMode = true;
@@ -2120,9 +2132,8 @@ PQbatchQueueProcess(PGconn *conn)
 	 * Initialize async result-accumulation state */
 	pqClearAsyncResult(conn);
 
-	/* reset single-row processing mode */
-	conn->singleRowMode = false;
-
+	/* Set single-row processing mode */
+	conn->singleRowMode = next_query->singleRowMode;
 
 	conn->last_query = next_query->query;
 	next_query->query = NULL;
diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h
index 33f212f..af4f753 100644
--- a/src/interfaces/libpq/libpq-int.h
+++ b/src/interfaces/libpq/libpq-int.h
@@ -315,6 +315,7 @@ typedef struct pgCommandQueueEntry
 {
 	PGQueryClass queryclass;	/* Query type; PGQUERY_SYNC for sync msg */
 	char	   *query;			/* SQL command, or NULL if unknown */
+	bool	   singleRowMode;   /* apply single row mode to this query */
 	struct pgCommandQueueEntry *next;
 }	PGcommandQueueEntry;
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-03-09 Thread Daniel Verite

Vaishnavi Prabakaran wrote:

>if (PQbatchStatus(st->con) != PQBATCH_MODE_ON)
> 
>But, it is better to use if (PQbatchStatus(st->con) ==
> PQBATCH_MODE_OFF) for this verification. Reason is there is one more state
> in batch mode - PQBATCH_MODE_ABORTED. So, if the batch mode is in aborted
> status, this check will still assume that the connection is not in batch
> mode.
> In a same way, "if(PQbatchStatus(st->con) == PQBATCH_MODE_ON)" check is
> better to be modified as "PQbatchStatus(st->con) != PQBATCH_MODE_OFF".

Agreed, these two tests need to be changed to account for the
PQBATCH_MODE_ABORTED case. Fixed in new patch.

> 2.  @@ -2207,7 +2227,47 @@ doCustom(TState *thread, CState *st, StatsData
> *agg)
> + if (PQbatchStatus(st->con) != PQBATCH_MODE_OFF)
> + {
> + commandFailed(st, "already in batch mode");
> + break;
> + }
> 
> This is not required as below PQbatchBegin() internally does this
> verification.
> 
> + PQbatchBegin(st->con);

The point of this test is to specifically forbid a sequence like this
\beginbatch
...(no \endbatch)
\beginbatch
...
and according to the doc for PQbatchBegin, it looks like the return
code won't tell us:
   "Causes a connection to enter batch mode if it is currently idle or
   already in batch mode.
int PQbatchBegin(PGconn *conn);
   Returns 1 for success"

My understanding is that "already in batch mode" is not an error.

   "Returns 0 and has no effect if the connection is not currently
   idle, i.e. it has a result ready, is waiting for more input from
   the server, etc. This function does not actually send anything to
   the server, it just changes the libpq connection state"

> 3. It is better to check the return value of PQbatchBegin() rather than
> assuming success. E.g: PQbatchBegin() will return false if the connection
> is in copy in/out/both states.

Given point #2 above, I think both tests are needed:
   if (PQbatchStatus(st->con) != PQBATCH_MODE_OFF)
{
   commandFailed(st, "already in batch mode");
   break;
}
if (PQbatchBegin(st->con) == 0)
{
   commandFailed(st, "failed to start a batch");
   break;
}

> > 3. In relation to #2, PQsendQuery() is not forbidden in batch mode
> > although I don't think it can work with it, as it's based on the old
> > protocol.
> >
> It is actually forbidden and you can see the error message "cannot
> PQsendQuery in batch mode, use PQsendQueryParams" when you use
> PQsendQuery().

Sorry for blaming the innocent piece of code, looking closer
it's pgbench that does not display the message.
With the simple query protocol, sendCommand() does essentially:

  r = PQsendQuery(st->con, sql);

... other stuff...

  if (r == 0)
  {
if (debug)
  fprintf(stderr, "client %d could not send %s\n",
  st->id, command->argv[0]);
st->ecnt++;
return false;
  }

So only in debug mode does it inform the user that it failed, and
even then, it does not display PQerrorMessage(st->con).

In non-debug mode it's opaque to the user. If it always fail, it appears
to loop forever. The caller has this comment that is relevant to the problem:

if (!sendCommand(st, command))
  {
/*
 * Failed. Stay in CSTATE_START_COMMAND state, to
 * retry. ??? What the point or retrying? Should
 * rather abort?
 */
return;
  }

I think it doesn't bother anyone up to now because the immediate
failure modes of PQsendQuery() are resource allocation or protocol
failures that tend to never happen.

Anyway currently \beginbatch avoids the problem by checking
whether querymode == QUERY_SIMPLE, to fail gracefully
instead of letting the endless loop happen.

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index f6cb5d4..f93673e 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -269,7 +269,8 @@ typedef enum
 *
 * CSTATE_START_COMMAND starts the execution of a command.  On a SQL
 * command, the command is sent to the server, and we move to
-* CSTATE_WAIT_RESULT state.  On a \sleep meta-command, the timer is 
set,
+* CSTATE_WAIT_RESULT state unless in batch mode.
+* On a \sleep meta-command, the timer is set,
 * and we enter the CSTATE_SLEEP state to wait for it to expire. Other
 * meta-commands are executed immediately.
 *
@@ -1882,11 +1883,24 @@ sendCommand(CState *st, Command *command)
if (commands[j]->type != SQL_COMMAND)
continue;
preparedStatementName(name, st->use_file, j);
-   res = PQprepare(st->con, name,
- commands[j]->argv[0], 
commands[j]->argc - 1, NULL);
-   if

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-03-07 Thread Vaishnavi Prabakaran

On Wed, Mar 8, 2017 at 3:52 AM, Daniel Verite 
wrote:

> Vaishnavi Prabakaran wrote:
>
> > Yes, I have created a new patch entry into the commitfest 2017-03 and
> > attached the latest patch with this e-mail.
>
> Please find attached a companion patch implementing the batch API in
> pgbench, exposed as \beginbatch and \endbatch meta-commands
> (without documentation).
>
> The idea for now is to make it easier to exercise the API and test
> how batching performs. I guess I'll submit the patch separately in
> a future CF, depending on when/if the libpq patch goes in.
>
>

Thanks for the companion patch and here are some comments:

1. I see, below check is used to verify if the connection is not in batch
mode:
if (PQbatchStatus(st->con) != PQBATCH_MODE_ON)

But, it is better to use if (PQbatchStatus(st->con) ==
PQBATCH_MODE_OFF) for this verification. Reason is there is one more state
in batch mode - PQBATCH_MODE_ABORTED. So, if the batch mode is in aborted
status, this check will still assume that the connection is not in batch
mode.

In a same way, "if(PQbatchStatus(st->con) == PQBATCH_MODE_ON)" check is
better to be modified as "PQbatchStatus(st->con) != PQBATCH_MODE_OFF".

2.  @@ -2207,7 +2227,47 @@ doCustom(TState *thread, CState *st, StatsData
*agg)
+ if (PQbatchStatus(st->con) != PQBATCH_MODE_OFF)
+ {
+ commandFailed(st, "already in batch mode");
+ break;
+ }

This is not required as below PQbatchBegin() internally does this
verification.

+ PQbatchBegin(st->con);

3. It is better to check the return value of PQbatchBegin() rather than
assuming success. E.g: PQbatchBegin() will return false if the connection
is in copy in/out/both states.

> While developing this, I noted a few things with 0001-v4:
>
> 1. lack of initialization for count in PQbatchQueueCount.
> Trivial fix:
>
> --- a/src/interfaces/libpq/fe-exec.c
> +++ b/src/interfaces/libpq/fe-exec.c
> @@ -1874,7 +1874,7 @@ PQisBusy(PGconn *conn)
>  int
>  PQbatchQueueCount(PGconn *conn)
>  {
> -   int count;
> +   int count = 0;
> PGcommandQueueEntry *entry;
>
>
Thanks for your review and yes, Corrected.

> 2. misleading error message in PQexecStart. It gets called by a few other
> functions than PQexec, such as PQprepare. As I understand it, the intent
> here is to forbid the synchronous functions in batch mode, so this error
> message should not single out PQexec.
>
> @@ -1932,6 +2425,13 @@ PQexecStart(PGconn *conn)
> if (!conn)
> return false;
>
> +   if (conn->asyncStatus == PGASYNC_QUEUED || conn->batch_status !=
> PQBATCH_MODE_OFF)
> +   {
> +   printfPQExpBuffer(>errorMessage,
> + libpq_gettext("cannot
> PQexec in batch mode\n"));
> +   return false;
> +   }
> +
>
>

Hmm, this error message goes with the flow of other error messages in the
same function. E.g: "PQexec not allowed during COPY BOTH" . But, anyways I
modified the
error message to be more generic.

> 3. In relation to #2, PQsendQuery() is not forbidden in batch mode
> although I don't think it can work with it, as it's based on the old
> protocol.
>
>
>
It is actually forbidden and you can see the error message "cannot
PQsendQuery in batch mode, use PQsendQueryParams" when you use
PQsendQuery().
Attached the updated patch.

Thanks & Regards,
Vaishnavi
Fujitsu Australia.

0001-Pipelining-batch-support-for-libpq-code-v5.patch
Description: Binary data

0002-Pipelining-batch-support-for-libpq-test-v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-03-07 Thread Craig Ringer

On 8 March 2017 at 00:52, Daniel Verite  wrote:
> Vaishnavi Prabakaran wrote:
>
>> Yes, I have created a new patch entry into the commitfest 2017-03 and
>> attached the latest patch with this e-mail.
>
> Please find attached a companion patch implementing the batch API in
> pgbench, exposed as \beginbatch and \endbatch meta-commands
> (without documentation).
>
> The idea for now is to make it easier to exercise the API and test
> how batching performs. I guess I'll submit the patch separately in
> a future CF, depending on when/if the libpq patch goes in.

That's great, thanks, and thanks also for the fixes. Any chance you
can attach your updated patch?

I looked at modifying psql to support batching when run
non-interactively, but it would've required major restructuring of its
control loop and I ran out of time. I didn't think of modifying
pgbench. Great to see.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-03-07 Thread Daniel Verite

Vaishnavi Prabakaran wrote:

> Yes, I have created a new patch entry into the commitfest 2017-03 and
> attached the latest patch with this e-mail.

Please find attached a companion patch implementing the batch API in
pgbench, exposed as \beginbatch and \endbatch meta-commands
(without documentation).

The idea for now is to make it easier to exercise the API and test
how batching performs. I guess I'll submit the patch separately in
a future CF, depending on when/if the libpq patch goes in.

While developing this, I noted a few things with 0001-v4:

1. lack of initialization for count in PQbatchQueueCount.
Trivial fix:

--- a/src/interfaces/libpq/fe-exec.c
+++ b/src/interfaces/libpq/fe-exec.c
@@ -1874,7 +1874,7 @@ PQisBusy(PGconn *conn)
 int
 PQbatchQueueCount(PGconn *conn)
 {
-   int count;
+   int count = 0;
PGcommandQueueEntry *entry;

2. misleading error message in PQexecStart. It gets called by a few other
functions than PQexec, such as PQprepare. As I understand it, the intent
here is to forbid the synchronous functions in batch mode, so this error
message should not single out PQexec.

@@ -1932,6 +2425,13 @@ PQexecStart(PGconn *conn)
if (!conn)
return false;
 
+   if (conn->asyncStatus == PGASYNC_QUEUED || conn->batch_status !=
PQBATCH_MODE_OFF)
+   {
+   printfPQExpBuffer(>errorMessage,
+ libpq_gettext("cannot
PQexec in batch mode\n"));
+   return false;
+   }
+

3. In relation to #2, PQsendQuery() is not forbidden in batch mode
although I don't think it can work with it, as it's based on the old
protocol. 


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index f6cb5d4..9b2fce8 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -269,7 +269,8 @@ typedef enum
 *
 * CSTATE_START_COMMAND starts the execution of a command.  On a SQL
 * command, the command is sent to the server, and we move to
-* CSTATE_WAIT_RESULT state.  On a \sleep meta-command, the timer is 
set,
+* CSTATE_WAIT_RESULT state unless in batch mode.
+* On a \sleep meta-command, the timer is set,
 * and we enter the CSTATE_SLEEP state to wait for it to expire. Other
 * meta-commands are executed immediately.
 *
@@ -1882,11 +1883,24 @@ sendCommand(CState *st, Command *command)
if (commands[j]->type != SQL_COMMAND)
continue;
preparedStatementName(name, st->use_file, j);
-   res = PQprepare(st->con, name,
- commands[j]->argv[0], 
commands[j]->argc - 1, NULL);
-   if (PQresultStatus(res) != PGRES_COMMAND_OK)
-   fprintf(stderr, "%s", 
PQerrorMessage(st->con));
-   PQclear(res);
+   if (PQbatchStatus(st->con) != PQBATCH_MODE_ON)
+   {
+   res = PQprepare(st->con, name,
+   
commands[j]->argv[0], commands[j]->argc - 1, NULL);
+   if (PQresultStatus(res) != 
PGRES_COMMAND_OK)
+   fprintf(stderr, "%s", 
PQerrorMessage(st->con));
+   PQclear(res);
+   }
+   else
+   {
+   /*
+* In batch mode, we use asynchronous 
functions. If a server-side
+* error occurs, it will be processed 
later among the other results.
+*/
+   if (!PQsendPrepare(st->con, name,
+  
commands[j]->argv[0], commands[j]->argc - 1, NULL))
+   fprintf(stderr, "%s", 
PQerrorMessage(st->con));
+   }
}
st->prepared[st->use_file] = true;
}
@@ -2165,7 +2179,13 @@ doCustom(TState *thread, CState *st, StatsData *agg)
return;
}
else
-   st->state = CSTATE_WAIT_RESULT;
+   {
+   /* Wait for results, unless in 
batch mode */
+

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-02-22 Thread Craig Ringer

On 22 Feb. 2017 14:14, "Vaishnavi Prabakaran" 
wrote:

Thanks for reviewing the patch.
>

Thanks for picking it up! I've wanted to see this process for some time,
but just haven't had the bandwidth for it.

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-02-20 Thread Michael Paquier

On Fri, Feb 17, 2017 at 2:17 PM, Prabakaran, Vaishnavi
 wrote:
> On 22 November 2016 at 18:32, Craig Ringer wrote:
> I am interested in this patch and addressed various below comments from 
> reviewers. And, I have separated out code and test module into 2 patches. So, 
> If needed, test patch can be enhanced more, meanwhile code patch can be 
> committed.

Cool. Nice to see a new version of this patch.
(You may want to your replies breath a bit, for example by adding an
extra newline to the paragraph you are answering to.)

>>Renaming and refactoring new APIs
>> +PQisInBatchMode   172
>>+PQqueriesInBatch  173
>>+PQbeginBatchMode  174
>>+PQendBatchMode175
>>+PQsendEndBatch176
>>+PQgetNextQuery177
>>+PQbatchIsAborted  178
>>This set of routines is a bit inconsistent. Why not just prefixing them with 
>>PQbatch? Like that for example:
>> PQbatchStatus(): consists of disabled/inactive/none, active, error. This 
>> covers both PQbatchIsAborted() and PQisInBatchMode().
>>PQbatchBegin()
>>PQbatchEnd()
>>PQbatchQueueSync() or PQbatchQueueFlush, same as PQsendEndBatch() to add and 
>>process a sync message into the queue.
>
> Renamed and modified batch status APIs as below
> PQisInBatchMode & PQbatchIsAborted ==> PQbatchStatus
> PQqueriesInBatch ==> PQbatchQueueCount
> PQbeginBatchMode ==> PQbatchBegin
> PQendBatchMode ==> PQbatchEnd
> PQsendEndBatch ==> PQbatchQueueSync
> PQgetNextQuery ==> PQbatchQueueProcess

Thanks. This seems a way cleaner interface to me. Others may have a
different opinion so let's see if there are some.

>>PQbatchQueueCount(): returns N>0 if there are N entries, 0 if empty,-1 on 
>>failure
>>PQbatchQueueProcess(): returns 1 if process can begin, 0 if not, -1 on 
>>failure (OOM)
>
> I think it is still ok to keep the current behaviour like other ones present 
> in the same file. E.g:"PQsendPrepare" "PQsendQueryGuts"
>
>>PQqueriesInBatch() (Newname(NN):PQbatchQueueCount)doesn't work as documented.
>>It says:
>>"Returns the number of queries still in the queue for this batch"
>>but in fact it's implemented as a Boolean.
>
> Modified the logic to count number of entries in pending queue and return the 
> count
>
>>The changes in src/test/examples/ are not necessary anymore. You moved all 
>>the tests to test_libpq (for the best actually).
> Removed these unnecessary changes from src/test/examples folder and corrected 
> the path mentioned in comments section of testlibpqbatch.c

>>But with the libpq batch API, maybe this could be modernized
>>with meta-commands like this:
>>\startbatch
>>...
>>\endbatch
>
> I think it is a separate patch candidate.

Definitely agreed on that. Let's not complicate things further more.

>> It may be a good idea to check for PG_PROTOCOL_MAJOR < 3 and issue an error 
>> for the new routines.
>
> All the APIs which supports asynchronous command processing can be executed 
> only if PG_PROTOCOL_MAJOR >= 3. So, adding it to new routines are not really 
> needed.

OK. Let's not complicate the patch more than necessary.

After an extra lookup at the patch, here are some comments.

In the docs when listing any of the PQBATCH_MODE_* variables, you
should have a markup  around them. It would be cleaner to
make a list of the potential values that can be returned by
PQbatchStatus() using .

+   while (queue != NULL)
+   {
+   PGcommandQueueEntry *prev = queue;
+
+   queue = queue->next;
+   if (prev->query)
+   free(prev->query);
+   free(prev);
+   }
+   conn->cmd_queue_recycle = NULL
This could be useful as a separate routine.

+/* PQmakePipelinedCommand
+ * Get a new command queue entry, allocating it if required. Doesn't add it to
+ * the tail of the queue yet, use PQappendPipelinedCommand once the command has
+ * been written for that. If a command fails once it's called this, it should
+ * use PQrecyclePipelinedCommand to put it on the freelist or release it.
+ *
This comment block is not project-like. Please use an empty line as
first line with only "/*". The same comment applies to a bunch of the
routines introduced by this patch.

Not sure there is much point in having PQstartProcessingNewQuery. It
only does two things in two places, so that's not worth the loss in
readability.

+   if (conn->batch_status != PQBATCH_MODE_OFF)
+   {
+   pipeCmd = PQmakePipelinedCommand(conn);
+
+   if (pipeCmd == NULL)
+   return 0;   /* error msg already set */
+
+   last_query = >query;
+   queryclass = >queryclass;
+   }
+   else
+   {
+   last_query = >last_query;
+   queryclass = >queryclass;
+   }
This setup should happen further down.

conn->in_batch is undefined, causing the patch to fail to compile. And
actually you should not need it.

-   conn->asyncStatus = PGASYNC_READY;
+   conn->asyncStatus = PGASYNC_READY_MORE;
return;

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-02-16 Thread Prabakaran, Vaishnavi

On 22 November 2016 at 18:32, Craig Ringer wrote:
> On 22 November 2016 at 15:14, Haribabu Kommi
>  wrote:
> >
> > On Fri, Nov 18, 2016 at 7:18 PM, Craig Ringer 
> wrote:
> >>
> >> The latest is what's attached upthread and what's in the git repo
> >> also referenced upthread.
> >>
> >> I haven't been able to update in response to more recent review due
> >> to other development commitments. At this point I doubt I'll be able
> >> to get update it again in time for v10, so anyone who wants to adopt
> >> it is welcome.
> >
> >
> > Currently patch status is marked as "returned with feedback" in the
> > 11-2016 commitfest. Anyone who wants to work on it can submit the
> > updated patch by taking care of all review comments and change the
> > status of the patch at any time.
> >
> > Thanks for the patch.
>
> Thanks. Sorry I haven't had time to work on it. Priorities.
Hi,
I am interested in this patch and addressed various below comments from 
reviewers. And, I have separated out code and test module into 2 patches. So, 
If needed, test patch can be enhanced more, meanwhile code patch can be 
committed.

>Renaming and refactoring new APIs
> +PQisInBatchMode   172
>+PQqueriesInBatch  173
>+PQbeginBatchMode  174
>+PQendBatchMode175
>+PQsendEndBatch176
>+PQgetNextQuery177
>+PQbatchIsAborted  178
>This set of routines is a bit inconsistent. Why not just prefixing them with 
>PQbatch? Like that for example:
> PQbatchStatus(): consists of disabled/inactive/none, active, error. This 
> covers both PQbatchIsAborted() and PQisInBatchMode().
>PQbatchBegin()
>PQbatchEnd()
>PQbatchQueueSync() or PQbatchQueueFlush, same as PQsendEndBatch() to add and 
>process a sync message into the queue.

Renamed and modified batch status APIs as below
PQisInBatchMode & PQbatchIsAborted ==> PQbatchStatus
PQqueriesInBatch ==> PQbatchQueueCount
PQbeginBatchMode ==> PQbatchBegin
PQendBatchMode ==> PQbatchEnd
PQsendEndBatch ==> PQbatchQueueSync
PQgetNextQuery ==> PQbatchQueueProcess

>PQbatchQueueCount(): returns N>0 if there are N entries, 0 if empty,-1 on 
>failure
>PQbatchQueueProcess(): returns 1 if process can begin, 0 if not, -1 on failure 
>(OOM)
I think it is still ok to keep the current behaviour like other ones present in 
the same file. E.g:"PQsendPrepare" "PQsendQueryGuts"

>PQqueriesInBatch() (Newname(NN):PQbatchQueueCount)doesn't work as documented.
>It says:
>"Returns the number of queries still in the queue for this batch"
>but in fact it's implemented as a Boolean.
Modified the logic to count number of entries in pending queue and return the 
count

>The changes in src/test/examples/ are not necessary anymore. You moved all the 
>tests to test_libpq (for the best actually).
Removed these unnecessary changes from src/test/examples folder and corrected 
the path mentioned in comments section of testlibpqbatch.c

> +   while (queue != NULL)
>+  {
>   PGcommandQueueEntry *prev = queue;
>+   queue = queue->next;
>+   free(prev);
>+   }
>This should free prev->query.
Both prev->query and prev is freed. Also, this applies to "cmd_queue_recycle" 
too.

>Running directly make check in src/test/modules/test_libpq/ does not work
Modified "check" rule in makefile

>You could just remove the VERBOSE flag in the tests, having a test more 
>talkative is always better.
Removed ifdef VERBOSE checks.

>But with the libpq batch API, maybe this could be modernized
>with meta-commands like this:
>\startbatch
>...
>\endbatch
I think it is a separate patch candidate.

> It is possible to guess each one of those errors(occurred in 
> PQbatchQueueProcess API) with respectively
> PQgetResult == NULL, PQisInBatchMode() and PQqueriesInBatch().
> Definitely it should be mentioned in the docs that it is possible to make a 
> difference between all those states.
Updated documentation section of PQbatchQueueProcess() with these details.

> +   entry = PQmakePipelinedCommand(conn);
>+   entry->queryclass = PGQUERY_SYNC;
>+   entry->query = NULL;
>PQmakePipelinedCommand() returns NULL, and boom.
Corrected to return false if PQmakePipelinedCommand() returns NULL.

> +   boolin_batch;   /* connection is in batch (pipelined) mode */
>+   boolbatch_aborted;  /* current batch is aborted, discarding until 
>next Sync */
>Having only one flag would be fine. batch_aborted is set of used only
>when in_batch is used, so both have a strong link
Yes, agree that tracking the batch status via one flag is more clean
So, Added new enum typedef enum
{
PQBATCH_MODE_OFF,
PQBATCH_MODE_ON,
PQBATCH_MODE_ABORTED
} PQBatchStatus;
and " PQBatchStatus batch_status"  member of pg_conn is used to track the 
status of batch mode.

>/* OK, it's launched! */
>-   conn->asyncStatus = PGASYNC_BUSY;
>+   if (conn->in_batch)
>+   PQappendPipelinedCommand(conn, pipeCmd);
>+   else
>+

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-02-13 Thread Iwata, Aya

Hi,

On 18 November 2016 at 08:18, Craig Ringer wrote:

>At this point I doubt I'll be able to

>get update it again in time for v10, so anyone who wants to adopt it

>is welcome.

I am interested in pipeline/batch support for ECPG, and found this thread.

I updated Craig's patch [1] to apply this one to HEAD. Moreover, I fixed an 
easy typo.

First, I'm changing PQqueriesInBatch() to work as documented.

After that, I plan to reflect contents of reviews in the patch.

On Mon, Oct 3, 2016 at 11:52 PM, Daniel Verite wrote:

> Wouldn't pgbench benefit from it?

> It was mentioned some time ago [1], in relationship to the

> \into construct, how client-server latency was important enough to

> justify the use of a "\;" separator between statements, to send them

> as a group.

>

> But with the libpq batch API, maybe this could be modernized

> with meta-commands like this:

>   \startbatch

>   ...

>   \endbatch

I'm planning to work on meta-commands to support batch mode after committing 
this patch successfully.

[1]https://github.com/2ndQuadrant/postgres/tree/dev/libpq-async-batch

Regards,

Aya Iwata

FUJITSU

0001-Pipelining-batch-support-for-libpq-v3.patch
Description: 0001-Pipelining-batch-support-for-libpq-v3.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2016-11-21 Thread Craig Ringer

On 22 November 2016 at 15:14, Haribabu Kommi  wrote:
>
> On Fri, Nov 18, 2016 at 7:18 PM, Craig Ringer  wrote:
>>
>> The latest is what's attached upthread and what's in the git repo also
>> referenced upthread.
>>
>> I haven't been able to update in response to more recent review due to
>> other development commitments. At this point I doubt I'll be able to
>> get update it again in time for v10, so anyone who wants to adopt it
>> is welcome.
>
>
> Currently patch status is marked as "returned with feedback" in the 11-2016
> commitfest. Anyone who wants to work on it can submit the updated patch
> by taking care of all review comments and change the status of the patch
> at any time.
>
> Thanks for the patch.

Thanks. Sorry I haven't had time to work on it. Priorities.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2016-11-21 Thread Haribabu Kommi

On Fri, Nov 18, 2016 at 7:18 PM, Craig Ringer  wrote:

> The latest is what's attached upthread and what's in the git repo also
> referenced upthread.
>
> I haven't been able to update in response to more recent review due to
> other development commitments. At this point I doubt I'll be able to
> get update it again in time for v10, so anyone who wants to adopt it
> is welcome.
>

Currently patch status is marked as "returned with feedback" in the 11-2016
commitfest. Anyone who wants to work on it can submit the updated patch
by taking care of all review comments and change the status of the patch
at any time.

Thanks for the patch.

Regards,
Hari Babu
Fujitsu Australia

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2016-11-18 Thread Craig Ringer

On 18 November 2016 at 14:04, Tsunakawa, Takayuki
 wrote:
> Hello, Craig,
>
> I'm sorry to be late to review your patch.  I've just been able to read the 
> HTML doc first.  Can I get the latest .patch file for reading and running the 
> code?

The latest is what's attached upthread and what's in the git repo also
referenced upthread.

I haven't been able to update in response to more recent review due to
other development commitments. At this point I doubt I'll be able to
get update it again in time for v10, so anyone who wants to adopt it
is welcome.

> Here are some comments and questions.  I tried to avoid the same point as 
> other reviewers, but there may be an overlap.
>
>
> (1)
> The example
>  UPDATE mytable SET x = x + 1;
> should be
>  UPDATE mytable SET x = x + 1 WHERE id = 42;

Good catch.

> (2)
> "The server usually begins executing the batch before all commands in the 
> batch are queued and the end of batch command is sent."
>
> Does this mean that the app developer cannot control or predict how many TCP 
> transmissions a batch is sent with?

That's not what that sentence means since the TCP layer is much lower
level, but what you say is also true.

All the docs are saying there is that there's no explicit control over
when we start sending the batch to the server. How that translates to
individual TCP packets, etc, is not its problem.

>  For example, if I want to insert 10 rows into a table in bulk, can I send 
> those 10 rows (and the end of batch command) efficiently in one TCP 
> transmission, or are they split by libpq into multiple TCP transmissions?

libpq neither knows nor cares about individual TCP packets. It sends
things to the kernel and lets the kernel deal with that.

That said, you can use socket options TCP_CORK and TCP_NODELAY to get
some control over how and when data is sent. If you know you're about
to send more, you might cork the socket to give the kernel a hint that
it should expect more data to send.

> (3)
> "To avoid deadlocks on large batches the client should be structured around a 
> nonblocking I/O loop using a function like select, poll, epoll, 
> WaitForMultipleObjectEx, etc."
>
> Can't we use some (new) platform-independent API instead of using poll() or 
> WaitForMultipleObject()?  e.g. some thin wrapper around pqWait().  It seems a 
> bit burdonsome to have to use an OS-specific API to just wait for libpq.  
> Apart from that, it does not seem possible to wait for the socket in 64-bit 
> apps on Windows, because SOCKET is 64-bit while PQsocket() returns int.

IMO this problem is out of scope for this patch. A wait abstraction
might be nice, but next thing we know we'll be reinventing APR or
NSPR, I think that's a totally different problem.

Not being able to get a win32 SOCKET from libpq seems like a bit of an
oversight, and it'd definitely be good to address that to make async
mode more usable on Windows. There's some other ugliness in PQsocket
already per the comments there. I think it should be a separate patch,
though.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2016-11-17 Thread Tsunakawa, Takayuki

Hello, Craig,

I'm sorry to be late to review your patch.  I've just been able to read the 
HTML doc first.  Can I get the latest .patch file for reading and running the 
code?

Here are some comments and questions.  I tried to avoid the same point as other 
reviewers, but there may be an overlap.


(1)
The example
 UPDATE mytable SET x = x + 1;
should be
 UPDATE mytable SET x = x + 1 WHERE id = 42;


(2)
"The server usually begins executing the batch before all commands in the batch 
are queued and the end of batch command is sent."

Does this mean that the app developer cannot control or predict how many TCP 
transmissions a batch is sent with?  For example, if I want to insert 10 rows 
into a table in bulk, can I send those 10 rows (and the end of batch command) 
efficiently in one TCP transmission, or are they split by libpq into multiple 
TCP transmissions?


(3)
"To avoid deadlocks on large batches the client should be structured around a 
nonblocking I/O loop using a function like select, poll, epoll, 
WaitForMultipleObjectEx, etc."

Can't we use some (new) platform-independent API instead of using poll() or 
WaitForMultipleObject()?  e.g. some thin wrapper around pqWait().  It seems a 
bit burdonsome to have to use an OS-specific API to just wait for libpq.  Apart 
from that, it does not seem possible to wait for the socket in 64-bit apps on 
Windows, because SOCKET is 64-bit while PQsocket() returns int.

[winsock2.h]
/*
 * The new type to be used in all
 * instances which refer to sockets.
 */
typedef UINT_PTRSOCKET;

Regards
Takayuki Tsunakawa


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2016-10-14 Thread Craig Ringer

On 14 October 2016 at 22:15, Shay Rojansky  wrote:

> Unless I'm mistaken TCP_CORK is not necessarily going to work across all
> platforms (e.g. Windows), although SO_LINGER (which is more standard) also
> helps here.

Yeah, true.

You can also twiddle TCP_NODELAY on and off on other platforms.

Anyway, my point is that it's not likely to be crucial, especially
since even without socket options the host will often do packet
combining if the queue isn't empty.

> Unless I'm mistaken, in the extended protocol you can't combine multiple ;
> delimited queries in a single Parse - that's a feature supported only by the
> Query message of the simple protocol. But then, if you're in the simple
> protocol Sync doesn't play any role, does it? I mean, a Query message
> behaves as though it's implicitly followed by a Sync message - an error in a
> previous Query message doesn't cause later messages to be skipped.

Right, good point. So that concern isn't relevant.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2016-10-14 Thread Shay Rojansky

>
> >  Of course, this is a
> > relatively minor performance issue (especially when compared to the
> overall
> > performance benefits provided by batching), and providing an API
> distinction
> > between adding a Sync and flushing the buffer may over-complicate the
> API. I
> > just thought I'd mention it.
>
> Unnecessary IMO. If we really want to add it later we'd probably do so
> by setting a "no flush on endbatch" mode and adding an explicit flush
> call. But I expect TCP_CORK will satisfy all realistic needs.
>

Unless I'm mistaken TCP_CORK is not necessarily going to work across all
platforms (e.g. Windows), although SO_LINGER (which is more standard) also
helps here.

> When it's about to send a batch, Npgsql knows whether it's in an
> (explicit)
> > transaction or not (by examining the transaction indicator on the last
> > ReadyForQuery message it received). If it's not in an (explicit)
> > transaction, it automatically inserts a Sync message after every
> Execute. If
> > some statement happens to be a BEGIN, it will be executed as a normal
> > statement and so on. The only issue is that if an error occurs after a
> > sneaked-in BEGIN, all subsequent statements will fail, and all have the
> Sync
> > messages Npgsql inserted. The end result will be a series of errors that
> > will be raised up to the user, but this isn't fundamentally different
> from
> > the case of a simple auto-commit batch containing multiple failures
> (because
> > of unique constraint violation or whatever) - multiple errors is
> something
> > that will have to be handled in any case.
>
> I'm a bit hesitant about how this will interact with multi-statements
> containing embedded BEGIN or COMMIT, where a single protocol message
> contains multiple ; delimited queries. But at least at this time of
> night I can't give a concrete problematic example.


Unless I'm mistaken, in the extended protocol you can't combine multiple ;
delimited queries in a single Parse - that's a feature supported only by
the Query message of the simple protocol. But then, if you're in the simple
protocol Sync doesn't play any role, does it? I mean, a Query message
behaves as though it's implicitly followed by a Sync message - an error in
a previous Query message doesn't cause later messages to be skipped...

Note that Npgsql only rarely uses the simple protocol for user messages.
Npgsql is a binary-only driver, and the simple protocol doesn't support
requesting binary results. So only the extended protocol is used, except
for some edge cases where it's possible to use the simple protocol for
efficiency - statements with no parameters and where the ExecuteNonQuery
API is used (i.e. the user won't access any results).

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2016-10-14 Thread Craig Ringer

On 14 October 2016 at 18:09, Shay Rojansky  wrote:
>> > It has recently come to my attention that this implementation is
>> > problematic
>> > because it forces the batch to occur within a transaction; in other
>> > words,
>> > there's no option for a non-transactional batch.
>>
>> That's not strictly the case. If you explicitly BEGIN and COMMIT,
>> those operations are honoured within the batch.
>
>
> I wasn't precise in my formulation (although I think we understand each
> other)... The problem I'm trying to address here is the fact that in the
> "usual" batching implementation (i.e. where a single Sync message is sent at
> the end of the batch), there's no support for batches which have no
> transactions whatsoever (i.e. where each statement is auto-committed and
> errors in earlier statements don't trigger skipping of later statements).

Right, you can't use implicit transactions delimited by protocol
message boundaries when batching.

>> The design I have in libpq allows for this by allowing the client to
>> delimit batches without ending batch mode, concurrently consuming a
>> stream of multiple batches. Each endbatch is a Sync. So a client that
>> wants autocommit-like behavour can send a series of 1-query batches.
>
> Ah, I see. libpq's API is considerably more low-level than what Npgsql needs
> to provide. If I understand correctly, you allow users to specify exactly
> where to insert Sync messages (if at all)

They don't get total control, but they can cause a Sync to be emitted
after any given statement when in batch mode.

> so that any number of statements
> arbitrarily interleaved with Sync messages can be sent without starting to
> read any results.

Right.

> if so, then the user indeed has everything they need to
> control the exact transactional behavior they want (including full
> auto-commit) without compromising on performance in any way (i.e. by
> increasing roundtrips).

Right.

> The only minor problem I can see is that PQsendEndBatch not only adds a Sync
> message to the buffer, but also flushes it.

It only does a non-blocking flush.

> This means that you may be
> forcing users to needlessly flush the buffer just because they wanted to
> insert a Sync. In other words, users can't send the following messages in a
> single buffer/packet:
> Prepare1/Bind1/Describe1/Execute1/Sync1/Prepare2/Bind2/Describe2/Execute2/Sync2
> - they have to be split into different packets.

Oh, right. That's true, but I'm not really sure we care. I suspect
that the OS will tend to coalesce them anyway, since we're not
actually waiting until the socket sends the message. At least when the
socket isn't able to send as fast as input comes in. It might matter
for local performance in incredibly high throughput applications, but
I suspect there will be a great many other things that come first.

Anyway, the client can already control this with TCP_CORK.

>  Of course, this is a
> relatively minor performance issue (especially when compared to the overall
> performance benefits provided by batching), and providing an API distinction
> between adding a Sync and flushing the buffer may over-complicate the API. I
> just thought I'd mention it.

Unnecessary IMO. If we really want to add it later we'd probably do so
by setting a "no flush on endbatch" mode and adding an explicit flush
call. But I expect TCP_CORK will satisfy all realistic needs.

> That's a good point. I definitely don't want to depend on client-side
> parsing of SQL in any way (in fact a planned feature is to allow using
> Npgsql without any sort of client-side parsing/manipulation of SQL).
> However, the fact that BEGIN/COMMIT can be sent in batches doesn't appear
> too problematic to me.
>
> When it's about to send a batch, Npgsql knows whether it's in an (explicit)
> transaction or not (by examining the transaction indicator on the last
> ReadyForQuery message it received). If it's not in an (explicit)
> transaction, it automatically inserts a Sync message after every Execute. If
> some statement happens to be a BEGIN, it will be executed as a normal
> statement and so on. The only issue is that if an error occurs after a
> sneaked-in BEGIN, all subsequent statements will fail, and all have the Sync
> messages Npgsql inserted. The end result will be a series of errors that
> will be raised up to the user, but this isn't fundamentally different from
> the case of a simple auto-commit batch containing multiple failures (because
> of unique constraint violation or whatever) - multiple errors is something
> that will have to be handled in any case.

I'm a bit hesitant about how this will interact with multi-statements
containing embedded BEGIN or COMMIT, where a single protocol message
contains multiple ; delimited queries. But at least at this time of
night I can't give a concrete problematic example.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2016-10-14 Thread Shay Rojansky

>
> > It has recently come to my attention that this implementation is
> problematic
> > because it forces the batch to occur within a transaction; in other
> words,
> > there's no option for a non-transactional batch.
>
> That's not strictly the case. If you explicitly BEGIN and COMMIT,
> those operations are honoured within the batch.
>

I wasn't precise in my formulation (although I think we understand each
other)... The problem I'm trying to address here is the fact that in the
"usual" batching implementation (i.e. where a single Sync message is sent
at the end of the batch), there's no support for batches which have no
transactions whatsoever (i.e. where each statement is auto-committed and
errors in earlier statements don't trigger skipping of later statements).

What's missing is implicit transactions. Usually if you send a series
> of messages they will each get their own implicit transaction. If you
> batch them, the whole lot gets an implicit transaction. This is
> because the PostgreSQL v3 protocol conflates transaction delineation
> with protocol error recovery into a single Sync message type.
>

That's right. I sent a message complaining about this conflation a while
ago:
https://www.postgresql.org/message-id/CADT4RqDdo9EcFbxwB_YO2H3BVZ0t-1qqZ%3D%2B%2BdVMnYaN6BpyUGQ%40mail.gmail.com.
There weren't any responses, although I'll add a note on the wiki on this
as a requested feature for the v4 protocol.

If the mythical v4 protocol ever happens I'd want to split Sync into
> two messages, one which is a protocol synchronisation message and
> another that is a transaction delimiter. Or give it flags or whatever.
>

Totally agree.


> In the mean time:
>
> > This can be a problem for
> > several reasons: users may want to sent off a batch of inserts, not
> caring
> > whether one of them fails (e.g. because of a unique constraint
> violation).
> > In other words, in some scenarios it may be appropriate for later batched
> > statements to be executed when an earlier batched statement raised an
> error.
> > If Sync is only sent at the very end, this isn't possible.
>
> Right, and that remains the case even with explicit transaction
> delineation, because the first ERROR causes processing of all
> subsequent messages to be skipped.
>
> The design I have in libpq allows for this by allowing the client to
> delimit batches without ending batch mode, concurrently consuming a
> stream of multiple batches. Each endbatch is a Sync. So a client that
> wants autocommit-like behavour can send a series of 1-query batches.


> I think I'll need to document this a bit better since it's more subtle
> than I properly explained.
>

Ah, I see. libpq's API is considerably more low-level than what Npgsql
needs to provide. If I understand correctly, you allow users to specify
exactly where to insert Sync messages (if at all), so that any number of
statements arbitrarily interleaved with Sync messages can be sent without
starting to read any results. If so, then the user indeed has everything
they need to control the exact transactional behavior they want (including
full auto-commit) without compromising on performance in any way (i.e. by
increasing roundtrips).

The only minor problem I can see is that PQsendEndBatch not only adds a
Sync message to the buffer, but also flushes it. This means that you may be
forcing users to needlessly flush the buffer just because they wanted to
insert a Sync. In other words, users can't send the following messages in a
single buffer/packet:
Prepare1/Bind1/Describe1/Execute1/Sync1/Prepare2/Bind2/Describe2/Execute2/Sync2
- they have to be split into different packets. Of course, this is a
relatively minor performance issue (especially when compared to the overall
performance benefits provided by batching), and providing an API
distinction between adding a Sync and flushing the buffer may
over-complicate the API. I just thought I'd mention it.


> Yes, that's what I suggest, and basically what the libpq batch
> interface does, though it expects the client to deal with the
> transaction boundaries.
>
> You will need to think hard about transaction boundaries as they
> relate to multi-statements unless nPgSQL parses out each statement
> from multi-statement strings like PgJDBC does. Otherwise a user can
> sneak in:
>
> somestatement; BEGIN; someotherstatement;
>
> or
>
> somestatement; CoMMiT; otherstatement;
>

That's a good point. I definitely don't want to depend on client-side
parsing of SQL in any way (in fact a planned feature is to allow using
Npgsql without any sort of client-side parsing/manipulation of SQL).
However, the fact that BEGIN/COMMIT can be sent in batches doesn't appear
too problematic to me.

When it's about to send a batch, Npgsql knows whether it's in an (explicit)
transaction or not (by examining the transaction indicator on the last
ReadyForQuery message it received). If it's not in an (explicit)
transaction, it automatically inserts a Sync message after every

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2016-10-13 Thread Craig Ringer

On 12 October 2016 at 19:51, Shay Rojansky  wrote:
> Hi all. I thought I'd share some experience from Npgsql regarding
> batching/pipelining - hope this isn't off-topic.

Not at all.

> Npgsql has supported batching for quite a while, similar to what this patch
> proposes - with a single Sync message is sent at the end.

PgJDBC does too. The benefits of it there are what prompted me to do
this, not only so libpq users can use it directly but so psqlODBC,
psycopg2, etc can benefit from it if they choose to expose it.

> It has recently come to my attention that this implementation is problematic
> because it forces the batch to occur within a transaction; in other words,
> there's no option for a non-transactional batch.

That's not strictly the case. If you explicitly BEGIN and COMMIT,
those operations are honoured within the batch.

What's missing is implicit transactions. Usually if you send a series
of messages they will each get their own implicit transaction. If you
batch them, the whole lot gets an implicit transaction. This is
because the PostgreSQL v3 protocol conflates transaction delineation
with protocol error recovery into a single Sync message type.

It's very similar to the behaviour of multi-statements, where:

psql -c "CREATE TABLE fred(x integer); DROP TABLE notexists;"

doesn't create "fred", but

psql -c "BEGIN; CREATE TABLE fred(x integer); COMMIT; BEGIN; DROP
TABLE notexists; COMMMIT;"

... does. And in fact it's for almost the same reason. They're sent as
a single SimpleQuery message by psql and split up client side, but the
effect is the same as two separate Query messages followed by a Sync.

It isn't simple to manage this client-side, because libpq doesn't know
whether a given command string may contain transaction delimiting
statements or not and can't reliably look for them without client-side
parsing of the SQL. So it can't dispatch its own BEGIN/COMMIT around
statements in a batch that it thinks might be intended to be
autocommit, and anyway that'd result in sending 3 queries for every 1
client query, which would suck.

If the mythical v4 protocol ever happens I'd want to split Sync into
two messages, one which is a protocol synchronisation message and
another that is a transaction delimiter. Or give it flags or whatever.

In the mean time:

> This can be a problem for
> several reasons: users may want to sent off a batch of inserts, not caring
> whether one of them fails (e.g. because of a unique constraint violation).
> In other words, in some scenarios it may be appropriate for later batched
> statements to be executed when an earlier batched statement raised an error.
> If Sync is only sent at the very end, this isn't possible.

Right, and that remains the case even with explicit transaction
delineation, because the first ERROR causes processing of all
subsequent messages to be skipped.

The design I have in libpq allows for this by allowing the client to
delimit batches without ending batch mode, concurrently consuming a
stream of multiple batches. Each endbatch is a Sync. So a client that
wants autocommit-like behavour can send a series of 1-query batches.

I think I'll need to document this a bit better since it's more subtle
than I properly explained.

> Another example
> of a problem (which actually happened) is that transactions acquire
> row-level locks, and so may trigger deadlocks if two different batches
> update the same rows in reverse order. Both of these issues wouldn't occur
> if the batch weren't implicitly batched.

Same solution as above.

> My current plan is to modify the batch implementation based on whether we're
> in an (explicit) transaction or not. If we're in a transaction, then it
> makes perfect sense to send a single Sync at the end as is being proposed
> here - any failure would cause the transaction to fail anyway, so skipping
> all subsequent statements until the batch's end makes sense. However, if
> we're not in an explicit transaction, I plan to insert a Sync message after
> each individual Execute, making non-transactional batched statements more or
> less identical in behavior to non-transactional unbatched statements. Note
> that this mean that a batch can generate multiple errors, not just one.

Yes, that's what I suggest, and basically what the libpq batch
interface does, though it expects the client to deal with the
transaction boundaries.

You will need to think hard about transaction boundaries as they
relate to multi-statements unless nPgSQL parses out each statement
from multi-statement strings like PgJDBC does. Otherwise a user can
sneak in:

somestatement; BEGIN; someotherstatement;

or

somestatement; CoMMiT; otherstatement;

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2016-10-13 Thread Jim Nasby


On 10/4/16 11:54 PM, Michael Paquier wrote:

+   
+Much like asynchronous query mode, there is no performance disadvantage to
+using batching and pipelining. It somewhat increased client application
+complexity and extra caution is required to prevent client/server network
+deadlocks, but can offer considerable performance improvements.
+   
I would reword that a bit "it increases client application complexity
and extra caution is required to prevent client/server deadlocks but
offers considerable performance improvements".


Unrelated, but another doc bug, on line 4647:

 + The batch API was introduced in PostgreSQL 9.6, but clients 
using it can


That should read 10.0 (or just 10?)
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2016-10-12 Thread Shay Rojansky

Hi all. I thought I'd share some experience from Npgsql regarding
batching/pipelining - hope this isn't off-topic.

Npgsql has supported batching for quite a while, similar to what this patch
proposes - with a single Sync message is sent at the end.

It has recently come to my attention that this implementation is
problematic because it forces the batch to occur within a transaction; in
other words, there's no option for a non-transactional batch. This can be a
problem for several reasons: users may want to sent off a batch of inserts,
not caring whether one of them fails (e.g. because of a unique constraint
violation). In other words, in some scenarios it may be appropriate for
later batched statements to be executed when an earlier batched statement
raised an error. If Sync is only sent at the very end, this isn't possible.
Another example of a problem (which actually happened) is that transactions
acquire row-level locks, and so may trigger deadlocks if two different
batches update the same rows in reverse order. Both of these issues
wouldn't occur if the batch weren't implicitly batched.

My current plan is to modify the batch implementation based on whether
we're in an (explicit) transaction or not. If we're in a transaction, then
it makes perfect sense to send a single Sync at the end as is being
proposed here - any failure would cause the transaction to fail anyway, so
skipping all subsequent statements until the batch's end makes sense.
However, if we're not in an explicit transaction, I plan to insert a Sync
message after each individual Execute, making non-transactional batched
statements more or less identical in behavior to non-transactional
unbatched statements. Note that this mean that a batch can generate
multiple errors, not just one.

I'm sharing this since it may be relevant to the libpq batching
implementation as well, and also to get any feedback regarding how Npgsql
should act.

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2016-10-04 Thread Michael Paquier

On Mon, Oct 3, 2016 at 12:48 PM, Craig Ringer  wrote:
> On 3 October 2016 at 10:10, Michael Paquier  wrote:
>> On Tue, Sep 6, 2016 at 8:01 PM, Craig Ringer  wrote:
>>> On 6 September 2016 at 16:10, Daniel Verite  wrote:
 Craig Ringer wrote:

> Updated patch attached.

 Please find attached a couple fixes for typos I've came across in
 the doc part.
>>>
>>> Thanks, will apply and post a rebased patch soon, or if someone picks
>>> this up in the mean time they can apply your diff on top of the patch.
>>
>> Could you send an updated patch then? At the same time I am noticing
>> that git --check is complaining... This patch has tests and a
>> well-documented feature, so I'll take a look at it soon at the top of
>> my list. Moved to next CF for now.
>
> Thanks.
>
> I'd really like to teach psql in non-interactive mode to use it, but
> (a) I'm concerned about possible subtle behaviour differences arising
> if we do that and (b) I won't have the time. I think it's mostly of
> interest to app authors and driver developers and that's what it's
> aimed at. pg_restore could benefit a lot too.

Looking at it now... The most interesting comments are first.

I wanted to congratulate you. I barely see a patch with this level of
details done within the first versions. Anybody can review the patch
just by looking at the code and especially the docs without looking at
the thread. There are even tests to show what this does, for the
client.

+PQisInBatchMode   172
+PQqueriesInBatch  173
+PQbeginBatchMode  174
+PQendBatchMode175
+PQsendEndBatch176
+PQgetNextQuery177
+PQbatchIsAborted  178
This set of routines is a bit inconsistent. Why not just prefixing
them with PQbatch? Like that for example:
PQbatchStatus(): consists of disabled/inactive/none, active, error.
This covers both PQbatchIsAborted() and PQisInBatchMode().
PQbatchBegin()
PQbatchEnd()
PQbatchQueueSync() or PQbatchQueueFlush, same as PQsendEndBatch() to
add and process a sync message into the queue.
PQbatchQueueCount(): returns N>0 if there are N entries, 0 if empty,
-1 on failure
PQbatchQueueProcess(): returns 1 if process can begin, 0 if not, -1 on
failure (OOM)

+   
+Much like asynchronous query mode, there is no performance disadvantage to
+using batching and pipelining. It somewhat increased client application
+complexity and extra caution is required to prevent client/server network
+deadlocks, but can offer considerable performance improvements.
+   
I would reword that a bit "it increases client application complexity
and extra caution is required to prevent client/server deadlocks but
offers considerable performance improvements".

+("ping time") is high, and when many small operations are being
performed in
Nit: should use  here. Still not quoting it would be fine.

+ After entering batch mode the application dispatches requests
+ using normal asynchronous libpq functions like
+ PQsendQueryParams,
PQsendPrepare,
+ etc. The asynchronous requests are followed by a init;
+$node->start;
+
+my $port = $node->port;
+
+$node->stop('fast');
Er... This does nothing..

The changes in src/test/examples/ are not necessary anymore. You moved
all the tests to test_libpq (for the best actually).

+   while (queue != NULL)
+   {
+   PGcommandQueueEntry *prev = queue;
+   queue = queue->next;
+   free(prev);
+   }
This should free prev->query.

/* blah comment
* blah2 comment
*/
A lot of comment blocks are like that, those should be reformated.

Running directly make check in src/test/modules/test_libpq/ does not work:
# Postmaster PID for node "main" is 10225
# Running: testlibpqbatch dbname=postgres 1 disallowed_in_batch
Command 'testlibpqbatch' not found in [...PATH list ...]
The problem is that testlibpqbatch is not getting compiled but I think
it should.

+ * testlibpqbatch.c
+ * Test of batch execution funtionality
[...]
+   fprintf(stderr, "%s is not a recognised test name\n", argv[3]);
s/funtionality/functionality/
s/recognised/recognized/

+   if (!PQsendQueryParams(conn, "SELECT $1", 1, dummy_param_oids,
+  dummy_params, NULL, NULL, 0))
+   {
+   fprintf(stderr, "dispatching first SELECT failed: %s\n",
PQerrorMessage(conn));
+   goto fail;
+   }
+
+   if (!PQsendEndBatch(conn))
+   {
+   fprintf(stderr, "Ending first batch failed: %s\n",
PQerrorMessage(conn));
+   goto fail;
+   }
+
+   if (!PQsendQueryParams(conn, "SELECT $1", 1, dummy_param_oids,
+  dummy_params, NULL, NULL, 0))
+   {
+   fprintf(stderr, "dispatching second SELECT failed: %s\n",
PQerrorMessage(conn));
+   goto fail;
+   }
May be better to use a loop here and set in the queue a bunch of queries..

You could just remove the VERBOSE flag in the tests, having a

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2016-10-04 Thread Gavin Flower


On 04/10/16 20:15, Michael Paquier wrote:

On Mon, Oct 3, 2016 at 11:52 PM, Daniel Verite  wrote:

Wouldn't pgbench benefit from it?
It was mentioned some time ago [1], in relationship to the
\into construct, how client-server latency was important enough to
justify the use of a "\;" separator between statements, to send them
as a group.

But with the libpq batch API, maybe this could be modernized
with meta-commands like this:
   \startbatch
   ...
   \endbatch

Or just \batch [on|off], which sounds like a damn good idea to me for
some users willing to test some workloads before integrating it in an
application.


+1

'\batch' is a bit easier, to find, & to remember than '\startbatch'



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2016-10-04 Thread Craig Ringer

On 4 Oct. 2016 15:15, "Michael Paquier"  wrote:
>
> On Mon, Oct 3, 2016 at 11:52 PM, Daniel Verite 
wrote:
> > Wouldn't pgbench benefit from it?
> > It was mentioned some time ago [1], in relationship to the
> > \into construct, how client-server latency was important enough to
> > justify the use of a "\;" separator between statements, to send them
> > as a group.
> >
> > But with the libpq batch API, maybe this could be modernized
> > with meta-commands like this:
> >   \startbatch
> >   ...
> >   \endbatch
>
> Or just \batch [on|off], which sounds like a damn good idea to me for
> some users willing to test some workloads before integrating it in an
> application.

A batch jsnt necessarily terminated by a commit, so I'm more keen on
start/end batch. It's more in line with begin/commit. Batch is not only a
mode, you also have to delineate batches.

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2016-10-04 Thread Michael Paquier

On Mon, Oct 3, 2016 at 11:52 PM, Daniel Verite  wrote:
> Wouldn't pgbench benefit from it?
> It was mentioned some time ago [1], in relationship to the
> \into construct, how client-server latency was important enough to
> justify the use of a "\;" separator between statements, to send them
> as a group.
>
> But with the libpq batch API, maybe this could be modernized
> with meta-commands like this:
>   \startbatch
>   ...
>   \endbatch

Or just \batch [on|off], which sounds like a damn good idea to me for
some users willing to test some workloads before integrating it in an
application.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2016-10-03 Thread Daniel Verite

Craig Ringer wrote:

> I think it's mostly of interest to app authors and driver developers
> and that's what it's aimed at. pg_restore could benefit a lot too.

Wouldn't pgbench benefit from it?
It was mentioned some time ago [1], in relationship to the 
\into construct, how client-server latency was important enough to
justify the use of a "\;" separator between statements, to send them
as a group.

But with the libpq batch API, maybe this could be modernized
with meta-commands like this:
  \startbatch
  ...
  \endbatch
which would essentially call PQbeginBatchMode() and PQsendEndBatch().
Inside the batch section, collecting results would be interleaved
with sending queries.
Interdepencies between results and subsequent queries could
be handled or ignored, depending on how sophisticated we'd want
this.

This might also draw more users to the batch API, because
it would make it easier to check how exactly it affects
the performance of specific sequences of SQL statements to be
grouped in a batch.
For instance it would make sense for programmers to benchmark mock-ups
of their code with pgbench with/without batching, before embarking on
adapting it from blocking mode to asynchronous/non-blocking mode.

[1]
https://www.postgresql.org/message-id/alpine.DEB.2.20.1607140925400.1962%40sto

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2016-10-02 Thread Craig Ringer

On 3 October 2016 at 10:10, Michael Paquier  wrote:
> On Tue, Sep 6, 2016 at 8:01 PM, Craig Ringer  wrote:
>> On 6 September 2016 at 16:10, Daniel Verite  wrote:
>>> Craig Ringer wrote:
>>>
 Updated patch attached.
>>>
>>> Please find attached a couple fixes for typos I've came across in
>>> the doc part.
>>
>> Thanks, will apply and post a rebased patch soon, or if someone picks
>> this up in the mean time they can apply your diff on top of the patch.
>
> Could you send an updated patch then? At the same time I am noticing
> that git --check is complaining... This patch has tests and a
> well-documented feature, so I'll take a look at it soon at the top of
> my list. Moved to next CF for now.

Thanks.

I'd really like to teach psql in non-interactive mode to use it, but
(a) I'm concerned about possible subtle behaviour differences arising
if we do that and (b) I won't have the time. I think it's mostly of
interest to app authors and driver developers and that's what it's
aimed at. pg_restore could benefit a lot too.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2016-10-02 Thread Michael Paquier

On Tue, Sep 6, 2016 at 8:01 PM, Craig Ringer  wrote:
> On 6 September 2016 at 16:10, Daniel Verite  wrote:
>> Craig Ringer wrote:
>>
>>> Updated patch attached.
>>
>> Please find attached a couple fixes for typos I've came across in
>> the doc part.
>
> Thanks, will apply and post a rebased patch soon, or if someone picks
> this up in the mean time they can apply your diff on top of the patch.

Could you send an updated patch then? At the same time I am noticing
that git --check is complaining... This patch has tests and a
well-documented feature, so I'll take a look at it soon at the top of
my list. Moved to next CF for now.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2016-09-06 Thread Craig Ringer

On 6 September 2016 at 16:10, Daniel Verite  wrote:
> Craig Ringer wrote:
>
>> Updated patch attached.
>
> Please find attached a couple fixes for typos I've came across in
> the doc part.

Thanks, will apply and post a rebased patch soon, or if someone picks
this up in the mean time they can apply your diff on top of the patch.

> Also it appears that PQqueriesInBatch() doesn't work as documented.
> It says:
>  "Returns the number of queries still in the queue for this batch"
> but in fact it's implemented as a boolean:

Whoops. Will fix.

I think the function is useful and necessary. There's no reason not to
expose that, but also it's good for when your query dispatch isn't as
tightly coupled to your query handling as in the example, so your app
might need to keep processing until it sees the end of queued results.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2016-09-06 Thread Daniel Verite

Craig Ringer wrote:

> Updated patch attached.

Please find attached a couple fixes for typos I've came across in
the doc part.

Also it appears that PQqueriesInBatch() doesn't work as documented.
It says:
 "Returns the number of queries still in the queue for this batch"
but in fact it's implemented as a boolean:

+/* PQqueriesInBatch
+ *   Return true if there are queries currently pending in batch mode
+ */+int
+PQqueriesInBatch(PGconn *conn)
+{
+   if (!PQisInBatchMode(conn))
+   return false;
+
+   return conn->cmd_queue_head != NULL;
+}

However, is this function really needed? It doesn't seem essential to
the API. You don't call it in the test program either.

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 73c6c03..af4f922 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -4653,7 +4653,7 @@ int PQflush(PGconn *conn);
 
 could be much more efficiently done with:
 
- UPDATE mytable SET x = x + 1;
+ UPDATE mytable SET x = x + 1 WHERE id = 42;
 

 
@@ -4696,7 +4696,7 @@ int PQflush(PGconn *conn);
  non-blocking mode. If used 
in
  blocking mode it is possible for a client/server deadlock to occur. The
  client will block trying to send queries to the server, but the server 
will
- block trying to send results from queries it's already processed to the
+ block trying to send results from queries it has already processed to the
  client. This only occurs when the client sends enough queries to fill its
  output buffer and the server's receive buffer before switching to
  processing input from the server, but it's hard to predict exactly when
@@ -5015,7 +5015,7 @@ int PQgetNextQuery(PGconn *conn);
  
   
   Returns the number of queries still in the queue for this batch, not
-  including any query that's currently having results being processsed.
+  including any query that's currently having results being processed.
   This is the number of times PQgetNextQuery has to be
   called before the query queue is empty again.
 
@@ -5037,7 +5037,7 @@ int PQqueriesInBatch(PGconn *conn);
 
  
   
-   Returns 1 if the batch curently being received on a
+   Returns 1 if the batch currently being received on a
libpq connection in batch mode is
aborted, 0

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2016-08-23 Thread Craig Ringer

On 23 August 2016 at 08:27, Craig Ringer  wrote:

> On 10 August 2016 at 14:44, Michael Paquier 
> wrote:
>
>>

> I am looking a bit more seriously at this patch and assigned myself as
>> a reviewer.
>>
>
> Much appreciated.
>
>
>>

> macos complains here. You may want to replace %06lds by just %06d.
>>
>
> Yeah, or cast to a type known to be big enough. Will amend.
>

I used an upcast to (long), because on Linux it's a long. I don't see the
point of messing about adding a configure test for something this trivial.

> This patch generates a core dump, use for example pg_ctl start -w and
>> you'll bump into the trace above. There is something wrong with the
>> queue handling.
>>
>
> Huh. I didn't see that here (Fedora 23). I'll look more closely.
>
>
>> Do you have plans for a more generic structure for the command queue list?
>>
>
> No plans, no. This was a weekend experiment that turned into a useful
> patch and I'm having to scrape up time for it amongst much more important
> things like logical failover / sequence decoding and various other
> replication work.
>
> Thanks for the docs review too, will amend.
>
>
>> +   fprintf(stderr, "internal error, COPY in batch mode");
>> +   abort();
>> I don't think that's a good idea.
>
>
My thinking there was that it's a "shouldn't happen" case. It's a problem
in library logic. I'd use an Assert() here in the backend.

I could printfPQExpBuffer(...) an error and return failure instead if you
think that's more appropriate. I'm not sure how the app would handle it
correctly, but OTOH it's generally better for libraries not to call
abort(). So I'll do that, but since it's an internal error that's not meant
to happen I'll skip the gettext calls.

> Error messages should also use libpq_gettext, and perhaps
>> be stored in conn->errorMessage as we do so for OOMs happening on
>> client-side and reporting them back even if they are not expected
>> (those are blocked PQsendQueryStart in your patch).
>>
>
I didn't get that last part, re PQsendQueryStart.

> src/test/examples is a good idea to show people what this new API can
>> do, but this is never getting compiled. It could as well be possible
>> to include tests in src/test/modules/, in the same shape as what
>> postgres_fdw is doing by connecting to itself and link it to libpq. As
>> this patch complicates quote a lot fe-exec.c, I think that this would
>> be worth it. Thoughts?
>
>
I think it makes sense to use the TAP framework. Added
src/test/modules/test_libpq/ with a test for async mode. Others can be
added/migrated based on that. I thought it made more sense for the tests to
live there than in src/interfaces//libpq/ since they need test client
programs and shouldn't pollute the library directory.

I've made the docs changes too. Thanks.

I fixed the list handling error. I'm amazed it appears to run fine, and
without complaint from valgrind, here, since it was an accidentally
_deleted_ line.

Re lists, I looked at simple_list.c and it's exceedingly primitive. Using
it would mean more malloc()ing since we'll have a list cell and then a
struct pointed to it, and won't recycle members, but... whatever. It's not
going to matter a great deal. The reason I did it with an embedded list
originally was because that's how it's done for PGnotify, but that's not
exactly new code

The bigger problem is that simple_list also uses pg_malloc, which won't set
conn->errorMessage, it'll just fprintf() and exit(1). I'm not convinced
it's appropriate to use that for libpq.

For now I've left list handling unchanged. If it's to move to a generic
list, it should probably be one that knows how to use
pg_malloc_extended(size, MCXT_ALLOC_NO_OOM) and emit its own
libpq-error-handling-aware error. I'm not sure whether that list should use
cell heads embedded in the structures it manages or pointing to them,
either.

Updated patch attached.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From fad0def5570907f5c8e3b6d65d57e5e7678e7383 Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Fri, 20 May 2016 12:45:18 +0800
Subject: [PATCH] Pipelining (batch) support for libpq

Allow libpq clients to avoid excessive round trips by pipelining multiple
commands into batches. A sync is only sent at the end of a batch. Commands
in a batch succeed or fail together.

Adds TAP tests for libpq at src/test/modules/test_libpq .

Includes a test program in src/test/modules/test_libpq/testlibpqbatch.c
---
 doc/src/sgml/libpq.sgml  |  478 
 src/interfaces/libpq/.gitignore  |1 +
 src/interfaces/libpq/Makefile|5 +
 src/interfaces/libpq/exports.txt |7 +
 src/interfaces/libpq/fe-connect.c|   17 +
 src/interfaces/libpq/fe-exec.c   |  572 -

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2016-08-22 Thread Craig Ringer

On 10 August 2016 at 14:44, Michael Paquier 
wrote:

> On Fri, Jun 3, 2016 at 8:51 PM, Dmitry Igrishin  wrote:
> >> BTW, I've publushed the HTML-ified SGML docs to
> >> http://2ndquadrant.github.io/postgres/libpq-batch-mode.html as a
> preview.
> > Typo detected: "Returns 1 if the batch curently being received" --
> "curently".
>
> I am looking a bit more seriously at this patch and assigned myself as
> a reviewer.
>

Much appreciated.


> testlibpqbatch.c:1239:73: warning: format specifies type 'long' but
> the argument has type '__darwin_suseconds_t' (aka 'int') [-Wformat]
> printf("batch insert elapsed:  %ld.%06lds\n",
> elapsed_time.tv_sec, elapsed_time.tv_usec);
> macos complains here. You may want to replace %06lds by just %06d.
>

Yeah, or cast to a type known to be big enough. Will amend.


> This patch generates a core dump, use for example pg_ctl start -w and
> you'll bump into the trace above. There is something wrong with the
> queue handling.
>

Huh. I didn't see that here (Fedora 23). I'll look more closely.


> Do you have plans for a more generic structure for the command queue list?
>

No plans, no. This was a weekend experiment that turned into a useful patch
and I'm having to scrape up time for it amongst much more important things
like logical failover / sequence decoding and various other replication
work.

Thanks for the docs review too, will amend.


> +   fprintf(stderr, "internal error, COPY in batch mode");
> +   abort();
> I don't think that's a good idea. defaultNoticeProcessor can be
> overridden to allow applications to have error messages sent
> elsewhere. Error messages should also use libpq_gettext, and perhaps
> be stored in conn->errorMessage as we do so for OOMs happening on
> client-side and reporting them back even if they are not expected
> (those are blocked PQsendQueryStart in your patch).
>
> src/test/examples is a good idea to show people what this new API can
> do, but this is never getting compiled. It could as well be possible
> to include tests in src/test/modules/, in the same shape as what
> postgres_fdw is doing by connecting to itself and link it to libpq. As
> this patch complicates quote a lot fe-exec.c, I think that this would
> be worth it. Thoughts?


I didn't think it added much complexity to fe-exec.c personally. A lot of
the appeal is that it has very minor impact on anything that isn't using it.

I think it makes sense to (ab)use the recovery module tests for this,
invoking the test program from there.

Ideally I'd like to teach pgsql and pg_restore how to use async mode, but
that's a whole separate patch.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2016-08-10 Thread Michael Paquier

On Fri, Jun 3, 2016 at 8:51 PM, Dmitry Igrishin  wrote:
>> BTW, I've publushed the HTML-ified SGML docs to
>> http://2ndquadrant.github.io/postgres/libpq-batch-mode.html as a preview.
> Typo detected: "Returns 1 if the batch curently being received" -- "curently".

I am looking a bit more seriously at this patch and assigned myself as
a reviewer.

testlibpqbatch.c:1239:73: warning: format specifies type 'long' but
the argument has type '__darwin_suseconds_t' (aka 'int') [-Wformat]
printf("batch insert elapsed:  %ld.%06lds\n",
elapsed_time.tv_sec, elapsed_time.tv_usec);
macos complains here. You may want to replace %06lds by just %06d.

(lldb) bt
* thread #1: tid = 0x, 0x000108bcdd56
libpq.5.dylib`closePGconn(conn=0x7fd642d002d0) + 438 at
fe-connect.c:3010, stop reason = signal SIGSTOP
  * frame #0: 0x000108bcdd56
libpq.5.dylib`closePGconn(conn=0x7fd642d002d0) + 438 at
fe-connect.c:3010
frame #1: 0x000108bc9db0
libpq.5.dylib`PQfinish(conn=0x7fd642d002d0) + 32 at
fe-connect.c:3072
frame #2: 0x000108bc9ede
libpq.5.dylib`PQping(conninfo="dbname=postgres port=5432 host='/tmp'
connect_timeout=5") + 46 at fe-connect.c:539
frame #3: 0x000108bb5210
pg_ctl`test_postmaster_connection(pm_pid=78218, do_checkpoint='\0') +
976 at pg_ctl.c:681
frame #4: 0x000108bb388e pg_ctl`do_start + 302 at pg_ctl.c:915
frame #5: 0x000108bb29b4 pg_ctl`main(argc=5,
argv=0x7fff5704e5c0) + 2836 at pg_ctl.c:2416
frame #6: 0x7fff8b8b65ad libdyld.dylib`start + 1
(lldb) down 1
frame #0: 0x000108bcdd56
libpq.5.dylib`closePGconn(conn=0x7fd642d002d0) + 438 at
fe-connect.c:3010
   3007queue = conn->cmd_queue_recycle;
   3008{
   3009PGcommandQueueEntry *prev = queue;
-> 3010queue = queue->next;
   3011free(prev);
   3012}
   3013conn->cmd_queue_recycle = NULL;
This patch generates a core dump, use for example pg_ctl start -w and
you'll bump into the trace above. There is something wrong with the
queue handling.

Do you have plans for a more generic structure for the command queue list?

+   libpq supports queueing up mulitiple queries into
s/mulitiple/multiple/.

+  
+   An example of batch use may be found in the source distribution in
+   src/test/examples/libpqbatch.c.
+  
You mean testlibpqbatch.c here.

+   
+Batching less useful when information from one operation is required by the
+client before it knows enough to send the next operation. The client must
"Batching *is* less useful".

src/test/examples/.gitignore needs a new entry for the new test binary.

+   fprintf(stderr, "internal error, COPY in batch mode");
+   abort();
I don't think that's a good idea. defaultNoticeProcessor can be
overridden to allow applications to have error messages sent
elsewhere. Error messages should also use libpq_gettext, and perhaps
be stored in conn->errorMessage as we do so for OOMs happening on
client-side and reporting them back even if they are not expected
(those are blocked PQsendQueryStart in your patch).

src/test/examples is a good idea to show people what this new API can
do, but this is never getting compiled. It could as well be possible
to include tests in src/test/modules/, in the same shape as what
postgres_fdw is doing by connecting to itself and link it to libpq. As
this patch complicates quote a lot fe-exec.c, I think that this would
be worth it. Thoughts?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2016-06-03 Thread Dmitry Igrishin

2016-05-24 5:01 GMT+03:00 Craig Ringer :
>
> On 24 May 2016 at 00:00, Michael Paquier  wrote:
>>
>> On Mon, May 23, 2016 at 8:50 AM, Andres Freund  wrote:
>
>
>>
>> > yay^2.
>>
>> I'll follow this mood. Yeha.
>
>
>
> BTW, I've publushed the HTML-ified SGML docs to
> http://2ndquadrant.github.io/postgres/libpq-batch-mode.html as a preview.
Typo detected: "Returns 1 if the batch curently being received" -- "curently".

-- 
// Dmitry.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2016-05-26 Thread Tsunakawa, Takayuki

From: pgsql-hackers-ow...@postgresql.org 
[mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Craig Ringer
I'll follow this mood. Yeha.


BTW, I've publushed the HTML-ified SGML docs to 
http://2ndquadrant.github.io/postgres/libpq-batch-mode.html as a preview.


Sorry for my late reply.  Fantastic performance improvement, nice 
documentation, and amazing rapid development!  I think I’ll join the review & 
testing in 2016/9 CF.

Regards
Takayuki Tsunakawa

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2016-05-24 Thread Jim Nasby


On 5/23/16 4:19 AM, Craig Ringer wrote:

+Batching less useful when information from one operation is required by the


SB "Batching is less useful".
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2016-05-23 Thread Tom Lane

Craig Ringer  writes:
> On 24 May 2016 at 00:00, Michael Paquier  wrote:
>> Did you consider the use of simple_list.c instead of introducing a new
>> mimic as PGcommandQueueEntry? It would be cool avoiding adding new
>> list emulations on frontends.

> I'd have to extend simple_list to add a generic object version, like

> struct my_list_elem
> {
> PG_SIMPLE_LIST_ATTRS;
> mytype mycol;
> myothertype myothercol;
> }

> Objections?

That doesn't look exactly "generic".

> I could add a void* version that's a simple clone of the string version,
> but having to malloc both a list cell and its contents separately is
> annoying.

I'd be okay with a void* version, but I'm not sure about this.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2016-05-23 Thread Craig Ringer

On 24 May 2016 at 00:00, Michael Paquier  wrote:


>
> Did you consider the use of simple_list.c instead of introducing a new
> mimic as PGcommandQueueEntry? It would be cool avoiding adding new
> list emulations on frontends.
>

I'd have to extend simple_list to add a generic object version, like


struct my_list_elem
{
PG_SIMPLE_LIST_ATTRS;
mytype mycol;
myothertype myothercol;
}

Objections?

I could add a void* version that's a simple clone of the string version,
but having to malloc both a list cell and its contents separately is
annoying.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2016-05-23 Thread Craig Ringer

On 24 May 2016 at 00:00, Michael Paquier  wrote:

> On Mon, May 23, 2016 at 8:50 AM, Andres Freund  wrote:
>

> > yay^2.
>
> I'll follow this mood. Yeha.

BTW, I've publushed the HTML-ified SGML docs to
http://2ndquadrant.github.io/postgres/libpq-batch-mode.html as a preview.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2016-05-23 Thread Craig Ringer

On 24 May 2016 at 00:00, Michael Paquier  wrote:

> On Mon, May 23, 2016 at 8:50 AM, Andres Freund 
> wrote:
>
>> This should be very useful for optimising FDWs, Postgres-XC, etc.
> >
> > And optimizing normal clients.
> >
> > Not easy, but I'd be very curious how much psql's performance improves
> > when replaying a .sql style dump, and replaying from a !tty fd, after
> > hacking it up to use the batch API.
>

I didn't, but agree it'd be interesting. So would pg_restore, for that
matter, though its use of COPY for the bulk of its work means it wouldn't
make tons of difference.

I think it'd be safe to enable it automatically in psql's
--single-transaction mode. It's also safe to send anything after an
explicit BEGIN and until the next COMMIT as a batch from libpq, and since
it parses the SQL enough to detect statement boundaries already that
shouldn't be too hard to handle.

However: psql is synchronous, using the PQexec family of blocking calls.
It's all fairly well contained in SendQuery and PSQLexec, but making it use
batching still require restructuring those to use the asynchronous
nonblocking API and append the query to a pending-list, plus the addition
of a select() loop to handle results and dispatch more work. MainLoop()
isn't structured around a select or poll, it loops over gets. So while it'd
be interesting to see what difference batching made the changes to make
psql use it would be a bit more invasive. Far from a rewrite, but to avoid
lots of code duplication it'd have to change everything to use nonblocking
mode and a select loop, which is a big change for such a core tool.

This is a bit of a side-project and I've got to get back to "real work" so
I don't expect to do a proper patch for psql any time soon. I'd rather not
try to build too much on this until it's seen some review and I know the
API won't need a drastic rewrite anyway. I'll see if I can do a hacked-up
version one evening to see what it does for performance though.

Did you consider the use of simple_list.c instead of introducing a new
> mimic as PGcommandQueueEntry? It would be cool avoiding adding new
> list emulations on frontends.
>

Nope. I didn't realise it was there; I've done very little on the C client
and library side so far. So thanks, I'll update it accordingly.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2016-05-23 Thread Michael Paquier

On Mon, May 23, 2016 at 8:50 AM, Andres Freund  wrote:
> On 2016-05-23 17:19:09 +0800, Craig Ringer wrote:
>> Following on from the foreign table batch inserts thread[1], here's a patch
>> to add support for pipelining queries into asynchronous batches in libpq.
>
> Yay!
>> I'm measuring 300x (not %) performance improvements doing batches on
>> servers over the Internet, so this seems pretty worthwhile. It turned out
>> to be way less invasive than I expected too.
>
> yay^2.

I'll follow this mood. Yeha.

>> (I intentionally didn't add any way for clients to annotate each work-item
>> in a batch with their own private data. I think that'd be really useful and
>> would make implementing clients easier, but should be a separate patch).
>>
>> This should be very useful for optimising FDWs, Postgres-XC, etc.
>
> And optimizing normal clients.
>
> Not easy, but I'd be very curious how much psql's performance improves
> when replaying a .sql style dump, and replaying from a !tty fd, after
> hacking it up to use the batch API.

Did you consider the use of simple_list.c instead of introducing a new
mimic as PGcommandQueueEntry? It would be cool avoiding adding new
list emulations on frontends.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2016-05-23 Thread Andres Freund

Hi,

On 2016-05-23 17:19:09 +0800, Craig Ringer wrote:
> Hi all
> Following on from the foreign table batch inserts thread[1], here's a patch
> to add support for pipelining queries into asynchronous batches in libpq.

Yay!


> I'm measuring 300x (not %) performance improvements doing batches on
> servers over the Internet, so this seems pretty worthwhile. It turned out
> to be way less invasive than I expected too.

yay^2.


> (I intentionally didn't add any way for clients to annotate each work-item
> in a batch with their own private data. I think that'd be really useful and
> would make implementing clients easier, but should be a separate patch).
> 
> This should be very useful for optimising FDWs, Postgres-XC, etc.

And optimizing normal clients.

Not easy, but I'd be very curious how much psql's performance improves
when replaying a .sql style dump, and replaying from a !tty fd, after
hacking it up to use the batch API.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] PATCH: Batch/pipelining support for libpq

2016-05-23 Thread Craig Ringer

Hi all

Following on from the foreign table batch inserts thread[1], here's a patch
to add support for pipelining queries into asynchronous batches in libpq.

Attached, and also available at
https://github.com/2ndQuadrant/postgres/tree/dev/libpq-async-batch (subject
to rebasing and force pushes).

It's cleaned up over the draft I posted on that thread and has error
recovery implemented. I've written and included the SGML docs for it. The
test program is now pretty comprehensive, more so than for anything else in
libpq anyway. I'll submit it to the next CF as a 9.7/10.0 candidate.

I'm measuring 300x (not %) performance improvements doing batches on
servers over the Internet, so this seems pretty worthwhile. It turned out
to be way less invasive than I expected too.

(I intentionally didn't add any way for clients to annotate each work-item
in a batch with their own private data. I think that'd be really useful and
would make implementing clients easier, but should be a separate patch).

This should be very useful for optimising FDWs, Postgres-XC, etc.


[1]
http://www.postgresql.org/message-id/CAMsr+YFgDUiJ37DEfPRk8WDBuZ58psdAYJd8iNFSaGxtw=w...@mail.gmail.com

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From 711a6951f57d84309bd2d7477a145ee3412b7967 Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Fri, 20 May 2016 12:45:18 +0800
Subject: [PATCH] Pipelining (batch) support for libpq

Allow libpq clients to avoid excessive round trips by pipelining multiple
commands into batches. A sync is only sent at the end of a batch. Commands
in a batch succeed or fail together.

Includes a test program in src/test/examples/testlibpqbatch.c
---
 doc/src/sgml/libpq.sgml |  478 +
 src/interfaces/libpq/exports.txt|7 +
 src/interfaces/libpq/fe-connect.c   |   16 +
 src/interfaces/libpq/fe-exec.c  |  572 +++-
 src/interfaces/libpq/fe-protocol2.c |6 +
 src/interfaces/libpq/fe-protocol3.c |   17 +-
 src/interfaces/libpq/libpq-fe.h |   13 +-
 src/interfaces/libpq/libpq-int.h|   37 +-
 src/test/examples/.gitignore|1 +
 src/test/examples/Makefile  |2 +-
 src/test/examples/testlibpqbatch.c  | 1254 +++
 11 files changed, 2360 insertions(+), 43 deletions(-)
 create mode 100644 src/test/examples/testlibpqbatch.c

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 3829a14..d6e896f 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -4568,6 +4568,484 @@ int PQflush(PGconn *conn);
 
  
 
+ 
+  Batch mode and query pipelining
+
+  
+   libpq
+   batch mode
+  
+
+  
+   libpq
+   pipelining
+  
+
+  
+   libpq supports queueing up mulitiple queries into
+   a pipeline to be executed as a batch on the server. Batching queries allows
+   applications to avoid a client/server round-trip after each query to get
+   the results before issuing the next query.
+  
+
+  
+   An example of batch use may be found in the source distribution in
+   src/test/examples/libpqbatch.c.
+  
+
+  
+   When to use batching
+
+   
+Much like asynchronous query mode, there is no performance disadvantage to
+using batching and pipelining. It somewhat increased client application
+complexity and extra caution is required to prevent client/server network
+deadlocks, but can offer considerable performance improvements.
+   
+
+   
+Batching is most useful when the server is distant, i.e. network latency
+("ping time") is high, and when many small operations are being performed in
+rapid sequence. There is usually less benefit in using batches when each
+query takes many multiples of the client/server round-trip time to execute.
+A 100-statement operation run on a server 300ms round-trip-time away would take
+30 seconds in network latency alone without batching; with batching it may spend
+as little as 0.3s waiting for results from the server.
+   
+
+   
+Use batches when your application does lots of small
+INSERT, UPDATE and
+DELETE operations that can't easily be transformed into
+operations on sets or into a
+COPY operation.
+   
+
+   
+Batching less useful when information from one operation is required by the
+client before it knows enough to send the next operation. The client must
+introduce a synchronisation point and wait for a full client/server
+round-trip to get the results it needs. However, it's often possible to
+adjust the client design to exchange the required information server-side.
+Read-modify-write cycles are especially good candidates; for example:
+
+ BEGIN;
+ SELECT x FROM mytable WHERE id = 42 FOR UPDATE;
+ -- result: x=2
+ -- client adds 1 to x:
+ UPDATE mytable SET x = 3 WHERE id = 42;
+ COMMIT;
+
+could be much more efficiently done with:
+
+ UPDATE

94 matches

Mail list logo