Re: [HACKERS] foreign table batch inserts

2016-06-01 Thread Etsuro Fujita

On 2016/05/31 14:53, Amit Langote wrote:

On 2016/05/30 22:59, Craig Ringer wrote:

On 30 May 2016 at 16:17, Etsuro Fujita  wrote:



That's a good point, but the basic idea is to send the local query
almost-as-is to the remote server if possible.  For example, if the local
query is "INSERT INTO foreign_table(a,b,c) VALUES (1, 2, 3), (4, 5, 6)",
send the remote query "INSERT INTO remote_table(a,b,c) VALUES (1, 2, 3),
(4, 5, 6)" to the remote server where remote_table is the table name for
the foreign table on the remote server.  So, wouldn't the query string
length be a problem in many cases?  Maybe I'm missing something, though.




FDWs don't operate at that level. They don't see the original query string.
They're plan nodes that operate with a row-by-row push/pull model. The
foreign table node in question has no idea you're doing a multivalued
insert and doesn't care if it's INSERT INTO ... SELECT, INSERT INTO ...
VALUES, or COPY.



IIUC, what Fujita-san seems to be referring to here is safe push-down of a
insert's query or values expression (and hence the whole insert itself)
considered during the *planning* step.


That's really what I have in mind.  Thanks for the explanation!


Although that sounds like a
different optimization from  what's being discussed on this thread.  The
latter certainly seems to have its benefits in case of push-down failure
and might as well be the majority of cases.


Agreed.

Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign table batch inserts

2016-05-30 Thread Amit Langote
On 2016/05/30 22:59, Craig Ringer wrote:
> On 30 May 2016 at 16:17, Etsuro Fujita  wrote:
>>
>> That's a good point, but the basic idea is to send the local query
>> almost-as-is to the remote server if possible.  For example, if the local
>> query is "INSERT INTO foreign_table(a,b,c) VALUES (1, 2, 3), (4, 5, 6)",
>> send the remote query "INSERT INTO remote_table(a,b,c) VALUES (1, 2, 3),
>> (4, 5, 6)" to the remote server where remote_table is the table name for
>> the foreign table on the remote server.  So, wouldn't the query string
>> length be a problem in many cases?  Maybe I'm missing something, though.
>> 
> 
> FDWs don't operate at that level. They don't see the original query string.
> They're plan nodes that operate with a row-by-row push/pull model. The
> foreign table node in question has no idea you're doing a multivalued
> insert and doesn't care if it's INSERT INTO ... SELECT, INSERT INTO ...
> VALUES, or COPY.

IIUC, what Fujita-san seems to be referring to here is safe push-down of a
insert's query or values expression (and hence the whole insert itself)
considered during the *planning* step.  Although that sounds like a
different optimization from  what's being discussed on this thread.  The
latter certainly seems to have its benefits in case of push-down failure
and might as well be the majority of cases.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign table batch inserts

2016-05-30 Thread Craig Ringer
On 30 May 2016 at 16:17, Etsuro Fujita  wrote:


>
> That's a good point, but the basic idea is to send the local query
> almost-as-is to the remote server if possible.  For example, if the local
> query is "INSERT INTO foreign_table(a,b,c) VALUES (1, 2, 3), (4, 5, 6)",
> send the remote query "INSERT INTO remote_table(a,b,c) VALUES (1, 2, 3),
> (4, 5, 6)" to the remote server where remote_table is the table name for
> the foreign table on the remote server.  So, wouldn't the query string
> length be a problem in many cases?  Maybe I'm missing something, though.
> 
>


FDWs don't operate at that level. They don't see the original query string.
They're plan nodes that operate with a row-by-row push/pull model. The
foreign table node in question has no idea you're doing a multivalued
insert and doesn't care if it's INSERT INTO ... SELECT, INSERT INTO ...
VALUES, or COPY.

That's why I think using batching is the way to go here. Each operation
remains isolated, but you don't force a round trip for each one, you just
queue them up on the wire and you flush only at end-of-statement. A failure
will cause the statement to ERROR and abort the tx, so the effect is the
same, though the failure might be a bit later than if you forced a flush
each time.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] foreign table batch inserts

2016-05-30 Thread Etsuro Fujita

On 2016/05/27 8:49, Michael Paquier wrote:

On Thu, May 26, 2016 at 4:25 AM, Etsuro Fujita
 wrote:



Honestly, I didn't have any idea for executing such an insert efficiently,
but I was thinking to execute an insert into a foreign table efficiently, by
sending the whole insert to the remote server, if possible.  For example, if
the insert is of the form:

INSERT INTO foreign_table(a,b,c) VALUES (1, 2, 3), (4, 5, 6) or
INSERT INTO foreign_table(a,b,c) SELECT a,b,c FROM foreign_table2

where foreign_table and foreign_table2 belong to the same foreign server,
then we could send the whole insert to the remote server.

Wouldn't that make sense?



Query strings have a limited length, and this assumption is true for
many code paths in the backend code, so doing that with a long string
would introduce more pain in the logic than anything else, as this
would become more data type sensitive.


That's a good point, but the basic idea is to send the local query 
almost-as-is to the remote server if possible.  For example, if the 
local query is "INSERT INTO foreign_table(a,b,c) VALUES (1, 2, 3), (4, 
5, 6)", send the remote query "INSERT INTO remote_table(a,b,c) VALUES 
(1, 2, 3), (4, 5, 6)" to the remote server where remote_table is the 
table name for the foreign table on the remote server.  So, wouldn't the 
query string length be a problem in many cases?  Maybe I'm missing 
something, though.


Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign table batch inserts

2016-05-26 Thread Michael Paquier
On Thu, May 26, 2016 at 4:25 AM, Etsuro Fujita
 wrote:
> On 2016/05/18 7:08, Michael Paquier wrote:
>>
>> On Wed, May 18, 2016 at 6:00 AM, Manuel Kniep  wrote:
>>>
>>> I realized that inserts into foreign tables are only done row by row.
>>> Consider copying data from one local table to a foreign table with
>>>
>>> INSERT INTO foreign_table(a,b,c) SELECT a,b,c FROM local_table;
>>>
>>> When the foreign  server is for example in another datacenter with long
>>> latency,
>>> this as an enormous performance trade off.
>
>
>> I am adding Fujita-san in the loop here, he is
>> quite involved with postgres_fdw these days so perhaps he has some
>> input to offer.
>
>
> Honestly, I didn't have any idea for executing such an insert efficiently,
> but I was thinking to execute an insert into a foreign table efficiently, by
> sending the whole insert to the remote server, if possible.  For example, if
> the insert is of the form:
>
> INSERT INTO foreign_table(a,b,c) VALUES (1, 2, 3), (4, 5, 6) or
> INSERT INTO foreign_table(a,b,c) SELECT a,b,c FROM foreign_table2
>
> where foreign_table and foreign_table2 belong to the same foreign server,
> then we could send the whole insert to the remote server.
>
> Wouldn't that make sense?

Query strings have a limited length, and this assumption is true for
many code paths in the backend code, so doing that with a long string
would introduce more pain in the logic than anything else, as this
would become more data type sensitive.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign table batch inserts

2016-05-26 Thread Etsuro Fujita

On 2016/05/18 7:08, Michael Paquier wrote:

On Wed, May 18, 2016 at 6:00 AM, Manuel Kniep  wrote:

I realized that inserts into foreign tables are only done row by row.
Consider copying data from one local table to a foreign table with

INSERT INTO foreign_table(a,b,c) SELECT a,b,c FROM local_table;

When the foreign  server is for example in another datacenter with long latency,
this as an enormous performance trade off.



I am adding Fujita-san in the loop here, he is
quite involved with postgres_fdw these days so perhaps he has some
input to offer.


Honestly, I didn't have any idea for executing such an insert 
efficiently, but I was thinking to execute an insert into a foreign 
table efficiently, by sending the whole insert to the remote server, if 
possible.  For example, if the insert is of the form:


INSERT INTO foreign_table(a,b,c) VALUES (1, 2, 3), (4, 5, 6) or
INSERT INTO foreign_table(a,b,c) SELECT a,b,c FROM foreign_table2

where foreign_table and foreign_table2 belong to the same foreign 
server, then we could send the whole insert to the remote server.


Wouldn't that make sense?

Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign table batch inserts

2016-05-23 Thread Craig Ringer
On 20 May 2016 at 23:18, Craig Ringer  wrote:

> On 20 May 2016 at 15:35, Craig Ringer  wrote:
>
>
>>
>> You can, however, omit Sync from between messages and send a series of
>> protocol messages, like
>>
>> Parse/Bind/Execute/Bind/Execute/Bind/Execute/Sync
>>
>> to avoid round-trip overheads.
>>
>>
> I implemented what I think is a pretty solid proof of concept of this for
> kicks this evening. Attached, including basic test program. Patch attached.
> The performance difference over higher latency links is huge, see below.
>

I finished it off and submitted it.


http://www.postgresql.org/message-id/flat/CAMsr+YFUjJytRyV4J-16bEoiZyH=4nj+sQ7JP9ajwz=b4dm...@mail.gmail.com#CAMsr+YFUjJytRyV4J-16bEoiZyH=4nj+sQ7JP9ajwz=b4dm...@mail.gmail.com


https://commitfest.postgresql.org/10/634/

I'll use the other thread for the patch from now on.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] foreign table batch inserts

2016-05-20 Thread Craig Ringer
On 20 May 2016 at 15:35, Craig Ringer  wrote:


>
> You can, however, omit Sync from between messages and send a series of
> protocol messages, like
>
> Parse/Bind/Execute/Bind/Execute/Bind/Execute/Sync
>
> to avoid round-trip overheads.
>
>
I implemented what I think is a pretty solid proof of concept of this for
kicks this evening. Attached, including basic test program. Patch attached.
The performance difference over higher latency links is huge, see below.

Demo/test program in src/test/examples/testlibpqbatch.c.

github: https://github.com/2ndQuadrant/postgres/tree/dev/libpq-async-batch



I still need to add the logic for handling an error during a batch by
discarding all input until the next Sync, but otherwise I think it's pretty
reasonable.

The time difference for 10k inserts on the local host over a unix socket
shows a solid improvement:

batch insert elapsed:  0.244293s
sequential insert elapsed: 0.375402s

... but over, say, a connection to a random AWS RDS instance fired up for
the purpose that lives about 320ms away the difference is huge:

batch insert elapsed:  9.029995s
sequential insert elapsed: (I got bored after 10 minutes; it should take a
bit less then an hour based on the latency numbers)

With 500 rows on the remote AWS RDS instance, once the I/O quota is already
saturated:

batch insert elapsed:  1.229024s
sequential insert elapsed: 156.962180s

which is an improvement by a factor of over 120

I didn't compare vs COPY. I'm sure COPY will be faster, but COPY doesn't
let you do INSERT ... ON CONFLICT, do UPDATE, do DELETE, etc. Not without
temp tables and a bunch of hoop jumping anyway. If COPY solved everything
there'd be no point having pipelining.

No docs yet, but if folks think the interface is reasonable I can add them
easily since the comments on each of the new functoins should be easy to
adapt into the SGML docs.

With a bit of polishing I think this can probably go in the next CF, though
I only wrote it as an experiment. Can I get opinions on the API?

The TL;DR API, using the usual async libpq routines, is:


PQbeginBatchMode(conn);

PQsendQueryParams(conn, "BEGIN", 0, NULL, NULL, NULL, NULL, 0);

PQsendPrepare(conn, "my_update", "UPDATE ...");

PQsetnonblocking(conn, 1);

while (!all_responses_received)
{
   select(...)

   if (can-write)
   {
 if (app-has-more-data-to-send)
 {
   PQsendQueryPrepared(conn, "my_update", params-go-here);
 }
 else if (havent-sent-commit-yet)
 {
   PQsendQueryParams(conn, "COMMIT", ...);
 }
 else if (havent-sent-endbatch-yet)
 {
   PqEndBatch(conn);
 }
 PQflush(conn);
   }

   if (can-read)
   {
 PQconsumeInput(conn);
 if (PQisBusy(conn))
   continue;
 res = PQgetResult(conn);
 if (res == NULL)
 {
   PQgetNextQuery(conn);
   continue;
 }
 /* process results in the same order we sent the commands */
 /* client keeps track of that, libpq just supplies the results */
 ...
   }
}

PQendBatch(conn);




Note that:

* PQsendQuery cannot be used as it uses simple query protocol, use
PQsendQueryParams instead;
* Batch supports PQsendQueryParams, PQsendPrepare, PQsendQueryPrepared,
PQsendDescribePrepared, PQsendDescribePortal;
* You don't call PQgetResult after dispatching each query
* Multiple batches may be pipelined, you don't have to wait for one to end
to start another (an advantage over JDBC's API)
* non-blocking mode isn't required, but is strongly advised

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From f0ca25bdc2bacf65530e4f180fdfc7c219866541 Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Fri, 20 May 2016 12:45:18 +0800
Subject: [PATCH] Draft of libpq async pipelining support

Now with test in src/test/examples/testlibpqbatch.c
---
 src/interfaces/libpq/exports.txt|   6 +
 src/interfaces/libpq/fe-connect.c   |  16 +
 src/interfaces/libpq/fe-exec.c  | 540 ++--
 src/interfaces/libpq/fe-protocol2.c |   6 +
 src/interfaces/libpq/fe-protocol3.c |  13 +-
 src/interfaces/libpq/libpq-fe.h |  12 +-
 src/interfaces/libpq/libpq-int.h|  36 +-
 src/test/examples/Makefile  |   2 +-
 src/test/examples/testlibpqbatch.c  | 690 
 9 files changed, 1278 insertions(+), 43 deletions(-)
 create mode 100644 src/test/examples/testlibpqbatch.c

diff --git a/src/interfaces/libpq/exports.txt b/src/interfaces/libpq/exports.txt
index 21dd772..e297c4b 100644
--- a/src/interfaces/libpq/exports.txt
+++ b/src/interfaces/libpq/exports.txt
@@ -171,3 +171,9 @@ PQsslAttributeNames   168
 PQsslAttribute169
 PQsetErrorContextVisibility 170
 PQresultVerboseErrorMessage 171
+PQisInBatchMode   172
+PQqueriesInBatch  173
+PQbeginBatchMode  174
+PQendBatchMode175
+PQendBatch176
+PQgetNextQuery  

Re: [HACKERS] foreign table batch inserts

2016-05-20 Thread Craig Ringer
On 20 May 2016 at 08:47, Tsunakawa, Takayuki  wrote:

> From: pgsql-hackers-ow...@postgresql.org [mailto:
> pgsql-hackers-ow...@postgresql.org] On Behalf Of Craig Ringer
>
> Well, there's FE/BE level batching/pipelining already. Just no access to
> it from libpq.
>
>
>
> Oh, really.  The Bind ('B') appears to take one set of parameter values,
> not multiple sets (array).  Anyway, I had to say "I want batch update API
> in libpq" to use it in ODBC and ECPG.
>
>
Right, and there's no protocol level support for array-valued batches.

You can, however, omit Sync from between messages and send a series of
protocol messages, like

Parse/Bind/Execute/Bind/Execute/Bind/Execute/Sync

to avoid round-trip overheads.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] foreign table batch inserts

2016-05-19 Thread Tsunakawa, Takayuki
From: pgsql-hackers-ow...@postgresql.org 
[mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Craig Ringer
Well, there's FE/BE level batching/pipelining already. Just no access to it 
from libpq.

Oh, really.  The Bind ('B') appears to take one set of parameter values, not 
multiple sets (array).  Anyway, I had to say "I want batch update API in libpq" 
to use it in ODBC and ECPG.

Regards
Takayuki Tsunakawa



Re: [HACKERS] foreign table batch inserts

2016-05-19 Thread Craig Ringer
On 19 May 2016 at 14:08, Tsunakawa, Takayuki  wrote:


>
>
> Yes, I want FE-BE protocol-level batch inserts/updates/deletes, too.  I
> was just about to start thinking of how to implement it because of recent
> user question in pgsql-odbc.  The OP uses Microsoft SQL Server Integration
> Service (SSIS) to migrate data to PostgreSQL.  He asked for a method to
> speed up multi-row inserts, because the ODBC's multi-row insert API takes
> as long a time as when performing single-row inserts separately.  This may
> prevent the migration to PostgreSQL.
>

Well, there's FE/BE level batching/pipelining already. Just no access to it
from libpq.



-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] foreign table batch inserts

2016-05-19 Thread Tsunakawa, Takayuki
From: pgsql-hackers-ow...@postgresql.org 
[mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Craig Ringer
On 19 May 2016 at 01:39, Michael Paquier  wrote:
On Wed, May 18, 2016 at 12:27 PM, Craig Ringer  wrote:
> On 18 May 2016 at 06:08, Michael Paquier  wrote:
>> > Wouldn’t it make sense to do the insert batch wise e.g. 100 rows ?
>>
>> Using a single query string with multiple values, perhaps, but after
>> that comes into consideration query string limit particularly for
>> large text values... The query used for the insertion is a prepared
>> statement since writable queries are supported in 9.3, which makes the
>> code quite simple actually.
>
> This should be done how PgJDBC does batches. It'd require a libpq
> enhancement, but it's one we IMO need anyway: allow pipelined query
> execution from libpq.

That's also something that would be useful for the ODBC driver. Since
it is using libpq as a hard dependency and does not speak the protocol
directly, it is doing additional round trips to the server for this
exact reason when preparing a statement.


Yes, I want FE-BE protocol-level batch inserts/updates/deletes, too.  I was 
just about to start thinking of how to implement it because of recent user 
question in pgsql-odbc.  The OP uses Microsoft SQL Server Integration Service 
(SSIS) to migrate data to PostgreSQL.  He asked for a method to speed up 
multi-row inserts, because the ODBC's multi-row insert API takes as long a time 
as when performing single-row inserts separately.  This may prevent the 
migration to PostgreSQL.

And it's also useful for ECPG.  Our customer wanted ECPG to support multi-row 
insert to migrate to PostgreSQL, because their embedded-SQL apps use the 
feature with a commercial database.

If you challenge this feature, I can help you by reviewing and testing, 
implementing the ODBC and ECPG sides, etc.

Regards
Takayuki Tsunakawa



Re: [HACKERS] foreign table batch inserts

2016-05-18 Thread Craig Ringer
On 19 May 2016 at 01:39, Michael Paquier  wrote:

> On Wed, May 18, 2016 at 12:27 PM, Craig Ringer 
> wrote:
> > On 18 May 2016 at 06:08, Michael Paquier 
> wrote:
> >> > Wouldn’t it make sense to do the insert batch wise e.g. 100 rows ?
> >>
> >> Using a single query string with multiple values, perhaps, but after
> >> that comes into consideration query string limit particularly for
> >> large text values... The query used for the insertion is a prepared
> >> statement since writable queries are supported in 9.3, which makes the
> >> code quite simple actually.
> >
> > This should be done how PgJDBC does batches. It'd require a libpq
> > enhancement, but it's one we IMO need anyway: allow pipelined query
> > execution from libpq.
>
> That's also something that would be useful for the ODBC driver. Since
> it is using libpq as a hard dependency and does not speak the protocol
> directly, it is doing additional round trips to the server for this
> exact reason when preparing a statement.
>

Good to know. It'll hurt especially badly when statement level rollback is
enabled, since psqlODBC does savepoints then and it'd be able to get rid of
an extra pair of round trips.

It looks like there's plenty of use for this. FDWs, psqlODBC, client
applications doing batches, and postgres XL would benefit from it too.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] foreign table batch inserts

2016-05-18 Thread Michael Paquier
On Wed, May 18, 2016 at 12:27 PM, Craig Ringer  wrote:
> On 18 May 2016 at 06:08, Michael Paquier  wrote:
>> > Wouldn’t it make sense to do the insert batch wise e.g. 100 rows ?
>>
>> Using a single query string with multiple values, perhaps, but after
>> that comes into consideration query string limit particularly for
>> large text values... The query used for the insertion is a prepared
>> statement since writable queries are supported in 9.3, which makes the
>> code quite simple actually.
>
> This should be done how PgJDBC does batches. It'd require a libpq
> enhancement, but it's one we IMO need anyway: allow pipelined query
> execution from libpq.

That's also something that would be useful for the ODBC driver. Since
it is using libpq as a hard dependency and does not speak the protocol
directly, it is doing additional round trips to the server for this
exact reason when preparing a statement.

> [design follows]
> This would require libpq to be smarter about how it tracks queries. Right
> now it keeps track of current query, query results, etc directly in the
> connection object, and it sends a Sync after each operation then expects to
> wait in a busy state until it gets the results from that operation.

Yep.

> Instead we'd have to have a FIFO queue of messages libpq expects responses
> for. Variants of PQsendPrepare, PQsendQueryPrepared, PQsendDescribePrepared,
> etc would not  send a Sync message and would append an entry to the expected
> result queue instead of setting the current query, etc on the connection.
> They'd still mark the connection as busy, so no non-queue-aware calls could
> be run until the queue is consumed and empty.

Yep. That's exactly the ODBC regression, which become a huge problem
with more latency.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign table batch inserts

2016-05-18 Thread Craig Ringer
On 18 May 2016 at 06:08, Michael Paquier  wrote:


> > Wouldn’t it make sense to do the insert batch wise e.g. 100 rows ?
>
> Using a single query string with multiple values, perhaps, but after
> that comes into consideration query string limit particularly for
> large text values... The query used for the insertion is a prepared
> statement since writable queries are supported in 9.3, which makes the
> code quite simple actually.
>

This should be done how PgJDBC does batches. It'd require a libpq
enhancement, but it's one we IMO need anyway: allow pipelined query
execution from libpq.

[design follows]

What this should be doing is:

- send Parse to create an unnamed prepared statement; then
- loop, and:
  - send a Bind & an Execute for the query with values if the send buffer
is not full
  - If there are no more values to send, send a Sync message
  - Receive and process results if the receive buffer is not empty
  - Check each result and mark it off against the list of dispatched queries
  - If an ERROR is received, bail out
  - If a Sync is received, check that all results have been retrieved as
expected then return OK

This would require libpq to be smarter about how it tracks queries. Right
now it keeps track of current query, query results, etc directly in the
connection object, and it sends a Sync after each operation then expects to
wait in a busy state until it gets the results from that operation.

Instead we'd have to have a FIFO queue of messages libpq expects responses
for. Variants of
PQsendPrepare, PQsendQueryPrepared, PQsendDescribePrepared, etc would not
 send a Sync message and would append an entry to the expected result queue
instead of setting the current query, etc on the connection. They'd still
mark the connection as busy, so no non-queue-aware calls could be run until
the queue is consumed and empty.

These functions might return some kind of handle value that can be used to
identify the queue entry they created; it'd be pretty useless at the
moment, but needed if we ever get "cancel queries up to X" functionality on
the protocol or if we later added buffering of multiple query results.

A new PQsendSync or similar would be added to send a synchronisation point,
which would go into the FIFO. Clients would call that after enqueueing a
batch of work, e.g. after sending a commit for a batched xact. That's
required for error recovery.

Clients would use PQgetResults as before. When it returns null, they'd call
a new PQnextResult(...) function to initiate processing of the next
operation's input; this would pop the next operaiton from the FIFO, or
return null if there's nothing more in the queue. PQisBusy returns true
until there are no items left in the queue.

We'd still use the connection object for result sets, fetching rows, etc,
as there can still only be one "current" query for which a response is
being received from the server. Nothing much would change with PQgetResult
etc. There wouldn't be any PQgetResult variant to wait for results of the
nth query or for some kind of query handle, because we need the client to
consume the results of all prior queries. The client must process query
results in FIFO order. We could have per-query result buffers, etc, but it
seems pretty pointless; the client can do this for its self if it wants.

If the server sends an error, libpq would pop popping queries off the queue
until we get to the Sync there and consume input on the socketuntil we get
to a Sync on the wire. PQgetResult for each queued operation so skipped
would return a state indicating that it didn't execute because of an error
in a prior operation.

Such an API would benefit immensely from the "cancel up to" functionality
we discussed here recently; without it, it's hard to cancel anything
reliably and know what exactly you're cancelling, but it doesn't need it.
The cancel problem isn't much worse than before.

If we wanted to allow batch execution from the sync API we'd need a new
function that takes a prepared query and an array of values and manages the
send and receive buffer polling using the async API internally, since we
need to use nonblocking sockets to avoid deadlocking.

I don't think this would look that different to current libpq code to the
user. Ignoring the details about error handling on command dispatch, etc.
The app would just call a series of PQqueuePrepare, PQqueueQueryPrepared,
etc (bikeshed as desired) then PQsendSync(...). Then it'd call PQgetResults
until it returns null, call PQgetNextResult(...) and resume calling
PQgetResults(...). Repeat until PQgetNextResult(...) returns null, and
check that the most recent result was a PGRES_SYNC_OK, which is what we'll
return from processing a PQsendSync(...) result.

If the client wants to be totally nonblocking it can do the PQconsumeInput
and PQflush dance as normal. It's strongly preferable for the client to use
non-blocking writes, because if it doesn't then it risks creating a

Re: [HACKERS] foreign table batch inserts

2016-05-17 Thread Michael Paquier
On Wed, May 18, 2016 at 6:00 AM, Manuel Kniep  wrote:
> I realized that inserts into foreign tables are only done row by row.
> Consider copying data from one local table to a foreign table with
>
> INSERT INTO foreign_table(a,b,c) SELECT a,b,c FROM local_table;
>
> When the foreign  server is for example in another datacenter with long 
> latency,
> this as an enormous performance trade off.
>
> Wouldn’t it make sense to do the insert batch wise e.g. 100 rows ?

Using a single query string with multiple values, perhaps, but after
that comes into consideration query string limit particularly for
large text values... The query used for the insertion is a prepared
statement since writable queries are supported in 9.3, which makes the
code quite simple actually.

> Are there any plans doing that or am I miss something?

Not that I know of. I am adding Fujita-san in the loop here, he is
quite involved with postgres_fdw these days so perhaps he has some
input to offer.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] foreign table batch inserts

2016-05-17 Thread Manuel Kniep
Hi,

I realized that inserts into foreign tables are only done row by row.
Consider copying data from one local table to a foreign table with

INSERT INTO foreign_table(a,b,c) SELECT a,b,c FROM local_table;

When the foreign  server is for example in another datacenter with long latency,
this as an enormous performance trade off.

Wouldn’t it make sense to do the insert batch wise e.g. 100 rows ?
Are there any plans doing that or am I miss something?

regards

Manuel Kniep



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers