Re: postgres_fdw insert extremely slow

2020-11-29 Thread Craig Ringer
On Thu, Nov 26, 2020 at 12:37 AM Mats Julian Olsen 
wrote:

>
> We have just set up postgres_fdw between two postgres databases, x and y,
> with the plan to periodically insert data from x into y.
>
> We've successfully set up the connection with a few options:
> `use_remote_estimate 'true'` and `fetch_size '5'` (the latter we've
> played around with). We've run ANALYZE on the foreign server.
>
> SELECTs against the foreign table returns in milliseconds, however an
> INSERT of 100 rows takes 10 seconds. we have roughly 200 000 rows for the
> initial sync, which translates into ~6 hours.
>
> Is this the expected performance of postgre_fdw? Is there anything we've
> overlooked when setting this up? Very curious to hear experiences from the
> community when doing read/write and not just read from foreign sources.
>

Are your inserts run in individual transactions or grouped into one
transaction? If the latter, commit time will be a factor.

What's the round-trip time (ping time) to the foreign server? Since
postgres_fdw runs each individual insert as a separate statement, you're
going to face insert times of (n * RTT) for inserts. Assuming negligible
time for insert execution on the foreign server, your runtime is 21600
seconds for 20 rows, i.e. 9.25 rows/second or 0.108 seconds/row. That
would be consistent with a 90-100ms ping time to the foreign server.

You'll be pleased to know that there is currently work ongoing in
pgsql-hackers to add the capability to batch INSERTs in postgres_fdw to
improve performance on higher latency links. See
https://www.postgresql.org/message-id/flat/20200628151002.7x5laxwpgvkyiu3q%40development
. That could well reduce your RTTs immensely. Try the patch out if you can
and report back please.

If you can get the client application to manage the foreign insert
directly, then handle commit consistency using two-phase commit, you should
be able to do the insert in half an hour or less instead (assuming ~10ms
execution time per insert and 90ms RTT). If you use `COPY`, or if you can
use JDBC to benefit from PgJDBC's support for the JDBC addBatch() and
executeBatch() APIs, you should be able to get it down way lower than that.
Assuming your RTT latency is 90ms and you spend 10ms executing each insert,
your insert time might well go down to 0.010 * 20 + 90*2 = 2180 seconds
or about 36 minutes. If you can insert a row in 3ms with COPY, 13 minutes.

There's work ongoing on making libpq (which underlies postgres_fdw) capable
of running multiple statements at the same time, i.e. "pipelining". That
won't immediately benefit postgres_fdw because using it in postgres_fdw
would require changes to the whole postgres executor as well. But if
adopted, it'd allow postgres_fdw to achieve that sort of performance
transparently.


Re: postgres_fdw insert extremely slow

2020-11-27 Thread David G. Johnston
On Fri, Nov 27, 2020 at 2:00 PM pabloa98  wrote:

> I would like to suggest for postgres_fdw: If the foreign database is
> PostgreSQL,
>

Just to be clear, the "postgres" part of the name means the remote database
must be a PostgreSQL database, there is no "if".  Likewise, for the
extension mysql_fdw the remote database is "MySQL".


> the link should just pass through all the CRUD SQL commands to the other
> database.
>
> If the other database is of a version so different that cannot make sense
> of the CRUD SQL command, it will generate an error and that's it.
>
> This would be very useful to keep datasets synchronized.
>
>
We already offer a tool for that, dblink.

https://www.postgresql.org/docs/current/dblink.html

But the generalized behavior of FDW doesn't work to just send a raw SQL
command across, even for CRUD.
e.g.,
UPDATE remote_tbl
SET ...
FROM local_tbl
WHERE remote_tbl.col_id=local_tbl.col_id;
or
INSERT INTO remote_tbl SELECT * FROM local_tbl;

Not saying that there isn't room for improvement here but I'm doubting it's
simple.

David J.


Re: postgres_fdw insert extremely slow

2020-11-27 Thread pabloa98
I would like to suggest for postgres_fdw: If the foreign database is
PostgreSQL, the link should just pass through all the CRUD SQL commands to
the other database.

If the other database is of a version so different that cannot make sense
of the CRUD SQL command, it will generate an error and that's it.

This would be very useful to keep datasets synchronized.

Pablo


Re: postgres_fdw insert extremely slow

2020-11-27 Thread Mats Julian Olsen

On 27.11.2020 10:11, pabloa98 wrote:



On Thu, Nov 26, 2020 at 8:25 PM Laurenz Albe > wrote:


On Thu, 2020-11-26 at 09:07 -0800, Adrian Klaver wrote:
> So even if Mats where to break this query:
>
> INSERT INTO foreign.labels (address, labels)
> SELECT address_id, ARRAY_AGG(name) AS labels
> FROM labels
> GROUP BY 1
> LIMIT 100;
>
> down into something like this:
>
> INSERT INTO foreign.labels (address, labels)
> VALUES (), (), (), ();
>
> postgres_fdw would send it as individual INSERTs?

Yes, that's the way the FDW API works.

I can confirm that it is NOT like FDW API work. I have the same 
problem and in the fdw database it receives:


INSERT INTO foreign.labels (address, labels) VALUES ();

One value at a time.

Same scenario, same performance. No network related. It is just a 
batch insert is split into 1 insert by row.



Pablo


Thank you all for your replies.

We've decided not to use postgres_fdw for the time being because of this 
behavior. Hope to revisit in the future.


Best,

--
Mats
CTO @ Dune Analytics
We're hiring: https://careers.duneanalytics.com



Re: postgres_fdw insert extremely slow

2020-11-27 Thread pabloa98
On Thu, Nov 26, 2020 at 8:25 PM Laurenz Albe 
wrote:

> On Thu, 2020-11-26 at 09:07 -0800, Adrian Klaver wrote:
> > So even if Mats where to break this query:
> >
> > INSERT INTO foreign.labels (address, labels)
> > SELECT address_id, ARRAY_AGG(name) AS labels
> > FROM labels
> > GROUP BY 1
> > LIMIT 100;
> >
> > down into something like this:
> >
> > INSERT INTO foreign.labels (address, labels)
> > VALUES (), (), (), ();
> >
> > postgres_fdw would send it as individual INSERTs?
>
> Yes, that's the way the FDW API works.
>
> I can confirm that it is NOT like FDW API work. I have the same problem
and in the fdw database it receives:

INSERT INTO foreign.labels (address, labels) VALUES ();

One value at a time.

Same scenario, same performance. No network related. It is just a batch
insert is split into 1 insert by row.


Pablo


Re: postgres_fdw insert extremely slow

2020-11-26 Thread Laurenz Albe
On Thu, 2020-11-26 at 09:07 -0800, Adrian Klaver wrote:
> So even if Mats where to break this query:
> 
> INSERT INTO foreign.labels (address, labels)
> SELECT address_id, ARRAY_AGG(name) AS labels
> FROM labels
> GROUP BY 1
> LIMIT 100;
> 
> down into something like this:
> 
> INSERT INTO foreign.labels (address, labels)
> VALUES (), (), (), ();
> 
> postgres_fdw would send it as individual INSERTs?

Yes, that's the way the FDW API works.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com





Re: postgres_fdw insert extremely slow

2020-11-26 Thread Adrian Klaver

On 11/25/20 11:18 AM, Tom Lane wrote:

Mats Julian Olsen  writes:

I've got some more numbers here:
...
To me this does indicate some sort of networking issue, but I'm
wondering if INSERTs are treated differently than SELECTs in
postgres_fdw? The only feasibly explanation I have is that postgres_fdw
does many more network calls for INSERT than for SELECT, e.g. something
like 1 for SELECT and `n` for INSERT?


I don't have the code in front of me, but from memory, postgres_fdw
will issue an INSERT statement to the remote for each row it has to
insert.  Maybe you are indeed just dealing with spectacularly bad
network round trip times.

You could try turning on log_statement and/or log_duration on the
remote to see if that sheds any more light about how much time is
spent executing each insertion vs. the network delay.

There's been some recent discussion about teaching postgres_fdw to
batch insertions, which would likely be helpful in your situation.
I don't know how close that is to committable, but in any case
it couldn't see the light of day earlier than v14.  In the meantime,
if you're sufficiently desperate maybe you could switch to using
dblink with manually-issued multi-row INSERTs.  (This needn't
preclude continuing to access the table with postgres_fdw when
that's suitable.)


So even if Mats where to break this query:

INSERT INTO foreign.labels (address, labels)
SELECT address_id, ARRAY_AGG(name) AS labels
FROM labels
GROUP BY 1
LIMIT 100;

down into something like this:

INSERT INTO foreign.labels (address, labels)
VALUES (), (), (), ();

postgres_fdw would send it as individual INSERTs?



regards, tom lane




--
Adrian Klaver
adrian.kla...@aklaver.com




Re: postgres_fdw insert extremely slow

2020-11-25 Thread Tom Lane
Mats Julian Olsen  writes:
> I've got some more numbers here:
> ...
> To me this does indicate some sort of networking issue, but I'm 
> wondering if INSERTs are treated differently than SELECTs in 
> postgres_fdw? The only feasibly explanation I have is that postgres_fdw 
> does many more network calls for INSERT than for SELECT, e.g. something 
> like 1 for SELECT and `n` for INSERT?

I don't have the code in front of me, but from memory, postgres_fdw
will issue an INSERT statement to the remote for each row it has to
insert.  Maybe you are indeed just dealing with spectacularly bad
network round trip times.

You could try turning on log_statement and/or log_duration on the
remote to see if that sheds any more light about how much time is
spent executing each insertion vs. the network delay.

There's been some recent discussion about teaching postgres_fdw to
batch insertions, which would likely be helpful in your situation.
I don't know how close that is to committable, but in any case
it couldn't see the light of day earlier than v14.  In the meantime,
if you're sufficiently desperate maybe you could switch to using
dblink with manually-issued multi-row INSERTs.  (This needn't
preclude continuing to access the table with postgres_fdw when
that's suitable.)

regards, tom lane




Re: postgres_fdw insert extremely slow

2020-11-25 Thread Mats Julian Olsen



On 25.11.2020 18:12, Tom Lane wrote:

Mats Julian Olsen  writes:

On 25.11.2020 17:58, Tom Lane wrote:

... Have you tried looking into pg_locks on the
remote server while this query is running?

Thanks Tom, I'll try to spin up a regular Postgres instance on both rds
and ec2 and see if that helps. As for the locks, I can not see any
blocked activity on the remote server while the query runs.

Another place to check is pg_stat_activity, specifically wait_event_type
and wait_event.


Thanks Tom, I've now setup the fdw from a local postgres instance and 
seeing similar timings for the selects and inserts.


I've got some more numbers here:

x (aurora aws eu-west-1) => y  (gcp gce us-central-1):  ~15 s

local (eu) => y: ~15 s

local (eu) => test 1 (postgres aws rds eu-west-1): ~4 s

local (eu) => test 2 (postgres google cloud sql, us-central-1): ~15s

local (eu) => local (eu): < 1s

SELECTs in all instances are sub-second.


To me this does indicate some sort of networking issue, but I'm 
wondering if INSERTs are treated differently than SELECTs in 
postgres_fdw? The only feasibly explanation I have is that postgres_fdw 
does many more network calls for INSERT than for SELECT, e.g. something 
like 1 for SELECT and `n` for INSERT?


Best,


Mats





Re: postgres_fdw insert extremely slow

2020-11-25 Thread Tom Lane
Mats Julian Olsen  writes:
> On 25.11.2020 17:58, Tom Lane wrote:
>> ... Have you tried looking into pg_locks on the
>> remote server while this query is running?

> Thanks Tom, I'll try to spin up a regular Postgres instance on both rds 
> and ec2 and see if that helps. As for the locks, I can not see any 
> blocked activity on the remote server while the query runs.

Another place to check is pg_stat_activity, specifically wait_event_type
and wait_event.

regards, tom lane




Re: postgres_fdw insert extremely slow

2020-11-25 Thread Mats Julian Olsen



On 25.11.2020 17:58, Tom Lane wrote:

Mats Julian Olsen  writes:

Postgres version(s)?

x: Aurora PostgreSQL 11.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC)
4.9.3, 64-bit (RDS)
y: PostgreSQL 12.2 (Ubuntu 12.2-2.pgdg19.10+1) on x86_64-pc-linux-gnu,
compiled by gcc (Ubuntu 9.2.1-9ubuntu2) 9.2.1 20191008, 64-bit (GCP)

Hmm, I wonder if Aurora could be doing something funny here?

Anyway, to answer your question, no that's not the expected level of
performance.  postgres_fdw is certainly not inexpensive compared to
local table access, but I'd still think inserts should take only a
small number of milliseconds.  It seems like something must be
blocking the query.  Have you tried looking into pg_locks on the
remote server while this query is running?

regards, tom lane
Thanks Tom, I'll try to spin up a regular Postgres instance on both rds 
and ec2 and see if that helps. As for the locks, I can not see any 
blocked activity on the remote server while the query runs.


--
Mats
CTO @ Dune Analytics
We're hiring: https://careers.duneanalytics.com





Re: postgres_fdw insert extremely slow

2020-11-25 Thread Mats Julian Olsen



On 25.11.2020 17:57, Adrian Klaver wrote:

On 11/25/20 8:48 AM, Mats Julian Olsen wrote:

Apologies for the sloppiness!

Postgres version(s)?


x: Aurora PostgreSQL 11.7 on x86_64-pc-linux-gnu, compiled by gcc 
(GCC) 4.9.3, 64-bit (RDS)


y: PostgreSQL 12.2 (Ubuntu 12.2-2.pgdg19.10+1) on 
x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.2.1-9ubuntu2) 9.2.1 
20191008, 64-bit (GCP)


So they are separated by what network distance?

By quite a lot! One server is in eu-west-1 and the other in gcp 
us-central1-a. Note however, that SELECTs across the network are very 
fast, so it doesn't seem like this is a network issue.




Without the actual query and EXPLAIN ANALYZE on it this will be 
difficult to answer with any detail.


Also would be nice to have the table schema.



Here's the query, schema and the EXPLAIN ANALYZE


FYI, you will get more responses by including below information 
directly in the email. To that end:


On x

CREATE TABLE labels (
    id integer NOT NULL,
    address_id bytea NOT NULL,
    name text NOT NULL,
    author character varying(50) NOT NULL,
    type text NOT NULL,
    source text,
    updated_at timestamp with time zone DEFAULT now() NOT NULL,
    CONSTRAINT lowercase_name CHECK ((name = lower(name))),
    CONSTRAINT lowercase_type CHECK ((type = lower(type))),
    CONSTRAINT whitespace_name CHECK ((regexp_replace(btrim(name, ' 
'::text), '(\s+)'::text, ' '::text, 'g'::text) = name)),
    CONSTRAINT whitespace_type CHECK ((regexp_replace(btrim(type, ' 
'::text), '(\s+)'::text, ' '::text, 'g'::text) = type))

);
CREATE INDEX labels_label ON labels USING BTREE (name);
CREATE INDEX labels_label_type ON labels USING BTREE (type);
CREATE INDEX labels_address ON labels USING HASH (address_id);
CREATE INDEX labels_source ON labels USING BTREE (source);

On y

CREATE TABLE labels.labels (
    address bytea PRIMARY KEY,
    labels  text[]
);



query: https://gist.github.com/mewwts/704663e19a21bc51864f3298f09a18c0


INSERT INTO foreign.labels (address, labels)
SELECT address_id, ARRAY_AGG(name) AS labels
FROM labels
GROUP BY 1
LIMIT 100;



plan: https://explain.depesz.com/s/RQFQ


Insert on labels  (cost=0.42..26.19 rows=100 width=53) (actual 
time=11541.205..11541.205 rows=0 loops=1)
   ->  Limit  (cost=0.42..25.19 rows=100 width=53) (actual 
time=0.068..3.549 rows=100 loops=1)
 ->  GroupAggregate  (cost=0.42..26502.02 rows=106996 
width=53) (actual time=0.066..3.449 rows=100 loops=1)

   Group Key: labels_1.address_id
   ->  Index Only Scan using 
labels_address_id_type_name_key on labels labels_1 
(cost=0.42..24068.85 rows=219145 width=31) (actual time=0.054..0.414 
rows=201 loops=1)

 Heap Fetches: 0
 Planning Time: 0.102 ms
 Execution Time: 12797.143 ms





Best,

Thank you for inlining this!

--
Mats
CTO @ Dune Analytics
We're hiring: https://careers.duneanalytics.com





Re: postgres_fdw insert extremely slow

2020-11-25 Thread Tom Lane
Mats Julian Olsen  writes:
>> Postgres version(s)?

> x: Aurora PostgreSQL 11.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 
> 4.9.3, 64-bit (RDS)
> y: PostgreSQL 12.2 (Ubuntu 12.2-2.pgdg19.10+1) on x86_64-pc-linux-gnu, 
> compiled by gcc (Ubuntu 9.2.1-9ubuntu2) 9.2.1 20191008, 64-bit (GCP)

Hmm, I wonder if Aurora could be doing something funny here?

Anyway, to answer your question, no that's not the expected level of
performance.  postgres_fdw is certainly not inexpensive compared to
local table access, but I'd still think inserts should take only a
small number of milliseconds.  It seems like something must be
blocking the query.  Have you tried looking into pg_locks on the
remote server while this query is running?

regards, tom lane




Re: postgres_fdw insert extremely slow

2020-11-25 Thread Adrian Klaver

On 11/25/20 8:48 AM, Mats Julian Olsen wrote:

Apologies for the sloppiness!

Postgres version(s)?


x: Aurora PostgreSQL 11.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 
4.9.3, 64-bit (RDS)


y: PostgreSQL 12.2 (Ubuntu 12.2-2.pgdg19.10+1) on x86_64-pc-linux-gnu, 
compiled by gcc (Ubuntu 9.2.1-9ubuntu2) 9.2.1 20191008, 64-bit (GCP)


So they are separated by what network distance?





Without the actual query and EXPLAIN ANALYZE on it this will be 
difficult to answer with any detail.


Also would be nice to have the table schema.



Here's the query, schema and the EXPLAIN ANALYZE


FYI, you will get more responses by including below information directly 
in the email. To that end:


On x

CREATE TABLE labels ( 

id integer NOT NULL, 

address_id bytea NOT NULL, 

name text NOT NULL, 

author character varying(50) NOT NULL, 

type text NOT NULL, 

source text, 

updated_at timestamp with time zone DEFAULT now() NOT NULL, 

CONSTRAINT lowercase_name CHECK ((name = lower(name))), 

CONSTRAINT lowercase_type CHECK ((type = lower(type))), 

CONSTRAINT whitespace_name CHECK ((regexp_replace(btrim(name, ' 
'::text), '(\s+)'::text, ' '::text, 'g'::text) = name)),
CONSTRAINT whitespace_type CHECK ((regexp_replace(btrim(type, ' 
'::text), '(\s+)'::text, ' '::text, 'g'::text) = type))
); 


CREATE INDEX labels_label ON labels USING BTREE (name);
CREATE INDEX labels_label_type ON labels USING BTREE (type);
CREATE INDEX labels_address ON labels USING HASH (address_id);
CREATE INDEX labels_source ON labels USING BTREE (source);

On y

CREATE TABLE labels.labels (
address bytea PRIMARY KEY,
labels  text[]
);



query: https://gist.github.com/mewwts/704663e19a21bc51864f3298f09a18c0


INSERT INTO foreign.labels (address, labels)
SELECT address_id, ARRAY_AGG(name) AS labels
FROM labels
GROUP BY 1
LIMIT 100;



plan: https://explain.depesz.com/s/RQFQ


Insert on labels  (cost=0.42..26.19 rows=100 width=53) (actual 
time=11541.205..11541.205 rows=0 loops=1)
   ->  Limit  (cost=0.42..25.19 rows=100 width=53) (actual 
time=0.068..3.549 rows=100 loops=1)
 ->  GroupAggregate  (cost=0.42..26502.02 rows=106996 width=53) 
(actual time=0.066..3.449 rows=100 loops=1)

   Group Key: labels_1.address_id
   ->  Index Only Scan using 
labels_address_id_type_name_key on labels labels_1  (cost=0.42..24068.85 
rows=219145 width=31) (actual time=0.054..0.414 rows=201 loops=1)

 Heap Fetches: 0
 Planning Time: 0.102 ms
 Execution Time: 12797.143 ms





Best,




--
Adrian Klaver
adrian.kla...@aklaver.com




Re: postgres_fdw insert extremely slow

2020-11-25 Thread Mats Julian Olsen

Apologies for the sloppiness!

Postgres version(s)?


x: Aurora PostgreSQL 11.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 
4.9.3, 64-bit (RDS)


y: PostgreSQL 12.2 (Ubuntu 12.2-2.pgdg19.10+1) on x86_64-pc-linux-gnu, 
compiled by gcc (Ubuntu 9.2.1-9ubuntu2) 9.2.1 20191008, 64-bit (GCP)




Without the actual query and EXPLAIN ANALYZE on it this will be 
difficult to answer with any detail.


Also would be nice to have the table schema.



Here's the query, schema and the EXPLAIN ANALYZE

query: https://gist.github.com/mewwts/704663e19a21bc51864f3298f09a18c0

plan: https://explain.depesz.com/s/RQFQ


Best,

--
Mats
CTO @ Dune Analytics
We're hiring: https://careers.duneanalytics.com





Re: postgres_fdw insert extremely slow

2020-11-25 Thread Adrian Klaver

On 11/25/20 8:37 AM, Mats Julian Olsen wrote:

Helle pgsql-general,

We have just set up postgres_fdw between two postgres databases, x and 
y, with the plan to periodically insert data from x into y.


We've successfully set up the connection with a few options: 
`use_remote_estimate 'true'` and `fetch_size '5'` (the latter we've 
played around with). We've run ANALYZE on the foreign server.


SELECTs against the foreign table returns in milliseconds, however an 
INSERT of 100 rows takes 10 seconds. we have roughly 200 000 rows for 
the initial sync, which translates into ~6 hours.


Postgres version(s)?

Without the actual query and EXPLAIN ANALYZE on it this will be 
difficult to answer with any detail.


Also would be nice to have the table schema.



Is this the expected performance of postgre_fdw? Is there anything we've 
overlooked when setting this up? Very curious to hear experiences from 
the community when doing read/write and not just read from foreign sources.


Best regards,

--
Mats
CTO @ Dune Analytics
We're hiring: https://careers.duneanalytics.com 




--
Adrian Klaver
adrian.kla...@aklaver.com




postgres_fdw insert extremely slow

2020-11-25 Thread Mats Julian Olsen
Helle pgsql-general,

We have just set up postgres_fdw between two postgres databases, x and y,
with the plan to periodically insert data from x into y.

We've successfully set up the connection with a few options:
`use_remote_estimate 'true'` and `fetch_size '5'` (the latter we've
played around with). We've run ANALYZE on the foreign server.

SELECTs against the foreign table returns in milliseconds, however an
INSERT of 100 rows takes 10 seconds. we have roughly 200 000 rows for the
initial sync, which translates into ~6 hours.

Is this the expected performance of postgre_fdw? Is there anything we've
overlooked when setting this up? Very curious to hear experiences from the
community when doing read/write and not just read from foreign sources.

Best regards,

-- 
Mats
CTO @ Dune Analytics
We're hiring: https://careers.duneanalytics.com