On Thu, Nov 26, 2020 at 12:37 AM Mats Julian Olsen <m...@duneanalytics.com>
wrote:

>
> We have just set up postgres_fdw between two postgres databases, x and y,
> with the plan to periodically insert data from x into y.
>
> We've successfully set up the connection with a few options:
> `use_remote_estimate 'true'` and `fetch_size '50000'` (the latter we've
> played around with). We've run ANALYZE on the foreign server.
>
> SELECTs against the foreign table returns in milliseconds, however an
> INSERT of 100 rows takes 10 seconds. we have roughly 200 000 rows for the
> initial sync, which translates into ~6 hours.
>
> Is this the expected performance of postgre_fdw? Is there anything we've
> overlooked when setting this up? Very curious to hear experiences from the
> community when doing read/write and not just read from foreign sources.
>

Are your inserts run in individual transactions or grouped into one
transaction? If the latter, commit time will be a factor.

What's the round-trip time (ping time) to the foreign server? Since
postgres_fdw runs each individual insert as a separate statement, you're
going to face insert times of (n * RTT) for inserts. Assuming negligible
time for insert execution on the foreign server, your runtime is 21600
seconds for 200000 rows, i.e. 9.25 rows/second or 0.108 seconds/row. That
would be consistent with a 90-100ms ping time to the foreign server.

You'll be pleased to know that there is currently work ongoing in
pgsql-hackers to add the capability to batch INSERTs in postgres_fdw to
improve performance on higher latency links. See
https://www.postgresql.org/message-id/flat/20200628151002.7x5laxwpgvkyiu3q%40development
. That could well reduce your RTTs immensely. Try the patch out if you can
and report back please.

If you can get the client application to manage the foreign insert
directly, then handle commit consistency using two-phase commit, you should
be able to do the insert in half an hour or less instead (assuming ~10ms
execution time per insert and 90ms RTT). If you use `COPY`, or if you can
use JDBC to benefit from PgJDBC's support for the JDBC addBatch() and
executeBatch() APIs, you should be able to get it down way lower than that.
Assuming your RTT latency is 90ms and you spend 10ms executing each insert,
your insert time might well go down to 0.010 * 200000 + 90*2 = 2180 seconds
or about 36 minutes. If you can insert a row in 3ms with COPY, 13 minutes.

There's work ongoing on making libpq (which underlies postgres_fdw) capable
of running multiple statements at the same time, i.e. "pipelining". That
won't immediately benefit postgres_fdw because using it in postgres_fdw
would require changes to the whole postgres executor as well. But if
adopted, it'd allow postgres_fdw to achieve that sort of performance
transparently.

Reply via email to