The default value for fdw_tuple_cost is 0.01, which seems way too low.  If
I set up a loop-back foreign server with a large fetch_size, then tests
like:

select * from pgbench_accounts except select * from
loopback.pgbench_accounts

vs

select * from pgbench_accounts except select * from pgbench_accounts

indicate that 0.1 is about the lowest value for fdw_tuple_cost that could
make sense, and a reasonable default would probably be 0.25.  Yes, it is
only a default, but the default should make sense for at least some
situation, and I can't imagine any situation in which 0.01 makes sense.

In the documentation for fdw_startup_cost, it says "This represents the
additional overhead of establishing a connection, parsing and planning the
query on the remote side, etc.".  I think that "establishing a connection"
should be stricken. Either you need a connection or you don't, there is
nothing the optimizer can do about this.  And if do need one, you only
establish one once (at most), not once per query sent to the remote side.
I think the implementation correctly doesn't try to account for the
overhead of establishing a connection, so the docs should remove that claim.

In regards to use_remote_estimate, the documentation says "Running ANALYZE
on the foreign table is the way to update the local statistics; this will
perform a scan of the remote table and then calculate and store statistics
just as though the table were local. Keeping local statistics can be a
useful way to reduce per-query planning overhead for a remote table — but
if the remote table is frequently updated, the local statistics will soon
be obsolete."  This makes it send like local stats is basically equivalent
to use_remote_estimate, other than the staleness issue.  But they are far
from equivalent.  use_remote_estimate has implicit knowledge of the indexes
on the foreign server (implicit via the reduced cost estimates derived from
the foreign side for parameterized queries), whereas local stats of foreign
tables just assumes there are no indexes for planning purposes. Perhaps
adding something like "Also, local statistics do not contain information on
the available indexes on the remote side, while use_remote_estimate does
take these into account"?

Cheers,

Jeff

Reply via email to