Hi Konstantin,
(2018/06/22 15:26), Konstantin Knizhnik wrote:
On 21.06.2018 20:08, Tom Lane wrote:
Konstantin Knizhnik <k.knizh...@postgrespro.ru> writes:
The following very simple test reduce the problem with wrong cost
estimation:
create foreign table t1_fdw(x integer, y integer) server pg_fdw options
(table_name 't1', use_remote_estimate 'false');
create foreign table t2_fdw(x integer) server pg_fdw options (table_name
't2', use_remote_estimate 'false');
It is possible to force Postgres to use correct plan by setting
"fdw_startup_cost" to some very large value (100000000 for example).
...
Also correct plan is used when use_remote_estimate is true.
If you are unhappy about the results with use_remote_estimate off, don't
run it that way. The optimizer does not have a crystal ball.
As I wrote, use_remote_estimate can not be used because in this case
query compilation time is unacceptable (10 seconds, while time of query
execution itself is ~200msec).
So the problem can be addressed in two ways:
1. Try to reduce time of remote estimation. I wonder why postgres_fdw
sends so much queries to remote server. For join of two tables there are
7 queries.
I suspect that for ~20 joined tables in the original query number of
calls is more than hundred, so on wonder that it takes so much time.
2. Try to make optimizer make better estimation of join cost based on
local statistic (please notice that ANALYZE is explicitly called for all
foreign tables and number of rows in the result was correctly calculated).
To make local estimates more accurate, I think we need other information
on remote tables such as remote indexes.
What do you think: which of this two direction is more perspective? Or
it is better to address both of them?
I'd vote for #2. One idea for that is to introduce CREATE FOREIGN INDEX
to have information on remote indexes on the local side, which I
proposed before. I have been putting it on hold since then, though.
Best regards,
Etsuro Fujita