Ronan Dunklau <ronan.dunk...@dalibo.com> writes: > While working on adapting the Multicorn FDW for 9.6, I noticed that there is > a > regression with regards to estimating the remote relation width.
Hm. In the old code it was basically a chance artifact that this example happens to produce the "right" width estimate. There are plenty of other cases where the per-column width estimates were already the relevant ones, for example if you join the foreign table to another one. postgres_fdw is falling down on the job by not making the per-column width estimates consistent with the overall relation width estimate. We could imagine extending use_remote_estimate mode to collect per-column width estimates from the remote end, but that would add quite a lot of cost. It's not really necessary either, IMO, because you can instead ANALYZE the foreign table to cause column width estimates to be computed and stored locally. If you do that in this example, you find another interesting thing about HEAD's behavior: regression=# EXPLAIN SELECT * FROM foreign_table; QUERY PLAN ---------------------------------------------------------------------- Foreign Scan on foreign_table (cost=100.00..101.03 rows=1 width=32) (1 row) regression=# ANALYZE foreign_table; ANALYZE regression=# EXPLAIN SELECT * FROM foreign_table; QUERY PLAN ------------------------------------------------------------------------- Foreign Scan on foreign_table (cost=100.00..101.03 rows=1 width=40004) (1 row) The width estimate is now based on the decompressed/detoasted column width, which is really the right thing here because that is the representation we'll be working with for any local operations --- estimating the size of a hash table using the remote's toasted column width, for example, is just wrong. So I don't see any basis for arguing that 472 is the "right" width to use for the foreign table. In both HEAD and 9.5, join cases (post-ANALYZE) look pretty wacko: regression=# EXPLAIN SELECT * FROM foreign_table cross join int8_tbl; QUERY PLAN ----------------------------------------------------------------------------- Nested Loop (cost=100.00..102.13 rows=5 width=40020) -> Foreign Scan on foreign_table (cost=100.00..101.03 rows=1 width=472) -> Seq Scan on int8_tbl (cost=0.00..1.05 rows=5 width=16) (3 rows) The top-level estimate here is actually right, IMV, but the width estimate for the ForeignScan is not. In view of this, I'm a bit tempted to double down on the ANALYZE dependency by having postgres_fdw not pay attention to the remote's width estimates at all, but just use whatever column width data is cached locally, and sum those numbers to get a relation width. That would be more reliable if we've done an ANALYZE recently, and I'm not convinced it'd be worse if we have not. Anyway, the bottom line is that the behavior of 9.5 and before is not so great in this area that I feel a need to be bug-compatible with it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers