Re: [HACKERS] Possible regression regarding estimating relation width in FDWs

Tom Lane Tue, 24 May 2016 14:15:14 -0700

Ronan Dunklau <ronan.dunk...@dalibo.com> writes:
> While working on adapting the Multicorn FDW for 9.6, I noticed that there is 
> a  
> regression with regards to estimating the remote relation width.


Hm.  In the old code it was basically a chance artifact that this example
happens to produce the "right" width estimate.  There are plenty of other
cases where the per-column width estimates were already the relevant ones,
for example if you join the foreign table to another one.  postgres_fdw
is falling down on the job by not making the per-column width estimates
consistent with the overall relation width estimate.

We could imagine extending use_remote_estimate mode to collect per-column
width estimates from the remote end, but that would add quite a lot of
cost.  It's not really necessary either, IMO, because you can instead
ANALYZE the foreign table to cause column width estimates to be computed
and stored locally.

If you do that in this example, you find another interesting thing about
HEAD's behavior:

regression=# EXPLAIN SELECT * FROM foreign_table;
                              QUERY PLAN                              
----------------------------------------------------------------------
 Foreign Scan on foreign_table  (cost=100.00..101.03 rows=1 width=32)
(1 row)

regression=# ANALYZE foreign_table;
ANALYZE
regression=# EXPLAIN SELECT * FROM foreign_table;
                               QUERY PLAN                                
-------------------------------------------------------------------------
 Foreign Scan on foreign_table  (cost=100.00..101.03 rows=1 width=40004)
(1 row)

The width estimate is now based on the decompressed/detoasted column
width, which is really the right thing here because that is the
representation we'll be working with for any local operations ---
estimating the size of a hash table using the remote's toasted column
width, for example, is just wrong.  So I don't see any basis for arguing
that 472 is the "right" width to use for the foreign table.

In both HEAD and 9.5, join cases (post-ANALYZE) look pretty wacko:

regression=# EXPLAIN SELECT * FROM foreign_table cross join int8_tbl;
                                 QUERY PLAN                                  
-----------------------------------------------------------------------------
 Nested Loop  (cost=100.00..102.13 rows=5 width=40020)
   ->  Foreign Scan on foreign_table  (cost=100.00..101.03 rows=1 width=472)
   ->  Seq Scan on int8_tbl  (cost=0.00..1.05 rows=5 width=16)
(3 rows)

The top-level estimate here is actually right, IMV, but the width estimate
for the ForeignScan is not.

In view of this, I'm a bit tempted to double down on the ANALYZE
dependency by having postgres_fdw not pay attention to the remote's width
estimates at all, but just use whatever column width data is cached
locally, and sum those numbers to get a relation width.  That would be
more reliable if we've done an ANALYZE recently, and I'm not convinced
it'd be worse if we have not.

Anyway, the bottom line is that the behavior of 9.5 and before is not
so great in this area that I feel a need to be bug-compatible with it.

                        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Possible regression regarding estimating relation width in FDWs

Reply via email to