Re: On disable_cost

David Rowley Wed, 31 Jul 2024 19:01:39 -0700

On Thu, 1 Aug 2024 at 04:23, Robert Haas <robertmh...@gmail.com> wrote:
> OK, here's a new patch version.


I think we're going down the right path here.

I've reviewed both patches, here's what I noted down during my review:

0. I've not seen any mention so far about postgres_fdw's
use_remote_estimate.  Maybe changing the costs is fixing an issue that
existed before. I'm just not 100% sure on that.

Consider:

CREATE EXTENSION postgres_fdw;

DO $d$
    BEGIN
        EXECUTE $$CREATE SERVER loopback FOREIGN DATA WRAPPER postgres_fdw
            OPTIONS (use_remote_estimate 'true',
                     dbname '$$||current_database()||$$',
                     port '$$||current_setting('port')||$$'
            )$$;
    END;
$d$;
CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;

create table t (a int);
create foreign table ft (a int) server loopback OPTIONS (table_name 't');

alter system set enable_seqscan=0;
select pg_Reload_conf();
set enable_seqscan=1;
explain select * from ft;

patched:
 Foreign Scan on ft  (cost=100.00..671.00 rows=2550 width=4)

master:
 Foreign Scan on ft  (cost=10000000100.00..10000000671.00 rows=2550 width=4)

I kinda think that might be fixing an issue that I don't recall being
reported before. I think we shouldn't really care that much about what
nodes are disabled on the remote server and not having disabled_cost
applied to that gives us that.

1. The final sentence of the function header comment needs to be
updated in estimate_path_cost_size().

2. Does cost_tidscan() need to update the header comment to say
tidquals must not be empty?

3. final_cost_nestloop() seems to initially use the disabled_nodes
from initial_cost_nestloop() but then it goes off and calculates it
again itself. One of these seems redundant. The "We could include
disable_cost in the preliminary estimate" comment explains why it was
originally left to final_cost_nestloop(), so maybe worth sticking to
that? I don't quite know the full implications, but it does not seem
worth risking a behaviour change here.

4. I wonder if it's worth doing a quick refactor of the code in
initial_cost_mergejoin() to get rid of the duplicate code in the "if
(outersortkeys)" and "if (innersortkeys)" branches. It seems ok to do
outer_path = &sort_path. Likewise for inner_path.

5. final_cost_hashjoin() does the same thing as #3

6. createplan.c adds #include "nodes/print.h" but doesn't seem to add
any code that might use anything in there.

7. create_lockrows_path() needs to propagate disabled_nodes.

create table a (a int);
set enable_seqscan=0;

explain select * from a for update limit 1;

 Limit  (cost=0.00..0.02 rows=1 width=10)
   ->  LockRows  (cost=0.00..61.00 rows=2550 width=10)
         ->  Seq Scan on a  (cost=0.00..35.50 rows=2550 width=10)
               Disabled Nodes: 1
(4 rows)


explain select * from a limit 1;

 Limit  (cost=0.00..0.01 rows=1 width=4)
   Disabled Nodes: 1
   ->  Seq Scan on a  (cost=0.00..35.50 rows=2550 width=4)
         Disabled Nodes: 1
(4 rows)

8. There's something weird with CTEs too.

create table b(a int);
set enable_sort=0;

Patched:

explain with cte as materialized (select * from b order by a) select *
from cte order by a desc;

 Sort  (cost=381.44..387.82 rows=2550 width=4)
   Disabled Nodes: 1
   Sort Key: cte.a DESC
   CTE cte
     ->  Sort  (cost=179.78..186.16 rows=2550 width=4)
           Disabled Nodes: 1
           Sort Key: b.a
           ->  Seq Scan on b  (cost=0.00..35.50 rows=2550 width=4)
   ->  CTE Scan on cte  (cost=0.00..51.00 rows=2550 width=4)
(9 rows)

master:

explain with cte as materialized (select * from a order by a) select *
from cte order by a desc;

 Sort  (cost=20000000381.44..20000000387.82 rows=2550 width=4)
   Sort Key: cte.a DESC
   CTE cte
     ->  Sort  (cost=10000000179.78..10000000186.16 rows=2550 width=4)
           Sort Key: a.a
           ->  Seq Scan on a  (cost=0.00..35.50 rows=2550 width=4)
   ->  CTE Scan on cte  (cost=0.00..51.00 rows=2550 width=4)
(7 rows)

I'd expect the final sort to have disabled_nodes == 2 since
disabled_cost has been added twice in master.

9. create_set_projection_path() needs to propagate disabled_nodes too:

explain select b from (select a,generate_series(1,2) as b from b) a  limit 1;

 Limit  (cost=0.00..0.03 rows=1 width=4)
   ->  Subquery Scan on a  (cost=0.00..131.12 rows=5100 width=4)
         ->  ProjectSet  (cost=0.00..80.12 rows=5100 width=8)
               ->  Seq Scan on b  (cost=0.00..35.50 rows=2550 width=0)
                     Disabled Nodes: 1

10. create_setop_path() needs to propagate disabled_nodes.

explain select * from b except select * from b limit 1;

 Limit  (cost=0.00..0.80 rows=1 width=8)
   ->  HashSetOp Except  (cost=0.00..160.25 rows=200 width=8)
         ->  Append  (cost=0.00..147.50 rows=5100 width=8)
               Disabled Nodes: 2
               ->  Subquery Scan on "*SELECT* 1"  (cost=0.00..61.00
rows=2550 width=8)
                     Disabled Nodes: 1
                     ->  Seq Scan on b  (cost=0.00..35.50 rows=2550 width=4)
                           Disabled Nodes: 1
               ->  Subquery Scan on "*SELECT* 2"  (cost=0.00..61.00
rows=2550 width=8)
                     Disabled Nodes: 1
                     ->  Seq Scan on b b_1  (cost=0.00..35.50 rows=2550 width=4)
                           Disabled Nodes: 1
(12 rows)

11. create_modifytable_path() needs to propagate disabled_nodes.

explain with cte as (update b set a = a+1 returning *) select * from
cte limit 1;

 Limit  (cost=41.88..41.90 rows=1 width=4)
   CTE cte
     ->  Update on b  (cost=0.00..41.88 rows=2550 width=10)
           ->  Seq Scan on b  (cost=0.00..41.88 rows=2550 width=10)
                 Disabled Nodes: 1
   ->  CTE Scan on cte  (cost=0.00..51.00 rows=2550 width=4)
(6 rows)

12. For the 0002 patch, I do agree that having this visible in EXPLAIN
is a must.  I'd much rather see:  Disabled: true/false.  And just
display this when the disabled_nodes is greater than the sum of the
subpaths.  That might be much more complex to implement, but it's
going to make it much easier to track down the disabled nodes in very
large plans.

David

Re: On disable_cost

Reply via email to