Fix incorrect start up costs for WindowAgg paths (bug #17862)

David Rowley Wed, 12 Apr 2023 02:04:27 -0700

Over on [1], Tim reported that the planner is making some bad choices
when the plan contains a WindowFunc which requires reading all of, or
a large portion of the WindowAgg subnode in order to produce the first
WindowAgg row.


For example:

EXPLAIN (ANALYZE, TIMING OFF)
SELECT COUNT(*) OVER ()
FROM tenk1 t1 INNER JOIN tenk1 t2 ON t1.unique1 = t2.tenthous
LIMIT 1;

With master, we get the following plan:

                                                                    QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.29..0.67 rows=1 width=8) (actual time=47.491..47.492
rows=1 loops=1)
   ->  WindowAgg  (cost=0.29..3815.00 rows=10000 width=8) (actual
time=47.489..47.490 rows=1 loops=1)
         ->  Nested Loop  (cost=0.29..3690.00 rows=10000 width=0)
(actual time=0.026..42.972 rows=10000 loops=1)
               ->  Seq Scan on tenk1 t2  (cost=0.00..445.00 rows=10000
width=4) (actual time=0.009..1.734 rows=10000 loops=1)
               ->  Index Only Scan using tenk1_unique1 on tenk1 t1
(cost=0.29..0.31 rows=1 width=4) (actual time=0.003..0.004 rows=1
loops=10000)
                     Index Cond: (unique1 = t2.tenthous)
                     Heap Fetches: 0
 Planning Time: 0.420 ms
 Execution Time: 48.107 ms

You can see that the time to get the first WindowAgg row (47.489 ms)
is not well aligned to the startup cost (0.29).  This effectively
causes the planner to choose a Nested Loop plan as it thinks it'll
read just 1 row from the join.  Due to the OVER (), we'll read all
rows! Not good.

It's not hard to imagine that a slightly different schema could yield
a *far* worse plan if it opted to use a non-parameterised nested loop
plan and proceed to read all rows from it.

With the attached patch, that turns into:


       QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=928.02..928.02 rows=1 width=8) (actual
time=29.308..29.310 rows=1 loops=1)
   ->  WindowAgg  (cost=928.02..928.07 rows=10000 width=8) (actual
time=29.306..29.308 rows=1 loops=1)
         ->  Hash Join  (cost=395.57..803.07 rows=10000 width=0)
(actual time=10.674..22.032 rows=10000 loops=1)
               Hash Cond: (t1.unique1 = t2.tenthous)
               ->  Index Only Scan using tenk1_unique1 on tenk1 t1
(cost=0.29..270.29 rows=10000 width=4) (actual time=0.036..4.961
rows=10000 loops=1)
                     Heap Fetches: 0
               ->  Hash  (cost=270.29..270.29 rows=10000 width=4)
(actual time=10.581..10.582 rows=10000 loops=1)
                     Buckets: 16384  Batches: 1  Memory Usage: 480kB
                     ->  Index Only Scan using tenk1_thous_tenthous on
tenk1 t2  (cost=0.29..270.29 rows=10000 width=4) (actual
time=0.055..5.437 rows=10000 loops=1)
                           Heap Fetches: 0
 Planning Time: 2.415 ms
 Execution Time: 30.554 ms


I'm not sure if we should consider backpatching a fix for this bug.
We tend not to commit stuff that would destabilise plans in the back
branches.  On the other hand, it's fairly hard to imagine  how we
could make this much worse even given bad estimates.

I do think we should fix this in v16, however.

I'll add this to the "Older bugs affecting stable branches" section of
the PG 16 open items list

David

[1] https://postgr.es/m/17862-1ab8f74b0f7b0...@postgresql.org

fix_bug_17862.patch
Description: Binary data

Fix incorrect start up costs for WindowAgg paths (bug #17862)

Reply via email to