Re: Optimize cardinality estimation when unique keys are fully covered

feichanghong Fri, 21 Nov 2025 21:16:27 -0800


> On Nov 22, 2025, at 02:52, Tom Lane <[email protected]> wrote:
> 
> "=?utf-8?B?ZmVpY2hhbmdob25n?=" <[email protected]> writes:
>> We may consider checking in cardinality estimation whether the restrictlist
>> fully covers a unique index. If so, we can directly set the estimated rows
>> to 1. The attached patch provides a very early demo of this approach.
> 
> I think this is far harder than you believe; a simplistic approach
> like this will mainly result in breaking things.  The reason is that
> we need to have the same estimate of the size of a join relation
> regardless of how it is implemented (and, indeed, that size estimate
> is made before we ever consider individual join paths).  Otherwise the
> fundamental model of generating different join paths and comparing
> them for merit breaks down, either at this join level or higher ones.
> But the conclusion that "where t1.a = t2.a and t1.b = 0" means that
> t1's rowcount is 1 only applies if the join is implemented as an inner
> indexscan.  If we choose some other method, say a hash join based on
> a seqscan of t1, having forced that rowcount to 1 would produce
> completely false estimates.


It should be noted that the row estimate for a base relation will only be
adjusted to "1" when the baserestrictinfo fully covers a unique index,
for example "WHERE a = 1 AND b = 0". See set_baserel_size_estimates for
details. In contrast, "WHERE t1.a = t2.a AND t1.b = 0" will only affect
the ppi_rows of a parameterized path; see get_parameterized_baserel_size
for details.

In addition, for indexscan and bitmapscan, I also adjusted the
indexSelectivity in genericcostestimate.

For example, in the following case: in the first SQL, the
baserestrictinfo covers the primary key, so its "rows=1". In the second
SQL, it does not, and the planner chooses a hash join with "rows=98045"
(note that we still adjusted the join row estimate through the
parameterized path).

```sql
postgres=# explain analyze select * from t1 where a = 1 and b = 0;
                                             QUERY PLAN
-----------------------------------------------------------------------------------------------------
 Seq Scan on t1  (cost=0.00..21367.77 rows=1 width=8) (actual 
time=0.027..260.106 rows=1.00 loops=1)
   Filter: ((a = 1) AND (b = 0))
   Rows Removed by Filter: 1099999
   Buffers: shared hit=4868
 Planning Time: 0.111 ms
 Execution Time: 260.136 ms
(6 rows)

postgres=# set enable_indexscan to off;
SET
postgres=# set enable_bitmapscan to off;
SET
postgres=# set max_parallel_workers_per_gather to 0;
SET
postgres=# explain analyze select * from t1, t2 where t1.a = t2.a and t1.b = 0;
                                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=1.02..18986.82 rows=1 width=12) (actual time=0.042..280.086 
rows=1.00 loops=1)
   Hash Cond: (t1.a = t2.a)
   Buffers: shared hit=4869
   ->  Seq Scan on t1  (cost=0.00..18617.81 rows=98045 width=8) (actual 
time=0.025..264.100 rows=100000.00 loops=1)
         Filter: (b = 0)
         Rows Removed by Filter: 1000000
         Buffers: shared hit=4868
   ->  Hash  (cost=1.01..1.01 rows=1 width=4) (actual time=0.010..0.012 
rows=1.00 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 9kB
         Buffers: shared hit=1
         ->  Seq Scan on t2  (cost=0.00..1.01 rows=1 width=4) (actual 
time=0.006..0.007 rows=1.00 loops=1)
               Buffers: shared hit=1
 Planning Time: 0.210 ms
 Execution Time: 280.137 ms
(14 rows)
```

> Now, what you are proposing would make the EXPLAIN output cosmetically
> better, in that instead of
> 
> Nested Loop  (cost=0.43..1.57 rows=32 width=12)
>   ->  Seq Scan on t2  (cost=0.00..1.01 rows=1 width=4)
>   ->  Index Only Scan using t1_pkey on t1  (cost=0.43..4.76 rows=32 width=8)
>         Index Cond: ((a = t2.a) AND (b = 0))
> 
> we would get a rowcount estimate of "1" for the parameterized t1 scan.
> But the estimate for the output of the nestloop would have to remain
> the same.  And the t1 scan is not where the problem is: it already
> made the right choice there.  To the extent that this EXPLAIN output
> is problematic, it's because if this join is part of a bigger query
> then we may make poor choices at upper join levels due to having a bad
> estimate of this join's output size.  I don't see how this line of
> work can fix that.  (Yes, I see the hack you put into
> set_joinrel_size_estimates.  It's a hack not a workable solution,
> because it will distort the size estimates in many cases that are
> not quite what you have here.)


For join cardinality estimation, we still follow the principle that the
rows should remain consistent regardless of which path is chosen. This is
also set in set_joinrel_size_estimates. The only adjustment is: if we know
that pushing down some join_clauses to lower-level nodes can yield a more
accurate row estimate (for example, in "t1.a = t2.a AND t1.b = 0" the
clause "t1.a = t2.a"), then we use the rows estimate after pushdown in the
lower node, and remove this join clause when computing join selectivity.

Of course, this is just a demo of an initial idea, and there are many
aspects that need improvement. Many thanks for your suggestions and
comments.

Best Regards,
Fei Changhong

Re: Optimize cardinality estimation when unique keys are fully covered

Reply via email to