Re: Optimize cardinality estimation when unique keys are fully covered

feichanghong Fri, 21 Nov 2025 09:12:31 -0800


> On Nov 22, 2025, at 00:54, feichanghong <[email protected]> wrote:
> 
> Hi,
> 
> Recently, I ran into a cardinality estimation problem where the query
> predicates fully cover a primary key or unique index. The issue can be
> reproduced with the following case:
> 
> ```sql
> create table t1(a int, b int, primary key(a, b));
> 
> DO $$
> DECLARE
>     i integer;
> BEGIN
>     FOR i IN 1..100000 LOOP
>         INSERT INTO t1 VALUES (i, 0);
>     END LOOP;
> END;
> $$ LANGUAGE plpgsql;
> 
> 
> DO $$
> DECLARE
>     i integer;
> BEGIN
>     FOR j IN 1..10 LOOP
>         FOR i IN 1..100000 LOOP
>             INSERT INTO t1 VALUES (i%10+10*(j-1), i);
>         END LOOP;
>     END LOOP;
> END;
> $$ LANGUAGE plpgsql;
> 
> 
> 
> create table t2(a int);
> insert into t2 select 1;
> 
> analyze t1, t2;
> 
> postgres=# explain analyze select * from t1 where a = 1 and b = 0;
>                                                      QUERY PLAN
> ---------------------------------------------------------------------------------------------------------------------
>  Bitmap Heap Scan on t1  (cost=4.65..2174.00 rows=833 width=8) (actual 
> time=0.060..0.061 rows=1.00 loops=1)
>    Recheck Cond: ((a = 1) AND (b = 0))
>    Heap Blocks: exact=1
>    Buffers: shared hit=4
>    ->  Bitmap Index Scan on t1_pkey  (cost=0.00..4.44 rows=833 width=0) 
> (actual time=0.042..0.043 rows=1.00 loops=1)
>          Index Cond: ((a = 1) AND (b = 0))
>          Index Searches: 1
>          Buffers: shared hit=3
>  Planning Time: 0.146 ms
>  Execution Time: 0.105 ms
> (10 rows)
> 
> postgres=# explain analyze select * from t1, t2 where t1.a = t2.a and t1.b = 
> 0;
>                                                          QUERY PLAN
> -----------------------------------------------------------------------------------------------------------------------------
>  Nested Loop  (cost=0.43..4.81 rows=32 width=12) (actual time=0.067..0.069 
> rows=1.00 loops=1)
>    Buffers: shared hit=5
>    ->  Seq Scan on t2  (cost=0.00..1.01 rows=1 width=4) (actual 
> time=0.024..0.025 rows=1.00 loops=1)
>          Buffers: shared hit=1
>    ->  Index Only Scan using t1_pkey on t1  (cost=0.43..108.23 rows=32 
> width=8) (actual time=0.036..0.036 rows=1.00 loops=1)
>          Index Cond: ((a = t2.a) AND (b = 0))
>          Heap Fetches: 0
>          Index Searches: 1
>          Buffers: shared hit=4
>  Planning Time: 0.257 ms
>  Execution Time: 0.114 ms
> (11 rows)
> 
> ```
> 
> It can be observed that the Index Cond fully covers the primary key index.
> Naturally, the correct rows estimate should be 1, but the optimizer
> estimates the two conditions independently, resulting in an overestimated
> row count. This overestimation has a greater impact in scenarios with
> partitioned tables and multi-table joins, making the planner more inclined
> to choose hash or merge joins.
> 
> We may consider checking in cardinality estimation whether the restrictlist
> fully covers a unique index. If so, we can directly set the estimated rows
> to 1. The attached patch provides a very early demo of this approach.
> 
Apologies for the garbled text due to my email client. Please find the
reproduction SQL below again:


```sql
create table t1(a int, b int, primary key(a, b));

DO $$
DECLARE
    i integer;
BEGIN
    FOR i IN 1..100000 LOOP
        INSERT INTO t1 VALUES (i, 0);
    END LOOP;
END;
$$ LANGUAGE plpgsql;


DO $$
DECLARE
    i integer;
BEGIN
    FOR j IN 1..10 LOOP
        FOR i IN 1..100000 LOOP
            INSERT INTO t1 VALUES (i%10+10*(j-1), i);
        END LOOP;
    END LOOP;
END;
$$ LANGUAGE plpgsql;



create table t2(a int);
insert into t2 select 1;

analyze t1, t2;

postgres=# explain analyze select * from t1 where a = 1 and b = 0;
                                                     QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on t1  (cost=4.65..2174.00 rows=833 width=8) (actual 
time=0.060..0.061 rows=1.00 loops=1)
   Recheck Cond: ((a = 1) AND (b = 0))
   Heap Blocks: exact=1
   Buffers: shared hit=4
   ->  Bitmap Index Scan on t1_pkey  (cost=0.00..4.44 rows=833 width=0) (actual 
time=0.042..0.043 rows=1.00 loops=1)
         Index Cond: ((a = 1) AND (b = 0))
         Index Searches: 1
         Buffers: shared hit=3
 Planning Time: 0.146 ms
 Execution Time: 0.105 ms
(10 rows)

postgres=# explain analyze select * from t1, t2 where t1.a = t2.a and t1.b = 0;
                                                         QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.43..4.81 rows=32 width=12) (actual time=0.067..0.069 
rows=1.00 loops=1)
   Buffers: shared hit=5
   ->  Seq Scan on t2  (cost=0.00..1.01 rows=1 width=4) (actual 
time=0.024..0.025 rows=1.00 loops=1)
         Buffers: shared hit=1
   ->  Index Only Scan using t1_pkey on t1  (cost=0.43..108.23 rows=32 width=8) 
(actual time=0.036..0.036 rows=1.00 loops=1)
         Index Cond: ((a = t2.a) AND (b = 0))
         Heap Fetches: 0
         Index Searches: 1
         Buffers: shared hit=4
 Planning Time: 0.257 ms
 Execution Time: 0.114 ms
(11 rows)

```


Best Regards,
Fei Changhong

Re: Optimize cardinality estimation when unique keys are fully covered

Reply via email to