Postgres Query Plan using wrong index

Manikandan Swaminathan Tue, 01 Apr 2025 17:49:24 -0700

Hello,

I'm running on the docker postgres:17.0 image and trying to test out the
behavior of adding a new index to a table. Specifically, I wanted to verify
that my new index is actually used by looking at the output of "EXPLAIN
ANALYZE". However, I found that my index is often not being used and wanted
to see the rationale of the query planner when choosing the index.


Reproduction steps
postgres=# select version();
                                                          version

---------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 17.0 (Debian 17.0-1.pgdg120+1) on aarch64-unknown-linux-gnu,
compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit
(1 row)

1. Create database

CREATE DATABASE test LOCALE_PROVIDER icu ICU_LOCALE "en-US-x-icu" LOCALE
"en_US.utf8" TEMPLATE template0;

2. Create table and indices
CREATE TABLE test_table (
    col_a int,
    col_b INT NOT NULL
);
CREATE INDEX IF NOT EXISTS idx_col_a_btree ON test_table(col_b);
CREATE INDEX IF NOT EXISTS idx_col_a_brin ON test_table USING brin (col_b);
CREATE INDEX IF NOT EXISTS idx_col_b_a ON test_table(col_a, col_b);

3. Load 10 million rows into table

DO $$
DECLARE
    batch_count INT := 0;
    b_var INT := 0;
    a_var INT := 1;
    prev_a INT := 1;
    a_null BOOLEAN := FALSE;
    batch_size INT := 1000;
BEGIN
    FOR i IN 1..10000000 LOOP
        IF batch_count = batch_size THEN
            b_var := b_var + 1;
            a_null := NOT a_null;
            IF NOT a_null THEN
                a_var := prev_a + 1;
            ELSE
                prev_a := a_var;
                a_var := NULL;
            END IF;
            batch_count := 0;
        END IF;
        INSERT INTO test_table (col_a, col_b) VALUES (a_var, b_var);
        batch_count := batch_count + 1;
    END LOOP;
END $$;

4. When running the following query, I would expect the index "idx_col_b_a"
to be used: select min(col_b) from test_table  where col_a > 4996.
I have a range-based filter on col_a, and am aggregating the result with
min(col_b). Both columns are covered by "idx_col_b_a". However, explain
analyze indicates otherwise:

postgres=# explain analyze select min(col_b) from test_table  where col_a >
4996;
                                                                      QUERY
PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=63.86..63.87 rows=1 width=4) (actual time=587.550..587.550
rows=1 loops=1)
   InitPlan 1
     ->  Limit  (cost=0.43..63.86 rows=1 width=4) (actual
time=587.542..587.543 rows=1 loops=1)
           ->  Index Scan using idx_col_a_btree on test_table
 (cost=0.43..259400.27 rows=4090 width=4) (actual time=587.541..587.541
rows=1 loops=1)
                 Filter: (col_a > 4996)
                 Rows Removed by Filter: 9992000
 Planning Time: 0.305 ms
 Execution Time: 587.579 ms
(8 rows)

Instead of using idx_col_b_a, it does an index scan on idx_col_a_btree.
This is a problem because of the way how data is structured in my table.
The higher col_a values are associated with higher col_b values. As a
result, the index scan ends up having to scan through most of the index
before finding the first record that matches the critieria "col_a > 4996".

When I DROP the idx_col_a_btree index, the resulting query plan looks much
better because it's using the correct index on col_b:
postgres=# explain analyze select min(col_b) from test_table  where col_a >
4996;
                                                                QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=102.23..102.24 rows=1 width=4) (actual time=0.591..0.592
rows=1 loops=1)
   ->  Index Only Scan using idx_col_b_a on test_table  (cost=0.43..92.01
rows=4090 width=4) (actual time=0.021..0.341 rows=4000 loops=1)
         Index Cond: (col_a > 4996)
         Heap Fetches: 0
 Planning Time: 0.283 ms
 Execution Time: 0.613 ms
(6 rows)

I tried fiddling with the table statistics and the random_page_cost but
neither seemed to make a difference. Is there some nuance here that I'm
missing? Why is the query planner using an index that drastically worsens
the performance of the query?

Postgres Query Plan using wrong index

Reply via email to