Hi,

We encountered the following unexpected situation when doing a query on a table 
containing a jsonb column with larg(ish) (64kb minified) JSONB payloads which 
has a predicate on the jsonb column. We created an expression index (see below) 
and expected that the query will use the index. The query might only filter out 
a small part of actual data but since it is a count we expected it will use the 
index to avoid costly rechecks on the actual jsonb data.

The table structure is:

                                              Table "state.state_a"
   Column   |     Type     | Collation | Nullable | Default | Storage  | 
Compression | Stats target | Description
------------+--------------+-----------+----------+---------+----------+-------------+--------------+-------------
 vehicle_id | uuid         |           | not null |         | plain    |        
     |              |
 code       | character(1) |           | not null |         | extended |        
     |              |
 document   | jsonb        |           | not null |         | extended |        
     |              |
Indexes:
...
    "state_a_text_idx" btree ((document ->> 'country_composition_iso'::text))

With an expression index on the predicate. The query in question is simply:

explain (analyse, buffers) select count(vehicle_id) from state.state_a where 
document->>'country_composition_iso' = 'CH';
                                                         QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=6447.74..6447.75 rows=1 width=8) (actual 
time=86404.994..86404.995 rows=1.00 loops=1)
   Buffers: shared hit=960991 read=537879
   ->  Seq Scan on state_a  (cost=0.00..5823.77 rows=249585 width=16) (actual 
time=2.513..86328.761 rows=249460.00 loops=1)
         Filter: ((document ->> 'country_composition_iso'::text) = 'CH'::text)
         Rows Removed by Filter: 5
         Buffers: shared hit=960991 read=537879
 Planning Time: 0.094 ms
 Execution Time: 86405.033 ms
(8 rows)

Which uses a seq scan and as you can see takes quite a long time. Disabling 
seq_scans actually uses the index as expected and is way faster:

SET enable_seqscan = OFF;

 explain (analyse, buffers) select count(vehicle_id) from state.state_a where 
document->>'country_composition_iso' = 'CH';
                                                                    QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=7501.12..7501.13 rows=1 width=8) (actual 
time=170.223..170.224 rows=1.00 loops=1)
   Buffers: shared hit=344 read=1952
   ->  Index Scan using state_a_text_idx on state_a  (cost=0.42..6877.16 
rows=249585 width=16) (actual time=2.236..147.748 rows=249460.00 loops=1)
         Index Cond: ((document ->> 'country_composition_iso'::text) = 
'CH'::text)
         Index Searches: 1
         Buffers: shared hit=344 read=1952
 Planning Time: 0.099 ms
 Execution Time: 170.266 ms
(8 rows)

>From what I can see is that the cost estimate for the seq scan is very low and 
>it seems that it does not include the fact that the filter must load the data 
>from the associated toast table. Looking at the relpages:
 relname | relpages | reltuples
---------+----------+-----------
 state_a |     2079 |    249465
 pg_toast_439105 |   530117 | 2.245078e+06

Suggest that for the cost estimation cost estimation of the seq scan only the 
state_a table and its pages is considered.  My questions would be:


  *
Is the analysis correct that the root cause is in fact the misestimation of the 
costs due to not considering the toast table?
  *
If so, is it possible to fix this?
  *
Our workaround currently is to set the seq_page_cost higher to avoid this 
scenario. Any recommendations how to better workaround this?

Thanks for your help!


Related PG version:
                                   version
------------------------------------------------------------------------------
 PostgreSQL 18.3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 13.2.0, 64-bit
(1 row)


Cheers,

Jan

Reply via email to