Hello all,

We’re seeing intermittently very poor performance of a query, when occasionally 
a poor query plan is chosen. We’re using Postgres 16.9.
One suspicious factor when looking at the EXPLAIN ANALYZE output, is a very 
wrong estimated number of rows to be returned from a text[] column queried with 
‘&&’.

After playing around with a simple recreate (details below), it seems ANALYZE 
of the table is affected by the number of rows in the table. Statistic 
`most_common_elems` is [null] when there’s over 15,873 rows in the table when 
analyzed. With fewer rows it’s analyzed correctly.

Is there any good explanation for this behaviour? Preferably we’d like some way 
for proper `most_common_elems` statistics to be collected in our production 
database, in the hope that influences a good query plan to always be selected.

In our production system there’s ~150,000 rows in a table including a `text[]` 
column, where each row has an array containing a single 19ish char string, 
unique within the table. The full query joins against a couple more tables, and 
has a GIN index on the text[] column. If necessary, I can get into details of 
the real system, but hope the simple recreate will be sufficient to understand 
the problem:



CREATE TABLE IF NOT EXISTS public.test(
    id SERIAL PRIMARY KEY,
    tags text[]
)

INSERT INTO public.test (tags)
        SELECT ARRAY[TO_CHAR(n,'fm00000000')] FROM ( SELECT 
generate_series(1,15_873) AS n );

ANALYZE public.test;

SELECT * FROM pg_stat_user_tables WHERE relname = 'test';

EXPLAIN (ANALYZE,BUFFERS,VERBOSE)
        SELECT * FROM test WHERE tags && ARRAY['00000002']




Results
-------
table with 15_000 rows has most_common_elems after ANALYZE 
(most_common_elem_freqs : 6.666667e-05)
table with 15_872 rows has most_common_elems after ANALYZE 
(most_common_elem_freqs : 6.300403e-05)
table with 15_873 rows has [null] most_common_elems after ANALYZE

table with 100_000 rows has [null] most_common_elems after ANALYZE



Query plans show an estimated 1 row is predicted when statistics has 
`most_common_elems` available, or the hardcoded default 1/200 of the estimated 
table size when most_common_elems is null.
Here 79 rows are estimated, when the table contained 15,873 rows and stats 
weren’t available.

Query plan
-----------
Seq Scan on public.test  (cost=0.00..463.41 rows=79 width=37) (actual 
time=9.934..17.190 rows=1 loops=1)
  Output: id, tags
  Filter: (test.tags && '{00000002}'::text[])
 Rows Removed by Filter: 15872
  Buffers: shared hit=268
Planning:
  Buffers: shared hit=75
Planning Time: 2.060 ms
Execution Time: 17.205 ms


Full version
------------
"PostgreSQL 16.9 (Debian 16.9-1.pgdg120+1) on aarch64-unknown-linux-gnu, 
compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit"


Regards,
Mark Frost
IBM

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: Building C, IBM Hursley Office, Hursley Park Road, 
Winchester, Hampshire SO21 2JN

Reply via email to