Hello all, We’re seeing intermittently very poor performance of a query, when occasionally a poor query plan is chosen. We’re using Postgres 16.9. One suspicious factor when looking at the EXPLAIN ANALYZE output, is a very wrong estimated number of rows to be returned from a text[] column queried with ‘&&’.
After playing around with a simple recreate (details below), it seems ANALYZE of the table is affected by the number of rows in the table. Statistic `most_common_elems` is [null] when there’s over 15,873 rows in the table when analyzed. With fewer rows it’s analyzed correctly. Is there any good explanation for this behaviour? Preferably we’d like some way for proper `most_common_elems` statistics to be collected in our production database, in the hope that influences a good query plan to always be selected. In our production system there’s ~150,000 rows in a table including a `text[]` column, where each row has an array containing a single 19ish char string, unique within the table. The full query joins against a couple more tables, and has a GIN index on the text[] column. If necessary, I can get into details of the real system, but hope the simple recreate will be sufficient to understand the problem: CREATE TABLE IF NOT EXISTS public.test( id SERIAL PRIMARY KEY, tags text[] ) INSERT INTO public.test (tags) SELECT ARRAY[TO_CHAR(n,'fm00000000')] FROM ( SELECT generate_series(1,15_873) AS n ); ANALYZE public.test; SELECT * FROM pg_stat_user_tables WHERE relname = 'test'; EXPLAIN (ANALYZE,BUFFERS,VERBOSE) SELECT * FROM test WHERE tags && ARRAY['00000002'] Results ------- table with 15_000 rows has most_common_elems after ANALYZE (most_common_elem_freqs : 6.666667e-05) table with 15_872 rows has most_common_elems after ANALYZE (most_common_elem_freqs : 6.300403e-05) table with 15_873 rows has [null] most_common_elems after ANALYZE table with 100_000 rows has [null] most_common_elems after ANALYZE Query plans show an estimated 1 row is predicted when statistics has `most_common_elems` available, or the hardcoded default 1/200 of the estimated table size when most_common_elems is null. Here 79 rows are estimated, when the table contained 15,873 rows and stats weren’t available. Query plan ----------- Seq Scan on public.test (cost=0.00..463.41 rows=79 width=37) (actual time=9.934..17.190 rows=1 loops=1) Output: id, tags Filter: (test.tags && '{00000002}'::text[]) Rows Removed by Filter: 15872 Buffers: shared hit=268 Planning: Buffers: shared hit=75 Planning Time: 2.060 ms Execution Time: 17.205 ms Full version ------------ "PostgreSQL 16.9 (Debian 16.9-1.pgdg120+1) on aarch64-unknown-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit" Regards, Mark Frost IBM Unless otherwise stated above: IBM United Kingdom Limited Registered in England and Wales with number 741598 Registered office: Building C, IBM Hursley Office, Hursley Park Road, Winchester, Hampshire SO21 2JN