Thanks Tom for the clear explanation. Unfortunately I don't get actual improvements. I use PG 11 and I run the following commands:
ALTER TABLE subscriptions ALTER tags SET STATISTICS 1000; ANALYZE subscriptions; However the bias remains pretty much the same (slightly worse after). Any idea? On Sun, Feb 2, 2020 at 6:11 PM Tom Lane <t...@sss.pgh.pa.us> wrote: > Marco Colli <collimarc...@gmail.com> writes: > > Let's say that you have a simple query like the following on a large > table > > (for a multi-tenant application): > > SELECT "subscribers".* FROM "subscribers" WHERE > "subscribers"."project_id" > > = 123 AND (tags @> ARRAY['de']::varchar[]); > > > If you run EXPLAIN ANALYZE you can see that stats are completely wrong. > > For example I get an expected count of 3,500 rows whereas the actual > > result is 20 rows. This also results in bad query plans... > > > In a previous discussion someone said that this wrong estimate is because > > @> uses a fixed selectivity of 0.001, **regardless of actual data**!! > > Is that true? > > Hasn't been true since 9.2. > > You might get some insight from looking into the most_common_elems, > most_common_elem_freqs, and elem_count_histogram fields of the pg_stats > view. > > It seems likely to me that increasing the statistics target for this array > column would help. IIRC, estimates for values that don't show up in > most_common_elems are going to depend on the lowest frequency that *does* > show up there ... so if you want better resolution for non-common values, > you need more entries. > > regards, tom lane >