On Thu, 21 Nov 2024 at 13:03, Toto guyoyg <thomas.bes...@hotmail.fr> wrote: > > Offending O(n²) query:
I disagree with the O(n^2) claims. The number of live matched rows in a single table's bitmap scan may be anywhere from 0 (leading to O(1) complexity in the rescan) to 970_662_608_670 (= 226 tuples per page * (2^32 - 1) pages), so such a rescan's complexity - given O(n) complexity for always choosing to use a linear scan on the array to find matches - would better be parameterized as O(n*m), for m live tuples in the generated bitmap, or even O(n*m + k), as the bitmap may contain k dead tuples which can also take some measurable amount of effort to scan. > 4. The EXPLAIN ANALYZE output shows that the planner always supposes that > arrays are of size 10, instead of using the estimated sizes of subqueries > they are created from, or actual size provided as argument. Did you recheck this on a recently released version of PostgreSQL? IIRC this was updated in PG17, and with `= ANY (ARRAY[...])` expressions you will get the expected number of rows for that array expression (assuming accurate statistics on your table). > It looks like this could be improved/fixed by either/all of: > > 1. Using a hashset (or sort + binary search) for recheck (past a certain > array length or even always) instead of always searching linearly IIUC, hashed "= ANY()" expressions were already implemented with commit 50e17ad2 (released in PG14) - look for EEOP_HASHED_SCALARARRAYOP in expression handling code. > 2. Fixing planner assuming that all arrays are of size 10, using instead > actual or estimated sizes. IIUC, this was also already implemented with commit 9391f715 (released in PG17). > Taking it into account for recheck cond cost estimation. (This wouldn't solve > recheck perf, but at least would make the planner aware that lossy heap scans > and Hash index bitmap heap scans are n times more expensive than exact BTree > bitmap heap scans, and hopefully make it avoid picking them.) > > Is my understanding correct? I believe your understanding may be quite out of date. Not all planner or executor features and optimizations are explicitly present in the output of EXPLAIN, and the examples all indicate you may be working with an outdated view of Postgres' capabilities. Kind regards, Matthias van de Meent Neon (https://neon.tech)