Matthias van de Meent <boekewurm+postg...@gmail.com> writes: > On Thu, 21 Nov 2024 at 13:03, Toto guyoyg <thomas.bes...@hotmail.fr> wrote: >> It looks like this could be improved/fixed by either/all of: >> >> 1. Using a hashset (or sort + binary search) for recheck (past a certain >> array length or even always) instead of always searching linearly
> IIUC, hashed "= ANY()" expressions were already implemented with > commit 50e17ad2 (released in PG14) - look for > EEOP_HASHED_SCALARARRAYOP in expression handling code. I checked into that and verified that this test case isn't reaching EEOP_HASHED_SCALARARRAYOP, because that commit only addressed the scenario where the array argument of =ANY is a constant. (A post-const-folding constant, but still a constant.) To apply that optimization here, where the array is an output of a subplan, we'd need some mechanism whereby ExecEvalHashedScalarArrayOp could find out when the array has changed underneath it. That's probably possible (by somehow extending the chgParam signaling logic), but the details aren't obvious. The planner's decision about when to apply hashing would become much less obvious, too. >> 2. Fixing planner assuming that all arrays are of size 10, using instead >> actual or estimated sizes. > IIUC, this was also already implemented with commit 9391f715 (released in > PG17). That's not helpful here either, since the arrays are built on-the-fly rather than being stored in table columns (where stats could be collected on them). You could imagine installing bespoke logic for array_agg() that looks at the estimated rowcount for the sub-select, perhaps. The bottom line here is that improving these queries would take a significant amount of work, and they just aren't very compelling examples that seem to justify such effort. Folding the output of a subquery into an array is largely an anti-pattern in SQL in the first place. regards, tom lane