On 2024-06-28 21:05, Matthias van de Meent wrote: > On Fri, 28 Jun 2024 at 10:59, Jelte Fennema-Nio <postg...@jeltef.nl> wrote: >> >> On Fri, 28 Jun 2024 at 00:41, Peter Geoghegan <p...@bowt.ie> wrote: >> > Typically, no, it won't be. But there's really no telling for sure. >> > The access patterns for a composite index on '(a, b)' with a qual >> > "WHERE b = 5" are identical to a qual explicitly written "WHERE a = >> > any(<every possible value in 'a'>) AND b = 5". >> >> Hmm, that's true. But in that case the explain plan gives a clear hint >> that something like that might be going on, because you'll see: >> >> Index Cond: a = any(<every possible value in 'a'>) AND b = 5 >> >> That does make me think of another way, and maybe more "correct" way, >> of representing Masahiros situation than adding a new "Skip Scan Cond" >> row to the EXPLAIN output. We could explicitly include a comparison to >> all prefix columns in the Index Cond: >> >> Index Cond: ((test.id1 = 1) AND (test.id2 = ANY) AND (test.id3 = 101)) >> >> Or if you want to go with syntactically correct SQL we could do the >> following: >> >> Index Cond: ((test.id1 = 1) AND ((test.id2 IS NULL) OR (test.id2 IS >> NOT NULL)) AND (test.id3 = 101)) >> >> An additional benefit this provides is that you now know which >> additional column you should use a more specific filter on to speed up >> the query. In this case test.id2 >> >> OTOH, maybe it's not a great way because actually running that puts >> the IS NULL+ IS NOT NULL query in the Filter clause (which honestly >> surprises me because I had expected this "always true expression" >> would have been optimized away by the planner). >> >> > EXPLAIN (VERBOSE, ANALYZE) SELECT id1, id2, id3 FROM test WHERE id1 = 1 >> > AND (id2 IS NULL OR id2 IS NOT NULL) AND id3 = 101; >> QUERY PLAN >> ───────────────────────────────────────────────────── >> Index Only Scan using test_idx on public.test (cost=0.42..12809.10 >> rows=1 width=12) (actual time=0.027..11.234 rows=1 loops=1) >> Output: id1, id2, id3 >> Index Cond: ((test.id1 = 1) AND (test.id3 = 101)) >> Filter: ((test.id2 IS NULL) OR (test.id2 IS NOT NULL)) >> >> > What about cases where we legitimately have to vary our strategy >> > during the same index scan? >> >> Would my new suggestion above address this? >> >> > In fact, that'd make sense even today, without skip scan (just with >> > the 17 work on nbtree SAOP scans). Even with regular SAOP nbtree index >> > scans, the number of primitive scans is hard to predict, and quite >> > indicative of what's really going on with the scan. >> >> *googles nbtree SAOP scans and finds the very helpful[1]* >> >> Yes, I feel like this should definitely be part of the ANALYZE output. >> Seeing how Lukas has to look at pg_stat_user_tables to get this >> information seems quite annoying[2] and only possible on systems that >> have no concurrent queries. >> >> So it sounds like we'd want a "Primitive Index Scans" counter in >> ANALYZE too. In addition to the number of filtered rows by, which if >> we go with my proposal above should probably be called "Rows Removed >> by Index Cond". > > This all just made me more confident that this shows a need to enable > index AMs to provide output for EXPLAIN: The knowledge about how index > scankeys are actually used is exclusively known to the index AM, > because the strategies are often unique to the index AM (or even > chosen operator classes), and sometimes can only be applied at > runtime: while the index scankeys' sizes, attribute numbers and > operators are known in advance (even if not all arguments are filled > in; `FROM a JOIN b ON b.id = ANY (a.ref_array)`), the AM can at least > show what strategy it is likely going to choose, and how (in)efficient > that strategy could be. > > Right now, Masahiro-san's patch tries to do that with an additional > field in IndexPath populated (by proxy) exclusively in btcostestimate. > I think that design is wrong, because it wires explain- and > btree-specific data through the planner, adding overhead everywhere > which is only used for btree- and btree-compatible indexes. > > I think the better choice would be adding an IndexAmRoutine->amexplain > support function, which would get called in e.g. explain.c's > ExplainIndexScanDetails to populate a new "Index Scan Details" (name > to be bikeshed) subsection of explain plans. This would certainly be > possible, as the essentials for outputting things to EXPLAIN are > readily available in the explain.h header.
Yes, that's one of my concerns. I agree to add IndexAmRoutine->amexplain is better because we can support several use cases. Although I'm not confident to add only IndexAmRoutine->amexplain is enough now, I'll make a PoC patch to confirm it. Regards, -- Masahiro Ikeda NTT DATA CORPORATION