On Wed, Oct 12, 2016 at 4:19 PM, Thomas Munro <thomas.mu...@enterprisedb.com> wrote: > Here it is, now split into two parts: one patch > to add an amskip operation, and another to consider using it to > implement DISTINCT when it was already has an index only scan path on > an index that supports skipping.
Those patches add an amskip operation and then teach the planner and executor how to do this, given a table foo (a, b) with an index on (a) or (a, b): SELECT DISTINCT foo FROM a Index Only Scan for distinct values of (a) using foo_pkey on foo In just the right circumstances that is vastly faster than scanning every tuple and uniquifying. I was thinking that the next step in this experiment might be to introduce a kind of plan that can handle queries where the index's leading column is not in the WHERE clause, building on the above plan, and behaving conceptually a little bit like a Nested Loop, like so: SELECT * FROM foo WHERE b = 42 Index Skip Scan -> Index Only Scan for distinct values of (a) using foo_pkey on foo -> Index Only Scan using foo_pkey on foo Index Cond: ((a = outer.a) AND (b = 42)) I suppose you could instead have a single node that knows how to perform the whole scan itself so that you don't finish up searching for each distinct value in both the inner and outer subplan, but it occurred to me that building the plan out of Lego bricks like this might generalise better to more complicated queries. I suppose it might also parallelise nicely with a Parallel Index Only Scan as the outer plan. Maybe something similar is possible for handling "GROUP BY a". Worth pursuing? Does amskip suck? Does anyone have better ideas, either for how to do the low level skip or the higher level Index Skip Scan, or perhaps a completely different way of looking at this? -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers