Re: Projection pushdown to index access method
Robert Haas writes: > On Tue, Sep 19, 2023 at 12:35 PM Chris Cleveland > wrote: >> I'm working on an index access method. I have a function which can appear in >> a projection list which should be evaluated by the access method itself. >> Example: >> ... >> How do I get the system to pull the value from the index instead of trying >> to calculate it? > I don't see how you can do this in general, because there's no > guarantee that the plan will be an Index Scan or Index Only Scan > instead of a Seq Scan or Bitmap Heap/Index Scan. Yeah. There is some adjacent functionality for indexed expressions, which maybe you could use, but it has a lot of shortcomings yet. For example: regression=# create or replace function f(x int) returns int as $$begin return x+1; end$$ language plpgsql strict immutable cost 1000; CREATE FUNCTION regression=# create table mytable (id int, x int); CREATE TABLE regression=# create index on mytable(x, f(x)); CREATE INDEX regression=# set enable_seqscan TO 0; SET regression=# set enable_bitmapscan TO 0; SET regression=# explain verbose select f(x) from mytable; QUERY PLAN - Index Only Scan using mytable_x_f_idx on public.mytable (cost=0.15..5728.06 rows=2260 width=4) Output: (f(x)) (2 rows) If you examine the plan tree closely you can confirm that it is pulling f(x) from the index rather than recomputing it. So maybe you could get somewhere by pretending that my_special_function(body) is an indexed expression. However, there are a couple of big gotchas, which this example illustrates: 1. The index has to also provide x (or for you, "body") or else the planner fails to detect that an IOS is applicable. This comes back to the point Robert made about the planner preferring to think about pulling individual Vars from tables: we don't believe the index is usable in an IOS unless it provides all the Vars the query needs from that table. This wouldn't be hard to fix exactly; the problem is to fix it without spending exponential amounts of planning time in check_index_only. We'd have to detect that all uses of "x" appear in the context "f(x)" in order to realize that we don't need to be able to fetch "x" itself. 2. Costing doesn't account for the fact that we've avoided runtime computation of f(), thus the IOS plan may not be preferred over other plan shapes, which is why I had to force it above. Again, this is pretty closely tied to the fact that we don't recognize until very late in the game that we can get f(x) from the index. 3. This only works for an index-only scan, not regular index scans. There's some early discussion happening about unifying IOS and regular scans a bit more, which perhaps would allow relaxing that (and maybe even solve issue #1?). But it's a long way off yet. If my_special_function() is supposed to always be applied to an indexed column, then issue #1 would fortuitously not be a problem for you. But #2 is a pain, and #3 might be a deal-breaker for you. regards, tom lane
Re: Projection pushdown to index access method
On Tue, Sep 19, 2023 at 12:35 PM Chris Cleveland wrote: > I'm working on an index access method. I have a function which can appear in > a projection list which should be evaluated by the access method itself. > Example: > > SELECT title, my_special_function(body) > FROM books > WHERE book_id <===> 42; > > "<===>" is the operator that invokes the access method. The value returned by > my_special_function() gets calculated during the index scan, and depends on > information that exists only in the index. > > How do I get the system to pull the value from the index instead of trying to > calculate it? I don't see how you can do this in general, because there's no guarantee that the plan will be an Index Scan or Index Only Scan instead of a Seq Scan or Bitmap Heap/Index Scan. > So far, I have created a CustomScan and set it using set_rel_pathlist_hook. > The hook function gives us a PlannerInfo, RelOptInfo, Index, and > RangeTblEntry. So far as I can tell, only RelOptInfo.reltarget.exprs gives us > any info on the SELECT expressions, but unfortunately, the exprs are Var > nodes that contain the (title, body) columns from above, and do not say > anything about my_special_function(). So what does the EXPLAIN plan look like? I'm not quite sure what's happening here, but the planner likes to make plans that just fetch attributes from all the relations being joined (here, there's just one) and then perform the calculation of any expressions at the very end, as the final step, or at least as the final step at that subquery level. And if it plans to ask your custom scan for title, body, and book_id and then compute my_special_function(body) after the fact, the thing you want to happen is not going to happen. If the planner can be induced to ask your custom scan for my_special_function(body), then I *think* you should be able to arrange to get that value any way you like and just return it. But I don't quite know how to induce the planner to do that -- and especially if this query involved more than one table, because of the planner's tendency to postpone expression evaluation until after joins are done. > I know this is possible, because the docs for PathTarget say this: > > PathTarget > * > * This struct contains what we need to know during planning about the > * targetlist (output columns) that a Path will compute. Each RelOptInfo > * includes a default PathTarget, which its individual Paths may simply > * reference. However, in some cases a Path may compute outputs different > * from other Paths, and in that case we make a custom PathTarget for it. > * For example, an indexscan might return index expressions that would > * otherwise need to be explicitly calculated. It's just worth keeping in mind that the planner and the executor are very tightly bound together here. This may be one of those cases getting the executor to do what you want is the easy part, and getting the planner to produce a plan that tells it to do that is the hard part. -- Robert Haas EDB: http://www.enterprisedb.com