Hi, Thanks for the previous feedbacks!
> The way the hook is used seems pretty inconvenient, though. I see the problem, and I agree. I looked into how other hooks work, and I am wondering if it looks ok if we: pass a pointer to the hook, and let the hook check if there is any information applicable. If there is none, the hook just returns False and we let the rest of the code handle. If it is true, we get the selectivity from the hook and return it. So something like ``` if (clauselist_selectivity_hook && (*clauselist_selectivity_hook) (root, clauses, varRelid, jointype, sjinfo, use_extended_stats, &s1)) { return s1; } ``` What I am trying to mock is the get_index_stats_hook ( https://github.com/taminomara/psql-hooks/blob/master/Detailed.md#get_index_stats_hook). Am I understanding your idea correctly and does this look somehow better? Best regards, Xiaozhe On Wed, Nov 17, 2021 at 7:47 PM Tomas Vondra <tomas.von...@enterprisedb.com> wrote: > On 11/17/21 16:39, Xiaozhe Yao wrote: > > Hi Tom, > > > > Thanks for your feedback. I completely agree with you that a > > higher-level hook is better suited for this case. I have adjusted the > > PoC patch to this email. > > > > Now it is located in the clauselist_selectivity_ext function, where we > > first check if the hook is defined. If so, we let the hook estimate the > > selectivity and return the result. With this one, I can also develop > > extensions to better estimate the selectivity. > > > > I think clauselist_selectivity is the right level, because this is > pretty similar to what extended statistics are doing. I'm not sure if > the hook should be called in clauselist_selectivity_ext or in the plain > clauselist_selectivity. But it should be in clauselist_selectivity_or > too, probably. > > The way the hook is used seems pretty inconvenient, though. I mean, if > you do this > > if (clauselist_selectivity_hook) > return clauselist_selectivity_hook(...); > > then what will happen when the ML model has no information applicable to > a query? This is called for all relations, all conditions, etc. and > you've short-circuited all the regular code, so the hook will have to > copy all of that. Seems pretty silly and fragile. > > IMO the right approach is what statext_clauselist_selectivity is doing, > i.e. estimate clauses, mark them as estimated in a bitmap, and let the > rest of the existing code take care of the remaining clauses. So more > something like > > if (clauselist_selectivity_hook) > s1 *= clauselist_selectivity_hook(..., &estimatedclauses); > > > regards > > -- > Tomas Vondra > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company >
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c index d263ecf082..94f7993529 100644 --- a/src/backend/optimizer/path/clausesel.c +++ b/src/backend/optimizer/path/clausesel.c @@ -26,6 +26,9 @@ #include "utils/lsyscache.h" #include "utils/selfuncs.h" +/* Hooks for plugins to get control when we ask for selectivity */ +clauselist_selectivity_hook_type clauselist_selectivity_hook = NULL; + /* * Data structure for accumulating info about possible range-query * clause pairs in clauselist_selectivity. @@ -130,6 +133,18 @@ clauselist_selectivity_ext(PlannerInfo *root, ListCell *l; int listidx; + /* + * If we have a hook for selectivity estimation and it returns True, then go for it. + */ + if (clauselist_selectivity_hook && + (*clauselist_selectivity_hook) (root, clauses, varRelid, jointype, sjinfo, use_extended_stats, &s1)) + { + /* + * The hook takes control of estimating the selectivity + * and saves the estimated selectivity to s1. + */ + return s1; + } /* * If there's exactly one clause, just go directly to * clause_selectivity_ext(). None of what we might do below is relevant. @@ -370,7 +385,19 @@ clauselist_selectivity_or(PlannerInfo *root, Bitmapset *estimatedclauses = NULL; ListCell *lc; int listidx; - + + /* + * If we have a hook for selectivity estimation and it returns True, then go for it. + */ + if (clauselist_selectivity_hook && + (*clauselist_selectivity_hook) (root, clauses, varRelid, jointype, sjinfo, use_extended_stats, &s1)) + { + /* + * The hook takes control of estimating the selectivity + * and saves the estimated selectivity to s1. + */ + return s1; + } /* * Determine if these clauses reference a single relation. If so, and if * it has extended statistics, try to apply those. diff --git a/src/include/optimizer/optimizer.h b/src/include/optimizer/optimizer.h index 41b49b2662..7e10144822 100644 --- a/src/include/optimizer/optimizer.h +++ b/src/include/optimizer/optimizer.h @@ -202,4 +202,17 @@ extern int locate_var_of_level(Node *node, int levelsup); extern List *pull_var_clause(Node *node, int flags); extern Node *flatten_join_alias_vars(Query *query, Node *node); +/* Hooks for external selectivity estimation */ +typedef bool (*clauselist_selectivity_hook_type) ( + PlannerInfo *root, + List *clauses, + int varRelid, + JoinType jointype, + SpecialJoinInfo *sjinfo, + bool use_extended_stats, + Selectivity *s); + +extern PGDLLIMPORT clauselist_selectivity_hook_type clauselist_selectivity_hook; + + #endif /* OPTIMIZER_H */