On 20.03.2025 15:32, Andrei Lepikhov wrote:
I quite frequently need the number of distinct values (or groups) predicted during the Memoize node creation to understand why caching is sometimes employed or not. But I had thought about an alternative way: having an extensible EXPLAIN (thanks to Robert), we may save optimisation-stage data (I have the same necessity in the case of IncrementalSort, for example) and put it into the Plan node on-demand. So, the way I want to go is a Plan::extlist node and create_plan hook, which may allow copying best_path data to the final plan. So, here, we will add a new parameter and avoid touching the core code. But I would give +1 to current approach if it were done in a shorter time.
Then before proceeding further, I think we need to benchmark this change to ensure there are no performance regressions. If performance issues arise, then using extensible EXPLAIN might be the only viable approach.
I have addressed most of the review comments except for the ExplainPropertyText change. I am attaching v3 of the patch with these updates. If anyone notices any performance issues, please let me know. Issue with ExplainPropertyText for this thread I'll fix in the next patches if it will be necessary.
So far, I have tested it on small machines. There are no performance issues yet. I'll run benchmarks on larger ones soon.
-- Best regards, Ilia Evdokimov, Tantor Labs LLC.
From 02a9b4793deb74f2e67dacabec259c4cd929d995 Mon Sep 17 00:00:00 2001 From: Ilia Evdokimov <ilya.evdoki...@tantorlabs.ru> Date: Thu, 20 Mar 2025 16:58:48 +0300 Subject: [PATCH v3] Show ndistinct and est_entries in EXPLAIN for Memoize --- src/backend/commands/explain.c | 12 ++++++++++++ src/backend/optimizer/path/costsize.c | 3 +++ src/backend/optimizer/plan/createplan.c | 10 +++++++--- src/backend/optimizer/util/pathnode.c | 3 +++ src/include/nodes/pathnodes.h | 2 ++ src/include/nodes/plannodes.h | 6 ++++++ 6 files changed, 33 insertions(+), 3 deletions(-) diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index 33a16d2d8e2..4d8e94d3feb 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -3629,6 +3629,11 @@ show_memoize_info(MemoizeState *mstate, List *ancestors, ExplainState *es) { ExplainPropertyText("Cache Key", keystr.data, es); ExplainPropertyText("Cache Mode", mstate->binary_mode ? "binary" : "logical", es); + if (es->costs) + { + ExplainPropertyFloat("Estimated Capacity", "", ((Memoize *) plan)->est_entries, 0, es); + ExplainPropertyFloat("Estimated Distinct Lookup Keys", "", ((Memoize *) plan)->est_unique_keys, 0, es); + } } else { @@ -3636,6 +3641,13 @@ show_memoize_info(MemoizeState *mstate, List *ancestors, ExplainState *es) appendStringInfo(es->str, "Cache Key: %s\n", keystr.data); ExplainIndentText(es); appendStringInfo(es->str, "Cache Mode: %s\n", mstate->binary_mode ? "binary" : "logical"); + if (es->costs) + { + ExplainIndentText(es); + appendStringInfo(es->str, "Estimated Capacity: %u, Estimated Distinct Lookup Keys: %0.0f\n", + ((Memoize *) plan)->est_entries, + ((Memoize *) plan)->est_unique_keys); + } } pfree(keystr.data); diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c index 256568d05a2..4e42afbf53a 100644 --- a/src/backend/optimizer/path/costsize.c +++ b/src/backend/optimizer/path/costsize.c @@ -2604,6 +2604,9 @@ cost_memoize_rescan(PlannerInfo *root, MemoizePath *mpath, mpath->est_entries = Min(Min(ndistinct, est_cache_entries), PG_UINT32_MAX); + /* Remember ndistinct for a potential EXPLAIN later */ + mpath->est_unique_keys = ndistinct; + /* * When the number of distinct parameter values is above the amount we can * store in the cache, then we'll have to evict some entries from the diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c index 75e2b0b9036..596b82c5dc9 100644 --- a/src/backend/optimizer/plan/createplan.c +++ b/src/backend/optimizer/plan/createplan.c @@ -284,7 +284,8 @@ static Material *make_material(Plan *lefttree); static Memoize *make_memoize(Plan *lefttree, Oid *hashoperators, Oid *collations, List *param_exprs, bool singlerow, bool binary_mode, - uint32 est_entries, Bitmapset *keyparamids); + uint32 est_entries, Bitmapset *keyparamids, + double est_unique_keys); static WindowAgg *make_windowagg(List *tlist, WindowClause *wc, int partNumCols, AttrNumber *partColIdx, Oid *partOperators, Oid *partCollations, int ordNumCols, AttrNumber *ordColIdx, Oid *ordOperators, Oid *ordCollations, @@ -1703,7 +1704,8 @@ create_memoize_plan(PlannerInfo *root, MemoizePath *best_path, int flags) plan = make_memoize(subplan, operators, collations, param_exprs, best_path->singlerow, best_path->binary_mode, - best_path->est_entries, keyparamids); + best_path->est_entries, keyparamids, + best_path->est_unique_keys); copy_generic_path_info(&plan->plan, (Path *) best_path); @@ -6636,7 +6638,8 @@ materialize_finished_plan(Plan *subplan) static Memoize * make_memoize(Plan *lefttree, Oid *hashoperators, Oid *collations, List *param_exprs, bool singlerow, bool binary_mode, - uint32 est_entries, Bitmapset *keyparamids) + uint32 est_entries, Bitmapset *keyparamids, + double est_unique_keys) { Memoize *node = makeNode(Memoize); Plan *plan = &node->plan; @@ -6654,6 +6657,7 @@ make_memoize(Plan *lefttree, Oid *hashoperators, Oid *collations, node->binary_mode = binary_mode; node->est_entries = est_entries; node->keyparamids = keyparamids; + node->est_unique_keys = est_unique_keys; return node; } diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c index 93e73cb44db..1fbcda99067 100644 --- a/src/backend/optimizer/util/pathnode.c +++ b/src/backend/optimizer/util/pathnode.c @@ -1701,6 +1701,9 @@ create_memoize_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath, Assert(enable_memoize); pathnode->path.disabled_nodes = subpath->disabled_nodes; + /* Estimated number of distinct memoization keys, computed using estimate_num_groups() */ + pathnode->est_unique_keys = 0; + /* * Add a small additional charge for caching the first entry. All the * harder calculations for rescans are performed in cost_memoize_rescan(). diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h index c24a1fc8514..0946ccea994 100644 --- a/src/include/nodes/pathnodes.h +++ b/src/include/nodes/pathnodes.h @@ -2040,6 +2040,8 @@ typedef struct MemoizePath uint32 est_entries; /* The maximum number of entries that the * planner expects will fit in the cache, or 0 * if unknown */ + double est_unique_keys; /* Estimated number of distinct memoization keys, + * used for cache size evaluation. Kept for EXPLAIN */ } MemoizePath; /* diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h index f78bffd90cf..b38d4d91fb4 100644 --- a/src/include/nodes/plannodes.h +++ b/src/include/nodes/plannodes.h @@ -1060,6 +1060,12 @@ typedef struct Memoize /* paramids from param_exprs */ Bitmapset *keyparamids; + + /* + * Estimated number of distinct memoization keys, + * used for cache size evaluation. Kept for EXPLAIN + */ + double est_unique_keys; } Memoize; /* ---------------- -- 2.34.1