> On Aug 20, 2025, at 07:34, Jeff Davis <[email protected]> wrote:
>
>
> <v1-0001-Use-per-PlanState-memory-context-for-work-mem.patch>
I understand this is not a final patch, so I would focus on the design:
1. This patch adds ps_WorkMem to PlanState, and make it as parent for other
per-node memory contexts, thus all memory usage with that node is grouped and
measurable, which is good.
I am thinking that, this change may lead to a chance to free those per-node
memory usage earlier once a node finishes. Before this patch, those node
execution memory contexts’ parent/grandparent is portal context, memories will
only be released once a query is done. Now, with this change, maybe we can free
per-node memory earlier, which may benefit large complex query executions.
I know some memory must be retained until the entire query finishes. But those
per-node memories, such as hash table, might be destroyed immediately after a
node finishes.
2. For the new function ValidateWorkMemParent(), the logic is that, if a parent
is given, then the parent must be a child of workmem; otherwise use the node’s
ps_WorkMem as parent. Can you add a comment to explain why design like this?
And when a parent should be explicitly specified?
3. The new function WorkMemClamBlockSize() will change the existing logic,
because it always use Min(maxBlockSize, size). But nodeAgg.c, the old logic is:
/*
* Like CreateWorkExprContext(), use smaller sizings for smaller work_mem,
* to avoid large jumps in memory usage.
*/
maxBlockSize = pg_prevpower2_size_t(work_mem * (Size) 1024 / 16);
/* But no bigger than ALLOCSET_DEFAULT_MAXSIZE */
maxBlockSize = Min(maxBlockSize, ALLOCSET_DEFAULT_MAXSIZE);
/* and no smaller than ALLOCSET_DEFAULT_INITSIZE */
maxBlockSize = Max(maxBlockSize, ALLOCSET_DEFAULT_INITSIZE);
aggstate->hash_tablecxt =
BumpContextCreate(aggstate->ss.ps.state->es_query_cxt,
"HashAgg table context",
ALLOCSET_DEFAULT_MINSIZE,
ALLOCSET_DEFAULT_INITSIZE,
maxBlockSize);
The new function doesn’t cover the logic of “no smaller than
ALLOCSET_DEFAULT_INTSIZE. Is logic change intended?
4. You add MemoryContextSwitchTo(GetWorkMem(ps)) wrapper around all
tuplesort_begin_heap():
oldcontext = MemoryContextSwitchTo(GetWorkMem(&aggstate->ss.ps));
aggstate->sort_out = tuplesort_begin_heap(tupDesc,
sortnode->numCols,
sortnode->sortColIdx,
sortnode->sortOperators,
sortnode->collations,
sortnode->nullsFirst,
work_mem,
NULL, TUPLESORT_NONE);
MemoryContextSwitchTo(oldcontext);
Should we just pass in a memory context into tuplesort_begin_heap()? Also, for
the parameter “worker_mem”, should we change it to GetWorkMemLimit()?
5. For the new function CheckWorkMemLimit():
bool
CheckWorkMemLimit(PlanState *ps)
{
size_t allocated = MemoryContextMemAllocated(GetWorkMem(ps), true);
return allocated <= ps->ps_WorkMemLimit;
}
It calls GetWorkMem(), and GetWorkMem() will create work men if it is null.
Meaning that, once a caller calls CheckWorkMemLimit() against a node, if the
node has no work mem yet, then it will automatically create workmem for the
node. Will that lead to unneeded workmem creation? If workmem is null, that
means workmem is not used, thus not exceeding limit.
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/