Re: Organize working memory under per-PlanState context

Chao Li Tue, 19 Aug 2025 22:39:29 -0700


> On Aug 20, 2025, at 07:34, Jeff Davis <[email protected]> wrote:
> 
> 
> <v1-0001-Use-per-PlanState-memory-context-for-work-mem.patch>



I understand this is not a final patch, so I would focus on the design:

1. This patch adds ps_WorkMem to PlanState, and make it as parent for other 
per-node memory contexts, thus all memory usage with that node is grouped and 
measurable, which is good. 

I am thinking that, this change may lead to a chance to free those per-node 
memory usage earlier once a node finishes. Before this patch, those node 
execution memory contexts’ parent/grandparent is portal context, memories will 
only be released once a query is done. Now, with this change, maybe we can free 
per-node memory earlier, which may benefit large complex query executions.

I know some memory must be retained until the entire query finishes. But those 
per-node memories, such as hash table, might be destroyed immediately after a 
node finishes.

2. For the new function ValidateWorkMemParent(), the logic is that, if a parent 
is given, then the parent must be a child of workmem; otherwise use the node’s 
ps_WorkMem as parent. Can you add a comment to explain why design like this? 
And when a parent should be explicitly specified?

3. The new function WorkMemClamBlockSize() will change the existing logic, 
because it always use Min(maxBlockSize, size). But nodeAgg.c, the old logic is:

  /*
   * Like CreateWorkExprContext(), use smaller sizings for smaller work_mem,
   * to avoid large jumps in memory usage.
   */
  maxBlockSize = pg_prevpower2_size_t(work_mem * (Size) 1024 / 16);

  /* But no bigger than ALLOCSET_DEFAULT_MAXSIZE */
  maxBlockSize = Min(maxBlockSize, ALLOCSET_DEFAULT_MAXSIZE);

  /* and no smaller than ALLOCSET_DEFAULT_INITSIZE */
  maxBlockSize = Max(maxBlockSize, ALLOCSET_DEFAULT_INITSIZE);

  aggstate->hash_tablecxt = 
BumpContextCreate(aggstate->ss.ps.state->es_query_cxt,
                        "HashAgg table context",
                        ALLOCSET_DEFAULT_MINSIZE,
                        ALLOCSET_DEFAULT_INITSIZE,
                        maxBlockSize);

The new function doesn’t  cover the logic of “no smaller than 
ALLOCSET_DEFAULT_INTSIZE. Is logic change intended?

4. You add MemoryContextSwitchTo(GetWorkMem(ps)) wrapper around all 
tuplesort_begin_heap():

    oldcontext = MemoryContextSwitchTo(GetWorkMem(&aggstate->ss.ps));

    aggstate->sort_out = tuplesort_begin_heap(tupDesc,
                          sortnode->numCols,
                          sortnode->sortColIdx,
                          sortnode->sortOperators,
                          sortnode->collations,
                          sortnode->nullsFirst,
                          work_mem,
                          NULL, TUPLESORT_NONE);

    MemoryContextSwitchTo(oldcontext);

Should we just pass in a memory context into tuplesort_begin_heap()? Also, for 
the parameter “worker_mem”, should we change it to GetWorkMemLimit()?

5. For the new function CheckWorkMemLimit():

bool
CheckWorkMemLimit(PlanState *ps)
{
  size_t allocated = MemoryContextMemAllocated(GetWorkMem(ps), true);
  return allocated <= ps->ps_WorkMemLimit;
}

It calls GetWorkMem(), and GetWorkMem() will create work men if it is null. 
Meaning that, once a caller calls CheckWorkMemLimit() against a node, if the 
node has no work mem yet, then it will automatically create workmem for the 
node. Will that lead to unneeded workmem creation? If workmem is null, that 
means workmem is not used, thus not exceeding limit.



--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Re: Organize working memory under per-PlanState context

Reply via email to