On Fri, Jun 20, 2025 at 9:30 PM Amit Langote <amitlangot...@gmail.com> wrote:
> On Thu, May 22, 2025 at 5:12 PM Amit Langote <amitlangot...@gmail.com> wrote:
> > I have pushed out the revert now.
> >
> > Note that I’ve only reverted the changes related to deferring locks on
> > prunable partitions. I’m planning to leave the preparatory commits
> > leading up to that one in place unless anyone objects. For reference,
> > here they are in chronological order (the last 3 are bug fixes):
> >
> > bb3ec16e14d Move PartitionPruneInfo out of plan nodes into PlannedStmt
> > d47cbf474ec Perform runtime initial pruning outside ExecInitNode()
> > cbc127917e0 Track unpruned relids to avoid processing pruned relations
> > 75dfde13639 Fix an oversight in cbc127917 to handle MERGE correctly
> > cbb9086c9ef Fix bug in cbc127917 to handle nested Append correctly
> > 28317de723b Ensure first ModifyTable rel initialized if all are pruned
> >
> > I think separating initial pruning from plan node initialization is
> > still worthwhile on its own, as evidenced by the improvements in
> > cbc127917e.
>
> I've been thinking about how to address the concerns Tom raised about
> the reverted patch.  Here's a summary of where my thinking currently
> stands.
>
> * CachedPlan invalidation handling:
>
> The first issue is the part of the old design where a CachedPlan
> invalidated during executor startup -- while locking unpruned
> partitions -- was modified in place to replace the stale PlannedStmts
> in its stmt_list with new ones obtained by replanning all queries in
> the enclosing CachedPlanSource's query_list. I did that mainly to
> ensure that replanning happens as soon as the executor discovers the
> plan is invalid, instead of returning to the caller and requiring them
> to go back to plancache.c to trigger replanning. There were many
> issues with making that approach work in practice, because different
> callers of the executor have different ways of running plans from a
> CachedPlan -- with pquery.c in particular being hard to refactor
> cleanly to support that flow.
>
> The first alternative I came up with is to place only the query whose
> PlannedStmt is being initialized into a standalone CachedPlanSource
> and create a corresponding standalone CachedPlan. "Standalone" here
> means that both objects are "saved" independently of the original
> CachedPlanSource and CachedPlan, but are still tracked by the
> invalidation callbacks.
>
> But thinking about it more recently, what's actually important is not
> whether we construct a new CachedPlan at all, but simply that we
> replan just the one query that needs to be run, and use the resulting
> PlannedStmt directly. The planner will have taken all required locks,
> so we don't need to register the plan with the invalidation machinery
> -- concurrent invalidations can't affect correctness.
>
> In that case, the replanned PlannedStmt can be treated as transient
> executor-local state, with no need to carry any of the plan cache
> infrastructure along with it.  To support that, I further assume that,
> because replanning and execution happen essentially back-to-back,
> there's no opportunity for role-based or xmin-based invalidation (as
> is checked for a CachedPlan in CheckCachedPlan()) to affect the plan
> in between. If that reasoning holds, then we don't need to register
> the replanned statement with the invalidation machinery at all.
>
> Because we wouldn't have touched the original CachedPlan at all, the
> stale PlannedStmts in it wouldn't be replaced until the next
> GetCachedPlan() call triggers replanning. I'm willing to accept that
> as a tradeoff for a less invasive design to handle replanning in the
> executor.
>
> Finally, it's worth noting that the executor is always passed the
> entire CachedPlan, regardless of which individual statement is being
> executed. Without per-statement validity tracking, it's hard for the
> executor to tell whether replanning is actually needed for a given
> query when the CachedPlan is marked invalid (is_valid=false), making
> it impossible to selectively replan just one. To support that, what I
> would need is validity tracking at the level of individual
> PlannedStmts -- and perhaps even Querys -- in the source's query_list,
> with the current is_valid flag effectively serving as the logical AND
> of all the individual flags. We didn't need that in the old design,
> because we'd replace all statements to mark the CachedPlan valid again
> -- though Tom was right to point out flaws in the assumption that
> setting is_valid like that was actually safe.
>
> * ExecutorStart() interface damage control:
>
> The other aspect I’ve been thinking about is how to contain the
> changes required inside ExecutorStart(), and limit the disruption to
> ExecutorStart_hooks in particular, while keeping changes for outside
> callers narrowly scoped. In the previous patch, pruning, locking, and
> invalidation checking were all done inside InitPlan(), which is called
> by standard_ExecutorStart() -- an implementation choice that was
> potentially disruptive to extensions using ExecutorStart_hook. Since
> such hooks are expected to call standard_ExecutorStart() to perform
> core plan initialization, they would have to check afterward whether
> the plan had actually been initialized successfully, in case an
> invalidation occurred during InitPlan(). That wasn’t optional, and it
> made it easy for hook authors to miss the fact that
> standard_ExecutorStart() could return without initializing the plan,
> breaking expectations that were previously reliable.
>
> Separately, for top-level callers of the executor, the patch
> introduced a new entry point, ExecutorStartCachedPlan(), to avoid
> requiring each caller to implement its own replanning loop. But that
> approach was also awkward, since it required switching to a
> nonstandard function just to get correct behavior.
>
> What I’m thinking now is that we should instead move the logic for
> pruning, deferred locking, and replanning directly into
> ExecutorStart() itself. In the reverted patch, callers were affected
> mainly because they had to choose between ExecutorStart() and a new
> entry point, ExecutorStartCachedPlan(), which existed solely to handle
> invalidation and replanning. That divergence from the standard API
> made things awkward at the call site.
>
> In contrast, the design I’m proposing avoids any need for new executor
> entry points -- ExecutorStart() retains its original signature and
> behavior, with the added benefit that replanning and pruning are now
> handled internally before hooks or standard initialization logic are
> invoked. The design requires moving some code from
> standard_ExecutorStart() -- specifically the code that sets up the
> EState and parameters -- and from InitPlan() -- namely, the parts that
> initialize the range table, partition pruning state, and perform
> ExecDoInitialPruning().
>
> The callers of ExecutorStart() do still need to ensure that they pass
> the CachedPlan, the CachedPlanSource, and the query_index in QueryDesc
> via CreateQueryDesc(). The executor’s external API remains unchanged.
>
> Importantly, this restructuring would not require any behavioral
> changes for existing ExecutorStart_hook implementations. From a hook’s
> point of view, this is a code motion change only. Hooks are still
> invoked at the same point, but they’re now guaranteed to receive a
> plan that is valid and ready for execution. This avoids the control
> flow surprises introduced by the reverted patch -- specifically, the
> need for hooks to detect whether standard_ExecutorStart() had
> completed successfully -- while preserving the executor’s API and
> execution contract as they exist in master.
>
> I’ll hold off on writing any code for now -- just wanted to lay out
> this direction and hear what others think, especially Tom.

The refinements I described in my email above might help mitigate some
of those executor-related issues. However, I'm starting to wonder if
it's worth reconsidering our decision to handle pruning, locking, and
validation entirely at executor startup, which was the approach taken
in the reverted patch.

The alternative approach, doing initial pruning and locking within
plancache.c itself (which I floated a while ago), might be worth
revisiting. It avoids the complications we've discussed around the
executor API and preserves the clear separation of concerns that
plancache.c provides, though it does introduce some new layering
concerns, which I describe further below.

To support this, we'd need a mechanism to pass pruning results to the
executor alongside each PlannedStmt. For each PartitionPruneInfo in
the plan, that would include the corresponding PartitionPruneState and
the bitmapset of surviving relids determined by initial pruning. Given
that a CachedPlan can contain multiple PlannedStmts, this would
effectively be a list of pruning results, one per statement. One
reasonable way to handle that might be to define a parallel data
structure, separate from PlannedStmt, constructed by plancache.c and
carried via QueryDesc. The memory and lifetime management would mirror
how ParamListInfo is handled today, leaving the executor API unchanged
and avoiding intrusive changes to PlannedStmt.

However, one potentially problematic aspect of this design is managing
the lifecycle of the relations referenced by PartitionPruneState.
Currently, partitioned table relations are opened by the executor
after entering ExecutorStart() and closed automatically by
ExecEndPlan(), allowing cleanup of pruning states implicitly. If we
perform initial pruning earlier, we'd need to keep these relations
open longer, necessitating explicit cleanup calls (e.g., a new
FinishPartitionPruneState()) invoked by the caller of the executor,
such as from ExecutorEnd() or even higher-level callers. This
introduces some questionable layering by shifting responsibility for
relation management tasks, which ideally belong within the executor,
into its callers.

My sense is that the complexity involved in carrying pruning results
via this parallel data structure was one of the concerns Tom raised
previously, alongside the significant pruning code refactoring that
the earlier patch required. The latter, at least, should no longer be
necessary given recent code improvements.

I think that's about as many approaches as I can think of, and would
really appreciate others' thoughts on these alternatives.

-- 
Thanks, Amit Langote


Reply via email to