On Wed, May 21, 2025 at 12:38 AM Tom Lane <t...@sss.pgh.pa.us> wrote:
> Amit Langote <amitlangot...@gmail.com> writes:
> > Thanks for pointing out the hole in the current handling of
> > CachedPlan->stmt_list. You're right that the approach of preserving
> > the list structure while replacing its contents in-place doesn’t hold
> > up when the rewriter adds or removes statements dynamically. There
> > might be other cases that neither of us have tried.  I don’t think
> > that mechanism is salvageable.
>
> > To address the issue without needing a full revert, I’m considering
> > dropping UpdateCachedPlan() and removing the associated MemoryContext
> > dance to preserve CachedPlan->stmt_list structure. Instead, the
> > executor would replan the necessary query into a transient list of
> > PlannedStmts, leaving the original CachedPlan untouched. That avoids
> > mutating shared plan state during execution and still enables deferred
> > locking in the vast majority of cases.
>
> Yeah, I think messing with the CachedPlan is just fundamentally wrong.
> It breaks the invariant that the executor should not scribble on what
> it's handed --- maybe not as obviously as some other cases, but it's
> still not a good design.

Fair enough. I’ll revert this and some related changes shortly.  WIP
patch attached.

> I kind of feel that we ought to take two steps back and think
> about what it even means to have a generic plan in this situation.
> Perhaps we should simply refuse to use that code path if there are
> prunable partitioned tables involved?

Sorry, I’m not sure I fully understand -- especially what you mean by
“that code path.” If you're referring to the generic plan creation and
reuse path in general, I'd point out that initial runtime pruning was
introduced largely to improve the efficiency of generic plan execution
(albeit without addressing the locking bottleneck at the time -- David
Rowley had explored that earlier). So simply disallowing generic plans
when partitions are involved feels like an odd direction, given that a
major motivation for initial pruning was to make those cases faster.

Custom plans can win when parameters are available, of course, but
there's a major use case involving stable expressions like now() with
time-based partitions, where plan_cache_mode = auto will still choose
a generic plan. So I wouldn’t say that optimizing generic plan
execution -- especially the goal of this project -- is wasted effort
in practice.

> > Let me know what you think -- I’ll hold off on posting a revert or a
> > replacement until we’ve agreed on the path forward.
>
> I had not looked at 525392d57 in any detail before (the claim in
> the commit message that I reviewed it is a figment of someone's
> imagination).

Apologies if I gave the misleading impression that you were on board
with the current design. I meant only to acknowledge your earlier
engagement with the general idea, which I appreciated. I marked it as
“(old versions)” in the commit metadata to reflect that -- clearly I
should’ve been more precise.  I know that the meaning of Reviewed-by
and other tags is evolving and I clearly haven't kept up.

>  Now that I have, I'm still going to argue for revert.
> Aside from the points above, I really hate what's been done to the
> fundamental executor APIs.  The fact that ExecutorStart callers have
> to know about this is as ugly as can be.  I also don't like the
> fact that it's added overhead in cases where there can be no benefit
> (notice that my test case doesn't even involve a partitioned table).

I tried to keep the overhead low by ensuring that the only additional
thing we'd be doing in the regular path is a CachedPlan->is_valid
boolean check in a couple of places, and that further work would only
happen if invalidation actually occurred. That said, I realize the
patch makes invalidation handling apply in more cases than before,
which may itself be seen as added overhead. But I may have
misunderstood your concern -- perhaps it's more about the layering
violation than the raw cycles?

> I still like the core idea of deferring locking, but I don't like
> anything about this implementation of it.  It seems like there has
> to be a better and simpler way.

It's good to hear that you still like the core idea -- I’d really
appreciate it if you're willing to continue bearing with me as I try
to rework this in a way that's cleaner and better aligned with the
overall design. I'd welcome any thoughts you have along the way. I
know this has been a difficult project, and I don't mean to come
across as taking any of it lightly. I'm still hopeful there's a path
forward, but I completely understand the need to reset here.

-- 
Thanks, Amit Langote

Attachment: v1-0001-Revert-Don-t-lock-partitions-pruned-by-initial-pr.patch
Description: Binary data

Reply via email to