Re: Proposal: Progressive explain

Robert Haas Wed, 19 Mar 2025 13:35:55 -0700

On Wed, Mar 19, 2025 at 1:47 PM Rafael Thofehrn Castro
<rafaelt...@gmail.com> wrote:
> Sending a new version as rebase was required.


Reading this thread, it seems to me that there has been a good deal of
discussion about things like the name of the feature and what the UI
ought to be, but not much discussion of whether the feature is
actually safe, and not much detailed review of the code. I'm willing
to bet that everybody wants some version of this feature if we can
convince ourselves that it's not going to do horrible things like
cause server crashes, and that even people who don't get their first
choice in terms of how the feature is named or how the GUCs work will
still be pretty happy to have it overall. However, if it breaks stuff
and there's no easy way to fix the breakage, that's going to be a
problem.

In broad strokes, the danger here is that doing stuff in the middle of
query execution that we currently only do at the end of query
execution will turn out to be problematic. The biggest problem, I
think, is whether it's safe to do all of the things that EXPLAIN does
while we're at some random point in query execution. It looks to me
like the wrap/unwrap stuff is more-or-less consistent with previous
discussions of how a feature of this kind should work, though I don't
recall the previous discussion and I think the patch should contain
some comments about why it works the way that it works. I do notice
that WrapExecProcNodeWithExplain does not walk the ps->initPlan list,
which I think is an oversight.

Without having the prior discussion near to hand, I *think* that the
reason we wanted to do this wrap/unwrap stuff is to make it so that
the progressive EXPLAIN code could only execute when entering a new
plan node rather than at any random point inside of that plan node,
and that does seem a lot safer than the alternative. For example, I
think it means that we won't start trying to do progressive EXPLAIN
while already holding some random LWLock, which is very good. However,
this still means we could do a bunch of catalog access (and thus
potentially receive invalidations) at places where that can't happen
today. I'm not sure that's a problem -- the same thing could happen at
any place in the executor where we evaluate a user-supplied
expression, and there are many such places -- but it's certainly worth
a few senior people thinking real carefully about it and trying to
imagine whether there's any scenario in which it might break something
that works today.

One way in which this proposal seems safer than previous proposals is
that previous proposals have involved session A poking session B and
trying to get session B to emit an EXPLAIN on the fly with no prior
setup. That would be very useful, but I think it's more difficult and
more risky than this proposal, where all the configuration happens in
the session that is going to emit the EXPLAIN output. It knows from
the beginning that it's going to maybe be doing this, and so it can do
whatever setup it likes to accomplish that goal. So I think this
design looks pretty good from that point of view.

I don't understand how this would be safe against interrupts or
errors. If a running query is interrupted, what would cause
ProgressiveExplainCleanup() to be called? If the answer is "nothing,"
why isn't that unsafe?

ExecProcNodeOriginal looks like it's basically the same thing as
ExecProcNodeReal, except that it's for a different purpose. But you
would be hard-pressed to guess which one is used for what purpose
based on the field names or the comments. Maybe it's also worth
worrying about whether this is a scalable design. Can we find a way to
use the existing fields here instead of having to add a new one?

The documentation for the progressive_explain = { off | explain |
analyze } option seems like it should go into more detail about how
the "explain" and "analyze" values are different. I'm not 100% sure I
know the answer, and I'm not the least-experienced person who will
ever read this documentation.

WrapMultiExecProcNodesWithExplain seems like a poor choice of name. It
invites confusion with MultiExecProcNode, to which it is unrelated.

I just went to some trouble to start breaking up the monolith that is
explain.c, so I'm not that happy about seeing this patch dump another
800+ lines of source code into that file. Probably we should have a
new source file for some or this, or maybe even more than one.

The changes to explain.h add three new data types. Two of them begin
with an uppercase letter and one with a lowercase letter. That seems
inconsistent. I also don't think that the declaration of char plan[]
is per PostgreSQL coding style. I believe we always write char
plan[FLEXIBLE_ARRAY_MEMBER]. Also, maybe it should be named something
other than plan, since it's really a string-ified explain-y kind of
thing, not literally a Plan. Also, can we please not have structure
members with single letter names? "h" and "p" are going to be
completely ungreppable, and I like grepping.

It looks very strange to me that ProgressiveExplainPrint() seems to
have a theoretical possibility of generating the output and then
throwing it away if we end up with entry == NULL. I guess maybe that
case is not supposed to happen because ProgressiveExplainInit() is
supposed to create the entry, but then why isn't this an elog(ERROR)
or something instead of a no-op?

It seems like when we replace a longer entry with a shorter one, we
forget that it was originally longer. Hence, if the length of a
progressive EXPLAIN is alternately 2922 characters and 2923
characters, we'll reallocate on every other progressive EXPLAIN
instead of at most once.

I'll try to look at this some more tomorrow. It seems like a very
interesting patch, but time is very short for this release and it
doesn't look to me like we have all the kinks sorted out here just
yet.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Proposal: Progressive explain

Reply via email to