On Wed, Mar 18, 2026 at 3:00 PM Lukas Fittl <[email protected]> wrote:
> I think we have similar problems elsewhere in Postgres where a large
> user input causes an even larger log output - e.g. a case I'm familiar
> with are complicated queries with long IN list inputs and their
> associated EXPLAIN plans being logged by auto_explain - I recently had
> a case where someone reported an OOM due to auto_explain trying to log
> a > 100 MB sized query plan.
>
> I think its actually less a problem with plan advice, since presumably
> we won't have ORMs generating plan advice, and even if we do it'd be
> per-table - so I think its unlikely a genuine use case would use 1000s
> of advice tags.
>
> That said, I also don't think super long long messages are actually
> helpful. I do wonder if we should have a more coarse GUC that limits
> DETAIL lines of any kind to a maximum length (e.g. 100 kB) across the
> board instead of special casing every emitter.

I think it would be difficult for generic code to do something
sensible when the message is really long. I mean, it could just cut it
off after N lines, but then you have no idea how much more there would
have been, and you probably want to tell the user something about
that. You could add a completely generic message to the end like "plus
10525 more lines of output that were truncated for display," but
that's pretty unsatisfying. If you want to show something contextually
appropriate, the implementation needs to be separate for each case
even if the limit is common. Anyway, the question here is not about
such a generic mechanism, but about whether somebody wants to argue
for sticking a limit of 100 on feedback messages on the theory that
log spam is bad, or whether it's fine as-is either because (a) the
likelihood of a significant number of people hitting that limit is
thought to be too low to worry about or (b) the likelihood of someone
wanting all of those messages (e.g. for machine-parsing) is thought to
be high enough that a limit is actually worse than no limit. I do not
really have a horse in the race, so if nobody else has a strong
opinion, I'm going to leave it alone for now and consider changing it
if a strong opinion materializes later.

> 1) What if we return the utilized advice string as a separate DefElem
> with a list of strings, and then the feedback just has to reference an
> index into that list? (though I suppose that doesn't actually save
> memory, now that I think that through -- unless we assume the caller
> already has the advice string, but I don't think we can rely on that)

I think that if we're using Integer nodes, it's just never going to be
very economical. We could use an IntList, which I believe would be
better, but there are a few complications that make me not like this
idea very much. One, what the feedback is actually about is a
reconstruction of a particular advice item into string form, not the
original string, e.g. if you input SEQ_SCAN(foo bar), the feedback
will be on SEQ_SCAN(foo) and SEQ_SCAN(bar). Two, even if you ignore
that, this would leave consumers of the data with the problem of
finding the end of the advice item unless you stored both starting and
ending indexes, which would have its own costs.

> 2) We could consider having separate DefElems for the different flag
> types (i.e. "feedback_failed", "feedback_match_full", etc), and then a
> list of strings attached to each - that'd save the nested DefElem and
> the Integer node

But it would also very often duplicate a bunch of the strings, which
seems likely to work out to a loss more often than not. You could
avoid that by have a list of strings per unique flag combination, but
that would be extra work to compute and I think it would be less
convenient for consumers. One user-visible consequence would be that
the advice feedback in EXPLAIN output would have much less to do with
the original order of the advice string.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


Reply via email to