On Wed, Mar 18, 2026 at 3:00 PM Lukas Fittl <[email protected]> wrote: > I think we have similar problems elsewhere in Postgres where a large > user input causes an even larger log output - e.g. a case I'm familiar > with are complicated queries with long IN list inputs and their > associated EXPLAIN plans being logged by auto_explain - I recently had > a case where someone reported an OOM due to auto_explain trying to log > a > 100 MB sized query plan. > > I think its actually less a problem with plan advice, since presumably > we won't have ORMs generating plan advice, and even if we do it'd be > per-table - so I think its unlikely a genuine use case would use 1000s > of advice tags. > > That said, I also don't think super long long messages are actually > helpful. I do wonder if we should have a more coarse GUC that limits > DETAIL lines of any kind to a maximum length (e.g. 100 kB) across the > board instead of special casing every emitter.
I think it would be difficult for generic code to do something sensible when the message is really long. I mean, it could just cut it off after N lines, but then you have no idea how much more there would have been, and you probably want to tell the user something about that. You could add a completely generic message to the end like "plus 10525 more lines of output that were truncated for display," but that's pretty unsatisfying. If you want to show something contextually appropriate, the implementation needs to be separate for each case even if the limit is common. Anyway, the question here is not about such a generic mechanism, but about whether somebody wants to argue for sticking a limit of 100 on feedback messages on the theory that log spam is bad, or whether it's fine as-is either because (a) the likelihood of a significant number of people hitting that limit is thought to be too low to worry about or (b) the likelihood of someone wanting all of those messages (e.g. for machine-parsing) is thought to be high enough that a limit is actually worse than no limit. I do not really have a horse in the race, so if nobody else has a strong opinion, I'm going to leave it alone for now and consider changing it if a strong opinion materializes later. > 1) What if we return the utilized advice string as a separate DefElem > with a list of strings, and then the feedback just has to reference an > index into that list? (though I suppose that doesn't actually save > memory, now that I think that through -- unless we assume the caller > already has the advice string, but I don't think we can rely on that) I think that if we're using Integer nodes, it's just never going to be very economical. We could use an IntList, which I believe would be better, but there are a few complications that make me not like this idea very much. One, what the feedback is actually about is a reconstruction of a particular advice item into string form, not the original string, e.g. if you input SEQ_SCAN(foo bar), the feedback will be on SEQ_SCAN(foo) and SEQ_SCAN(bar). Two, even if you ignore that, this would leave consumers of the data with the problem of finding the end of the advice item unless you stored both starting and ending indexes, which would have its own costs. > 2) We could consider having separate DefElems for the different flag > types (i.e. "feedback_failed", "feedback_match_full", etc), and then a > list of strings attached to each - that'd save the nested DefElem and > the Integer node But it would also very often duplicate a bunch of the strings, which seems likely to work out to a loss more often than not. You could avoid that by have a list of strings per unique flag combination, but that would be extra work to compute and I think it would be less convenient for consumers. One user-visible consequence would be that the advice feedback in EXPLAIN output would have much less to do with the original order of the advice string. -- Robert Haas EDB: http://www.enterprisedb.com
