On Fri, Mar 27, 2026 at 4:31 AM Lukas Fittl <[email protected]> wrote:
> I've been thinking more about 0003 (pg_collect_advice) today, and I'm
> getting increasingly skeptical that we should try to get that into 19.
> It just feels like there is more design work to be done here, and I
> don't see the pressing need to have this in place.
>
> Instead, I wonder if we should just add a "debug_log_plan_advice"
> setting to pg_plan_advice, that always logs the plan advice when
> enabled. Basically like "always_store_advice_details", but emit a log
> line in addition to doing the work. That could either be enabled on a
> single session with a sufficiently high client_min_messages to consume
> it directly, or written to the log for a few minutes when trying to
> capture production activity (on small production systems where the
> logging overhead is acceptable).

Sure, we could do something like that. It means that people have to do
log parsing to get the advice out cleanly, but it avoids the concerns
about memory utilization, and it's simple to code.

> For the other one 0004 (pg_stash_advice) I feel its worth trying to
> get it in, if we can figure out the remaining issues. I'll try to do
> another pass on that tomorrow after some sleep.

Let me just talk a little about the way that I see the space of
possible designs here. Let's set aside what we do or do not have time
to code for a moment and just think about what makes some kind of
sense in theory. I believe we can divide this up along a few different
axes:

1. Where is advice stored for on-the-fly retrieval? Possible answers
include: in shared memory, in local memory, in a table, on disk. But
realistically, I doubt that "in a table" is a realistic option. Even
if we hard-coded a direct index lookup i.e. without going through the
query planner, I think this would be a fairly expensive thing to do
for every query, and if this is going to be usable on production
systems, it has to be fast. I am absolutely certain that "on disk" is
not a realistic option. Local memory is an option. It has the
advantage of making server-lifespan memory leaks impossible, and the
further advantage of avoiding contention between different backends,
since each backend would have its own copy of the data. The big
disadvantage is memory consumption. If we think the advice store is
going to be small, then that might be fine, but somebody is
potentially going to have thousands of advice strings stored,
duplicating that across hundreds (or, gulp, thousands) of backends is
pretty bad.

2. What provision do we have for durability? Possible answers include:
in a table, on disk, nothing. I went with nothing, partly on the
theory that it gives users more flexibility. We don't really care
where they store their query IDs and advice strings, as long as they
have a way to feed them into the mechanism. But of course I was also
trying to save myself implementation complexity. There are some tricky
things about a table: as implemented, the advice stores are
cluster-wide objects, but tables are database objects. If we're
supposed to automatically load advice strings from a table that might
be in another database into an in-memory store, well, we can't. That
could be fixed by scoping stores to a specific database, which would
be inconvenient only for users who have the same schema in multiple
databases and would want to share stashes across DBs, which is
probably not common. A disk file is also an option. It requires
inventing some kind of custom format that we can generate and parse,
which has some complexity, but reading from a table has some
complexity too; I'm not sure which is messier.

3. How do we limit memory usage? One possible approach involves
limiting the size of the hash table by entries or by memory
consumption, but I don't think that's too valuable if that's all we
do, because presumably all that's going to do is annoy people who hit
the limit. It could make sense if we switched to a design where the
superuser creates the stash, assigns it a memory limit, and then
there's a way to give permission to use that stash to some particular
user who is bound by the memory limit. In that kind of design, the
person setting aside the memory has different and superior privileges
to the person using it, so the cap serves a real purpose. A
complicating factor here is that dshash doesn't seem to be well set up
to enforce memory limits. You can do it if each dshash uses a separate
DSA, but that's clunky, too. A completely different direction is to
treat the in-memory component of the system as a cache, backed by a
table or file from which cold entries are retrieved as needed. The
problem with this - and I think it's a pretty big problem - is that
performance will probably fall off a cliff as soon as you overflow the
cache. I mean, it might not, if most of the requests can be satisfied
from cache and a small percentage get fetched from cold storage, but
if it's a case where a uniformly-accessed working set is 10% larger
than the cache, the cache hit ratio will go down the tubes and so will
performance.

4. How do we match advice strings to queries? The query ID is the
obvious answer, but you could also think of having an option to match
on query text, for cases when query IDs collide. You could do
something like store a tag in the query string or in a GUC and look
that up, instead of the query ID, but then I'm not sure why you don't
just store the advice string instead of the tag, and then you don't
need a hash table lookup anymore. You could do some kind of pattern
matching on the query string rather than using the query ID, but that
feels like it would be hard to get right, and also probably more
expensive. There are probably other options here but I don't know what
they are. I doubt that it makes sense from a performance standpoint to
delegate out to a function written in a high-level language, and if
you want to delegate to C code then you can just forget about
pg_stash_advice and just use the add_advisor hook directly. I really
feel like I'm probably missing some possible techniques, here. I
wonder if other people will come up with some clever ideas.

5. What should the security model be? Right now it's as simple as can
be: the superuser gets to decide who can use the mechanism. But also,
not to be overlooked, an individual session can always opt out of
automatic advice application by clearing the stash_name GUC. It
shouldn't be possible to force wrong answers on another session even
if you can impose arbitrary plan_advice, but there is a good chance
that you could find a way to make their session catastrophically slow,
so it's good that users can opt themselves out of the mechanism.
Nonetheless, if we really want to have mutually untrusting users be
able to use this facility, then stashes should have owners, and you
should only be able to access a stash whose owner has authorized you
to do so. This is all a bit awkward because there's no way for an
extension to integrate with the built-in permissions mechanisms for
handling e.g. dropped users, and in-memory objects are a poor fit
anyway. Also, all of this is bound to have a performance cost: if
every access to a stash involves permissions checking, that's going to
add a possibly-significant amount of overhead to a code path that
might be very hot. And, in many environments, that permissions check
would be a complete waste of cycles. Things could maybe be simplified
by deciding that stash access can't be delegated: a stash has one and
only one owner, and it can affect only that user and not any other.
That is unlike what we do for built-in objects, but it simplifies the
permissions check to strcmp(), which is super-cheap compared to
catalog lookups. All in all, I don't really know which way to jump
here: what I've got right now looks almost stupidly primitive, and I
suspect it is, but adding complexity along what seem to be the obvious
lines isn't an obvious win, either.

6. How do we tell users when there's a problem? Right now, the only
available answer is to set pg_plan_advice.feedback_warnings, which I
don't think is unreasonable. If you're using advice as intended, i.e.
only for particular queries that really need it, then it really
shouldn't be generating any output unless something's gone wrong, but
I also don't think everyone is going to want this information going to
the main log file. Somehow percolating the advice feedback back to the
advisor that provided it -- pg_stash_advice or whatever -- feels like
a thing that is probably worth doing. pg_stash_advice could then
summarize the feedback and provide reports: hey, look, of the 100
queries for which you stashed advice, 97 of them always showed /*
matched */ for every item of advice, but the last 3 had varying
numbers of other results. Being able to query that via SQL seems like
it would be quite useful. I tend to think that it's almost a given
that we're going to eventually want something like this, but the
details aren't entirely clear to me. It could for example be developed
as a separate extension that can work for any advisor, rather than
being coupled to pg_stash_advice specifically -- but I'm also not
saying that's better, and I think it might be worse. Figuring out
exactly what we want to do here is a lot less critical than the items
mentioned above because, no matter what we do about any of those
things, this can always be added afterwards if and when desired. But,
I'm mentioning it for completeness.

If there are other design axes we should be talking about, I'm keen to
hear them. This is just what came to mind off-hand. Of course, I'm
also interested in everyone's views on what the right decisions are
from among the available options (or other options they may have in
mind).

Thanks,

-- 
Robert Haas
EDB: http://www.enterprisedb.com


Reply via email to