On Mon, Mar 16, 2026 at 1:51 PM Robert Haas <[email protected]> wrote:
> On Fri, Mar 13, 2026 at 4:39 AM Lukas Fittl <[email protected]> wrote:
> > - pgpa_parser uses pgpa_yyerror, but that function doesn't use
> > elog(ERROR) like the main parser does - I think it'd be more
> > consistent to use YYABORT explicitly, e.g. how we do it in cubeparse
>
> Perhaps, but I think this needs more thought, because you haven't done
> anything about the calls in pgpa_scanner.l. I think right now, the
> operating principle is that parsing continues after an error and that
> we mustn't crash as a result, though the resulting tree will probably
> be invalid for practical use. If we're going to change that to stop on
> the spot, I think we should probably do it for both lexing and
> parsing, and think about whether that leads to any other changes or
> simplificatons.

Fair - I don't think there is a practical impact from not doing the
YYABORT calls, since as you say care is taken to not crash in that
case.

> > From a usability perspective, I do wonder about two things when it
> > comes to users specifying advice directly (which will happen in
> > practice, e.g. when debugging plan problems):
> >
> > 1) The lack of being able to target scan advice (e.g. SEQ_SCAN) to a
> > partition parent is frustrating - I believe we discussed partitioning
> > earlier in the thread in terms of gathering+applying it, but I do
> > wonder if we shouldn't at least allow users to specify a partitioned
> > table/partitioned index, instead of only the children. See attached
> > nocfbot-0002 for an idea what would be enough, I think.
>
> I'm not on board with this without a lot more study. I've been down
> this road before, and it can easily end in tears. Examining the
> partition structure on the fly can have a performance cost, and it
> might even have security ramifications or at least bugs if there are
> concurrent modifications to the partitioning structure happening.
> Also, the test_plan_advice framework doesn't do much to tell us
> whether this actually works. Also, I understand the frustration and
> I'm sure we'll want to introduce various forms of wildcards, but I
> think there will be a lot of opinions about that should actually look
> like. One can, as you've done here, follow index links from child to
> parent. One can do a wildcard match on the index name. One could want
> to specify an index on a particular column rather than a specific
> name, to survive index renamings. I wouldn't be surprised if there are
> other ideas, too. Three weeks before feature freeze is not the time to
> be taking an opinionated position on what the best answers will
> ultimately turn out to be. It's easy to write a tool that will spit
> out matching index advice for all indexes involving in a partitioning
> hierarchy, and I think that's what people should do for now.

That's fair. I would like us to do something about this in the PG20
release cycle - for my part, I think its reasonable to follow the
declarative partitioning parent-child relationship for indexes if
present - assuming we can sort out the performance/etc. aspects of
that.

For 19 I think we might want to consider calling this out more
explicitly in the documentation under the "Scan Method Advice"
paragraph, i.e. that one cannot specify partition parent table names
(at least not ones that have no data of their own) and instead one has
to specify the partitions individually. Otherwise I think users will
just be confused by the Append node that says "Disabled: true" and the
advice that didn't match.

> > 2) I find the join strategy advice hard to use - pg_hint_plan has
> > hints (e.g. HashJoin) that allow saying "use a hash join when joining
> > these two rels in any order", vs pg_plan_advice requires setting a
> > specific outer rel, which only makes sense when you want to fully
> > specify every aspect of the plan. I suspect users who directly write
> > advice will struggle with specifying join strategy advice as it is
> > right now. We could consider having a different syntax for saying "I
> > want a hash join when these two rels are joined, but I don't care
> > about the order", e.g. "USE_HASH_JOIN(a b)". If you think that's
> > worthwhile I'd be happy to take a stab at a patch.
>
> I'd be inclined to classify that as a design error in pg_hint_plan,
> but maybe I'm just not understanding something. Under what
> circumstances would you know that you wanted two tables to be joined
> via a hash join but not care which one was on which side of the join?

I think the common case would be someone sees the planner picked a
Nested Loop, and instead wants to see the plan that prefers a Hash
Join (or Merge Join), e.g. to understand costing differences. That's
how I usually use pg_hint_plan, to understand what the alternate plan
looked like that the planner didn't pick, but where costs were close.
The top-level "enable_nestloop = off" often tends to not work that
well for complex plans, hence my historic use of pg_hint_plan's
HASHJOIN/MERGEJOIN (or NO_NESTLOOP) hints for this purpose.

> Also, there's a definitional question here. What exactly does
> USE_HASH_JOIN(a b) mean? Possible definitions:
>
> ...
>
> Now, of course, I got to pick these examples, so I picked examples
> that prove my point. Maybe there are examples where a "one side or the
> other" constraint actually works better. But I don't know what those
> examples are. When I've experimented this kind of thing, I've found
> that I never get the results that I want because the planner just does
> something stupid that technically satisfies the constraint but is
> nothing like what I actually meant. If you know of examples where my
> definitions suck and the "one side or the other" definition produces
> great results, I'd love to hear about them ...

Thanks for the detailed work through - I think I see your
implementation choice for this more clearly now. I've also re-read the
documentation section on join methods and I think that is clear enough
in terms of how it works.

I don't think a change here is necessary. I think for the use case I
described I will just resort to testing both variants (i.e. being more
specific which shape of plan I want), which I think aligns with the
goals of pg_plan_advice as compared to pg_hint_plan.

Later in the release cycle I'll see if I can put together a community
resource that compares pg_hint_plan to pg_plan_advice, and where they
differ. I suspect many end users will have similar questions, and
whilst I don't think explaining the differences belongs in the regular
Postgres docs, it could fit the wiki as a cheatsheet of sorts.

> although I would have
> loved to hear about them even more 4.5 months ago when I first posted
> this patch set and already had the phrase "useless in practice" in the
> README on exactly this topic. This is exactly why I put the patches up
> for design review before they were fully baked.

Understood - I'll admit I mainly looked at the high-level join logic
before (and the join hooks in detail when doing the pg_hint_plan
testing) but had not fully understood how you dealt with join
hierarchies / specifying them in the advice. I had previously looked
at examples where multiple tables were listed assuming it worked like
hint plan, but that's not the case.

>
> > For v20-0001, from a quick conceptual review:
> >
> > I find the two separate GUC mechanisms for local backend vs shared
> > memory a bit confusing as a user (which one should I be using?).
> > Enabling the shared memory mechanism on a system-wide basis seems like
> > it'd likely have too high overhead anyway for production systems?
> > (specifically worried about the advice capturing for each plan, though
> > I haven't benchmarked it)
> >
> > I wonder if we shouldn't keep this simpler for now, and e.g. only do
> > the backend local version to start - we could iterate a bit on
> > system-wide collection out-of-core, e.g. I'm considering teaching
> > pg_stat_plans to optionally collect plan advice the first time it sees
> > a plan ID (plan advice is roughly a superset of what we consider a
> > plan ID there), and then we could come back to this for PG20.
>
> The shared version is rather useful for testing, though. That's
> actually why I created it initially: turn on the shared collector, run
> the regression tests, and then use SQL to look through the collector
> results for interesting things. You can't do that with the local
> collector.

Right - I can see the usefulness for testing, but I worry that people
use it on production systems and then experience unexpected
performance issues. That said, we could address that with a warning in
the docs noting its not intended for production use.

Out of time for today to think through naming for 0003 more, but I'll
see that I find more time this week.

Thanks,
Lukas

-- 
Lukas Fittl


Reply via email to