Hi Tomas, Thanks for the detailed comments. I do not think I can answer all of the questions yet, but I would like to first align my current understanding and reply to the parts where I have some thoughts. For the remaining parts, I think I need more analysis/research before making stronger claims.
> On May 12, 2026, at 21:42, Tomas Vondra <[email protected]> wrote: > > After reading this, my thinking was "we should assume the estimates are > accurate" because with bogus estimates we can end up with arbitrarily > bad plans. Garbage in, garbage out. > > But maybe it's not as black-and-white? I agree if one method is much > more robust (i.e. performs better with estimates that are close enough > to the actual values), then that seems like an important feature for > this type of heuristics. > > I wonder how "bad" the estimates are in the presented example, both in > the good and bad runs. I agree with this point. My earlier benchmark was more of an end-to-end test with current postgres estimates, including cases where the estimates can be quite wrong. I used that setup because complex join estimates seem very hard to make accurate in all cases, so these errors are likely to remain part of the practical problem. But for evaluating the join-order algorithm itself, I agree we also need a setup where the estimates are closer to the actual values, for example by first measuring actual row counts/cardinalities and then using those values instead of normal planner estimates. That would answer a different question: the current results show end-to-end behavior with the current postgres estimator, while the additional setup would better isolate the search algorithm. If the estimates are very wrong, the selected plan may be dominated by estimator error rather than by the search method. In that case, I am not yet sure how much a win/loss between two heuristics proves about the join search algorithm itself. On the other hand, if the estimates are close enough and one method is still more robust, I agree that would be an important property for this kind of heuristic. For the presented examples, I need to look more carefully at the good and bad runs. From a first look at some severe regressions, the issue seems to be that some intermediate join row estimates are off by orders of magnitude, but I do not want to claim that as the full answer before doing a more systematic analysis. > IMHO it's futile to search for a perfect heuristics. It we had that, we > wouldn't need the DP mode at all. This can't be the criteria, because no > heuristics would pass that. Agreed. I did not mean that the goal should be a heuristic with no bad cases. That would not be realistic. What I had in mind was a weaker practical goal: assuming planning time stays acceptable and plan advice can still be used to correct bad cases, can the default setting reduce the probability of severe bad cases? For example, using purely made-up numbers, if the new algorithm's default setting has severe regressions in 10% of cases, can we reduce that to 5% or lower? The point is to reduce the probability of bad cases, not to eliminate them. I am not fully sure yet whether this is the right goal/criterion, though. > Regarding the two proposed directions, it's not very clear to me why the > first direction would require broader changes, particularly changes that > would be unnecessary for the second direction (DP for prefix). > > To me it seems the second approach is actually an advanced version of > the first one. How to you pick the prefix to handle by DP if not by some > heuristics, requiring this new signals? That's fair. The DP-prefix approach still needs a heuristic to choose the frontier seed, so it does not avoid the ranking problem. What I meant was mostly about scope, not a strict dependency between the two directions. The current GOO variants mostly use signals already available in postgres at that point, such as estimated cost and estimated result size. If those signals are wrong, different variants may still share similar failure patterns. To get a substantially different signal, we might need something outside the current GOO patch. For example, the CIDR 2021 paper "Simplicity Done Right for Join Ordering" seems like one possible direction for the first approach. My current reading is that it is not only changing a local choice inside the current GOO rule. It also changes the way candidate joins are formed/connected, and it may need extra statistics about join attributes, such as maximum value frequency, to make the risk signal work. That seems broader than only changing the GOO search rule. That is why I originally looked at the DP-prefix direction first: not because I think it is definitely the better direction, but because it seemed more contained to the join search algorithm and could initially reuse existing signals like cost/result size. Better signals could still be used later for prefix/frontier selection if that direction turns out to be useful. I am probably still missing some of the tradeoffs, and I need to study the paper, the failure cases, and other people's views more carefully. After this discussion, it may make more sense to first make the pure GOO baseline and the evaluation framework clearer, and then decide which follow-up direction is worth pursuing. > TBH I don't quite understand what the proposed approach with "using > exact DP for a prefix of the problem" is meant to do. As of now we split > the join problem into smaller parts per join_collapse_limit (=8). If > these problems are "too large" for DP (geqo_threshold=12), we use the > heuristics (GEQO) mode. Or do I misremember this? > > Maybe I'm just "too used" to this, but this seems reasonable to me. Try > searching for a "perfect" solution first (within reason), and only for > large problems fall back to something approximate. Sensible, no? > > To got to the GEQO/GOO code, people have to adjust the limits, so that > (join_collapse_limit >= geqo_threshold). AFAIK almost no one does that, > so most join problems are smaller that geqo_threshold and so handled by > regular DP (for smaller subproblems). > > But let's say someone adjusts the GUCs, gets to GOO, and it handles a > prefix of the problem using the DP approach. How is that different from > keeping the (join_collapse_limit < geqo_threshold) and not even getting > to GOO? Why not to just leave join_collapse_limit to a low value? This is a very good question. I need to better understand the current implementation and think about this more carefully before giving an answer. > Right. When replacing one heuristics with a different one, there will > almost certainly be regressions. Each heuristics will explore a slightly > different subset of the solution space, and it's a matter of luck which > gets a substantially better solution. > > I don't have a fully formed idea how to evaluate this, but I think the > only way to "prove" a the new heuristics is better is to test a lot of > complex join queries, and look at the overall statistics. See how long > the planning took, see how many queries got faster/slower, etc. Agreed. I think we need to test many more large join queries. As mentioned above, I think it would be useful to keep the current-estimate and accurate-estimate evaluations separate, because they answer related but different questions. > The JOB is certainly one option to do that, and it's valuable because > the queries are meant to be realistic / from actual application. But > there's not all that many of them. Yes. In addition to JOB and JOB-Complex, the benchmark repository also has an extended IMDB-backed workload, `imdb_ceb_3k`, with 842 queries involving at least 12 base relations [1]. It uses the same IMDB schema/data family, so it should be a useful larger test set after the smaller JOB/JOB-Complex subset. I have not run that full set for this patch yet, but I plan to use it for the next round of evaluation. > I think it would be useful to write a script that generates joins of > arbitrary complexity (number of relations, how connected they are), and > see how it works on those. It could even generate data to get estimates > with adjustable inaccuracy. That makes sense, but I need to think more about it. I am not sure yet whether this would be easy to design in a useful way. References: [1] Benchmark workloads: https://github.com/Reminiscent/join-order-benchmark/blob/main/WORKLOADS.md -- Best regards, Chengpeng Yan
