On Thu, May 28, 2026 at 6:11 PM Tomas Vondra <[email protected]> wrote: > Determine if the query includes a starjoin (or multiple), and remember > the relids included in the starjoin cluster. Pick a "canonical" join > order for each starjoin cluster we found (e.g. with dimensions in relid > order).
I like the idea of trying to avoid considering a bunch of join orders that are not meaningfully different from each other, but I think we're only really guaranteed that two join orders are equivalent if both of them are exactly one-to-one, with the row count neither increasing nor decreasing. (Even then, it's not 100% equivalent, but probably close enough.) I don't remember off-hand whether your definition of starjoin also includes joins that can reduce the row count. If it does, then I think we might need to be smarter than just putting the dimensions in relid order. If it does not, then maybe we need to think about whether the optimization is going to apply to a sufficiently broad range of cases. Maybe it's fine: if a fact table is joined to lots of dimension tables, then some of them might have restriction clauses but probably many of them won't. This is making me wonder whether we could improve on GEQO by transforming it from a "plan this query" thing to a "restrict the join order" thing. For instance, suppose we have two tables T1 and T2 and we wonder whether swapping the order in which they are joined matters. We pick a few random partial join orders of tables in the query excluding T1 and T2 and then check whether appending T1 and then T2 produces significantly different results from appending T2 and then T1. So for example say we pick (), (T3), (T3-T4), and (T5-T3). We then test the cost/row count of T1-T2 vs. T2-T1, the cost/row count of T3-T1-T2 vs. T3-T2-T1, similarly for T3-T4-T1-T2 vs. T3-T4-T2-T1, similarly for T5-T3-T1-T2 vs. T5-T3-T2-T1. If we see that swapping the order of T1 and T2 routinely has minimal effect on the outcome, then it's probably safe to pick one or the other to be joined first and disregard all the join orders where the two are swapped. By repeating this kind of testing for various pairs of tables, we could possibly prune the join search space for large join problems considerably, and then still do an exhaustive test of the remaining join orders. Maybe that would produce better results than GEQO currently does (or maybe not). Although I kind of like this idea and think it's interesting, it's probably not going to be a good enough idea to compete with what you're doing, because your idea is working even when the number of dimensions is very small, and this seems like it would be a significantly more expensive test. So the question is probably whether there are cases where your cheaper test loses too much. I'm not sure of the answer. My intuition is that join planning is pretty hard and that shortcuts will tend to have cases where they behave really badly, but my intuition is also that we are wasting a whole lot of CPU cycles testing cases that are nearly equivalent to each other. I'm not sure which of those two intuitions is more correct in this case. -- Robert Haas EDB: http://www.enterprisedb.com
