On 2/4/25 20:43, Jeff Davis wrote: > On Tue, 2025-02-04 at 15:00 +0100, Tomas Vondra wrote: >> This is a surprisingly common query pattern in OLTP applications, >> thanks >> to normalization. > > +1. Creating a small lookup table should be encouraged rather than > penalized. > > Your test data includes a fact table with 10k rows and no index on the > filter condition. In OLTP applications the fact table might often fit > in memory, but I'd still expect it to have an index on the filter > condition. That might not change your overall point, but I'm curious > why you constructed the test that way? >
No particular reason. I think I intended to make it a lookup by PK (which would match the use case examples), and I forgot about that. But yeah, I would expect an index too. > >> There's a lot of stuff that could / should be improved on the current >> patch. For (1) we might add support for more complex cases with >> snowflake schemas [3] or with multiple fact tables. At the same time >> (1) >> needs to be very cheap, so that it does not regress every non- >> starjoin >> query. > > The patch only considers the largest table as the fact table, which is > a good heuristic of course. However, I'm curious if other approaches > might work. For instance, could we consider the table involved in the > most join conditions to be the fact table? > > If you base it on the join conditions rather than the size of the > table, then detection of the star join would be based purely on the > query structure (not stats), which would be nice for predictability. > Right, there may be other (possibly better) ways to detect the star join shape. I was thinking about also requiring for foreign keys on the join clauses - in DWH systems FKeys are sometimes omitted, which would break the heuristics, but in OLTP it's common to still have them. I think the cost of the heuristic will be an important metric - I don't know if the number of join conditions is more expensive to determine than what the patch does now, though. >> But the bigger question is whether it makes sense to have such fast- >> path >> modes for certain query shapes. > > We should explore what kinds of surprising cases it might create, or > what maintenance headaches might come up with future planner changes. > But the performance numbers you posted suggest that we should do > something here. > Yes, it seems like an interesting opportunity for starjoin queries. It's a pretty common query pattern, but it also happens to be very expensive to plan because the dimensions can be reordered almost arbitrarily. regards -- Tomas Vondra