On 1/31/26 12:18, Andrei Lepikhov wrote: > On 29/1/26 06:04, Alexandra Wang wrote: >> Hi hackers, >> >> As promised in my previous email, I'm sharing a proof-of-concept patch >> exploring join statistics for correlated columns across relations. >> This is a POC at this point, but I hope the performance numbers below >> give a better idea of both the potential usefulness of join statistics >> and the complexity of implementing them. > I wonder why you chose the JOIN operator only? > > It seems to me that any relational operator produces relational output > that can be treated as a table. The extended statistics code may be > adopted to such relations. > I think it may be a VIEW that you can declare (manually or > automatically) and allow Postgres to build statistics on this 'virtual' > table. So, the main focus may shift to the question: how to provably > match a query subtree to a specific statistic. >
Because for each "supported" operator we need to know two things: (1) how to sample it efficiently (2) how to apply it in selectivity estimation We can't add support for everything at once, and for some cases we may not even know answers to (1) and/or (2). We can't simply store an opaque VIEW, and build the stats by simply executing it (and sampling the results). The whole premise of extended stats is that people define them to fix incorrect estimates. And with incorrect estimates the plan may be terrible, and the VIEW may not even complete. regards -- Tomas Vondra
