On Fri, Mar 17, 2017 at 1:14 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > After a bit more thought, it seems like the bug here is that "the > fraction of the LHS that has a non-matching row" is not one minus > "the fraction of the LHS that has a matching row". In fact, in > this example, *all* LHS rows have both matching and non-matching > RHS rows. So the problem is that neqjoinsel is doing something > that's entirely insane for semijoin cases.

Thanks for the analysis. I had a niggling feeling that there might be something of this sort going on, but I was not sure. > It would not be too hard to convince me that neqjoinsel should > simply return 1.0 for any semijoin/antijoin case, perhaps with > some kind of discount for nullfrac. Whether or not there's an > equal row, there's almost always going to be non-equal row(s). > Maybe we can think of a better implementation but that seems > like the zero-order approximation. Yeah, it's not obvious how to do better than that considering only one clause at a time. Of course, what we really want to know is P(x<>y|z=t), but don't ask me how to compute that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers