On Thu, Jul 20, 2017 at 11:04 AM, Thomas Munro
> On Fri, Jun 2, 2017 at 4:16 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>> I don't think it does really. The thing about a <> semijoin is that it
>> will succeed unless *every* join key value from the inner query is equal
>> to the outer key value (or is null). That's something we should consider
>> to be of very low probability typically, so that the <> selectivity should
>> be estimated as nearly 1.0. If the regular equality selectivity
>> approaches 1.0, or when there are expected to be very few rows out of the
>> inner query, then maybe the <> estimate should start to drop off from 1.0,
>> but it surely doesn't move linearly with the equality selectivity.
> Ok, here I go like a bull in a china shop: please find attached a
> draft patch. Is this getting warmer?
> In the comment for JOIN_SEMI I mentioned a couple of refinements I
> thought of but my intuition was that we don't go for such sensitive
> and discontinuous treatment of stats; so I made the simplifying
> assumption that RHS always has more than 1 distinct value in it.
> Anti-join <> returns all the nulls from the LHS, and then it only
> returns other LHS rows if there is exactly one distinct non-null value
> in RHS and it happens to be that one. But if we make the same
> assumption I described above, namely that there are always at least 2
> distinct values on the RHS, then the join selectivity is just
The patch looks good to me.
+ * For semi-joins, if there is more than one distinct key in the RHS
+ * relation then every non-null LHS row must find a match since it can
+ * only be equal to one of them.
The word "match" confusing. Google's dictionary entry gives "be equal
to (something) in quality or strength." as its meaning. May be we want
to reword it as "... LHS row must find a joining row in RHS ..."?
The Postgres Database Company
Sent via pgsql-hackers mailing list (email@example.com)
To make changes to your subscription: