Re: [HACKERS] Erronous sort used in query plan

Shane Ambler Sun, 07 Jan 2007 13:05:19 -0800

Tom Lane wrote:

Shane Ambler <[EMAIL PROTECTED]> writes:

I am putting together searches on the catalog info and came up with aselect that was rather slow and I noticed that in the explain analyzethere is a sort step on one of the left joins which I don't thinkbelongs there.


Well, it's certainly necessary in context because it's preparing the
data for the merge join immediately above it.  The correct question
is why is the thing using a merge join here, when a hash join would be
cheaper?

I dug through this and found out that the hash join is estimated as
cheaper, right up till the last step of cost_hashjoin:

    /*
     * Bias against putting larger relation on inside.  We don't want an
     * absolute prohibition, though, since larger relation might have better
     * bucketsize --- and we can't trust the size estimates unreservedly,
     * anyway.  Instead, inflate the run cost by the square root of the size
     * ratio.  (Why square root?  No real good reason, but it seems
     * reasonable...)
     *
     * Note: before 7.4 we implemented this by inflating startup cost; but if
     * there's a disable_cost component in the input paths' startup cost, that
     * unfairly penalizes the hash.  Probably it'd be better to keep track of
     * disable penalty separately from cost.
     */
    if (innerbytes > outerbytes && outerbytes > 0)
        run_cost *= sqrt(innerbytes / outerbytes);

In this example, the data volume from the join of everything else is
estimated as less than what needs to be fetched from pg_proc, and so
this bias kicks in, and the cost estimate roughly doubles.
Unfortunately, because it's a LEFT JOIN, we'll never consider hashjoin
in the other direction and so the hash loses out to the mergejoin.

It seems clear to me that we ought not impose a bias unless the join
type is such that both directions of hashing are feasible.  I wonder
also if the bias is too large ... but there's not really evidence for
or against that in this example.  The point is that this code implicitly
assumes both directions will be tried, and they won't.

I think that the selected sort (or at least the merge join) is incorrect- the column sorted (or both actually) is linking the current record inpg_operator with the oid in the pg_proc - it will only return one row.

If one of the pg_type joins is changed, it then sorts on the oid ofpg_operator as the foreign table - again this will only return one row.

I would think that the foreign oid would indicate to the planner that itwill only find one foreign row to link with.

I can see that the error I made created a funny (probably uselessactually) link that would throw things out, but I would expect it tocreate bad planning for the two joins that are in error not anon-related one to one join. If a sort/merge join was created from thisand used to satisfy this join I would accept that as part of what Iunintentionally requested but instead it generates a sort/merge join ona join that links one current record to one foreign record and hasnothing in common with the joins in error.




--

Shane Ambler
[EMAIL PROTECTED]

Get Sheeky @ http://Sheeky.Biz

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
      choose an index scan if your joining column's datatypes do not
      match

Re: [HACKERS] Erronous sort used in query plan

Reply via email to