On Thu, Jul 15, 2004 at 12:52:38AM -0400, Tom Lane wrote:
> No, it's not missing anything.  The number being reported here is the
> number of rows pulled from the plan node --- but this plan node is on
> the inside of a merge join, and one of the properties of merge join is
> that it will do partial rescans of its inner input in the presence of
> equal keys in the outer input.  If you have, say, 10 occurrences of
> "42" in the outer input, then any "42" rows in the inner input have to
> be rescanned 10 times.  EXPLAIN ANALYZE will count each of them as 10
> rows returned by the input node.

OK, that makes sense, although it seems to me as is loops= should have been
something larger than 1 if the data was scanned multiple times.

> The large multiple here (80-to-one overscan) says that you've got a lot
> of duplicate values in the outer input.  This is generally a good
> situation to *not* use a mergejoin in ;-).  We do have some logic in the
> planner that attempts to estimate the extra cost involved in such
> rescanning, but I'm not sure how accurate the cost model is.

Hum, I'm not sure if I'm in the termiology here -- "outer input" in "A left
join B" is A, right? But yes, I do have a lot of duplicates, that seems to
match my data well.

> Raising shared_buffers seems unlikely to help.  I do agree with raising
> sort_mem --- not so much to make the merge faster as to encourage the
> thing to try a hash join instead.

sort_mem is already 16384, which I thought would be plenty -- I tried
increasing it to 65536 which made exactly zero difference. :-)

/* Steinar */
Homepage: http://www.sesse.net/

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Reply via email to