On 27-8-2005 16:27, Tom Lane wrote:
Arjen van der Meijden <[EMAIL PROTECTED]> writes:

Is a nested loop normally so much (3x) more costly than a hash join? Or is it just this query that gets estimated wronly?

There's been some discussion that we are overestimating the cost of
nestloops in general, because we don't take into account that successive
scans of the inner relation are likely to find many pages already in
cache from the earlier scans.  So far no one's come up with a good cost
model to use for this, though.

Ah, that explains. I take it, there already is an estimation for the cost of "the amount of pages that will be loaded for this operation". For indexed lookups this will probably be something like "the amount of expected pages to fetch * the random page cost"?

And appareantly for the nested loop its something like "iterations * amount of pages per iteration * random page cost" ? The naive approach seems to me, is to just calculate the probable amount of pages to fetch from disk rather than from cache.

In this case there are 7692207 rows in 60569 pages and on average 234 rows per product (per nested loop) in the estimation. It estimates that it'll have to do 565 iterations. In worst case for the first 234 rows, no pages are already cached and the rows are all in a seperate page. So thats 234 pages to fetch. In the second iteration, you know already 234 pages are fetched and that's about 0.386% of the total pages. So the expected amount of pages for the next 234 pages expected to be in cache is 234 * 0.00386 = 1. After that you'll have 234 + 233 pages in cache, etc, etc. Following that approach, the 565th iteration only has to pull in about 27 new pages in the worst case of all records being perfectly scattered over the pages, not 234.

Of course this has to be adjusted for the amount of available buffers and cache and the expected amount of pages to fetch for the iterations, which may be less than 234.

When a fetch of a random page costs 4 and one from cache 0.01, there is quite a large difference: 565 * (234 * 4) = 530535 vs 215864,93

Actually the likeliness of a page being in cache is a bit higher, since the expectation increases for each newly fetched page, not for batches of 234. I didn't use that in my calculation here.

Anyway, this is probably been thought over already and there may be many flaws in it. If not, please think it over.

Best regards,

Arjen

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to [EMAIL PROTECTED] so that your
      message can get through to the mailing list cleanly

Reply via email to