Re: [PERFORM] Inefficient queryplan for query with intersectable

Arjen van der Meijden Sun, 28 Aug 2005 07:47:22 -0700

On 27-8-2005 16:27, Tom Lane wrote:

Arjen van der Meijden <[EMAIL PROTECTED]> writes:

Is a nested loop normally so much (3x) more costly than a hash join? Oris it just this query that gets estimated wronly?


There's been some discussion that we are overestimating the cost of
nestloops in general, because we don't take into account that successive
scans of the inner relation are likely to find many pages already in
cache from the earlier scans.  So far no one's come up with a good cost
model to use for this, though.

Ah, that explains. I take it, there already is an estimation for thecost of "the amount of pages that will be loaded for this operation".For indexed lookups this will probably be something like "the amount ofexpected pages to fetch * the random page cost"?

And appareantly for the nested loop its something like "iterations *amount of pages per iteration * random page cost" ?The naive approach seems to me, is to just calculate the probable amountof pages to fetch from disk rather than from cache.

In this case there are 7692207 rows in 60569 pages and on average 234rows per product (per nested loop) in the estimation. It estimates thatit'll have to do 565 iterations.In worst case for the first 234 rows, no pages are already cached andthe rows are all in a seperate page. So thats 234 pages to fetch.In the second iteration, you know already 234 pages are fetched andthat's about 0.386% of the total pages. So the expected amount of pagesfor the next 234 pages expected to be in cache is 234 * 0.00386 = 1.After that you'll have 234 + 233 pages in cache, etc, etc.Following that approach, the 565th iteration only has to pull in about27 new pages in the worst case of all records being perfectly scatteredover the pages, not 234.

Of course this has to be adjusted for the amount of available buffersand cache and the expected amount of pages to fetch for the iterations,which may be less than 234.

When a fetch of a random page costs 4 and one from cache 0.01, there isquite a large difference: 565 * (234 * 4) = 530535 vs 215864,93

Actually the likeliness of a page being in cache is a bit higher, sincethe expectation increases for each newly fetched page, not for batchesof 234. I didn't use that in my calculation here.

Anyway, this is probably been thought over already and there may be manyflaws in it. If not, please think it over.


Best regards,

Arjen

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to [EMAIL PROTECTED] so that your
      message can get through to the mailing list cleanly

Re: [PERFORM] Inefficient queryplan for query with intersectable

Reply via email to