Re: Improved Cost Calculation for IndexOnlyScan

Heikki Linnakangas Tue, 29 Sep 2020 01:09:41 -0700

On 29/09/2020 10:06, Hamid Akhtar wrote:

In one of my earlier emails [1], I mentioned that there seems to be aproblem with how the cost for index only scans is being calculated.[1]https://www.postgresql.org/message-id/CANugjhsnh0OBMOYc7qKcC%2BZsVvAXCeF7QiidLuFvg6zmHy1C7A%40mail.gmail.com
My concern is that there seems to be a bigger disconnect between thecost of index only scan and the execution time. Having tested this on 3different systems, docker, laptop and a server with RAID 5 SSDconfigured, at the threshold where index only scan cost exceeds that ofsequential scan, index only is still around 30% faster than thesequential scan.

A 30% discrepancy doesn't sound too bad, to be honest. The exactthreshold depends on so many factors.

My initial hunch was that perhaps we need to consider a differentapproach when considering cost for index only scan. However, thesolution seems somewhat simple.
cost_index function in costsize.c, in case of indexonlyscan, multipliesthe number of pages fetched by a factor of (1.0 - baserel->allvisfrac)which is then used to calculate the max_IO_cost and min_IO_cost.
This is very similar to the cost estimate methods for indexes internallycall genericostesimate function. This function primarily gets the numberof pages for the indexes and multiplies that with random page cost(spc_random_page_cost) to get the total disk access cost.
I believe that in case of index only scan, we should adjust thespc_random_page_cost in context of baserel->allvisfrac so that itaccounts for random pages for only the fraction that needs to be readfor the relation and excludes that the index page fetches.

That doesn't sound right to me. The genericcostestimate() functioncalculates the number of *index* pages fetched. It makes no differenceif it's an Index Scan or an Index Only Scan.

genericcostestimate() could surely be made smarter. Currently, itmultiplies the number of index pages fetched with random_page_cost, eventhough a freshly created index is mostly physically ordered by the keys.seq_page_cost with some fudge factor might be more appropriate, whetheror not it's an Index Only Scan. Not sure what the exact formula shouldbe, just replacing random_page_cost with seq_page_cost is surely notright either.


- Heikki

Re: Improved Cost Calculation for IndexOnlyScan

Reply via email to