On Fri, Oct 6, 2017 at 9:21 PM, Dilip Kumar <dilipbal...@gmail.com> wrote: > On Fri, Oct 6, 2017 at 7:24 PM, Dilip Kumar <dilipbal...@gmail.com> wrote: >> On Fri, Oct 6, 2017 at 6:08 PM, Alexander Kuzmenkov >> <a.kuzmen...@postgrespro.ru> wrote: >>> >>>> Analysis: The estimated value of the lossy_pages is way higher than >>>> its actual value and reason is that the total_pages calculated by the >>>> "Mackert and Lohman formula" is not correct. >>> >>> >>> I think the problem might be that the total_pages includes cache effects and >>> rescans. For bitmap entries we should use something like relation pages * >>> selectivity. >> >> I have noticed that for the TPCH case if I use "pages * selectivity" >> it give me better results, but IMHO directly multiplying the pages >> with selectivity may not be the correct way to calculate the number of >> heap pages it can only give the correct result when all the TID being >> fetched are clustered. But on the other hand "Mackert and Lohman >> formula" formulae consider that all the TID's are evenly distributed >> across the heap pages which can also give the wrong estimation like we >> are seeing in our TPCH case. > > I agree with the point that the total_pages included the cache effects > and rescan when loop_count > 1, that can be avoided if we always > calculate heap_pages as it is calculated in the else part > (loop_count=0). Fortunately, in all the TPCH query plan what I posted > up thread bitmap scan was never at the inner side of the NLJ so > loop_count was always 0. I will fix this.
I have fixed the issue. Now, for calculating the lossy pages it will not consider the rescan. As mentioned above it will not affect the TPCH plan so haven't measured the performance again. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
improve_bitmap_cost_v6.patch
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers