On Fri, Oct 6, 2017 at 7:24 PM, Dilip Kumar <dilipbal...@gmail.com> wrote: > On Fri, Oct 6, 2017 at 6:08 PM, Alexander Kuzmenkov > <a.kuzmen...@postgrespro.ru> wrote: >> >>> Analysis: The estimated value of the lossy_pages is way higher than >>> its actual value and reason is that the total_pages calculated by the >>> "Mackert and Lohman formula" is not correct. >> >> >> I think the problem might be that the total_pages includes cache effects and >> rescans. For bitmap entries we should use something like relation pages * >> selectivity. > > I have noticed that for the TPCH case if I use "pages * selectivity" > it give me better results, but IMHO directly multiplying the pages > with selectivity may not be the correct way to calculate the number of > heap pages it can only give the correct result when all the TID being > fetched are clustered. But on the other hand "Mackert and Lohman > formula" formulae consider that all the TID's are evenly distributed > across the heap pages which can also give the wrong estimation like we > are seeing in our TPCH case.
I agree with the point that the total_pages included the cache effects and rescan when loop_count > 1, that can be avoided if we always calculate heap_pages as it is calculated in the else part (loop_count=0). Fortunately, in all the TPCH query plan what I posted up thread bitmap scan was never at the inner side of the NLJ so loop_count was always 0. I will fix this. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers