On Fri, Jan 06, 2006 at 01:24:41AM -0500, Greg Stark wrote: > > 5% based on block-based sampling is reasonable; that means a straight 5% of > > the on-disk size of the table, so 5gb for a 100gb table. With random-row > > sampling, that would require as much as 25% of the table, making it easier > > to just scan the whole thing. > > Postgres's current sample sizes are clearly geared towards the histograms > where they are entirely realistic. All of the distinct estimates are clearly > just ad hoc attempts based on the existing sampling. > > Is a mechanism that is only 5x faster than reading the whole table (assuming > random_page_cost of 4) and is off by more than a factor of three 10% of the > time really worth it?
Before we start debating merits of proposals based on random reads, can someone confirm that the sampling code actually does read randomly? I looked at it yesterday; there is a comment that states that blocks to be scanned are passed to the analyze function in physical order, and AFAICT the function that chooses blocks does so based strictly on applying a probability function to block numbers as it increments a counter. It seems that any reading is actually sequential and not random, which makes all the random_page_cost hand-waving null and void. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend