Greg Stark <[EMAIL PROTECTED]> writes: > Perhaps what this indicates is that the real meat is in track sampling, not > block sampling.
Fwiw, I've done a little benchmarking and I'm starting to think this isn't a bad idea. I see a dramatic speed improvement for samples of 1-10% as the block size increases. Presumably this is as Hannu said, reducing the number of tracks necessary to cover the sample. I see improvements up to around 256M blocks or so, but my data is pretty questionable since I'm busy watching tv in Mythtv in another window. It's on another drive but it still seems to be making the numbers jump around a bit. I expect there's a trade-off between keeping enough blocks for the sample of blocks to be representative on the one hand and large blocks being much faster to read in on the other. I would suggest something like setting the block size in the block sampling algorithm to something like max(8k,sqrt(table size)). That gives 8k blocks for anything up to 255M but takes better advantage of the speed increase available from sequential i/o for larger tables, from my experiments about a 50% increase in speed. Actually maybe even something even more aggressive would be better, maybe (table size)^.75 So it kicks in sooner than on 256M tables and gets to larger block sizes on reasonable sized tables. Note, this doesn't mean anything like changing page sizes, just selecting more blocks that hopefully lie on the same track when possible. -- greg ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match