On Thu, Dec 12, 2013 at 07:22:59AM +1300, Gavin Flower wrote:
> Surely we want to sample a 'constant fraction' (obviously, in
> practice you have to sample an integral number of rows in a page!)
> of rows per page? The simplest way, as Tom suggests, is to use all
> the rows in a page.
> 
> However, if you wanted the same number of rows from a greater number
> of pages, you could (for example) select a quarter of the rows from
> each page.  In which case, when this is a fractional number: take
> the integral number of rows, plus on extra row with a probability
> equal to the fraction (here 0.25).

In this discussion we've mostly used block = 1 postgresql block of 8k. 
But when reading from a disk once you've read one block you can
basically read the following ones practically for free.

So I wonder if you could make your sampling read always 16 consecutive
blocks, but then use 25-50% of the tuples.  That way you get many more
tuples for the same amount of disk I/O seeks..

Have a nice day,
-- 
Martijn van Oosterhout   <klep...@svana.org>   http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.
   -- Arthur Schopenhauer

Attachment: signature.asc
Description: Digital signature

Reply via email to