It looks like this is a fairly well understood problem because in the real world it's also often cheaper to speak to people in a small geographic area or time interval too. These wikipedia pages sound interesting and have some external references:
http://en.wikipedia.org/wiki/Cluster_sampling http://en.wikipedia.org/wiki/Multistage_sampling I suspect the hard part will be characterising the nature of the non-uniformity in the sample generated by taking a whole block. Some of it may come from how the rows were loaded (e.g. older rows were loaded by pg_restore but newer rows were inserted retail) or from the way Postgres works (e.g. hotter rows are on blocks with fewer rows in them and colder rows are more densely packed). I've felt for a long time that Postgres would make an excellent test bed for some aspiring statistics research group. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers