Re: [HACKERS] ANALYZE sampling is too good

Fabien COELHO Fri, 06 Dec 2013 23:19:46 -0800

http://en.wikipedia.org/wiki/Cluster_sampling
http://en.wikipedia.org/wiki/Multistage_sampling


I suspect the hard part will be characterising the nature of the
non-uniformity in the sample generated by taking a whole block. Some
of it may come from how the rows were loaded (e.g. older rows were
loaded by pg_restore but newer rows were inserted retail) or from the
way Postgres works (e.g. hotter rows are on blocks with fewer rows in
them and colder rows are more densely packed).

I would have thought that as VACUUM reclaims space it levels that issue inthe long run and on average, so that it could be simply ignored?

I've felt for a long time that Postgres would make an excellent test
bed for some aspiring statistics research group.

I would say "applied statistics" rather than "research". Nevertheless Ican ask my research statistician colleagues next door about their opinionon this sampling question.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ANALYZE sampling is too good

Reply via email to