On 7/23/13 2:16 AM, Satoshi Nagayasu wrote:
> I've been working on new pgstattuple function to allow
> block sampling [1] in order to reduce block reads while
> scanning a table. A PoC patch is attached.

Take a look at all of the messages linked in
https://commitfest.postgresql.org/action/patch_view?id=778

Jaime and I tried to do what you're working on then, including a random
block sampling mechanism modeled on the stats_target mechanism.  We
didn't do that as part of pgstattuple though, which was a mistake.

Noah created some test cases as part of his thorough review that were
not computing the correct results.  Getting the results correct for all
of the various types of PostgreSQL tables and indexes ended up being
much harder than the sampling part.  See
http://www.postgresql.org/message-id/20120222052747.ge8...@tornado.leadboat.com
in particular for that.

> This new function, pgstattuple2(), samples only 3,000 blocks
> (which accounts 24MB) from the table randomly, and estimates
> several parameters of the entire table.

There should be an input parameter to the function for how much sampling
to do, and if it's possible to make the scale for it to look like
ANALYZE that's helpful too.

I have a project for this summer that includes reviving this topic and
making sure it works on some real-world systems.  If you want to work on
this too, I can easily combine that project into what you're doing.

-- 
Greg Smith   2ndQuadrant US    g...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to