On 2/22/15 5:41 PM, Tomas Vondra wrote:
Otherwise, the code looks OK to me. Now, there are a few features I'd
like to have for production use (to minimize the impact):

1) no index support:-(

    I'd like to see support for more relation types (at least btree
    indexes). Are there any plans for that? Do we have an idea on how to
    compute that?

It'd be cleaner if had actual an actual am function for this, but see below.

2) sampling just a portion of the table

    For example, being able to sample just 5% of blocks, making it less
    obtrusive, especially on huge tables. Interestingly, there's a
    TABLESAMPLE patch in this CF, so maybe it's possible to reuse some
    of the methods (e.g. functions behind SYSTEM sampling)?

3) throttling

    Another feature minimizing impact of running this on production might
    be some sort of throttling, e.g. saying 'limit the scan to 4 MB/s'
    or something along those lines.

4) prefetch

    fbstat_heap is using visibility map to skip fully-visible pages,
    which is nice, but if we skip too many pages it breaks readahead
    similarly to bitmap heap scan. I believe this is another place where
    effective_io_concurrency (i.e. prefetch) would be appropriate.

All of those wishes are solved in one way or another by vacuum and/or analyze. If we had a hook in the tuple scanning loop and at the end of vacuum you could just piggy-back on it. But really all we'd need for vacuum to be able to report this info is one more field in LVRelStats, a call to GetRecordedFreeSpace for all-visible pages, and some logic to deal with pages skipped because we couldn't get the vacuum lock.

Should we just add this to vacuum instead?
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to