On 10/20/14, 3:11 PM, Andres Freund wrote:
On 2014-10-19 20:43:29 -0500, Jim Nasby wrote:
On 10/19/14, 11:41 AM, Andres Freund wrote:
On 2014-10-18 21:36:48 -0500, Jim Nasby wrote:
The "weird" part is that if it's not doing a freeze it will just punt
on a page if it can't get the cleanup lock.

I don't think that's particularly wierd. Otherwise vacuum can get stuck
behind a single very hot page - leading to much, much more bloat.

I have to believe that could seriously screw up autovacuum scheduling.

Why?

I'm worried there could be some pathological cases where we'd skip a
large number of pages, perhaps if a vacuum scan and a seqscan ended up
running alongside each other.

I've seen little evidence of that. The reverse, a stuck autovacuum, is
imo much more likely. For this to be an actual problem you'd need to
encounter many pages that are not locked, but are pinned. That state
doesn't exist for very long.

How would you actually get evidence of this... we don't log it. :) (See my 
proposal at 
http://www.postgresql.org/message-id/54446c10.2080...@bluetreble.com)

Perhaps this is just paranoia, but we have no idea how bad things
might be, because we don't have any logging for how many pages we
skipped because we couldn't lock them.

But so what? If we skip individual pages it won't be too bad - and very
likely waiting very long is going to be more painful. The page won't be
marked 'all visible' so the next vacuum will come around to it
again. And it'll also get cleaned up by opportunistic hot pruning.

Probably true. Hopefully we can start logging it and then we'll know for sure.


That ultimately, our current method for determining when and what to
vacuum is rather crude, and likely results in wasted effort during
scans as well as not firing autovac often enough. Keep in mind that
autovac started as a user-space utility and the best it could possibly
do was to keep a table of stats counters.

I agree that we should trigger autovacuum more often. It's
*intentionally* not triggered *at all* for insert only workloads (if you
discount anti wraparound vacuums). I think it's time to change that. For
that we'd need to make vacuums that don't delete any tuples cheaper. We
already rescan only the changed parts of the heaps - but we always scan
indexes fully...

Or maybe vacuum isn't the right way to handle some of these scenarios. It's 
become the catch-all for all of this stuff, but maybe that doesn't make sense 
anymore. Certainly when it comes to dealing with inserts there's no reason we 
*have* to do anything other than set hint bits and possibly freeze xmin.

Instead of relying on the crude methods, if we reliably tracked
certain txids on a per-block basis in a fork, we could cheaply scan
the fork and make an extremely informed decision on how much a vacuum
would gain us, and exactly what blocks it should hit.

Let me use freezing as an example. If we had a reliable list of the
lowest txid for each block of a relation that would allow us to do a
freeze scan by hitting only blocks with minimum txid within our freeze
range. The same could be done for multixacts.

It'd also become a prime contention point because you'd need to
constantly update it. In contrast to a simple 'is frozen' bit (akin to
is_visible) which only changes infrequently, and only in one direction.

Actually, the contention on freeze would very possibly be minimal, because it 
probably doesn't change very often. Even if it did, it's OK if the value isn't 
100% accurate, so long as the recorded XID is guaranteed older than what's 
actually on the page.

If we stored 3 txids for each block in a fork, we could fit
information for ~680 heap blocks in each fork block. So in a database
with 680G of heap data, we could fully determine every *block* (not
table) we needed to vacuum by scanning just 1GB of data. That would
allow for far better autovacuum scheduling than what we do today.

It's not that simple. Wraparounds and locking complicate it
significantly.

I realize what I'm talking about isn't trivial (though, I'm confused by your 
comment about wraparound since presumably TransactionIdPrecedes() and it's ilk 
solve that problem...)

My ultimate point here is that we're using what are (today) very crude methods 
to control what gets vacuumed when, and I think that now that we have resource 
forks would could do *much* better without a tremendous amount of work. But to 
make a big advancement here we'll need to take a step back and rethink some 
things (like vacuum is the only way to handle these problems).

Let me put some thought into this.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to