Re: [HACKERS] Proposal for background vacuum full/cluster

Paul Tillotson Wed, 20 Apr 2005 17:07:48 -0700

Jim C. Nasby wrote:

I talked to a few people on IRC about this and they didn't think I was
nuts, so maybe this is something practical...
In a nutshell, my idea is to use the normal transactional/XID code to relocate tuples in the heap. Think of doing an UPDATE field=field if you could tell update what page to put the new tuple on.

Be careful not to fire UPDATE triggers on the tuple while doing so.

Of course, it's not quite that simple. For starters, you'd want to do a
conventional vacuum before this, both to free as much space as possible
and to update the FSM. It might also be necessary to prevent backends
from using the pages at the end of the heap (which you're trying to
empty). I'm guessing that could be done just by removing the pages from
the FSM. You'd also need to vacuum after emptying these pages to reclaim
the disk space. To facilitate these things, it might be useful to be
able to vacuum parts of the heap. So as pages are emptied at the end of
the heap, they can be vacuumed and reclaimed while the pages are still
probably in cache (and without requiring a re-vacuum of the entire
table).

Keep in mind that the transaction that does the update can't also vacuum it's own tuples. You'd have to end one transaction, then wait until every transaction running while the updater ran finishes, then start the transaction that vacuums. Obviously your command would need to be able to start and end transactions. (Meaning that it can't be a user defined function, and it probably can't be a normal self-contained command in postgres.)

Taking this technique one step further, it should also be possible to
cluster in the background without blocking everything. One way to do
this would be to empty the first page in the heap by moving it's tuples
elsewhere, and vacuuming that page (but not putting it in the FSM). Once
that page is available, you can start reading in from the clustering
index and moving those tuples to the first page.

One thing that might be an issue for both ideas is index bloat. But
since reindex is a non-blocking operation, it doesn't seem unreasonable
to either do that automatically or have the user do it.

Is this TODOable?

I asked about something like this on the -hackers list a while back, but didn't get any response from any of the knowledgeable hackers.

Are you thinking of coding this, or just suggesting it for others? I was thinking of coding something like this but found that I didn't understand enough of the internals of how the vacuum command actually works to be able to write this. I'd be willing to devote perhaps a few hours a week to it if you want to help me.

Regards,

Paul Tillotson

-----------------

P. S.

The last time I thought about it, I decided that the best solution is probably one that works just like vacuum full except that it scans the table in reverse order. It would do something like this:

- Wait for exclusive lock. - Start at the end of the table -- call this page I. - If page I is completely empty, shrink the heap and go to step 1 again. (Page I is not empty now.) - *Scan forward in the table until you find a page that is empty. (Call it J) If no such page is found, there is no more free space in the table. Exit. - Move the tuples from page I to page J. - Drop the exclusive lock. and go to step one.

*On subsequent iterations of the loop, do not reset J. i.e., start scanning at the last place that free space was known to exist. Presumably no useful amount of free space will get created in the table while this algorithm is running.


---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Proposal for background vacuum full/cluster

Reply via email to