On Wed, 2007-02-07 at 14:17 -0500, Tom Lane wrote: > "Simon Riggs" <[EMAIL PROTECTED]> writes: > > The basic idea is that when a tuple is UPDATEd we can, in certain > > circumstances, avoid inserting index tuples for a tuple. Such tuples are > > marked HEAP_ONLY_TUPLE, but their storage is otherwise identical to > > other tuples. > > What is VACUUM FULL going to do when it wants to move one of these things?
In addition to others suggested, one option is to rework VACUUM FULL: Use case for VACUUM FULL is very low these days. It has an appallingly long execution time. This can be speeded up by dropping and re-creating indexes, says the manual, but it is still lengthy. It is even faster to drop the indexes, do a CREATE TABLE AS SELECT * FROM table, drop the old table and then rebuild the indexes. When moving into the new relation the space requirement is not double, since we only copy useful data/space and we also do this using the space the indexes occupied, so the actual space overhead isn't that high. VACUUM FULL also generates lots of WAL and ties up lots of memory while it operates - and it uses memory without any constraint. The CTAS technique doesn't carry across all of the visible tuple chains, but then to be brutally frank, by the time VACUUM FULL has actually finished executing, it is very frequently the oldest transaction anyway, so we needn't really have gone to all the trouble of moving the tuple chains. So the main use case for VACUUM FULL is when the space to be freed inside the table is low enough to make defraging the table quicker than a CTAS, yet still high enough that we were worried enough to do a VACUUM FULL. Thats a very narrow use case, and if it exists at all there's a narrow time window associated with it - only a heavily updated/deleted table needs vacuuming anyway - that means most often be performing the VF when we're already into the zone where the CTAS approach is quicker. VACUUM FULL also forces us to handle various failure cases that leave half-moved tuples scattered across tables. (Incomplete VACUUM FULLs are actually fairly common because of its incredibly long run time). So the radical suggestion is to continue to support the VACUUM FULL command, but using a newly coded technique that is both higher performance and more robust. We can either make VACUUM FULL wait until it actually is the oldest transaction, or we can mark the table in some way so that a lock cannot be obtained upon it by an earlier Xid. We should be able to compact the files of a large table one physical file at a time, so the space overhead is only ever MAX_PHYSICAL_FILESIZE and the space overhead may become a net space gain as the VF continues. This is of course (almost) identical to the approach already in use for CLUSTER, so it seems like that should be acceptable. As a result, its really not that much code and can still be accomplished on time. Note also that we would not have to drop and re-add Foreign Keys, since nothing can have changed while we have the table locked. Doing this also frees up two heap info bits and simplifies many of the HeapTupleSatisfies code, which could probably use the helping hand. Tuples moved to the new files would retain their info bit settings as they are copied across. So overall, it seems a lot easier to completely replace VF than to fight through its complexities and failure cases. If we do the above, then we'll speed up VACUUM FULL and we'll be able to handle HOT tuples easily. > > CREATE INDEX requires some careful work to allow it to identify and > > correctly index HEAP_ONLY_TUPLEs that need to become ~HEAP_ONLY_TUPLE as > > a result of the new index. > > I think you've glossed over the CREATE INDEX problem much too easily. > The difficulty with that is going to be that de-HOT-ifying a tuple > is going to require multiple updates that can't possibly be put into > a single WAL record, and I don't think that WAL replay can clean up > after an incomplete update (since it can't run user-defined functions > and hence cannot be expected to compute index entries for itself). > So I don't think you can do that while preserving crash safety. No intention to gloss, just wanted to get past first post and onto the really complex stuff. You're absolutely right that this is where much of the thinking/work needs to take place. > > Removing the root tuple will require a VACUUM *FULL*. > > That seems unacceptable ... it won't take too long for your table to > fill up with stubs, and we don't want to return to the bad old days > when periodic VACUUM FULL was unavoidable. Completely agree. I wanted to start right at the very beginning, so everybody would understand the issues, rather than jump straight in again with additional complexity. > ISTM we could fix that by extending the index VACUUM interface to > include two concepts: aside from "remove these TIDs when you find them", > there could be "replace these TIDs with those TIDs when you find them". > This would allow pointer-swinging to one of the child tuples, after > which the old root could be removed. This has got the same atomicity > problem as for CREATE INDEX, because it's the same thing: you're > de-HOT-ifying the child. So if you can solve the former, I think you > can make this work too. This is looking like the best option out of the many, since it doesn't have any serious restrictions or penalties. Let's see what Pavan thinks, since he's been working on this aspect. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster