On Fri, 2006-11-10 at 16:46 +0100, Zeugswetter Andreas ADI SD wrote: > > I'm not sure this really solves that problem because there > > are still DELETEs to consider but it does remove one factor > > that exacerbates it unnecessarily. > > Yea, so you still need to vaccum the large table regularly.
HOT covers the use-case of heavy updating, which in many common cases occurs on tables with few inserts/deletes. HOT would significantly reduce the need to vacuum since deletes and wraparound issues would be the only remaining reasons to do this. [I have some ideas for how to optimize tables with heavy INSERT/DELETE activity, but that case is much less prevalent than heavy UPDATEs.] > > I think the vision is that the overflow table would never be > > very large because it can be vacuumed very aggressively. It > > has only tuples that are busy and will need vacuuming as soon > > as a transaction ends. Unlike the main table which is mostly > > tuples that don't need vacuuming. > > Ok, but you have to provide an extra vacuum that does only that then > (and it randomly touches heap pages, and only does partial work there). Sure, HOT needs a specially optimised VACUUM. > > So a heap that's double in size necessary takes twice as > > long as necessary to scan. The fact that the overflow tables > > are taking up space isn't interesting if they don't have to > > be scanned. > > The overflow does have to be read for each seq scan. And it was stated > that it would > be accessed with random access (follow tuple chain). > But maybe we can read the overflow same as if it where an additional > segment file ? Not without taking a write-avoiding lock on the table, unfortunately. > > Hitting the overflow tables should be quite rare, it only > > comes into play when looking at concurrently updated tuples. > > It certainly happens but most tuples in the table will be > > committed and not being concurrently updated by anyone else. > > The first update moves the row to overflow, only the 2nd next might be > able to pull it back. > So on average you would have at least 66% of all updated rows after last > vacuum in the overflow. > > The problem with needing very frequent vacuums is, that you might not be > able to do any work because of long transactions. HOT doesn't need more frequent VACUUMs, it is just more efficient and so can allow them, when needed to avoid I/O. Space usage in the overflow relation is at its worst in the case of an enormous table with low volume random updates, but note that it is *never* worse than current space usage. In the best case, which is actually fairly common in practice: a small number of rows of a large table are being updated by a steady stream of concurrent updates, we find the overflow relation needs only a few 100 tuples, so regular vacuuming will be both easy and effective. As an aside, note that HOT works best in real-world situations, not benchmarks such as TPC where the I/Os are deliberately randomised to test the scalability of the RDBMS. But even then, HOT works better. The long-running transaction issue remains unsolved in this proposal, but I have some ideas for later. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly