Re: [HACKERS] Frequent Update Project: Design Overview of HOTUpdates

Simon Riggs Fri, 10 Nov 2006 12:40:45 -0800

On Fri, 2006-11-10 at 16:46 +0100, Zeugswetter Andreas ADI SD wrote:

> > I'm not sure this really solves that problem because there 
> > are still DELETEs to consider but it does remove one factor 
> > that exacerbates it unnecessarily.
> 
> Yea, so you still need to vaccum the large table regularly.


HOT covers the use-case of heavy updating, which in many common cases
occurs on tables with few inserts/deletes. HOT would significantly
reduce the need to vacuum since deletes and wraparound issues would be
the only remaining reasons to do this.

[I have some ideas for how to optimize tables with heavy INSERT/DELETE
activity, but that case is much less prevalent than heavy UPDATEs.]

> > I think the vision is that the overflow table would never be 
> > very large because it can be vacuumed very aggressively. It 
> > has only tuples that are busy and will need vacuuming as soon 
> > as a transaction ends. Unlike the main table which is mostly 
> > tuples that don't need vacuuming. 
> 
> Ok, but you have to provide an extra vacuum that does only that then
> (and it randomly touches heap pages, and only does partial work there).

Sure, HOT needs a specially optimised VACUUM.

> > So a heap that's double in size necessary takes twice as 
> > long as necessary to scan. The fact that the overflow tables 
> > are taking up space isn't interesting if they don't have to 
> > be scanned.
> 
> The overflow does have to be read for each seq scan. And it was stated
> that it would
> be accessed with random access (follow tuple chain).
> But maybe we can read the overflow same as if it where an additional
> segment file ?

Not without taking a write-avoiding lock on the table, unfortunately.

> > Hitting the overflow tables should be quite rare, it only 
> > comes into play when looking at concurrently updated tuples. 
> > It certainly happens but most tuples in the table will be 
> > committed and not being concurrently updated by anyone else.
> 
> The first update moves the row to overflow, only the 2nd next might be
> able to pull it back.
> So on average you would have at least 66% of all updated rows after last
> vacuum in the overflow.
> 
> The problem with needing very frequent vacuums is, that you might not be
> able to do any work because of long transactions.

HOT doesn't need more frequent VACUUMs, it is just more efficient and so
can allow them, when needed to avoid I/O. Space usage in the overflow
relation is at its worst in the case of an enormous table with low
volume random updates, but note that it is *never* worse than current
space usage. In the best case, which is actually fairly common in
practice: a small number of rows of a large table are being updated by a
steady stream of concurrent updates, we find the overflow relation needs
only a few 100 tuples, so regular vacuuming will be both easy and
effective.

As an aside, note that HOT works best in real-world situations, not
benchmarks such as TPC where the I/Os are deliberately randomised to
test the scalability of the RDBMS. But even then, HOT works better.

The long-running transaction issue remains unsolved in this proposal,
but I have some ideas for later.

-- 
  Simon Riggs             
  EnterpriseDB   http://www.enterprisedb.com



---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
       subscribe-nomail command to [EMAIL PROTECTED] so that your
       message can get through to the mailing list cleanly

Re: [HACKERS] Frequent Update Project: Design Overview of HOTUpdates

Reply via email to