On Thu, Nov 24, 2016 at 04:06:14PM +0000, Greg Stark wrote:
> For snapshot isolation Oracle has yet a *third* copy of the data in a
> space called the "rollback segment(s)". When you update a row in a
> block you save the whole block in the rollback segment. When you try
> to access a block you check if the CSN -- which is basically
> equivalent to our LSN -- is newer than your snapshot and if it is you
> fetch the old version of the block from the rollback.

I assume they can do this with good performance because, unlike UNDO,
they don't need to fsync the pages kept for snapshot visibility --- if
the server crashes, you don't need them.

> Essentially their MVCC is done on a per-block level rather than a
> per-row level and they keep only the newest version of the block in
> the table, the rest are in the rollback segment.  For what it's worth
> I think our approach is cleaner and more flexible. They had a lot of
> trouble with their approach over the years and it works well only
> because they invested an enormous amount of development in it and also
> because people throw a lot of hardware at it too.
> I think the main use case we have trouble with is actually the "update
> every row in the table" type of update which requires we write to
> every block, plus a second copy of every block, plus write full pages
> of both copies, then later set hint bits dirtying pages again and
> generating more full pages writes, then later come along and vacuum
> which requires two more writes of every block, etc. If we had a
> solution for the special case of an update that replaces every row in
> a page that I think would complement HOT nicely and go a long way
> towards fixing our issues.

Any ideas how we could optimize the update-all workload?  Create a new
heap file and indexes?

Our previous worst-case workload was a single row that was updated
repeatedly, but HOT fixed that for non-indexed rows, and WARM will
improve it for indexed rows.

One of the big takeaways when writing my MVCC talk is that no matter how
much we optimize UPDATE, we still need cleanup for deletes and aborted

I am hopeful we can continue reducing the number of times we write a
page  for maintenance purposes.

  Bruce Momjian  <br...@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to