On 23 November 2016 at 04:28, Peter Geoghegan <p...@heroku.com> wrote:
> On Tue, Nov 22, 2016 at 7:01 PM, Robert Haas <robertmh...@gmail.com> wrote:
>> This basic DO-UNDO-REDO protocol has been well-understood for
>> decades.
> FWIW, while this is basically true, the idea of repurposing UNDO to be
> usable for MVCC is definitely an Oracleism. Mohan's ARIES paper says
> nothing about MVCC.

Fwiw, Oracle does not use the undo log for snapshot fetches. It's used
only for transaction rollback and recovery.

For snapshot isolation Oracle has yet a *third* copy of the data in a
space called the "rollback segment(s)". When you update a row in a
block you save the whole block in the rollback segment. When you try
to access a block you check if the CSN -- which is basically
equivalent to our LSN -- is newer than your snapshot and if it is you
fetch the old version of the block from the rollback.

Essentially their MVCC is done on a per-block level rather than a
per-row level and they keep only the newest version of the block in
the table, the rest are in the rollback segment.  For what it's worth
I think our approach is cleaner and more flexible. They had a lot of
trouble with their approach over the years and it works well only
because they invested an enormous amount of development in it and also
because people throw a lot of hardware at it too.

I think the main use case we have trouble with is actually the "update
every row in the table" type of update which requires we write to
every block, plus a second copy of every block, plus write full pages
of both copies, then later set hint bits dirtying pages again and
generating more full pages writes, then later come along and vacuum
which requires two more writes of every block, etc. If we had a
solution for the special case of an update that replaces every row in
a page that I think would complement HOT nicely and go a long way
towards fixing our issues.

Incidentally the "Interested transaction list" is for locking rows for
updates and it's basically similar to what we've discussed before of
having a "most frequent xmin" in the header and then a bit indicating
the xmin is missing from the row header. Except in their case they
don't need it for the actual xmin/xmax because their visibility is
done per-block, only the transient lock state


Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to