Re: [HACKERS] pg_sequence catalog

Craig Ringer Wed, 31 Aug 2016 07:36:07 -0700

On 31 August 2016 at 22:01, Tom Lane <[email protected]> wrote:

> Personally, my big beef with the current approach to sequences is that
> we eat a whole relation (including a whole relfilenode) per sequence.
> I wish that we could reduce a sequence to just a single row in a
> catalog, including the nontransactional state.


I'd be happy to see incremental improvement in this space as Peter has
suggested, though I certainly see the value of something like seqam
too.

It sounds like you're thinking of something like a normal(ish) heap
tuple where we just overwrite some fields in-place without fiddling
xmin/xmax and making a new row version. Right? Like we currently
overwrite the lone Form_pg_sequence  on the 1-page sequence relations.

I initially thought that TRUNCATE ... RESTART IDENTITY would be
somewhat of a problem with this. We effectively have a temporary
"timeline" fork in the sequence value where it's provisionally
restarted and we start using values from the restarted sequence within
the xact that restarted it. But actually, it'd fit pretty well.
TRUNCATE ... RESTART IDENTITY would write a new row version with a new
xmin, and set xmax on the old sequence row. nextval(...) within the
truncating xact would update the new row's non-transactional fields
when it allocated new sequence chunks. On commit, everyone starts
using the new row due to normal transactional visibility rules. On
rollback everyone ignores it like they would any other dead tuple from
an aborted act and uses the old tuple's nontransactional fields. It
Just Works(TM).

nextval(...) takes AccessShareLock on a sequence relation. TRUNCATE
... RESTART IDENTITY takes AccessExclusiveLock. So we can never have
nextval(...) advancing the "old" timeline in other xacts at the same
time as we consume values on the restarted sequence inside the xact
that did the restarting. We still need the new "timeline" though,
because we have to retain the old value for rollback.

It feels intuitively pretty gross to effectively dirty-read and write
a few fields of a tuple. But that's what we do all the time with
xmin/xmax etc, it's not really that different.

It'd certainly make sequence decoding easier too. A LOT easier.

-- 
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_sequence catalog

Reply via email to