On 18.11.2013 11:48, Andres Freund wrote:
On 2013-11-18 10:54:42 +0200, Heikki Linnakangas wrote:
On 15.11.2013 20:21, Andres Freund wrote:
Well, it exposes log_sequence_tuple() - together with the added "am
private" column of pg_squence that allows to do quite a bit of different
things. I think unless we really implement pluggable RMGRs - which I
don't really see coming - we cannot be radically different.

The proposed abstraction leaks like a sieve. The plugin should not need to
touch anything but the private amdata field. log_sequence_tuple() is way too
low-level.

Why? It's *less* low level than a good part of other crash recovery code
we have. I have quite some doubts about the layering, but it's imo
pretty sensible to centralize the wal logging code that plugins couldn't
write.

It doesn't go far enough, it's still too *low*-level. The sequence AM implementation shouldn't need to have direct access to the buffer page at all.

If you look at what e.g the _alloc callback currently gets passed.
a) Relation: Important for metadata like TupleDesc, name etc.
b) SeqTable entry: Important to setup state for cur/lastval
c) Buffer of the tuple: LSN etc.
d) HeapTuple itself: a place to store the tuples changed state

And if you then look at what gets modified in that callback:
Form_pg_sequence->amdata
                 ->is_called
                 ->last_value
                 ->log_cnt
SeqTable        ->last
SeqTable        ->cached
SeqTable        ->last_valid

You need is_called, last_valid because of our current representation of
sequences state (which we could change, to remove that need). The rest
contains values that need to be set depending on how you want your
sequence to behave:
* Amdata is obvious.
* You need log_cnt to influence/remember how big the chunks are you WAL log at
   once are.  Which e.g. you need to control if want a sequence that
   doesn't leak values in crashes
* cached is needed to control the per-backend caching.

I don't think the sequence AM should be in control of 'cached'. The caching is done outside the AM. And log_cnt probably should be passed to the _alloc function directly as an argument, ie. the server code asks the AM to allocate N new values in one call.

I'm thinking that the alloc function should look something like this:

seqam_alloc(Relation seqrel, int nrequested, Datum am_private)

And it should return:

int64 value - the first value allocated.
int nvalues - the number of values allocated.
am_private - updated private data.

The backend code handles the caching and logging of values. When it has exhausted all the cached values (or doesn't have any yet), it calls the AM's alloc function to get a new batch. The AM returns the new batch, and updates its private state as necessary. Then the backend code updates the relation file with the new values and the AM's private data. WAL-logging and checkpointing is the backend's responsibility.

Just as a thought-experiment, imagine that we decided to re-implement
sequences so that all the sequence values are stored in one big table, or
flat-file in the data directory, instead of the current implementation where
we have a one-block relation file for each sequence. If the sequence access
method API is any good, it should stay unchanged. That's clearly not the
case with the proposed API.

I don't think we can entirely abstract away the reality that now they are
based on relations and might not be at some later point. Otherwise we'd
have to invent an API that copied all the data that's stored in struct
RelationData. Like name, owner, ...
Which non SQL accessible API we provide has a chance of providing that
level of consistency in the face of fundamental refactoring? I'd say
none.

I'm not thinking that we'd change sequences to not be relations. I'm thinking that we might not want to store the state as a one-page file anymore. In fact that was just discussed in the other thread titled "init_sequence spill to hash table".

b) It's not actually easy to get similar semantics in "userspace". How
would you emulate the crash-safe but non-transactional semantics of
sequences without copying most of sequence.c? Without writing
XLogInsert()s which we cannot do from a out-of-tree code?

heap_inplace_update()

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to