On 11/05/2014 05:07 PM, Petr Jelinek wrote:
On 05/11/14 13:45, Heikki Linnakangas wrote:
In fact, if the seqam manages the current value outside the database
(e.g. a "remote" seqam that gets the value from another server),
nextval() never needs to write a WAL record.

Sure it does, you need to keep the current state in Postgres also, at
least the current value so that you can pass correct input to
sequence_alloc(). And you need to do this in crash-safe way so WAL is
necessary.

Why does sequence_alloc need the current value? If it's a "remote" seqam, the current value is kept in the remote server, and the last value that was given to this PostgreSQL server is irrelevant.

That irks me with this API. The method for acquiring a new value isn't fully abstracted behind the AM interface, as sequence.c still needs to track it itself. That's useful for the local AM, of course, and maybe some others, but for others it's totally useless.

For the amdata handling (which is the AM's private data variable) the
API assumes that (Datum) 0 is NULL, this seems to work well for
reloptions so should work here also and it simplifies things a little
compared to passing pointers to pointers around and making sure
everything is allocated, etc.

Sadly the fact that amdata is not fixed size and can be NULL made the
page updates of the sequence relation quite more complex that it
used to
be.

It would be nice if the seqam could define exactly the columns it needs,
with any datatypes. There would be a set of common attributes:
sequence_name, start_value, cache_value, increment_by, max_value,
min_value, is_cycled. The local seqam would add "last_value", "log_cnt"
and "is_called" to that. A remote seqam that calls out to some other
server might store the remote server's hostname etc.

There could be a seqam function that returns a TupleDesc with the
required columns, for example.

Wouldn't that somewhat bloat catalog if we had new catalog table for
each sequence AM?

No, that's not what I meant. The number of catalog tables would be the
same as today. Sequences look much like any other relation, with entries
in pg_attribute catalog table for all the attributes for each sequence.
Currently, all sequences have the same set of attributes, sequence_name,
last_value and so forth. What I'm proposing is that there would a set of
attributes that are common to all sequences, but in addition to that
there could be any number of AM-specific attributes.

Oh, that's interesting idea, so the AM interfaces would basically return
updated tuple and there would be some description function that returns
tupledesc.

Yeah, something like that.

I am bit worried that this would kill any possibility of
ALTER SEQUENCE USING access_method. Plus I don't think it actually
solves any real problem - serializing the internal C structs into bytea
is not any harder than serializing them into tuple IMHO.

I agree that serialization to bytea isn't that difficult, but it's still nicer to work directly with the correct data types. And it makes the internal state easily accessible for monitoring and debugging purposes.

It also does not really solve the amdata being dynamic
size "issue".

Yes it would. There would not be a single amdata attribute, but the AM
could specify any number of custom attributes, which could be fixed size
or varlen. It would be solely the AM's responsibility to set the values
of those attributes.


That's not the issue I was referring to, I was talking about the page
replacement code which is not as simple now that we have potentially
dynamic size tuple and if tuples were different for different AMs the
code would still have to be able to handle that case. Setting the values
in tuple itself is not too complicated.

I don't see the problem with that. We deal with variable-sized tuples in heap pages all the time. The max size of amdata (or the extra AM-specific columns) is going to be determined by the block size, though.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to