I'm planning to develop an extension, and I'm here for getting some
help. But I would like to share the problem I intend to solve. Maybe my
desired solution is not a good option.
What I have:
* a lot of data being generated every day, which are mainly
queried by an immutable column of type date or timestamp;
* as a standard, almost every table has a bigserial id column
as a primary key;
* data is huge enough to demand table partitioning, which is
implemented as suggested in Postgres documentation, by using triggers
and table inheritance. A function called by cron deal with creation of
What I would like to develop first is a custom type (let's call it
datedserial) for replacing bigserial as the primary key:
* the type would be 8 bytes long, being 4 dedicated to storing
the Date, and 4 dedicated to storing a serial within that day;
* the text representation of the type would show its date and
its serial number (something like '2015-10-02.0000007296' as a canonical
form, but which could accept inputs like '20151002.0000007296');
* as a consequence of this internal representation, the serial
part could not be greater than 4 billion and some;
* support for operator classes allowing the type being used in
GIN and GIST indexes would be optional for now.
That would allow me to have a compact primary key which I could use
to partition the table based on the object's date. That would also allow
me to partition detail tables on the foreign key column having this data
type. Besides that, just by examining the value, mainly when used as a
foreign key, I could infer where the record belongs to.
When I have a working custom data type, I would go to the next and
harder part. I would like to create a new structure like a sequence, and
it should behave exactly like sequences, but separated by a date space.
So I would have functions similar to the following:
* createsequencegroup(sequence_group_name text): create a new
named structure for managing the sequence group;
* nextval(sequence_group_name text, context_date date): return
next value of the sequence (as a datedserial) belonging to the sequence
group and associated with the context date. The value returned have the
context_date in its date part and the next value for that date in the
sequence part. The first call for a specific date would return 1 for the
sequence part. Concerning to concurrency and transactions, the function
behaves exactly like nextval(sequence_group_name text);
* currval(sequence_group_name text, context_date date): the
currval function counterpart;
* setval(sequence_group_name text, context_date date, int4
value): the setval function counterpart;
* freeze_before(sequence_group_name text, freeze_date date):
disallow using the sequence group with context dates before the freeze_date.
I would consider extending the data type to allow including
information about the cluster which generated the values. This way, the
user could set a configuration entry defining a byte value for
identifying the cluster among others involved in replication, so that
the sequence group could have different sequences not only for different
dates, but for different nodes as well.
As I've said, I would like to package the resulting work as an
For now, I would like some help about where to start. I've
downloaded the postgres source code and have successfully compiled it
using my Ubuntu desktop, although have not tested the resulting binary.
Should I create a folder in the contrib directory and use another
extension as a starting point? Is this the recommended path? Or is this
too much and I should create a separate project?
Thanks in advance.
Sent via pgsql-hackers mailing list (email@example.com)
To make changes to your subscription: