[HACKERS] Generating Lots of PKs with nextval(): A Feature Proposal

2010-05-14 Thread Peter Crabtree
Recently, in preparation for migrating an application to postgres, I
got to this part of the manual (which is *excellent* so far, by the
way):

http://www.postgresql.org/docs/8.4/interactive/functions-sequence.html

A quick check with the folks on #postgresql confirmed my
understanding, which was that the locking semantics of setval() and
nextval() make this unsafe:

SELECT setval('my_seq', nextval('my_seq') + 500);

Now, I was reminded that I could simply do this:

SELECT nextval('my_seq') FROM generate_series(1, 500);

But of course then I would have no guarantee that I would get a
contiguous block of ids, which means if I'm using this to do a mass
insert of records which refer to each others' ids (example: storing a
directed, linear graph), I either have to do a correlated update on
the client side, after transferring the keys (consider the cost of
doing this for a few million records - 4 MB in keys per million
records, for, in extreme cases, 12 MB of data to be inserted -- 33%
overhead in the worst case, presuming symmetric bandwidth), or I have
to insert into a temporary table, then have the db backend do the
update, then insert from there to the real table. Both are imperfect
options in terms of performance and complexity.

Thus, before I start work on it, I propose an extension to the current
nextval():

SELECT nextval('my_seq', 500);

This would increment the my_seq sequence by its interval * 500, and
return the first valid key. This both makes client code that needs a
bunch of PKs simpler to implement, and saves in performance, since the
client can just replace all its PKs (presuming they're currently a
contiguous block from 1 to n) with my_starting_pk + current_pk, so
this:

 pk | next_node
+---
  0 | 1
  1 | 2
  2 | 0

can be easily updated like this:

SELECT nextval('my_seq', (SELECT count(*) FROM my_table));
UPDATE my_table SET pk = currval('my_seq') + pk, next_node =
currval('my_seq') + next_node;

to something like this:

   pk   | next_node
+--
 521650 |521651
 521651 |521652
 521652 |521650

This is a net gain of performance and ease of implementation in many
cases where a large number of ids from a sequence are needed -- with a
small added benefit of the keys being guaranteed to be contiguous.

I don't see any technical problems with this; postgres already can
pre-allocate more than one key, but the number is semi-static (the
CACHE parameter to CREATE SEQUENCE). This might break existing user
code if they've defined a nextval(regclass, integer), but I don't see
any way to

Finally, I've checked sequence.c -- this looks pretty straightforward
to implement, but I figured checking with this list was wise before
starting work. Apologies if I've been overly wordy.

Peter

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Generating Lots of PKs with nextval(): A Feature Proposal

2010-05-14 Thread Peter Crabtree
On Fri, May 14, 2010 at 5:27 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Peter Crabtree peter.crabt...@gmail.com writes:
 Now, I was reminded that I could simply do this:

 SELECT nextval('my_seq') FROM generate_series(1, 500);

 But of course then I would have no guarantee that I would get a
 contiguous block of ids,

 The existing cache behavior will already handle that for you,
 I believe.  I don't really see a need for new features here.

I don't see how that works for this case, because the cache setting
is static, and also shared between sessions. So if I have 10 records
one time, and 100 records the next, and 587 the third time, what
should my CACHE be set to for that sequence?

And if I do ALTER SEQUENCE SET CACHE each time, I have either killed
concurrency (because I'm locking other sessions out of using that
sequence until I'm finished with it), or I have a race condition (if
someone else issues an ALTER SEQUENCE before I call nextval()). The
same problem exists with using ALTER SEQUENCE SET INCREMENT BY.

Peter

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Generating Lots of PKs with nextval(): A Feature Proposal

2010-05-14 Thread Peter Crabtree
On Fri, May 14, 2010 at 5:29 PM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, May 14, 2010 at 5:04 PM, hubert depesz lubaczewski
 dep...@depesz.com wrote:
 On Fri, May 14, 2010 at 02:07:27PM -0500, Kenneth Marshall wrote:
 Hi Peter,

 All you need to do is define your own sequence with an
 increment of 500. Look at:

 http://www.postgresql.org/docs/8.4/static/sql-createsequence.html

 This is often not enough. For example - I want standard increment of 1,
 but right now I'm importing 1 objects, and it would be simpler for
 me to get 1 ids. Preferably in one block.

 This is not achievable now. I know I can 'alter sequence set increment
 by' - but this will also affect concurrent sessions. which might not be
 a problem, but it's a side effect that I don't want.

 +1 for original proposition, would love to get it.

 If we do this, I'm inclined to think that the extra argument to
 nextval() should be treated as overriding the base increment rather
 than specifying a multiplier for it.  Other than that nitpick, it
 sounds like a reasonable thing to allow.


After giving it some thought, that sounds better. You gain some
functionality that way (temporarily overriding the interval) and lose
none.

Peter

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers