On Wed, Apr 12, 2023, at 6:24 PM, Thorsten Glaser wrote:
> On Wed, 12 Apr 2023, Kirk Wolak wrote:
> 
> >The solution exists.  Pre-fetch the IDs, assign them and insert them with
> >the IDs.  Then you have 100% control.
> >SELECT NEXTVAL('tbl_seq') from GENERATE_SERIES(1, <total_needed>);
> 
> This would be a nice solution… but the PK could be always generated,
> or not even sequential (UUIDs for example). If you’re developing
> schema-first the application would not even (need to) know about how
> the IDs are generated as it’s the DB that generates them.
> 
> AIUI the OP’s an SQLAlchemy developer, which is an ORM. Therefore he
> needs a generic solution.

im my view I already have the solution which is the form I referred to in my 
earlier email:

INSERT INTO mytable (a, b) 
SELECT p1, p2 FROM (VALUES (10, 11, 1), (12, 13, 2)) AS sel(p1, p2, num) ORDER 
BY num 
RETURNING mytable.id

the columns we want to INSERT, a and b, come from p1 and p2 in the SELECT. the 
third value in each table-row value inside the VALUES clause we call the 
"sentinel" and we apply it as an incrementing number which we also ORDER BY.


per Tom Lane's email at 
https://www.postgresql.org/message-id/29386.1528813619%40sss.pgh.pa.us in 2018, 
assuming mytable.id is serial (or I would assume IDENTITY), this resolves to 
something equivalent to:

INSERT INTO mytable (id, a, b)
SELECT nextval("mytable_id_seq"), p1, p2 FROM (
    SELECT p1, p2 FROM (VALUES (10, 11, 1), (12, 13, 2)) AS sel(p1, p2, num) 
ORDER BY num
) AS ss RETURNING mytable.id


So the above is one part we want to confirm is accurate, hasn't changed, is not 
going to change much as far as its ultimate behavior.  It means that as rows 
are generated from the SELECT, the ascending values for "mytable_id_seq" will 
correspond to the ascending values for "num".  That's the important part; we 
are sending in an number that controls the sorting of records as they are 
delivered outwards, and the generated identifiers, assuming they are 
incrementing, will sort in that same way as each value is associated with the 
rows emanating from the inner SELECT.  it does not matter if the sequence has 
gaps in it from other transactions or whatever, only that it increments in the 
same direction as "num" which we pass in.

In this sequence of events, the INSERT conceptually hasn't even happened for a 
particular row before the row has been associated with a sequence value.  
INSERT can do whatever it wants, it can break out into 20 threads and jumble up 
all the rows or whatever.  When it sends them out in RETURNING, whatever order 
it comes back is no problem; we just want those "mytable_id_seq" values back.  
We sort by that when we get the rows back.  This ordering matches the one we 
put in.  So the way these threads often seem to discuss how INSERTS and 
RETURNING are both not ordered are missing this particular point, that we 
really just need a way to relate server generated values to the rows we put in. 
 

This form comes from two places, one is from Microsoft SQL Server which 
documents as well as guaranteed to maintain order at 
https://learn.microsoft.com/en-us/sql/t-sql/statements/insert-transact-sql?view=sql-server-ver16;
 "INSERT queries that use SELECT with ORDER BY to populate rows guarantees how 
identity values are computed but not the order in which the rows are inserted." 
; for PostgreSQL, the 2018 thread above seems to confirm a similar behavior for 
PostgreSQL.

So we were looking for feedback from PG developers on the above assumptions.  
Beyond that, throughout all the theoretical talk about "well PG might change 
this someday", while PG most certainly can and maybe will change some of these 
assumptions, it would produce a lot of surprises for many users who knowingly 
or unknowingly have relied on various orderings of this way and it would likely 
lead to some very good documentation about this area of functionality, maybe 
even some new syntaxes or keywords to help with this extremely common use case. 
 Documentation for these things is what the various threads on this subject 
often ask for.  The first such thread I can find asking about this, which also 
ended kind of ambiguously, was in 2012 here: 
https://www.mail-archive.com/pgsql-hackers@postgresql.org/msg204156.html  .   
Then there's the 2018 thread, and also a bunch on stackoverflow.  Everyone is 
left guessing or not totally sure since there's no official documentation for 
this.

We're going to document in SQLAlchemy the approaches we are using for different 
backends and I wouldnt be surprised if when people google for "PostgreSQL 
INSERT RETURNING ORDERING" our own docs are going to come up near the top.  
That seems to happen a lot.



Reply via email to