On Wed, Apr 12, 2023, at 6:24 PM, Thorsten Glaser wrote:
> On Wed, 12 Apr 2023, Kirk Wolak wrote:
>
> >The solution exists. Pre-fetch the IDs, assign them and insert them with
> >the IDs. Then you have 100% control.
> >SELECT NEXTVAL('tbl_seq') from GENERATE_SERIES(1, <total_needed>);
>
> This would be a nice solution… but the PK could be always generated,
> or not even sequential (UUIDs for example). If you’re developing
> schema-first the application would not even (need to) know about how
> the IDs are generated as it’s the DB that generates them.
>
> AIUI the OP’s an SQLAlchemy developer, which is an ORM. Therefore he
> needs a generic solution.
im my view I already have the solution which is the form I referred to in my
earlier email:
INSERT INTO mytable (a, b)
SELECT p1, p2 FROM (VALUES (10, 11, 1), (12, 13, 2)) AS sel(p1, p2, num) ORDER
BY num
RETURNING mytable.id
the columns we want to INSERT, a and b, come from p1 and p2 in the SELECT. the
third value in each table-row value inside the VALUES clause we call the
"sentinel" and we apply it as an incrementing number which we also ORDER BY.
per Tom Lane's email at
https://www.postgresql.org/message-id/29386.1528813619%40sss.pgh.pa.us in 2018,
assuming mytable.id is serial (or I would assume IDENTITY), this resolves to
something equivalent to:
INSERT INTO mytable (id, a, b)
SELECT nextval("mytable_id_seq"), p1, p2 FROM (
SELECT p1, p2 FROM (VALUES (10, 11, 1), (12, 13, 2)) AS sel(p1, p2, num)
ORDER BY num
) AS ss RETURNING mytable.id
So the above is one part we want to confirm is accurate, hasn't changed, is not
going to change much as far as its ultimate behavior. It means that as rows
are generated from the SELECT, the ascending values for "mytable_id_seq" will
correspond to the ascending values for "num". That's the important part; we
are sending in an number that controls the sorting of records as they are
delivered outwards, and the generated identifiers, assuming they are
incrementing, will sort in that same way as each value is associated with the
rows emanating from the inner SELECT. it does not matter if the sequence has
gaps in it from other transactions or whatever, only that it increments in the
same direction as "num" which we pass in.
In this sequence of events, the INSERT conceptually hasn't even happened for a
particular row before the row has been associated with a sequence value.
INSERT can do whatever it wants, it can break out into 20 threads and jumble up
all the rows or whatever. When it sends them out in RETURNING, whatever order
it comes back is no problem; we just want those "mytable_id_seq" values back.
We sort by that when we get the rows back. This ordering matches the one we
put in. So the way these threads often seem to discuss how INSERTS and
RETURNING are both not ordered are missing this particular point, that we
really just need a way to relate server generated values to the rows we put in.
This form comes from two places, one is from Microsoft SQL Server which
documents as well as guaranteed to maintain order at
https://learn.microsoft.com/en-us/sql/t-sql/statements/insert-transact-sql?view=sql-server-ver16;
"INSERT queries that use SELECT with ORDER BY to populate rows guarantees how
identity values are computed but not the order in which the rows are inserted."
; for PostgreSQL, the 2018 thread above seems to confirm a similar behavior for
PostgreSQL.
So we were looking for feedback from PG developers on the above assumptions.
Beyond that, throughout all the theoretical talk about "well PG might change
this someday", while PG most certainly can and maybe will change some of these
assumptions, it would produce a lot of surprises for many users who knowingly
or unknowingly have relied on various orderings of this way and it would likely
lead to some very good documentation about this area of functionality, maybe
even some new syntaxes or keywords to help with this extremely common use case.
Documentation for these things is what the various threads on this subject
often ask for. The first such thread I can find asking about this, which also
ended kind of ambiguously, was in 2012 here:
https://www.mail-archive.com/[email protected]/msg204156.html .
Then there's the 2018 thread, and also a bunch on stackoverflow. Everyone is
left guessing or not totally sure since there's no official documentation for
this.
We're going to document in SQLAlchemy the approaches we are using for different
backends and I wouldnt be surprised if when people google for "PostgreSQL
INSERT RETURNING ORDERING" our own docs are going to come up near the top.
That seems to happen a lot.