Re: Concurrent INSERT statements with RETURNING clause resetting SERIAL sequence

2022-07-20 Thread Karsten Hilbert
Am Wed, Jul 20, 2022 at 09:15:29AM + schrieb Sebastien Flaesch:

> Thomas, we already have a similar solution.
> The idea is to use the native PostgreSQL SERIAL type.

Which does not guarantuee gaplessness.

Karsten
--
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B




Re: Concurrent INSERT statements with RETURNING clause resetting SERIAL sequence

2022-07-20 Thread Sebastien Flaesch
Thomas, we already have a similar solution.
The idea is to use the native PostgreSQL SERIAL type.
Seb

From: Thomas Kellerer 
Sent: Wednesday, July 20, 2022 8:56 AM
To: pgsql-general@lists.postgresql.org 
Subject: Re: Concurrent INSERT statements with RETURNING clause resetting 
SERIAL sequence

EXTERNAL: Do not click links or open attachments if you do not recognize the 
sender.

Sebastien Flaesch schrieb am 19.07.2022 um 18:50:
> Tom,
>
> /If that's the behavior you want, you can build it out of standard SQL 
> facilities (e.g. update a one-row table).
> /
>
> Can you elaborate please?
>
> Do you mean the code should use an UPDATE on a one-row table to acquire a 
> lock?

I assume something like this:

https://urldefense.com/v3/__https://blog.sql-workbench.eu/post/gapless-sequence/__;!!I_DbfM1H!F7_2cNahve0cmwPMP6QBBwwpyP6UAum4ukFj71_21ebcxTKXZFtU0_3O6l1lfG5jYiKjO7wEzRt_E1GbJ9Q$






Re: Concurrent INSERT statements with RETURNING clause resetting SERIAL sequence

2022-07-20 Thread Francisco Olarte
Hi Gavan.

On Wed, 20 Jul 2022 at 00:10, Gavan Schneider  wrote:
> On 20 Jul 2022, at 4:08, Francisco Olarte wrote:
> As a remark, in Spain bill numbers need to be gapless increasing. I
> have done it with 
>
> One answer to this problem has been around for a while, and my version is 
> shown below.

If I read it correctly, leaving the 700 stuff aside, this function
gives you the first free transaction_ref. This is not valid for my
problem, I need them increasing, 1-3-2 is not a valid sequence. The
trick I use is that I can have "dummy" records, which do not have real
data ( or dates, which is what, among other things, makes me need them
increasing ), so I generate 1-3-4 and then insert 2 in batch which
values adequate for legal (i.e., I can just use 1.date for 2.date ).

I think what you do is generate account numbers which should be
gapless in the medium term and should be increasing as needed, wherte
the gapless is more important to you than the increasing.

> No extra table is needed, just a windowed look at the table where the 
> reference is used.

 It is normally never needed, thay are used solely for performance and
to avoid locking. A current_transaction_seq

> My usage is at human speed so performance is not an issue but it should be 
> pretty good if there are minimal holes. What is not addressed is that a 
> reference can be reissued upto the time the calling process commits an entry 
> in the table and takes that reference out of circulation. There are different 
> approaches to handling/preventing such collisions.

Your construct seems pretty expensive, and I'm not sure how much
locking it does at serializable. Also, given it needs recovery ( the
single row table plus back-fill does not, if you consider the
back-filling "bussines as usual" ). Also the reissuing of a number is
a no-go in my automated systems, it would need extensive recovery, in
that case I can use a gap-less approach by simpy selecting max(id) (
and given it is indexed, using a select for update on the max record
by ordering desc and limit 1, if I'm not too confused this would give
no gaps, increasing at the expense of reduced concurrency ).

...
> $COMMENT$
> Determines the next available reference number, making sure to fill any holes.

No going back allowed in my case. Simplifying it, a bill may reference
an older one in its data, and its number MUST be less. The gap problem
is there MUST be NO gaps WHEN I "print" them. And I can use dummies,
but I SHOULD not use them. In practice, you rarely use more than a
couple of dummies a year. I could strictly print dummies when I need
them, but they MUST be always the same, and the easier way is to
insert them.

Francisco Olarte.




Re: Concurrent INSERT statements with RETURNING clause resetting SERIAL sequence

2022-07-20 Thread Thomas Kellerer
Sebastien Flaesch schrieb am 19.07.2022 um 18:50:
> Tom,
>
> /If that's the behavior you want, you can build it out of standard SQL 
> facilities (e.g. update a one-row table).
> /
>
> Can you elaborate please?
>
> Do you mean the code should use an UPDATE on a one-row table to acquire a 
> lock?

I assume something like this:

https://blog.sql-workbench.eu/post/gapless-sequence/






Re: Concurrent INSERT statements with RETURNING clause resetting SERIAL sequence

2022-07-20 Thread Sebastien Flaesch

Changing a bit the CASE/WHEN logic in the RETURNING clause solves the issue 
when concurrently inserting rows without specifying explicitly a value for the 
serial column (INSERTs specifying values for the serial column are seldom used 
by concurrent programs inserting many rows):

insert into tab1 (name) VALUES ('aaa')
returning tab1.pkey,
  (select case
 when tab1.pkey>(select last_value from 
public.tab1_pkey_seq)
  then setval('public.tab1_pkey_seq',tab1.pkey,true)
 else 0
   end)


Example (assuming there is no unique constraint!):

  1.  INSERT without value for serial column => sequence=0+1 => last_value = 1
  2.  INSERT without value for serial column => sequence=1+1 => last_value = 2
  3.  INSERT with value 2 for serial column => tab1.pkey(2) > last_value(2) ? 
false => no sequence reset
  4.  INSERT without value for serial column => sequence=2+1 => last_value = 3
  5.  INSERT with value 4 for serial column => tab1.pkey(4) > last_value ? true 
=> setval(seqname,4,true)
  6.  INSERT without value for serial column => sequence=4+1 => last_value = 5

This will also save setval() calls for each INSERT not specifying a value 
explicitly for the serial column.

Stop me if I am wrong... 

Seb




Seb

From: Tom Lane 
Sent: Tuesday, July 19, 2022 5:41 PM
To: Sebastien Flaesch 
Cc: pgsql-general 
Subject: Re: Concurrent INSERT statements with RETURNING clause resetting 
SERIAL sequence

EXTERNAL: Do not click links or open attachments if you do not recognize the 
sender.

Sebastien Flaesch  writes:
> I try to update the underlying sequence of a SERIAL column, by using a 
> RETURNING clause in my INSERT statement, which is checking that the column 
> value is greater than the last_value of my sequence, and reset the sequence 
> with setval() if needed.

It's not too surprising that that doesn't work, if you're coding it
based on this assumption:

> The whole INSERT statement (including the code in the RETURNING clause), 
> should execute in a ATOMIC manner.

Sequence-related actions are always carried out immediately, they do
not participate in any atomicity guarantees about the calling transaction.
Without this, any sequence update would have to block all concurrent
uses of that sequence until they see whether the first update commits.

If that's the behavior you want, you can build it out of standard SQL
facilities (e.g. update a one-row table).  The point of sequence objects
is exactly to provide a feature with better concurrent performance,
at the cost of no rollback guarantees.

So, there's no bug here, and calling it one isn't going to change
anybody's mind about that.

regards, tom lane


Re: Concurrent INSERT statements with RETURNING clause resetting SERIAL sequence

2022-07-19 Thread Gavan Schneider

On 20 Jul 2022, at 4:08, Francisco Olarte wrote:


As a remark, in Spain bill numbers need to be gapless increasing. I
have done it with a sequence ( with cache 1, so I doubt it is much
more performant than a single row table, as it has a backing element
which needs to be saved ), and just used another process which swept
the values and filled the gap a posteriori ( there is a way to fill a
gap, has it legal uses, you can use it for that as long as you do not
have too many, basically insert cancelled bills in the gaps ). I
probably would do it with the one-row table now for simplicity, I
doubt I gained enough performance to justify the 20-something perl
lines plus the crontab entry to use sequences. As beard turns grey I
apreciate more the simpler ways to do things.

One answer to this problem has been around for a while, and my version 
is shown below. No extra table is needed, just a windowed look at the 
table where the reference is used. My usage is at human speed so 
performance is not an issue but it should be pretty good if there are 
minimal holes. What is not addressed is that a reference can be reissued 
upto the time the calling process commits an entry in the table and 
takes that reference out of circulation. There are different approaches 
to handling/preventing such collisions.


CREATE OR REPLACE FUNCTION accounts.next_active_reference()
RETURNS integer
LANGUAGE 'sql'
VOLATILE  LEAKPROOF STRICT
PARALLEL UNSAFE
COST 3000 -- pure guesstimate
AS $BODY$
SELECT L.transaction_ref + 1 AS start
FROM accounts.transaction_refs AS L
LEFT OUTER JOIN accounts.transaction_refs AS R
ON L.transaction_ref + 1 = R.transaction_ref
WHERE R.transaction_ref IS NULL
		AND L.transaction_ref >700 -- historic reasons only, added to existing 
system

ORDER BY L.transaction_ref
LIMIT 1;
$BODY$;

COMMENT ON FUNCTION accounts.next_active_reference() IS
$COMMENT$
	Determines the next available reference number, making sure to fill any 
holes.
	The holes in the sequence prior to 700 are ignored (pure history), and 
this code
	will make sure any out of sequence data blocks will not cause 
conflicts.

Credits:
	Ref: 

	2022 update: this link is now dead, only reporting "There is nothing 
here".

$COMMENT$;

Regards

Gavan Schneider
——
Gavan Schneider, Sodwalls, NSW, Australia
Explanations exist; they have existed for all time; there is always a 
well-known solution to every human problem — neat, plausible, and 
wrong.

— H. L. Mencken, 1920


Re: Concurrent INSERT statements with RETURNING clause resetting SERIAL sequence

2022-07-19 Thread Francisco Olarte
On Tue, 19 Jul 2022 at 18:50, Sebastien Flaesch
 wrote:
>> If that's the behavior you want, you can build it out of standard SQL 
>> facilities (e.g. update a one-row table).
> Can you elaborate please?
> Do you mean the code should use an UPDATE on a one-row table to acquire a 
> lock?

That's the usual SQL way. You can, using the appropiate isolation
level, select the max() from the table using the IDs  ( but this will
probably lock it whole ), you can use a one-row table with a current
int column as a sequence, and set current=current+1, but this will
lock everyone inserting ( and need an extra table ). You can use a
name-current unique sequences table and update that ( it saves tables
but it can worsen locking, as many engines locks whole pages ). The
consecutive gapless sequence problem surfaces a lot, but there is no
good way to solve it without locking, as once you get the value
everybody else needing it has to wait until you decide whether to
commit on rollback.

Sequences are for when you need autogenerated keys. They use
consecutive integer blocks nearly everywhere because they are easy to
code, play well with btree indexes, correlate well with insertion
times, but they give up the "no hole" approach to improve performance.
They even can return values unordered in different sessions (
sequences acquire, and burn,  blocks in a locking way, when you need
high concurrency you make them return big blocks, but them values from
different sessions can be delivered out of order ).

If you need correlative numbers, byte the bullet and code it, trying
to do it with sequences will probably end with an approach which will
fail in some corner cases and be slower than just coding it with a
single row table ( just remember to check it, so you do not have the
fun which ensues in oracle when someone puts two rows in dual in the
old days ). All the queries you have posted are probably more complex
than a couple procedures to update and read a table, even using eval
to simulate named sequences if you want ( just remember to always
pre-increment before reading, there is a reason everybody does it that
way ).

As a remark, in Spain bill numbers need to be gapless increasing. I
have done it with a sequence ( with cache 1, so I doubt it is much
more performant than a single row table, as it has a backing element
which needs to be saved ), and just used another process which swept
the values and filled the gap a posteriori ( there is a way to fill a
gap, has it legal uses, you can use it for that as long as you do not
have too many, basically insert cancelled bills in the gaps ). I
probably would do it with the one-row table now for simplicity, I
doubt I gained enough performance to justify the 20-something perl
lines plus the crontab entry to use sequences. As beard turns grey I
apreciate more the simpler ways to do things.

Francisco Olarte.




Re: Concurrent INSERT statements with RETURNING clause resetting SERIAL sequence

2022-07-19 Thread Sebastien Flaesch
Tom,
If that's the behavior you want, you can build it out of standard SQL 
facilities (e.g. update a one-row table).
Can you elaborate please?

Do you mean the code should use an UPDATE on a one-row table to acquire a lock?

Seb


Re: Concurrent INSERT statements with RETURNING clause resetting SERIAL sequence

2022-07-19 Thread Tom Lane
Sebastien Flaesch  writes:
> I try to update the underlying sequence of a SERIAL column, by using a 
> RETURNING clause in my INSERT statement, which is checking that the column 
> value is greater than the last_value of my sequence, and reset the sequence 
> with setval() if needed.

It's not too surprising that that doesn't work, if you're coding it
based on this assumption:

> The whole INSERT statement (including the code in the RETURNING clause), 
> should execute in a ATOMIC manner.

Sequence-related actions are always carried out immediately, they do
not participate in any atomicity guarantees about the calling transaction.
Without this, any sequence update would have to block all concurrent
uses of that sequence until they see whether the first update commits.

If that's the behavior you want, you can build it out of standard SQL
facilities (e.g. update a one-row table).  The point of sequence objects
is exactly to provide a feature with better concurrent performance,
at the cost of no rollback guarantees.

So, there's no bug here, and calling it one isn't going to change
anybody's mind about that.

regards, tom lane




Concurrent INSERT statements with RETURNING clause resetting SERIAL sequence

2022-07-19 Thread Sebastien Flaesch
Hello!

I try to update the underlying sequence of a SERIAL column, by using a 
RETURNING clause in my INSERT statement, which is checking that the column 
value is greater than the last_value of my sequence, and reset the sequence 
with setval() if needed.

When running several client processes in concurrently, using this INSERT 
statement, I get duplicate primary key errors.

The SQL code of my INSERT looks like this:

insert into tab1 (name) VALUES ('aaa')
returning tab1.pkey,
  (select case
 when tab1.pkey>=(select last_value from 
public.tab1_pkey_seq)
  then setval('public.tab1_pkey_seq',tab1.pkey,true)
 else 0
   end)

To me something is wrong with the behavior of PostgreSQL:

The whole INSERT statement (including the code in the RETURNING clause), should 
execute in a ATOMIC manner.

Obviously if several processes execute that code in parallel, and if the select 
last_value / setval() get mixed up, we are in trouble...

Can something confirm this is a PostgreSQL bug, or help me to find the right 
code to avoid the issue?

To reproduce:

  1.  Create tab1 table and stored procedure for testing with insert-tab1.sql
  2.  Run several psql processes in parallel, calling the stored procedure: sh 
./start2.sh (Linux/bash)

Thanks!

Seb


inserts-tab1.sql
Description: inserts-tab1.sql


call-proc.sql
Description: call-proc.sql


start2.sh
Description: start2.sh