Re: [HACKERS] Is there a way to run heap_insert() AFTER ExecInsertIndexTuples() ?

2007-03-02 Thread Bruno Wolff III
On Thu, Mar 01, 2007 at 11:26:23 +0100,
  Florian G. Pflug [EMAIL PROTECTED] wrote:
 
 But just postponing nextval() until after the uniqueness checks
 only decreases the *probability* of non-monotonic values, and
 *does not* preven them. Consindert two transactions
 
 A: begin ;
 B: Begin ;
 A: insert ... -- IDENTITY generates value 1
 B: insert .. -- IDENTITY generates value 2
 A: rollback ;
 B: commit ;
 
 Now there is a record with IDENTITY 2, but not with 1. The *only*
 way to fix this is to *not* use a sequence, but rather do
 lock table t in exclusive mode ;
 select max(identity)+1 from t ;
 to generate the identity - but of course this prevents any concurrent
 inserts, which will make this unuseable for any larger database.

While this demonstrates that you can get holes in the sequence, it doesn't
show an example that is not monotonic.

 Note that this is not a deficency of postgres sequences - there is no
 way to guarantee stricly monotonic values while allowing concurrent
 selects at the same time. (Other than lazyly assigning the values, but
 this needs to be done by the application)

With in a single session and barring wrap-around you will get monotonicly
increasing values. You are correct that there is no such guaranty between
separate sessions that overlap in time.

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Is there a way to run heap_insert() AFTER ExecInsertIndexTuples() ?

2007-03-01 Thread Florian G. Pflug

Zoltan Boszormenyi wrote:

The GENERATED column is an easy of use feature
with possibly having less work, whereas the IDENTITY
column is mandatory for some applications (e.g. accounting
and billing is stricter in some countries) where you simply
cannot skip a value in the sequence, the strict monotonity is
not enough.


But just postponing nextval() until after the uniqueness checks
only decreases the *probability* of non-monotonic values, and
*does not* preven them. Consindert two transactions

A: begin ;
B: Begin ;
A: insert ... -- IDENTITY generates value 1
B: insert .. -- IDENTITY generates value 2
A: rollback ;
B: commit ;

Now there is a record with IDENTITY 2, but not with 1. The *only*
way to fix this is to *not* use a sequence, but rather do
lock table t in exclusive mode ;
select max(identity)+1 from t ;
to generate the identity - but of course this prevents any concurrent
inserts, which will make this unuseable for any larger database.

Note that this is not a deficency of postgres sequences - there is no
way to guarantee stricly monotonic values while allowing concurrent
selects at the same time. (Other than lazyly assigning the values, but
this needs to be done by the application)

I agree that I'd be nice to generate the identity columns as late as
possible to prevents needless gaps, but not if price is a for more
intrusive patch, or much higher complexity.

greetings, Florian Pflug


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Is there a way to run heap_insert() AFTER ExecInsertIndexTuples() ?

2007-03-01 Thread Zoltan Boszormenyi

Florian G. Pflug írta:

Zoltan Boszormenyi wrote:

The GENERATED column is an easy of use feature
with possibly having less work, whereas the IDENTITY
column is mandatory for some applications (e.g. accounting
and billing is stricter in some countries) where you simply
cannot skip a value in the sequence, the strict monotonity is
not enough.


But just postponing nextval() until after the uniqueness checks
only decreases the *probability* of non-monotonic values, and
*does not* preven them. Consindert two transactions

A: begin ;
B: Begin ;
A: insert ... -- IDENTITY generates value 1
B: insert .. -- IDENTITY generates value 2
A: rollback ;
B: commit ;


I can understand that. But your example is faulty,
you can't have transaction inside a transaction.
Checkpoints are another story. 8-)

You can have some application tricks to
have continous sequence today with regular
serials but only if don't have a unique index
that doesn't use the serial column. Inserting
a record to that table outside the transaction,
making note of the serial value.

If subsequent processing fails (because of unique,
check constraint, etc) you have to go back to the main
table and modify the record, indicating that the record
isn't representing valid data. But you must keep it with
the serial value it was assigned. I have seen systems
requiring this. My point is that with the identity
column, you will be able to define unique index
on the table that exludes the identity column.


Now there is a record with IDENTITY 2, but not with 1. The *only*
way to fix this is to *not* use a sequence, but rather do
lock table t in exclusive mode ;
select max(identity)+1 from t ;
to generate the identity - but of course this prevents any concurrent
inserts, which will make this unuseable for any larger database.

Note that this is not a deficency of postgres sequences - there is no
way to guarantee stricly monotonic values while allowing concurrent
selects at the same time. (Other than lazyly assigning the values, but
this needs to be done by the application)


Agreed.


I agree that I'd be nice to generate the identity columns as late as
possible to prevents needless gaps, but not if price is a for more
intrusive patch, or much higher complexity.


Intrusive, hm? The catalog have to indicate that the column
is IDENTITY, otherwise you cannot know it.

The cost I am thinking now is an extra heap_update()
after heap_insert() without generating the identity value
and inserting index tuples to indexes that doesn't
contain the identity column.

Best regards,
Zoltán Böszörményi


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Is there a way to run heap_insert() AFTER ExecInsertIndexTuples() ?

2007-03-01 Thread Zoltan Boszormenyi

Zoltan Boszormenyi írta:

Florian G. Pflug írta:

Zoltan Boszormenyi wrote:

The GENERATED column is an easy of use feature
with possibly having less work, whereas the IDENTITY
column is mandatory for some applications (e.g. accounting
and billing is stricter in some countries) where you simply
cannot skip a value in the sequence, the strict monotonity is
not enough.


But just postponing nextval() until after the uniqueness checks
only decreases the *probability* of non-monotonic values, and
*does not* preven them. Consindert two transactions

A: begin ;
B: Begin ;
A: insert ... -- IDENTITY generates value 1
B: insert .. -- IDENTITY generates value 2
A: rollback ;
B: commit ;


I can understand that. But your example is faulty,
you can't have transaction inside a transaction.
Checkpoints are another story. 8-)

You can have some application tricks to
have continous sequence today with regular
serials but only if don't have a unique index
that doesn't use the serial column. Inserting
a record to that table outside the transaction,
making note of the serial value.

If subsequent processing fails (because of unique,
check constraint, etc) you have to go back to the main
table and modify the record, indicating that the record
isn't representing valid data. But you must keep it with
the serial value it was assigned. I have seen systems
requiring this. My point is that with the identity
column, you will be able to define unique index
on the table that exludes the identity column.


Now there is a record with IDENTITY 2, but not with 1. The *only*
way to fix this is to *not* use a sequence, but rather do
lock table t in exclusive mode ;
select max(identity)+1 from t ;
to generate the identity - but of course this prevents any concurrent
inserts, which will make this unuseable for any larger database.

Note that this is not a deficency of postgres sequences - there is no
way to guarantee stricly monotonic values while allowing concurrent
selects at the same time. (Other than lazyly assigning the values, but
this needs to be done by the application)


Agreed.


I agree that I'd be nice to generate the identity columns as late as
possible to prevents needless gaps, but not if price is a for more
intrusive patch, or much higher complexity.


Intrusive, hm? The catalog have to indicate that the column
is IDENTITY, otherwise you cannot know it.

The cost I am thinking now is an extra heap_update()
after heap_insert() without generating the identity value
and inserting index tuples to indexes that doesn't
contain the identity column.


And as far as I tested the current state, there is no cost
if you don't use GENERATED or IDENTITY.
The extra heap_update() would be performed only
if you have an IDENTITY colum.


Best regards,
Zoltán Böszörményi





---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Is there a way to run heap_insert() AFTER ExecInsertIndexTuples() ?

2007-03-01 Thread Florian G. Pflug

Zoltan Boszormenyi wrote:

Florian G. Pflug írta:

Zoltan Boszormenyi wrote:

The GENERATED column is an easy of use feature
with possibly having less work, whereas the IDENTITY
column is mandatory for some applications (e.g. accounting
and billing is stricter in some countries) where you simply
cannot skip a value in the sequence, the strict monotonity is
not enough.


But just postponing nextval() until after the uniqueness checks
only decreases the *probability* of non-monotonic values, and
*does not* preven them. Consindert two transactions

A: begin ;
B: Begin ;
A: insert ... -- IDENTITY generates value 1
B: insert .. -- IDENTITY generates value 2
A: rollback ;
B: commit ;


I can understand that. But your example is faulty,
you can't have transaction inside a transaction.
Checkpoints are another story. 8-)


A: and B: are meant to denote *two* *different*
transactions running concurrently.


You can have some application tricks to
have continous sequence today with regular
serials but only if don't have a unique index
that doesn't use the serial column. Inserting
a record to that table outside the transaction,
making note of the serial value.

If subsequent processing fails (because of unique,
check constraint, etc) you have to go back to the main
table and modify the record, indicating that the record
isn't representing valid data. But you must keep it with
the serial value it was assigned. I have seen systems
requiring this. My point is that with the identity
column, you will be able to define unique index
on the table that exludes the identity column.


Yes, of course you can prevent gaps by just filling them
with garbage/invalid records of whatever. But I don't see
why this is usefull - either you want, say, your invoice
number to be continuous because it's required by law - or
you don't. But if the law required your invoice numbers to be
continous, surely just filling the gaps with fake invoices
it just as illegal as having gaps in the first place.


I agree that I'd be nice to generate the identity columns as late as
possible to prevents needless gaps, but not if price is a for more
intrusive patch, or much higher complexity.


Intrusive, hm? The catalog have to indicate that the column
is IDENTITY, otherwise you cannot know it.

The cost I am thinking now is an extra heap_update()
after heap_insert() without generating the identity value
and inserting index tuples to indexes that doesn't
contain the identity column.


I'll have to admit that I haven't actually looked at your patch -
so sorry if I missunderstood things. I got the impression that
tom's main complaint was that you are shuffling too much existing code
around in your patch, and I figured that this is partly because you
try to generate the IDENTITY value as late as possible. Since doing
this won't prevent gaps, but just reduces the probability of creating
them, I thought that a way around tom's concerns might be to drop
that requirement.

I will shut up now, at least until I have read the patch ;-)

greetings, Florian Pflug



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Is there a way to run heap_insert() AFTER ExecInsertIndexTuples() ?

2007-03-01 Thread Zoltan Boszormenyi

Florian G. Pflug írta:

Yes, of course you can prevent gaps by just filling them
with garbage/invalid records of whatever. But I don't see
why this is usefull - either you want, say, your invoice
number to be continuous because it's required by law - or
you don't. But if the law required your invoice numbers to be
continous, surely just filling the gaps with fake invoices
it just as illegal as having gaps in the first place.


Not fake invoice, stornoed for whatever reason.
But you have to keep the record to show you didn't delete anything.

Best regards,
Zoltán Böszörményi


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Is there a way to run heap_insert() AFTER ExecInsertIndexTuples() ?

2007-03-01 Thread Zoltan Boszormenyi

Zoltan Boszormenyi írta:

The cost I am thinking now is an extra heap_update()
after heap_insert() without generating the identity value
and inserting index tuples to indexes that doesn't
contain the identity column.


And as far as I tested the current state, there is no cost
if you don't use GENERATED or IDENTITY.
The extra heap_update() would be performed only
if you have an IDENTITY colum.


The modification I imagined is actually working:
- skip indexes using the identity columns
- do a simple_heap_update() after all other columns are
 assigned and index tuples are inserted
- do ExecInsertIndexTuples() on indexes referencing
 the IDENTITY column

However, I get warning messages like:

WARNING:  detected write past chunk end in ExecutorState 0xaaff68

How can I prevent them?

Best regards,
Zoltán Böszörményi


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Is there a way to run heap_insert() AFTER ExecInsertIndexTuples() ?

2007-03-01 Thread Tom Lane
Zoltan Boszormenyi [EMAIL PROTECTED] writes:
 However, I get warning messages like:
 WARNING:  detected write past chunk end in ExecutorState 0xaaff68
 How can I prevent them?

Find the memory-clobbering bug in your patch.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Is there a way to run heap_insert() AFTER ExecInsertIndexTuples() ?

2007-03-01 Thread Zoltan Boszormenyi

Tom Lane írta:

Zoltan Boszormenyi [EMAIL PROTECTED] writes:
  

However, I get warning messages like:
WARNING:  detected write past chunk end in ExecutorState 0xaaff68
How can I prevent them?



Find the memory-clobbering bug in your patch.

regards, tom lane
  


Thanks, I found it.

Best regards,
Zoltán Böszörményi


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


[HACKERS] Is there a way to run heap_insert() AFTER ExecInsertIndexTuples() ?

2007-02-28 Thread Zoltan Boszormenyi

Hi,

I would like to be able to harden the conditions
of generating IDENTITY columns so the
events below run in this order:

- assign values for regular columns (with or without DEFAULT)
- NOT NULL checks on regular columns
- CHECK constraints whose expression contains only regular columns

- assign values for GENERATED columns
- NOT NULL checks on GENERATED columns
- CHECK constraints whose expression may contain regular
 or GENERATED columns

- UNIQUE index checks that has only regular columns
- UNIQUE index checks that may have regular or GENERATED columns

- assign values for IDENTITY column
- NOT NULL on IDENTITY
- CHECK constraints on IDENTITY
- UNIQUE index checks that can reference IDENTITY column

At this point the heap tuple and the index tuple can be inserted
without further checks.

Currently tuple-t_self is required by ExecInsertIndexTuples()
and I don't see any way to make IDENTITY work the way it's
intended but to mix heap_insert()/heap_update() and
ExecInsertIndexTuples() together and use the result in
ExecInsert() and ExecUpdate().

Would it be acceptable?

Best regards,
Zoltán Böszörményi


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Is there a way to run heap_insert() AFTER ExecInsertIndexTuples() ?

2007-02-28 Thread Tom Lane
Zoltan Boszormenyi [EMAIL PROTECTED] writes:
 Would it be acceptable?

No, because you can't create index entries when you haven't yet got the
TID for the heap tuple.  What do you propose doing, insert a dummy index
entry and then go back to fill it in later?  Aside from approximately
doubling the work involved, this is fundamentally broken because no
other backend could know what to do upon encountering the dummy index
entry --- there's no way for it to check if the entry references a live
tuple or not.  Not to mention that a crash here will leave a permanently
dummy index entry that there's no way to vacuum.

The other rearrangements you suggest are not any more acceptable;
we are not going to restructure the entire handling of defaults and
check constraints around a single badly-designed SQL2003 feature.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Is there a way to run heap_insert() AFTER ExecInsertIndexTuples() ?

2007-02-28 Thread Zoltan Boszormenyi

Hi,

Tom Lane írta:

Zoltan Boszormenyi [EMAIL PROTECTED] writes:
  

Would it be acceptable?



No, because you can't create index entries when you haven't yet got the
TID for the heap tuple.  What do you propose doing, insert a dummy index
entry and then go back to fill it in later?  Aside from approximately
  


No, I was thinking about breaking up e.g. heap_insert()
to be able to mix with ExecInsertIndexTuples() so I get a
pinned buffer and have the heap_tuple's t_self set first
then doing the uniqueness checks step by step.
BTW, can I use modify_tuple() after doing
RelationGetBufferForTuple() and RelationPutHeapTuple(),
right?


doubling the work involved, this is fundamentally broken because no
  


Well, the work wouldn't be doubled as all the unique indexes
have to be checked anyway with the current way, too, to have
the tuple accepted into the database.


other backend could know what to do upon encountering the dummy index
entry --- there's no way for it to check if the entry references a live
tuple or not.  Not to mention that a crash here will leave a permanently
dummy index entry that there's no way to vacuum.

The other rearrangements you suggest are not any more acceptable;
we are not going to restructure the entire handling of defaults and
check constraints around a single badly-designed SQL2003 feature.
  


My IDENTITY/GENERATED patch broke up the
checks currently this way (CHECK constraints are prohibited
for special case columns):

- normal columns are assigned values (maybe using DEFAULT)
- check NOT NULLs and CHECKs for normal columns

( Up to this point this works the same way as before if you don't
  use neither IDENTITY nor GENERATED. )

- assign GENERATED with ther values
- check NOT NULLs for GENERATED
- assign IDENTITY with value
- check NOT NULL for IDENTITY

and

- check UNIQUE for everything

Identity would be special so it doesn't inflate the sequence
if avoidable. Currently the only way if UNIQUE fails
for any index which is still very much makes it unusable.

What I would like to achieve is for IDENTITY to skip
a sequence value and fail to be INSERTed if the IDENTITY
column's uniqe check is failed. Which pretty much means
that there is already a record with that IDENTITY value
regardless of the UNIQUE index is defined for only the IDENTITY
column or the IDENTITY column is part of a multi-column
UNIQUE index.

If I could broke up the order of events the way I described
in my first mail, I could re-enable having CHECK constraints
for both IDENTITY and GENERATED columns.

The point with GENERATED is you have to have
all other columns assigned with values BEFORE
being able to compute a GENERATED column
that reference other columns in its expression so
you _have to_ break up the current order of computing
DEFAULTs. I know a computed column could be done
either in the application or with SELECTs but compare
the amount of work: if you do it in the SELECT you have to
compute the expressions every time the SELECT is run
making it slower. Doing it on UPDATE or INSERT
makes it LESS work in a fewer INSERT/UPDATE +
heavy SELECT workload. Of course, for a heavy UPDATE
workload it makes it more work but only if you actually
use GENERATED columns. It means exatly the same
amount of work if you use IDENTITY as with SERIAL,
it's just made in different order.

The GENERATED column is an easy of use feature
with possibly having less work, whereas the IDENTITY
column is mandatory for some applications (e.g. accounting
and billing is stricter in some countries) where you simply
cannot skip a value in the sequence, the strict monotonity is
not enough.

Best regards,
Zoltán Böszörményi


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate