Re: [HACKERS] Behavior of GENERATED columns per SQL2003

2007-05-14 Thread Bruce Momjian

URL added to TODO item.  Patch rejected for 8.3.

---

Tom Lane wrote:
 I've been studying the SQL spec in a bit more detail and I'm suddenly
 thinking that we've got the behavior all wrong in the current
 GENERATED/IDENTITY patch.  In particular, it looks to me like we've
 been implementing GENERATED ALWAYS AS (expr) according to the rules
 that the spec in fact lays down only for GENERATED ALWAYS AS IDENTITY.
 You'd think the two constructs would be pretty closely related but
 the behaviors specified by the spec are light-years apart.  If you
 look closely, a generated column in the meaning of section 4.14.8
 is one that has GENERATED ALWAYS AS (expr), and identity columns are
 *not* in this class.
 
 It looks to me like the behavior the spec intends for a generated column
 is actually that it can be implemented as a virtual column occupying
 no space on disk and instead computed on-the-fly when retrieved.
 Identity columns can have their values overridden by the
 user (it's a little harder if GENERATED ALWAYS, but still possible),
 and they don't change during an UPDATE unless specifically forced to.
 In contrast, generated columns cannot be overridden by
 assignment, and are recomputed from their base columns during updates.
 This realization also explains the following, otherwise rather strange,
 facts:
 
 * There is no GENERATED BY DEFAULT AS (expr) in the spec.
 
 * GENERATED expressions are specifically disallowed from containing
   subselects, calling functions that access any SQL-data, or being
   nondeterministic; hence their values depend solely on the regular
   columns in the same row.
 
 * While identity columns are updated (if needed) before execution of
   BEFORE triggers, generated columns are updated after BEFORE triggers;
   hence a BEFORE trigger can override the value in one case and not the
   other.  (The current patch gets this wrong, btw.)
 
 * Generated columns are forcibly updated when their base columns change
   as a result of FK constraints (such as ON UPDATE CASCADE).
 
 It looks to me like a BEFORE trigger is actually the only place that can
 (transiently) see values of a generated column that are different from
 the result of applying the generation expression on the rest of the row.
 It's unclear whether that's intentional or an oversight.
 
 Is anyone familiar with a database that implements SQL-spec generated
 columns?  Do they actually store the columns?
 
   regards, tom lane
 
 ---(end of broadcast)---
 TIP 3: Have you checked our extensive FAQ?
 
http://www.postgresql.org/docs/faq

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Behavior of GENERATED columns per SQL2003

2007-05-11 Thread Richard Huxton

Tom Lane wrote:

Simon Riggs [EMAIL PROTECTED] writes:

On Thu, 2007-05-10 at 10:11 -0400, Tom Lane wrote:

As for GENERATED ALWAYS AS (expr), now that we understand that it's not
supposed to define a virtual column, what's the point?



We do need virtual columns, whether the spec requires them or not.


Agreed, they seem more useful than what the spec's got in mind.  You can
fake a virtual column using a view, but it's a whole lot more painful
than faking a GENERATED column using a trigger (at least if you wish the
view to be updatable).


If they could play a part in foreign keys that would be useful too.

--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Behavior of GENERATED columns per SQL2003

2007-05-10 Thread Zoltan Boszormenyi

Tom Lane írta:

After some more study of the SQL spec, the distinction between GENERATED
ALWAYS AS IDENTITY and GENERATED BY DEFAULT AS IDENTITY is not what
I thought it was.

* As far as I can find from the spec, there is *no* difference between
the two cases for INSERT commands.  The rule is that you ignore any
user-supplied data and use the default (ie, nextval()) unless OVERRIDING
SYSTEM VALUE is specified.  It is not an error to try to insert data
into an identity column, it's just ignored unless OVERRIDING SYSTEM
VALUE.

* The difference for UPDATE commands is that you can update a BY DEFAULT
identity column to anything you want, whereas for an ALWAYS identity
it's an error to update to anything but DEFAULT (which causes a fresh
nextval() to be assigned).  Both behaviors are different from a
generated column, which is updated whether you mention it or not.
  


The quoted SIGMOD paper mentioned that specifying a value
for a generated column should raise an error in INSERT but
this behaviour is not mentioned by the standard.
BTW, do you know what's a self-referencing column?
I haven't found a definition of it and there are places where the standard
uses this term on behaviour that would be natural for generated columns.
E.g. page 860 in latest drafts, section 10.14, or SQL:2003, section 14.8,
about INSERT statement:  the value the user specified should be stored if
some underlying column of Ci is a self-referencing column and
OVERRIDING SYSTEM VALUE is specified.


This means that GENERATED BY DEFAULT AS IDENTITY is not at all
equivalent to our historical behavior for SERIAL columns and hence we
cannot merge the two cases.
  


Yes, they are equivalent if you read 5IWD2-02-Foundation-2006-04.pdf
or 5CD2-02-Foundation-2006-01.pdf, i.e. the latest two drafts.
(The latter seems to be misnamed considering that www.wiscorp.com
refreshed the sql200n.zip on 2007-03-11.) Page 860, section 14.10, INSERT.
The value the user provides should be accepted for storage if:
- the column is an identity column and you provide
 OVERRIDING SYSTEM VALUE, or
- the column is an GENERATED BY DEFAULT AS IDENTITY
 and you provide neither OVERRIDING USER VALUE nor
 the DEFAULT specification for the column.

I think the babble about OVERRIDING USER VALUE
is somewhat controversial. Why would you want to do e.g.
INSERT INTO tabname (id, ...) OVERRIDING USER VALUE (N, ...);
where N is an explicit constant? And I haven't even implemented
handling it. Anyway, without specifying OVERRIDING USER VALUE
the GENERATED BY DEFAULT AS IDENTITY is
equivalent with traditional SERIAL in PostgreSQL.
Implementing OVERRIDING USER VALUE behaviour means
that GENERATED BY DEFAULT AS IDENTITY (or SERIAL) would be
marked as an identity as well, not as a column simply having a DEFAULT 
clause.

Otherwise OVERRIDING USER VALUE would override every
user-specified value for regular columns having a DEFAULT expression.


The lack of any behavioral difference for INSERT seems surprising
and counterintuitive; have I just missed something in the spec?
  


No, I was just ahead of the times and read newer drafts than SQL:2003.


BTW, I found what they did about the problem that generated columns
are out of sync with their underlying columns during BEFORE-trigger
execution: in 11.39

	12)If BEFORE is specified, then: 
	   ...

   c) The triggered action shall not contain a field
  reference that references a field in the new transition
  variable corresponding to a generated column of T.
  


I vaguely remember reading it, although the idea seem to have remained
in my mind. :-)


IOW they just pretend you can't look.  So I think we need not worry
about leaving the values out-of-date until after the triggers fire.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

  



--
--
Zoltán Böszörményi
Cybertec Geschwinde  Schönig GmbH
http://www.postgresql.at/


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Behavior of GENERATED columns per SQL2003

2007-05-10 Thread Zoltan Boszormenyi

Zoltan Boszormenyi írta:

The quoted SIGMOD paper mentioned that specifying a value
for a generated column should raise an error in INSERT but
this behaviour is not mentioned by the standard.


I found it now, I haven't read hard enough before.
SQL:2003, section 14.8, syntax rules:

10) If contextually typed table value constructor CTTVC is specified, 
then every contextually typed row
   value constructor element simply contained in CTTVC whose 
positionally corresponding column name
   in insert column list references a column of which some underlying 
column is a generated column shall

   be a default specification.

So, I can only omit the generated column or specify DEFAULT.
Anything else should raise an error. Should it be done in analyze.c
or in rewriteHandler.c?

--
--
Zoltán Böszörményi
Cybertec Geschwinde  Schönig GmbH
http://www.postgresql.at/


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Behavior of GENERATED columns per SQL2003

2007-05-10 Thread Tom Lane
Zoltan Boszormenyi [EMAIL PROTECTED] writes:
 Tom Lane írta:
 This means that GENERATED BY DEFAULT AS IDENTITY is not at all
 equivalent to our historical behavior for SERIAL columns and hence we
 cannot merge the two cases.

 Yes, they are equivalent if you read 5IWD2-02-Foundation-2006-04.pdf
 or 5CD2-02-Foundation-2006-01.pdf, i.e. the latest two drafts.

Hm.  So what we've got here is that the committee has decided the 2003
spec is broken, and they may someday come out with a revised definition
that might, or might not, bear any resemblance to the current 200n
working papers.  According to some off-list discussion, nobody is
entirely sure what the current draft is trying to say anyway.

That about tears it for me: I think we should reject at least the
IDENTITY parts of this patch, and very likely the entire thing.  I've
spent more than three days now trying to get it into a committable form,
time I can't really afford to spend right now on a patch that adds such
a marginal feature.

AFAICS the only actual new feature that IDENTITY adds is the ability to
make the default expression silently override user-specified insertion
data, as in fact was the use-case argued by you here:
http://archives.postgresql.org/pgsql-hackers/2006-08/msg00038.php

Now we find that as the spec is actually written, it doesn't work that
way: if you try to specify a value other than DEFAULT for an inserted
column, you get a syntax error, unless you specify OVERRIDING SYSTEM
VALUE or OVERRIDING USER VALUE, neither of which are going to be found
in legacy apps ported from other DBMSes, and neither of which add any
actual new functionality (if you have to write OVERRIDING USER VALUE,
you could just as well not specify the column).  So I'm seeing a lot
of complexity and a lot of confusion added for not much, not even
considering the risks of trying to track a moving standards target.

As for GENERATED ALWAYS AS (expr), now that we understand that it's not
supposed to define a virtual column, what's the point?  You can get the
same behavior with a trivial BEFORE INSERT/UPDATE trigger that
recomputes the derived value, and you don't have to buy into the rather
ill-defined spec behavior (in particular the point that the generated
column is effectively undefined during trigger firing seems really
poorly done).  In fact, given that the only plausible use-cases involve
expressions that are expensive to compute, a trigger can probably do
*better* than the built-in feature, since it can make use of application
knowledge about when a recomputation is really necessary.  The current
patch recomputes the expression on every UPDATE, and would have a hard
time being any brighter than that, given that we don't know what BEFORE
triggers might do to the row.

So at this point I'm feeling that we've poured a lot of effort down a
hole :-(.  We are not really close to having a patch that implements
the current 200n draft (in particular note that OVERRIDING USER VALUE
is not a no-op according to this morning's understanding of the draft).
Even if we were, I'd vote against implementing a part of the draft
that's clearly still in flux --- if they change it again, we'd be stuck.
But the real bottom line is that I don't see enough use-case for these
features to justify the work done already, let alone a bunch more work.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Behavior of GENERATED columns per SQL2003

2007-05-10 Thread Simon Riggs
On Thu, 2007-05-10 at 10:11 -0400, Tom Lane wrote:

 As for GENERATED ALWAYS AS (expr), now that we understand that it's not
 supposed to define a virtual column, what's the point?  You can get the
 same behavior with a trivial BEFORE INSERT/UPDATE trigger that
 recomputes the derived value, and you don't have to buy into the rather
 ill-defined spec behavior (in particular the point that the generated
 column is effectively undefined during trigger firing seems really
 poorly done).  In fact, given that the only plausible use-cases involve
 expressions that are expensive to compute, a trigger can probably do
 *better* than the built-in feature, since it can make use of application
 knowledge about when a recomputation is really necessary.  The current
 patch recomputes the expression on every UPDATE, and would have a hard
 time being any brighter than that, given that we don't know what BEFORE
 triggers might do to the row.

We do need virtual columns, whether the spec requires them or not. They
would allow us to completely remove the column value when using
value-list based partitioning, giving considerable space savings for
VLDBs.

ISTM that we should interpret this as a requirement for a virtual
column. We can always move from that to a stored column if the spec
becomes more specific, though it would be harder to move the other way.
And as you point out, storing the value would make no sense.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Behavior of GENERATED columns per SQL2003

2007-05-10 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 On Thu, 2007-05-10 at 10:11 -0400, Tom Lane wrote:
 As for GENERATED ALWAYS AS (expr), now that we understand that it's not
 supposed to define a virtual column, what's the point?

 We do need virtual columns, whether the spec requires them or not.

Agreed, they seem more useful than what the spec's got in mind.  You can
fake a virtual column using a view, but it's a whole lot more painful
than faking a GENERATED column using a trigger (at least if you wish the
view to be updatable).

 ISTM that we should interpret this as a requirement for a virtual
 column. We can always move from that to a stored column if the spec
 becomes more specific, though it would be harder to move the other way.

If you're suggesting commandeering the spec's GENERATED ALWAYS syntax
to represent virtual columns, when the committee has made it clear that
that's not what they intend, I say that's sheer folly.  What will you do
when they tweak the spec to the point where a virtual column clearly
doesn't satisfy it?  If we want a nonstandard feature we should use a
nonstandard syntax for it.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Behavior of GENERATED columns per SQL2003

2007-05-09 Thread Tom Lane
After some more study of the SQL spec, the distinction between GENERATED
ALWAYS AS IDENTITY and GENERATED BY DEFAULT AS IDENTITY is not what
I thought it was.

* As far as I can find from the spec, there is *no* difference between
the two cases for INSERT commands.  The rule is that you ignore any
user-supplied data and use the default (ie, nextval()) unless OVERRIDING
SYSTEM VALUE is specified.  It is not an error to try to insert data
into an identity column, it's just ignored unless OVERRIDING SYSTEM
VALUE.

* The difference for UPDATE commands is that you can update a BY DEFAULT
identity column to anything you want, whereas for an ALWAYS identity
it's an error to update to anything but DEFAULT (which causes a fresh
nextval() to be assigned).  Both behaviors are different from a
generated column, which is updated whether you mention it or not.

This means that GENERATED BY DEFAULT AS IDENTITY is not at all
equivalent to our historical behavior for SERIAL columns and hence we
cannot merge the two cases.

The lack of any behavioral difference for INSERT seems surprising
and counterintuitive; have I just missed something in the spec?

BTW, I found what they did about the problem that generated columns
are out of sync with their underlying columns during BEFORE-trigger
execution: in 11.39

12)If BEFORE is specified, then: 
   ...
   c) The triggered action shall not contain a field
  reference that references a field in the new transition
  variable corresponding to a generated column of T.

IOW they just pretend you can't look.  So I think we need not worry
about leaving the values out-of-date until after the triggers fire.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


[HACKERS] Behavior of GENERATED columns per SQL2003

2007-05-07 Thread Tom Lane
I've been studying the SQL spec in a bit more detail and I'm suddenly
thinking that we've got the behavior all wrong in the current
GENERATED/IDENTITY patch.  In particular, it looks to me like we've
been implementing GENERATED ALWAYS AS (expr) according to the rules
that the spec in fact lays down only for GENERATED ALWAYS AS IDENTITY.
You'd think the two constructs would be pretty closely related but
the behaviors specified by the spec are light-years apart.  If you
look closely, a generated column in the meaning of section 4.14.8
is one that has GENERATED ALWAYS AS (expr), and identity columns are
*not* in this class.

It looks to me like the behavior the spec intends for a generated column
is actually that it can be implemented as a virtual column occupying
no space on disk and instead computed on-the-fly when retrieved.
Identity columns can have their values overridden by the
user (it's a little harder if GENERATED ALWAYS, but still possible),
and they don't change during an UPDATE unless specifically forced to.
In contrast, generated columns cannot be overridden by
assignment, and are recomputed from their base columns during updates.
This realization also explains the following, otherwise rather strange,
facts:

* There is no GENERATED BY DEFAULT AS (expr) in the spec.

* GENERATED expressions are specifically disallowed from containing
  subselects, calling functions that access any SQL-data, or being
  nondeterministic; hence their values depend solely on the regular
  columns in the same row.

* While identity columns are updated (if needed) before execution of
  BEFORE triggers, generated columns are updated after BEFORE triggers;
  hence a BEFORE trigger can override the value in one case and not the
  other.  (The current patch gets this wrong, btw.)

* Generated columns are forcibly updated when their base columns change
  as a result of FK constraints (such as ON UPDATE CASCADE).

It looks to me like a BEFORE trigger is actually the only place that can
(transiently) see values of a generated column that are different from
the result of applying the generation expression on the rest of the row.
It's unclear whether that's intentional or an oversight.

Is anyone familiar with a database that implements SQL-spec generated
columns?  Do they actually store the columns?

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Behavior of GENERATED columns per SQL2003

2007-05-07 Thread David Fuhry
Oracle 10g, MySQL 5, and SQL Server 2005 don't appear to support the 
syntax.  The SQL:2003 SIGMOD paper [1] indicates pretty clearly that 
their intention is for the values of generated columns to be stored on disk:


... commonly used expressions are evaluated once and their results 
stored for future use


Generated columns can lead to higher performance... because of reduced 
computation


-Dave

[1] Eisenberg, A., Melton, J., Kulkarni, K., Michels, J., and Zemke, F. 
2004. SQL:2003 has been published. SIGMOD Rec. 33, 1 (Mar. 2004), 119-126.

http://www.sigmod.org/record/issues/0403/E.JimAndrew-standard.pdf

Tom Lane wrote:

I've been studying the SQL spec in a bit more detail and I'm suddenly
thinking that we've got the behavior all wrong in the current
GENERATED/IDENTITY patch.  In particular, it looks to me like we've
been implementing GENERATED ALWAYS AS (expr) according to the rules
that the spec in fact lays down only for GENERATED ALWAYS AS IDENTITY.
You'd think the two constructs would be pretty closely related but
the behaviors specified by the spec are light-years apart.  If you
look closely, a generated column in the meaning of section 4.14.8
is one that has GENERATED ALWAYS AS (expr), and identity columns are
*not* in this class.

It looks to me like the behavior the spec intends for a generated column
is actually that it can be implemented as a virtual column occupying
no space on disk and instead computed on-the-fly when retrieved.
Identity columns can have their values overridden by the
user (it's a little harder if GENERATED ALWAYS, but still possible),
and they don't change during an UPDATE unless specifically forced to.
In contrast, generated columns cannot be overridden by
assignment, and are recomputed from their base columns during updates.
This realization also explains the following, otherwise rather strange,
facts:

* There is no GENERATED BY DEFAULT AS (expr) in the spec.

* GENERATED expressions are specifically disallowed from containing
  subselects, calling functions that access any SQL-data, or being
  nondeterministic; hence their values depend solely on the regular
  columns in the same row.

* While identity columns are updated (if needed) before execution of
  BEFORE triggers, generated columns are updated after BEFORE triggers;
  hence a BEFORE trigger can override the value in one case and not the
  other.  (The current patch gets this wrong, btw.)

* Generated columns are forcibly updated when their base columns change
  as a result of FK constraints (such as ON UPDATE CASCADE).

It looks to me like a BEFORE trigger is actually the only place that can
(transiently) see values of a generated column that are different from
the result of applying the generation expression on the rest of the row.
It's unclear whether that's intentional or an oversight.

Is anyone familiar with a database that implements SQL-spec generated
columns?  Do they actually store the columns?

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


---(end of broadcast)---
TIP 6: explain analyze is your friend