Re: [HACKERS] Behavior of GENERATED columns per SQL2003
URL added to TODO item. Patch rejected for 8.3. --- Tom Lane wrote: I've been studying the SQL spec in a bit more detail and I'm suddenly thinking that we've got the behavior all wrong in the current GENERATED/IDENTITY patch. In particular, it looks to me like we've been implementing GENERATED ALWAYS AS (expr) according to the rules that the spec in fact lays down only for GENERATED ALWAYS AS IDENTITY. You'd think the two constructs would be pretty closely related but the behaviors specified by the spec are light-years apart. If you look closely, a generated column in the meaning of section 4.14.8 is one that has GENERATED ALWAYS AS (expr), and identity columns are *not* in this class. It looks to me like the behavior the spec intends for a generated column is actually that it can be implemented as a virtual column occupying no space on disk and instead computed on-the-fly when retrieved. Identity columns can have their values overridden by the user (it's a little harder if GENERATED ALWAYS, but still possible), and they don't change during an UPDATE unless specifically forced to. In contrast, generated columns cannot be overridden by assignment, and are recomputed from their base columns during updates. This realization also explains the following, otherwise rather strange, facts: * There is no GENERATED BY DEFAULT AS (expr) in the spec. * GENERATED expressions are specifically disallowed from containing subselects, calling functions that access any SQL-data, or being nondeterministic; hence their values depend solely on the regular columns in the same row. * While identity columns are updated (if needed) before execution of BEFORE triggers, generated columns are updated after BEFORE triggers; hence a BEFORE trigger can override the value in one case and not the other. (The current patch gets this wrong, btw.) * Generated columns are forcibly updated when their base columns change as a result of FK constraints (such as ON UPDATE CASCADE). It looks to me like a BEFORE trigger is actually the only place that can (transiently) see values of a generated column that are different from the result of applying the generation expression on the rest of the row. It's unclear whether that's intentional or an oversight. Is anyone familiar with a database that implements SQL-spec generated columns? Do they actually store the columns? regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Behavior of GENERATED columns per SQL2003
Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: On Thu, 2007-05-10 at 10:11 -0400, Tom Lane wrote: As for GENERATED ALWAYS AS (expr), now that we understand that it's not supposed to define a virtual column, what's the point? We do need virtual columns, whether the spec requires them or not. Agreed, they seem more useful than what the spec's got in mind. You can fake a virtual column using a view, but it's a whole lot more painful than faking a GENERATED column using a trigger (at least if you wish the view to be updatable). If they could play a part in foreign keys that would be useful too. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Behavior of GENERATED columns per SQL2003
Tom Lane írta: After some more study of the SQL spec, the distinction between GENERATED ALWAYS AS IDENTITY and GENERATED BY DEFAULT AS IDENTITY is not what I thought it was. * As far as I can find from the spec, there is *no* difference between the two cases for INSERT commands. The rule is that you ignore any user-supplied data and use the default (ie, nextval()) unless OVERRIDING SYSTEM VALUE is specified. It is not an error to try to insert data into an identity column, it's just ignored unless OVERRIDING SYSTEM VALUE. * The difference for UPDATE commands is that you can update a BY DEFAULT identity column to anything you want, whereas for an ALWAYS identity it's an error to update to anything but DEFAULT (which causes a fresh nextval() to be assigned). Both behaviors are different from a generated column, which is updated whether you mention it or not. The quoted SIGMOD paper mentioned that specifying a value for a generated column should raise an error in INSERT but this behaviour is not mentioned by the standard. BTW, do you know what's a self-referencing column? I haven't found a definition of it and there are places where the standard uses this term on behaviour that would be natural for generated columns. E.g. page 860 in latest drafts, section 10.14, or SQL:2003, section 14.8, about INSERT statement: the value the user specified should be stored if some underlying column of Ci is a self-referencing column and OVERRIDING SYSTEM VALUE is specified. This means that GENERATED BY DEFAULT AS IDENTITY is not at all equivalent to our historical behavior for SERIAL columns and hence we cannot merge the two cases. Yes, they are equivalent if you read 5IWD2-02-Foundation-2006-04.pdf or 5CD2-02-Foundation-2006-01.pdf, i.e. the latest two drafts. (The latter seems to be misnamed considering that www.wiscorp.com refreshed the sql200n.zip on 2007-03-11.) Page 860, section 14.10, INSERT. The value the user provides should be accepted for storage if: - the column is an identity column and you provide OVERRIDING SYSTEM VALUE, or - the column is an GENERATED BY DEFAULT AS IDENTITY and you provide neither OVERRIDING USER VALUE nor the DEFAULT specification for the column. I think the babble about OVERRIDING USER VALUE is somewhat controversial. Why would you want to do e.g. INSERT INTO tabname (id, ...) OVERRIDING USER VALUE (N, ...); where N is an explicit constant? And I haven't even implemented handling it. Anyway, without specifying OVERRIDING USER VALUE the GENERATED BY DEFAULT AS IDENTITY is equivalent with traditional SERIAL in PostgreSQL. Implementing OVERRIDING USER VALUE behaviour means that GENERATED BY DEFAULT AS IDENTITY (or SERIAL) would be marked as an identity as well, not as a column simply having a DEFAULT clause. Otherwise OVERRIDING USER VALUE would override every user-specified value for regular columns having a DEFAULT expression. The lack of any behavioral difference for INSERT seems surprising and counterintuitive; have I just missed something in the spec? No, I was just ahead of the times and read newer drafts than SQL:2003. BTW, I found what they did about the problem that generated columns are out of sync with their underlying columns during BEFORE-trigger execution: in 11.39 12)If BEFORE is specified, then: ... c) The triggered action shall not contain a field reference that references a field in the new transition variable corresponding to a generated column of T. I vaguely remember reading it, although the idea seem to have remained in my mind. :-) IOW they just pretend you can't look. So I think we need not worry about leaving the values out-of-date until after the triggers fire. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend -- -- Zoltán Böszörményi Cybertec Geschwinde Schönig GmbH http://www.postgresql.at/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Behavior of GENERATED columns per SQL2003
Zoltan Boszormenyi írta: The quoted SIGMOD paper mentioned that specifying a value for a generated column should raise an error in INSERT but this behaviour is not mentioned by the standard. I found it now, I haven't read hard enough before. SQL:2003, section 14.8, syntax rules: 10) If contextually typed table value constructor CTTVC is specified, then every contextually typed row value constructor element simply contained in CTTVC whose positionally corresponding column name in insert column list references a column of which some underlying column is a generated column shall be a default specification. So, I can only omit the generated column or specify DEFAULT. Anything else should raise an error. Should it be done in analyze.c or in rewriteHandler.c? -- -- Zoltán Böszörményi Cybertec Geschwinde Schönig GmbH http://www.postgresql.at/ ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Behavior of GENERATED columns per SQL2003
Zoltan Boszormenyi [EMAIL PROTECTED] writes: Tom Lane írta: This means that GENERATED BY DEFAULT AS IDENTITY is not at all equivalent to our historical behavior for SERIAL columns and hence we cannot merge the two cases. Yes, they are equivalent if you read 5IWD2-02-Foundation-2006-04.pdf or 5CD2-02-Foundation-2006-01.pdf, i.e. the latest two drafts. Hm. So what we've got here is that the committee has decided the 2003 spec is broken, and they may someday come out with a revised definition that might, or might not, bear any resemblance to the current 200n working papers. According to some off-list discussion, nobody is entirely sure what the current draft is trying to say anyway. That about tears it for me: I think we should reject at least the IDENTITY parts of this patch, and very likely the entire thing. I've spent more than three days now trying to get it into a committable form, time I can't really afford to spend right now on a patch that adds such a marginal feature. AFAICS the only actual new feature that IDENTITY adds is the ability to make the default expression silently override user-specified insertion data, as in fact was the use-case argued by you here: http://archives.postgresql.org/pgsql-hackers/2006-08/msg00038.php Now we find that as the spec is actually written, it doesn't work that way: if you try to specify a value other than DEFAULT for an inserted column, you get a syntax error, unless you specify OVERRIDING SYSTEM VALUE or OVERRIDING USER VALUE, neither of which are going to be found in legacy apps ported from other DBMSes, and neither of which add any actual new functionality (if you have to write OVERRIDING USER VALUE, you could just as well not specify the column). So I'm seeing a lot of complexity and a lot of confusion added for not much, not even considering the risks of trying to track a moving standards target. As for GENERATED ALWAYS AS (expr), now that we understand that it's not supposed to define a virtual column, what's the point? You can get the same behavior with a trivial BEFORE INSERT/UPDATE trigger that recomputes the derived value, and you don't have to buy into the rather ill-defined spec behavior (in particular the point that the generated column is effectively undefined during trigger firing seems really poorly done). In fact, given that the only plausible use-cases involve expressions that are expensive to compute, a trigger can probably do *better* than the built-in feature, since it can make use of application knowledge about when a recomputation is really necessary. The current patch recomputes the expression on every UPDATE, and would have a hard time being any brighter than that, given that we don't know what BEFORE triggers might do to the row. So at this point I'm feeling that we've poured a lot of effort down a hole :-(. We are not really close to having a patch that implements the current 200n draft (in particular note that OVERRIDING USER VALUE is not a no-op according to this morning's understanding of the draft). Even if we were, I'd vote against implementing a part of the draft that's clearly still in flux --- if they change it again, we'd be stuck. But the real bottom line is that I don't see enough use-case for these features to justify the work done already, let alone a bunch more work. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Behavior of GENERATED columns per SQL2003
On Thu, 2007-05-10 at 10:11 -0400, Tom Lane wrote: As for GENERATED ALWAYS AS (expr), now that we understand that it's not supposed to define a virtual column, what's the point? You can get the same behavior with a trivial BEFORE INSERT/UPDATE trigger that recomputes the derived value, and you don't have to buy into the rather ill-defined spec behavior (in particular the point that the generated column is effectively undefined during trigger firing seems really poorly done). In fact, given that the only plausible use-cases involve expressions that are expensive to compute, a trigger can probably do *better* than the built-in feature, since it can make use of application knowledge about when a recomputation is really necessary. The current patch recomputes the expression on every UPDATE, and would have a hard time being any brighter than that, given that we don't know what BEFORE triggers might do to the row. We do need virtual columns, whether the spec requires them or not. They would allow us to completely remove the column value when using value-list based partitioning, giving considerable space savings for VLDBs. ISTM that we should interpret this as a requirement for a virtual column. We can always move from that to a stored column if the spec becomes more specific, though it would be harder to move the other way. And as you point out, storing the value would make no sense. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Behavior of GENERATED columns per SQL2003
Simon Riggs [EMAIL PROTECTED] writes: On Thu, 2007-05-10 at 10:11 -0400, Tom Lane wrote: As for GENERATED ALWAYS AS (expr), now that we understand that it's not supposed to define a virtual column, what's the point? We do need virtual columns, whether the spec requires them or not. Agreed, they seem more useful than what the spec's got in mind. You can fake a virtual column using a view, but it's a whole lot more painful than faking a GENERATED column using a trigger (at least if you wish the view to be updatable). ISTM that we should interpret this as a requirement for a virtual column. We can always move from that to a stored column if the spec becomes more specific, though it would be harder to move the other way. If you're suggesting commandeering the spec's GENERATED ALWAYS syntax to represent virtual columns, when the committee has made it clear that that's not what they intend, I say that's sheer folly. What will you do when they tweak the spec to the point where a virtual column clearly doesn't satisfy it? If we want a nonstandard feature we should use a nonstandard syntax for it. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Behavior of GENERATED columns per SQL2003
After some more study of the SQL spec, the distinction between GENERATED ALWAYS AS IDENTITY and GENERATED BY DEFAULT AS IDENTITY is not what I thought it was. * As far as I can find from the spec, there is *no* difference between the two cases for INSERT commands. The rule is that you ignore any user-supplied data and use the default (ie, nextval()) unless OVERRIDING SYSTEM VALUE is specified. It is not an error to try to insert data into an identity column, it's just ignored unless OVERRIDING SYSTEM VALUE. * The difference for UPDATE commands is that you can update a BY DEFAULT identity column to anything you want, whereas for an ALWAYS identity it's an error to update to anything but DEFAULT (which causes a fresh nextval() to be assigned). Both behaviors are different from a generated column, which is updated whether you mention it or not. This means that GENERATED BY DEFAULT AS IDENTITY is not at all equivalent to our historical behavior for SERIAL columns and hence we cannot merge the two cases. The lack of any behavioral difference for INSERT seems surprising and counterintuitive; have I just missed something in the spec? BTW, I found what they did about the problem that generated columns are out of sync with their underlying columns during BEFORE-trigger execution: in 11.39 12)If BEFORE is specified, then: ... c) The triggered action shall not contain a field reference that references a field in the new transition variable corresponding to a generated column of T. IOW they just pretend you can't look. So I think we need not worry about leaving the values out-of-date until after the triggers fire. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
[HACKERS] Behavior of GENERATED columns per SQL2003
I've been studying the SQL spec in a bit more detail and I'm suddenly thinking that we've got the behavior all wrong in the current GENERATED/IDENTITY patch. In particular, it looks to me like we've been implementing GENERATED ALWAYS AS (expr) according to the rules that the spec in fact lays down only for GENERATED ALWAYS AS IDENTITY. You'd think the two constructs would be pretty closely related but the behaviors specified by the spec are light-years apart. If you look closely, a generated column in the meaning of section 4.14.8 is one that has GENERATED ALWAYS AS (expr), and identity columns are *not* in this class. It looks to me like the behavior the spec intends for a generated column is actually that it can be implemented as a virtual column occupying no space on disk and instead computed on-the-fly when retrieved. Identity columns can have their values overridden by the user (it's a little harder if GENERATED ALWAYS, but still possible), and they don't change during an UPDATE unless specifically forced to. In contrast, generated columns cannot be overridden by assignment, and are recomputed from their base columns during updates. This realization also explains the following, otherwise rather strange, facts: * There is no GENERATED BY DEFAULT AS (expr) in the spec. * GENERATED expressions are specifically disallowed from containing subselects, calling functions that access any SQL-data, or being nondeterministic; hence their values depend solely on the regular columns in the same row. * While identity columns are updated (if needed) before execution of BEFORE triggers, generated columns are updated after BEFORE triggers; hence a BEFORE trigger can override the value in one case and not the other. (The current patch gets this wrong, btw.) * Generated columns are forcibly updated when their base columns change as a result of FK constraints (such as ON UPDATE CASCADE). It looks to me like a BEFORE trigger is actually the only place that can (transiently) see values of a generated column that are different from the result of applying the generation expression on the rest of the row. It's unclear whether that's intentional or an oversight. Is anyone familiar with a database that implements SQL-spec generated columns? Do they actually store the columns? regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Behavior of GENERATED columns per SQL2003
Oracle 10g, MySQL 5, and SQL Server 2005 don't appear to support the syntax. The SQL:2003 SIGMOD paper [1] indicates pretty clearly that their intention is for the values of generated columns to be stored on disk: ... commonly used expressions are evaluated once and their results stored for future use Generated columns can lead to higher performance... because of reduced computation -Dave [1] Eisenberg, A., Melton, J., Kulkarni, K., Michels, J., and Zemke, F. 2004. SQL:2003 has been published. SIGMOD Rec. 33, 1 (Mar. 2004), 119-126. http://www.sigmod.org/record/issues/0403/E.JimAndrew-standard.pdf Tom Lane wrote: I've been studying the SQL spec in a bit more detail and I'm suddenly thinking that we've got the behavior all wrong in the current GENERATED/IDENTITY patch. In particular, it looks to me like we've been implementing GENERATED ALWAYS AS (expr) according to the rules that the spec in fact lays down only for GENERATED ALWAYS AS IDENTITY. You'd think the two constructs would be pretty closely related but the behaviors specified by the spec are light-years apart. If you look closely, a generated column in the meaning of section 4.14.8 is one that has GENERATED ALWAYS AS (expr), and identity columns are *not* in this class. It looks to me like the behavior the spec intends for a generated column is actually that it can be implemented as a virtual column occupying no space on disk and instead computed on-the-fly when retrieved. Identity columns can have their values overridden by the user (it's a little harder if GENERATED ALWAYS, but still possible), and they don't change during an UPDATE unless specifically forced to. In contrast, generated columns cannot be overridden by assignment, and are recomputed from their base columns during updates. This realization also explains the following, otherwise rather strange, facts: * There is no GENERATED BY DEFAULT AS (expr) in the spec. * GENERATED expressions are specifically disallowed from containing subselects, calling functions that access any SQL-data, or being nondeterministic; hence their values depend solely on the regular columns in the same row. * While identity columns are updated (if needed) before execution of BEFORE triggers, generated columns are updated after BEFORE triggers; hence a BEFORE trigger can override the value in one case and not the other. (The current patch gets this wrong, btw.) * Generated columns are forcibly updated when their base columns change as a result of FK constraints (such as ON UPDATE CASCADE). It looks to me like a BEFORE trigger is actually the only place that can (transiently) see values of a generated column that are different from the result of applying the generation expression on the rest of the row. It's unclear whether that's intentional or an oversight. Is anyone familiar with a database that implements SQL-spec generated columns? Do they actually store the columns? regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ---(end of broadcast)--- TIP 6: explain analyze is your friend