Re: [HACKERS] ALTER TYPE 2: skip already-provable no-work rewrites
Noah, * Noah Misch (n...@leadboat.com) wrote: On Mon, Feb 14, 2011 at 04:06:59PM -0500, Stephen Frost wrote: I've attached a new version of the patch that attempts to flesh out the comments based on your feedback. Does it improve things? Yes, much better, thanks! Offhand, I can't think of any concrete implementation along those lines that would simplify the overall task. Did you have something more specific in mind? Sadly, no. :) For now it seemed like premature abstraction. Fair enough. All in all, this patch looks good to me. Looks like that might be moot, however, based on Robert's comments. Still, thanks for it, I certainly look forward to having it eventually. :) Thanks again, Stephen signature.asc Description: Digital signature
Re: [HACKERS] ALTER TYPE 2: skip already-provable no-work rewrites
On Sun, Feb 13, 2011 at 02:53:07PM -0500, Robert Haas wrote: On Sun, Feb 13, 2011 at 6:50 AM, Noah Misch n...@leadboat.com wrote: Yikes. ?I didn't like that Assert much, but maybe we need it, because this is scary. Can you elaborate on the fear-inducing aspect? ?I don't mind re-adding the Assert, but it seems that some positive understanding of the assumption's validity is in order. Well, it's pretty obvious that if somehow we were to go down that code path and not get back a value that was identical to the one we had before, we'd have a corrupted table. Certainly. Understand that the first increment of this patch already made a similar assumption about RelabelType, and CoerceToDomain is no less stable in this respect. We're exposed to this level of responsibility already. We had better get the code right, but what else is new? It seems that just about any refactoring of tablecmds.c might create such a possibility. Uhhh. Example? Assertions are a good idea when the reasons why something is true are far-removed (in the code) from the places where they need to be true. Yes. Anyway, since we both clearly like the assertion, I've re-added it. I'm somewhat uncomfortable also with the dependency of this code on very subtle semantic differences between if (newrel) and if (tab-newvals != NIL). I think this would blow up and die if for any reason newrel were non-NULL but tab-newvals were NIL. Actually, doesn't that happen if we're adding or dropping an OID column? Adding or dropping OIDs as the sole operation of the ALTER TABLE does result in newrel != NULL and tab-newvals == NIL, but the code handles that fine. The loop just re-forms the tuple from its original values/nulls lists. I think it might be cleaner to have tab-verify_constraints rather than tab-verify. Then ATRewriteTables() could test if (tab-verify_constraints || tab-new_notnull), and you wouldn't need the bit that sets at-verify if at-new_notnull is already set. I wouldn't care for that change. Despite the name, NOT NULL and FOREIGN KEY constraints wouldn't be represented. Casting to a domain to check the domain constraints is a stretch of the term, and it will be fully obsolete when we optimize xml - text and such. However, I've moved the setting of tab-verify to the points where we set tab-new_notnull, rather than doing it later in that one place. Correct me if I'm wrong here, but we could handle the domains-without-contraints part here with about three additional lines of code, right? It's only the domains-with-constraints part that requires the rest of this. Correct. I'm half-tempted to put that part off to 9.2, in the hopes of getting a more substantial solution that can also handle things like text - xml which we don't have time to re-engineer right now. I see. Anyway, I've attached an updated version with these changes: - Restored the transform expression walk to its own function - Assert re-added - tab-verify set alongside tab-new_notnull, not later nm diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c index 452ced6..466be25 100644 diff --git a/src/backend/commands/index 1db42d0..06942c3 100644 *** a/src/backend/commands/tablecmds.c --- b/src/backend/commands/tablecmds.c *** *** 71,76 --- 71,77 #include storage/smgr.h #include utils/acl.h #include utils/builtins.h + #include utils/datum.h #include utils/fmgroids.h #include utils/inval.h #include utils/lsyscache.h *** *** 142,147 typedef struct AlteredTableInfo --- 143,149 List *newvals;/* List of NewColumnValue */ boolnew_notnull;/* T if we added new NOT NULL constraints */ boolrewrite;/* T if we a rewrite is forced */ + boolverify; /* T if we shall verify constraints */ Oid newTableSpace; /* new tablespace; 0 means no change */ /* Objects to rebuild after completing ALTER TYPE operations */ List *changedConstraintOids; /* OIDs of constraints to rebuild */ *** *** 336,342 static void ATPrepAlterColumnType(List **wqueue, AlteredTableInfo *tab, Relation rel, bool recurse, bool recursing, AlterTableCmd *cmd, LOCKMODE lockmode); ! static bool ATColumnChangeRequiresRewrite(Node *expr, AttrNumber varattno); static void ATExecAlterColumnType(AlteredTableInfo *tab, Relation rel, const char *colName, TypeName *typeName, LOCKMODE lockmode); static void ATPostAlterTypeCleanup(List **wqueue, AlteredTableInfo *tab, LOCKMODE lockmode); --- 338,345 AlteredTableInfo *tab, Relation rel, bool recurse, bool
Re: [HACKERS] ALTER TYPE 2: skip already-provable no-work rewrites
Noah, I'm even less familiar w/ this code than Robert, but figured I'd give a shot at reviewing this anyway. I definitely like avoiding table rewrites if I can get away with it. :) First question is- why do you use #ifdef USE_ASSERT_CHECKING ..? assert_enabled exists and will work the way you expect regardless, no? Strikes me as unlikely that the checks would be a real performance problem.. In ATColumnChangeSetWorkLevel(), I'm really not a huge fan of using a passed-in argument to move through a list with.. I'd suggest using a local variable that is set from what's passed in. I do see that's inheirited, but still, you've pretty much redefined that entire function anyway.. Also, I feel like that while(!tab-rewrite) really deserves more explanation of what's happening than the function-level comment above gives it. I'd prefer to see a comment above the while() explaining that we're moving through a list to see if there's any level at which expr is something complicated or is referring to a domain which has constraints on it (presuming that I've followed what's going on correctly, that is..). It also seems like it'd make more sense to me to be a while() controlled by (IsA(expr, Var) ((Var *) expr)-varattno == varattno), since that's really the normal stopping point. These are all more stylistic issues than anything else. Last, but not least, I do worry about how this may impact contrib modules, external projects, or user-added data types, such as PostGIS, hstore, and ip4r. Could we/should we limit this to only PG data types that we 'know' are binary compatible? Is there any way or reason such external modules could be fouled up by this? Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] ALTER TYPE 2: skip already-provable no-work rewrites
Hi Stephen, Thanks for jumping in on this one. On Mon, Feb 14, 2011 at 01:12:21PM -0500, Stephen Frost wrote: First question is- why do you use #ifdef USE_ASSERT_CHECKING ..? The other six code sites checking assert_enabled directly do the same. assert_enabled exists and will work the way you expect regardless, no? Yes. Strikes me as unlikely that the checks would be a real performance problem.. Agreed. In ATColumnChangeSetWorkLevel(), I'm really not a huge fan of using a passed-in argument to move through a list with.. I'd suggest using a local variable that is set from what's passed in. I do see that's inheirited, but still, you've pretty much redefined that entire function anyway.. Could do that. However, the function would reference the original argument just once, to make that copy. I'm not seeing a win. Also, I feel like that while(!tab-rewrite) really deserves more explanation of what's happening than the function-level comment above gives it. I'd prefer to see a comment above the while() explaining that we're moving through a list to see if there's any level at which expr is something complicated or is referring to a domain which has constraints on it (presuming that I've followed what's going on correctly, that is..). The way I like to think about it is that we're recursively walking an expression tree to determine which of three categories it falls into: always produces the same value without error; always produces the same value on success, but may throw an error; neither of the above. We have a whitelist of node types that are acceptable, and anything else makes us assume the worst. (The nodes we accept are simple enough that the recursion degenerates to iteration.) Here's the comment explaining the algorithm, from an earlier version of the patch: + /* GetCoerceExemptions() + *Assess invariants of a coercion expression. + * + * Various common expressions arising from type coercion are subject to + * optimizations. For example, a simple varchar - text cast will never change + * the underlying data (COERCE_EXEMPT_NOCHANGE) and never yield an error + * (COERCE_EXEMPT_NOERROR). A varchar(8) - varchar(4) will never change the + * data, but it may yield an error. Given a varno and varattno denoting the + * source datum, determine which invariants hold for an expression by walking it + * per these rules: + * + *1. A Var with the varno/varattno in question has both invariants. + *2. A RelabelType node inherits the invariants of its sole argument. + *3. A CoerceToDomain node inherits any COERCE_EXEMPT_NOCHANGE invariant from + *its sole argument. When GetDomainConstraints() == NIL, it also inherits + *COERCE_EXEMPT_NOERROR. Otherwise, COERCE_EXEMPT_NOERROR becomes false. + *4. All other nodes have neither invariant. + * + * Returns a bit string that may contain the following bits: + *COERCE_EXEMPT_NOCHANGE: expression result will always have the same binary + *representation as a Var expression having the given varno and + *varattno + *COERCE_EXEMPT_NOERROR: expression will never throw an error + */ The return value changed, but the rest remains accurate. Here's a similar explanation from the design thread; it covers a more general case: http://archives.postgresql.org/message-id/20101231013534.ga7...@tornado.leadboat.com Should I restore some of that? Any other particular text that would have helped? It also seems like it'd make more sense to me to be a while() controlled by (IsA(expr, Var) ((Var *) expr)-varattno == varattno), since that's really the normal stopping point. If we can optimize to some extent, that is the stopping point. If not, tab-rewrite is the stopping point. I picked the no-optimization case as normal and used that as the loop condition, but one could go either way. These are all more stylistic issues than anything else. Last, but not least, I do worry about how this may impact contrib modules, external projects, or user-added data types, such as PostGIS, hstore, and ip4r. Could we/should we limit this to only PG data types that we 'know' are binary compatible? Is there any way or reason such external modules could be fouled up by this? External modules are safe. If a binary coercion cast (CREATE CAST ... WITHOUT FUNCTION) implements the type conversion for an ALTER TABLE ... SET DATA TYPE, coerce_to_target_type() will inject a RelabelType node, and this code will pick up on that and avoid the rewrite. If such a cast were defined erroneously, the user is no less in trouble. He'd get a table rewrite, but the rewrite would just transfer the bits unchanged. The validity of this optimization does not rely on any user-settable property of a data type, but it does lean heavily on the execution semantics of specific nodes. Thanks, nm -- Sent via pgsql-hackers mailing list
Re: [HACKERS] ALTER TYPE 2: skip already-provable no-work rewrites
* Noah Misch (n...@leadboat.com) wrote: On Mon, Feb 14, 2011 at 01:12:21PM -0500, Stephen Frost wrote: First question is- why do you use #ifdef USE_ASSERT_CHECKING ..? The other six code sites checking assert_enabled directly do the same. Wow, I could have sworn that I looked at what others were doing and didn't see that. Sorry for the noise. In ATColumnChangeSetWorkLevel(), I'm really not a huge fan of using a passed-in argument to move through a list with.. I'd suggest using a local variable that is set from what's passed in. I do see that's inheirited, but still, you've pretty much redefined that entire function anyway.. Could do that. However, the function would reference the original argument just once, to make that copy. I'm not seeing a win. Perhaps I'm just deficient in this area, but I think of arguments, unless specifically intended otherwise, to be 'read-only' and based on that assumption, seeing it come up as an lvalue hits me as wrong. I'm happy enough to let someone else decide if they agree with me or you though. :) The way I like to think about it is that we're recursively walking an expression tree to determine which of three categories it falls into: always produces the same value without error; always produces the same value on success, but may throw an error; neither of the above. We have a whitelist of node types that are acceptable, and anything else makes us assume the worst. (The nodes we accept are simple enough that the recursion degenerates to iteration.) It's that degeneration that definitely hits me as making the whole thing look a bit 'funny'. When I first looked at the loop, I was looking for a tree structure and trying to figure out how it could work with just a simple while(). http://archives.postgresql.org/message-id/20101231013534.ga7...@tornado.leadboat.com Should I restore some of that? Any other particular text that would have helped? I definitely think the examples given, enumerating the types of nodes that matter for this (and why they're the ones we look for), and the rules that are followed would help a great deal. Anyone else who comes across this code may be wondering do I need to do something here for this new node type that I just added. It also seems like it'd make more sense to me to be a while() controlled by (IsA(expr, Var) ((Var *) expr)-varattno == varattno), since that's really the normal stopping point. If we can optimize to some extent, that is the stopping point. If not, tab-rewrite is the stopping point. I picked the no-optimization case as normal and used that as the loop condition, but one could go either way. I would think you could still short-circuit the loop when you've discovered a case where we have to rewrite the table anyway. Having the comments updated to reflect what's going on and why stopping on tab-rewrite, in particular, makes sense is more important though. The validity of this optimization does not rely on any user-settable property of a data type, but it does lean heavily on the execution semantics of specific nodes. After thinking through this and diving into coerce_to_target_type() a bit, I'm finally coming to understand how this is working. The gist of it, if I follow correctly, is that if the planner doesn't think we have to do anything but copy this value to make it the new data type, then we're good to go. That makes sense, when you think about it, but boy does it go the long way around to get there. Essentially, coerce_to_target_type() is returning an expression tree and ATColumnChangeSetWorkLevel() is checking to see if that expression tree is copy the value. Maybe this is a bit much, but if coerce_to_target_type() knows the expression given to it is a straight-up copy, perhaps it could pass that information along in an easier to use format than an expression tree? That would obviate the need for ATColumnChangeSetWorkLevel() entirely, if I understand correctly. Of course, coerce_to_target_type() is used by lots of other places, almost all of which probably have to have an expression tree to stuff into a plan, so maybe a simpler function could be defined which operates at the level that ATColumnChangeSetWorkLevel() needs? Or perhaps other places would benefit from knowing that a given conversion is an actual no-op rather than copying the value? Just my 2c, I don't believe the patch could cause problems now that I'm understanding it better, but it sure does seem excessive to use an expression tree to figure out when something is a no-op. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] ALTER TYPE 2: skip already-provable no-work rewrites
On Mon, Feb 14, 2011 at 04:06:59PM -0500, Stephen Frost wrote: * Noah Misch (n...@leadboat.com) wrote: On Mon, Feb 14, 2011 at 01:12:21PM -0500, Stephen Frost wrote: In ATColumnChangeSetWorkLevel(), I'm really not a huge fan of using a passed-in argument to move through a list with.. I'd suggest using a local variable that is set from what's passed in. I do see that's inheirited, but still, you've pretty much redefined that entire function anyway.. Could do that. However, the function would reference the original argument just once, to make that copy. I'm not seeing a win. Perhaps I'm just deficient in this area, but I think of arguments, unless specifically intended otherwise, to be 'read-only' and based on that assumption, seeing it come up as an lvalue hits me as wrong. I'm happy enough to let someone else decide if they agree with me or you though. :) Same here. If there's a general project tendency either way, I'll comply. I haven't noticed one, but I haven't been looking. I've attached a new version of the patch that attempts to flesh out the comments based on your feedback. Does it improve things? The validity of this optimization does not rely on any user-settable property of a data type, but it does lean heavily on the execution semantics of specific nodes. After thinking through this and diving into coerce_to_target_type() a bit, I'm finally coming to understand how this is working. The gist of it, if I follow correctly, is that if the planner doesn't think we have to do anything but copy this value to make it the new data type, then we're good to go. That makes sense, when you think about it, but boy does it go the long way around to get there. Essentially. The planner is not yet involved. We're looking at an expression tree after parse analysis, before planning. Essentially, coerce_to_target_type() is returning an expression tree and ATColumnChangeSetWorkLevel() is checking to see if that expression tree is copy the value. Maybe this is a bit much, but if coerce_to_target_type() knows the expression given to it is a straight-up copy, perhaps it could pass that information along in an easier to use format than an expression tree? That would obviate the need for ATColumnChangeSetWorkLevel() entirely, if I understand correctly. PostgreSQL has a strong tradition of passing around expression trees and walking them to (re-)discover facts. See the various clauses.h functions. Also, when you have a USING clause, coerce_to_target_type() doesn't know the whole picture. We could teach it to discover the whole picture, but that would amount to a very similar tree walk in a different place. Of course, coerce_to_target_type() is used by lots of other places, almost all of which probably have to have an expression tree to stuff into a plan, so maybe a simpler function could be defined which operates at the level that ATColumnChangeSetWorkLevel() needs? Offhand, I can't think of any concrete implementation along those lines that would simplify the overall task. Did you have something more specific in mind? Or perhaps other places would benefit from knowing that a given conversion is an actual no-op rather than copying the value? RelabelType itself does not cause a copy; the existing datum passes through. In the case of ALTER TABLE, we do eventually copy the datum via heap_form_tuple. There may be other places that would benefit from similar analysis. For that reason, I originally had ATColumnChangeSetWorkLevel() in parse_coerce.c with the name GetCoerceExemptions(). I'm not aware of any specific applications, though. For now it seemed like premature abstraction. Thanks again, nm diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c index 452ced6..466be25 100644 diff --git a/src/backend/commands/index 1db42d0..ab0bcda 100644 *** a/src/backend/commands/tablecmds.c --- b/src/backend/commands/tablecmds.c *** *** 71,76 --- 71,77 #include storage/smgr.h #include utils/acl.h #include utils/builtins.h + #include utils/datum.h #include utils/fmgroids.h #include utils/inval.h #include utils/lsyscache.h *** *** 142,147 typedef struct AlteredTableInfo --- 143,149 List *newvals;/* List of NewColumnValue */ boolnew_notnull;/* T if we added new NOT NULL constraints */ boolrewrite;/* T if we a rewrite is forced */ + boolverify; /* T if we shall verify constraints */ Oid newTableSpace; /* new tablespace; 0 means no change */ /* Objects to rebuild after completing ALTER TYPE operations */ List *changedConstraintOids; /* OIDs of constraints to rebuild */ *** *** 336,342 static void ATPrepAlterColumnType(List **wqueue,
Re: [HACKERS] ALTER TYPE 2: skip already-provable no-work rewrites
On Mon, Feb 14, 2011 at 7:52 AM, Noah Misch n...@leadboat.com wrote: I'm half-tempted to put that part off to 9.2, in the hopes of getting a more substantial solution that can also handle things like text - xml which we don't have time to re-engineer right now. I see. After sleeping on it, I think this route makes most sense. The ability to downgrade a rewrite to a scan is really a separate feature, and I'd like to see us implement that in a more complete way when/if we're going to do it; and I'd rather commit it at the beginning of a development cycle when we have more time to find any lurking bugs. So I've committed a change that just handles the unconstrained domain case. I think for 9.2 we should revisit the following areas: 1. Downgrading rewrites to scans (vs. skipping them altogether). One idea is that we might modify CREATE CAST so that you can do this: CREATE CAST (source_type AS target_type) WITH [ CHECK ] FUNCTION function_name (argument_type [, ...]) [ AS ASSIGNMENT | AS IMPLICIT ]; The inclusion of the keyword check there would inform the system that the binary representation can't change, but (as distinguished from WITHOUT FUNCTION) an error might be thrown. Of course, I'm not quite sure how to get this information over to the alter table machinery cleanly. 2. Detecting binary-coercible cases that involve typemods, rather than just type OIDs. 3. Avoiding index rebuilds. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TYPE 2: skip already-provable no-work rewrites
On Sun, Feb 13, 2011 at 12:04:20AM -0500, Robert Haas wrote: On Sat, Feb 12, 2011 at 10:45 AM, Noah Misch n...@leadboat.com wrote: That said, I've tried both constructions, and I marginally prefer the end result with AlteredTableInfo.verify. ?I've inlined ATColumnChangeRequiresRewrite into ATPrepAlterColumnType; it would need to either pass back two bools or take an AlteredTableInfo arg to mutate, so this seemed cleaner. I think I like the idea of passing it the AlteredTableInfo. Okay. I've omitted the assertion that my previous version added to ATRewriteTable; it was helpful for other scan-only type changes, but it's excessive for domains alone. ?Otherwise, the differences are cosmetic. So in the case of a constrained domain, it looks like we're going to evaluate the changed columns, but if no error is thrown, we're going to assume they match the original ones and throw out the data? Correct. We can see that a RelabelType changes no values by inspecting ExecEvalRelabelType. Likewise, by inspecting ExecEvalCoerceToDomain, we can know that a CoerceToDomain node may introduce errors but never modified values. Yikes. I didn't like that Assert much, but maybe we need it, because this is scary. Can you elaborate on the fear-inducing aspect? I don't mind re-adding the Assert, but it seems that some positive understanding of the assumption's validity is in order. nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TYPE 2: skip already-provable no-work rewrites
On Sun, Feb 13, 2011 at 6:50 AM, Noah Misch n...@leadboat.com wrote: Yikes. I didn't like that Assert much, but maybe we need it, because this is scary. Can you elaborate on the fear-inducing aspect? I don't mind re-adding the Assert, but it seems that some positive understanding of the assumption's validity is in order. Well, it's pretty obvious that if somehow we were to go down that code path and not get back a value that was identical to the one we had before, we'd have a corrupted table. It seems that just about any refactoring of tablecmds.c might create such a possibility. Assertions are a good idea when the reasons why something is true are far-removed (in the code) from the places where they need to be true. I'm somewhat uncomfortable also with the dependency of this code on very subtle semantic differences between if (newrel) and if (tab-newvals != NIL). I think this would blow up and die if for any reason newrel were non-NULL but tab-newvals were NIL. Actually, doesn't that happen if we're adding or dropping an OID column? I think it might be cleaner to have tab-verify_constraints rather than tab-verify. Then ATRewriteTables() could test if (tab-verify_constraints || tab-new_notnull), and you wouldn't need the bit that sets at-verify if at-new_notnull is already set. That part makes the semantics of tab-verify depend on which point in processing we're at, which I have found to be a recipe for confusion in other parts of the source base (planner, I'm looking at you). Correct me if I'm wrong here, but we could handle the domains-without-contraints part here with about three additional lines of code, right? It's only the domains-with-constraints part that requires the rest of this. I'm half-tempted to put that part off to 9.2, in the hopes of getting a more substantial solution that can also handle things like text - xml which we don't have time to re-engineer right now. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TYPE 2: skip already-provable no-work rewrites
On Fri, Feb 11, 2011 at 02:49:27PM -0500, Robert Haas wrote: You might want to consider a second boolean in lieu of a three way enum. I'm not sure if that's cleaner but if it lets you write: if (blah) at-verify = true; instead of: if (blah) at-worklevel = Min(at-worklevel, WORK_VERIFY); ...then I think that might be cleaner. Good point; the Max() calls did not make much sense all by themselves. The point was to make sure nothing decreased the worklevel. Wrapping them in a macro, say, ATRequireWork, probably would have helped. That said, I've tried both constructions, and I marginally prefer the end result with AlteredTableInfo.verify. I've inlined ATColumnChangeRequiresRewrite into ATPrepAlterColumnType; it would need to either pass back two bools or take an AlteredTableInfo arg to mutate, so this seemed cleaner. I've omitted the assertion that my previous version added to ATRewriteTable; it was helpful for other scan-only type changes, but it's excessive for domains alone. Otherwise, the differences are cosmetic. The large block in ATRewriteTable is now superfluous. For easier review, I haven't removed it. I missed a typo in the last patch: T if we a rewrite is forced. Not changed in this patch as I assume you'll want to commit it separately. nm diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c index 452ced6..466be25 100644 diff --git a/src/backend/commands/index 1db42d0..2e10661 100644 *** a/src/backend/commands/tablecmds.c --- b/src/backend/commands/tablecmds.c *** *** 142,147 typedef struct AlteredTableInfo --- 142,148 List *newvals;/* List of NewColumnValue */ boolnew_notnull;/* T if we added new NOT NULL constraints */ boolrewrite;/* T if we a rewrite is forced */ + boolverify; /* T if we shall verify constraints */ Oid newTableSpace; /* new tablespace; 0 means no change */ /* Objects to rebuild after completing ALTER TYPE operations */ List *changedConstraintOids; /* OIDs of constraints to rebuild */ *** *** 336,342 static void ATPrepAlterColumnType(List **wqueue, AlteredTableInfo *tab, Relation rel, bool recurse, bool recursing, AlterTableCmd *cmd, LOCKMODE lockmode); - static bool ATColumnChangeRequiresRewrite(Node *expr, AttrNumber varattno); static void ATExecAlterColumnType(AlteredTableInfo *tab, Relation rel, const char *colName, TypeName *typeName, LOCKMODE lockmode); static void ATPostAlterTypeCleanup(List **wqueue, AlteredTableInfo *tab, LOCKMODE lockmode); --- 337,342 *** *** 3242,3247 ATRewriteTables(List **wqueue, LOCKMODE lockmode) --- 3242,3251 heap_close(rel, NoLock); } + /* New NOT NULL constraints always require a scan. */ + if (tab-new_notnull) + tab-verify = true; + /* * We only need to rewrite the table if at least one column needs to * be recomputed, or we are adding/removing the OID column. *** *** 3313,3319 ATRewriteTables(List **wqueue, LOCKMODE lockmode) * Test the current data within the table against new constraints * generated by ALTER TABLE commands, but don't rebuild data. */ ! if (tab-constraints != NIL || tab-new_notnull) ATRewriteTable(tab, InvalidOid, lockmode); /* --- 3317,3323 * Test the current data within the table against new constraints * generated by ALTER TABLE commands, but don't rebuild data. */ ! if (tab-verify) ATRewriteTable(tab, InvalidOid, lockmode); /* *** *** 3387,3393 ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode) Relationnewrel; TupleDesc oldTupDesc; TupleDesc newTupDesc; - boolneedscan = false; List *notnull_attrs; int i; ListCell *l; --- 3391,3396 *** *** 3446,3452 ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode) switch (con-contype) { case CONSTR_CHECK: - needscan = true; con-qualstate = (List *) ExecPrepareExpr((Expr *) con-qual, estate);
Re: [HACKERS] ALTER TYPE 2: skip already-provable no-work rewrites
On Sat, Feb 12, 2011 at 10:45 AM, Noah Misch n...@leadboat.com wrote: That said, I've tried both constructions, and I marginally prefer the end result with AlteredTableInfo.verify. I've inlined ATColumnChangeRequiresRewrite into ATPrepAlterColumnType; it would need to either pass back two bools or take an AlteredTableInfo arg to mutate, so this seemed cleaner. I think I like the idea of passing it the AlteredTableInfo. I've omitted the assertion that my previous version added to ATRewriteTable; it was helpful for other scan-only type changes, but it's excessive for domains alone. Otherwise, the differences are cosmetic. So in the case of a constrained domain, it looks like we're going to evaluate the changed columns, but if no error is thrown, we're going to assume they match the original ones and throw out the data? Yikes. I didn't like that Assert much, but maybe we need it, because this is scary. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TYPE 2: skip already-provable no-work rewrites
On Thu, Jan 27, 2011 at 2:48 PM, Noah Misch n...@leadboat.com wrote: On Wed, Jan 26, 2011 at 07:31:40AM -0500, Robert Haas wrote: I'd also suggest that this big if-block you changed to a case statement could just as well stay as an if-block. There are only three cases, and we want to avoid rearranging things more than necessary. It complicates both review and back-patching to no good end. Okay. I've also left out the large reindent in ATRewriteTable for now. Easy to re-add it later if desired. I think you should collect up what's left of ALTER TABLE 0 and the stuff on this thread, rebase it, and submit it as a single patch on this thread that applies directly against the master branch. We may decide to split it back up again in some other way, but I think the current division isn't actually buying us much. Done as attached. This preserves compatibility with our current handling of composite type dependencies. The rest you've seen before. OK, I finally got back to this. Sorry for the delay. I still feel that it's possible to accomplish the same goal with less complexity. See what you think of the attached. Upon examination, it appeared to me that trying to make the AlteredTableInfo contain an enum indicating whether we were doing a scan, a rewrite, or nothing was requiring more code change than I could really justify. What I ended up doing is replacing the 'bool new_changedoids' member with a 'bool rewrite' member. I'm pretty sure this is safe. The only reason we ever tested the new_changeoids member was to see whether we needed to do a rewrite; it wasn't used for anything else. The new member gets set either when we need to do a rewrite, or when a column type changes in a fashion that requires a rewrite. It's pretty easy to verify that this doesn't result in any behavior change, except for one corner case: currently, if you add or remove OIDs to/from a table and in the same ALTER TABLE command add a constraint that involves creating an index, such as a primary key, the new primary key index will get built, thrown away, and rebuilt a second time. With this change, we just build it once, which seems like an improvement, even though the case is vanishingly unlikely to ever happen in practice. I also have to say that after playing with a little bit, I find the new debugging messages you added to be quite invaluable for understanding what's really going on. The old output told you basically nothing useful, even if you cranked it all the way up to DEBUG3. This is much better. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company alter-type-2-rmh.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TYPE 2: skip already-provable no-work rewrites
Hi Robert, On Fri, Feb 11, 2011 at 10:27:11AM -0500, Robert Haas wrote: On Thu, Jan 27, 2011 at 2:48 PM, Noah Misch n...@leadboat.com wrote: Done as attached. ?This preserves compatibility with our current handling of composite type dependencies. ?The rest you've seen before. OK, I finally got back to this. Sorry for the delay. I still feel that it's possible to accomplish the same goal with less complexity. See what you think of the attached. It's a nice, clean patch. However, it achieves that by targeting a smaller goal. Specifically, it omits: 1) Test cases and DEBUG messages sufficient to facilitate them 2) Skip rewrite for T - constraint-free domain over T 3) Downgrade rewrite to scan for T - constrained domain over T Upon examination, it appeared to me that trying to make the AlteredTableInfo contain an enum indicating whether we were doing a scan, a rewrite, or nothing was requiring more code change than I could really justify. Offhand, I count 12 changed lines to introduce the enum. There may be other good reasons not to do it, but surely that wasn't it? What I ended up doing is replacing the 'bool new_changedoids' member with a 'bool rewrite' member. I'm pretty sure this is safe. The only reason we ever tested the new_changeoids member was to see whether we needed to do a rewrite; it wasn't used for anything else. The new member gets set either when we need to do a rewrite, or when a column type changes in a fashion that requires a rewrite. It's pretty easy to verify that this doesn't result in any behavior change, except for one corner case: currently, if you add or remove OIDs to/from a table and in the same ALTER TABLE command add a constraint that involves creating an index, such as a primary key, the new primary key index will get built, thrown away, and rebuilt a second time. With this change, we just build it once, which seems like an improvement, even though the case is vanishingly unlikely to ever happen in practice. This is fairly similar to the design in my first patch. The name was different (new_bits), and that patch had an extra bool for scan-only cases. I won't object to moving back to this, but I did find that your enum suggestion worked out significantly better. Even supposing we push off all scan-only cases to another patch, it would be good to have the tablecmds.c-internal representation of that in mind. No sense in simplifying a 12-line change to an 8-line change, only to redo it next patch. I also have to say that after playing with a little bit, I find the new debugging messages you added to be quite invaluable for understanding what's really going on. The old output told you basically nothing useful, even if you cranked it all the way up to DEBUG3. This is much better. Thanks. I added them solely to make automated testing possible, but they did turn out to have user value. I'll certainly make use of them when rewriting an inheritance tree of tables or a large table with many indexes. Hunk-specific comments follow, largely concerning documentation. I really, really don't want to get mired in a discussion of exact documentation text. In fact I think I'd rather hunt crocodiles barefoot, armed with nothing but ALTER TABLE ... DISABLE TRIGGER ALL than have such a discussion. But I'll articulate the rationale behind my original constructions, in case they include facts you did not consider when rewriting them. diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml index 9f02674..e3ceea8 100644 --- a/doc/src/sgml/ref/alter_table.sgml +++ b/doc/src/sgml/ref/alter_table.sgml @@ -766,9 +766,13 @@ ALTER TABLE replaceable class=PARAMETERname/replaceable para Adding a column with a non-null default or changing the type of an existing column will require the entire table and indexes to be rewritten. -This might take a significant amount of time for a large table; and it will -temporarily require double the disk space. Adding or removing a system -literaloid/ column likewise requires rewriting the entire table. +As an exception, if the old type and new types are binary compatible and In the documentation for CREATE CAST, we define binary compatible using Two types that are binary coercible both ways are also referred to as binary compatible. This feature does not require binary compatibility, merely a one-way binary coercion. Generally, though, I like where you're going. My version was accurate but obtuse. +the literalUSING/ clause does not change the column contents, the This is probably fine, but note that USING col || '' does not change the column contents, yet we won't optimize it. +table rewrite will be skipped, but the index rebuild is still required. This wrongly suggests that the rebuild mentioned earlier in the paragraph, affecting all indexes, will take place. It's a more-limited rebuild. +Adding or removing a
Re: [HACKERS] ALTER TYPE 2: skip already-provable no-work rewrites
On Fri, Feb 11, 2011 at 1:08 PM, Noah Misch n...@leadboat.com wrote: It's a nice, clean patch. However, it achieves that by targeting a smaller goal. Specifically, it omits: 1) Test cases and DEBUG messages sufficient to facilitate them That was an intentional omission. 2) Skip rewrite for T - constraint-free domain over T 3) Downgrade rewrite to scan for T - constrained domain over T These were not. Upon examination, it appeared to me that trying to make the AlteredTableInfo contain an enum indicating whether we were doing a scan, a rewrite, or nothing was requiring more code change than I could really justify. Offhand, I count 12 changed lines to introduce the enum. There may be other good reasons not to do it, but surely that wasn't it? I was unable to see what were getting out of that logic, and I couldn't readily verify that it was correct. @@ -766,9 +766,13 @@ ALTER TABLE replaceable class=PARAMETERname/replaceable para Adding a column with a non-null default or changing the type of an existing column will require the entire table and indexes to be rewritten. - This might take a significant amount of time for a large table; and it will - temporarily require double the disk space. Adding or removing a system - literaloid/ column likewise requires rewriting the entire table. + As an exception, if the old type and new types are binary compatible and In the documentation for CREATE CAST, we define binary compatible using Two types that are binary coercible both ways are also referred to as binary compatible. This feature does not require binary compatibility, merely a one-way binary coercion. Generally, though, I like where you're going. My version was accurate but obtuse. OK, I have touched up that language in the attached version. + the literalUSING/ clause does not change the column contents, the This is probably fine, but note that USING col || '' does not change the column contents, yet we won't optimize it. True; but I think we can let the fine user figure that out for themselves. :-) + table rewrite will be skipped, but the index rebuild is still required. This wrongly suggests that the rebuild mentioned earlier in the paragraph, affecting all indexes, will take place. It's a more-limited rebuild. Ah, OK. I've changed the wording there. This presents three options without any indication of how to choose between them. A user wanting just a rewrite should look no further than VACUUM FULL. We should document that ALTER TABLE can rewrite, both for general user planning and to dispel thoughts of doing both a VACUUM FULL and an ALTER TABLE in short succession. However, it doesn't help to advertise ALTER TABLE or CLUSTER as rewrite tools per se. I was wondering if we should try to be more clear about that, but I think it's a separate issue from this patch. --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -71,6 +71,7 @@ #include storage/smgr.h #include utils/acl.h #include utils/builtins.h +#include utils/datum.h This header is not needed in your version, is it? Looks like it isn't, thanks. Moving this from parse_coerce.c to tablecmds.c does make sense. OK, glad you agree. It seemed sensible to me.. I think what I'd like to do if there are no major objections is commit this as-is, and then if you could perhaps provide a patch containing the set of changes that are necessary to recapture the cases I inadvertently failed to handle, namely: 2) Skip rewrite for T - constraint-free domain over T 3) Downgrade rewrite to scan for T - constrained domain over T Then I'll review that separately. I think this change stands on its own, and committing it in steps will be simple for me than doing the whole thing in one go. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company alter-type-2-rmh-v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TYPE 2: skip already-provable no-work rewrites
On Fri, Feb 11, 2011 at 1:08 PM, Noah Misch n...@leadboat.com wrote: Even supposing we push off all scan-only cases to another patch, it would be good to have the tablecmds.c-internal representation of that in mind. No sense in simplifying a 12-line change to an 8-line change, only to redo it next patch. Incidentally, I don't really agree with this, as a philosophical point. There can be a lot of point to simplifying things, even if it means redoing a little work, if it makes them easier to understand, both for the people reviewing at the time and for the benefit of people reading the commit log in the future. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TYPE 2: skip already-provable no-work rewrites
On Fri, Feb 11, 2011 at 01:55:45PM -0500, Robert Haas wrote: On Fri, Feb 11, 2011 at 1:08 PM, Noah Misch n...@leadboat.com wrote: Even supposing we push off all scan-only cases to another patch, it would be good to have the tablecmds.c-internal representation of that in mind. ?No sense in simplifying a 12-line change to an 8-line change, only to redo it next patch. Incidentally, I don't really agree with this, as a philosophical point. There can be a lot of point to simplifying things, even if it means redoing a little work, if it makes them easier to understand, both for the people reviewing at the time and for the benefit of people reading the commit log in the future. Good to know. I can envision that perspective, and I share it when the savings is rather more substantial, say 10% of the patch or 100 lines. Below that threshold, the energy I expend grasping two interface changes in one patch series exceeds my annoyance at the premature generality. On Fri, Feb 11, 2011 at 01:34:14PM -0500, Robert Haas wrote: On Fri, Feb 11, 2011 at 1:08 PM, Noah Misch n...@leadboat.com wrote: Upon examination, it appeared to me that trying to make the AlteredTableInfo contain an enum indicating whether we were doing a scan, a rewrite, or nothing was requiring more code change than I could really justify. Offhand, I count 12 changed lines to introduce the enum. ?There may be other good reasons not to do it, but surely that wasn't it? I was unable to see what were getting out of that logic, and I couldn't readily verify that it was correct. The value probably gets clearer when you need to use all three states. I think what I'd like to do if there are no major objections is commit this as-is, and then if you could perhaps provide a patch containing the set of changes that are necessary to recapture the cases I inadvertently failed to handle, namely: 2) Skip rewrite for T - constraint-free domain over T 3) Downgrade rewrite to scan for T - constrained domain over T Then I'll review that separately. I think this change stands on its own, and committing it in steps will be simple for me than doing the whole thing in one go. That works for me. Know that, barring other suggestions, the followup patch will replace AlteredTableInfo.rewrite with the enum field. Just want to make sure that's not a surprise. ... And now that I've read your second reply, I probably didn't even have to mention it. Thanks again, nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TYPE 2: skip already-provable no-work rewrites
On Fri, Feb 11, 2011 at 2:17 PM, Noah Misch n...@leadboat.com wrote: Good to know. I can envision that perspective, and I share it when the savings is rather more substantial, say 10% of the patch or 100 lines. Below that threshold, the energy I expend grasping two interface changes in one patch series exceeds my annoyance at the premature generality. It's a fair point. One other thing that may be worth mentioning is that you're diving into the community at a fairly high level. It's obvious that you have a pretty solid understanding of the code, better than mine in some areas, and I don't want to be the dumb guy who won't commit your patch because I'm too, like, dumb. OTOH, as you've probably realized, our community dynamic is that the committer is the one who gets yelled at when something is broken. Then I'll review that separately. I think this change stands on its own, and committing it in steps will be simple for me than doing the whole thing in one go. That works for me. Know that, barring other suggestions, the followup patch will replace AlteredTableInfo.rewrite with the enum field. Just want to make sure that's not a surprise. ... And now that I've read your second reply, I probably didn't even have to mention it. It's all good. You might want to consider a second boolean in lieu of a three way enum. I'm not sure if that's cleaner but if it lets you write: if (blah) at-verify = true; instead of: if (blah) at-worklevel = Min(at-worklevel, WORK_VERIFY); ...then I think that might be cleaner. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TYPE 2: skip already-provable no-work rewrites
On Sun, Feb 06, 2011 at 02:15:47AM -0500, Robert Haas wrote: On Sat, Feb 5, 2011 at 7:44 PM, Noah Misch n...@leadboat.com wrote: But this is a little unsatisfying, for two reasons. ?First, the error message will be subtly wrong: we can make it complain about a table or a type, but not a foreign table. ?At a quick look, it likes the right fix might be to replace the second and third arguments to find_composite_type_dependencies() with a Relation. Seems like a clear improvement. That didn't quite work, because there's a caller in typecmds.c that doesn't have the relation handy. So I made it take a relkind and a name, which works fine. Hmm, indeed. In get_rels_with_domain(), it's a scalar type. Second, I wonder if we shouldn't refactor things so that all the checks fire in ATRewriteTables() rather than doing them in different places. ?Seems like that might be cleaner. Offhand, this seems reasonable, too. ?I assumed there was some good reason it couldn't be done there for non-tables, but nothing comes to mind. Actually, thinking about this more, I'm thinking if we're going to change anything, it seems we ought to go the other way. If we ever actually did support recursing into wherever the composite type dependencies take us, we'd want to detect that before phase 3 and add work-queue items for each table that we needed to futz with. The attached patch takes this approach. It's very slightly more code, but it reduces the amount of spooky action at a distance. Comments? Your patch improves the code. My standard for commending a refactor-only patch is rather high, though, and this patch doesn't reach it. The ancestral code placement wasn't obviously correct, but neither is this. So I'd vote -0. nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TYPE 2: skip already-provable no-work rewrites
On Sun, Feb 6, 2011 at 4:15 AM, Noah Misch n...@leadboat.com wrote: That didn't quite work, because there's a caller in typecmds.c that doesn't have the relation handy. So I made it take a relkind and a name, which works fine. Hmm, indeed. In get_rels_with_domain(), it's a scalar type. Yeeah, that's actually a little ugly. It's actually a domain over a composite type, not a composite type proper, IIUC. Better ideas? The attached patch takes this approach. It's very slightly more code, but it reduces the amount of spooky action at a distance. Comments? Your patch improves the code. My standard for commending a refactor-only patch is rather high, though, and this patch doesn't reach it. The ancestral code placement wasn't obviously correct, but neither is this. So I'd vote -0. Well, what's your suggestion, then? Your patch pops the test up from ATRewriteTable() up to ATRewriteTables(), but that's not obviously correct either, and it's an awkward place to do it because we don't have the Relation object handy at the point where the check needs to be done. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TYPE 2: skip already-provable no-work rewrites
On Sun, Feb 06, 2011 at 07:54:52AM -0500, Robert Haas wrote: On Sun, Feb 6, 2011 at 4:15 AM, Noah Misch n...@leadboat.com wrote: That didn't quite work, because there's a caller in typecmds.c that doesn't have the relation handy. ?So I made it take a relkind and a name, which works fine. Hmm, indeed. ?In get_rels_with_domain(), it's a scalar type. Yeeah, that's actually a little ugly. It's actually a domain over a composite type, not a composite type proper, IIUC. Better ideas? There are no domains over composite types. get_rels_with_domain() finds all relations having columns of the (scalar) domain type. It then calls find_composite_type_dependencies to identify uses of the composite types discovered in the previous step. Honestly, RELKIND_COMPOSITE_TYPE is a reasonable choice despite the technical mismatch. One more-correct approach would be to have two arguments, a catalog OID (pg_type or pg_class, currently) and a relkind, 0 when the catalog OID != pg_class. Might be an improvement, albeit a minor one. The attached patch takes this approach. ?It's very slightly more code, but it reduces the amount of spooky action at a distance. Comments? Your patch improves the code. ?My standard for commending a refactor-only patch is rather high, though, and this patch doesn't reach it. ?The ancestral code placement wasn't obviously correct, but neither is this. ?So I'd vote -0. Well, what's your suggestion, then? Your patch pops the test up from ATRewriteTable() up to ATRewriteTables(), but that's not obviously correct either, and it's an awkward place to do it because we don't have the Relation object handy at the point where the check needs to be done. Agreed. I couldn't think of any grand improvements, so my patch bore the conceptually-smallest change I could come up with to handle that particular issue. That is, I felt it was the change that could most easily be verified as rejecting the same cases as the old test. nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TYPE 2: skip already-provable no-work rewrites
On Sun, Feb 06, 2011 at 08:40:44AM -0500, Noah Misch wrote: On Sun, Feb 06, 2011 at 07:54:52AM -0500, Robert Haas wrote: Yeeah, that's actually a little ugly. It's actually a domain over a composite type, not a composite type proper, IIUC. Better ideas? There are no domains over composite types. get_rels_with_domain() finds all relations having columns of the (scalar) domain type. It then calls find_composite_type_dependencies to identify uses of the composite types discovered in the previous step. Honestly, RELKIND_COMPOSITE_TYPE is a reasonable choice despite the technical mismatch. One more-correct approach would be to have two arguments, a catalog OID (pg_type or pg_class, currently) and a relkind, 0 when the catalog OID != pg_class. Might be an improvement, albeit a minor one. Scratch that. How about classid and objid arguments, passing them to getObjectionDescription() internally? We already do something very similar in ATExecAlterColumnType for a related case. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TYPE 2: skip already-provable no-work rewrites
On Sun, Feb 6, 2011 at 8:58 AM, Noah Misch n...@leadboat.com wrote: On Sun, Feb 06, 2011 at 08:40:44AM -0500, Noah Misch wrote: On Sun, Feb 06, 2011 at 07:54:52AM -0500, Robert Haas wrote: Yeeah, that's actually a little ugly. It's actually a domain over a composite type, not a composite type proper, IIUC. Better ideas? There are no domains over composite types. get_rels_with_domain() finds all relations having columns of the (scalar) domain type. It then calls find_composite_type_dependencies to identify uses of the composite types discovered in the previous step. Honestly, RELKIND_COMPOSITE_TYPE is a reasonable choice despite the technical mismatch. One more-correct approach would be to have two arguments, a catalog OID (pg_type or pg_class, currently) and a relkind, 0 when the catalog OID != pg_class. Might be an improvement, albeit a minor one. Scratch that. How about classid and objid arguments, passing them to getObjectionDescription() internally? We already do something very similar in ATExecAlterColumnType for a related case. That's not quite so good for translators, I think. Another option is that we could just say relation (table, foreign table, etc...) or type. We use the word relation as a more generic version of table in a few other places. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TYPE 2: skip already-provable no-work rewrites
On Sun, Feb 6, 2011 at 12:13 PM, Robert Haas robertmh...@gmail.com wrote: That's not quite so good for translators, I think. Another option is that we could just say relation (table, foreign table, etc...) or type. We use the word relation as a more generic version of table in a few other places. Or how about passing an ObjectType? Then we could specify OBJECT_TABLE, OBJECT_FOREIGN_TABLE, or OBJECT_TYPE. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TYPE 2: skip already-provable no-work rewrites
On Sun, Feb 06, 2011 at 12:54:19PM -0500, Robert Haas wrote: On Sun, Feb 6, 2011 at 12:13 PM, Robert Haas robertmh...@gmail.com wrote: That's not quite so good for translators, I think. Another option is that we could just say relation (table, foreign table, etc...) or type. ?We use the word relation as a more generic version of table in a few other places. Seems fine. Or how about passing an ObjectType? Then we could specify OBJECT_TABLE, OBJECT_FOREIGN_TABLE, or OBJECT_TYPE. Could this be done without a several-line blob of code at each call site to determine the answer? If and only if so, this sounds better. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TYPE 2: skip already-provable no-work rewrites
On Sun, Feb 6, 2011 at 8:18 PM, Noah Misch n...@leadboat.com wrote: Or how about passing an ObjectType? Then we could specify OBJECT_TABLE, OBJECT_FOREIGN_TABLE, or OBJECT_TYPE. Could this be done without a several-line blob of code at each call site to determine the answer? If and only if so, this sounds better. Yeah, that's a problem. New thought: how about we go back more or less to the original coding, except replacing the second argument (only) with a Relation? In other words, callers will pass either a Relation (which might be a table or foreign table) or a type name. Not particularly elegant, but no worse than what we had before. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TYPE 2: skip already-provable no-work rewrites
On Mon, Feb 07, 2011 at 12:04:02AM -0500, Robert Haas wrote: On Sun, Feb 6, 2011 at 8:18 PM, Noah Misch n...@leadboat.com wrote: Or how about passing an ObjectType? ?Then we could specify OBJECT_TABLE, OBJECT_FOREIGN_TABLE, or OBJECT_TYPE. Could this be done without a several-line blob of code at each call site to determine the answer? ?If and only if so, this sounds better. Yeah, that's a problem. New thought: how about we go back more or less to the original coding, except replacing the second argument (only) with a Relation? In other words, callers will pass either a Relation (which might be a table or foreign table) or a type name. Not particularly elegant, but no worse than what we had before. Sounds good. Thanks. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TYPE 2: skip already-provable no-work rewrites
Robert, Thanks for the obviously thought-out review. On Sat, Feb 05, 2011 at 01:29:35AM -0500, Robert Haas wrote: On Thu, Jan 27, 2011 at 2:48 PM, Noah Misch n...@leadboat.com wrote: Done as attached. ?This preserves compatibility with our current handling of composite type dependencies. ?The rest you've seen before. OK, so I took a look at this in more detail today. The current logic for table rewrites looks like this: 1. If we're changing the data type of a column, or adding a column with a default, or adding/dropping OIDs, rewrite the table. Stop. 2. Otherwise, if we're adding a constraint or NOT NULL, scan the table and check constraints. 3. If we're changing tablespaces, copy the table block-by-block. Correct. It's perhaps obvious, but rewriting the table will always reindex it. It seems to me that the revised logic needs to look like this: 1. If we're changing the data type of a column and the existing contents are not binary coercible to the new contents, or if we're adding a column with a default or adding/dropping OIDs, rewrite the table. Stop. 2. Otherwise, if we're adding a constraint or NOT NULL, scan the table and check constraints. With this patch, step 2 changes changes to Otherwise, if we're adding a constraint or NOT NULL, or changing a column to a binary-compatible domain with a domain CHECK constraint, scan the table and check constraints. 3. If we're changing the data type of a column in the table, reindex the table. Rebuild indexes that depend on a changing column. Other indexes can stay. 4. If we're changing tablespaces, copy the table block-by-block. I might be missing something, but I don't see that the patch includes step #3, which I think is necessary. For example, citext is binary coercible to text, but you can't reuse the index because the comparison function is different. Of course, if you're changing the type of a column to its already-current type, you can skip #3, but if that's the only case we can optimize, it's not much of an accomplishment. I guess this gets back to the ALTER TYPE 7 patch, which I haven't looked at in detail, but I have a feeling it may be controversial. It's there, but it's happening rather implicitly. ATExecAlterColumnType builds lists of indexes and constraints that depend on changing columns. Specifically, it stashes their OIDs and the SQL to recreate them. ATPostAlterTypeCleanup drops those objects by OID, then parses the SQL statements, now based on the updated table definition. ATExecAddIndex and ATExecAddConstraint use those parsed statements to recreate the objects. The key is the skip_build logic in ATExecAddIndex: if ATRewriteTables will rewrite the table (and therefore *all* indexes), we skip the build at that earlier stage to avoid building the same index twice. The only thing I had to do was update the skip_build condition so it continues to mirror the corresponding test in ATRewriteTables. Originally I had this patch doing a full reindex, with an eye to having the next patch reduce the scope to dependent indexes. However, all the infrastructure was already there, and it actually made this patch smaller to skip directly to what it does today. ALTER TYPE 7 additionally skips builds of indexes that depend on a changing column but can be proven compatible. So it's in the business of, for example figuring out that text and varchar are compatible but text and citext are not. Another problem here is that if you have to do both #2 and #3, you might have been better off (or just as well off) doing #1, unless you can somehow jigger things so that the same scan does both the constraint checks and the index rebuild. That doesn't look simple. We have no such optimization during #1, either, so #2+#3 is never worse. In particular, #1 scans the table (# indexes total) + 1 times, while #2+#3 scans it (# of indexes depending on changed columns) + 1 times. There are some nice optimization opportunities here, to be sure. As a specific first step, teach index_build to create multiple indexes with a single scan, then have reindex_relation use that. Probably not simple. Combining that with the ATRewriteTable scan would be less simple still. nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TYPE 2: skip already-provable no-work rewrites
On Sat, Feb 5, 2011 at 3:05 AM, Noah Misch n...@leadboat.com wrote: Correct. It's perhaps obvious, but rewriting the table will always reindex it. Right. 3. If we're changing the data type of a column in the table, reindex the table. Rebuild indexes that depend on a changing column. Other indexes can stay. Good point. 4. If we're changing tablespaces, copy the table block-by-block. I might be missing something, but I don't see that the patch includes step #3, which I think is necessary. For example, citext is binary coercible to text, but you can't reuse the index because the comparison function is different. Of course, if you're changing the type of a column to its already-current type, you can skip #3, but if that's the only case we can optimize, it's not much of an accomplishment. I guess this gets back to the ALTER TYPE 7 patch, which I haven't looked at in detail, but I have a feeling it may be controversial. It's there, but it's happening rather implicitly. I see now. So you're actually not really making any change to that machinery. It's sufficient to just skip the rewrite of the heap when it isn't needed, and without any particular code change the indexes will sort themselves out. We have no such optimization during #1, either, so #2+#3 is never worse. In particular, #1 scans the table (# indexes total) + 1 times, while #2+#3 scans it (# of indexes depending on changed columns) + 1 times. OK. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TYPE 2: skip already-provable no-work rewrites
On Thu, Jan 27, 2011 at 2:48 PM, Noah Misch n...@leadboat.com wrote: Done as attached. Looking at this still more, it appears that independent of any change this patch may wish to make, there's a live bug here related to the foreign table patch I committed back in December. Creating a foreign table creates an eponymous rowtype, which can be used as a column in a regular table. You can then change the data type of a column in the foreign table, read from the regular table, and crash the server. The simple fix for this is to just change the code in ATPrepAlterColumnType to handle the foreign table case also: if (tab-relkind == RELKIND_COMPOSITE_TYPE) { /* * For composite types, do this check now. Tables will check * it later when the table is being rewritten. */ find_composite_type_dependencies(rel-rd_rel-reltype, NULL, RelationGetRelationName(rel)); } But this is a little unsatisfying, for two reasons. First, the error message will be subtly wrong: we can make it complain about a table or a type, but not a foreign table. At a quick look, it likes the right fix might be to replace the second and third arguments to find_composite_type_dependencies() with a Relation. Second, I wonder if we shouldn't refactor things so that all the checks fire in ATRewriteTables() rather than doing them in different places. Seems like that might be cleaner. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TYPE 2: skip already-provable no-work rewrites
On Sat, Feb 05, 2011 at 10:03:59AM -0500, Robert Haas wrote: Looking at this still more, it appears that independent of any change this patch may wish to make, there's a live bug here related to the foreign table patch I committed back in December. Creating a foreign table creates an eponymous rowtype, which can be used as a column in a regular table. You can then change the data type of a column in the foreign table, read from the regular table, and crash the server. The simple fix for this is to just change the code in ATPrepAlterColumnType to handle the foreign table case also: if (tab-relkind == RELKIND_COMPOSITE_TYPE) { /* * For composite types, do this check now. Tables will check * it later when the table is being rewritten. */ find_composite_type_dependencies(rel-rd_rel-reltype, NULL, RelationGetRelationName(rel)); } But this is a little unsatisfying, for two reasons. First, the error message will be subtly wrong: we can make it complain about a table or a type, but not a foreign table. At a quick look, it likes the right fix might be to replace the second and third arguments to find_composite_type_dependencies() with a Relation. Seems like a clear improvement. Second, I wonder if we shouldn't refactor things so that all the checks fire in ATRewriteTables() rather than doing them in different places. Seems like that might be cleaner. Offhand, this seems reasonable, too. I assumed there was some good reason it couldn't be done there for non-tables, but nothing comes to mind. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TYPE 2: skip already-provable no-work rewrites
On Sat, Feb 5, 2011 at 7:44 PM, Noah Misch n...@leadboat.com wrote: But this is a little unsatisfying, for two reasons. First, the error message will be subtly wrong: we can make it complain about a table or a type, but not a foreign table. At a quick look, it likes the right fix might be to replace the second and third arguments to find_composite_type_dependencies() with a Relation. Seems like a clear improvement. That didn't quite work, because there's a caller in typecmds.c that doesn't have the relation handy. So I made it take a relkind and a name, which works fine. Second, I wonder if we shouldn't refactor things so that all the checks fire in ATRewriteTables() rather than doing them in different places. Seems like that might be cleaner. Offhand, this seems reasonable, too. I assumed there was some good reason it couldn't be done there for non-tables, but nothing comes to mind. Actually, thinking about this more, I'm thinking if we're going to change anything, it seems we ought to go the other way. If we ever actually did support recursing into wherever the composite type dependencies take us, we'd want to detect that before phase 3 and add work-queue items for each table that we needed to futz with. The attached patch takes this approach. It's very slightly more code, but it reduces the amount of spooky action at a distance. The existing coding is basically relying on the assumption that operations which require finding composite type dependencies also require a table rewrite. That was probably true at one point in time, but it's not really quite right. It already requires compensating code foreign tables and composite types (which can require this treatment even though there's nothing to rewrite) and your proposed changes to avoid table rewrites in cases where a type is changed to a compatible type would break it in the opposite direction (the check would still be needed even if the parent table doesn't need a rewrite, because the rowtype could be indexed in some fashion that depends on the type of the column being changed). Comments? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 6a17399..83f8be0 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -3378,23 +3378,6 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode) } /* - * If we change column data types or add/remove OIDs, the operation has to - * be propagated to tables that use this table's rowtype as a column type. - * newrel will also be non-NULL in the case where we're adding a column - * with a default. We choose to forbid that case as well, since composite - * types might eventually support defaults. - * - * (Eventually we'll probably need to check for composite type - * dependencies even when we're just scanning the table without a rewrite, - * but at the moment a composite type does not enforce any constraints, - * so it's not necessary/appropriate to enforce them just during ALTER.) - */ - if (newrel) - find_composite_type_dependencies(oldrel-rd_rel-reltype, - oldrel-rd_rel-relkind, - RelationGetRelationName(oldrel)); - - /* * Generate the constraint and default execution states */ @@ -4055,7 +4038,7 @@ ATExecAddColumn(AlteredTableInfo *tab, Relation rel, Oid typeOid; int32 typmod; Form_pg_type tform; - Expr *defval; + Expr *defval = NULL; attrdesc = heap_open(AttributeRelationId, RowExclusiveLock); @@ -4304,6 +4287,20 @@ ATExecAddColumn(AlteredTableInfo *tab, Relation rel, tab-new_changeoids = true; /* + * If we're adding an OID column, the operation has to be propagated to + * tables that use this table's rowtype as a column type. But we don't + * currently have the infrastructure for that, so just throw an error. + * We also forbid the case where we're adding a column with a default, since + * composite types might eventually support defaults, and ALTER TABLE .. + * ADD COLUMN .. DEFAULT would be expected to initialize the newly added + * column to the default in each instantiation of the rowtype. + */ + if (isOid || defval) + find_composite_type_dependencies(rel-rd_rel-reltype, + rel-rd_rel-relkind, + RelationGetRelationName(rel)); + + /* * Add needed dependency entries for the new column. */ add_column_datatype_dependency(myrelid, newattnum, attribute.atttypid); @@ -4975,6 +4972,16 @@ ATExecDropColumn(List **wqueue, Relation rel, const char *colName, /* Tell Phase 3 to physically remove the OID column */ tab-new_changeoids = true; + + /* + * The OID removal operation needs to be propagated to tables that use + * this table's rowtype as a column type. But we don't currently have + * the infrastructure for that, so just throw an error if it would be + * required. + */ +
Re: [HACKERS] ALTER TYPE 2: skip already-provable no-work rewrites
On Thu, Jan 27, 2011 at 2:48 PM, Noah Misch n...@leadboat.com wrote: Done as attached. This preserves compatibility with our current handling of composite type dependencies. The rest you've seen before. OK, so I took a look at this in more detail today. The current logic for table rewrites looks like this: 1. If we're changing the data type of a column, or adding a column with a default, or adding/dropping OIDs, rewrite the table. Stop. 2. Otherwise, if we're adding a constraint or NOT NULL, scan the table and check constraints. 3. If we're changing tablespaces, copy the table block-by-block. It seems to me that the revised logic needs to look like this: 1. If we're changing the data type of a column and the existing contents are not binary coercible to the new contents, or if we're adding a column with a default or adding/dropping OIDs, rewrite the table. Stop. 2. Otherwise, if we're adding a constraint or NOT NULL, scan the table and check constraints. 3. If we're changing the data type of a column in the table, reindex the table. 4. If we're changing tablespaces, copy the table block-by-block. I might be missing something, but I don't see that the patch includes step #3, which I think is necessary. For example, citext is binary coercible to text, but you can't reuse the index because the comparison function is different. Of course, if you're changing the type of a column to its already-current type, you can skip #3, but if that's the only case we can optimize, it's not much of an accomplishment. I guess this gets back to the ALTER TYPE 7 patch, which I haven't looked at in detail, but I have a feeling it may be controversial. Another problem here is that if you have to do both #2 and #3, you might have been better off (or just as well off) doing #1, unless you can somehow jigger things so that the same scan does both the constraint checks and the index rebuild. That doesn't look simple. Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TYPE 2: skip already-provable no-work rewrites
On Wed, Jan 26, 2011 at 07:31:40AM -0500, Robert Haas wrote: I'd also suggest that this big if-block you changed to a case statement could just as well stay as an if-block. There are only three cases, and we want to avoid rearranging things more than necessary. It complicates both review and back-patching to no good end. Okay. I've also left out the large reindent in ATRewriteTable for now. Easy to re-add it later if desired. I think you should collect up what's left of ALTER TABLE 0 and the stuff on this thread, rebase it, and submit it as a single patch on this thread that applies directly against the master branch. We may decide to split it back up again in some other way, but I think the current division isn't actually buying us much. Done as attached. This preserves compatibility with our current handling of composite type dependencies. The rest you've seen before. Thanks, nm diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml index 1d52be6..4efb02e 100644 *** a/doc/src/sgml/ref/alter_table.sgml --- b/doc/src/sgml/ref/alter_table.sgml *** *** 403,411 ALTER TABLE replaceable class=PARAMETERname/replaceable for details on the available parameters. Note that the table contents will not be modified immediately by this command; depending on the parameter you might need to rewrite the table to get the desired effects. ! That can be done with xref linkend=SQL-CLUSTER ! or one of the forms of commandALTER ! TABLE/ that forces a table rewrite. /para note --- 403,411 for details on the available parameters. Note that the table contents will not be modified immediately by this command; depending on the parameter you might need to rewrite the table to get the desired effects. ! That can be done with link linkend=SQL-VACUUMVACUUM ! FULL/, xref linkend=SQL-CLUSTER or one of the forms ! of commandALTER TABLE/ that forces a table rewrite. /para note *** *** 746,759 ALTER TABLE replaceable class=PARAMETERname/replaceable /para para ! Adding a column with a non-null default or changing the type of an ! existing column will require the entire table and indexes to be rewritten. ! This might take a significant amount of time for a large table; and it will ! temporarily require double the disk space. Adding or removing a system literaloid/ column likewise requires rewriting the entire table. /para para Adding a literalCHECK/ or literalNOT NULL/ constraint requires scanning the table to verify that existing rows meet the constraint. /para --- 746,770 /para para ! Adding a column with a non-null default will rewrite the entire table and ! all indexes. Changing the type of an existing column will do the same ! unless a binary-coercible cast implements the type conversion. Refer to ! xref linkend=SQL-CREATECAST for further information. A rewrite might ! take a significant amount of time for a large table, and it will temporarily ! require double the disk space. Adding or removing a system literaloid/ column likewise requires rewriting the entire table. /para para + Similar to the behavior of link linkend=SQL-VACUUMVACUUM FULL/, the + rewriting process eliminates any dead space in the table. Prior + to productnamePostgreSQL/ 9.0, literalSET DATA TYPE/ + outpaced literalVACUUM FULL/ at this task, so it was useful even absent + the need for a column type change. This speed advantage no longer holds, + and literalSET DATA TYPE/ may not even rewrite the table. +/para + +para Adding a literalCHECK/ or literalNOT NULL/ constraint requires scanning the table to verify that existing rows meet the constraint. /para *** *** 777,797 ALTER TABLE replaceable class=PARAMETERname/replaceable /para para - The fact that literalSET DATA TYPE/ requires rewriting the whole table - is sometimes an advantage, because the rewriting process eliminates - any dead space in the table. For example, to reclaim the space occupied - by a dropped column immediately, the fastest way is: - programlisting - ALTER TABLE table ALTER COLUMN anycol TYPE anytype; - /programlisting - where literalanycol/ is any remaining table column and - literalanytype/ is the same type that column already has. - This results in no semantically-visible change in the table, - but the command forces rewriting, which gets rid of no-longer-useful - data. -/para - -para The literalUSING/literal option of literalSET DATA TYPE/ can actually specify any expression involving the old values of the row; that is, it can refer to other columns as well as the one being converted. This allows --- 788,793
Re: [HACKERS] ALTER TYPE 2: skip already-provable no-work rewrites
On Tue, Jan 25, 2011 at 10:22 PM, Noah Misch n...@leadboat.com wrote: I'm fine with this patch. OK, committed. The next patch removed new_changeoids, so we would instead be keeping it with this as the only place referencing it. [...] The at2v2 patch would then morph to do something like: if (worklevel != WORK_NONE) tab-new_changetypes = true; Well, I'm not too keen on either of those things. The second one, especially, looks like the sense of the Boolean is clearly being abused, so either the Boolean needs to be renamed or some other change is required. I'd also suggest that this big if-block you changed to a case statement could just as well stay as an if-block. There are only three cases, and we want to avoid rearranging things more than necessary. It complicates both review and back-patching to no good end. I think you should collect up what's left of ALTER TABLE 0 and the stuff on this thread, rebase it, and submit it as a single patch on this thread that applies directly against the master branch. We may decide to split it back up again in some other way, but I think the current division isn't actually buying us much. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TYPE 2: skip already-provable no-work rewrites
On Mon, Jan 24, 2011 at 7:10 PM, Noah Misch n...@leadboat.com wrote: * at1.1-default-composite.patch Remove the error when the user adds a column with a default value to a table whose rowtype is used in a column elsewhere. Can we fix this without moving the logic around quite so much? I'm worried that could introduce bugs. It strikes me that the root of the problem here is that this test is subtly wrong: if (newrel) find_composite_type_dependencies(oldrel-rd_rel-reltype, RelationGetRelationName(oldrel), NULL); So what this is saying is: If the user has performed an operation that requires a rewrite, then we can't carry out that operation if the rowtype is used elsewhere, because we wouldn't be able to propagate the rewrite to those other objects. That's correct, unless the operation in question is one which isn't supported by composite types anyway. We trigger a rewrite if there is a has-OIDs change or if tab-newvals contains any elements, which can happen if either there is a type change or if a column with a default is added. So it seems to me that we could fix this with something like the attached. Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company defaults-are-not-so-evil.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TYPE 2: skip already-provable no-work rewrites
On Mon, Jan 24, 2011 at 7:10 PM, Noah Misch n...@leadboat.com wrote: * at1.2-doc-set-data-type.patch The documentation used ALTER TYPE when it meant SET DATA TYPE, a subform of ALTER TABLE or ALTER FOREIGN TABLE. Fixes just that. Committed this part. For reasons involving me being tired, I initially thought that only the first part was correct, which is why I did it as two commits. But it's obviously right, so now it's all committed. I back-patched the ALTER TABLE part to 9.0.X so it'll show up in the web site docs after the next minor release. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TYPE 2: skip already-provable no-work rewrites
On Tue, Jan 25, 2011 at 06:40:08PM -0500, Robert Haas wrote: On Mon, Jan 24, 2011 at 7:10 PM, Noah Misch n...@leadboat.com wrote: * at1.1-default-composite.patch Remove the error when the user adds a column with a default value to a table whose rowtype is used in a column elsewhere. Can we fix this without moving the logic around quite so much? I'm worried that could introduce bugs. It strikes me that the root of the problem here is that this test is subtly wrong: if (newrel) find_composite_type_dependencies(oldrel-rd_rel-reltype, RelationGetRelationName(oldrel), NULL); Correct. So it seems to me that we could fix this with something like the attached. Thoughts? I'm fine with this patch. A few notes based on its context in the larger project: --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -3366,14 +3367,14 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode) } /* - * If we need to rewrite the table, the operation has to be propagated to - * tables that use this table's rowtype as a column type. + * If we change column data types or add/remove OIDs, the operation has to + * be propagated to tables that use this table's rowtype as a column type. * * (Eventually this will probably become true for scans as well, but at * the moment a composite type does not enforce any constraints, so it's * not necessary/appropriate to enforce them just during ALTER.) */ - if (newrel) + if (tab-new_changetypes || tab-new_changeoids) The next patch removed new_changeoids, so we would instead be keeping it with this as the only place referencing it. find_composite_type_dependencies(oldrel-rd_rel-reltype, RelationGetRelationName(oldrel), NULL); @@ -6347,6 +6348,7 @@ ATPrepAlterColumnType(List **wqueue, newval-expr = (Expr *) transform; tab-newvals = lappend(tab-newvals, newval); + tab-new_changetypes = true; The at2v2 patch would then morph to do something like: if (worklevel != WORK_NONE) tab-new_changetypes = true; That weakens the name new_changetypes a bit. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TYPE 2: skip already-provable no-work rewrites
Robert Haas robertmh...@gmail.com writes: On Sun, Jan 9, 2011 at 5:01 PM, Noah Misch n...@leadboat.com wrote: As unintended fallout, it's no longer an error to add oids or a column with a default value to a table whose rowtype is used in columns elsewhere. This seems best. Defaults on the origin table do not even apply to new inserts into such a column, and the rowtype does not gain an OID column via its table. I think this should be split into two patches (we can discuss both on this thread, no need to start any new ones), one that implements just the above improvement and another that accomplishes the main purpose of the patch. I haven't been paying much attention to this thread, but I happened to read the above, and quite frankly it scares the cr*p out of me. I don't believe that Noah even begins to be qualified to understand the implications of adding/removing an OID column, and I clearly remember the non-obvious bugs we've had in the past from that. Before this goes in I want to see a convincing explanation (not an unsupported assertion) why this isn't going to re-introduce those old bugs. I'm also pretty unclear on why it's a good idea to let uses of a rowtype be different from the table on which it's allegedly based. To the extent that the current behavior allows that, isn't that a bug rather than a feature we should extend? # The fact that ALTER TYPE requires rewriting the whole table is sometimes an advantage, because the rewriting process # eliminates any dead space in the table. I'm not sure what we can recommend here instead. New-style VACUUM FULL. I don't think that a patch that makes it harder to use ALTER TABLE this way is a problem in itself, now that we've got that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TYPE 2: skip already-provable no-work rewrites
On Sun, Jan 23, 2011 at 12:06 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Sun, Jan 9, 2011 at 5:01 PM, Noah Misch n...@leadboat.com wrote: As unintended fallout, it's no longer an error to add oids or a column with a default value to a table whose rowtype is used in columns elsewhere. This seems best. Defaults on the origin table do not even apply to new inserts into such a column, and the rowtype does not gain an OID column via its table. I think this should be split into two patches (we can discuss both on this thread, no need to start any new ones), one that implements just the above improvement and another that accomplishes the main purpose of the patch. I haven't been paying much attention to this thread, but I happened to read the above, and quite frankly it scares the cr*p out of me. I don't believe that Noah even begins to be qualified to understand the implications of adding/removing an OID column, and I clearly remember the non-obvious bugs we've had in the past from that. Before this goes in I want to see a convincing explanation (not an unsupported assertion) why this isn't going to re-introduce those old bugs. Because all of our old bugs now have regression tests that will break if we reintroduce them? I guess that probably falls into the category of wishful thinking. I'm also pretty unclear on why it's a good idea to let uses of a rowtype be different from the table on which it's allegedly based. To the extent that the current behavior allows that, isn't that a bug rather than a feature we should extend? It's not clear to me what it would mean for OIDs or default values to propagate themselves to the table's row type. Check constraints, foreign keys, unique constraints, etc. don't either. In fact, not even the NOT NULL property flows through. On the surface, preventing these details from interfering with ALTER TABLE commands that can't actually break anything seems like removing an annoying implementation restriction, but I guess one could make the argument that we actually intend to make those flow through some day. But if so, this is the first I'm hearing of it. # The fact that ALTER TYPE requires rewriting the whole table is sometimes an advantage, because the rewriting process # eliminates any dead space in the table. I'm not sure what we can recommend here instead. New-style VACUUM FULL. I don't think that a patch that makes it harder to use ALTER TABLE this way is a problem in itself, now that we've got that. Cool. That'll reclaim space from dropped columns and stuff? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TYPE 2: skip already-provable no-work rewrites
On Sun, Jan 23, 2011 at 12:06:43PM -0500, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: On Sun, Jan 9, 2011 at 5:01 PM, Noah Misch n...@leadboat.com wrote: As unintended fallout, it's no longer an error to add oids or a column with a default value to a table whose rowtype is used in columns elsewhere. This seems best. Defaults on the origin table do not even apply to new inserts into such a column, and the rowtype does not gain an OID column via its table. I think this should be split into two patches (we can discuss both on this thread, no need to start any new ones), one that implements just the above improvement and another that accomplishes the main purpose of the patch. I haven't been paying much attention to this thread, but I happened to read the above, and quite frankly it scares the cr*p out of me. I don't believe that Noah even begins to be qualified to understand the implications of adding/removing an OID column, and I clearly remember the non-obvious bugs we've had in the past from that. Before this goes in I want to see a convincing explanation (not an unsupported assertion) why this isn't going to re-introduce those old bugs. Turns out that we do set HEAP_HASOID and allocate space for an OID in the composite-type datums. We don't actually assign an OID, and I can't see any way to read it from the composite. It seems most consistent with the verdict of commit 6d1e361 to continue rejecting these cases. Thanks for the heads-up. I'm also pretty unclear on why it's a good idea to let uses of a rowtype be different from the table on which it's allegedly based. To the extent that the current behavior allows that, isn't that a bug rather than a feature we should extend? From the perspective of defining the behavior afresh, I'd agree. I haven't seen any stirrings of actually implementing this, though. Like Robert, I'm doubting there's a user benefit from rejecting these cases in preparation for the day that they would actually require it. nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TYPE 2: skip already-provable no-work rewrites
On Sun, Jan 9, 2011 at 5:01 PM, Noah Misch n...@leadboat.com wrote: This patch removes ALTER TYPE rewrites in cases we can already prove valid. I add a function GetCoerceExemptions() that walks an Expr according to rules discussed in the design thread, simplified slightly pending additions in the next patch. See the comment at that function for a refresher. I use it to populate two new bools to AlteredTableInfo, new_bits and mayerror. new_bits is a superset of new_changedoids, so I subsume that. I change ATRewriteTable to act on those and support the notion of evaluating the transformation expressions when we're not rewriting the table. This helps on conversions like varchar(X)-text, xml-text, and conversions between domains and their base types. This certainly looks like a worthwhile thing to do, and it doesn't seem to need a lot of code, which is great. But I confess I'm not confident I really understand what this patch is changing or why it's changing it. I think the problem is partly that the terminology used is not very consistent: + if (!(exempt COERCE_EXEMPT_NOCHANGE)) + tab-new_bits = true; + if (!(exempt COERCE_EXEMPT_NOERROR)) + tab-mayerror = true; These are the same two bits of status that are elsewhere called always-noop and never-error. Or maybe not quite the same... but close. A related problem is that I think only three of the four combinations are actually interesting: if there are new bits... that is, if !COERCE_EXEMPT_NOCHANGE... i.e. not always-noop, then the state of the other bit is irrelevant. I think maybe this ought to just be rephrased as an enum with three elements, describing the operation to be performed: rewrite, check, nothing. As unintended fallout, it's no longer an error to add oids or a column with a default value to a table whose rowtype is used in columns elsewhere. This seems best. Defaults on the origin table do not even apply to new inserts into such a column, and the rowtype does not gain an OID column via its table. I think this should be split into two patches (we can discuss both on this thread, no need to start any new ones), one that implements just the above improvement and another that accomplishes the main purpose of the patch. Patches that do two or three or four things are quite a bit harder to understand than patches that just do one thing. On a related note, it is very helpful to avoid introducing unrelated changes into a patch. Comment updates should reflect changes you made, rather than editorialization on what's already there. There is some value to the latter, but it makes it harder to understand what the patch is doing. Also, you need to update the ALTER TABLE documentation. The whole notes section needs to be gone over, but the following part in particular seems problematic, since we're proposing to break this: # The fact that ALTER TYPE requires rewriting the whole table is sometimes an advantage, because the rewriting process # eliminates any dead space in the table. For example, to reclaim the space occupied by a dropped column immediately, the # fastest way is: # # ALTER TABLE table ALTER COLUMN anycol TYPE anytype; I'm not sure what we can recommend here instead. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers