Re: [HACKERS] plpgsql versus domains
Andrew Gierth and...@tao11.riddles.org.uk writes: Tom == Tom Lane t...@sss.pgh.pa.us writes: Tom This is the first attempt at weaponizing the memory context Tom reset/delete feature, and I'm fairly happy with it, except for one Tom thing: I had to #include utils/memnodes.h into typcache.h in order Tom to preserve the intended property that the callback structs could Tom be included directly into structs using them. Now, that's not Tom awful in itself, because typcache.h isn't used everywhere; but if Tom this feature gets popular we'll find memnodes.h being included Tom pretty much everywhere, which is not so great. I'm thinking about Tom moving struct MemoryContextCallback and the extern for Tom MemoryContextRegisterResetCallback into palloc.h to avoid this. Tom Maybe that's too far in the other direction. Thoughts? Tom Compromise ideas? This was pretty much my first thought on looking at the callback patch. It may seem logical to put the reset callback stuff in memutils/memnodes alongside context creation and reset and so on, but in fact the places that are going to want to use callbacks are primarily _not_ the places that are doing their own context management - since those could do their own cleanup - but rather places that are allocating things in contexts supplied by others. So palloc.h is the place for it. That argument sounds good to me ;-). Moved. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plpgsql versus domains
Andres Freund and...@2ndquadrant.com writes: On 2015-02-26 13:53:18 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: You're probably going to kill me because of the increased complexity, but how about making typecache.c smarter, more in the vein of relcache.c, and store the relevant info in there? I thought that's what I was proposing in point #1. Sure, but my second point was to avoid duplicating that information into -fcinfo and such and instead reference the typecache entry from there. So that if the typecache entry is being rebuilt (a new mechanism I'm afraid), whatever uses -fcinfo gets the updated functions/definition. Here's a draft patch for that. This fixes the delayed-domain-constraint problem pretty thoroughly, in that any attempt to check a domain's constraints will use fully up-to-date info. This is the first attempt at weaponizing the memory context reset/delete feature, and I'm fairly happy with it, except for one thing: I had to #include utils/memnodes.h into typcache.h in order to preserve the intended property that the callback structs could be included directly into structs using them. Now, that's not awful in itself, because typcache.h isn't used everywhere; but if this feature gets popular we'll find memnodes.h being included pretty much everywhere, which is not so great. I'm thinking about moving struct MemoryContextCallback and the extern for MemoryContextRegisterResetCallback into palloc.h to avoid this. Maybe that's too far in the other direction. Thoughts? Compromise ideas? Also, instrumenting the code showed that TypeCacheConstrCallback gets called quite a lot during the standard regression tests, which is why I went out of my way to make it quick. Almost all of those cache flushes are from non-domain-related updates on pg_type or pg_constraint, so they're not really necessary from a logical perspective, and they're surely going to hurt performance for heavy users of domains. I think to fix this we'd have to split pg_constraint into two catalogs, one for table constraints and one for domain constraints; which would be a good thing anyway from a normal-form-theory perspective. And we'd have to get rid of pg_type.typnotnull and instead store domain NOT NULL constraints in this hypothetical domain constraint catalog. I don't plan to do anything about that myself right now, because I'm not sure that production databases would have the kind of traffic on pg_type and pg_constraint that the regression tests exhibit. But at some point we might have to fix it. regards, tom lane diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 07ab4b4..7455020 100644 *** a/src/backend/commands/tablecmds.c --- b/src/backend/commands/tablecmds.c *** ATExecAddColumn(List **wqueue, AlteredTa *** 4824,4830 { defval = (Expr *) build_column_default(rel, attribute.attnum); ! if (!defval GetDomainConstraints(typeOid) != NIL) { Oid baseTypeId; int32 baseTypeMod; --- 4824,4830 { defval = (Expr *) build_column_default(rel, attribute.attnum); ! if (!defval DomainHasConstraints(typeOid)) { Oid baseTypeId; int32 baseTypeMod; *** ATColumnChangeRequiresRewrite(Node *expr *** 7778,7784 { CoerceToDomain *d = (CoerceToDomain *) expr; ! if (GetDomainConstraints(d-resulttype) != NIL) return true; expr = (Node *) d-arg; } --- 7778,7784 { CoerceToDomain *d = (CoerceToDomain *) expr; ! if (DomainHasConstraints(d-resulttype)) return true; expr = (Node *) d-arg; } diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c index b77e1b4..d800033 100644 *** a/src/backend/commands/typecmds.c --- b/src/backend/commands/typecmds.c *** *** 59,65 #include executor/executor.h #include miscadmin.h #include nodes/makefuncs.h - #include optimizer/planner.h #include optimizer/var.h #include parser/parse_coerce.h #include parser/parse_collate.h --- 59,64 *** domainAddConstraint(Oid domainOid, Oid d *** 3081,3206 return ccbin; } - /* - * GetDomainConstraints - get a list of the current constraints of domain - * - * Returns a possibly-empty list of DomainConstraintState nodes. - * - * This is called by the executor during plan startup for a CoerceToDomain - * expression node. The given constraints will be checked for each value - * passed through the node. - * - * We allow this to be called for non-domain types, in which case the result - * is always NIL. - */ - List * - GetDomainConstraints(Oid typeOid) - { - List *result = NIL; - bool notNull = false; - Relation conRel; - - conRel = heap_open(ConstraintRelationId, AccessShareLock); - - for (;;) - { - HeapTuple tup; - HeapTuple conTup; - Form_pg_type typTup; - ScanKeyData key[1]; - SysScanDesc scan; - - tup =
Re: [HACKERS] plpgsql versus domains
Tom == Tom Lane t...@sss.pgh.pa.us writes: Tom This is the first attempt at weaponizing the memory context Tom reset/delete feature, and I'm fairly happy with it, except for one Tom thing: I had to #include utils/memnodes.h into typcache.h in order Tom to preserve the intended property that the callback structs could Tom be included directly into structs using them. Now, that's not Tom awful in itself, because typcache.h isn't used everywhere; but if Tom this feature gets popular we'll find memnodes.h being included Tom pretty much everywhere, which is not so great. I'm thinking about Tom moving struct MemoryContextCallback and the extern for Tom MemoryContextRegisterResetCallback into palloc.h to avoid this. Tom Maybe that's too far in the other direction. Thoughts? Tom Compromise ideas? This was pretty much my first thought on looking at the callback patch. It may seem logical to put the reset callback stuff in memutils/memnodes alongside context creation and reset and so on, but in fact the places that are going to want to use callbacks are primarily _not_ the places that are doing their own context management - since those could do their own cleanup - but rather places that are allocating things in contexts supplied by others. So palloc.h is the place for it. -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plpgsql versus domains
On Thu, Feb 26, 2015 at 1:53 PM, Tom Lane t...@sss.pgh.pa.us wrote: To some extent this is a workaround for the fact that plpgsql does type conversions the way it does (ie, by I/O rather than by using the parser's cast mechanisms). We've talked about changing that, but people seem to be afraid of the compatibility consequences, and I'm not planning to fight two compatibility battles concurrently ;-) A sensible plan, but since we're here: - I do agree that changing the way PL/pgsql does those type conversions is scary. I can't predict what will break, and there's no getting around the scariness of that. - On the other hand, I do think it would be good to change it, because this sucks: rhaas=# do $$ declare x int; begin x := (3.0::numeric)/(1.0::numeric); end $$; ERROR: invalid input syntax for integer: 3. CONTEXT: PL/pgSQL function inline_code_block line 1 at assignment I didn't realize that this issue had even been discussed before... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plpgsql versus domains
Robert Haas robertmh...@gmail.com writes: On Thu, Feb 26, 2015 at 1:53 PM, Tom Lane t...@sss.pgh.pa.us wrote: To some extent this is a workaround for the fact that plpgsql does type conversions the way it does (ie, by I/O rather than by using the parser's cast mechanisms). We've talked about changing that, but people seem to be afraid of the compatibility consequences, and I'm not planning to fight two compatibility battles concurrently ;-) A sensible plan, but since we're here: - I do agree that changing the way PL/pgsql does those type conversions is scary. I can't predict what will break, and there's no getting around the scariness of that. - On the other hand, I do think it would be good to change it, because this sucks: rhaas=# do $$ declare x int; begin x := (3.0::numeric)/(1.0::numeric); end $$; ERROR: invalid input syntax for integer: 3. CONTEXT: PL/pgSQL function inline_code_block line 1 at assignment If we did want to open that can of worms, my proposal would be to make plpgsql type conversions work like so: * If there is a cast pathway known to the core SQL parser, use that mechanism. * Otherwise, attempt to convert via I/O as we do today. This seems to minimize the risk of breaking things, although there would probably be corner cases that work differently (for instance I bet boolean to text might turn out differently). In the very long run we could perhaps deprecate and remove the second phase. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plpgsql versus domains
2015-02-27 21:57 GMT+01:00 Tom Lane t...@sss.pgh.pa.us: Robert Haas robertmh...@gmail.com writes: On Thu, Feb 26, 2015 at 1:53 PM, Tom Lane t...@sss.pgh.pa.us wrote: To some extent this is a workaround for the fact that plpgsql does type conversions the way it does (ie, by I/O rather than by using the parser's cast mechanisms). We've talked about changing that, but people seem to be afraid of the compatibility consequences, and I'm not planning to fight two compatibility battles concurrently ;-) A sensible plan, but since we're here: - I do agree that changing the way PL/pgsql does those type conversions is scary. I can't predict what will break, and there's no getting around the scariness of that. - On the other hand, I do think it would be good to change it, because this sucks: rhaas=# do $$ declare x int; begin x := (3.0::numeric)/(1.0::numeric); end $$; ERROR: invalid input syntax for integer: 3. CONTEXT: PL/pgSQL function inline_code_block line 1 at assignment If we did want to open that can of worms, my proposal would be to make plpgsql type conversions work like so: * If there is a cast pathway known to the core SQL parser, use that mechanism. * Otherwise, attempt to convert via I/O as we do today. This seems to minimize the risk of breaking things, although there would probably be corner cases that work differently (for instance I bet boolean to text might turn out differently). In the very long run we could perhaps deprecate and remove the second phase. +1 There can be similar solution like plpgsql/sql identifiers priority configuration. Some levels - and what you are proposing should be default. It works perfectly - and from my view and what I know from my neighborhood, there are no issues. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plpgsql versus domains
Hi 2015-02-26 18:31 GMT+01:00 Tom Lane t...@sss.pgh.pa.us: At the behest of Salesforce, I've been looking into improving plpgsql's handling of variables of domain types, which is currently pretty awful. It's really slow, because any assignment to a domain variable goes through the slow double-I/O-conversion path in exec_cast_value, even if no actual work is needed. And I noticed that the domain's constraints get looked up only once per function per session; for example this script gets unexpected results: create domain di as int; create function foo1(int) returns di as $$ declare d di; begin d := $1; return d; end $$ language plpgsql immutable; select foo1(0); -- call once to set up cache alter domain di add constraint pos check (value 0); select 0::di; -- fails, as expected select foo1(0); -- should fail, does not \c - select foo1(0); -- now it fails The reason for this is that domain_in looks up the domain's constraints and caches them under the calling FmgrInfo struct. That behavior was designed to ensure we'd do just one constraint-lookup per query, which I think is reasonable. But plpgsql sets up an FmgrInfo in the variable's PLpgSQL_type record, which persists for the whole session unless the function's parse tree is flushed for some reason. So the constraints only get looked up once. The rough sketch I have in mind for fixing this is: 1. Arrange to cache the constraints for domain types in typcache.c, so as to reduce the number of trips to the actual catalogs. I think typcache.c will have to flush these cache items upon seeing any sinval change in pg_constraint, but that's still cheaper than searching pg_constraint for every query. 2. In plpgsql, explicitly model a domain type as being its base type plus some constraints --- that is, stop depending on domain_in() to enforce the constraints, and do it ourselves. This would mean that the FmgrInfo in a PLpgSQL_type struct would reference the base type's input function rather than domain_in, so we'd not have to be afraid to cache that. The constraints would be represented as a separate list, which we'd arrange to fetch from the typcache once per transaction. (I think checking for new constraints once per transaction is sufficient for reasonable situations, though I suppose that could be argued about.) The advantage of this approach is first that we could avoid an I/O conversion when the incoming value to be assigned matches the domain's base type, which is the typical case; and second that a domain without any CHECK constraints would become essentially free, which is also a common case. I like this variant There can be some good optimization with scalar types: text, varchars, numbers and reduce IO cast. 3. We could still have domains.c responsible for most of the details; the domain_check() API may be good enough as-is, though it seems to lack any simple way to force re-lookup of the domain constraints once per xact. Thoughts, better ideas? Given the lack of field complaints, I don't feel a need to try to back-patch a fix for this, but I do want to see it fixed for 9.5. +1 Regards Pavel regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plpgsql versus domains
On 2015-02-26 13:53:18 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: Hm. A bit sad to open code that in plpgsql instead of making it available more generally. The actual checking wouldn't be open-coded, it would come from domain_check() (or possibly a modified version of that). It is true that plpgsql would know more about domains than it does today, but It'd still teach plpgsql more about types than it knows right now. But on a second thought that ship has sailed long ago - and the amount of knowledge seems relatively small. There's much more stuff about composites there already. and I'm not planning to fight two compatibility battles concurrently ;-) Heh ;) You're probably going to kill me because of the increased complexity, but how about making typecache.c smarter, more in the vein of relcache.c, and store the relevant info in there? I thought that's what I was proposing in point #1. Sure, but my second point was to avoid duplicating that information into -fcinfo and such and instead reference the typecache entry from there. So that if the typecache entry is being rebuilt (a new mechanism I'm afraid), whatever uses -fcinfo gets the updated functions/definition. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plpgsql versus domains
Hi, On 2015-02-26 12:31:46 -0500, Tom Lane wrote: It's really slow, because any assignment to a domain variable goes through the slow double-I/O-conversion path in exec_cast_value, even if no actual work is needed. Hm, that's surely not nice for some types. Doing syscache lookups every time can't help either. And I noticed that the domain's constraints get looked up only once per function per session; for example this script gets unexpected results: The reason for this is that domain_in looks up the domain's constraints and caches them under the calling FmgrInfo struct. That's probably far from the only such case we have :( 1. Arrange to cache the constraints for domain types in typcache.c, so as to reduce the number of trips to the actual catalogs. I think typcache.c will have to flush these cache items upon seeing any sinval change in pg_constraint, but that's still cheaper than searching pg_constraint for every query. That sounds sane. 2. In plpgsql, explicitly model a domain type as being its base type plus some constraints --- that is, stop depending on domain_in() to enforce the constraints, and do it ourselves. This would mean that the FmgrInfo in a PLpgSQL_type struct would reference the base type's input function rather than domain_in, so we'd not have to be afraid to cache that. The constraints would be represented as a separate list, which we'd arrange to fetch from the typcache once per transaction. Hm. A bit sad to open code that in plpgsql instead of making it available more generally. 3. We could still have domains.c responsible for most of the details; the domain_check() API may be good enough as-is, though it seems to lack any simple way to force re-lookup of the domain constraints once per xact. Thoughts, better ideas? You're probably going to kill me because of the increased complexity, but how about making typecache.c smarter, more in the vein of relcache.c, and store the relevant info in there? And then, to avoid repeated lookups, store a reference to that in fcinfo? The lifetime of objects wouldn't be entirely trivial, but it doesn't sound impossible. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plpgsql versus domains
2015-02-26 19:53 GMT+01:00 Tom Lane t...@sss.pgh.pa.us: Andres Freund and...@2ndquadrant.com writes: On 2015-02-26 12:31:46 -0500, Tom Lane wrote: 2. In plpgsql, explicitly model a domain type as being its base type plus some constraints --- that is, stop depending on domain_in() to enforce the constraints, and do it ourselves. This would mean that the FmgrInfo in a PLpgSQL_type struct would reference the base type's input function rather than domain_in, so we'd not have to be afraid to cache that. The constraints would be represented as a separate list, which we'd arrange to fetch from the typcache once per transaction. Hm. A bit sad to open code that in plpgsql instead of making it available more generally. The actual checking wouldn't be open-coded, it would come from domain_check() (or possibly a modified version of that). It is true that plpgsql would know more about domains than it does today, but I'm not sure I see another use-case for this type of logic. To some extent this is a workaround for the fact that plpgsql does type conversions the way it does (ie, by I/O rather than by using the parser's cast mechanisms). We've talked about changing that, but people seem to be afraid of the compatibility consequences, and I'm not planning to fight two compatibility battles concurrently ;-) I understand, but in this case, a compatibility can be enforced simply - we can support a only know cast mode and IO cast mode. IO cast is simple for lot of people, but there is a lot of performance issues and few errors related to this topic. But this is offtopic now. But, what can be related - for plpgsql_check is necessary some simple check of a assigning - that should to return error or warning This part of plpgsql_check is too complex now. Regards Pavel You're probably going to kill me because of the increased complexity, but how about making typecache.c smarter, more in the vein of relcache.c, and store the relevant info in there? I thought that's what I was proposing in point #1. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plpgsql versus domains
Andres Freund and...@2ndquadrant.com writes: On 2015-02-26 12:31:46 -0500, Tom Lane wrote: 2. In plpgsql, explicitly model a domain type as being its base type plus some constraints --- that is, stop depending on domain_in() to enforce the constraints, and do it ourselves. This would mean that the FmgrInfo in a PLpgSQL_type struct would reference the base type's input function rather than domain_in, so we'd not have to be afraid to cache that. The constraints would be represented as a separate list, which we'd arrange to fetch from the typcache once per transaction. Hm. A bit sad to open code that in plpgsql instead of making it available more generally. The actual checking wouldn't be open-coded, it would come from domain_check() (or possibly a modified version of that). It is true that plpgsql would know more about domains than it does today, but I'm not sure I see another use-case for this type of logic. To some extent this is a workaround for the fact that plpgsql does type conversions the way it does (ie, by I/O rather than by using the parser's cast mechanisms). We've talked about changing that, but people seem to be afraid of the compatibility consequences, and I'm not planning to fight two compatibility battles concurrently ;-) You're probably going to kill me because of the increased complexity, but how about making typecache.c smarter, more in the vein of relcache.c, and store the relevant info in there? I thought that's what I was proposing in point #1. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers