Re: CAST(... ON DEFAULT) - WIP build on top of Error-Safe User Functions
On Tue, Mar 28, 2023 at 3:25 PM Isaac Morland wrote: > On Mon, 19 Dec 2022 at 17:57, Corey Huinker > wrote: > >> >> Attached is my work in progress to implement the changes to the CAST() >> function as proposed by Vik Fearing. >> >> CAST(expr AS typename NULL ON ERROR) >> will use error-safe functions to do the cast of expr, and will return >> NULL if the cast fails. >> >> CAST(expr AS typename DEFAULT expr2 ON ERROR) >> will use error-safe functions to do the cast of expr, and will return >> expr2 if the cast fails. >> > > Is there any difference between NULL and DEFAULT NULL? > What I think you're asking is: is there a difference between these two statements: SELECT CAST(my_string AS integer NULL ON ERROR) FROM my_table; SELECT CAST(my_string AS integer DEFAULT NULL ON ERROR) FROM my_table; And as I understand it, the answer would be no, there is no practical difference. The first case is just a convenient shorthand, whereas the second case tees you up for a potentially complex expression. Before you ask, I believe the ON ERROR syntax could be made optional. As I implemented it, both cases create a default expression which then typecast to integer, and in both cases that expression would be a const-null, so the optimizer steps would very quickly collapse those steps into a plain old constant.
Re: CAST(... ON DEFAULT) - WIP build on top of Error-Safe User Functions
On Tue, Mar 28, 2023 at 2:53 PM Gregory Stark (as CFM) wrote: > On Tue, 3 Jan 2023 at 14:16, Tom Lane wrote: > > > > Vik Fearing writes: > > > > > I don't think we should add that syntax until I do get it through the > > > committee, just in case they change something. > > > > Agreed. So this is something we won't be able to put into v16; > > it'll have to wait till there's something solid from the committee. > > I guess I'll mark this Rejected in the CF then. Who knows when the SQL > committee will look at this... > > -- > Gregory Stark > As Commitfest Manager > Yes, for now. I'm in touch with the pg-people on the committee and will resume work when there's something to act upon.
Re: CAST(... ON DEFAULT) - WIP build on top of Error-Safe User Functions
On Mon, 19 Dec 2022 at 17:57, Corey Huinker wrote: > > Attached is my work in progress to implement the changes to the CAST() > function as proposed by Vik Fearing. > > CAST(expr AS typename NULL ON ERROR) > will use error-safe functions to do the cast of expr, and will return > NULL if the cast fails. > > CAST(expr AS typename DEFAULT expr2 ON ERROR) > will use error-safe functions to do the cast of expr, and will return > expr2 if the cast fails. > Is there any difference between NULL and DEFAULT NULL?
Re: CAST(... ON DEFAULT) - WIP build on top of Error-Safe User Functions
On Tue, 3 Jan 2023 at 14:16, Tom Lane wrote: > > Vik Fearing writes: > > > I don't think we should add that syntax until I do get it through the > > committee, just in case they change something. > > Agreed. So this is something we won't be able to put into v16; > it'll have to wait till there's something solid from the committee. I guess I'll mark this Rejected in the CF then. Who knows when the SQL committee will look at this... -- Gregory Stark As Commitfest Manager
Re: CAST(... ON DEFAULT) - WIP build on top of Error-Safe User Functions
Vik Fearing writes: > I have not posted my paper to the committee yet, but I plan to do so > before the working group's meeting early February. Just like with > posting patches here, I cannot guarantee that it will get accepted but I > will be arguing for it. > I don't think we should add that syntax until I do get it through the > committee, just in case they change something. Agreed. So this is something we won't be able to put into v16; it'll have to wait till there's something solid from the committee. regards, tom lane
Re: CAST(... ON DEFAULT) - WIP build on top of Error-Safe User Functions
On 1/3/23 19:14, Tom Lane wrote: Corey Huinker writes: On Mon, Jan 2, 2023 at 10:57 AM Tom Lane wrote: While I approve of trying to get some functionality in this area, I'm not sure that extending CAST is a great idea, because I'm afraid that the SQL committee will do something that conflicts with it. I'm going off the spec that Vik presented in https://www.postgresql.org/message-id/f8600a3b-f697-2577-8fea-f40d3e18b...@postgresfriends.org which is his effort to get it through the SQL committee. I'm pretty certain that sending something to pgsql-hackers will have exactly zero impact on the SQL committee. Is there anything actually submitted to the committee, and if so what's its status? I have not posted my paper to the committee yet, but I plan to do so before the working group's meeting early February. Just like with posting patches here, I cannot guarantee that it will get accepted but I will be arguing for it. I don't think we should add that syntax until I do get it through the committee, just in case they change something. -- Vik Fearing
Re: Error-safe user functions
On Sun, Dec 25, 2022 at 12:13 PM Tom Lane wrote: > Here's a proposed patch for converting regprocin and friends > to soft error reporting. I'll say at the outset that it's an > engineering compromise, and it may be worth going further in > future. But I doubt it's worth doing more than this for v16, > because the next steps would be pretty invasive. I don't know that I feel particularly good about converting some errors to be reported softly and others not, especially since the dividing line around which things fall into which category is pretty much "well, whatever seemed hard we didn't convert". We could consider hanging it to report everything as a hard error until we can convert everything, but I'm not sure that's better. On another note, I can't help noticing that all of these patches seem to have been committed without any documentation changes. Maybe that's because there's nothing user-visible that makes any use of these features yet, but if that's true, then we probably ought to add something so that the changes are testable. And having done that we need to explain to users what the behavior actually is: that input validation errors are trapped but other kinds of failures like out of memory are not; that most core data types report all input validation errors softly, and the exceptions; and that for non-core data types the behavior depends on how the extension is coded. I think it's really a mistake to suppose that users won't care about or don't need to know these kinds of details. In my experience, that's just not true. -- Robert Haas EDB: http://www.enterprisedb.com
Re: CAST(... ON DEFAULT) - WIP build on top of Error-Safe User Functions
Corey Huinker writes: > On Mon, Jan 2, 2023 at 10:57 AM Tom Lane wrote: >> While I approve of trying to get some functionality in this area, >> I'm not sure that extending CAST is a great idea, because I'm afraid >> that the SQL committee will do something that conflicts with it. > I'm going off the spec that Vik presented in > https://www.postgresql.org/message-id/f8600a3b-f697-2577-8fea-f40d3e18b...@postgresfriends.org > which is his effort to get it through the SQL committee. I'm pretty certain that sending something to pgsql-hackers will have exactly zero impact on the SQL committee. Is there anything actually submitted to the committee, and if so what's its status? regards, tom lane
Re: CAST(... ON DEFAULT) - WIP build on top of Error-Safe User Functions
On Mon, Jan 2, 2023 at 10:57 AM Tom Lane wrote: > Corey Huinker writes: > > The proposed changes are as follows: > > CAST(expr AS typename) > > continues to behave as before. > > CAST(expr AS typename ERROR ON ERROR) > > has the identical behavior as the unadorned CAST() above. > > CAST(expr AS typename NULL ON ERROR) > > will use error-safe functions to do the cast of expr, and will return > > NULL if the cast fails. > > CAST(expr AS typename DEFAULT expr2 ON ERROR) > > will use error-safe functions to do the cast of expr, and will return > > expr2 if the cast fails. > > While I approve of trying to get some functionality in this area, > I'm not sure that extending CAST is a great idea, because I'm afraid > that the SQL committee will do something that conflicts with it. > If we know that they are about to standardize exactly this syntax, > where is that information available? If we don't know that, > I'd prefer to invent some kind of function or other instead of > extending the grammar. > > regards, tom lane > I'm going off the spec that Vik presented in https://www.postgresql.org/message-id/f8600a3b-f697-2577-8fea-f40d3e18b...@postgresfriends.org which is his effort to get it through the SQL committee. I was alreading thinking about how to get the SQLServer TRY_CAST() function into postgres, so this seemed like the logical next step. While the syntax may change, the underlying infrastructure would remain basically the same: we would need the ability to detect that a typecast had failed, and replace it with the default value, and handle that at parse time, or executor time, and handle array casts where the array has the default but the underlying elements can't. It would be simple to move the grammar changes to their own patch if that removes a barrier for people.
Re: CAST(... ON DEFAULT) - WIP build on top of Error-Safe User Functions
On 2023-01-02 Mo 10:57, Tom Lane wrote: > Corey Huinker writes: >> The proposed changes are as follows: >> CAST(expr AS typename) >> continues to behave as before. >> CAST(expr AS typename ERROR ON ERROR) >> has the identical behavior as the unadorned CAST() above. >> CAST(expr AS typename NULL ON ERROR) >> will use error-safe functions to do the cast of expr, and will return >> NULL if the cast fails. >> CAST(expr AS typename DEFAULT expr2 ON ERROR) >> will use error-safe functions to do the cast of expr, and will return >> expr2 if the cast fails. > While I approve of trying to get some functionality in this area, > I'm not sure that extending CAST is a great idea, because I'm afraid > that the SQL committee will do something that conflicts with it. > If we know that they are about to standardize exactly this syntax, > where is that information available? If we don't know that, > I'd prefer to invent some kind of function or other instead of > extending the grammar. +1 cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: CAST(... ON DEFAULT) - WIP build on top of Error-Safe User Functions
On Tue, 20 Dec 2022 at 04:27, Corey Huinker wrote: > > > Attached is my work in progress to implement the changes to the CAST() > function as proposed by Vik Fearing. > > This work builds upon the Error-safe User Functions work currently ongoing. > > The proposed changes are as follows: > > CAST(expr AS typename) > continues to behave as before. > > CAST(expr AS typename ERROR ON ERROR) > has the identical behavior as the unadorned CAST() above. > > CAST(expr AS typename NULL ON ERROR) > will use error-safe functions to do the cast of expr, and will return > NULL if the cast fails. > > CAST(expr AS typename DEFAULT expr2 ON ERROR) > will use error-safe functions to do the cast of expr, and will return > expr2 if the cast fails. > > There is an additional FORMAT parameter that I have not yet implemented, my > understanding is that it is largely intended for DATE/TIME field conversions, > but others are certainly possible. > CAST(expr AS typename FORMAT fmt DEFAULT expr2 ON ERROR) > > What is currently working: > - Any scalar expression that can be evaluated at parse time. These tests from > cast.sql all currently work: > > VALUES (CAST('error' AS integer)); > VALUES (CAST('error' AS integer ERROR ON ERROR)); > VALUES (CAST('error' AS integer NULL ON ERROR)); > VALUES (CAST('error' AS integer DEFAULT 42 ON ERROR)); > > SELECT CAST('{123,abc,456}' AS integer[] DEFAULT '{-789}' ON ERROR) as > array_test1; > > - Scalar values evaluated at runtime. > > CREATE TEMPORARY TABLE t(t text); > INSERT INTO t VALUES ('a'), ('1'), ('b'), (2); > SELECT CAST(t.t AS integer DEFAULT -1 ON ERROR) AS foo FROM t; > foo > - > -1 >1 > -1 >2 > (4 rows) > > > Along the way, I made a few design decisions, each of which is up for debate: > > First, I created OidInputFunctionCallSafe, which is to OidInputFunctionCall > what InputFunctionCallSafe is to InputFunctionCall. Given that the only place > I ended up using it was stringTypeDatumSafe(), it may be possible to just > move that code inside stringTypeDatumSafe. > > Next, I had a need for FuncExpr, CoerceViaIO, and ArrayCoerce to all report > if their expr argument failed, and if not, just past the evaluation of expr2. > Rather than duplicate this logic in several places, I chose instead to modify > CoalesceExpr to allow for an error-test mode in addition to its default > null-test mode, and then to provide this altered node with two expressions, > the first being the error-safe typecast of expr and the second being the > non-error-safe typecast of expr2. > > I still don't have array-to-array casts working, as the changed I would > likely need to make to ArrayCoerce get somewhat invasive, so this seemed like > a good time to post my work so far and solicit some feedback beyond what I've > already been getting from Jeff Davis and Michael Paquier. > > I've sidestepped domains as well for the time being as well as avoiding JIT > issues entirely. > > No documentation is currently prepared. All but one of the regression test > queries work, the one that is currently failing is: > > SELECT CAST('{234,def,567}'::text[] AS integer[] DEFAULT '{-1011}' ON ERROR) > as array_test2; > > Other quirks: > - an unaliased CAST ON DEFAULT will return the column name of "coalesce", > which internally is true, but obviously would be quite confusing to a user. > > As a side observation, I noticed that the optimizer already tries to resolve > expressions based on constants and to collapse expression trees where > possible, which makes me wonder if the work done to do the same in > transformTypeCast/ and coerce_to_target_type and coerce_type isn't also > wasted. CFBot shows some compilation errors as in [1], please post an updated version for the same: [02:53:44.829] time make -s -j${BUILD_JOBS} world-bin [02:55:41.164] llvmjit_expr.c: In function ‘llvm_compile_expr’: [02:55:41.164] llvmjit_expr.c:928:6: error: ‘v_resnull’ undeclared (first use in this function); did you mean ‘v_resnullp’? [02:55:41.164] 928 | v_resnull = LLVMBuildLoad(b, v_reserrorp, ""); [02:55:41.164] | ^ [02:55:41.164] | v_resnullp [02:55:41.164] llvmjit_expr.c:928:6: note: each undeclared identifier is reported only once for each function it appears in [02:55:41.164] llvmjit_expr.c:928:35: error: ‘v_reserrorp’ undeclared (first use in this function); did you mean ‘v_reserror’? [02:55:41.164] 928 | v_resnull = LLVMBuildLoad(b, v_reserrorp, ""); [02:55:41.164] | ^~~ [02:55:41.164] | v_reserror [02:55:41.165] make[2]: *** [: llvmjit_expr.o] Error 1 [02:55:41.165] make[2]: *** Waiting for unfinished jobs [02:55:45.495] make[1]: *** [Makefile:42: all-backend/jit/llvm-recurse] Error 2 [02:55:45.495] make: *** [GNUmakefile:21: world-bin-src-recurse] Error 2 [1] - https://cirrus-ci.com/task/6687753371385856?logs=gcc_warning#L448 Regards, Vignesh
Re: CAST(... ON DEFAULT) - WIP build on top of Error-Safe User Functions
Corey Huinker writes: > The proposed changes are as follows: > CAST(expr AS typename) > continues to behave as before. > CAST(expr AS typename ERROR ON ERROR) > has the identical behavior as the unadorned CAST() above. > CAST(expr AS typename NULL ON ERROR) > will use error-safe functions to do the cast of expr, and will return > NULL if the cast fails. > CAST(expr AS typename DEFAULT expr2 ON ERROR) > will use error-safe functions to do the cast of expr, and will return > expr2 if the cast fails. While I approve of trying to get some functionality in this area, I'm not sure that extending CAST is a great idea, because I'm afraid that the SQL committee will do something that conflicts with it. If we know that they are about to standardize exactly this syntax, where is that information available? If we don't know that, I'd prefer to invent some kind of function or other instead of extending the grammar. regards, tom lane
Re: Error-safe user functions
On 2022-12-28 We 01:00, Amul Sul wrote: > On Tue, Dec 27, 2022 at 11:17 PM Tom Lane wrote: >> Andrew Dunstan writes: >>> Here's a patch that covers the ltree and intarray contrib modules. >> I would probably have done this a little differently --- I think >> the added "res" parameters aren't really necessary for most of >> these. But it's not worth arguing over. >> > Also, it would be good if we can pass "escontext" through the "state" > argument of makepool() like commit 78212f210114 done for makepol() of > tsquery.c. Attached patch is the updated version that does the same. > Thanks, I have done both of these things. Looks like we're now done with this task, thanks everybody. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Error-safe user functions
On Tue, Dec 27, 2022 at 11:17 PM Tom Lane wrote: > > Andrew Dunstan writes: > > Here's a patch that covers the ltree and intarray contrib modules. > > I would probably have done this a little differently --- I think > the added "res" parameters aren't really necessary for most of > these. But it's not worth arguing over. > Also, it would be good if we can pass "escontext" through the "state" argument of makepool() like commit 78212f210114 done for makepol() of tsquery.c. Attached patch is the updated version that does the same. Regards, Amul v2-ltree-intarray-error-safe.patch Description: Binary data
Re: Error-safe user functions
Andrew Dunstan writes: > On Dec 27, 2022, at 12:47 PM, Tom Lane wrote: >> Andrew Dunstan writes: >>> I think that would leave just hstore to be done. >> Yeah, that matches my scoreboard. Are you going to look at >> hstore, or do you want me to? > Go for it. Done. regards, tom lane
Re: Error-safe user functions
> On Dec 27, 2022, at 12:47 PM, Tom Lane wrote: > > Andrew Dunstan writes: >> Here's a patch that covers the ltree and intarray contrib modules. > > I would probably have done this a little differently --- I think > the added "res" parameters aren't really necessary for most of > these. But it's not worth arguing over. I’ll take another look > >> I think that would leave just hstore to be done. > > Yeah, that matches my scoreboard. Are you going to look at > hstore, or do you want me to? > > Go for it. Cheers Andrew
Re: Error-safe user functions
Andrew Dunstan writes: > Here's a patch that covers the ltree and intarray contrib modules. I would probably have done this a little differently --- I think the added "res" parameters aren't really necessary for most of these. But it's not worth arguing over. > I think that would leave just hstore to be done. Yeah, that matches my scoreboard. Are you going to look at hstore, or do you want me to? regards, tom lane
Re: Error-safe user functions
On 2022-12-26 Mo 18:00, Tom Lane wrote: > I wrote: >> (Perhaps we should go further than this, and convert all these >> functions to just be DirectInputFunctionCallSafe wrappers >> around the corresponding input functions? That would save >> some duplicative code, but I've not done it here.) > I looked closer at that idea, and realized that it would do more than > just save some code: it'd cause the to_regfoo functions to accept > numeric OIDs, as they did not before (and are documented not to). > It is unclear to me whether that inconsistency with the input > functions is really desirable or not --- but I don't offhand see a > good argument for it. If we change this though, it should probably > happen in a separate commit. Accordingly, here's a delta patch > doing that. > > +1 for doing this. The code simplification is nice too. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Error-safe user functions
On 2022-12-26 Mo 14:12, Andrew Dunstan wrote: > On 2022-12-26 Mo 12:47, Tom Lane wrote: >> Here's a proposed patch for making tsvectorin and tsqueryin >> report errors softly. We have to take the changes down a >> couple of levels of subroutines, but it's not hugely difficult. > > Great! > > >> With the other patches I've posted recently, this covers all >> of the core datatype input functions. There are still half >> a dozen to tackle in contrib. >> >> > > Yeah, I'm currently looking at those in ltree. > > Here's a patch that covers the ltree and intarray contrib modules. I think that would leave just hstore to be done. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com diff --git a/contrib/intarray/_int_bool.c b/contrib/intarray/_int_bool.c index 3ed88af76d..77bbf77c43 100644 --- a/contrib/intarray/_int_bool.c +++ b/contrib/intarray/_int_bool.c @@ -149,8 +149,8 @@ pushquery(WORKSTATE *state, int32 type, int32 val) /* * make polish notation of query */ -static int32 -makepol(WORKSTATE *state) +static bool +makepol(WORKSTATE *state, int32 *res, struct Node *escontext) { int32 val, type; @@ -179,7 +179,7 @@ makepol(WORKSTATE *state) else { if (lenstack == STACKDEPTH) - ereport(ERROR, + ereturn(escontext, false, (errcode(ERRCODE_STATEMENT_TOO_COMPLEX), errmsg("statement too complex"))); stack[lenstack] = val; @@ -187,8 +187,10 @@ makepol(WORKSTATE *state) } break; case OPEN: -if (makepol(state) == ERR) - return ERR; +if (!makepol(state, res, escontext)) + return false; +if (*res == ERR) + return true; while (lenstack && (stack[lenstack - 1] == (int32) '&' || stack[lenstack - 1] == (int32) '!')) { @@ -202,14 +204,14 @@ makepol(WORKSTATE *state) lenstack--; pushquery(state, OPR, stack[lenstack]); }; -return END; +*res = END; +return true; break; case ERR: default: -ereport(ERROR, +ereturn(escontext, false, (errcode(ERRCODE_SYNTAX_ERROR), errmsg("syntax error"))); -return ERR; } } @@ -218,7 +220,8 @@ makepol(WORKSTATE *state) lenstack--; pushquery(state, OPR, stack[lenstack]); }; - return END; + *res = END; + return true; } typedef struct @@ -483,6 +486,8 @@ bqarr_in(PG_FUNCTION_ARGS) ITEM *ptr; NODE *tmp; int32 pos = 0; + int32 polres; + struct Node *escontext = fcinfo->context; #ifdef BS_DEBUG StringInfoData pbuf; @@ -495,14 +500,15 @@ bqarr_in(PG_FUNCTION_ARGS) state.str = NULL; /* make polish notation (postfix, but in reverse order) */ - makepol(); + if (!makepol(, , escontext)) + PG_RETURN_NULL(); if (!state.num) - ereport(ERROR, + ereturn(escontext, (Datum) 0, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("empty query"))); if (state.num > QUERYTYPEMAXITEMS) - ereport(ERROR, + ereturn(escontext, (Datum) 0, (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), errmsg("number of query items (%d) exceeds the maximum allowed (%d)", state.num, (int) QUERYTYPEMAXITEMS))); diff --git a/contrib/intarray/expected/_int.out b/contrib/intarray/expected/_int.out index a09d40efa1..c953065a5c 100644 --- a/contrib/intarray/expected/_int.out +++ b/contrib/intarray/expected/_int.out @@ -398,6 +398,21 @@ SELECT '1&(2&(4&(5|!6)))'::query_int; 1 & 2 & 4 & ( 5 | !6 ) (1 row) +-- test non-error-throwing input +SELECT str as "query_int", + pg_input_is_valid(str,'query_int') as ok, + pg_input_error_message(str,'query_int') as errmsg +FROM (VALUES ('1&(2&(4&(5|6)))'), + ('1#(2&(4&(5&6)))'), + ('foo')) + AS a(str); +query_int| ok |errmsg +-++-- + 1&(2&(4&(5|6))) | t | + 1#(2&(4&(5&6))) | f | syntax error + foo | f | syntax error +(3 rows) + CREATE TABLE test__int( a int[] ); \copy test__int from 'data/test__int.data' ANALYZE test__int; diff --git a/contrib/intarray/sql/_int.sql b/contrib/intarray/sql/_int.sql index b26fc57e4d..4c9ba4c1fb 100644 --- a/contrib/intarray/sql/_int.sql +++ b/contrib/intarray/sql/_int.sql @@ -75,6 +75,17 @@ SELECT '1&2&4&5&6'::query_int; SELECT '1&(2&(4&(5|6)))'::query_int; SELECT '1&(2&(4&(5|!6)))'::query_int; +-- test non-error-throwing input + +SELECT str as "query_int", + pg_input_is_valid(str,'query_int') as ok, + pg_input_error_message(str,'query_int') as errmsg +FROM (VALUES ('1&(2&(4&(5|6)))'), + ('1#(2&(4&(5&6)))'), + ('foo')) + AS a(str); + + CREATE TABLE test__int( a int[] ); \copy test__int from 'data/test__int.data' diff --git a/contrib/ltree/expected/ltree.out b/contrib/ltree/expected/ltree.out index c6d8f3ef75..b95be71c78 100644 --- a/contrib/ltree/expected/ltree.out +++ b/contrib/ltree/expected/ltree.out @@ -8084,3 +8084,28 @@ SELECT count(*) FROM _ltreetest WHERE t ? '{23.*.1,23.*.2}' ; 15 (1 row) +-- test
Re: Error-safe user functions
I wrote: > (Perhaps we should go further than this, and convert all these > functions to just be DirectInputFunctionCallSafe wrappers > around the corresponding input functions? That would save > some duplicative code, but I've not done it here.) I looked closer at that idea, and realized that it would do more than just save some code: it'd cause the to_regfoo functions to accept numeric OIDs, as they did not before (and are documented not to). It is unclear to me whether that inconsistency with the input functions is really desirable or not --- but I don't offhand see a good argument for it. If we change this though, it should probably happen in a separate commit. Accordingly, here's a delta patch doing that. regards, tom lane diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 836b9254fb..3bf8d021c3 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -24100,8 +24100,7 @@ SELECT collation for ('foo' COLLATE "de_DE"); obtained by casting the string to type regclass (see ); however, this function will return NULL rather than throwing an error if the name is -not found. Also unlike the cast, this does not accept -a numeric OID as input. +not found. @@ -24118,8 +24117,7 @@ SELECT collation for ('foo' COLLATE "de_DE"); obtained by casting the string to type regcollation (see ); however, this function will return NULL rather than throwing an error if the name is -not found. Also unlike the cast, this does not accept -a numeric OID as input. +not found. @@ -24136,8 +24134,7 @@ SELECT collation for ('foo' COLLATE "de_DE"); obtained by casting the string to type regnamespace (see ); however, this function will return NULL rather than throwing an error if the name is -not found. Also unlike the cast, this does not accept -a numeric OID as input. +not found. @@ -24154,8 +24151,7 @@ SELECT collation for ('foo' COLLATE "de_DE"); obtained by casting the string to type regoper (see ); however, this function will return NULL rather than throwing an error if the name is -not found or is ambiguous. Also unlike the cast, this does not accept -a numeric OID as input. +not found or is ambiguous. @@ -24172,8 +24168,7 @@ SELECT collation for ('foo' COLLATE "de_DE"); obtained by casting the string to type regoperator (see ); however, this function will return NULL rather than throwing an error if the name is -not found. Also unlike the cast, this does not accept -a numeric OID as input. +not found. @@ -24190,8 +24185,7 @@ SELECT collation for ('foo' COLLATE "de_DE"); obtained by casting the string to type regproc (see ); however, this function will return NULL rather than throwing an error if the name is -not found or is ambiguous. Also unlike the cast, this does not accept -a numeric OID as input. +not found or is ambiguous. @@ -24208,8 +24202,7 @@ SELECT collation for ('foo' COLLATE "de_DE"); obtained by casting the string to type regprocedure (see ); however, this function will return NULL rather than throwing an error if the name is -not found. Also unlike the cast, this does not accept -a numeric OID as input. +not found. @@ -24226,8 +24219,7 @@ SELECT collation for ('foo' COLLATE "de_DE"); obtained by casting the string to type regrole (see ); however, this function will return NULL rather than throwing an error if the name is -not found. Also unlike the cast, this does not accept -a numeric OID as input. +not found. @@ -24244,8 +24236,7 @@ SELECT collation for ('foo' COLLATE "de_DE"); obtained by casting the string to type regtype (see ); however, this function will return NULL rather than throwing an error if the name is -not found. Also unlike the cast, this does not accept -a numeric OID as input. +not found. diff --git a/src/backend/utils/adt/regproc.c b/src/backend/utils/adt/regproc.c index 14d76c856d..3635a94633 100644 --- a/src/backend/utils/adt/regproc.c +++ b/src/backend/utils/adt/regproc.c @@ -118,24 +118,15 @@ Datum to_regproc(PG_FUNCTION_ARGS) { char *pro_name = text_to_cstring(PG_GETARG_TEXT_PP(0)); - List *names; - FuncCandidateList clist; + Datum result; ErrorSaveContext escontext = {T_ErrorSaveContext}; - /* - * Parse the name into components and see if it matches any pg_proc - * entries in the current search path. - */ - names = stringToQualifiedNameList(pro_name, (Node *) ); - if (names == NIL)
Re: Error-safe user functions
On 2022-12-26 Mo 12:47, Tom Lane wrote: > Here's a proposed patch for making tsvectorin and tsqueryin > report errors softly. We have to take the changes down a > couple of levels of subroutines, but it's not hugely difficult. Great! > > With the other patches I've posted recently, this covers all > of the core datatype input functions. There are still half > a dozen to tackle in contrib. > > Yeah, I'm currently looking at those in ltree. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Error-safe user functions
Here's a proposed patch for making tsvectorin and tsqueryin report errors softly. We have to take the changes down a couple of levels of subroutines, but it's not hugely difficult. A couple of points worthy of comment: * To reduce API changes, I made the functions in tsvector_parser.c and tsquery.c pass around the escontext pointer in TSVectorParseState and TSQueryParserState respectively. This is a little duplicative, but since those structs are private within those files, there's no easy way to share the same pointer except by adding it as a new parameter to all those functions. This also means that if any of the outside callers of parse_tsquery (in to_tsany.c) wanted to do soft error handling and wanted their custom PushFunctions to be able to report such errors, they'd need to pass the escontext via their "opaque" passthrough structs, making for yet a third copy. Still, I judged adding an extra parameter to dozens of functions wasn't a better way. * There are two places in tsquery parsing that emit nuisance NOTICEs about empty queries. I chose to suppress those when soft error handling has been requested. Maybe we should rethink whether we want them at all? With the other patches I've posted recently, this covers all of the core datatype input functions. There are still half a dozen to tackle in contrib. regards, tom lane diff --git a/src/backend/tsearch/to_tsany.c b/src/backend/tsearch/to_tsany.c index edeffacc2d..c0030ddaec 100644 --- a/src/backend/tsearch/to_tsany.c +++ b/src/backend/tsearch/to_tsany.c @@ -594,7 +594,8 @@ to_tsquery_byid(PG_FUNCTION_ARGS) query = parse_tsquery(text_to_cstring(in), pushval_morph, PointerGetDatum(), - 0); + 0, + NULL); PG_RETURN_TSQUERY(query); } @@ -630,7 +631,8 @@ plainto_tsquery_byid(PG_FUNCTION_ARGS) query = parse_tsquery(text_to_cstring(in), pushval_morph, PointerGetDatum(), - P_TSQ_PLAIN); + P_TSQ_PLAIN, + NULL); PG_RETURN_POINTER(query); } @@ -667,7 +669,8 @@ phraseto_tsquery_byid(PG_FUNCTION_ARGS) query = parse_tsquery(text_to_cstring(in), pushval_morph, PointerGetDatum(), - P_TSQ_PLAIN); + P_TSQ_PLAIN, + NULL); PG_RETURN_TSQUERY(query); } @@ -704,7 +707,8 @@ websearch_to_tsquery_byid(PG_FUNCTION_ARGS) query = parse_tsquery(text_to_cstring(in), pushval_morph, PointerGetDatum(), - P_TSQ_WEB); + P_TSQ_WEB, + NULL); PG_RETURN_TSQUERY(query); } diff --git a/src/backend/utils/adt/tsquery.c b/src/backend/utils/adt/tsquery.c index a206926042..1097294d55 100644 --- a/src/backend/utils/adt/tsquery.c +++ b/src/backend/utils/adt/tsquery.c @@ -16,6 +16,7 @@ #include "libpq/pqformat.h" #include "miscadmin.h" +#include "nodes/miscnodes.h" #include "tsearch/ts_locale.h" #include "tsearch/ts_type.h" #include "tsearch/ts_utils.h" @@ -58,10 +59,16 @@ typedef enum /* * get token from query string * - * *operator is filled in with OP_* when return values is PT_OPR, - * but *weight could contain a distance value in case of phrase operator. - * *strval, *lenval and *weight are filled in when return value is PT_VAL + * All arguments except "state" are output arguments. * + * If return value is PT_OPR, then *operator is filled with an OP_* code + * and *weight will contain a distance value in case of phrase operator. + * + * If return value is PT_VAL, then *lenval, *strval, *weight, and *prefix + * are filled. + * + * If PT_ERR is returned then a soft error has occurred. If state->escontext + * isn't already filled then this should be reported as a generic parse error. */ typedef ts_tokentype (*ts_tokenizer) (TSQueryParserState state, int8 *operator, int *lenval, char **strval, @@ -93,6 +100,9 @@ struct TSQueryParserStateData /* state for value's parser */ TSVectorParseState valstate; + + /* context object for soft errors - must match valstate's escontext */ + Node *escontext; }; /* @@ -194,7 +204,7 @@ parse_phrase_operator(TSQueryParserState pstate, int16 *distance) if (ptr == endptr) return false; else if (errno == ERANGE || l < 0 || l > MAXENTRYPOS) - ereport(ERROR, + ereturn(pstate->escontext, false, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("distance in phrase operator must be an integer value between zero and %d inclusive", MAXENTRYPOS))); @@ -301,10 +311,8 @@ gettoken_query_standard(TSQueryParserState state, int8 *operator, } else if (t_iseq(state->buf, ':')) { - ereport(ERROR, - (errcode(ERRCODE_SYNTAX_ERROR), - errmsg("syntax error in tsquery: \"%s\"", - state->buffer))); + /* generic syntax error message is fine */ + return PT_ERR; } else if (!t_isspace(state->buf)) { @@ -320,12 +328,17 @@ gettoken_query_standard(TSQueryParserState state, int8 *operator, state->state = WAITOPERATOR;
Re: Error-safe user functions
On 2022-12-25 Su 12:13, Tom Lane wrote: > Robert Haas writes: >> On Fri, Dec 16, 2022 at 1:31 PM Tom Lane wrote: >>> The reg* functions probably need a unified plan as to how far >>> down we want to push non-error behavior. >> I would be in favor of an aggressive approach. > Here's a proposed patch for converting regprocin and friends > to soft error reporting. I'll say at the outset that it's an > engineering compromise, and it may be worth going further in > future. But I doubt it's worth doing more than this for v16, > because the next steps would be pretty invasive. It's a judgement call, but I'm not too fussed about stopping here for v16. I see the reg* items as probably the lowest priority to fix. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Error-safe user functions
I got annoyed by the fact that types cid, xid, xid8 don't throw error even for obvious garbage, because they just believe the result of strtoul or strtoull without any checking. That was probably up to project standards when cidin and xidin were written; but surely it's not anymore, especially when we can piggyback on work already done for type oid. Anybody have an objection to the following? One note is that because we already had test cases checking that xid would accept hex input, I made the common subroutines use "0" not "10" for strtoul's last argument, meaning that oid will accept hex now too. regards, tom lane diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c index 7cded73e6e..c67a79344a 100644 --- a/src/backend/utils/adt/numutils.c +++ b/src/backend/utils/adt/numutils.c @@ -477,6 +477,180 @@ invalid_syntax: "bigint", s))); } +/* + * Convert input string to an unsigned 32 bit integer. + * + * Allows any number of leading or trailing whitespace characters. + * + * If endloc isn't NULL, store a pointer to the rest of the string there, + * so that caller can parse the rest. Otherwise, it's an error if anything + * but whitespace follows. + * + * typname is what is reported in error messges. + * + * If escontext points to an ErrorSaveContext node, that is filled instead + * of throwing an error; the caller must check SOFT_ERROR_OCCURRED() + * to detect errors. + */ +uint32 +uint32in_subr(const char *s, char **endloc, + const char *typname, Node *escontext) +{ + uint32 result; + unsigned long cvt; + char *endptr; + + /* Ensure that empty-input is handled consistently across platforms */ + if (*s == '\0') + ereturn(escontext, 0, +(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), + errmsg("invalid input syntax for type %s: \"%s\"", + typname, s))); + + errno = 0; + cvt = strtoul(s, , 0); + + /* + * strtoul() normally only sets ERANGE. On some systems it also may set + * EINVAL, which simply means it couldn't parse the input string. This is + * handled by the second "if" consistent across platforms. + */ + if (errno && errno != ERANGE && errno != EINVAL) + ereturn(escontext, 0, +(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), + errmsg("invalid input syntax for type %s: \"%s\"", + typname, s))); + + if (endptr == s) + ereturn(escontext, 0, +(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), + errmsg("invalid input syntax for type %s: \"%s\"", + typname, s))); + + if (errno == ERANGE) + ereturn(escontext, 0, +(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), + errmsg("value \"%s\" is out of range for type %s", + s, typname))); + + if (endloc) + { + /* caller wants to deal with rest of string */ + *endloc = endptr; + } + else + { + /* allow only whitespace after number */ + while (*endptr && isspace((unsigned char) *endptr)) + endptr++; + if (*endptr) + ereturn(escontext, 0, + (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), + errmsg("invalid input syntax for type %s: \"%s\"", + typname, s))); + } + + result = (uint32) cvt; + + /* + * Cope with possibility that unsigned long is wider than uint32, in which + * case strtoul will not raise an error for some values that are out of + * the range of uint32. + * + * For backwards compatibility, we want to accept inputs that are given + * with a minus sign, so allow the input value if it matches after either + * signed or unsigned extension to long. + * + * To ensure consistent results on 32-bit and 64-bit platforms, make sure + * the error message is the same as if strtoul() had returned ERANGE. + */ +#if PG_UINT32_MAX != ULONG_MAX + if (cvt != (unsigned long) result && + cvt != (unsigned long) ((int) result)) + ereturn(escontext, 0, +(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), + errmsg("value \"%s\" is out of range for type %s", + s, typname))); +#endif + + return result; +} + +/* + * Convert input string to an unsigned 64 bit integer. + * + * Allows any number of leading or trailing whitespace characters. + * + * If endloc isn't NULL, store a pointer to the rest of the string there, + * so that caller can parse the rest. Otherwise, it's an error if anything + * but whitespace follows. + * + * typname is what is reported in error messges. + * + * If escontext points to an ErrorSaveContext node, that is filled instead + * of throwing an error; the caller must check SOFT_ERROR_OCCURRED() + * to detect errors. + */ +uint64 +uint64in_subr(const char *s, char **endloc, + const char *typname, Node *escontext) +{ + uint64 result; + char *endptr; + + /* Ensure that empty-input is handled consistently across platforms */ + if (*s == '\0') + ereturn(escontext, 0, +(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), + errmsg("invalid input syntax for type %s: \"%s\"", + typname, s))); + + errno = 0; + result = strtou64(s, , 0); + + /* + * strtoul[l]() normally only sets ERANGE. On
Re: Error-safe user functions
Robert Haas writes: > On Fri, Dec 16, 2022 at 1:31 PM Tom Lane wrote: >> The reg* functions probably need a unified plan as to how far >> down we want to push non-error behavior. > I would be in favor of an aggressive approach. Here's a proposed patch for converting regprocin and friends to soft error reporting. I'll say at the outset that it's an engineering compromise, and it may be worth going further in future. But I doubt it's worth doing more than this for v16, because the next steps would be pretty invasive. I've converted all the errors thrown directly within regproc.c, and also converted parseTypeString, typeStringToTypeName, and stringToQualifiedNameList to report their own errors softly. This affected some outside callers, but not so many of them that I think it's worth inventing compatibility wrappers. I dealt with lookup failures by just changing the input functions to call the respective lookup functions with missing_ok = true, and then throw their own error softly on failure. Also, I've changed to_regproc() and friends to return NULL in exactly the same cases that are now soft errors for the input functions. Previously they were a bit inconsistent about what triggered hard errors vs. returning NULL. (Perhaps we should go further than this, and convert all these functions to just be DirectInputFunctionCallSafe wrappers around the corresponding input functions? That would save some duplicative code, but I've not done it here.) What's not fixed here: 1. As previously discussed, parse errors in type names are thrown by the main grammar, so getting those to not be hard errors seems like too big a lift for today. 2. Errors about invalid type modifiers (reported by typenameTypeMod or type-specific typmodin routines) are not trapped either. Fixing this would require extending the soft-error conventions to typmodin routines, which maybe will be worth doing someday but it seems pretty far down the priority list. Specifying a typmod is surely not main-line usage for regtypein. 3. Throwing our own error has the demerit that it might be different from what the underlying lookup function would have reported. This is visible in some changes in existing regression test cases, such as -ERROR: schema "ng_catalog" does not exist +ERROR: relation "ng_catalog.pg_class" does not exist This isn't wrong, exactly, but the loss of specificity is a bit annoying. 4. This still fails to trap errors about "too many dotted names" and "cross-database references are not implemented", which are thrown in DeconstructQualifiedName, LookupTypeName, RangeVarGetRelid, and maybe some other places. 5. We also don't trap errors about "the schema exists but you don't have USAGE permission to do a lookup in it", because LookupExplicitNamespace still throws that even when passed missing_ok = true. The obvious way to fix #3,#4,#5 is to change pretty much all of the catalog lookup infrastructure to deal in escontext arguments instead of "missing_ok" booleans. That might be worth doing --- it'd have benefits beyond the immediate problem, I think --- but I feel it's a bigger lift than we want to undertake for v16. It'd be better to spend the time we have left for v16 on building features that use soft error reporting than on refining corner cases in the reg* functions. So I think we should stop more or less here, possibly after changing the to_regfoo functions to be simple wrappers around the soft input functions. regards, tom lane diff --git a/src/backend/catalog/objectaddress.c b/src/backend/catalog/objectaddress.c index 109bdfb33f..a1df8b1ddc 100644 --- a/src/backend/catalog/objectaddress.c +++ b/src/backend/catalog/objectaddress.c @@ -2182,7 +2182,7 @@ pg_get_object_address(PG_FUNCTION_ARGS) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("name or argument lists may not contain nulls"))); - typename = typeStringToTypeName(TextDatumGetCString(elems[0])); + typename = typeStringToTypeName(TextDatumGetCString(elems[0]), NULL); } else if (type == OBJECT_LARGEOBJECT) { @@ -2238,7 +2238,8 @@ pg_get_object_address(PG_FUNCTION_ARGS) (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("name or argument lists may not contain nulls"))); args = lappend(args, - typeStringToTypeName(TextDatumGetCString(elems[i]))); + typeStringToTypeName(TextDatumGetCString(elems[i]), +NULL)); } } else diff --git a/src/backend/parser/parse_type.c b/src/backend/parser/parse_type.c index f7ad689459..8f3850aa4e 100644 --- a/src/backend/parser/parse_type.c +++ b/src/backend/parser/parse_type.c @@ -727,10 +727,15 @@ pts_error_callback(void *arg) * Given a string that is supposed to be a SQL-compatible type declaration, * such as "int4" or "integer" or "character varying(32)", parse * the string and return the result as a TypeName. - * If the string cannot be parsed as a type, an error is raised. + * + * If the string
Re: Error-safe user functions
On 2022-12-24 Sa 10:42, Tom Lane wrote: > Andrew Dunstan writes: >> As I was giving this a final polish I noticed this in jspConvertRegexFlags: >> /* >> * We'll never need sub-match details at execution. While >> * RE_compile_and_execute would set this flag anyway, force it on here to >> * ensure that the regex cache entries created by makeItemLikeRegex are >> * useful. >> */ >> cflags |= REG_NOSUB; >> Clearly the comment would no longer be true. I guess I should just >> remove this? > Yeah, we can just drop that I guess. I'm slightly worried that we might > need it again after some future refactoring; but it's not really worth > devising a re-worded comment to justify keeping it. > > Also, I realized that I failed in my reviewerly duty by not noticing > that you'd forgotten to pg_regfree the regex after successful > compilation. Running something like this exposes the memory leak > very quickly: > > select pg_input_is_valid('$ ? (@ like_regex "pattern" flag "smixq")', > 'jsonpath') > from generate_series(1,1000); > > The attached delta patch takes care of it. (Per comment at pg_regcomp, > we don't need this after a failure return.) > > Thanks, pushed with those changes. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Error-safe user functions
Ted Yu writes: > On Fri, Dec 23, 2022 at 1:22 PM Tom Lane wrote: >> Yes, it is. We don't want a query-cancel transformed into a soft error. > The same regex, without user interruption, would exhibit an `invalid > regular expression` error. On what grounds do you claim that? The timing of arrival of the SIGINT is basically chance --- it might happen while we're inside backend/regex, or not. I mean, sure you could claim that a bad regex might run a long time and thereby be more likely to cause the user to issue a query cancel, but that's a stretched line of reasoning. regards, tom lane
Re: Error-safe user functions
Andrew Dunstan writes: > As I was giving this a final polish I noticed this in jspConvertRegexFlags: > /* > * We'll never need sub-match details at execution. While > * RE_compile_and_execute would set this flag anyway, force it on here to > * ensure that the regex cache entries created by makeItemLikeRegex are > * useful. > */ > cflags |= REG_NOSUB; > Clearly the comment would no longer be true. I guess I should just > remove this? Yeah, we can just drop that I guess. I'm slightly worried that we might need it again after some future refactoring; but it's not really worth devising a re-worded comment to justify keeping it. Also, I realized that I failed in my reviewerly duty by not noticing that you'd forgotten to pg_regfree the regex after successful compilation. Running something like this exposes the memory leak very quickly: select pg_input_is_valid('$ ? (@ like_regex "pattern" flag "smixq")', 'jsonpath') from generate_series(1,1000); The attached delta patch takes care of it. (Per comment at pg_regcomp, we don't need this after a failure return.) regards, tom lane diff --git a/src/backend/utils/adt/jsonpath_gram.y b/src/backend/utils/adt/jsonpath_gram.y index 8c3a0c7623..30179408f5 100644 --- a/src/backend/utils/adt/jsonpath_gram.y +++ b/src/backend/utils/adt/jsonpath_gram.y @@ -560,6 +560,8 @@ makeItemLikeRegex(JsonPathParseItem *expr, JsonPathString *pattern, (errcode(ERRCODE_INVALID_REGULAR_EXPRESSION), errmsg("invalid regular expression: %s", errMsg))); } + + pg_regfree(_tmp); } *result = v;
Re: Error-safe user functions
On Sat, Dec 24, 2022 at 4:38 AM Andrew Dunstan wrote: > > On 2022-12-24 Sa 04:51, Ted Yu wrote: > > > > > > On Fri, Dec 23, 2022 at 1:22 PM Tom Lane wrote: > > > > Ted Yu writes: > > > In makeItemLikeRegex : > > > > > + /* See regexp.c for explanation */ > > > + CHECK_FOR_INTERRUPTS(); > > > + pg_regerror(re_result, _tmp, errMsg, > > > sizeof(errMsg)); > > > + ereturn(escontext, false, > > > > > Since an error is returned, I wonder if the > > `CHECK_FOR_INTERRUPTS` call is > > > still necessary. > > > > Yes, it is. We don't want a query-cancel transformed into a soft > > error. > > > > regards, tom lane > > > > Hi, > > For this case (`invalid regular expression`), the potential user > > interruption is one reason for stopping execution. > > I feel surfacing user interruption somehow masks the underlying error. > > > > The same regex, without user interruption, would exhibit an `invalid > > regular expression` error. > > I think it would be better to surface the error. > > > > > > All that this patch is doing is replacing a call to > RE_compile_and_cache, which calls CHECK_FOR_INTERRUPTS, with similar > code, which gives us the opportunity to call ereturn instead of ereport. > Note that where escontext is NULL (the common case), ereturn functions > identically to ereport. So unless you want to argue that the logic in > RE_compile_and_cache is wrong I don't see what we're arguing about. If > instead I had altered the API of RE_compile_and_cache to include an > escontext parameter we wouldn't be having this argument at all. The only > reason I didn't do that was the point Tom quite properly raised about > why we're doing any caching here anyway. > > > cheers > > > andrew > > -- > Andrew Dunstan > EDB: https://www.enterprisedb.com Andrew: Thanks for the response.
Re: Error-safe user functions
On 2022-12-23 Fr 13:53, Tom Lane wrote: > Andrew Dunstan writes: >> On 2022-12-22 Th 11:44, Tom Lane wrote: >>> (I wonder why this is using RE_compile_and_cache at all, really, >>> rather than some other API. There doesn't seem to be value in >>> forcing the regex into the cache at this point.) >> I agree. The attached uses pg_regcomp instead. I had a lift a couple of >> lines from regexp.c, but not too many. > LGTM. No further comments. > > As I was giving this a final polish I noticed this in jspConvertRegexFlags: /* * We'll never need sub-match details at execution. While * RE_compile_and_execute would set this flag anyway, force it on here to * ensure that the regex cache entries created by makeItemLikeRegex are * useful. */ cflags |= REG_NOSUB; Clearly the comment would no longer be true. I guess I should just remove this? cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Error-safe user functions
On 2022-12-24 Sa 04:51, Ted Yu wrote: > > > On Fri, Dec 23, 2022 at 1:22 PM Tom Lane wrote: > > Ted Yu writes: > > In makeItemLikeRegex : > > > + /* See regexp.c for explanation */ > > + CHECK_FOR_INTERRUPTS(); > > + pg_regerror(re_result, _tmp, errMsg, > > sizeof(errMsg)); > > + ereturn(escontext, false, > > > Since an error is returned, I wonder if the > `CHECK_FOR_INTERRUPTS` call is > > still necessary. > > Yes, it is. We don't want a query-cancel transformed into a soft > error. > > regards, tom lane > > Hi, > For this case (`invalid regular expression`), the potential user > interruption is one reason for stopping execution. > I feel surfacing user interruption somehow masks the underlying error. > > The same regex, without user interruption, would exhibit an `invalid > regular expression` error. > I think it would be better to surface the error. > > All that this patch is doing is replacing a call to RE_compile_and_cache, which calls CHECK_FOR_INTERRUPTS, with similar code, which gives us the opportunity to call ereturn instead of ereport. Note that where escontext is NULL (the common case), ereturn functions identically to ereport. So unless you want to argue that the logic in RE_compile_and_cache is wrong I don't see what we're arguing about. If instead I had altered the API of RE_compile_and_cache to include an escontext parameter we wouldn't be having this argument at all. The only reason I didn't do that was the point Tom quite properly raised about why we're doing any caching here anyway. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Error-safe user functions
On Fri, Dec 23, 2022 at 1:22 PM Tom Lane wrote: > Ted Yu writes: > > In makeItemLikeRegex : > > > + /* See regexp.c for explanation */ > > + CHECK_FOR_INTERRUPTS(); > > + pg_regerror(re_result, _tmp, errMsg, > > sizeof(errMsg)); > > + ereturn(escontext, false, > > > Since an error is returned, I wonder if the `CHECK_FOR_INTERRUPTS` call > is > > still necessary. > > Yes, it is. We don't want a query-cancel transformed into a soft error. > > regards, tom lane > Hi, For this case (`invalid regular expression`), the potential user interruption is one reason for stopping execution. I feel surfacing user interruption somehow masks the underlying error. The same regex, without user interruption, would exhibit an `invalid regular expression` error. I think it would be better to surface the error. Cheers
Re: Error-safe user functions
Ted Yu writes: > On Fri, Dec 23, 2022 at 1:22 PM Tom Lane wrote: >> Ted Yu writes: >>> + /* See regexp.c for explanation */ >>> + CHECK_FOR_INTERRUPTS(); >> Yes, it is. We don't want a query-cancel transformed into a soft error. > `ereturn(escontext` calls appear in multiple places in the patch. > What about other callsites (w.r.t. checking interrupt) ? What about them? The reason this one is special is that backend/regexp might return a failure code that's specifically "I gave up because there's a query cancel pending". We don't want to report that as a soft error. It's true that we might cancel the query for real a bit later on even if this check weren't here, but that doesn't mean it's okay to go down the soft error path and hope that there'll be a CHECK_FOR_INTERRUPTS sometime before there's any visible evidence that we did the wrong thing. regards, tom lane
Re: Error-safe user functions
On Fri, Dec 23, 2022 at 1:22 PM Tom Lane wrote: > Ted Yu writes: > > In makeItemLikeRegex : > > > + /* See regexp.c for explanation */ > > + CHECK_FOR_INTERRUPTS(); > > + pg_regerror(re_result, _tmp, errMsg, > > sizeof(errMsg)); > > + ereturn(escontext, false, > > > Since an error is returned, I wonder if the `CHECK_FOR_INTERRUPTS` call > is > > still necessary. > > Yes, it is. We don't want a query-cancel transformed into a soft error. > > regards, tom lane > Hi, `ereturn(escontext` calls appear in multiple places in the patch. What about other callsites (w.r.t. checking interrupt) ? Cheers
Re: Error-safe user functions
Ted Yu writes: > In makeItemLikeRegex : > + /* See regexp.c for explanation */ > + CHECK_FOR_INTERRUPTS(); > + pg_regerror(re_result, _tmp, errMsg, > sizeof(errMsg)); > + ereturn(escontext, false, > Since an error is returned, I wonder if the `CHECK_FOR_INTERRUPTS` call is > still necessary. Yes, it is. We don't want a query-cancel transformed into a soft error. regards, tom lane
Re: Error-safe user functions
On Fri, Dec 23, 2022 at 9:20 AM Andrew Dunstan wrote: > > On 2022-12-22 Th 11:44, Tom Lane wrote: > > Andrew Dunstan writes: > >> Yeah, I started there, but it's substantially more complex - unlike cube > >> the jsonpath scanner calls the error routines as well as the parser. > >> Anyway, here's a patch. > > I looked through this and it seems generally OK. A minor nitpick is > > that we usually write "(Datum) 0" not "(Datum) NULL" for dont-care Datum > > values. > > > Fixed in the new version attached. > > > > A slightly bigger issue is that makeItemLikeRegex still allows > > an error to be thrown from RE_compile_and_cache if a bogus regex is > > presented. But that could be dealt with later. > > > I'd rather fix it now while we're paying attention. > > > > > > (I wonder why this is using RE_compile_and_cache at all, really, > > rather than some other API. There doesn't seem to be value in > > forcing the regex into the cache at this point.) > > > > > > > I agree. The attached uses pg_regcomp instead. I had a lift a couple of > lines from regexp.c, but not too many. > > > cheers > > > andrew > > > -- > Andrew Dunstan > EDB: https://www.enterprisedb.com Hi, In makeItemLikeRegex : + /* See regexp.c for explanation */ + CHECK_FOR_INTERRUPTS(); + pg_regerror(re_result, _tmp, errMsg, sizeof(errMsg)); + ereturn(escontext, false, Since an error is returned, I wonder if the `CHECK_FOR_INTERRUPTS` call is still necessary. Cheers
Re: Error-safe user functions
Andrew Dunstan writes: > On 2022-12-22 Th 11:44, Tom Lane wrote: >> (I wonder why this is using RE_compile_and_cache at all, really, >> rather than some other API. There doesn't seem to be value in >> forcing the regex into the cache at this point.) > I agree. The attached uses pg_regcomp instead. I had a lift a couple of > lines from regexp.c, but not too many. LGTM. No further comments. regards, tom lane
Re: Error-safe user functions
On 2022-12-22 Th 11:44, Tom Lane wrote: > Andrew Dunstan writes: >> Yeah, I started there, but it's substantially more complex - unlike cube >> the jsonpath scanner calls the error routines as well as the parser. >> Anyway, here's a patch. > I looked through this and it seems generally OK. A minor nitpick is > that we usually write "(Datum) 0" not "(Datum) NULL" for dont-care Datum > values. Fixed in the new version attached. > A slightly bigger issue is that makeItemLikeRegex still allows > an error to be thrown from RE_compile_and_cache if a bogus regex is > presented. But that could be dealt with later. I'd rather fix it now while we're paying attention. > > (I wonder why this is using RE_compile_and_cache at all, really, > rather than some other API. There doesn't seem to be value in > forcing the regex into the cache at this point.) > > I agree. The attached uses pg_regcomp instead. I had a lift a couple of lines from regexp.c, but not too many. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com diff --git a/src/backend/utils/adt/jsonpath.c b/src/backend/utils/adt/jsonpath.c index 91af030095..6c7a5b9854 100644 --- a/src/backend/utils/adt/jsonpath.c +++ b/src/backend/utils/adt/jsonpath.c @@ -66,16 +66,19 @@ #include "funcapi.h" #include "lib/stringinfo.h" #include "libpq/pqformat.h" +#include "nodes/miscnodes.h" #include "miscadmin.h" #include "utils/builtins.h" #include "utils/json.h" #include "utils/jsonpath.h" -static Datum jsonPathFromCstring(char *in, int len); +static Datum jsonPathFromCstring(char *in, int len, struct Node *escontext); static char *jsonPathToCstring(StringInfo out, JsonPath *in, int estimated_len); -static int flattenJsonPathParseItem(StringInfo buf, JsonPathParseItem *item, +static bool flattenJsonPathParseItem(StringInfo buf, int *result, + struct Node *escontext, + JsonPathParseItem *item, int nestingLevel, bool insideArraySubscript); static void alignStringInfoInt(StringInfo buf); static int32 reserveSpaceForItemPointer(StringInfo buf); @@ -95,7 +98,7 @@ jsonpath_in(PG_FUNCTION_ARGS) char *in = PG_GETARG_CSTRING(0); int len = strlen(in); - return jsonPathFromCstring(in, len); + return jsonPathFromCstring(in, len, fcinfo->context); } /* @@ -119,7 +122,7 @@ jsonpath_recv(PG_FUNCTION_ARGS) else elog(ERROR, "unsupported jsonpath version number: %d", version); - return jsonPathFromCstring(str, nbytes); + return jsonPathFromCstring(str, nbytes, NULL); } /* @@ -165,24 +168,29 @@ jsonpath_send(PG_FUNCTION_ARGS) * representation of jsonpath. */ static Datum -jsonPathFromCstring(char *in, int len) +jsonPathFromCstring(char *in, int len, struct Node *escontext) { - JsonPathParseResult *jsonpath = parsejsonpath(in, len); + JsonPathParseResult *jsonpath = parsejsonpath(in, len, escontext); JsonPath *res; StringInfoData buf; - initStringInfo(); - enlargeStringInfo(, 4 * len /* estimation */ ); - - appendStringInfoSpaces(, JSONPATH_HDRSZ); + if (SOFT_ERROR_OCCURRED(escontext)) + return (Datum) NULL; if (!jsonpath) - ereport(ERROR, + ereturn(escontext, (Datum) NULL, (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), errmsg("invalid input syntax for type %s: \"%s\"", "jsonpath", in))); - flattenJsonPathParseItem(, jsonpath->expr, 0, false); + initStringInfo(); + enlargeStringInfo(, 4 * len /* estimation */ ); + + appendStringInfoSpaces(, JSONPATH_HDRSZ); + + if (!flattenJsonPathParseItem(, NULL, escontext, + jsonpath->expr, 0, false)) + return (Datum) NULL; res = (JsonPath *) buf.data; SET_VARSIZE(res, buf.len); @@ -225,9 +233,10 @@ jsonPathToCstring(StringInfo out, JsonPath *in, int estimated_len) * Recursive function converting given jsonpath parse item and all its * children into a binary representation. */ -static int -flattenJsonPathParseItem(StringInfo buf, JsonPathParseItem *item, - int nestingLevel, bool insideArraySubscript) +static bool +flattenJsonPathParseItem(StringInfo buf, int *result, struct Node *escontext, + JsonPathParseItem *item, int nestingLevel, + bool insideArraySubscript) { /* position from beginning of jsonpath data */ int32 pos = buf->len - JSONPATH_HDRSZ; @@ -295,16 +304,22 @@ flattenJsonPathParseItem(StringInfo buf, JsonPathParseItem *item, int32 left = reserveSpaceForItemPointer(buf); int32 right = reserveSpaceForItemPointer(buf); -chld = !item->value.args.left ? pos : - flattenJsonPathParseItem(buf, item->value.args.left, - nestingLevel + argNestingLevel, - insideArraySubscript); +if (!item->value.args.left) + chld = pos; +else if (! flattenJsonPathParseItem(buf, , escontext, + item->value.args.left, + nestingLevel + argNestingLevel, + insideArraySubscript)) + return false; *(int32 *) (buf->data + left) = chld - pos; -chld =
Re: Error-safe user functions
On 2022-12-22 Th 01:10, Tom Lane wrote: > Andrew Dunstan writes: >> And here's another for contrib/seg >> I'm planning to commit these two in the next day or so. > I didn't look at the jsonpath one yet. The seg patch passes > an eyeball check, with one minor nit: in seg_atof, > > + *result = float4in_internal(value, NULL, "real", value, escontext); > > don't we want to use "seg" as the type_name? > > Even more nitpicky, in > > +seg_yyerror(SEG *result, struct Node *escontext, const char *message) > { > + if (SOFT_ERROR_OCCURRED(escontext)) > + return; > > I'd be inclined to add some explanation, say > > +seg_yyerror(SEG *result, struct Node *escontext, const char *message) > { > + /* if we already reported an error, don't overwrite it */ > + if (SOFT_ERROR_OCCURRED(escontext)) > + return; > > Thanks for the review. Fixed both of these and pushed. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Error-safe user functions
Andrew Dunstan writes: > Yeah, I started there, but it's substantially more complex - unlike cube > the jsonpath scanner calls the error routines as well as the parser. > Anyway, here's a patch. I looked through this and it seems generally OK. A minor nitpick is that we usually write "(Datum) 0" not "(Datum) NULL" for dont-care Datum values. A slightly bigger issue is that makeItemLikeRegex still allows an error to be thrown from RE_compile_and_cache if a bogus regex is presented. But that could be dealt with later. (I wonder why this is using RE_compile_and_cache at all, really, rather than some other API. There doesn't seem to be value in forcing the regex into the cache at this point.) regards, tom lane
Re: Error-safe user functions
Andrew Dunstan writes: > And here's another for contrib/seg > I'm planning to commit these two in the next day or so. I didn't look at the jsonpath one yet. The seg patch passes an eyeball check, with one minor nit: in seg_atof, + *result = float4in_internal(value, NULL, "real", value, escontext); don't we want to use "seg" as the type_name? Even more nitpicky, in +seg_yyerror(SEG *result, struct Node *escontext, const char *message) { + if (SOFT_ERROR_OCCURRED(escontext)) + return; I'd be inclined to add some explanation, say +seg_yyerror(SEG *result, struct Node *escontext, const char *message) { + /* if we already reported an error, don't overwrite it */ + if (SOFT_ERROR_OCCURRED(escontext)) + return; regards, tom lane
Re: Error-safe user functions
On 2022-12-18 Su 09:42, Andrew Dunstan wrote: > On 2022-12-14 We 17:37, Tom Lane wrote: >> Andrew Dunstan writes: >>> Thanks, I have been looking at jsonpath, but I'm not quite sure how to >>> get the escontext argument to the yyerror calls in jsonath_scan.l. Maybe >>> I need to specify a lex-param setting? >> You want a parse-param option in jsonpath_gram.y, I think; adding that >> will persuade Bison to change the signatures of relevant functions. >> Compare the mods I made in contrib/cube in ccff2d20e. >> >> > > Yeah, I started there, but it's substantially more complex - unlike cube > the jsonpath scanner calls the error routines as well as the parser. > > > Anyway, here's a patch. > > And here's another for contrib/seg I'm planning to commit these two in the next day or so. cheers andew -- Andrew Dunstan EDB: https://www.enterprisedb.com From d7a0d93ee90ccc61b7179f236dcd1a824f30c8e6 Mon Sep 17 00:00:00 2001 From: Andrew Dunstan Date: Wed, 21 Dec 2022 18:14:03 -0500 Subject: [PATCH] Provide error safety for contrib/seg's input function --- contrib/seg/expected/seg.out | 20 contrib/seg/seg.c| 4 ++-- contrib/seg/segdata.h| 5 +++-- contrib/seg/segparse.y | 34 +++--- contrib/seg/segscan.l| 10 +++--- contrib/seg/sql/seg.sql | 13 + 6 files changed, 68 insertions(+), 18 deletions(-) diff --git a/contrib/seg/expected/seg.out b/contrib/seg/expected/seg.out index 2320464dd4..7a06113ed8 100644 --- a/contrib/seg/expected/seg.out +++ b/contrib/seg/expected/seg.out @@ -1273,3 +1273,23 @@ FROM test_seg WHERE s @> '11.2..11.3' OR s IS NULL ORDER BY s; || (144 rows) +-- test non error throwing API +SELECT str as seg, + pg_input_is_valid(str,'seg') as ok, + pg_input_error_message(str,'seg') as errmsg +FROM unnest(ARRAY['-1 .. 1'::text, + '100(+-)1', + '', + 'ABC', + '1 e7', + '1e700']) str; + seg| ok |errmsg +--++--- + -1 .. 1 | t | + 100(+-)1 | t | + | f | bad seg representation + ABC | f | bad seg representation + 1 e7 | f | bad seg representation + 1e700| f | "1e700" is out of range for type real +(6 rows) + diff --git a/contrib/seg/seg.c b/contrib/seg/seg.c index a7effc1b19..7f9fc24eb4 100644 --- a/contrib/seg/seg.c +++ b/contrib/seg/seg.c @@ -108,8 +108,8 @@ seg_in(PG_FUNCTION_ARGS) seg_scanner_init(str); - if (seg_yyparse(result) != 0) - seg_yyerror(result, "bogus input"); + if (seg_yyparse(result, fcinfo->context) != 0) + seg_yyerror(result, fcinfo->context, "bogus input"); seg_scanner_finish(); diff --git a/contrib/seg/segdata.h b/contrib/seg/segdata.h index f4eafc865d..3d6e3e3f24 100644 --- a/contrib/seg/segdata.h +++ b/contrib/seg/segdata.h @@ -16,9 +16,10 @@ extern int significant_digits(const char *s); /* in segscan.l */ extern int seg_yylex(void); -extern void seg_yyerror(SEG *result, const char *message) pg_attribute_noreturn(); +extern void seg_yyerror(SEG *result, struct Node *escontext, + const char *message); extern void seg_scanner_init(const char *str); extern void seg_scanner_finish(void); /* in segparse.y */ -extern int seg_yyparse(SEG *result); +extern int seg_yyparse(SEG *result, struct Node *escontext); diff --git a/contrib/seg/segparse.y b/contrib/seg/segparse.y index 0156c3e027..19f1dba50f 100644 --- a/contrib/seg/segparse.y +++ b/contrib/seg/segparse.y @@ -7,7 +7,9 @@ #include #include "fmgr.h" +#include "nodes/miscnodes.h" #include "utils/builtins.h" +#include "utils/float.h" #include "segdata.h" @@ -19,7 +21,7 @@ #define YYMALLOC palloc #define YYFREE pfree -static float seg_atof(const char *value); +static bool seg_atof(char *value, float *result, struct Node *escontext); static int sig_digits(const char *value); @@ -35,6 +37,7 @@ static char strbuf[25] = { /* BISON Declarations */ %parse-param {SEG *result} +%parse-param {struct Node *escontext} %expect 0 %name-prefix="seg_yy" @@ -77,7 +80,7 @@ range: boundary PLUMIN deviation result->lower = $1.val; result->upper = $3.val; if ( result->lower > result->upper ) { - ereport(ERROR, + errsave(escontext, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("swapped boundaries: %g is greater than %g", result->lower, result->upper))); @@ -121,7 +124,10 @@ range: boundary PLUMIN deviation boundary: SEGFLOAT { /* temp variable avoids a gcc 3.3.x bug on Sparc64 */ - float val = seg_atof($1); + float val; + + if (!seg_atof($1, , escontext)) + YYABORT; $$.ext = '\0'; $$.sigd = sig_digits($1); @@ -130,7 +136,10 @@ boundary: SEGFLOAT | EXTENSION SEGFLOAT { /* temp variable avoids a gcc 3.3.x bug on Sparc64 */ - float val = seg_atof($2); +
CAST(... ON DEFAULT) - WIP build on top of Error-Safe User Functions
Attached is my work in progress to implement the changes to the CAST() function as proposed by Vik Fearing. This work builds upon the Error-safe User Functions work currently ongoing. The proposed changes are as follows: CAST(expr AS typename) continues to behave as before. CAST(expr AS typename ERROR ON ERROR) has the identical behavior as the unadorned CAST() above. CAST(expr AS typename NULL ON ERROR) will use error-safe functions to do the cast of expr, and will return NULL if the cast fails. CAST(expr AS typename DEFAULT expr2 ON ERROR) will use error-safe functions to do the cast of expr, and will return expr2 if the cast fails. There is an additional FORMAT parameter that I have not yet implemented, my understanding is that it is largely intended for DATE/TIME field conversions, but others are certainly possible. CAST(expr AS typename FORMAT fmt DEFAULT expr2 ON ERROR) What is currently working: - Any scalar expression that can be evaluated at parse time. These tests from cast.sql all currently work: VALUES (CAST('error' AS integer)); VALUES (CAST('error' AS integer ERROR ON ERROR)); VALUES (CAST('error' AS integer NULL ON ERROR)); VALUES (CAST('error' AS integer DEFAULT 42 ON ERROR)); SELECT CAST('{123,abc,456}' AS integer[] DEFAULT '{-789}' ON ERROR) as array_test1; - Scalar values evaluated at runtime. CREATE TEMPORARY TABLE t(t text); INSERT INTO t VALUES ('a'), ('1'), ('b'), (2); SELECT CAST(t.t AS integer DEFAULT -1 ON ERROR) AS foo FROM t; foo - -1 1 -1 2 (4 rows) Along the way, I made a few design decisions, each of which is up for debate: First, I created OidInputFunctionCallSafe, which is to OidInputFunctionCall what InputFunctionCallSafe is to InputFunctionCall. Given that the only place I ended up using it was stringTypeDatumSafe(), it may be possible to just move that code inside stringTypeDatumSafe. Next, I had a need for FuncExpr, CoerceViaIO, and ArrayCoerce to all report if their expr argument failed, and if not, just past the evaluation of expr2. Rather than duplicate this logic in several places, I chose instead to modify CoalesceExpr to allow for an error-test mode in addition to its default null-test mode, and then to provide this altered node with two expressions, the first being the error-safe typecast of expr and the second being the non-error-safe typecast of expr2. I still don't have array-to-array casts working, as the changed I would likely need to make to ArrayCoerce get somewhat invasive, so this seemed like a good time to post my work so far and solicit some feedback beyond what I've already been getting from Jeff Davis and Michael Paquier. I've sidestepped domains as well for the time being as well as avoiding JIT issues entirely. No documentation is currently prepared. All but one of the regression test queries work, the one that is currently failing is: SELECT CAST('{234,def,567}'::text[] AS integer[] DEFAULT '{-1011}' ON ERROR) as array_test2; Other quirks: - an unaliased CAST ON DEFAULT will return the column name of "coalesce", which internally is true, but obviously would be quite confusing to a user. As a side observation, I noticed that the optimizer already tries to resolve expressions based on constants and to collapse expression trees where possible, which makes me wonder if the work done to do the same in transformTypeCast/ and coerce_to_target_type and coerce_type isn't also wasted. From 897c9c68a29ad0fa57f28734df0c77553e026d80 Mon Sep 17 00:00:00 2001 From: coreyhuinker Date: Sun, 18 Dec 2022 16:20:01 -0500 Subject: [PATCH 1/3] add OidInputFunctionCall --- src/backend/utils/fmgr/fmgr.c | 13 + src/include/fmgr.h| 4 2 files changed, 17 insertions(+) diff --git a/src/backend/utils/fmgr/fmgr.c b/src/backend/utils/fmgr/fmgr.c index 0d37f69298..e9a19ce653 100644 --- a/src/backend/utils/fmgr/fmgr.c +++ b/src/backend/utils/fmgr/fmgr.c @@ -1701,6 +1701,19 @@ OidInputFunctionCall(Oid functionId, char *str, Oid typioparam, int32 typmod) return InputFunctionCall(, str, typioparam, typmod); } +bool +OidInputFunctionCallSafe(Oid functionId, char *str, Oid typioparam, + int32 typmod, fmNodePtr escontext, + Datum *result) +{ + FmgrInfo flinfo; + + fmgr_info(functionId, ); + + return InputFunctionCallSafe(, str, typioparam, typmod, + escontext, result); +} + char * OidOutputFunctionCall(Oid functionId, Datum val) { diff --git a/src/include/fmgr.h b/src/include/fmgr.h index b7832d0aa2..b835ef72b5 100644 --- a/src/include/fmgr.h +++ b/src/include/fmgr.h @@ -706,6 +706,10 @@ extern bool InputFunctionCallSafe(FmgrInfo *flinfo, char *str, Datum *result); extern Datum OidInputFunctionCall(Oid functionId, char *str, Oid typioparam, int32 typmod); +extern bool +OidInputFunctionCallSafe(Oid functionId, char *str, Oid typioparam, + int32 typmod, fmNodePtr escontext, + Datum *result); extern char *OutputFunctionCal
Re: Error-safe user functions
On Mon, Dec 19, 2022 at 4:27 PM Tom Lane wrote: > In [1] I wrote > > >>> I'm a little concerned about the cost-benefit of fixing the reg* types. > >>> The ones that accept type names actually use the core grammar to parse > >>> those. Now, we probably could fix the grammar to be non-throwing, but > >>> it'd be very invasive and I'm not sure about the performance impact. > >>> It might be best to content ourselves with soft reporting of lookup > >>> failures, as opposed to syntax problems. Ah right. I agree that invading the main grammar doesn't seem terribly appealing. Setting regtypein aside could be a sensible choice, then. Another option might be to have some way of parsing type names outside of the main grammar, which would be more work and would require keeping things in sync, but perhaps it would end up being less ugly -- Robert Haas EDB: http://www.enterprisedb.com
Re: Error-safe user functions
Robert Haas writes: > On Mon, Dec 19, 2022 at 11:44 AM Tom Lane wrote: >> ... I guess you didn't read my remarks upthread about regtypein. >> I do not want to try to make gram.y+scan.l non-error-throwing. > Huh, for some reason I'm not seeing an email about that. Do you have a link? In [1] I wrote >>> I'm a little concerned about the cost-benefit of fixing the reg* types. >>> The ones that accept type names actually use the core grammar to parse >>> those. Now, we probably could fix the grammar to be non-throwing, but >>> it'd be very invasive and I'm not sure about the performance impact. >>> It might be best to content ourselves with soft reporting of lookup >>> failures, as opposed to syntax problems. regards, tom lane [1] https://www.postgresql.org/message-id/1863335.1670783397%40sss.pgh.pa.us
Re: Error-safe user functions
On Mon, Dec 19, 2022 at 11:44 AM Tom Lane wrote: > > It also doesn't seem too bad from an implementation point of view to > > try to cover all the caes. > > ... I guess you didn't read my remarks upthread about regtypein. > I do not want to try to make gram.y+scan.l non-error-throwing. Huh, for some reason I'm not seeing an email about that. Do you have a link? -- Robert Haas EDB: http://www.enterprisedb.com
Re: Error-safe user functions
Robert Haas writes: > On Fri, Dec 16, 2022 at 1:31 PM Tom Lane wrote: >> The reg* functions probably need a unified plan as to how far >> down we want to push non-error behavior. The rest of these >> I think just require turning the crank along the same lines >> as in functions already dealt with. > I would be in favor of an aggressive approach. I agree that anything based on implementation concerns is going to look pretty unprincipled to end users. However ... > It also doesn't seem too bad from an implementation point of view to > try to cover all the caes. ... I guess you didn't read my remarks upthread about regtypein. I do not want to try to make gram.y+scan.l non-error-throwing. regards, tom lane
Re: Error-safe user functions
On Fri, Dec 16, 2022 at 1:31 PM Tom Lane wrote: > The reg* functions probably need a unified plan as to how far > down we want to push non-error behavior. The rest of these > I think just require turning the crank along the same lines > as in functions already dealt with. I would be in favor of an aggressive approach. For example, let's look at regclassin(). It calls oidin(), stringToQualifiedNameList(), makeRangeVarFromNameList(), and RangeVarGetRelidExtended(). Basically, oidin() could fail if the input, known to be all digits, is out of range; stringToQualifiedNameList() could fail due to mismatched delimiters or improperly-separated names; makeRangeVarFromNameList() doesn't want to have more than three name components (db.schema.relation); and RangeVarGetRelidExtended() doesn't like cross-database references or non-existent relations. Now, one option here would be to distinguish between something that could be valid in some database but isn't in this one, like a non-existent relation name, and one that couldn't ever work anywhere, like a relation name with four parts or bad quoting. You could decide that the former kind of error will be reported softly but the latter is hard error. But I think that is presuming that we know how users will want to use this functionality, and I don't think we do. I also think that it will be confusing to users. Finally, I think it's different from what we do for other data types. You could equally well argue that, for int4in, we ought to treat '99' and 'potato' differently, one a hard error and the other soft. I think it's hard to puzzle out a decision that makes any sense there, and I don't think this case is much different. I don't think it's too hard to mentally separate errors about the validity of the input from, say, out of memory errors -- but one distinguishing between one kind of input validity check and another seems like a muddle. It also doesn't seem too bad from an implementation point of view to try to cover all the caes. The stickiest case looks to be RangeVarGetRelidExtended() and we might need to give a bit of thought to how to handle that one. The others don't seem like a big issue, and oidin() is already done. > While it'd be good to get all of these done before v16 feature > freeze, I can't see that any of them represent blockers for > building features based on soft input error handling. +1. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Error-safe user functions
On 2022-12-14 We 17:37, Tom Lane wrote: > Andrew Dunstan writes: >> Thanks, I have been looking at jsonpath, but I'm not quite sure how to >> get the escontext argument to the yyerror calls in jsonath_scan.l. Maybe >> I need to specify a lex-param setting? > You want a parse-param option in jsonpath_gram.y, I think; adding that > will persuade Bison to change the signatures of relevant functions. > Compare the mods I made in contrib/cube in ccff2d20e. > > Yeah, I started there, but it's substantially more complex - unlike cube the jsonpath scanner calls the error routines as well as the parser. Anyway, here's a patch. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com diff --git a/src/backend/utils/adt/jsonpath.c b/src/backend/utils/adt/jsonpath.c index 91af030095..6c7a5b9854 100644 --- a/src/backend/utils/adt/jsonpath.c +++ b/src/backend/utils/adt/jsonpath.c @@ -66,16 +66,19 @@ #include "funcapi.h" #include "lib/stringinfo.h" #include "libpq/pqformat.h" +#include "nodes/miscnodes.h" #include "miscadmin.h" #include "utils/builtins.h" #include "utils/json.h" #include "utils/jsonpath.h" -static Datum jsonPathFromCstring(char *in, int len); +static Datum jsonPathFromCstring(char *in, int len, struct Node *escontext); static char *jsonPathToCstring(StringInfo out, JsonPath *in, int estimated_len); -static int flattenJsonPathParseItem(StringInfo buf, JsonPathParseItem *item, +static bool flattenJsonPathParseItem(StringInfo buf, int *result, + struct Node *escontext, + JsonPathParseItem *item, int nestingLevel, bool insideArraySubscript); static void alignStringInfoInt(StringInfo buf); static int32 reserveSpaceForItemPointer(StringInfo buf); @@ -95,7 +98,7 @@ jsonpath_in(PG_FUNCTION_ARGS) char *in = PG_GETARG_CSTRING(0); int len = strlen(in); - return jsonPathFromCstring(in, len); + return jsonPathFromCstring(in, len, fcinfo->context); } /* @@ -119,7 +122,7 @@ jsonpath_recv(PG_FUNCTION_ARGS) else elog(ERROR, "unsupported jsonpath version number: %d", version); - return jsonPathFromCstring(str, nbytes); + return jsonPathFromCstring(str, nbytes, NULL); } /* @@ -165,24 +168,29 @@ jsonpath_send(PG_FUNCTION_ARGS) * representation of jsonpath. */ static Datum -jsonPathFromCstring(char *in, int len) +jsonPathFromCstring(char *in, int len, struct Node *escontext) { - JsonPathParseResult *jsonpath = parsejsonpath(in, len); + JsonPathParseResult *jsonpath = parsejsonpath(in, len, escontext); JsonPath *res; StringInfoData buf; - initStringInfo(); - enlargeStringInfo(, 4 * len /* estimation */ ); - - appendStringInfoSpaces(, JSONPATH_HDRSZ); + if (SOFT_ERROR_OCCURRED(escontext)) + return (Datum) NULL; if (!jsonpath) - ereport(ERROR, + ereturn(escontext, (Datum) NULL, (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), errmsg("invalid input syntax for type %s: \"%s\"", "jsonpath", in))); - flattenJsonPathParseItem(, jsonpath->expr, 0, false); + initStringInfo(); + enlargeStringInfo(, 4 * len /* estimation */ ); + + appendStringInfoSpaces(, JSONPATH_HDRSZ); + + if (!flattenJsonPathParseItem(, NULL, escontext, + jsonpath->expr, 0, false)) + return (Datum) NULL; res = (JsonPath *) buf.data; SET_VARSIZE(res, buf.len); @@ -225,9 +233,10 @@ jsonPathToCstring(StringInfo out, JsonPath *in, int estimated_len) * Recursive function converting given jsonpath parse item and all its * children into a binary representation. */ -static int -flattenJsonPathParseItem(StringInfo buf, JsonPathParseItem *item, - int nestingLevel, bool insideArraySubscript) +static bool +flattenJsonPathParseItem(StringInfo buf, int *result, struct Node *escontext, + JsonPathParseItem *item, int nestingLevel, + bool insideArraySubscript) { /* position from beginning of jsonpath data */ int32 pos = buf->len - JSONPATH_HDRSZ; @@ -295,16 +304,22 @@ flattenJsonPathParseItem(StringInfo buf, JsonPathParseItem *item, int32 left = reserveSpaceForItemPointer(buf); int32 right = reserveSpaceForItemPointer(buf); -chld = !item->value.args.left ? pos : - flattenJsonPathParseItem(buf, item->value.args.left, - nestingLevel + argNestingLevel, - insideArraySubscript); +if (!item->value.args.left) + chld = pos; +else if (! flattenJsonPathParseItem(buf, , escontext, + item->value.args.left, + nestingLevel + argNestingLevel, + insideArraySubscript)) + return false; *(int32 *) (buf->data + left) = chld - pos; -chld = !item->value.args.right ? pos : - flattenJsonPathParseItem(buf, item->value.args.right, - nestingLevel + argNestingLevel, - insideArraySubscript); +if (!item->value.args.right) + chld = pos; +else if (! flattenJsonPathParseItem(buf, , escontext, + item->value.args.right, + nestingLevel +
Re: Error-safe user functions
On 2022-12-15 Th 09:12, Robert Haas wrote: > On Wed, Dec 14, 2022 at 6:24 PM Tom Lane wrote: >> I've gone through these now and revised/pushed most of them. > Tom, I just want to extend huge thanks to you for working on this > infrastructure. jsonpath aside, I think this is going to pay dividends > in many ways for many years to come. It's something that we've needed > for a really long time, and I'm very happy that we're moving forward > with it. > > Thanks so much. > Robert beat me to it, but I will heartily second this. Many thanks. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Error-safe user functions
I wrote: > I'm going to step back from this for now and get on with other work, > but before that I thought there was one more input function I should > look at: xml_in, because xml.c is such a hairy can of worms. Pushed that. For the record, my list of input functions still needing attention stands at Core: jsonpath_in regclassin regcollationin regconfigin regdictionaryin regnamespacein regoperatorin regoperin regprocedurein regprocin regrolein regtypein tsqueryin tsvectorin Contrib: hstore: hstore_in intarray: bqarr_in isn: ean13_in isbn_in ismn_in issn_in upc_in ltree: ltree_in lquery_in ltxtq_in seg: seg_in The reg* functions probably need a unified plan as to how far down we want to push non-error behavior. The rest of these I think just require turning the crank along the same lines as in functions already dealt with. While it'd be good to get all of these done before v16 feature freeze, I can't see that any of them represent blockers for building features based on soft input error handling. regards, tom lane
Re: Error-safe user functions
Robert Haas writes: > Tom, I just want to extend huge thanks to you for working on this > infrastructure. Thanks. I agree it's an important bit of work. I'm going to step back from this for now and get on with other work, but before that I thought there was one more input function I should look at: xml_in, because xml.c is such a hairy can of worms. It turns out to be not too bad, given our design principle that only "bad input" errors should be reported softly. xml_parse() now has two different ways of reporting errors depending on whether they're hard or soft, but it didn't take an undue amount of refactoring to make that work. While fixing that, my attention was drawn to wellformed_xml(), whose error handling is unbelievably horrid: it traps any longjmp whatsoever (query cancel, for instance) and reports it as ill-formed XML. 0002 attached makes use of this new code to get rid of the need for any PG_TRY there at all; instead, soft errors result in a "false" return but hard errors are allowed to propagate. xml_is_document was much more careful, but we can change it the same way to save code and cycles. regards, tom lane diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c index 3884babc1b..4b9877ee0b 100644 --- a/src/backend/utils/adt/xml.c +++ b/src/backend/utils/adt/xml.c @@ -119,9 +119,10 @@ struct PgXmlErrorContext static xmlParserInputPtr xmlPgEntityLoader(const char *URL, const char *ID, xmlParserCtxtPtr ctxt); +static void xml_errsave(Node *escontext, PgXmlErrorContext *errcxt, + int sqlcode, const char *msg); static void xml_errorHandler(void *data, xmlErrorPtr error); -static void xml_ereport_by_code(int level, int sqlcode, -const char *msg, int code); +static int errdetail_for_xml_code(int code); static void chopStringInfoNewlines(StringInfo str); static void appendStringInfoLineSeparator(StringInfo str); @@ -143,7 +144,8 @@ static bool print_xml_decl(StringInfo buf, const xmlChar *version, pg_enc encoding, int standalone); static bool xml_doctype_in_content(const xmlChar *str); static xmlDocPtr xml_parse(text *data, XmlOptionType xmloption_arg, - bool preserve_whitespace, int encoding); + bool preserve_whitespace, int encoding, + Node *escontext); static text *xml_xmlnodetoxmltype(xmlNodePtr cur, PgXmlErrorContext *xmlerrcxt); static int xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj, ArrayBuildState *astate, @@ -261,14 +263,18 @@ xml_in(PG_FUNCTION_ARGS) xmltype*vardata; xmlDocPtr doc; + /* Build the result object. */ vardata = (xmltype *) cstring_to_text(s); /* - * Parse the data to check if it is well-formed XML data. Assume that - * ERROR occurred if parsing failed. + * Parse the data to check if it is well-formed XML data. + * + * Note: we don't need to worry about whether a soft error is detected. */ - doc = xml_parse(vardata, xmloption, true, GetDatabaseEncoding()); - xmlFreeDoc(doc); + doc = xml_parse(vardata, xmloption, true, GetDatabaseEncoding(), + fcinfo->context); + if (doc != NULL) + xmlFreeDoc(doc); PG_RETURN_XML_P(vardata); #else @@ -323,9 +329,10 @@ xml_out_internal(xmltype *x, pg_enc target_encoding) return buf.data; } - xml_ereport_by_code(WARNING, ERRCODE_INTERNAL_ERROR, - "could not parse XML declaration in stored value", - res_code); + ereport(WARNING, + errcode(ERRCODE_INTERNAL_ERROR), + errmsg_internal("could not parse XML declaration in stored value"), + errdetail_for_xml_code(res_code)); #endif return str; } @@ -392,7 +399,7 @@ xml_recv(PG_FUNCTION_ARGS) * Parse the data to check if it is well-formed XML data. Assume that * xml_parse will throw ERROR if not. */ - doc = xml_parse(result, xmloption, true, encoding); + doc = xml_parse(result, xmloption, true, encoding, NULL); xmlFreeDoc(doc); /* Now that we know what we're dealing with, convert to server encoding */ @@ -754,7 +761,7 @@ xmlparse(text *data, XmlOptionType xmloption_arg, bool preserve_whitespace) xmlDocPtr doc; doc = xml_parse(data, xmloption_arg, preserve_whitespace, - GetDatabaseEncoding()); + GetDatabaseEncoding(), NULL); xmlFreeDoc(doc); return (xmltype *) data; @@ -895,7 +902,7 @@ xml_is_document(xmltype *arg) PG_TRY(); { doc = xml_parse((text *) arg, XMLOPTION_DOCUMENT, true, - GetDatabaseEncoding()); + GetDatabaseEncoding(), NULL); result = true; } PG_CATCH(); @@ -1500,17 +1507,26 @@ xml_doctype_in_content(const xmlChar *str) /* - * Convert a C string to XML internal representation + * Convert a text object to XML internal representation + * + * data is the source data (must not be toasted!), encoding is its encoding, + * and xmloption_arg and preserve_whitespace are options for the + * transformation. + * + * Errors normally result in ereport(ERROR), but if escontext is an + * ErrorSaveContext, then "safe" errors are reported there
Re: Error-safe user functions
On Wed, Dec 14, 2022 at 6:24 PM Tom Lane wrote: > I've gone through these now and revised/pushed most of them. Tom, I just want to extend huge thanks to you for working on this infrastructure. jsonpath aside, I think this is going to pay dividends in many ways for many years to come. It's something that we've needed for a really long time, and I'm very happy that we're moving forward with it. Thanks so much. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Error-safe user functions
On Thu, Dec 15, 2022 at 11:16 AM Tom Lane wrote: > > Amul Sul writes: > > There are other a bunch of hard errors from get_multirange_io_data(), > > get_range_io_data() and its subroutine can hit, shouldn't we care > > about those? > > I think those are all "internal" errors, ie not reachable as a > consequence of bad input data. Do you see a reason to think > differently? Make sense, I was worried about the internal errors as well as an error that the user can cause while declaring multi-range e.g. shell type, but realized that case gets checked at creating that multi-range type. Regards, Amul
Re: Error-safe user functions
Amul Sul writes: > There are other a bunch of hard errors from get_multirange_io_data(), > get_range_io_data() and its subroutine can hit, shouldn't we care > about those? I think those are all "internal" errors, ie not reachable as a consequence of bad input data. Do you see a reason to think differently? regards, tom lane
Re: Error-safe user functions
On Thu, Dec 15, 2022 at 9:03 AM Tom Lane wrote: > > Here are some proposed patches for converting range_in and multirange_in. > > 0001 tackles the straightforward part, which is trapping syntax errors > and called-input-function errors. The only thing that I think might > be controversial here is that I chose to change the signatures of > the exposed functions range_serialize and make_range rather than > inventing xxx_safe variants. I think this is all right, because > AFAIK the only likely reason for extensions to call either of those > is that custom types' canonical functions would need to call > range_serialize --- and those will need to be touched anyway, > see 0002. > > What 0001 does not cover is trapping errors occurring in range > canonicalize functions. I'd first thought maybe doing that wasn't > worth the trouble, but it's not really very hard to fix the built-in > canonicalize functions, as shown in 0002. Probably extensions would > not find it much harder, and in any case they're not really required > to make their errors soft. > > Any objections? > There are other a bunch of hard errors from get_multirange_io_data(), get_range_io_data() and its subroutine can hit, shouldn't we care about those? Regards, Amul
Re: Error-safe user functions
Here are some proposed patches for converting range_in and multirange_in. 0001 tackles the straightforward part, which is trapping syntax errors and called-input-function errors. The only thing that I think might be controversial here is that I chose to change the signatures of the exposed functions range_serialize and make_range rather than inventing xxx_safe variants. I think this is all right, because AFAIK the only likely reason for extensions to call either of those is that custom types' canonical functions would need to call range_serialize --- and those will need to be touched anyway, see 0002. What 0001 does not cover is trapping errors occurring in range canonicalize functions. I'd first thought maybe doing that wasn't worth the trouble, but it's not really very hard to fix the built-in canonicalize functions, as shown in 0002. Probably extensions would not find it much harder, and in any case they're not really required to make their errors soft. Any objections? regards, tom lane diff --git a/src/backend/utils/adt/multirangetypes.c b/src/backend/utils/adt/multirangetypes.c index 307d087c97..ed26cfba2d 100644 --- a/src/backend/utils/adt/multirangetypes.c +++ b/src/backend/utils/adt/multirangetypes.c @@ -120,6 +120,7 @@ multirange_in(PG_FUNCTION_ARGS) char *input_str = PG_GETARG_CSTRING(0); Oid mltrngtypoid = PG_GETARG_OID(1); Oid typmod = PG_GETARG_INT32(2); + Node *escontext = fcinfo->context; TypeCacheEntry *rangetyp; int32 ranges_seen = 0; int32 range_count = 0; @@ -133,6 +134,7 @@ multirange_in(PG_FUNCTION_ARGS) const char *range_str_begin = NULL; int32 range_str_len; char *range_str; + Datum range_datum; cache = get_multirange_io_data(fcinfo, mltrngtypoid, IOFunc_input); rangetyp = cache->typcache->rngtype; @@ -144,7 +146,7 @@ multirange_in(PG_FUNCTION_ARGS) if (*ptr == '{') ptr++; else - ereport(ERROR, + ereturn(escontext, (Datum) 0, (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), errmsg("malformed multirange literal: \"%s\"", input_str), @@ -157,7 +159,7 @@ multirange_in(PG_FUNCTION_ARGS) char ch = *ptr; if (ch == '\0') - ereport(ERROR, + ereturn(escontext, (Datum) 0, (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), errmsg("malformed multirange literal: \"%s\"", input_str), @@ -186,7 +188,7 @@ multirange_in(PG_FUNCTION_ARGS) parse_state = MULTIRANGE_AFTER_RANGE; } else - ereport(ERROR, + ereturn(escontext, (Datum) 0, (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), errmsg("malformed multirange literal: \"%s\"", input_str), @@ -204,10 +206,14 @@ multirange_in(PG_FUNCTION_ARGS) repalloc(ranges, range_capacity * sizeof(RangeType *)); } ranges_seen++; - range = DatumGetRangeTypeP(InputFunctionCall(>typioproc, - range_str, - cache->typioparam, - typmod)); + if (!InputFunctionCallSafe(>typioproc, + range_str, + cache->typioparam, + typmod, + escontext, + _datum)) + PG_RETURN_NULL(); + range = DatumGetRangeTypeP(range_datum); if (!RangeIsEmpty(range)) ranges[range_count++] = range; parse_state = MULTIRANGE_AFTER_RANGE; @@ -256,7 +262,7 @@ multirange_in(PG_FUNCTION_ARGS) else if (ch == '}') parse_state = MULTIRANGE_FINISHED; else - ereport(ERROR, + ereturn(escontext, (Datum) 0, (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), errmsg("malformed multirange literal: \"%s\"", input_str), @@ -280,7 +286,7 @@ multirange_in(PG_FUNCTION_ARGS) ptr++; if (*ptr != '\0') - ereport(ERROR, + ereturn(escontext, (Datum) 0, (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), errmsg("malformed multirange literal: \"%s\"", input_str), @@ -807,7 +813,7 @@ multirange_get_union_range(TypeCacheEntry *rangetyp, multirange_get_bounds(rangetyp, mr, 0, , ); multirange_get_bounds(rangetyp, mr, mr->rangeCount - 1, , ); - return make_range(rangetyp, , , false); + return make_range(rangetyp, , , false, NULL); } @@ -2696,7 +2702,8 @@ range_merge_from_multirange(PG_FUNCTION_ARGS) multirange_get_bounds(typcache->rngtype, mr, mr->rangeCount - 1, , ); - result = make_range(typcache->rngtype, , , false); + result = make_range(typcache->rngtype, , , + false, NULL); } PG_RETURN_RANGE_P(result); diff --git a/src/backend/utils/adt/multirangetypes_selfuncs.c b/src/backend/utils/adt/multirangetypes_selfuncs.c index 919c8889d4..8fd6250e4a 100644 --- a/src/backend/utils/adt/multirangetypes_selfuncs.c +++ b/src/backend/utils/adt/multirangetypes_selfuncs.c @@ -221,7 +221,8 @@ multirangesel(PG_FUNCTION_ARGS) upper.val = ((Const *) other)->constvalue; upper.infinite = false; upper.lower = false; - constrange = range_serialize(typcache->rngtype, , , false); + constrange =
Re: Error-safe user functions
I wrote: > Amul Sul writes: >> Attaching a complete set of the patches changing function till this >> except bpcharin, byteain jsonpath_in that Andrew is planning to look >> in. I have skipped reg* functions. > I'll take a look at these shortly, unless Andrew is already on it. I've gone through these now and revised/pushed most of them. I do not think that we need to touch any unimplemented I/O functions. They can just as well be unimplemented for soft-error cases too; I can't see any use-case where it could be useful to have them not complain. So I discarded v1-0004-Change-brin_bloom_summary_in-to-allow-non-throw-e.patch v1-0005-Change-brin_minmax_multi_summary_in-to-allow-non-.patch v1-0009-Change-gtsvectorin-to-allow-non-throw-error-repor.patch v1-0018-Change-pg_mcv_list_in-to-allow-non-throw-error-re.patch v1-0019-Change-pg_ndistinct_in-to-allow-non-throw-error-r.patch As for the rest, some were closer to being committable than others. You need to be more careful about handling error cases in subroutines: you can't just ereturn from a subroutine and figure you're done, because the caller will keep plugging along if you don't do something to teach it not to. What that would often lead to is the caller finding what it thinks is a new error condition, and overwriting the original message with something that's much less on-point. This is comparable to cascading errors from a compiler: anybody who's dealt with those knows that errors after the first one are often just noise. So we have to be careful to quit after we log the first error. Also, I ended up ripping out the changes in line_construct, because as soon as I tried to test them I tripped over the fact that lseg_sl was still throwing hard errors, before we ever get to line_construct. Perhaps it is worth fixing all that but I have to deem it very low priority, because the two-input-points formulation isn't the mainstream code path. (I kind of wonder too if there isn't a better, more numerically robust conversion method ...) In any case I'm pretty sure those changes in float.h would have drawn Andres' ire. We don't want to be adding arguments to float_overflow_error/float_underflow_error; if that were acceptable they'd not have looked like that to begin with. Anyway, thanks for the work! That moved us a good ways. I think I'm going to go fix bpcharin and varcharin, because those are the last of the SQL-spec-defined types. regards, tom lane
Re: Error-safe user functions
Andrew Dunstan writes: > Thanks, I have been looking at jsonpath, but I'm not quite sure how to > get the escontext argument to the yyerror calls in jsonath_scan.l. Maybe > I need to specify a lex-param setting? You want a parse-param option in jsonpath_gram.y, I think; adding that will persuade Bison to change the signatures of relevant functions. Compare the mods I made in contrib/cube in ccff2d20e. regards, tom lane
Re: Error-safe user functions
On 2022-12-14 We 11:00, Tom Lane wrote: > Amul Sul writes: >> Attaching a complete set of the patches changing function till this >> except bpcharin, byteain jsonpath_in that Andrew is planning to look >> in. I have skipped reg* functions. > I'll take a look at these shortly, unless Andrew is already on it. > > Thanks, I have been looking at jsonpath, but I'm not quite sure how to get the escontext argument to the yyerror calls in jsonath_scan.l. Maybe I need to specify a lex-param setting? cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Error-safe user functions
Amul Sul writes: > Attaching a complete set of the patches changing function till this > except bpcharin, byteain jsonpath_in that Andrew is planning to look > in. I have skipped reg* functions. I'll take a look at these shortly, unless Andrew is already on it. regards, tom lane
Re: Error-safe user functions
On Mon, Dec 12, 2022 at 12:00 AM Tom Lane wrote: > > Andrew Dunstan writes: > > Maybe as we work through the remaining input functions (there are about > > 60 core candidates left on my list) we should mark them with a comment > > if no adjustment is needed. > > I did a quick pass through them last night. Assuming that we don't > need to touch the unimplemented input functions (eg for pseudotypes), > I count these core functions as still needing work: > > aclitemin > bit_in > box_in > bpcharin > byteain > cash_in > cidin > cidr_in > circle_in > inet_in > int2vectorin > jsonpath_in > line_in > lseg_in > macaddr8_in > macaddr_in Attaching patches changing these functions except bpcharin, byteain, jsonpath_in, and cidin. I am continuing work on the next items below: > multirange_in > namein > oidin > oidvectorin > path_in > pg_lsn_in > pg_snapshot_in > point_in > poly_in > range_in > regclassin > regcollationin > regconfigin > regdictionaryin > regnamespacein > regoperatorin > regoperin > regprocedurein > regprocin > regrolein > regtypein > tidin > tsqueryin > tsvectorin > uuid_in > varbit_in > varcharin > xid8in > xidin > xml_in > > and these contrib functions: > > hstore: > hstore_in > intarray: > bqarr_in > isn: > ean13_in > isbn_in > ismn_in > issn_in > upc_in > ltree: > ltree_in > lquery_in > ltxtq_in > seg: > seg_in > > Maybe we should have a conversation about which of these are > highest priority to get to a credible feature. We clearly need > to fix the remaining SQL-spec types (varchar and bpchar, mainly). > At the other extreme, likely nobody would weep if we never fixed > int2vectorin, for instance. > > I'm a little concerned about the cost-benefit of fixing the reg* types. > The ones that accept type names actually use the core grammar to parse > those. Now, we probably could fix the grammar to be non-throwing, but > it'd be very invasive and I'm not sure about the performance impact. > It might be best to content ourselves with soft reporting of lookup > failures, as opposed to syntax problems. > Regards, Amul From 4c4c18bd8104114351ca58a73a9d92fbb0c85dd2 Mon Sep 17 00:00:00 2001 From: Amul Sul Date: Tue, 13 Dec 2022 17:29:04 +0530 Subject: [PATCH v1 13/14] Change macaddr8_in to allow non-throw error reporting --- src/backend/utils/adt/mac8.c | 36 +- src/test/regress/expected/macaddr8.out | 25 ++ src/test/regress/sql/macaddr8.sql | 6 + 3 files changed, 49 insertions(+), 18 deletions(-) diff --git a/src/backend/utils/adt/mac8.c b/src/backend/utils/adt/mac8.c index 24d219f6386..adb253a6ac0 100644 --- a/src/backend/utils/adt/mac8.c +++ b/src/backend/utils/adt/mac8.c @@ -35,7 +35,8 @@ #define lobits(addr) \ ((unsigned long)(((addr)->e<<24) | ((addr)->f<<16) | ((addr)->g<<8) | ((addr)->h))) -static unsigned char hex2_to_uchar(const unsigned char *ptr, const unsigned char *str); +static unsigned char hex2_to_uchar(const unsigned char *ptr, const unsigned char *str, + Node *escontext); static const signed char hexlookup[128] = { -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, @@ -58,7 +59,8 @@ static const signed char hexlookup[128] = { * the entire string, which is used only for error reporting. */ static inline unsigned char -hex2_to_uchar(const unsigned char *ptr, const unsigned char *str) +hex2_to_uchar(const unsigned char *ptr, const unsigned char *str, + Node *escontext) { unsigned char ret = 0; signed char lookup; @@ -88,13 +90,10 @@ hex2_to_uchar(const unsigned char *ptr, const unsigned char *str) return ret; invalid_input: - ereport(ERROR, + ereturn(escontext, 0, (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), errmsg("invalid input syntax for type %s: \"%s\"", "macaddr8", str))); - - /* We do not actually reach here */ - return 0; } /* @@ -104,6 +103,7 @@ Datum macaddr8_in(PG_FUNCTION_ARGS) { const unsigned char *str = (unsigned char *) PG_GETARG_CSTRING(0); + Node *escontext = fcinfo->context; const unsigned char *ptr = str; macaddr8 *result; unsigned char a = 0, @@ -136,32 +136,32 @@ macaddr8_in(PG_FUNCTION_ARGS) switch (count) { case 1: -a = hex2_to_uchar(ptr, str); +a = hex2_to_uchar(ptr, str, escontext); break; case 2: -b = hex2_to_uchar(ptr, str); +b = hex2_to_uchar(ptr, str, escontext); break; case 3: -c = hex2_to_uchar(ptr, str); +c = hex2_to_uchar(ptr, str, escontext); break; case 4: -d = hex2_to_uchar(ptr, str); +d = hex2_to_uchar(ptr, str, escontext); break; case 5: -e = hex2_to_uchar(ptr, str); +e = hex2_to_uchar(ptr, str, escontext); break; case 6: -f = hex2_to_uchar(ptr, str); +f = hex2_to_uchar(ptr, str, escontext); break; case 7: -g = hex2_to_uchar(ptr, str); +g = hex2_to_uchar(ptr, str, escontext); break; case 8: -h = hex2_to_uchar(ptr, str); +h = hex2_to_uchar(ptr, str,
Re: Error-safe user functions
Andres Freund writes: > On 2022-12-11 12:24:11 -0500, Tom Lane wrote: >> I spent some time looking at this, and was discouraged to conclude >> that the notational mess would probably be substantially out of >> proportion to the value. The main problem is that we'd have to change >> the API of pushJsonbValue, which has more than 150 call sites, most >> of which would need to grow a new test for failure return. Maybe >> somebody will feel like tackling that at some point, but not me today. > Could we address this more minimally by putting the error state into the > JsonbParseState and add a check for that error state to convertToJsonb() or > such (by passing in the JsonbParseState)? We'd need to return immediately in > pushJsonbValue() if there's already an error, but that that's not too bad. We could shoehorn error state into the JsonbParseState, although the fact that that stack normally starts out empty is a bit of a problem. I think you'd have to push a dummy entry if you want soft errors, store the error state pointer into that, and have pushState() copy down the parent's error pointer. Kind of ugly, but do-able. Whether it's better than replacing that argument with a pointer-to-struct- that-includes-the-stack-and-the-error-pointer wasn't real clear to me. What seemed like a mess was getting the calling code to quit early. I'm not convinced that just putting an immediate exit into pushJsonbValue would be enough, because the callers tend to assume a series of calls will behave as they expect. Probably some of the call sites could ignore the issue, but you'd still end with a lot of messy changes I fear. regards, tom lane
Re: Error-safe user functions
Hi, On 2022-12-11 12:24:11 -0500, Tom Lane wrote: > Andrew Dunstan writes: > > On 2022-12-10 Sa 14:38, Tom Lane wrote: > >> I have not done anything here about errors within JsonbValueToJsonb. > >> There would need to be another round of API-extension in that area > >> if we want to be able to trap its errors. As you say, those are mostly > >> about exceeding implementation size limits, so I suppose one could argue > >> that they are not so different from palloc failure. It's still annoying. > >> If people are good with the changes attached, I might take a look at > >> that. > > > Awesome. > > I spent some time looking at this, and was discouraged to conclude > that the notational mess would probably be substantially out of > proportion to the value. The main problem is that we'd have to change > the API of pushJsonbValue, which has more than 150 call sites, most > of which would need to grow a new test for failure return. Maybe > somebody will feel like tackling that at some point, but not me today. Could we address this more minimally by putting the error state into the JsonbParseState and add a check for that error state to convertToJsonb() or such (by passing in the JsonbParseState)? We'd need to return immediately in pushJsonbValue() if there's already an error, but that that's not too bad. Greetings, Andres Freund
Re: Error-safe user functions
Andrew Dunstan writes: > Maybe as we work through the remaining input functions (there are about > 60 core candidates left on my list) we should mark them with a comment > if no adjustment is needed. I did a quick pass through them last night. Assuming that we don't need to touch the unimplemented input functions (eg for pseudotypes), I count these core functions as still needing work: aclitemin bit_in box_in bpcharin byteain cash_in cidin cidr_in circle_in inet_in int2vectorin jsonpath_in line_in lseg_in macaddr8_in macaddr_in multirange_in namein oidin oidvectorin path_in pg_lsn_in pg_snapshot_in point_in poly_in range_in regclassin regcollationin regconfigin regdictionaryin regnamespacein regoperatorin regoperin regprocedurein regprocin regrolein regtypein tidin tsqueryin tsvectorin uuid_in varbit_in varcharin xid8in xidin xml_in and these contrib functions: hstore: hstore_in intarray: bqarr_in isn: ean13_in isbn_in ismn_in issn_in upc_in ltree: ltree_in lquery_in ltxtq_in seg: seg_in Maybe we should have a conversation about which of these are highest priority to get to a credible feature. We clearly need to fix the remaining SQL-spec types (varchar and bpchar, mainly). At the other extreme, likely nobody would weep if we never fixed int2vectorin, for instance. I'm a little concerned about the cost-benefit of fixing the reg* types. The ones that accept type names actually use the core grammar to parse those. Now, we probably could fix the grammar to be non-throwing, but it'd be very invasive and I'm not sure about the performance impact. It might be best to content ourselves with soft reporting of lookup failures, as opposed to syntax problems. regards, tom lane
Re: Error-safe user functions
On 2022-12-11 Su 12:24, Tom Lane wrote: > Andrew Dunstan writes: >> On 2022-12-10 Sa 14:38, Tom Lane wrote: >>> I have not done anything here about errors within JsonbValueToJsonb. >>> There would need to be another round of API-extension in that area >>> if we want to be able to trap its errors. As you say, those are mostly >>> about exceeding implementation size limits, so I suppose one could argue >>> that they are not so different from palloc failure. It's still annoying. >>> If people are good with the changes attached, I might take a look at >>> that. >> Awesome. > I spent some time looking at this, and was discouraged to conclude > that the notational mess would probably be substantially out of > proportion to the value. The main problem is that we'd have to change > the API of pushJsonbValue, which has more than 150 call sites, most > of which would need to grow a new test for failure return. Maybe > somebody will feel like tackling that at some point, but not me today. > > Yes, I had similar feelings when I looked at it. I don't think this needs to hold up proceeding with the SQL/JSON rework, which I think can reasonably restart now. Maybe as we work through the remaining input functions (there are about 60 core candidates left on my list) we should mark them with a comment if no adjustment is needed. I'm going to look at jsonpath and the text types next, I somewhat tied up this week but might get to relook at pushJsonbValue later in the month. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Error-safe user functions
Andrew Dunstan writes: > On 2022-12-10 Sa 14:38, Tom Lane wrote: >> I have not done anything here about errors within JsonbValueToJsonb. >> There would need to be another round of API-extension in that area >> if we want to be able to trap its errors. As you say, those are mostly >> about exceeding implementation size limits, so I suppose one could argue >> that they are not so different from palloc failure. It's still annoying. >> If people are good with the changes attached, I might take a look at >> that. > Awesome. I spent some time looking at this, and was discouraged to conclude that the notational mess would probably be substantially out of proportion to the value. The main problem is that we'd have to change the API of pushJsonbValue, which has more than 150 call sites, most of which would need to grow a new test for failure return. Maybe somebody will feel like tackling that at some point, but not me today. regards, tom lane
Re: Error-safe user functions
On 2022-12-10 Sa 19:00, Tom Lane wrote: > Looking at my notes, there's one other loose end > I forgot to mention: > > * Note: pg_unicode_to_server() will throw an error for a > * conversion failure, rather than returning a failure > * indication. That seems OK. > > We ought to do something about that, but I'm not sure how hard we > ought to work at it. Perhaps it's sufficient to make a variant of > pg_unicode_to_server that just returns true/false instead of failing, > and add a JsonParseErrorType for "untranslatable character" to let > json_errdetail return a reasonably on-point message. Seems reasonable. > We could imagine > extending the ErrorSaveContext infrastructure into the encoding > conversion modules, and maybe at some point that'll be worth doing, > but in this particular context it doesn't seem like we'd be getting > a very much better error message. The main thing that we would get > from such an extension is a chance to capture the report from > report_untranslatable_char. But what that adds is the ability to > identify exactly which character couldn't be translated --- and in > this use-case there's always just one. > > Yeah, probably overkill for now. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Error-safe user functions
Andrew Dunstan writes: > On 2022-12-10 Sa 14:38, Tom Lane wrote: >> Seeing that SQL/JSON is one of the major drivers of this whole project, >> it seemed a little sad to me that jsonb couldn't manage to implement >> what is required. So I spent a bit of time poking at it. Attached >> is an extended version of your patch that also covers jsonb. > Many thanks for doing this, it looks good. Cool, thanks. Looking at my notes, there's one other loose end I forgot to mention: * Note: pg_unicode_to_server() will throw an error for a * conversion failure, rather than returning a failure * indication. That seems OK. We ought to do something about that, but I'm not sure how hard we ought to work at it. Perhaps it's sufficient to make a variant of pg_unicode_to_server that just returns true/false instead of failing, and add a JsonParseErrorType for "untranslatable character" to let json_errdetail return a reasonably on-point message. We could imagine extending the ErrorSaveContext infrastructure into the encoding conversion modules, and maybe at some point that'll be worth doing, but in this particular context it doesn't seem like we'd be getting a very much better error message. The main thing that we would get from such an extension is a chance to capture the report from report_untranslatable_char. But what that adds is the ability to identify exactly which character couldn't be translated --- and in this use-case there's always just one. regards, tom lane
Re: Error-safe user functions
On 2022-12-10 Sa 14:38, Tom Lane wrote: > Andrew Dunstan writes: >> OK, json is a fairly easy case, see attached. But jsonb is a different >> kettle of fish. Both the semantic routines called by the parser and the >> subsequent call to JsonbValueToJsonb() can raise errors. These are >> pretty much all about breaking various limits (for strings, objects, >> arrays). There's also a call to numeric_in, but I assume that a string >> that's already parsed as a valid json numeric literal won't upset >> numeric_in. > Um, nope ... > > regression=# select '1e100'::jsonb; > ERROR: value overflows numeric format > LINE 1: select '1e100'::jsonb; >^ Oops, yeah. >> Many of these occur several calls down the stack, so >> adjusting everything to deal with them would be fairly invasive. Perhaps >> we could instead document that this class of input error won't be >> trapped, at least for jsonb. > Seeing that SQL/JSON is one of the major drivers of this whole project, > it seemed a little sad to me that jsonb couldn't manage to implement > what is required. So I spent a bit of time poking at it. Attached > is an extended version of your patch that also covers jsonb. > > The main thing I soon realized is that the JsonSemAction API is based > on the assumption that semantic actions will report errors by throwing > them. This is a bit schizophrenic considering the parser itself carefully > hands back error codes instead of throwing anything (excluding palloc > failures of course). What I propose in the attached is that we change > that API so that action functions return JsonParseErrorType, and add > an enum value denoting "I already logged a suitable error, so you don't > have to". It was a little tedious to modify all the existing functions > that way, but not hard. Only the ones used by jsonb_in need to do > anything except "return JSON_SUCCESS", at least for now. Many thanks for doing this, it looks good. > I have not done anything here about errors within JsonbValueToJsonb. > There would need to be another round of API-extension in that area > if we want to be able to trap its errors. As you say, those are mostly > about exceeding implementation size limits, so I suppose one could argue > that they are not so different from palloc failure. It's still annoying. > If people are good with the changes attached, I might take a look at > that. > > Awesome. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Error-safe user functions
Corey Huinker writes: > On Sat, Dec 10, 2022 at 9:20 AM Tom Lane wrote: >> A fallback we can offer to anyone with such a problem is "write a >> plpgsql function and wrap the potentially-failing bit in an exception >> block". Then they get to pay the cost of the subtransaction, while >> we're not imposing one on everybody else. > That exception block will prevent parallel plans. I'm not saying it isn't > the best way forward for us, but wanted to make that side effect clear. Hmm. Apropos of that, I notice that domain_in is marked PARALLEL SAFE, which seems like a bad idea if it could invoke not-so-parallel-safe expressions. Do we need to mark it less safe, and if so how much less? Anyway, assuming that people are okay with the Not Our Problem approach, the patch is pretty trivial, as attached. I started to write an addition to the CREATE DOMAIN man page recommending that domain CHECK constraints not throw errors, but couldn't get past the bare recommendation. Normally I'd want to explain such a thing along the lines of "For example, X won't work" ... but we don't yet have any committed features that depend on this. I'm inclined to leave it like that for now. If we don't remember to fix it once we do have some features, I'm sure somebody will ask a question about it eventually. regards, tom lane diff --git a/doc/src/sgml/ref/create_domain.sgml b/doc/src/sgml/ref/create_domain.sgml index 82a0b87492..73f9f28d6c 100644 --- a/doc/src/sgml/ref/create_domain.sgml +++ b/doc/src/sgml/ref/create_domain.sgml @@ -239,6 +239,11 @@ INSERT INTO tab (domcol) VALUES ((SELECT domcol FROM tab WHERE false)); DOMAIN), adjust the function definition, and re-add the constraint, thereby rechecking it against stored data. + + + It's also good practice to ensure that domain CHECK + expressions will not throw errors. + diff --git a/src/backend/utils/adt/domains.c b/src/backend/utils/adt/domains.c index 3de0cb01a2..99aeaddb5d 100644 --- a/src/backend/utils/adt/domains.c +++ b/src/backend/utils/adt/domains.c @@ -126,9 +126,14 @@ domain_state_setup(Oid domainType, bool binary, MemoryContext mcxt) * This is roughly similar to the handling of CoerceToDomain nodes in * execExpr*.c, but we execute each constraint separately, rather than * compiling them in-line within a larger expression. + * + * If escontext points to an ErrorStateContext, any failures are reported + * there, otherwise they are ereport'ed. Note that we do not attempt to do + * soft reporting of errors raised during execution of CHECK constraints. */ static void -domain_check_input(Datum value, bool isnull, DomainIOData *my_extra) +domain_check_input(Datum value, bool isnull, DomainIOData *my_extra, + Node *escontext) { ExprContext *econtext = my_extra->econtext; ListCell *l; @@ -144,11 +149,14 @@ domain_check_input(Datum value, bool isnull, DomainIOData *my_extra) { case DOM_CONSTRAINT_NOTNULL: if (isnull) - ereport(ERROR, +{ + errsave(escontext, (errcode(ERRCODE_NOT_NULL_VIOLATION), errmsg("domain %s does not allow null values", format_type_be(my_extra->domain_type)), errdatatype(my_extra->domain_type))); + goto fail; +} break; case DOM_CONSTRAINT_CHECK: { @@ -179,13 +187,16 @@ domain_check_input(Datum value, bool isnull, DomainIOData *my_extra) econtext->domainValue_isNull = isnull; if (!ExecCheck(con->check_exprstate, econtext)) - ereport(ERROR, + { + errsave(escontext, (errcode(ERRCODE_CHECK_VIOLATION), errmsg("value for domain %s violates check constraint \"%s\"", format_type_be(my_extra->domain_type), con->name), errdomainconstraint(my_extra->domain_type, con->name))); + goto fail; + } break; } default: @@ -200,6 +211,7 @@ domain_check_input(Datum value, bool isnull, DomainIOData *my_extra) * per-tuple memory. This avoids leaking non-memory resources, if * anything in the expression(s) has any. */ +fail: if (econtext) ReScanExprContext(econtext); } @@ -213,6 +225,7 @@ domain_in(PG_FUNCTION_ARGS) { char *string; Oid domainType; + Node *escontext = fcinfo->context; DomainIOData *my_extra; Datum value; @@ -245,15 +258,18 @@ domain_in(PG_FUNCTION_ARGS) /* * Invoke the base type's typinput procedure to convert the data. */ - value = InputFunctionCall(_extra->proc, - string, - my_extra->typioparam, - my_extra->typtypmod); + if (!InputFunctionCallSafe(_extra->proc, + string, + my_extra->typioparam, + my_extra->typtypmod, + escontext, + )) + PG_RETURN_NULL(); /* * Do the necessary checks to ensure it's a valid domain value. */ - domain_check_input(value, (string == NULL), my_extra); + domain_check_input(value, (string == NULL), my_extra, escontext); if (string ==
Re: Error-safe user functions
Andrew Dunstan writes: > OK, json is a fairly easy case, see attached. But jsonb is a different > kettle of fish. Both the semantic routines called by the parser and the > subsequent call to JsonbValueToJsonb() can raise errors. These are > pretty much all about breaking various limits (for strings, objects, > arrays). There's also a call to numeric_in, but I assume that a string > that's already parsed as a valid json numeric literal won't upset > numeric_in. Um, nope ... regression=# select '1e100'::jsonb; ERROR: value overflows numeric format LINE 1: select '1e100'::jsonb; ^ > Many of these occur several calls down the stack, so > adjusting everything to deal with them would be fairly invasive. Perhaps > we could instead document that this class of input error won't be > trapped, at least for jsonb. Seeing that SQL/JSON is one of the major drivers of this whole project, it seemed a little sad to me that jsonb couldn't manage to implement what is required. So I spent a bit of time poking at it. Attached is an extended version of your patch that also covers jsonb. The main thing I soon realized is that the JsonSemAction API is based on the assumption that semantic actions will report errors by throwing them. This is a bit schizophrenic considering the parser itself carefully hands back error codes instead of throwing anything (excluding palloc failures of course). What I propose in the attached is that we change that API so that action functions return JsonParseErrorType, and add an enum value denoting "I already logged a suitable error, so you don't have to". It was a little tedious to modify all the existing functions that way, but not hard. Only the ones used by jsonb_in need to do anything except "return JSON_SUCCESS", at least for now. (I wonder if pg_verifybackup's parse_manifest.c could use a second look at how it's handling errors, given this API. I didn't study it closely.) I have not done anything here about errors within JsonbValueToJsonb. There would need to be another round of API-extension in that area if we want to be able to trap its errors. As you say, those are mostly about exceeding implementation size limits, so I suppose one could argue that they are not so different from palloc failure. It's still annoying. If people are good with the changes attached, I might take a look at that. regards, tom lane diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c index fee2ffb55c..e6896eccfe 100644 --- a/src/backend/utils/adt/json.c +++ b/src/backend/utils/adt/json.c @@ -81,9 +81,10 @@ json_in(PG_FUNCTION_ARGS) /* validate it */ lex = makeJsonLexContext(result, false); - pg_parse_json_or_ereport(lex, ); + if (!pg_parse_json_or_errsave(lex, , fcinfo->context)) + PG_RETURN_NULL(); - /* Internal representation is the same as text, for now */ + /* Internal representation is the same as text */ PG_RETURN_TEXT_P(result); } @@ -1337,7 +1338,7 @@ json_typeof(PG_FUNCTION_ARGS) /* Lex exactly one token from the input and check its type. */ result = json_lex(lex); if (result != JSON_SUCCESS) - json_ereport_error(result, lex); + json_errsave_error(result, lex, NULL); tok = lex->token_type; switch (tok) { diff --git a/src/backend/utils/adt/jsonb.c b/src/backend/utils/adt/jsonb.c index 9e14922ec2..7c1e5e6144 100644 --- a/src/backend/utils/adt/jsonb.c +++ b/src/backend/utils/adt/jsonb.c @@ -33,6 +33,7 @@ typedef struct JsonbInState { JsonbParseState *parseState; JsonbValue *res; + Node *escontext; } JsonbInState; /* unlike with json categories, we need to treat json and jsonb differently */ @@ -61,15 +62,15 @@ typedef struct JsonbAggState Oid val_output_func; } JsonbAggState; -static inline Datum jsonb_from_cstring(char *json, int len); -static size_t checkStringLen(size_t len); -static void jsonb_in_object_start(void *pstate); -static void jsonb_in_object_end(void *pstate); -static void jsonb_in_array_start(void *pstate); -static void jsonb_in_array_end(void *pstate); -static void jsonb_in_object_field_start(void *pstate, char *fname, bool isnull); +static inline Datum jsonb_from_cstring(char *json, int len, Node *escontext); +static bool checkStringLen(size_t len, Node *escontext); +static JsonParseErrorType jsonb_in_object_start(void *pstate); +static JsonParseErrorType jsonb_in_object_end(void *pstate); +static JsonParseErrorType jsonb_in_array_start(void *pstate); +static JsonParseErrorType jsonb_in_array_end(void *pstate); +static JsonParseErrorType jsonb_in_object_field_start(void *pstate, char *fname, bool isnull); static void jsonb_put_escaped_value(StringInfo out, JsonbValue *scalarVal); -static void jsonb_in_scalar(void *pstate, char *token, JsonTokenType tokentype); +static JsonParseErrorType jsonb_in_scalar(void *pstate, char *token, JsonTokenType tokentype); static void jsonb_categorize_type(Oid typoid, JsonbTypeCategory *tcategory, Oid
Re: Error-safe user functions
On Sat, Dec 10, 2022 at 9:20 AM Tom Lane wrote: > Alvaro Herrera writes: > > On 2022-Dec-09, Tom Lane wrote: > >> ... So I think it might be > >> okay to say "if you want soft error treatment for a domain, > >> make sure its check constraints don't throw errors". > > > I think that's fine. If the user does, say "CHECK (value > 0)" and that > > results in a soft error, that seems to me enough support for now. If > > they want to do something more elaborate, they can write C functions. > > Maybe eventually we'll want to offer some other mechanism that doesn't > > require C, but let's figure out what the requirements are. I don't > > think we know that, at this point. > > A fallback we can offer to anyone with such a problem is "write a > plpgsql function and wrap the potentially-failing bit in an exception > block". Then they get to pay the cost of the subtransaction, while > we're not imposing one on everybody else. > > regards, tom lane > That exception block will prevent parallel plans. I'm not saying it isn't the best way forward for us, but wanted to make that side effect clear.
Re: Error-safe user functions
so 10. 12. 2022 v 15:35 odesílatel Andrew Dunstan napsal: > > On 2022-12-09 Fr 10:37, Andrew Dunstan wrote: > > I am currently looking at the json types. I think that will be enough to > > let us rework the sql/json patches as discussed a couple of months ago. > > > > OK, json is a fairly easy case, see attached. But jsonb is a different > kettle of fish. Both the semantic routines called by the parser and the > subsequent call to JsonbValueToJsonb() can raise errors. These are > pretty much all about breaking various limits (for strings, objects, > arrays). There's also a call to numeric_in, but I assume that a string > that's already parsed as a valid json numeric literal won't upset > numeric_in. Many of these occur several calls down the stack, so > adjusting everything to deal with them would be fairly invasive. Perhaps > we could instead document that this class of input error won't be > trapped, at least for jsonb. We could still test for well-formed jsonb > input, just as I propose for json. That means that we would not be able > to trap one of these errors in the ON ERROR clause of JSON_TABLE. I > think we can probably live with that. > > Thoughts? > +1 Pavel > > > cheers > > > andrew > > > -- > Andrew Dunstan > EDB: https://www.enterprisedb.com >
Re: Error-safe user functions
On 2022-12-09 Fr 10:37, Andrew Dunstan wrote: > I am currently looking at the json types. I think that will be enough to > let us rework the sql/json patches as discussed a couple of months ago. > OK, json is a fairly easy case, see attached. But jsonb is a different kettle of fish. Both the semantic routines called by the parser and the subsequent call to JsonbValueToJsonb() can raise errors. These are pretty much all about breaking various limits (for strings, objects, arrays). There's also a call to numeric_in, but I assume that a string that's already parsed as a valid json numeric literal won't upset numeric_in. Many of these occur several calls down the stack, so adjusting everything to deal with them would be fairly invasive. Perhaps we could instead document that this class of input error won't be trapped, at least for jsonb. We could still test for well-formed jsonb input, just as I propose for json. That means that we would not be able to trap one of these errors in the ON ERROR clause of JSON_TABLE. I think we can probably live with that. Thoughts? cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com From 945ede5fa99f5aa9fb0740fe04303f37aa511528 Mon Sep 17 00:00:00 2001 From: Andrew Dunstan Date: Sat, 10 Dec 2022 08:18:57 -0500 Subject: [PATCH] adjustments for json_in --- src/backend/utils/adt/json.c | 7 --- src/backend/utils/adt/jsonfuncs.c | 26 +- src/include/utils/jsonfuncs.h | 14 +- src/test/regress/expected/json.out | 19 +++ src/test/regress/sql/json.sql | 5 + 5 files changed, 54 insertions(+), 17 deletions(-) diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c index fee2ffb55c..d5c48c99c3 100644 --- a/src/backend/utils/adt/json.c +++ b/src/backend/utils/adt/json.c @@ -81,9 +81,10 @@ json_in(PG_FUNCTION_ARGS) /* validate it */ lex = makeJsonLexContext(result, false); - pg_parse_json_or_ereport(lex, ); + if ( ! pg_parse_json_or_errsave(lex, , fcinfo->context)) + PG_RETURN_NULL(); - /* Internal representation is the same as text, for now */ + /* Internal representation is the same as text */ PG_RETURN_TEXT_P(result); } @@ -1337,7 +1338,7 @@ json_typeof(PG_FUNCTION_ARGS) /* Lex exactly one token from the input and check its type. */ result = json_lex(lex); if (result != JSON_SUCCESS) - json_ereport_error(result, lex); + json_errsave_error(result, lex, NULL); tok = lex->token_type; switch (tok) { diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c index bfc3f02a86..1a3cec59cb 100644 --- a/src/backend/utils/adt/jsonfuncs.c +++ b/src/backend/utils/adt/jsonfuncs.c @@ -490,21 +490,28 @@ static void transform_string_values_object_field_start(void *state, char *fname, static void transform_string_values_array_element_start(void *state, bool isnull); static void transform_string_values_scalar(void *state, char *token, JsonTokenType tokentype); + /* - * pg_parse_json_or_ereport + * pg_parse_json_or_errsave * * This function is like pg_parse_json, except that it does not return a * JsonParseErrorType. Instead, in case of any failure, this function will - * ereport(ERROR). + * call errsave(escontext), which will call ereport(ERROR) if escontext is NULL. + * Otherwise, returns a boolean indicating success or failure. */ -void -pg_parse_json_or_ereport(JsonLexContext *lex, JsonSemAction *sem) +bool +pg_parse_json_or_errsave(JsonLexContext *lex, JsonSemAction *sem, + Node *escontext) { JsonParseErrorType result; result = pg_parse_json(lex, sem); if (result != JSON_SUCCESS) - json_ereport_error(result, lex); + { + json_errsave_error(result, lex, escontext); + return false; + } + return true; } /* @@ -608,17 +615,18 @@ jsonb_object_keys(PG_FUNCTION_ARGS) * Report a JSON error. */ void -json_ereport_error(JsonParseErrorType error, JsonLexContext *lex) +json_errsave_error(JsonParseErrorType error, JsonLexContext *lex, + Node *escontext) { if (error == JSON_UNICODE_HIGH_ESCAPE || error == JSON_UNICODE_CODE_POINT_ZERO) - ereport(ERROR, + errsave(escontext, (errcode(ERRCODE_UNTRANSLATABLE_CHARACTER), errmsg("unsupported Unicode escape sequence"), errdetail_internal("%s", json_errdetail(error, lex)), report_json_context(lex))); else - ereport(ERROR, + errsave(escontext, (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), errmsg("invalid input syntax for type %s", "json"), errdetail_internal("%s", json_errdetail(error, lex)), @@ -1260,7 +1268,7 @@ get_array_start(void *state) error = json_count_array_elements(_state->lex, ); if (error != JSON_SUCCESS) -json_ereport_error(error, _state->lex); +json_errsave_error(error, _state->lex, NULL); if (-_state->path_indexes[lex_level] <= nelements) _state->path_indexes[lex_level] += nelements; diff --git a/src/include/utils/jsonfuncs.h b/src/include/utils/jsonfuncs.h
Re: Error-safe user functions
Alvaro Herrera writes: > On 2022-Dec-09, Tom Lane wrote: >> ... So I think it might be >> okay to say "if you want soft error treatment for a domain, >> make sure its check constraints don't throw errors". > I think that's fine. If the user does, say "CHECK (value > 0)" and that > results in a soft error, that seems to me enough support for now. If > they want to do something more elaborate, they can write C functions. > Maybe eventually we'll want to offer some other mechanism that doesn't > require C, but let's figure out what the requirements are. I don't > think we know that, at this point. A fallback we can offer to anyone with such a problem is "write a plpgsql function and wrap the potentially-failing bit in an exception block". Then they get to pay the cost of the subtransaction, while we're not imposing one on everybody else. regards, tom lane
Re: Error-safe user functions
On 2022-Dec-09, Tom Lane wrote: > I think though that it might be okay to just define this as > Not Our Problem. Although we don't seem to try to enforce it, > non-immutable domain check constraints are strongly deprecated > (the CREATE DOMAIN man page says that we assume immutability). > And not throwing errors is something that we usually consider > should ride along with immutability. So I think it might be > okay to say "if you want soft error treatment for a domain, > make sure its check constraints don't throw errors". I think that's fine. If the user does, say "CHECK (value > 0)" and that results in a soft error, that seems to me enough support for now. If they want to do something more elaborate, they can write C functions. Maybe eventually we'll want to offer some other mechanism that doesn't require C, but let's figure out what the requirements are. I don't think we know that, at this point. -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/ "Estoy de acuerdo contigo en que la verdad absoluta no existe... El problema es que la mentira sí existe y tu estás mintiendo" (G. Lama)
Re: Error-safe user functions
Andrew Dunstan writes: > On 2022-12-09 Fr 10:16, Tom Lane wrote: >> As I said, I'll take a look at the datetime area. Do we >> have any volunteers for other input functions? > I am currently looking at the json types. I think that will be enough to > let us rework the sql/json patches as discussed a couple of months ago. Cool. I've finished up what I wanted to do with the datetime code. It occurred to me that we're going to have a bit of a problem with domain_in. We can certainly make it pass back any soft errors from the underlying type's input function, and we can make it return a soft error if a domain constraint evaluates to false. However, what happens if some function in a check constraint throws an error? Our only hope of trapping that, given that it's a general user-defined expression, would be a subtransaction. Which is exactly what we don't want here. I think though that it might be okay to just define this as Not Our Problem. Although we don't seem to try to enforce it, non-immutable domain check constraints are strongly deprecated (the CREATE DOMAIN man page says that we assume immutability). And not throwing errors is something that we usually consider should ride along with immutability. So I think it might be okay to say "if you want soft error treatment for a domain, make sure its check constraints don't throw errors". Thoughts? regards, tom lane
Re: Error-safe user functions
On Fri, Dec 9, 2022 at 11:17 AM Amul Sul wrote: > On Fri, Dec 9, 2022 at 9:08 PM Andrew Dunstan wrote: > > > > > > On 2022-12-09 Fr 10:16, Tom Lane wrote: > > > Andrew Dunstan writes: > > >> On 2022-12-08 Th 21:59, Tom Lane wrote: > > >>> Yeah, I was planning to take a look at that before walking away from > > >>> this stuff. (I'm sure not volunteering to convert ALL the input > > >>> functions, but I'll do the datetime code.) > > >> Awesome. Perhaps if there are no more comments you can commit what you > > >> currently have so people can start work on other input functions. > > > Pushed. > > > > > > Great! > > > > > > > As I said, I'll take a look at the datetime area. Do we > > > have any volunteers for other input functions? > > > > > > > > > > > > I am currently looking at the json types. I think that will be enough to > > let us rework the sql/json patches as discussed a couple of months ago. > > > > I will pick a few other input functions, thanks. > > Regards, > Amul > I can do a few as well, as I need them done for the CAST With Default effort. Amul, please let me know which ones you pick so we don't duplicate work.
Re: Error-safe user functions
On Fri, Dec 9, 2022 at 9:08 PM Andrew Dunstan wrote: > > > On 2022-12-09 Fr 10:16, Tom Lane wrote: > > Andrew Dunstan writes: > >> On 2022-12-08 Th 21:59, Tom Lane wrote: > >>> Yeah, I was planning to take a look at that before walking away from > >>> this stuff. (I'm sure not volunteering to convert ALL the input > >>> functions, but I'll do the datetime code.) > >> Awesome. Perhaps if there are no more comments you can commit what you > >> currently have so people can start work on other input functions. > > Pushed. > > > Great! > > > > As I said, I'll take a look at the datetime area. Do we > > have any volunteers for other input functions? > > > > > > > I am currently looking at the json types. I think that will be enough to > let us rework the sql/json patches as discussed a couple of months ago. > I will pick a few other input functions, thanks. Regards, Amul
Re: Error-safe user functions
On 2022-12-09 Fr 10:16, Tom Lane wrote: > Andrew Dunstan writes: >> On 2022-12-08 Th 21:59, Tom Lane wrote: >>> Yeah, I was planning to take a look at that before walking away from >>> this stuff. (I'm sure not volunteering to convert ALL the input >>> functions, but I'll do the datetime code.) >> Awesome. Perhaps if there are no more comments you can commit what you >> currently have so people can start work on other input functions. > Pushed. Great! > As I said, I'll take a look at the datetime area. Do we > have any volunteers for other input functions? > > I am currently looking at the json types. I think that will be enough to let us rework the sql/json patches as discussed a couple of months ago. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Error-safe user functions
Andrew Dunstan writes: > On 2022-12-08 Th 21:59, Tom Lane wrote: >> Yeah, I was planning to take a look at that before walking away from >> this stuff. (I'm sure not volunteering to convert ALL the input >> functions, but I'll do the datetime code.) > Awesome. Perhaps if there are no more comments you can commit what you > currently have so people can start work on other input functions. Pushed. As I said, I'll take a look at the datetime area. Do we have any volunteers for other input functions? regards, tom lane
Re: Error-safe user functions
On 2022-12-08 Th 21:59, Tom Lane wrote: > Andres Freund writes: >> On 2022-12-08 17:57:09 -0500, Tom Lane wrote: >>> Given that this additional experimentation didn't find any holes >>> in the API design, I think this is pretty much ready to go. >> One interesting area is timestamp / datetime related code. There's been some >> past efforts in the area, mostly in 5bc450629b3. See the RETURN_ERROR macro >> in >> formatting.c. >> This is not directly about type input functions, but it looks to me that the >> functionality in the patchset should work. > Yeah, I was planning to take a look at that before walking away from > this stuff. (I'm sure not volunteering to convert ALL the input > functions, but I'll do the datetime code.) > Awesome. Perhaps if there are no more comments you can commit what you currently have so people can start work on other input functions. Thanks for your work on this. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Error-safe user functions
Andres Freund writes: > On 2022-12-08 17:57:09 -0500, Tom Lane wrote: >> Given that this additional experimentation didn't find any holes >> in the API design, I think this is pretty much ready to go. > One interesting area is timestamp / datetime related code. There's been some > past efforts in the area, mostly in 5bc450629b3. See the RETURN_ERROR macro in > formatting.c. > This is not directly about type input functions, but it looks to me that the > functionality in the patchset should work. Yeah, I was planning to take a look at that before walking away from this stuff. (I'm sure not volunteering to convert ALL the input functions, but I'll do the datetime code.) You're right that formatting.c is doing stuff that's not exactly an input function, but I don't see why we can't apply the same API concepts to it. regards, tom lane
Re: Error-safe user functions
Hi, On 2022-12-08 17:57:09 -0500, Tom Lane wrote: > Given that this additional experimentation didn't find any holes > in the API design, I think this is pretty much ready to go. One interesting area is timestamp / datetime related code. There's been some past efforts in the area, mostly in 5bc450629b3. See the RETURN_ERROR macro in formatting.c. This is not directly about type input functions, but it looks to me that the functionality in the patchset should work. I certainly have the hope that it'll make the code look a bit less ugly... It looks like a fair bit of work to convert this code, so I don't think we should tie converting formatting.c to the patchset. But it might be a good idea for Tom to skim the code to see whether there's any things impacting the design. Greetings, Andres Freund
Re: Error-safe user functions
On 2022-12-08 Th 17:57, Tom Lane wrote: > Andres Freund writes: >> On 2022-12-08 16:00:10 -0500, Robert Haas wrote: >>> Yes, I think just putting "struct Node;" in as many places as >>> necessary is the way to go. Or even: >> +1 > OK, here's a v5 that does it like that. > > I've spent a little time pushing ahead on other input functions, > and realized that my original plan to require a pre-encoded typmod > for these test functions was not very user-friendly. So in v5 > you can write something like > > pg_input_is_valid('1234.567', 'numeric(7,4)') > > 0004 attached finishes up the remaining core numeric datatypes > (int*, float*, numeric). I ripped out float8in_internal_opt_error > in favor of a function that uses the new APIs. Great, that takes care of some of the relatively urgent work. > > 0005 converts contrib/cube, which I chose to tackle partly because > I'd already touched it in 0004, partly because it seemed like a > good idea to verify that extension modules wouldn't have any > problems with this apprach, and partly because I wondered whether > an input function that uses a Bison/Flex parser would have big > problems getting converted. This one didn't, anyway. Cool > > Given that this additional experimentation didn't find any holes > in the API design, I think this is pretty much ready to go. > > I will look in more detail tomorrow, but it LGTM on a quick look. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Error-safe user functions
Andres Freund writes: > On 2022-12-08 16:00:10 -0500, Robert Haas wrote: >> Yes, I think just putting "struct Node;" in as many places as >> necessary is the way to go. Or even: > +1 OK, here's a v5 that does it like that. I've spent a little time pushing ahead on other input functions, and realized that my original plan to require a pre-encoded typmod for these test functions was not very user-friendly. So in v5 you can write something like pg_input_is_valid('1234.567', 'numeric(7,4)') 0004 attached finishes up the remaining core numeric datatypes (int*, float*, numeric). I ripped out float8in_internal_opt_error in favor of a function that uses the new APIs. 0005 converts contrib/cube, which I chose to tackle partly because I'd already touched it in 0004, partly because it seemed like a good idea to verify that extension modules wouldn't have any problems with this apprach, and partly because I wondered whether an input function that uses a Bison/Flex parser would have big problems getting converted. This one didn't, anyway. Given that this additional experimentation didn't find any holes in the API design, I think this is pretty much ready to go. regards, tom lane diff --git a/doc/src/sgml/ref/create_type.sgml b/doc/src/sgml/ref/create_type.sgml index 693423e524..994dfc6526 100644 --- a/doc/src/sgml/ref/create_type.sgml +++ b/doc/src/sgml/ref/create_type.sgml @@ -900,6 +900,17 @@ CREATE TYPE name function is written in C. + + In PostgreSQL version 16 and later, + it is desirable for base types' input functions to + return soft errors using the + new errsave()/ereturn() + mechanism, rather than throwing ereport() + exceptions as in previous versions. + See src/backend/utils/fmgr/README for more + information. + + diff --git a/src/backend/nodes/Makefile b/src/backend/nodes/Makefile index 4368c30fdb..7c594be583 100644 --- a/src/backend/nodes/Makefile +++ b/src/backend/nodes/Makefile @@ -56,6 +56,7 @@ node_headers = \ nodes/bitmapset.h \ nodes/extensible.h \ nodes/lockoptions.h \ + nodes/miscnodes.h \ nodes/replnodes.h \ nodes/supportnodes.h \ nodes/value.h \ diff --git a/src/backend/nodes/gen_node_support.pl b/src/backend/nodes/gen_node_support.pl index 7212bc486f..08992dfd47 100644 --- a/src/backend/nodes/gen_node_support.pl +++ b/src/backend/nodes/gen_node_support.pl @@ -68,6 +68,7 @@ my @all_input_files = qw( nodes/bitmapset.h nodes/extensible.h nodes/lockoptions.h + nodes/miscnodes.h nodes/replnodes.h nodes/supportnodes.h nodes/value.h @@ -89,6 +90,7 @@ my @nodetag_only_files = qw( executor/tuptable.h foreign/fdwapi.h nodes/lockoptions.h + nodes/miscnodes.h nodes/replnodes.h nodes/supportnodes.h ); diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c index f5cd1b7493..eb489ea3a7 100644 --- a/src/backend/utils/error/elog.c +++ b/src/backend/utils/error/elog.c @@ -71,6 +71,7 @@ #include "libpq/libpq.h" #include "libpq/pqformat.h" #include "mb/pg_wchar.h" +#include "nodes/miscnodes.h" #include "miscadmin.h" #include "pgstat.h" #include "postmaster/bgworker.h" @@ -611,6 +612,128 @@ errfinish(const char *filename, int lineno, const char *funcname) CHECK_FOR_INTERRUPTS(); } + +/* + * errsave_start --- begin a "soft" error-reporting cycle + * + * If "context" isn't an ErrorSaveContext node, this behaves as + * errstart(ERROR, domain), and the errsave() macro ends up acting + * exactly like ereport(ERROR, ...). + * + * If "context" is an ErrorSaveContext node, but the node creator only wants + * notification of the fact of a soft error without any details, we just set + * the error_occurred flag in the ErrorSaveContext node and return false, + * which will cause us to skip the remaining error processing steps. + * + * Otherwise, create and initialize error stack entry and return true. + * Subsequently, errmsg() and perhaps other routines will be called to further + * populate the stack entry. Finally, errsave_finish() will be called to + * tidy up. + */ +bool +errsave_start(struct Node *context, const char *domain) +{ + ErrorSaveContext *escontext; + ErrorData *edata; + + /* + * Do we have a context for soft error reporting? If not, just punt to + * errstart(). + */ + if (context == NULL || !IsA(context, ErrorSaveContext)) + return errstart(ERROR, domain); + + /* Report that a soft error was detected */ + escontext = (ErrorSaveContext *) context; + escontext->error_occurred = true; + + /* Nothing else to do if caller wants no further details */ + if (!escontext->details_wanted) + return false; + + /* + * Okay, crank up a stack entry to store the info in. + */ + + recursion_depth++; + + /* Initialize data for this error frame */ + edata = get_error_stack_entry(); + edata->elevel = LOG; /* signal all is well to errsave_finish */ + set_stack_entry_domain(edata, domain); + /* Select default errcode based on the assumed elevel of ERROR */ +
Re: Error-safe user functions
Hi, On 2022-12-08 16:00:10 -0500, Robert Haas wrote: > On Thu, Dec 8, 2022 at 11:32 AM Tom Lane wrote: > > If we go with "struct Node *" then we can solve such problems by > > just repeating "struct Node;" forward-declarations in as many > > headers as we have to. > > Yes, I think just putting "struct Node;" in as many places as > necessary is the way to go. Or even: +1 > struct Node; > typedef struct Node Node; That doesn't work well, because C99 doesn't allow typedefs to be redeclared in the same scope. IIRC C11 added suppport for it, and a lot of compilers already supported it before. Greetings, Andres Freund
Re: Error-safe user functions
On Thu, Dec 8, 2022 at 11:32 AM Tom Lane wrote: > If we go with "struct Node *" then we can solve such problems by > just repeating "struct Node;" forward-declarations in as many > headers as we have to. Yes, I think just putting "struct Node;" in as many places as necessary is the way to go. Or even: struct Node; typedef struct Node Node; which I think then allows for Node * to be used later. A small problem with typedef struct Something *SomethingElse is that it can get hard to keep track of whether some identifier is a pointer to a struct or just a struct. This doesn't bother me as much as it does some other hackers, from what I gather anyway, but I think we should be pretty judicious in using typedef that way. "SomethingPtr" really has no advantage over "Something *". It is neither shorter nor clearer. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Error-safe user functions
Andres Freund writes: > On 2022-12-07 17:32:21 -0500, Tom Lane wrote: >> +typedef struct Node *NodePtr; > Seems like it'd be easier to just forward declare the struct, and use the > non-typedef'ed name in the header than to have to deal with these > interdependencies and the differing typenames. I've been having second thoughts about how to handle this issue. As we convert more and more datatypes, references to "Node *" are going to be needed in assorted headers that don't currently have any reason to #include nodes.h. Rather than bloating their include footprints, we'll want to use the alternate spelling, whichever it is. (I already had to do this in array.h.) Some of these headers might be things that are also read by frontend compiles, in which case they won't have access to elog.h either, so that NodePtr in this formulation won't work for them. (I ran into a variant of that with an early draft of this patch series.) If we stick with NodePtr we'll probably end by putting that typedef into c.h so that it's accessible in frontend as well as backend. I don't have a huge problem with that, but I concede it's a little ugly. If we go with "struct Node *" then we can solve such problems by just repeating "struct Node;" forward-declarations in as many headers as we have to. This is a bit ugly too, but maybe less so, and it's a method we use elsewhere. The main downside I can see to it is that we will probably not find out all the places where we need such declarations until we get field complaints that "header X doesn't compile for me". elog.h will have a struct Node declaration, and that will be visible in every backend compilation we do as well as every cpluspluscheck/headerscheck test. Another notational point I'm wondering about is whether we want to create hundreds of direct references to fcinfo->context. Is it time to invent #define PG_GET_CONTEXT()(fcinfo->context) and write that instead in all these input functions? Thoughts? regards, tom lane
Re: Error-safe user functions
On 2022-Dec-07, David G. Johnston wrote: > Are you suggesting we should not go down the path that v8-0003 does in the > monitoring section cleanup thread? I find the usability of Chapter 54 > System Views to be superior to these two run-on chapters and would rather > we emulate it in both these places - for what is in the end very little > additional effort, all mechanical in nature. I think the new 9.26 is much better now than what we had there two days ago. Maybe it would be even better with your proposed changes, but let's see what you come up with. As for Chapter 54, while it's a lot better than what we had previously, I have a complaint about the new presentation: the overview table appears (at least in the HTML presentation) in a separate page from the initial page of the chapter. So to get the intended table of contents I have to move forward from the unintended table of contents (i.e. from https://www.postgresql.org/docs/devel/views.html forward to https://www.postgresql.org/docs/devel/views-overview.html ). This seems pointless. I think it would be better if we just removed the line , which would put that table in the "front page". I also have an issue with Chapter 28, more precisely 28.2.2, where we have a similar TOC-style tables (Tables 28.1 and 28.2), but these ones seem inferior to the new table in Chapter 54 in that the outgoing links are in random positions in the text of the table. It would be better to put those in a column of their own, so that they are all vertically aligned and easier to spot/click. Not sure if you've been here already. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "En las profundidades de nuestro inconsciente hay una obsesiva necesidad de un universo lógico y coherente. Pero el universo real se halla siempre un paso más allá de la lógica" (Irulan)
Re: Error-safe user functions
On Wed, Dec 7, 2022 at 12:17 PM Tom Lane wrote: > Corey Huinker writes: > > In my attempt to implement CAST...DEFAULT, I noticed that I immediately > > needed an > > OidInputFunctionCallSafe, which was trivial but maybe something we want > to > > add to the infra patch, but the comments around that function also > somewhat > > indicate that we might want to just do the work in-place and call > > InputFunctionCallSafe directly. Open to both ideas. > > I'm a bit skeptical of that. IMO using OidInputFunctionCall is only > appropriate in places that will be executed just once per query. > That is what's happening when the expr of the existing CAST ( expr AS typename ) is a constant and we want to just resolve the constant at parse time. > >
Re: Error-safe user functions
Andres Freund writes: > I wonder if there are potential use-cases for levels other than ERROR. I can > potentially see us wanting to defer some FATALs, e.g. when they occur in > process exit hooks. I thought about that early on, and concluded not. The whole thing is moot for levels less than ERROR, of course, and I'm having a hard time seeing how it could be useful for FATAL or PANIC. Maybe I just lack imagination, but if a call is specifying FATAL rather than just ERROR then it seems to me it's already a special snowflake rather than something we could fold into a generic non-error behavior. > For a moment I was worried that it could lead to odd behaviour that we don't > do get_error_stack_entry() when !details_wanted, due to not raising an error > we'd otherwise raise. But that's a should-never-be-reached case, so ... I don't see how. Returning false out of errsave_start causes the errsave macro to immediately give control back to the caller, which will go on about its business. > It seems somewhat ugly transport this knowledge via edata->elevel, but it's > not too bad. The LOG-vs-ERROR business, you mean? Yeah. I considered adding another bool flag to ErrorData, but couldn't convince myself it was worth the trouble. If we find a problem we can do that sometime in future. >> +/* >> + * We cannot include nodes.h yet, so make a stub reference. (This is also >> + * used by fmgr.h, which doesn't want to depend on nodes.h either.) >> + */ >> +typedef struct Node *NodePtr; > Seems like it'd be easier to just forward declare the struct, and use the > non-typedef'ed name in the header than to have to deal with these > interdependencies and the differing typenames. Meh. I'm a little allergic to writing "struct foo *" in function argument lists, because I so often see gcc pointing out that if struct foo isn't yet known then that can silently mean something different than you intended. With the typedef, it either works or is an error, no halfway about it. And the struct way isn't really much better in terms of having two different notations to use rather than only one. > Perhaps worth noting here that the reason why the errsave_start/errsave_finish > split exist differs a bit from the reason in ereport_domain()? "Over there" > it's just about not wanting to incur overhead when the message isn't logged, > but here we'll always have >= ERROR, but ->details_wanted can still lead to > not wanting to incur the overhead. Hmmm ... it seems like the same reason to me, we don't want to incur the overhead if the "start" function says not to. > Is it ok that we throw an error in lookup_rowtype_tupdesc()? Yeah, that should fall in the category of internal errors I think. I don't see how you'd reach that from a bad input string. (Or to be more precise, the point of pg_input_is_valid is to tell you whether the input string is valid, not to tell you whether the type name is valid; if you're worried about the latter you need a separate and earlier test.) regards, tom lane
Re: Error-safe user functions
Hi, On 2022-12-07 17:32:21 -0500, Tom Lane wrote: > I already pushed the elog-refactoring patch, since that seemed > uncontroversial. 0001 attached covers the same territory as before, > but I regrouped the rest so that 0002 installs the new test support > functions, then 0003 adds both the per-datatype changes and > corresponding test cases for bool, int4, arrays, and records. > The idea here is that 0003 can be pointed to as a sample of what > has to be done to datatype input functions, while the preceding > patches can be cited as relevant documentation. (I've not decided > whether to squash 0001 and 0002 together or commit them separately. I think they make sense as is. > Does it make sense to break 0003 into 4 separate commits, or is > that overkill?) I think it'd be fine either way. > + * If "context" is an ErrorSaveContext node, but the node creator only wants > + * notification of the fact of a soft error without any details, just set > + * the error_occurred flag in the ErrorSaveContext node and return false, > + * which will cause us to skip the remaining error processing steps. > + * > + * Otherwise, create and initialize error stack entry and return true. > + * Subsequently, errmsg() and perhaps other routines will be called to > further > + * populate the stack entry. Finally, errsave_finish() will be called to > + * tidy up. > + */ > +bool > +errsave_start(NodePtr context, const char *domain) I wonder if there are potential use-cases for levels other than ERROR. I can potentially see us wanting to defer some FATALs, e.g. when they occur in process exit hooks. > +{ > + ErrorSaveContext *escontext; > + ErrorData *edata; > + > + /* > + * Do we have a context for soft error reporting? If not, just punt to > + * errstart(). > + */ > + if (context == NULL || !IsA(context, ErrorSaveContext)) > + return errstart(ERROR, domain); > + > + /* Report that a soft error was detected */ > + escontext = (ErrorSaveContext *) context; > + escontext->error_occurred = true; > + > + /* Nothing else to do if caller wants no further details */ > + if (!escontext->details_wanted) > + return false; > + > + /* > + * Okay, crank up a stack entry to store the info in. > + */ > + > + recursion_depth++; > + > + /* Initialize data for this error frame */ > + edata = get_error_stack_entry(); For a moment I was worried that it could lead to odd behaviour that we don't do get_error_stack_entry() when !details_wanted, due to not raising an error we'd otherwise raise. But that's a should-never-be-reached case, so ... > +/* > + * errsave_finish --- end a "soft" error-reporting cycle > + * > + * If errsave_start() decided this was a regular error, behave as > + * errfinish(). Otherwise, package up the error details and save > + * them in the ErrorSaveContext node. > + */ > +void > +errsave_finish(NodePtr context, const char *filename, int lineno, > +const char *funcname) > +{ > + ErrorSaveContext *escontext = (ErrorSaveContext *) context; > + ErrorData *edata = [errordata_stack_depth]; > + > + /* verify stack depth before accessing *edata */ > + CHECK_STACK_DEPTH(); > + > + /* > + * If errsave_start punted to errstart, then elevel will be ERROR or > + * perhaps even PANIC. Punt likewise to errfinish. > + */ > + if (edata->elevel >= ERROR) > + { > + errfinish(filename, lineno, funcname); > + pg_unreachable(); > + } It seems somewhat ugly transport this knowledge via edata->elevel, but it's not too bad. > +/* > + * We cannot include nodes.h yet, so make a stub reference. (This is also > + * used by fmgr.h, which doesn't want to depend on nodes.h either.) > + */ > +typedef struct Node *NodePtr; Seems like it'd be easier to just forward declare the struct, and use the non-typedef'ed name in the header than to have to deal with these interdependencies and the differing typenames. > +/*-- > + * Support for reporting "soft" errors that don't require a full transaction > + * abort to clean up. This is to be used in this way: > + * errsave(context, > + * errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), > + * errmsg("invalid input syntax for type %s: > \"%s\"", > + * "boolean", in_str), > + * ... other errxxx() fields as needed ...); > + * > + * "context" is a node pointer or NULL, and the remaining auxiliary calls > + * provide the same error details as in ereport(). If context is not a > + * pointer to an ErrorSaveContext node, then errsave(context, ...) > + * behaves identically to ereport(ERROR, ...). If context is a pointer > + * to an ErrorSaveContext node, then the information provided by the > + * auxiliary calls is stored in the context node and control returns > + *
Re: Error-safe user functions
Andrew Dunstan writes: > On 2022-12-07 We 17:32, Tom Lane wrote: >> Does it make sense to break 0003 into 4 separate commits, or is >> that overkill?) > No strong opinion about 0001 and 0002. I'm happy enough with them as > they are, but if you want to squash them that's ok. I wouldn't break up > 0003. I think we're going to end up committing the remaining work in > batches, although they would probably be a bit more thematically linked > than these. Yeah, we certainly aren't likely to do this work as one-commit-per-datatype going forward. I'm just wondering how to do these initial commits so that they provide good reference material. regards, tom lane
Re: Error-safe user functions
On 2022-12-07 We 17:32, Tom Lane wrote: > OK, here's a v4 that I think is possibly committable. > > I've changed all the comments and docs to use the "soft error" > terminology, but since using "soft" in the actual function names > didn't seem that appealing, they still use "safe". > > I already pushed the elog-refactoring patch, since that seemed > uncontroversial. 0001 attached covers the same territory as before, > but I regrouped the rest so that 0002 installs the new test support > functions, then 0003 adds both the per-datatype changes and > corresponding test cases for bool, int4, arrays, and records. > The idea here is that 0003 can be pointed to as a sample of what > has to be done to datatype input functions, while the preceding > patches can be cited as relevant documentation. (I've not decided > whether to squash 0001 and 0002 together or commit them separately. > Does it make sense to break 0003 into 4 separate commits, or is > that overkill?) > No strong opinion about 0001 and 0002. I'm happy enough with them as they are, but if you want to squash them that's ok. I wouldn't break up 0003. I think we're going to end up committing the remaining work in batches, although they would probably be a bit more thematically linked than these. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Error-safe user functions
OK, here's a v4 that I think is possibly committable. I've changed all the comments and docs to use the "soft error" terminology, but since using "soft" in the actual function names didn't seem that appealing, they still use "safe". I already pushed the elog-refactoring patch, since that seemed uncontroversial. 0001 attached covers the same territory as before, but I regrouped the rest so that 0002 installs the new test support functions, then 0003 adds both the per-datatype changes and corresponding test cases for bool, int4, arrays, and records. The idea here is that 0003 can be pointed to as a sample of what has to be done to datatype input functions, while the preceding patches can be cited as relevant documentation. (I've not decided whether to squash 0001 and 0002 together or commit them separately. Does it make sense to break 0003 into 4 separate commits, or is that overkill?) Thoughts? regards, tom lane diff --git a/doc/src/sgml/ref/create_type.sgml b/doc/src/sgml/ref/create_type.sgml index 693423e524..994dfc6526 100644 --- a/doc/src/sgml/ref/create_type.sgml +++ b/doc/src/sgml/ref/create_type.sgml @@ -900,6 +900,17 @@ CREATE TYPE name function is written in C. + + In PostgreSQL version 16 and later, + it is desirable for base types' input functions to + return soft errors using the + new errsave()/ereturn() + mechanism, rather than throwing ereport() + exceptions as in previous versions. + See src/backend/utils/fmgr/README for more + information. + + diff --git a/src/backend/nodes/Makefile b/src/backend/nodes/Makefile index 4368c30fdb..7c594be583 100644 --- a/src/backend/nodes/Makefile +++ b/src/backend/nodes/Makefile @@ -56,6 +56,7 @@ node_headers = \ nodes/bitmapset.h \ nodes/extensible.h \ nodes/lockoptions.h \ + nodes/miscnodes.h \ nodes/replnodes.h \ nodes/supportnodes.h \ nodes/value.h \ diff --git a/src/backend/nodes/gen_node_support.pl b/src/backend/nodes/gen_node_support.pl index 7212bc486f..08992dfd47 100644 --- a/src/backend/nodes/gen_node_support.pl +++ b/src/backend/nodes/gen_node_support.pl @@ -68,6 +68,7 @@ my @all_input_files = qw( nodes/bitmapset.h nodes/extensible.h nodes/lockoptions.h + nodes/miscnodes.h nodes/replnodes.h nodes/supportnodes.h nodes/value.h @@ -89,6 +90,7 @@ my @nodetag_only_files = qw( executor/tuptable.h foreign/fdwapi.h nodes/lockoptions.h + nodes/miscnodes.h nodes/replnodes.h nodes/supportnodes.h ); diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c index f5cd1b7493..a36aeb832e 100644 --- a/src/backend/utils/error/elog.c +++ b/src/backend/utils/error/elog.c @@ -71,6 +71,7 @@ #include "libpq/libpq.h" #include "libpq/pqformat.h" #include "mb/pg_wchar.h" +#include "nodes/miscnodes.h" #include "miscadmin.h" #include "pgstat.h" #include "postmaster/bgworker.h" @@ -611,6 +612,128 @@ errfinish(const char *filename, int lineno, const char *funcname) CHECK_FOR_INTERRUPTS(); } + +/* + * errsave_start --- begin a "soft" error-reporting cycle + * + * If "context" isn't an ErrorSaveContext node, this behaves as + * errstart(ERROR, domain), and the errsave() macro ends up acting + * exactly like ereport(ERROR, ...). + * + * If "context" is an ErrorSaveContext node, but the node creator only wants + * notification of the fact of a soft error without any details, just set + * the error_occurred flag in the ErrorSaveContext node and return false, + * which will cause us to skip the remaining error processing steps. + * + * Otherwise, create and initialize error stack entry and return true. + * Subsequently, errmsg() and perhaps other routines will be called to further + * populate the stack entry. Finally, errsave_finish() will be called to + * tidy up. + */ +bool +errsave_start(NodePtr context, const char *domain) +{ + ErrorSaveContext *escontext; + ErrorData *edata; + + /* + * Do we have a context for soft error reporting? If not, just punt to + * errstart(). + */ + if (context == NULL || !IsA(context, ErrorSaveContext)) + return errstart(ERROR, domain); + + /* Report that a soft error was detected */ + escontext = (ErrorSaveContext *) context; + escontext->error_occurred = true; + + /* Nothing else to do if caller wants no further details */ + if (!escontext->details_wanted) + return false; + + /* + * Okay, crank up a stack entry to store the info in. + */ + + recursion_depth++; + + /* Initialize data for this error frame */ + edata = get_error_stack_entry(); + edata->elevel = LOG; /* signal all is well to errsave_finish */ + set_stack_entry_domain(edata, domain); + /* Select default errcode based on the assumed elevel of ERROR */ + edata->sqlerrcode = ERRCODE_INTERNAL_ERROR; + + /* + * Any allocations for this error state level should go into the caller's + * context. We don't need to pollute ErrorContext, or even require it to + * exist, in this code path. + */ + edata->assoc_context =
Re: Error-safe user functions
"David G. Johnston" writes: > On Wed, Dec 7, 2022 at 8:04 AM Andrew Dunstan wrote: >> I'm not sure InputFunctionCallSoft would be an improvement. Maybe >> InputFunctionCallSoftError would be clearer, but I don't know that it's >> much of an improvement either. The same goes for the other visible changes. > InputFunctionCallSafe -> TryInputFunctionCall I think we are already using "TryXXX" for code that involves catching ereport errors. Since the whole point here is that we are NOT doing that, I think this naming would be more confusing than helpful. > Unrelated observation: "Although the error stack is not large, we don't > expect to run out of space." -> "Because the error stack is not large, > assume that we will not run out of space and panic if we are wrong."? That doesn't seem to make the point I wanted to make. I've adopted your other suggestions in the v4 I'm preparing now. regards, tom lane
Re: Error-safe user functions
I wrote: > Andres Freund writes: >> Is there a guarantee that input functions are stable or immutable? > There's a project policy that that should be true. That justifies > marking things like record_in as stable --- if the per-column input > functions could be volatile, record_in would need to be as well. > There are other dependencies on it; see e.g. aab353a60, 3db6524fe. I dug in the archives and found the thread leading up to aab353a60: https://www.postgresql.org/message-id/flat/AANLkTik8v7O9QR9jjHNVh62h-COC1B0FDUNmEYMdtKjR%40mail.gmail.com regards, tom lane
Re: Error-safe user functions
Andres Freund writes: > Is there a guarantee that input functions are stable or immutable? There's a project policy that that should be true. That justifies marking things like record_in as stable --- if the per-column input functions could be volatile, record_in would need to be as well. There are other dependencies on it; see e.g. aab353a60, 3db6524fe. > We don't > have any volatile input functions in core PG: Indeed, because type_sanity.sql checks that. regards, tom lane