Re: CAST(... ON DEFAULT) - WIP build on top of Error-Safe User Functions

2023-03-28 Thread Corey Huinker
On Tue, Mar 28, 2023 at 3:25 PM Isaac Morland 
wrote:

> On Mon, 19 Dec 2022 at 17:57, Corey Huinker 
> wrote:
>
>>
>> Attached is my work in progress to implement the changes to the CAST()
>> function as proposed by Vik Fearing.
>>
>> CAST(expr AS typename NULL ON ERROR)
>> will use error-safe functions to do the cast of expr, and will return
>> NULL if the cast fails.
>>
>> CAST(expr AS typename DEFAULT expr2 ON ERROR)
>> will use error-safe functions to do the cast of expr, and will return
>> expr2 if the cast fails.
>>
>
> Is there any difference between NULL and DEFAULT NULL?
>

What I think you're asking is: is there a difference between these two
statements:

SELECT CAST(my_string AS integer NULL ON ERROR) FROM my_table;


SELECT CAST(my_string AS integer DEFAULT NULL ON ERROR) FROM my_table;


And as I understand it, the answer would be no, there is no practical
difference. The first case is just a convenient shorthand, whereas the
second case tees you up for a potentially complex expression. Before you
ask, I believe the ON ERROR syntax could be made optional. As I implemented
it, both cases create a default expression which then typecast to integer,
and in both cases that expression would be a const-null, so the optimizer
steps would very quickly collapse those steps into a plain old constant.


Re: CAST(... ON DEFAULT) - WIP build on top of Error-Safe User Functions

2023-03-28 Thread Corey Huinker
On Tue, Mar 28, 2023 at 2:53 PM Gregory Stark (as CFM) 
wrote:

> On Tue, 3 Jan 2023 at 14:16, Tom Lane  wrote:
> >
> > Vik Fearing  writes:
> >
> > > I don't think we should add that syntax until I do get it through the
> > > committee, just in case they change something.
> >
> > Agreed.  So this is something we won't be able to put into v16;
> > it'll have to wait till there's something solid from the committee.
>
> I guess I'll mark this Rejected in the CF then. Who knows when the SQL
> committee will look at this...
>
> --
> Gregory Stark
> As Commitfest Manager
>

Yes, for now. I'm in touch with the pg-people on the committee and will
resume work when there's something to act upon.


Re: CAST(... ON DEFAULT) - WIP build on top of Error-Safe User Functions

2023-03-28 Thread Isaac Morland
On Mon, 19 Dec 2022 at 17:57, Corey Huinker  wrote:

>
> Attached is my work in progress to implement the changes to the CAST()
> function as proposed by Vik Fearing.
>
> CAST(expr AS typename NULL ON ERROR)
> will use error-safe functions to do the cast of expr, and will return
> NULL if the cast fails.
>
> CAST(expr AS typename DEFAULT expr2 ON ERROR)
> will use error-safe functions to do the cast of expr, and will return
> expr2 if the cast fails.
>

Is there any difference between NULL and DEFAULT NULL?


Re: CAST(... ON DEFAULT) - WIP build on top of Error-Safe User Functions

2023-03-28 Thread Gregory Stark (as CFM)
On Tue, 3 Jan 2023 at 14:16, Tom Lane  wrote:
>
> Vik Fearing  writes:
>
> > I don't think we should add that syntax until I do get it through the
> > committee, just in case they change something.
>
> Agreed.  So this is something we won't be able to put into v16;
> it'll have to wait till there's something solid from the committee.

I guess I'll mark this Rejected in the CF then. Who knows when the SQL
committee will look at this...

-- 
Gregory Stark
As Commitfest Manager




Re: CAST(... ON DEFAULT) - WIP build on top of Error-Safe User Functions

2023-01-03 Thread Tom Lane
Vik Fearing  writes:
> I have not posted my paper to the committee yet, but I plan to do so 
> before the working group's meeting early February.  Just like with 
> posting patches here, I cannot guarantee that it will get accepted but I 
> will be arguing for it.

> I don't think we should add that syntax until I do get it through the 
> committee, just in case they change something.

Agreed.  So this is something we won't be able to put into v16;
it'll have to wait till there's something solid from the committee.

regards, tom lane




Re: CAST(... ON DEFAULT) - WIP build on top of Error-Safe User Functions

2023-01-03 Thread Vik Fearing

On 1/3/23 19:14, Tom Lane wrote:

Corey Huinker  writes:

On Mon, Jan 2, 2023 at 10:57 AM Tom Lane  wrote:

While I approve of trying to get some functionality in this area,
I'm not sure that extending CAST is a great idea, because I'm afraid
that the SQL committee will do something that conflicts with it.



I'm going off the spec that Vik presented in
https://www.postgresql.org/message-id/f8600a3b-f697-2577-8fea-f40d3e18b...@postgresfriends.org
which is his effort to get it through the SQL committee.


I'm pretty certain that sending something to pgsql-hackers will have
exactly zero impact on the SQL committee.  Is there anything actually
submitted to the committee, and if so what's its status?


I have not posted my paper to the committee yet, but I plan to do so 
before the working group's meeting early February.  Just like with 
posting patches here, I cannot guarantee that it will get accepted but I 
will be arguing for it.


I don't think we should add that syntax until I do get it through the 
committee, just in case they change something.

--
Vik Fearing





Re: Error-safe user functions

2023-01-03 Thread Robert Haas
On Sun, Dec 25, 2022 at 12:13 PM Tom Lane  wrote:
> Here's a proposed patch for converting regprocin and friends
> to soft error reporting.  I'll say at the outset that it's an
> engineering compromise, and it may be worth going further in
> future.  But I doubt it's worth doing more than this for v16,
> because the next steps would be pretty invasive.

I don't know that I feel particularly good about converting some
errors to be reported softly and others not, especially since the
dividing line around which things fall into which category is pretty
much "well, whatever seemed hard we didn't convert". We could consider
hanging it to report everything as a hard error until we can convert
everything, but I'm not sure that's better.

On another note, I can't help noticing that all of these patches seem
to have been committed without any documentation changes. Maybe that's
because there's nothing user-visible that makes any use of these
features yet, but if that's true, then we probably ought to add
something so that the changes are testable. And having done that we
need to explain to users what the behavior actually is: that input
validation errors are trapped but other kinds of failures like out of
memory are not; that most core data types report all input validation
errors softly, and the exceptions; and that for non-core data types
the behavior depends on how the extension is coded. I think it's
really a mistake to suppose that users won't care about or don't need
to know these kinds of details. In my experience, that's just not
true.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: CAST(... ON DEFAULT) - WIP build on top of Error-Safe User Functions

2023-01-03 Thread Tom Lane
Corey Huinker  writes:
> On Mon, Jan 2, 2023 at 10:57 AM Tom Lane  wrote:
>> While I approve of trying to get some functionality in this area,
>> I'm not sure that extending CAST is a great idea, because I'm afraid
>> that the SQL committee will do something that conflicts with it.

> I'm going off the spec that Vik presented in
> https://www.postgresql.org/message-id/f8600a3b-f697-2577-8fea-f40d3e18b...@postgresfriends.org
> which is his effort to get it through the SQL committee.

I'm pretty certain that sending something to pgsql-hackers will have
exactly zero impact on the SQL committee.  Is there anything actually
submitted to the committee, and if so what's its status?

regards, tom lane




Re: CAST(... ON DEFAULT) - WIP build on top of Error-Safe User Functions

2023-01-03 Thread Corey Huinker
On Mon, Jan 2, 2023 at 10:57 AM Tom Lane  wrote:

> Corey Huinker  writes:
> > The proposed changes are as follows:
> > CAST(expr AS typename)
> > continues to behave as before.
> > CAST(expr AS typename ERROR ON ERROR)
> > has the identical behavior as the unadorned CAST() above.
> > CAST(expr AS typename NULL ON ERROR)
> > will use error-safe functions to do the cast of expr, and will return
> > NULL if the cast fails.
> > CAST(expr AS typename DEFAULT expr2 ON ERROR)
> > will use error-safe functions to do the cast of expr, and will return
> > expr2 if the cast fails.
>
> While I approve of trying to get some functionality in this area,
> I'm not sure that extending CAST is a great idea, because I'm afraid
> that the SQL committee will do something that conflicts with it.
> If we know that they are about to standardize exactly this syntax,
> where is that information available?  If we don't know that,
> I'd prefer to invent some kind of function or other instead of
> extending the grammar.
>
> regards, tom lane
>

I'm going off the spec that Vik presented in
https://www.postgresql.org/message-id/f8600a3b-f697-2577-8fea-f40d3e18b...@postgresfriends.org
which is his effort to get it through the SQL committee. I was
alreading thinking about how to get the SQLServer TRY_CAST() function into
postgres, so this seemed like the logical next step.

While the syntax may change, the underlying infrastructure would remain
basically the same: we would need the ability to detect that a typecast had
failed, and replace it with the default value, and handle that at parse
time, or executor time, and handle array casts where the array has the
default but the underlying elements can't.

It would be simple to move the grammar changes to their own patch if that
removes a barrier for people.


Re: CAST(... ON DEFAULT) - WIP build on top of Error-Safe User Functions

2023-01-03 Thread Andrew Dunstan


On 2023-01-02 Mo 10:57, Tom Lane wrote:
> Corey Huinker  writes:
>> The proposed changes are as follows:
>> CAST(expr AS typename)
>> continues to behave as before.
>> CAST(expr AS typename ERROR ON ERROR)
>> has the identical behavior as the unadorned CAST() above.
>> CAST(expr AS typename NULL ON ERROR)
>> will use error-safe functions to do the cast of expr, and will return
>> NULL if the cast fails.
>> CAST(expr AS typename DEFAULT expr2 ON ERROR)
>> will use error-safe functions to do the cast of expr, and will return
>> expr2 if the cast fails.
> While I approve of trying to get some functionality in this area,
> I'm not sure that extending CAST is a great idea, because I'm afraid
> that the SQL committee will do something that conflicts with it.
> If we know that they are about to standardize exactly this syntax,
> where is that information available?  If we don't know that,
> I'd prefer to invent some kind of function or other instead of
> extending the grammar.


+1


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: CAST(... ON DEFAULT) - WIP build on top of Error-Safe User Functions

2023-01-03 Thread vignesh C
On Tue, 20 Dec 2022 at 04:27, Corey Huinker  wrote:
>
>
> Attached is my work in progress to implement the changes to the CAST() 
> function as proposed by Vik Fearing.
>
> This work builds upon the Error-safe User Functions work currently ongoing.
>
> The proposed changes are as follows:
>
> CAST(expr AS typename)
> continues to behave as before.
>
> CAST(expr AS typename ERROR ON ERROR)
> has the identical behavior as the unadorned CAST() above.
>
> CAST(expr AS typename NULL ON ERROR)
> will use error-safe functions to do the cast of expr, and will return 
> NULL if the cast fails.
>
> CAST(expr AS typename DEFAULT expr2 ON ERROR)
> will use error-safe functions to do the cast of expr, and will return 
> expr2 if the cast fails.
>
> There is an additional FORMAT parameter that I have not yet implemented, my 
> understanding is that it is largely intended for DATE/TIME field conversions, 
> but others are certainly possible.
> CAST(expr AS typename FORMAT fmt DEFAULT expr2 ON ERROR)
>
> What is currently working:
> - Any scalar expression that can be evaluated at parse time. These tests from 
> cast.sql all currently work:
>
> VALUES (CAST('error' AS integer));
> VALUES (CAST('error' AS integer ERROR ON ERROR));
> VALUES (CAST('error' AS integer NULL ON ERROR));
> VALUES (CAST('error' AS integer DEFAULT 42 ON ERROR));
>
> SELECT CAST('{123,abc,456}' AS integer[] DEFAULT '{-789}' ON ERROR) as 
> array_test1;
>
> - Scalar values evaluated at runtime.
>
> CREATE TEMPORARY TABLE t(t text);
> INSERT INTO t VALUES ('a'), ('1'), ('b'), (2);
> SELECT CAST(t.t AS integer DEFAULT -1 ON ERROR) AS foo FROM t;
>  foo
> -
>   -1
>1
>   -1
>2
> (4 rows)
>
>
> Along the way, I made a few design decisions, each of which is up for debate:
>
> First, I created OidInputFunctionCallSafe, which is to OidInputFunctionCall 
> what InputFunctionCallSafe is to InputFunctionCall. Given that the only place 
> I ended up using it was stringTypeDatumSafe(), it may be possible to just 
> move that code inside stringTypeDatumSafe.
>
> Next, I had a need for FuncExpr, CoerceViaIO, and ArrayCoerce to all report 
> if their expr argument failed, and if not, just past the evaluation of expr2. 
> Rather than duplicate this logic in several places, I chose instead to modify 
> CoalesceExpr to allow for an error-test mode in addition to its default 
> null-test mode, and then to provide this altered node with two expressions, 
> the first being the error-safe typecast of expr and the second being the 
> non-error-safe typecast of expr2.
>
> I still don't have array-to-array casts working, as the changed I would 
> likely need to make to ArrayCoerce get somewhat invasive, so this seemed like 
> a good time to post my work so far and solicit some feedback beyond what I've 
> already been getting from Jeff Davis and Michael Paquier.
>
> I've sidestepped domains as well for the time being as well as avoiding JIT 
> issues entirely.
>
> No documentation is currently prepared. All but one of the regression test 
> queries work, the one that is currently failing is:
>
> SELECT CAST('{234,def,567}'::text[] AS integer[] DEFAULT '{-1011}' ON ERROR) 
> as array_test2;
>
> Other quirks:
> - an unaliased CAST ON DEFAULT will return the column name of "coalesce", 
> which internally is true, but obviously would be quite confusing to a user.
>
> As a side observation, I noticed that the optimizer already tries to resolve 
> expressions based on constants and to collapse expression trees where 
> possible, which makes me wonder if the work done to do the same in 
> transformTypeCast/ and coerce_to_target_type and coerce_type isn't also 
> wasted.

CFBot shows some compilation errors as in [1], please post an updated
version for the same:
[02:53:44.829] time make -s -j${BUILD_JOBS} world-bin
[02:55:41.164] llvmjit_expr.c: In function ‘llvm_compile_expr’:
[02:55:41.164] llvmjit_expr.c:928:6: error: ‘v_resnull’ undeclared
(first use in this function); did you mean ‘v_resnullp’?
[02:55:41.164] 928 | v_resnull = LLVMBuildLoad(b, v_reserrorp, "");
[02:55:41.164] | ^
[02:55:41.164] | v_resnullp
[02:55:41.164] llvmjit_expr.c:928:6: note: each undeclared identifier
is reported only once for each function it appears in
[02:55:41.164] llvmjit_expr.c:928:35: error: ‘v_reserrorp’ undeclared
(first use in this function); did you mean ‘v_reserror’?
[02:55:41.164] 928 | v_resnull = LLVMBuildLoad(b, v_reserrorp, "");
[02:55:41.164] | ^~~
[02:55:41.164] | v_reserror
[02:55:41.165] make[2]: *** [: llvmjit_expr.o] Error 1
[02:55:41.165] make[2]: *** Waiting for unfinished jobs
[02:55:45.495] make[1]: *** [Makefile:42: all-backend/jit/llvm-recurse] Error 2
[02:55:45.495] make: *** [GNUmakefile:21: world-bin-src-recurse] Error 2

[1] - https://cirrus-ci.com/task/6687753371385856?logs=gcc_warning#L448

Regards,
Vignesh




Re: CAST(... ON DEFAULT) - WIP build on top of Error-Safe User Functions

2023-01-02 Thread Tom Lane
Corey Huinker  writes:
> The proposed changes are as follows:
> CAST(expr AS typename)
> continues to behave as before.
> CAST(expr AS typename ERROR ON ERROR)
> has the identical behavior as the unadorned CAST() above.
> CAST(expr AS typename NULL ON ERROR)
> will use error-safe functions to do the cast of expr, and will return
> NULL if the cast fails.
> CAST(expr AS typename DEFAULT expr2 ON ERROR)
> will use error-safe functions to do the cast of expr, and will return
> expr2 if the cast fails.

While I approve of trying to get some functionality in this area,
I'm not sure that extending CAST is a great idea, because I'm afraid
that the SQL committee will do something that conflicts with it.
If we know that they are about to standardize exactly this syntax,
where is that information available?  If we don't know that,
I'd prefer to invent some kind of function or other instead of
extending the grammar.

regards, tom lane




Re: Error-safe user functions

2022-12-28 Thread Andrew Dunstan


On 2022-12-28 We 01:00, Amul Sul wrote:
> On Tue, Dec 27, 2022 at 11:17 PM Tom Lane  wrote:
>> Andrew Dunstan  writes:
>>> Here's a patch that covers the ltree and intarray contrib modules.
>> I would probably have done this a little differently --- I think
>> the added "res" parameters aren't really necessary for most of
>> these.  But it's not worth arguing over.
>>
> Also, it would be good if we can pass "escontext" through the "state"
> argument of makepool() like commit 78212f210114 done for makepol() of
> tsquery.c. Attached patch is the updated version that does the same.
>


Thanks, I have done both of these things. Looks like we're now done with
this task, thanks everybody.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Error-safe user functions

2022-12-27 Thread Amul Sul
On Tue, Dec 27, 2022 at 11:17 PM Tom Lane  wrote:
>
> Andrew Dunstan  writes:
> > Here's a patch that covers the ltree and intarray contrib modules.
>
> I would probably have done this a little differently --- I think
> the added "res" parameters aren't really necessary for most of
> these.  But it's not worth arguing over.
>

Also, it would be good if we can pass "escontext" through the "state"
argument of makepool() like commit 78212f210114 done for makepol() of
tsquery.c. Attached patch is the updated version that does the same.

Regards,
Amul


v2-ltree-intarray-error-safe.patch
Description: Binary data


Re: Error-safe user functions

2022-12-27 Thread Tom Lane
Andrew Dunstan  writes:
> On Dec 27, 2022, at 12:47 PM, Tom Lane  wrote:
>> Andrew Dunstan  writes:
>>> I think that would leave just hstore to be done.

>> Yeah, that matches my scoreboard.  Are you going to look at
>> hstore, or do you want me to?

> Go for it. 

Done.

regards, tom lane




Re: Error-safe user functions

2022-12-27 Thread Andrew Dunstan



> On Dec 27, 2022, at 12:47 PM, Tom Lane  wrote:
> 
> Andrew Dunstan  writes:
>> Here's a patch that covers the ltree and intarray contrib modules.
> 
> I would probably have done this a little differently --- I think
> the added "res" parameters aren't really necessary for most of
> these.  But it's not worth arguing over.

I’ll take another look 


> 
>> I think that would leave just hstore to be done.
> 
> Yeah, that matches my scoreboard.  Are you going to look at
> hstore, or do you want me to?
> 
>  

Go for it. 

Cheers

Andrew 



Re: Error-safe user functions

2022-12-27 Thread Tom Lane
Andrew Dunstan  writes:
> Here's a patch that covers the ltree and intarray contrib modules.

I would probably have done this a little differently --- I think
the added "res" parameters aren't really necessary for most of
these.  But it's not worth arguing over.

> I think that would leave just hstore to be done.

Yeah, that matches my scoreboard.  Are you going to look at
hstore, or do you want me to?

regards, tom lane




Re: Error-safe user functions

2022-12-27 Thread Andrew Dunstan


On 2022-12-26 Mo 18:00, Tom Lane wrote:
> I wrote:
>> (Perhaps we should go further than this, and convert all these
>> functions to just be DirectInputFunctionCallSafe wrappers
>> around the corresponding input functions?  That would save
>> some duplicative code, but I've not done it here.)
> I looked closer at that idea, and realized that it would do more than
> just save some code: it'd cause the to_regfoo functions to accept
> numeric OIDs, as they did not before (and are documented not to).
> It is unclear to me whether that inconsistency with the input
> functions is really desirable or not --- but I don't offhand see a
> good argument for it.  If we change this though, it should probably
> happen in a separate commit.  Accordingly, here's a delta patch
> doing that.
>
>   


+1 for doing this. The code simplification is nice too.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Error-safe user functions

2022-12-27 Thread Andrew Dunstan

On 2022-12-26 Mo 14:12, Andrew Dunstan wrote:
> On 2022-12-26 Mo 12:47, Tom Lane wrote:
>> Here's a proposed patch for making tsvectorin and tsqueryin
>> report errors softly.  We have to take the changes down a
>> couple of levels of subroutines, but it's not hugely difficult.
>
> Great!
>
>
>> With the other patches I've posted recently, this covers all
>> of the core datatype input functions.  There are still half
>> a dozen to tackle in contrib.
>>
>>  
>
> Yeah, I'm currently looking at those in ltree.
>
>

Here's a patch that covers the ltree and intarray contrib modules. I
think that would leave just hstore to be done.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com
diff --git a/contrib/intarray/_int_bool.c b/contrib/intarray/_int_bool.c
index 3ed88af76d..77bbf77c43 100644
--- a/contrib/intarray/_int_bool.c
+++ b/contrib/intarray/_int_bool.c
@@ -149,8 +149,8 @@ pushquery(WORKSTATE *state, int32 type, int32 val)
 /*
  * make polish notation of query
  */
-static int32
-makepol(WORKSTATE *state)
+static bool
+makepol(WORKSTATE *state, int32 *res, struct Node *escontext)
 {
 	int32		val,
 type;
@@ -179,7 +179,7 @@ makepol(WORKSTATE *state)
 else
 {
 	if (lenstack == STACKDEPTH)
-		ereport(ERROR,
+		ereturn(escontext, false,
 (errcode(ERRCODE_STATEMENT_TOO_COMPLEX),
  errmsg("statement too complex")));
 	stack[lenstack] = val;
@@ -187,8 +187,10 @@ makepol(WORKSTATE *state)
 }
 break;
 			case OPEN:
-if (makepol(state) == ERR)
-	return ERR;
+if (!makepol(state, res, escontext))
+	return false;
+if (*res == ERR)
+	return true;
 while (lenstack && (stack[lenstack - 1] == (int32) '&' ||
 	stack[lenstack - 1] == (int32) '!'))
 {
@@ -202,14 +204,14 @@ makepol(WORKSTATE *state)
 	lenstack--;
 	pushquery(state, OPR, stack[lenstack]);
 };
-return END;
+*res = END;
+return true;
 break;
 			case ERR:
 			default:
-ereport(ERROR,
+ereturn(escontext, false,
 		(errcode(ERRCODE_SYNTAX_ERROR),
 		 errmsg("syntax error")));
-return ERR;
 		}
 	}
 
@@ -218,7 +220,8 @@ makepol(WORKSTATE *state)
 		lenstack--;
 		pushquery(state, OPR, stack[lenstack]);
 	};
-	return END;
+	*res = END;
+	return true;
 }
 
 typedef struct
@@ -483,6 +486,8 @@ bqarr_in(PG_FUNCTION_ARGS)
 	ITEM	   *ptr;
 	NODE	   *tmp;
 	int32		pos = 0;
+	int32   polres;
+	struct Node *escontext = fcinfo->context;
 
 #ifdef BS_DEBUG
 	StringInfoData pbuf;
@@ -495,14 +500,15 @@ bqarr_in(PG_FUNCTION_ARGS)
 	state.str = NULL;
 
 	/* make polish notation (postfix, but in reverse order) */
-	makepol();
+	if (!makepol(, , escontext))
+		PG_RETURN_NULL();
 	if (!state.num)
-		ereport(ERROR,
+		ereturn(escontext, (Datum) 0,
 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
  errmsg("empty query")));
 
 	if (state.num > QUERYTYPEMAXITEMS)
-		ereport(ERROR,
+		ereturn(escontext, (Datum) 0,
 (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
  errmsg("number of query items (%d) exceeds the maximum allowed (%d)",
 		state.num, (int) QUERYTYPEMAXITEMS)));
diff --git a/contrib/intarray/expected/_int.out b/contrib/intarray/expected/_int.out
index a09d40efa1..c953065a5c 100644
--- a/contrib/intarray/expected/_int.out
+++ b/contrib/intarray/expected/_int.out
@@ -398,6 +398,21 @@ SELECT '1&(2&(4&(5|!6)))'::query_int;
  1 & 2 & 4 & ( 5 | !6 )
 (1 row)
 
+-- test non-error-throwing input
+SELECT str as "query_int",
+   pg_input_is_valid(str,'query_int') as ok,
+   pg_input_error_message(str,'query_int') as errmsg
+FROM (VALUES ('1&(2&(4&(5|6)))'),
+ ('1#(2&(4&(5&6)))'),
+ ('foo'))
+  AS a(str);
+query_int| ok |errmsg
+-++--
+ 1&(2&(4&(5|6))) | t  | 
+ 1#(2&(4&(5&6))) | f  | syntax error
+ foo | f  | syntax error
+(3 rows)
+
 CREATE TABLE test__int( a int[] );
 \copy test__int from 'data/test__int.data'
 ANALYZE test__int;
diff --git a/contrib/intarray/sql/_int.sql b/contrib/intarray/sql/_int.sql
index b26fc57e4d..4c9ba4c1fb 100644
--- a/contrib/intarray/sql/_int.sql
+++ b/contrib/intarray/sql/_int.sql
@@ -75,6 +75,17 @@ SELECT '1&2&4&5&6'::query_int;
 SELECT '1&(2&(4&(5|6)))'::query_int;
 SELECT '1&(2&(4&(5|!6)))'::query_int;
 
+-- test non-error-throwing input
+
+SELECT str as "query_int",
+   pg_input_is_valid(str,'query_int') as ok,
+   pg_input_error_message(str,'query_int') as errmsg
+FROM (VALUES ('1&(2&(4&(5|6)))'),
+ ('1#(2&(4&(5&6)))'),
+ ('foo'))
+  AS a(str);
+
+
 
 CREATE TABLE test__int( a int[] );
 \copy test__int from 'data/test__int.data'
diff --git a/contrib/ltree/expected/ltree.out b/contrib/ltree/expected/ltree.out
index c6d8f3ef75..b95be71c78 100644
--- a/contrib/ltree/expected/ltree.out
+++ b/contrib/ltree/expected/ltree.out
@@ -8084,3 +8084,28 @@ SELECT count(*) FROM _ltreetest WHERE t ? '{23.*.1,23.*.2}' ;
 15
 (1 row)
 
+-- test 

Re: Error-safe user functions

2022-12-26 Thread Tom Lane
I wrote:
> (Perhaps we should go further than this, and convert all these
> functions to just be DirectInputFunctionCallSafe wrappers
> around the corresponding input functions?  That would save
> some duplicative code, but I've not done it here.)

I looked closer at that idea, and realized that it would do more than
just save some code: it'd cause the to_regfoo functions to accept
numeric OIDs, as they did not before (and are documented not to).
It is unclear to me whether that inconsistency with the input
functions is really desirable or not --- but I don't offhand see a
good argument for it.  If we change this though, it should probably
happen in a separate commit.  Accordingly, here's a delta patch
doing that.

regards, tom lane

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 836b9254fb..3bf8d021c3 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -24100,8 +24100,7 @@ SELECT collation for ('foo' COLLATE "de_DE");
 obtained by casting the string to type regclass (see
 ); however, this function will return
 NULL rather than throwing an error if the name is
-not found.  Also unlike the cast, this does not accept
-a numeric OID as input.
+not found.

   
 
@@ -24118,8 +24117,7 @@ SELECT collation for ('foo' COLLATE "de_DE");
 obtained by casting the string to type regcollation (see
 ); however, this function will return
 NULL rather than throwing an error if the name is
-not found.  Also unlike the cast, this does not accept
-a numeric OID as input.
+not found.

   
 
@@ -24136,8 +24134,7 @@ SELECT collation for ('foo' COLLATE "de_DE");
 obtained by casting the string to type regnamespace (see
 ); however, this function will return
 NULL rather than throwing an error if the name is
-not found.  Also unlike the cast, this does not accept
-a numeric OID as input.
+not found.

   
 
@@ -24154,8 +24151,7 @@ SELECT collation for ('foo' COLLATE "de_DE");
 obtained by casting the string to type regoper (see
 ); however, this function will return
 NULL rather than throwing an error if the name is
-not found or is ambiguous.  Also unlike the cast, this does not accept
-a numeric OID as input.
+not found or is ambiguous.

   
 
@@ -24172,8 +24168,7 @@ SELECT collation for ('foo' COLLATE "de_DE");
 obtained by casting the string to type regoperator (see
 ); however, this function will return
 NULL rather than throwing an error if the name is
-not found.  Also unlike the cast, this does not accept
-a numeric OID as input.
+not found.

   
 
@@ -24190,8 +24185,7 @@ SELECT collation for ('foo' COLLATE "de_DE");
 obtained by casting the string to type regproc (see
 ); however, this function will return
 NULL rather than throwing an error if the name is
-not found or is ambiguous.  Also unlike the cast, this does not accept
-a numeric OID as input.
+not found or is ambiguous.

   
 
@@ -24208,8 +24202,7 @@ SELECT collation for ('foo' COLLATE "de_DE");
 obtained by casting the string to type regprocedure (see
 ); however, this function will return
 NULL rather than throwing an error if the name is
-not found.  Also unlike the cast, this does not accept
-a numeric OID as input.
+not found.

   
 
@@ -24226,8 +24219,7 @@ SELECT collation for ('foo' COLLATE "de_DE");
 obtained by casting the string to type regrole (see
 ); however, this function will return
 NULL rather than throwing an error if the name is
-not found.  Also unlike the cast, this does not accept
-a numeric OID as input.
+not found.

   
 
@@ -24244,8 +24236,7 @@ SELECT collation for ('foo' COLLATE "de_DE");
 obtained by casting the string to type regtype (see
 ); however, this function will return
 NULL rather than throwing an error if the name is
-not found.  Also unlike the cast, this does not accept
-a numeric OID as input.
+not found.

   
  
diff --git a/src/backend/utils/adt/regproc.c b/src/backend/utils/adt/regproc.c
index 14d76c856d..3635a94633 100644
--- a/src/backend/utils/adt/regproc.c
+++ b/src/backend/utils/adt/regproc.c
@@ -118,24 +118,15 @@ Datum
 to_regproc(PG_FUNCTION_ARGS)
 {
 	char	   *pro_name = text_to_cstring(PG_GETARG_TEXT_PP(0));
-	List	   *names;
-	FuncCandidateList clist;
+	Datum		result;
 	ErrorSaveContext escontext = {T_ErrorSaveContext};
 
-	/*
-	 * Parse the name into components and see if it matches any pg_proc
-	 * entries in the current search path.
-	 */
-	names = stringToQualifiedNameList(pro_name, (Node *) );
-	if (names == NIL)

Re: Error-safe user functions

2022-12-26 Thread Andrew Dunstan


On 2022-12-26 Mo 12:47, Tom Lane wrote:
> Here's a proposed patch for making tsvectorin and tsqueryin
> report errors softly.  We have to take the changes down a
> couple of levels of subroutines, but it's not hugely difficult.


Great!


>
> With the other patches I've posted recently, this covers all
> of the core datatype input functions.  There are still half
> a dozen to tackle in contrib.
>
>   


Yeah, I'm currently looking at those in ltree.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Error-safe user functions

2022-12-26 Thread Tom Lane
Here's a proposed patch for making tsvectorin and tsqueryin
report errors softly.  We have to take the changes down a
couple of levels of subroutines, but it's not hugely difficult.

A couple of points worthy of comment:

* To reduce API changes, I made the functions in
tsvector_parser.c and tsquery.c pass around the escontext pointer
in TSVectorParseState and TSQueryParserState respectively.
This is a little duplicative, but since those structs are private
within those files, there's no easy way to share the same
pointer except by adding it as a new parameter to all those
functions.  This also means that if any of the outside callers
of parse_tsquery (in to_tsany.c) wanted to do soft error handling
and wanted their custom PushFunctions to be able to report such
errors, they'd need to pass the escontext via their "opaque"
passthrough structs, making for yet a third copy.  Still,
I judged adding an extra parameter to dozens of functions wasn't
a better way.

* There are two places in tsquery parsing that emit nuisance
NOTICEs about empty queries.  I chose to suppress those when
soft error handling has been requested.  Maybe we should rethink
whether we want them at all?

With the other patches I've posted recently, this covers all
of the core datatype input functions.  There are still half
a dozen to tackle in contrib.

regards, tom lane

diff --git a/src/backend/tsearch/to_tsany.c b/src/backend/tsearch/to_tsany.c
index edeffacc2d..c0030ddaec 100644
--- a/src/backend/tsearch/to_tsany.c
+++ b/src/backend/tsearch/to_tsany.c
@@ -594,7 +594,8 @@ to_tsquery_byid(PG_FUNCTION_ARGS)
 	query = parse_tsquery(text_to_cstring(in),
 		  pushval_morph,
 		  PointerGetDatum(),
-		  0);
+		  0,
+		  NULL);
 
 	PG_RETURN_TSQUERY(query);
 }
@@ -630,7 +631,8 @@ plainto_tsquery_byid(PG_FUNCTION_ARGS)
 	query = parse_tsquery(text_to_cstring(in),
 		  pushval_morph,
 		  PointerGetDatum(),
-		  P_TSQ_PLAIN);
+		  P_TSQ_PLAIN,
+		  NULL);
 
 	PG_RETURN_POINTER(query);
 }
@@ -667,7 +669,8 @@ phraseto_tsquery_byid(PG_FUNCTION_ARGS)
 	query = parse_tsquery(text_to_cstring(in),
 		  pushval_morph,
 		  PointerGetDatum(),
-		  P_TSQ_PLAIN);
+		  P_TSQ_PLAIN,
+		  NULL);
 
 	PG_RETURN_TSQUERY(query);
 }
@@ -704,7 +707,8 @@ websearch_to_tsquery_byid(PG_FUNCTION_ARGS)
 	query = parse_tsquery(text_to_cstring(in),
 		  pushval_morph,
 		  PointerGetDatum(),
-		  P_TSQ_WEB);
+		  P_TSQ_WEB,
+		  NULL);
 
 	PG_RETURN_TSQUERY(query);
 }
diff --git a/src/backend/utils/adt/tsquery.c b/src/backend/utils/adt/tsquery.c
index a206926042..1097294d55 100644
--- a/src/backend/utils/adt/tsquery.c
+++ b/src/backend/utils/adt/tsquery.c
@@ -16,6 +16,7 @@
 
 #include "libpq/pqformat.h"
 #include "miscadmin.h"
+#include "nodes/miscnodes.h"
 #include "tsearch/ts_locale.h"
 #include "tsearch/ts_type.h"
 #include "tsearch/ts_utils.h"
@@ -58,10 +59,16 @@ typedef enum
 /*
  * get token from query string
  *
- * *operator is filled in with OP_* when return values is PT_OPR,
- * but *weight could contain a distance value in case of phrase operator.
- * *strval, *lenval and *weight are filled in when return value is PT_VAL
+ * All arguments except "state" are output arguments.
  *
+ * If return value is PT_OPR, then *operator is filled with an OP_* code
+ * and *weight will contain a distance value in case of phrase operator.
+ *
+ * If return value is PT_VAL, then *lenval, *strval, *weight, and *prefix
+ * are filled.
+ *
+ * If PT_ERR is returned then a soft error has occurred.  If state->escontext
+ * isn't already filled then this should be reported as a generic parse error.
  */
 typedef ts_tokentype (*ts_tokenizer) (TSQueryParserState state, int8 *operator,
 	  int *lenval, char **strval,
@@ -93,6 +100,9 @@ struct TSQueryParserStateData
 
 	/* state for value's parser */
 	TSVectorParseState valstate;
+
+	/* context object for soft errors - must match valstate's escontext */
+	Node	   *escontext;
 };
 
 /*
@@ -194,7 +204,7 @@ parse_phrase_operator(TSQueryParserState pstate, int16 *distance)
 if (ptr == endptr)
 	return false;
 else if (errno == ERANGE || l < 0 || l > MAXENTRYPOS)
-	ereport(ERROR,
+	ereturn(pstate->escontext, false,
 			(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 			 errmsg("distance in phrase operator must be an integer value between zero and %d inclusive",
 	MAXENTRYPOS)));
@@ -301,10 +311,8 @@ gettoken_query_standard(TSQueryParserState state, int8 *operator,
 }
 else if (t_iseq(state->buf, ':'))
 {
-	ereport(ERROR,
-			(errcode(ERRCODE_SYNTAX_ERROR),
-			 errmsg("syntax error in tsquery: \"%s\"",
-	state->buffer)));
+	/* generic syntax error message is fine */
+	return PT_ERR;
 }
 else if (!t_isspace(state->buf))
 {
@@ -320,12 +328,17 @@ gettoken_query_standard(TSQueryParserState state, int8 *operator,
 		state->state = WAITOPERATOR;
 	

Re: Error-safe user functions

2022-12-26 Thread Andrew Dunstan


On 2022-12-25 Su 12:13, Tom Lane wrote:
> Robert Haas  writes:
>> On Fri, Dec 16, 2022 at 1:31 PM Tom Lane  wrote:
>>> The reg* functions probably need a unified plan as to how far
>>> down we want to push non-error behavior.
>> I would be in favor of an aggressive approach.
> Here's a proposed patch for converting regprocin and friends
> to soft error reporting.  I'll say at the outset that it's an
> engineering compromise, and it may be worth going further in
> future.  But I doubt it's worth doing more than this for v16,
> because the next steps would be pretty invasive.


It's a judgement call, but I'm not too fussed about stopping here for
v16. I see the reg* items as probably the lowest priority to fix.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Error-safe user functions

2022-12-25 Thread Tom Lane
I got annoyed by the fact that types cid, xid, xid8 don't throw
error even for obvious garbage, because they just believe the
result of strtoul or strtoull without any checking.  That was
probably up to project standards when cidin and xidin were
written; but surely it's not anymore, especially when we can
piggyback on work already done for type oid.

Anybody have an objection to the following?  One note is that
because we already had test cases checking that xid would
accept hex input, I made the common subroutines use "0" not
"10" for strtoul's last argument, meaning that oid will accept
hex now too.

regards, tom lane

diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c
index 7cded73e6e..c67a79344a 100644
--- a/src/backend/utils/adt/numutils.c
+++ b/src/backend/utils/adt/numutils.c
@@ -477,6 +477,180 @@ invalid_syntax:
 	"bigint", s)));
 }
 
+/*
+ * Convert input string to an unsigned 32 bit integer.
+ *
+ * Allows any number of leading or trailing whitespace characters.
+ *
+ * If endloc isn't NULL, store a pointer to the rest of the string there,
+ * so that caller can parse the rest.  Otherwise, it's an error if anything
+ * but whitespace follows.
+ *
+ * typname is what is reported in error messges.
+ *
+ * If escontext points to an ErrorSaveContext node, that is filled instead
+ * of throwing an error; the caller must check SOFT_ERROR_OCCURRED()
+ * to detect errors.
+ */
+uint32
+uint32in_subr(const char *s, char **endloc,
+			  const char *typname, Node *escontext)
+{
+	uint32		result;
+	unsigned long cvt;
+	char	   *endptr;
+
+	/* Ensure that empty-input is handled consistently across platforms */
+	if (*s == '\0')
+		ereturn(escontext, 0,
+(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+ errmsg("invalid input syntax for type %s: \"%s\"",
+		typname, s)));
+
+	errno = 0;
+	cvt = strtoul(s, , 0);
+
+	/*
+	 * strtoul() normally only sets ERANGE.  On some systems it also may set
+	 * EINVAL, which simply means it couldn't parse the input string. This is
+	 * handled by the second "if" consistent across platforms.
+	 */
+	if (errno && errno != ERANGE && errno != EINVAL)
+		ereturn(escontext, 0,
+(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+ errmsg("invalid input syntax for type %s: \"%s\"",
+		typname, s)));
+
+	if (endptr == s)
+		ereturn(escontext, 0,
+(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+ errmsg("invalid input syntax for type %s: \"%s\"",
+		typname, s)));
+
+	if (errno == ERANGE)
+		ereturn(escontext, 0,
+(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+ errmsg("value \"%s\" is out of range for type %s",
+		s, typname)));
+
+	if (endloc)
+	{
+		/* caller wants to deal with rest of string */
+		*endloc = endptr;
+	}
+	else
+	{
+		/* allow only whitespace after number */
+		while (*endptr && isspace((unsigned char) *endptr))
+			endptr++;
+		if (*endptr)
+			ereturn(escontext, 0,
+	(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+	 errmsg("invalid input syntax for type %s: \"%s\"",
+			typname, s)));
+	}
+
+	result = (uint32) cvt;
+
+	/*
+	 * Cope with possibility that unsigned long is wider than uint32, in which
+	 * case strtoul will not raise an error for some values that are out of
+	 * the range of uint32.
+	 *
+	 * For backwards compatibility, we want to accept inputs that are given
+	 * with a minus sign, so allow the input value if it matches after either
+	 * signed or unsigned extension to long.
+	 *
+	 * To ensure consistent results on 32-bit and 64-bit platforms, make sure
+	 * the error message is the same as if strtoul() had returned ERANGE.
+	 */
+#if PG_UINT32_MAX != ULONG_MAX
+	if (cvt != (unsigned long) result &&
+		cvt != (unsigned long) ((int) result))
+		ereturn(escontext, 0,
+(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+ errmsg("value \"%s\" is out of range for type %s",
+		s, typname)));
+#endif
+
+	return result;
+}
+
+/*
+ * Convert input string to an unsigned 64 bit integer.
+ *
+ * Allows any number of leading or trailing whitespace characters.
+ *
+ * If endloc isn't NULL, store a pointer to the rest of the string there,
+ * so that caller can parse the rest.  Otherwise, it's an error if anything
+ * but whitespace follows.
+ *
+ * typname is what is reported in error messges.
+ *
+ * If escontext points to an ErrorSaveContext node, that is filled instead
+ * of throwing an error; the caller must check SOFT_ERROR_OCCURRED()
+ * to detect errors.
+ */
+uint64
+uint64in_subr(const char *s, char **endloc,
+			  const char *typname, Node *escontext)
+{
+	uint64		result;
+	char	   *endptr;
+
+	/* Ensure that empty-input is handled consistently across platforms */
+	if (*s == '\0')
+		ereturn(escontext, 0,
+(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+ errmsg("invalid input syntax for type %s: \"%s\"",
+		typname, s)));
+
+	errno = 0;
+	result = strtou64(s, , 0);
+
+	/*
+	 * strtoul[l]() normally only sets ERANGE.  On 

Re: Error-safe user functions

2022-12-25 Thread Tom Lane
Robert Haas  writes:
> On Fri, Dec 16, 2022 at 1:31 PM Tom Lane  wrote:
>> The reg* functions probably need a unified plan as to how far
>> down we want to push non-error behavior.

> I would be in favor of an aggressive approach.

Here's a proposed patch for converting regprocin and friends
to soft error reporting.  I'll say at the outset that it's an
engineering compromise, and it may be worth going further in
future.  But I doubt it's worth doing more than this for v16,
because the next steps would be pretty invasive.

I've converted all the errors thrown directly within regproc.c,
and also converted parseTypeString, typeStringToTypeName, and
stringToQualifiedNameList to report their own errors softly.
This affected some outside callers, but not so many of them
that I think it's worth inventing compatibility wrappers.

I dealt with lookup failures by just changing the input functions
to call the respective lookup functions with missing_ok = true,
and then throw their own error softly on failure.

Also, I've changed to_regproc() and friends to return NULL
in exactly the same cases that are now soft errors for the
input functions.  Previously they were a bit inconsistent
about what triggered hard errors vs. returning NULL.
(Perhaps we should go further than this, and convert all these
functions to just be DirectInputFunctionCallSafe wrappers
around the corresponding input functions?  That would save
some duplicative code, but I've not done it here.)

What's not fixed here:

1. As previously discussed, parse errors in type names are
thrown by the main grammar, so getting those to not be
hard errors seems like too big a lift for today.

2. Errors about invalid type modifiers (reported by
typenameTypeMod or type-specific typmodin routines) are not
trapped either.  Fixing this would require extending the
soft-error conventions to typmodin routines, which maybe will
be worth doing someday but it seems pretty far down the
priority list.  Specifying a typmod is surely not main-line
usage for regtypein.

3. Throwing our own error has the demerit that it might be
different from what the underlying lookup function would have
reported.  This is visible in some changes in existing
regression test cases, such as

-ERROR:  schema "ng_catalog" does not exist
+ERROR:  relation "ng_catalog.pg_class" does not exist

This isn't wrong, exactly, but the loss of specificity is
a bit annoying.

4. This still fails to trap errors about "too many dotted names"
and "cross-database references are not implemented", which are
thrown in DeconstructQualifiedName, LookupTypeName,
RangeVarGetRelid, and maybe some other places.

5. We also don't trap errors about "the schema exists but
you don't have USAGE permission to do a lookup in it",
because LookupExplicitNamespace still throws that even
when passed missing_ok = true.

The obvious way to fix #3,#4,#5 is to change pretty much all
of the catalog lookup infrastructure to deal in escontext
arguments instead of "missing_ok" booleans.  That might be
worth doing --- it'd have benefits beyond the immediate
problem, I think --- but I feel it's a bigger lift than we
want to undertake for v16.  It'd be better to spend the time
we have left for v16 on building features that use soft error
reporting than on refining corner cases in the reg* functions.

So I think we should stop more or less here, possibly after
changing the to_regfoo functions to be simple wrappers
around the soft input functions.

regards, tom lane

diff --git a/src/backend/catalog/objectaddress.c b/src/backend/catalog/objectaddress.c
index 109bdfb33f..a1df8b1ddc 100644
--- a/src/backend/catalog/objectaddress.c
+++ b/src/backend/catalog/objectaddress.c
@@ -2182,7 +2182,7 @@ pg_get_object_address(PG_FUNCTION_ARGS)
 			ereport(ERROR,
 	(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 	 errmsg("name or argument lists may not contain nulls")));
-		typename = typeStringToTypeName(TextDatumGetCString(elems[0]));
+		typename = typeStringToTypeName(TextDatumGetCString(elems[0]), NULL);
 	}
 	else if (type == OBJECT_LARGEOBJECT)
 	{
@@ -2238,7 +2238,8 @@ pg_get_object_address(PG_FUNCTION_ARGS)
 		(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 		 errmsg("name or argument lists may not contain nulls")));
 			args = lappend(args,
-		   typeStringToTypeName(TextDatumGetCString(elems[i])));
+		   typeStringToTypeName(TextDatumGetCString(elems[i]),
+NULL));
 		}
 	}
 	else
diff --git a/src/backend/parser/parse_type.c b/src/backend/parser/parse_type.c
index f7ad689459..8f3850aa4e 100644
--- a/src/backend/parser/parse_type.c
+++ b/src/backend/parser/parse_type.c
@@ -727,10 +727,15 @@ pts_error_callback(void *arg)
  * Given a string that is supposed to be a SQL-compatible type declaration,
  * such as "int4" or "integer" or "character varying(32)", parse
  * the string and return the result as a TypeName.
- * If the string cannot be parsed as a type, an error is raised.
+ *
+ * If the string 

Re: Error-safe user functions

2022-12-24 Thread Andrew Dunstan


On 2022-12-24 Sa 10:42, Tom Lane wrote:
> Andrew Dunstan  writes:
>> As I was giving this a final polish I noticed this in jspConvertRegexFlags:
>>     /*
>>      * We'll never need sub-match details at execution.  While
>>      * RE_compile_and_execute would set this flag anyway, force it on here to
>>      * ensure that the regex cache entries created by makeItemLikeRegex are
>>      * useful.
>>      */
>>     cflags |= REG_NOSUB;
>> Clearly the comment would no longer be true. I guess I should just
>> remove this?
> Yeah, we can just drop that I guess.  I'm slightly worried that we might
> need it again after some future refactoring; but it's not really worth
> devising a re-worded comment to justify keeping it.
>
> Also, I realized that I failed in my reviewerly duty by not noticing
> that you'd forgotten to pg_regfree the regex after successful
> compilation.  Running something like this exposes the memory leak
> very quickly:
>
> select pg_input_is_valid('$ ? (@ like_regex "pattern" flag "smixq")', 
> 'jsonpath')
>   from generate_series(1,1000);
>
> The attached delta patch takes care of it.  (Per comment at pg_regcomp,
> we don't need this after a failure return.)
>
>   


Thanks, pushed with those changes.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Error-safe user functions

2022-12-24 Thread Tom Lane
Ted Yu  writes:
> On Fri, Dec 23, 2022 at 1:22 PM Tom Lane  wrote:
>> Yes, it is.  We don't want a query-cancel transformed into a soft error.

> The same regex, without user interruption, would exhibit an `invalid
> regular expression` error.

On what grounds do you claim that?  The timing of arrival of the SIGINT
is basically chance --- it might happen while we're inside backend/regex,
or not.  I mean, sure you could claim that a bad regex might run a long
time and thereby be more likely to cause the user to issue a query
cancel, but that's a stretched line of reasoning.

regards, tom lane




Re: Error-safe user functions

2022-12-24 Thread Tom Lane
Andrew Dunstan  writes:
> As I was giving this a final polish I noticed this in jspConvertRegexFlags:

>     /*
>      * We'll never need sub-match details at execution.  While
>      * RE_compile_and_execute would set this flag anyway, force it on here to
>      * ensure that the regex cache entries created by makeItemLikeRegex are
>      * useful.
>      */
>     cflags |= REG_NOSUB;

> Clearly the comment would no longer be true. I guess I should just
> remove this?

Yeah, we can just drop that I guess.  I'm slightly worried that we might
need it again after some future refactoring; but it's not really worth
devising a re-worded comment to justify keeping it.

Also, I realized that I failed in my reviewerly duty by not noticing
that you'd forgotten to pg_regfree the regex after successful
compilation.  Running something like this exposes the memory leak
very quickly:

select pg_input_is_valid('$ ? (@ like_regex "pattern" flag "smixq")', 
'jsonpath')
  from generate_series(1,1000);

The attached delta patch takes care of it.  (Per comment at pg_regcomp,
we don't need this after a failure return.)

regards, tom lane

diff --git a/src/backend/utils/adt/jsonpath_gram.y b/src/backend/utils/adt/jsonpath_gram.y
index 8c3a0c7623..30179408f5 100644
--- a/src/backend/utils/adt/jsonpath_gram.y
+++ b/src/backend/utils/adt/jsonpath_gram.y
@@ -560,6 +560,8 @@ makeItemLikeRegex(JsonPathParseItem *expr, JsonPathString *pattern,
 	(errcode(ERRCODE_INVALID_REGULAR_EXPRESSION),
 	 errmsg("invalid regular expression: %s", errMsg)));
 		}
+
+		pg_regfree(_tmp);
 	}
 
 	*result = v;


Re: Error-safe user functions

2022-12-24 Thread Ted Yu
On Sat, Dec 24, 2022 at 4:38 AM Andrew Dunstan  wrote:

>
> On 2022-12-24 Sa 04:51, Ted Yu wrote:
> >
> >
> > On Fri, Dec 23, 2022 at 1:22 PM Tom Lane  wrote:
> >
> > Ted Yu  writes:
> > > In makeItemLikeRegex :
> >
> > > +   /* See regexp.c for explanation */
> > > +   CHECK_FOR_INTERRUPTS();
> > > +   pg_regerror(re_result, _tmp, errMsg,
> > > sizeof(errMsg));
> > > +   ereturn(escontext, false,
> >
> > > Since an error is returned, I wonder if the
> > `CHECK_FOR_INTERRUPTS` call is
> > > still necessary.
> >
> > Yes, it is.  We don't want a query-cancel transformed into a soft
> > error.
> >
> > regards, tom lane
> >
> > Hi,
> > For this case (`invalid regular expression`), the potential user
> > interruption is one reason for stopping execution.
> > I feel surfacing user interruption somehow masks the underlying error.
> >
> > The same regex, without user interruption, would exhibit an `invalid
> > regular expression` error.
> > I think it would be better to surface the error.
> >
> >
>
> All that this patch is doing is replacing a call to
> RE_compile_and_cache, which calls CHECK_FOR_INTERRUPTS, with similar
> code, which gives us the opportunity to call ereturn instead of ereport.
> Note that where escontext is NULL (the common case), ereturn functions
> identically to ereport. So unless you want to argue that the logic in
> RE_compile_and_cache is wrong I don't see what we're arguing about. If
> instead I had altered the API of RE_compile_and_cache to include an
> escontext parameter we wouldn't be having this argument at all. The only
> reason I didn't do that was the point Tom quite properly raised about
> why we're doing any caching here anyway.
>
>
> cheers
>
>
> andrew
>
> --
> Andrew Dunstan
> EDB: https://www.enterprisedb.com

Andrew:

Thanks for the response.


Re: Error-safe user functions

2022-12-24 Thread Andrew Dunstan


On 2022-12-23 Fr 13:53, Tom Lane wrote:
> Andrew Dunstan  writes:
>> On 2022-12-22 Th 11:44, Tom Lane wrote:
>>> (I wonder why this is using RE_compile_and_cache at all, really,
>>> rather than some other API.  There doesn't seem to be value in
>>> forcing the regex into the cache at this point.)
>> I agree. The attached uses pg_regcomp instead. I had a lift a couple of
>> lines from regexp.c, but not too many.
> LGTM.  No further comments.
>
>   


As I was giving this a final polish I noticed this in jspConvertRegexFlags:


    /*
     * We'll never need sub-match details at execution.  While
     * RE_compile_and_execute would set this flag anyway, force it on
here to
     * ensure that the regex cache entries created by makeItemLikeRegex are
     * useful.
     */
    cflags |= REG_NOSUB;


Clearly the comment would no longer be true. I guess I should just
remove this?


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Error-safe user functions

2022-12-24 Thread Andrew Dunstan


On 2022-12-24 Sa 04:51, Ted Yu wrote:
>
>
> On Fri, Dec 23, 2022 at 1:22 PM Tom Lane  wrote:
>
> Ted Yu  writes:
> > In makeItemLikeRegex :
>
> > +                       /* See regexp.c for explanation */
> > +                       CHECK_FOR_INTERRUPTS();
> > +                       pg_regerror(re_result, _tmp, errMsg,
> > sizeof(errMsg));
> > +                       ereturn(escontext, false,
>
> > Since an error is returned, I wonder if the
> `CHECK_FOR_INTERRUPTS` call is
> > still necessary.
>
> Yes, it is.  We don't want a query-cancel transformed into a soft
> error.
>
>                         regards, tom lane
>
> Hi,
> For this case (`invalid regular expression`), the potential user
> interruption is one reason for stopping execution.
> I feel surfacing user interruption somehow masks the underlying error.
>
> The same regex, without user interruption, would exhibit an `invalid
> regular expression` error.
> I think it would be better to surface the error.
>
>

All that this patch is doing is replacing a call to
RE_compile_and_cache, which calls CHECK_FOR_INTERRUPTS, with similar
code, which gives us the opportunity to call ereturn instead of ereport.
Note that where escontext is NULL (the common case), ereturn functions
identically to ereport. So unless you want to argue that the logic in
RE_compile_and_cache is wrong I don't see what we're arguing about. If
instead I had altered the API of RE_compile_and_cache to include an
escontext parameter we wouldn't be having this argument at all. The only
reason I didn't do that was the point Tom quite properly raised about
why we're doing any caching here anyway.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Error-safe user functions

2022-12-24 Thread Ted Yu
On Fri, Dec 23, 2022 at 1:22 PM Tom Lane  wrote:

> Ted Yu  writes:
> > In makeItemLikeRegex :
>
> > +   /* See regexp.c for explanation */
> > +   CHECK_FOR_INTERRUPTS();
> > +   pg_regerror(re_result, _tmp, errMsg,
> > sizeof(errMsg));
> > +   ereturn(escontext, false,
>
> > Since an error is returned, I wonder if the `CHECK_FOR_INTERRUPTS` call
> is
> > still necessary.
>
> Yes, it is.  We don't want a query-cancel transformed into a soft error.
>
> regards, tom lane
>
Hi,
For this case (`invalid regular expression`), the potential user
interruption is one reason for stopping execution.
I feel surfacing user interruption somehow masks the underlying error.

The same regex, without user interruption, would exhibit an `invalid
regular expression` error.
I think it would be better to surface the error.

Cheers


Re: Error-safe user functions

2022-12-23 Thread Tom Lane
Ted Yu  writes:
> On Fri, Dec 23, 2022 at 1:22 PM Tom Lane  wrote:
>> Ted Yu  writes:
>>> +   /* See regexp.c for explanation */
>>> +   CHECK_FOR_INTERRUPTS();

>> Yes, it is.  We don't want a query-cancel transformed into a soft error.

> `ereturn(escontext` calls appear in multiple places in the patch.
> What about other callsites (w.r.t. checking interrupt) ?

What about them?  The reason this one is special is that backend/regexp
might return a failure code that's specifically "I gave up because
there's a query cancel pending".  We don't want to report that as a soft
error.  It's true that we might cancel the query for real a bit later on
even if this check weren't here, but that doesn't mean it's okay to go
down the soft error path and hope that there'll be a CHECK_FOR_INTERRUPTS
sometime before there's any visible evidence that we did the wrong thing.

regards, tom lane




Re: Error-safe user functions

2022-12-23 Thread Ted Yu
On Fri, Dec 23, 2022 at 1:22 PM Tom Lane  wrote:

> Ted Yu  writes:
> > In makeItemLikeRegex :
>
> > +   /* See regexp.c for explanation */
> > +   CHECK_FOR_INTERRUPTS();
> > +   pg_regerror(re_result, _tmp, errMsg,
> > sizeof(errMsg));
> > +   ereturn(escontext, false,
>
> > Since an error is returned, I wonder if the `CHECK_FOR_INTERRUPTS` call
> is
> > still necessary.
>
> Yes, it is.  We don't want a query-cancel transformed into a soft error.
>
> regards, tom lane
>
Hi,
`ereturn(escontext` calls appear in multiple places in the patch.
What about other callsites (w.r.t. checking interrupt) ?

Cheers


Re: Error-safe user functions

2022-12-23 Thread Tom Lane
Ted Yu  writes:
> In makeItemLikeRegex :

> +   /* See regexp.c for explanation */
> +   CHECK_FOR_INTERRUPTS();
> +   pg_regerror(re_result, _tmp, errMsg,
> sizeof(errMsg));
> +   ereturn(escontext, false,

> Since an error is returned, I wonder if the `CHECK_FOR_INTERRUPTS` call is
> still necessary.

Yes, it is.  We don't want a query-cancel transformed into a soft error.

regards, tom lane




Re: Error-safe user functions

2022-12-23 Thread Ted Yu
On Fri, Dec 23, 2022 at 9:20 AM Andrew Dunstan  wrote:

>
> On 2022-12-22 Th 11:44, Tom Lane wrote:
> > Andrew Dunstan  writes:
> >> Yeah, I started there, but it's substantially more complex - unlike cube
> >> the jsonpath scanner calls the error routines as well as the parser.
> >> Anyway, here's a patch.
> > I looked through this and it seems generally OK.  A minor nitpick is
> > that we usually write "(Datum) 0" not "(Datum) NULL" for dont-care Datum
> > values.
>
>
> Fixed in the new version attached.
>
>
> > A slightly bigger issue is that makeItemLikeRegex still allows
> > an error to be thrown from RE_compile_and_cache if a bogus regex is
> > presented.  But that could be dealt with later.
>
>
> I'd rather fix it now while we're paying attention.
>
>
> >
> > (I wonder why this is using RE_compile_and_cache at all, really,
> > rather than some other API.  There doesn't seem to be value in
> > forcing the regex into the cache at this point.)
> >
> >
>
>
> I agree. The attached uses pg_regcomp instead. I had a lift a couple of
> lines from regexp.c, but not too many.
>
>
> cheers
>
>
> andrew
>
>
> --
> Andrew Dunstan
> EDB: https://www.enterprisedb.com

Hi,
In makeItemLikeRegex :

+   /* See regexp.c for explanation */
+   CHECK_FOR_INTERRUPTS();
+   pg_regerror(re_result, _tmp, errMsg,
sizeof(errMsg));
+   ereturn(escontext, false,

Since an error is returned, I wonder if the `CHECK_FOR_INTERRUPTS` call is
still necessary.

 Cheers


Re: Error-safe user functions

2022-12-23 Thread Tom Lane
Andrew Dunstan  writes:
> On 2022-12-22 Th 11:44, Tom Lane wrote:
>> (I wonder why this is using RE_compile_and_cache at all, really,
>> rather than some other API.  There doesn't seem to be value in
>> forcing the regex into the cache at this point.)

> I agree. The attached uses pg_regcomp instead. I had a lift a couple of
> lines from regexp.c, but not too many.

LGTM.  No further comments.

regards, tom lane




Re: Error-safe user functions

2022-12-23 Thread Andrew Dunstan

On 2022-12-22 Th 11:44, Tom Lane wrote:
> Andrew Dunstan  writes:
>> Yeah, I started there, but it's substantially more complex - unlike cube
>> the jsonpath scanner calls the error routines as well as the parser.
>> Anyway, here's a patch.
> I looked through this and it seems generally OK.  A minor nitpick is
> that we usually write "(Datum) 0" not "(Datum) NULL" for dont-care Datum
> values.  


Fixed in the new version attached.


> A slightly bigger issue is that makeItemLikeRegex still allows
> an error to be thrown from RE_compile_and_cache if a bogus regex is
> presented.  But that could be dealt with later.


I'd rather fix it now while we're paying attention.


>
> (I wonder why this is using RE_compile_and_cache at all, really,
> rather than some other API.  There doesn't seem to be value in
> forcing the regex into the cache at this point.)
>
>   


I agree. The attached uses pg_regcomp instead. I had a lift a couple of
lines from regexp.c, but not too many.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com
diff --git a/src/backend/utils/adt/jsonpath.c b/src/backend/utils/adt/jsonpath.c
index 91af030095..6c7a5b9854 100644
--- a/src/backend/utils/adt/jsonpath.c
+++ b/src/backend/utils/adt/jsonpath.c
@@ -66,16 +66,19 @@
 #include "funcapi.h"
 #include "lib/stringinfo.h"
 #include "libpq/pqformat.h"
+#include "nodes/miscnodes.h"
 #include "miscadmin.h"
 #include "utils/builtins.h"
 #include "utils/json.h"
 #include "utils/jsonpath.h"
 
 
-static Datum jsonPathFromCstring(char *in, int len);
+static Datum jsonPathFromCstring(char *in, int len, struct Node *escontext);
 static char *jsonPathToCstring(StringInfo out, JsonPath *in,
 			   int estimated_len);
-static int	flattenJsonPathParseItem(StringInfo buf, JsonPathParseItem *item,
+static bool	flattenJsonPathParseItem(StringInfo buf, int *result,
+	 struct Node *escontext,
+	 JsonPathParseItem *item,
 	 int nestingLevel, bool insideArraySubscript);
 static void alignStringInfoInt(StringInfo buf);
 static int32 reserveSpaceForItemPointer(StringInfo buf);
@@ -95,7 +98,7 @@ jsonpath_in(PG_FUNCTION_ARGS)
 	char	   *in = PG_GETARG_CSTRING(0);
 	int			len = strlen(in);
 
-	return jsonPathFromCstring(in, len);
+	return jsonPathFromCstring(in, len, fcinfo->context);
 }
 
 /*
@@ -119,7 +122,7 @@ jsonpath_recv(PG_FUNCTION_ARGS)
 	else
 		elog(ERROR, "unsupported jsonpath version number: %d", version);
 
-	return jsonPathFromCstring(str, nbytes);
+	return jsonPathFromCstring(str, nbytes, NULL);
 }
 
 /*
@@ -165,24 +168,29 @@ jsonpath_send(PG_FUNCTION_ARGS)
  * representation of jsonpath.
  */
 static Datum
-jsonPathFromCstring(char *in, int len)
+jsonPathFromCstring(char *in, int len, struct Node *escontext)
 {
-	JsonPathParseResult *jsonpath = parsejsonpath(in, len);
+	JsonPathParseResult *jsonpath = parsejsonpath(in, len, escontext);
 	JsonPath   *res;
 	StringInfoData buf;
 
-	initStringInfo();
-	enlargeStringInfo(, 4 * len /* estimation */ );
-
-	appendStringInfoSpaces(, JSONPATH_HDRSZ);
+	if (SOFT_ERROR_OCCURRED(escontext))
+		return (Datum) NULL;
 
 	if (!jsonpath)
-		ereport(ERROR,
+		ereturn(escontext, (Datum) NULL,
 (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
  errmsg("invalid input syntax for type %s: \"%s\"", "jsonpath",
 		in)));
 
-	flattenJsonPathParseItem(, jsonpath->expr, 0, false);
+	initStringInfo();
+	enlargeStringInfo(, 4 * len /* estimation */ );
+
+	appendStringInfoSpaces(, JSONPATH_HDRSZ);
+
+	if (!flattenJsonPathParseItem(, NULL, escontext,
+  jsonpath->expr, 0, false))
+		return (Datum) NULL;
 
 	res = (JsonPath *) buf.data;
 	SET_VARSIZE(res, buf.len);
@@ -225,9 +233,10 @@ jsonPathToCstring(StringInfo out, JsonPath *in, int estimated_len)
  * Recursive function converting given jsonpath parse item and all its
  * children into a binary representation.
  */
-static int
-flattenJsonPathParseItem(StringInfo buf, JsonPathParseItem *item,
-		 int nestingLevel, bool insideArraySubscript)
+static bool
+flattenJsonPathParseItem(StringInfo buf,  int *result, struct Node *escontext,
+		 JsonPathParseItem *item, int nestingLevel,
+		 bool insideArraySubscript)
 {
 	/* position from beginning of jsonpath data */
 	int32		pos = buf->len - JSONPATH_HDRSZ;
@@ -295,16 +304,22 @@ flattenJsonPathParseItem(StringInfo buf, JsonPathParseItem *item,
 int32		left = reserveSpaceForItemPointer(buf);
 int32		right = reserveSpaceForItemPointer(buf);
 
-chld = !item->value.args.left ? pos :
-	flattenJsonPathParseItem(buf, item->value.args.left,
-			 nestingLevel + argNestingLevel,
-			 insideArraySubscript);
+if (!item->value.args.left)
+	chld = pos;
+else if (! flattenJsonPathParseItem(buf, , escontext,
+	item->value.args.left,
+	nestingLevel + argNestingLevel,
+	insideArraySubscript))
+	return false;
 *(int32 *) (buf->data + left) = chld - pos;
 
-chld = 

Re: Error-safe user functions

2022-12-23 Thread Andrew Dunstan


On 2022-12-22 Th 01:10, Tom Lane wrote:
> Andrew Dunstan  writes:
>> And here's another for contrib/seg
>> I'm planning to commit these two in the next day or so.
> I didn't look at the jsonpath one yet.  The seg patch passes
> an eyeball check, with one minor nit: in seg_atof,
>
> + *result = float4in_internal(value, NULL, "real", value, escontext);
>
> don't we want to use "seg" as the type_name?
>
> Even more nitpicky, in
>
> +seg_yyerror(SEG *result, struct Node *escontext, const char *message)
>  {
> + if (SOFT_ERROR_OCCURRED(escontext))
> + return;
>
> I'd be inclined to add some explanation, say
>
> +seg_yyerror(SEG *result, struct Node *escontext, const char *message)
>  {
> + /* if we already reported an error, don't overwrite it */
> + if (SOFT_ERROR_OCCURRED(escontext))
> + return;
>
>   


Thanks for the review.


Fixed both of these and pushed.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Error-safe user functions

2022-12-22 Thread Tom Lane
Andrew Dunstan  writes:
> Yeah, I started there, but it's substantially more complex - unlike cube
> the jsonpath scanner calls the error routines as well as the parser.
> Anyway, here's a patch.

I looked through this and it seems generally OK.  A minor nitpick is
that we usually write "(Datum) 0" not "(Datum) NULL" for dont-care Datum
values.  A slightly bigger issue is that makeItemLikeRegex still allows
an error to be thrown from RE_compile_and_cache if a bogus regex is
presented.  But that could be dealt with later.

(I wonder why this is using RE_compile_and_cache at all, really,
rather than some other API.  There doesn't seem to be value in
forcing the regex into the cache at this point.)

regards, tom lane




Re: Error-safe user functions

2022-12-21 Thread Tom Lane
Andrew Dunstan  writes:
> And here's another for contrib/seg
> I'm planning to commit these two in the next day or so.

I didn't look at the jsonpath one yet.  The seg patch passes
an eyeball check, with one minor nit: in seg_atof,

+   *result = float4in_internal(value, NULL, "real", value, escontext);

don't we want to use "seg" as the type_name?

Even more nitpicky, in

+seg_yyerror(SEG *result, struct Node *escontext, const char *message)
 {
+   if (SOFT_ERROR_OCCURRED(escontext))
+   return;

I'd be inclined to add some explanation, say

+seg_yyerror(SEG *result, struct Node *escontext, const char *message)
 {
+   /* if we already reported an error, don't overwrite it */
+   if (SOFT_ERROR_OCCURRED(escontext))
+   return;

regards, tom lane




Re: Error-safe user functions

2022-12-21 Thread Andrew Dunstan

On 2022-12-18 Su 09:42, Andrew Dunstan wrote:
> On 2022-12-14 We 17:37, Tom Lane wrote:
>> Andrew Dunstan  writes:
>>> Thanks, I have been looking at jsonpath, but I'm not quite sure how to
>>> get the escontext argument to the yyerror calls in jsonath_scan.l. Maybe
>>> I need to specify a lex-param setting?
>> You want a parse-param option in jsonpath_gram.y, I think; adding that
>> will persuade Bison to change the signatures of relevant functions.
>> Compare the mods I made in contrib/cube in ccff2d20e.
>>
>>  
>
> Yeah, I started there, but it's substantially more complex - unlike cube
> the jsonpath scanner calls the error routines as well as the parser.
>
>
> Anyway, here's a patch.
>
>

And here's another for contrib/seg

I'm planning to commit these two in the next day or so.


cheers


andew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com
From d7a0d93ee90ccc61b7179f236dcd1a824f30c8e6 Mon Sep 17 00:00:00 2001
From: Andrew Dunstan 
Date: Wed, 21 Dec 2022 18:14:03 -0500
Subject: [PATCH] Provide error safety for contrib/seg's input function

---
 contrib/seg/expected/seg.out | 20 
 contrib/seg/seg.c|  4 ++--
 contrib/seg/segdata.h|  5 +++--
 contrib/seg/segparse.y   | 34 +++---
 contrib/seg/segscan.l| 10 +++---
 contrib/seg/sql/seg.sql  | 13 +
 6 files changed, 68 insertions(+), 18 deletions(-)

diff --git a/contrib/seg/expected/seg.out b/contrib/seg/expected/seg.out
index 2320464dd4..7a06113ed8 100644
--- a/contrib/seg/expected/seg.out
+++ b/contrib/seg/expected/seg.out
@@ -1273,3 +1273,23 @@ FROM test_seg WHERE s @> '11.2..11.3' OR s IS NULL ORDER BY s;
||  
 (144 rows)
 
+-- test non error throwing API
+SELECT str as seg,
+   pg_input_is_valid(str,'seg') as ok,
+   pg_input_error_message(str,'seg') as errmsg
+FROM unnest(ARRAY['-1 .. 1'::text,
+  '100(+-)1',
+  '',
+  'ABC',
+  '1 e7',
+  '1e700']) str;
+   seg| ok |errmsg 
+--++---
+ -1 .. 1  | t  | 
+ 100(+-)1 | t  | 
+  | f  | bad seg representation
+ ABC  | f  | bad seg representation
+ 1 e7 | f  | bad seg representation
+ 1e700| f  | "1e700" is out of range for type real
+(6 rows)
+
diff --git a/contrib/seg/seg.c b/contrib/seg/seg.c
index a7effc1b19..7f9fc24eb4 100644
--- a/contrib/seg/seg.c
+++ b/contrib/seg/seg.c
@@ -108,8 +108,8 @@ seg_in(PG_FUNCTION_ARGS)
 
 	seg_scanner_init(str);
 
-	if (seg_yyparse(result) != 0)
-		seg_yyerror(result, "bogus input");
+	if (seg_yyparse(result, fcinfo->context) != 0)
+		seg_yyerror(result, fcinfo->context, "bogus input");
 
 	seg_scanner_finish();
 
diff --git a/contrib/seg/segdata.h b/contrib/seg/segdata.h
index f4eafc865d..3d6e3e3f24 100644
--- a/contrib/seg/segdata.h
+++ b/contrib/seg/segdata.h
@@ -16,9 +16,10 @@ extern int	significant_digits(const char *s);
 
 /* in segscan.l */
 extern int	seg_yylex(void);
-extern void seg_yyerror(SEG *result, const char *message) pg_attribute_noreturn();
+extern void seg_yyerror(SEG *result, struct Node *escontext,
+		const char *message);
 extern void seg_scanner_init(const char *str);
 extern void seg_scanner_finish(void);
 
 /* in segparse.y */
-extern int	seg_yyparse(SEG *result);
+extern int	seg_yyparse(SEG *result, struct Node *escontext);
diff --git a/contrib/seg/segparse.y b/contrib/seg/segparse.y
index 0156c3e027..19f1dba50f 100644
--- a/contrib/seg/segparse.y
+++ b/contrib/seg/segparse.y
@@ -7,7 +7,9 @@
 #include 
 
 #include "fmgr.h"
+#include "nodes/miscnodes.h"
 #include "utils/builtins.h"
+#include "utils/float.h"
 
 #include "segdata.h"
 
@@ -19,7 +21,7 @@
 #define YYMALLOC palloc
 #define YYFREE   pfree
 
-static float seg_atof(const char *value);
+static bool seg_atof(char *value, float *result, struct Node *escontext);
 
 static int sig_digits(const char *value);
 
@@ -35,6 +37,7 @@ static char strbuf[25] = {
 
 /* BISON Declarations */
 %parse-param {SEG *result}
+%parse-param {struct Node *escontext}
 %expect 0
 %name-prefix="seg_yy"
 
@@ -77,7 +80,7 @@ range: boundary PLUMIN deviation
 		result->lower = $1.val;
 		result->upper = $3.val;
 		if ( result->lower > result->upper ) {
-			ereport(ERROR,
+			errsave(escontext,
 	(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 	 errmsg("swapped boundaries: %g is greater than %g",
 			result->lower, result->upper)));
@@ -121,7 +124,10 @@ range: boundary PLUMIN deviation
 boundary: SEGFLOAT
 	{
 		/* temp variable avoids a gcc 3.3.x bug on Sparc64 */
-		float		val = seg_atof($1);
+		float		val;
+
+		if (!seg_atof($1, , escontext))
+			YYABORT;
 
 		$$.ext = '\0';
 		$$.sigd = sig_digits($1);
@@ -130,7 +136,10 @@ boundary: SEGFLOAT
 	| EXTENSION SEGFLOAT
 	{
 		/* temp variable avoids a gcc 3.3.x bug on Sparc64 */
-		float		val = seg_atof($2);
+		

CAST(... ON DEFAULT) - WIP build on top of Error-Safe User Functions

2022-12-19 Thread Corey Huinker
Attached is my work in progress to implement the changes to the CAST()
function as proposed by Vik Fearing.

This work builds upon the Error-safe User Functions work currently ongoing.

The proposed changes are as follows:

CAST(expr AS typename)
continues to behave as before.

CAST(expr AS typename ERROR ON ERROR)
has the identical behavior as the unadorned CAST() above.

CAST(expr AS typename NULL ON ERROR)
will use error-safe functions to do the cast of expr, and will return
NULL if the cast fails.

CAST(expr AS typename DEFAULT expr2 ON ERROR)
will use error-safe functions to do the cast of expr, and will return
expr2 if the cast fails.

There is an additional FORMAT parameter that I have not yet implemented, my
understanding is that it is largely intended for DATE/TIME field
conversions, but others are certainly possible.
CAST(expr AS typename FORMAT fmt DEFAULT expr2 ON ERROR)

What is currently working:
- Any scalar expression that can be evaluated at parse time. These tests
from cast.sql all currently work:

VALUES (CAST('error' AS integer));
VALUES (CAST('error' AS integer ERROR ON ERROR));
VALUES (CAST('error' AS integer NULL ON ERROR));
VALUES (CAST('error' AS integer DEFAULT 42 ON ERROR));

SELECT CAST('{123,abc,456}' AS integer[] DEFAULT '{-789}' ON ERROR) as
array_test1;

- Scalar values evaluated at runtime.

CREATE TEMPORARY TABLE t(t text);
INSERT INTO t VALUES ('a'), ('1'), ('b'), (2);
SELECT CAST(t.t AS integer DEFAULT -1 ON ERROR) AS foo FROM t;
 foo
-
  -1
   1
  -1
   2
(4 rows)


Along the way, I made a few design decisions, each of which is up for
debate:

First, I created OidInputFunctionCallSafe, which is to OidInputFunctionCall
what InputFunctionCallSafe is to InputFunctionCall. Given that the only
place I ended up using it was stringTypeDatumSafe(), it may be possible to
just move that code inside stringTypeDatumSafe.

Next, I had a need for FuncExpr, CoerceViaIO, and ArrayCoerce to all report
if their expr argument failed, and if not, just past the evaluation of
expr2. Rather than duplicate this logic in several places, I chose instead
to modify CoalesceExpr to allow for an error-test mode in addition to its
default null-test mode, and then to provide this altered node with two
expressions, the first being the error-safe typecast of expr and the second
being the non-error-safe typecast of expr2.

I still don't have array-to-array casts working, as the changed I would
likely need to make to ArrayCoerce get somewhat invasive, so this seemed
like a good time to post my work so far and solicit some feedback beyond
what I've already been getting from Jeff Davis and Michael Paquier.

I've sidestepped domains as well for the time being as well as avoiding JIT
issues entirely.

No documentation is currently prepared. All but one of the regression test
queries work, the one that is currently failing is:

SELECT CAST('{234,def,567}'::text[] AS integer[] DEFAULT '{-1011}' ON
ERROR) as array_test2;

Other quirks:
- an unaliased CAST ON DEFAULT will return the column name of "coalesce",
which internally is true, but obviously would be quite confusing to a user.

As a side observation, I noticed that the optimizer already tries to
resolve expressions based on constants and to collapse expression trees
where possible, which makes me wonder if the work done to do the same in
transformTypeCast/ and coerce_to_target_type and coerce_type isn't also
wasted.
From 897c9c68a29ad0fa57f28734df0c77553e026d80 Mon Sep 17 00:00:00 2001
From: coreyhuinker 
Date: Sun, 18 Dec 2022 16:20:01 -0500
Subject: [PATCH 1/3] add OidInputFunctionCall

---
 src/backend/utils/fmgr/fmgr.c | 13 +
 src/include/fmgr.h|  4 
 2 files changed, 17 insertions(+)

diff --git a/src/backend/utils/fmgr/fmgr.c b/src/backend/utils/fmgr/fmgr.c
index 0d37f69298..e9a19ce653 100644
--- a/src/backend/utils/fmgr/fmgr.c
+++ b/src/backend/utils/fmgr/fmgr.c
@@ -1701,6 +1701,19 @@ OidInputFunctionCall(Oid functionId, char *str, Oid typioparam, int32 typmod)
 	return InputFunctionCall(, str, typioparam, typmod);
 }
 
+bool
+OidInputFunctionCallSafe(Oid functionId, char *str, Oid typioparam,
+		 int32 typmod, fmNodePtr escontext,
+		 Datum *result)
+{
+	FmgrInfo			flinfo;
+
+	fmgr_info(functionId, );
+
+	return InputFunctionCallSafe(, str, typioparam, typmod,
+ escontext, result);
+}
+
 char *
 OidOutputFunctionCall(Oid functionId, Datum val)
 {
diff --git a/src/include/fmgr.h b/src/include/fmgr.h
index b7832d0aa2..b835ef72b5 100644
--- a/src/include/fmgr.h
+++ b/src/include/fmgr.h
@@ -706,6 +706,10 @@ extern bool InputFunctionCallSafe(FmgrInfo *flinfo, char *str,
   Datum *result);
 extern Datum OidInputFunctionCall(Oid functionId, char *str,
   Oid typioparam, int32 typmod);
+extern bool
+OidInputFunctionCallSafe(Oid functionId, char *str, Oid typioparam,
+		 int32 typmod, fmNodePtr escontext,
+		 Datum *result);
 extern char *OutputFunctionCal

Re: Error-safe user functions

2022-12-19 Thread Robert Haas
On Mon, Dec 19, 2022 at 4:27 PM Tom Lane  wrote:
> In [1] I wrote
>
> >>> I'm a little concerned about the cost-benefit of fixing the reg* types.
> >>> The ones that accept type names actually use the core grammar to parse
> >>> those.  Now, we probably could fix the grammar to be non-throwing, but
> >>> it'd be very invasive and I'm not sure about the performance impact.
> >>> It might be best to content ourselves with soft reporting of lookup
> >>> failures, as opposed to syntax problems.

Ah right.  I agree that invading the main grammar doesn't seem
terribly appealing. Setting regtypein aside could be a sensible
choice, then. Another option might be to have some way of parsing type
names outside of the main grammar, which would be more work and would
require keeping things in sync, but perhaps it would end up being less
ugly

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Error-safe user functions

2022-12-19 Thread Tom Lane
Robert Haas  writes:
> On Mon, Dec 19, 2022 at 11:44 AM Tom Lane  wrote:
>> ... I guess you didn't read my remarks upthread about regtypein.
>> I do not want to try to make gram.y+scan.l non-error-throwing.

> Huh, for some reason I'm not seeing an email about that. Do you have a link?

In [1] I wrote

>>> I'm a little concerned about the cost-benefit of fixing the reg* types.
>>> The ones that accept type names actually use the core grammar to parse
>>> those.  Now, we probably could fix the grammar to be non-throwing, but
>>> it'd be very invasive and I'm not sure about the performance impact.
>>> It might be best to content ourselves with soft reporting of lookup
>>> failures, as opposed to syntax problems.

regards, tom lane

[1] https://www.postgresql.org/message-id/1863335.1670783397%40sss.pgh.pa.us




Re: Error-safe user functions

2022-12-19 Thread Robert Haas
On Mon, Dec 19, 2022 at 11:44 AM Tom Lane  wrote:
> > It also doesn't seem too bad from an implementation point of view to
> > try to cover all the caes.
>
> ... I guess you didn't read my remarks upthread about regtypein.
> I do not want to try to make gram.y+scan.l non-error-throwing.

Huh, for some reason I'm not seeing an email about that. Do you have a link?

--
Robert Haas
EDB: http://www.enterprisedb.com




Re: Error-safe user functions

2022-12-19 Thread Tom Lane
Robert Haas  writes:
> On Fri, Dec 16, 2022 at 1:31 PM Tom Lane  wrote:
>> The reg* functions probably need a unified plan as to how far
>> down we want to push non-error behavior.  The rest of these
>> I think just require turning the crank along the same lines
>> as in functions already dealt with.

> I would be in favor of an aggressive approach.

I agree that anything based on implementation concerns is going
to look pretty unprincipled to end users.  However ...

> It also doesn't seem too bad from an implementation point of view to
> try to cover all the caes.

... I guess you didn't read my remarks upthread about regtypein.
I do not want to try to make gram.y+scan.l non-error-throwing.

regards, tom lane




Re: Error-safe user functions

2022-12-19 Thread Robert Haas
On Fri, Dec 16, 2022 at 1:31 PM Tom Lane  wrote:
> The reg* functions probably need a unified plan as to how far
> down we want to push non-error behavior.  The rest of these
> I think just require turning the crank along the same lines
> as in functions already dealt with.

I would be in favor of an aggressive approach. For example, let's look
at regclassin(). It calls oidin(), stringToQualifiedNameList(),
makeRangeVarFromNameList(), and RangeVarGetRelidExtended(). Basically,
oidin() could fail if the input, known to be all digits, is out of
range; stringToQualifiedNameList() could fail due to mismatched
delimiters or improperly-separated names; makeRangeVarFromNameList()
doesn't want to have more than three name components
(db.schema.relation); and RangeVarGetRelidExtended() doesn't like
cross-database references or non-existent relations.

Now, one option here would be to distinguish between something that
could be valid in some database but isn't in this one, like a
non-existent relation name, and one that couldn't ever work anywhere,
like a relation name with four parts or bad quoting. You could decide
that the former kind of error will be reported softly but the latter
is hard error. But I think that is presuming that we know how users
will want to use this functionality, and I don't think we do. I also
think that it will be confusing to users. Finally, I think it's
different from what we do for other data types. You could equally well
argue that, for int4in, we ought to treat '99' and 'potato'
differently, one a hard error and the other soft. I think it's hard to
puzzle out a decision that makes any sense there, and I don't think
this case is much different. I don't think it's too hard to mentally
separate errors about the validity of the input from, say, out of
memory errors -- but one distinguishing between one kind of input
validity check and another seems like a muddle.

It also doesn't seem too bad from an implementation point of view to
try to cover all the caes. The stickiest case looks to be
RangeVarGetRelidExtended() and we might need to give a bit of thought
to how to handle that one. The others don't seem like a big issue, and
oidin() is already done.

> While it'd be good to get all of these done before v16 feature
> freeze, I can't see that any of them represent blockers for
> building features based on soft input error handling.

+1.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Error-safe user functions

2022-12-18 Thread Andrew Dunstan

On 2022-12-14 We 17:37, Tom Lane wrote:
> Andrew Dunstan  writes:
>> Thanks, I have been looking at jsonpath, but I'm not quite sure how to
>> get the escontext argument to the yyerror calls in jsonath_scan.l. Maybe
>> I need to specify a lex-param setting?
> You want a parse-param option in jsonpath_gram.y, I think; adding that
> will persuade Bison to change the signatures of relevant functions.
> Compare the mods I made in contrib/cube in ccff2d20e.
>
>   


Yeah, I started there, but it's substantially more complex - unlike cube
the jsonpath scanner calls the error routines as well as the parser.


Anyway, here's a patch.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com
diff --git a/src/backend/utils/adt/jsonpath.c b/src/backend/utils/adt/jsonpath.c
index 91af030095..6c7a5b9854 100644
--- a/src/backend/utils/adt/jsonpath.c
+++ b/src/backend/utils/adt/jsonpath.c
@@ -66,16 +66,19 @@
 #include "funcapi.h"
 #include "lib/stringinfo.h"
 #include "libpq/pqformat.h"
+#include "nodes/miscnodes.h"
 #include "miscadmin.h"
 #include "utils/builtins.h"
 #include "utils/json.h"
 #include "utils/jsonpath.h"
 
 
-static Datum jsonPathFromCstring(char *in, int len);
+static Datum jsonPathFromCstring(char *in, int len, struct Node *escontext);
 static char *jsonPathToCstring(StringInfo out, JsonPath *in,
 			   int estimated_len);
-static int	flattenJsonPathParseItem(StringInfo buf, JsonPathParseItem *item,
+static bool	flattenJsonPathParseItem(StringInfo buf, int *result,
+	 struct Node *escontext,
+	 JsonPathParseItem *item,
 	 int nestingLevel, bool insideArraySubscript);
 static void alignStringInfoInt(StringInfo buf);
 static int32 reserveSpaceForItemPointer(StringInfo buf);
@@ -95,7 +98,7 @@ jsonpath_in(PG_FUNCTION_ARGS)
 	char	   *in = PG_GETARG_CSTRING(0);
 	int			len = strlen(in);
 
-	return jsonPathFromCstring(in, len);
+	return jsonPathFromCstring(in, len, fcinfo->context);
 }
 
 /*
@@ -119,7 +122,7 @@ jsonpath_recv(PG_FUNCTION_ARGS)
 	else
 		elog(ERROR, "unsupported jsonpath version number: %d", version);
 
-	return jsonPathFromCstring(str, nbytes);
+	return jsonPathFromCstring(str, nbytes, NULL);
 }
 
 /*
@@ -165,24 +168,29 @@ jsonpath_send(PG_FUNCTION_ARGS)
  * representation of jsonpath.
  */
 static Datum
-jsonPathFromCstring(char *in, int len)
+jsonPathFromCstring(char *in, int len, struct Node *escontext)
 {
-	JsonPathParseResult *jsonpath = parsejsonpath(in, len);
+	JsonPathParseResult *jsonpath = parsejsonpath(in, len, escontext);
 	JsonPath   *res;
 	StringInfoData buf;
 
-	initStringInfo();
-	enlargeStringInfo(, 4 * len /* estimation */ );
-
-	appendStringInfoSpaces(, JSONPATH_HDRSZ);
+	if (SOFT_ERROR_OCCURRED(escontext))
+		return (Datum) NULL;
 
 	if (!jsonpath)
-		ereport(ERROR,
+		ereturn(escontext, (Datum) NULL,
 (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
  errmsg("invalid input syntax for type %s: \"%s\"", "jsonpath",
 		in)));
 
-	flattenJsonPathParseItem(, jsonpath->expr, 0, false);
+	initStringInfo();
+	enlargeStringInfo(, 4 * len /* estimation */ );
+
+	appendStringInfoSpaces(, JSONPATH_HDRSZ);
+
+	if (!flattenJsonPathParseItem(, NULL, escontext,
+  jsonpath->expr, 0, false))
+		return (Datum) NULL;
 
 	res = (JsonPath *) buf.data;
 	SET_VARSIZE(res, buf.len);
@@ -225,9 +233,10 @@ jsonPathToCstring(StringInfo out, JsonPath *in, int estimated_len)
  * Recursive function converting given jsonpath parse item and all its
  * children into a binary representation.
  */
-static int
-flattenJsonPathParseItem(StringInfo buf, JsonPathParseItem *item,
-		 int nestingLevel, bool insideArraySubscript)
+static bool
+flattenJsonPathParseItem(StringInfo buf,  int *result, struct Node *escontext,
+		 JsonPathParseItem *item, int nestingLevel,
+		 bool insideArraySubscript)
 {
 	/* position from beginning of jsonpath data */
 	int32		pos = buf->len - JSONPATH_HDRSZ;
@@ -295,16 +304,22 @@ flattenJsonPathParseItem(StringInfo buf, JsonPathParseItem *item,
 int32		left = reserveSpaceForItemPointer(buf);
 int32		right = reserveSpaceForItemPointer(buf);
 
-chld = !item->value.args.left ? pos :
-	flattenJsonPathParseItem(buf, item->value.args.left,
-			 nestingLevel + argNestingLevel,
-			 insideArraySubscript);
+if (!item->value.args.left)
+	chld = pos;
+else if (! flattenJsonPathParseItem(buf, , escontext,
+	item->value.args.left,
+	nestingLevel + argNestingLevel,
+	insideArraySubscript))
+	return false;
 *(int32 *) (buf->data + left) = chld - pos;
 
-chld = !item->value.args.right ? pos :
-	flattenJsonPathParseItem(buf, item->value.args.right,
-			 nestingLevel + argNestingLevel,
-			 insideArraySubscript);
+if (!item->value.args.right)
+	chld = pos;
+else if (! flattenJsonPathParseItem(buf, , escontext,
+	item->value.args.right,
+	nestingLevel + 

Re: Error-safe user functions

2022-12-17 Thread Andrew Dunstan


On 2022-12-15 Th 09:12, Robert Haas wrote:
> On Wed, Dec 14, 2022 at 6:24 PM Tom Lane  wrote:
>> I've gone through these now and revised/pushed most of them.
> Tom, I just want to extend huge thanks to you for working on this
> infrastructure. jsonpath aside, I think this is going to pay dividends
> in many ways for many years to come. It's something that we've needed
> for a really long time, and I'm very happy that we're moving forward
> with it.
>
> Thanks so much.
>

Robert beat me to it, but I will heartily second this. Many thanks.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Error-safe user functions

2022-12-16 Thread Tom Lane
I wrote:
> I'm going to step back from this for now and get on with other work,
> but before that I thought there was one more input function I should
> look at: xml_in, because xml.c is such a hairy can of worms.

Pushed that.  For the record, my list of input functions still needing
attention stands at

Core:

jsonpath_in
regclassin
regcollationin
regconfigin
regdictionaryin
regnamespacein
regoperatorin
regoperin
regprocedurein
regprocin
regrolein
regtypein
tsqueryin
tsvectorin

Contrib:

hstore:
hstore_in
intarray:
bqarr_in
isn:
ean13_in
isbn_in
ismn_in
issn_in
upc_in
ltree:
ltree_in
lquery_in
ltxtq_in
seg:
seg_in

The reg* functions probably need a unified plan as to how far
down we want to push non-error behavior.  The rest of these
I think just require turning the crank along the same lines
as in functions already dealt with.

While it'd be good to get all of these done before v16 feature
freeze, I can't see that any of them represent blockers for
building features based on soft input error handling.

regards, tom lane




Re: Error-safe user functions

2022-12-15 Thread Tom Lane
Robert Haas  writes:
> Tom, I just want to extend huge thanks to you for working on this
> infrastructure.

Thanks.  I agree it's an important bit of work.

I'm going to step back from this for now and get on with other work,
but before that I thought there was one more input function I should
look at: xml_in, because xml.c is such a hairy can of worms.  It
turns out to be not too bad, given our design principle that only
"bad input" errors should be reported softly.  xml_parse() now has
two different ways of reporting errors depending on whether they're
hard or soft, but it didn't take an undue amount of refactoring to
make that work.

While fixing that, my attention was drawn to wellformed_xml(),
whose error handling is unbelievably horrid: it traps any longjmp
whatsoever (query cancel, for instance) and reports it as ill-formed XML.
0002 attached makes use of this new code to get rid of the need for any
PG_TRY there at all; instead, soft errors result in a "false" return
but hard errors are allowed to propagate.  xml_is_document was much more
careful, but we can change it the same way to save code and cycles.

regards, tom lane

diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 3884babc1b..4b9877ee0b 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -119,9 +119,10 @@ struct PgXmlErrorContext
 
 static xmlParserInputPtr xmlPgEntityLoader(const char *URL, const char *ID,
 		   xmlParserCtxtPtr ctxt);
+static void xml_errsave(Node *escontext, PgXmlErrorContext *errcxt,
+		int sqlcode, const char *msg);
 static void xml_errorHandler(void *data, xmlErrorPtr error);
-static void xml_ereport_by_code(int level, int sqlcode,
-const char *msg, int code);
+static int	errdetail_for_xml_code(int code);
 static void chopStringInfoNewlines(StringInfo str);
 static void appendStringInfoLineSeparator(StringInfo str);
 
@@ -143,7 +144,8 @@ static bool print_xml_decl(StringInfo buf, const xmlChar *version,
 		   pg_enc encoding, int standalone);
 static bool xml_doctype_in_content(const xmlChar *str);
 static xmlDocPtr xml_parse(text *data, XmlOptionType xmloption_arg,
-		   bool preserve_whitespace, int encoding);
+		   bool preserve_whitespace, int encoding,
+		   Node *escontext);
 static text *xml_xmlnodetoxmltype(xmlNodePtr cur, PgXmlErrorContext *xmlerrcxt);
 static int	xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj,
    ArrayBuildState *astate,
@@ -261,14 +263,18 @@ xml_in(PG_FUNCTION_ARGS)
 	xmltype*vardata;
 	xmlDocPtr	doc;
 
+	/* Build the result object. */
 	vardata = (xmltype *) cstring_to_text(s);
 
 	/*
-	 * Parse the data to check if it is well-formed XML data.  Assume that
-	 * ERROR occurred if parsing failed.
+	 * Parse the data to check if it is well-formed XML data.
+	 *
+	 * Note: we don't need to worry about whether a soft error is detected.
 	 */
-	doc = xml_parse(vardata, xmloption, true, GetDatabaseEncoding());
-	xmlFreeDoc(doc);
+	doc = xml_parse(vardata, xmloption, true, GetDatabaseEncoding(),
+	fcinfo->context);
+	if (doc != NULL)
+		xmlFreeDoc(doc);
 
 	PG_RETURN_XML_P(vardata);
 #else
@@ -323,9 +329,10 @@ xml_out_internal(xmltype *x, pg_enc target_encoding)
 		return buf.data;
 	}
 
-	xml_ereport_by_code(WARNING, ERRCODE_INTERNAL_ERROR,
-		"could not parse XML declaration in stored value",
-		res_code);
+	ereport(WARNING,
+			errcode(ERRCODE_INTERNAL_ERROR),
+			errmsg_internal("could not parse XML declaration in stored value"),
+			errdetail_for_xml_code(res_code));
 #endif
 	return str;
 }
@@ -392,7 +399,7 @@ xml_recv(PG_FUNCTION_ARGS)
 	 * Parse the data to check if it is well-formed XML data.  Assume that
 	 * xml_parse will throw ERROR if not.
 	 */
-	doc = xml_parse(result, xmloption, true, encoding);
+	doc = xml_parse(result, xmloption, true, encoding, NULL);
 	xmlFreeDoc(doc);
 
 	/* Now that we know what we're dealing with, convert to server encoding */
@@ -754,7 +761,7 @@ xmlparse(text *data, XmlOptionType xmloption_arg, bool preserve_whitespace)
 	xmlDocPtr	doc;
 
 	doc = xml_parse(data, xmloption_arg, preserve_whitespace,
-	GetDatabaseEncoding());
+	GetDatabaseEncoding(), NULL);
 	xmlFreeDoc(doc);
 
 	return (xmltype *) data;
@@ -895,7 +902,7 @@ xml_is_document(xmltype *arg)
 	PG_TRY();
 	{
 		doc = xml_parse((text *) arg, XMLOPTION_DOCUMENT, true,
-		GetDatabaseEncoding());
+		GetDatabaseEncoding(), NULL);
 		result = true;
 	}
 	PG_CATCH();
@@ -1500,17 +1507,26 @@ xml_doctype_in_content(const xmlChar *str)
 
 
 /*
- * Convert a C string to XML internal representation
+ * Convert a text object to XML internal representation
+ *
+ * data is the source data (must not be toasted!), encoding is its encoding,
+ * and xmloption_arg and preserve_whitespace are options for the
+ * transformation.
+ *
+ * Errors normally result in ereport(ERROR), but if escontext is an
+ * ErrorSaveContext, then "safe" errors are reported there 

Re: Error-safe user functions

2022-12-15 Thread Robert Haas
On Wed, Dec 14, 2022 at 6:24 PM Tom Lane  wrote:
> I've gone through these now and revised/pushed most of them.

Tom, I just want to extend huge thanks to you for working on this
infrastructure. jsonpath aside, I think this is going to pay dividends
in many ways for many years to come. It's something that we've needed
for a really long time, and I'm very happy that we're moving forward
with it.

Thanks so much.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Error-safe user functions

2022-12-14 Thread Amul Sul
On Thu, Dec 15, 2022 at 11:16 AM Tom Lane  wrote:
>
> Amul Sul  writes:
> > There are other a bunch of hard errors from get_multirange_io_data(),
> > get_range_io_data() and its subroutine can hit, shouldn't we care
> > about those?
>
> I think those are all "internal" errors, ie not reachable as a
> consequence of bad input data.  Do you see a reason to think
> differently?

Make sense, I was worried about the internal errors as well as an
error that the user can cause while declaring multi-range e.g. shell
type, but realized that case gets checked at creating that multi-range
type.

Regards,
Amul




Re: Error-safe user functions

2022-12-14 Thread Tom Lane
Amul Sul  writes:
> There are other a bunch of hard errors from get_multirange_io_data(),
> get_range_io_data() and its subroutine can hit, shouldn't we care
> about those?

I think those are all "internal" errors, ie not reachable as a
consequence of bad input data.  Do you see a reason to think
differently?

regards, tom lane




Re: Error-safe user functions

2022-12-14 Thread Amul Sul
On Thu, Dec 15, 2022 at 9:03 AM Tom Lane  wrote:
>
> Here are some proposed patches for converting range_in and multirange_in.
>
> 0001 tackles the straightforward part, which is trapping syntax errors
> and called-input-function errors.  The only thing that I think might
> be controversial here is that I chose to change the signatures of
> the exposed functions range_serialize and make_range rather than
> inventing xxx_safe variants.  I think this is all right, because
> AFAIK the only likely reason for extensions to call either of those
> is that custom types' canonical functions would need to call
> range_serialize --- and those will need to be touched anyway,
> see 0002.
>
> What 0001 does not cover is trapping errors occurring in range
> canonicalize functions.  I'd first thought maybe doing that wasn't
> worth the trouble, but it's not really very hard to fix the built-in
> canonicalize functions, as shown in 0002.  Probably extensions would
> not find it much harder, and in any case they're not really required
> to make their errors soft.
>
> Any objections?
>

There are other a bunch of hard errors from get_multirange_io_data(),
get_range_io_data() and its subroutine can hit, shouldn't we care
about those?


Regards,
Amul




Re: Error-safe user functions

2022-12-14 Thread Tom Lane
Here are some proposed patches for converting range_in and multirange_in.

0001 tackles the straightforward part, which is trapping syntax errors
and called-input-function errors.  The only thing that I think might
be controversial here is that I chose to change the signatures of
the exposed functions range_serialize and make_range rather than
inventing xxx_safe variants.  I think this is all right, because
AFAIK the only likely reason for extensions to call either of those
is that custom types' canonical functions would need to call
range_serialize --- and those will need to be touched anyway,
see 0002.

What 0001 does not cover is trapping errors occurring in range
canonicalize functions.  I'd first thought maybe doing that wasn't
worth the trouble, but it's not really very hard to fix the built-in
canonicalize functions, as shown in 0002.  Probably extensions would
not find it much harder, and in any case they're not really required
to make their errors soft.

Any objections?

regards, tom lane

diff --git a/src/backend/utils/adt/multirangetypes.c b/src/backend/utils/adt/multirangetypes.c
index 307d087c97..ed26cfba2d 100644
--- a/src/backend/utils/adt/multirangetypes.c
+++ b/src/backend/utils/adt/multirangetypes.c
@@ -120,6 +120,7 @@ multirange_in(PG_FUNCTION_ARGS)
 	char	   *input_str = PG_GETARG_CSTRING(0);
 	Oid			mltrngtypoid = PG_GETARG_OID(1);
 	Oid			typmod = PG_GETARG_INT32(2);
+	Node	   *escontext = fcinfo->context;
 	TypeCacheEntry *rangetyp;
 	int32		ranges_seen = 0;
 	int32		range_count = 0;
@@ -133,6 +134,7 @@ multirange_in(PG_FUNCTION_ARGS)
 	const char *range_str_begin = NULL;
 	int32		range_str_len;
 	char	   *range_str;
+	Datum		range_datum;
 
 	cache = get_multirange_io_data(fcinfo, mltrngtypoid, IOFunc_input);
 	rangetyp = cache->typcache->rngtype;
@@ -144,7 +146,7 @@ multirange_in(PG_FUNCTION_ARGS)
 	if (*ptr == '{')
 		ptr++;
 	else
-		ereport(ERROR,
+		ereturn(escontext, (Datum) 0,
 (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
  errmsg("malformed multirange literal: \"%s\"",
 		input_str),
@@ -157,7 +159,7 @@ multirange_in(PG_FUNCTION_ARGS)
 		char		ch = *ptr;
 
 		if (ch == '\0')
-			ereport(ERROR,
+			ereturn(escontext, (Datum) 0,
 	(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
 	 errmsg("malformed multirange literal: \"%s\"",
 			input_str),
@@ -186,7 +188,7 @@ multirange_in(PG_FUNCTION_ARGS)
 	parse_state = MULTIRANGE_AFTER_RANGE;
 }
 else
-	ereport(ERROR,
+	ereturn(escontext, (Datum) 0,
 			(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
 			 errmsg("malformed multirange literal: \"%s\"",
 	input_str),
@@ -204,10 +206,14 @@ multirange_in(PG_FUNCTION_ARGS)
 			repalloc(ranges, range_capacity * sizeof(RangeType *));
 	}
 	ranges_seen++;
-	range = DatumGetRangeTypeP(InputFunctionCall(>typioproc,
- range_str,
- cache->typioparam,
- typmod));
+	if (!InputFunctionCallSafe(>typioproc,
+			   range_str,
+			   cache->typioparam,
+			   typmod,
+			   escontext,
+			   _datum))
+		PG_RETURN_NULL();
+	range = DatumGetRangeTypeP(range_datum);
 	if (!RangeIsEmpty(range))
 		ranges[range_count++] = range;
 	parse_state = MULTIRANGE_AFTER_RANGE;
@@ -256,7 +262,7 @@ multirange_in(PG_FUNCTION_ARGS)
 else if (ch == '}')
 	parse_state = MULTIRANGE_FINISHED;
 else
-	ereport(ERROR,
+	ereturn(escontext, (Datum) 0,
 			(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
 			 errmsg("malformed multirange literal: \"%s\"",
 	input_str),
@@ -280,7 +286,7 @@ multirange_in(PG_FUNCTION_ARGS)
 		ptr++;
 
 	if (*ptr != '\0')
-		ereport(ERROR,
+		ereturn(escontext, (Datum) 0,
 (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
  errmsg("malformed multirange literal: \"%s\"",
 		input_str),
@@ -807,7 +813,7 @@ multirange_get_union_range(TypeCacheEntry *rangetyp,
 	multirange_get_bounds(rangetyp, mr, 0, , );
 	multirange_get_bounds(rangetyp, mr, mr->rangeCount - 1, , );
 
-	return make_range(rangetyp, , , false);
+	return make_range(rangetyp, , , false, NULL);
 }
 
 
@@ -2696,7 +2702,8 @@ range_merge_from_multirange(PG_FUNCTION_ARGS)
 		multirange_get_bounds(typcache->rngtype, mr, mr->rangeCount - 1,
 			  , );
 
-		result = make_range(typcache->rngtype, , , false);
+		result = make_range(typcache->rngtype, , ,
+			false, NULL);
 	}
 
 	PG_RETURN_RANGE_P(result);
diff --git a/src/backend/utils/adt/multirangetypes_selfuncs.c b/src/backend/utils/adt/multirangetypes_selfuncs.c
index 919c8889d4..8fd6250e4a 100644
--- a/src/backend/utils/adt/multirangetypes_selfuncs.c
+++ b/src/backend/utils/adt/multirangetypes_selfuncs.c
@@ -221,7 +221,8 @@ multirangesel(PG_FUNCTION_ARGS)
 			upper.val = ((Const *) other)->constvalue;
 			upper.infinite = false;
 			upper.lower = false;
-			constrange = range_serialize(typcache->rngtype, , , false);
+			constrange = 

Re: Error-safe user functions

2022-12-14 Thread Tom Lane
I wrote:
> Amul Sul  writes:
>> Attaching a complete set of the patches changing function till this
>> except bpcharin, byteain jsonpath_in that Andrew is planning to look
>> in. I have skipped reg* functions.

> I'll take a look at these shortly, unless Andrew is already on it.

I've gone through these now and revised/pushed most of them.

I do not think that we need to touch any unimplemented I/O functions.
They can just as well be unimplemented for soft-error cases too;
I can't see any use-case where it could be useful to have them not
complain.  So I discarded

v1-0004-Change-brin_bloom_summary_in-to-allow-non-throw-e.patch
v1-0005-Change-brin_minmax_multi_summary_in-to-allow-non-.patch
v1-0009-Change-gtsvectorin-to-allow-non-throw-error-repor.patch
v1-0018-Change-pg_mcv_list_in-to-allow-non-throw-error-re.patch
v1-0019-Change-pg_ndistinct_in-to-allow-non-throw-error-r.patch

As for the rest, some were closer to being committable than others.
You need to be more careful about handling error cases in subroutines:
you can't just ereturn from a subroutine and figure you're done,
because the caller will keep plugging along if you don't do something
to teach it not to.  What that would often lead to is the caller
finding what it thinks is a new error condition, and overwriting the
original message with something that's much less on-point.  This is
comparable to cascading errors from a compiler: anybody who's dealt
with those knows that errors after the first one are often just noise.
So we have to be careful to quit after we log the first error.

Also, I ended up ripping out the changes in line_construct, because
as soon as I tried to test them I tripped over the fact that lseg_sl
was still throwing hard errors, before we ever get to line_construct.
Perhaps it is worth fixing all that but I have to deem it very low
priority, because the two-input-points formulation isn't the mainstream
code path.  (I kind of wonder too if there isn't a better, more
numerically robust conversion method ...)  In any case I'm pretty sure
those changes in float.h would have drawn Andres' ire.  We don't want
to be adding arguments to float_overflow_error/float_underflow_error;
if that were acceptable they'd not have looked like that to begin with.

Anyway, thanks for the work!  That moved us a good ways.

I think I'm going to go fix bpcharin and varcharin, because those
are the last of the SQL-spec-defined types.

regards, tom lane




Re: Error-safe user functions

2022-12-14 Thread Tom Lane
Andrew Dunstan  writes:
> Thanks, I have been looking at jsonpath, but I'm not quite sure how to
> get the escontext argument to the yyerror calls in jsonath_scan.l. Maybe
> I need to specify a lex-param setting?

You want a parse-param option in jsonpath_gram.y, I think; adding that
will persuade Bison to change the signatures of relevant functions.
Compare the mods I made in contrib/cube in ccff2d20e.

regards, tom lane




Re: Error-safe user functions

2022-12-14 Thread Andrew Dunstan


On 2022-12-14 We 11:00, Tom Lane wrote:
> Amul Sul  writes:
>> Attaching a complete set of the patches changing function till this
>> except bpcharin, byteain jsonpath_in that Andrew is planning to look
>> in. I have skipped reg* functions.
> I'll take a look at these shortly, unless Andrew is already on it.
>
>   


Thanks, I have been looking at jsonpath, but I'm not quite sure how to
get the escontext argument to the yyerror calls in jsonath_scan.l. Maybe
I need to specify a lex-param setting?


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Error-safe user functions

2022-12-14 Thread Tom Lane
Amul Sul  writes:
> Attaching a complete set of the patches changing function till this
> except bpcharin, byteain jsonpath_in that Andrew is planning to look
> in. I have skipped reg* functions.

I'll take a look at these shortly, unless Andrew is already on it.

regards, tom lane




Re: Error-safe user functions

2022-12-13 Thread Amul Sul
On Mon, Dec 12, 2022 at 12:00 AM Tom Lane  wrote:
>
> Andrew Dunstan  writes:
> > Maybe as we work through the remaining input functions (there are about
> > 60 core candidates left on my list) we should mark them with a comment
> > if no adjustment is needed.
>
> I did a quick pass through them last night.  Assuming that we don't
> need to touch the unimplemented input functions (eg for pseudotypes),
> I count these core functions as still needing work:
>
> aclitemin
> bit_in
> box_in
> bpcharin
> byteain
> cash_in
> cidin
> cidr_in
> circle_in
> inet_in
> int2vectorin
> jsonpath_in
> line_in
> lseg_in
> macaddr8_in
> macaddr_in

Attaching patches changing these functions except bpcharin,
byteain, jsonpath_in, and cidin. I am continuing work on the next
items below:

> multirange_in
> namein
> oidin
> oidvectorin
> path_in
> pg_lsn_in
> pg_snapshot_in
> point_in
> poly_in
> range_in
> regclassin
> regcollationin
> regconfigin
> regdictionaryin
> regnamespacein
> regoperatorin
> regoperin
> regprocedurein
> regprocin
> regrolein
> regtypein
> tidin
> tsqueryin
> tsvectorin
> uuid_in
> varbit_in
> varcharin
> xid8in
> xidin
> xml_in
>
> and these contrib functions:
>
> hstore:
> hstore_in
> intarray:
> bqarr_in
> isn:
> ean13_in
> isbn_in
> ismn_in
> issn_in
> upc_in
> ltree:
> ltree_in
> lquery_in
> ltxtq_in
> seg:
> seg_in
>
> Maybe we should have a conversation about which of these are
> highest priority to get to a credible feature.  We clearly need
> to fix the remaining SQL-spec types (varchar and bpchar, mainly).
> At the other extreme, likely nobody would weep if we never fixed
> int2vectorin, for instance.
>
> I'm a little concerned about the cost-benefit of fixing the reg* types.
> The ones that accept type names actually use the core grammar to parse
> those.  Now, we probably could fix the grammar to be non-throwing, but
> it'd be very invasive and I'm not sure about the performance impact.
> It might be best to content ourselves with soft reporting of lookup
> failures, as opposed to syntax problems.
>

Regards,
Amul
From 4c4c18bd8104114351ca58a73a9d92fbb0c85dd2 Mon Sep 17 00:00:00 2001
From: Amul Sul 
Date: Tue, 13 Dec 2022 17:29:04 +0530
Subject: [PATCH v1 13/14] Change macaddr8_in to allow non-throw error
 reporting

---
 src/backend/utils/adt/mac8.c   | 36 +-
 src/test/regress/expected/macaddr8.out | 25 ++
 src/test/regress/sql/macaddr8.sql  |  6 +
 3 files changed, 49 insertions(+), 18 deletions(-)

diff --git a/src/backend/utils/adt/mac8.c b/src/backend/utils/adt/mac8.c
index 24d219f6386..adb253a6ac0 100644
--- a/src/backend/utils/adt/mac8.c
+++ b/src/backend/utils/adt/mac8.c
@@ -35,7 +35,8 @@
 #define lobits(addr) \
   ((unsigned long)(((addr)->e<<24) | ((addr)->f<<16) | ((addr)->g<<8) | ((addr)->h)))
 
-static unsigned char hex2_to_uchar(const unsigned char *ptr, const unsigned char *str);
+static unsigned char hex2_to_uchar(const unsigned char *ptr, const unsigned char *str,
+   Node *escontext);
 
 static const signed char hexlookup[128] = {
 	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
@@ -58,7 +59,8 @@ static const signed char hexlookup[128] = {
  * the entire string, which is used only for error reporting.
  */
 static inline unsigned char
-hex2_to_uchar(const unsigned char *ptr, const unsigned char *str)
+hex2_to_uchar(const unsigned char *ptr, const unsigned char *str,
+			  Node *escontext)
 {
 	unsigned char ret = 0;
 	signed char lookup;
@@ -88,13 +90,10 @@ hex2_to_uchar(const unsigned char *ptr, const unsigned char *str)
 	return ret;
 
 invalid_input:
-	ereport(ERROR,
+	ereturn(escontext, 0,
 			(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
 			 errmsg("invalid input syntax for type %s: \"%s\"", "macaddr8",
 	str)));
-
-	/* We do not actually reach here */
-	return 0;
 }
 
 /*
@@ -104,6 +103,7 @@ Datum
 macaddr8_in(PG_FUNCTION_ARGS)
 {
 	const unsigned char *str = (unsigned char *) PG_GETARG_CSTRING(0);
+	Node	   *escontext = fcinfo->context;
 	const unsigned char *ptr = str;
 	macaddr8   *result;
 	unsigned char a = 0,
@@ -136,32 +136,32 @@ macaddr8_in(PG_FUNCTION_ARGS)
 		switch (count)
 		{
 			case 1:
-a = hex2_to_uchar(ptr, str);
+a = hex2_to_uchar(ptr, str, escontext);
 break;
 			case 2:
-b = hex2_to_uchar(ptr, str);
+b = hex2_to_uchar(ptr, str, escontext);
 break;
 			case 3:
-c = hex2_to_uchar(ptr, str);
+c = hex2_to_uchar(ptr, str, escontext);
 break;
 			case 4:
-d = hex2_to_uchar(ptr, str);
+d = hex2_to_uchar(ptr, str, escontext);
 break;
 			case 5:
-e = hex2_to_uchar(ptr, str);
+e = hex2_to_uchar(ptr, str, escontext);
 break;
 			case 6:
-f = hex2_to_uchar(ptr, str);
+f = hex2_to_uchar(ptr, str, escontext);
 break;
 			case 7:
-g = hex2_to_uchar(ptr, str);
+g = hex2_to_uchar(ptr, str, escontext);
 break;
 			case 8:
-h = hex2_to_uchar(ptr, str);
+h = hex2_to_uchar(ptr, str, 

Re: Error-safe user functions

2022-12-11 Thread Tom Lane
Andres Freund  writes:
> On 2022-12-11 12:24:11 -0500, Tom Lane wrote:
>> I spent some time looking at this, and was discouraged to conclude
>> that the notational mess would probably be substantially out of
>> proportion to the value.  The main problem is that we'd have to change
>> the API of pushJsonbValue, which has more than 150 call sites, most
>> of which would need to grow a new test for failure return.  Maybe
>> somebody will feel like tackling that at some point, but not me today.

> Could we address this more minimally by putting the error state into the
> JsonbParseState and add a check for that error state to convertToJsonb() or
> such (by passing in the JsonbParseState)? We'd need to return immediately in
> pushJsonbValue() if there's already an error, but that that's not too bad.

We could shoehorn error state into the JsonbParseState, although the
fact that that stack normally starts out empty is a bit of a problem.
I think you'd have to push a dummy entry if you want soft errors,
store the error state pointer into that, and have pushState() copy
down the parent's error pointer.  Kind of ugly, but do-able.  Whether
it's better than replacing that argument with a pointer-to-struct-
that-includes-the-stack-and-the-error-pointer wasn't real clear to me.

What seemed like a mess was getting the calling code to quit early.
I'm not convinced that just putting an immediate exit into pushJsonbValue
would be enough, because the callers tend to assume a series of calls
will behave as they expect.  Probably some of the call sites could
ignore the issue, but you'd still end with a lot of messy changes
I fear.

regards, tom lane




Re: Error-safe user functions

2022-12-11 Thread Andres Freund
Hi,

On 2022-12-11 12:24:11 -0500, Tom Lane wrote:
> Andrew Dunstan  writes:
> > On 2022-12-10 Sa 14:38, Tom Lane wrote:
> >> I have not done anything here about errors within JsonbValueToJsonb.
> >> There would need to be another round of API-extension in that area
> >> if we want to be able to trap its errors.  As you say, those are mostly
> >> about exceeding implementation size limits, so I suppose one could argue
> >> that they are not so different from palloc failure.  It's still annoying.
> >> If people are good with the changes attached, I might take a look at
> >> that.
> 
> > Awesome.
> 
> I spent some time looking at this, and was discouraged to conclude
> that the notational mess would probably be substantially out of
> proportion to the value.  The main problem is that we'd have to change
> the API of pushJsonbValue, which has more than 150 call sites, most
> of which would need to grow a new test for failure return.  Maybe
> somebody will feel like tackling that at some point, but not me today.

Could we address this more minimally by putting the error state into the
JsonbParseState and add a check for that error state to convertToJsonb() or
such (by passing in the JsonbParseState)? We'd need to return immediately in
pushJsonbValue() if there's already an error, but that that's not too bad.

Greetings,

Andres Freund




Re: Error-safe user functions

2022-12-11 Thread Tom Lane
Andrew Dunstan  writes:
> Maybe as we work through the remaining input functions (there are about
> 60 core candidates left on my list) we should mark them with a comment
> if no adjustment is needed.

I did a quick pass through them last night.  Assuming that we don't
need to touch the unimplemented input functions (eg for pseudotypes),
I count these core functions as still needing work:

aclitemin
bit_in
box_in
bpcharin
byteain
cash_in
cidin
cidr_in
circle_in
inet_in
int2vectorin
jsonpath_in
line_in
lseg_in
macaddr8_in
macaddr_in
multirange_in
namein
oidin
oidvectorin
path_in
pg_lsn_in
pg_snapshot_in
point_in
poly_in
range_in
regclassin
regcollationin
regconfigin
regdictionaryin
regnamespacein
regoperatorin
regoperin
regprocedurein
regprocin
regrolein
regtypein
tidin
tsqueryin
tsvectorin
uuid_in
varbit_in
varcharin
xid8in
xidin
xml_in

and these contrib functions:

hstore:
hstore_in
intarray:
bqarr_in
isn:
ean13_in
isbn_in
ismn_in
issn_in
upc_in
ltree:
ltree_in
lquery_in
ltxtq_in
seg:
seg_in

Maybe we should have a conversation about which of these are
highest priority to get to a credible feature.  We clearly need
to fix the remaining SQL-spec types (varchar and bpchar, mainly).
At the other extreme, likely nobody would weep if we never fixed
int2vectorin, for instance.

I'm a little concerned about the cost-benefit of fixing the reg* types.
The ones that accept type names actually use the core grammar to parse
those.  Now, we probably could fix the grammar to be non-throwing, but
it'd be very invasive and I'm not sure about the performance impact.
It might be best to content ourselves with soft reporting of lookup
failures, as opposed to syntax problems.

regards, tom lane




Re: Error-safe user functions

2022-12-11 Thread Andrew Dunstan


On 2022-12-11 Su 12:24, Tom Lane wrote:
> Andrew Dunstan  writes:
>> On 2022-12-10 Sa 14:38, Tom Lane wrote:
>>> I have not done anything here about errors within JsonbValueToJsonb.
>>> There would need to be another round of API-extension in that area
>>> if we want to be able to trap its errors.  As you say, those are mostly
>>> about exceeding implementation size limits, so I suppose one could argue
>>> that they are not so different from palloc failure.  It's still annoying.
>>> If people are good with the changes attached, I might take a look at
>>> that.
>> Awesome.
> I spent some time looking at this, and was discouraged to conclude
> that the notational mess would probably be substantially out of
> proportion to the value.  The main problem is that we'd have to change
> the API of pushJsonbValue, which has more than 150 call sites, most
> of which would need to grow a new test for failure return.  Maybe
> somebody will feel like tackling that at some point, but not me today.
>
>   


Yes, I had similar feelings when I looked at it. I don't think this
needs to hold up proceeding with the SQL/JSON rework, which I think can
reasonably restart now.

Maybe as we work through the remaining input functions (there are about
60 core candidates left on my list) we should mark them with a comment
if no adjustment is needed.

I'm going to look at jsonpath and the text types next, I somewhat tied
up this week but might get to relook at pushJsonbValue later in the month.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Error-safe user functions

2022-12-11 Thread Tom Lane
Andrew Dunstan  writes:
> On 2022-12-10 Sa 14:38, Tom Lane wrote:
>> I have not done anything here about errors within JsonbValueToJsonb.
>> There would need to be another round of API-extension in that area
>> if we want to be able to trap its errors.  As you say, those are mostly
>> about exceeding implementation size limits, so I suppose one could argue
>> that they are not so different from palloc failure.  It's still annoying.
>> If people are good with the changes attached, I might take a look at
>> that.

> Awesome.

I spent some time looking at this, and was discouraged to conclude
that the notational mess would probably be substantially out of
proportion to the value.  The main problem is that we'd have to change
the API of pushJsonbValue, which has more than 150 call sites, most
of which would need to grow a new test for failure return.  Maybe
somebody will feel like tackling that at some point, but not me today.

regards, tom lane




Re: Error-safe user functions

2022-12-11 Thread Andrew Dunstan


On 2022-12-10 Sa 19:00, Tom Lane wrote:
> Looking at my notes, there's one other loose end
> I forgot to mention:
>
>  * Note: pg_unicode_to_server() will throw an error for a
>  * conversion failure, rather than returning a failure
>  * indication.  That seems OK.
>
> We ought to do something about that, but I'm not sure how hard we
> ought to work at it.  Perhaps it's sufficient to make a variant of
> pg_unicode_to_server that just returns true/false instead of failing,
> and add a JsonParseErrorType for "untranslatable character" to let
> json_errdetail return a reasonably on-point message. 


Seems reasonable.


>  We could imagine
> extending the ErrorSaveContext infrastructure into the encoding
> conversion modules, and maybe at some point that'll be worth doing,
> but in this particular context it doesn't seem like we'd be getting
> a very much better error message.  The main thing that we would get
> from such an extension is a chance to capture the report from
> report_untranslatable_char.  But what that adds is the ability to
> identify exactly which character couldn't be translated --- and in
> this use-case there's always just one.
>
>   


Yeah, probably overkill for now.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Error-safe user functions

2022-12-10 Thread Tom Lane
Andrew Dunstan  writes:
> On 2022-12-10 Sa 14:38, Tom Lane wrote:
>> Seeing that SQL/JSON is one of the major drivers of this whole project,
>> it seemed a little sad to me that jsonb couldn't manage to implement
>> what is required.  So I spent a bit of time poking at it.  Attached
>> is an extended version of your patch that also covers jsonb.

> Many thanks for doing this, it looks good.

Cool, thanks.  Looking at my notes, there's one other loose end
I forgot to mention:

 * Note: pg_unicode_to_server() will throw an error for a
 * conversion failure, rather than returning a failure
 * indication.  That seems OK.

We ought to do something about that, but I'm not sure how hard we
ought to work at it.  Perhaps it's sufficient to make a variant of
pg_unicode_to_server that just returns true/false instead of failing,
and add a JsonParseErrorType for "untranslatable character" to let
json_errdetail return a reasonably on-point message.  We could imagine
extending the ErrorSaveContext infrastructure into the encoding
conversion modules, and maybe at some point that'll be worth doing,
but in this particular context it doesn't seem like we'd be getting
a very much better error message.  The main thing that we would get
from such an extension is a chance to capture the report from
report_untranslatable_char.  But what that adds is the ability to
identify exactly which character couldn't be translated --- and in
this use-case there's always just one.

regards, tom lane




Re: Error-safe user functions

2022-12-10 Thread Andrew Dunstan


On 2022-12-10 Sa 14:38, Tom Lane wrote:
> Andrew Dunstan  writes:
>> OK, json is a fairly easy case, see attached. But jsonb is a different
>> kettle of fish. Both the semantic routines called by the parser and the
>> subsequent call to JsonbValueToJsonb() can raise errors. These are
>> pretty much all about breaking various limits (for strings, objects,
>> arrays). There's also a call to numeric_in, but I assume that a string
>> that's already parsed as a valid json numeric literal won't upset
>> numeric_in.
> Um, nope ...
>
> regression=# select '1e100'::jsonb;
> ERROR:  value overflows numeric format
> LINE 1: select '1e100'::jsonb;
>^


Oops, yeah.


>> Many of these occur several calls down the stack, so
>> adjusting everything to deal with them would be fairly invasive. Perhaps
>> we could instead document that this class of input error won't be
>> trapped, at least for jsonb.
> Seeing that SQL/JSON is one of the major drivers of this whole project,
> it seemed a little sad to me that jsonb couldn't manage to implement
> what is required.  So I spent a bit of time poking at it.  Attached
> is an extended version of your patch that also covers jsonb.
>
> The main thing I soon realized is that the JsonSemAction API is based
> on the assumption that semantic actions will report errors by throwing
> them.  This is a bit schizophrenic considering the parser itself carefully
> hands back error codes instead of throwing anything (excluding palloc
> failures of course).  What I propose in the attached is that we change
> that API so that action functions return JsonParseErrorType, and add
> an enum value denoting "I already logged a suitable error, so you don't
> have to".  It was a little tedious to modify all the existing functions
> that way, but not hard.  Only the ones used by jsonb_in need to do
> anything except "return JSON_SUCCESS", at least for now.


Many thanks for doing this, it looks good.

> I have not done anything here about errors within JsonbValueToJsonb.
> There would need to be another round of API-extension in that area
> if we want to be able to trap its errors.  As you say, those are mostly
> about exceeding implementation size limits, so I suppose one could argue
> that they are not so different from palloc failure.  It's still annoying.
> If people are good with the changes attached, I might take a look at
> that.
>
>   


Awesome.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Error-safe user functions

2022-12-10 Thread Tom Lane
Corey Huinker  writes:
> On Sat, Dec 10, 2022 at 9:20 AM Tom Lane  wrote:
>> A fallback we can offer to anyone with such a problem is "write a
>> plpgsql function and wrap the potentially-failing bit in an exception
>> block".  Then they get to pay the cost of the subtransaction, while
>> we're not imposing one on everybody else.

> That exception block will prevent parallel plans. I'm not saying it isn't
> the best way forward for us, but wanted to make that side effect clear.

Hmm.  Apropos of that, I notice that domain_in is marked PARALLEL SAFE,
which seems like a bad idea if it could invoke not-so-parallel-safe
expressions.  Do we need to mark it less safe, and if so how much less?

Anyway, assuming that people are okay with the Not Our Problem approach,
the patch is pretty trivial, as attached.  I started to write an addition
to the CREATE DOMAIN man page recommending that domain CHECK constraints
not throw errors, but couldn't get past the bare recommendation.  Normally
I'd want to explain such a thing along the lines of "For example, X won't
work" ... but we don't yet have any committed features that depend on
this.  I'm inclined to leave it like that for now.  If we don't remember
to fix it once we do have some features, I'm sure somebody will ask a
question about it eventually.

regards, tom lane

diff --git a/doc/src/sgml/ref/create_domain.sgml b/doc/src/sgml/ref/create_domain.sgml
index 82a0b87492..73f9f28d6c 100644
--- a/doc/src/sgml/ref/create_domain.sgml
+++ b/doc/src/sgml/ref/create_domain.sgml
@@ -239,6 +239,11 @@ INSERT INTO tab (domcol) VALUES ((SELECT domcol FROM tab WHERE false));
DOMAIN), adjust the function definition, and re-add the
constraint, thereby rechecking it against stored data.
   
+
+  
+   It's also good practice to ensure that domain CHECK
+   expressions will not throw errors.
+  
  
 
  
diff --git a/src/backend/utils/adt/domains.c b/src/backend/utils/adt/domains.c
index 3de0cb01a2..99aeaddb5d 100644
--- a/src/backend/utils/adt/domains.c
+++ b/src/backend/utils/adt/domains.c
@@ -126,9 +126,14 @@ domain_state_setup(Oid domainType, bool binary, MemoryContext mcxt)
  * This is roughly similar to the handling of CoerceToDomain nodes in
  * execExpr*.c, but we execute each constraint separately, rather than
  * compiling them in-line within a larger expression.
+ *
+ * If escontext points to an ErrorStateContext, any failures are reported
+ * there, otherwise they are ereport'ed.  Note that we do not attempt to do
+ * soft reporting of errors raised during execution of CHECK constraints.
  */
 static void
-domain_check_input(Datum value, bool isnull, DomainIOData *my_extra)
+domain_check_input(Datum value, bool isnull, DomainIOData *my_extra,
+   Node *escontext)
 {
 	ExprContext *econtext = my_extra->econtext;
 	ListCell   *l;
@@ -144,11 +149,14 @@ domain_check_input(Datum value, bool isnull, DomainIOData *my_extra)
 		{
 			case DOM_CONSTRAINT_NOTNULL:
 if (isnull)
-	ereport(ERROR,
+{
+	errsave(escontext,
 			(errcode(ERRCODE_NOT_NULL_VIOLATION),
 			 errmsg("domain %s does not allow null values",
 	format_type_be(my_extra->domain_type)),
 			 errdatatype(my_extra->domain_type)));
+	goto fail;
+}
 break;
 			case DOM_CONSTRAINT_CHECK:
 {
@@ -179,13 +187,16 @@ domain_check_input(Datum value, bool isnull, DomainIOData *my_extra)
 	econtext->domainValue_isNull = isnull;
 
 	if (!ExecCheck(con->check_exprstate, econtext))
-		ereport(ERROR,
+	{
+		errsave(escontext,
 (errcode(ERRCODE_CHECK_VIOLATION),
  errmsg("value for domain %s violates check constraint \"%s\"",
 		format_type_be(my_extra->domain_type),
 		con->name),
  errdomainconstraint(my_extra->domain_type,
 	 con->name)));
+		goto fail;
+	}
 	break;
 }
 			default:
@@ -200,6 +211,7 @@ domain_check_input(Datum value, bool isnull, DomainIOData *my_extra)
 	 * per-tuple memory.  This avoids leaking non-memory resources, if
 	 * anything in the expression(s) has any.
 	 */
+fail:
 	if (econtext)
 		ReScanExprContext(econtext);
 }
@@ -213,6 +225,7 @@ domain_in(PG_FUNCTION_ARGS)
 {
 	char	   *string;
 	Oid			domainType;
+	Node	   *escontext = fcinfo->context;
 	DomainIOData *my_extra;
 	Datum		value;
 
@@ -245,15 +258,18 @@ domain_in(PG_FUNCTION_ARGS)
 	/*
 	 * Invoke the base type's typinput procedure to convert the data.
 	 */
-	value = InputFunctionCall(_extra->proc,
-			  string,
-			  my_extra->typioparam,
-			  my_extra->typtypmod);
+	if (!InputFunctionCallSafe(_extra->proc,
+			   string,
+			   my_extra->typioparam,
+			   my_extra->typtypmod,
+			   escontext,
+			   ))
+		PG_RETURN_NULL();
 
 	/*
 	 * Do the necessary checks to ensure it's a valid domain value.
 	 */
-	domain_check_input(value, (string == NULL), my_extra);
+	domain_check_input(value, (string == NULL), my_extra, escontext);
 
 	if (string == 

Re: Error-safe user functions

2022-12-10 Thread Tom Lane
Andrew Dunstan  writes:
> OK, json is a fairly easy case, see attached. But jsonb is a different
> kettle of fish. Both the semantic routines called by the parser and the
> subsequent call to JsonbValueToJsonb() can raise errors. These are
> pretty much all about breaking various limits (for strings, objects,
> arrays). There's also a call to numeric_in, but I assume that a string
> that's already parsed as a valid json numeric literal won't upset
> numeric_in.

Um, nope ...

regression=# select '1e100'::jsonb;
ERROR:  value overflows numeric format
LINE 1: select '1e100'::jsonb;
   ^

> Many of these occur several calls down the stack, so
> adjusting everything to deal with them would be fairly invasive. Perhaps
> we could instead document that this class of input error won't be
> trapped, at least for jsonb.

Seeing that SQL/JSON is one of the major drivers of this whole project,
it seemed a little sad to me that jsonb couldn't manage to implement
what is required.  So I spent a bit of time poking at it.  Attached
is an extended version of your patch that also covers jsonb.

The main thing I soon realized is that the JsonSemAction API is based
on the assumption that semantic actions will report errors by throwing
them.  This is a bit schizophrenic considering the parser itself carefully
hands back error codes instead of throwing anything (excluding palloc
failures of course).  What I propose in the attached is that we change
that API so that action functions return JsonParseErrorType, and add
an enum value denoting "I already logged a suitable error, so you don't
have to".  It was a little tedious to modify all the existing functions
that way, but not hard.  Only the ones used by jsonb_in need to do
anything except "return JSON_SUCCESS", at least for now.

(I wonder if pg_verifybackup's parse_manifest.c could use a second
look at how it's handling errors, given this API.  I didn't study it
closely.)

I have not done anything here about errors within JsonbValueToJsonb.
There would need to be another round of API-extension in that area
if we want to be able to trap its errors.  As you say, those are mostly
about exceeding implementation size limits, so I suppose one could argue
that they are not so different from palloc failure.  It's still annoying.
If people are good with the changes attached, I might take a look at
that.

regards, tom lane

diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index fee2ffb55c..e6896eccfe 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -81,9 +81,10 @@ json_in(PG_FUNCTION_ARGS)
 
 	/* validate it */
 	lex = makeJsonLexContext(result, false);
-	pg_parse_json_or_ereport(lex, );
+	if (!pg_parse_json_or_errsave(lex, , fcinfo->context))
+		PG_RETURN_NULL();
 
-	/* Internal representation is the same as text, for now */
+	/* Internal representation is the same as text */
 	PG_RETURN_TEXT_P(result);
 }
 
@@ -1337,7 +1338,7 @@ json_typeof(PG_FUNCTION_ARGS)
 	/* Lex exactly one token from the input and check its type. */
 	result = json_lex(lex);
 	if (result != JSON_SUCCESS)
-		json_ereport_error(result, lex);
+		json_errsave_error(result, lex, NULL);
 	tok = lex->token_type;
 	switch (tok)
 	{
diff --git a/src/backend/utils/adt/jsonb.c b/src/backend/utils/adt/jsonb.c
index 9e14922ec2..7c1e5e6144 100644
--- a/src/backend/utils/adt/jsonb.c
+++ b/src/backend/utils/adt/jsonb.c
@@ -33,6 +33,7 @@ typedef struct JsonbInState
 {
 	JsonbParseState *parseState;
 	JsonbValue *res;
+	Node	   *escontext;
 } JsonbInState;
 
 /* unlike with json categories, we need to treat json and jsonb differently */
@@ -61,15 +62,15 @@ typedef struct JsonbAggState
 	Oid			val_output_func;
 } JsonbAggState;
 
-static inline Datum jsonb_from_cstring(char *json, int len);
-static size_t checkStringLen(size_t len);
-static void jsonb_in_object_start(void *pstate);
-static void jsonb_in_object_end(void *pstate);
-static void jsonb_in_array_start(void *pstate);
-static void jsonb_in_array_end(void *pstate);
-static void jsonb_in_object_field_start(void *pstate, char *fname, bool isnull);
+static inline Datum jsonb_from_cstring(char *json, int len, Node *escontext);
+static bool checkStringLen(size_t len, Node *escontext);
+static JsonParseErrorType jsonb_in_object_start(void *pstate);
+static JsonParseErrorType jsonb_in_object_end(void *pstate);
+static JsonParseErrorType jsonb_in_array_start(void *pstate);
+static JsonParseErrorType jsonb_in_array_end(void *pstate);
+static JsonParseErrorType jsonb_in_object_field_start(void *pstate, char *fname, bool isnull);
 static void jsonb_put_escaped_value(StringInfo out, JsonbValue *scalarVal);
-static void jsonb_in_scalar(void *pstate, char *token, JsonTokenType tokentype);
+static JsonParseErrorType jsonb_in_scalar(void *pstate, char *token, JsonTokenType tokentype);
 static void jsonb_categorize_type(Oid typoid,
   JsonbTypeCategory *tcategory,
   Oid 

Re: Error-safe user functions

2022-12-10 Thread Corey Huinker
On Sat, Dec 10, 2022 at 9:20 AM Tom Lane  wrote:

> Alvaro Herrera  writes:
> > On 2022-Dec-09, Tom Lane wrote:
> >> ...  So I think it might be
> >> okay to say "if you want soft error treatment for a domain,
> >> make sure its check constraints don't throw errors".
>
> > I think that's fine.  If the user does, say "CHECK (value > 0)" and that
> > results in a soft error, that seems to me enough support for now.  If
> > they want to do something more elaborate, they can write C functions.
> > Maybe eventually we'll want to offer some other mechanism that doesn't
> > require C, but let's figure out what the requirements are.  I don't
> > think we know that, at this point.
>
> A fallback we can offer to anyone with such a problem is "write a
> plpgsql function and wrap the potentially-failing bit in an exception
> block".  Then they get to pay the cost of the subtransaction, while
> we're not imposing one on everybody else.
>
> regards, tom lane
>

That exception block will prevent parallel plans. I'm not saying it isn't
the best way forward for us, but wanted to make that side effect clear.


Re: Error-safe user functions

2022-12-10 Thread Pavel Stehule
so 10. 12. 2022 v 15:35 odesílatel Andrew Dunstan 
napsal:

>
> On 2022-12-09 Fr 10:37, Andrew Dunstan wrote:
> > I am currently looking at the json types. I think that will be enough to
> > let us rework the sql/json patches as discussed a couple of months ago.
> >
>
> OK, json is a fairly easy case, see attached. But jsonb is a different
> kettle of fish. Both the semantic routines called by the parser and the
> subsequent call to JsonbValueToJsonb() can raise errors. These are
> pretty much all about breaking various limits (for strings, objects,
> arrays). There's also a call to numeric_in, but I assume that a string
> that's already parsed as a valid json numeric literal won't upset
> numeric_in. Many of these occur several calls down the stack, so
> adjusting everything to deal with them would be fairly invasive. Perhaps
> we could instead document that this class of input error won't be
> trapped, at least for jsonb. We could still test for well-formed jsonb
> input, just as I propose for json. That means that we would not be able
> to trap one of these errors in the ON ERROR clause of JSON_TABLE. I
> think we can probably live with that.
>
> Thoughts?
>

+1

Pavel



>
>
> cheers
>
>
> andrew
>
>
> --
> Andrew Dunstan
> EDB: https://www.enterprisedb.com
>


Re: Error-safe user functions

2022-12-10 Thread Andrew Dunstan

On 2022-12-09 Fr 10:37, Andrew Dunstan wrote:
> I am currently looking at the json types. I think that will be enough to
> let us rework the sql/json patches as discussed a couple of months ago.
>

OK, json is a fairly easy case, see attached. But jsonb is a different
kettle of fish. Both the semantic routines called by the parser and the
subsequent call to JsonbValueToJsonb() can raise errors. These are
pretty much all about breaking various limits (for strings, objects,
arrays). There's also a call to numeric_in, but I assume that a string
that's already parsed as a valid json numeric literal won't upset
numeric_in. Many of these occur several calls down the stack, so
adjusting everything to deal with them would be fairly invasive. Perhaps
we could instead document that this class of input error won't be
trapped, at least for jsonb. We could still test for well-formed jsonb
input, just as I propose for json. That means that we would not be able
to trap one of these errors in the ON ERROR clause of JSON_TABLE. I
think we can probably live with that.

Thoughts?


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com
From 945ede5fa99f5aa9fb0740fe04303f37aa511528 Mon Sep 17 00:00:00 2001
From: Andrew Dunstan 
Date: Sat, 10 Dec 2022 08:18:57 -0500
Subject: [PATCH] adjustments for json_in

---
 src/backend/utils/adt/json.c   |  7 ---
 src/backend/utils/adt/jsonfuncs.c  | 26 +-
 src/include/utils/jsonfuncs.h  | 14 +-
 src/test/regress/expected/json.out | 19 +++
 src/test/regress/sql/json.sql  |  5 +
 5 files changed, 54 insertions(+), 17 deletions(-)

diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index fee2ffb55c..d5c48c99c3 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -81,9 +81,10 @@ json_in(PG_FUNCTION_ARGS)
 
 	/* validate it */
 	lex = makeJsonLexContext(result, false);
-	pg_parse_json_or_ereport(lex, );
+	if ( ! pg_parse_json_or_errsave(lex, , fcinfo->context))
+		PG_RETURN_NULL();
 
-	/* Internal representation is the same as text, for now */
+	/* Internal representation is the same as text */
 	PG_RETURN_TEXT_P(result);
 }
 
@@ -1337,7 +1338,7 @@ json_typeof(PG_FUNCTION_ARGS)
 	/* Lex exactly one token from the input and check its type. */
 	result = json_lex(lex);
 	if (result != JSON_SUCCESS)
-		json_ereport_error(result, lex);
+		json_errsave_error(result, lex, NULL);
 	tok = lex->token_type;
 	switch (tok)
 	{
diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c
index bfc3f02a86..1a3cec59cb 100644
--- a/src/backend/utils/adt/jsonfuncs.c
+++ b/src/backend/utils/adt/jsonfuncs.c
@@ -490,21 +490,28 @@ static void transform_string_values_object_field_start(void *state, char *fname,
 static void transform_string_values_array_element_start(void *state, bool isnull);
 static void transform_string_values_scalar(void *state, char *token, JsonTokenType tokentype);
 
+
 /*
- * pg_parse_json_or_ereport
+ * pg_parse_json_or_errsave
  *
  * This function is like pg_parse_json, except that it does not return a
  * JsonParseErrorType. Instead, in case of any failure, this function will
- * ereport(ERROR).
+ * call errsave(escontext), which will call ereport(ERROR) if escontext is NULL.
+ * Otherwise, returns  a boolean indicating success or failure.
  */
-void
-pg_parse_json_or_ereport(JsonLexContext *lex, JsonSemAction *sem)
+bool
+pg_parse_json_or_errsave(JsonLexContext *lex, JsonSemAction *sem,
+		 Node *escontext)
 {
 	JsonParseErrorType result;
 
 	result = pg_parse_json(lex, sem);
 	if (result != JSON_SUCCESS)
-		json_ereport_error(result, lex);
+	{
+		json_errsave_error(result, lex, escontext);
+		return false;
+	}
+	return true;
 }
 
 /*
@@ -608,17 +615,18 @@ jsonb_object_keys(PG_FUNCTION_ARGS)
  * Report a JSON error.
  */
 void
-json_ereport_error(JsonParseErrorType error, JsonLexContext *lex)
+json_errsave_error(JsonParseErrorType error, JsonLexContext *lex,
+   Node *escontext)
 {
 	if (error == JSON_UNICODE_HIGH_ESCAPE ||
 		error == JSON_UNICODE_CODE_POINT_ZERO)
-		ereport(ERROR,
+		errsave(escontext,
 (errcode(ERRCODE_UNTRANSLATABLE_CHARACTER),
  errmsg("unsupported Unicode escape sequence"),
  errdetail_internal("%s", json_errdetail(error, lex)),
  report_json_context(lex)));
 	else
-		ereport(ERROR,
+		errsave(escontext,
 (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
  errmsg("invalid input syntax for type %s", "json"),
  errdetail_internal("%s", json_errdetail(error, lex)),
@@ -1260,7 +1268,7 @@ get_array_start(void *state)
 
 			error = json_count_array_elements(_state->lex, );
 			if (error != JSON_SUCCESS)
-json_ereport_error(error, _state->lex);
+json_errsave_error(error, _state->lex, NULL);
 
 			if (-_state->path_indexes[lex_level] <= nelements)
 _state->path_indexes[lex_level] += nelements;
diff --git a/src/include/utils/jsonfuncs.h b/src/include/utils/jsonfuncs.h

Re: Error-safe user functions

2022-12-10 Thread Tom Lane
Alvaro Herrera  writes:
> On 2022-Dec-09, Tom Lane wrote:
>> ...  So I think it might be
>> okay to say "if you want soft error treatment for a domain,
>> make sure its check constraints don't throw errors".

> I think that's fine.  If the user does, say "CHECK (value > 0)" and that
> results in a soft error, that seems to me enough support for now.  If
> they want to do something more elaborate, they can write C functions.
> Maybe eventually we'll want to offer some other mechanism that doesn't
> require C, but let's figure out what the requirements are.  I don't
> think we know that, at this point.

A fallback we can offer to anyone with such a problem is "write a
plpgsql function and wrap the potentially-failing bit in an exception
block".  Then they get to pay the cost of the subtransaction, while
we're not imposing one on everybody else.

regards, tom lane




Re: Error-safe user functions

2022-12-10 Thread Alvaro Herrera
On 2022-Dec-09, Tom Lane wrote:

> I think though that it might be okay to just define this as
> Not Our Problem.  Although we don't seem to try to enforce it,
> non-immutable domain check constraints are strongly deprecated
> (the CREATE DOMAIN man page says that we assume immutability).
> And not throwing errors is something that we usually consider
> should ride along with immutability.  So I think it might be
> okay to say "if you want soft error treatment for a domain,
> make sure its check constraints don't throw errors".

I think that's fine.  If the user does, say "CHECK (value > 0)" and that
results in a soft error, that seems to me enough support for now.  If
they want to do something more elaborate, they can write C functions.
Maybe eventually we'll want to offer some other mechanism that doesn't
require C, but let's figure out what the requirements are.  I don't
think we know that, at this point.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Estoy de acuerdo contigo en que la verdad absoluta no existe...
El problema es que la mentira sí existe y tu estás mintiendo" (G. Lama)




Re: Error-safe user functions

2022-12-09 Thread Tom Lane
Andrew Dunstan  writes:
> On 2022-12-09 Fr 10:16, Tom Lane wrote:
>> As I said, I'll take a look at the datetime area.  Do we
>> have any volunteers for other input functions?

> I am currently looking at the json types. I think that will be enough to
> let us rework the sql/json patches as discussed a couple of months ago.

Cool.  I've finished up what I wanted to do with the datetime code.

It occurred to me that we're going to have a bit of a problem
with domain_in.  We can certainly make it pass back any soft
errors from the underlying type's input function, and we can
make it return a soft error if a domain constraint evaluates
to false.  However, what happens if some function in a check
constraint throws an error?  Our only hope of trapping that,
given that it's a general user-defined expression, would be
a subtransaction.  Which is exactly what we don't want here.

I think though that it might be okay to just define this as
Not Our Problem.  Although we don't seem to try to enforce it,
non-immutable domain check constraints are strongly deprecated
(the CREATE DOMAIN man page says that we assume immutability).
And not throwing errors is something that we usually consider
should ride along with immutability.  So I think it might be
okay to say "if you want soft error treatment for a domain,
make sure its check constraints don't throw errors".

Thoughts?

regards, tom lane




Re: Error-safe user functions

2022-12-09 Thread Corey Huinker
On Fri, Dec 9, 2022 at 11:17 AM Amul Sul  wrote:

> On Fri, Dec 9, 2022 at 9:08 PM Andrew Dunstan  wrote:
> >
> >
> > On 2022-12-09 Fr 10:16, Tom Lane wrote:
> > > Andrew Dunstan  writes:
> > >> On 2022-12-08 Th 21:59, Tom Lane wrote:
> > >>> Yeah, I was planning to take a look at that before walking away from
> > >>> this stuff.  (I'm sure not volunteering to convert ALL the input
> > >>> functions, but I'll do the datetime code.)
> > >> Awesome. Perhaps if there are no more comments you can commit what you
> > >> currently have so people can start work on other input functions.
> > > Pushed.
> >
> >
> > Great!
> >
> >
> > > As I said, I'll take a look at the datetime area.  Do we
> > > have any volunteers for other input functions?
> > >
> > >
> >
> >
> > I am currently looking at the json types. I think that will be enough to
> > let us rework the sql/json patches as discussed a couple of months ago.
> >
>
> I will pick a few other input functions, thanks.
>
> Regards,
> Amul
>

I can do a few as well, as I need them done for the CAST With Default
effort.

Amul, please let me know which ones you pick so we don't duplicate work.


Re: Error-safe user functions

2022-12-09 Thread Amul Sul
On Fri, Dec 9, 2022 at 9:08 PM Andrew Dunstan  wrote:
>
>
> On 2022-12-09 Fr 10:16, Tom Lane wrote:
> > Andrew Dunstan  writes:
> >> On 2022-12-08 Th 21:59, Tom Lane wrote:
> >>> Yeah, I was planning to take a look at that before walking away from
> >>> this stuff.  (I'm sure not volunteering to convert ALL the input
> >>> functions, but I'll do the datetime code.)
> >> Awesome. Perhaps if there are no more comments you can commit what you
> >> currently have so people can start work on other input functions.
> > Pushed.
>
>
> Great!
>
>
> > As I said, I'll take a look at the datetime area.  Do we
> > have any volunteers for other input functions?
> >
> >
>
>
> I am currently looking at the json types. I think that will be enough to
> let us rework the sql/json patches as discussed a couple of months ago.
>

I will pick a few other input functions, thanks.

Regards,
Amul




Re: Error-safe user functions

2022-12-09 Thread Andrew Dunstan


On 2022-12-09 Fr 10:16, Tom Lane wrote:
> Andrew Dunstan  writes:
>> On 2022-12-08 Th 21:59, Tom Lane wrote:
>>> Yeah, I was planning to take a look at that before walking away from
>>> this stuff.  (I'm sure not volunteering to convert ALL the input
>>> functions, but I'll do the datetime code.)
>> Awesome. Perhaps if there are no more comments you can commit what you
>> currently have so people can start work on other input functions.
> Pushed.  


Great!


> As I said, I'll take a look at the datetime area.  Do we
> have any volunteers for other input functions?
>
>   


I am currently looking at the json types. I think that will be enough to
let us rework the sql/json patches as discussed a couple of months ago.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Error-safe user functions

2022-12-09 Thread Tom Lane
Andrew Dunstan  writes:
> On 2022-12-08 Th 21:59, Tom Lane wrote:
>> Yeah, I was planning to take a look at that before walking away from
>> this stuff.  (I'm sure not volunteering to convert ALL the input
>> functions, but I'll do the datetime code.)

> Awesome. Perhaps if there are no more comments you can commit what you
> currently have so people can start work on other input functions.

Pushed.  As I said, I'll take a look at the datetime area.  Do we
have any volunteers for other input functions?

regards, tom lane




Re: Error-safe user functions

2022-12-09 Thread Andrew Dunstan


On 2022-12-08 Th 21:59, Tom Lane wrote:
> Andres Freund  writes:
>> On 2022-12-08 17:57:09 -0500, Tom Lane wrote:
>>> Given that this additional experimentation didn't find any holes
>>> in the API design, I think this is pretty much ready to go.
>> One interesting area is timestamp / datetime related code. There's been some
>> past efforts in the area, mostly in 5bc450629b3. See the RETURN_ERROR macro 
>> in
>> formatting.c.
>> This is not directly about type input functions, but it looks to me that the
>> functionality in the patchset should work.
> Yeah, I was planning to take a look at that before walking away from
> this stuff.  (I'm sure not volunteering to convert ALL the input
> functions, but I'll do the datetime code.)
>

Awesome. Perhaps if there are no more comments you can commit what you
currently have so people can start work on other input functions.


Thanks for your work on this.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Error-safe user functions

2022-12-08 Thread Tom Lane
Andres Freund  writes:
> On 2022-12-08 17:57:09 -0500, Tom Lane wrote:
>> Given that this additional experimentation didn't find any holes
>> in the API design, I think this is pretty much ready to go.

> One interesting area is timestamp / datetime related code. There's been some
> past efforts in the area, mostly in 5bc450629b3. See the RETURN_ERROR macro in
> formatting.c.
> This is not directly about type input functions, but it looks to me that the
> functionality in the patchset should work.

Yeah, I was planning to take a look at that before walking away from
this stuff.  (I'm sure not volunteering to convert ALL the input
functions, but I'll do the datetime code.)

You're right that formatting.c is doing stuff that's not exactly
an input function, but I don't see why we can't apply the same
API concepts to it.

regards, tom lane




Re: Error-safe user functions

2022-12-08 Thread Andres Freund
Hi,

On 2022-12-08 17:57:09 -0500, Tom Lane wrote:
> Given that this additional experimentation didn't find any holes
> in the API design, I think this is pretty much ready to go.

One interesting area is timestamp / datetime related code. There's been some
past efforts in the area, mostly in 5bc450629b3. See the RETURN_ERROR macro in
formatting.c.

This is not directly about type input functions, but it looks to me that the
functionality in the patchset should work.

I certainly have the hope that it'll make the code look a bit less ugly...


It looks like a fair bit of work to convert this code, so I don't think we
should tie converting formatting.c to the patchset. But it might be a good
idea for Tom to skim the code to see whether there's any things impacting the
design.

Greetings,

Andres Freund




Re: Error-safe user functions

2022-12-08 Thread Andrew Dunstan


On 2022-12-08 Th 17:57, Tom Lane wrote:
> Andres Freund  writes:
>> On 2022-12-08 16:00:10 -0500, Robert Haas wrote:
>>> Yes, I think just putting "struct Node;" in as many places as
>>> necessary is the way to go. Or even:
>> +1
> OK, here's a v5 that does it like that.
>
> I've spent a little time pushing ahead on other input functions,
> and realized that my original plan to require a pre-encoded typmod
> for these test functions was not very user-friendly.  So in v5
> you can write something like
>
> pg_input_is_valid('1234.567', 'numeric(7,4)')
>
> 0004 attached finishes up the remaining core numeric datatypes
> (int*, float*, numeric).  I ripped out float8in_internal_opt_error
> in favor of a function that uses the new APIs.


Great, that takes care of some of the relatively urgent work.


>
> 0005 converts contrib/cube, which I chose to tackle partly because
> I'd already touched it in 0004, partly because it seemed like a
> good idea to verify that extension modules wouldn't have any
> problems with this apprach, and partly because I wondered whether
> an input function that uses a Bison/Flex parser would have big
> problems getting converted.  This one didn't, anyway.


Cool


>
> Given that this additional experimentation didn't find any holes
> in the API design, I think this is pretty much ready to go.
>
>   


I will look in more detail tomorrow, but it LGTM on a quick look.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Error-safe user functions

2022-12-08 Thread Tom Lane
Andres Freund  writes:
> On 2022-12-08 16:00:10 -0500, Robert Haas wrote:
>> Yes, I think just putting "struct Node;" in as many places as
>> necessary is the way to go. Or even:

> +1

OK, here's a v5 that does it like that.

I've spent a little time pushing ahead on other input functions,
and realized that my original plan to require a pre-encoded typmod
for these test functions was not very user-friendly.  So in v5
you can write something like

pg_input_is_valid('1234.567', 'numeric(7,4)')

0004 attached finishes up the remaining core numeric datatypes
(int*, float*, numeric).  I ripped out float8in_internal_opt_error
in favor of a function that uses the new APIs.

0005 converts contrib/cube, which I chose to tackle partly because
I'd already touched it in 0004, partly because it seemed like a
good idea to verify that extension modules wouldn't have any
problems with this apprach, and partly because I wondered whether
an input function that uses a Bison/Flex parser would have big
problems getting converted.  This one didn't, anyway.

Given that this additional experimentation didn't find any holes
in the API design, I think this is pretty much ready to go.

regards, tom lane

diff --git a/doc/src/sgml/ref/create_type.sgml b/doc/src/sgml/ref/create_type.sgml
index 693423e524..994dfc6526 100644
--- a/doc/src/sgml/ref/create_type.sgml
+++ b/doc/src/sgml/ref/create_type.sgml
@@ -900,6 +900,17 @@ CREATE TYPE name
function is written in C.
   
 
+  
+   In PostgreSQL version 16 and later,
+   it is desirable for base types' input functions to
+   return soft errors using the
+   new errsave()/ereturn()
+   mechanism, rather than throwing ereport()
+   exceptions as in previous versions.
+   See src/backend/utils/fmgr/README for more
+   information.
+  
+
  
 
  
diff --git a/src/backend/nodes/Makefile b/src/backend/nodes/Makefile
index 4368c30fdb..7c594be583 100644
--- a/src/backend/nodes/Makefile
+++ b/src/backend/nodes/Makefile
@@ -56,6 +56,7 @@ node_headers = \
 	nodes/bitmapset.h \
 	nodes/extensible.h \
 	nodes/lockoptions.h \
+	nodes/miscnodes.h \
 	nodes/replnodes.h \
 	nodes/supportnodes.h \
 	nodes/value.h \
diff --git a/src/backend/nodes/gen_node_support.pl b/src/backend/nodes/gen_node_support.pl
index 7212bc486f..08992dfd47 100644
--- a/src/backend/nodes/gen_node_support.pl
+++ b/src/backend/nodes/gen_node_support.pl
@@ -68,6 +68,7 @@ my @all_input_files = qw(
   nodes/bitmapset.h
   nodes/extensible.h
   nodes/lockoptions.h
+  nodes/miscnodes.h
   nodes/replnodes.h
   nodes/supportnodes.h
   nodes/value.h
@@ -89,6 +90,7 @@ my @nodetag_only_files = qw(
   executor/tuptable.h
   foreign/fdwapi.h
   nodes/lockoptions.h
+  nodes/miscnodes.h
   nodes/replnodes.h
   nodes/supportnodes.h
 );
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index f5cd1b7493..eb489ea3a7 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -71,6 +71,7 @@
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
 #include "mb/pg_wchar.h"
+#include "nodes/miscnodes.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/bgworker.h"
@@ -611,6 +612,128 @@ errfinish(const char *filename, int lineno, const char *funcname)
 	CHECK_FOR_INTERRUPTS();
 }
 
+
+/*
+ * errsave_start --- begin a "soft" error-reporting cycle
+ *
+ * If "context" isn't an ErrorSaveContext node, this behaves as
+ * errstart(ERROR, domain), and the errsave() macro ends up acting
+ * exactly like ereport(ERROR, ...).
+ *
+ * If "context" is an ErrorSaveContext node, but the node creator only wants
+ * notification of the fact of a soft error without any details, we just set
+ * the error_occurred flag in the ErrorSaveContext node and return false,
+ * which will cause us to skip the remaining error processing steps.
+ *
+ * Otherwise, create and initialize error stack entry and return true.
+ * Subsequently, errmsg() and perhaps other routines will be called to further
+ * populate the stack entry.  Finally, errsave_finish() will be called to
+ * tidy up.
+ */
+bool
+errsave_start(struct Node *context, const char *domain)
+{
+	ErrorSaveContext *escontext;
+	ErrorData  *edata;
+
+	/*
+	 * Do we have a context for soft error reporting?  If not, just punt to
+	 * errstart().
+	 */
+	if (context == NULL || !IsA(context, ErrorSaveContext))
+		return errstart(ERROR, domain);
+
+	/* Report that a soft error was detected */
+	escontext = (ErrorSaveContext *) context;
+	escontext->error_occurred = true;
+
+	/* Nothing else to do if caller wants no further details */
+	if (!escontext->details_wanted)
+		return false;
+
+	/*
+	 * Okay, crank up a stack entry to store the info in.
+	 */
+
+	recursion_depth++;
+
+	/* Initialize data for this error frame */
+	edata = get_error_stack_entry();
+	edata->elevel = LOG;		/* signal all is well to errsave_finish */
+	set_stack_entry_domain(edata, domain);
+	/* Select default errcode based on the assumed elevel of ERROR */
+	

Re: Error-safe user functions

2022-12-08 Thread Andres Freund
Hi,

On 2022-12-08 16:00:10 -0500, Robert Haas wrote:
> On Thu, Dec 8, 2022 at 11:32 AM Tom Lane  wrote:
> > If we go with "struct Node *" then we can solve such problems by
> > just repeating "struct Node;" forward-declarations in as many
> > headers as we have to.
> 
> Yes, I think just putting "struct Node;" in as many places as
> necessary is the way to go. Or even:

+1


> struct Node;
> typedef struct Node Node;

That doesn't work well, because C99 doesn't allow typedefs to be redeclared in
the same scope. IIRC C11 added suppport for it, and a lot of compilers already
supported it before.

Greetings,

Andres Freund




Re: Error-safe user functions

2022-12-08 Thread Robert Haas
On Thu, Dec 8, 2022 at 11:32 AM Tom Lane  wrote:
> If we go with "struct Node *" then we can solve such problems by
> just repeating "struct Node;" forward-declarations in as many
> headers as we have to.

Yes, I think just putting "struct Node;" in as many places as
necessary is the way to go. Or even:

struct Node;
typedef struct Node Node;

which I think then allows for Node * to be used later.

A small problem with typedef struct Something *SomethingElse is that
it can get hard to keep track of whether some identifier is a pointer
to a struct or just a struct. This doesn't bother me as much as it
does some other hackers, from what I gather anyway, but I think we
should be pretty judicious in using typedef that way. "SomethingPtr"
really has no advantage over "Something *". It is neither shorter nor
clearer.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Error-safe user functions

2022-12-08 Thread Tom Lane
Andres Freund  writes:
> On 2022-12-07 17:32:21 -0500, Tom Lane wrote:
>> +typedef struct Node *NodePtr;

> Seems like it'd be easier to just forward declare the struct, and use the
> non-typedef'ed name in the header than to have to deal with these
> interdependencies and the differing typenames.

I've been having second thoughts about how to handle this issue.
As we convert more and more datatypes, references to "Node *" are
going to be needed in assorted headers that don't currently have
any reason to #include nodes.h.  Rather than bloating their include
footprints, we'll want to use the alternate spelling, whichever
it is.  (I already had to do this in array.h.)  Some of these headers
might be things that are also read by frontend compiles, in which
case they won't have access to elog.h either, so that NodePtr in
this formulation won't work for them.  (I ran into a variant of that
with an early draft of this patch series.)

If we stick with NodePtr we'll probably end by putting that typedef
into c.h so that it's accessible in frontend as well as backend.
I don't have a huge problem with that, but I concede it's a little ugly.

If we go with "struct Node *" then we can solve such problems by
just repeating "struct Node;" forward-declarations in as many
headers as we have to.  This is a bit ugly too, but maybe less so,
and it's a method we use elsewhere.  The main downside I can see
to it is that we will probably not find out all the places where
we need such declarations until we get field complaints that
"header X doesn't compile for me".  elog.h will have a struct Node
declaration, and that will be visible in every backend compilation
we do as well as every cpluspluscheck/headerscheck test.

Another notational point I'm wondering about is whether we want
to create hundreds of direct references to fcinfo->context.
Is it time to invent

#define PG_GET_CONTEXT()(fcinfo->context)

and write that instead in all these input functions?

Thoughts?

regards, tom lane




Re: Error-safe user functions

2022-12-08 Thread Alvaro Herrera
On 2022-Dec-07, David G. Johnston wrote:

> Are you suggesting we should not go down the path that v8-0003 does in the
> monitoring section cleanup thread?  I find the usability of Chapter 54
> System Views to be superior to these two run-on chapters and would rather
> we emulate it in both these places - for what is in the end very little
> additional effort, all mechanical in nature.

I think the new 9.26 is much better now than what we had there two days
ago.  Maybe it would be even better with your proposed changes, but
let's see what you come up with.

As for Chapter 54, while it's a lot better than what we had previously,
I have a complaint about the new presentation: the overview table
appears (at least in the HTML presentation) in a separate page from the
initial page of the chapter.  So to get the intended table of contents I
have to move forward from the unintended table of contents (i.e. from
https://www.postgresql.org/docs/devel/views.html forward to
https://www.postgresql.org/docs/devel/views-overview.html ).  This seems
pointless.  I think it would be better if we just removed the line
, which would put that table in the "front page".

I also have an issue with Chapter 28, more precisely 28.2.2, where we
have a similar TOC-style tables (Tables 28.1 and 28.2), but these ones
seem inferior to the new table in Chapter 54 in that the outgoing links
are in random positions in the text of the table.  It would be better to
put those in a column of their own, so that they are all vertically
aligned and easier to spot/click.  Not sure if you've been here already.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"En las profundidades de nuestro inconsciente hay una obsesiva necesidad
de un universo lógico y coherente. Pero el universo real se halla siempre
un paso más allá de la lógica" (Irulan)




Re: Error-safe user functions

2022-12-07 Thread Corey Huinker
On Wed, Dec 7, 2022 at 12:17 PM Tom Lane  wrote:

> Corey Huinker  writes:
> > In my attempt to implement CAST...DEFAULT, I noticed that I immediately
> > needed an
> > OidInputFunctionCallSafe, which was trivial but maybe something we want
> to
> > add to the infra patch, but the comments around that function also
> somewhat
> > indicate that we might want to just do the work in-place and call
> > InputFunctionCallSafe directly. Open to both ideas.
>
> I'm a bit skeptical of that.  IMO using OidInputFunctionCall is only
> appropriate in places that will be executed just once per query.
>

That is what's happening when the expr of the existing CAST ( expr AS
typename ) is a constant and we want to just resolve the constant at parse
time.


>
>


Re: Error-safe user functions

2022-12-07 Thread Tom Lane
Andres Freund  writes:
> I wonder if there are potential use-cases for levels other than ERROR. I can
> potentially see us wanting to defer some FATALs, e.g. when they occur in
> process exit hooks.

I thought about that early on, and concluded not.  The whole thing is
moot for levels less than ERROR, of course, and I'm having a hard
time seeing how it could be useful for FATAL or PANIC.  Maybe I just
lack imagination, but if a call is specifying FATAL rather than just
ERROR then it seems to me it's already a special snowflake rather
than something we could fold into a generic non-error behavior.

> For a moment I was worried that it could lead to odd behaviour that we don't
> do get_error_stack_entry() when !details_wanted, due to not raising an error
> we'd otherwise raise. But that's a should-never-be-reached case, so ...

I don't see how.  Returning false out of errsave_start causes the
errsave macro to immediately give control back to the caller, which
will go on about its business.

> It seems somewhat ugly transport this knowledge via edata->elevel, but it's
> not too bad.

The LOG-vs-ERROR business, you mean?  Yeah.  I considered adding another
bool flag to ErrorData, but couldn't convince myself it was worth the
trouble.  If we find a problem we can do that sometime in future.

>> +/*
>> + * We cannot include nodes.h yet, so make a stub reference.  (This is also
>> + * used by fmgr.h, which doesn't want to depend on nodes.h either.)
>> + */
>> +typedef struct Node *NodePtr;

> Seems like it'd be easier to just forward declare the struct, and use the
> non-typedef'ed name in the header than to have to deal with these
> interdependencies and the differing typenames.

Meh.  I'm a little allergic to writing "struct foo *" in function argument
lists, because I so often see gcc pointing out that if struct foo isn't
yet known then that can silently mean something different than you
intended.  With the typedef, it either works or is an error, no halfway
about it.  And the struct way isn't really much better in terms of
having two different notations to use rather than only one.

> Perhaps worth noting here that the reason why the errsave_start/errsave_finish
> split exist differs a bit from the reason in ereport_domain()? "Over there"
> it's just about not wanting to incur overhead when the message isn't logged,
> but here we'll always have >= ERROR, but ->details_wanted can still lead to
> not wanting to incur the overhead.

Hmmm ... it seems like the same reason to me, we don't want to incur the
overhead if the "start" function says not to.

> Is it ok that we throw an error in lookup_rowtype_tupdesc()?

Yeah, that should fall in the category of internal errors I think.
I don't see how you'd reach that from a bad input string.

(Or to be more precise, the point of pg_input_is_valid is to tell
you whether the input string is valid, not to tell you whether the
type name is valid; if you're worried about the latter you need
a separate and earlier test.)

regards, tom lane




Re: Error-safe user functions

2022-12-07 Thread Andres Freund
Hi,

On 2022-12-07 17:32:21 -0500, Tom Lane wrote:
> I already pushed the  elog-refactoring patch, since that seemed
> uncontroversial.  0001 attached covers the same territory as before,
> but I regrouped the rest so that 0002 installs the new test support
> functions, then 0003 adds both the per-datatype changes and
> corresponding test cases for bool, int4, arrays, and records.
> The idea here is that 0003 can be pointed to as a sample of what
> has to be done to datatype input functions, while the preceding
> patches can be cited as relevant documentation.  (I've not decided
> whether to squash 0001 and 0002 together or commit them separately.

I think they make sense as is.


> Does it make sense to break 0003 into 4 separate commits, or is
> that overkill?)

I think it'd be fine either way.


> + * If "context" is an ErrorSaveContext node, but the node creator only wants
> + * notification of the fact of a soft error without any details, just set
> + * the error_occurred flag in the ErrorSaveContext node and return false,
> + * which will cause us to skip the remaining error processing steps.
> + *
> + * Otherwise, create and initialize error stack entry and return true.
> + * Subsequently, errmsg() and perhaps other routines will be called to 
> further
> + * populate the stack entry.  Finally, errsave_finish() will be called to
> + * tidy up.
> + */
> +bool
> +errsave_start(NodePtr context, const char *domain)

I wonder if there are potential use-cases for levels other than ERROR. I can
potentially see us wanting to defer some FATALs, e.g. when they occur in
process exit hooks.


> +{
> + ErrorSaveContext *escontext;
> + ErrorData  *edata;
> +
> + /*
> +  * Do we have a context for soft error reporting?  If not, just punt to
> +  * errstart().
> +  */
> + if (context == NULL || !IsA(context, ErrorSaveContext))
> + return errstart(ERROR, domain);
> +
> + /* Report that a soft error was detected */
> + escontext = (ErrorSaveContext *) context;
> + escontext->error_occurred = true;
> +
> + /* Nothing else to do if caller wants no further details */
> + if (!escontext->details_wanted)
> + return false;
> +
> + /*
> +  * Okay, crank up a stack entry to store the info in.
> +  */
> +
> + recursion_depth++;
> +
> + /* Initialize data for this error frame */
> + edata = get_error_stack_entry();

For a moment I was worried that it could lead to odd behaviour that we don't
do get_error_stack_entry() when !details_wanted, due to not raising an error
we'd otherwise raise. But that's a should-never-be-reached case, so ...


> +/*
> + * errsave_finish --- end a "soft" error-reporting cycle
> + *
> + * If errsave_start() decided this was a regular error, behave as
> + * errfinish().  Otherwise, package up the error details and save
> + * them in the ErrorSaveContext node.
> + */
> +void
> +errsave_finish(NodePtr context, const char *filename, int lineno,
> +const char *funcname)
> +{
> + ErrorSaveContext *escontext = (ErrorSaveContext *) context;
> + ErrorData  *edata = [errordata_stack_depth];
> +
> + /* verify stack depth before accessing *edata */
> + CHECK_STACK_DEPTH();
> +
> + /*
> +  * If errsave_start punted to errstart, then elevel will be ERROR or
> +  * perhaps even PANIC.  Punt likewise to errfinish.
> +  */
> + if (edata->elevel >= ERROR)
> + {
> + errfinish(filename, lineno, funcname);
> + pg_unreachable();
> + }

It seems somewhat ugly transport this knowledge via edata->elevel, but it's
not too bad.



> +/*
> + * We cannot include nodes.h yet, so make a stub reference.  (This is also
> + * used by fmgr.h, which doesn't want to depend on nodes.h either.)
> + */
> +typedef struct Node *NodePtr;

Seems like it'd be easier to just forward declare the struct, and use the
non-typedef'ed name in the header than to have to deal with these
interdependencies and the differing typenames.


> +/*--
> + * Support for reporting "soft" errors that don't require a full transaction
> + * abort to clean up.  This is to be used in this way:
> + *   errsave(context,
> + *   errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
> + *   errmsg("invalid input syntax for type %s: 
> \"%s\"",
> + *  "boolean", in_str),
> + *   ... other errxxx() fields as needed ...);
> + *
> + * "context" is a node pointer or NULL, and the remaining auxiliary calls
> + * provide the same error details as in ereport().  If context is not a
> + * pointer to an ErrorSaveContext node, then errsave(context, ...)
> + * behaves identically to ereport(ERROR, ...).  If context is a pointer
> + * to an ErrorSaveContext node, then the information provided by the
> + * auxiliary calls is stored in the context node and control returns
> + * 

Re: Error-safe user functions

2022-12-07 Thread Tom Lane
Andrew Dunstan  writes:
> On 2022-12-07 We 17:32, Tom Lane wrote:
>> Does it make sense to break 0003 into 4 separate commits, or is
>> that overkill?)

> No strong opinion about 0001 and 0002. I'm happy enough with them as
> they are, but if you want to squash them that's ok. I wouldn't break up
> 0003. I think we're going to end up committing the remaining work in
> batches, although they would probably be a bit more thematically linked
> than these.

Yeah, we certainly aren't likely to do this work as
one-commit-per-datatype going forward.  I'm just wondering
how to do these initial commits so that they provide
good reference material.

regards, tom lane




Re: Error-safe user functions

2022-12-07 Thread Andrew Dunstan


On 2022-12-07 We 17:32, Tom Lane wrote:
> OK, here's a v4 that I think is possibly committable.
>
> I've changed all the comments and docs to use the "soft error"
> terminology, but since using "soft" in the actual function names
> didn't seem that appealing, they still use "safe".
>
> I already pushed the  elog-refactoring patch, since that seemed
> uncontroversial.  0001 attached covers the same territory as before,
> but I regrouped the rest so that 0002 installs the new test support
> functions, then 0003 adds both the per-datatype changes and
> corresponding test cases for bool, int4, arrays, and records.
> The idea here is that 0003 can be pointed to as a sample of what
> has to be done to datatype input functions, while the preceding
> patches can be cited as relevant documentation.  (I've not decided
> whether to squash 0001 and 0002 together or commit them separately.
> Does it make sense to break 0003 into 4 separate commits, or is
> that overkill?)
>

No strong opinion about 0001 and 0002. I'm happy enough with them as
they are, but if you want to squash them that's ok. I wouldn't break up
0003. I think we're going to end up committing the remaining work in
batches, although they would probably be a bit more thematically linked
than these.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Error-safe user functions

2022-12-07 Thread Tom Lane
OK, here's a v4 that I think is possibly committable.

I've changed all the comments and docs to use the "soft error"
terminology, but since using "soft" in the actual function names
didn't seem that appealing, they still use "safe".

I already pushed the  elog-refactoring patch, since that seemed
uncontroversial.  0001 attached covers the same territory as before,
but I regrouped the rest so that 0002 installs the new test support
functions, then 0003 adds both the per-datatype changes and
corresponding test cases for bool, int4, arrays, and records.
The idea here is that 0003 can be pointed to as a sample of what
has to be done to datatype input functions, while the preceding
patches can be cited as relevant documentation.  (I've not decided
whether to squash 0001 and 0002 together or commit them separately.
Does it make sense to break 0003 into 4 separate commits, or is
that overkill?)

Thoughts?

regards, tom lane

diff --git a/doc/src/sgml/ref/create_type.sgml b/doc/src/sgml/ref/create_type.sgml
index 693423e524..994dfc6526 100644
--- a/doc/src/sgml/ref/create_type.sgml
+++ b/doc/src/sgml/ref/create_type.sgml
@@ -900,6 +900,17 @@ CREATE TYPE name
function is written in C.
   
 
+  
+   In PostgreSQL version 16 and later,
+   it is desirable for base types' input functions to
+   return soft errors using the
+   new errsave()/ereturn()
+   mechanism, rather than throwing ereport()
+   exceptions as in previous versions.
+   See src/backend/utils/fmgr/README for more
+   information.
+  
+
  
 
  
diff --git a/src/backend/nodes/Makefile b/src/backend/nodes/Makefile
index 4368c30fdb..7c594be583 100644
--- a/src/backend/nodes/Makefile
+++ b/src/backend/nodes/Makefile
@@ -56,6 +56,7 @@ node_headers = \
 	nodes/bitmapset.h \
 	nodes/extensible.h \
 	nodes/lockoptions.h \
+	nodes/miscnodes.h \
 	nodes/replnodes.h \
 	nodes/supportnodes.h \
 	nodes/value.h \
diff --git a/src/backend/nodes/gen_node_support.pl b/src/backend/nodes/gen_node_support.pl
index 7212bc486f..08992dfd47 100644
--- a/src/backend/nodes/gen_node_support.pl
+++ b/src/backend/nodes/gen_node_support.pl
@@ -68,6 +68,7 @@ my @all_input_files = qw(
   nodes/bitmapset.h
   nodes/extensible.h
   nodes/lockoptions.h
+  nodes/miscnodes.h
   nodes/replnodes.h
   nodes/supportnodes.h
   nodes/value.h
@@ -89,6 +90,7 @@ my @nodetag_only_files = qw(
   executor/tuptable.h
   foreign/fdwapi.h
   nodes/lockoptions.h
+  nodes/miscnodes.h
   nodes/replnodes.h
   nodes/supportnodes.h
 );
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index f5cd1b7493..a36aeb832e 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -71,6 +71,7 @@
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
 #include "mb/pg_wchar.h"
+#include "nodes/miscnodes.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/bgworker.h"
@@ -611,6 +612,128 @@ errfinish(const char *filename, int lineno, const char *funcname)
 	CHECK_FOR_INTERRUPTS();
 }
 
+
+/*
+ * errsave_start --- begin a "soft" error-reporting cycle
+ *
+ * If "context" isn't an ErrorSaveContext node, this behaves as
+ * errstart(ERROR, domain), and the errsave() macro ends up acting
+ * exactly like ereport(ERROR, ...).
+ *
+ * If "context" is an ErrorSaveContext node, but the node creator only wants
+ * notification of the fact of a soft error without any details, just set
+ * the error_occurred flag in the ErrorSaveContext node and return false,
+ * which will cause us to skip the remaining error processing steps.
+ *
+ * Otherwise, create and initialize error stack entry and return true.
+ * Subsequently, errmsg() and perhaps other routines will be called to further
+ * populate the stack entry.  Finally, errsave_finish() will be called to
+ * tidy up.
+ */
+bool
+errsave_start(NodePtr context, const char *domain)
+{
+	ErrorSaveContext *escontext;
+	ErrorData  *edata;
+
+	/*
+	 * Do we have a context for soft error reporting?  If not, just punt to
+	 * errstart().
+	 */
+	if (context == NULL || !IsA(context, ErrorSaveContext))
+		return errstart(ERROR, domain);
+
+	/* Report that a soft error was detected */
+	escontext = (ErrorSaveContext *) context;
+	escontext->error_occurred = true;
+
+	/* Nothing else to do if caller wants no further details */
+	if (!escontext->details_wanted)
+		return false;
+
+	/*
+	 * Okay, crank up a stack entry to store the info in.
+	 */
+
+	recursion_depth++;
+
+	/* Initialize data for this error frame */
+	edata = get_error_stack_entry();
+	edata->elevel = LOG;		/* signal all is well to errsave_finish */
+	set_stack_entry_domain(edata, domain);
+	/* Select default errcode based on the assumed elevel of ERROR */
+	edata->sqlerrcode = ERRCODE_INTERNAL_ERROR;
+
+	/*
+	 * Any allocations for this error state level should go into the caller's
+	 * context.  We don't need to pollute ErrorContext, or even require it to
+	 * exist, in this code path.
+	 */
+	edata->assoc_context = 

Re: Error-safe user functions

2022-12-07 Thread Tom Lane
"David G. Johnston"  writes:
> On Wed, Dec 7, 2022 at 8:04 AM Andrew Dunstan  wrote:
>> I'm not sure InputFunctionCallSoft would be an improvement. Maybe
>> InputFunctionCallSoftError would be clearer, but I don't know that it's
>> much of an improvement either. The same goes for the other visible changes.

> InputFunctionCallSafe -> TryInputFunctionCall

I think we are already using "TryXXX" for code that involves catching
ereport errors.  Since the whole point here is that we are NOT doing
that, I think this naming would be more confusing than helpful.

> Unrelated observation: "Although the error stack is not large, we don't
> expect to run out of space." -> "Because the error stack is not large,
> assume that we will not run out of space and panic if we are wrong."?

That doesn't seem to make the point I wanted to make.

I've adopted your other suggestions in the v4 I'm preparing now.

regards, tom lane




Re: Error-safe user functions

2022-12-07 Thread Tom Lane
I wrote:
> Andres Freund  writes:
>> Is there a guarantee that input functions are stable or immutable?

> There's a project policy that that should be true.  That justifies
> marking things like record_in as stable --- if the per-column input
> functions could be volatile, record_in would need to be as well.
> There are other dependencies on it; see e.g. aab353a60, 3db6524fe.

I dug in the archives and found the thread leading up to aab353a60:

https://www.postgresql.org/message-id/flat/AANLkTik8v7O9QR9jjHNVh62h-COC1B0FDUNmEYMdtKjR%40mail.gmail.com

regards, tom lane




Re: Error-safe user functions

2022-12-07 Thread Tom Lane
Andres Freund  writes:
> Is there a guarantee that input functions are stable or immutable?

There's a project policy that that should be true.  That justifies
marking things like record_in as stable --- if the per-column input
functions could be volatile, record_in would need to be as well.
There are other dependencies on it; see e.g. aab353a60, 3db6524fe.

> We don't
> have any volatile input functions in core PG:

Indeed, because type_sanity.sql checks that.

regards, tom lane




  1   2   >