Re: Underscores in numeric literals

2023-02-04 Thread Dean Rasheed
On Thu, 2 Feb 2023 at 22:40, Peter Eisentraut
 wrote:
>
> On 31.01.23 17:09, Dean Rasheed wrote:
> > On Tue, 31 Jan 2023 at 15:28, Peter Eisentraut
> >  wrote:
> >>
> >> Did you have any thoughts about what to do with the float types?  I
> >> guess we could handle those in a separate patch?
> >>
> >
> > I was assuming that we'd do nothing for float types, because anything
> > we did would necessarily impact their performance.
>
> Yeah, as long as we are using strtof() and strtod() we should just leave
> it alone.  If we have break that open and hand-code something, we can
> reconsider it.
>
> So I think you could go ahead with committing your patch and we can
> consider this topic done for now.
>

Done.

Regards,
Dean




Re: Underscores in numeric literals

2023-02-02 Thread Peter Eisentraut

On 31.01.23 17:09, Dean Rasheed wrote:

On Tue, 31 Jan 2023 at 15:28, Peter Eisentraut
 wrote:


Did you have any thoughts about what to do with the float types?  I
guess we could handle those in a separate patch?



I was assuming that we'd do nothing for float types, because anything
we did would necessarily impact their performance.


Yeah, as long as we are using strtof() and strtod() we should just leave 
it alone.  If we have break that open and hand-code something, we can 
reconsider it.


So I think you could go ahead with committing your patch and we can 
consider this topic done for now.






Re: Underscores in numeric literals

2023-01-31 Thread Dean Rasheed
On Tue, 31 Jan 2023 at 15:28, Peter Eisentraut
 wrote:
>
> Did you have any thoughts about what to do with the float types?  I
> guess we could handle those in a separate patch?
>

I was assuming that we'd do nothing for float types, because anything
we did would necessarily impact their performance.

Regards,
Dean




Re: Underscores in numeric literals

2023-01-31 Thread Peter Eisentraut

On 23.01.23 21:45, Dean Rasheed wrote:

On Wed, 4 Jan 2023 at 09:28, Dean Rasheed  wrote:


In addition, I think that strip_underscores() could then go away if
numeric_in() were made to handle underscores.

Essentially then, that would move all responsibility for parsing
underscores and non-decimal integers to the datatype input functions,
or their support routines, rather than having it distributed.


Here's an update with those changes.


This looks good to me.

Did you have any thoughts about what to do with the float types?  I 
guess we could handle those in a separate patch?






Re: Underscores in numeric literals

2023-01-23 Thread Dean Rasheed
On Wed, 4 Jan 2023 at 09:28, Dean Rasheed  wrote:
>
> In addition, I think that strip_underscores() could then go away if
> numeric_in() were made to handle underscores.
>
> Essentially then, that would move all responsibility for parsing
> underscores and non-decimal integers to the datatype input functions,
> or their support routines, rather than having it distributed.
>

Here's an update with those changes.

Regards,
Dean
diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
new file mode 100644
index 0ccddea..9509d0f
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -729,6 +729,20 @@ $function$
 
 
 
+ For visual grouping, underscores can be inserted between digits.  These
+ have no further effect on the value of the literal.  For example:
+1_500_000_000
+0b10001000_
+0o_1_755
+0x_
+1.618_034
+
+ Underscores are not allowed at the start or end of a numeric constant or
+ a group of digits (that is, immediately before or after a period or the
+ e), and more than one underscore in a row is not allowed.
+
+
+
  integer
  bigint
  numeric
diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
new file mode 100644
index abad216..3766762
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -528,6 +528,7 @@ T653	SQL-schema statements in external r
 T654	SQL-dynamic statements in external routines			NO	
 T655	Cyclically dependent routines			YES	
 T661	Non-decimal integer literals			YES	SQL:202x draft
+T662	Underscores in integer literals			YES	SQL:202x draft
 T811	Basic SQL/JSON constructor functions			NO	
 T812	SQL/JSON: JSON_OBJECTAGG			NO	
 T813	SQL/JSON: JSON_ARRAYAGG with ORDER BY			NO	
diff --git a/src/backend/parser/parse_node.c b/src/backend/parser/parse_node.c
new file mode 100644
index f1967a3..5020b9f
--- a/src/backend/parser/parse_node.c
+++ b/src/backend/parser/parse_node.c
@@ -19,6 +19,7 @@
 #include "catalog/pg_type.h"
 #include "mb/pg_wchar.h"
 #include "nodes/makefuncs.h"
+#include "nodes/miscnodes.h"
 #include "nodes/nodeFuncs.h"
 #include "nodes/subscripting.h"
 #include "parser/parse_coerce.h"
@@ -385,47 +386,11 @@ make_const(ParseState *pstate, A_Const *
 			{
 /* could be an oversize integer as well as a float ... */
 
-int			base = 10;
-char	   *startptr;
-int			sign;
-char	   *testvalue;
+ErrorSaveContext escontext = {T_ErrorSaveContext};
 int64		val64;
-char	   *endptr;
 
-startptr = aconst->val.fval.fval;
-if (startptr[0] == '-')
-{
-	sign = -1;
-	startptr++;
-}
-else
-	sign = +1;
-if (startptr[0] == '0')
-{
-	if (startptr[1] == 'b' || startptr[1] == 'B')
-	{
-		base = 2;
-		startptr += 2;
-	}
-	else if (startptr[1] == 'o' || startptr[1] == 'O')
-	{
-		base = 8;
-		startptr += 2;
-	}
-	else if (startptr[1] == 'x' || startptr[1] == 'X')
-	{
-		base = 16;
-		startptr += 2;
-	}
-}
-
-if (sign == +1)
-	testvalue = startptr;
-else
-	testvalue = psprintf("-%s", startptr);
-errno = 0;
-val64 = strtoi64(testvalue, , base);
-if (errno == 0 && *endptr == '\0')
+val64 = pg_strtoint64_safe(aconst->val.fval.fval, (Node *) );
+if (!escontext.error_occurred)
 {
 	/*
 	 * It might actually fit in int32. Probably only INT_MIN
diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l
new file mode 100644
index 1e821d4..b2216a9
--- a/src/backend/parser/scan.l
+++ b/src/backend/parser/scan.l
@@ -37,10 +37,12 @@
 
 #include "common/string.h"
 #include "gramparse.h"
+#include "nodes/miscnodes.h"
 #include "parser/parser.h"		/* only needed for GUC variables */
 #include "parser/scansup.h"
 #include "port/pg_bitutils.h"
 #include "mb/pg_wchar.h"
+#include "utils/builtins.h"
 }
 
 %{
@@ -395,19 +397,19 @@ hexdigit		[0-9A-Fa-f]
 octdigit		[0-7]
 bindigit		[0-1]
 
-decinteger		{decdigit}+
-hexinteger		0[xX]{hexdigit}+
-octinteger		0[oO]{octdigit}+
-bininteger		0[bB]{bindigit}+
+decinteger		{decdigit}(_?{decdigit})*
+hexinteger		0[xX](_?{hexdigit})+
+octinteger		0[oO](_?{octdigit})+
+bininteger		0[bB](_?{bindigit})+
 
-hexfail			0[xX]
-octfail			0[oO]
-binfail			0[bB]
+hexfail			0[xX]_?
+octfail			0[oO]_?
+binfail			0[bB]_?
 
 numeric			(({decinteger}\.{decinteger}?)|(\.{decinteger}))
 numericfail		{decdigit}+\.\.
 
-real			({decinteger}|{numeric})[Ee][-+]?{decdigit}+
+real			({decinteger}|{numeric})[Ee][-+]?{decinteger}
 realfail		({decinteger}|{numeric})[Ee][-+]
 
 decinteger_junk	{decinteger}{ident_start}
@@ -1364,12 +1366,11 @@ litbufdup(core_yyscan_t yyscanner)
 static int
 process_integer_literal(const char *token, YYSTYPE *lval, int base)
 {
-	int			val;
-	char	   *endptr;
+	ErrorSaveContext escontext = {T_ErrorSaveContext};
+	int32		val;
 
-	errno = 0;
-	val = strtoint(base == 10 ? token : token + 2, , base);
-	if (*endptr != '\0' || errno == ERANGE)
+	val 

Re: Underscores in numeric literals

2023-01-04 Thread Dean Rasheed
Oh, one other minor nit -- in parser/scan.l:

-real   ({decinteger}|{numeric})[Ee][-+]?{decdigit}+
+real   ({decinteger}|{numeric})[Ee][-+]?{decinteger}+

the final "+" isn't necessary now.

Regards,
Dean




Re: Underscores in numeric literals

2023-01-04 Thread Dean Rasheed
On Wed, 28 Dec 2022 at 14:28, Andrew Dunstan  wrote:
>
> On 2022-12-27 Tu 09:55, Tom Lane wrote:
> > We already accept that numeric input is different from numeric
> > literals: you can't write Infinity or NaN in SQL without quotes.
> > So I don't see an argument that we have to allow this in numeric
> > input for consistency.
>
> That's almost the same, but not quite, ISTM. Those are things you can't
> say without quotes, but here unless I'm mistaken you'd be disallowing
> this style if you use quotes. I get the difficulties with input
> functions, but it seems like we'll be building lots of grounds for
> confusion.
>

Yeah, it's easy to see why something like 'NaN' needs quotes, but it
would be harder to explain why something like 1000_000 mustn't have
quotes, and couldn't be used as input to COPY.

My feeling is that we should try to make the datatype input functions
accept anything that is legal syntax as a numeric literal, even if the
reverse isn't always possible.

That said, I think it's very important to minimise any performance
hit, especially in the existing case of inputs with no underscores.

Looking at the patch's changes to pg_strtointNN(), I think there's
more that can be done to reduce that performance hit. As it stands,
every input character is checked to see if it's an underscore, and
then there's a new check at the end to ensure that the input string
doesn't have a trailing underscore. Both of those can be avoided by
rearranging things a little, as in the attached v2 patch.

In the v2 patch, each input character is only compared with underscore
if it's not a digit, so in the case of an input with no underscores or
trailing spaces, the new checks for underscores are never executed.

In addition, if an underscore is seen, it now checks that the next
character is a digit. This eliminates the possibility of two
underscores in a row, and also of a trailing underscore, and so there
is no need for the final check for trailing underscores.

Thus, if the input consists only of digits, it never has to test for
underscores at all, and the performance hit for this case is
minimised.

My other concern with this patch is that the responsibility for
handling underscores is distributed over a couple of different places.
I had the same concern about the non-decimal integer patch, but at the
time I couldn't see any way round it. Now that we have soft error
handling though, I think that there is a way to improve this,
centralising the logic for both underscore and non-decimal handling to
one place for each datatype, reducing code duplication and the chances
of bugs.

For example, make_const() in the T_Float case has gained new code to
parse both the sign and base-prefix of the input, duplicating the
logic in pg_strtointNN(). That can now be avoided by having it call
pg_strtoint64_safe() with an ErrorSaveContext, instead of strtoi64().
In the process, it would then gain the ability to handle underscores,
so they wouldn't need to be stripped off elsewhere.

Similarly, process_integer_literal() could be made to call
pg_strtoint32_safe() with an ErrorSaveContext instead of strtoint(),
and it then wouldn't need to strip off underscores, or be passed the
number's base, since pg_strtoint32_safe() would handle all of that.

In addition, I think that strip_underscores() could then go away if
numeric_in() were made to handle underscores.

Essentially then, that would move all responsibility for parsing
underscores and non-decimal integers to the datatype input functions,
or their support routines, rather than having it distributed.

Regards,
Dean
diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
new file mode 100644
index 956182e..27e53b4
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -729,6 +729,20 @@ $function$
 
 
 
+ For visual grouping, underscores can be inserted between digits.  These
+ have no further effect on the value of the literal.  For example:
+1_500_000_000
+0b10001000_
+0o_1_755
+0x_
+1.618_034
+
+ Underscores are not allowed at the start or end of a numeric constant or
+ a group of digits (that is, immediately before or after a period or the
+ e), and more than one underscore in a row is not allowed.
+
+
+
  integer
  bigint
  numeric
diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
new file mode 100644
index abad216..3766762
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -528,6 +528,7 @@ T653	SQL-schema statements in external r
 T654	SQL-dynamic statements in external routines			NO	
 T655	Cyclically dependent routines			YES	
 T661	Non-decimal integer literals			YES	SQL:202x draft
+T662	Underscores in integer literals			YES	SQL:202x draft
 T811	Basic SQL/JSON constructor functions			NO	
 T812	SQL/JSON: JSON_OBJECTAGG			NO	
 T813	SQL/JSON: JSON_ARRAYAGG with ORDER BY			NO	
diff --git a/src/backend/parser/scan.l 

Re: Underscores in numeric literals

2022-12-28 Thread Andrew Dunstan


On 2022-12-27 Tu 09:55, Tom Lane wrote:
> We already accept that numeric input is different from numeric
> literals: you can't write Infinity or NaN in SQL without quotes.
> So I don't see an argument that we have to allow this in numeric
> input for consistency.
>

That's almost the same, but not quite, ISTM. Those are things you can't
say without quotes, but here unless I'm mistaken you'd be disallowing
this style if you use quotes. I get the difficulties with input
functions, but it seems like we'll be building lots of grounds for
confusion.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Underscores in numeric literals

2022-12-27 Thread Justin Pryzby
On Tue, Dec 27, 2022 at 09:55:32AM -0500, Tom Lane wrote:
> Peter Eisentraut  writes:
> > Here is a patch to add support for underscores in numeric literals, for 
> > visual grouping, like
> 
> >  1_500_000_000
> >  0b10001000_
> >  0o_1_755
> >  0x_
> >  1.618_034
> 
> > per SQL:202x draft.
> 
> > This adds support in the lexer as well as in the integer type input 
> > functions.
> > TODO: float/numeric type input support
> 
> Hmm ... I'm on board with allowing this in SQL if the committee says
> so.

> I'm not especially on board with accepting it in datatype input
> functions.  There's been zero demand for that AFAIR.  Moreover,
> I don't think we need the inevitable I/O performance hit, nor the
> increased risk of accepting garbage, nor the certainty of
> inconsistency with other places that don't get converted (because
> they depend on strtoul() or whatever).

+1 to accept underscores only in literals and leave input functions
alone.

(When I realized that python3.6 changed to accept things like
int("3_5"), I felt compelled to write a wrapper to check for embedded
underscores and raise an exception in that case.  And I'm sure it
affected performance.)

-- 
Justin




Re: Underscores in numeric literals

2022-12-27 Thread Tom Lane
Peter Eisentraut  writes:
> Here is a patch to add support for underscores in numeric literals, for 
> visual grouping, like

>  1_500_000_000
>  0b10001000_
>  0o_1_755
>  0x_
>  1.618_034

> per SQL:202x draft.

> This adds support in the lexer as well as in the integer type input 
> functions.
> TODO: float/numeric type input support

Hmm ... I'm on board with allowing this in SQL if the committee says
so.  I'm not especially on board with accepting it in datatype input
functions.  There's been zero demand for that AFAIR.  Moreover,
I don't think we need the inevitable I/O performance hit, nor the
increased risk of accepting garbage, nor the certainty of
inconsistency with other places that don't get converted (because
they depend on strtoul() or whatever).

We already accept that numeric input is different from numeric
literals: you can't write Infinity or NaN in SQL without quotes.
So I don't see an argument that we have to allow this in numeric
input for consistency.

regards, tom lane




Underscores in numeric literals

2022-12-27 Thread Peter Eisentraut
Here is a patch to add support for underscores in numeric literals, for 
visual grouping, like


1_500_000_000
0b10001000_
0o_1_755
0x_
1.618_034

per SQL:202x draft.

This adds support in the lexer as well as in the integer type input 
functions.


TODO: float/numeric type input support

I did some performance tests similar to what was done in [0] and didn't 
find any problematic deviations.  Other tests would be welcome.


[0]: 
https://www.postgresql.org/message-id/flat/b239564c-cad0-b23e-c57e-166d883cb...@enterprisedb.comFrom 36aef78423e9adc6ebe72fb2a3cf43e385a2caca Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 27 Dec 2022 10:10:18 +0100
Subject: [PATCH v1] Underscores in numeric literals

Add support for underscores in numeric literals, for visual grouping,
like

1_500_000_000
0b10001000_
0o_1_755
0x_
1.618_034

per SQL:202x draft.

This adds support in the lexer as well as in the integer type input
functions.

TODO: float/numeric type input support

Discussion: 
https://www.postgresql.org/message-id/flat/b239564c-cad0-b23e-c57e-166d883cb...@enterprisedb.com
---
 doc/src/sgml/syntax.sgml  |  14 ++
 src/backend/catalog/sql_features.txt  |   1 +
 src/backend/parser/scan.l |  63 ++--
 src/backend/utils/adt/numutils.c  | 144 --
 src/fe_utils/psqlscan.l   |  16 +-
 src/interfaces/ecpg/preproc/pgc.l |  16 +-
 src/pl/plpgsql/src/expected/plpgsql_trap.out  |   2 +-
 src/pl/plpgsql/src/sql/plpgsql_trap.sql   |   2 +-
 src/test/regress/expected/int2.out|  44 ++
 src/test/regress/expected/int4.out|  44 ++
 src/test/regress/expected/int8.out|  44 ++
 src/test/regress/expected/numerology.out  |  92 ++-
 src/test/regress/expected/partition_prune.out |   6 +-
 src/test/regress/sql/int2.sql |  14 ++
 src/test/regress/sql/int4.sql |  14 ++
 src/test/regress/sql/int8.sql |  14 ++
 src/test/regress/sql/numerology.sql   |  24 ++-
 src/test/regress/sql/partition_prune.sql  |   6 +-
 18 files changed, 509 insertions(+), 51 deletions(-)

diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index 956182e7c6..27e53b4b46 100644
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -728,6 +728,20 @@ Numeric Constants
  
 
 
+
+ For visual grouping, underscores can be inserted between digits.  These
+ have no further effect on the value of the literal.  For example:
+1_500_000_000
+0b10001000_
+0o_1_755
+0x_
+1.618_034
+
+ Underscores are not allowed at the start or end of a numeric constant or
+ a group of digits (that is, immediately before or after a period or the
+ e), and more than one underscore in a row is not allowed.
+
+
 
  integer
  bigint
diff --git a/src/backend/catalog/sql_features.txt 
b/src/backend/catalog/sql_features.txt
index abad216b7e..3766762ae3 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -528,6 +528,7 @@ T653SQL-schema statements in external routines  
YES
 T654   SQL-dynamic statements in external routines NO  
 T655   Cyclically dependent routines   YES 
 T661   Non-decimal integer literalsYES SQL:202x draft
+T662   Underscores in integer literals YES SQL:202x draft
 T811   Basic SQL/JSON constructor functionsNO  
 T812   SQL/JSON: JSON_OBJECTAGGNO  
 T813   SQL/JSON: JSON_ARRAYAGG with ORDER BY   NO  
diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l
index 9ad9e0c8ba..a1ea94ef06 100644
--- a/src/backend/parser/scan.l
+++ b/src/backend/parser/scan.l
@@ -124,6 +124,7 @@ static void addlit(char *ytext, int yleng, core_yyscan_t 
yyscanner);
 static void addlitchar(unsigned char ychar, core_yyscan_t yyscanner);
 static char *litbufdup(core_yyscan_t yyscanner);
 static unsigned char unescape_single_char(unsigned char c, core_yyscan_t 
yyscanner);
+static char *strip_underscores(const char *in);
 static int process_integer_literal(const char *token, YYSTYPE *lval, int 
base);
 static void addunicode(pg_wchar c, yyscan_t yyscanner);
 
@@ -395,19 +396,19 @@ hexdigit  [0-9A-Fa-f]
 octdigit   [0-7]
 bindigit   [0-1]
 
-decinteger {decdigit}+
-hexinteger 0[xX]{hexdigit}+
-octinteger 0[oO]{octdigit}+
-bininteger 0[bB]{bindigit}+
+decinteger {decdigit}(_?{decdigit})*
+hexinteger 0[xX](_?{hexdigit})+
+octinteger 0[oO](_?{octdigit})+
+bininteger 0[bB](_?{bindigit})+
 
-hexfail0[xX]
-octfail