Another modest update, because of the copyright year update preventing
the previous patches from applying cleanly.
I also did a bit of work on the ecpg scanner so that it also handles
some errors on par with the main scanner.
There is still no automated testing of this in ecpg, but I have a bunch
of single-line test files that can provoke various errors. I will keep
these around and maybe put them into something more formal in the future.
On 30.12.21 10:43, Peter Eisentraut wrote:
There has been some other refactoring going on, which made this patch
set out of date. So here is an update.
The old pg_strtouint64() has been removed, so there is no longer a
naming concern with patch 0001. That one should be good to go.
I also found that yet another way to parse integers in pg_atoi() has
mostly faded away in utility, so I removed the last two callers and
removed the function in 0002 and 0003.
The remaining patches are as before, with some of the review comments
applied. I still need to write some lexing unit tests for ecpg, which I
haven't gotten to yet. This affects patches 0004 and 0005.
As mentioned before, patches 0006 and 0007 are more feature previews at
this point.
On 01.12.21 16:47, Peter Eisentraut wrote:
On 25.11.21 18:51, John Naylor wrote:
If we're going to change the comment anyway, "the parser" sounds more
natural. Aside from that, 0001 and 0002 can probably be pushed now,
if you like.
done
--- a/src/interfaces/ecpg/preproc/pgc.l
+++ b/src/interfaces/ecpg/preproc/pgc.l
@@ -365,6 +365,10 @@ real ({integer}|{decimal})[Ee][-+]?{digit}+
realfail1 ({integer}|{decimal})[Ee]
realfail2 ({integer}|{decimal})[Ee][-+]
+integer_junk {integer}{ident_start}
+decimal_junk {decimal}{ident_start}
+real_junk {real}{ident_start}
A comment might be good here to explain these are only in ECPG for
consistency with the other scanners. Not really important, though.
Yeah, it's a bit weird that not all the symbols are used in ecpg.
I'll look into explaining this better.
0006
+{hexfail} {
+ yyerror("invalid hexadecimal integer");
+ }
+{octfail} {
+ yyerror("invalid octal integer");
}
-{decimal} {
+{binfail} {
+ yyerror("invalid binary integer");
+ }
It seems these could use SET_YYLLOC(), since the error cursor doesn't
match other failure states:
ok
We might consider some tests for ECPG since lack of coverage has been
a problem.
right
Also, I'm curious: how does the spec work as far as deciding the year
of release, or feature-freezing of new items?
The schedule has recently been extended again, so the current plan is
for SQL:202x with x=3, with feature freeze in mid-2022.
So the feature patches in this thread are in my mind now targeting
PG15+1. But the preparation work (up to v5-0005, and some other
number parsing refactoring that I'm seeing) could be considered for PG15.
I'll move this to the next CF and come back with an updated patch set
in a little while.
From e7aad2b81e9be2b53dad73c66e692a80fc2f81e1 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <pe...@eisentraut.org>
Date: Thu, 30 Dec 2021 10:26:37 +0100
Subject: [PATCH v7 1/7] Move scanint8() to numutils.c
Move scanint8() to numutils.c and rename to pg_strtoint64(). We
already have a "16" and "32" version of that, and the code inside the
functions was aligned, so this move makes all three versions
consistent. The API is also changed to no longer provide the errorOK
case. Users that need the error checking can use strtoi64().
Discussion:
https://www.postgresql.org/message-id/flat/b239564c-cad0-b23e-c57e-166d883cb...@enterprisedb.com
---
src/backend/parser/parse_node.c | 12 ++-
src/backend/replication/pgoutput/pgoutput.c | 9 ++-
src/backend/utils/adt/int8.c | 90 +--------------------
src/backend/utils/adt/numutils.c | 84 +++++++++++++++++++
src/bin/pgbench/pgbench.c | 4 +-
src/include/utils/builtins.h | 1 +
src/include/utils/int8.h | 25 ------
7 files changed, 103 insertions(+), 122 deletions(-)
delete mode 100644 src/include/utils/int8.h
diff --git a/src/backend/parser/parse_node.c b/src/backend/parser/parse_node.c
index ba9baf140c..8dd821b761 100644
--- a/src/backend/parser/parse_node.c
+++ b/src/backend/parser/parse_node.c
@@ -26,7 +26,6 @@
#include "parser/parse_relation.h"
#include "parser/parsetree.h"
#include "utils/builtins.h"
-#include "utils/int8.h"
#include "utils/lsyscache.h"
#include "utils/syscache.h"
#include "utils/varbit.h"
@@ -353,7 +352,6 @@ make_const(ParseState *pstate, A_Const *aconst)
{
Const *con;
Datum val;
- int64 val64;
Oid typeid;
int typelen;
bool typebyval;
@@ -384,8 +382,15 @@ make_const(ParseState *pstate, A_Const *aconst)
break;
case T_Float:
+ {
/* could be an oversize integer as well as a float ...
*/
- if (scanint8(aconst->val.fval.val, true, &val64))
+
+ int64 val64;
+ char *endptr;
+
+ errno = 0;
+ val64 = strtoi64(aconst->val.fval.val, &endptr, 10);
+ if (errno == 0 && *endptr == '\0')
{
/*
* It might actually fit in int32. Probably
only INT_MIN can
@@ -425,6 +430,7 @@ make_const(ParseState *pstate, A_Const *aconst)
typebyval = false;
}
break;
+ }
case T_String:
diff --git a/src/backend/replication/pgoutput/pgoutput.c
b/src/backend/replication/pgoutput/pgoutput.c
index af8d51aee9..0570caa351 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -21,7 +21,6 @@
#include "replication/logicalproto.h"
#include "replication/origin.h"
#include "replication/pgoutput.h"
-#include "utils/int8.h"
#include "utils/inval.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
@@ -205,7 +204,8 @@ parse_output_parameters(List *options, PGOutputData *data)
/* Check each param, whether or not we recognize it */
if (strcmp(defel->defname, "proto_version") == 0)
{
- int64 parsed;
+ unsigned long parsed;
+ char *endptr;
if (protocol_version_given)
ereport(ERROR,
@@ -213,12 +213,13 @@ parse_output_parameters(List *options, PGOutputData *data)
errmsg("conflicting or
redundant options")));
protocol_version_given = true;
- if (!scanint8(strVal(defel->arg), true, &parsed))
+ parsed = strtoul(strVal(defel->arg), &endptr, 10);
+ if (errno || *endptr != '\0')
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("invalid
proto_version")));
- if (parsed > PG_UINT32_MAX || parsed < 0)
+ if (parsed > PG_UINT32_MAX)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("proto_version \"%s\"
out of range",
diff --git a/src/backend/utils/adt/int8.c b/src/backend/utils/adt/int8.c
index ad19d154ff..4a87114a4f 100644
--- a/src/backend/utils/adt/int8.c
+++ b/src/backend/utils/adt/int8.c
@@ -24,7 +24,6 @@
#include "nodes/supportnodes.h"
#include "optimizer/optimizer.h"
#include "utils/builtins.h"
-#include "utils/int8.h"
typedef struct
@@ -45,99 +44,14 @@ typedef struct
* Formatting and conversion routines.
*---------------------------------------------------------*/
-/*
- * scanint8 --- try to parse a string into an int8.
- *
- * If errorOK is false, ereport a useful error message if the string is bad.
- * If errorOK is true, just return "false" for bad input.
- */
-bool
-scanint8(const char *str, bool errorOK, int64 *result)
-{
- const char *ptr = str;
- int64 tmp = 0;
- bool neg = false;
-
- /*
- * Do our own scan, rather than relying on sscanf which might be broken
- * for long long.
- *
- * As INT64_MIN can't be stored as a positive 64 bit integer, accumulate
- * value as a negative number.
- */
-
- /* skip leading spaces */
- while (*ptr && isspace((unsigned char) *ptr))
- ptr++;
-
- /* handle sign */
- if (*ptr == '-')
- {
- ptr++;
- neg = true;
- }
- else if (*ptr == '+')
- ptr++;
-
- /* require at least one digit */
- if (unlikely(!isdigit((unsigned char) *ptr)))
- goto invalid_syntax;
-
- /* process digits */
- while (*ptr && isdigit((unsigned char) *ptr))
- {
- int8 digit = (*ptr++ - '0');
-
- if (unlikely(pg_mul_s64_overflow(tmp, 10, &tmp)) ||
- unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
- goto out_of_range;
- }
-
- /* allow trailing whitespace, but not other trailing chars */
- while (*ptr != '\0' && isspace((unsigned char) *ptr))
- ptr++;
-
- if (unlikely(*ptr != '\0'))
- goto invalid_syntax;
-
- if (!neg)
- {
- /* could fail if input is most negative number */
- if (unlikely(tmp == PG_INT64_MIN))
- goto out_of_range;
- tmp = -tmp;
- }
-
- *result = tmp;
- return true;
-
-out_of_range:
- if (!errorOK)
- ereport(ERROR,
- (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
- errmsg("value \"%s\" is out of range for type
%s",
- str, "bigint")));
- return false;
-
-invalid_syntax:
- if (!errorOK)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
- errmsg("invalid input syntax for type %s:
\"%s\"",
- "bigint", str)));
- return false;
-}
-
/* int8in()
*/
Datum
int8in(PG_FUNCTION_ARGS)
{
- char *str = PG_GETARG_CSTRING(0);
- int64 result;
+ char *num = PG_GETARG_CSTRING(0);
- (void) scanint8(str, false, &result);
- PG_RETURN_INT64(result);
+ PG_RETURN_INT64(pg_strtoint64(num));
}
diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c
index 898a9e3f9a..e82d23a325 100644
--- a/src/backend/utils/adt/numutils.c
+++ b/src/backend/utils/adt/numutils.c
@@ -325,6 +325,90 @@ pg_strtoint32(const char *s)
return 0; /* keep compiler quiet
*/
}
+/*
+ * Convert input string to a signed 64 bit integer.
+ *
+ * Allows any number of leading or trailing whitespace characters. Will throw
+ * ereport() upon bad input format or overflow.
+ *
+ * NB: Accumulate input as a negative number, to deal with two's complement
+ * representation of the most negative number, which can't be represented as a
+ * positive number.
+ */
+int64
+pg_strtoint64(const char *s)
+{
+ const char *ptr = s;
+ int64 tmp = 0;
+ bool neg = false;
+
+ /*
+ * Do our own scan, rather than relying on sscanf which might be broken
+ * for long long.
+ *
+ * As INT64_MIN can't be stored as a positive 64 bit integer, accumulate
+ * value as a negative number.
+ */
+
+ /* skip leading spaces */
+ while (*ptr && isspace((unsigned char) *ptr))
+ ptr++;
+
+ /* handle sign */
+ if (*ptr == '-')
+ {
+ ptr++;
+ neg = true;
+ }
+ else if (*ptr == '+')
+ ptr++;
+
+ /* require at least one digit */
+ if (unlikely(!isdigit((unsigned char) *ptr)))
+ goto invalid_syntax;
+
+ /* process digits */
+ while (*ptr && isdigit((unsigned char) *ptr))
+ {
+ int8 digit = (*ptr++ - '0');
+
+ if (unlikely(pg_mul_s64_overflow(tmp, 10, &tmp)) ||
+ unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
+ goto out_of_range;
+ }
+
+ /* allow trailing whitespace, but not other trailing chars */
+ while (*ptr != '\0' && isspace((unsigned char) *ptr))
+ ptr++;
+
+ if (unlikely(*ptr != '\0'))
+ goto invalid_syntax;
+
+ if (!neg)
+ {
+ /* could fail if input is most negative number */
+ if (unlikely(tmp == PG_INT64_MIN))
+ goto out_of_range;
+ tmp = -tmp;
+ }
+
+ return tmp;
+
+out_of_range:
+ ereport(ERROR,
+ (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+ errmsg("value \"%s\" is out of range for type %s",
+ s, "bigint")));
+
+invalid_syntax:
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+ errmsg("invalid input syntax for type %s: \"%s\"",
+ "bigint", s)));
+
+ return 0; /* keep compiler quiet
*/
+}
+
/*
* pg_itoa: converts a signed 16-bit integer to its string representation
* and returns strlen(a).
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 97f2a1f80a..f166a77e3a 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -787,8 +787,8 @@ is_an_int(const char *str)
/*
* strtoint64 -- convert a string to 64-bit integer
*
- * This function is a slightly modified version of scanint8() from
- * src/backend/utils/adt/int8.c.
+ * This function is a slightly modified version of pg_strtoint64() from
+ * src/backend/utils/adt/numutils.c.
*
* The function returns whether the conversion worked, and if so
* "*result" is set to the result.
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index 7ac4780e3f..191cc854a3 100644
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -46,6 +46,7 @@ extern int namestrcmp(Name name, const char *str);
extern int32 pg_atoi(const char *s, int size, int c);
extern int16 pg_strtoint16(const char *s);
extern int32 pg_strtoint32(const char *s);
+extern int64 pg_strtoint64(const char *s);
extern int pg_itoa(int16 i, char *a);
extern int pg_ultoa_n(uint32 l, char *a);
extern int pg_ulltoa_n(uint64 l, char *a);
diff --git a/src/include/utils/int8.h b/src/include/utils/int8.h
deleted file mode 100644
index f0386c4008..0000000000
--- a/src/include/utils/int8.h
+++ /dev/null
@@ -1,25 +0,0 @@
-/*-------------------------------------------------------------------------
- *
- * int8.h
- * Declarations for operations on 64-bit integers.
- *
- *
- * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
- * Portions Copyright (c) 1994, Regents of the University of California
- *
- * src/include/utils/int8.h
- *
- * NOTES
- * These data types are supported on all 64-bit architectures, and may
- * be supported through libraries on some 32-bit machines. If your machine
- * is not currently supported, then please try to make it so, then post
- * patches to the postgresql.org hackers mailing list.
- *
- *-------------------------------------------------------------------------
- */
-#ifndef INT8_H
-#define INT8_H
-
-extern bool scanint8(const char *str, bool errorOK, int64 *result);
-
-#endif /* INT8_H */
base-commit: bed6ed3de9b3e62d8c6ee034513d04d769091927
--
2.34.1
From 15bc1f99665a2c52adb2282a4e65d0a628ecaf9b Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <pe...@eisentraut.org>
Date: Thu, 30 Dec 2021 10:26:37 +0100
Subject: [PATCH v7 2/7] Remove one use of pg_atoi()
There was no real need to use this here instead of a simpler API.
---
src/backend/utils/adt/jsonpath_gram.y | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/backend/utils/adt/jsonpath_gram.y
b/src/backend/utils/adt/jsonpath_gram.y
index 7a251b892d..7311d12e35 100644
--- a/src/backend/utils/adt/jsonpath_gram.y
+++ b/src/backend/utils/adt/jsonpath_gram.y
@@ -232,7 +232,7 @@ array_accessor:
;
any_level:
- INT_P { $$ =
pg_atoi($1.val, 4, 0); }
+ INT_P { $$ =
pg_strtoint32($1.val); }
| LAST_P { $$ = -1; }
;
--
2.34.1
From dcbc44a62d06d660314305dff4919041b7408f63 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <pe...@eisentraut.org>
Date: Thu, 30 Dec 2021 10:26:37 +0100
Subject: [PATCH v7 3/7] Remove pg_atoi()
The last caller was int2vectorin(), and having such a general function
for one user didn't seem useful, so just put the required parts inline
and remove the function.
---
src/backend/utils/adt/int.c | 32 ++++++++++--
src/backend/utils/adt/numutils.c | 88 --------------------------------
src/include/utils/builtins.h | 1 -
3 files changed, 28 insertions(+), 93 deletions(-)
diff --git a/src/backend/utils/adt/int.c b/src/backend/utils/adt/int.c
index 8bd234c11c..42ddae99ef 100644
--- a/src/backend/utils/adt/int.c
+++ b/src/backend/utils/adt/int.c
@@ -146,15 +146,39 @@ int2vectorin(PG_FUNCTION_ARGS)
result = (int2vector *) palloc0(Int2VectorSize(FUNC_MAX_ARGS));
- for (n = 0; *intString && n < FUNC_MAX_ARGS; n++)
+ for (n = 0; n < FUNC_MAX_ARGS; n++)
{
+ long l;
+ char *endp;
+
while (*intString && isspace((unsigned char) *intString))
intString++;
if (*intString == '\0')
break;
- result->values[n] = pg_atoi(intString, sizeof(int16), ' ');
- while (*intString && !isspace((unsigned char) *intString))
- intString++;
+
+ errno = 0;
+ l = strtol(intString, &endp, 10);
+
+ if (intString == endp)
+ ereport(ERROR,
+
(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+ errmsg("invalid input syntax for type
%s: \"%s\"",
+ "smallint",
intString)));
+
+ if (errno == ERANGE || l < SHRT_MIN || l > SHRT_MAX)
+ ereport(ERROR,
+
(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+ errmsg("value \"%s\" is out of range
for type %s", intString,
+ "smallint")));
+
+ if (*endp && *endp != ' ')
+ ereport(ERROR,
+
(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+ errmsg("invalid input syntax for type
%s: \"%s\"",
+ "integer", intString)));
+
+ result->values[n] = l;
+ intString = endp;
}
while (*intString && isspace((unsigned char) *intString))
intString++;
diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c
index e82d23a325..cc3f95d399 100644
--- a/src/backend/utils/adt/numutils.c
+++ b/src/backend/utils/adt/numutils.c
@@ -85,94 +85,6 @@ decimalLength64(const uint64 v)
return t + (v >= PowersOfTen[t]);
}
-/*
- * pg_atoi: convert string to integer
- *
- * allows any number of leading or trailing whitespace characters.
- *
- * 'size' is the sizeof() the desired integral result (1, 2, or 4 bytes).
- *
- * c, if not 0, is a terminator character that may appear after the
- * integer (plus whitespace). If 0, the string must end after the integer.
- *
- * Unlike plain atoi(), this will throw ereport() upon bad input format or
- * overflow.
- */
-int32
-pg_atoi(const char *s, int size, int c)
-{
- long l;
- char *badp;
-
- /*
- * Some versions of strtol treat the empty string as an error, but some
- * seem not to. Make an explicit test to be sure we catch it.
- */
- if (s == NULL)
- elog(ERROR, "NULL pointer");
- if (*s == 0)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
- errmsg("invalid input syntax for type %s:
\"%s\"",
- "integer", s)));
-
- errno = 0;
- l = strtol(s, &badp, 10);
-
- /* We made no progress parsing the string, so bail out */
- if (s == badp)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
- errmsg("invalid input syntax for type %s:
\"%s\"",
- "integer", s)));
-
- switch (size)
- {
- case sizeof(int32):
- if (errno == ERANGE
-#if defined(HAVE_LONG_INT_64)
- /* won't get ERANGE on these with 64-bit longs... */
- || l < INT_MIN || l > INT_MAX
-#endif
- )
- ereport(ERROR,
-
(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
- errmsg("value \"%s\" is out of
range for type %s", s,
- "integer")));
- break;
- case sizeof(int16):
- if (errno == ERANGE || l < SHRT_MIN || l > SHRT_MAX)
- ereport(ERROR,
-
(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
- errmsg("value \"%s\" is out of
range for type %s", s,
- "smallint")));
- break;
- case sizeof(int8):
- if (errno == ERANGE || l < SCHAR_MIN || l > SCHAR_MAX)
- ereport(ERROR,
-
(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
- errmsg("value \"%s\" is out of
range for 8-bit integer", s)));
- break;
- default:
- elog(ERROR, "unsupported result size: %d", size);
- }
-
- /*
- * Skip any trailing whitespace; if anything but whitespace remains
before
- * the terminating character, bail out
- */
- while (*badp && *badp != c && isspace((unsigned char) *badp))
- badp++;
-
- if (*badp && *badp != c)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
- errmsg("invalid input syntax for type %s:
\"%s\"",
- "integer", s)));
-
- return (int32) l;
-}
-
/*
* Convert input string to a signed 16 bit integer.
*
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index 191cc854a3..58abf4364a 100644
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -43,7 +43,6 @@ extern void namestrcpy(Name name, const char *str);
extern int namestrcmp(Name name, const char *str);
/* numutils.c */
-extern int32 pg_atoi(const char *s, int size, int c);
extern int16 pg_strtoint16(const char *s);
extern int32 pg_strtoint32(const char *s);
extern int64 pg_strtoint64(const char *s);
--
2.34.1
From fb224fec2251b61cc5cf57806b6741db8f8cc58c Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <pe...@eisentraut.org>
Date: Thu, 30 Dec 2021 10:26:37 +0100
Subject: [PATCH v7 4/7] Add test case for trailing junk after numeric literals
PostgreSQL currently accepts numeric literals with trailing
non-digits, such as 123abc where the abc is treated as the next token.
This may be a bit surprising. This commit adds test cases for this;
subsequent commits intend to change this behavior.
Discussion:
https://www.postgresql.org/message-id/flat/b239564c-cad0-b23e-c57e-166d883cb...@enterprisedb.com
---
src/test/regress/expected/numerology.out | 62 ++++++++++++++++++++++++
src/test/regress/sql/numerology.sql | 16 ++++++
2 files changed, 78 insertions(+)
diff --git a/src/test/regress/expected/numerology.out
b/src/test/regress/expected/numerology.out
index 44d6c435de..2ffc73e854 100644
--- a/src/test/regress/expected/numerology.out
+++ b/src/test/regress/expected/numerology.out
@@ -2,6 +2,68 @@
-- NUMEROLOGY
-- Test various combinations of numeric types and functions.
--
+--
+-- Trailing junk in numeric literals
+--
+SELECT 123abc;
+ abc
+-----
+ 123
+(1 row)
+
+SELECT 0x0o;
+ x0o
+-----
+ 0
+(1 row)
+
+SELECT 1_2_3;
+ _2_3
+------
+ 1
+(1 row)
+
+SELECT 0.a;
+ a
+---
+ 0
+(1 row)
+
+SELECT 0.0a;
+ a
+-----
+ 0.0
+(1 row)
+
+SELECT .0a;
+ a
+-----
+ 0.0
+(1 row)
+
+SELECT 0.0e1a;
+ a
+---
+ 0
+(1 row)
+
+SELECT 0.0e;
+ e
+-----
+ 0.0
+(1 row)
+
+SELECT 0.0e+a;
+ERROR: syntax error at or near "+"
+LINE 1: SELECT 0.0e+a;
+ ^
+PREPARE p1 AS SELECT $1a;
+EXECUTE p1(1);
+ a
+---
+ 1
+(1 row)
+
--
-- Test implicit type conversions
-- This fails for Postgres v6.1 (and earlier?)
diff --git a/src/test/regress/sql/numerology.sql
b/src/test/regress/sql/numerology.sql
index fddb58f8fd..fb75f97832 100644
--- a/src/test/regress/sql/numerology.sql
+++ b/src/test/regress/sql/numerology.sql
@@ -3,6 +3,22 @@
-- Test various combinations of numeric types and functions.
--
+--
+-- Trailing junk in numeric literals
+--
+
+SELECT 123abc;
+SELECT 0x0o;
+SELECT 1_2_3;
+SELECT 0.a;
+SELECT 0.0a;
+SELECT .0a;
+SELECT 0.0e1a;
+SELECT 0.0e;
+SELECT 0.0e+a;
+PREPARE p1 AS SELECT $1a;
+EXECUTE p1(1);
+
--
-- Test implicit type conversions
-- This fails for Postgres v6.1 (and earlier?)
--
2.34.1
From ac3b6ac952624ded1c9aefe4f3e8a6715f4bb1d9 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <pe...@eisentraut.org>
Date: Thu, 30 Dec 2021 10:26:37 +0100
Subject: [PATCH v7 5/7] Reject trailing junk after numeric literals
After this, the PostgreSQL lexers no longer accept numeric literals
with trailing non-digits, such as 123abc, which would be scanned as
two tokens: 123 and abc. This is undocumented and surprising, and it
might also interfere with some extended numeric literal syntax being
contemplated for the future.
Discussion:
https://www.postgresql.org/message-id/flat/b239564c-cad0-b23e-c57e-166d883cb...@enterprisedb.com
---
src/backend/parser/scan.l | 32 +++++++---
src/fe_utils/psqlscan.l | 25 +++++---
src/interfaces/ecpg/preproc/pgc.l | 22 +++++++
src/test/regress/expected/numerology.out | 77 +++++++++---------------
src/test/regress/sql/numerology.sql | 1 -
5 files changed, 91 insertions(+), 66 deletions(-)
diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l
index f555ac6e6d..ab24bf70db 100644
--- a/src/backend/parser/scan.l
+++ b/src/backend/parser/scan.l
@@ -399,7 +399,12 @@ real
({integer}|{decimal})[Ee][-+]?{digit}+
realfail1 ({integer}|{decimal})[Ee]
realfail2 ({integer}|{decimal})[Ee][-+]
+integer_junk {integer}{ident_start}
+decimal_junk {decimal}{ident_start}
+real_junk {real}{ident_start}
+
param \${integer}
+param_junk \${integer}{ident_start}
other .
@@ -974,6 +979,10 @@ other .
yylval->ival = atol(yytext + 1);
return PARAM;
}
+{param_junk} {
+ SET_YYLLOC();
+ yyerror("trailing junk after
parameter");
+ }
{integer} {
SET_YYLLOC();
@@ -996,19 +1005,24 @@ other .
return FCONST;
}
{realfail1} {
- /*
- * throw back the [Ee], and figure out
whether what
- * remains is an {integer} or {decimal}.
- */
- yyless(yyleng - 1);
SET_YYLLOC();
- return process_integer_literal(yytext,
yylval);
+ yyerror("trailing junk after numeric
literal");
}
{realfail2} {
- /* throw back the [Ee][+-], and proceed
as above */
- yyless(yyleng - 2);
SET_YYLLOC();
- return process_integer_literal(yytext,
yylval);
+ yyerror("trailing junk after numeric
literal");
+ }
+{integer_junk} {
+ SET_YYLLOC();
+ yyerror("trailing junk after numeric
literal");
+ }
+{decimal_junk} {
+ SET_YYLLOC();
+ yyerror("trailing junk after numeric
literal");
+ }
+{real_junk} {
+ SET_YYLLOC();
+ yyerror("trailing junk after numeric
literal");
}
diff --git a/src/fe_utils/psqlscan.l b/src/fe_utils/psqlscan.l
index 941ed06553..0394edb15f 100644
--- a/src/fe_utils/psqlscan.l
+++ b/src/fe_utils/psqlscan.l
@@ -337,7 +337,12 @@ real
({integer}|{decimal})[Ee][-+]?{digit}+
realfail1 ({integer}|{decimal})[Ee]
realfail2 ({integer}|{decimal})[Ee][-+]
+integer_junk {integer}{ident_start}
+decimal_junk {decimal}{ident_start}
+real_junk {real}{ident_start}
+
param \${integer}
+param_junk \${integer}{ident_start}
/* psql-specific: characters allowed in variable names */
variable_char [A-Za-z\200-\377_0-9]
@@ -839,6 +844,9 @@ other .
{param} {
ECHO;
}
+{param_junk} {
+ ECHO;
+ }
{integer} {
ECHO;
@@ -855,17 +863,18 @@ other .
ECHO;
}
{realfail1} {
- /*
- * throw back the [Ee], and figure out
whether what
- * remains is an {integer} or {decimal}.
- * (in psql, we don't actually care...)
- */
- yyless(yyleng - 1);
ECHO;
}
{realfail2} {
- /* throw back the [Ee][+-], and proceed
as above */
- yyless(yyleng - 2);
+ ECHO;
+ }
+{integer_junk} {
+ ECHO;
+ }
+{decimal_junk} {
+ ECHO;
+ }
+{real_junk} {
ECHO;
}
diff --git a/src/interfaces/ecpg/preproc/pgc.l
b/src/interfaces/ecpg/preproc/pgc.l
index 39e578e868..25fb3b43b3 100644
--- a/src/interfaces/ecpg/preproc/pgc.l
+++ b/src/interfaces/ecpg/preproc/pgc.l
@@ -365,7 +365,12 @@ real
({integer}|{decimal})[Ee][-+]?{digit}+
realfail1 ({integer}|{decimal})[Ee]
realfail2 ({integer}|{decimal})[Ee][-+]
+integer_junk {integer}{ident_start}
+decimal_junk {decimal}{ident_start}
+real_junk {real}{ident_start}
+
param \${integer}
+param_junk \${integer}{ident_start}
/* special characters for other dbms */
/* we have to react differently in compat mode */
@@ -917,6 +922,9 @@ cppline
{space}*#([^i][A-Za-z]*|{if}|{ifdef}|{ifndef}|{import})((\/\*[^*/]*\*+
base_yylval.ival = atol(yytext+1);
return PARAM;
}
+{param_junk} {
+ mmfatal(PARSE_ERROR, "trailing junk
after parameter");
+ }
{ip} {
base_yylval.str = mm_strdup(yytext);
@@ -957,6 +965,20 @@ cppline
{space}*#([^i][A-Za-z]*|{if}|{ifdef}|{ifndef}|{import})((\/\*[^*/]*\*+
} /* <C,SQL> */
<SQL>{
+/*
+ * Note that some trailing junk is valid in C (such as 100LL), so we contain
+ * this to SQL mode.
+ */
+{integer_junk} {
+ mmfatal(PARSE_ERROR, "trailing junk
after numeric literal");
+ }
+{decimal_junk} {
+ mmfatal(PARSE_ERROR, "trailing junk
after numeric literal");
+ }
+{real_junk} {
+ mmfatal(PARSE_ERROR, "trailing junk
after numeric literal");
+ }
+
:{identifier}((("->"|\.){identifier})|(\[{array}\]))* {
base_yylval.str = mm_strdup(yytext+1);
return CVARIABLE;
diff --git a/src/test/regress/expected/numerology.out
b/src/test/regress/expected/numerology.out
index 2ffc73e854..77d4843417 100644
--- a/src/test/regress/expected/numerology.out
+++ b/src/test/regress/expected/numerology.out
@@ -6,64 +6,45 @@
-- Trailing junk in numeric literals
--
SELECT 123abc;
- abc
------
- 123
-(1 row)
-
+ERROR: trailing junk after numeric literal at or near "123a"
+LINE 1: SELECT 123abc;
+ ^
SELECT 0x0o;
- x0o
------
- 0
-(1 row)
-
+ERROR: trailing junk after numeric literal at or near "0x"
+LINE 1: SELECT 0x0o;
+ ^
SELECT 1_2_3;
- _2_3
-------
- 1
-(1 row)
-
+ERROR: trailing junk after numeric literal at or near "1_"
+LINE 1: SELECT 1_2_3;
+ ^
SELECT 0.a;
- a
----
- 0
-(1 row)
-
+ERROR: trailing junk after numeric literal at or near "0.a"
+LINE 1: SELECT 0.a;
+ ^
SELECT 0.0a;
- a
------
- 0.0
-(1 row)
-
+ERROR: trailing junk after numeric literal at or near "0.0a"
+LINE 1: SELECT 0.0a;
+ ^
SELECT .0a;
- a
------
- 0.0
-(1 row)
-
+ERROR: trailing junk after numeric literal at or near ".0a"
+LINE 1: SELECT .0a;
+ ^
SELECT 0.0e1a;
- a
----
- 0
-(1 row)
-
+ERROR: trailing junk after numeric literal at or near "0.0e1a"
+LINE 1: SELECT 0.0e1a;
+ ^
SELECT 0.0e;
- e
------
- 0.0
-(1 row)
-
+ERROR: trailing junk after numeric literal at or near "0.0e"
+LINE 1: SELECT 0.0e;
+ ^
SELECT 0.0e+a;
-ERROR: syntax error at or near "+"
+ERROR: trailing junk after numeric literal at or near "0.0e+"
LINE 1: SELECT 0.0e+a;
- ^
+ ^
PREPARE p1 AS SELECT $1a;
-EXECUTE p1(1);
- a
----
- 1
-(1 row)
-
+ERROR: trailing junk after parameter at or near "$1a"
+LINE 1: PREPARE p1 AS SELECT $1a;
+ ^
--
-- Test implicit type conversions
-- This fails for Postgres v6.1 (and earlier?)
diff --git a/src/test/regress/sql/numerology.sql
b/src/test/regress/sql/numerology.sql
index fb75f97832..be7d6dfe0c 100644
--- a/src/test/regress/sql/numerology.sql
+++ b/src/test/regress/sql/numerology.sql
@@ -17,7 +17,6 @@
SELECT 0.0e;
SELECT 0.0e+a;
PREPARE p1 AS SELECT $1a;
-EXECUTE p1(1);
--
-- Test implicit type conversions
--
2.34.1
From d40d84e76525f732ee8a07ffd62c68db5368c842 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <pe...@eisentraut.org>
Date: Thu, 30 Dec 2021 10:26:37 +0100
Subject: [PATCH v7 6/7] Non-decimal integer literals
Add support for hexadecimal, octal, and binary integer literals:
0x42F
0o273
0b100101
per SQL:202x draft.
This adds support in the lexer as well as in the integer type input
functions.
Discussion:
https://www.postgresql.org/message-id/flat/b239564c-cad0-b23e-c57e-166d883cb...@enterprisedb.com
---
doc/src/sgml/syntax.sgml | 26 ++++
src/backend/catalog/information_schema.sql | 6 +-
src/backend/catalog/sql_features.txt | 1 +
src/backend/parser/scan.l | 101 +++++++++++----
src/backend/utils/adt/numutils.c | 140 +++++++++++++++++++++
src/fe_utils/psqlscan.l | 80 +++++++++---
src/interfaces/ecpg/preproc/pgc.l | 116 +++++++++--------
src/test/regress/expected/int2.out | 19 +++
src/test/regress/expected/int4.out | 19 +++
src/test/regress/expected/int8.out | 19 +++
src/test/regress/expected/numerology.out | 59 ++++++++-
src/test/regress/sql/int2.sql | 7 ++
src/test/regress/sql/int4.sql | 7 ++
src/test/regress/sql/int8.sql | 7 ++
src/test/regress/sql/numerology.sql | 21 +++-
15 files changed, 529 insertions(+), 99 deletions(-)
diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index d66560b587..a4f04199c6 100644
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -694,6 +694,32 @@ <title>Numeric Constants</title>
</literallayout>
</para>
+ <para>
+ Additionally, non-decimal integer constants can be used in these forms:
+<synopsis>
+0x<replaceable>hexdigits</replaceable>
+0o<replaceable>octdigits</replaceable>
+0b<replaceable>bindigits</replaceable>
+</synopsis>
+ <replaceable>hexdigits</replaceable> is one or more hexadecimal digits
+ (0-9, A-F), <replaceable>octdigits</replaceable> is one or more octal
+ digits (0-7), <replaceable>bindigits</replaceable> is one or more binary
+ digits (0 or 1). Hexadecimal digits and the radix prefixes can be in
+ upper or lower case. Note that only integers can have non-decimal forms,
+ not numbers with fractional parts.
+ </para>
+
+ <para>
+ These are some examples of this:
+<literallayout>0b100101
+0B10011001
+0o273
+0O755
+0x42f
+0XFFFF
+</literallayout>
+ </para>
+
<para>
<indexterm><primary>integer</primary></indexterm>
<indexterm><primary>bigint</primary></indexterm>
diff --git a/src/backend/catalog/information_schema.sql
b/src/backend/catalog/information_schema.sql
index b4f348a24d..1957fc6e2d 100644
--- a/src/backend/catalog/information_schema.sql
+++ b/src/backend/catalog/information_schema.sql
@@ -119,7 +119,7 @@ CREATE FUNCTION _pg_numeric_precision(typid oid, typmod
int4) RETURNS integer
WHEN 1700 /*numeric*/ THEN
CASE WHEN $2 = -1
THEN null
- ELSE (($2 - 4) >> 16) & 65535
+ ELSE (($2 - 4) >> 16) & 0xFFFF
END
WHEN 700 /*float4*/ THEN 24 /*FLT_MANT_DIG*/
WHEN 701 /*float8*/ THEN 53 /*DBL_MANT_DIG*/
@@ -147,7 +147,7 @@ CREATE FUNCTION _pg_numeric_scale(typid oid, typmod int4)
RETURNS integer
WHEN $1 IN (1700) THEN
CASE WHEN $2 = -1
THEN null
- ELSE ($2 - 4) & 65535
+ ELSE ($2 - 4) & 0xFFFF
END
ELSE null
END;
@@ -163,7 +163,7 @@ CREATE FUNCTION _pg_datetime_precision(typid oid, typmod
int4) RETURNS integer
WHEN $1 IN (1083, 1114, 1184, 1266) /* time, timestamp, same + tz */
THEN CASE WHEN $2 < 0 THEN 6 ELSE $2 END
WHEN $1 IN (1186) /* interval */
- THEN CASE WHEN $2 < 0 OR $2 & 65535 = 65535 THEN 6 ELSE $2 & 65535
END
+ THEN CASE WHEN $2 < 0 OR $2 & 0xFFFF = 0xFFFF THEN 6 ELSE $2 &
0xFFFF END
ELSE null
END;
diff --git a/src/backend/catalog/sql_features.txt
b/src/backend/catalog/sql_features.txt
index b8a78f4d41..545cb45131 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -526,6 +526,7 @@ T652 SQL-dynamic statements in SQL routines
NO
T653 SQL-schema statements in external routines YES
T654 SQL-dynamic statements in external routines NO
T655 Cyclically dependent routines YES
+T661 Non-decimal integer literals YES SQL:202x draft
T811 Basic SQL/JSON constructor functions NO
T812 SQL/JSON: JSON_OBJECTAGG NO
T813 SQL/JSON: JSON_ARRAYAGG with ORDER BY NO
diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l
index ab24bf70db..2e1aa62d81 100644
--- a/src/backend/parser/scan.l
+++ b/src/backend/parser/scan.l
@@ -124,7 +124,7 @@ static void addlit(char *ytext, int yleng, core_yyscan_t
yyscanner);
static void addlitchar(unsigned char ychar, core_yyscan_t yyscanner);
static char *litbufdup(core_yyscan_t yyscanner);
static unsigned char unescape_single_char(unsigned char c, core_yyscan_t
yyscanner);
-static int process_integer_literal(const char *token, YYSTYPE *lval);
+static int process_integer_literal(const char *token, YYSTYPE *lval, int
base);
static void addunicode(pg_wchar c, yyscan_t yyscanner);
#define yyerror(msg) scanner_yyerror(msg, yyscanner)
@@ -385,26 +385,41 @@ operator {op_chars}+
* Unary minus is not part of a number here. Instead we pass it separately to
* the parser, and there it gets coerced via doNegate().
*
- * {decimalfail} is used because we would like "1..10" to lex as 1, dot_dot,
10.
+ * {numericfail} is used because we would like "1..10" to lex as 1, dot_dot,
10.
*
* {realfail1} and {realfail2} are added to prevent the need for scanner
* backup when the {real} rule fails to match completely.
*/
-digit [0-9]
-
-integer {digit}+
-decimal (({digit}*\.{digit}+)|({digit}+\.{digit}*))
-decimalfail {digit}+\.\.
-real ({integer}|{decimal})[Ee][-+]?{digit}+
-realfail1 ({integer}|{decimal})[Ee]
-realfail2 ({integer}|{decimal})[Ee][-+]
-
-integer_junk {integer}{ident_start}
-decimal_junk {decimal}{ident_start}
+decdigit [0-9]
+hexdigit [0-9A-Fa-f]
+octdigit [0-7]
+bindigit [0-1]
+
+decinteger {decdigit}+
+hexinteger 0[xX]{hexdigit}+
+octinteger 0[oO]{octdigit}+
+bininteger 0[bB]{bindigit}+
+
+hexfail 0[xX]
+octfail 0[oO]
+binfail 0[bB]
+
+numeric (({decinteger}\.{decinteger}?)|(\.{decinteger}))
+numericfail {decdigit}+\.\.
+
+real ({decinteger}|{numeric})[Ee][-+]?{decdigit}+
+realfail1 ({decinteger}|{numeric})[Ee]
+realfail2 ({decinteger}|{numeric})[Ee][-+]
+
+decinteger_junk {decinteger}{ident_start}
+hexinteger_junk {hexinteger}{ident_start}
+octinteger_junk {octinteger}{ident_start}
+bininteger_junk {bininteger}{ident_start}
+numeric_junk {numeric}{ident_start}
real_junk {real}{ident_start}
-param \${integer}
-param_junk \${integer}{ident_start}
+param \${decinteger}
+param_junk \${decinteger}{ident_start}
other .
@@ -984,20 +999,44 @@ other .
yyerror("trailing junk after
parameter");
}
-{integer} {
+{decinteger} {
+ SET_YYLLOC();
+ return process_integer_literal(yytext,
yylval, 10);
+ }
+{hexinteger} {
+ SET_YYLLOC();
+ return process_integer_literal(yytext +
2, yylval, 16);
+ }
+{octinteger} {
+ SET_YYLLOC();
+ return process_integer_literal(yytext +
2, yylval, 8);
+ }
+{bininteger} {
+ SET_YYLLOC();
+ return process_integer_literal(yytext +
2, yylval, 2);
+ }
+{hexfail} {
+ SET_YYLLOC();
+ yyerror("invalid hexadecimal integer");
+ }
+{octfail} {
SET_YYLLOC();
- return process_integer_literal(yytext,
yylval);
+ yyerror("invalid octal integer");
}
-{decimal} {
+{binfail} {
+ SET_YYLLOC();
+ yyerror("invalid binary integer");
+ }
+{numeric} {
SET_YYLLOC();
yylval->str = pstrdup(yytext);
return FCONST;
}
-{decimalfail} {
+{numericfail} {
/* throw back the .., and treat as
integer */
yyless(yyleng - 2);
SET_YYLLOC();
- return process_integer_literal(yytext,
yylval);
+ return process_integer_literal(yytext,
yylval, 10);
}
{real} {
SET_YYLLOC();
@@ -1012,11 +1051,23 @@ other .
SET_YYLLOC();
yyerror("trailing junk after numeric
literal");
}
-{integer_junk} {
+{decinteger_junk} {
+ SET_YYLLOC();
+ yyerror("trailing junk after numeric
literal");
+ }
+{hexinteger_junk} {
+ SET_YYLLOC();
+ yyerror("trailing junk after numeric
literal");
+ }
+{octinteger_junk} {
+ SET_YYLLOC();
+ yyerror("trailing junk after numeric
literal");
+ }
+{bininteger_junk} {
SET_YYLLOC();
yyerror("trailing junk after numeric
literal");
}
-{decimal_junk} {
+{numeric_junk} {
SET_YYLLOC();
yyerror("trailing junk after numeric
literal");
}
@@ -1312,17 +1363,17 @@ litbufdup(core_yyscan_t yyscanner)
}
/*
- * Process {integer}. Note this will also do the right thing with {decimal},
+ * Process {*integer}. Note this will also do the right thing with {numeric},
* ie digits and a decimal point.
*/
static int
-process_integer_literal(const char *token, YYSTYPE *lval)
+process_integer_literal(const char *token, YYSTYPE *lval, int base)
{
int val;
char *endptr;
errno = 0;
- val = strtoint(token, &endptr, 10);
+ val = strtoint(token, &endptr, base);
if (*endptr != '\0' || errno == ERANGE)
{
/* integer too large (or contains decimal pt), treat it as a
float */
diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c
index cc3f95d399..37364921d5 100644
--- a/src/backend/utils/adt/numutils.c
+++ b/src/backend/utils/adt/numutils.c
@@ -85,6 +85,17 @@ decimalLength64(const uint64 v)
return t + (v >= PowersOfTen[t]);
}
+static const int8 hexlookup[128] = {
+ -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+ -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+ -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, -1, -1, -1, -1, -1, -1,
+ -1, 10, 11, 12, 13, 14, 15, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+ -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+ -1, 10, 11, 12, 13, 14, 15, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+ -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+};
+
/*
* Convert input string to a signed 16 bit integer.
*
@@ -120,6 +131,48 @@ pg_strtoint16(const char *s)
goto invalid_syntax;
/* process digits */
+ if (ptr[0] == '0' && (ptr[1] == 'x' || ptr[1] == 'X'))
+ {
+ ptr += 2;
+ while (*ptr && isxdigit((unsigned char) *ptr))
+ {
+ int8 digit = hexlookup[(unsigned char) *ptr];
+
+ if (unlikely(pg_mul_s16_overflow(tmp, 16, &tmp)) ||
+ unlikely(pg_sub_s16_overflow(tmp, digit, &tmp)))
+ goto out_of_range;
+
+ ptr++;
+ }
+ }
+ else if (ptr[0] == '0' && (ptr[1] == 'o' || ptr[1] == 'O'))
+ {
+ ptr += 2;
+
+ while (*ptr && (*ptr >= '0' && *ptr <= '7'))
+ {
+ int8 digit = (*ptr++ - '0');
+
+ if (unlikely(pg_mul_s16_overflow(tmp, 8, &tmp)) ||
+ unlikely(pg_sub_s16_overflow(tmp, digit, &tmp)))
+ goto out_of_range;
+ }
+ }
+ else if (ptr[0] == '0' && (ptr[1] == 'b' || ptr[1] == 'B'))
+ {
+ ptr += 2;
+
+ while (*ptr && (*ptr >= '0' && *ptr <= '1'))
+ {
+ int8 digit = (*ptr++ - '0');
+
+ if (unlikely(pg_mul_s16_overflow(tmp, 2, &tmp)) ||
+ unlikely(pg_sub_s16_overflow(tmp, digit, &tmp)))
+ goto out_of_range;
+ }
+ }
+ else
+ {
while (*ptr && isdigit((unsigned char) *ptr))
{
int8 digit = (*ptr++ - '0');
@@ -128,6 +181,7 @@ pg_strtoint16(const char *s)
unlikely(pg_sub_s16_overflow(tmp, digit, &tmp)))
goto out_of_range;
}
+ }
/* allow trailing whitespace, but not other trailing chars */
while (*ptr != '\0' && isspace((unsigned char) *ptr))
@@ -196,6 +250,48 @@ pg_strtoint32(const char *s)
goto invalid_syntax;
/* process digits */
+ if (ptr[0] == '0' && (ptr[1] == 'x' || ptr[1] == 'X'))
+ {
+ ptr += 2;
+ while (*ptr && isxdigit((unsigned char) *ptr))
+ {
+ int8 digit = hexlookup[(unsigned char) *ptr];
+
+ if (unlikely(pg_mul_s32_overflow(tmp, 16, &tmp)) ||
+ unlikely(pg_sub_s32_overflow(tmp, digit, &tmp)))
+ goto out_of_range;
+
+ ptr++;
+ }
+ }
+ else if (ptr[0] == '0' && (ptr[1] == 'o' || ptr[1] == 'O'))
+ {
+ ptr += 2;
+
+ while (*ptr && (*ptr >= '0' && *ptr <= '7'))
+ {
+ int8 digit = (*ptr++ - '0');
+
+ if (unlikely(pg_mul_s32_overflow(tmp, 8, &tmp)) ||
+ unlikely(pg_sub_s32_overflow(tmp, digit, &tmp)))
+ goto out_of_range;
+ }
+ }
+ else if (ptr[0] == '0' && (ptr[1] == 'b' || ptr[1] == 'B'))
+ {
+ ptr += 2;
+
+ while (*ptr && (*ptr >= '0' && *ptr <= '1'))
+ {
+ int8 digit = (*ptr++ - '0');
+
+ if (unlikely(pg_mul_s32_overflow(tmp, 2, &tmp)) ||
+ unlikely(pg_sub_s32_overflow(tmp, digit, &tmp)))
+ goto out_of_range;
+ }
+ }
+ else
+ {
while (*ptr && isdigit((unsigned char) *ptr))
{
int8 digit = (*ptr++ - '0');
@@ -204,6 +300,7 @@ pg_strtoint32(const char *s)
unlikely(pg_sub_s32_overflow(tmp, digit, &tmp)))
goto out_of_range;
}
+ }
/* allow trailing whitespace, but not other trailing chars */
while (*ptr != '\0' && isspace((unsigned char) *ptr))
@@ -280,6 +377,48 @@ pg_strtoint64(const char *s)
goto invalid_syntax;
/* process digits */
+ if (ptr[0] == '0' && (ptr[1] == 'x' || ptr[1] == 'X'))
+ {
+ ptr += 2;
+ while (*ptr && isxdigit((unsigned char) *ptr))
+ {
+ int8 digit = hexlookup[(unsigned char) *ptr];
+
+ if (unlikely(pg_mul_s64_overflow(tmp, 16, &tmp)) ||
+ unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
+ goto out_of_range;
+
+ ptr++;
+ }
+ }
+ else if (ptr[0] == '0' && (ptr[1] == 'o' || ptr[1] == 'O'))
+ {
+ ptr += 2;
+
+ while (*ptr && (*ptr >= '0' && *ptr <= '7'))
+ {
+ int8 digit = (*ptr++ - '0');
+
+ if (unlikely(pg_mul_s64_overflow(tmp, 8, &tmp)) ||
+ unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
+ goto out_of_range;
+ }
+ }
+ else if (ptr[0] == '0' && (ptr[1] == 'b' || ptr[1] == 'B'))
+ {
+ ptr += 2;
+
+ while (*ptr && (*ptr >= '0' && *ptr <= '1'))
+ {
+ int8 digit = (*ptr++ - '0');
+
+ if (unlikely(pg_mul_s64_overflow(tmp, 2, &tmp)) ||
+ unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
+ goto out_of_range;
+ }
+ }
+ else
+ {
while (*ptr && isdigit((unsigned char) *ptr))
{
int8 digit = (*ptr++ - '0');
@@ -288,6 +427,7 @@ pg_strtoint64(const char *s)
unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
goto out_of_range;
}
+ }
/* allow trailing whitespace, but not other trailing chars */
while (*ptr != '\0' && isspace((unsigned char) *ptr))
diff --git a/src/fe_utils/psqlscan.l b/src/fe_utils/psqlscan.l
index 0394edb15f..09155a3d5d 100644
--- a/src/fe_utils/psqlscan.l
+++ b/src/fe_utils/psqlscan.l
@@ -323,26 +323,41 @@ operator {op_chars}+
* Unary minus is not part of a number here. Instead we pass it separately to
* the parser, and there it gets coerced via doNegate().
*
- * {decimalfail} is used because we would like "1..10" to lex as 1, dot_dot,
10.
+ * {numericfail} is used because we would like "1..10" to lex as 1, dot_dot,
10.
*
* {realfail1} and {realfail2} are added to prevent the need for scanner
* backup when the {real} rule fails to match completely.
*/
-digit [0-9]
-
-integer {digit}+
-decimal (({digit}*\.{digit}+)|({digit}+\.{digit}*))
-decimalfail {digit}+\.\.
-real ({integer}|{decimal})[Ee][-+]?{digit}+
-realfail1 ({integer}|{decimal})[Ee]
-realfail2 ({integer}|{decimal})[Ee][-+]
-
-integer_junk {integer}{ident_start}
-decimal_junk {decimal}{ident_start}
+decdigit [0-9]
+hexdigit [0-9A-Fa-f]
+octdigit [0-7]
+bindigit [0-1]
+
+decinteger {decdigit}+
+hexinteger 0[xX]{hexdigit}+
+octinteger 0[oO]{octdigit}+
+bininteger 0[bB]{bindigit}+
+
+hexfail 0[xX]
+octfail 0[oO]
+binfail 0[bB]
+
+numeric (({decinteger}\.{decinteger}?)|(\.{decinteger}))
+numericfail {decdigit}+\.\.
+
+real ({decinteger}|{numeric})[Ee][-+]?{decdigit}+
+realfail1 ({decinteger}|{numeric})[Ee]
+realfail2 ({decinteger}|{numeric})[Ee][-+]
+
+decinteger_junk {decinteger}{ident_start}
+hexinteger_junk {hexinteger}{ident_start}
+octinteger_junk {octinteger}{ident_start}
+bininteger_junk {bininteger}{ident_start}
+numeric_junk {numeric}{ident_start}
real_junk {real}{ident_start}
-param \${integer}
-param_junk \${integer}{ident_start}
+param \${decinteger}
+param_junk \${decinteger}{ident_start}
/* psql-specific: characters allowed in variable names */
variable_char [A-Za-z\200-\377_0-9]
@@ -848,13 +863,31 @@ other .
ECHO;
}
-{integer} {
+{decinteger} {
+ ECHO;
+ }
+{hexinteger} {
+ ECHO;
+ }
+{octinteger} {
+ ECHO;
+ }
+{bininteger} {
+ ECHO;
+ }
+{hexfail} {
ECHO;
}
-{decimal} {
+{octfail} {
ECHO;
}
-{decimalfail} {
+{binfail} {
+ ECHO;
+ }
+{numeric} {
+ ECHO;
+ }
+{numericfail} {
/* throw back the .., and treat as
integer */
yyless(yyleng - 2);
ECHO;
@@ -868,10 +901,19 @@ other .
{realfail2} {
ECHO;
}
-{integer_junk} {
+{decinteger_junk} {
+ ECHO;
+ }
+{hexinteger_junk} {
+ ECHO;
+ }
+{octinteger_junk} {
+ ECHO;
+ }
+{bininteger_junk} {
ECHO;
}
-{decimal_junk} {
+{numeric_junk} {
ECHO;
}
{real_junk} {
diff --git a/src/interfaces/ecpg/preproc/pgc.l
b/src/interfaces/ecpg/preproc/pgc.l
index 25fb3b43b3..58d1a00d65 100644
--- a/src/interfaces/ecpg/preproc/pgc.l
+++ b/src/interfaces/ecpg/preproc/pgc.l
@@ -57,7 +57,7 @@ static bool include_next;
#define startlit() (literalbuf[0] = '\0', literallen = 0)
static void addlit(char *ytext, int yleng);
static void addlitchar(unsigned char);
-static int process_integer_literal(const char *token, YYSTYPE *lval);
+static int process_integer_literal(const char *token, YYSTYPE *lval, int
base);
static void parse_include(void);
static bool ecpg_isspace(char ch);
static bool isdefine(void);
@@ -351,26 +351,41 @@ operator {op_chars}+
* Unary minus is not part of a number here. Instead we pass it separately to
* the parser, and there it gets coerced via doNegate().
*
- * {decimalfail} is used because we would like "1..10" to lex as 1, dot_dot,
10.
+ * {numericfail} is used because we would like "1..10" to lex as 1, dot_dot,
10.
*
* {realfail1} and {realfail2} are added to prevent the need for scanner
* backup when the {real} rule fails to match completely.
*/
-digit [0-9]
-
-integer {digit}+
-decimal (({digit}*\.{digit}+)|({digit}+\.{digit}*))
-decimalfail {digit}+\.\.
-real ({integer}|{decimal})[Ee][-+]?{digit}+
-realfail1 ({integer}|{decimal})[Ee]
-realfail2 ({integer}|{decimal})[Ee][-+]
-
-integer_junk {integer}{ident_start}
-decimal_junk {decimal}{ident_start}
+decdigit [0-9]
+hexdigit [0-9A-Fa-f]
+octdigit [0-7]
+bindigit [0-1]
+
+decinteger {decdigit}+
+hexinteger 0[xX]{hexdigit}+
+octinteger 0[oO]{octdigit}+
+bininteger 0[bB]{bindigit}+
+
+hexfail 0[xX]
+octfail 0[oO]
+binfail 0[bB]
+
+numeric (({decinteger}\.{decinteger}?)|(\.{decinteger}))
+numericfail {decdigit}+\.\.
+
+real ({decinteger}|{numeric})[Ee][-+]?{decdigit}+
+realfail1 ({decinteger}|{numeric})[Ee]
+realfail2 ({decinteger}|{numeric})[Ee][-+]
+
+decinteger_junk {decinteger}{ident_start}
+hexinteger_junk {hexinteger}{ident_start}
+octinteger_junk {octinteger}{ident_start}
+bininteger_junk {bininteger}{ident_start}
+numeric_junk {numeric}{ident_start}
real_junk {real}{ident_start}
-param \${integer}
-param_junk \${integer}{ident_start}
+param \${decinteger}
+param_junk \${decinteger}{ident_start}
/* special characters for other dbms */
/* we have to react differently in compat mode */
@@ -400,9 +415,6 @@ include_next
[iI][nN][cC][lL][uU][dD][eE]_[nN][eE][xX][tT]
import [iI][mM][pP][oO][rR][tT]
undef [uU][nN][dD][eE][fF]
-/* C version of hex number */
-xch 0[xX][0-9A-Fa-f]*
-
ccomment "//".*\n
if [iI][fF]
@@ -415,7 +427,7 @@ endif [eE][nN][dD][iI][fF]
struct [sS][tT][rR][uU][cC][tT]
exec_sql {exec}{space}*{sql}{space}*
-ipdigit ({digit}|{digit}{digit}|{digit}{digit}{digit})
+ipdigit
({decdigit}|{decdigit}{decdigit}|{decdigit}{decdigit}{decdigit})
ip {ipdigit}\.{ipdigit}\.{ipdigit}\.{ipdigit}
/* we might want to parse all cpp include files */
@@ -933,17 +945,20 @@ cppline
{space}*#([^i][A-Za-z]*|{if}|{ifdef}|{ifndef}|{import})((\/\*[^*/]*\*+
} /* <SQL> */
<C,SQL>{
-{integer} {
- return process_integer_literal(yytext,
&base_yylval);
+{decinteger} {
+ return process_integer_literal(yytext,
&base_yylval, 10);
}
-{decimal} {
+{hexinteger} {
+ return process_integer_literal(yytext +
2, &base_yylval, 16);
+ }
+{numeric} {
base_yylval.str = mm_strdup(yytext);
return FCONST;
}
-{decimalfail} {
+{numericfail} {
/* throw back the .., and treat as
integer */
yyless(yyleng - 2);
- return process_integer_literal(yytext,
&base_yylval);
+ return process_integer_literal(yytext,
&base_yylval, 10);
}
{real} {
base_yylval.str = mm_strdup(yytext);
@@ -952,27 +967,43 @@ cppline
{space}*#([^i][A-Za-z]*|{if}|{ifdef}|{ifndef}|{import})((\/\*[^*/]*\*+
{realfail1} {
/*
* throw back the [Ee], and figure out
whether what
- * remains is an {integer} or {decimal}.
+ * remains is an {decinteger} or
{numeric}.
*/
yyless(yyleng - 1);
- return process_integer_literal(yytext,
&base_yylval);
+ return process_integer_literal(yytext,
&base_yylval, 10);
}
{realfail2} {
/* throw back the [Ee][+-], and proceed
as above */
yyless(yyleng - 2);
- return process_integer_literal(yytext,
&base_yylval);
+ return process_integer_literal(yytext,
&base_yylval, 10);
}
} /* <C,SQL> */
<SQL>{
-/*
- * Note that some trailing junk is valid in C (such as 100LL), so we contain
- * this to SQL mode.
- */
-{integer_junk} {
+{octinteger} {
+ return process_integer_literal(yytext +
2, &base_yylval, 8);
+ }
+{bininteger} {
+ return process_integer_literal(yytext +
2, &base_yylval, 2);
+ }
+
+ /*
+ * Note that some trailing junk is valid in C (such as 100LL), so we
contain
+ * this to SQL mode.
+ */
+{decinteger_junk} {
mmfatal(PARSE_ERROR, "trailing junk
after numeric literal");
}
-{decimal_junk} {
+{hexinteger_junk} {
+ mmfatal(PARSE_ERROR, "trailing junk
after numeric literal");
+ }
+{octinteger_junk} {
+ mmfatal(PARSE_ERROR, "trailing junk
after numeric literal");
+ }
+{bininteger_junk} {
+ mmfatal(PARSE_ERROR, "trailing junk
after numeric literal");
+ }
+{numeric_junk} {
mmfatal(PARSE_ERROR, "trailing junk
after numeric literal");
}
{real_junk} {
@@ -1033,19 +1064,6 @@ cppline
{space}*#([^i][A-Za-z]*|{if}|{ifdef}|{ifndef}|{import})((\/\*[^*/]*\*+
return S_ANYTHING;
}
<C>{ccomment} { ECHO; }
-<C>{xch} {
- char* endptr;
-
- errno = 0;
- base_yylval.ival =
strtoul((char *)yytext,&endptr,16);
- if (*endptr != '\0' || errno ==
ERANGE)
- {
- errno = 0;
- base_yylval.str =
mm_strdup(yytext);
- return SCONST;
- }
- return ICONST;
- }
<C>{cppinclude} {
if (system_includes)
{
@@ -1570,17 +1588,17 @@ addlitchar(unsigned char ychar)
}
/*
- * Process {integer}. Note this will also do the right thing with {decimal},
+ * Process {*integer}. Note this will also do the right thing with {numeric},
* ie digits and a decimal point.
*/
static int
-process_integer_literal(const char *token, YYSTYPE *lval)
+process_integer_literal(const char *token, YYSTYPE *lval, int base)
{
int val;
char *endptr;
errno = 0;
- val = strtoint(token, &endptr, 10);
+ val = strtoint(token, &endptr, base);
if (*endptr != '\0' || errno == ERANGE)
{
/* integer too large (or contains decimal pt), treat it as a
float */
diff --git a/src/test/regress/expected/int2.out
b/src/test/regress/expected/int2.out
index 55ea7202cd..220e1493e8 100644
--- a/src/test/regress/expected/int2.out
+++ b/src/test/regress/expected/int2.out
@@ -306,3 +306,22 @@ FROM (VALUES (-2.5::numeric),
2.5 | 3
(7 rows)
+-- non-decimal literals
+SELECT int2 '0b100101';
+ int2
+------
+ 37
+(1 row)
+
+SELECT int2 '0o273';
+ int2
+------
+ 187
+(1 row)
+
+SELECT int2 '0x42F';
+ int2
+------
+ 1071
+(1 row)
+
diff --git a/src/test/regress/expected/int4.out
b/src/test/regress/expected/int4.out
index 9d20b3380f..6fdbd58b40 100644
--- a/src/test/regress/expected/int4.out
+++ b/src/test/regress/expected/int4.out
@@ -437,3 +437,22 @@ SELECT lcm((-2147483648)::int4, 1::int4); -- overflow
ERROR: integer out of range
SELECT lcm(2147483647::int4, 2147483646::int4); -- overflow
ERROR: integer out of range
+-- non-decimal literals
+SELECT int4 '0b100101';
+ int4
+------
+ 37
+(1 row)
+
+SELECT int4 '0o273';
+ int4
+------
+ 187
+(1 row)
+
+SELECT int4 '0x42F';
+ int4
+------
+ 1071
+(1 row)
+
diff --git a/src/test/regress/expected/int8.out
b/src/test/regress/expected/int8.out
index 36540ec456..edd15a4353 100644
--- a/src/test/regress/expected/int8.out
+++ b/src/test/regress/expected/int8.out
@@ -932,3 +932,22 @@ SELECT lcm((-9223372036854775808)::int8, 1::int8); --
overflow
ERROR: bigint out of range
SELECT lcm(9223372036854775807::int8, 9223372036854775806::int8); -- overflow
ERROR: bigint out of range
+-- non-decimal literals
+SELECT int8 '0b100101';
+ int8
+------
+ 37
+(1 row)
+
+SELECT int8 '0o273';
+ int8
+------
+ 187
+(1 row)
+
+SELECT int8 '0x42F';
+ int8
+------
+ 1071
+(1 row)
+
diff --git a/src/test/regress/expected/numerology.out
b/src/test/regress/expected/numerology.out
index 77d4843417..d95b24c7b3 100644
--- a/src/test/regress/expected/numerology.out
+++ b/src/test/regress/expected/numerology.out
@@ -3,14 +3,33 @@
-- Test various combinations of numeric types and functions.
--
--
--- Trailing junk in numeric literals
+-- numeric literals
--
+SELECT 0b100101;
+ ?column?
+----------
+ 37
+(1 row)
+
+SELECT 0o273;
+ ?column?
+----------
+ 187
+(1 row)
+
+SELECT 0x42F;
+ ?column?
+----------
+ 1071
+(1 row)
+
+-- error cases
SELECT 123abc;
ERROR: trailing junk after numeric literal at or near "123a"
LINE 1: SELECT 123abc;
^
SELECT 0x0o;
-ERROR: trailing junk after numeric literal at or near "0x"
+ERROR: trailing junk after numeric literal at or near "0x0o"
LINE 1: SELECT 0x0o;
^
SELECT 1_2_3;
@@ -45,6 +64,42 @@ PREPARE p1 AS SELECT $1a;
ERROR: trailing junk after parameter at or near "$1a"
LINE 1: PREPARE p1 AS SELECT $1a;
^
+SELECT 0b;
+ERROR: invalid binary integer at or near "0b"
+LINE 1: SELECT 0b;
+ ^
+SELECT 1b;
+ERROR: trailing junk after numeric literal at or near "1b"
+LINE 1: SELECT 1b;
+ ^
+SELECT 0b0x;
+ERROR: trailing junk after numeric literal at or near "0b0x"
+LINE 1: SELECT 0b0x;
+ ^
+SELECT 0o;
+ERROR: invalid octal integer at or near "0o"
+LINE 1: SELECT 0o;
+ ^
+SELECT 1o;
+ERROR: trailing junk after numeric literal at or near "1o"
+LINE 1: SELECT 1o;
+ ^
+SELECT 0o0x;
+ERROR: trailing junk after numeric literal at or near "0o0x"
+LINE 1: SELECT 0o0x;
+ ^
+SELECT 0x;
+ERROR: invalid hexadecimal integer at or near "0x"
+LINE 1: SELECT 0x;
+ ^
+SELECT 1x;
+ERROR: trailing junk after numeric literal at or near "1x"
+LINE 1: SELECT 1x;
+ ^
+SELECT 0x0y;
+ERROR: trailing junk after numeric literal at or near "0x0y"
+LINE 1: SELECT 0x0y;
+ ^
--
-- Test implicit type conversions
-- This fails for Postgres v6.1 (and earlier?)
diff --git a/src/test/regress/sql/int2.sql b/src/test/regress/sql/int2.sql
index 613b344704..0dee22fe6d 100644
--- a/src/test/regress/sql/int2.sql
+++ b/src/test/regress/sql/int2.sql
@@ -112,3 +112,10 @@ CREATE TABLE INT2_TBL(f1 int2);
(0.5::numeric),
(1.5::numeric),
(2.5::numeric)) t(x);
+
+
+-- non-decimal literals
+
+SELECT int2 '0b100101';
+SELECT int2 '0o273';
+SELECT int2 '0x42F';
diff --git a/src/test/regress/sql/int4.sql b/src/test/regress/sql/int4.sql
index 55ec07a147..2a69b1614e 100644
--- a/src/test/regress/sql/int4.sql
+++ b/src/test/regress/sql/int4.sql
@@ -176,3 +176,10 @@ CREATE TABLE INT4_TBL(f1 int4);
SELECT lcm((-2147483648)::int4, 1::int4); -- overflow
SELECT lcm(2147483647::int4, 2147483646::int4); -- overflow
+
+
+-- non-decimal literals
+
+SELECT int4 '0b100101';
+SELECT int4 '0o273';
+SELECT int4 '0x42F';
diff --git a/src/test/regress/sql/int8.sql b/src/test/regress/sql/int8.sql
index 32940b4daa..b7ad696dd8 100644
--- a/src/test/regress/sql/int8.sql
+++ b/src/test/regress/sql/int8.sql
@@ -250,3 +250,10 @@ CREATE TABLE INT8_TBL(q1 int8, q2 int8);
SELECT lcm((-9223372036854775808)::int8, 1::int8); -- overflow
SELECT lcm(9223372036854775807::int8, 9223372036854775806::int8); -- overflow
+
+
+-- non-decimal literals
+
+SELECT int8 '0b100101';
+SELECT int8 '0o273';
+SELECT int8 '0x42F';
diff --git a/src/test/regress/sql/numerology.sql
b/src/test/regress/sql/numerology.sql
index be7d6dfe0c..0e12bcc7b7 100644
--- a/src/test/regress/sql/numerology.sql
+++ b/src/test/regress/sql/numerology.sql
@@ -3,10 +3,16 @@
-- Test various combinations of numeric types and functions.
--
+
--
--- Trailing junk in numeric literals
+-- numeric literals
--
+SELECT 0b100101;
+SELECT 0o273;
+SELECT 0x42F;
+
+-- error cases
SELECT 123abc;
SELECT 0x0o;
SELECT 1_2_3;
@@ -18,6 +24,19 @@
SELECT 0.0e+a;
PREPARE p1 AS SELECT $1a;
+SELECT 0b;
+SELECT 1b;
+SELECT 0b0x;
+
+SELECT 0o;
+SELECT 1o;
+SELECT 0o0x;
+
+SELECT 0x;
+SELECT 1x;
+SELECT 0x0y;
+
+
--
-- Test implicit type conversions
-- This fails for Postgres v6.1 (and earlier?)
--
2.34.1
From ac104eaa206f6b98631a2ef18bfdb0afb494bb9c Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <pe...@eisentraut.org>
Date: Thu, 30 Dec 2021 10:26:37 +0100
Subject: [PATCH v7 7/7] WIP: Underscores in numeric literals
Discussion:
https://www.postgresql.org/message-id/flat/b239564c-cad0-b23e-c57e-166d883cb...@enterprisedb.com
---
src/backend/parser/Makefile | 2 +-
src/backend/parser/scan.l | 26 +++++++++++++++---
src/test/regress/expected/numerology.out | 34 +++++++++++++++++++++---
src/test/regress/sql/numerology.sql | 7 ++++-
4 files changed, 59 insertions(+), 10 deletions(-)
diff --git a/src/backend/parser/Makefile b/src/backend/parser/Makefile
index 5ddb9a92f0..827bc4c189 100644
--- a/src/backend/parser/Makefile
+++ b/src/backend/parser/Makefile
@@ -56,7 +56,7 @@ gram.c: BISON_CHECK_CMD = $(PERL) $(srcdir)/check_keywords.pl
$< $(top_srcdir)/s
scan.c: FLEXFLAGS = -CF -p -p
-scan.c: FLEX_NO_BACKUP=yes
+#scan.c: FLEX_NO_BACKUP=yes
scan.c: FLEX_FIX_WARNING=yes
diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l
index 2e1aa62d81..5b574c4233 100644
--- a/src/backend/parser/scan.l
+++ b/src/backend/parser/scan.l
@@ -395,10 +395,10 @@ hexdigit [0-9A-Fa-f]
octdigit [0-7]
bindigit [0-1]
-decinteger {decdigit}+
-hexinteger 0[xX]{hexdigit}+
-octinteger 0[oO]{octdigit}+
-bininteger 0[bB]{bindigit}+
+decinteger {decdigit}(_?{decdigit})*
+hexinteger 0[xX](_?{hexdigit})+
+octinteger 0[oO](_?{octdigit})+
+bininteger 0[bB](_?{bindigit})+
hexfail 0[xX]
octfail 0[oO]
@@ -1372,6 +1372,24 @@ process_integer_literal(const char *token, YYSTYPE
*lval, int base)
int val;
char *endptr;
+ if (strchr(token, '_'))
+ {
+ char *newtoken = palloc(strlen(token));
+ const char *p1;
+ char *p2;
+
+ p1 = token;
+ p2 = newtoken;
+ while (*p1)
+ {
+ if (*p1 != '_')
+ *p2++ = *p1;
+ p1++;
+ }
+ *p2 = '\0';
+ token = newtoken;
+ }
+
errno = 0;
val = strtoint(token, &endptr, base);
if (*endptr != '\0' || errno == ERANGE)
diff --git a/src/test/regress/expected/numerology.out
b/src/test/regress/expected/numerology.out
index d95b24c7b3..7289a325fc 100644
--- a/src/test/regress/expected/numerology.out
+++ b/src/test/regress/expected/numerology.out
@@ -23,6 +23,36 @@ SELECT 0x42F;
1071
(1 row)
+SELECT 1_000_000;
+ ?column?
+----------
+ 1000000
+(1 row)
+
+SELECT 1_2_3;
+ ?column?
+----------
+ 123
+(1 row)
+
+SELECT 0x1EEE_FFFF;
+ ?column?
+-----------
+ 518979583
+(1 row)
+
+SELECT 0o2_73;
+ ?column?
+----------
+ 187
+(1 row)
+
+SELECT 0b_10_0101;
+ ?column?
+----------
+ 37
+(1 row)
+
-- error cases
SELECT 123abc;
ERROR: trailing junk after numeric literal at or near "123a"
@@ -32,10 +62,6 @@ SELECT 0x0o;
ERROR: trailing junk after numeric literal at or near "0x0o"
LINE 1: SELECT 0x0o;
^
-SELECT 1_2_3;
-ERROR: trailing junk after numeric literal at or near "1_"
-LINE 1: SELECT 1_2_3;
- ^
SELECT 0.a;
ERROR: trailing junk after numeric literal at or near "0.a"
LINE 1: SELECT 0.a;
diff --git a/src/test/regress/sql/numerology.sql
b/src/test/regress/sql/numerology.sql
index 0e12bcc7b7..f35ff31d9a 100644
--- a/src/test/regress/sql/numerology.sql
+++ b/src/test/regress/sql/numerology.sql
@@ -12,10 +12,15 @@
SELECT 0o273;
SELECT 0x42F;
+SELECT 1_000_000;
+SELECT 1_2_3;
+SELECT 0x1EEE_FFFF;
+SELECT 0o2_73;
+SELECT 0b_10_0101;
+
-- error cases
SELECT 123abc;
SELECT 0x0o;
-SELECT 1_2_3;
SELECT 0.a;
SELECT 0.0a;
SELECT .0a;
--
2.34.1