Re: Non-decimal integer literals

2023-01-24 Thread Ranier Vilela
Em ter., 24 de jan. de 2023 às 07:24, Dean Rasheed 
escreveu:

> On Tue, 24 Jan 2023 at 00:47, Ranier Vilela  wrote:
> >
> > On 13.01.23 11:01, Dean Rasheed wrote:
> > > So I'm feeling quite good about the end result -- I set out hoping not
> > > to make performance noticeably worse, but ended up making it
> > > significantly better.
> > Hi Dean, thanks for your work.
> >
> > But since PG_RETURN_NULL, is a simple return,
> > now the "value" var is not leaked?
> >
>
> That originates from a prior commit:
>
> ccff2d20ed Convert a few datatype input functions to use "soft" error
> reporting.
>
> and see also a bunch of follow-on commits for other input functions.
>
> It will only return NULL if the input is invalid and escontext is
> non-NULL. You only identified a fraction of the cases where that would
> happen. If we really cared about not leaking memory for invalid
> inputs, we'd have to look at every code path using ereturn()
> (including lower-level functions, and not just in numeric.c). I think
> that would be a waste of time, and counterproductive -- trying to
> immediately free memory for all possible invalid inputs would likely
> complicate a lot of code, and slow down parsing of valid inputs.
> Better to leave it until the owning memory context is freed.
>
Thank you for the explanation.

regards,
Ranier Vilela


Re: Non-decimal integer literals

2023-01-24 Thread Dean Rasheed
On Tue, 24 Jan 2023 at 00:47, Ranier Vilela  wrote:
>
> On 13.01.23 11:01, Dean Rasheed wrote:
> > So I'm feeling quite good about the end result -- I set out hoping not
> > to make performance noticeably worse, but ended up making it
> > significantly better.
> Hi Dean, thanks for your work.
>
> But since PG_RETURN_NULL, is a simple return,
> now the "value" var is not leaked?
>

That originates from a prior commit:

ccff2d20ed Convert a few datatype input functions to use "soft" error reporting.

and see also a bunch of follow-on commits for other input functions.

It will only return NULL if the input is invalid and escontext is
non-NULL. You only identified a fraction of the cases where that would
happen. If we really cared about not leaking memory for invalid
inputs, we'd have to look at every code path using ereturn()
(including lower-level functions, and not just in numeric.c). I think
that would be a waste of time, and counterproductive -- trying to
immediately free memory for all possible invalid inputs would likely
complicate a lot of code, and slow down parsing of valid inputs.
Better to leave it until the owning memory context is freed.

Regards,
Dean




Re: Non-decimal integer literals

2023-01-23 Thread Ranier Vilela
 On 13.01.23 11:01, Dean Rasheed wrote:
> So I'm feeling quite good about the end result -- I set out hoping not
> to make performance noticeably worse, but ended up making it
> significantly better.
Hi Dean, thanks for your work.

But since PG_RETURN_NULL, is a simple return,
now the "value" var is not leaked?

If not, sorry for the noise.

regards,
Ranier Vilela


avoid_leak_value_numeric.patch
Description: Binary data


Re: Non-decimal integer literals

2023-01-23 Thread Dean Rasheed
On Mon, 23 Jan 2023 at 20:00, Joel Jacobson  wrote:
>
> Nice! This also simplifies when dealing with non-negative integers 
> represented as byte arrays,
> common in e.g. cryptography code.
>

Ah, interesting. I hadn't thought of that use-case.

> create function numeric_from_bytes(bytea) returns numeric language sql as $$
> select ('0'||right($1::text,-1))::numeric
> $$;
>
> Would we want a built-in function for this?

Not sure. It does feel a bit niche. It's quite common in other
programming languages, but that doesn't mean that a lot of Postgres
users need it. Perhaps start a new thread to gauge people's interest?

Regards,
Dean




Re: Non-decimal integer literals

2023-01-23 Thread Joel Jacobson
On Fri, Jan 13, 2023, at 07:01, Dean Rasheed wrote:
> Attachments:
> * 0001-Add-non-decimal-integer-support-to-type-numeric.patch

Nice! This also simplifies when dealing with non-negative integers represented 
as byte arrays,
common in e.g. cryptography code.

Before, one had to implement numeric_from_bytes(bytea) in plpgsql [1],
which can now be greatly simplified:

create function numeric_from_bytes(bytea) returns numeric language sql as $$
select ('0'||right($1::text,-1))::numeric
$$;

\timing
select numeric_from_bytes(('\x'||repeat('0123456789abcdef',1000))::bytea);
Time: 484.223 ms -- HEAD + plpgsql numeric_from_bytes()
Time: 19.790 ms -- 0001 + simplified numeric_from_bytes()

About 25x faster!

Would we want a built-in function for this?
To avoid the text casts, but also to improve user-friendliness,
since the improved solution is still a hack a user needing it has to someone 
come up with or find.
The topic "Convert hex in text representation to decimal number" is an old one 
on Stackoverflow [2],
posted 11 years ago, with a myriad of various hackis solutions, out of which 
one had a bug that I reported.
Many other modern languages seems to have this as a built-in or in stdlibs:
Python3:
classmethod int.from_bytes(bytes, byteorder='big', *, signed=False)
Rust:
pub const fn from_be_bytes(bytes: [u8; 8]) -> u64

/Joel

[1] https://gist.github.com/joelonsql/f54552db1f0fd6d9b3397d255e51f58a
[2] 
https://stackoverflow.com/questions/8316164/convert-hex-in-text-representation-to-decimal-number




Re: Non-decimal integer literals

2023-01-23 Thread Dean Rasheed
On Mon, 23 Jan 2023 at 15:55, Peter Eisentraut
 wrote:
>
> On 13.01.23 11:01, Dean Rasheed wrote:
> > So I'm feeling quite good about the end result -- I set out hoping not
> > to make performance noticeably worse, but ended up making it
> > significantly better.
>
> This is great!  How do you want to proceed?  You also posted an updated
> patch in the "underscores" thread and suggested some additional work
> there.  In which order should these be addressed, in your opinion?
>

I think it makes most sense if I push 0001 now, and then merge 0002
into the underscores patch. I think at least one of the suggested
changes to the underscores patch required 0002 to work.

Regards,
Dean




Re: Non-decimal integer literals

2023-01-23 Thread Peter Eisentraut

On 13.01.23 11:01, Dean Rasheed wrote:

So I'm feeling quite good about the end result -- I set out hoping not
to make performance noticeably worse, but ended up making it
significantly better.


This is great!  How do you want to proceed?  You also posted an updated 
patch in the "underscores" thread and suggested some additional work 
there.  In which order should these be addressed, in your opinion?






Re: Non-decimal integer literals

2023-01-13 Thread Dean Rasheed
On Wed, 14 Dec 2022 at 05:47, Peter Eisentraut
 wrote:
>
> committed

Now that we have this for integer types, I think it's worth doing for
numeric as well, since the parser will now pass such things through to
numeric_in() when they don't fit in an int64, and it seems plausible
that at least some people might use non-decimal integers beyond
INT64MIN/MAX. Also, without such support in numeric_in(), the feature
looks a little incomplete:

SELECT -0x8000;
   ?column?
--
 -9223372036854775808
(1 row)

SELECT 0x8000;
ERROR:  invalid input syntax for type numeric: "0x8000"
LINE 1: select 0x8000;
   ^

One concern I had was what the performance would be like. I don't
really expect people to pass in the kinds of truly huge values that
numeric supports, but it can't be ruled out. So I gave it a go, to see
how hard it would be, and what the worst-case performance looks like.
(I included underscore-handling too, so that I could measure that at
the same time.)

The base-conversion algorithm is O(N^2), and the worst case before
overflow is with hex strings with around 108,000 digits, oct strings
with around 145,000 digits, or binary strings with around 435,000
digits. Each of those takes around 400ms to parse on my machine.
That's around the level at which I might consider adding
CHECK_FOR_INTERRUPTS()'s, but I think that it's probably not worth it,
given how unrealistic such huge inputs are in practice.

The other important thing is that this shouldn't impact the
performance when parsing regular decimal inputs. The bulk of the
non-decimal integer parsing is handled by a separate function, which
is called directly from numeric_in(), since non-decimal handling isn't
required at the set_var_from_str() level (used by the float4/8 ->
numeric conversion functions). I also re-arranged the numeric_in()
code somewhat, and was able to make substantial savings by reducing
the number of pg_strncasecmp() calls, and avoiding those calls
entirely for regular numbers that aren't NaN or Inf. Testing that with
COPY with a few million numbers of different sizes, I observed a
10-15% performance increase.

So I'm feeling quite good about the end result -- I set out hoping not
to make performance noticeably worse, but ended up making it
significantly better.

Regards,
Dean
From f129bcdaeaaa62d8ddaf6a8e6441183f46097687 Mon Sep 17 00:00:00 2001
From: Dean Rasheed 
Date: Fri, 13 Jan 2023 09:20:17 +
Subject: [PATCH 1/2] Add non-decimal integer support to type numeric.

This enhances the numeric type input function, adding support for
hexadecimal, octal, and binary integers of any size, up to the limits
of the numeric type.

Since 6fcda9aba8, such non-decimal integers have been accepted by the
parser as integer literals and passed through to numeric_in(). This
commit gives numeric_in() the ability to handle them.

While at it, simplify the handling of NaN and infinities, reducing the
number of calls to pg_strncasecmp(), and arrange for pg_strncasecmp()
to not be called at all for regular numbers. This gives a significant
performance improvement for decimal inputs, more than offsetting the
small performance hit of checking for non-decimal input.
---
 src/backend/utils/adt/numeric.c  | 355 +++
 src/test/regress/expected/numeric.out|  62 +++-
 src/test/regress/expected/numerology.out |  48 +--
 src/test/regress/sql/numeric.sql |  10 +
 4 files changed, 380 insertions(+), 95 deletions(-)

diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c
index a6409ecbee..ed592841dc 100644
--- a/src/backend/utils/adt/numeric.c
+++ b/src/backend/utils/adt/numeric.c
@@ -500,6 +500,11 @@ static void zero_var(NumericVar *var);
 static bool set_var_from_str(const char *str, const char *cp,
 			 NumericVar *dest, const char **endptr,
 			 Node *escontext);
+static bool set_var_from_non_decimal_integer_str(const char *str,
+ const char *cp, int sign,
+ int base, NumericVar *dest,
+ const char **endptr,
+ Node *escontext);
 static void set_var_from_num(Numeric num, NumericVar *dest);
 static void init_var_from_num(Numeric num, NumericVar *dest);
 static void set_var_from_var(const NumericVar *value, NumericVar *dest);
@@ -625,6 +630,8 @@ numeric_in(PG_FUNCTION_ARGS)
 	Node	   *escontext = fcinfo->context;
 	Numeric		res;
 	const char *cp;
+	const char *numstart;
+	int			sign;
 
 	/* Skip leading spaces */
 	cp = str;
@@ -636,70 +643,130 @@ numeric_in(PG_FUNCTION_ARGS)
 	}
 
 	/*
-	 * Check for NaN and infinities.  We recognize the same strings allowed by
-	 * float8in().
+	 * Process the number's sign. This duplicates logic in set_var_from_str(),
+	 * but it's worth doing here, since it simplifies the handling of
+	 * infinities and non-decimal integers.
 	 */
-	if (pg_strncasecmp(cp, "NaN", 3) == 0)
-	{
-		res = make_result(_nan);
-		cp += 3;
-	}
-	else if 

Re: Non-decimal integer literals

2022-12-13 Thread Peter Eisentraut

On 08.12.22 12:16, Peter Eisentraut wrote:

On 29.11.22 21:22, David Rowley wrote:

There seems to be a small bug in the pg_strtointXX functions in the
code that checks that there's at least 1 digit.  This causes 0x to be
a valid representation of zero.  That does not seem to be allowed by
the parser, so I think we should likely reject it in COPY too.
-- probably shouldn't work
postgres=# copy a from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.

0x
\.

COPY 1


Fixed in new patch.  I moved the "require at least one digit" checks 
after the loops over the digits, to make it easier to write one check 
for all bases.


This patch is also incorporates your changes to the digit analysis 
algorithm.  I didn't check it carefully, but all the tests still pass. ;-)


committed




Re: Non-decimal integer literals

2022-12-08 Thread Peter Eisentraut

On 29.11.22 21:22, David Rowley wrote:

There seems to be a small bug in the pg_strtointXX functions in the
code that checks that there's at least 1 digit.  This causes 0x to be
a valid representation of zero.  That does not seem to be allowed by
the parser, so I think we should likely reject it in COPY too.
-- probably shouldn't work
postgres=# copy a from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.

0x
\.

COPY 1


Fixed in new patch.  I moved the "require at least one digit" checks 
after the loops over the digits, to make it easier to write one check 
for all bases.


This patch is also incorporates your changes to the digit analysis 
algorithm.  I didn't check it carefully, but all the tests still pass. ;-)
From 76510f2077d3075653a9bbe899b9d4752953d30e Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 8 Dec 2022 12:10:41 +0100
Subject: [PATCH v12] Non-decimal integer literals

Add support for hexadecimal, octal, and binary integer literals:

0x42F
0o273
0b100101

per SQL:202x draft.

This adds support in the lexer as well as in the integer type input
functions.

Discussion: 
https://www.postgresql.org/message-id/flat/b239564c-cad0-b23e-c57e-166d883cb...@enterprisedb.com
---
 doc/src/sgml/syntax.sgml   |  34 
 src/backend/catalog/information_schema.sql |   6 +-
 src/backend/catalog/sql_features.txt   |   1 +
 src/backend/parser/parse_node.c|  37 +++-
 src/backend/parser/scan.l  | 101 ---
 src/backend/utils/adt/numutils.c   | 185 +---
 src/fe_utils/psqlscan.l|  78 +++--
 src/interfaces/ecpg/preproc/pgc.l  | 106 ++-
 src/test/regress/expected/int2.out |  92 ++
 src/test/regress/expected/int4.out |  92 ++
 src/test/regress/expected/int8.out |  92 ++
 src/test/regress/expected/numerology.out   | 193 -
 src/test/regress/sql/int2.sql  |  26 +++
 src/test/regress/sql/int4.sql  |  26 +++
 src/test/regress/sql/int8.sql  |  26 +++
 src/test/regress/sql/numerology.sql|  51 +-
 16 files changed, 1028 insertions(+), 118 deletions(-)

diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index 93ad71737f..956182e7c6 100644
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -694,6 +694,40 @@ Numeric Constants
 
 
 
+
+ Additionally, non-decimal integer constants can be used in these forms:
+
+0xhexdigits
+0ooctdigits
+0bbindigits
+
+ hexdigits is one or more hexadecimal digits
+ (0-9, A-F), octdigits is one or more octal
+ digits (0-7), bindigits is one or more binary
+ digits (0 or 1).  Hexadecimal digits and the radix prefixes can be in
+ upper or lower case.  Note that only integers can have non-decimal forms,
+ not numbers with fractional parts.
+
+
+
+ These are some examples of this:
+0b100101
+0B10011001
+0o273
+0O755
+0x42f
+0X
+
+
+
+
+ 
+  Nondecimal integer constants are currently only supported in the range
+  of the bigint type (see ).
+ 
+
+
 
  integer
  bigint
diff --git a/src/backend/catalog/information_schema.sql 
b/src/backend/catalog/information_schema.sql
index 18725a02d1..95c27a625e 100644
--- a/src/backend/catalog/information_schema.sql
+++ b/src/backend/catalog/information_schema.sql
@@ -119,7 +119,7 @@ CREATE FUNCTION _pg_numeric_precision(typid oid, typmod 
int4) RETURNS integer
  WHEN 1700 /*numeric*/ THEN
   CASE WHEN $2 = -1
THEN null
-   ELSE (($2 - 4) >> 16) & 65535
+   ELSE (($2 - 4) >> 16) & 0x
END
  WHEN 700 /*float4*/ THEN 24 /*FLT_MANT_DIG*/
  WHEN 701 /*float8*/ THEN 53 /*DBL_MANT_DIG*/
@@ -147,7 +147,7 @@ CREATE FUNCTION _pg_numeric_scale(typid oid, typmod int4) 
RETURNS integer
WHEN $1 IN (1700) THEN
 CASE WHEN $2 = -1
  THEN null
- ELSE ($2 - 4) & 65535
+ ELSE ($2 - 4) & 0x
  END
ELSE null
   END;
@@ -163,7 +163,7 @@ CREATE FUNCTION _pg_datetime_precision(typid oid, typmod 
int4) RETURNS integer
WHEN $1 IN (1083, 1114, 1184, 1266) /* time, timestamp, same + tz */
THEN CASE WHEN $2 < 0 THEN 6 ELSE $2 END
WHEN $1 IN (1186) /* interval */
-   THEN CASE WHEN $2 < 0 OR $2 & 65535 = 65535 THEN 6 ELSE $2 & 65535 
END
+   THEN CASE WHEN $2 < 0 OR $2 & 0x = 0x THEN 6 ELSE $2 & 
0x END
ELSE null
   END;
 
diff --git a/src/backend/catalog/sql_features.txt 
b/src/backend/catalog/sql_features.txt
index 8704a42b60..abad216b7e 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catal

Re: Non-decimal integer literals

2022-11-30 Thread Tom Lane
David Rowley  writes:
> I agree that it should be a separate patch.  But thinking about what
> Tom mentioned in [1], I had in mind this patch would need to wait
> until the new standard is out so that we have a more genuine reason
> for breaking existing queries.

Well, we already broke them in v15: that example now gives

regression=# select 0x42e;
ERROR:  trailing junk after numeric literal at or near "0x"
LINE 1: select 0x42e;
   ^

So there's probably no compatibility reason not to drop the
other shoe.

regards, tom lane




Re: Non-decimal integer literals

2022-11-30 Thread David Rowley
On Thu, 1 Dec 2022 at 00:34, Dean Rasheed  wrote:
> So something
> like:
>
> // Accumulate positive value using unsigned int, with approximate
> // overflow check. If acc >= 1 - INT_MIN / 10, then acc * 10 is
> // sure to exceed -INT_MIN.
> unsigned int cutoff = 1 - INT_MIN / 10;
> unsigned int acc = 0;
>
> while (*ptr && isdigit((unsigned char) *ptr))
> {
> if (unlikely(acc >= cutoff))
> goto out_of_range;
> acc = acc * 10 + (*ptr - '0');
> ptr++;
> }
>
> and similar for other bases, allowing the coding for all bases to be
> kept similar.

Seems like a good idea to me. Couldn't the cutoff check just be "acc >
INT_MAX / 10"?

> I think it's probably best to consider this as a follow-on patch
> though. It shouldn't delay getting the main feature committed.

I agree that it should be a separate patch.  But thinking about what
Tom mentioned in [1], I had in mind this patch would need to wait
until the new standard is out so that we have a more genuine reason
for breaking existing queries.

I've drafted up a full patch for improving the current base-10 code,
so I'll go post that on another thread.

David

[1] https://postgr.es/m/3260805.1631106...@sss.pgh.pa.us




Re: Non-decimal integer literals

2022-11-30 Thread Dean Rasheed
On Wed, 30 Nov 2022 at 05:50, David Rowley  wrote:
>
> I spent a bit more time trying to figure out why the compiler does
> imul instead of bit shifting and it just seems to be down to a
> combination of signed-ness plus the overflow check. See [1]. Neither
> of the two compilers I tested could use bit shifting with a signed
> type when overflow checking is done, which is what we're doing in the
> new code.
>

Ah, interesting. That makes me think that it might be possible to get
some performance gains for all bases (including 10) by separating the
overflow check from the multiplication, and giving the compiler the
best chance to decide on the optimal way to do the multiplication. For
example, on my Intel box, GCC prefers a pair of LEA instructions over
an IMUL, to multiply by 10.

I like your previous idea of using an unsigned integer for the
accumulator, because then the overflow check in the loop doesn't need
to be exact, as long as an exact check is done later. That way, there
are fewer conditional branches in the loop, and the possibility for
the compiler to choose the fastest multiplication method. So something
like:

// Accumulate positive value using unsigned int, with approximate
// overflow check. If acc >= 1 - INT_MIN / 10, then acc * 10 is
// sure to exceed -INT_MIN.
unsigned int cutoff = 1 - INT_MIN / 10;
unsigned int acc = 0;

while (*ptr && isdigit((unsigned char) *ptr))
{
if (unlikely(acc >= cutoff))
goto out_of_range;
acc = acc * 10 + (*ptr - '0');
ptr++;
}

and similar for other bases, allowing the coding for all bases to be
kept similar.

I think it's probably best to consider this as a follow-on patch
though. It shouldn't delay getting the main feature committed.

Regards,
Dean




Re: Non-decimal integer literals

2022-11-29 Thread David Rowley
On Wed, 23 Nov 2022 at 22:19, John Naylor  wrote:
>
>
> On Wed, Nov 23, 2022 at 3:54 PM David Rowley  wrote:
> >
> > Going by [1], clang will actually use multiplication by 16 to
> > implement the former. gcc is better and shifts left by 4, so likely
> > won't improve things for gcc.  It seems worth doing it this way for
> > anything that does not have HAVE__BUILTIN_OP_OVERFLOW anyway.
>
> FWIW, gcc 12.2 generates an imul on my system when compiling in situ.

I spent a bit more time trying to figure out why the compiler does
imul instead of bit shifting and it just seems to be down to a
combination of signed-ness plus the overflow check. See [1]. Neither
of the two compilers I tested could use bit shifting with a signed
type when overflow checking is done, which is what we're doing in the
new code.

In clang 15, multiplication is done in both smultiply16 and
umultiply16. These both check for overflow. The versions without the
overflow checks both use bit shifting. With GCC, only smultiply16 does
multiplication. The other 3 variants all use bit shifting.

David

[1] https://godbolt.org/z/EG9jKMjq5




Re: Non-decimal integer literals

2022-11-29 Thread David Rowley
On Tue, 29 Nov 2022 at 03:00, Peter Eisentraut
 wrote:
> Fixed in new patch.

There seems to be a small bug in the pg_strtointXX functions in the
code that checks that there's at least 1 digit.  This causes 0x to be
a valid representation of zero.  That does not seem to be allowed by
the parser, so I think we should likely reject it in COPY too.

-- Does not work.
postgres=# select 0x + 1;
ERROR:  invalid hexadecimal integer at or near "0x"
LINE 1: select 0x + 1;


postgres=# create table a (a int);
CREATE TABLE

-- probably shouldn't work
postgres=# copy a from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> 0x
>> \.
COPY 1

David




Re: Non-decimal integer literals

2022-11-29 Thread David Rowley
On Tue, 29 Nov 2022 at 23:11, Dean Rasheed  wrote:
>
> On Wed, 23 Nov 2022 at 08:56, David Rowley  wrote:
> >
> > On Wed, 23 Nov 2022 at 21:54, David Rowley  wrote:
> > > I wonder if you'd be better off with something like:
> > >
> > > while (*ptr && isxdigit((unsigned char) *ptr))
> > > {
> > > if (unlikely(tmp & UINT64CONST(0xF000)))
> > > goto out_of_range;
> > >
> > > tmp = (tmp << 4) | hexlookup[(unsigned char) *ptr++];
> > > }
> >
> > Here's a delta diff with it changed to work that way.
> >
>
> This isn't correct, because those functions are meant to accumulate a
> negative number in "tmp".

Looks like I didn't quite look at that code closely enough.

To make that work we could just form the non-decimal versions in an
unsigned integer of the given size and then check if that's become
greater than -PG_INTXX_MIN after the loop.  We'd then just need to
convert it back to its negative form.

i.e:

uint64 tmp2 = 0;
ptr += 2;
while (*ptr && isxdigit((unsigned char) *ptr))
{
if (unlikely(tmp2 & UINT64CONST(0xF000)))
goto out_of_range;

tmp2 = (tmp2 << 4) | hexlookup[(unsigned char) *ptr++];
}

if (tmp2 > -PG_INT64_MIN)
goto out_of_range;
tmp = -((int64) tmp2);

David




Re: Non-decimal integer literals

2022-11-29 Thread Dean Rasheed
On Wed, 23 Nov 2022 at 08:56, David Rowley  wrote:
>
> On Wed, 23 Nov 2022 at 21:54, David Rowley  wrote:
> > I wonder if you'd be better off with something like:
> >
> > while (*ptr && isxdigit((unsigned char) *ptr))
> > {
> > if (unlikely(tmp & UINT64CONST(0xF000)))
> > goto out_of_range;
> >
> > tmp = (tmp << 4) | hexlookup[(unsigned char) *ptr++];
> > }
>
> Here's a delta diff with it changed to work that way.
>

This isn't correct, because those functions are meant to accumulate a
negative number in "tmp".

The overflow check can't just ignore the final digit either, so I'm
not sure how much this would end up saving once those issues are
fixed.

Regards,
Dean




Re: Non-decimal integer literals

2022-11-28 Thread David Rowley
On Sat, 26 Nov 2022 at 05:13, Peter Eisentraut
 wrote:
>
> On 24.11.22 10:13, David Rowley wrote:
> > I
> > remember many years ago and several jobs ago when working with SQL
> > Server being able to speed up importing data using hexadecimal
> > DATETIMEs. I can't think why else you might want to represent a
> > DATETIME as a hexstring, so I assumed this was a large part of the use
> > case for INTs in PostgreSQL. Are you telling me that better
> > performance is not something anyone will want out of this feature?
>
> This isn't about datetimes but about integers.

I'm aware. My aim was to show that hex is commonly used as a more
efficient way of getting integer numbers in and out of computers.

Likely it's better for me to quantify this performance increase claim
with some actual performance results.

Here's master (@f0cd57f85) doing copy ab2 from '/tmp/ab.csv';

ab2 is a table with no indexes and just 2 int columns.

  16.55%  postgres  [.] CopyReadLine
   7.82%  postgres  [.] pg_strtoint32
   7.60%  postgres  [.] CopyReadAttributesText
   7.06%  postgres  [.] NextCopyFrom
   4.40%  postgres  [.] CopyFrom

The copy completes in 2512.5278 ms (average time over 10 runs)

Patching master with your v11 patch and copying in hex numbers instead
of decimal numbers shows:

  14.39%  postgres  [.] CopyReadLine
   8.60%  postgres  [.] pg_strtoint32
   6.95%  postgres  [.] NextCopyFrom
   6.79%  postgres  [.] CopyReadAttributesText
   4.81%  postgres  [.] CopyFrom

This shows that we're spending proportionally less time in
CopyReadLine() and proportionally more time in pg_strtoint32(). There
are probably two things going on there, CopyReadLine is likely faster
due to having to read fewer bytes and pg_strtoint32() is likely slower
due to additional branching and code size.

This (copy ab2 from '/tmp/abhex.csv') saw an average time of 2720.1387
ms over 10 runs.

Patching master with your v11 patch +
more_efficient_hex_oct_and_binary_processing.diff

  15.68%  postgres  [.] CopyReadLine
   7.75%  postgres  [.] NextCopyFrom
   7.73%  postgres  [.] pg_strtoint32
   6.25%  postgres  [.] CopyReadAttributesText
   4.76%  postgres  [.] CopyFrom

The average time to import the hex version of the csv file comes down
to 2385.7298 ms over 10 runs.

I didn't run any tests to see how much the performance of importing
the decimal representation slowed down from the v11 patch. I assume
there will be a small performance hit due to the extra processing done
in pg_strtoint32()

David




Re: Non-decimal integer literals

2022-11-28 Thread Peter Eisentraut

On 23.11.22 17:25, Dean Rasheed wrote:

Taking a quick look, I noticed that there are no tests for negative
values handled in the parser.

Giving that a spin shows that make_const() fails to correctly identify
the base of negative non-decimal integers in the T_Float case, causing
it to fall through to numeric_in() and fail:


Fixed in new patch.
From 2d7f41981187df904e3d985f2770d9b5c26e9d7c Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 28 Nov 2022 09:24:20 +0100
Subject: [PATCH v11] Non-decimal integer literals

Add support for hexadecimal, octal, and binary integer literals:

0x42F
0o273
0b100101

per SQL:202x draft.

This adds support in the lexer as well as in the integer type input
functions.

Discussion: 
https://www.postgresql.org/message-id/flat/b239564c-cad0-b23e-c57e-166d883cb...@enterprisedb.com
---
 doc/src/sgml/syntax.sgml   |  34 
 src/backend/catalog/information_schema.sql |   6 +-
 src/backend/catalog/sql_features.txt   |   1 +
 src/backend/parser/parse_node.c|  37 +++-
 src/backend/parser/scan.l  | 101 ---
 src/backend/utils/adt/numutils.c   | 170 --
 src/fe_utils/psqlscan.l|  78 +++--
 src/interfaces/ecpg/preproc/pgc.l  | 106 ++-
 src/test/regress/expected/int2.out |  80 +
 src/test/regress/expected/int4.out |  80 +
 src/test/regress/expected/int8.out |  80 +
 src/test/regress/expected/numerology.out   | 193 -
 src/test/regress/sql/int2.sql  |  22 +++
 src/test/regress/sql/int4.sql  |  22 +++
 src/test/regress/sql/int8.sql  |  22 +++
 src/test/regress/sql/numerology.sql|  51 +-
 16 files changed, 974 insertions(+), 109 deletions(-)

diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index 93ad71737f51..956182e7c6a8 100644
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -694,6 +694,40 @@ Numeric Constants
 
 
 
+
+ Additionally, non-decimal integer constants can be used in these forms:
+
+0xhexdigits
+0ooctdigits
+0bbindigits
+
+ hexdigits is one or more hexadecimal digits
+ (0-9, A-F), octdigits is one or more octal
+ digits (0-7), bindigits is one or more binary
+ digits (0 or 1).  Hexadecimal digits and the radix prefixes can be in
+ upper or lower case.  Note that only integers can have non-decimal forms,
+ not numbers with fractional parts.
+
+
+
+ These are some examples of this:
+0b100101
+0B10011001
+0o273
+0O755
+0x42f
+0X
+
+
+
+
+ 
+  Nondecimal integer constants are currently only supported in the range
+  of the bigint type (see ).
+ 
+
+
 
  integer
  bigint
diff --git a/src/backend/catalog/information_schema.sql 
b/src/backend/catalog/information_schema.sql
index 18725a02d1fb..95c27a625e7e 100644
--- a/src/backend/catalog/information_schema.sql
+++ b/src/backend/catalog/information_schema.sql
@@ -119,7 +119,7 @@ CREATE FUNCTION _pg_numeric_precision(typid oid, typmod 
int4) RETURNS integer
  WHEN 1700 /*numeric*/ THEN
   CASE WHEN $2 = -1
THEN null
-   ELSE (($2 - 4) >> 16) & 65535
+   ELSE (($2 - 4) >> 16) & 0x
END
  WHEN 700 /*float4*/ THEN 24 /*FLT_MANT_DIG*/
  WHEN 701 /*float8*/ THEN 53 /*DBL_MANT_DIG*/
@@ -147,7 +147,7 @@ CREATE FUNCTION _pg_numeric_scale(typid oid, typmod int4) 
RETURNS integer
WHEN $1 IN (1700) THEN
 CASE WHEN $2 = -1
  THEN null
- ELSE ($2 - 4) & 65535
+ ELSE ($2 - 4) & 0x
  END
ELSE null
   END;
@@ -163,7 +163,7 @@ CREATE FUNCTION _pg_datetime_precision(typid oid, typmod 
int4) RETURNS integer
WHEN $1 IN (1083, 1114, 1184, 1266) /* time, timestamp, same + tz */
THEN CASE WHEN $2 < 0 THEN 6 ELSE $2 END
WHEN $1 IN (1186) /* interval */
-   THEN CASE WHEN $2 < 0 OR $2 & 65535 = 65535 THEN 6 ELSE $2 & 65535 
END
+   THEN CASE WHEN $2 < 0 OR $2 & 0x = 0x THEN 6 ELSE $2 & 
0x END
ELSE null
   END;
 
diff --git a/src/backend/catalog/sql_features.txt 
b/src/backend/catalog/sql_features.txt
index 8704a42b60a8..abad216b7ee4 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -527,6 +527,7 @@ T652SQL-dynamic statements in SQL routines  
NO
 T653   SQL-schema statements in external routines  YES 
 T654   SQL-dynamic statements in external routines NO  
 T655   Cyclically dependent routines   YES 
+T661   Non-decimal integer literalsYES SQL:202x draft
 T811   Basic SQL/JSON constructor functions

Re: Non-decimal integer literals

2022-11-25 Thread Peter Eisentraut

On 24.11.22 10:13, David Rowley wrote:

On Thu, 24 Nov 2022 at 21:35, Peter Eisentraut
 wrote:

My code follows the style used for parsing the decimal integers.
Keeping that consistent is valuable I think.  I think the proposed
change makes the code significantly harder to understand.  Also, what
you are suggesting here would amount to an attempt to make parsing
hexadecimal integers even faster than parsing decimal integers.  Is that
useful?


Isn't it being faster one of the major use cases for this feature?


Never thought about it that way.


I
remember many years ago and several jobs ago when working with SQL
Server being able to speed up importing data using hexadecimal
DATETIMEs. I can't think why else you might want to represent a
DATETIME as a hexstring, so I assumed this was a large part of the use
case for INTs in PostgreSQL. Are you telling me that better
performance is not something anyone will want out of this feature?


This isn't about datetimes but about integers.





Re: Non-decimal integer literals

2022-11-24 Thread David Rowley
On Thu, 24 Nov 2022 at 21:35, Peter Eisentraut
 wrote:
> My code follows the style used for parsing the decimal integers.
> Keeping that consistent is valuable I think.  I think the proposed
> change makes the code significantly harder to understand.  Also, what
> you are suggesting here would amount to an attempt to make parsing
> hexadecimal integers even faster than parsing decimal integers.  Is that
> useful?

Isn't it being faster one of the major use cases for this feature?   I
remember many years ago and several jobs ago when working with SQL
Server being able to speed up importing data using hexadecimal
DATETIMEs. I can't think why else you might want to represent a
DATETIME as a hexstring, so I assumed this was a large part of the use
case for INTs in PostgreSQL. Are you telling me that better
performance is not something anyone will want out of this feature?

David




Re: Non-decimal integer literals

2022-11-24 Thread Peter Eisentraut

On 23.11.22 09:54, David Rowley wrote:

On Wed, 23 Nov 2022 at 02:37, Peter Eisentraut
 wrote:

Here is a new patch.


This looks like quite an inefficient way to convert a hex string into an int64:

 while (*ptr && isxdigit((unsigned char) *ptr))
 {
 int8digit = hexlookup[(unsigned char) *ptr];

 if (unlikely(pg_mul_s64_overflow(tmp, 16, )) ||
 unlikely(pg_sub_s64_overflow(tmp, digit, )))
 goto out_of_range;

 ptr++;
 }

I wonder if you'd be better off with something like:

 while (*ptr && isxdigit((unsigned char) *ptr))
 {
 if (unlikely(tmp & UINT64CONST(0xF000)))
 goto out_of_range;

 tmp = (tmp << 4) | hexlookup[(unsigned char) *ptr++];
 }

Going by [1], clang will actually use multiplication by 16 to
implement the former. gcc is better and shifts left by 4, so likely
won't improve things for gcc.  It seems worth doing it this way for
anything that does not have HAVE__BUILTIN_OP_OVERFLOW anyway.


My code follows the style used for parsing the decimal integers. 
Keeping that consistent is valuable I think.  I think the proposed 
change makes the code significantly harder to understand.  Also, what 
you are suggesting here would amount to an attempt to make parsing 
hexadecimal integers even faster than parsing decimal integers.  Is that 
useful?





Re: Non-decimal integer literals

2022-11-23 Thread Dean Rasheed
On Tue, 22 Nov 2022 at 13:37, Peter Eisentraut
 wrote:
>
> On 15.11.22 11:31, Peter Eisentraut wrote:
> > On 14.11.22 08:25, John Naylor wrote:
> >> Regarding the patch, it looks good overall. My only suggestion would
> >> be to add a regression test for just below and just above overflow, at
> >> least for int2.
> >
> This was a valuable suggestion, because this found some breakage.  In
> particular, the handling of grammar-level literals that overflow to
> "Float" was not correct.  (The radix prefix was simply stripped and
> forgotten.)  So I added a bunch more tests for this.  Here is a new patch.

Taking a quick look, I noticed that there are no tests for negative
values handled in the parser.

Giving that a spin shows that make_const() fails to correctly identify
the base of negative non-decimal integers in the T_Float case, causing
it to fall through to numeric_in() and fail:

SELECT -0x8000;

ERROR:  invalid input syntax for type numeric: "-0x8000"
   ^
Regards,
Dean




Re: Non-decimal integer literals

2022-11-23 Thread John Naylor
On Wed, Nov 23, 2022 at 3:54 PM David Rowley  wrote:
>
> Going by [1], clang will actually use multiplication by 16 to
> implement the former. gcc is better and shifts left by 4, so likely
> won't improve things for gcc.  It seems worth doing it this way for
> anything that does not have HAVE__BUILTIN_OP_OVERFLOW anyway.

FWIW, gcc 12.2 generates an imul on my system when compiling in situ. I've
found it useful to run godbolt locally* and load the entire PG file (nicer
to read than plain objdump) -- compilers can make different decisions when
going from isolated snippets to within full functions.

* clone from https://github.com/compiler-explorer/compiler-explorer
install npm 16
run "make" and when finished will show the localhost url
add the right flags, which in this case was

-Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement
-Werror=vla -Wendif-labels -Wmissing-format-attribute
-Wimplicit-fallthrough=3 -Wcast-function-type -Wformat-security
-fno-strict-aliasing -fwrapv -fexcess-precision=standard
-Wno-format-truncation -Wno-stringop-truncation -O2
-I/path/to/srcdir/src/include -I/path/to/builddir/src/include  -D_GNU_SOURCE

--
John Naylor
EDB: http://www.enterprisedb.com


Re: Non-decimal integer literals

2022-11-23 Thread David Rowley
On Wed, 23 Nov 2022 at 21:54, David Rowley  wrote:
> I wonder if you'd be better off with something like:
>
> while (*ptr && isxdigit((unsigned char) *ptr))
> {
> if (unlikely(tmp & UINT64CONST(0xF000)))
> goto out_of_range;
>
> tmp = (tmp << 4) | hexlookup[(unsigned char) *ptr++];
> }

Here's a delta diff with it changed to work that way.

David
diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c
index 2942b7c449..ce305b611d 100644
--- a/src/backend/utils/adt/numutils.c
+++ b/src/backend/utils/adt/numutils.c
@@ -136,13 +136,10 @@ pg_strtoint16(const char *s)
ptr += 2;
while (*ptr && isxdigit((unsigned char) *ptr))
{
-   int8digit = hexlookup[(unsigned char) *ptr];
-
-   if (unlikely(pg_mul_s16_overflow(tmp, 16, )) ||
-   unlikely(pg_sub_s16_overflow(tmp, digit, )))
+   if (unlikely(tmp & 0xF000))
goto out_of_range;
 
-   ptr++;
+   tmp = (tmp << 4) | hexlookup[(unsigned char) *ptr++];
}
}
else if (ptr[0] == '0' && (ptr[1] == 'o' || ptr[1] == 'O'))
@@ -151,11 +148,10 @@ pg_strtoint16(const char *s)
 
while (*ptr && (*ptr >= '0' && *ptr <= '7'))
{
-   int8digit = (*ptr++ - '0');
-
-   if (unlikely(pg_mul_s16_overflow(tmp, 8, )) ||
-   unlikely(pg_sub_s16_overflow(tmp, digit, )))
+   if (unlikely(tmp & 0xE000))
goto out_of_range;
+
+   tmp = (tmp << 3) | (*ptr++ - '0');
}
}
else if (ptr[0] == '0' && (ptr[1] == 'b' || ptr[1] == 'B'))
@@ -164,11 +160,10 @@ pg_strtoint16(const char *s)
 
while (*ptr && (*ptr >= '0' && *ptr <= '1'))
{
-   int8digit = (*ptr++ - '0');
-
-   if (unlikely(pg_mul_s16_overflow(tmp, 2, )) ||
-   unlikely(pg_sub_s16_overflow(tmp, digit, )))
+   if (unlikely(tmp & 0x8000))
goto out_of_range;
+
+   tmp = (tmp << 1) | (*ptr++ - '0');
}
}
else
@@ -255,13 +250,10 @@ pg_strtoint32(const char *s)
ptr += 2;
while (*ptr && isxdigit((unsigned char) *ptr))
{
-   int8digit = hexlookup[(unsigned char) *ptr];
-
-   if (unlikely(pg_mul_s32_overflow(tmp, 16, )) ||
-   unlikely(pg_sub_s32_overflow(tmp, digit, )))
+   if (unlikely(tmp & 0xF000))
goto out_of_range;
 
-   ptr++;
+   tmp = (tmp << 4) | hexlookup[(unsigned char) *ptr++];
}
}
else if (ptr[0] == '0' && (ptr[1] == 'o' || ptr[1] == 'O'))
@@ -270,11 +262,10 @@ pg_strtoint32(const char *s)
 
while (*ptr && (*ptr >= '0' && *ptr <= '7'))
{
-   int8digit = (*ptr++ - '0');
-
-   if (unlikely(pg_mul_s32_overflow(tmp, 8, )) ||
-   unlikely(pg_sub_s32_overflow(tmp, digit, )))
+   if (unlikely(tmp & 0xE000))
goto out_of_range;
+
+   tmp = (tmp << 3) | (*ptr++ - '0');
}
}
else if (ptr[0] == '0' && (ptr[1] == 'b' || ptr[1] == 'B'))
@@ -283,11 +274,10 @@ pg_strtoint32(const char *s)
 
while (*ptr && (*ptr >= '0' && *ptr <= '1'))
{
-   int8digit = (*ptr++ - '0');
-
-   if (unlikely(pg_mul_s32_overflow(tmp, 2, )) ||
-   unlikely(pg_sub_s32_overflow(tmp, digit, )))
+   if (unlikely(tmp & 0x8000))
goto out_of_range;
+
+   tmp = (tmp << 1) | (*ptr++ - '0');
}
}
else
@@ -382,13 +372,10 @@ pg_strtoint64(const char *s)
ptr += 2;
while (*ptr && isxdigit((unsigned char) *ptr))
{
-   int8digit = hexlookup[(unsigned char) *ptr];
-
-   if (unlikely(pg_mul_s64_overflow(tmp, 16, )) ||
-   unlikely(pg_sub_s64_overflow(tmp, digit, )))
+   if (unlikely(tmp & UINT64CONST(0xF000)))
goto out_of_range;
 
-   ptr++;
+   tmp = (tmp << 4) | hexlookup[(unsigned char) *ptr++];
}
  

Re: Non-decimal integer literals

2022-11-23 Thread David Rowley
On Wed, 23 Nov 2022 at 02:37, Peter Eisentraut
 wrote:
> Here is a new patch.

This looks like quite an inefficient way to convert a hex string into an int64:

while (*ptr && isxdigit((unsigned char) *ptr))
{
int8digit = hexlookup[(unsigned char) *ptr];

if (unlikely(pg_mul_s64_overflow(tmp, 16, )) ||
unlikely(pg_sub_s64_overflow(tmp, digit, )))
goto out_of_range;

ptr++;
}

I wonder if you'd be better off with something like:

while (*ptr && isxdigit((unsigned char) *ptr))
{
if (unlikely(tmp & UINT64CONST(0xF000)))
goto out_of_range;

tmp = (tmp << 4) | hexlookup[(unsigned char) *ptr++];
}

Going by [1], clang will actually use multiplication by 16 to
implement the former. gcc is better and shifts left by 4, so likely
won't improve things for gcc.  It seems worth doing it this way for
anything that does not have HAVE__BUILTIN_OP_OVERFLOW anyway.

David

[1] https://godbolt.org/z/jz6Th6jnM




Re: Non-decimal integer literals

2022-11-22 Thread John Naylor
On Tue, Nov 22, 2022 at 8:36 PM Peter Eisentraut <
peter.eisentr...@enterprisedb.com> wrote:
>
> On 15.11.22 11:31, Peter Eisentraut wrote:
> > On 14.11.22 08:25, John Naylor wrote:
> >> Regarding the patch, it looks good overall. My only suggestion would
> >> be to add a regression test for just below and just above overflow, at
> >> least for int2.
> >
> > ok
>
> This was a valuable suggestion, because this found some breakage.  In
> particular, the handling of grammar-level literals that overflow to
> "Float" was not correct.  (The radix prefix was simply stripped and
> forgotten.)  So I added a bunch more tests for this.  Here is a new patch.

Looks good to me.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: Non-decimal integer literals

2022-11-22 Thread Peter Eisentraut

On 15.11.22 11:31, Peter Eisentraut wrote:

On 14.11.22 08:25, John Naylor wrote:
Regarding the patch, it looks good overall. My only suggestion would 
be to add a regression test for just below and just above overflow, at 
least for int2.


ok


This was a valuable suggestion, because this found some breakage.  In 
particular, the handling of grammar-level literals that overflow to 
"Float" was not correct.  (The radix prefix was simply stripped and 
forgotten.)  So I added a bunch more tests for this.  Here is a new patch.
From c0daab31eb145fbe54c2822bc093d774b993cd3d Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 22 Nov 2022 14:22:09 +0100
Subject: [PATCH v10] Non-decimal integer literals

Add support for hexadecimal, octal, and binary integer literals:

0x42F
0o273
0b100101

per SQL:202x draft.

This adds support in the lexer as well as in the integer type input
functions.

Discussion: 
https://www.postgresql.org/message-id/flat/b239564c-cad0-b23e-c57e-166d883cb...@enterprisedb.com
---
 doc/src/sgml/syntax.sgml   |  34 +
 src/backend/catalog/information_schema.sql |   6 +-
 src/backend/catalog/sql_features.txt   |   1 +
 src/backend/parser/parse_node.c|  24 ++-
 src/backend/parser/scan.l  | 101 +---
 src/backend/utils/adt/numutils.c   | 170 +++--
 src/fe_utils/psqlscan.l|  78 +++---
 src/interfaces/ecpg/preproc/pgc.l  | 106 +++--
 src/test/regress/expected/int2.out |  80 ++
 src/test/regress/expected/int4.out |  80 ++
 src/test/regress/expected/int8.out |  80 ++
 src/test/regress/expected/numerology.out   | 127 ++-
 src/test/regress/sql/int2.sql  |  22 +++
 src/test/regress/sql/int4.sql  |  22 +++
 src/test/regress/sql/int8.sql  |  22 +++
 src/test/regress/sql/numerology.sql|  37 -
 16 files changed, 881 insertions(+), 109 deletions(-)

diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index 93ad71737f51..956182e7c6a8 100644
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -694,6 +694,40 @@ Numeric Constants
 
 
 
+
+ Additionally, non-decimal integer constants can be used in these forms:
+
+0xhexdigits
+0ooctdigits
+0bbindigits
+
+ hexdigits is one or more hexadecimal digits
+ (0-9, A-F), octdigits is one or more octal
+ digits (0-7), bindigits is one or more binary
+ digits (0 or 1).  Hexadecimal digits and the radix prefixes can be in
+ upper or lower case.  Note that only integers can have non-decimal forms,
+ not numbers with fractional parts.
+
+
+
+ These are some examples of this:
+0b100101
+0B10011001
+0o273
+0O755
+0x42f
+0X
+
+
+
+
+ 
+  Nondecimal integer constants are currently only supported in the range
+  of the bigint type (see ).
+ 
+
+
 
  integer
  bigint
diff --git a/src/backend/catalog/information_schema.sql 
b/src/backend/catalog/information_schema.sql
index 18725a02d1fb..95c27a625e7e 100644
--- a/src/backend/catalog/information_schema.sql
+++ b/src/backend/catalog/information_schema.sql
@@ -119,7 +119,7 @@ CREATE FUNCTION _pg_numeric_precision(typid oid, typmod 
int4) RETURNS integer
  WHEN 1700 /*numeric*/ THEN
   CASE WHEN $2 = -1
THEN null
-   ELSE (($2 - 4) >> 16) & 65535
+   ELSE (($2 - 4) >> 16) & 0x
END
  WHEN 700 /*float4*/ THEN 24 /*FLT_MANT_DIG*/
  WHEN 701 /*float8*/ THEN 53 /*DBL_MANT_DIG*/
@@ -147,7 +147,7 @@ CREATE FUNCTION _pg_numeric_scale(typid oid, typmod int4) 
RETURNS integer
WHEN $1 IN (1700) THEN
 CASE WHEN $2 = -1
  THEN null
- ELSE ($2 - 4) & 65535
+ ELSE ($2 - 4) & 0x
  END
ELSE null
   END;
@@ -163,7 +163,7 @@ CREATE FUNCTION _pg_datetime_precision(typid oid, typmod 
int4) RETURNS integer
WHEN $1 IN (1083, 1114, 1184, 1266) /* time, timestamp, same + tz */
THEN CASE WHEN $2 < 0 THEN 6 ELSE $2 END
WHEN $1 IN (1186) /* interval */
-   THEN CASE WHEN $2 < 0 OR $2 & 65535 = 65535 THEN 6 ELSE $2 & 65535 
END
+   THEN CASE WHEN $2 < 0 OR $2 & 0x = 0x THEN 6 ELSE $2 & 
0x END
ELSE null
   END;
 
diff --git a/src/backend/catalog/sql_features.txt 
b/src/backend/catalog/sql_features.txt
index da7c9c772e09..e897e28ed148 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -527,6 +527,7 @@ T652SQL-dynamic statements in SQL routines  
NO
 T653   SQL-schema statements in external routines  YES 
 T654   SQL-dynamic statements in external routines NO   

Re: Non-decimal integer literals

2022-11-15 Thread Peter Eisentraut

On 14.11.22 08:25, John Naylor wrote:
Regarding the patch, it looks good overall. My only suggestion would be 
to add a regression test for just below and just above overflow, at 
least for int2.


ok


Minor nits:

- * Process {integer}.  Note this will also do the right thing with 
{decimal},
+ * Process {*integer}.  Note this will also do the right thing with 
{numeric},


I scratched my head for a while, thinking this was Flex syntax, until I 
realized my brain was supposed to do shell-globbing first, at which 
point I could see it was referring to multiple Flex rules. I'd try to 
rephrase.


ok


+T661 Non-decimal integer literals YES SQL:202x draft

Is there an ETA yet?


March 2023

Also, it's not this patch's job to do it, but there are at least a half 
dozen places that open-code turning a hex char into a number. It might 
be a good easy "todo item" to unify that.


right





Re: Non-decimal integer literals

2022-11-13 Thread John Naylor
On Mon, Oct 10, 2022 at 9:17 PM Peter Eisentraut <
peter.eisentr...@enterprisedb.com> wrote:

> Taking another look around ecpg to see how this interacts with C-syntax
> integer literals.  I'm not aware of any particular issues, but it's
> understandably tricky.

I can find no discussion in the archives about the commit that added "xch":

commit 6fb3c3f78fbb2296894424f6e3183d339915eac7
Author: Michael Meskes 
Date:   Fri Oct 15 19:02:08 1999 +

*** empty log message ***

...and I can't think of why bounds checking a C literal was done like this.

Regarding the patch, it looks good overall. My only suggestion would be to
add a regression test for just below and just above overflow, at least for
int2.

Minor nits:

- * Process {integer}.  Note this will also do the right thing with
{decimal},
+ * Process {*integer}.  Note this will also do the right thing with
{numeric},

I scratched my head for a while, thinking this was Flex syntax, until I
realized my brain was supposed to do shell-globbing first, at which point I
could see it was referring to multiple Flex rules. I'd try to rephrase.

+T661 Non-decimal integer literals YES SQL:202x draft

Is there an ETA yet?

Also, it's not this patch's job to do it, but there are at least a half
dozen places that open-code turning a hex char into a number. It might be a
good easy "todo item" to unify that.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: Non-decimal integer literals

2022-10-11 Thread Junwang Zhao
On Tue, Oct 11, 2022 at 4:59 PM Peter Eisentraut
 wrote:
>
> On 11.10.22 05:29, Junwang Zhao wrote:
> > What do you think if we move these code into a static inline function? like:
> >
> > static inline char*
> > process_digits(char *ptr, int32 *result)
> > {
> > ...
> > }
>
> How would you handle the different ways each branch checks for valid
> digits and computes the value of each digit?
>

Didn't notice that, sorry for the noise ;(


-- 
Regards
Junwang Zhao




Re: Non-decimal integer literals

2022-10-11 Thread Peter Eisentraut

On 11.10.22 05:29, Junwang Zhao wrote:

What do you think if we move these code into a static inline function? like:

static inline char*
process_digits(char *ptr, int32 *result)
{
...
}


How would you handle the different ways each branch checks for valid 
digits and computes the value of each digit?






Re: Non-decimal integer literals

2022-10-10 Thread Junwang Zhao
Hi Peter,

  /* process digits */
+ if (ptr[0] == '0' && (ptr[1] == 'x' || ptr[1] == 'X'))
+ {
+ ptr += 2;
+ while (*ptr && isxdigit((unsigned char) *ptr))
+ {
+ int8 digit = hexlookup[(unsigned char) *ptr];
+
+ if (unlikely(pg_mul_s16_overflow(tmp, 16, )) ||
+ unlikely(pg_sub_s16_overflow(tmp, digit, )))
+ goto out_of_range;
+
+ ptr++;
+ }
+ }
+ else if (ptr[0] == '0' && (ptr[1] == 'o' || ptr[1] == 'O'))
+ {
+ ptr += 2;
+
+ while (*ptr && (*ptr >= '0' && *ptr <= '7'))
+ {
+ int8 digit = (*ptr++ - '0');
+
+ if (unlikely(pg_mul_s16_overflow(tmp, 8, )) ||
+ unlikely(pg_sub_s16_overflow(tmp, digit, )))
+ goto out_of_range;
+ }
+ }
+ else if (ptr[0] == '0' && (ptr[1] == 'b' || ptr[1] == 'B'))
+ {
+ ptr += 2;
+
+ while (*ptr && (*ptr >= '0' && *ptr <= '1'))
+ {
+ int8 digit = (*ptr++ - '0');
+
+ if (unlikely(pg_mul_s16_overflow(tmp, 2, )) ||
+ unlikely(pg_sub_s16_overflow(tmp, digit, )))
+ goto out_of_range;
+ }
+ }
+ else
+ {
  while (*ptr && isdigit((unsigned char) *ptr))
  {
  int8 digit = (*ptr++ - '0');
@@ -128,6 +181,7 @@ pg_strtoint16(const char *s)
  unlikely(pg_sub_s16_overflow(tmp, digit, )))
  goto out_of_range;
  }
+ }

What do you think if we move these code into a static inline function? like:

static inline char*
process_digits(char *ptr, int32 *result)
{
...
}

On Mon, Oct 10, 2022 at 10:17 PM Peter Eisentraut
 wrote:
>
> On 16.02.22 11:11, Peter Eisentraut wrote:
> > The remaining patches are material for PG16 at this point, and I will
> > set the commit fest item to returned with feedback in the meantime.
>
> Time to continue with this.
>
> Attached is a rebased and cleaned up patch for non-decimal integer
> literals.  (I don't include the underscores-in-numeric literals patch.
> I'm keeping that for later.)
>
> Two open issues from my notes:
>
> Technically, numeric_in() should be made aware of this, but that seems
> relatively complicated and maybe not necessary for the first iteration.
>
> Taking another look around ecpg to see how this interacts with C-syntax
> integer literals.  I'm not aware of any particular issues, but it's
> understandably tricky.
>
> Other than that, this seems pretty complete as a start.



-- 
Regards
Junwang Zhao




Re: Non-decimal integer literals

2022-10-10 Thread Peter Eisentraut

On 16.02.22 11:11, Peter Eisentraut wrote:
The remaining patches are material for PG16 at this point, and I will 
set the commit fest item to returned with feedback in the meantime.


Time to continue with this.

Attached is a rebased and cleaned up patch for non-decimal integer 
literals.  (I don't include the underscores-in-numeric literals patch. 
I'm keeping that for later.)


Two open issues from my notes:

Technically, numeric_in() should be made aware of this, but that seems 
relatively complicated and maybe not necessary for the first iteration.


Taking another look around ecpg to see how this interacts with C-syntax 
integer literals.  I'm not aware of any particular issues, but it's 
understandably tricky.


Other than that, this seems pretty complete as a start.
From d0bc72fa4c339ba2ea0bb8d1e5a3923d76ee8105 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 10 Oct 2022 16:03:15 +0200
Subject: [PATCH v9] Non-decimal integer literals

Add support for hexadecimal, octal, and binary integer literals:

0x42F
0o273
0b100101

per SQL:202x draft.

This adds support in the lexer as well as in the integer type input
functions.

Discussion: 
https://www.postgresql.org/message-id/flat/b239564c-cad0-b23e-c57e-166d883cb...@enterprisedb.com
---
 doc/src/sgml/syntax.sgml   |  26 
 src/backend/catalog/information_schema.sql |   6 +-
 src/backend/catalog/sql_features.txt   |   1 +
 src/backend/parser/scan.l  |  99 +++
 src/backend/utils/adt/numutils.c   | 140 +
 src/fe_utils/psqlscan.l|  78 +---
 src/interfaces/ecpg/preproc/pgc.l  | 108 +---
 src/test/regress/expected/int2.out |  19 +++
 src/test/regress/expected/int4.out |  19 +++
 src/test/regress/expected/int8.out |  19 +++
 src/test/regress/expected/numerology.out   |  59 -
 src/test/regress/sql/int2.sql  |   7 ++
 src/test/regress/sql/int4.sql  |   7 ++
 src/test/regress/sql/int8.sql  |   7 ++
 src/test/regress/sql/numerology.sql|  21 +++-
 15 files changed, 523 insertions(+), 93 deletions(-)

diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index 93ad71737f51..bba78c22f1a9 100644
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -694,6 +694,32 @@ Numeric Constants
 
 
 
+
+ Additionally, non-decimal integer constants can be used in these forms:
+
+0xhexdigits
+0ooctdigits
+0bbindigits
+
+ hexdigits is one or more hexadecimal digits
+ (0-9, A-F), octdigits is one or more octal
+ digits (0-7), bindigits is one or more binary
+ digits (0 or 1).  Hexadecimal digits and the radix prefixes can be in
+ upper or lower case.  Note that only integers can have non-decimal forms,
+ not numbers with fractional parts.
+
+
+
+ These are some examples of this:
+0b100101
+0B10011001
+0o273
+0O755
+0x42f
+0X
+
+
+
 
  integer
  bigint
diff --git a/src/backend/catalog/information_schema.sql 
b/src/backend/catalog/information_schema.sql
index 18725a02d1fb..95c27a625e7e 100644
--- a/src/backend/catalog/information_schema.sql
+++ b/src/backend/catalog/information_schema.sql
@@ -119,7 +119,7 @@ CREATE FUNCTION _pg_numeric_precision(typid oid, typmod 
int4) RETURNS integer
  WHEN 1700 /*numeric*/ THEN
   CASE WHEN $2 = -1
THEN null
-   ELSE (($2 - 4) >> 16) & 65535
+   ELSE (($2 - 4) >> 16) & 0x
END
  WHEN 700 /*float4*/ THEN 24 /*FLT_MANT_DIG*/
  WHEN 701 /*float8*/ THEN 53 /*DBL_MANT_DIG*/
@@ -147,7 +147,7 @@ CREATE FUNCTION _pg_numeric_scale(typid oid, typmod int4) 
RETURNS integer
WHEN $1 IN (1700) THEN
 CASE WHEN $2 = -1
  THEN null
- ELSE ($2 - 4) & 65535
+ ELSE ($2 - 4) & 0x
  END
ELSE null
   END;
@@ -163,7 +163,7 @@ CREATE FUNCTION _pg_datetime_precision(typid oid, typmod 
int4) RETURNS integer
WHEN $1 IN (1083, 1114, 1184, 1266) /* time, timestamp, same + tz */
THEN CASE WHEN $2 < 0 THEN 6 ELSE $2 END
WHEN $1 IN (1186) /* interval */
-   THEN CASE WHEN $2 < 0 OR $2 & 65535 = 65535 THEN 6 ELSE $2 & 65535 
END
+   THEN CASE WHEN $2 < 0 OR $2 & 0x = 0x THEN 6 ELSE $2 & 
0x END
ELSE null
   END;
 
diff --git a/src/backend/catalog/sql_features.txt 
b/src/backend/catalog/sql_features.txt
index da7c9c772e09..e897e28ed148 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -527,6 +527,7 @@ T652SQL-dynamic statements in SQL routines  
NO
 T653   SQL-schema statements in external routines  YES 
 T654   SQL-dynamic statements in external routines 

Re: Non-decimal integer literals

2022-02-16 Thread Peter Eisentraut



On 13.02.22 13:14, John Naylor wrote:

On Wed, Jan 26, 2022 at 10:10 PM Peter Eisentraut
 wrote:

[v8 patch]


0001-0004 seem pretty straightforward.


These have been committed.



0005:

  {realfail1} {
- /*
- * throw back the [Ee], and figure out whether what
- * remains is an {integer} or {decimal}.
- */
- yyless(yyleng - 1);
   SET_YYLLOC();
- return process_integer_literal(yytext, yylval);
+ yyerror("trailing junk after numeric literal");
   }

realfail1 has been subsumed by integer_junk and decimal_junk, so that
pattern can be removed.


Committed with that change.

I found that the JSON path lexer has the same trailing-junk issue.  I 
have researched the relevant ECMA standard and it explicitly points out 
that this is not allowed.  I will look into that separately.  I'm just 
pointing that out here because grepping for "realfail1" will still show 
a hit after this.


The remaining patches are material for PG16 at this point, and I will 
set the commit fest item to returned with feedback in the meantime.



0006:

Minor nit -- the s/decimal/numeric/ change doesn't seem to have any
notational advantage, but it's not worse, either.


I did that because with the addition of non-decimal literals, the word 
"decimal" becomes ambiguous or misleading.  (It doesn't mean "uses 
decimal digits" but "has a decimal point".)  (Of course, "numeric" isn't 
entirely free of ambiguity, but there are only so many words available 
in this space. ;-) )



0007:

I've attached an addendum to restore the no-backtrack property.


Thanks, that is helpful.


Will the underscore syntax need treatment in the input routines as well?


Yeah, various additional work is required for this.




Re: Non-decimal integer literals

2022-02-14 Thread Christoph Berg
Re: Peter Eisentraut
> This adds support in the lexer as well as in the integer type input
> functions.
> 
> Those core parts are straightforward enough, but there are a bunch of other
> places where integers are parsed, and one could consider in each case
> whether they should get the same treatment, for example the replication
> syntax lexer, or input function for oid, numeric, and int2vector.

One thing I always found weird is that timeline IDs appear most
prominently as hex numbers in WAL filenames, but they are printed as
decimal in the log ("new timeline id nn"), and have to be specified as
decimal in recovery_target_timeline.

Perhaps both these could make use of 0xhex numbers as well.

Christoph




Re: Non-decimal integer literals

2022-02-13 Thread John Naylor
On Wed, Jan 26, 2022 at 10:10 PM Peter Eisentraut
 wrote:
> [v8 patch]

0001-0004 seem pretty straightforward.

0005:

 {realfail1} {
- /*
- * throw back the [Ee], and figure out whether what
- * remains is an {integer} or {decimal}.
- */
- yyless(yyleng - 1);
  SET_YYLLOC();
- return process_integer_literal(yytext, yylval);
+ yyerror("trailing junk after numeric literal");
  }

realfail1 has been subsumed by integer_junk and decimal_junk, so that
pattern can be removed.

 {
+/*
+ * Note that some trailing junk is valid in C (such as 100LL), so we contain
+ * this to SQL mode.
+ */

It seems Flex doesn't like C comments after the "%%", so this stanza
was indented in 0006. If these are to be committed separately, that
fix should happen here.

0006:

Minor nit -- the s/decimal/numeric/ change doesn't seem to have any
notational advantage, but it's not worse, either.

0007:

I've attached an addendum to restore the no-backtrack property.

Will the underscore syntax need treatment in the input routines as well?

-- 
John Naylor
EDB: http://www.enterprisedb.com
diff --git a/src/backend/parser/Makefile b/src/backend/parser/Makefile
index 827bc4c189..5ddb9a92f0 100644
--- a/src/backend/parser/Makefile
+++ b/src/backend/parser/Makefile
@@ -56,7 +56,7 @@ gram.c: BISON_CHECK_CMD = $(PERL) $(srcdir)/check_keywords.pl 
$< $(top_srcdir)/s
 
 
 scan.c: FLEXFLAGS = -CF -p -p
-#scan.c: FLEX_NO_BACKUP=yes
+scan.c: FLEX_NO_BACKUP=yes
 scan.c: FLEX_FIX_WARNING=yes
 
 
diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l
index 5b574c4233..3b311ac2dd 100644
--- a/src/backend/parser/scan.l
+++ b/src/backend/parser/scan.l
@@ -400,9 +400,9 @@ hexinteger  0[xX](_?{hexdigit})+
 octinteger 0[oO](_?{octdigit})+
 bininteger 0[bB](_?{bindigit})+
 
-hexfail0[xX]
-octfail0[oO]
-binfail0[bB]
+hexfail0[xX]_?
+octfail0[oO]_?
+binfail0[bB]_?
 
 numeric(({decinteger}\.{decinteger}?)|(\.{decinteger}))
 numericfail{decdigit}+\.\.


Re: Non-decimal integer literals

2022-01-26 Thread Andrew Dunstan


On 1/25/22 13:43, Alvaro Herrera wrote:
> On 2022-Jan-24, Peter Eisentraut wrote:
>
>> +decinteger  {decdigit}(_?{decdigit})*
>> +hexinteger  0[xX](_?{hexdigit})+
>> +octinteger  0[oO](_?{octdigit})+
>> +bininteger  0[bB](_?{bindigit})+
> I think there should be test cases for literals that these seemingly
> strange expressions reject, which are a number with trailing _ (0x123_),
> and one with consecutive underscores __ (0x12__34).
>
> I like the idea of these literals.  I would have found them useful on
> many occassions.


+1. I can't remember the number of times I have miscounted a long string
of digits in a literal.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Non-decimal integer literals

2022-01-26 Thread Peter Eisentraut

On 26.01.22 01:02, Tom Lane wrote:

Robert Haas  writes:

On Tue, Jan 25, 2022 at 5:34 AM Peter Eisentraut
 wrote:

Which part exactly?  There are several different changes proposed here.



I was just going based on the description of the feature in your
original post. If someone is hoping that int4in() will accept only
^\d+$ then they will be disappointed by this patch.


Maybe I misunderstood, but I thought this was about what you could
write as a SQL literal, not about the I/O behavior of the integer
types.  I'd be -0.1 on changing the latter.


I think it would be strange if I/O routines would accept a different 
syntax from the literals.  Also, the behavior of a cast from string/text 
to a numeric type is usually defined in terms of what the literal syntax 
is, so they need to be aligned.





Re: Non-decimal integer literals

2022-01-25 Thread Tom Lane
Robert Haas  writes:
> On Tue, Jan 25, 2022 at 5:34 AM Peter Eisentraut
>  wrote:
>> Which part exactly?  There are several different changes proposed here.

> I was just going based on the description of the feature in your
> original post. If someone is hoping that int4in() will accept only
> ^\d+$ then they will be disappointed by this patch.

Maybe I misunderstood, but I thought this was about what you could
write as a SQL literal, not about the I/O behavior of the integer
types.  I'd be -0.1 on changing the latter.

regards, tom lane




Re: Non-decimal integer literals

2022-01-25 Thread Alvaro Herrera
On 2022-Jan-24, Peter Eisentraut wrote:

> +decinteger   {decdigit}(_?{decdigit})*
> +hexinteger   0[xX](_?{hexdigit})+
> +octinteger   0[oO](_?{octdigit})+
> +bininteger   0[bB](_?{bindigit})+

I think there should be test cases for literals that these seemingly
strange expressions reject, which are a number with trailing _ (0x123_),
and one with consecutive underscores __ (0x12__34).

I like the idea of these literals.  I would have found them useful on
many occassions.

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/




Re: Non-decimal integer literals

2022-01-25 Thread Robert Haas
On Tue, Jan 25, 2022 at 5:34 AM Peter Eisentraut
 wrote:
> On 24.01.22 19:53, Robert Haas wrote:
> > On Mon, Jan 24, 2022 at 3:41 AM Peter Eisentraut
> >  wrote:
> >> Rebased patch set
> >
> > What if someone finds this new behavior too permissive?
>
> Which part exactly?  There are several different changes proposed here.

I was just going based on the description of the feature in your
original post. If someone is hoping that int4in() will accept only
^\d+$ then they will be disappointed by this patch.

Maybe nobody is hoping that, though.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Non-decimal integer literals

2022-01-25 Thread Peter Eisentraut

On 24.01.22 19:53, Robert Haas wrote:

On Mon, Jan 24, 2022 at 3:41 AM Peter Eisentraut
 wrote:

Rebased patch set


What if someone finds this new behavior too permissive?


Which part exactly?  There are several different changes proposed here.




Re: Non-decimal integer literals

2022-01-24 Thread Robert Haas
On Mon, Jan 24, 2022 at 3:41 AM Peter Eisentraut
 wrote:
> Rebased patch set

What if someone finds this new behavior too permissive?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Non-decimal integer literals

2022-01-24 Thread Peter Eisentraut
);
return PARAM;
}
+{param_junk}   {
+   mmfatal(PARSE_ERROR, "trailing junk 
after parameter");
+   }
 
 {ip}   {
base_yylval.str = mm_strdup(yytext);
@@ -957,6 +965,20 @@ cppline
{space}*#([^i][A-Za-z]*|{if}|{ifdef}|{ifndef}|{import})((\/\*[^*/]*\*+
 } /*  */
 
 {
+/*
+ * Note that some trailing junk is valid in C (such as 100LL), so we contain
+ * this to SQL mode.
+ */
+{integer_junk} {
+   mmfatal(PARSE_ERROR, "trailing junk 
after numeric literal");
+   }
+{decimal_junk} {
+   mmfatal(PARSE_ERROR, "trailing junk 
after numeric literal");
+   }
+{real_junk}{
+   mmfatal(PARSE_ERROR, "trailing junk 
after numeric literal");
+   }
+
 :{identifier}((("->"|\.){identifier})|(\[{array}\]))*  {
base_yylval.str = mm_strdup(yytext+1);
return CVARIABLE;
diff --git a/src/test/regress/expected/numerology.out 
b/src/test/regress/expected/numerology.out
index 2ffc73e854..77d4843417 100644
--- a/src/test/regress/expected/numerology.out
+++ b/src/test/regress/expected/numerology.out
@@ -6,64 +6,45 @@
 -- Trailing junk in numeric literals
 --
 SELECT 123abc;
- abc 
--
- 123
-(1 row)
-
+ERROR:  trailing junk after numeric literal at or near "123a"
+LINE 1: SELECT 123abc;
+   ^
 SELECT 0x0o;
- x0o 
--
-   0
-(1 row)
-
+ERROR:  trailing junk after numeric literal at or near "0x"
+LINE 1: SELECT 0x0o;
+   ^
 SELECT 1_2_3;
- _2_3 
---
-1
-(1 row)
-
+ERROR:  trailing junk after numeric literal at or near "1_"
+LINE 1: SELECT 1_2_3;
+   ^
 SELECT 0.a;
- a 

- 0
-(1 row)
-
+ERROR:  trailing junk after numeric literal at or near "0.a"
+LINE 1: SELECT 0.a;
+   ^
 SELECT 0.0a;
-  a  
--
- 0.0
-(1 row)
-
+ERROR:  trailing junk after numeric literal at or near "0.0a"
+LINE 1: SELECT 0.0a;
+   ^
 SELECT .0a;
-  a  
--
- 0.0
-(1 row)
-
+ERROR:  trailing junk after numeric literal at or near ".0a"
+LINE 1: SELECT .0a;
+   ^
 SELECT 0.0e1a;
- a 

- 0
-(1 row)
-
+ERROR:  trailing junk after numeric literal at or near "0.0e1a"
+LINE 1: SELECT 0.0e1a;
+   ^
 SELECT 0.0e;
-  e  
------
- 0.0
-(1 row)
-
+ERROR:  trailing junk after numeric literal at or near "0.0e"
+LINE 1: SELECT 0.0e;
+   ^
 SELECT 0.0e+a;
-ERROR:  syntax error at or near "+"
+ERROR:  trailing junk after numeric literal at or near "0.0e+"
 LINE 1: SELECT 0.0e+a;
-   ^
+   ^
 PREPARE p1 AS SELECT $1a;
-EXECUTE p1(1);
- a 

- 1
-(1 row)
-
+ERROR:  trailing junk after parameter at or near "$1a"
+LINE 1: PREPARE p1 AS SELECT $1a;
+ ^
 --
 -- Test implicit type conversions
 -- This fails for Postgres v6.1 (and earlier?)
diff --git a/src/test/regress/sql/numerology.sql 
b/src/test/regress/sql/numerology.sql
index fb75f97832..be7d6dfe0c 100644
--- a/src/test/regress/sql/numerology.sql
+++ b/src/test/regress/sql/numerology.sql
@@ -17,7 +17,6 @@
 SELECT 0.0e;
 SELECT 0.0e+a;
 PREPARE p1 AS SELECT $1a;
-EXECUTE p1(1);
 
 --
 -- Test implicit type conversions
-- 
2.34.1

From 0132fb1da543b429b9001f1a682d21b1f510a3ef Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 30 Dec 2021 10:26:37 +0100
Subject: [PATCH v8 6/7] Non-decimal integer literals

Add support for hexadecimal, octal, and binary integer literals:

0x42F
0o273
0b100101

per SQL:202x draft.

This adds support in the lexer as well as in the integer type input
functions.

Discussion: 
https://www.postgresql.org/message-id/flat/b239564c-cad0-b23e-c57e-166d883cb...@enterprisedb.com
---
 doc/src/sgml/syntax.sgml   |  26 
 src/backend/catalog/information_schema.sql |   6 +-
 src/backend/catalog/sql_features.txt   |   1 +
 src/backend/parser/scan.l  | 101 +++
 src/backend/utils/adt/numutils.c   | 140 +
 src/fe_utils/psqlscan.l|  80 +---
 src/interfaces/ecpg/preproc/pgc.l  | 116 +
 src/test/regress/expected/int2.out |  19 +++
 src/test/regress/expected/int4.out |  19 +++
 src/test/regress/expected/int8.out |  19 +++
 src/test/regress/expected/numerology.out   |  59 -
 src/test/regress/sql/int2.sql  |   7 ++
 src/test/regress/sql/int4.sql  |   7 ++
 src/test/regress/sql/int8.sql  |   7 +

Re: Non-decimal integer literals

2022-01-13 Thread Peter Eisentraut
   mmfatal(PARSE_ERROR, "trailing junk 
after parameter");
+   }
 
 {ip}   {
base_yylval.str = mm_strdup(yytext);
@@ -957,6 +965,20 @@ cppline
{space}*#([^i][A-Za-z]*|{if}|{ifdef}|{ifndef}|{import})((\/\*[^*/]*\*+
 } /*  */
 
 {
+/*
+ * Note that some trailing junk is valid in C (such as 100LL), so we contain
+ * this to SQL mode.
+ */
+{integer_junk} {
+   mmfatal(PARSE_ERROR, "trailing junk 
after numeric literal");
+   }
+{decimal_junk} {
+   mmfatal(PARSE_ERROR, "trailing junk 
after numeric literal");
+   }
+{real_junk}{
+   mmfatal(PARSE_ERROR, "trailing junk 
after numeric literal");
+   }
+
 :{identifier}((("->"|\.){identifier})|(\[{array}\]))*  {
base_yylval.str = mm_strdup(yytext+1);
return CVARIABLE;
diff --git a/src/test/regress/expected/numerology.out 
b/src/test/regress/expected/numerology.out
index 2ffc73e854..77d4843417 100644
--- a/src/test/regress/expected/numerology.out
+++ b/src/test/regress/expected/numerology.out
@@ -6,64 +6,45 @@
 -- Trailing junk in numeric literals
 --
 SELECT 123abc;
- abc 
--
- 123
-(1 row)
-
+ERROR:  trailing junk after numeric literal at or near "123a"
+LINE 1: SELECT 123abc;
+   ^
 SELECT 0x0o;
- x0o 
--
-   0
-(1 row)
-
+ERROR:  trailing junk after numeric literal at or near "0x"
+LINE 1: SELECT 0x0o;
+   ^
 SELECT 1_2_3;
- _2_3 
---
-1
-(1 row)
-
+ERROR:  trailing junk after numeric literal at or near "1_"
+LINE 1: SELECT 1_2_3;
+   ^
 SELECT 0.a;
- a 

- 0
-(1 row)
-
+ERROR:  trailing junk after numeric literal at or near "0.a"
+LINE 1: SELECT 0.a;
+   ^
 SELECT 0.0a;
-  a  
--
- 0.0
-(1 row)
-
+ERROR:  trailing junk after numeric literal at or near "0.0a"
+LINE 1: SELECT 0.0a;
+   ^
 SELECT .0a;
-  a  
--
- 0.0
-(1 row)
-
+ERROR:  trailing junk after numeric literal at or near ".0a"
+LINE 1: SELECT .0a;
+   ^
 SELECT 0.0e1a;
- a 

- 0
-(1 row)
-
+ERROR:  trailing junk after numeric literal at or near "0.0e1a"
+LINE 1: SELECT 0.0e1a;
+   ^
 SELECT 0.0e;
-  e  
--
- 0.0
-(1 row)
-
+ERROR:  trailing junk after numeric literal at or near "0.0e"
+LINE 1: SELECT 0.0e;
+   ^
 SELECT 0.0e+a;
-ERROR:  syntax error at or near "+"
+ERROR:  trailing junk after numeric literal at or near "0.0e+"
 LINE 1: SELECT 0.0e+a;
-   ^
+   ^
 PREPARE p1 AS SELECT $1a;
-EXECUTE p1(1);
- a 

- 1
-(1 row)
-
+ERROR:  trailing junk after parameter at or near "$1a"
+LINE 1: PREPARE p1 AS SELECT $1a;
+ ^
 --
 -- Test implicit type conversions
 -- This fails for Postgres v6.1 (and earlier?)
diff --git a/src/test/regress/sql/numerology.sql 
b/src/test/regress/sql/numerology.sql
index fb75f97832..be7d6dfe0c 100644
--- a/src/test/regress/sql/numerology.sql
+++ b/src/test/regress/sql/numerology.sql
@@ -17,7 +17,6 @@
 SELECT 0.0e;
 SELECT 0.0e+a;
 PREPARE p1 AS SELECT $1a;
-EXECUTE p1(1);
 
 --
 -- Test implicit type conversions
-- 
2.34.1

From d40d84e76525f732ee8a07ffd62c68db5368c842 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 30 Dec 2021 10:26:37 +0100
Subject: [PATCH v7 6/7] Non-decimal integer literals

Add support for hexadecimal, octal, and binary integer literals:

0x42F
0o273
0b100101

per SQL:202x draft.

This adds support in the lexer as well as in the integer type input
functions.

Discussion: 
https://www.postgresql.org/message-id/flat/b239564c-cad0-b23e-c57e-166d883cb...@enterprisedb.com
---
 doc/src/sgml/syntax.sgml   |  26 
 src/backend/catalog/information_schema.sql |   6 +-
 src/backend/catalog/sql_features.txt   |   1 +
 src/backend/parser/scan.l  | 101 +++
 src/backend/utils/adt/numutils.c   | 140 +
 src/fe_utils/psqlscan.l|  80 +---
 src/interfaces/ecpg/preproc/pgc.l  | 116 +
 src/test/regress/expected/int2.out |  19 +++
 src/test/regress/expected/int4.out |  19 +++
 src/test/regress/expected/int8.out |  19 +++
 src/test/regress/expected/numerology.out   |  59 -
 src/test/regress/sql/int2.sql  |   7 ++
 src/test/regress/sql/int4.sql  |   7 ++
 src/test/regress/sql/int8.sql  |   7 ++
 src/test/regress/sql/numerology.sql|  21 +++-
 15 files changed, 529 insertions(+), 99 deletions(-)

diff --g

Re: Non-decimal integer literals

2021-12-30 Thread Peter Eisentraut
teger_junk} {
+   SET_YYLLOC();
+   yyerror("trailing junk after numeric 
literal");
+   }
+{decimal_junk} {
+   SET_YYLLOC();
+   yyerror("trailing junk after numeric 
literal");
+   }
+{real_junk}{
+   SET_YYLLOC();
+   yyerror("trailing junk after numeric 
literal");
}
 
 
diff --git a/src/fe_utils/psqlscan.l b/src/fe_utils/psqlscan.l
index db8a8dfaf2..09709e6151 100644
--- a/src/fe_utils/psqlscan.l
+++ b/src/fe_utils/psqlscan.l
@@ -337,6 +337,10 @@ real   
({integer}|{decimal})[Ee][-+]?{digit}+
 realfail1  ({integer}|{decimal})[Ee]
 realfail2  ({integer}|{decimal})[Ee][-+]
 
+integer_junk   {integer}{ident_start}
+decimal_junk   {decimal}{ident_start}
+real_junk  {real}{ident_start}
+
 param  \${integer}
 
 /* psql-specific: characters allowed in variable names */
@@ -855,17 +859,18 @@ other .
ECHO;
}
 {realfail1}{
-   /*
-* throw back the [Ee], and figure out 
whether what
-* remains is an {integer} or {decimal}.
-* (in psql, we don't actually care...)
-*/
-   yyless(yyleng - 1);
ECHO;
}
 {realfail2}{
-   /* throw back the [Ee][+-], and proceed 
as above */
-   yyless(yyleng - 2);
+   ECHO;
+   }
+{integer_junk} {
+   ECHO;
+   }
+{decimal_junk} {
+   ECHO;
+   }
+{real_junk}{
ECHO;
}
 
diff --git a/src/interfaces/ecpg/preproc/pgc.l 
b/src/interfaces/ecpg/preproc/pgc.l
index a2f8c7f3d8..110478059b 100644
--- a/src/interfaces/ecpg/preproc/pgc.l
+++ b/src/interfaces/ecpg/preproc/pgc.l
@@ -365,6 +365,10 @@ real   
({integer}|{decimal})[Ee][-+]?{digit}+
 realfail1  ({integer}|{decimal})[Ee]
 realfail2  ({integer}|{decimal})[Ee][-+]
 
+integer_junk   {integer}{ident_start}
+decimal_junk   {decimal}{ident_start}
+real_junk  {real}{ident_start}
+
 param  \${integer}
 
 /* special characters for other dbms */
diff --git a/src/test/regress/expected/numerology.out 
b/src/test/regress/expected/numerology.out
index 32c6d80c03..2f176ccb52 100644
--- a/src/test/regress/expected/numerology.out
+++ b/src/test/regress/expected/numerology.out
@@ -6,57 +6,41 @@
 -- Trailing junk in numeric literals
 --
 SELECT 123abc;
- abc 
--
- 123
-(1 row)
-
+ERROR:  trailing junk after numeric literal at or near "123a"
+LINE 1: SELECT 123abc;
+   ^
 SELECT 0x0o;
- x0o 
--
-   0
-(1 row)
-
+ERROR:  trailing junk after numeric literal at or near "0x"
+LINE 1: SELECT 0x0o;
+   ^
 SELECT 1_2_3;
- _2_3 
---
-1
-(1 row)
-
+ERROR:  trailing junk after numeric literal at or near "1_"
+LINE 1: SELECT 1_2_3;
+   ^
 SELECT 0.a;
- a 

- 0
-(1 row)
-
+ERROR:  trailing junk after numeric literal at or near "0.a"
+LINE 1: SELECT 0.a;
+   ^
 SELECT 0.0a;
-  a  
--
- 0.0
-(1 row)
-
+ERROR:  trailing junk after numeric literal at or near "0.0a"
+LINE 1: SELECT 0.0a;
+   ^
 SELECT .0a;
-  a  
--
- 0.0
-(1 row)
-
+ERROR:  trailing junk after numeric literal at or near ".0a"
+LINE 1: SELECT .0a;
+   ^
 SELECT 0.0e1a;
- a 

- 0
-(1 row)
-
+ERROR:  trailing junk after numeric literal at or near "0.0e1a"
+LINE 1: SELECT 0.0e1a;
+   ^
 SELECT 0.0e;
-  e  
--
- 0.0
-(1 row)
-
+ERROR:  trailing junk after numeric literal at or near "0.0e"
+LINE 1: SELECT 0.0e;
+   ^
 SELECT 0.0e+a;
-ERROR:  syntax error at or near "+"
+ERROR:  trailing junk after numeric literal at or near "0.0e+"
 LINE 1: SELECT 0.0e+a;
-   ^
+   ^
 --
 -- Test implicit type conversions
 -- This fails for Postgres v6.1 (and earlier?)
-- 
2.34.1

From 8cf484ed47263ecf257e3770715cfa83394f1fa4 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 30 Dec 2021 10:26:37 +0100
Subject: [PATCH v6 6/7] No

Re: Non-decimal integer literals

2021-12-01 Thread Peter Eisentraut

On 25.11.21 18:51, John Naylor wrote:
If we're going to change the comment anyway, "the parser" sounds more 
natural. Aside from that, 0001 and 0002 can probably be pushed now, if 
you like.


done


--- a/src/interfaces/ecpg/preproc/pgc.l
+++ b/src/interfaces/ecpg/preproc/pgc.l
@@ -365,6 +365,10 @@ real ({integer}|{decimal})[Ee][-+]?{digit}+
  realfail1 ({integer}|{decimal})[Ee]
  realfail2 ({integer}|{decimal})[Ee][-+]

+integer_junk {integer}{ident_start}
+decimal_junk {decimal}{ident_start}
+real_junk {real}{ident_start}

A comment might be good here to explain these are only in ECPG for 
consistency with the other scanners. Not really important, though.


Yeah, it's a bit weird that not all the symbols are used in ecpg.  I'll 
look into explaining this better.



0006

+{hexfail} {
+ yyerror("invalid hexadecimal integer");
+ }
+{octfail} {
+ yyerror("invalid octal integer");
   }
-{decimal} {
+{binfail} {
+ yyerror("invalid binary integer");
+ }

It seems these could use SET_YYLLOC(), since the error cursor doesn't 
match other failure states:


ok

We might consider some tests for ECPG since lack of coverage has been a 
problem.


right

Also, I'm curious: how does the spec work as far as deciding the year of 
release, or feature-freezing of new items?


The schedule has recently been extended again, so the current plan is 
for SQL:202x with x=3, with feature freeze in mid-2022.


So the feature patches in this thread are in my mind now targeting 
PG15+1.  But the preparation work (up to v5-0005, and some other number 
parsing refactoring that I'm seeing) could be considered for PG15.


I'll move this to the next CF and come back with an updated patch set in 
a little while.





Re: Non-decimal integer literals

2021-12-01 Thread Peter Eisentraut

On 25.11.21 16:46, Zhihong Yu wrote:

For patch 3,

+int64
+pg_strtoint64(const char *s)

How about naming the above function pg_scanint64()?
pg_strtoint64xx() can be named pg_strtoint64() - this would align with 
existing function:


pg_strtouint64(const char *str, char **endptr, int base)


That would be one way.  But the existing pg_strtointNN() functions are 
pretty widely used, so I would tend toward finding another name for the 
less used pg_strtouint64(), maybe pg_strtouint64x() ("extended").






Re: Non-decimal integer literals

2021-11-25 Thread John Naylor
Hi Peter,

0001

-/* we no longer allow unary minus in numbers.
- * instead we pass it separately to parser. there it gets
- * coerced via doNegate() -- Leon aug 20 1999
+/*
+ * Numbers
+ *
+ * Unary minus is not part of a number here.  Instead we pass it
separately to
+ * parser, and there it gets coerced via doNegate().

If we're going to change the comment anyway, "the parser" sounds more
natural. Aside from that, 0001 and 0002 can probably be pushed now, if you
like. I don't have any good ideas about 0003 at the moment.

0005

--- a/src/interfaces/ecpg/preproc/pgc.l
+++ b/src/interfaces/ecpg/preproc/pgc.l
@@ -365,6 +365,10 @@ real ({integer}|{decimal})[Ee][-+]?{digit}+
 realfail1 ({integer}|{decimal})[Ee]
 realfail2 ({integer}|{decimal})[Ee][-+]

+integer_junk {integer}{ident_start}
+decimal_junk {decimal}{ident_start}
+real_junk {real}{ident_start}

A comment might be good here to explain these are only in ECPG for
consistency with the other scanners. Not really important, though.

0006

+{hexfail} {
+ yyerror("invalid hexadecimal integer");
+ }
+{octfail} {
+ yyerror("invalid octal integer");
  }
-{decimal} {
+{binfail} {
+ yyerror("invalid binary integer");
+ }

It seems these could use SET_YYLLOC(), since the error cursor doesn't match
other failure states:

+SELECT 0b;
+ERROR:  invalid binary integer at or near "SELECT 0b"
+LINE 1: SELECT 0b;
+^
+SELECT 1b;
+ERROR:  trailing junk after numeric literal at or near "1b"
+LINE 1: SELECT 1b;
+   ^

We might consider some tests for ECPG since lack of coverage has been a
problem.

Also, I'm curious: how does the spec work as far as deciding the year of
release, or feature-freezing of new items?
--
John Naylor
EDB: http://www.enterprisedb.com


Re: Non-decimal integer literals

2021-11-25 Thread Zhihong Yu
On Thu, Nov 25, 2021 at 5:18 AM Peter Eisentraut <
peter.eisentr...@enterprisedb.com> wrote:

> On 01.11.21 07:09, Peter Eisentraut wrote:
> > Here is an updated patch for this.  It's the previous patch polished a
> > bit more, and it contains changes so that numeric literals reject
> > trailing identifier parts without whitespace in between, as discussed.
> > Maybe I should split that into incremental patches, but for now I only
> > have the one.  I don't have a patch for the underscores in numeric
> > literals yet.  It's in progress, but not ready.
>
> Here is a progressed version of this work, split into more incremental
> patches.  The first three patches are harmless code cleanups.  Patch 3
> has an interesting naming conflict, noted in the commit message; ideas
> welcome.  Patches 4 and 5 handle the rejection of trailing junk after
> numeric literals, as discussed.  I have expanded that compared to the v4
> patch to also cover non-integer literals.  It also comes with more tests
> now.  Patch 6 is the titular introduction of non-decimal integer
> literals, unchanged from before.

Hi,
For patch 3,

+int64
+pg_strtoint64(const char *s)

How about naming the above function pg_scanint64()?
pg_strtoint64xx() can be named pg_strtoint64() - this would align with
existing function:

pg_strtouint64(const char *str, char **endptr, int base)

Cheers


Re: Non-decimal integer literals

2021-11-25 Thread Peter Eisentraut

On 01.11.21 07:09, Peter Eisentraut wrote:
Here is an updated patch for this.  It's the previous patch polished a 
bit more, and it contains changes so that numeric literals reject 
trailing identifier parts without whitespace in between, as discussed. 
Maybe I should split that into incremental patches, but for now I only 
have the one.  I don't have a patch for the underscores in numeric 
literals yet.  It's in progress, but not ready.


Here is a progressed version of this work, split into more incremental 
patches.  The first three patches are harmless code cleanups.  Patch 3 
has an interesting naming conflict, noted in the commit message; ideas 
welcome.  Patches 4 and 5 handle the rejection of trailing junk after 
numeric literals, as discussed.  I have expanded that compared to the v4 
patch to also cover non-integer literals.  It also comes with more tests 
now.  Patch 6 is the titular introduction of non-decimal integer 
literals, unchanged from before.From 39aed9c0516fcf0a6b3372361ecfcf4874614578 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 24 Nov 2021 09:10:32 +0100
Subject: [PATCH v5 1/6] Improve some comments in scanner files

---
 src/backend/parser/scan.l | 14 --
 src/fe_utils/psqlscan.l   | 14 --
 src/interfaces/ecpg/preproc/pgc.l | 16 +---
 3 files changed, 25 insertions(+), 19 deletions(-)

diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l
index 6e6824faeb..4e02815803 100644
--- a/src/backend/parser/scan.l
+++ b/src/backend/parser/scan.l
@@ -174,7 +174,7 @@ extern void core_yyset_column(int column_no, yyscan_t 
yyscanner);
  *   bit string literal
  *   extended C-style comments
  *   delimited identifiers (double-quoted identifiers)
- *   hexadecimal numeric string
+ *   hexadecimal byte string
  *   standard quoted strings
  *   quote stop (detect continued strings)
  *   extended quoted strings (support backslash escape sequences)
@@ -262,7 +262,7 @@ quotecontinuefail   {whitespace}*"-"?
 xbstart[bB]{quote}
 xbinside   [^']*
 
-/* Hexadecimal number */
+/* Hexadecimal byte string */
 xhstart[xX]{quote}
 xhinside   [^']*
 
@@ -341,7 +341,6 @@ xcstart \/\*{op_chars}*
 xcstop \*+\/
 xcinside   [^*/]+
 
-digit  [0-9]
 ident_start[A-Za-z\200-\377_]
 ident_cont [A-Za-z\200-\377_0-9\$]
 
@@ -380,15 +379,18 @@ self  [,()\[\].;\:\+\-\*\/\%\^\<\>\=]
 op_chars   [\~\!\@\#\^\&\|\`\?\+\-\*\/\%\<\>\=]
 operator   {op_chars}+
 
-/* we no longer allow unary minus in numbers.
- * instead we pass it separately to parser. there it gets
- * coerced via doNegate() -- Leon aug 20 1999
+/*
+ * Numbers
+ *
+ * Unary minus is not part of a number here.  Instead we pass it separately to
+ * parser, and there it gets coerced via doNegate().
  *
  * {decimalfail} is used because we would like "1..10" to lex as 1, dot_dot, 
10.
  *
  * {realfail1} and {realfail2} are added to prevent the need for scanner
  * backup when the {real} rule fails to match completely.
  */
+digit  [0-9]
 
 integer{digit}+
 decimal(({digit}*\.{digit}+)|({digit}+\.{digit}*))
diff --git a/src/fe_utils/psqlscan.l b/src/fe_utils/psqlscan.l
index 0fab48a382..9aac166aa0 100644
--- a/src/fe_utils/psqlscan.l
+++ b/src/fe_utils/psqlscan.l
@@ -112,7 +112,7 @@ extern void psql_yyset_column(int column_no, yyscan_t 
yyscanner);
  *   bit string literal
  *   extended C-style comments
  *   delimited identifiers (double-quoted identifiers)
- *   hexadecimal numeric string
+ *   hexadecimal byte string
  *   standard quoted strings
  *   quote stop (detect continued strings)
  *   extended quoted strings (support backslash escape sequences)
@@ -200,7 +200,7 @@ quotecontinuefail   {whitespace}*"-"?
 xbstart[bB]{quote}
 xbinside   [^']*
 
-/* Hexadecimal number */
+/* Hexadecimal byte string */
 xhstart[xX]{quote}
 xhinside   [^']*
 
@@ -279,7 +279,6 @@ xcstart \/\*{op_chars}*
 xcstop \*+\/
 xcinside   [^*/]+
 
-digit  [0-9]
 ident_start[A-Za-z\200-\377_]
 ident_cont [A-Za-z\200-\377_0-9\$]
 
@@ -318,15 +317,18 @@ self  [,()\[\].;\:\+\-\*\/\%\^\<\>\=]
 op_chars   [\~\!\@\#\^\&\|\`\?\+\-\*\/\%\<\>\=]
 operator   {op_chars}+
 
-/* we no longer allow unary minus in numbers.
- * instead we pass it separately to parser. there it gets
- * coerced via doNegate() -- Leon aug 20 1999
+/*
+ * Numbers
+ *
+ * Unary minus is not part of a number here.  Instead we pass it separately to
+ * parser, and there it gets coerced via doNegate().
  *
  * {decimalfa

Re: Non-decimal integer literals

2021-11-01 Thread Peter Eisentraut

On 28.09.21 17:30, Peter Eisentraut wrote:

On 09.09.21 16:08, Vik Fearing wrote:

Even without that point, this patch *is* going to break valid queries,
because every one of those cases is a valid 
number-followed-by-identifier

today,


Ah, true that.  So if this does go in, we may as well add the
underscores at the same time.


Yeah, looks like I'll need to look into the identifier lexing issues 
previously discussed.  I'll attack that during the next commit fest.


Here is an updated patch for this.  It's the previous patch polished a 
bit more, and it contains changes so that numeric literals reject 
trailing identifier parts without whitespace in between, as discussed. 
Maybe I should split that into incremental patches, but for now I only 
have the one.  I don't have a patch for the underscores in numeric 
literals yet.  It's in progress, but not ready.
From 6e081c44c04201ee9ded9dc6b689824ccabdfc28 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Sun, 31 Oct 2021 15:42:18 +0100
Subject: [PATCH v4] Non-decimal integer literals

Add support for hexadecimal, octal, and binary integer literals:

0x42F
0o273
0b100101

per SQL:202x draft.

This adds support in the lexer as well as in the integer type input
functions.

Discussion: 
https://www.postgresql.org/message-id/flat/b239564c-cad0-b23e-c57e-166d883cb...@enterprisedb.com
---
 doc/src/sgml/syntax.sgml   |  26 ++
 src/backend/catalog/information_schema.sql |   6 +-
 src/backend/catalog/sql_features.txt   |   1 +
 src/backend/parser/scan.l  | 103 -
 src/backend/utils/adt/int8.c   |  54 +++
 src/backend/utils/adt/numutils.c   |  97 +++
 src/fe_utils/psqlscan.l|  81 
 src/interfaces/ecpg/preproc/pgc.l  |  95 +++
 src/test/regress/expected/int2.out |  19 
 src/test/regress/expected/int4.out |  75 +++
 src/test/regress/expected/int8.out |  19 
 src/test/regress/sql/int2.sql  |   7 ++
 src/test/regress/sql/int4.sql  |  26 ++
 src/test/regress/sql/int8.sql  |   7 ++
 14 files changed, 531 insertions(+), 85 deletions(-)

diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index d66560b587..a4f04199c6 100644
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -694,6 +694,32 @@ Numeric Constants
 
 
 
+
+ Additionally, non-decimal integer constants can be used in these forms:
+
+0xhexdigits
+0ooctdigits
+0bbindigits
+
+ hexdigits is one or more hexadecimal digits
+ (0-9, A-F), octdigits is one or more octal
+ digits (0-7), bindigits is one or more binary
+ digits (0 or 1).  Hexadecimal digits and the radix prefixes can be in
+ upper or lower case.  Note that only integers can have non-decimal forms,
+ not numbers with fractional parts.
+
+
+
+ These are some examples of this:
+0b100101
+0B10011001
+0o273
+0O755
+0x42f
+0X
+
+
+
 
  integer
  bigint
diff --git a/src/backend/catalog/information_schema.sql 
b/src/backend/catalog/information_schema.sql
index 11d9dd60c2..ce88c483a2 100644
--- a/src/backend/catalog/information_schema.sql
+++ b/src/backend/catalog/information_schema.sql
@@ -119,7 +119,7 @@ CREATE FUNCTION _pg_numeric_precision(typid oid, typmod 
int4) RETURNS integer
  WHEN 1700 /*numeric*/ THEN
   CASE WHEN $2 = -1
THEN null
-   ELSE (($2 - 4) >> 16) & 65535
+   ELSE (($2 - 4) >> 16) & 0x
END
  WHEN 700 /*float4*/ THEN 24 /*FLT_MANT_DIG*/
  WHEN 701 /*float8*/ THEN 53 /*DBL_MANT_DIG*/
@@ -147,7 +147,7 @@ CREATE FUNCTION _pg_numeric_scale(typid oid, typmod int4) 
RETURNS integer
WHEN $1 IN (1700) THEN
 CASE WHEN $2 = -1
  THEN null
- ELSE ($2 - 4) & 65535
+ ELSE ($2 - 4) & 0x
  END
ELSE null
   END;
@@ -163,7 +163,7 @@ CREATE FUNCTION _pg_datetime_precision(typid oid, typmod 
int4) RETURNS integer
WHEN $1 IN (1083, 1114, 1184, 1266) /* time, timestamp, same + tz */
THEN CASE WHEN $2 < 0 THEN 6 ELSE $2 END
WHEN $1 IN (1186) /* interval */
-   THEN CASE WHEN $2 < 0 OR $2 & 65535 = 65535 THEN 6 ELSE $2 & 65535 
END
+   THEN CASE WHEN $2 < 0 OR $2 & 0x = 0x THEN 6 ELSE $2 & 
0x END
ELSE null
   END;
 
diff --git a/src/backend/catalog/sql_features.txt 
b/src/backend/catalog/sql_features.txt
index 9f424216e2..d6359503f3 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -526,6 +526,7 @@ T652SQL-dynamic statements in SQL routines  
NO
 T653   SQL-schema statements in external routines  YES

Re: Non-decimal integer literals

2021-09-28 Thread Peter Eisentraut



On 09.09.21 16:08, Vik Fearing wrote:

Even without that point, this patch *is* going to break valid queries,
because every one of those cases is a valid number-followed-by-identifier
today,


Ah, true that.  So if this does go in, we may as well add the
underscores at the same time.


Yeah, looks like I'll need to look into the identifier lexing issues 
previously discussed.  I'll attack that during the next commit fest.



so I kind of wonder why we're in such a hurry to adopt something
that hasn't even made it past draft-standard status.

I don't really see a hurry here.  I am fine with waiting until the draft
becomes final.


Right, the point is to explore this now so that it can be ready when the 
standard is ready.





Re: Non-decimal integer literals

2021-09-28 Thread Peter Eisentraut

On 07.09.21 13:50, Zhihong Yu wrote:

On 16.08.21 17:32, John Naylor wrote:
 > The one thing that jumped out at me on a cursory reading is
 > the {integer} rule, which seems to be used nowhere except to
 > call process_integer_literal, which must then inspect the token
text to
 > figure out what type of integer it is. Maybe consider 4 separate
 > process_*_literal functions?

Agreed, that can be done in a simpler way.  Here is an updated patch.

Hi,
Minor comment:

+SELECT int4 '0o112';

Maybe involve digits of up to 7 in the octal test case.


Good point, here is a lightly updated patch.
From 43957a1f48ed6f750f231ef8e3533d74d7ac4cc9 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 28 Sep 2021 17:14:44 +0200
Subject: [PATCH v3] Non-decimal integer literals

Add support for hexadecimal, octal, and binary integer literals:

0x42F
0o273
0b100101

per SQL:202x draft.

This adds support in the lexer as well as in the integer type input
functions.

Discussion: 
https://www.postgresql.org/message-id/flat/b239564c-cad0-b23e-c57e-166d883cb...@enterprisedb.com
---
 doc/src/sgml/syntax.sgml   | 26 ++
 src/backend/catalog/information_schema.sql |  6 +-
 src/backend/catalog/sql_features.txt   |  1 +
 src/backend/parser/scan.l  | 87 +--
 src/backend/utils/adt/int8.c   | 54 
 src/backend/utils/adt/numutils.c   | 97 ++
 src/fe_utils/psqlscan.l| 55 
 src/interfaces/ecpg/preproc/pgc.l  | 64 +-
 src/test/regress/expected/int2.out | 19 +
 src/test/regress/expected/int4.out | 37 +
 src/test/regress/expected/int8.out | 19 +
 src/test/regress/sql/int2.sql  |  7 ++
 src/test/regress/sql/int4.sql  | 11 +++
 src/test/regress/sql/int8.sql  |  7 ++
 14 files changed, 425 insertions(+), 65 deletions(-)

diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index d66560b587..a4f04199c6 100644
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -694,6 +694,32 @@ Numeric Constants
 
 
 
+
+ Additionally, non-decimal integer constants can be used in these forms:
+
+0xhexdigits
+0ooctdigits
+0bbindigits
+
+ hexdigits is one or more hexadecimal digits
+ (0-9, A-F), octdigits is one or more octal
+ digits (0-7), bindigits is one or more binary
+ digits (0 or 1).  Hexadecimal digits and the radix prefixes can be in
+ upper or lower case.  Note that only integers can have non-decimal forms,
+ not numbers with fractional parts.
+
+
+
+ These are some examples of this:
+0b100101
+0B10011001
+0o273
+0O755
+0x42f
+0X
+
+
+
 
  integer
  bigint
diff --git a/src/backend/catalog/information_schema.sql 
b/src/backend/catalog/information_schema.sql
index 11d9dd60c2..ce88c483a2 100644
--- a/src/backend/catalog/information_schema.sql
+++ b/src/backend/catalog/information_schema.sql
@@ -119,7 +119,7 @@ CREATE FUNCTION _pg_numeric_precision(typid oid, typmod 
int4) RETURNS integer
  WHEN 1700 /*numeric*/ THEN
   CASE WHEN $2 = -1
THEN null
-   ELSE (($2 - 4) >> 16) & 65535
+   ELSE (($2 - 4) >> 16) & 0x
END
  WHEN 700 /*float4*/ THEN 24 /*FLT_MANT_DIG*/
  WHEN 701 /*float8*/ THEN 53 /*DBL_MANT_DIG*/
@@ -147,7 +147,7 @@ CREATE FUNCTION _pg_numeric_scale(typid oid, typmod int4) 
RETURNS integer
WHEN $1 IN (1700) THEN
 CASE WHEN $2 = -1
  THEN null
- ELSE ($2 - 4) & 65535
+ ELSE ($2 - 4) & 0x
  END
ELSE null
   END;
@@ -163,7 +163,7 @@ CREATE FUNCTION _pg_datetime_precision(typid oid, typmod 
int4) RETURNS integer
WHEN $1 IN (1083, 1114, 1184, 1266) /* time, timestamp, same + tz */
THEN CASE WHEN $2 < 0 THEN 6 ELSE $2 END
WHEN $1 IN (1186) /* interval */
-   THEN CASE WHEN $2 < 0 OR $2 & 65535 = 65535 THEN 6 ELSE $2 & 65535 
END
+   THEN CASE WHEN $2 < 0 OR $2 & 0x = 0x THEN 6 ELSE $2 & 
0x END
ELSE null
   END;
 
diff --git a/src/backend/catalog/sql_features.txt 
b/src/backend/catalog/sql_features.txt
index 9f424216e2..d6359503f3 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -526,6 +526,7 @@ T652SQL-dynamic statements in SQL routines  
NO
 T653   SQL-schema statements in external routines  YES 
 T654   SQL-dynamic statements in external routines NO  
 T655   Cyclically dependent routines   YES 
+T661   Non-decimal integer literalsYES SQL:202x draft
 T811   Basic SQL/JSO

Re: Non-decimal integer literals

2021-09-09 Thread Vik Fearing
On 9/8/21 3:14 PM, Tom Lane wrote:
> Vik Fearing  writes:
> 
>> Is there any hope of adding the optional underscores?  I see a potential
>> problem there as SELECT 1_a; is currently parsed as SELECT 1 AS _a; when
>> it should be parsed as SELECT 1_ AS a; or perhaps even as an error since
>> 0x1_a would be a valid number with no alias.
> 
> Even without that point, this patch *is* going to break valid queries,
> because every one of those cases is a valid number-followed-by-identifier
> today,

Ah, true that.  So if this does go in, we may as well add the
underscores at the same time.

> AFAIR we've seen exactly zero field demand for this feature,

I have often wanted something like this, even if I didn't bring it up on
this list.  I have had customers who have wanted this, too.  My response
has always been to show these exact problems to explain why it's not
possible, but if it's going to be in the standard then I favor doing it.

I have never really had a use for octal, but sometimes binary and hex
make things much clearer.  Having a grouping separator for large numbers
is even more useful.

> so I kind of wonder why we're in such a hurry to adopt something
> that hasn't even made it past draft-standard status.
I don't really see a hurry here.  I am fine with waiting until the draft
becomes final.
-- 
Vik Fearing




Re: Non-decimal integer literals

2021-09-08 Thread Tom Lane
Vik Fearing  writes:
> On 8/16/21 11:51 AM, Peter Eisentraut wrote:
>> Here is a patch to add support for hexadecimal, octal, and binary
>> integer literals:
>> 
>>     0x42E
>>     0o112
>>     0b100101
>> 
>> per SQL:202x draft.

> Is there any hope of adding the optional underscores?  I see a potential
> problem there as SELECT 1_a; is currently parsed as SELECT 1 AS _a; when
> it should be parsed as SELECT 1_ AS a; or perhaps even as an error since
> 0x1_a would be a valid number with no alias.

Even without that point, this patch *is* going to break valid queries,
because every one of those cases is a valid number-followed-by-identifier
today, e.g.

regression=# select 0x42e;
 x42e 
--
0
(1 row)

AFAIR we've seen exactly zero field demand for this feature,
so I kind of wonder why we're in such a hurry to adopt something
that hasn't even made it past draft-standard status.

regards, tom lane




Re: Non-decimal integer literals

2021-09-08 Thread Vik Fearing
On 8/16/21 11:51 AM, Peter Eisentraut wrote:
> Here is a patch to add support for hexadecimal, octal, and binary
> integer literals:
> 
>     0x42E
>     0o112
>     0b100101
> 
> per SQL:202x draft.

Is there any hope of adding the optional underscores?  I see a potential
problem there as SELECT 1_a; is currently parsed as SELECT 1 AS _a; when
it should be parsed as SELECT 1_ AS a; or perhaps even as an error since
0x1_a would be a valid number with no alias.

(The standard does not allow identifiers to begin with _ but we do...)
-- 
Vik Fearing




Re: Non-decimal integer literals

2021-09-07 Thread Zhihong Yu
On Tue, Sep 7, 2021 at 4:13 AM Peter Eisentraut <
peter.eisentr...@enterprisedb.com> wrote:

> On 16.08.21 17:32, John Naylor wrote:
> > The one thing that jumped out at me on a cursory reading is
> > the {integer} rule, which seems to be used nowhere except to
> > call process_integer_literal, which must then inspect the token text to
> > figure out what type of integer it is. Maybe consider 4 separate
> > process_*_literal functions?
>
> Agreed, that can be done in a simpler way.  Here is an updated patch.
>
Hi,
Minor comment:

+SELECT int4 '0o112';

Maybe involve digits of up to 7 in the octal test case.

Thanks


Re: Non-decimal integer literals

2021-09-07 Thread Peter Eisentraut

On 16.08.21 17:32, John Naylor wrote:
The one thing that jumped out at me on a cursory reading is 
the {integer} rule, which seems to be used nowhere except to 
call process_integer_literal, which must then inspect the token text to 
figure out what type of integer it is. Maybe consider 4 separate 
process_*_literal functions?


Agreed, that can be done in a simpler way.  Here is an updated patch.
From f90826f77d8067a1641f60dd75d5ea1d83466ea9 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 7 Sep 2021 13:10:18 +0200
Subject: [PATCH v2] Non-decimal integer literals

Add support for hexadecimal, octal, and binary integer literals:

0x42E
0o112
0b100101

per SQL:202x draft.

This adds support in the lexer as well as in the integer type input
functions.

Discussion: 
https://www.postgresql.org/message-id/flat/b239564c-cad0-b23e-c57e-166d883cb...@enterprisedb.com
---
 doc/src/sgml/syntax.sgml | 26 
 src/backend/catalog/sql_features.txt |  1 +
 src/backend/parser/scan.l| 87 ++---
 src/backend/utils/adt/int8.c | 54 
 src/backend/utils/adt/numutils.c | 97 
 src/fe_utils/psqlscan.l  | 55 +++-
 src/interfaces/ecpg/preproc/pgc.l| 64 +++---
 src/test/regress/expected/int2.out   | 19 ++
 src/test/regress/expected/int4.out   | 37 +++
 src/test/regress/expected/int8.out   | 19 ++
 src/test/regress/sql/int2.sql|  7 ++
 src/test/regress/sql/int4.sql| 11 
 src/test/regress/sql/int8.sql|  7 ++
 13 files changed, 422 insertions(+), 62 deletions(-)

diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index d66560b587..8fb4b1228d 100644
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -694,6 +694,32 @@ Numeric Constants
 
 
 
+
+ Additionally, non-decimal integer constants can be used in these forms:
+
+0xhexdigits
+0ooctdigits
+0bbindigits
+
+ hexdigits is one or more hexadecimal digits
+ (0-9, A-F), octdigits is one or more octal
+ digits (0-7), bindigits is one or more binary
+ digits (0 or 1).  Hexadecimal digits and the radix prefixes can be in
+ upper or lower case.  Note that only integers can have non-decimal forms,
+ not numbers with fractional parts.
+
+
+
+ These are some examples of this:
+0b100101
+0B10011001
+0o112
+0O755
+0x42e
+0X
+
+
+
 
  integer
  bigint
diff --git a/src/backend/catalog/sql_features.txt 
b/src/backend/catalog/sql_features.txt
index 9f424216e2..d6359503f3 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -526,6 +526,7 @@ T652SQL-dynamic statements in SQL routines  
NO
 T653   SQL-schema statements in external routines  YES 
 T654   SQL-dynamic statements in external routines NO  
 T655   Cyclically dependent routines   YES 
+T661   Non-decimal integer literalsYES SQL:202x draft
 T811   Basic SQL/JSON constructor functionsNO  
 T812   SQL/JSON: JSON_OBJECTAGGNO  
 T813   SQL/JSON: JSON_ARRAYAGG with ORDER BY   NO  
diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l
index 6e6824faeb..a78fe7a2ed 100644
--- a/src/backend/parser/scan.l
+++ b/src/backend/parser/scan.l
@@ -124,7 +124,7 @@ static void addlit(char *ytext, int yleng, core_yyscan_t 
yyscanner);
 static void addlitchar(unsigned char ychar, core_yyscan_t yyscanner);
 static char *litbufdup(core_yyscan_t yyscanner);
 static unsigned char unescape_single_char(unsigned char c, core_yyscan_t 
yyscanner);
-static int process_integer_literal(const char *token, YYSTYPE *lval);
+static int process_integer_literal(const char *token, YYSTYPE *lval, int 
base);
 static void addunicode(pg_wchar c, yyscan_t yyscanner);
 
 #define yyerror(msg)  scanner_yyerror(msg, yyscanner)
@@ -262,7 +262,7 @@ quotecontinuefail   {whitespace}*"-"?
 xbstart[bB]{quote}
 xbinside   [^']*
 
-/* Hexadecimal number */
+/* Hexadecimal byte string */
 xhstart[xX]{quote}
 xhinside   [^']*
 
@@ -341,7 +341,7 @@ xcstart \/\*{op_chars}*
 xcstop \*+\/
 xcinside   [^*/]+
 
-digit  [0-9]
+
 ident_start[A-Za-z\200-\377_]
 ident_cont [A-Za-z\200-\377_0-9\$]
 
@@ -380,24 +380,39 @@ self  [,()\[\].;\:\+\-\*\/\%\^\<\>\=]
 op_chars   [\~\!\@\#\^\&\|\`\?\+\-\*\/\%\<\>\=]
 operator   {op_chars}+
 
-/* we no longer allow unary minus in numbers.
- * instead we pass it separately to parser. there it gets
- * coerced via doNegate() -- Leon aug 20 1999
+/*
+ * Numbers
  *
- * {decimalfail} is used because we

Re: Non-decimal integer literals

2021-08-16 Thread John Naylor
On Mon, Aug 16, 2021 at 5:52 AM Peter Eisentraut <
peter.eisentr...@enterprisedb.com> wrote:
>
> Here is a patch to add support for hexadecimal, octal, and binary
> integer literals:
>
>  0x42E
>  0o112
>  0b100101
>
> per SQL:202x draft.
>
> This adds support in the lexer as well as in the integer type input
> functions.

The one thing that jumped out at me on a cursory reading is the {integer}
rule, which seems to be used nowhere except to
call process_integer_literal, which must then inspect the token text to
figure out what type of integer it is. Maybe consider 4 separate
process_*_literal functions?

--
John Naylor
EDB: http://www.enterprisedb.com


Non-decimal integer literals

2021-08-16 Thread Peter Eisentraut
Here is a patch to add support for hexadecimal, octal, and binary 
integer literals:


0x42E
0o112
0b100101

per SQL:202x draft.

This adds support in the lexer as well as in the integer type input 
functions.


Those core parts are straightforward enough, but there are a bunch of 
other places where integers are parsed, and one could consider in each 
case whether they should get the same treatment, for example the 
replication syntax lexer, or input function for oid, numeric, and 
int2vector.  There are also some opportunities to move some code around, 
for example scanint8() could be in numutils.c.  I have also looked with 
some suspicion at some details of the number lexing in ecpg, but haven't 
found anything I could break yet.  Suggestions are welcome.
From f2a9b37968a55bf91feb2b4753745c9f5a64be2e Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 16 Aug 2021 09:32:14 +0200
Subject: [PATCH v1] Non-decimal integer literals

Add support for hexadecimal, octal, and binary integer literals:

0x42E
0o112
0b100101

per SQL:202x draft.

This adds support in the lexer as well as in the integer type input
functions.
---
 doc/src/sgml/syntax.sgml | 26 
 src/backend/catalog/sql_features.txt |  1 +
 src/backend/parser/scan.l| 70 ++--
 src/backend/utils/adt/int8.c | 54 
 src/backend/utils/adt/numutils.c | 97 
 src/fe_utils/psqlscan.l  | 55 +++-
 src/interfaces/ecpg/preproc/pgc.l| 64 +++---
 src/test/regress/expected/int2.out   | 19 ++
 src/test/regress/expected/int4.out   | 37 +++
 src/test/regress/expected/int8.out   | 19 ++
 src/test/regress/sql/int2.sql|  7 ++
 src/test/regress/sql/int4.sql| 11 
 src/test/regress/sql/int8.sql|  7 ++
 13 files changed, 412 insertions(+), 55 deletions(-)

diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index d66560b587..8fb4b1228d 100644
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -694,6 +694,32 @@ Numeric Constants
 
 
 
+
+ Additionally, non-decimal integer constants can be used in these forms:
+
+0xhexdigits
+0ooctdigits
+0bbindigits
+
+ hexdigits is one or more hexadecimal digits
+ (0-9, A-F), octdigits is one or more octal
+ digits (0-7), bindigits is one or more binary
+ digits (0 or 1).  Hexadecimal digits and the radix prefixes can be in
+ upper or lower case.  Note that only integers can have non-decimal forms,
+ not numbers with fractional parts.
+
+
+
+ These are some examples of this:
+0b100101
+0B10011001
+0o112
+0O755
+0x42e
+0X
+
+
+
 
  integer
  bigint
diff --git a/src/backend/catalog/sql_features.txt 
b/src/backend/catalog/sql_features.txt
index 9f424216e2..d6359503f3 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -526,6 +526,7 @@ T652SQL-dynamic statements in SQL routines  
NO
 T653   SQL-schema statements in external routines  YES 
 T654   SQL-dynamic statements in external routines NO  
 T655   Cyclically dependent routines   YES 
+T661   Non-decimal integer literalsYES SQL:202x draft
 T811   Basic SQL/JSON constructor functionsNO  
 T812   SQL/JSON: JSON_OBJECTAGGNO  
 T813   SQL/JSON: JSON_ARRAYAGG with ORDER BY   NO  
diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l
index 6e6824faeb..83458ffb30 100644
--- a/src/backend/parser/scan.l
+++ b/src/backend/parser/scan.l
@@ -262,7 +262,7 @@ quotecontinuefail   {whitespace}*"-"?
 xbstart[bB]{quote}
 xbinside   [^']*
 
-/* Hexadecimal number */
+/* Hexadecimal byte string */
 xhstart[xX]{quote}
 xhinside   [^']*
 
@@ -341,7 +341,7 @@ xcstart \/\*{op_chars}*
 xcstop \*+\/
 xcinside   [^*/]+
 
-digit  [0-9]
+
 ident_start[A-Za-z\200-\377_]
 ident_cont [A-Za-z\200-\377_0-9\$]
 
@@ -380,24 +380,41 @@ self  [,()\[\].;\:\+\-\*\/\%\^\<\>\=]
 op_chars   [\~\!\@\#\^\&\|\`\?\+\-\*\/\%\<\>\=]
 operator   {op_chars}+
 
-/* we no longer allow unary minus in numbers.
- * instead we pass it separately to parser. there it gets
- * coerced via doNegate() -- Leon aug 20 1999
+/*
+ * Numbers
+ *
+ * Unary minus is not part of a number here.  Instead we pass it separately to
+ * parser, and there it gets coerced via doNegate().
  *
- * {decimalfail} is used because we would like "1..10" to lex as 1, dot_dot, 
10.
+ * {numericfail} is used because we would like "1..10" to lex as 1, dot_dot, 
10.
  *
  * {realfail1}