On Wed, Feb 18, 2026 at 6:58 AM Aleksander Alekseev
<[email protected]> wrote:
>
> Hi,
>
> > I only rebased v3 and improved the commit messages, but I didn't
> > account for Masahiko Sawada's feedback for 0002. Andrey, are you still
> > working on this or others can pick it up?
> >
> > The patch is not on the commitfest, so I'm about to add it.
>
> Here is patch v5 where I accounted for the previous feedback from
> Masahiko Sawada and also made some other changes, see below.
>
> > How about the error message like "invalid input length for type uuid"?
> > I think "uuid" should be lower case as it indicates PostgreSQL uuid
> > data type, and it's better to use %s format instead of directly
> > writing "uuid" (see string_to_uuid() for example).
>
> Makes sense. Fixed.
>
> > As for the errdetail message, should we add "bytea" also after "got %d"?
>
> You probably meant "got %d bytes", not "got %d bytea". I believe the
> current message is fine, but maybe native speakers will correct us.
>
> > We already have tests for casting bytes to integer data types in
> > strings.sql. I suggest moving the casting tests from bytea to uuid
> > into therel.
>
> I disagree on the grounds that there are zero tests related to UUID in
> strings.sql; uuid.sql is a more appropriate place for these tests IMO.
> However if someone seconds the idea we can easily move the tests at
> any time.
>
> > For the uuid.sql file, we could add a test to verify that
> > a UUID value remains unchanged when it's cast to bytea and back to
> > UUID. For example,
> >
> > SELECT v = v::bytea::uuid as matched FROM gen_random_uuid() v;
>
> Good point. Added.
>
> > base32hex_encode() doesn't seem to add '=' paddings, but is it
> > intentional? I don't see any description in RFC 4648 that we can omit
> > '=' paddings.
>
> You are right, both base32 and base32hex should add paddings;
> substring() can be used if necessary. Fixed.
>
> > I think the patch should add tests not only for uuid data type but
> > also for general cases like other encodings.
>
> Yes, and the good place for these tests would be closer to other tests
> for encode() and decode() i.e. strings.sql. Fixed.
>
> While working on it I noticed some inconsistencies between base32hex
> implementation and our current implementation of base64. As an
> example, we don't allow `=` input:
>
> ```
> =# SELECT decode('=', 'base64');
> ERROR: unexpected "=" while decoding base64 sequence
> ```
>
> ... while base32hex did. I fixed such inconsistencies too.
>
> > In uuid.sql tests, how about adding some tests to check if base32hex
> > maintains the sortability of UUIDv7 data?
>
> Agree. Added.
>
> > I think we should update the documentation in the uuid section about
> > casting data between bytea and uuid. For references, we have a similar
> > description for bytea and integer[1].
>
> Fair point. Fixed.
>
Thank you for updating the patch!
I've reviewed both patches and have some comments.
* 0001 patch
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
+ errmsg("invalid input length for type %s", "uuid"),
+ errdetail("Expected %d bytes, got %d.", UUID_LEN, len)));
I think we need to handle plural and singular forms depending on the
value. Or we can change it to "Expected size %d, got %d".
* 0002 patch:
errhint("Valid encodings are \"%s\", \"%s\", \"%s\",
and \"%s\".",
"base64", "base64url", "escape", "hex")));
We need to add 'base32hex' here.
---
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("unexpected \"=\" while decoding
%s sequence",
+ "base32hex")));
I think we can directly write 'base32hex' in the error message.
---
+ /* Verify no extra bits remain (padding bits should be zero) */
+ if (bits_in_buffer > 0 && (bits_buffer & ((1ULL << bits_in_buffer)
- 1)) != 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid base32hex end sequence"),
+ errhint("Input data has non-zero padding bits.")));
This code checks if the remaining bits of the input data are all zero.
IIUC we don't have a similar check for base64 and base64url. For
instance, the following input data is accepted:
=# select decode('AB', 'base64');
decode
--------
\x00
(1 row)
I think it's better to have consistent behavior across our encoding.
I've attached a patch for the 0002 patch part that fixes the above
points (except for the last point) and has some minor fixes as well.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
diff --git a/doc/src/sgml/func/func-binarystring.sgml b/doc/src/sgml/func/func-binarystring.sgml
index 51be0463eec..7aa3805f1ec 100644
--- a/doc/src/sgml/func/func-binarystring.sgml
+++ b/doc/src/sgml/func/func-binarystring.sgml
@@ -815,14 +815,16 @@
The <literal>base32hex</literal> format is that of
<ulink url="https://datatracker.ietf.org/doc/html/rfc4648#section-7">
RFC 4648 Section 7</ulink>. It uses the extended hex alphabet
- (0-9, A-V) which preserves sort order when encoding binary data.
- The <function>encode</function> function produces padded output,
- while <function>decode</function> accepts both padded and unpadded
- input. Decoding is case-insensitive and ignores whitespace characters.
+ (<literal>0</literal>-<literal>9</literal> and
+ <literal>A</literal>-<literal>V</literal>) which preserves sort order
+ when encoding binary data. The <function>encode</function> function
+ produces padded output, while <function>decode</function> accepts both
+ padded and unpadded input. Decoding is case-insensitive and ignores
+ whitespace characters.
</para>
<para>
This format can be used for encoding UUIDs in a compact, sortable format:
- <literal>substring(encode(uuid_value :: bytea, 'base32hex') from 1 for 26)</literal>
+ <literal>substring(encode(uuid_value::bytea, 'base32hex') FROM 1 FOR 26)</literal>
produces a 26-character string compared to the standard 36-character
UUID representation.
</para>
diff --git a/src/backend/utils/adt/encode.c b/src/backend/utils/adt/encode.c
index 6153a5cb5f2..ea11bc3f3b5 100644
--- a/src/backend/utils/adt/encode.c
+++ b/src/backend/utils/adt/encode.c
@@ -65,8 +65,8 @@ binary_encode(PG_FUNCTION_ARGS)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("unrecognized encoding: \"%s\"", namebuf),
- errhint("Valid encodings are \"%s\", \"%s\", \"%s\", and \"%s\".",
- "base64", "base64url", "escape", "hex")));
+ errhint("Valid encodings are \"%s\", \"%s\", \"%s\", \"%s\", and \"%s\".",
+ "base64", "base64url", "base32hex", "escape", "hex")));
dataptr = VARDATA_ANY(data);
datalen = VARSIZE_ANY_EXHDR(data);
@@ -115,8 +115,8 @@ binary_decode(PG_FUNCTION_ARGS)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("unrecognized encoding: \"%s\"", namebuf),
- errhint("Valid encodings are \"%s\", \"%s\", \"%s\", and \"%s\".",
- "base64", "base64url", "escape", "hex")));
+ errhint("Valid encodings are \"%s\", \"%s\", \"%s\", \"%s\", and \"%s\".",
+ "base64", "base64url", "base32hex", "escape", "hex")));
dataptr = VARDATA_ANY(data);
datalen = VARSIZE_ANY_EXHDR(data);
@@ -873,15 +873,11 @@ base32hex_encode(const char *src, size_t srclen, char *dst)
/* Handle remaining bits (if any) */
if (bits_in_buffer > 0)
- {
dst[output_pos++] = base32hex_table[(bits_buffer << (5 - bits_in_buffer)) & 0x1F];
- }
/* Add padding to make length a multiple of 8 (per RFC 4648) */
while (output_pos % 8 != 0)
- {
dst[output_pos++] = '=';
- }
return output_pos;
}
@@ -909,8 +905,8 @@ base32hex_decode(const char *src, size_t srclen, char *dst)
if (c == '=')
{
/*
- * Padding is only valid at positions 2, 4, 5, or 7 within an
- * 8-character group (corresponding to 1, 2, 3, or 4 input bytes).
+ * The first padding is only valid at positions 2, 4, 5, or 7 within
+ * an 8-character group (corresponding to 1, 2, 3, or 4 input bytes).
* We only check the position for the first '=' character.
*/
if (!end)
@@ -918,8 +914,7 @@ base32hex_decode(const char *src, size_t srclen, char *dst)
if (pos != 2 && pos != 4 && pos != 5 && pos != 7)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("unexpected \"=\" while decoding %s sequence",
- "base32hex")));
+ errmsg("unexpected \"=\" while decoding base32hex sequence")));
end = true;
}
pos++;
@@ -930,9 +925,8 @@ base32hex_decode(const char *src, size_t srclen, char *dst)
if (end)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("invalid symbol \"%.*s\" found while decoding %s sequence",
- pg_mblen((const char *) &c), (const char *) &c,
- "base32hex")));
+ errmsg("invalid symbol \"%.*s\" found while decoding base32hex sequence",
+ pg_mblen((const char *) &c), (const char *) &c)));
/* Decode base32hex character (0-9, A-V, case-insensitive) */
if (c >= '0' && c <= '9')
@@ -944,9 +938,8 @@ base32hex_decode(const char *src, size_t srclen, char *dst)
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("invalid symbol \"%.*s\" found while decoding %s sequence",
- pg_mblen((const char *) &c), (const char *) &c,
- "base32hex")));
+ errmsg("invalid symbol \"%.*s\" found while decoding base32hex sequence",
+ pg_mblen((const char *) &c), (const char *) &c)));
/* Add 5 bits to buffer */
bits_buffer = (bits_buffer << 5) | val;
diff --git a/src/test/regress/expected/strings.out b/src/test/regress/expected/strings.out
index 910757537e7..53b1a14c895 100644
--- a/src/test/regress/expected/strings.out
+++ b/src/test/regress/expected/strings.out
@@ -2600,10 +2600,10 @@ SELECT decode(encode('\x1234567890abcdef00', 'escape'), 'escape');
-- report an error with a hint listing valid encodings when an invalid encoding is specified
SELECT encode('\x01'::bytea, 'invalid'); -- error
ERROR: unrecognized encoding: "invalid"
-HINT: Valid encodings are "base64", "base64url", "escape", and "hex".
+HINT: Valid encodings are "base64", "base64url", "base32hex", "escape", and "hex".
SELECT decode('00', 'invalid'); -- error
ERROR: unrecognized encoding: "invalid"
-HINT: Valid encodings are "base64", "base64url", "escape", and "hex".
+HINT: Valid encodings are "base64", "base64url", "base32hex", "escape", and "hex".
--
-- base32hex encoding/decoding
--
@@ -2698,6 +2698,12 @@ SELECT decode('24', 'base32hex'); -- OK, padding `=` are optional
\x11
(1 row)
+SELECT decode('24=======', 'base32hex'); -- OK
+ decode
+--------
+ \x11
+(1 row)
+
SELECT decode('24h36h2lco', 'base32hex'); -- OK, the encoding is case-insensitive
decode
----------------
@@ -2710,6 +2716,9 @@ SELECT decode('W', 'base32hex'); -- error
ERROR: invalid symbol "W" found while decoding base32hex sequence
SELECT decode('24H36H0=24', 'base32hex'); -- error
ERROR: invalid symbol "2" found while decoding base32hex sequence
+SELECT decode('11=', 'base32hex'); -- error
+ERROR: invalid base32hex end sequence
+HINT: Input data has non-zero padding bits.
--
-- base64url encoding/decoding
--
diff --git a/src/test/regress/sql/strings.sql b/src/test/regress/sql/strings.sql
index b5237e85172..6ebc192a9b1 100644
--- a/src/test/regress/sql/strings.sql
+++ b/src/test/regress/sql/strings.sql
@@ -856,10 +856,12 @@ SELECT decode('24H36H2L', 'base32hex'); -- \x1122334455
SELECT decode('24H36H2LCO======', 'base32hex'); -- \x112233445566
SELECT decode('24', 'base32hex'); -- OK, padding `=` are optional
+SELECT decode('24=======', 'base32hex'); -- OK
SELECT decode('24h36h2lco', 'base32hex'); -- OK, the encoding is case-insensitive
SELECT decode('=', 'base32hex'); -- error
SELECT decode('W', 'base32hex'); -- error
SELECT decode('24H36H0=24', 'base32hex'); -- error
+SELECT decode('11=', 'base32hex'); -- error
--