On Sun, Jun 28, 2026 at 7:20 PM Haibo Yan <[email protected]> wrote: > > On Thu, Jun 25, 2026 at 3:16 PM Masahiko Sawada <[email protected]> wrote: > > > > On Thu, Jun 25, 2026 at 2:31 PM Haibo Yan <[email protected]> wrote: > > > > > > > > > > > > On Thu, Jun 25, 2026 at 11:28 AM Masahiko Sawada <[email protected]> > > > wrote: > > >> > > >> Hi all, > > >> > > >> I'd like to propose the $subject. > > >> > > >> Since commit ec8719ccbfcd made hex_decode_safe() SIMD-aware, decoding > > >> a run of hex digits is now fast. The attached patch reuses > > >> hex_decode_safe() in the UUID input function to speed up parsing. > > >> > > >> We accept several textual forms of a UUID[1]. The fast path handles > > >> the common ones: 32 hex digits, the canonical 8x-4x-4x-4x-12x form > > >> (where "nx" means n hex digits), and either of those wrapped in > > >> braces. Otherwise, it falls back to the ordinary scalar UUID parse. > > >> > > >> I've benchmarked the parse speed using the following query: > > >> > > >> CREATE TEMP TABLE u AS SELECT gen_random_uuid()::text AS t FROM > > >> generate_series(1, 1000000); > > >> EXPLAIN (ANALYZE, TIMING OFF) SELECT t::uuid FROM u; > > >> > > >> I compared the execution time of the second query, which measures > > >> uuid_in() alone, with/without SIMD optimization. Here are results (the > > >> median of 5 runs): > > >> > > >> HEAD: 208.879 ms > > >> Patched: 40.983 ms > > >> > > >> The improvements look promising to me. But in a realistic pipeline the > > >> parse is a small fraction of the work, so end-to-end gains could be > > >> much smaller. > > >> > > >> Feedback is very welcome. > > >> > > > I may be missing something, but I wonder whether the fast path is relying > > > on > > > slightly different input semantics from the existing UUID parser. > > > > > > In particular, hex_decode_safe() is not a strict “32 hex characters only” > > > decoder. It skips whitespace, which is fine for its existing callers, > > > but I > > > don’t think UUID input should treat whitespace inside the UUID body as > > > ignorable. > > > > Good catch! hex_decode_safe() skips whitespaces so the patch accepts > > the following UUID value, which is bad: > > > > select '019f00b5-7f8a-722f-b707-59f0ed25cd '::uuid; > > uuid > > -------------------------------------- > > 019f00b5-7f8a-722f-b707-59f0ed25cd00 > > (1 row) > > > > > Also, since hex_decode_safe() returns void, the UUID fast path > > > cannot verify that exactly UUID_LEN bytes were produced. > > > > IIUC hex_decode_safe() does return the output length in bytes. So I > > think we can fallback to the scalar UUID parser if > > esctx.error_occurred is true or if the returned value is not 16. > > > > You’re right, I misread that part. Checking both esctx.error_occurred and > the returned length sounds good to me. > > > > > > > So I think it would be safer either to pre-validate that the 32 source > > > characters are all hex digits before calling hex_decode_safe(), or to use > > > a > > > UUID-specific strict hex decoder for this path. After that, a comment > > > explaining why hex_decode_safe() is safe here would make the invariant > > > much > > > clearer. > > > > IIUC hex_decode_simd_helper() accepts only hex digits so we could > > re-use it for UUID parsing. Let me check if the above idea of using > > the return value works for us first. > > > > That sounds reasonable. My main concern was to keep the fast path’s accepted > input set identical to the scalar UUID parser. Falling back when the decoded > length is not UUID_LEN, together with regression tests for whitespace cases, > should address that. > > > > > > > Could you also add a few regression tests for invalid inputs that contain > > > whitespace inside otherwise fast-path-looking UUID strings? For example: > > > > > > --------------------------------------------------------------- > > > > > > SELECT 'a0eebc99 9c0b4ef8bb6d6bb9bd380a11'::uuid; > > > SELECT 'a0eebc999c0b4ef8bb6d6bb9bd380a1 '::uuid; > > > SELECT '{a0eebc999c0b4ef8bb6d6bb9bd380a1 }'::uuid; > > > SELECT 'a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a1 '::uuid; > > > --------------------------------------------------------------- > > > > > > These should continue to be rejected in the same way as the scalar parser. > > > Regards, > > > > Agreed. > >
I've attached the updated patch. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
From 7e8055206ce3c3abef611028c1dfdca1d4fde0c0 Mon Sep 17 00:00:00 2001 From: Masahiko Sawada <[email protected]> Date: Thu, 25 Jun 2026 10:03:44 -0700 Subject: [PATCH v2] Optimize UUID parse using SIMD. Author: Reviewed-by: Discussion: https://postgr.es/m/ --- src/backend/utils/adt/uuid.c | 102 +++++++++++++++++++++++++++-- src/test/regress/expected/uuid.out | 55 ++++++++++++++++ src/test/regress/sql/uuid.sql | 16 +++++ 3 files changed, 168 insertions(+), 5 deletions(-) diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c index 6ee3752ac78..6e7b841bde4 100644 --- a/src/backend/utils/adt/uuid.c +++ b/src/backend/utils/adt/uuid.c @@ -19,7 +19,9 @@ #include "common/hashfn.h" #include "lib/hyperloglog.h" #include "libpq/pqformat.h" +#include "nodes/miscnodes.h" #include "port/pg_bswap.h" +#include "utils/builtins.h" #include "utils/fmgrprotos.h" #include "utils/guc.h" #include "utils/skipsupport.h" @@ -122,13 +124,10 @@ uuid_out(PG_FUNCTION_ARGS) } /* - * We allow UUIDs as a series of 32 hexadecimal digits with an optional dash - * after each group of 4 hexadecimal digits, and optionally surrounded by {}. - * (The canonical format 8x-4x-4x-4x-12x, where "nx" means n hexadecimal - * digits, is the only one used for output.) + * General UUID parser. */ static void -string_to_uuid(const char *source, pg_uuid_t *uuid, Node *escontext) +string_to_uuid_scalar(const char *source, pg_uuid_t *uuid, Node *escontext) { const char *src = source; bool braces = false; @@ -177,6 +176,99 @@ syntax_error: "uuid", source))); } +/* + * Fast path for the common UUID shapes, built on our SIMD-aware hex decoder. + * + * This handles a bare string of 32 hex digits and the canonical + * 8x-4x-4x-4x-12x form (where "nx" means n hex digits), each optionally + * wrapped in braces. Any other shape, or any decoding error, is handed off to + * string_to_uuid_scalar() so that parsing and error reporting stay identical + * to the scalar implmentation. + */ +#ifndef USE_NO_SIMD +static void +string_to_uuid_fast(const char *source, pg_uuid_t *uuid, Node *escontext) +{ + const char *body = source; + size_t len = strlen(source); + const char *hexsrc = NULL; + char hexbuf[32]; + uint64 written; + ErrorSaveContext esctx = {T_ErrorSaveContext}; + + /* Strip one optional surrounding brace pair */ + if (len >= 2 && source[0] == '{' && source[len - 1] == '}') + { + body = source + 1; + len -= 2; + } + + if (len == 32) + { + /* + * Body is already 32 contiguous hex digits -- decode straight from + * the input. hex_decode_safe() reads exactly body[0..31], so it never + * touches the trailing NULL or '}'. + */ + hexsrc = body; + } + else if (len == 36 && body[8] == '-' && body[13] == '-' && + body[18] == '-' && body[23] == '-') + { + /* + * Canonical 8x-4x-4x-4x-12x form; compact them into hexbuf with + * fixed-offset copies, dropping the dashes. + */ + memcpy(&hexbuf[0], &body[0], 8); + memcpy(&hexbuf[8], &body[9], 4); + memcpy(&hexbuf[12], &body[14], 4); + memcpy(&hexbuf[16], &body[19], 4); + memcpy(&hexbuf[20], &body[24], 12); + hexsrc = hexbuf; + } + + if (hexsrc == NULL) + { + /* Uncommon shape; let the general parse handle it */ + string_to_uuid_scalar(source, uuid, escontext); + return; + } + + /* + * Decode the UUID hex data using our hex decoder that is SIMD-aware. We + * give it a private error context so that a decode failure is swalled + * here and reported by the scalar path instead, kepping the error message + * identical. + */ + written = hex_decode_safe(hexsrc, 32, (char *) uuid->data, (Node *) &esctx); + + /* + * Fall back to the scalar path on any error. We must also reject a short + * result: hex_decode_safe() skips whitespaces, so it can succeed yet + * write fewer than UUID_LEN bytes, whereas the UUID grammer forbids + * whitespaces. + */ + if (esctx.error_occurred || written != UUID_LEN) + string_to_uuid_scalar(source, uuid, escontext); +} +#endif + +/* + * We allow UUIDs as a series of 32 hexadecimal digits with an optional dash + * after each group of 4 hexadecimal digits, and optionally surrounded by {}. + * (The canonical format 8x-4x-4x-4x-12x, where "nx" means n hexadecimal + * digits, is the only one used for output.) + */ +static void +string_to_uuid(const char *source, pg_uuid_t *uuid, Node *escontext) +{ +#ifdef USE_NO_SIMD + string_to_uuid_scalar(source, uuid, escontext); +#else + string_to_uuid_fast(source, uuid, escontext); +#endif +} + Datum uuid_recv(PG_FUNCTION_ARGS) { diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out index 9c5dda9e9ab..928e71c7ad3 100644 --- a/src/test/regress/expected/uuid.out +++ b/src/test/regress/expected/uuid.out @@ -340,5 +340,60 @@ SELECT v = v::bytea::uuid as matched FROM gen_random_uuid() v; t (1 row) +-- Test UUID shapes that the parser uses the SIMD path. +SELECT '5b35380a-7143-4912-9b55-f322699c6770'::uuid; + uuid +-------------------------------------- + 5b35380a-7143-4912-9b55-f322699c6770 +(1 row) + +SELECT '{5b35380a-7143-4912-9b55-f322699c6770}'::uuid; + uuid +-------------------------------------- + 5b35380a-7143-4912-9b55-f322699c6770 +(1 row) + +SELECT '5b35380a714349129b55f322699c6770'::uuid; + uuid +-------------------------------------- + 5b35380a-7143-4912-9b55-f322699c6770 +(1 row) + +SELECT '{5b35380a714349129b55f322699c6770}'::uuid; + uuid +-------------------------------------- + 5b35380a-7143-4912-9b55-f322699c6770 +(1 row) + +-- Test if the UUID parser using SIMD optimization correctly rejects invalid UUID +-- string format. +SELECT '5b35380a714349129b55f32 99c6770'::uuid; +ERROR: invalid input syntax for type uuid: "5b35380a714349129b55f32 99c6770" +LINE 1: SELECT '5b35380a714349129b55f32 99c6770'::uuid; + ^ +SELECT '5b35380a-7143-4912-9b55-f322699c67 '::uuid; +ERROR: invalid input syntax for type uuid: "5b35380a-7143-4912-9b55-f322699c67 " +LINE 1: SELECT '5b35380a-7143-4912-9b55-f322699c67 '::uuid; + ^ +SELECT ' 35380a-7143-4912-9b55-f322699c6770'::uuid; +ERROR: invalid input syntax for type uuid: " 35380a-7143-4912-9b55-f322699c6770" +LINE 1: SELECT ' 35380a-7143-4912-9b55-f322699c6770'::uuid; + ^ +SELECT 'AZ35380a-7143-4912-9b55-f322699c6770'::uuid; +ERROR: invalid input syntax for type uuid: "AZ35380a-7143-4912-9b55-f322699c6770" +LINE 1: SELECT 'AZ35380a-7143-4912-9b55-f322699c6770'::uuid; + ^ +SELECT '{AZ35380a-7143-4912-9b55-f322699c6770}'::uuid; +ERROR: invalid input syntax for type uuid: "{AZ35380a-7143-4912-9b55-f322699c6770}" +LINE 1: SELECT '{AZ35380a-7143-4912-9b55-f322699c6770}'::uuid; + ^ +SELECT '{AZ35380a714349129b55f322699c6770}'::uuid; +ERROR: invalid input syntax for type uuid: "{AZ35380a714349129b55f322699c6770}" +LINE 1: SELECT '{AZ35380a714349129b55f322699c6770}'::uuid; + ^ +SELECT '{AZ35380a714349129b55f322699c67 }'::uuid; +ERROR: invalid input syntax for type uuid: "{AZ35380a714349129b55f322699c67 }" +LINE 1: SELECT '{AZ35380a714349129b55f322699c67 }'::uuid; + ^ -- clean up DROP TABLE guid1, guid2, guid3 CASCADE; diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql index 8cc2ad40614..d67d3d2ded9 100644 --- a/src/test/regress/sql/uuid.sql +++ b/src/test/regress/sql/uuid.sql @@ -161,5 +161,21 @@ SELECT '\x019a2f859ced7225b99d9c55044a2563'::bytea::uuid; SELECT '\x1234567890abcdef'::bytea::uuid; -- error SELECT v = v::bytea::uuid as matched FROM gen_random_uuid() v; +-- Test UUID shapes that the parser uses the SIMD path. +SELECT '5b35380a-7143-4912-9b55-f322699c6770'::uuid; +SELECT '{5b35380a-7143-4912-9b55-f322699c6770}'::uuid; +SELECT '5b35380a714349129b55f322699c6770'::uuid; +SELECT '{5b35380a714349129b55f322699c6770}'::uuid; + +-- Test if the UUID parser using SIMD optimization correctly rejects invalid UUID +-- string format. +SELECT '5b35380a714349129b55f32 99c6770'::uuid; +SELECT '5b35380a-7143-4912-9b55-f322699c67 '::uuid; +SELECT ' 35380a-7143-4912-9b55-f322699c6770'::uuid; +SELECT 'AZ35380a-7143-4912-9b55-f322699c6770'::uuid; +SELECT '{AZ35380a-7143-4912-9b55-f322699c6770}'::uuid; +SELECT '{AZ35380a714349129b55f322699c6770}'::uuid; +SELECT '{AZ35380a714349129b55f322699c67 }'::uuid; + -- clean up DROP TABLE guid1, guid2, guid3 CASCADE; -- 2.54.0
