On Thu, Mar 26, 2026 at 1:32 AM Andrey Borodin <[email protected]> wrote: > > > > > On 26 Mar 2026, at 04:40, Tom Lane <[email protected]> wrote: > > > > I wonder whether this discovery puts enough of a hole in the > > value-proposition for base32hex that we should just revert > > this patch altogether. "It works except in some locales" > > isn't a very appetizing prospect, so the whole idea is starting > > to feel more like a foot-gun than a widely-useful feature. > > To be precise, this discovery cast shadows on argument "[base32hex is > ]lexicographically sortable format that preserves temporal ordering for > UUIDv7". And, actually, any UUID. But I do not think it invalidates the > argument completely. > > It's taken from RFC[0], actually, that states: > One property with this alphabet, which the base64 and base32 > alphabets lack, is that encoded data maintains its sort order when > the encoded data is compared bit-wise. > > > RFC does not give any other benefits. > Personally, I like that it's compact, visually better than base64, and > RFC-compliant. > And IMO argument "base32hex is lexicographically sortable format that > preserves ordering for UUID in C locale" is still very strong. > Though, there's a little footy shooty in last 3 words.
Yeah, I still find that base32hex is useful. As I mentioned in another email, I think we should make a note the fact that "base32hex is lexicographically sortable format that preserves ordering for UUID in C locale" in the documentation. I've attached the patch. Feedback is very welcome. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
From 515c666b60f7f81f6b2a004ebfb91b358188470c Mon Sep 17 00:00:00 2001 From: Masahiko Sawada <[email protected]> Date: Thu, 26 Mar 2026 10:17:23 -0700 Subject: [PATCH v1] doc: Add note about collation requirements for base32hex sortability. While fixing the base32hex UUID sortability test in commit 89210037a0a, it turned out that the expected lexicographical order is only maintained under the C collation (or an equivalent byte-wise collation). Since this is not just a testing quirk but could be a real trap users might fall into when sorting encoded data in their databases, we added a note to the documentation to make this requirement explicitly clear. Reviewed-by: Discussion: https://postgr.es/m/cad21aoawx1d6basguqxm0mzpxpwb07kgaoaaahjnhhenbdy...@mail.gmail.com --- doc/src/sgml/func/func-binarystring.sgml | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/doc/src/sgml/func/func-binarystring.sgml b/doc/src/sgml/func/func-binarystring.sgml index 0aaf9bc68f1..9f731d7bca0 100644 --- a/doc/src/sgml/func/func-binarystring.sgml +++ b/doc/src/sgml/func/func-binarystring.sgml @@ -790,6 +790,14 @@ produces a 26-character string compared to the standard 36-character UUID representation. </para> + <note> + <para> + To maintain the lexicographical sort order of the encoded data, + ensure that the text is sorted using the C collation + (e.g., using <literal>COLLATE "C"</literal>). Natural language + collations may sort characters differently and break the ordering. + </para> + </note> </listitem> </varlistentry> -- 2.53.0
