After digging into it, you are completely correct. I had to do a bit more reading to understand the relationships between UTF-8 and wchar, but ultimately the existing locale support works for my use case.
Therefore I have updated the patch with three much smaller changes: * Support for `-` in addition to `_` * Expanding the limit to 512 chars (from the existing 256); again it's not uncommon for non-English strings to be much longer * Fixed the documentation to expand on what the ltree label's relationship to the DB locale is Thank you, Garen On Wed, Oct 5, 2022 at 3:56 PM Tom Lane <t...@sss.pgh.pa.us> wrote: > Garen Torikian <gjtorik...@gmail.com> writes: > >> Perhaps the docs are a bit unclear about that, but it's not > >> restricted to ASCII alphanumerics. AFAICS the code will accept > >> whatever iswalpha() and iswdigit() will accept in the database's > >> default locale. > > > Sorry but I don't think that is correct. Here is the single > > definition check of what constitutes a valid character: > > > https://github.com/postgres/postgres/blob/c3315a7da57be720222b119385ed0f7ad7c15268/contrib/ltree/ltree.h#L129 > > > As you can see, there are no `is_*` calls at all. > > Did you chase down what t_isalpha and t_isdigit do? > > regards, tom lane >
0002-Expand-character-set-for-ltree-labels.patch
Description: Binary data