After digging into it, you are completely correct. I had to do a bit more
reading to understand the relationships between UTF-8 and wchar, but
ultimately the existing locale support works for my use case.

Therefore I have updated the patch with three much smaller changes:

* Support for `-` in addition to `_`
* Expanding the limit to 512 chars (from the existing 256); again it's not
uncommon for non-English strings to be much longer
* Fixed the documentation to expand on what the ltree label's relationship
to the DB locale is

Thank you,
Garen

On Wed, Oct 5, 2022 at 3:56 PM Tom Lane <t...@sss.pgh.pa.us> wrote:

> Garen Torikian <gjtorik...@gmail.com> writes:
> >> Perhaps the docs are a bit unclear about that, but it's not
> >> restricted to ASCII alphanumerics.  AFAICS the code will accept
> >> whatever iswalpha() and iswdigit() will accept in the database's
> >> default locale.
>
> > Sorry but I don't think that is correct. Here is the single
> > definition check of what constitutes a valid character:
> >
> https://github.com/postgres/postgres/blob/c3315a7da57be720222b119385ed0f7ad7c15268/contrib/ltree/ltree.h#L129
>
> > As you can see, there are no `is_*` calls at all.
>
> Did you chase down what t_isalpha and t_isdigit do?
>
>                         regards, tom lane
>

Attachment: 0002-Expand-character-set-for-ltree-labels.patch
Description: Binary data

Reply via email to