Hi Tom, > Perhaps the docs are a bit unclear about that, but it's not > restricted to ASCII alphanumerics. AFAICS the code will accept > whatever iswalpha() and iswdigit() will accept in the database's > default locale.
Sorry but I don't think that is correct. Here is the single definition check of what constitutes a valid character: https://github.com/postgres/postgres/blob/c3315a7da57be720222b119385ed0f7ad7c15268/contrib/ltree/ltree.h#L129 As you can see, there are no `is_*` calls at all. Where in this contrib package do you see `iswalpha`? Perhaps I missed it. > That seems really pretty random. Ok. I am trying to avoid a situation where other users may wish to use other delimiters other than `-`, due to its commonplace presence in words (eg., compound ones). On Wed, Oct 5, 2022 at 2:59 PM Tom Lane <t...@sss.pgh.pa.us> wrote: > Garen Torikian <gjtorik...@gmail.com> writes: > > I am submitting a patch to expand the label requirements for ltree. > > > The current format is restricted to alphanumeric characters, plus _. > > Unfortunately, for non-English labels, this set is insufficient. > > Hm? Perhaps the docs are a bit unclear about that, but it's not > restricted to ASCII alphanumerics. AFAICS the code will accept > whatever iswalpha() and iswdigit() will accept in the database's > default locale. There's certainly work that could/should be done > to allow use of not-so-default locales, but that's not specific > to ltree. I'm not sure that doing an application-side encoding > is attractive compared to just using that ability directly. > > If you do want to do application-side encoding, I'm unsure why > punycode would be the choice anyway, as opposed to something > that can fit in the existing restrictions. > > > On top of this, I added support for two more characters: # and ;, which > are > > used for HTML entities. > > That seems really pretty random. > > regards, tom lane >