On Wed, Oct 5, 2022 at 3:53 PM Tom Lane <t...@sss.pgh.pa.us> wrote:

> I happened to wonder why various places are testing things like
>
> #define ISWORDCHR(c)    (t_isalpha(c) || t_isdigit(c))
>
> rather than using an isalnum-equivalent test.  The direct answer
> is that ts_locale.c/.h provides no such test function, which
> apparently is because there's not a lot of potential callers in
> the core code.  However, both pg_trgm and ltree could benefit
> from adding one.
>
> There's no semantic hazard here: the documentation I consulted
> is all pretty explicit that is[w]alnum is true exactly when
> either is[w]alpha or is[w]digit are.  For example, POSIX saith
>
>     The iswalpha() and iswalpha_l() functions shall test whether wc is a
>     wide-character code representing a character of class alpha in the
>     current locale, or in the locale represented by locale, respectively;
>     see XBD Locale.
>
>     The iswdigit() and iswdigit_l() functions shall test whether wc is a
>     wide-character code representing a character of class digit in the
>     current locale, or in the locale represented by locale, respectively;
>     see XBD Locale.
>
>     The iswalnum() and iswalnum_l() functions shall test whether wc is a
>     wide-character code representing a character of class alpha or digit
>     in the current locale, or in the locale represented by locale,
>     respectively; see XBD Locale.
>
> While I didn't try to actually measure it, these functions don't
> look remarkably cheap.  Doing char2wchar() twice when we only need
> to do it once seems silly, and the libc functions themselves are
> probably none too cheap for multibyte characters either.
>
> Hence, I propose the attached.  I got rid of some places that were
> unnecessarily checking pg_mblen before applying t_iseq(), too.
>
>                         regards, tom lane
>
>
I see this is already committed, but I'm curious, why do t_isalpha and
t_isdigit have the pair of /* TODO */ comments? This unfinished business
isn't explained anywhere in the file.

Reply via email to