On Wed, Oct 5, 2022 at 3:53 PM Tom Lane <t...@sss.pgh.pa.us> wrote:
> I happened to wonder why various places are testing things like > > #define ISWORDCHR(c) (t_isalpha(c) || t_isdigit(c)) > > rather than using an isalnum-equivalent test. The direct answer > is that ts_locale.c/.h provides no such test function, which > apparently is because there's not a lot of potential callers in > the core code. However, both pg_trgm and ltree could benefit > from adding one. > > There's no semantic hazard here: the documentation I consulted > is all pretty explicit that is[w]alnum is true exactly when > either is[w]alpha or is[w]digit are. For example, POSIX saith > > The iswalpha() and iswalpha_l() functions shall test whether wc is a > wide-character code representing a character of class alpha in the > current locale, or in the locale represented by locale, respectively; > see XBD Locale. > > The iswdigit() and iswdigit_l() functions shall test whether wc is a > wide-character code representing a character of class digit in the > current locale, or in the locale represented by locale, respectively; > see XBD Locale. > > The iswalnum() and iswalnum_l() functions shall test whether wc is a > wide-character code representing a character of class alpha or digit > in the current locale, or in the locale represented by locale, > respectively; see XBD Locale. > > While I didn't try to actually measure it, these functions don't > look remarkably cheap. Doing char2wchar() twice when we only need > to do it once seems silly, and the libc functions themselves are > probably none too cheap for multibyte characters either. > > Hence, I propose the attached. I got rid of some places that were > unnecessarily checking pg_mblen before applying t_iseq(), too. > > regards, tom lane > > I see this is already committed, but I'm curious, why do t_isalpha and t_isdigit have the pair of /* TODO */ comments? This unfinished business isn't explained anywhere in the file.