pgsql: Make regex "max_chr" depend on encoding, not provider.

Jeff Davis Mon, 01 Dec 2025 11:09:32 -0800

Make regex "max_chr" depend on encoding, not provider.

The regex mechanism scans through the first "max_chr" character values
to cache character property ranges (isalpha, etc.). For single-byte
encodings, there's no sense in scanning beyond UCHAR_MAX; but for
UTF-8 it makes sense to cache higher code point values (though not all
of them; only up to MAX_SIMPLE_CHR).


Prior to 5a38104b36, the logic about how many character values to scan
was based on the pg_regex_strategy, which was dependent on the
provider. Commit 5a38104b36 preserved that logic exactly, allowing
different providers to define the "max_chr".

Now, change it to depend only on the encoding and whether
ctype_is_c. For this specific calculation, distinguishing between
providers creates more complexity than it's worth.

Discussion: 
https://postgr.es/m/[email protected]
Reviewed-by: Chao Li <[email protected]>

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/19b966243c38196a33b033fb0c259dcf760c0d69

Modified Files
--------------
src/backend/regex/regc_pg_locale.c     | 18 ++++++++++--------
src/backend/utils/adt/pg_locale_libc.c |  2 --
src/include/utils/pg_locale.h          |  6 ------
3 files changed, 10 insertions(+), 16 deletions(-)

pgsql: Make regex "max_chr" depend on encoding, not provider.

Reply via email to