It's good that the masking with 0x1fffff now only occurs if PCRE_NO_UTF32_CHECK is specified. The Unicode conformance can be improved, and the code made slightly smaller, faster, and more flexible, with a simple change to pcre_internal.h. By default, PCRE_NO_UTF32_CHECK should disable checking without enabling masking. Masking can be enabled by a compile-time option. The definition of UTF32_MASK can be replaced by the following:
#if defined PCRE_MASK_UTF32_BEYOND_1FFFFF #define ADJUST_UTF32_CODE_UNIT(c) ((c) & 0x1fffffu) #else #define ADJUST_UTF32_CODE_UNIT(c) (c) #endif and these macros can be revised as follows: #define GETCHAR(c, eptr) \ c = ADJUST_UTF32_CODE_UNIT(*(eptr)); #define GETCHARTEST(c, eptr) \ c = *eptr; \ if (utf) c = ADJUST_UTF32_CODE_UNIT(c); #define GETCHARINC(c, eptr) \ c = ADJUST_UTF32_CODE_UNIT(*eptr++); #define GETCHARINCTEST(c, eptr) \ c = *eptr++; \ if (utf) c = ADJUST_UTF32_CODE_UNIT(c); #define RAWUCHAR(eptr) \ ADJUST_UTF32_CODE_UNIT(*(eptr)) #define RAWUCHARINC(eptr) \ ADJUST_UTF32_CODE_UNIT(*(eptr)++) #define RAWUCHARTEST(eptr) \ (utf ? (ADJUST_UTF32_CODE_UNIT(*(eptr))) : *(eptr)) #define RAWUCHARINCTEST(eptr) \ (utf ? (ADJUST_UTF32_CODE_UNIT(*(eptr)++)) : *(eptr)++) Best wishes, Tom 文林 Wenlin Institute, Inc. Software for Learning Chinese E-mail: [email protected] Web: http://www.wenlin.com Telephone: 1-877-4-WENLIN (1-877-493-6546) ☯ -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev
