Re: [pcre-dev] Using PCRE upon Asian and other two-byte national codings

Zoltán Herczeg Sun, 24 Nov 2013 00:58:36 -0800

Hi,

currently PCRE character tables can only hold lowercase / flipped case and 
various type bits for the first 256 characters. Supporting the whole 64K 
character set in 16 bit mode would take 409600 bytes of memory, which is less 
than half megabyte. Today, even smartphones can afford that cost. The trade-of 
would be that the same tables could not be used in 8/16/32 bit modes anymore, 
since the lowercase / flipped case tables would depend on the natural character 
length. Hence a table with only 256 characters would be bigger in 16/32 bit 
mode than now. (Note: the table size would always be divisible by 256. This 
would allow not to change anything in 8 bit mode, but we could also support 
character sets which does not have 64K characters in 16 bit and especially in 
32 bit mode, where we have 4096M characters).


I am sure we cannot do this for 8.34 (this is not an easy task), but if this is 
important for many people, we might think about this later.

Regards,
Zoltan

p...@hermes.cam.ac.uk írta:
>On Sat, 23 Nov 2013, Zoltán Herczeg wrote:
>
>> PCRE supports 2 or 4 byte character encodings, but character
>> properties are only supported for 0-255 character codes. 
>
>I think I had better clarify that, for the record. The 16-bit and 32-bit
>PCRE libraries do support Unicode character properties, just like the
>8-bit library. However, locale-based properties apply only to 0-255
>character codes.
>
>Philip
>
>-- 
>Philip Hazel


-- 
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev

Re: [pcre-dev] Using PCRE upon Asian and other two-byte national codings

Reply via email to