Re: [pcre-dev] Locale-aware (Turkish) Unicode caseless matching

Zoltán Herczeg Fri, 24 May 2013 10:50:51 -0700

Hi,

the other thing is the "only two option bits remained" issue. Designing a new 
API is on the table for a long time, but we never actually started to do it. 
Personally I would do it first before add anything new feature to PCRE, and 
introducing pcre2.h.


Regards,
Zoltan

[email protected] írta:
>On Thu, 23 May 2013, Giuseppe D'Angelo wrote:>
>
> No, because it hasn't to do with Unicode Properties -- it has to do>
> with setting up the case folding tables in a different way depending>
> on the user's language (if it's Turkish, "I" shouldn't fold to -->
> match caselessly -- "i").>
>
We can take our time discussing this, because it is now too late for any >
new changes to the forthcoming 8.33 release.>
>
The problem I see with local changes is the problem that Unicode is>
supposed to solve: what to do if a document is written in more than one>
language? If part is in English and part is in Turkish - which is quite>
possible in an English book that is discussing Turkish literature (or>
vice versa) - how can you have English rules for some parts and Turkish>
rules for others?>
>
PCRE currently gets its case-folding rules from the Unicode tables. I >
would be very loath to start introducing locale-specific exceptions - >
where does this end? I also see these problems:>
>
1. If the tables are modified at PCRE build time you get the best >
performance, but you are then restricted to one locale.>
>
2. If the tables are not modified, but special cases are detected at run >
time, performance is hit - for all users, not just those in the special >
locales.>
>
Perhaps whoever represents Turkish on the Unicode consortium should be >
lobbying for new characters I and i that do not fold to each other.>
>
Philip>
>
-- >
Philip Hazel>
>
-- >
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev >



-- 
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev

Re: [pcre-dev] Locale-aware (Turkish) Unicode caseless matching

Reply via email to