[pcre-dev] [Bug 1208] New: Case folding in PCRE

Giuseppe D'Angelo Tue, 07 Feb 2012 14:07:15 -0800

------- You are receiving this mail because: -------
You are on the CC list for the bug.


http://bugs.exim.org/show_bug.cgi?id=1208
           Summary: Case folding in PCRE
           Product: PCRE
           Version: 8.30
          Platform: Other
        OS/Version: Linux
            Status: NEW
          Severity: wishlist
          Priority: low
         Component: Code
        AssignedTo: [email protected]
        ReportedBy: [email protected]
                CC: [email protected]


Hi,

I was wondering what's the (planned?) status of casefolding in PCRE when doing
a (case insensitive) match using Unicode. 

For instance, "ß" (U+00DF LATIN SMALL LETTER SHARP S) should match "ss" (or
even "SS" in case insensitive); µ (U+00B5, MICRO SIGN) should match μ
(U+03BC, GREEK SMALL LETTER MU), or Μ (U+039C, GREEK CAPITAL LETTER MU). The
CaseFolding.txt file from Unicode says

# If all characters are mapped according to the full mapping below, then
# case differences (according to UnicodeData.txt and SpecialCasing.txt)
# are eliminated.

For instance the relevant entries for what I just said are:

0053; C; 0073; # LATIN CAPITAL LETTER S
00DF; F; 0073 0073; # LATIN SMALL LETTER SHARP S
00B5; C; 03BC; # MICRO SIGN
039C; C; 03BC; # GREEK CAPITAL LETTER MU

From what I can see right now, PCRE doesn't seem to do this. For starters -- am
I wrong? If not, what's the overall status of such a feature? For instance, how
are the four different Turkish "i" letters considered?

Thanks,
Giuseppe D'Angelo


-- 
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email
-- 
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev

[pcre-dev] [Bug 1208] New: Case folding in PCRE

Reply via email to