------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=933
           Summary: Multibyte symbols in bracket expressions are treated as
                    separate 1-byte symbols
           Product: PCRE
           Version: N/A
          Platform: Other
        OS/Version: Linux
            Status: NEW
          Severity: bug
          Priority: medium
         Component: Code
        AssignedTo: [email protected]
        ReportedBy: [email protected]
                CC: [email protected]


On UTF-8 locales bracket expressions with non-ASCII characters are matched as
if those were single-byte characters. 

For example '[бв]' which is \xd0\xb1\xd0\xb2 is treated as any of the symbols
\xd0, \xb1 or \xb2 rather than any either of the sequences \xd0\xb1 or
\xd0\xb2.

Try running “pcregrep -o '[бв]' random-symbols.txt” on the attached file.

Observed on libpcre versions 7.9 and 8.00, Gentoo Linux on AMD64.


-- 
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email
-- 
## List details at http://lists.exim.org/mailman/listinfo/pcre-dev 

Reply via email to