------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=865
           Summary: \b Does not work for non ascii characters in UTF-8
           Product: PCRE
           Version: N/A
          Platform: x86
        OS/Version: Linux
            Status: NEW
          Severity: bug
          Priority: medium
         Component: Code
        AssignedTo: [email protected]
        ReportedBy: [email protected]
                CC: [email protected]


\b Does not work in UTF-8 mode with characters that are encoded in more than
one byte: It does not find 'words' that begin with a non-ascii letter.

E.g: I try to use the following expression to find the words in a text: 
(Pasted from C source)

"\\b([\\p{N}\\p{L}]+)[^\\p{N}\\p{L}]*"

It does not find the German words meaning 'change' and 'about' in:

Änderung oder änderung. Über eine Menge Worte muß man schreiben.


-- 
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email
-- 
## List details at http://lists.exim.org/mailman/listinfo/pcre-dev 

Reply via email to