------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugs.exim.org/show_bug.cgi?id=1074 Summary: Incorrect length check in match_ref(...) Product: PCRE Version: 8.11 Platform: All OS/Version: All Status: NEW Severity: bug Priority: medium Component: Code AssignedTo: [email protected] ReportedBy: [email protected] CC: [email protected] I had a discussion with Philip and it turned out that some unicode uppercase-lowercase pairs have different length, like the following pair: 570 - 11365 (I have no idea about the glyph, but it doesn't matter). Their utf8 representation (in C hecxa character form): \xc8\xba = 570 \xe2\xb1\xa5 = 11365 The following regular expression incorrectly reports a match: const char* pattern = "(\xc8\xba\xc8\xba\xc8\xba)?\\1" on string: const char* input = "\xc8\xba\xc8\xba\xc8\xba\xe2\xb1\xa5\xe2\xb1\xa5" The input is basically the char 570 repeated 3 times, and char 11365 repeated twice. The pattern also contans char 570 repeated 3 times. Output: 0, 12, 0, 6 Actually match_ref do an early byte-length check, which is invalid in this case since the length of three '570' is the same as two '11365'. -- Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email -- ## List details at http://lists.exim.org/mailman/listinfo/pcre-dev
