------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugs.exim.org/show_bug.cgi?id=976 Summary: pcre_exec after pcre_study work incorrectly. Product: PCRE Version: N/A Platform: Other OS/Version: Windows Status: NEW Severity: bug Priority: medium Component: Code AssignedTo: [email protected] ReportedBy: [email protected] CC: [email protected] With 17 UTF-8 letters of Cyrillic block pcre_exec after pcre_study works incorrectly if options PCRE_UTF8 and PCRE_CASELESS are used. Simple code exapmle to demonstrate this problem: #include <pcre.h> #include <iostream> int pcre_test(const char* pattern, const char* str, bool study) { const char* err = NULL; int num = 0; const int options = PCRE_UTF8 | PCRE_CASELESS; pcre* regexp = pcre_compile(pattern, options, &err, &num, NULL); pcre_extra* extra = study ? pcre_study(regexp, 0, &err) : NULL; int vec[256] = {0}; const int res = pcre_exec(regexp, extra, str, 2, 0, 0, vec, sizeof(vec)/sizeof(int)); pcre_free(regexp); pcre_free(extra); return res; } int main() { const char* upc8[] = {"\xd0\x81", "\xd0\xa0", "\xd0\xa1", "\xd0\xa2", "\xd0\xa3", "\xd0\xa4", "\xd0\xa5", "\xd0\xa6", "\xd0\xa7", "\xd0\xa8", "\xd0\xa9", "\xd0\xac", "\xd0\xab", "\xd0\xaa", "\xd0\xad", "\xd0\xae", "\xd0\xaf" }; const char* lowc8[] = {"\xd1\x91", "\xd1\x80", "\xd1\x81", "\xd1\x82", "\xd1\x83", "\xd1\x84", "\xd1\x85", "\xd1\x86", "\xd1\x87", "\xd1\x88", "\xd1\x89", "\xd1\x8c", "\xd1\x8b", "\xd1\x8a", "\xd1\x8d", "\xd1\x8e", "\xd1\x8f" }; for (size_t ii = 0; ii < sizeof(upc8)/sizeof(upc8[0]); ++ii) { if (pcre_test(lowc8[ii], upc8[ii], true) != pcre_test(lowc8[ii], upc8[ii], false)) { std::cout << "!?\n"; } } return 0; } 17 !? will be printed. -- Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email -- ## List details at http://lists.exim.org/mailman/listinfo/pcre-dev
