------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugs.exim.org/show_bug.cgi?id=1537 Summary: pcre_exec does not fill offsets for certain regexps Product: PCRE Version: 8.36 Platform: x86-64 OS/Version: All Status: NEW Severity: bug Priority: high Component: Code AssignedTo: [email protected] ReportedBy: [email protected] CC: [email protected] In the PCRE documentation, in chapter " How pcre_exec() returns captured substrings", the returned offsets are described as follows: The first pair of integers, ovector[0] and ovector[1], identify the portion of the subject string matched by the entire pattern. The next pair is used for the first capturing subpattern, and so on. The value returned by pcre_exec() is one more than the highest numbered pair that has been set. For example, if two substrings have been captured, the returned value is 3. /.../ It is possible for capturing subpattern number n+1 to match some part of the subject when subpattern n has not been used at all. For example, if the string "abc" is matched against the pattern (a|(z))(bc) the return from the function is 4, and subpatterns 1 and 3 are matched, but 2 is not. When this happens, both values in the offset pairs corre- sponding to unused subpatterns are set to -1. Now if we consider the following pattern: /(?:((abcd))|(((?:(?:(?:(?:abc|(?:abcdef))))b)abcdefghi)abc)|((*ACCEPT)))/ and run pcretest, with this pattern and data "1234abcd", putting breakpoint on pcre_exec, before pcre_exec we have this: #1 0x0000000100007ea6 in main (argc=1606415904, argv=0x7fff5fbff620) at pcretest.c:5207 5207 PCRE_EXEC(count, re, extra, bptr, len, start_offset, (gdb) l 5207 5202 } 5203 #endif 5204 5205 else 5206 { 5207 PCRE_EXEC(count, re, extra, bptr, len, start_offset, 5208 options | g_notempty, use_offsets, use_size_offsets); 5209 if (count == 0) 5210 { 5211 fprintf(outfile, "Matched, but too many substrings\n"); So we're calling the pcre_exec and offsets are in use_offsets. Since pcre_exec does not require any initialization to offsets array, I've filled the array with junk data: (gdb) x/12wx use_offsets 0x100100a00: 0xbeef5555 0xbeef5555 0xbeef5555 0xbeef5555 0x100100a10: 0xbeef5555 0xbeef5555 0xbeef5555 0xbeef5555 0x100100a20: 0xbeef5555 0xbeef5555 0xbeef5555 0xbeef5555 Now after returning from the pcre_exec call we get: 0x000000010000a32a in main (argc=1, argv=0x7fff5fbff648) at pcretest.c:5207 5207 PCRE_EXEC(count, re, extra, bptr, len, start_offset, Value returned is $17 = 6 (gdb) p count $19 = 6 So pcre_exec returned 6, which means we have to expect 5 pattern offsets, plus one global offset. However, if we look at the offsets data, we get this: (gdb) x/12wx use_offsets 0x100100a00: 0x00000000 0x00000000 0xbeef5555 0xbeef5555 0x100100a10: 0xbeef5555 0xbeef5555 0xbeef5555 0xbeef5555 0x100100a20: 0xbeef5555 0xbeef5555 0x00000000 0x00000000 So only the first pair and the last pair of offsets were initialized, but the rest keeps containing random junk. Not only this is unexpected, this can lead to very bad consequences if the client trusts pcre_exec and passes the offset table as is to pcre_get_substring_list() since this function does not check the offsets and tries to calculate sizes based on them, and random junk there may lead to very bad consequences. In fact, pcretest itself proceeds with: 0: ERROR: bad negative value -1091611307 for offset 2 ERROR: bad negative value -1091611307 for offset 3 1: <unset> ERROR: bad negative value -1091611307 for offset 4 ERROR: bad negative value -1091611307 for offset 5 2: <unset> ERROR: bad negative value -1091611307 for offset 6 ERROR: bad negative value -1091611307 for offset 7 3: <unset> ERROR: bad negative value -1091611307 for offset 8 ERROR: bad negative value -1091611307 for offset 9 4: <unset> 5: Which is not really what is expected from just processing a pattern. I would expect pcre_exec to initialize all of the offsets with correct values or at least with -1 values as described above in the docs. If not, then in the documentation it should be clearly stated that the offset array should be zeroed out prior to calling pcre_exec. -- Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev
