------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugs.exim.org/show_bug.cgi?id=1295 Summary: add 32-bit library Product: PCRE Version: N/A Platform: Other OS/Version: Linux Status: NEW Severity: bug Priority: medium Component: Code AssignedTo: [email protected] ReportedBy: [email protected] CC: [email protected] PCRE should add support to handle 32-bit character strings. Like the 8- and 16-bit libraries, there will be two modes of operation: (A) UTF-32 strings (B) arbitrary 32-bit character strings Like the 16-bit support, this will be done by adding a new libpcre32 library alongside the existing 8- and 16-bit libraries. The API is exactly the same as the existing 16-bit library except that all instances of "pcre16" were replaced with "pcre32", and the character data type used is an unsigned 32-bit integer type. Since the existing PCRE_INFO_FIRSTCHAR and PCRE_INFO_LASTLITERAL return the first/req char in an integer but also return negative numbers in some special cases (ie to indicate that no such char was set), it was necessary to add new PCRE_INFO_FIRSTLITERAL and PCRE_INFO_LASTLITERAL2 to return those characters in an unsigned int32, and PCRE_INFO_FIRSTLITERALSET and PCRE_INFO_LASTLITERAL2SET that return the information previously communicated via the negative values. All tests pass. (Some tests have different output on 16/32 bit, so I've split the output into -16 and -32 for some tests, and also added new tests 23..26 for testing things really specific to just 16 or 32-bit and non-UTF vs. UTF.) Since UTF-32 only occupies 21 bits of the 32-bit characters, it's useful for implementations to use the upper bits to store extra info (flags, etc). Since it's more efficient to pass the unmodified strings to pcre32, I aim to make pcre32 mask out those upper bits. This is done in the code but hasn't been debugged yet (it's not working yet). To allow arbitrary 32-bit character strings for goal (B), I had to make some extra changes to the code where previously characters were passed around in an int with negative values reserved for special purposes; matching itself wasn't tested yet beyond the existing tests. So I suspect there'll be more work to do; possibly a full audit of the code for signed/unsigned conversions and assigning values from one int type to another (truncation); compile warning flags can help us there. The JIT compiler also works in pcre32; I only had to comment out the use of the fast_forward_first_two_chars() function since I couldn't figure out how to port it to 32-bit; help appreciated there (and for everything else too :-). The docs have already been updated to included the 32-bit library (except the new values for pcre32_fullinfo), but the html docs haven't been updated yet (is there some automation for that?). To check out the code, get the "pcre32" branch from my gitorious repository at https://gitorious.org/~chpe/pcre/chpe-pcre . (It'll be frequently rebased for updates from svn.) (BTW, I've also set up a (manually updated) git-svn clone of the PCRE svn repository at https://gitorious.org/pcre/pcre ). -- Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev
