------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugs.exim.org/show_bug.cgi?id=897 --- Comment #1 from Philip Hazel <[email protected]> 2009-10-19 19:39:34 --- On Mon, 19 Oct 2009, Pavel Kostromitinov wrote: > However, having to deal with international characters almost constantly, I > would really appreciate something like a compile-time option (for compiling > pcre) to force it into using Unicode properties always. > I cannot just replace all the "\b" with complex constructions based on \p{}, > since I don't write patterns myself - end-users do it. And parsing their > patterns just to make correct replacement doesn't look appealing to me either. > > At least, I would greatly appreciate a hint on where should I look in pcre > sources to try and change this behaviour myself. Look at all the places in pcre_exec.c where one of the following opcodes are mentioned: OP_WORD_BOUNDARY, OP_NOT_WORD_BOUNDARY, OP_DIGIT, OP_NOT_DIGIT, OP_WHITESPACE, OP_NOT_WHITESPACE, OP_WORDCHAR, OP_NOT_WORDCHAR. There are 44 places in the code where you would have to make changes. They would be quite substantial changes because not only does the current code use a look-up table, it knows that it just needs to test one byte from the subject instead of looking for a general UTF-8 character. I suppose a compile-time option would be better that a runtime option, because that would save testing the option many times during a run. However, in theory there would still have to be a test for UTF-8 mode at run time. Some of the tests are inside loops - you don't want to test the flag every time round the loop, so two copies of the loop will probably be needed. Hmmm.... Maybe the compile-time option should be "force UTF-8 mode always and use Unicode properties always". Then a lot of testing for UTF-8 mode could be cut out and the PCRE_UTF8 option would be redundant. I have just (today) released PCRE 8.00 and I don't plan on working on PCRE now for some time, except to fix any important bugs that show up. I have, however, noted this item for thinking about sometime in the future. Philip -- Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email -- ## List details at http://lists.exim.org/mailman/listinfo/pcre-dev
