------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugs.exim.org/show_bug.cgi?id=897 --- Comment #13 from Philip Hazel <[email protected]> 2009-12-17 17:44:22 --- On Wed, 16 Dec 2009, Pavel Kostromitinov wrote: > Well, I added an attachment describing what I meant. It's a patch for > pcre_internal.h and pcre_exec.c, seemingly handling all what is needed > (in pcre_exec()) for UTF_USES_UCP to work. > > The only functions added are ucp_XXX at the end of pcre_exec, and they > are only called if they are needed. Surely, one could do it all as > #defines or just inline it, but this way it looks much cleaner in my > opinion. Adding the macros as you have done is indeed a nice tidy way of wrapping up all the different cases. I have not looked at them in detail, but they certainly seem to be reasonably at a quick glance, apart from one thing: #define CHAR_IS_WORDCHAR(c, utf8) (utf8 ? ucp_wordchar(c) : ((c < 256) && (md->ctypes[c] & ctype_word))) I really do not like using a function here. I am sure it will lose performance. The function is so short: int ucp_wordchar(int c) const ucd_record *prop = GET_UCD(c); return (_pcre_ucp_gentype[prop->chartype] == ucp_L) || (_pcre_ucp_gentype[prop->chartype] == ucp_N) || (c == '_'); I am sure there will be performance hit that can easily be avoided by making the macro generate inline code. The variable "prop" can be generally available in the function, and the macro can generate this for the UCD case: prop = GET_UCD(c), (_pcre_ucp_gentype[prop->chartype] == ucp_L) || (_pcre_ucp_gentype[prop->chartype] == ucp_N) || (c == '_') If a function is used, it would have to be called _pcre_ucp_wordchar() because it is an external (in the C sense) function. Also, it would have to be put into a separate source file because, assuming pcre_dfa_exec() is updated as well, some users may use only pcre_dfa_exec() and not pcre_exec(), and you do not want them to load unnecessary code. As a style point, I would never write (md->ctypes[c] & ctype_word) as a true/false value. I would always write ((md->ctypes[c] & ctype_word) != 0) partly to emphasize that it is a truth value, and partly to ensure that the value actually is 0 or 1 so that if it is part of a larger boolean expression, it will work as expected. Regards, Philip -- Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email -- ## List details at http://lists.exim.org/mailman/listinfo/pcre-dev
