On Thu, 30 Apr 2009, David Dennerline wrote: > Would there be any problem with adding a #ifdef SUPPORT_UCP to prevent > including the Unicode character table for pcre_ucd.c? I tried doing this > because I don't need UTF-8 support and it decreases binary size by 50KB. I > saw the GET_UCD() is not called in pcre_dfa_exec() only if SUPPORT_UCP is > called.
I don't see any problem, but then again, I don't see the need. Surely if there are no references to the module, it won't get included in the binary? I thought that was how libraries worked? > The program compiles and links correctly, but I wanted to double-check to > see if there would be any potential instability. You did not say which operating system you are using. Is it Windows? I know nothing about Windows, never having used it. I have just done an experiment on Linux, and when I compile with UCP support disabled, adding #ifdef SUPPORT_UCP makes no difference at all to the size of the binaries for pcretest and pcregrep (though it does reduce the size of the pcre_ucd.o compiled module). The binaries are, however, noticeably smaller than when UCP support is enabled (by more than 50K because a lot of other code is cut out as well as the tables). > Second, has there ever been any discussion or any plans on trying to > implement a hybrid NFA/DFA engine that would improve performance for > applications that do not require back-references (i.e., substitution) or > other non-DFA friendly constructs. Something like Henry Spencer's Tcl > regular expression parser. There has been no discussion or planning that I am aware of. A while before I retired (18 months ago) I did start thinking about the possibility of turning the compiled regex into a proper state table for a traditional finite state machine that would probably execute faster than pcre_dfa_exec(). However, I did not get very far (it was very tricky, as I recall) and I have not picked this up again since. Philip -- Philip Hazel -- ## List details at http://lists.exim.org/mailman/listinfo/pcre-dev
