Thanks for taking the time to post a useful discussion. On Tue, 8 Nov 2011, Graycode wrote:
> I think it would be great if PCRE could invent a pcre_app_config() > function whereby the application could specify its default > limitations and configuration options. It should include things like > the memory allocation / free vectors, match_limit_recursion, > match_limit, etc. These are all currently present in PCRE, either > as static variables or as members of the extra structure. All I'm > suggesting is that a pcre_app_config() could establish default > handing, and that new function could spread those settings back > out into static variables that are private in the library. I am not very knowledgeable about threads, but it seems to me that this would not work, at least not in a Unix/Linux world (which is where I operate) because the static variables would be shared by all threads. Unless I am missing something (and that may well be true!) there is no concept of "static variables that are private to the library" in Unix/Linux. (I also suspect that in most Unix/Linux systems PCRE is installed as a shared library.) I do already have an item on the Wish List that reads as follows: . Write a wrapper to maintain a structure with specified runtime parameters, such as recurse limit, and pass these to PCRE each time it is called. Also maybe malloc and free. In a threaded world, such a wrapper would have to keep the data in thread-local storage, possibly passed as an argument. Not really sure how this would work. > I suggest not requiring that pcre_callout be set that way. I think > it's not-the-same kind of configuration option because it's more > likely to have a different value for different threads of an > application. Consider adding a pcre_callout call-back function > pointer as a member of the extra structure that the application > can assign, next to the callout_data pointer that's already there. Yes, I think that is something that I will do. > Trying to carry the memory management vectors through the PCRE code > by starting with a pcre_compile3() seems difficult and may be more > trouble than it's worth. It would not, in fact, be difficult. There is already a local structure that is carried through the code (it contains "static" variables); adding one or more fields to it is straightforward. The same is true of pcre_exec. The difficulty is in how to get the new data into pcre_compile and other functions. The only way I can see of doing this compatibly is to invent pcre_compile3 (etc). > Keep in mind that the thread that invokes pcre_compile2() may not be > the same thread that will call pcre_exec() to use it. Good point. > In our case all the setups including compile() are done by one thread, > and later the exec() using the compiled expressions are done by > multiple other threads. Releasing the compiled expression is also > done by the same thread that compiled them. That could matter a lot > depending on whether threaded memory is like fork() or other. Indeed, and I'd rather not tangle with those issues because they may differ from OS to OS. > By the way, we do make use of (and rely upon) the PCRE memory > management vectors. I suspected that somebody might; it's good to know that the facility is used. That is more of an incentive to improve it if possible. Philip PS: I've just answered a post (Bugzilla 1049) about UTF-16. Taking that along with this issue, it is almost making a case that the current API has been pushed to its limits and that a totally new API should be created. I am not really happy about this for all sorts of reasons (not least because of the amount of work!) However, the current API has lasted a long time ... I think the last incompatible change was in 1998 or thereabouts. -- Philip Hazel -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev
