Hello, This is my first mail to this list. I am doing some hacking on GLib (part of GNOME), specifically their GRegex module which is built on top of this (incredible) library.
I was digging into pcre_compile.c (here<http://vcs.pcre.org/viewvc/code/trunk/pcre_compile.c?revision=1233&view=markup>) to understand ownership of 'errorptr'. I know now that ownership is not transferred to the caller. I found this line in pcre_compile2 (): >> *errorptr = find_error_text(errorcode); Following the trail, I found this comment: /* The texts of compile-time error messages. These are "char *" because they are passed to the outside world. Do not ever re-use any error number, because they are documented. Always add a new error instead. Messages marked DEAD below are no longer used. This used to be a table of strings, but in order to reduce the number of relocations needed when a shared library is loaded dynamically, it is now one long string. We cannot use a table of offsets, because the lengths of inserts such as XSTRING(MAX_NAME_SIZE) are not known. Instead, we simply count through to the one we want - this isn't a performance issue because these strings are used only when there is a compilation error. Each substring ends with \0 to insert a null character. This includes the final substring, so that the whole string ends with \0\0, which can be detected when counting through. */ I have no doubt that this design exists for very good reasons, as explained in the comment. (I had never considered relocation impact from a huge list of char pointers.) Regarding the note about "performance issue", I have an idea to improve. What if we create a (static) list of offsets that is only initialised at first compile error? Something like: >> static int error_text_offsets[78] = { -1 }; I am willing to write the code and submit a patch. However, since I am new to this project, I don't know enough about its style (example: cathedral vs. bazaar?). Also, I wanted to have my idea considered before writing code. If you think this idea is worthy, there are two areas of concern I can think of: Sizing error_text_offsets ================= I am unsure if error_text_offsets should be sized (1) static/precisely, (2) static/liberally, or (3) dynamically ((3a) precisely or (3b) liberally). (1) static/precisely: we need to keep error_text_offsets precisely sized to error_texts. Today, it looks to be 78. This uses the least amount of memory, but incurs higher risk/maintenance costs. I'm all about low-maintenance ("future proofing") where possible/reasonable. This would probably require a test to be written to ensure the precise size is correct. Imagine a scenario where error_texts has an addition, but error_text_offsets does not grow. I have not yet looked if PCRE includes testing as part of its release procedure. I assume yes. (2) static/liberally: something big, but not crazy, like 255. ex: static int error_text_offsets[255] If error_texts grows over time, we have space. Low maintenance, but takes a bit more memory. (3) dynamically: static int *error_text_offsets = NULL allocate on first use via malloc() or PRCE's built-in/override method Again: do we allocate precisely or liberally? (a) Precisely can be done via static constant or at runtime (counting the number of embedded strings). (b) Liberally: Allocate something big enough, like 255, to prevent double scan/two-pass over error_texts. First pass: Calculate size of offsets array. Second pass: Calculate and store offsets. Multithreading ========== Finally, what about multithreading (MT)? Imagine the scenario where two parallel threads call pcre_compile2 () independently. Both their compiles fail. The init routine for error_text_offsets needs to be MT-aware. Generally, (in C) I don't know how to properly handle that issue when initialising this list the first time. (I know how to do it in Java/C#, but there you get a huge framework library with every install. C is bit trickier due to cross platform issues -- pthreads, etc.) Plus, I know nothing about how PRCE handles/supports multithreading. Thanks, Arpe -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev
