On 1 January 2013 14:12, Philip Hazel <[email protected]> wrote: > On Mon, 31 Dec 2012, Kevin Connor Arpe wrote: > >> Apologies, I should be clearer. By "first" I do not mean multiple errors >> in the same pattern. I mean multiple, sequential calls to pcre_compile(). >> Imagine the scenario above where user is entering regex in a GUI. This >> causes continuous recompile -- after each keystroke. Each recompile will >> be different (probably), and may potentially fail. Another case: You have >> a big file of regexes that you want to try to compile (test, etc). > > I don't see how that would work with a shared library. With a static > library, yes, you could modify the data in the module. Note that there > are no static variables in PCRE other than those that are data tables > that are never changed. > > [A thought: perhaps I don't understand shared libraries. Does each user > get their own static section? If so, what I wrote above is nonsense.]
Each user gets its own writable data section; read-only data sections are instead shared between the users. With the proposed approach, if I got it right, every user would have to build the table of offsets when it encounters the first pattern compilation error; but all tables would be equal to each other and "read only" (you build it once, then always read from it). Doesn't sound a good idea. >> So when I say "first", I literally mean the first time error_texts is ever >> scanned (after PCRE lib is loaded into memory). At that point, we build >> the the indexer. For subsequent compiles that fail, we will have faster >> error lookup. > > I really don't believe you would notice much difference. Especially in > the example you gave of a human interacting. The time taken for a modern > cpu to scan through no more than 75 messages is minuscule. Using > pcretest interactively, for example, gives instant responses, even on > this old desktop computer of mine. Indeed, for 75 strings I don't think the difference would be noticeable. If we can figure out how to statically build the offset table [1], then it would be a mere space/time tradeoff (O(1) instead of O(n) time for lookup, at a O(n) ~~ 4n additional memory cost). > Another way of speeding it up - though again I do not believe it is > worth doing - would be to store the messages as a concatenated sequence > of "BCPL strings", that is, with a byte containing the length of the > string at the start. Then skipping over them is even faster, and there > would be no time wasted doing indexing in the cases when only one regex > is being compiled. But again, this requires some sort of preprocessing in order to deal with the XSTRING(...), which are fixed at configure time. And if we accept to do the preprocessing, then I think we can statically build the array of the offsets. Cheers, -- Giuseppe D'Angelo [1] See also * http://www.macieira.org/blog/2011/07/table-driven-methods-with-no-relocations/ * http://websvn.kde.org/trunk/KDE/kdesdk/scripts/generate_string_table.pl?view=markup -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev
