Re: [pcre-dev] Matching file contents as one string using PCRE_DOTALL

pcunite Fri, 10 May 2013 07:10:51 -0700

> Philip
> Some comments about that regex, independent of DOTALL:

My personal thanks for taking the time to comment in this thread. Using your
library I've been able to achieve my goals. However, as you've noted ... it
could be better. I'll briefly tell you what I'm doing and this might explain
the approach I've taken.


PCRE is the backend to a frontend GUI that supports DOS and Boolean terms. The
GUI will therefore convert the simple syntax into PCRE syntax. I never know 
what the user will type or the order. Thus I have built a parser that does its
best to convert it sanely. Here are some example conversions. Grouping allowed
me to make a smaller parser.

Typed into GUI ------------------> My parsed conversions

File names:
-------------
*.txt ---------------------------> (?=.*\.txt$)
*txt ----------------------------> (?=^txt.*)
*.txt  OR *.exe -----------------> (?=.*\.txt$)|(?=.*\.exe$)
*.txt NOT *.exe -----------------> ((?=.*\.txt$))(?!.*\.exe$)

File contents
-------------
hello ---------------------------> (?=.*hello.*)
hello  OR world -----------------> (?=.*hello.*)|(?=.*world.*)
hello NOT world -----------------> ((?=.*hello.*))(?!.*world.*)
hello AND world -----------------> (?=.*hello.*)(?=.*world.*)
hello AND world NOT sky ---------> ((?=.*hello.*)(?=.*world.*))(?!.*sky.*)

As you can see I use ^ or $ when they are looking for file names - there is no
newlines to worry with. You may also notice that I'm building my strings 
starting with (? and
appending =.* or !.* as appropriate. Your suggestion to use a lazy STAR would 
not be
hard for me to add.

I don't want any form of LookBehind (to minimize stack use) occurring at all. 
Also,  if
you're telling me I can drop the trailing .* when looking inside of files and 
it increases
performance then I'm all for that! However, it will be awhile before I can 
create a parser
that is highly efficient. Because of the grouping I thought .* made things work 
in 
testing. I'll double check.


-- 
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev

Re: [pcre-dev] Matching file contents as one string using PCRE_DOTALL

Reply via email to