This patch is inspired by the work that I did for ufdbGuard and a few emails 
with Amos.

Attached is a patch for squid 3.1.12 to optimise lists of regular expressions.
The optimisations are:
* initial .* is stripped
* RE-1 RE-2 ... RE-n are joined into one large RE: (RE-1)|(RE-2)|...|(RE-n)
* -i ... -i options are optimised: the second one is ignored, same for +i

The only modified file is src/acl/RegexData.cc

attached are the patch (RegexData.cc.patch) and files for a unit test:
squidtest.conf
re.4lines       - used in squidtest.conf; contains REs
re.200lines     - used in squidtest.conf; contains REs
unittest_re_optim_wget - script with wget commands to trigger squid to evaluate 
REs

unittest_re_optim_wget contains instructions on how to setup and perform a unit 
test

I am not subscribed to the squid-dev mailing list.
Please reply to my email address also.

Marcus Kool
[email protected]

Amos Jeffries wrote:
On 01/06/11 09:18, Marcus Kool wrote:
Hi,

after some emails with Amos I agreed to make a patch for
squid to optimise lists of regular expressions. The
optimisations are:
* initial .* is stripped
* RE-1 RE-2 ... RE-n are joined into one large RE: (RE-1)|(RE-2)|...|(RE-n)
* -i ... -i options are optimised: the second one is ignored, same for +i

The only modified file is src/acl/RegexData.cc

My question for submitting the patch:
how do want the patch? is the output of the following command OK?
LC_ALL=C TZ=UTC0 diff -Naur src/acl/RegexData.cc src/acl/RegexData.cc.orig

That should be fine.


I used a test set: a squid.conf, two files with regular expressions
and a file with wget commands to test URLs.
Do you want/need these?

That would be helpful for unit-tests. So yes, thank you.


How to post the patch ?

As attachment please, with [PATCH] subject prefix and a description suitable for commit message. From an email you are happy adding permanently to the credits records.


I am not subscribed to the squid-dev mailing list. Please reply
to my email address also.

Thanks

Marcus Kool

Amos
abc.com
urlfilterdb.com/secret
xs4all.nl/verysecret
cnn.com/public
-i
abc.example.com/scripts/cgi-bin/40example.cgi
-i
foo\.example\.com/html/index\.php
-i
foo\.example\.com/html/asfsecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
01john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/01example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/skdfhsecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
02john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/02example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/234second\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
03john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/03example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/sdfsaassecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
04john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/04example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/345nsecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
05john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/05example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/asfkdhsadsecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
06john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/06example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/2345234nnsecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
07john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/07example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/asd0second\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
08john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/08example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/sdgw1second\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
09john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/09example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/safn2nsecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
10john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/10example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/345n2second\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
11john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/11example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/sdfn3nsecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
12john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/12example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/sdfbdfsbdsf9second\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
13john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/13example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/nsdnfds92nsecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
14john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/14example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/ndsnsdansdasecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
15john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/15example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/dsfn3n3nsecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
16john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/16example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/nfsdnaosecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
17john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/17example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/oodfmsdjsecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
18john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/18example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/dsansdnn3second\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
19john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/19example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/n31n1n2nsecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
20john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/20example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/nsdadndnxnxnsecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
21john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/21example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/fjfkdkdkdsecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
22john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/22example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/ndndnddndndsecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
23john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/23example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/mmckcmcmcsecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
24john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/24example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/gdgdgdgdsecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
25john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/25example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/krkrkrkrsecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
26john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/26example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/utututututsecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
27john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/27example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/kfkfkfkfksecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
28john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/28example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/kkkkksecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
29john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/29example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/qqqqnsecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
30john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/30example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/kkkkskskssecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
31john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/31example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/33k3k3second\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
32john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/32example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/44k44k4k4second\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
33john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/33example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/00d0d0d0second\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
34john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/34example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/2k2k2k2second\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
35john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/35example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/aaananasecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
36john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/36example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/kwkwkwkwkwsecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
37john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/37example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/qkqkqkqkqjsdsecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
38john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/38example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/oododofofnsecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
39john.*doe.example.com/.*/index.php
abc.example.com/scripts/cgi-bin/39example.cgi
foo\.example\.com/html/index\.php
foo\.example\.com/html/oeoeoeoekkfnsecond\.php
foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php
40john.*doe.example.com/.*/index.php
--- src/acl/RegexData.cc        2011-06-02 13:18:27.000000000 +0000
+++ src/acl/RegexData.cc.orig   2011-05-28 13:54:06.000000000 +0000
@@ -4,12 +4,6 @@
  * DEBUG: section 28    Access Control
  * AUTHOR: Duane Wessels
  *
- * Regular Expression Optimisation added by Marcus Kool, June 2011.
- * optimisations:
- *     initial .* is stripped
- *     RE-1 RE-2 ... RE-n are joined into one large RE: 
(RE-1)|(RE-2)|...|(RE-n)
- *     -i ... -i options are optimised: the second one is ignored
- *
  * SQUID Web Proxy Cache          http://www.squid-cache.org/
  * ----------------------------------------------------------
  *
@@ -119,262 +113,49 @@
     return W;
 }
 
-
-_SQUID_INLINE_ static char * removeUnnecessaryWildcards( char * t )
-{
-    char * orig = t;
-
-    /* NOTE: an initial '.' might seem unnessary but is not; 
-     * it can be a valid requirement that cannot be optimised 
-     */
-    while (*t == '.'  &&  *(t+1) == '*') {
-       t += 2;
-    }
-
-    if (*t == '\0') {
-       debugs(28, 0, "" << cfg_filename << " line " << config_lineno << ": " 
<< config_input_line);
-       debugs(28, 0, "WARNING: regular expression '" << orig << "' has only 
wildcards and matches all strings." );
-        return orig;
-    }
-
-    if (t != orig) {
-       debugs(28, 0, "" << cfg_filename << " line " << config_lineno << ": " 
<< config_input_line);
-       debugs(28, 0, "WARNING: regular expression '" << orig << "' has 
unnecessary wildcard(s)" );
-    }
- 
-    return t;
-}
-
-
-static relist ** compileRE( relist **Tail, char * RE, int flags )
-{
-    int errcode;
-    relist *q;
-    regex_t comp;
-
-    if (RE == NULL  ||  *RE == '\0')
-        return Tail;
-
-    if ((errcode = regcomp(&comp, RE, flags)) != 0) {
-       char errbuf[256];
-       regerror(errcode, &comp, errbuf, sizeof errbuf);
-       debugs(28, 0, "" << cfg_filename << " line " << config_lineno << ": " 
<< config_input_line);
-       debugs(28, 0, "compileRE: invalid regular expression: '" << RE << "': " 
<< errbuf);
-       return NULL;
-    }
-    debugs(28, 2, "compileRE: compiled '" << RE << "' with flags " << flags );
-
-    q = (relist *) memAllocate(MEM_RELIST);
-    q->pattern = xstrdup(RE);
-    q->regex = comp;
-    *(Tail) = q;
-    Tail = &q->next;
-
-    return Tail;
-}
-
-
-static int compileOptimisedREs( relist **curlist, wordlist * wl )
-{
-    relist **Tail;
-    relist *newlist;
-    relist **newlistp;
-    int numREs = 0;
-    int totalNumREs = 0;
-    int flags = REG_EXTENDED | REG_NOSUB;
-    int largeREindex = 0;
-    char largeRE[BUFSIZ];
-
-    largeRE[0] = '\0';
-    newlist = NULL;
-    newlistp = &newlist;
-
-    while (wl != NULL) {
-       int RElen;
-       RElen = strlen( wl->key );
-
-        if (strcmp(wl->key, "-i") == 0) {
-           if (flags & REG_ICASE) {
-               /* optimisation of  -i ... -i */
-               debugs(28, 3, "compileOptimisedREs: optimisation of -i ... -i" 
);
-           }
-           else {
-               debugs(28, 2, "compileOptimisedREs: -i" );
-               newlistp = compileRE( newlistp, largeRE, flags );
-               if (newlistp == NULL) {
-                   aclDestroyRegexList( newlist );
-                   return 0;
-               }
-               if (numREs > 1)
-                   debugs(28, 2, "compileOptimisedREs: " << numREs << " REs 
are optimised into one RE." );
-               flags |= REG_ICASE;
-               totalNumREs += numREs;
-               largeREindex = numREs = 0;
-               largeRE[largeREindex] = '\0';
-           }
-        }
-        else if (strcmp(wl->key, "+i") == 0) {
-           if ((flags & REG_ICASE) == 0) {
-               /* optimisation of  +i ... +i */
-               debugs(28, 3, "compileOptimisedREs: optimisation of +i ... +i" 
);
-           }
-           else {
-               debugs(28, 2, "compileOptimisedREs: +i" );
-               newlistp = compileRE( newlistp, largeRE, flags );
-               if (newlistp == NULL) {
-                   aclDestroyRegexList( newlist );
-                   return 0;
-               }
-               if (numREs > 1)
-                   debugs(28, 2, "compileOptimisedREs: " << numREs << " REs 
are optimised into one RE." );
-               flags &= ~REG_ICASE;
-               totalNumREs += numREs;
-               largeREindex = numREs = 0;
-               largeRE[largeREindex] = '\0';
-           }
-        }
-       else if (RElen > BUFSIZ-1) {
-            debugs(28, 0, "" << cfg_filename << " line " << config_lineno << 
": " << config_input_line);
-            debugs(28, 0, "compileOptimisedREs: regular expression is larger 
than " << BUFSIZ-1 << " characters: " << wl->key );
-            debugs(28, 0, "compileOptimisedREs: the above regular expression 
is skipped" );
-       }
-       else if (RElen + largeREindex + 3 < BUFSIZ-1) {
-           debugs(28, 4, "compileOptimisedREs: adding RE '" << wl->key << "'" 
);
-           if (largeREindex > 0)
-               largeRE[largeREindex++] = '|';
-           largeRE[largeREindex++] = '(';
-           for (char * t = wl->key; *t != '\0'; t++)
-               largeRE[largeREindex++] = *t;
-           largeRE[largeREindex++] = ')';
-           largeRE[largeREindex] = '\0';
-           numREs++;
-       } else {
-           debugs(28, 2, "compileOptimisedREs: buffer full, generating new 
optimised RE..." );
-           newlistp = compileRE( newlistp, largeRE, flags );
-           if (newlistp == NULL) {
-               aclDestroyRegexList( newlist );
-               return 0;
-           }
-           if (numREs > 1)
-               debugs(28, 2, "compileOptimisedREs: " << numREs << " REs are 
optimised into one RE." );
-           totalNumREs += numREs;
-           largeREindex = numREs = 0;
-           largeRE[largeREindex] = '\0';
-           continue;    /* do the loop again to add the RE to largeRE */
-       }
-       wl = wl->next;
-    }
-
-    newlistp = compileRE( newlistp, largeRE, flags );
-    if (newlistp == NULL) {
-       aclDestroyRegexList( newlist );
-        return 0;
-    }
-
-    if (numREs > 1)
-       debugs(28, 2, "compileOptimisedREs: " << numREs << " REs are optimised 
into one RE." );
-
-    /* no errors, so put the new list at the tail */
-    if (*curlist == NULL) {
-       *curlist = newlist;
-    }
-    else {
-       for (Tail = curlist; *Tail != NULL; Tail = &((*Tail)->next))
-           ;
-       (*Tail) = newlist;
-    }
-    
-    if (totalNumREs > 100)  {
-       debugs(28, 0, "" << cfg_filename << " line " << config_lineno << ": " 
<< config_input_line );
-       debugs(28, 0, "compileOptimisedREs: there are " << totalNumREs << " 
regular expressions. " 
-                     "This is considered bad use of REs. "
-                     "Consider using less REs or use rules without expressions 
like 'dstdomain'." );
-    }
-
-    return 1;
-}
-
-
-static void compileUnoptimisedREs( relist **curlist, wordlist * wl )
+static void aclParseRegexList(relist **curlist);
+void
+aclParseRegexList(relist **curlist)
 {
-    int totalNumREs = 0;
     relist **Tail;
-    relist **newTail;
+    relist *q = NULL;
+    char *t = NULL;
+    regex_t comp;
+    int errcode;
     int flags = REG_EXTENDED | REG_NOSUB;
 
-    for (Tail = curlist; *Tail != NULL; Tail = &((*Tail)->next))
-        ;
-
-    while (wl != NULL) {
-        int RElen;
-       RElen = strlen( wl->key );
-        if (strcmp(wl->key, "-i") == 0) {
+    for (Tail = (relist **)curlist; *Tail; Tail = &((*Tail)->next));
+    while ((t = ConfigParser::strtokFile())) {
+        if (strcmp(t, "-i") == 0) {
             flags |= REG_ICASE;
+            continue;
         }
-        else if (strcmp(wl->key, "+i") == 0) {
+
+        if (strcmp(t, "+i") == 0) {
             flags &= ~REG_ICASE;
+            continue;
         }
-       else if (RElen > BUFSIZ-1) {
-            debugs(28, 0, "" << cfg_filename << " line " << config_lineno << 
": " << config_input_line);
-            debugs(28, 0, "compileUnoptimisedREs: regular expression is larger 
than " << BUFSIZ-1 << " characters: " << wl->key );
-            debugs(28, 0, "compileUnoptimisedREs: the above regular expression 
is skipped" );
-       } else {
-           newTail = compileRE( Tail, wl->key , flags );
-            totalNumREs++;
-           if (newTail == NULL) {
-               debugs(28, 0, "compileUnoptimisedREs: the above regular 
expression is skipped" );
-           }
-           else
-               Tail = newTail;
-       }
-       wl = wl->next;
-    }
-
-    if (totalNumREs > 100) {
-       debugs(28, 0, "" << cfg_filename << " line " << config_lineno << ": " 
<< config_input_line );
-       debugs(28, 0, "compileUnoptimisedREs: there are " << totalNumREs << " 
regular expressions. "
-                     "This is considered bad use of REs. "
-                     "Consider using less REs or use rules without expressions 
like 'dstdomain'." );
-    }
-}
 
+        if ((errcode = regcomp(&comp, t, flags)) != 0) {
+            char errbuf[256];
+            regerror(errcode, &comp, errbuf, sizeof errbuf);
+            debugs(28, 0, "" << cfg_filename << " line " << config_lineno << 
": " << config_input_line);
+            debugs(28, 0, "aclParseRegexList: Invalid regular expression '" << 
t << "': " << errbuf);
+            continue;
+        }
 
-static void aclParseRegexList(relist **curlist)
-{
-    char *t;
-    wordlist *wl = NULL;
-
-    while ((t = ConfigParser::strtokFile()) != NULL) {
-        t = removeUnnecessaryWildcards(t);
-       if (strlen(t) > BUFSIZ-1) {
-            debugs(28, 0, "" << cfg_filename << " line " << config_lineno << 
": " << config_input_line );
-            debugs(28, 0, "aclParseRegexList: regular expression is larger 
than " << BUFSIZ-1 << " characters: '" << wl->key << "'" );
-            debugs(28, 0, "aclParseRegexList: the above regular expression is 
skipped" );
-       }
-       else {
-            debugs(28, 4, "aclParseRegexList: buffering RE '" << t << "'" );
-           wordlistAdd(&wl, t);
-       }
-    }
-
-    if (!compileOptimisedREs(curlist, wl)) {
-       debugs(28, 0, "aclParseRegexList: optimisation of regular expressions 
failed; using fallback method without optimisation" );
-        compileUnoptimisedREs(curlist, wl);
+        q = (relist *)memAllocate(MEM_RELIST);
+        q->pattern = xstrdup(t);
+        q->regex = comp;
+        *(Tail) = q;
+        Tail = &q->next;
     }
-
-    wordlistDestroy(&wl);
 }
 
 void
 ACLRegexData::parse()
 {
     aclParseRegexList(&data);
-
-#ifdef _SQUID_VERY_VERBOSE_DEBUGGING
-    for (relist * l = data;  l != NULL;  l = l->next) {
-        debugs( 28, 2, "ACLRegexData::parse result: '" << l->pattern << "'" );
-    }
-#endif
 }
 
 bool
# section 3 is options parsing
# section 28 is ACL
debug_options ALL,1 3,9 28,9

visible_hostname squidunittest.com
max_filedescriptors 1024

via off
follow_x_forwarded_for deny all
forwarded_for off

http_port 33129

icp_port 0
htcp_port 0

#  TAG: hierarchy_stoplist
#       A list of words which, if found in a URL, cause the object to
#       be handled directly by this cache.  In other words, use this
#       to not query neighbor caches for certain objects.  You may
#       list this option multiple times.
#
# We recommend you to use at least the following line.
hierarchy_stoplist cgi-bin ?

#  TAG: no_cache
#       A list of ACL elements which, if matched, cause the reply to
#       immediately removed from the cache.  In other words, use this
#       to force certain objects to never be cached.
#
#       You must use the word 'DENY' to indicate the ACL names which should
#       NOT be cached.
#
#We recommend you to use the following two lines.
acl QUERY urlpath_regex cgi-bin \?
no_cache deny QUERY

acl microsoft1 dstdomain .microsoft.com .windowsupdate.com .windows.com

acl iabc url_regex -i aaa bbb ccc
acl iabc url_regex -i xaaa xbbb xccc x?ddd
acl iabcIde url_regex -i aaaa bbbb cccc +i dddd eeee
acl iabcidef url_regex -i axx bxx cxx -i dxx exx fxx
acl owc1 url_regex -i .*ABCDEF ..*GHIJKLM
acl simple1 url_regex -i abc.com
acl simple2 url_regex .example.com/abc
acl long1 url_regex -i abcabc -i defdef -i defghi kkklll mmmnnn ooo ppp qqq sss 
hhh uuu zzz
acl error1 url_regex -i err -i erro -i error 
where-is-the-error-in-the-long-expression abc[missingbracket errors
acl URLREGEX1 url_regex "/local/test/etc/re.4lines"
acl URLREGEX2 url_regex "/local/test/etc/re.200lines"

# OPTIONS WHICH AFFECT THE CACHE SIZE
# -----------------------------------------------------------------------------
cache_mem 8 MB

cache_swap_low 90
cache_swap_high 91


# LOGFILE PATHNAMES AND CACHE DIRECTORIES
# -----------------------------------------------------------------------------

cache_dir aufs /local/test/cache 128 16 128

logformat combha %>a %ui %un [%tl] "%rm %ru HTTP/%rv" %>Hs %<st %Ss:%Sh %>ha

access_log      /local/test/logs/testaccess.log combha
cache_log       /local/test/logs/testcache.log
cache_store_log none

pid_filename /local/test/logs/testsquid.pid

refresh_pattern  ^ftp:        600     20%     10080
refresh_pattern  ^gopher:     600     0%      600
refresh_pattern  .            0       20%     4320 

shutdown_lifetime 2 seconds



acl  nohackers01  dstdomain .xupiter.com

#Recommended minimum configuration:
acl mynet1 src 10.8.0.0/24
acl mynet2 src 10.9.0.0/24
acl mynet3 src 10.0.8.138/32
acl manager proto cache_object
acl localhost src 127.0.0.1/32
acl SSL_ports port 443 563
acl Safe_ports port 80          # http
acl Safe_ports port 21          # ftp
acl Safe_ports port 443 563     # https, snews
acl Safe_ports port 322 554     # rtsps, rtsp
acl Safe_ports port 70          # gopher
acl Safe_ports port 210         # wais
acl Safe_ports port 1025-65535  # unregistered ports
acl Safe_ports port 280         # http-mgmt
acl Safe_ports port 488         # gss-http
acl Safe_ports port 591         # filemaker
acl Safe_ports port 777         # multiling http
acl CONNECT method CONNECT


#Recommended minimum configuration:
#
# Only allow cachemgr access from localhost
http_access allow manager localhost
http_access deny manager

# Deny requests to unknown ports
http_access deny !Safe_ports

# Deny CONNECT to other than SSL ports
http_access deny CONNECT !SSL_ports

#
# INSERT YOUR OWN RULE(S) HERE TO ALLOW ACCESS FROM YOUR CLIENTS
#
http_access deny nohackers01
http_access allow localhost
http_access allow mynet1
http_access allow mynet2
http_access allow mynet3
http_access deny all

cache_mgr [email protected]

cache_effective_user squid
cache_effective_group squid

cachemgr_passwd unittest all


# we use these always_direct directives to make sure that Squid evaluates the 
url_regex rules
always_direct allow microsoft1
always_direct allow iabc 
always_direct allow iabcIde 
always_direct allow iabcidef 
always_direct allow owc1 
always_direct allow simple1
always_direct allow simple2
always_direct allow long1 
always_direct allow error1
always_direct allow URLREGEX1
always_direct allow URLREGEX2 

# We want to see the whole URL:
strip_query_terms off

#!/bin/sh
#
# unittest_re_optim_wget - # test the RE optimisation patch
#
# The squid.conf file should have to see all debug output
# debug_options ALL,1 28,9
# squid.conf of this unit test assumes that there is a squid tree in /local/test
# the configuration file needs to be edited in case an other directory is used.
# squid.conf has various url_regex directives and 2 references to files with 
REs:
#    acl URLREGEX1 url_regex "/local/test/etc/re.4lines"
#    acl URLREGEX2 url_regex "/local/test/etc/re.200lines"
#
# NOTE: use "squid -X -f /local/test/etc/squidtest.conf"  to see the debug 
output during startup

# To support multiple Squid instances, squidtest.conf has
#    http_port 33129
http_proxy=localhost:33129
export http_proxy

# squidtest.conf has url_regex ACLs which are used in 'always_direct allow foo' 
directives.
# and there is a  standard acl: acl QUERY urlpath_regex cgi-bin \?
# The following wget commands trigger the RE matching functions.

wget -q -O ttt01 http://www.example.com/abc/def?x=0
# aclRegexData::match: match '(cgi-bin)|(\?)' found in '/abc/def?x=0'
# aclRegexData::match: match '(.example.com/abc)' found in 
'http://www.example.com/abc/def?x=0'

wget -q -O ttt02 http://www.example.com/cgi-bin/report.pl?a=9
# aclRegexData::match: match '(cgi-bin)|(\?)' found in '/cgi-bin/report.pl?a=9'

wget -q -O ttt03 http://www.example.com/-1-2-exx-3-a.html
# aclRegexData::match: match '(axx)|(bxx)|(cxx)|(dxx)|(exx)|(fxx)' found in 
'http://www.example.com/-1-2-exx-3-a.html'

wget -q -O ttt04 http://40john-doe.example.com/foo/bar/index.php
# aclRegexData::match: match 
'(foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php)|(31john.*doe.example.com/.*/index.php)|(abc.example.com/scripts/cgi-bin/31example.cgi)|(foo\.example\.com/html/index\.php)|(foo\.example\.com/html/33k3k3second\.php)|(foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php)|(32john.*doe.example.com/.*/index.php)|(abc.example.com/scripts/cgi-bin/32example.cgi)|(foo\.example\.com/html/index\.php)|(foo\.example\.com/html/44k44k4k4second\.php)|(foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php)|(33john.*doe.example.com/.*/index.php)|(abc.example.com/scripts/cgi-bin/33example.cgi)|(foo\.example\.com/html/index\.php)|(foo\.example\.com/html/00d0d0d0second\.php)|(foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php)|(34john.*doe.example.com/.*/index.php)|(abc.example.com/scripts/cgi-bin/34example.cgi)|(foo\.example\.com/html/index\.php)|(foo\.example\.com/html/2k2k2k2second\.php)|(foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php)|(35john.*doe.example.com/.*/index.php)|(abc.example.com/scripts/cgi-bin/35example.cgi)|(foo\.example\.com/html/index\.php)|(foo\.example\.com/html/aaananasecond\.php)|(foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php)|(36john.*doe.example.com/.*/index.php)|(abc.example.com/scripts/cgi-bin/36example.cgi)|(foo\.example\.com/html/index\.php)|(foo\.example\.com/html/kwkwkwkwkwsecond\.php)|(foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php)|(37john.*doe.example.com/.*/index.php)|(abc.example.com/scripts/cgi-bin/37example.cgi)|(foo\.example\.com/html/index\.php)|(foo\.example\.com/html/qkqkqkqkqjsdsecond\.php)|(foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php)|(38john.*doe.example.com/.*/index.php)|(abc.example.com/scripts/cgi-bin/38example.cgi)|(foo\.example\.com/html/index\.php)|(foo\.example\.com/html/oododofofnsecond\.php)|(foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php)|(39john.*doe.example.com/.*/index.php)|(abc.example.com/scripts/cgi-bin/39example.cgi)|(foo\.example\.com/html/index\.php)|(foo\.example\.com/html/oeoeoeoekkfnsecond\.php)|(foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php)|(40john.*doe.example.com/.*/index.php)'
 found in 'http://40john-doe.example.com/foo/bar/index.php'

wget -q -O ttt05 
http://www.foo.example.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm.php?var=6
# aclRegexData::match: match '(cgi-bin)|(\?)' found in 
'/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm.php?var=6'

# 2011/06/02 14:10:59.889| aclRegexData::match: match 
'(abc.example.com/scripts/cgi-bin/40example.cgi)| ... (edited) ... 
|(foo\.example\.com/html/utututututsecond\.php)|(foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php)|(27john.*doe.example.com/.*/index.php)|(abc.example.com/scripts/cgi-bin/27example.cgi)|(foo\.example\.com/html/index\.php)|(foo\.example\.com/html/kfkfkfkfksecond\.php)|(foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php)|(28john.*doe.example.com/.*/index.php)|(abc.example.com/scripts/cgi-bin/28example.cgi)|(foo\.example\.com/html/index\.php)|(foo\.example\.com/html/kkkkksecond\.php)|(foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php)|(29john.*doe.example.com/.*/index.php)|(abc.example.com/scripts/cgi-bin/29example.cgi)|(foo\.example\.com/html/index\.php)|(foo\.example\.com/html/qqqqnsecond\.php)|(foo\.example\.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm\.php)|(30john.*doe.example.com/.*/index.php)|(abc.example.com/scripts/cgi-bin/30example.cgi)|(foo\.example\.com/html/index\.php)|(foo\.example\.com/html/kkkkskskssecond\.php)'
 found in 
'http://www.foo.example.com/html/another-very-long-url-to-test-buffers-of-the-re-optimisation-algorithm.php?var=6'

wget -q -O ttt06 http://www.foo.example.com/error.html
# aclRegexData::match: match 'err' found in 
'http://www.foo.example.com/error.html'

rm -f ./ttt??

Reply via email to