I am author of ufdbGuard, a free URL filter for Squid.
You may want to check it out: ufdbGuard is multithreaded and supports
POSIX regular expressions.

If you do not want to use ufdbGuard, here is a tip:
ufdbGuard composes large REs from a set of "simple" REs:
largeRE = (RE1)|(RE2)|...|(REn)
which reduces the CPU time for the RE matching logic considerably.

Marcus


Henrik K wrote:
On Mon, Nov 01, 2010 at 03:00:21PM +0000, decl...@is.bbc.co.uk wrote:
Besides that, I have a laaarge url_regexp file to process, and I was
wondering if there was any benefit to trying to break this regexp out to a
perl helper process (and if anyone has a precooked setup doing this I can
borrow)

The golden rule is to run as few regexp as possible.. no matter how big they
are.

Dump your regexpes through Regexp::Assemble:
http://search.cpan.org/dist/Regexp-Assemble/Assemble.pm

Then compile Squid with PCRE support (LDFLAGS="-lpcre -lpcreposix") for
added performance.

I've only modified Squid2 myself, but for Squid3 you probably need to change
this in cache_cf.cc:

- while (fgets(config_input_line, BUFSIZ, fp)) {
+ while (fgets(config_input_line, 65535, fp)) {

... because Squid can't read a huge regexp in a single line otherwise.
Of course your script must not feed too many regex to go over that limit.

I'm also assuming you've converted as many rules as possible to dstdomain
etc, which is the first thing to do.



Reply via email to