On 31Jan2016 09:49, Paul Rubin <no.email@nospam.invalid> wrote:
Cameron Simpson <c...@zip.com.au> writes:
Adzapper. It has many many regexps matching URLs. (Actually a more
globlike syntax, but it gets turned into a regexp.) You plug it into
your squid proxy.

Oh cool, is that out there in circulation?

Yes:

 http://adzapper.sourceforge.net/

which includes the installation instructions (install script, add a line to squid.conf).

However my publication workflow is broken. (And source forge isn't what it used to be.) I need to get the update process improved. I'm happy to send the latest copy to people by private email.

It sounds like the approach of merging all the regexes into one and
compiling to a FSM could be a big win.  I wouldn't expect too big a
state space explosion.

Perhaps so. The existing script (a) merges regexps for successive patterns for the same class and (b) use's perl's "study" function, which examines a string which will have several regexps applies to it - it nots things like character positions I gather, which is used in the matching process. Since the zapper applies all the rules to most URLs this is a performance win.

Cheers,
Cameron Simpson <c...@zip.com.au>
--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to