On 31Jan2016 09:49, Paul Rubin <no.email@nospam.invalid> wrote:
Cameron Simpson <c...@zip.com.au> writes:
Adzapper. It has many many regexps matching URLs. (Actually a more
globlike syntax, but it gets turned into a regexp.) You plug it into
your squid proxy.
Oh cool, is that out there in circulation?
Yes:
http://adzapper.sourceforge.net/
which includes the installation instructions (install script, add a line to
squid.conf).
However my publication workflow is broken. (And source forge isn't what it used
to be.) I need to get the update process improved. I'm happy to send the latest
copy to people by private email.
It sounds like the approach of merging all the regexes into one and
compiling to a FSM could be a big win. I wouldn't expect too big a
state space explosion.
Perhaps so. The existing script (a) merges regexps for successive patterns for
the same class and (b) use's perl's "study" function, which examines a string
which will have several regexps applies to it - it nots things like character
positions I gather, which is used in the matching process. Since the zapper
applies all the rules to most URLs this is a performance win.
Cheers,
Cameron Simpson <c...@zip.com.au>
--
https://mail.python.org/mailman/listinfo/python-list