On Wednesday, March 23, 2005, 6:04:10 PM, Darrell wrote:

Dsic> Pete, 

Dsic> Doesnt Sniffer have a certain level of support for regex's?  I know we 
have
Dsic> had good luck with regex's like this which catch obfuscation techniques 
with
Dsic> viagra with Declude.  We found it easier to use regex's than to list all 
of
Dsic> the different variations. 

Dsic> 
(?:\b|\s)[_\W]{0,3}(?:\\\/|V)[_\W]{0,3}[ij1!|l\xEC\xED\xEE\xEF][_\W]{0,3}[a4
Dsic> [EMAIL PROTECTED],3}[xyz]?[gj][_\W]{0,3}r[_\W]{0,[EMAIL PROTECTED],
Dsic> 3}x?[_\W]{0,3}(?:\b|\s) 

The compiler and scanner we use has a limited regex capability. Some
of the features you've used here were kept out of the engine on
purpose. Later versions of the engine (under development) will have
some more of these features - eventually including all of the features
found on most regex systems, and then moving beyond them.

Slick regex patterns like the one you have here are often useful for
describing patterns, but not always as useful for rapidly developing
and modifying dynamic pattern matching schemes.

For example - the regex you have stated here will match a wide range
of permutations in a single statement. That is, after all, a strength
of regex. However in practice it is often found that most of the
possible patterns simply are never seen "in the wild" or that some
specific variations might be problematic... In these cases it is
better to use a small catalog of simpler patterns because they can be
implemented and understood incrementally, and they can be very easily
excluded on a one-by-one basis if needed. Adding that kind of
flexibility to the regex you have here could make it even more
difficult to understand and correctly encode --- since we have a very
small staff creating and modifying hundreds of rules per day seconds
count. I have to admit that it would take me a few minutes to
completely understand what the above regex really does - and chances
are that if I modified it I would be much more likely to introduce an
error than I would using our more simplified coding scheme.

That's not to say that we won't be introducing more complex pattern
matching capabilities - we certainly will. However, the syntax for
these rules will be less concerned with an economy of keystrokes and
more concerned with reliable, rapid generation and modification.

For example, the coding system we have planned will be able to break
down the pattern you've represented into a number of functional units
that can be mixed and re-used in a hierarchical structure. This will
allow both the robots and the humans to understand and manipulate the
patterns very easily.

Regex (as written) is a good way to represent some patterns
efficiently - but it has the down side that the syntax can be
arbitrarily difficult and that does not naturally represent conceptual
structures that might be found in the patterns to be matched and
readily reused.

Best,

_M







This E-Mail came from the Message Sniffer mailing list. For information and 
(un)subscription instructions go to 
http://www.sortmonster.com/MessageSniffer/Help/Help.html

Reply via email to