Re: One-line URI body spam

darxus Wed, 19 Oct 2011 19:21:54 -0700

On 10/19, Alex wrote:
> body            __SHORT_BODY    /.{1,150}$/

That will match anything that ends in 1 to 150 characters of anything.  So
it'll match any email that has 1 or more characters.


> describe        __SHORT_BODY    Short email body
> body            __BODY_URI      m{https?://.{1,50}$}

That will match any email that ends with http:// followed by 1 to 50
characters of anythings, including spaces and other stuff not part of the
url.  "$" is not "I want stuff to stop matching here."  It's the end.
Either of the line, or of the email, depending on how SA handles newlines.

> describe        __BODY_URI      Message body contains URI
> meta            LOC_SHORT       (__SHORT_BODY && __BODY_URI)
> describe        LOC_SHORT       Contains short body and URI
> score           LOC_SHORT       0.2
> 
> I'd appreciate it if someone could help me create rules to identify a
> message body less than 150 chars and contains URL less than 50 chars.

Some quick untested thoughts:

body            __LONG_BODY     /.{151}/
describe        __LONG_BODY     Has a body of more than 150 characters
body            __BODY_URI      m{https?://\S{1,49}(\s|$)}
describe        __BODY_URI      Mesage body contains a URI
meta            LOC_SHORT       ( ! __LONG_BODY && __BODY_URI)
describe        LOC_SHORT       Contains long body and short URI
score           LOC_SHORT       0.2

You might be able to do:
body            __SHORT_BODY    /(?!.{1,150})/
But I'm new to this "negative look-ahead assertion" thing.

Happy to work on this more.

Regexes can be some scary dense logic.  I recommend creating a tiny perl
script, with a sample bit of text to match, and working up the regex 1
character at a time.

Start with:

#!/usr/bin/perl
use strict; use warnings;
my $body = "http://www.example.com";;
if ($body =~ m{http}) {
  print "Matched.\n";
} else {
  print "Didn't match.\n";
}

And work up from there.  I often have to do stuff like this when working
with regexes.  And don't forget testing on an example string that the regex
shouldn't match.

-- 
"...and he that hath no sword, let him sell his garment, and buy one."
- Luke 22:36, King James Bible
http://www.ChaosReigns.com

Re: One-line URI body spam

Reply via email to