On 10/19, Alex wrote: > body __SHORT_BODY /.{1,150}$/ That will match anything that ends in 1 to 150 characters of anything. So it'll match any email that has 1 or more characters.
> describe __SHORT_BODY Short email body > body __BODY_URI m{https?://.{1,50}$} That will match any email that ends with http:// followed by 1 to 50 characters of anythings, including spaces and other stuff not part of the url. "$" is not "I want stuff to stop matching here." It's the end. Either of the line, or of the email, depending on how SA handles newlines. > describe __BODY_URI Message body contains URI > meta LOC_SHORT (__SHORT_BODY && __BODY_URI) > describe LOC_SHORT Contains short body and URI > score LOC_SHORT 0.2 > > I'd appreciate it if someone could help me create rules to identify a > message body less than 150 chars and contains URL less than 50 chars. Some quick untested thoughts: body __LONG_BODY /.{151}/ describe __LONG_BODY Has a body of more than 150 characters body __BODY_URI m{https?://\S{1,49}(\s|$)} describe __BODY_URI Mesage body contains a URI meta LOC_SHORT ( ! __LONG_BODY && __BODY_URI) describe LOC_SHORT Contains long body and short URI score LOC_SHORT 0.2 You might be able to do: body __SHORT_BODY /(?!.{1,150})/ But I'm new to this "negative look-ahead assertion" thing. Happy to work on this more. Regexes can be some scary dense logic. I recommend creating a tiny perl script, with a sample bit of text to match, and working up the regex 1 character at a time. Start with: #!/usr/bin/perl use strict; use warnings; my $body = "http://www.example.com"; if ($body =~ m{http}) { print "Matched.\n"; } else { print "Didn't match.\n"; } And work up from there. I often have to do stuff like this when working with regexes. And don't forget testing on an example string that the regex shouldn't match. -- "...and he that hath no sword, let him sell his garment, and buy one." - Luke 22:36, King James Bible http://www.ChaosReigns.com