On Mon, 22 Mar 2010, Alex wrote:
rawbody __BODY_ONLY_URI
/^[^a-z]{0,10}(http:\/\/|www\.)(\w+\.)+(com|net|org|biz|cn|ru)\/?[^ 
]{0,20}[^a-z]{0,10}$/msi
This allows for some amount (up to ten chars?) of text before and
after the URI if I'm reading that right, correct?

Nope. With the /ms flags ^ and $ at beginning and end match the *whole* body as a single 'string' and permit 'any character' (. or [^x]) matches to also match newlines. So the above regex translates to:

/^ - Beginning of body
[^a-z]{0,10} - match 0-10 non-alpha characters *including* newlines
(http:\/\/|www\.) - match a uri beginning with http *or* www
(\w+\.)+ - match multiple occurences of word followed by .
        (this will match 'domain.' *or* 'www.domain.')
(com|net|biz|org|cn|ru) - match TLD (adjust to fit your mail)
\/? - match a slash if there is one
[^ ]{0,20} - match 0-20 non-blank characters (page name, if given)
[^a-z]{0,10} - match 0-10 non-alpha chars including newlines
     (did I TYPO in my OP and leave out the '^'?)
$ - match end of body
/msi

Is it possible to determine the beginning of the line with a body rule?

Insert '\n' into the above regex where you want to match newline.....

I didn't think that was possible. I believe this is also what this is trying to do?

It's possible, but NOT what this regex does. Essentially this regex matches against a complete body that consists of nothing more than a single URI on a line, with possible blank lines before or after. Rather than test for newlines, I test for non-alpha so that a stray space or tab or LF code does not fail to match.

This simple regex can also be 'dressed up' with elements of the form
(\<[^\>\<]+\> +)+ to match any HTML code inserted before or after the URI. A regex could also check for a link consisting of text enclosed by <a href=...> ... </a>

They key is to be sure that you don't use '*' or '+' in any context where it could 'run away' and try to match large message bodies.... This way as soon as the body exceeds 40 characters on either side of an unbroken string of characters it stops the test. Relatively efficient for a rawbody
test....

- C

Reply via email to