Re: Yahoo/URL spam
Hi, I'm having some additional difficulty with body URI rules and hoped someone could help. rawbody __BODY_ONLY_URI /^[^a-z]{0,10}(http:\/\/|www\.)(\w+\.)+(com|net|org|biz|cn|ru)\/?[^ ]{0,20}[a-z]{0,10}$/msi This doesn't seem to catch a quoted-printable body and I can't figure out how to adapt it to allow for the 'Content-Transfer-Encoding: that precedes the URL, if that's even the right approach. Here's an example: http://pastebin.com/NDR1n4sN Ideas greatly appreciated. Thanks, Alex
Re: [sa] Re: Yahoo/URL spam
On 3/23/2010 2:49 PM the voices made Charles Gregory write: On Tue, 23 Mar 2010, Alex wrote: This is what I have: /^[^a-z]{0,10}(http:\/\/|www\.)(\w+\.)+(com|net|org|biz|cn|ru)\/?[^ ]{0,20}[a-z]{0,10}$/msi My bad. I got an option wrong. Please remove the 'm' above. I always get it backwards. According to 'man perlre' (the definitive resource for SA regexes!) the 'm' makes '^' match every newline! We want it to only match the beginning of the body. So just remove it, and, as noted by others, add the '^' that was missing... like so ... ]{0,20}[^a-z]{0,10}$/si Hello, You might want to change (\w+\.)+ to ([\w-]+\.)+ to account for domains like polster-jj.de -- MG
Re: Yahoo/URL spam
Hi Charles, /^[^a-z]{0,10}(http:\/\/|www\.)(\w+\.)+(com|net|org|biz|cn|ru)\/?[^ ]{0,20}[^a-z]{0,10}$/msi This allows for some amount (up to ten chars?) of text before and after the URI if I'm reading that right, correct? Nope. With the /ms flags ^ and $ at beginning and end match the *whole* body as a single 'string' and permit 'any character' (. or [^x]) matches to also match newlines. So the above regex translates to: This was very helpful, thanks. I might be doing something wrong, or there's a typo somewhere. It seems to catch situations where there's more than just a URL in the body, such as just additional text. This is what I have: /^[^a-z]{0,10}(http:\/\/|www\.)(\w+\.)+(com|net|org|biz|cn|ru)\/?[^ ]{0,20}[a-z]{0,10}$/msi Thanks again, Alex
Re: Yahoo/URL spam
On Tue, 2010-03-23 at 13:18 -0400, Alex wrote: Hi Charles, /^[^a-z]{0,10}(http:\/\/|www\.)(\w+\.)+(com|net|org|biz|cn|ru)\/?[^ ]{0,20}[^a-z]{0,10}$/msi This is what I have: /^[^a-z]{0,10}(http:\/\/|www\.)(\w+\.)+(com|net|org|biz|cn|ru)\/?[^ ]{0,20}[a-z]{0,10}$/msi ^ The original had [^a-z] John. -- John Horne, University of Plymouth, UK Tel: +44 (0)1752 587287Fax: +44 (0)1752 587001
Re: [sa] Re: Yahoo/URL spam
On Tue, 23 Mar 2010, Alex wrote: This is what I have: /^[^a-z]{0,10}(http:\/\/|www\.)(\w+\.)+(com|net|org|biz|cn|ru)\/?[^ ]{0,20}[a-z]{0,10}$/msi My bad. I got an option wrong. Please remove the 'm' above. I always get it backwards. According to 'man perlre' (the definitive resource for SA regexes!) the 'm' makes '^' match every newline! We want it to only match the beginning of the body. So just remove it, and, as noted by others, add the '^' that was missing... like so ... ]{0,20}[^a-z]{0,10}$/si - Charles
Re: Yahoo/URL spam
On Mon, 22 Mar 2010, Alex wrote: rawbody __BODY_ONLY_URI /^[^a-z]{0,10}(http:\/\/|www\.)(\w+\.)+(com|net|org|biz|cn|ru)\/?[^ ]{0,20}[^a-z]{0,10}$/msi This allows for some amount (up to ten chars?) of text before and after the URI if I'm reading that right, correct? Nope. With the /ms flags ^ and $ at beginning and end match the *whole* body as a single 'string' and permit 'any character' (. or [^x]) matches to also match newlines. So the above regex translates to: /^ - Beginning of body [^a-z]{0,10} - match 0-10 non-alpha characters *including* newlines (http:\/\/|www\.) - match a uri beginning with http *or* www (\w+\.)+ - match multiple occurences of word followed by . (this will match 'domain.' *or* 'www.domain.') (com|net|biz|org|cn|ru) - match TLD (adjust to fit your mail) \/? - match a slash if there is one [^ ]{0,20} - match 0-20 non-blank characters (page name, if given) [^a-z]{0,10} - match 0-10 non-alpha chars including newlines (did I TYPO in my OP and leave out the '^'?) $ - match end of body /msi Is it possible to determine the beginning of the line with a body rule? Insert '\n' into the above regex where you want to match newline. I didn't think that was possible. I believe this is also what this is trying to do? It's possible, but NOT what this regex does. Essentially this regex matches against a complete body that consists of nothing more than a single URI on a line, with possible blank lines before or after. Rather than test for newlines, I test for non-alpha so that a stray space or tab or LF code does not fail to match. This simple regex can also be 'dressed up' with elements of the form (\[^\\]+\ +)+ to match any HTML code inserted before or after the URI. A regex could also check for a link consisting of text enclosed by a href=... ... /a They key is to be sure that you don't use '*' or '+' in any context where it could 'run away' and try to match large message bodies This way as soon as the body exceeds 40 characters on either side of an unbroken string of characters it stops the test. Relatively efficient for a rawbody test - C
Re: Yahoo/URL spam
Hi, Lots of ham may contain a URI, but how much ham contains ONLY a URI? Rough outline of rule, untested. rawbody __BODY_ONLY_URI /^[^a-z]{0,10}(http:\/\/|www\.)(\w+\.)+(com|net|org|biz|cn|ru)\/?[^ ]{0,20}[a-z]{0,10}$/msi Combine that with 'frequent abusers' like Yahoo, and you've got something you can give a few points This allows for some amount (up to ten chars?) of text before and after the URI if I'm reading that right, correct? Is it possible to determine the beginning of the line with a body rule? I didn't think that was possible. I believe this is also what this is trying to do? Thanks, Alex
Re: Yahoo/URL spam
On Thu, 18 Mar 2010, Ned Slider wrote: If that's not an option, how about a meta rule for FROM_YAHOO and __HAS_ANY_URI (this rule exists in SA). Lots of ham may contain a URI, but how much ham contains ONLY a URI? Rough outline of rule, untested. rawbody __BODY_ONLY_URI /^[^a-z]{0,10}(http:\/\/|www\.)(\w+\.)+(com|net|org|biz|cn|ru)\/?[^ ]{0,20}[a-z]{0,10}$/msi Combine that with 'frequent abusers' like Yahoo, and you've got something you can give a few points There will probably need to be a variant on this to account for HTML mail and/or the 'standard' footers inserted by free mail agents. Which incidentally, suprises me here. I thought Yahoo always added a tagline? - C
Yahoo/URL spam
Hi, I'm having a real problem with this persistent spam that contains just a URL as the body, and is always from yahoo. I've got an example here: http://pastebin.com/UqzhDHEu 'example.com' is my change. I'm using SA v3.2.5 with postfix/amavis. I'm concerned that the bayes score is always low. I can't determine any other patterns from this message to key on for other rules. Ideas most welcome. Thanks! Best, Alex
Re: Yahoo/URL spam
On Thu, 2010-03-18 at 18:05 -0400, Alex wrote: Hi, I'm having a real problem with this persistent spam that contains just a URL as the body, and is always from yahoo. I've got an example here: http://pastebin.com/UqzhDHEu 'example.com' is my change. I'm using SA v3.2.5 with postfix/amavis. I'm concerned that the bayes score is always low. I can't determine any other patterns from this message to key on for other rules. Ideas most welcome. There's something odd about the message as posted: I'm getting hits on MISSING_SUBJECT and MISSING_DATE (SA 3.3.0). Martin
Re: Yahoo/URL spam
On Thu, 18 Mar 2010 22:31:04 + Martin Gregorie mar...@gregorie.org wrote: There's something odd about the message as posted: I'm getting hits on MISSING_SUBJECT and MISSING_DATE (SA 3.3.0). Some of the wrapped headers aren't properly indented. Probably happened on editing.