Before I help you with your shell and regex issues, I should point out that this is not a very strong rule. It will hit ham.
On 04/21/2011 02:54 PM, Kevin Miller wrote: > I'm trying to write a local rule that will scan for 5 or more > instances of "<br>" but not having much luck. I'm testing first on > the CLI, just trying to get the syntax down. > What works: > I have a file called DomainLiterals.txt with repeating characters > and it returns expected results: > mkm@mis-mkm-lnx:~$ egrep \[10.]{3} DomainLiterals.txt > you can add a line containing only [10.10.10.10] to > /etc/mail/local-host-names where 10.10.10.10 is the IP address you The regex '\10.]{3}' is invalid. It un-escapes from the command line as '[10.]{3}' but will match any of these: 111 ... 000 10. .01 since it is asking for three of any character matching one, zero, or dot. The grouping symbol you are looking for is a curly-bracket, and the dot (when outside a square bracket) must be escaped as it otherwise means "any single character." > However, doing this fails: > mxg:/var/spool/MailScanner/quarantine/20110421/nonspam # egrep \[<br>]{5,} > p3LJZSnX024470 > -bash: br: No such file or directory > > The file p3LJZSnX024470 is just a plain text file in a quarantine directory. Again, you have a CLI escaping issue AND a regex issue. If you are not quoting that query, you need to escape almost every single punctuation character listed there. Alternatively, you could put that query in quotes. "egrep \[<br>]{5,} p3L..." tells the shell that you are looking for the query "[" from input file "br" and you want to output your results to (invalid) file "]" and then run the command "5," in a subshell, followed by a third command (your email file). "egrep '[<br>]{5,}' p3L..." prevents the shell from trying to interpret your query but still has a bad query, as it looks for five or more consecutive occurrences of any character listed between the angle brackets, so "<b>brr</b>" will match up to the slash. > What am I missing? I'll turn this into a body rule once I get the > syntax right then test it for a day or so w/a score of .01. If I'm not > hitting legitimate mail I'll bump it up. On top of all of this, egrep does not use Perl-compatible regular expressions (PCRE) (though the regexps I've used so far are compatible with Posix regexps as well as PCRE). See 'man perlre' (or your favorite website) for help on PCREs. Try using either grep -P (requires libpcre3) or pcregrep (which you may have to install) or else perl itself, like: perl -ne 'print if /whatever/' < DomainLiterals.txt As to what that should be searching for, I suspect you want a multi-line expression (which none of the above shell commands will help you with since they parse one line at a time). Try this: header LOCAL_10_10_10_10 X-Spam-Relays-Untrusted =~ /^[^\[]+ ip=(?:10\.){3}/ rawbody LOCAL_5X_BR_TAGS /(?:<br\/?>[\s\r\n]{0,4}){5}/mi That second one will also match <br/> and allows for a few spaces, tabs, or linebreaks in between the <br> tags. For a more strict version of what you're looking for, try this: rawbody LOCAL_5X_BR_TAGS /(?:<br>){5}/i Note that you need rawbody since body rules will strip HTML. Again, this rule will hit some hams. It is also not terribly CPU-efficient. Better solution: put some examples up on a pastebin and link them to us so we can help you find more diagnostic (and simpler) patterns to nail them with.
signature.asc
Description: OpenPGP digital signature