Amir 'CG' Caspi wrote:
> Spample:
> http://pastebin.com/0jEMBA1X
> 
> The other unfortunate thing is that this SHOULD have popped
> HTML_COMMENT_GIBBERISH (my own home-baked version since there's not a
> public one), but it didn't pop that one either.  Of course, as I have
> posted previously, I've had problems getting SA to hit
> HTML_COMMENT_GIBBERISH even when it should, i.e. when feeding the mail
> into regexpal.com, it says there are hits, but SA, for some unknown
> reason, does not.  So, I guess it's not surprising that
> HTML_COMMENT_GIBBERISH didn't hit, but I still don't know why not.
> 
> (For posterity, my HTML_COMMENT_GIBBERISH rule is the following:
> rawbody HTML_COMMENT_GIBBERISH      
> /<!--\s*(?:[\w'"?.:;-]+\s+){100,}\s*-->/im
> 
> I'm sure this isn't quite the best one,

This will probably hit on a lot of mail sent from Outlook as it
regularly includes comments longer than 100 characters.  Yes, I found
this out the hard way...

I've gone through several "gibberish comment" rules over time, but the
only one that I've still got active is this:

rawbody LONG_COMMENT    m|<!--[^>{};]{200,}-->|s

IIRC the exclusions are to prevent hits on legitimately commented
Javascript, which may (sort of) reasonably run over 200 characters.  I'd
give Javascript in email a solid 10 points on its own if it didn't show
up so often in ham.  :(

The other thing I've been seeing more recently are *very* large spams
(600-900K, a handful over 1M) with several very large comments.  After
my own round of questions here on these I used this rule for a while:

body OVERSIZE_COMMENT eval:html_text_match('comment', '(?s)^(?=.{32000})')

-kgd

Reply via email to