Amir 'CG' Caspi wrote: > Spample: > http://pastebin.com/0jEMBA1X > > The other unfortunate thing is that this SHOULD have popped > HTML_COMMENT_GIBBERISH (my own home-baked version since there's not a > public one), but it didn't pop that one either. Of course, as I have > posted previously, I've had problems getting SA to hit > HTML_COMMENT_GIBBERISH even when it should, i.e. when feeding the mail > into regexpal.com, it says there are hits, but SA, for some unknown > reason, does not. So, I guess it's not surprising that > HTML_COMMENT_GIBBERISH didn't hit, but I still don't know why not. > > (For posterity, my HTML_COMMENT_GIBBERISH rule is the following: > rawbody HTML_COMMENT_GIBBERISH > /<!--\s*(?:[\w'"?.:;-]+\s+){100,}\s*-->/im > > I'm sure this isn't quite the best one,
This will probably hit on a lot of mail sent from Outlook as it regularly includes comments longer than 100 characters. Yes, I found this out the hard way... I've gone through several "gibberish comment" rules over time, but the only one that I've still got active is this: rawbody LONG_COMMENT m|<!--[^>{};]{200,}-->|s IIRC the exclusions are to prevent hits on legitimately commented Javascript, which may (sort of) reasonably run over 200 characters. I'd give Javascript in email a solid 10 points on its own if it didn't show up so often in ham. :( The other thing I've been seeing more recently are *very* large spams (600-900K, a handful over 1M) with several very large comments. After my own round of questions here on these I used this rule for a while: body OVERSIZE_COMMENT eval:html_text_match('comment', '(?s)^(?=.{32000})') -kgd