On 04/02/2012 09:40 AM, Kris Deugau wrote:
> Can anyone point out what bit of stupidity I'm committing in trying
> to use this:
> 
> rawbody OVERSIZE_COMMENT        m|<!--(?!-->).{32000,}|s
> 
> to match messages that are mostly very very long HTML comment(s)?
> 
> Testing the same regex against the whole raw message outside of SA
> seems to fire just fine.

There are already a few rules that do this sort of thing.  Use them as
models:

% grep html_text_match..comment 20_html_tests.cf
body HTML_COMMENT_SHORT eval:html_text_match('comment', '<!(?!-).{0,6}>')
body HTML_COMMENT_SAVED_URL eval:html_text_match('comment', '<!-- saved
from url=\(\d{4}\)')
body __COMMENT_EXISTS eval:html_text_match('comment', '<!.*?>')

Try this:

body OVERSIZE_COMMENT  eval:html_text_match('comment',
'<!--(?!.?-->).{512,}-->')

Any more that 512 chars isn't going to be helpful but will end up being
computationally expensive (I've played with this idea).  Also, I'd say
this is more of a ham indicator than a spam indicator.

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to