On 04/02/2012 09:40 AM, Kris Deugau wrote: > Can anyone point out what bit of stupidity I'm committing in trying > to use this: > > rawbody OVERSIZE_COMMENT m|<!--(?!-->).{32000,}|s > > to match messages that are mostly very very long HTML comment(s)? > > Testing the same regex against the whole raw message outside of SA > seems to fire just fine.
There are already a few rules that do this sort of thing. Use them as models: % grep html_text_match..comment 20_html_tests.cf body HTML_COMMENT_SHORT eval:html_text_match('comment', '<!(?!-).{0,6}>') body HTML_COMMENT_SAVED_URL eval:html_text_match('comment', '<!-- saved from url=\(\d{4}\)') body __COMMENT_EXISTS eval:html_text_match('comment', '<!.*?>') Try this: body OVERSIZE_COMMENT eval:html_text_match('comment', '<!--(?!.?-->).{512,}-->') Any more that 512 chars isn't going to be helpful but will end up being computationally expensive (I've played with this idea). Also, I'd say this is more of a ham indicator than a spam indicator.
signature.asc
Description: OpenPGP digital signature