At 1:41 PM -0600 08/10/2013, Amir 'CG' Caspi wrote:
(The HTML comment gibberish rule would be a big step here, since that's one of the few things that would distinguish this from ham... unlikely that a real person would embed tens of KB of comment gibberish.)

OK, I'm trying to test an HTML comment gibberish rule and having some problems. I'm using the following test spam, the same I showed before:
http://pastebin.com/VCtvzjzV

I'm testing the following rule:

# HTML comment gibberish
rawbody HTML_COMMENT_GIBBERISH  /<!--\s*(?:[\w'"?.:;-]+\s+){100,}\s*-->/im
tflags HTML_COMMENT_GIBBERISH   multiple
describe HTML_COMMENT_GIBBERISH lots of spammy text in HTML comment
score HTML_COMMENT_GIBBERISH    0.001

Now, when I run this test spam through SA, I do get a hit, but only a single hit... the rule is popping for the final HTML comment (the one beginning with "Simpsons"). However, there are two other HTML comments in this email, prior to the one that hit... for some reason, they are not hitting, even though I've set tflags=multiple. (I was considering having a meta rule that scored extra for multiple comments.)

My regex is valid and appropriate for those comments... I tested it at regexpal.com, which shows that all three comments match just fine (all three get highlighted).

So... why is SA hitting only on the final comment, and ignoring the first two? (I tried using a meta rule that popped if this hit more than once, and that meta rule did not pop. SA debug output shows only this one comment hitting, not the other two.) If my regex is fine and I've got tflags=multiple, what's preventing the other comments from hitting?

Thanks.

                                                --- Amir

Reply via email to