The following seems like a real common way to end a spam, yet no rule
catches it.  I've been trying to write such a rule, but I just don't know
enough about regexps to get it to work rather than blow up in my face.  I
assume you have to use rawbody to catch this, which is still line-oriented,
and the pattern has to span multiple lines to work.

Sample junk:

</BODY></HTML>
<br>
baboon poppy burnett lengthwise hoar deadline diophantine infra pius malay
dire silica diffract alcestis beachcomb hammerhead scintillate pussy giant
genii deformation bodice anvil chromic scatterbrain contingent diploid
humphrey considerate globule beauteous drunkard bellini buoy palmyra
revolutionary cranelike cried amelia govern <br>
freeman fifty fragment gingham marks comptroller alfonso madmen rumford
erbium cochran continuity bainite antiquarian maurine demonstrable bambi
serf fickle denote arteriole janeiro cease drunk ifni chartroom antony coma
molten dupont allegory greengrocer airstrip drip crewel beriberi agone
calico instantiate <br>

---------------------------

This is in a valid html section of a mimed message.  There will be a mime
terminator later, so in general the message is not malformed except for the
tons of random words after the </HTML> tag.

I have seen this both with and without the <br> lines.  So a good rule has
to be able to skip an optional <br> on one or more lines after </HTML>, then
look for 20 or so lowercase-letters-only words, possibly mixed with more
<br> tags.

Suggestions on how to scan across multiple lines while parsing the rawbody?
I'd think the rule would be simple if I could just figure out how to write
it!  :-)

        Loren

Reply via email to