On Wed, 29 Jan 2014, Joe Quinn wrote:
On 1/29/2014 11:53 AM, Andy Jezierski wrote:
I've been noticing a lot of spam getting through with the same traits, a
bunch of random words within brackets. They all seem to come after the
</body> or the </html> tag. Anyone much more knowledgeable than me care
to assist with a rule to detect them?
Example:
</html>
</body>
<style>
<geehrter>
<convaincre>
<eingerichtet>
<piuttosto>
<meny>
...etc snipped.
I've been seeing that as well. They seem to all begin with <style> as well,
to keep that crap from going through mail client HTML parsers.
You can probably exploit the fact that nobody is ever going to write a style
block that doesn't match /[{}]/, but I haven't been able to experiment yet
with any rules.
There is already a style gibberish rule.
http://ruleqa.spamassassin.org/20140128-r1562007-n/STYLE_GIBBERISH/detail
I wouldn't recommend going the more general route of counting invalid HTML
tags, simply due to the enormity of trying to maintain such a rule over time.
Not in a rule certainly. That would be more proper in a plugin. Agreed
that maintenance of the list of valid hTML tags would be an ongoing issue
unless the list is available in machine-parseable form somewhere and a
code generator based on that is used to support the plugin.
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhar...@impsec.org FALaholic #11174 pgpk -a jhar...@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Maxim IX: Never turn your back on an enemy.
-----------------------------------------------------------------------
3 days until the 11st anniversary of the loss of STS-107 Columbia