On Wed, 29 Jan 2014, Joe Quinn wrote:

On 1/29/2014 11:53 AM, Andy Jezierski wrote:
 I've been noticing a lot of spam getting through with the same traits, a
 bunch of random words within brackets.  They all seem to come after the
 </body> or the </html> tag.  Anyone much more knowledgeable than me care
 to assist with a rule to detect them?

 Example:

 </html>

 </body>
 <style>
 <geehrter>
 <convaincre>
 <eingerichtet>
 <piuttosto>
 <meny>

...etc snipped.

I've been seeing that as well. They seem to all begin with <style> as well, to keep that crap from going through mail client HTML parsers.

You can probably exploit the fact that nobody is ever going to write a style block that doesn't match /[{}]/, but I haven't been able to experiment yet with any rules.

There is already a style gibberish rule.

http://ruleqa.spamassassin.org/20140128-r1562007-n/STYLE_GIBBERISH/detail

I wouldn't recommend going the more general route of counting invalid HTML tags, simply due to the enormity of trying to maintain such a rule over time.

Not in a rule certainly. That would be more proper in a plugin. Agreed that maintenance of the list of valid hTML tags would be an ongoing issue unless the list is available in machine-parseable form somewhere and a code generator based on that is used to support the plugin.

--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 jhar...@impsec.org    FALaholic #11174     pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  Maxim IX: Never turn your back on an enemy.
-----------------------------------------------------------------------
 3 days until the 11st anniversary of the loss of STS-107 Columbia

Reply via email to