[qmailtoaster] Re: Special-case spam filtering based on envelope sender

Eric Shubert Sat, 23 Aug 2014 14:15:24 -0700

On 08/23/2014 12:07 PM, Eric Shubert wrote:

On 08/23/2014 11:26 AM, Eric Shubert wrote:

It appears that these spams are using random text that's hidden inside
of html in order to beat the bayes filter. At least that's my guess.


I'm guessing that if we write a filter/editor that strips out all
unviewable text from html content in a message before sending it to
sa-learn, the bayes filter will be effective once again.

Thoughts on this? Anyone know of a filter we can pipe messages through
on their way to sa-learn?


It looks as though search engines also consider hidden text to be spam.
http://www.seologic.com/faq/hidden-text


Ok, so all of these that I've examined have
<font color="white">
in them to hide text at the end of the email.

You can quickly check to see if there's hidden text by selecting thetext (it changes color then). Viewing the source will show the techniquethat's being used to hide the text. Actually,

<font color="white">

is a pretty unsophisticated technique from what I've read about it.Fortunately it should be pretty easy to identify as well.


Looking into the SA rules, I see this:
body HTML_FONT_LOW_CONTRAST     eval:html_test('font_low_contrast')

describe HTML_FONT_LOW_CONTRAST HTML font color similar or identical tobackground

I would expect this to be finding such a thing. This is included in the/var/lib/spamassassin/3.003002/updates_spamassassin_org/20_html_tests.cffile.The Mail::SpamAssassin::Plugin::HTMLEval plugin is loaded according to--lint.

So now I'm wondering, why isn't this rule firing for these messages? Isthe test so lame that it doesn't pick up the <font color="white"> asbeing low contrast? I do see some HTML_FONT_LOW_CONTRAST occurrences inthe spamd log (maillog for me) files, so the rule is firing sometimes.The scoring is:

score HTML_FONT_LOW_CONTRAST 0.713 0.001 0.786 0.001

That might be lower than it should be, but on these messages I'm seeing,this rule isn't firing at all. Why not?


I just received two more of these spams. This time, they both use
<div style="color:white">

to hide the (random) text. That's a *little* more sophisticated. Still,the rule didn't fire.

So I think I'm on the right track with this rule. Just need to figureout why it's not firing, and probably will need to adjust the scoringupwards as well.


Stay tuned. (Or dig in yourself if you'd like a little challenge!)

--
-Eric 'shubes'


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[qmailtoaster] Re: Special-case spam filtering based on envelope sender

Reply via email to