On Thu, 2019-03-21 at 09:23 -0700, John Hardin wrote:
> On Thu, 21 Mar 2019, Savvas Karagiannidis wrote:
> 
> > What should be considered is the message's language. All messages
> > that were 
> > false positives had the following mime encoding (messages were
> > actually in 
> > greek):
> > 
> > Content-Type: text/[plain|html]; charset="windows-1253" or
> > Content-Type: text/[plain|html]; charset="iso-8859-7"
> > 
> > while all messages that were actual spam and were properly detected
> > had:
> > 
> > Content-Type: text/[plain|html]; charset="utf-8"
> 
> It should be fairly easy to add an exclusion based on that
> information. 
> However, that information may well be leveraged by spammers who are
> using that obfuscation...
> 
FWIW roughly 10% of my spam corpus uses <font> tags to set white text.
The ratio of using "white" to "#ffffff" to 1/3 - 2/3. I should say that
some of these messages are quite old - I keep them as test data when
I'm writing new rules: they are NOT used for Bayes training.

My mail archive contains 192540 messages in theory it contains no spam
apart, that is, from a small amount of spam eeled its way in. 145
messages in it contain 'color="white"' and 2293 contain
'color="#ffffff"' The combination makes up 1.27% of the archived
messages. 

My take is that so it would appear that it may deserve a small score,
but it is probably best used as a subrule.
 

Martin


Reply via email to