(I'm dropping Xavier Leroy from the CC list.)

On Wed, Mar 16, 2005 at 12:54:33PM +1300, Tony Meyer wrote:

>>> I just got this "interesting" spam, you might be interested
>>> in. Clever way to evade bayesian filters that classify "I don't
>>> know" (which looks to me like is the guaranteed score here...)

>> I meant "if you get the details right", obviously. This particular
>> mail screwed the details (still sent HTML, still a spammy sentence
>> in the first link in the HTML (!), ...), but I wouldn't be
>> surprised a mail with "the details right" would evade most bayesian
>> filtering. Am I missing something?

> I wrote a whole response to this before I figured out that these are
> ascii art pictures and not images (they look like images to me).

> The message will have to HTML for this to work, though, or the ascii
> art is too huge to fit anything in (or the ways that it can be drawn
> are limited).  (Looking at the plain-text version of the message, I
> can't make out the words at all).

No? I don't run any HTML engine on my mails - ever (unless I have
vaguely checked the HTML code and it comes from a clueless family
member, and even then, it goes through "lynx -dump"), and the ascii
art was clear and readable to me. Maybe a question of habit (I suppose
you did use a fixed pitch font to look at the plain-text version? If
not, this would probably explain it), or my bad eyesight "averaged
out" the "pixels". ;-)

> I get a fair number of these sorts of messages now.

> What did the message score for you?

I don't have a well-trained spambayes, so I cannot give precise
figures.

> Your message scored 0.997712 for me, which isn't too bad considering
> it was to you and not me.

What does it score if you remove the spammy parts? I mean:

 - the following line:

   <a href="http://vietnamese.com.medattuneto.com/?Bggiw/x";>more convenience: 
LOow price meds<BR>

 - all HTML tags, and the text/html MIME declaration

 - maybe "satisfaction" from the title?

> (All but five of the spam clues were hapaxes, so this is almost
> certainly because I received something very similar that was unsure
> and then trained on).

I see. Were some of them the "random" words making up the ASCII art?
Then you may have gotten the very same spam before :)

> There are filters that make an effort to look at the message in 'eye
> space' (i.e. as the user sees it),

Really going to eye space for that kind of thing needs OCR... That's a
wholly new level of complexity throw in.

> If this sort of thing works, then more of that might be necessary,
> although I think there are other ways of countering this.

What ways are you thinking about?

-- 
Lionel
_______________________________________________
spambayes-dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/spambayes-dev

Reply via email to