(I'm dropping Xavier Leroy from the CC list.) On Wed, Mar 16, 2005 at 12:54:33PM +1300, Tony Meyer wrote:
>>> I just got this "interesting" spam, you might be interested >>> in. Clever way to evade bayesian filters that classify "I don't >>> know" (which looks to me like is the guaranteed score here...) >> I meant "if you get the details right", obviously. This particular >> mail screwed the details (still sent HTML, still a spammy sentence >> in the first link in the HTML (!), ...), but I wouldn't be >> surprised a mail with "the details right" would evade most bayesian >> filtering. Am I missing something? > I wrote a whole response to this before I figured out that these are > ascii art pictures and not images (they look like images to me). > The message will have to HTML for this to work, though, or the ascii > art is too huge to fit anything in (or the ways that it can be drawn > are limited). (Looking at the plain-text version of the message, I > can't make out the words at all). No? I don't run any HTML engine on my mails - ever (unless I have vaguely checked the HTML code and it comes from a clueless family member, and even then, it goes through "lynx -dump"), and the ascii art was clear and readable to me. Maybe a question of habit (I suppose you did use a fixed pitch font to look at the plain-text version? If not, this would probably explain it), or my bad eyesight "averaged out" the "pixels". ;-) > I get a fair number of these sorts of messages now. > What did the message score for you? I don't have a well-trained spambayes, so I cannot give precise figures. > Your message scored 0.997712 for me, which isn't too bad considering > it was to you and not me. What does it score if you remove the spammy parts? I mean: - the following line: <a href="http://vietnamese.com.medattuneto.com/?Bggiw/x">more convenience: LOow price meds<BR> - all HTML tags, and the text/html MIME declaration - maybe "satisfaction" from the title? > (All but five of the spam clues were hapaxes, so this is almost > certainly because I received something very similar that was unsure > and then trained on). I see. Were some of them the "random" words making up the ASCII art? Then you may have gotten the very same spam before :) > There are filters that make an effort to look at the message in 'eye > space' (i.e. as the user sees it), Really going to eye space for that kind of thing needs OCR... That's a wholly new level of complexity throw in. > If this sort of thing works, then more of that might be necessary, > although I think there are other ways of countering this. What ways are you thinking about? -- Lionel _______________________________________________ spambayes-dev mailing list [email protected] http://mail.python.org/mailman/listinfo/spambayes-dev
