Re: Default Bayes Database

David F. Skoll Fri, 10 May 2013 14:59:24 -0700

On Fri, 10 May 2013 23:14:36 +0200
Karsten Bräckelmann <guent...@rudersport.de> wrote:


> I happened to be the lucky recipient of specific spam campaigns in
> languages I do not speak. Campaign referring to quite a few samples
> during a specific, relatively short time period. This definitely
> happened with French, Spanish, and Turkish. Odds are high for any word
> in those languages being on the seriously spammy side. Unlike for
> anyone actually speaking these languages...

We (probably) have a much larger sample population, so this tends not
to be as much of a problem for us.

> I do receive quite specific campaigns, plain text, no obfuscation,
> offering private health insurance ("Private Krankenversicherung" in
> German). That is a totally valid phrase. Unlike English, German tends
> to concatenate words to form specifics -- "Krankenversicherung" is
> pretty much a word-by-word translation of "health insurance". This
> makes the word more rare, "health" on its own in comparison hardly
> gives a hint. And the totally legit word is spammy for me, because I
> usually do not talk about that topic in mail. My next door neighbor
> probably would disagree...

Again, the key is a large sample size.

> "Your ham is someone else's spam" on a different level: There are
> quite a few reports in bugzilla, where an obfuscation pattern matches
> a legit word in non-English languages.

These are edge cases that are pretty easily handled with personal
Bayes databases or whitelisting if the system keeps getting it wrong.

> Accents are good for obfuscation. But accents also are entirely legit.

And we can tell which is which, based on a large sample size.

> Paypal. And them notifying their customers about changes in the terms
> of use. And actually sending out the full terms of use in the same
> mail. In this case, again, German -- but they managed to score a
> whopping 12.2 once for me. Yes, of course, BAYES_99.

Was this with your personal Bayes data?  Even that can be wrong sometimes...

Regards,

David.

Re: Default Bayes Database

Reply via email to