J Anjen <scruple532 at ...> writes:

> The signup process for the devl and tech mailing lists
> have been screwed up lately so I'm not sure if this
> will make it or not.
> 
> Come to #Frost on TOR if you want to coordinate.
> 
> Here's a copy of the script I've been working on, it's
> not done yet and it doesn't have Bayesian although
> Reverned might work if I can get more documentation.
--snip--

Thanks for making the effort, but please realise that bayesian or any
other kind of body content filtering can never work against completely
freeform spam (at least on its own) so I wouldn't waste too much time
on this approach if I were you.

It does work on email spam, because there has to be some structure for
it to serve its purpose : there's usually a site URL, the headers are
virtually always forged in detectable ways, if there's an originating
IP you can see if it's in cable modem zombie land, if it's phishing
it's generally disguised in a few common ways, there's often symbols
or unusual letter patterns to get past naive string filters, 
viruses/worms generally use a few static attachment names or they're
all the same size, uncommon words are often randomly appended as hash
busters and so on.

The only purpose of Frost spam on the other hand is to annoy you, so
it can be absolutely anything. So far we've had racist jokes, blank
posts, random single symbols and one-liners from some story or other.
But it could be anything else. How do you propose to filter text from
random blog searches or other people's valid posts being reposted, 
for example? How about markov chains? You can't, so don't bother trying.

The *only* viable approach to the spam problem I have seen is a full
web of trust system. When identities must behave for a significant time
to be read by any number of people and become ignored across whole
trust webs as soon as they misbehave, trusted identity effectively
becomes expensive and thus spamming becomes a lot more effort and 
much less effective.

Bob



Reply via email to