--- Nick Croft  wrote:
> Just wondering if someone could enlighten me as to the meaning,
> purpose or
> origin of those strange nonsense words in spam.  I'm busy refining
> spamc
> & bogofilter atm and I see a lot of those words.

It's got to do with fooling Bayesian statistical filters, I think. See
this article for a description:
http://www.paulgraham.com/spam.html

 I guess what normally happens is that if a message is identified as
spam, it's keywords are added to the filter's database of
spam-keywords. These messages make sure that a bunch of nonsense is
added to the database at the same time, eventually clogging it up with
nonsense.

That's just a guess - I'm no expert on this stuff. Surely it's easy
enough to filter out the nonsense, and _then_ run the message through
the filter. 

If it was me, I'd be looking for _collocations_ of keywords (e.g.
'prescription' only if it co-occurs within a given span with 'viagra'
or 'vallium'...). But like I said, this isn't my field.

Are there any filters that attempt to scan for grammatical pattens,
like for example imperatives "Visit our exciting web-site!" or clauses
without a finite verb "Free viagra content here!" ? Or do they all just
work at the vocabulary level?

mark

=====
mark a. bell
http://www.users.bigpond.com/m487396

__________________________________
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com
-- 
SLUG - Sydney Linux User's Group - http://slug.org.au/
More Info: http://lists.slug.org.au/listinfo/slug

Reply via email to