--- Nick Croft wrote: > Just wondering if someone could enlighten me as to the meaning, > purpose or > origin of those strange nonsense words in spam. I'm busy refining > spamc > & bogofilter atm and I see a lot of those words.
It's got to do with fooling Bayesian statistical filters, I think. See this article for a description: http://www.paulgraham.com/spam.html I guess what normally happens is that if a message is identified as spam, it's keywords are added to the filter's database of spam-keywords. These messages make sure that a bunch of nonsense is added to the database at the same time, eventually clogging it up with nonsense. That's just a guess - I'm no expert on this stuff. Surely it's easy enough to filter out the nonsense, and _then_ run the message through the filter. If it was me, I'd be looking for _collocations_ of keywords (e.g. 'prescription' only if it co-occurs within a given span with 'viagra' or 'vallium'...). But like I said, this isn't my field. Are there any filters that attempt to scan for grammatical pattens, like for example imperatives "Visit our exciting web-site!" or clauses without a finite verb "Free viagra content here!" ? Or do they all just work at the vocabulary level? mark ===== mark a. bell http://www.users.bigpond.com/m487396 __________________________________ Do you Yahoo!? SBC Yahoo! DSL - Now only $29.95 per month! http://sbc.yahoo.com -- SLUG - Sydney Linux User's Group - http://slug.org.au/ More Info: http://lists.slug.org.au/listinfo/slug
