Talking of which, does anyone know any good or interesting approaches to
identifying these junk strings?

A checksum algorithm based spam system (eg vipul's razor) could be
modified to work with checksums of only the recognized words in an email.
All unrecognized stuff (based on a standard wordlist) would get stripped
before the checksum was generated.  This would help for a while, and I'd
be interested to hear about anything out there, but the spammers could
deal with it easily enough by modifying their approach to just tack on
half a dozen common words selected at random.

I presume that algorithms have been developed in the area of detecting
copyright violations which look at percentage overlap between different
bits of text, but I'm far from clear on how you could do that efficiently.

Anyone have any pointers?

Andrew



On Sun, 15 Jun 2003 [EMAIL PROTECTED] wrote:

> Date: Sun, 15 Jun 2003 12:25:16 +1000
> From: [EMAIL PROTECTED]
> To: [EMAIL PROTECTED]
> Subject: Re: [SLUG] Meaning of Nonsense in  s p a m
>
>
> I've always presumed that it was some type
> of cookie; in the bovious way, to validate the
> email, but also if you complained to the isp
> they would be able to know who complained if
> the isp then showed the complaint to the
> spammer.
>
>
> But yeah, trying to fool filters is probably the
> main purpose.
>
> Matt
>
>
>

--

No added Sugar.  Not tested on animals.  If irritation occurs,
discontinue use.

-------------------------------------------------------------------
Andrew McNaughton           In Sydney
                            Working on a Product Recommender System
[EMAIL PROTECTED]
Mobile: +61 422 753 792     http://staff.scoop.co.nz/andrew/cv.doc



-- 
SLUG - Sydney Linux User's Group - http://slug.org.au/
More Info: http://lists.slug.org.au/listinfo/slug

Reply via email to