> What about having a tool that a corpus could be run through which uses
> the existing code in SpamAssassin to split up the input into tokens,
> hash them, and then writes to a database of tokens keyed on the hash.

Way too complicated.

-- 
Daniel Quinlan                     anti-spam (SpamAssassin), Linux,
http://www.pathname.com/~quinlan/    and open source consulting

Reply via email to