> What about having a tool that a corpus could be run through which uses > the existing code in SpamAssassin to split up the input into tokens, > hash them, and then writes to a database of tokens keyed on the hash.
Way too complicated. -- Daniel Quinlan anti-spam (SpamAssassin), Linux, http://www.pathname.com/~quinlan/ and open source consulting
