http://bugzilla.spamassassin.org/show_bug.cgi?id=3331
------- Additional Comments From [EMAIL PROTECTED] 2004-05-02 14:53 ------- What about having a tool that a corpus could be run through which uses the existing code in SpamAssassin to split up the input into tokens, hash them, and then writes to a database of tokens keyed on the hash. As a standalone program with a separate database it would have no effect on performance, but it would be available to be run when someone wants to be able to analyze a Bayes db in terms of the original tokens. If there are a few extra entries, that doesn't hurt, and if a few uninteresting tokens are missing that probably doesn't hurt either. This way there are no problems with configuration, backup/restore, etc., and the tool would not have to be run often. If it is set up to add entries to a dictionary, someone could run it periodically on saved ham and spam to keep a dictionary built up, or they could maintain their research corpus and when they want to analyze Bayes results for it run the tool over the corpus to make a fresh database. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.
