http://bugzilla.spamassassin.org/show_bug.cgi?id=3331





------- Additional Comments From [EMAIL PROTECTED]  2004-05-02 14:53 -------
What about having a tool that a corpus could be run through which uses the
existing code in SpamAssassin to split up the input into tokens, hash them, and
then writes to a database of tokens keyed on the hash.

As a standalone program with a separate database it would have no effect on
performance, but it would be available to be run when someone wants to be able
to analyze a Bayes db in terms of the original tokens. If there are a few extra
entries, that doesn't hurt, and if a few uninteresting tokens are missing that
probably doesn't hurt either.

This way there are no problems with configuration, backup/restore, etc., and the
tool would not have to be run often. If it is set up to add entries to a
dictionary, someone could run it periodically on saved ham and spam to keep a
dictionary built up, or they could maintain their research corpus and when they
want to analyze Bayes results for it run the tool over the corpus to make a
fresh database.




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

Reply via email to