http://bugzilla.spamassassin.org/show_bug.cgi?id=3331





------- Additional Comments From [EMAIL PROTECTED]  2004-04-29 10:34 -------
I like the general idea of having the actual tokens available somehow for
analysis. I'm concerned about possible effects on perfomance if they are in the
main database. Michael's thoughts about getting them in a separate optional
table may be the way to go.

An important part of the optimizations was getting the records to be all fixed
length. A quote from the MySQL manual section on optimization:

"For MyISAM tables that change a lot, you should try to avoid all
variable-length columns (VARCHAR, BLOB, and TEXT). The table will use dynamic
record format if it includes a single variable-length column."

Note that such a table of tokens would contain information that is more privacy
sensitive than the Bayes database that has only hashes. That might effect
sharing of database information.

That said, I would like to see a suggestion that dealt with all the issues in a
sufficiently compelling way to get Theo to change his mind about his -1. I just
don't know what all would be in such a suggestion.

There may be something coming out of the idea that all we are interested in
having is the ability to usually look up the original text given a hash when
doing some analysis, but the mapping does not have to be maintained, expired,
etc., to the same degtree of precisions and reliability as the rest of the 
database.




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

Reply via email to