I started looking over the details of what we do with Bayes and MySQL, and I have some questions.

The tables defined in sql/bayes_mysql.sql all have a username field that is varchar(200).

Why do we need a long username string in every record? Why is it 200 when the username field in the MySQL userpref table is varchar(100). If there really has to be a user id in each record why not use the integer prefid field from the userpref table?

In the tables that do have a username field, that field is declared as either the key or is the prefix of the key. With MySQL is a selection based on the username just as fast as somehow splitting the data for each user into some separate location, whether that is a separate table per user or database? Does MySQL optimize the storage by not storing the actual key with the record? I guess I'm asking if there is some MySQL optimization that isn't apparent to me that makes sense out of having the username in every record.

Does MySQL automatically optimize things so that when SpamAssassin queries the database for each token in a message, since they are all for the same user records for that user, or at least index entries for records for that user, will end up getting cached on the first query and then read again from memory on the subsequent queries? If not, then shouldn't the database be structured so as to keep each user's data together?

 -- sidney



Reply via email to