Mark wrote:
-----Original Message-----
From: Mark [mailto:[EMAIL PROTECTED]
Sent: woensdag 15 november 2006 18:15
To: 'users@spamassassin.apache.org'
Subject: RE: Bayes column 'token'
Well, bayes_mysql.sql does not specify collation; so, like
you said, the collation will be your MySQL server-set default. And
searches in MySQL are case-insensitive by default. Might indeed
perhaps be a good idea to convert to "latin1_bin" or some such.
There will be any problem if I convert the current data to the new
collation?
I see no indication (or reason) in the code that tokens are
to be handled in an case-insensitive manner. The opposite, ere.
So, I'm inclined to say that "latin1_bin" collation is better.
I don't wanna be responsible for messing up your database, though. :)
So I will test this a bit on my Vmware box.
Did the testing; and it works very smooth with latin1_bin.
PRIMARY for `id` and `token` should not have INDEX for `id`
and `token` added, too.
I don't understand what you mean.
The couple (id, token) is PRIMARY, not INDEX...
Where exactly is the problem?
PRIMARY, like UNIQUE, always implies INDEX, too. So, adding
an extra INDEX for `id` and `token` basically gives you a double
INDEX for them.
There's a double INDEX for `atime` too. So, I'd say, in
bayes_mysql.sql, replace this:
CREATE TABLE bayes_token (
id int(11) NOT NULL default '0',
token char(5) NOT NULL default '',
spam_count int(11) NOT NULL default '0',
ham_count int(11) NOT NULL default '0',
atime int(11) NOT NULL default '0',
PRIMARY KEY (id, token),
INDEX bayes_token_idx1 (token),
INDEX bayes_token_idx2 (id, atime)
) TYPE=MyISAM;
With:
CREATE TABLE bayes_token (
id int(11) NOT NULL default '0',
token char(5) COLLATE latin1_bin NOT NULL default '',
spam_count int(11) NOT NULL default '0',
ham_count int(11) NOT NULL default '0',
atime int(11) NOT NULL default '0',
PRIMARY KEY (id, token),
INDEX bayes_token_idx1 (atime)
) TYPE=MyISAM;
Those are multi-column indexes not duplicates.
INDEX bayes_token_idx1 (id, atime)
is NOT the same as:
INDEX bayes_token_idx1 (id)
INDEX bayes_token_idx2 (atime)
Unless you've verified that the SQL used by the Bayes modules doesn't need these indexes, you
probably shouldn't change these.
(sorry I didn't notice this earlier in the thread)